As a Python learner, I've faced several challenges, but so far, one of the most difficult topics to understand has been concurrency. In the beginning, it can be incredibly confusing, especially if you're a beginner. The aim of this blog post is to simplify concurrency by breaking it down with a couple of examples and an analogy to help you understand this challenging concept. So, let's get started.
Why does Concurrency Matter?
When writing Python programs, you might find yourself needing to execute multiple tasks simultaneously or in parallel. This is where concurrency comes in. Concurrency allows your program to run multiple tasks at the same time, which can significantly improve performance and efficiency, particularly when handling time-consuming tasks.
The Magic of Python concurrent.futures
Python's concurrent.futures
module simplifies concurrent programming by providing a high-level interface for asynchronously executing callable (functions/methods). ThreadPoolExecutor and ProcessPoolExecutor are two popular classes within this module that enable you to easily execute tasks concurrently, using threads or processes, respectively.
When deciding between ThreadPoolExecutor and ProcessPoolExecutor, consider the following analogy - ThreadPoolExecutor is like having multiple chefs in a shared kitchen, while ProcessPoolExecutor is like having multiple chefs, each with their own kitchen.
ThreadPoolExecutor is ideal for I/O-bound tasks, where tasks often wait for external resources, such as reading files or downloading data. In these cases, sharing resources is acceptable and efficient. On the other hand, ProcessPoolExecutor is better suited for CPU-bound tasks, where heavy computations are performed, and sharing resources could lead to performance bottlenecks.
Examples
In our examples, we will be using ThreadPoolExecutor, as our tasks primarily involve waiting for external resources or events rather than heavy computations.
The first example doesn't use concurrency, demonstrating how tasks are executed sequentially. The second example employs executor.map()
, which executes tasks concurrently and returns results in the order they were submitted.
The third example makes use of executor.submit()
along with concurrent.futures.as_completed()
, which also executes tasks concurrently but allows you to process results as they become available, regardless of the order of submission. By analyzing these examples, you will gain a better understanding of how concurrency works in Python.
Let's use an analogy of a post office with multiple mailing clerks to better understand how concurrent.futures
work. Imagine you have a stack of letters you want to mail. Each letter needs to be processed by a mailing clerk, who can stamp and send the letters. However, the time it takes for each clerk to process a letter may vary.
1. Without Concurrency
This script does not use any concurrency. Instead, it iterates through the letters
list using a for loop and calls the mail_letter()
function sequentially for each letter.
By comparing this script to the upcoming examples with concurrent.futures
, you can appreciate the efficiency and time-saving benefits of using concurrency in your Python projects.
import time
import random
def mail_letter(letter):
duration = random.randint(1, 5)
print(f"Started mailing letter {letter} (duration: {duration}s)")
time.sleep(duration)
print(f"Finished mailing letter {letter}")
return f"Letter {letter} mailed"
if __name__ == '__main__':
letters = ['A', 'B', 'C', 'D', 'E']
results = []
for letter in letters:
result = mail_letter(letter)
results.append(result)
print("Mailing Results:")
for result in results:
print(result)
Here's a line-by-line explanation of the code.
- Import required modules
def mail_letter(letter):
: Defines a function calledmail_letter
that takes a single argument,letter
.duration = random.randint(1, 5)
: Inside the function, generates a random integer between 1 and 5 (inclusive), and assigns it to the variableduration
.time.sleep(duration)
: Pauses the execution of the function for the number of seconds specified byduration
. This simulates the time it takes to mail the letter.return f"Letter {letter} mailed"
: Returns a string indicating that the letter has been mailed.if __name__ == '__main__':
: Checks if the script is being run as the main program (not being imported as a module).letters = ['A', 'B', 'C', 'D', 'E']
: Creates a list of letters that need to be mailed.results = []
: Initializes an empty list calledresults
to store the results of mailing each letter.result = mail_letter(letter)
: Calls themail_letter()
function with the current letter and assigns the returned result to the variableresult
.for result in results:
: Iterates through each result in theresults
list.print(result)
: Prints the current result.
As you can see below, without concurrency, the mailing process takes longer as each letter is mailed one at a time, and the program must wait for each letter to finish mailing before starting the next one. It took around 18 seconds to complete.
#output
Started mailing letter A (duration: 2s)
Finished mailing letter A
Started mailing letter B (duration: 3s)
Finished mailing letter B
Started mailing letter C (duration: 4s)
Finished mailing letter C
Started mailing letter D (duration: 4s)
Finished mailing letter D
Started mailing letter E (duration: 5s)
Finished mailing letter E
Mailing Results:
Letter A mailed
Letter B mailed
Letter C mailed
Letter D mailed
Letter E mailed
2. With Concurrency (executor.map)
In this example, we use executor.map()
to apply the mail_letter()
function to each letter in the letters
list concurrently. The results are returned in the order that the tasks were submitted, and we print the mailing results in the same order.
import concurrent.futures
import time
import random
def mail_letter(letter):
duration = random.randint(1, 5)
print(f"Started mailing letter {letter} (duration: {duration}s)")
time.sleep(duration)
print(f"Finished mailing letter {letter}")
return f"Letter {letter} mailed"
if __name__ == '__main__':
letters = ['A', 'B', 'C', 'D', 'E']
with concurrent.futures.ThreadPoolExecutor() as executor:
results = list(executor.map(mail_letter, letters))
print("Mailing Results:")
for result in results:
print(result)
Here's a line-by-line explanation of the code, excluding parts already explained in the previous example, and focusing on the differences related to concurrent.futures:
import concurrent.futures
: Imports theconcurrent.futures
module, which provides a high-level interface for asynchronously executing callables.if __name__ == '__main__':
: (Same as before) Checks if the script is being run as the main program (not being imported as a module).letters = ['A', 'B', 'C', 'D', 'E']
: (Same as before) Creates a list of letters that need to be mailed.with concurrent.futures.ThreadPoolExecutor() as executor:
: Creates a ThreadPoolExecutor instance as a context manager, which manages the life cycle of a pool of worker threads that will be used to execute tasks concurrently.results = list(executor.map(mail_letter, letters))
: Uses theexecutor.map()
method to apply themail_letter()
function to each item in theletters
list concurrently. It returns an iterable with the results in the same order as the input. The iterable is then converted to a list and assigned to the variableresults
.print("Mailing Results:")
: (Same as before) Prints a message to indicate that the mailing results will be displayed.for result in results:
: (Same as before) Iterates through each result in theresults
list.print(result)
: (Same as before) Prints the current result.
This script demonstrates how to mail a list of letters concurrently using the ThreadPoolExecutor
and the executor.map()
method, which allows for faster execution of the tasks comparing to the previous example.
Started mailing letter A (duration: 3s)
Started mailing letter B (duration: 2s)
Started mailing letter C (duration: 1s)
Started mailing letter D (duration: 4s)
Started mailing letter E (duration: 1s)
Finished mailing letter C
Finished mailing letter E
Finished mailing letter B
Finished mailing letter A
Finished mailing letter D
Mailing Results:
Letter A mailed
Letter B mailed
Letter C mailed
Letter D mailed
Letter E mailed
The output above demonstrates how the concurrent execution took place.
- Each letter's mailing process started almost simultaneously because multiple threads were running the
mail_letter()
function concurrently. You can see that the "Started mailing letter" messages were printed in order (A, B, C, D, E), but with different random durations. - The mailing process for each letter is finished at different times, depending on the randomly assigned duration. This is evident from the "Finished mailing letter" messages, which were not printed in the original order (A, B, C, D, E). Instead, the letters with shorter durations finished earlier.
- The Mailing Results section displays the results of the mailing process, which are returned in the same order as the input letters (A, B, C, D, E). The
executor.map()
method ensures that the results are ordered according to the input sequence, even though the tasks are finished at different times. This is why you see the results as "Letter A mailed", "Letter B mailed", and so on, in the original order.
In summary, the output demonstrates how the ThreadPoolExecutor and the executor.map()
method enabled the mail_letter()
function to run concurrently for each letter, starting the tasks almost simultaneously and finishing them depending on their randomly assigned durations. The final results are displayed in the same order as the input letters, thanks to the executor.map()
method.
3. With Concurrency (as_completed)
In this final example, we also use executor.submit()
to submit the mail_letter()
function to the executor for each letter in the letters
list. However, we also store the returned Future objects in a dictionary called futures
and use concurrent.futures.as_completed()
to process the results as they become available, regardless of the order in which they were submitted.
import concurrent.futures
import time
import random
def mail_letter(letter):
duration = random.randint(1, 5)
print(f"Started mailing letter {letter} (duration: {duration}s)")
time.sleep(duration)
print(f"Finished mailing letter {letter}")
return f"Letter {letter} mailed"
if __name__ == '__main__':
letters = ['A', 'B', 'C', 'D', 'E']
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {executor.submit(mail_letter, letter): letter for letter in letters}
for future in concurrent.futures.as_completed(futures):
letter = futures[future]
result = future.result()
print(f"Result: {result}")
- Import required modules (same as before)
-
def mail_letter(letter):
: (Same as before) Defines a function calledmail_letter
that takes a single argument,letter
. The function generates a random duration, prints a message indicating the mailing process has started, waits for the duration, prints a message indicating the mailing process has finished, and returns a string indicating that the letter has been mailed. with concurrent.futures.ThreadPoolExecutor() as executor:
: (Same as before) Creates a ThreadPoolExecutor instance as a context manager, which manages the life cycle of a pool of worker threads that will be used to execute tasks concurrently.futures = {executor.submit(mail_letter, letter): letter for letter in letters}
: Uses a dictionary comprehension to submit eachmail_letter
task for every letter in theletters
list to the ThreadPoolExecutor. Theexecutor.submit()
method returns aconcurrent.futures.Future
object representing the result of a computation that may not have completed yet. The dictionary maps theseFuture
objects to their corresponding letters.for future in concurrent.futures.as_completed(futures):
: Iterates over theFuture
objects in thefutures
dictionary as they complete (regardless of the order they were submitted). This allows processing the results as soon as they become available.letter = futures[future]
: Retrieves the letter associated with the currentFuture
object from thefutures
dictionary.result = future.result()
: Waits for the currentFuture
object to complete (if it hasn't already) and retrieves its result.print(f"Result: {result}")
: Prints the result for the current letter.
Started mailing letter A (duration: 2s)
Started mailing letter B (duration: 5s)
Started mailing letter C (duration: 2s)
Started mailing letter D (duration: 1s)
Started mailing letter E (duration: 2s)
Finished mailing letter D
Result: Letter D mailed
Finished mailing letter A
Result: Letter A mailed
Finished mailing letter C
Result: Letter C mailed
Finished mailing letter E
Result: Letter E mailed
Finished mailing letter B
Result: Letter B mailed
- Similar to the second example, the mailing process for each letter started almost simultaneously because multiple threads were running the
mail_letter()
function concurrently. The "Started mailing letter" messages were printed in order (A, B, C, D, E), with different random durations assigned. - As before, the mailing process for each letter finished at different times, depending on their randomly assigned duration. The "Finished mailing letter" messages indicate the completion order, which is not necessarily the same as the original order of the letters.
- Unlike the second example, in this case, we display the results immediately after each letter's mailing process is finished.
- The
concurrent.futures.as_completed()
function allows us to iterate through theFuture
objects as they complete, regardless of their submission order. This is why you see the "Result:" lines interspersed between the "Finished mailing letter" messages, reflecting the order in which the tasks are finished.
Conclusion
In conclusion, we've explored the concurrent.futures
module in Python and how it can help you execute tasks concurrently. We covered three examples, highlighting the differences between sequential execution and two concurrent approaches. We hope these simple examples and explanations have made it easier for you, Happy Coding.