This article will provide a simple introduction to multiprocessing in Python and provide code examples.
The code and examples in this article are for Python 3. Python 2 is no longer supported, if you are still using it you should be migrating your projects to the latest version of Python to ensure security and compatibility.
What is Multiprocessing?
Usually, Python runs your code, line-by-line, in a single process until it has completed. Multiprocessing allows you to run multiple sets of instructions simultaneously in separate processes, enabling your application to do more than one thing at a time.
Why Use Multiprocessing?
Multiprocessing allows you to do more than one thing at a time. Simple, single-process Python apps are fine for simple tasks like operating on text files or automating basic tasks, but as you start to build more complex applications, you will want to be able to utilize the full power of your computer and assign tasks to other CPU cores to increase performance, and perform multiple tasks at the same time.
By default, Python has some limitations and cannot utilize all available processing cores. With multiprocessing, each spawned process gets its own instance of Python so that more of the systems processing capacity can be used.
For example, you might have an application which processes large text files. If you had 100 text files that took 1 minute to process each, a single-process Python application which does not utilize multiprocessing would take 100 minutes to complete the task. Modern computers have multiple CPU cores which can all execute tasks concurrently. By using multiprocessing on a 4 core CPU, the tasks could be completed in 25 minutes instead of 100.
Multiprocessing is different to threading, which is covered later in this article.
Using the multiprocessing Library
Multiprocessing functionality in Python is provided by the multiprocessing library which is included with all Python 3 installations.
Here is a simple example of multiprocessing in action:
# Import the multiprocessing library. import multiprocessing # Import the time module which is used to demonstrate that the processes run simultaneously import time # Define a function which will contain the code to be run in parallel. def myProcess(): # Wait 1 second time.sleep(1) # Print a message print('I am processing!') # Processes can only be launched from the main module of a Python application. The following check to ensure that this is the main module MUST be present to successfully launch a process. if __name__ == '__main__': # Create a loop that will repeat 3 times, launching 10 processes for i in range(3): # Create the process object for the separate process proc = multiprocessing.Process(target=myProcess) # Launch the process proc.start()
The comments in the code above should explain what is happening. In summary:
- The required libraries are imported.
- A function is defined which waits 1 second, then prints a message.
- We check that the script is the main module for this Python application, and has not been imported, using name
- 3 child processes are created and launched using the multiprocessing.Process function which calls myProcess as the target to spawn new processes from
- If the main process quits, it will not necessarily end the child processes it has spawned – see the Daemons section below for how to create processes that will terminate with the main process
You’ll notice that time.sleep is used to wait a second before printing a message in the child processes. This is done to demonstrate that the processes are indeed run concurrently. When run, this Python code will print ‘I am processing!‘ three times, each print command being run from a separate process at the same time. The proof of that is in the delay – all three messages print at the same time after a wait of 1 second, rather than printing one at a time with a delay between each.
A daemon is a process that runs in the background, and is not under the direct control of the user. In the context of Python multiprocessing, daemons are treated as background processes which will be automatically terminated before the main process exits. Daemon processes cannot spawn their own processes.
The output from daemon processes is not automatically displayed in the output of the main process.
To run a process as a daemon, set the daemon flag to True:
import multiprocessing def myProcess(): print('I am processing!') if __name__ == '__main__': for i in range(3): proc = multiprocessing.Process(target=myProcess, daemon=True) proc.start()
Waiting for Processes to Finish
The join() method is used to wait for a multiprocessing process to finish before continuing.
join() will block the executing of any code from continuing until the process has completed – whether it s a daemon process or not.
import multiprocessing import time def slowProcess(): time.sleep(5) print 'I am a slow process!' if __name__ == '__main__': proc = multiprocessing.Process(target=slowProcess) proc.start() proc.join() print('This text was delayed')
To demonstrate that this is working, if you comment out the proc.join(), the last line of text will print immediately, as the script will no longer wait for the spawned process to complete.
Processes can be terminated before they complete by using the terminate() method. This is useful when a process is thought to have frozen, or the task it is performing is no longer required.
import multiprocessing import time def slowProcess(): time.sleep(5) print('I am a slow process!') if __name__ == '__main__': proc = multiprocessing.Process(target=slowProcess) proc.start() proc.terminate() proc.join()
It is good practice to join() the process after terminating it to ensure that Python has an up-to-date status for the terminated process.
What is Threading?
Threading has similar syntax and terminology to multiprocessing, but the purpose is different. While multiprocessing is focussed on performance – assigning tasks to run in parallel on additional processing cores, threading is just about letting more than one thing happen at once – within the limitations of a single Python process.
Threading does not allow for parallel computation on multiple CPU cores. It just lets more than one thing run at once, and can negatively impact performance if your hardware isn’t up to the job. It’s meant for waiting on network requests (like retrieving data from the internet), or monitoring user input while other tasks are performed – not making your application performant.
Using the threading Library
Threading is handled through the Python 3 threading library, which is included in all Python 3 installations.
The syntax is similar to that used by the multiprocessing library:
import threading def myThread(): print("I am a thread!") if __name__ == '__main__': for i in range(3): thread = threading.Thread(target=myThread) thread.start()
Above, a function called myThread is created, which is run in parallel threads three times. The target function is run three times, concurrently, but within the same Python process.
As threads are all run under the same Python process, they have access to the same environment and variables. This can cause problems if two threads try to modify the same variable at the same time. That’s where the lock comes in.
When a variable is being modified by a thread, it can be locked, and other threads must wait for it to be unlocked before they can modify it in turn.
To lock and unlock a variable, use the acquire() and release() methods of the Lock object:
import threading # Create a global variable which multiple threads will manipulate myVariable = 0 # Create a global Lock object which these threads will use to coordinate their locks lock = threading.Lock() def myThread(): global myVariable global lock # Acquire a lock lock.acquire() # Alter variable myVariable += 1 # Release lock lock.release() print("I am a thread! The count is " + str(myVariable)) if __name__ == '__main__': for i in range(3): thread = threading.Thread(target=myThread) thread.start()
Waiting for threads to finish
Use the join() method to wait for a thread to complete before continuing:
import threading def myThread(): print("I am a thread!") if __name__ == '__main__': for i in range(3): thread = threading.Thread(target=myThread) thread.start() thread.join()
Should I use Multiprocessing or Threading?
Whether to use multiprocessing or threading will depend on what you are trying to achieve.
If you are working on a large set of data and want to process it faster, splitting the job up and distributing it to multiple processes using multiprocessing will result in a faster execution. Attempting to do the same with threading will not result in the same performance improvements.
If you are simply waiting for input, or data from a network, threading allows you to continue executing other code while you wait for the result, without having to spawn new processes.