Python Multithreading vs Multiprocessing

Posted by ChenRiang on April 18, 2021

My first impression towards python multithreading and multiprocessing is that both of them work pretty much the same. However, this is wrong and in this article we will be looking at the differences of multithreading and multiprocessing.

Multithreading vs Multiprocessing

Let’s look at the code snippet below to understand more.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import threading
import time
import multiprocessing


def hello_func(thread_no):
    time.sleep(2)
    print("greeting from - ", thread_no)

t1 = threading.Thread(target=hello_func, args=("t1",))
t2 = threading.Thread(target=hello_func, args=("t2",))

# start thread
t1.start()
t2.start()

# wait for thread to complete 
t1.join()
t2.join()

p1 = multiprocessing.Process(target=hello_func, args=("p1",))
p2 = multiprocessing.Process(target=hello_func, args=("p2",))

# start process
p1.start()
p2.start()

# wait for process to complete
p1.join()
p2.join()

In the code snippet above, we will spawn 2 thread and process that execute method hello_func.

Output:

1
2
3
4
greeting from -  t1
greeting from -  t2
greeting from -  p1
greeting from -  p2

From the result above, multithreading seem like doing the exact same thing as multiprocessing.

To understand the differences, let’s look at the code snippet below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import threading
import time
import multiprocessing


def heavy_calculation(n, name):
    count = 0
    for i in range(n):
        count += i
    print(name, " done calculation")


t_time = time.time()
t1 = threading.Thread(target=heavy_calculation, args=(10000000, "t1",))
t2 = threading.Thread(target=heavy_calculation, args=(10000000, "t2",))

t1.start()
t2.start()

t1.join()
t2.join()
print("computation time for multithreading : ", (time.time() - t_time))

p_time = time.time()
p1 = multiprocessing.Process(target=heavy_calculation, args=(10000000, "p1",))
p2 = multiprocessing.Process(target=heavy_calculation, args=(10000000, "p2",))

p1.start()
p2.start()

p1.join()
p2.join()

print("computation time for multiprocessing : ", (time.time() - p_time))

In the code snippet above, we spawn 2 thread and process to execute a method heavy_calculation which will run a CPU intensive computation logic.

Output:

1
2
3
4
5
6
t2  done calculation
t1  done calculation
computation time for multithreading :  0.8291773796081543
p2  done calculation
p1  done calculation
computation time for multiprocessing :  0.4506714344024658

Multithreading take 0.82 second to run the method but multiprocessing only used 0.45 second.

But, why?

Multiprocessing is a true parallelism but multithreading is not because the Python’s global interpreter lock (GIL) will assure that there is will be only one thread running each time in a process. Click here for more info.

When to use?

In a simple short answer, use

  • multithreading for I/O intensive task
  • multiprocessing for CPU intensive task

Threads in Python are best use for IO task because they can share the result easily with each other in a process while for Processes it they need to pickle and combine their results (which takes time). However, due to GIL thread provide no benefit in parallelism and CPU intensive task.