Python中的multiprocessing

Date: 2019/10/25 Tags: Python



再Python中有很多种办法去并行的执行计算, 比如multiprocessing, pyspark, thread pool等, 使用起来有一些不同和限制

multiprocessing

import multiprocessing as mp
from math import sin
p = mp.Pool(4)

result = p.map(sin, range(10000))

multiprocessing使用pickle来做序列化, pickle无法序列化lambda和非顶层的函数, 导致使用比较受限

dill

有人发现了这个问题, 可以用dill替换pickle, 用pathos.multiprocessing替换multiprocessing即可, 上面的例子可以写成

from pathos multiprocessing as mp
p = mp.Pool(4)

result = p.map(lambda x: x+x, range(10000))

IPython Parallel

启动ipycluster

ipcluster start --n=4
from IPython.parallel import Client

p = Client()[:]
p.use_dill()

p.map_sync(lambda x: x+x, range(1000))

Refer to (http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization)