Python中的multiprocessing
Date: 2019/10/25 Tags: Python
再Python中有很多种办法去并行的执行计算, 比如multiprocessing, pyspark, thread pool等, 使用起来有一些不同和限制
multiprocessing
import multiprocessing as mp
from math import sin
p = mp.Pool(4)
result = p.map(sin, range(10000))
multiprocessing使用pickle来做序列化, pickle无法序列化lambda和非顶层的函数, 导致使用比较受限
dill
有人发现了这个问题, 可以用dill
替换pickle
,
用pathos.multiprocessing
替换multiprocessing
即可, 上面的例子可以写成
from pathos multiprocessing as mp
p = mp.Pool(4)
result = p.map(lambda x: x+x, range(10000))
IPython Parallel
启动ipycluster
ipcluster start --n=4
from IPython.parallel import Client
p = Client()[:]
p.use_dill()
p.map_sync(lambda x: x+x, range(1000))
Refer to (http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization)