Not all Pandas dataframes are shared in a multiprocessing list


I’ve tried to get answer for this question on StackOverflow first, but I hope some of you can explain this and hopefully lead us to a solution.

The StackOverflow question is here:

I’ve also added an error callback and managed to get an error:

RemoteError(‘Traceback (most recent call last):
File “lib\multiprocessing\”, line 228, in serve_client
request = recv()
File “lib\multiprocessing\”, line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
AttributeError: Can’t get attribute ‘DataFrame’ on <module ‘pandas.core.frame’ from ‘lib\site-packages\pandas\core\>’

I’ve looked into the GitHub tracker and I found this issue that looks a lot like mine: #2440 Although there are a few differences:

  • I’m using multiprocessing instead of threading. Because of this, we can use a multiprocessing.Pool and and a special list object to share objects.
  • In our example, we don’t actually change the dataframe in the different processes. We’re only adding it to the list of shared objects.

Output of pd.show_versions()


commit: None
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2 (I’ve also tested this with pandas version 0.22.0, which I believe was the latests)
nose: 1.3.7
pip: 10.0.0
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.13.1
scipy: 1.0.1
statsmodels: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
matplotlib: 2.1.1
openpyxl: 2.4.9
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
httplib2: None
apiclient: None
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
boto: 2.48.0
pandas_datareader: None

If you need anything else, let me know. We appreciate all the work you’ve done!

