I have a shuffled series with a bunch of sinvalues in float16, like this:
tdata.time_sin
110405276 -0.183105
175560878 -0.301270
...
130331292 -0.158813
6782127 -0.282471
Name: time_sin, Length: 18490389, dtype: float16
There’s no NaN values, everything’s a sinus of something:
tdata.time_sin[np.isnan(tdata.time_sin) == True].count()
0
But for some reason, mean()
chokes somewhere in the middle like it’s overflowing:
tdata.time_sin.mean()
nan
tdata.time_sin[:328720].mean()
0.0
tdata.time_sin[:328721].mean()
nan
tdata.time_sin[328719:328722]
117467643 -0.639648
85318746 0.956055
10829780 0.112000
Name: time_sin, dtype: float16
And it works fine when converted to float32:
foo = tdata.time_sin.astype(np.float32)
foo.mean()
0.20143597
Is this weird or am I missing something about float16?
This behavior persists after pickling and loading and sorting by index, although it now chokes much earlier:
zzz = pickle.load(open('timesin.pkl', 'rb'))
bb = zzz.sort_index()
bb[:74351].mean()
-0.0
bb[:74352].mean()
nan
bb[74350:74355]
749371 -0.898438
749393 -0.898438
749432 -0.898438
749447 -0.898438
749479 -0.898438
Name: time_sin, dtype: float16
Problem description
Expected Output
Output of pd.show_versions()
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-119-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.3.1
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
You have an overflow. Take the mean over a ratio,(df[col] / n).mean() * n, where n is large enough.
To know how large n needs to be you can compute the sum of the column once cast into float32, and compare to the largest float16.