It seems that fetch_openml is not caching the data.
If I run:
from sklearn.datasets import fetch_openml
import time
time0 = time.time()
cifar10 = fetch_openml('CIFAR_10', cache=True)
print('First time: {:.0f}s'.format(time.time()-time0))
time0 = time.time()
cifar10 = fetch_openml('CIFAR_10', cache=True)
print('Second time: {:.0f}s'.format(time.time()-time0))
I obtain:
First time: 44s
Second time: 45s
I expected the second time will be much lower than the first one.
Versions:
Python dependencies:
pip: 20.2.4
setuptools: 49.6.0.post20201009
sklearn: 0.23.2
numpy: 1.19.2
scipy: 1.5.2
Cython: None
pandas: None
matplotlib: 3.3.2
joblib: 0.17.0
threadpoolctl: 2.1.0
Built with OpenMP: True
Thank you very much, this solve my problem with some minor changes 😉