fetch_openml does not cache the data

It seems that fetch_openml is not caching the data.

If I run:

from sklearn.datasets import fetch_openml
import time

time0 = time.time()
cifar10 = fetch_openml('CIFAR_10', cache=True)
print('First time: {:.0f}s'.format(time.time()-time0))

time0 = time.time()
cifar10 = fetch_openml('CIFAR_10', cache=True)
print('Second time: {:.0f}s'.format(time.time()-time0))

I obtain:

First time: 44s
Second time: 45s

I expected the second time will be much lower than the first one.

Versions:

Python dependencies:
          pip: 20.2.4
   setuptools: 49.6.0.post20201009
      sklearn: 0.23.2
        numpy: 1.19.2
        scipy: 1.5.2
       Cython: None
       pandas: None
   matplotlib: 3.3.2
       joblib: 0.17.0
threadpoolctl: 2.1.0

Built with OpenMP: True

Author: Fantashit

1 thought on “fetch_openml does not cache the data

  1. Thank you very much, this solve my problem with some minor changes 😉

    from joblib import Memory
    
    memory = Memory('./tmp')
    
    fetch_openml_cached = memory.cache(fetch_openml)
    
    cifar10 = fetch_openml_cached('CIFAR_10')
    
    

Comments are closed.