Cache¶
Utility for caching iterables.
This can be used to save a stream of data to the disk in order to iterate over it faster the following time. This can save time depending on the nature of stream. The more processing happens in a stream, the more time will be saved. Even in the case where no processing is done apart from reading the data, the cache will save some time because it is using the pickle binary protocol. It can thus improve the speed in common cases such as reading from a CSV file.
Parameters¶
-
directory
Default →
None
The path where to store the pickled data streams. If not provided, then it will be automatically inferred whenever possible, if not an exception will be raised.
Attributes¶
-
keys (set)
The set of keys that are being cached.
Examples¶
import time
from river import datasets
from river import stream
dataset = datasets.Phishing()
cache = stream.Cache()
The cache can be used by wrapping it around an iterable. Because this is the first time are iterating over the data, nothing is cached.
tic = time.time()
for x, y in cache(dataset, key='phishing'):
pass
toc = time.time()
print(toc - tic) # doctest: +SKIP
0.012813
If we do the same thing again, we can see the loop is now faster.
tic = time.time()
for x, y in cache(dataset, key='phishing'):
pass
toc = time.time()
print(toc - tic) # doctest: +SKIP
0.001927
We can see an overview of the cache. The first line indicates the location of the cache.
cache # doctest: +SKIP
/tmp
phishing - 125.2KiB
Finally, we can clear the stream from the cache.
cache.clear('phishing')
cache # doctest: +SKIP
/tmp
There is also a clear_all
method to remove all the items in the cache.
cache.clear_all()
Methods¶
call
Call self as a function.
Parameters
- stream
- key — defaults to
None
clear
Delete the cached stream associated with the given key.
Parameters
- key — 'str'
clear_all
Delete all the cached streams.