iter_csvΒΆ
Iterates over rows from a CSV file.
Reading CSV files can be quite slow. If, for whatever reason, you're going to loop through the same file multiple times, then we recommend that you to use the stream.Cache utility.
ParametersΒΆ
-
filepath_or_buffer
Either a string indicating the location of a file, or a buffer object that has a
readmethod. -
target (Union[str, List[str]]) β defaults to
NoneA single target column is assumed if a string is passed. A multiple output scenario is assumed if a list of strings is passed. A
Nonevalue will be assigned to eachyif this parameter is omitted. -
converters (dict) β defaults to
NoneAll values in the CSV are interpreted as strings by default. You can use this parameter to cast values to the desired type. This should be a
dictmapping feature names to callables used to parse their associated values. Note that a callable may be a type, such asfloatandint. -
parse_dates (dict) β defaults to
NoneA
dictmapping feature names to a format passed to thedatetime.datetime.strptimemethod. -
drop (List[str]) β defaults to
NoneFields to ignore.
-
drop_nones β defaults to
FalseWhether or not to drop fields where the value is a
None. -
fraction β defaults to
1.0Sampling fraction.
-
compression β defaults to
inferFor on-the-fly decompression of on-disk data. If this is set to 'infer' and
filepath_or_bufferis a path, then the decompression method is inferred for the following extensions: '.gz', '.zip'. -
seed (int) β defaults to
NoneIf specified, the sampling will be deterministic.
-
field_size_limit (int) β defaults to
NoneIf not
None, this will be passed to thecsv.field_size_limitfunction. -
kwargs
All other keyword arguments are passed to the underlying
csv.DictReader.
ExamplesΒΆ
Although this function is designed to handle different kinds of inputs, the most common use case is to read a file on the disk. We'll first create a little CSV file to illustrate.
>>> tv_shows = '''name,year,rating
... Planet Earth II,2016,9.5
... Planet Earth,2006,9.4
... Band of Brothers,2001,9.4
... Breaking Bad,2008,9.4
... Chernobyl,2019,9.4
... '''
>>> with open('tv_shows.csv', mode='w') as f:
... _ = f.write(tv_shows)
We can now go through the rows one by one. We can use the converters parameter to cast
the rating field value as a float. We can also convert the year to a datetime via
the parse_dates parameter.
>>> from river import stream
>>> params = {
... 'converters': {'rating': float},
... 'parse_dates': {'year': '%Y'}
... }
>>> for x, y in stream.iter_csv('tv_shows.csv', **params):
... print(x, y)
{'name': 'Planet Earth II', 'year': datetime.datetime(2016, 1, 1, 0, 0), 'rating': 9.5} None
{'name': 'Planet Earth', 'year': datetime.datetime(2006, 1, 1, 0, 0), 'rating': 9.4} None
{'name': 'Band of Brothers', 'year': datetime.datetime(2001, 1, 1, 0, 0), 'rating': 9.4} None
{'name': 'Breaking Bad', 'year': datetime.datetime(2008, 1, 1, 0, 0), 'rating': 9.4} None
{'name': 'Chernobyl', 'year': datetime.datetime(2019, 1, 1, 0, 0), 'rating': 9.4} None
The value of y is always None because we haven't provided a value for the target
parameter. Here is an example where a target is provided:
>>> dataset = stream.iter_csv('tv_shows.csv', target='rating', **params)
>>> for x, y in dataset:
... print(x, y)
{'name': 'Planet Earth II', 'year': datetime.datetime(2016, 1, 1, 0, 0)} 9.5
{'name': 'Planet Earth', 'year': datetime.datetime(2006, 1, 1, 0, 0)} 9.4
{'name': 'Band of Brothers', 'year': datetime.datetime(2001, 1, 1, 0, 0)} 9.4
{'name': 'Breaking Bad', 'year': datetime.datetime(2008, 1, 1, 0, 0)} 9.4
{'name': 'Chernobyl', 'year': datetime.datetime(2019, 1, 1, 0, 0)} 9.4
Finally, let's delete the example file.
>>> import os; os.remove('tv_shows.csv')