OrdinalEncoder¶

Ordinal encoder.

This transformer maps each feature to integers. It can useful when a feature has string values (i.e. categorical variables).

Parameters¶

unknown_value

Type → int | None

Default → 0

The value to use for unknown categories seen during transform_one. Unknown categories will be mapped to an integer once they are seen during learn_one. This value can be set to None in order to categories to None if they've never been seen before.
none_value

Type → int

Default → -1

The value to encode None with.

Attributes¶

categories

A dict of dicts. The outer dict maps each feature to its inner dict. The inner dict maps each category to its code.

Examples¶

from river import preprocessing

X = [
    {"country": "France", "place": "Taco Bell"},
    {"country": None, "place": None},
    {"country": "Sweden", "place": "Burger King"},
    {"country": "France", "place": "Burger King"},
    {"country": "Russia", "place": "Starbucks"},
    {"country": "Russia", "place": "Starbucks"},
    {"country": "Sweden", "place": "Taco Bell"},
    {"country": None, "place": None},
]

encoder = preprocessing.OrdinalEncoder()
for x in X:
    print(encoder.transform_one(x))
    encoder.learn_one(x)

{'country': 0, 'place': 0}
{'country': -1, 'place': -1}
{'country': 0, 'place': 0}
{'country': 1, 'place': 2}
{'country': 0, 'place': 0}
{'country': 3, 'place': 3}
{'country': 2, 'place': 1}
{'country': -1, 'place': -1}

xb1 = pd.DataFrame(X[0:4], index=[0, 1, 2, 3])
xb2 = pd.DataFrame(X[4:8], index=[4, 5, 6, 7])

encoder = preprocessing.OrdinalEncoder()
encoder.transform_many(xb1)

   country  place
0        0      0
1       -1     -1
2        0      0
3        0      0

encoder.learn_many(xb1)
encoder.transform_many(xb2)

   country  place
4        0      0
5        0      0
6        2      1
7       -1     -1

Methods¶

learn_many

Update with a mini-batch of features.

A lot of transformers don't actually have to do anything during the learn_many step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_many can override this method.

Parameters

X — 'pd.DataFrame'
y — defaults to None

learn_one

Update with a set of features x.

A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.

Parameters

x — 'dict'

transform_many

Transform a mini-batch of features.

Parameters

X — 'pd.DataFrame'

Returns

pd.DataFrame: A new DataFrame.

transform_one

Transform a set of features x.

Parameters

x — 'dict'

Returns

dict: The transformed values.