TransformerUnion¶
Packs multiple transformers into a single one.
Pipelines allow you to apply steps sequentially. Therefore, the output of a step becomes the input of the next one. In many cases, you may want to pass the output of a step to multiple steps. This simple transformer allows you to do so. In other words, it enables you to apply particular steps to different parts of an input. A typical example is when you want to scale numeric features and one-hot encode categorical features.
This transformer is essentially a list of transformers. Whenever it is updated, it loops through each transformer and updates them. Meanwhile, calling transform_one
collects the output of each transformer and merges them into a single dictionary.
Parameters¶
-
transformers
Ideally, a list of (name, estimator) tuples. A name is automatically inferred if none is provided.
Examples¶
Take the following dataset:
X = [
{'place': 'Taco Bell', 'revenue': 42},
{'place': 'Burger King', 'revenue': 16},
{'place': 'Burger King', 'revenue': 24},
{'place': 'Taco Bell', 'revenue': 58},
{'place': 'Burger King', 'revenue': 20},
{'place': 'Taco Bell', 'revenue': 50}
]
As an example, let's assume we want to compute two aggregates of a dataset. We therefore
define two feature_extraction.Agg
s and initialize a TransformerUnion
with them:
from river import compose
from river import feature_extraction
from river import stats
mean = feature_extraction.Agg(
on='revenue', by='place',
how=stats.Mean()
)
count = feature_extraction.Agg(
on='revenue', by='place',
how=stats.Count()
)
agg = compose.TransformerUnion(mean, count)
We can now update each transformer and obtain their output with a single function call:
from pprint import pprint
for x in X:
agg = agg.learn_one(x)
pprint(agg.transform_one(x))
{'revenue_count_by_place': 1, 'revenue_mean_by_place': 42.0}
{'revenue_count_by_place': 1, 'revenue_mean_by_place': 16.0}
{'revenue_count_by_place': 2, 'revenue_mean_by_place': 20.0}
{'revenue_count_by_place': 2, 'revenue_mean_by_place': 50.0}
{'revenue_count_by_place': 3, 'revenue_mean_by_place': 20.0}
{'revenue_count_by_place': 3, 'revenue_mean_by_place': 50.0}
Note that you can use the +
operator as a shorthand notation:
agg = mean + count
This allows you to build complex pipelines in a very terse manner. For instance, we can create a pipeline that scales each feature and fits a logistic regression as so:
from river import linear_model as lm
from river import preprocessing as pp
model = (
(mean + count) |
pp.StandardScaler() |
lm.LogisticRegression()
)
Whice is equivalent to the following code:
model = compose.Pipeline(
compose.TransformerUnion(mean, count),
pp.StandardScaler(),
lm.LogisticRegression()
)
Note that you access any part of a TransformerUnion
by name:
model['TransformerUnion']['Agg']
Agg (
on="revenue"
by=['place']
how=Mean ()
)
model['TransformerUnion']['Agg1']
Agg (
on="revenue"
by=['place']
how=Count ()
)
You can also manually provide a name for each step:
agg = compose.TransformerUnion(
('Mean revenue by place', mean),
('# by place', count)
)
Mini-batch example:
X = pd.DataFrame([
{"place": 2, "revenue": 42},
{"place": 3, "revenue": 16},
{"place": 3, "revenue": 24},
{"place": 2, "revenue": 58},
{"place": 3, "revenue": 20},
{"place": 2, "revenue": 50},
])
Since we need a transformer with mini-batch support to demonstrate, we shall use
a StandardScaler
.
from river import compose
from river import preprocessing
agg = (
compose.Select("place") +
(compose.Select("revenue") | preprocessing.StandardScaler())
)
_ = agg.learn_many(X)
agg.transform_many(X)
place revenue
0 2 0.441250
1 3 -1.197680
2 3 -0.693394
3 2 1.449823
4 3 -0.945537
5 2 0.945537
Methods¶
learn_many
Update each transformer.
Parameters
- X — 'pd.DataFrame'
- y — 'pd.Series | None' — defaults to
None
learn_one
Update each transformer.
Parameters
- x — 'dict'
- y — defaults to
None
transform_many
Passes the data through each transformer and packs the results together.
Parameters
- X — 'pd.DataFrame'
transform_one
Passes the data through each transformer and packs the results together.
Parameters
- x — 'dict'