Skip to content

FuncTransformer

Wraps a function to make it usable in a pipeline.

There is often a need to apply an arbitrary transformation to a set of features. For instance, this could involve parsing a date and then extracting the hour from said date. If you're processing a stream of data, then you can do this yourself by calling the necessary code at your leisure. On the other hand, if you want to do this as part of a pipeline, then you need to follow a simple convention.

To use a function as part of a pipeline, take as input a dict of features and output a dict. Once you have initialized this class with your function, then you can use it like you would use any other (unsupervised) transformer.

It is up to you if you want your function to be pure or not. By pure we refer to a function that doesn't modify its input. However, we recommend writing pure functions because this reduces the chances of inserting bugs into your pipeline.

Parameters

  • func

    Typetyping.Callable[[dict], dict]

    A function that takes as input a dict and outputs a dict.

Examples

from pprint import pprint
import datetime as dt
from river import compose

x = {'date': '2019-02-14'}

def parse_date(x):
    date = dt.datetime.strptime(x['date'], '%Y-%m-%d')
    x['is_weekend'] = date.day in (5, 6)
    x['hour'] = date.hour
    return x

t = compose.FuncTransformer(parse_date)
pprint(t.transform_one(x))
{'date': '2019-02-14', 'hour': 0, 'is_weekend': False}

The above example is not pure because it modifies the input. The following example is pure and produces the same output:

def parse_date(x):
    date = dt.datetime.strptime(x['date'], '%Y-%m-%d')
    return {'is_weekend': date.day in (5, 6), 'hour': date.hour}

t = compose.FuncTransformer(parse_date)
pprint(t.transform_one(x))
{'hour': 0, 'is_weekend': False}

The previous example doesn't include the date feature because it returns a new dict. However, a common usecase is to add a feature to an existing set of features. You can do this in a pure way by unpacking the input dict into the output dict:

def parse_date(x):
    date = dt.datetime.strptime(x['date'], '%Y-%m-%d')
    return {'is_weekend': date.day in (5, 6), 'hour': date.hour, **x}

t = compose.FuncTransformer(parse_date)
pprint(t.transform_one(x))
{'date': '2019-02-14', 'hour': 0, 'is_weekend': False}

You can add FuncTransformer to a pipeline just like you would with any other transformer.

from river import naive_bayes

pipeline = compose.FuncTransformer(parse_date) | naive_bayes.MultinomialNB()
pipeline
Pipeline (
  FuncTransformer (
    func="parse_date"
  ),
  MultinomialNB (
    alpha=1.
  )
)

If you provide a function without wrapping it, then the pipeline will do it for you:

pipeline = parse_date | naive_bayes.MultinomialNB()

Methods

learn_many

Update with a mini-batch of features.

A lot of transformers don't actually have to do anything during the learn_many step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_many can override this method.

Parameters

  • X'pd.DataFrame'

learn_one

Update with a set of features x.

A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.

Parameters

  • x'dict'

transform_many

Transform a mini-batch of features.

Parameters

  • X'pd.DataFrame'

Returns

pd.DataFrame: A new DataFrame.

transform_one

Transform a set of features x.

Parameters

  • x'dict'

Returns

dict: The transformed values.