Shift¶
Shifts a data stream by returning past values.
This can be used to compute statistics over past data. For instance, if you're computing daily averages, then shifting by 7 will be equivalent to computing averages from a week ago.
Shifting values is useful when you're calculating an average over a target value. Indeed, in this case it's important to shift the values in order not to introduce leakage. The recommended way to do this is to feature_extraction.TargetAgg
, which already takes care of shifting the target values once.
Parameters¶
-
amount – defaults to
1
Shift amount. The
get
method will return thet - amount
value, wheret
is the current moment. -
fill_value – defaults to
None
This value will be returned by the
get
method if not enough values have been observed.
Attributes¶
- name
Examples¶
It is rare to have to use Shift
by itself. A more common usage is to compose it with
other statistics. This can be done via the |
operator.
>>> from river import stats
>>> stat = stats.Shift(1) | stats.Mean()
>>> for i in range(5):
... stat = stat.update(i)
... print(stat.get())
0.0
0.0
0.5
1.0
1.5
A common usecase for using Shift
is when computing statistics on shifted data. For
instance, say you have a dataset which records the amount of sales for a set of shops. You
might then have a shop
field and a sales
field. Let's say you want to look at the
average amount of sales per shop. You can do this by using a feature_extraction.Agg
. When
you call transform_one
, you're expecting it to return the average amount of sales,
without including today's sales. You can do this by prepending an instance of
stats.Mean
with an instance of stats.Shift
.
>>> from river import feature_extraction
>>> agg = feature_extraction.Agg(
... on='sales',
... how=stats.Shift(1) | stats.Mean(),
... by='shop'
... )
Let's define a little example dataset.
>>> X = iter([
... {'shop': 'Ikea', 'sales': 10},
... {'shop': 'Ikea', 'sales': 15},
... {'shop': 'Ikea', 'sales': 20}
... ])
Now let's call the learn_one
method to update our feature extractor.
>>> x = next(X)
>>> agg = agg.learn_one(x)
At this point, the average defaults to the initial value of stats.Mean
, which is 0.
>>> agg.transform_one(x)
{'sales_mean_of_shift_1_by_shop': 0.0}
We can now update our feature extractor with the next data point and check the output.
>>> agg = agg.learn_one(next(X))
>>> agg.transform_one(x)
{'sales_mean_of_shift_1_by_shop': 10.0}
>>> agg = agg.learn_one(next(X))
>>> agg.transform_one(x)
{'sales_mean_of_shift_1_by_shop': 12.5}
Methods¶
clone
Return a fresh estimator with the same parameters.
The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy
if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.
get
Return the current value of the statistic.
revert
Revert and return the called instance.
Parameters
- x
update
Update and return the called instance.
Parameters
- x