Adaptive Model Rules.
AMRules1 is a rule-based algorithm for incremental regression tasks. AMRules relies on the Hoeffding bound to build its rule set, similarly to Hoeffding Trees. The Variance-Ratio heuristic is used to evaluate rules' splits. Moreover, this rule-based regressor has additional capacities not usually found in decision trees.
Firstly, each created decision rule has a built-in drift detection mechanism. Every time a drift is detected, the affected decision rule is removed. In addition, AMRules' rules also have anomaly detection capabilities. After a warm-up period, each rule tests whether or not the incoming instances are anomalies. Anomalous instances are not used for training.
Every time no rule is covering an incoming example, a default rule is used to learn from it. A rule covers an instance when all of the rule's literals (tests joined by the logical operation
and) match the input case. The default rule is also applied for predicting examples not covered by any rules from the rule set.
n_min (int) – defaults to
The total weight that must be observed by a rule between expansion attempts.
delta (float) – defaults to
The split test significance. The split confidence is given by
1 - delta.
tau (float) – defaults to
The tie-breaking threshold.
pred_type (str) – defaults to
The prediction strategy used by the decision rules. Can be either: -
"mean": outputs the target mean within the partitions defined by the decision rules. -
"model": always use instances of the model passed
pred_modelto make predictions. -
"adaptive": dynamically selects between "mean" and "model" for each incoming example. The most accurate option at the moment will be used.
pred_model (base.Regressor) – defaults to
The regression model that will be replicated for every rule when
splitter (river.tree.splitter.base.Splitter) – defaults to
The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits. Splitters are available in the
tree.splittermodule. Different splitters are available for classification and regression tasks. Classification and regression splitters can be distinguished by their property
is_target_class. This is an advanced option. Special care must be taken when choosing different splitters. By default,
tree.splitter.TEBSTSplitteris used if
drift_detector (base.DriftDetector) – defaults to
The drift detection model that is used by each rule. Care must be taken to avoid the triggering of too many false alarms or delaying too much the concept drift detection. By default,
drift.ADWINis used if
alpha (float) – defaults to
The exponential decaying factor applied to the learning models' absolute errors, that are monitored if
pred_type='adaptive'. Must be between
1. The closer to
1, the more importance is going to be given to past observations. On the other hand, if its value approaches
0, the recent observed errors are going to have more influence on the final decision.
anomaly_threshold (float) – defaults to
The threshold below which instances will be considered anomalies by the rules.
m_min (int) – defaults to
The minimum total weight a rule must observe before it starts to skip anomalous instances during training.
ordered_rule_set (bool) – defaults to
True, only the first rule that covers an instance will be used for training or prediction. If
False, all the rules covering an instance will be updated during training, and the predictions for an instance will be the average prediction of all rules covering that example.
min_samples_split (int) – defaults to
The minimum number of samples each partition of a binary split candidate must have to be considered valid.
The number of detected concept drifts.
>>> from river import datasets >>> from river import drift >>> from river import evaluate >>> from river import metrics >>> from river import preprocessing >>> from river import rules >>> dataset = datasets.TrumpApproval() >>> model = ( ... preprocessing.StandardScaler() | ... rules.AMRules( ... delta=0.01, ... n_min=50, ... drift_detector=drift.ADWIN() ... ) ... ) >>> metric = metrics.MAE() >>> evaluate.progressive_val_score(dataset, model, metric) MAE: 1.117705
Aggregated anomaly score computed using all the rules that cover the input instance.
Returns the mean anomaly score, the standard deviation of the score, and the proportion of rules that cover the instance (support). If the support is zero, it means that the default rule was used (not other rule covered
typing.Tuple[float, float, float]: mean_anomaly_score, std_anomaly_score, support
Return an explanation of how
x is predicted
str: A representation of the rules that cover the input and their prediction.
Fits to a set of features
x and a real-valued target
- x (dict)
- y (numbers.Number)
- w (int) – defaults to
Predict the output of features
- x (dict)
Number: The prediction.
AMRules treats all the non-numerical inputs as nominal features. All instances of
numbers.Number will be treated as continuous, even if they represent integer categories.
When using nominal features,
pred_type should be set to "mean", otherwise errors will be
thrown while trying to update the underlying rules' prediction models. Prediction strategies
other than "mean" can be used, as long as the prediction model passed to
Duarte, J., Gama, J. and Bifet, A., 2016. Adaptive model rules from high-speed data streams. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(3), pp.1-22. ↩