EBSTSplitterยถ
iSOUP-Tree's Extended Binary Search Tree (E-BST).
This class implements the Extended Binary Search Tree1 (E-BST) structure, using the variant employed by Osojnik et al.2 in the iSOUP-Tree algorithm. This structure is employed to observe the target space distribution.
Proposed along with Fast Incremental Model Tree with Drift Detection1 (FIMT-DD), E-BST was the first attribute observer (AO) proposed for incremental Hoeffding Tree regressors. This AO works by storing all observations between splits in an extended binary search tree structure. E-BST stores the input feature realizations and statistics of the target(s) that enable calculating the split heuristic at any time. To alleviate time and memory costs, E-BST implements a memory management routine, where the worst split candidates are pruned from the binary tree.
In this variant, only the left branch statistics are stored and the complete split-enabling statistics are calculated with an in-order traversal of the binary search tree.
Attributesยถ
-
is_numeric
Determine whether or not the splitter works with numerical features.
-
is_target_class
Check on which kind of learning task the splitter is designed to work. If
True
, the splitter works with classification trees, otherwise it is designed for regression trees.
Methodsยถ
best_evaluated_split_suggestion
Get the best split suggestion given a criterion and the target's statistics.
Parameters
- criterion (river.tree.split_criterion.base.SplitCriterion)
- pre_split_dist (Union[List, Dict])
- att_idx (Hashable)
- binary_only (bool) โ defaults to
True
Returns
BranchFactory: Suggestion of the best attribute split.
cond_proba
Not implemented in regression splitters.
Parameters
- att_val
- target_val (Union[bool, str, int])
remove_bad_splits
Remove bad splits.
Based on FIMT-DD's 1 procedure to remove bad split candidates from the E-BST. This mechanism is triggered every time a split attempt fails. The rationale is to remove points whose split merit is much worse than the best candidate overall (for which the growth decision already failed). Let
Parameters
- criterion
- last_check_ratio (float)
- last_check_vr (float)
- last_check_e (float)
- pre_split_dist (Union[List, Dict])
update
Update statistics of this observer given an attribute value, its target value and the weight of the instance observed.
Parameters
- att_val
- target_val (Union[bool, str, int, numbers.Number])
- sample_weight (float)