EBSTSplitter¶
iSOUP-Tree's Extended Binary Search Tree (E-BST).
This class implements the Extended Binary Search Tree1 (E-BST) structure, using the variant employed by Osojnik et al.2 in the iSOUP-Tree algorithm. This structure is employed to observe the target space distribution.
Proposed along with Fast Incremental Model Tree with Drift Detection1 (FIMT-DD), E-BST was the first attribute observer (AO) proposed for incremental Hoeffding Tree regressors. This AO works by storing all observations between splits in an extended binary search tree structure. E-BST stores the input feature realizations and statistics of the target(s) that enable calculating the split heuristic at any time. To alleviate time and memory costs, E-BST implements a memory management routine, where the worst split candidates are pruned from the binary tree.
In this variant, only the left branch statistics are stored and the complete split-enabling statistics are calculated with an in-order traversal of the binary search tree.
Attributes¶
-
is_numeric
Determine whether or not the splitter works with numerical features.
-
is_target_class
Check on which kind of learning task the splitter is designed to work. If
True
, the splitter works with classification trees, otherwise it is designed for regression trees.
Methods¶
best_evaluated_split_suggestion
Get the best split suggestion given a criterion and the target's statistics.
Parameters
- criterion ā 'SplitCriterion'
- pre_split_dist ā 'list | dict'
- att_idx ā 'base.typing.FeatureName'
- binary_only ā 'bool' ā defaults to
True
Returns
BranchFactory: Suggestion of the best attribute split.
cond_proba
Not implemented in regression splitters.
Parameters
- att_val
- target_val ā 'base.typing.ClfTarget'
remove_bad_splits
Remove bad splits.
Based on FIMT-DD's 1 procedure to remove bad split candidates from the E-BST. This mechanism is triggered every time a split attempt fails. The rationale is to remove points whose split merit is much worse than the best candidate overall (for which the growth decision already failed). Let
Parameters
- criterion
- last_check_ratio ā 'float'
- last_check_vr ā 'float'
- last_check_e ā 'float'
- pre_split_dist ā 'list | dict'
update
Update statistics of this observer given an attribute value, its target value and the weight of the instance observed.
Parameters
- att_val
- target_val ā 'base.typing.Target'
- sample_weight ā 'float'