BernoulliNB¶
Bernoulli Naive Bayes.
Bernoulli Naive Bayes model learns from occurrences between features such as word counts and discrete classes. The input vector must contain positive values, such as counts or TF-IDF values.
Parameters¶
-
alpha
Default →
1.0
Additive (Laplace/Lidstone) smoothing parameter (use 0 for no smoothing).
-
true_threshold
Default →
0.0
Threshold for binarizing (mapping to booleans) features.
Attributes¶
-
class_counts (collections.Counter)
Number of times each class has been seen.
-
feature_counts (collections.defaultdict)
Total frequencies per feature and class.
Examples¶
import pandas as pd
from river import compose
from river import feature_extraction
from river import naive_bayes
docs = [
("Chinese Beijing Chinese", "yes"),
("Chinese Chinese Shanghai", "yes"),
("Chinese Macao", "yes"),
("Tokyo Japan Chinese", "no")
]
model = compose.Pipeline(
("tokenize", feature_extraction.BagOfWords(lowercase=False)),
("nb", naive_bayes.BernoulliNB(alpha=1))
)
for sentence, label in docs:
model = model.learn_one(sentence, label)
model["nb"].p_class("yes")
0.75
model["nb"].p_class("no")
0.25
model.predict_proba_one("test")
{'yes': 0.883..., 'no': 0.116...}
model.predict_one("test")
'yes'
You can train the model and make predictions in mini-batch mode using the class methods
learn_many
and predict_many
.
df_docs = pd.DataFrame(docs, columns = ["docs", "y"])
X = pd.Series([
"Chinese Beijing Chinese",
"Chinese Chinese Shanghai",
"Chinese Macao",
"Tokyo Japan Chinese"
])
y = pd.Series(["yes", "yes", "yes", "no"])
model = compose.Pipeline(
("tokenize", feature_extraction.BagOfWords(lowercase=False)),
("nb", naive_bayes.BernoulliNB(alpha=1))
)
model = model.learn_many(X, y)
unseen = pd.Series(["Taiwanese Taipei", "Chinese Shanghai"])
model.predict_proba_many(unseen)
no yes
0 0.116846 0.883154
1 0.047269 0.952731
model.predict_many(unseen)
0 yes
1 yes
dtype: object
Methods¶
joint_log_likelihood
Computes the joint log likelihood of input features.
Parameters
- x — 'dict'
Returns
float: Mapping between classes and joint log likelihood.
joint_log_likelihood_many
Computes the joint log likelihood of input features.
Parameters
- X — 'pd.DataFrame'
Returns
pd.DataFrame: Input samples joint log likelihood.
learn_many
Learn from a batch of count vectors.
Parameters
- X — 'pd.DataFrame'
- y — 'pd.Series'
Returns
MiniBatchClassifier: self
learn_one
Updates the model with a single observation.
Parameters
- x — 'dict'
- y — 'base.typing.ClfTarget'
Returns
Classifier: self
p_class
p_class_many
p_feature_given_class
predict_many
Predict the outcome for each given sample.
Parameters
- X — 'pd.DataFrame'
Returns
pd.Series: The predicted labels.
predict_one
Predict the label of a set of features x
.
Parameters
- x — 'dict'
- kwargs
Returns
base.typing.ClfTarget | None: The predicted label.
predict_proba_many
Return probabilities using the log-likelihoods in mini-batchs setting.
Parameters
- X — 'pd.DataFrame'
predict_proba_one
Return probabilities using the log-likelihoods.
Parameters
- x — 'dict'