ComplementNB¶

Naive Bayes classifier for multinomial models.

Complement Naive Bayes model learns from occurrences between features such as word counts and discrete classes. ComplementNB is suitable for imbalance dataset. The input vector must contain positive values, such as counts or TF-IDF values.

Parameters¶

alpha – defaults to 1.0

Additive (Laplace/Lidstone) smoothing parameter (use 0 for no smoothing).

Attributes¶

class_dist (proba.Multinomial)

Class prior probability distribution.
feature_counts (collections.defaultdict)

Total frequencies per feature and class.
class_totals (collections.Counter)

Total frequencies per class.

Examples¶

>>> from river import feature_extraction
>>> from river import naive_bayes

>>> sentences = [
...     ('food food meat brain', 'health'),
...     ('food meat ' + 'kitchen ' * 9 + 'job' * 5, 'butcher'),
...     ('food food meat job', 'health')
... ]

>>> model = feature_extraction.BagOfWords() | ('nb', naive_bayes.ComplementNB)

>>> for sentence, label in sentences:
...     model = model.learn_one(sentence, label)

>>> model['nb'].p_class('health') == 2 / 3
True
>>> model['nb'].p_class('butcher') == 1 / 3
True

>>> model.predict_proba_one('food job meat')
{'health': 0.9409689355477155, 'butcher': 0.05903106445228467}

You can train the model and make predictions in mini-batch mode using the class methods learn_many and predict_many.

>>> import pandas as pd

>>> docs = [
...     ('food food meat brain', 'health'),
...     ('food meat ' + 'kitchen ' * 9 + 'job' * 5, 'butcher'),
...     ('food food meat job', 'health')
... ]

>>> docs = pd.DataFrame(docs, columns = ['X', 'y'])

>>> X, y = docs['X'], docs['y']

>>> model = feature_extraction.BagOfWords() | ('nb', naive_bayes.ComplementNB)

>>> model = model.learn_many(X, y)

>>> model['nb'].p_class('health') == 2 / 3
True

>>> model['nb'].p_class('butcher') == 1 / 3
True

>>> model['nb'].p_class_many()
    butcher    health
0  0.333333  0.666667

>>> model.predict_proba_one('food job meat')
{'butcher': 0.05903106445228467, 'health': 0.9409689355477155}

>>> model.predict_proba_one('Taiwanese Taipei')
{'butcher': 0.3769230769230768, 'health': 0.6230769230769229}

>>> unseen_data = pd.Series(
...    ['food job meat', 'Taiwanese Taipei'], name = 'X', index = ['river', 'rocks'])

>>> model.predict_proba_many(unseen_data)
        butcher    health
river  0.059031  0.940969
rocks  0.376923  0.623077

>>> model.predict_many(unseen_data)
river    health
rocks    health
dtype: object

Methods¶

clone

Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.

joint_log_likelihood

Computes the joint log likelihood of input features.

Parameters

x (dict)

Returns

float: Mapping between classes and joint log likelihood.

joint_log_likelihood_many

Computes the joint log likelihood of input features.

Parameters

X (pandas.core.frame.DataFrame)

Returns

DataFrame: Input samples joint log likelihood.

learn_many

Updates the model with a term-frequency or TF-IDF pandas dataframe.

Parameters

X (pandas.core.frame.DataFrame)
y (pandas.core.series.Series)

Returns

self

learn_one

Updates the model with a single observation.

Parameters

x (dict)
y (Union[bool, str, int])

Returns

Classifier: self

p_class

p_class_many

predict_many

Predict the labels of a DataFrame X.

Parameters

X (pandas.core.frame.DataFrame)

Returns

Series: Series of predicted labels.

predict_one

Predict the label of a set of features x.

Parameters

x (dict)

Returns

typing.Union[bool, str, int]: The predicted label.

predict_proba_many

Return probabilities using the log-likelihoods in mini-batchs setting.

Parameters

X (pandas.core.frame.DataFrame)

predict_proba_one

Return probabilities using the log-likelihoods.

Parameters

x (dict)

ComplementNB¶

Parameters¶

Attributes¶

Examples¶

Methods¶

References¶