AdPredictor¶
Bayesian online probit regression for click-through-rate prediction.
AdPredictor, used at some point by Microsoft for CTR prediction in Bing's sponsored search 1, keeps a Gaussian belief over each feature weight rather than a point estimate. It predicts via a probit link and learns in a single pass, scaling each weight's step size by its own uncertainty. It shines on the sparse, high-cardinality, categorical data of ad logs: it yields well-calibrated probabilities and exposes uncertainty for exploration. Like the other linear models, its cost per example scales only with the number of active features. Plain logistic regression is simpler and just as good on dense, low-dimensional numeric data.
Features are expected to be a sparse active set: a key in x is active when its value is truthy (its magnitude is otherwise ignored), so one-hot encode or bucket inputs. Use preprocessing.OneHotEncoder(drop_zeros=True)` to feed only the present categories.
Parameters¶
-
beta
Type →
floatDefault →
0.1Standard deviation of the per-example label noise; the prediction step size shrinks as
betagrows. -
prior_probability
Type →
floatDefault →
0.5Base-rate CTR used to initialise the bias weight, so the model predicts this value before seeing any data. The bias is calibrated lazily from the number of active features in the first observed example.
-
epsilon
Type →
floatDefault →
0.05Variance-dynamics rate in [0, 1). Each update nudges weights a fraction
epsilonof the way back toward the unit prior, preventing variances from collapsing and letting the model track drift. Set to 0 to disable.
Examples¶
from river import linear_model
model = linear_model.AdPredictor(beta=0.1, prior_probability=0.5)
With no data seen yet, the model predicts the base rate:
model.predict_proba_one({"a": 1, "b": 1})[True]
0.5
After repeatedly seeing the same clicked impression, its belief moves up:
for _ in range(50):
model.learn_one({"a": 1, "b": 1}, True)
model.predict_proba_one({"a": 1, "b": 1})[True] > 0.9
True
On a real ad-click stream, one-hot encode the fields into a sparse active set and evaluate with progressive validation:
from river import datasets, metrics, preprocessing
dataset = datasets.CriteoAds()
model = preprocessing.OneHotEncoder(drop_zeros=True) | linear_model.AdPredictor()
metric = metrics.ROCAUC()
for x, y in dataset.take(10_000):
metric.update(y, model.predict_proba_one(x))
model.learn_one(x, y)
metric
ROCAUC: 64.95%
Methods¶
learn_one
Update the model with a set of features x and a label y.
Parameters
- x —
dict[base.typing.FeatureName, Any] - y —
base.typing.ClfTarget
predict_one
Predict the label of a set of features x.
Parameters
- x —
dict[base.typing.FeatureName, Any] - kwargs —
Any
Returns
base.typing.ClfTarget | None: The predicted label.
predict_proba_one
Predict the probability of each label for a dictionary of features x.
Parameters
- x —
dict[base.typing.FeatureName, Any]
Returns
dict[base.typing.ClfTarget, float]: A dictionary that associates a probability which each label.
-
Graepel, T., Candela, J.Q., Borchert, T. and Herbrich, R., 2010. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft's Bing search engine. ICML 2010. ↩