Skip to content

Newton

Online Newton Step (ONS) optimizer.

This optimizer uses second-order information (i.e. the Hessian of the cost function) in addition to first-order information (i.e. the gradient of the cost function). It maintains the matrix

\[A_t = \epsilon I + \sum_{s=1}^{t} g_s g_s^\intercal\]

where \(g_s\) is the gradient at step \(s\), and applies the update

\[w_{t+1} = w_t - \eta A_t^{-1} g_t.\]

The inverse \(A_t^{-1}\) is never computed from scratch. Instead it is updated in-place at each step with the Sherman-Morrison formula (see utils.math.sherman_morrison), which turns the rank-1 update \(A_t = A_{t-1} + g_t g_t^\intercal\) into a single rank-1 update of the inverse. This costs \(O(d^2)\) time and memory, where \(d\) is the number of features seen so far.

Parameters

  • lr (mutable)

    Default0.1

    Learning rate. This corresponds to \(1 / \gamma\) in the paper.

  • eps

    Default1e-05

    Regularization term used to initialize the Hessian to \(\epsilon I\). Its inverse \(1 / \epsilon\) is used as the starting value of \(A_t^{-1}\), so a smaller eps allows larger steps early on.

Attributes

  • learning_rate

Examples

from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import optim
from river import preprocessing

dataset = datasets.Phishing()
optimizer = optim.Newton()
model = (
    preprocessing.StandardScaler() |
    linear_model.LogisticRegression(optimizer)
)
metric = metrics.F1()

evaluate.progressive_val_score(dataset, model, metric)
F1: 73.62%

Methods

look_ahead

Updates a weight vector before a prediction is made.

Parameters: w (dict): A dictionary of weight parameters. The weights are modified in-place. Returns: The updated weights.

Parameters

  • wdict

step

Updates a weight vector given a gradient.

Parameters

  • wdict | VectorLike
  • gdict | VectorLike

Returns

dict | VectorLike: The updated weights.

References