# Regression¶

Regression is about predicting a numeric output for a given sample. A labeled regression sample is made up of a bunch of features and a number. The number is usually continuous, but it may also be discrete. We'll use the Trump approval rating dataset as an example.

from river import datasets

dataset = datasets.TrumpApproval()
dataset

Donald Trump approval ratings.

This dataset was obtained by reshaping the data used by FiveThirtyEight for analyzing Donald
Trump's approval ratings. It contains 5 features, which are approval ratings collected by
5 polling agencies. The target is the approval rating from FiveThirtyEight's model. The goal of
this task is to see if we can reproduce FiveThirtyEight's model.

Name  TrumpApproval
Samples  1,001
Features  6
Sparse  False
Path  /Users/max.halford/projects/river/river/datasets/trump_approval.csv.gz


This dataset is a streaming dataset which can be looped over.

for x, y in dataset:
pass


Let's take a look at the first sample.

x, y = next(iter(dataset))
x

{'ordinal_date': 736389,
'gallup': 43.843213,
'ipsos': 46.19925042857143,
'morning_consult': 48.318749,
'rasmussen': 44.104692,
'you_gov': 43.636914000000004}


A regression model's goal is to learn to predict a numeric target y from a bunch of features x. We'll attempt to do this with a nearest neighbors model.

from river import neighbors

model = neighbors.KNNRegressor()
model.predict_one(x)

0.0


The model hasn't been trained on any data, and therefore outputs a default value of 0.

The model can be trained on the sample, which will update the model's state.

model = model.learn_one(x, y)


If we try to make a prediction on the same sample, we can see that the output is different, because the model has learned something.

model.predict_one(x)

43.75505


Typically, an online model makes a prediction, and then learns once the ground truth reveals itself. The prediction and the ground truth can be compared to measure the model's correctness. If you have a dataset available, you can loop over it, make a prediction, update the model, and compare the model's output with the ground truth. This is called progressive validation.

from river import metrics

model = neighbors.KNNRegressor()

metric = metrics.MAE()

for x, y in dataset:
y_pred = model.predict_one(x)
model.learn_one(x, y)
metric.update(y, y_pred)

metric

MAE: 0.31039


This is a common way to evaluate an online model. In fact, there is a dedicated evaluate.progressive_val_score function that does this for you.

from river import evaluate

model = neighbors.KNNRegressor()
metric = metrics.MAE()

evaluate.progressive_val_score(dataset, model, metric)

MAE: 0.31039


That concludes the getting started introduction to regression! You can now move on to the next steps.