Skip to content

TREC07

TREC's 2007 Spam Track dataset.

The data contains 75,419 chronologically ordered items, i.e. 3 months of emails delivered to a particular server in 2007. Spam messages represent 66.6% of the dataset. The goal is to predict whether an email is a spam or not.

The available raw features are: sender, recipients, date, subject, body.

Attributes

  • desc

    Return the description from the docstring.

  • is_downloaded

    Indicate whether or the data has been correctly downloaded.

  • path

Methods

download
take

Iterate over the k samples.

Parameters

  • k'int'