TREC's 2007 Spam Track dataset.
The data contains 75,419 chronologically ordered items, i.e. 3 months of emails delivered to a particular server in 2007. Spam messages represent 66.6% of the dataset. The goal is to predict whether an email is a spam or not.
The available raw features are: sender, recipients, date, subject, body.
Return the description from the docstring.
Indicate whether or the data has been correctly downloaded.
Iterate over the k samples.
- k (int)