The difference between spam corpora

View: New views
1 Messages — Rating Filter:   Alert me  

The difference between spam corpora

by Martijn Grooten :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

All,

I have been wondering if any research has been done about the difference between different (kinds of) spam corpora*; I believe this is the right place to ask. (Oh, and hello, I am kind of new here too; a lurker for quite some time, but not sure if I've posted before.)

* throughout this email, by corpus I mean all emails in a live mail stream, used in real time.

To test a spam filter, or an anti-spam method or to do research about spam, it is inevitable to use a spam corpus. As the spam sent to one email address, or even one corporation, is unlikely to be representative of all the spam sent globally during that period, most people add the spam sent to one or more spam traps to their test. There is nothing wrong with approach, but, at least in theory, a lot of spam will not end up in such traps: mailings sent by dodgy ESPs; spam sent to addresses harvested from Outlook address books; spam sent to addresses obtained by hacking a company's customer database (or, perhaps more likely here in the UK, spam sent to addresses from a CD-Rom found on a train).

I am not sure how big a proportion of spam is of this latter kind, but I think it would be interesting to find out. Over the past months I have sent both our corporate mail stream and the spam from a distributed spam trap through a number of spam filters and the difference in performance was striking, with many products letting through ten or more times as much corportate spam as spam trap spam. Now easy-to-filter is just one way of quantifying a difference between spam corpora, but these results have led me to believe that spam traps, much as they are extremely useful, don't show the full picture.

Martijn.

Virus Bulletin Ltd, The Pentagon, Abingdon, OX14 3YP, England.
Company Reg No: 2388295. VAT Reg No: GB 532 5598 33.
_______________________________________________
Asrg mailing list
Asrg@...
http://www.irtf.org/mailman/listinfo/asrg