|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
hostkarma/uribl_black disparityHi,
Over the past few days I have been investigating more closely email that wasn't tagged that I thought should have been, and vice-versa, using various factors, such as URIBL_BLACK and JMF_W. I'm very surprised that obvious hosts are on the URIBL_BLACK list, like receiveeweek.com. Even more interesting is a bunch of FNs that contain both URIBL_BLACK and JMF_W. I'm not sure which is correct in many cases, because they are not always so cut-and-dried. For example, there was a Citi Bank email (whitelisted) that happened to use an image server (csnimages.com) that is in URIBL_BLACK. While I don't think that particular email should have been tagged as spam, it's only an example, and I hoped someone would be interested enough to check out a list I created with these types of disparities I've had over the last day or so. It's too long to include here, so I've created a pastebin for it: http://pastebin.com/m4a1561b5 I realize this type of thing could happen for many reasons, not the least of which is an otherwise-legitimate host that has been compromised and now used to send spam. However, many on my list are quite persistent, like blr-events.com and eturbonews.com, which I have no idea whether it is legitimate or bogus. Whatever the case, there are definitely mistakes, and I'd like to help correct them. Ideas appreciated. I'd be glad to gather more info if necessary. Thanks Alex |
|
|
Re: hostkarma/uribl_black disparityMySQL Student wrote:
> Over the past few days I have been investigating more closely email > that wasn't tagged that I thought should have been, and > vice-versa, using various factors, such as URIBL_BLACK and JMF_W. Very interesting. Here's a quick testing script (ymmv on log file syntax): ######### #!/bin/sh # helper function, see below _sacount() { zgrep -h "spamd: result: ${3:+Y}" /var/log/mail.lo* \ |egrep -c "$1${2:+.*$2|$2.*$1}" } # Usage: sa_count RULE1 [RULE2] # Counts messages marked as RULE1 (and RULE2 if given) sa_count() { c=`_sacount $1 $2` sc=`_sacount $1 $2 spam` echo "Found $c ($sc spam) matching ${2:+both} $1${2:+ and $2}." } sa_count RCVD_IN_HOSTKARMA_W URIBL_BLACK sa_count RCVD_IN_DNSWL URIBL_BLACK sa_count URIBL_BLACK sa_count . # show total numbers ######### My output (note, I greylist): Found 54 (11 spam) matching both RCVD_IN_HOSTKARMA_W and URIBL_BLACK. Found 25 (16 spam) matching both RCVD_IN_DNSWL and URIBL_BLACK. Found 1981 (1919 spam) matching URIBL_BLACK. Found 123273 (3791 spam) matching .. I don't have data on whether there were FPs or FNs involved. (And yes, zgrep is perfectly content to deal with uncompressed files.) |
| Free embeddable forum powered by Nabble | Forum Help |