Spam PDF

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

Re: Spam PDF

by John Rudd :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

bgodette@... wrote:

> John Rudd wrote:
>> Robert Schetterer wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> arni schrieb:
>>>> Raymond Myren schrieb:
>>>>> Hello,
>>>>>
>>>>> Just today I started receiving spam mails with attached .pdf files
>>>>> with a spam image.
>>>>> Any ideas how to stop this spam type?
>>>>>
>>>>> \raymond
>>>> as i said several times on this maillist now, i've never had any of
>>>> these mails get through, here is how the current ones score:
>>>>
>>>> X-Spam-Status: Yes, score=16.6 required=5.0 tests=BAYES_99,BOTNET,
>>>>     BOTNET_NORDNS,DCC_CHECK,DKIM_POLICY_SIGNSOME,HTML_MESSAGE,LOGINHASH1,
>>>>     LOGINHASH2,MIME_HTML_MOSTLY,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PBL,RDNS_NONE
>>>>
>>>>     autolearn=no version=3.2.0
>>>> X-Spam-Report:     *  5.5 BAYES_99 BODY: Bayesian spam probability is 99
>>>> to 100%
>>>>     *      [score: 1.0000]
>>>>     *  0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
>>>>     *  2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
>>>> bl.spamcop.net
>>>>     *      [Blocked - see <http://www.spamcop.net/bl.shtml?85.138.88.254>]
>>>>     *  0.9 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
>>>>     *      [85.138.88.254 listed in zen.spamhaus.org]
>>>>     *  3.0 BOTNET Relay might be a spambot or virusbot
>>>>     *      [botnet0.7,ip=85.138.88.254,nordns]
>>>>     *  0.0 DKIM_POLICY_SIGNSOME Domain Keys Identified Mail: policy says
>>>> domain
>>>>     *       signs some mails
>>>>     *  0.0 BOTNET_NORDNS Relay's IP address has no PTR record
>>>>     *      [botnet_nordns,ip=85.138.88.254]
>>>>     *  0.0 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME
>>>>     *  0.0 HTML_MESSAGE BODY: HTML included in message
>>>>     *  1.5 LOGINHASH2 BODY: mail has been classified as spam @ unknown
>>>> company,
>>>>     *       Germany
>>>>     *  1.5 LOGINHASH1 BODY: mail has been classified as spam @
>>>> LogIn&Solutions
>>>>     *      AG, Germany
>>>>     *  2.2 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
>>>>
>>>> arni
>>>>
>>> you are in a luck,
>>> you are a "late reciever" of that spam, so it was detected
>>> by others before ( look at your headers )
>>> but it wasnt detected by i.e a plain pdf_spam rule/solution
>>> ( like fuzzy_ocr etc )
>>> this is what i am looking for
>> His success didn't depend upon that luck.  Even without the LOGINHASH*
>> and DCC_CHECK, or even BAYES, he still had a high enough score to flag
>> it as spam.
>>
>>
> Actually it did, take away the spamtrap fed blackholes (PBL and SPAMCOP)
> and the spamtrap fed BAYES as well and it scores a whopping 3.1 thanks
> to the BOTNET plugin (which is amazing btw). That hit was all from
> late-receiver effect.
>

Actually, it didn't.  The assertion is that if someone else hadn't seen
this exact message first, then SA wouldn't have caught it.

The PBL (which isn't spamtrap fed, it's collected from ISP published
and/or contributed data) would have caught this based upon issues that
have nothing at all to do with this message, and most likely nothing at
all to do with this current round of spam.  It would be based upon the
host provider's policy that this host shouldn't send email to the internet.

Similarly, the SPAMCOP listing is most likely not related to _this_
message.  It is more likely an ongoing abuse issue, so the fact that the
host fed a spamtrap at spamcop at some point in the past does not mean
that they were "lucky to catch this message".  The odds are that the
SPAMCOP listing has nothing to do with this message.


I would make the same characterization of BAYES.  You don't have to see
a specific message in the past in order for BAYES to catch it.
Therefore, you're not depending upon "luckily not being the first person
to see a given message".


Just resting upon BAYES, BOTNET, and PBL, you're not "lucky to have
caught the message because you're a late receiver".  You've caught the
message due to a combination of policy, misuse, and historical
characteristics of spam in general being used to train your system.




Re: Spam PDF

by SARE Webmaster :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raymond Dijkxhoorn wrote:

> Hi!
>
>>> Jun 27 14:50:03 vmx80 MailScanner[4491]: Message l5RCnxP8019756 from
>>> 212.127.254.149 (idqct@...) to quicknet.nl is spam,
>>> SpamAssassin (not cached, score=24.191, required 5, BAYES_50 0.00,
>>> BODY_EMPTY 0.50, GMD_PDF_BAD_FUZZY 20.00, GMD_PDF_HORIZ 0.25,
>>> GMD_PDF_STOX
>>> 1.00, PROLO_NO_URI 0.01, RCVD_IN_WHOIS_BOGONS 2.43)
>
>> Where did those GMD rules come from?
>
> Will be announced lateron.
>

Until its publicly released, you can request it with a simple email to
us, see http://www.rulesemporium.com/plugins.htm#pdfinfo

Do not reply here, as I only digest, and I expect that subject hardcoded
so I can filter properly ;)



Re: Spam PDF

by John Thompson-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raymond Myren wrote:

> Just today I started receiving spam mails with attached .pdf files with
> a spam image.
> Any ideas how to stop this spam type?

Nothing, yet. But since these appear to be an image file encapsulated in
a .pdf, it may be possible to get FuzzyOCR to parse them for spam text.


--

-John Thompson (john@...)
 Appleton WI USA

Re: Spam PDF

by bgodette :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Actually, it didn't.  The assertion is that if someone else hadn't seen
> this exact message first, then SA wouldn't have caught it.

No, the assertion is that if someone else hadn't seen prior abuse from
the sending host first (not this exact message), then SA wouldn't have
caught that particular message. That assertion happens to be true for
the blacklists, and true for BAYES as well since it would have had to
have seen headers (since the payload is vastly different) that look like
this sending host in the recent past and been told that it was SPAM.

>
> The PBL (which isn't spamtrap fed, it's collected from ISP published
> and/or contributed data) would have caught this based upon issues that
> have nothing at all to do with this message, and most likely nothing at
> all to do with this current round of spam.  It would be based upon the
> host provider's policy that this host shouldn't send email to the internet.

Which means, some time, in the past, for whatever reasons that
particular IP address did something against someone's policy to end up
on that list. The important part being "in the past".

> Similarly, the SPAMCOP listing is most likely not related to _this_
> message.  It is more likely an ongoing abuse issue, so the fact that the
> host fed a spamtrap at spamcop at some point in the past does not mean
> that they were "lucky to catch this message".  The odds are that the
> SPAMCOP listing has nothing to do with this message.

Spamcop automatically delists IP addresses over time, to be relisted
someone/something has to report new abuse. If you happen to receive the
message before anyone has reported the new abuse, well it won't be listed.

> I would make the same characterization of BAYES.  You don't have to see
> a specific message in the past in order for BAYES to catch it.
> Therefore, you're not depending upon "luckily not being the first person
> to see a given message".

Explain how BAYES will have any matching tokens to work on if its from a
fresh, never before seen by your system, zombie and there's no message
body other than the attachment? All you have to work with is headers
which you've never seen before and MIME boundaries which you've never
seen before.

> Just resting upon BAYES, BOTNET, and PBL, you're not "lucky to have
> caught the message because you're a late receiver".  You've caught the
> message due to a combination of policy, misuse, and historical
> characteristics of spam in general being used to train your system.

All of which needs prior examples/reporting of messages similar to the
one you're trying to detect, that's what "historical characteristics of
spam" means.

Re: Spam PDF

by Dallas Engelken :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Thompson wrote:

> Raymond Myren wrote:
>
>  
>> Just today I started receiving spam mails with attached .pdf files with
>> a spam image.
>> Any ideas how to stop this spam type?
>>    
>
> Nothing, yet. But since these appear to be an image file encapsulated in
> a .pdf, it may be possible to get FuzzyOCR to parse them for spam text.
>
>  

As was stated earlier...

Until its publicly released, you can request a solution from SARE with a
simple email via the information at
http://www.rulesemporium.com/plugins.htm#pdfinfo

--
Dallas Engelken
dallase@...
http://uribl.com


Re: Spam PDF

by bgodette :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

arni wrote:
> bgodette@... schrieb:
>> Actually it did, take away the spamtrap fed blackholes (PBL and SPAMCOP)
>> and the spamtrap fed BAYES as well and it scores a whopping 3.1 thanks
>> to the BOTNET plugin (which is amazing btw). That hit was all from
>> late-receiver effect.
>>
> That sounds a bit like "if we stopped trying to detect spam, we'd fail
> to catch it"
>
Sounds more like "if we didn't rely on other people to have seen this
particular abusive host before us and our learning system to have seen
past examples of spam that looks a whole lot like this one from headers
alone to detect this particular spam, we'd fail to catch it until we've
trained our system and the abusive host has been reported to various lists".

That's what makes policy (e.g. MTA checks, BOTNET) and behavior based
detection work as well as it does, it's proactive instead of reactive.

Re: Spam PDF

by John Rudd :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

bgodette@... wrote:
>> Actually, it didn't.  The assertion is that if someone else hadn't seen
>> this exact message first, then SA wouldn't have caught it.
>
> No, the assertion is that if someone else hadn't seen prior abuse from
> the sending host first (not this exact message), then SA wouldn't have
> caught that particular message. That assertion happens to be true for
> the blacklists, and true for BAYES as well since it would have had to
> have seen headers (since the payload is vastly different) that look like
> this sending host in the recent past and been told that it was SPAM.

Your assertion about bayes is not well supported.  It might have been
flagged by bayes for reasons that have _NOTHING_ to do with the received
headers.


>> The PBL (which isn't spamtrap fed, it's collected from ISP published
>> and/or contributed data) would have caught this based upon issues that
>> have nothing at all to do with this message, and most likely nothing at
>> all to do with this current round of spam.  It would be based upon the
>> host provider's policy that this host shouldn't send email to the internet.
>
> Which means, some time, in the past, for whatever reasons that
> particular IP address did something against someone's policy to end up
> on that list. The important part being "in the past".

No, it means that the ISP, or possibly net block user, told Spamhaus
"it's an end user IP address, and not a mail server".  There might be
_NO_ previous abuse from that IP address, and they'll still be listed.
The "policy" here is NOT the recipient's policy, the sendering network
owner's policy.


>> Similarly, the SPAMCOP listing is most likely not related to _this_
>> message.  It is more likely an ongoing abuse issue, so the fact that the
>> host fed a spamtrap at spamcop at some point in the past does not mean
>> that they were "lucky to catch this message".  The odds are that the
>> SPAMCOP listing has nothing to do with this message.
>
> Spamcop automatically delists IP addresses over time, to be relisted
> someone/something has to report new abuse. If you happen to receive the
> message before anyone has reported the new abuse, well it won't be listed.

It could have been recent abuse from an entirely different message
batch.  In other words, maybe that IP sent a standard stock scam
yesterday, and today it sent the pdf spam ... and this person was the
first one to receive that pdf spam message.  No previous recipient of
the same message.  But they'll still be listed at spamcop.


>> I would make the same characterization of BAYES.  You don't have to see
>> a specific message in the past in order for BAYES to catch it.
>> Therefore, you're not depending upon "luckily not being the first person
>> to see a given message".
>
> Explain how BAYES will have any matching tokens to work on if its from a
> fresh, never before seen by your system, zombie and there's no message
> body other than the attachment? All you have to work with is headers
> which you've never seen before and MIME boundaries which you've never
> seen before.

There are more headers than just the received headers.  And, I honestly
don't know whether or not an attachment's raw data is analyzed by bayes
or not.  My assumption is that it is.


>> Just resting upon BAYES, BOTNET, and PBL, you're not "lucky to have
>> caught the message because you're a late receiver".  You've caught the
>> message due to a combination of policy, misuse, and historical
>> characteristics of spam in general being used to train your system.
>
> All of which needs prior examples/reporting of messages similar to the
> one you're trying to detect, that's what "historical characteristics of
> spam" means.

BOTNET does _NOT_ need prior reporting.  And the prior reporting the PBL
require has nothing to do with abuse.  Further, BAYES does not depend
upon the received headers.  But even if you're right about bayes, your
claim that "all of which needs prior..." is at least 2/3 wrong, if not
3/3 wrong.




Re: Spam PDF

by arni-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

bgodette@... schrieb:

Sounds more like "if we didn't rely on other people to have seen this
particular abusive host before us and our learning system to have seen
past examples of spam that looks a whole lot like this one from headers
alone to detect this particular spam, we'd fail to catch it until we've
trained our system and the abusive host has been reported to various lists".

That's what makes policy (e.g. MTA checks, BOTNET) and behavior based
detection work as well as it does, it's proactive instead of reactive.

  
I have no spam that doesnt score at least BAYES_80 - BAYES_80 is 3.5 points here, BOTNET is 3 points here, makes 6.5 total and a bust.

Doesnt have anything to do with beeing a late reciever as i recieve this spam on a whole lot of addresses and not just one - please dont tell me you think i'm a late reciever on all.

arni

Re: Spam PDF

by Robert Schetterer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dallas Engelken schrieb:

> John Thompson wrote:
>> Raymond Myren wrote:
>>
>>  
>>> Just today I started receiving spam mails with attached .pdf files with
>>> a spam image.
>>> Any ideas how to stop this spam type?
>>>    
>>
>> Nothing, yet. But since these appear to be an image file encapsulated in
>> a .pdf, it may be possible to get FuzzyOCR to parse them for spam text.
>>
>>  
>
> As was stated earlier...
>
> Until its publicly released, you can request a solution from SARE with a
> simple email via the information at
> http://www.rulesemporium.com/plugins.htm#pdfinfo
>
Hi Dallas,
i am lucky to report that your rules matched
all pdf spam ( i had 4 ) caught in the past at my servers
good work!

- --
Mit freundlichen Gruessen
Best Regards

Robert Schetterer

https://www.schetterer.org
Germany
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGhBqqfGH2AvR16oERAgAQAJ9oxicM6V+oEounEOTeLFy1z7DhXQCdF+oV
FOpwKaJuhnfGHtLsnQONOqM=
=O0xn
-----END PGP SIGNATURE-----


Re: Spam PDF

by John Rudd :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Rudd wrote:

> The "policy" here is NOT the recipient's policy, the sendering network
> owner's policy.


That was a rather mangled sentence...


The "policy" that is the P in PBL is not the recipient's spam/abuse/etc.
policy, it's the sending network owner's policy about who should or
shouldn't be allowed to send email out to the internet instead of going
through a network-owner controlled mail server.

Re: Spam PDF

by Dallas Engelken :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Robert Schetterer wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Dallas Engelken schrieb:
>  
>> John Thompson wrote:
>>    
>>> Raymond Myren wrote:
>>>
>>>  
>>>      
>>>> Just today I started receiving spam mails with attached .pdf files with
>>>> a spam image.
>>>> Any ideas how to stop this spam type?
>>>>    
>>>>        
>>> Nothing, yet. But since these appear to be an image file encapsulated in
>>> a .pdf, it may be possible to get FuzzyOCR to parse them for spam text.
>>>
>>>  
>>>      
>> As was stated earlier...
>>
>> Until its publicly released, you can request a solution from SARE with a
>> simple email via the information at
>> http://www.rulesemporium.com/plugins.htm#pdfinfo
>>
>>    
> Hi Dallas,
> i am lucky to report that your rules matched
> all pdf spam ( i had 4 ) caught in the past at my servers
> good work!
>
>
>  

Good, as expected.   Thanks for the feedback.

--
Dallas Engelken
dallase@...
http://uribl.com


Re: Spam PDF

by Claude Frantz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raymond Myren wrote:

> Just today I started receiving spam mails with attached .pdf files with
> a spam image.
> Any ideas how to stop this spam type?

I was able to decode to plain text using the following commands:

cat report.pdf | acroread -toPostScript  -level2 -saveVM | ps2ascii

Finally, very simple.

Claude

Re: Spam PDF

by Raymond Dijkxhoorn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi!

>> Just today I started receiving spam mails with attached .pdf files with a
>> spam image.
>> Any ideas how to stop this spam type?

> I was able to decode to plain text using the following commands:
>
> cat report.pdf | acroread -toPostScript  -level2 -saveVM | ps2ascii

And this scales? :)

Bye,
Raymond.

Re: Spam PDF

by Claude Frantz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Raymond Dijkxhoorn wrote:

>> I was able to decode to plain text using the following commands:
>>
>> cat report.pdf | acroread -toPostScript  -level2 -saveVM | ps2ascii
>
> And this scales? :)

It worked for me on an example of the many similar SPAM messages I have
got. It will probably not work with any one. Have a try and report us
about your own results.

Claude

Re: Spam PDF

by Raymond Dijkxhoorn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Clause,

>>> I was able to decode to plain text using the following commands:
>>>
>>> cat report.pdf | acroread -toPostScript  -level2 -saveVM | ps2ascii

>> And this scales? :)

> It worked for me on an example of the many similar SPAM messages I have got.
> It will probably not work with any one. Have a try and report us about your
> own results.

No i tested acroread but its not exactly a lightweight tool to do this
conversions. You can allmost better open the PDF and filter them manually ;)

If you get a couple of thousand an hour, like we do now, it aint fun with
acroread.

Bye,
Raymond.

Re: Spam PDF

by Loren Wilton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>> I was able to decode to plain text using the following commands:
>>>
>>> cat report.pdf | acroread -toPostScript  -level2 -saveVM | ps2ascii

There are two forms of these PDF spams.  The first ones had plain text and
looked very professional.  The second wave is image spam wrapped in a PDF,
and has all the usual ugly spammer tricks in the image to try to make it
unreadable by spam tools.  What it mostly does is makes it unreadable by
people, of course.

        Loren



Re: Spam PDF

by Claude Frantz-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Just another command sequence which worked well on a file containing an
image too:

gs -sOutputFile=hugo -sDEVICE=pnmraw -dNOPAUSE -dBATCH -r600x600 hugo.pdf
cat hugo | pamthreshold -simple -threshold 0.5 | pamtopnm |  ocrad
--format=utf8

This could be a base for another prep and scanset for FuzzyOcr.

Just some ideas....

Claude

Re: Spam PDF

by Ralf Hildebrandt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

* Raymond Dijkxhoorn <raymond@...>:

> No i tested acroread but its not exactly a lightweight tool to do this
> conversions. You can allmost better open the PDF and filter them manually ;)
>
> If you get a couple of thousand an hour, like we do now, it aint fun with
> acroread.

Why not use pdf2ascii?

--
Ralf Hildebrandt (i.A. des IT-Zentrums)         Ralf.Hildebrandt@...
Charite - Universitätsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-Berlin    Fax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF                    send no mail to plonk@...

Re: Spam PDF

by Yet Another Ninja :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 6/29/2007 1:27 PM, Ralf Hildebrandt wrote:

> * Raymond Dijkxhoorn <raymond@...>:
>
>> No i tested acroread but its not exactly a lightweight tool to do this
>> conversions. You can allmost better open the PDF and filter them manually ;)
>>
>> If you get a couple of thousand an hour, like we do now, it aint fun with
>> acroread.
>
> Why not use pdf2ascii?
>

Why not use PDFinfo?


Re: Spam PDF

by Andy Sutton-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, 2007-06-29 at 12:58 +0200, Claude Frantz wrote:
> I was able to decode to plain text using the following commands:
>
> cat report.pdf | acroread -toPostScript  -level2 -saveVM | ps2ascii
>
> Finally, very simple.

Don't forget to filter escapes, or you might get a .pdf that includes
somethin' nasty (like a cd /; rm -rf *).  :)

--
- Andy

I myself am made entirely of flaws, stitched together with good intentions.
  - Augusten Burroughs

< Prev | 1 - 2 - 3 | Next >