Bayesian filtering not kicking in, but it's trained.

View: New views
11 Messages — Rating Filter:   Alert me  

Bayesian filtering not kicking in, but it's trained.

by RinkWorks :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm trying to run Spam Assassin 3.1.7 as root on a Linux machine (Debian Etch, Perl 5.8.8), with individual user Bayes databases.  Everything seems to be working except that I'm getting no BAYES_* scores for anything.  So, when reading mail for the 'ss1' user (which is me), I see lots of SpamAssassin headers but no BAYES scores.  However, ~ss1/.spamassassin is populated with bayes_seen and bayes_toks (no bayes_journal), and I am able to run sa-learn as the 'ss1' user and see these files being updated with the new data.

As far as autolearn goes, some emails are "autolearn=ham" but the rest are "autolearn=no" -- I don't see that I'm getting anything being autolearned as spam, but maybe I haven't gotten anything recently that scored high enough for that.  No idea if the data on the autolearned hams is actually making it to the right bayes database.

Anyway, spamd is running as root.  It's started with "/etc/init.d/spamassassin start" but the process that ultimately results has these arguments:

/usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid

I'm not sure if I'm supposed to run "spamassassin -D --lint" as the 'ss1' user or the 'root' user, so here are both:

If I run "spamassassin -D --lint" as the 'ss1' user, grepping for "bayes", I get this:

[32082] dbg: config: read file /usr/share/spamassassin/23_bayes.cf
[32082] dbg: bayes: tie-ing to DB file R/O /home/ss1/.spamassassin/bayes_toks
[32082] dbg: bayes: tie-ing to DB file R/O /home/ss1/.spamassassin/bayes_seen
[32082] dbg: bayes: found bayes db version 3
[32082] dbg: bayes: DB journal sync: last sync: 0
[32082] dbg: bayes: DB journal sync: last sync: 0
[32082] dbg: bayes: corpus size: nspam = 2655, nham = 786
[32082] dbg: bayes: score = 0.168968394084945
[32082] dbg: bayes: DB journal sync: last sync: 0
[32082] dbg: bayes: untie-ing
[32082] dbg: bayes: untie-ing db_toks
[32082] dbg: bayes: untie-ing db_seen

If I run "spamassassin -D --lint" as the 'root' user, grepping for "bayes", I get this:

[32666] dbg: config: read file /usr/share/spamassassin/23_bayes.cf
[32666] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks
[32666] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks
[32666] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks
[32666] dbg: bayes: not scoring message, returning undef
[32666] dbg: bayes: opportunistic call attempt failed, DB not readable

...but that's expected, right?  I'm running as root, which doesn't have its own bayes database, but I want to have individual user bayes databases, and so mail sent to 'ss1' should be using the bayes files in ~ss1/.spamassassin/bayes.  Right?  Or is this the problem?

Thanks so much in advance for any help any of you can give.

Re: Bayesian filtering not kicking in, but it's trained.

by Matus UHLAR - fantomas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 05.09.07 08:28, RinkWorks wrote:
> Subject: Bayesian filtering not kicking in, but it's trained.

is it trained with enough of spams and hams?

--
Matus UHLAR - fantomas, uhlar@... ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
WinError #98652: Operation completed successfully.

Re: Bayesian filtering not kicking in, but it's trained.

by RinkWorks :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matus UHLAR - fantomas wrote:
is it trained with enough of spams and hams?
Yes.  I've got the defaults of 200 hams and 200 spams required, and as you can see from the -D output, I've got 2655 spams and 786 hams that it currently knows about in the ss1 user's bayes data files.

Re: Bayesian filtering not kicking in, but it's trained.

by Matus UHLAR - fantomas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 05.09.07 08:28, RinkWorks wrote:
> I'm trying to run Spam Assassin 3.1.7 as root on a Linux machine (Debian
> Etch, Perl 5.8.8), with individual user Bayes databases.  Everything seems
> to be working except that I'm getting no BAYES_* scores for anything.  So,
> when reading mail for the 'ss1' user (which is me), I see lots of
> SpamAssassin headers but no BAYES scores.  However, ~ss1/.spamassassin is
> populated with bayes_seen and bayes_toks (no bayes_journal), and I am able
> to run sa-learn as the 'ss1' user and see these files being updated with the
> new data.

> If I run "spamassassin -D --lint" as the 'ss1' user, grepping for "bayes", I
> get this:
>
> [32082] dbg: config: read file /usr/share/spamassassin/23_bayes.cf
> [32082] dbg: bayes: tie-ing to DB file R/O
> /home/ss1/.spamassassin/bayes_toks
> [32082] dbg: bayes: tie-ing to DB file R/O
> /home/ss1/.spamassassin/bayes_seen
> [32082] dbg: bayes: found bayes db version 3
> [32082] dbg: bayes: DB journal sync: last sync: 0
> [32082] dbg: bayes: DB journal sync: last sync: 0
> [32082] dbg: bayes: corpus size: nspam = 2655, nham = 786
> [32082] dbg: bayes: score = 0.168968394084945
> [32082] dbg: bayes: DB journal sync: last sync: 0
> [32082] dbg: bayes: untie-ing
> [32082] dbg: bayes: untie-ing db_toks
> [32082] dbg: bayes: untie-ing db_seen

This tells that spamassassin did check the bayes database and spam
probability of scanned message is 0.168968394084945 which should be matched
by BAYES_20.

don't you have turned bayes filtering off somewhere? use_bayes_rules 0?

> As far as autolearn goes, some emails are "autolearn=ham" but the rest are
> "autolearn=no" -- I don't see that I'm getting anything being autolearned as
> spam, but maybe I haven't gotten anything recently that scored high enough
> for that.  No idea if the data on the autolearned hams is actually making it
> to the right bayes database.

Do you reject mails with score over some value? The defailt value for spam
learning is 10, if you reject that mail, you'll never autolearn spam...

--
Matus UHLAR - fantomas, uhlar@... ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Fighting for peace is like fucking for virginity...

Re: Bayesian filtering not kicking in, but it's trained.

by RinkWorks :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Matus UHLAR - fantomas wrote:
don't you have turned bayes filtering off somewhere? use_bayes_rules 0?
No.  That's what's so confusing.  But there's an update now.  Apparently at some point
yesterday, BAYES tests just suddenly started showing up.  I wasn't doing anything at the
time; it just suddenly started kicking in.  That doesn't make a whole lot of sense to me
unless I had *just* autolearned enough spams and hams for Bayesian filtering to take hold.
But as I say, I was hundreds of hams and thousands of spams over the minimum long before
that.

So it's a mystery, I guess, but case closed.  But thank you very much for giving this matter
your attention.

Re: Bayesian filtering not kicking in, but it's trained.

by Anthony Peacock :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

RinkWorks wrote:

>
> Matus UHLAR - fantomas wrote:
>> don't you have turned bayes filtering off somewhere? use_bayes_rules 0?
>>
>
> No.  That's what's so confusing.  But there's an update now.  Apparently at
> some point
> yesterday, BAYES tests just suddenly started showing up.  I wasn't doing
> anything at the
> time; it just suddenly started kicking in.  That doesn't make a whole lot of
> sense to me
> unless I had *just* autolearned enough spams and hams for Bayesian filtering
> to take hold.
> But as I say, I was hundreds of hams and thousands of spams over the minimum
> long before
> that.
>
> So it's a mystery, I guess, but case closed.  But thank you very much for
> giving this matter
> your attention.

To me this sounds like the Bayes database you are looking at when you
check the number of learnt messages is not the same one used when
scanning emails.  Are you running the checks on the Bayes database as
the same user that SA runs as normally?

--
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"A CAT scan should take less time than a PET scan.  For a CAT scan,
  they're only looking for one thing, whereas a PET scan could result in
  a lot of things."    - Carl Princi, 2002/07/19

Re: Bayesian filtering not kicking in, but it's trained.

by maillist-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RinkWorks wrote:
> I'm trying to run Spam Assassin 3.1.7 as root
Let me stop you right there.  You cannot run spamd as root.  It drops
privs, and runs as user "nobody".
>
> /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d
> --pidfile=/var/run/spamd.pid
It would be best to create a spamd user, and start with this:

/usr/sbin/spamd --create-prefs --username=spamd --max-children 5 --helper-home-dir -d
--pidfile=/var/run/spamd.pid

You can specify a bayes_path in your config, and run sa-learn as root if you'd like.


-=Aubrey=-




Re: Bayesian filtering not kicking in, but it's trained.

by Kris Deugau :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

maillist wrote:
> RinkWorks wrote:
>> I'm trying to run Spam Assassin 3.1.7 as root
> Let me stop you right there.  You cannot run spamd as root.  It drops
> privs, and runs as user "nobody".

Not quite correct...  spamd will drop privs to nobody *for that call* if
spamc is run by root without -u <non-root-username>.

Otherwise, per-user configs using system users don't work, because if
spamd doesn't run as root, it can't fork and drop priviledges (or
whatever the exact process is;  IIRC it changed a while back) to the
calling user.

I'm happily running 3.1.9's spamd as root, calling spamc from individual
.procmailrc files on several systems.

(I have had to switch to calling "spamassassin" for root's mail
filtering, however.)

-kgd

Re: Bayesian filtering not kicking in, but it's trained.

by RinkWorks :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RinkWorks wrote:
So it's a mystery, I guess, but case closed.  But thank you very much for giving this matter
your attention.
I was wrong -- the case is still open.  But I found out why Bayes wasn't working and then kicked in.

Basically, I discovered that Spam Assassin wasn't paying attention to the whitelist_from statements in my user_prefs file.  So I wondered if it was using a different .spamassassin directory somewhere.  Sure enough, there's a /var/spool/exim4/.spamassassin directory.  The reason why Bayesian filtering wasn't working, then suddenly kicked in, is because *THAT* director's bayesian filtering database hadn't gotten enough hams and spams yet, but eventually it autolearned enough of both to kick in.

That directory is owned by the "Debian-exim4" user, which is the user that owns the exim4 daemon process.  However, the "spamd" processes are running as root.

There must be a way to have spamd run in a way that it looks at each individual user's .spamassassin directory instead of the mail daemon user.  I'd think that would be a common thing.  But I can't figure out how to set it up that way.

Anybody know?

Just to reiterate from before, when "/etc/init.d/spamassassin start" runs, I get a process that looks like this:

root     25165  0.0  1.3  32176 28780 ?        SNs  Sep07   0:05 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid


Thanks in advance.

Re: Bayesian filtering not kicking in, but it's trained.

by Jerry Durand :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, 2007-09-08 at 10:25 -0700, RinkWorks wrote:

> Basically, I discovered that Spam Assassin wasn't paying attention to the
> whitelist_from statements in my user_prefs file.  So I wondered if it was
> using a different .spamassassin directory somewhere.  Sure enough, there's a
> /var/spool/exim4/.spamassassin directory.  The reason why Bayesian filtering
> wasn't working, then suddenly kicked in, is because *THAT* director's
> bayesian filtering database hadn't gotten enough hams and spams yet, but
> eventually it autolearned enough of both to kick in.
>
> That directory is owned by the "Debian-exim4" user, which is the user that
> owns the exim4 daemon process.  However, the "spamd" processes are running
> as root.

Sounds like the default installation of OS X Server.  For that the fix
is deleting one of the directories and putting in a link to the other
one.  Crude, but it works.

--
Jerry Durand, Durand Interstellar, Inc.
Los Gatos, California, USA, www.interstellar.com
tel: +1.408.356.3886, USA:  866-356-3886, Skype:  jerrydurand


Re: Bayesian filtering not kicking in, but it's trained.

by Kris Deugau :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

RinkWorks wrote:
> There must be a way to have spamd run in a way that it looks at each
> individual user's .spamassassin directory instead of the mail daemon user.
> I'd think that would be a common thing.  But I can't figure out how to set
> it up that way.

Whether you can do that depends on how you're calling SA.  If you really
want per-user preferences, SA should be called as one of the last steps
in your mail processing.

It sounds like you've got SA called as an MTA content filter from Exim,
rather than from procmail or maildrop (or whatever else you might be
using) just before delivery.  You **MAY** be able to modify your
existing call to SA enough to do what you want, but content filtering in
the MTA is prone to conflicts between what different recipients want done.

My own per-user setups call SA from individual .procmailrc files, at
which time there is only one recipient for the message, and it's clear
who that recipient is.

-kgd