|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
|
|
Bayesian filtering not kicking in, but it's trained.I'm trying to run Spam Assassin 3.1.7 as root on a Linux machine (Debian Etch, Perl 5.8.8), with individual user Bayes databases. Everything seems to be working except that I'm getting no BAYES_* scores for anything. So, when reading mail for the 'ss1' user (which is me), I see lots of SpamAssassin headers but no BAYES scores. However, ~ss1/.spamassassin is populated with bayes_seen and bayes_toks (no bayes_journal), and I am able to run sa-learn as the 'ss1' user and see these files being updated with the new data.
As far as autolearn goes, some emails are "autolearn=ham" but the rest are "autolearn=no" -- I don't see that I'm getting anything being autolearned as spam, but maybe I haven't gotten anything recently that scored high enough for that. No idea if the data on the autolearned hams is actually making it to the right bayes database. Anyway, spamd is running as root. It's started with "/etc/init.d/spamassassin start" but the process that ultimately results has these arguments: /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid I'm not sure if I'm supposed to run "spamassassin -D --lint" as the 'ss1' user or the 'root' user, so here are both: If I run "spamassassin -D --lint" as the 'ss1' user, grepping for "bayes", I get this: [32082] dbg: config: read file /usr/share/spamassassin/23_bayes.cf [32082] dbg: bayes: tie-ing to DB file R/O /home/ss1/.spamassassin/bayes_toks [32082] dbg: bayes: tie-ing to DB file R/O /home/ss1/.spamassassin/bayes_seen [32082] dbg: bayes: found bayes db version 3 [32082] dbg: bayes: DB journal sync: last sync: 0 [32082] dbg: bayes: DB journal sync: last sync: 0 [32082] dbg: bayes: corpus size: nspam = 2655, nham = 786 [32082] dbg: bayes: score = 0.168968394084945 [32082] dbg: bayes: DB journal sync: last sync: 0 [32082] dbg: bayes: untie-ing [32082] dbg: bayes: untie-ing db_toks [32082] dbg: bayes: untie-ing db_seen If I run "spamassassin -D --lint" as the 'root' user, grepping for "bayes", I get this: [32666] dbg: config: read file /usr/share/spamassassin/23_bayes.cf [32666] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks [32666] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks [32666] dbg: bayes: no dbs present, cannot tie DB R/O: /root/.spamassassin/bayes_toks [32666] dbg: bayes: not scoring message, returning undef [32666] dbg: bayes: opportunistic call attempt failed, DB not readable ...but that's expected, right? I'm running as root, which doesn't have its own bayes database, but I want to have individual user bayes databases, and so mail sent to 'ss1' should be using the bayes files in ~ss1/.spamassassin/bayes. Right? Or is this the problem? Thanks so much in advance for any help any of you can give. |
|
|
Re: Bayesian filtering not kicking in, but it's trained.On 05.09.07 08:28, RinkWorks wrote:
> Subject: Bayesian filtering not kicking in, but it's trained. is it trained with enough of spams and hams? -- Matus UHLAR - fantomas, uhlar@... ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. WinError #98652: Operation completed successfully. |
|
|
Re: Bayesian filtering not kicking in, but it's trained.Yes. I've got the defaults of 200 hams and 200 spams required, and as you can see from the -D output, I've got 2655 spams and 786 hams that it currently knows about in the ss1 user's bayes data files. |
|
|
Re: Bayesian filtering not kicking in, but it's trained.On 05.09.07 08:28, RinkWorks wrote:
> I'm trying to run Spam Assassin 3.1.7 as root on a Linux machine (Debian > Etch, Perl 5.8.8), with individual user Bayes databases. Everything seems > to be working except that I'm getting no BAYES_* scores for anything. So, > when reading mail for the 'ss1' user (which is me), I see lots of > SpamAssassin headers but no BAYES scores. However, ~ss1/.spamassassin is > populated with bayes_seen and bayes_toks (no bayes_journal), and I am able > to run sa-learn as the 'ss1' user and see these files being updated with the > new data. > If I run "spamassassin -D --lint" as the 'ss1' user, grepping for "bayes", I > get this: > > [32082] dbg: config: read file /usr/share/spamassassin/23_bayes.cf > [32082] dbg: bayes: tie-ing to DB file R/O > /home/ss1/.spamassassin/bayes_toks > [32082] dbg: bayes: tie-ing to DB file R/O > /home/ss1/.spamassassin/bayes_seen > [32082] dbg: bayes: found bayes db version 3 > [32082] dbg: bayes: DB journal sync: last sync: 0 > [32082] dbg: bayes: DB journal sync: last sync: 0 > [32082] dbg: bayes: corpus size: nspam = 2655, nham = 786 > [32082] dbg: bayes: score = 0.168968394084945 > [32082] dbg: bayes: DB journal sync: last sync: 0 > [32082] dbg: bayes: untie-ing > [32082] dbg: bayes: untie-ing db_toks > [32082] dbg: bayes: untie-ing db_seen This tells that spamassassin did check the bayes database and spam probability of scanned message is 0.168968394084945 which should be matched by BAYES_20. don't you have turned bayes filtering off somewhere? use_bayes_rules 0? > As far as autolearn goes, some emails are "autolearn=ham" but the rest are > "autolearn=no" -- I don't see that I'm getting anything being autolearned as > spam, but maybe I haven't gotten anything recently that scored high enough > for that. No idea if the data on the autolearned hams is actually making it > to the right bayes database. Do you reject mails with score over some value? The defailt value for spam learning is 10, if you reject that mail, you'll never autolearn spam... -- Matus UHLAR - fantomas, uhlar@... ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Fighting for peace is like fucking for virginity... |
|
|
Re: Bayesian filtering not kicking in, but it's trained.No. That's what's so confusing. But there's an update now. Apparently at some point yesterday, BAYES tests just suddenly started showing up. I wasn't doing anything at the time; it just suddenly started kicking in. That doesn't make a whole lot of sense to me unless I had *just* autolearned enough spams and hams for Bayesian filtering to take hold. But as I say, I was hundreds of hams and thousands of spams over the minimum long before that. So it's a mystery, I guess, but case closed. But thank you very much for giving this matter your attention. |
|
|
Re: Bayesian filtering not kicking in, but it's trained.Hi,
RinkWorks wrote: > > Matus UHLAR - fantomas wrote: >> don't you have turned bayes filtering off somewhere? use_bayes_rules 0? >> > > No. That's what's so confusing. But there's an update now. Apparently at > some point > yesterday, BAYES tests just suddenly started showing up. I wasn't doing > anything at the > time; it just suddenly started kicking in. That doesn't make a whole lot of > sense to me > unless I had *just* autolearned enough spams and hams for Bayesian filtering > to take hold. > But as I say, I was hundreds of hams and thousands of spams over the minimum > long before > that. > > So it's a mystery, I guess, but case closed. But thank you very much for > giving this matter > your attention. To me this sounds like the Bayes database you are looking at when you check the number of learnt messages is not the same one used when scanning emails. Are you running the checks on the Bayes database as the same user that SA runs as normally? -- Anthony Peacock CHIME, Royal Free & University College Medical School WWW: http://www.chime.ucl.ac.uk/~rmhiajp/ "A CAT scan should take less time than a PET scan. For a CAT scan, they're only looking for one thing, whereas a PET scan could result in a lot of things." - Carl Princi, 2002/07/19 |
|
|
Re: Bayesian filtering not kicking in, but it's trained.RinkWorks wrote:
> I'm trying to run Spam Assassin 3.1.7 as root Let me stop you right there. You cannot run spamd as root. It drops privs, and runs as user "nobody". > > /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d > --pidfile=/var/run/spamd.pid It would be best to create a spamd user, and start with this: /usr/sbin/spamd --create-prefs --username=spamd --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid You can specify a bayes_path in your config, and run sa-learn as root if you'd like. -=Aubrey=- |
|
|
Re: Bayesian filtering not kicking in, but it's trained.maillist wrote:
> RinkWorks wrote: >> I'm trying to run Spam Assassin 3.1.7 as root > Let me stop you right there. You cannot run spamd as root. It drops > privs, and runs as user "nobody". Not quite correct... spamd will drop privs to nobody *for that call* if spamc is run by root without -u <non-root-username>. Otherwise, per-user configs using system users don't work, because if spamd doesn't run as root, it can't fork and drop priviledges (or whatever the exact process is; IIRC it changed a while back) to the calling user. I'm happily running 3.1.9's spamd as root, calling spamc from individual .procmailrc files on several systems. (I have had to switch to calling "spamassassin" for root's mail filtering, however.) -kgd |
|
|
Re: Bayesian filtering not kicking in, but it's trained.I was wrong -- the case is still open. But I found out why Bayes wasn't working and then kicked in. Basically, I discovered that Spam Assassin wasn't paying attention to the whitelist_from statements in my user_prefs file. So I wondered if it was using a different .spamassassin directory somewhere. Sure enough, there's a /var/spool/exim4/.spamassassin directory. The reason why Bayesian filtering wasn't working, then suddenly kicked in, is because *THAT* director's bayesian filtering database hadn't gotten enough hams and spams yet, but eventually it autolearned enough of both to kick in. That directory is owned by the "Debian-exim4" user, which is the user that owns the exim4 daemon process. However, the "spamd" processes are running as root. There must be a way to have spamd run in a way that it looks at each individual user's .spamassassin directory instead of the mail daemon user. I'd think that would be a common thing. But I can't figure out how to set it up that way. Anybody know? Just to reiterate from before, when "/etc/init.d/spamassassin start" runs, I get a process that looks like this: root 25165 0.0 1.3 32176 28780 ? SNs Sep07 0:05 /usr/sbin/spamd --create-prefs --max-children 5 --helper-home-dir -d --pidfile=/var/run/spamd.pid Thanks in advance. |
|
|
Re: Bayesian filtering not kicking in, but it's trained.On Sat, 2007-09-08 at 10:25 -0700, RinkWorks wrote:
> Basically, I discovered that Spam Assassin wasn't paying attention to the > whitelist_from statements in my user_prefs file. So I wondered if it was > using a different .spamassassin directory somewhere. Sure enough, there's a > /var/spool/exim4/.spamassassin directory. The reason why Bayesian filtering > wasn't working, then suddenly kicked in, is because *THAT* director's > bayesian filtering database hadn't gotten enough hams and spams yet, but > eventually it autolearned enough of both to kick in. > > That directory is owned by the "Debian-exim4" user, which is the user that > owns the exim4 daemon process. However, the "spamd" processes are running > as root. Sounds like the default installation of OS X Server. For that the fix is deleting one of the directories and putting in a link to the other one. Crude, but it works. -- Jerry Durand, Durand Interstellar, Inc. Los Gatos, California, USA, www.interstellar.com tel: +1.408.356.3886, USA: 866-356-3886, Skype: jerrydurand |
|
|
Re: Bayesian filtering not kicking in, but it's trained.RinkWorks wrote:
> There must be a way to have spamd run in a way that it looks at each > individual user's .spamassassin directory instead of the mail daemon user. > I'd think that would be a common thing. But I can't figure out how to set > it up that way. Whether you can do that depends on how you're calling SA. If you really want per-user preferences, SA should be called as one of the last steps in your mail processing. It sounds like you've got SA called as an MTA content filter from Exim, rather than from procmail or maildrop (or whatever else you might be using) just before delivery. You **MAY** be able to modify your existing call to SA enough to do what you want, but content filtering in the MTA is prone to conflicts between what different recipients want done. My own per-user setups call SA from individual .procmailrc files, at which time there is only one recipient for the message, and it's clear who that recipient is. -kgd |
| Free embeddable forum powered by Nabble | Forum Help |