|
View:
New views
17 Messages
—
Rating Filter:
Alert me
|
|
|
bayes_seen = 256GBSpamAssassin-3.2.0
Freebsd6.2 The file bayes_seen has grown in size to 256GB! (274992939008) How do I cap the size limit of this file? I want to have it not grow larger then say 800mb at the most! Thanks. |
|
|
Re: bayes_seen = 256GBOn Wed, Sep 19, 2007 at 02:11:19PM -0700, mfahey wrote:
> > SpamAssassin-3.2.0 > Freebsd6.2 > > The file bayes_seen has grown in size to 256GB! (274992939008) > How do I cap the size limit of this file? I want to have it not grow larger > then say 800mb at the most! You need to expire old bayes tokens. The limit is set not as a size, but as a count of tokens. The default is 150,000 tokens iirc, but you can set it yourself by setting the bayes_expiry_max_db_size to whatever value you want. Pretty much any number you'd reasonably choose will put you less than 800mb. :) To make it expire, either set bayes_auto_expire to 1 and let it expire tokens automatically, or run sa-learn --force-expire and sa-learn --sync. -- Gus |
|
|
Re: bayes_seen = 256GBOn Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote:
> > The file bayes_seen has grown in size to 256GB! (274992939008) > > How do I cap the size limit of this file? I want to have it not grow larger > > then say 800mb at the most! > > You need to expire old bayes tokens. The limit is set not as a size, but as Expiring bayes tokens does nothing to the bayes_seen file. There is no expiry for bayes_seen. If the seen file is bigger than you'd like, I'd just rm the file. -- Randomly Selected Tagline: "No animals were harmed in this production. Any resemblence to other smoking cats, real or imagined, is purely coincidental." - Richard Basile |
|
|
Re: bayes_seen = 256GBOn Wed, 19 Sep 2007 at 15:23 -0600, mrgus@... confabulated:
> On Wed, Sep 19, 2007 at 02:11:19PM -0700, mfahey wrote: >> >> SpamAssassin-3.2.0 >> Freebsd6.2 >> >> The file bayes_seen has grown in size to 256GB! (274992939008) >> How do I cap the size limit of this file? I want to have it not grow larger >> then say 800mb at the most! > > You need to expire old bayes tokens. The limit is set not as a size, but as > a count of tokens. The default is 150,000 tokens iirc, but you can set it > yourself by setting the bayes_expiry_max_db_size to whatever value you > want. Pretty much any number you'd reasonably choose will put you less than > 800mb. :) > > To make it expire, either set bayes_auto_expire to 1 and let it expire > tokens automatically, or run sa-learn --force-expire and sa-learn --sync. Doesn't --force-expire also do a sync? ------ _|_ (_| | |
|
|
R: bayes_seen = 256GB> -----Messaggio originale-----
> Da: mfahey [mailto:mfahey@...] > Inviato: mercoledì 19 settembre 2007 23.11 > A: users@... > Oggetto: bayes_seen = 256GB > > > SpamAssassin-3.2.0 > Freebsd6.2 > > The file bayes_seen has grown in size to 256GB! (274992939008) > How do I cap the size limit of this file? I want to have it not grow > larger > then say 800mb at the most! What about this: bayes_expiry_max_db_size <max-num-of-tokens> ? Despite its name, this conf directive wants the maximum number of tokens to be kept into the Bayes db. Docs says 150,000 tokens are roughly equivalent to 8MB. You could attempt to scale it to a number suitable to your needs. Giampaolo > > Thanks. > > -- > View this message in context: http://www.nabble.com/bayes_seen-%3D- > 256GB-tf4483829.html#a12786313 > Sent from the SpamAssassin - Users mailing list archive at Nabble.com. |
|
|
Re: bayes_seen = 256GBmfahey wrote:
> SpamAssassin-3.2.0 > Freebsd6.2 > > The file bayes_seen has grown in size to 256GB! (274992939008) > How do I cap the size limit of this file? I want to have it not grow larger > then say 800mb at the most! > > Thanks. > > maintenance query that deletes rows over 2 weeks old. |
|
|
Re: bayes_seen = 256GBTheo and all. I know this topic comes up on occasion, but I am not sure
I've ever seen an explanation as to why the bayes_seen file is not auto pruned along with the bayes db file. Since tokens expire in the main DB file, what is the purpose of having a seen file to unlearn tokens which may have long ago been purged? IMO, it would seem logical to also purge the seen file at some sort of cycle so it can't grow so excessively large. Theo Van Dinter wrote: > On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote: > >>> The file bayes_seen has grown in size to 256GB! (274992939008) >>> How do I cap the size limit of this file? I want to have it not grow larger >>> then say 800mb at the most! >>> >> You need to expire old bayes tokens. The limit is set not as a size, but as >> > > Expiring bayes tokens does nothing to the bayes_seen file. There is no expiry > for bayes_seen. > > If the seen file is bigger than you'd like, I'd just rm the file. > > |
|
|
Re: bayes_seen = 256GBDave Koontz wrote:
> Theo and all. I know this topic comes up on occasion, but I am not sure > I've ever seen an explanation as to why the bayes_seen file is not auto > pruned along with the bayes db file. Since tokens expire in the main DB > file, what is the purpose of having a seen file to unlearn tokens which > may have long ago been purged? IMO, it would seem logical to also > purge the seen file at some sort of cycle so it can't grow so > excessively large. > In order to expire from bayes_seen you have to know that there are no longer any tokens from a given msg in the bayes_token database. This is a hard problem, mapping tokens to msgs, so it wasn't done. Likewise no one ever did anything about expiring the bayes_seen entries. Sounds like a good project, there might even be a bugzilla enhancement opened already. Patches are welcome. Michael > Theo Van Dinter wrote: >> On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote: >> >>>> The file bayes_seen has grown in size to 256GB! (274992939008) >>>> How do I cap the size limit of this file? I want to have it not grow larger >>>> then say 800mb at the most! >>>> >>> You need to expire old bayes tokens. The limit is set not as a size, but as >>> >> Expiring bayes tokens does nothing to the bayes_seen file. There is no expiry >> for bayes_seen. >> >> If the seen file is bigger than you'd like, I'd just rm the file. >> >> > |
|
|
Re: bayes_seen = 256GBThanks Michael. I don't see anything in bugzilla, so I am adding that
this to the list. (see Bug 5652) BTW, the link on the submission page for "bug writing guidelines" generates a 404 error. So I will take my best guess here. My request is below. I'd love to take this on myself, but I am far from a perl expert. Any Perl / SA gurus out there who can look at this? Complaints from average users keep coming in to this list, generally after they run out of resources do they notice this flaw. Bugzilla #5652 - bayes_seen - auto expire http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5652 --- bayes_seen db grows without any purge cycle, even if previously learned tokens have long been expired for the main bayes db. Users non-sa saavy often complain of over sized seen db file sizes, at times from 250mb-4GB in size. Request for a new process and variable to control the seen db size... perhaps: Bayes_Unlearn_Threshold_days Where a user could enter a value for how many days to keep the seen DB tokens and expire those older than that threshold. Perhaps a DEFAULT value of 7 days would be in order as most spam campains last a single day at most. A 30 day purge should be more than safe for most anyone and bets a non-expiry system. Michael Parker wrote: > Dave Koontz wrote: > >> Theo and all. I know this topic comes up on occasion, but I am not sure >> I've ever seen an explanation as to why the bayes_seen file is not auto >> pruned along with the bayes db file. Since tokens expire in the main DB >> file, what is the purpose of having a seen file to unlearn tokens which >> may have long ago been purged? IMO, it would seem logical to also >> purge the seen file at some sort of cycle so it can't grow so >> excessively large. >> >> > > In order to expire from bayes_seen you have to know that there are no > longer any tokens from a given msg in the bayes_token database. This is > a hard problem, mapping tokens to msgs, so it wasn't done. Likewise no > one ever did anything about expiring the bayes_seen entries. > > Sounds like a good project, there might even be a bugzilla enhancement > opened already. > > Patches are welcome. > > Michael > > > > >> Theo Van Dinter wrote: >> >>> On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote: >>> >>> >>>>> The file bayes_seen has grown in size to 256GB! (274992939008) >>>>> How do I cap the size limit of this file? I want to have it not grow larger >>>>> then say 800mb at the most! >>>>> >>>>> >>>> You need to expire old bayes tokens. The limit is set not as a size, but as >>>> >>>> >>> Expiring bayes tokens does nothing to the bayes_seen file. There is no expiry >>> for bayes_seen. >>> >>> If the seen file is bigger than you'd like, I'd just rm the file. >>> >>> >>> > > |
|
|
R: bayes_seen = 256GB> -----Messaggio originale-----
> Da: Michael Parker [mailto:parkerm@...] > > In order to expire from bayes_seen you have to know that there are no > longer any tokens from a given msg in the bayes_token database. This > is > a hard problem, mapping tokens to msgs, so it wasn't done. This could be achieved with a many-to-many table, mapping message IDs (bayes_seen entries) to their tokens (bayes_token entries). This many-to-many relation may be keyed on message ids only, by the way. Was this discarded because a many-to-many relation is regarded as overkilling? > Likewise no one ever did anything about expiring the bayes_seen > entries. I guess this would need a further key on bayes_seen: the time of insertion in the db. Was this discarded because the DB_File (and BerkeleyDB) doesn't allow for multiple keys on databases? It seems to me that most enhancements to the Bayes database would require switching to BerkeleyDB and waiting for a version implementing the secondary databases semantics of BerkeleyDB, otherwise most of them would be allowed only on SQL-based storage. Giampaolo > > Sounds like a good project, there might even be a bugzilla enhancement > opened already. > > Patches are welcome. > > Michael > > > > > Theo Van Dinter wrote: > >> On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote: > >> > >>>> The file bayes_seen has grown in size to 256GB! (274992939008) > >>>> How do I cap the size limit of this file? I want to have it not > grow larger > >>>> then say 800mb at the most! > >>>> > >>> You need to expire old bayes tokens. The limit is set not as a > size, but as > >>> > >> Expiring bayes tokens does nothing to the bayes_seen file. There is > no expiry > >> for bayes_seen. > >> > >> If the seen file is bigger than you'd like, I'd just rm the file. > >> > >> > > |
|
|
Re: bayes_seen = 256GBOn Wed, Sep 19, 2007 at 05:55:20PM -0400, Dave Koontz wrote:
> Theo and all. I know this topic comes up on occasion, but I am not sure > I've ever seen an explanation as to why the bayes_seen file is not auto > pruned along with the bayes db file. Since tokens expire in the main DB > file, what is the purpose of having a seen file to unlearn tokens which > may have long ago been purged? IMO, it would seem logical to also > purge the seen file at some sort of cycle so it can't grow so > excessively large. Sure, patches welcome. :) Seriously, it would require someone to write the code to deal with expiry, and to upgrade people's seen files (or otherwise handle that situation), etc. At a minimum, just adding in a timestamp would help, but if you wanted to have some mapping of tokens to message, then that's a whole huge thing. Oh, and you'd need to support SQL and DBM, of course. Since you can just rm the seen file or do a "delete from" in SQL, and have everything continue to work, it hasn't been considered a priority. But if you think it's important enough to get in, we're happy to accept the patch to implement it. -- Randomly Selected Tagline: Forgetfulness, n.: A gift of God bestowed upon debtors in compensation for their destitution of conscience. |
|
|
Re: R: bayes_seen = 256GBThanks for all the posts. We are running global bayes filtering. Im gathering then the only way is to removal bayes* and restart spamd. I've tried expiring tokens before and it doesnt not reduce the size of bayes_seen. Can someone post the relevent info
to the dev list, maybe it will get implemented in the next version! Being persistant will get this added. Resetting The bayes files when Processing about 1 million messages per day tends to let alot of spam threw and is less becoming an option. Thanks.
|
|
|
Re: bayes_seen = 256GBIf tokens are expired from the DB based on time, and assuming *all* tokens
older than some date are expired, wouldn't it be reasonable to prune bayes_seen to the expiry date after the expiry run? Of course this assumes bayes_seen has date stamps in the sequential data, which may well not be the case; but could perhaps be added. Loren |
|
|
Re: R: bayes_seen = 256GBYou missed the critical posts. Just manually rm bayes_seen and keep going.
bayes_seen isn't the bayes database. Loren ----- Original Message ----- From: "mfahey" <mfahey@...> To: <users@...> Sent: Wednesday, September 19, 2007 7:29 PM Subject: Re: R: bayes_seen = 256GB Thanks for all the posts. We are running global bayes filtering. Im gathering then the only way is to removal bayes* and restart spamd. I've tried expiring tokens before and it doesnt not reduce the size of bayes_seen. Can someone post the relevent info to the dev list, maybe it will get implemented in the next version! Being persistant will get this added. Resetting The bayes files when Processing about 1 million messages per day tends to let alot of spam threw and is less becoming an option. |
|
|
Re: bayes_seen = 256GB"Loren Wilton" <lwilton@...> writes:
> If tokens are expired from the DB based on time, and assuming *all* > tokens older than some date are expired, wouldn't it be reasonable to > prune bayes_seen to the expiry date after the expiry run? You cannot assume that all tokens earlier than some date have expired. A token (in bayes_token) is only expired when its last occurrence in an email was before the expiry interval. So it is perfectly possible for a token from the very first email ever learnt to still be in bayes years later. |
|
|
RE: bayes_seen = 256GBI had the exact same problem with a mySQL setup. The problem was
permissions, the mySQL user did not have delete permissions for the that table so it could not remove the rows. Once I did that, everything started working fine. -----Original Message----- From: Dave Koontz [mailto:dkoontz@...] Sent: Wednesday, September 19, 2007 5:55 PM To: users@... Subject: Re: bayes_seen = 256GB Theo and all. I know this topic comes up on occasion, but I am not sure I've ever seen an explanation as to why the bayes_seen file is not auto pruned along with the bayes db file. Since tokens expire in the main DB file, what is the purpose of having a seen file to unlearn tokens which may have long ago been purged? IMO, it would seem logical to also purge the seen file at some sort of cycle so it can't grow so excessively large. Theo Van Dinter wrote: > On Wed, Sep 19, 2007 at 03:23:50PM -0600, Mr. Gus wrote: > >>> The file bayes_seen has grown in size to 256GB! (274992939008) >>> How do I cap the size limit of this file? I want to have it not grow larger >>> then say 800mb at the most! >>> >> You need to expire old bayes tokens. The limit is set not as a size, but as >> > > Expiring bayes tokens does nothing to the bayes_seen file. There is no expiry > for bayes_seen. > > If the seen file is bigger than you'd like, I'd just rm the file. > > |
|
|
Re: bayes_seen = 256GBOn Thursday 20 September 2007 07:59, Graham Murray wrote:
> "Loren Wilton" <lwilton@...> writes: > > If tokens are expired from the DB based on time, and assuming *all* > > tokens older than some date are expired, wouldn't it be reasonable to > > prune bayes_seen to the expiry date after the expiry run? > > You cannot assume that all tokens earlier than some date have expired. A > token (in bayes_token) is only expired when its last occurrence in an > email was before the expiry interval. So it is perfectly possible for a > token from the very first email ever learnt to still be in bayes years > later. probably don't want to relearn an old message anyway. The Bayes system can record the message date (e.g. from the top Received: field), expire messages older than a certain age, and refuse to learn older messages, unless explicitly overridden (for example when populating a clean bayes DB with an initial corpus). -- Magnus Holmgren holmgren@... (No Cc of list mail needed, thanks) |
| Free embeddable forum powered by Nabble | Forum Help |