Bad performance of Bayes with MySQL cluster

View: New views
9 Messages — Rating Filter:   Alert me  

Bad performance of Bayes with MySQL cluster

by Jorn Argelo-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi All,

I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes, and all three of them are sharing the same bayes DB using a MySQL cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes with a quadcore and 4 GB of memory. Everything is working fine, even the AWL in SQL, except for Bayes. The bayes database currently houses a bit less than 500k tokens and the database size is not very big either, as the datanodes have less than 1 GB of storage in use. I've followed the instructions from the Spamassassin wiki, and I also used the supplied bayes_mysql.sql file to create my tables. In case anyone is interested, you can find the cluster.ini and the my.cnf used on the SQL nodes here:

http://www.wcborstel.com/web/mysql/my.cnf
http://www.wcborstel.com/web/mysql/cluster.ini

I've been doing quite a bit of research and so on. First I thought it were the settings of my cluster, as I knew there was a lot to be tuned. Things like query cache sizes, thread cache, table cache, specific NDB settings et cetera. Unfortunately that didn't have seemed to help. I came to the conclusion that the bayes table was simply too heavily used. I have scantimes of 30-200+ seconds with bayes enabled, while I have scantimes under 8 seconds when disabling bayes.

Now the problem at the first glance seems to be, from my perspective (please correct me if I'm wrong), the actual queries being done. For every mail being scanned by spamassassin, it seems to be doing the "SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token" query every time. This effectively requesting the entire bayes_token table, which can take up to 10-20 seconds. Now one would think that this is a nice canidate to cache. I would agree, unfortunately the MySQL query cache is not very efficient here, seeing as the atime of a token is being updated continuously. In other words, the cache is pretty much invalid most of the time. My Qcache hits is also very low (I noticed 8k inserts with about 250 cache hits). It seems that the query cache is either not suitable for this or I am doing something majorly wrong :)

Here is how I came to my findings. Note I removed some SELECT RPAD rows to avoid spammyness (they show essentially the same as the other rows anyway):


mysql> show processlist\G
*************************** 1. row ***************************
     Id: 1
   User: system user
   Host:
     db:
Command: Daemon
   Time: 0
  State: Waiting for event from ndbcluster
   Info: NULL
FROM bayes_token
    
*************************** 3. row ***************************
     Id: 1464
   User: bayes
   Host: ::ffff:1.2.3.4:57082
     db: spamd
Command: Query
   Time: 13
  State: Sending data
   Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
                     FROM bayes_token
    
   
*************************** 5. row ***************************
     Id: 1479
   User: bayes
   Host: ::ffff:1.2.3.4:57133
     db: spamd
Command: Query
   Time: 24
  State: Searching rows for update
   Info: UPDATE bayes_token SET atime = '1250259027' WHERE id = '3' AND token IN ('e?5?U','?;?6','?e?F?','?
    
*************************** 8. row ***************************
     Id: 1485
   User: bayes
   Host: ::ffff:1.2.3.4:57148
     db: spamd
Command: Query
   Time: 18
  State: Sending data
   Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
                     FROM bayes_token
    
*************************** 9. row ***************************
     Id: 1487
   User: bayes
   Host: ::ffff:1.2.3.4:57155
     db: spamd
Command: Query
   Time: 18
  State: Sending data
   Info: SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
                     FROM bayes_token
    
    
12 rows in set (0.00 sec)

As you can see, row #9 has been executing for 18 seconds already. I was first playing around with trying to create some additional indexes, but I've seen a couple of SELECT queries where the indexes where actually used and that was pretty quick. Now I am by far not a MySQL guru, so again, if anyone has any info in regards to creating additional indexes I would love to hear them. Currently I don't have any indexes other than those provided by the bayes_mysql.sql file.

Currently I'm running my mail servers without bayes where they are performing fine. Does anyone have any recommendations or experiences with this? Or perhaps is there more information needed? Also will adding more memory to my datanodes solve anything?

Thanks a lot for any feedback.

Best regards,

Jorn Argelo



__________ Information from ESET NOD32 Antivirus, version of virus signature database 4336 (20090814) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Re: Bad performance of Bayes with MySQL cluster

by Henrik K :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:

> Hi All,
>
> I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes,  
> and all three of them are sharing the same bayes DB using a MySQL  
> cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes  
> with a quadcore and 4 GB of memory. Everything is working fine, even the  
> AWL in SQL, except for Bayes. The bayes database currently houses a bit  
> less than 500k tokens and the database size is not very big either, as  
> the datanodes have less than 1 GB of storage in use. I've followed the  
> instructions from the Spamassassin wiki, and I also used the supplied  
> bayes_mysql.sql file to create my tables. In case anyone is interested,  
> you can find the cluster.ini and the my.cnf used on the SQL nodes here:
>
> http://www.wcborstel.com/web/mysql/my.cnf

skip-innodb

That's pretty much the reason. You _need_ to use InnoDB as it has row level
locking. MyISAM just kills Bayes.

> Now the problem at the first glance seems to be, from my perspective  
> (please correct me if I'm wrong), the actual queries being done. For  
> every mail being scanned by spamassassin, it seems to be doing the  
> "SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM  
> bayes_token" query every time. This effectively requesting the entire  
> bayes_token table

What you are seeing are expiry runs.

As you right now use MyISAM, the whole table is locked for such operations
so you are pretty much hosed.

In any case, you should use "bayes_auto_expire 0" and run expire for example
once every night when traffic is slower.

> It seems that the query cache is either not suitable for this or I am
> doing something majorly wrong :)

You are right. Better to disable completely if there's nothing else running
that uses it and save little CPU.


Re: Bad performance of Bayes with MySQL cluster

by Jorn Argelo-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik K wrote:
On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
  
Hi All,

I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three boxes,  
and all three of them are sharing the same bayes DB using a MySQL  
cluster, version 7.0.6 (based on 5.1.34). The cluster has 2 datanodes  
with a quadcore and 4 GB of memory. Everything is working fine, even the  
AWL in SQL, except for Bayes. The bayes database currently houses a bit  
less than 500k tokens and the database size is not very big either, as  
the datanodes have less than 1 GB of storage in use. I've followed the  
instructions from the Spamassassin wiki, and I also used the supplied  
bayes_mysql.sql file to create my tables. In case anyone is interested,  
you can find the cluster.ini and the my.cnf used on the SQL nodes here:

http://www.wcborstel.com/web/mysql/my.cnf
    

skip-innodb

That's pretty much the reason. You _need_ to use InnoDB as it has row level
locking. MyISAM just kills Bayes.
  
Actually I'm using NDB and not MyISAM. I need a clustered storage engine, otherwise the bayes DB can't really be shared. If I create an InnoDB table on one SQL node, it doesn't show up at the other SQL node, while this is the case with an NDB storage engine.

What I can do however, is point all mailservers to one SQL node. I just need to synchronize the bayes_token table to the other SQL node I guess. Do you have an idea about this?
  
Now the problem at the first glance seems to be, from my perspective  
(please correct me if I'm wrong), the actual queries being done. For  
every mail being scanned by spamassassin, it seems to be doing the  
"SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM  
bayes_token" query every time. This effectively requesting the entire  
bayes_token table
    

What you are seeing are expiry runs.

As you right now use MyISAM, the whole table is locked for such operations
so you are pretty much hosed.

In any case, you should use "bayes_auto_expire 0" and run expire for example
once every night when traffic is slower.
  
Thanks for this, I was not aware of it. Running expiry runs manually is done by sa-learn --force-expiry, correct?
  
It seems that the query cache is either not suitable for this or I am
doing something majorly wrong :)
    

You are right. Better to disable completely if there's nothing else running
that uses it and save little CPU.
  
Good to know. There will be other applications running on it as well so I'll reduce the size of the query cache for a good bit.

Thanks a lot for your feedback.

Jorn


__________ Information from ESET NOD32 Antivirus, version of virus signature database 4336 (20090814) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

Re: Bad performance of Bayes with MySQL cluster

by Henrik K :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Aug 15, 2009 at 09:50:41AM +0200, Jorn Argelo wrote:

> Henrik K wrote:
>> On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
>>  
>>> Hi All,
>>>
>>> I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three
>>> boxes,  and all three of them are sharing the same bayes DB using a
>>> MySQL  cluster, version 7.0.6 (based on 5.1.34). The cluster has 2
>>> datanodes  with a quadcore and 4 GB of memory. Everything is working
>>> fine, even the  AWL in SQL, except for Bayes. The bayes database
>>> currently houses a bit  less than 500k tokens and the database size
>>> is not very big either, as  the datanodes have less than 1 GB of
>>> storage in use. I've followed the  instructions from the Spamassassin
>>> wiki, and I also used the supplied  bayes_mysql.sql file to create my
>>> tables. In case anyone is interested,  you can find the cluster.ini
>>> and the my.cnf used on the SQL nodes here:
>>>
>>> http://www.wcborstel.com/web/mysql/my.cnf
>>>    
>>
>> skip-innodb
>>
>> That's pretty much the reason. You _need_ to use InnoDB as it has row level
>> locking. MyISAM just kills Bayes.
>>  
> Actually I'm using NDB and not MyISAM. I need a clustered storage  
> engine, otherwise the bayes DB can't really be shared. If I create an  
> InnoDB table on one SQL node, it doesn't show up at the other SQL node,  
> while this is the case with an NDB storage engine.

Ah right sorry.. I have no idea on NDB and how it performs for SA.

> What I can do however, is point all mailservers to one SQL node. I just  
> need to synchronize the bayes_token table to the other SQL node I guess.  
> Do you have an idea about this?

MySQL replication? Maybe search on spamassassin-users archives to find
experiences.

> Thanks for this, I was not aware of it. Running expiry runs manually is  
> done by sa-learn --force-expiry, correct?

Yep.


Re: [Solved] Bad performance of Bayes with MySQL cluster

by Jorn Argelo-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Henrik K wrote:

> On Sat, Aug 15, 2009 at 09:50:41AM +0200, Jorn Argelo wrote:
>  
>> Henrik K wrote:
>>    
>>> On Fri, Aug 14, 2009 at 07:43:37PM +0200, Jorn Argelo wrote:
>>>  
>>>      
>>>> Hi All,
>>>>
>>>> I'm running spamassassin 3.2.5 on RHEL 5.3 x86_64. We have three
>>>> boxes,  and all three of them are sharing the same bayes DB using a
>>>> MySQL  cluster, version 7.0.6 (based on 5.1.34). The cluster has 2
>>>> datanodes  with a quadcore and 4 GB of memory. Everything is working
>>>> fine, even the  AWL in SQL, except for Bayes. The bayes database
>>>> currently houses a bit  less than 500k tokens and the database size
>>>> is not very big either, as  the datanodes have less than 1 GB of
>>>> storage in use. I've followed the  instructions from the Spamassassin
>>>> wiki, and I also used the supplied  bayes_mysql.sql file to create my
>>>> tables. In case anyone is interested,  you can find the cluster.ini
>>>> and the my.cnf used on the SQL nodes here:
>>>>
>>>> http://www.wcborstel.com/web/mysql/my.cnf
>>>>    
>>>>        
>>> skip-innodb
>>>
>>> That's pretty much the reason. You _need_ to use InnoDB as it has row level
>>> locking. MyISAM just kills Bayes.
>>>  
>>>      
>> Actually I'm using NDB and not MyISAM. I need a clustered storage  
>> engine, otherwise the bayes DB can't really be shared. If I create an  
>> InnoDB table on one SQL node, it doesn't show up at the other SQL node,  
>> while this is the case with an NDB storage engine.
>>    
>
> Ah right sorry.. I have no idea on NDB and how it performs for SA.
>
>  
>> What I can do however, is point all mailservers to one SQL node. I just  
>> need to synchronize the bayes_token table to the other SQL node I guess.  
>> Do you have an idea about this?
>>    
>
> MySQL replication? Maybe search on spamassassin-users archives to find
> experiences.
>
>  
>> Thanks for this, I was not aware of it. Running expiry runs manually is  
>> done by sa-learn --force-expiry, correct?
>>    
>
> Yep.
>
>
>  
In case anybody else comes across the same, I've kicked out the MySQL
cluster and now using MySQL with multi-master replication. There we can
use InnoDB and this definitely solved all of the problems I had with
bayes. Scantimes are now below 1 second. I don't have much load as of
yet, so I expect this to increase somewhat during business hours, but
all in all things look a lot more promising. I've used this howto:
http://capttofu.livejournal.com/1752.html

Thanks for the pointers, Henrik.

Regards,
Jorn



__________ Information from ESET NOD32 Antivirus, version of virus signature database 4336 (20090814) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



RE: [Solved] Bad performance of Bayes with MySQL cluster

by Jonas Akrouh Larsen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> >
> In case anybody else comes across the same, I've kicked out the MySQL
> cluster and now using MySQL with multi-master replication. There we can
> use InnoDB and this definitely solved all of the problems I had with
> bayes. Scantimes are now below 1 second. I don't have much load as of
> yet, so I expect this to increase somewhat during business hours, but
> all in all things look a lot more promising. I've used this howto:
> http://capttofu.livejournal.com/1752.html
>

Hi Jorn

I'm running the same kind of multimaster mysql replication scheme (my sql
ndoes are in different datacenters so a cluster was not a good option for
us)

However I've seen several reports of people having problem with bayes and
replication, since with multimaster replication you have to manually be sure
there is no overlap in row id's etc.

I cant remember 100% if it was an old issue or if it still should be an
issue, I just thought I'd send you this warning, that replication might
break if you habve a bayes DB on the dbms and the 2 masters write the same
id/token whatever to the DB.

So you might want to test thoroughly before you deploy in production, or at
least if it breaks later you might remember this mail :)



Med venlig hilsen / Best regards
 
Jonas Akrouh Larsen
 
TechBiz ApS
Laplandsgade 4, 2. sal
2300 København S
 
Office: 7020 0979
Direct: 3336 9974
Mobile: 5120 1096
Fax:    7020 0978
Web: www.techbiz.dk




Re: [Solved] Bad performance of Bayes with MySQL cluster

by Jorn Argelo-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jonas A. Larsen wrote:

>> In case anybody else comes across the same, I've kicked out the MySQL
>> cluster and now using MySQL with multi-master replication. There we can
>> use InnoDB and this definitely solved all of the problems I had with
>> bayes. Scantimes are now below 1 second. I don't have much load as of
>> yet, so I expect this to increase somewhat during business hours, but
>> all in all things look a lot more promising. I've used this howto:
>> http://capttofu.livejournal.com/1752.html
>>
>>    
>
> Hi Jorn
>
> I'm running the same kind of multimaster mysql replication scheme (my sql
> ndoes are in different datacenters so a cluster was not a good option for
> us)
>
> However I've seen several reports of people having problem with bayes and
> replication, since with multimaster replication you have to manually be sure
> there is no overlap in row id's etc.
>
> I cant remember 100% if it was an old issue or if it still should be an
> issue, I just thought I'd send you this warning, that replication might
> break if you habve a bayes DB on the dbms and the 2 masters write the same
> id/token whatever to the DB.
>
> So you might want to test thoroughly before you deploy in production, or at
> least if it breaks later you might remember this mail :)
>  
Hi Johas,

Thanks a lot for your heads-up here.

Actually I was realising this myself as well, especially since Bayes
doesn't seem to be using MySQL's AUTO_INCREMENT property for its ids.
I've seen a few articles where the different masters used different
offsets for their AUTO_INCREMENT columns. Frankly this wasn't going to
work for the bayes_token table at the very least. And even if it did
work, you'd end up with different IDs, same token and different count
values (purely theoretical if my logic isn't failing).

For the sake of the KISS concept I decided to scrap the multi-master
replication and made it a master-slave setup only, where all 3 boxes
point to one master. Then the slave is present as a backup server only.
I was expecting to need 2 masters because of performance reasons, but
the single master works like a charm in production. I'm sticking to that
for now and works surprisingly well with InnoDB and the
my-4GB-huge-InnoDB.cnf (or whatever it's called).

Cheers,
Jorn

>
>
> Med venlig hilsen / Best regards
>  
> Jonas Akrouh Larsen
>  
> TechBiz ApS
> Laplandsgade 4, 2. sal
> 2300 København S
>  
> Office: 7020 0979
> Direct: 3336 9974
> Mobile: 5120 1096
> Fax:    7020 0978
> Web: www.techbiz.dk
>
>
>
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus signature database 4343 (20090817) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>
>  



__________ Information from ESET NOD32 Antivirus, version of virus signature database 4343 (20090817) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com



Re: [Solved] Bad performance of Bayes with MySQL cluster

by John Hardin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, 18 Aug 2009, Jorn Argelo wrote:

> For the sake of the KISS concept I decided to scrap the multi-master
> replication and made it a master-slave setup only, where all 3 boxes
> point to one master. Then the slave is present as a backup server only.
> I was expecting to need 2 masters because of performance reasons, but
> the single master works like a charm in production. I'm sticking to that
> for now and works surprisingly well with InnoDB and the
> my-4GB-huge-InnoDB.cnf (or whatever it's called).

I assume you're using autolearning. Master+multislave can support
distributed scoring if you're manually learning, as sa-learn can be
configured to talk to the master while the SA daemons cal query the
slave(s).

Suggestion: open a feature request bug to allow bayes autolearn to use a
different database connection string than bayes scoring. That way you
could configure all the daemons' autolearns to write to the master, but
distribute their scoring queries across X number of replicated slaves...

--
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@...    FALaholic #11174     pgpk -a jhardin@...
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The real opiate of the masses isn't religion; it's the belief that
   somewhere there is a benefit that can be delivered without a
   corresponding cost.                       -- Tom of "Radio Free NJ"
-----------------------------------------------------------------------
  7 days until the 1930th anniversary of the destruction of Pompeii

Re: [Solved] Bad performance of Bayes with MySQL cluster

by furban :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

John Hardin wrote:
Suggestion: open a feature request bug to allow bayes autolearn to use a
different database connection string than bayes scoring. That way you
could configure all the daemons' autolearns to write to the master, but
distribute their scoring queries across X number of replicated slaves...
There are to feature requests active.
Bug 4508 asking for this problem here.
Bug 5998 asking for solving the problem with a master-master replication database
In both Bugs there is a patch included to solve the problem. I just asked the SpamAssassin Team if these patches will be included in SA3.3 but the answer was:

--- Comment #6 from Justin Mason <jm@jmason.org> 2009-10-30 05:51:10 UTC ---
(In reply to comment #5)
> Bug 5998. This feature will be included in SA 3.3?
Nope -- Michael is about the nearest thing we have on the dev team to our SQL
expert.  if he vetoes it, that's a big problem....

> What about Bug 4508? I like more to have the possibility to use another server
> for writing data than to read from.
> I'm using SQLGrey in this way today.

without someone reviewing and approving the patches, we can't apply them....

So it seemed that some more people need to test the patches.......