|
View:
New views
1 Messages
—
Rating Filter:
Alert me
|
|
|
Too many connections: A funny storyThis started off as a frantic late-night cry for help. I was tearing my
hair out. In the process of writing it, my error became clear. It's funny enough for me to take the time to edit and pass on. Someday somebody may find it useful. I suddenly had a problem with connections living too long, resulting in MySQL "too many connections" open errors. Database is Mysql 4.1.22 running on RedHat 4.8. Application is a "not-really-web" application that, however, runs on Tomcat 6.0 for a variety of reasons that needn't concern us. There are no long-lasting transactions or humongous queries - database activity is light and only a small part of the application. There are no EJB's or container-managed anything. Application uses Hibernate 3.2 and I religiously close all transactions and sessions. Three application servers are served by this database. One resides within the same LAN as the database server, the other two are geographically distant. Prior to the spit hitting the fan last week, all access to the database was through a separate application colocated on the same machine as the database that decrypted the data coming across the wire, then fed it to Hibernate and the database. This old home-brewed encryption-cum-message-queueing engine was a real pain to maintain and it was decided to replace it with direct connects to the database machine that used MYSQL's SSL support for the encryption. All the testing of this new architecture was done within the LAN, and it seemed to work great. We rolled it out first to the single machine that was on the same LAN as the DB server. Everything ran better. But when we rolled it out to the two distant machines, we soon experienced Too Many Connections errors. Many more connections were being left in SLEEP state by MySQL than were needed or than the c3p0 connection pooling was supposed to allow. Now, here's the weird part. We started collecting statistics on how many connections MySQL had open every five minutes. The server that was on the same LAN as the database NEVER varied above or below the minimum number of connections specified in the C3P0 settings. The other two servers would leave connections for lengthy periods of time, often coming near to doubling what C3p0 wanted to let through. MYSQL was using the default configuration setting of 28800 seconds (8hrs) to time out connections. (And yet, for some reason that is still unknown, in spite of this, it was killing the sessions after 7200 seconds.) Which was fine by me - it was saving my keyster until I figured this out. Ironically, it happened that the worst spikes in open connections happened when the load was LOWEST! This led us to the supposition that the firewall was somehow involved. The firewall was timing out connections from the two remote servers every five minutes: the same limit as C3P0 was placing on the idle_test_period. If the firewall got there first, it might just kill the socket without talking to MYSQL, just before C3P0 was about to tickle it. (This suggests, by the way, that there is another use for the idle_test_period: as a keepalive mechanism to prevent firewalls from shutting sockets without telling the database). So we boosted the Firewall's timeout to 6 minutes. We thought we had it solved. It didn't do any good! The number of open connections quickly passed C3P0's max and once load went down at night, the number of open connections GREW! That's when I started to tear my hair out and start this message. These were the C3P0 settings via Hibernate. No separate C3P0 settings were being done. <property name="hibernate.c3p0.min_size">5</property> <property name="hibernate.c3p0.max_size">20</property> <property name="hibernate.c3p0.timeout">600</property> <property name="hibernate.c3p0.max_statements">50</property> <property name="hibernate.c3p0.acquire_increment">2</property> <property name="hibernate.c3p0.idle_test period">300</property> Say,what's that last one? "hibernate.c3p0.idle_test period"? Shouldn't there be an underscore between "test" and "period"? D'Oh!!!!!! So, this meant we weren't doing ANY idle testing (I learned from the docs) and no keep-alives were being sent, thus ALL connections were being closed by the firewall without telling the database. Which explains why the worst spikes happened at low-volume times: at higher volume times other client accesses caused the threads to be reused while at lower volume times, the firewall got there first. Got my fix deployed and immediately the crazy mountain-shaped graphs turned flat as a board. Turns out the minimum size was good enough 99% of the time. C3P0, it turned out, worked like a charm. Anyway, I thought someone might get a kick out of this war story and maybe someone down the road will Google and find an idea that's useful to them. Thanks for a great product. Once you set it up right, it just works. ------------------------------------------------------------------------------ Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference _______________________________________________ c3p0-users mailing list c3p0-users@... https://lists.sourceforge.net/lists/listinfo/c3p0-users |
| Free embeddable forum powered by Nabble | Forum Help |