Too many connections: A funny story

View: New views
1 Messages — Rating Filter:   Alert me  

Too many connections: A funny story

by stevecoh1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This started off as a frantic late-night cry for help.  I was tearing my
hair out.  In the process of writing it, my error became clear.  It's
funny enough for me to take the time to edit and pass on.  Someday
somebody may find it useful.

I suddenly had a problem with connections living too long, resulting in
MySQL "too many connections" open errors.

Database is Mysql  4.1.22 running on RedHat 4.8.

Application is a "not-really-web" application that, however, runs on
Tomcat 6.0 for a variety of reasons that needn't concern us.   There are
no long-lasting transactions or humongous queries - database activity is
light and only a small part of the application.  There are no EJB's or
container-managed anything.  Application uses Hibernate 3.2 and I
religiously close all transactions and sessions.

Three application servers are served by this database.  One resides
within the same LAN as the database server, the other two are
geographically distant.  Prior to the spit hitting the fan last week,
all access to the database was through a separate application colocated
on the same machine as the database that decrypted the data coming
across the wire, then fed it to Hibernate and the database.  This old
home-brewed encryption-cum-message-queueing engine was a real pain to
maintain and it was decided to replace it with direct connects to the
database machine that used MYSQL's SSL support for the encryption.

All the testing of this new architecture was done within the LAN, and it
seemed to work great.  We rolled it out first to the single machine that
was on the same LAN as the DB server.  Everything ran better.  But when
we rolled it out to the two distant machines, we soon experienced Too
Many Connections errors.  Many more connections were being left in SLEEP
state by MySQL than were needed or than the c3p0 connection pooling was
supposed to allow.

Now, here's the weird part.  We started collecting statistics on how
many connections MySQL had open every five minutes.  The server that was
on the same LAN as the database NEVER varied above or below the minimum
number of connections specified in the C3P0 settings.  The other two
servers would leave connections for lengthy periods of time, often
coming near to doubling what C3p0 wanted to let through.  MYSQL was
using the default configuration setting of 28800 seconds (8hrs) to time
out connections.  (And yet, for some reason that is still unknown, in
spite of this, it was killing the sessions after 7200 seconds.)  Which
was fine by me - it was saving my keyster until I figured this out.

Ironically, it happened that the worst spikes in open connections
happened when the load was LOWEST!  This led us to the supposition that
the firewall was somehow involved.  The firewall was timing out
connections from the two remote servers every five minutes: the same
limit as C3P0 was placing on the idle_test_period.  If the firewall got
there first, it might just kill the socket without talking to MYSQL,
just before C3P0 was about to tickle it.  (This suggests, by the way,
that there is another use for the idle_test_period: as a keepalive
mechanism to prevent firewalls from shutting sockets without telling the
database).  So we boosted the Firewall's timeout to 6 minutes.  We
thought we had it solved.

It didn't do any good!  The number of open connections quickly passed
C3P0's max and once load went down at night, the number of open
connections GREW!  That's when I started to tear my hair out and start
this message.

These were the C3P0 settings via Hibernate. No separate C3P0 settings
were being done.

<property name="hibernate.c3p0.min_size">5</property>
<property name="hibernate.c3p0.max_size">20</property>
<property name="hibernate.c3p0.timeout">600</property>
<property name="hibernate.c3p0.max_statements">50</property>
<property name="hibernate.c3p0.acquire_increment">2</property>
<property name="hibernate.c3p0.idle_test period">300</property>

Say,what's that last one? "hibernate.c3p0.idle_test period"?  Shouldn't
there be an underscore between "test" and "period"?  D'Oh!!!!!!  So,
this meant we weren't doing ANY idle testing (I learned from the docs)
and no keep-alives were being sent, thus ALL connections were being
closed by the firewall without telling the database.  Which explains why
the worst spikes happened at low-volume times: at higher volume times
other client accesses caused the threads to be reused while at lower
volume times, the firewall got there first.  Got my fix deployed and
immediately the crazy mountain-shaped graphs turned flat as a board.  
Turns out the minimum size was good enough 99% of the time.  C3P0, it
turned out, worked like a charm.

Anyway, I thought someone might get a kick out of this war story and
maybe someone down the road will Google and find an idea that's useful
to them.  Thanks for a great product.  Once you set it up right, it just
works.

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
c3p0-users mailing list
c3p0-users@...
https://lists.sourceforge.net/lists/listinfo/c3p0-users