I wanted to share with the community some of the issues we have
endured over the past week and what we have learned and changed as a
result. As a short background, we have been running our Sakai based
system (Oncourse) in parallel with the legacy system for 2+ years.
The beginning of our fall semester marked the first time where all of
the legacy user base transitioned to the Sakai based system. This
additional user base represents an overall 2x increase in load on our
production system (16x app servers and 1x Oracle 10.0.1.x server).
The first sign of trouble came from the fact that DBCP was having
trouble obtaining and maintaining connections to Oracle. DBCP has a
nasty bug that can be triggered in these kinds of conditions that
results in a deadlock situation. To resolve the DBCP bugs, we
switched to the c3p0 connection pool. C3p0 behaves much more stably
and predictably under heavy load and can recover better from
connection issues with Oracle. This is a drop-in replacement for
DBCP and I am going to recommend that Sakai switch to this connection
pool as a default in the 2.5 release.
Next we started troubleshooting Oracle settings to get the instance
sized to handle the additional load being thrown at it. The one
thing that we think made more difference than anything was turing OFF
Oracle's automatic memory management (AMM). With AMM turned off, we
then went through some iterations of increasing db_cache_size,
shared_pool_size, large_pool_size, and sga_max_size. We eventually
over tuned these settings and started causing swapping in the OS. We
now have backed those down to a reasonable number and Oracle seems to
be performing well.
The areas of the application that are still giving us trouble are
related to running out of heap space in the jvm. We have run the app
servers with 1GB of heap for 2+ years with no issues, but with the
current load we are seeing we bumped up heap to 2GB (the max for 32-
bit architecture). We have now returned service to a level of
stability, but we are still running dangerously close on max heap.
The next steps, from a software perspective, are to replace the XML-
based storage mechanisms with normalized relational database tables.
There are a few code paths that we are aware of that consume extreme
amounts of memory due to the loading of XML documents - Resources
(especially quota calculation), Assignments (especially download zip
file), and Calendar. We plan on pursuing a migration to 64-bit app
servers as an insurance plan (more max heap), but we (Sakai) need to
put some concerted focus on removing XML-based storage.
L
You can see a complete change log here:
https://oncourse.iu.edu/access/wiki/site/
3001b886-1069-4fb7-00d5-8db4b3a85f74/home.html
Lance Speelmon +1 (317) 278-9053
Manager Online Development / Sakai Release Manager
[see attachment: "message0.html", size: 4945 bytes]
[see attachment: "smime.p7s", size: 2417 bytes]
Attachments:
message0.html
https://collab.sakaiproject.org/access/content/attachment/1e66b58f-ff4a-4aee-8028-78aa1f489986/message0.htmlsmime.p7s
https://collab.sakaiproject.org/access/content/attachment/e4fe6cd4-fdff-4dd6-0037-1cc4aab82785/smime.p7s----------------------
This automatic notification message was sent by Sakai Collab (
https://collab.sakaiproject.org/portal) from the DG: Development (a.k.a. sakai-dev) site.
You can modify how you receive notifications at My Workspace > Preferences.