« Return to Thread: Serious scalability issues with Oncourse at Indiana U

Re: Serious scalability issues with Oncourse at Indiana U

by Speelmon, Lance Day :: Rate this Message:

| View in Thread

We peaked at 120k web requests / 5 min last week with user presence  
enabled.  In an attempt to throttle database activity, we disabled  
user presence and are running at peaks of 100k web requests / 5 min  
this week.  Here is what we are seeing from the database perspective:

We typically run around 15 transactions/second but we have seen it as  
high as 110 transactions / second.
Average Response Time to the user  hovers around 1 second.
CPUs in use varies from 6 to 9 mostly witht e occasional spike  
hitting 10+ cpus fro a short period of time.


Thanks, L


Lance Speelmon  +1 (317) 278-9053
Manager Online Development / Sakai Release Manager


On Sep 5, 2007, at 4:33 PM, R.P. Aditya wrote:

> This is interesting, thanks for sending this -- what is the load  
> (hits per
> second or db executions or user transactions per second) you are  
> seeing?
>
> Thanks,
> Adi
>
> On Wed, Sep 05, 2007 at 04:26:34PM -0400, Lance Speelmon wrote:
> > I wanted to share with the community some of the issues we have
> > endured over the past week and what we have learned and changed as a
> > result.  As a short background, we have been running our Sakai based
> > system (Oncourse) in parallel with the legacy system for 2+ years.
> > The beginning of our fall semester marked the first time where  
> all of
> > the legacy user base transitioned to the Sakai based system.  This
> > additional user base represents an overall 2x increase in load on  
> our
> > production system (16x app servers and 1x Oracle 10.0.1.x server).
> >
> > The first sign of trouble came from the fact that DBCP was having
> > trouble obtaining and maintaining connections to Oracle.  DBCP has a
> > nasty bug that can be triggered in these kinds of conditions that
> > results in a deadlock situation.  To resolve the DBCP bugs, we
> > switched to the c3p0 connection pool.  C3p0 behaves much more stably
> > and predictably under heavy load and can recover better from
> > connection issues with Oracle.  This is a drop-in replacement for
> > DBCP and I am going to recommend that Sakai switch to this  
> connection
> > pool as a default in the 2.5 release.
> >
> > Next we started troubleshooting Oracle settings to get the instance
> > sized to handle the additional load being thrown at it.  The one
> > thing that we think made more difference than anything was turing  
> OFF
> > Oracle's automatic memory management (AMM).  With AMM turned off, we
> > then went through some iterations of increasing db_cache_size,
> > shared_pool_size, large_pool_size, and sga_max_size.  We eventually
> > over tuned these settings and started causing swapping in the  
> OS.  We
> > now have backed those down to a reasonable number and Oracle  
> seems to
> > be performing well.
> >
> > The areas of the application that are still giving us trouble are
> > related to running out of heap space in the jvm.  We have run the  
> app
> > servers with 1GB of heap for 2+ years with no issues, but with the
> > current load we are seeing we bumped up heap to 2GB (the max for 32-
> > bit architecture).  We have now returned service to a level of
> > stability, but we are still running dangerously close on max heap.
> > The next steps, from a software perspective, are to replace the XML-
> > based storage mechanisms with normalized relational database tables.
> > There are a few code paths that we are aware of that consume extreme
> > amounts of memory due to the loading of XML documents - Resources
> > (especially quota calculation), Assignments (especially download zip
> > file), and Calendar.  We plan on pursuing a migration to 64-bit app
> > servers as an insurance plan (more max heap), but we (Sakai) need to
> > put some concerted focus on removing XML-based storage.
> >
> > L
> >
> > You can see a complete change log here:
> > https://oncourse.iu.edu/access/wiki/site/
> > 3001b886-1069-4fb7-00d5-8db4b3a85f74/home.html
> >
> >
> > Lance Speelmon  +1 (317) 278-9053
> > Manager Online Development / Sakai Release Manager
> >
> >
> > [see attachment: "message0.html", size: 4945 bytes]
> >
> > [see attachment: "smime.p7s", size: 2417 bytes]
> >
> >
> > Attachments:
> >
> > message0.html
> > https://collab.sakaiproject.org/access/content/attachment/ 
> 1e66b58f-ff4a-4aee-8028-78aa1f489986/message0.html
> >
> > smime.p7s
> > https://collab.sakaiproject.org/access/content/attachment/ 
> e4fe6cd4-fdff-4dd6-0037-1cc4aab82785/smime.p7s
> >
> > ----------------------
> > This automatic notification message was sent by Sakai Collab
> > (https://collab.sakaiproject.org/portal) from the WG: Production  
> site.
> > You can modify how you receive notifications at My Workspace >  
> Preferences.
> >
>

[see attachment: "message0.html", size: 8045 bytes]

[see attachment: "smime.p7s", size: 2417 bytes]


Attachments:

message0.html
https://collab.sakaiproject.org/access/content/attachment/4a1f50e6-95f9-46e5-8092-d8655cd61c48/message0.html

smime.p7s
https://collab.sakaiproject.org/access/content/attachment/276bc325-ad5b-4a78-80ae-0353e2ad3931/smime.p7s

----------------------
This automatic notification message was sent by Sakai Collab (https://collab.sakaiproject.org/portal) from the DG: Development (a.k.a. sakai-dev) site.
You can modify how you receive notifications at My Workspace > Preferences.

 « Return to Thread: Serious scalability issues with Oncourse at Indiana U