performance-related question

View: New views
7 Messages — Rating Filter:   Alert me  

performance-related question

by Michael Sokolov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We have run up against a performance problem with a reasonably large
(but not huge, I would say) data collection.  Even the most basic
queries are running unacceptably slow on this collection, so I am
wondering if there is some thing very obviously broken about our
configuration.  The machine is not the beefiest: it has a single 2.4 GHz
processor and has only 1GB RAM, but I am trying to find out what
performance we can wring from it before moving up to a better one.  The
JVM is allocated 800MB, and I have:

 <db-connection cacheSize="400M" collectionCache="96M" database="native"
        files="webapp/WEB-INF/data" pageSize="4096">

in the conf.xml

The collection of interest has 27700 documents, of varying size.  A
large number (say 1/4- to 1/2) are binary (images).  None is larger than
an article or book chapter.  Many are smaller (say a page or two of XML).

This query:

for $doc in collection ('/bopp.bfldev')
return $doc

takes 12 seconds to evaluate; the number of results returned is limited
to 1 by the client. We need to get < 1 second.

for $doc in subsequence(collection ('/bopp.bfldev'),1,1)
return $doc

takes the same time

The log shows only:

2009-11-05 12:06:15,842 [P1-49] DEBUG (XQuery.java [compile]:155) -
Query diagnostics:
for  <5>
    $doc in collection("/bopp.bfldev")
return <6>
    $doc
 
2009-11-05 12:06:15,843 [P1-49] DEBUG (XQuery.java [compile]:161) -
Compilation took 6 ms
2009-11-05 12:06:27,348 [P1-49] DEBUG (XQuery.java [execute]:231) -
Execution took 11,498 ms


the returned document is quite small, so I don't think there's a
serialization problem

My one concern is that possibly there are too many collections:
currently about 6400 if the following is a correct measure (it includes
collections outside the one we are running the query on, but that is by
far the largest of them):

charlestown:/proj/exist/eXist> find webapp/WEB-INF/data/fs -type d | wc
   6399    6399  402505

Is that likely to cause problems?  I could restructure our paths to
avoid that


Any ideas, folks?  I have a checkpoint release tomorrow and would really
love to speed this up a bit!  Thanks

-Mike


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: performance-related question

by Adam Retter-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> for $doc in collection ('/bopp.bfldev')
> return $doc

Isnt this just retreiving all  27700 documents?




--
Adam Retter

eXist Developer
{ United Kingdom }
adam@...
irc://irc.freenode.net/existdb

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: performance-related question

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> for $doc in collection ('/bopp.bfldev')
> return $doc
>
> takes 12 seconds to evaluate; the number of results returned is limited
> to 1 by the client.

A query like this should return instantly.

> My one concern is that possibly there are too many collections:
> currently about 6400

Ok, that's the only explanation I have. Does the second query execute
faster? What happens if you increase the collectionCache setting in
conf.xml?

If you can't figure it out, I can offer to have a look at your data
(unless it's confident) if you send it to me within the next hours.

Wolfgang

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: performance-related question

by Michael Sokolov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

So I increased the collectionCache setting from 96 to 256 MB and the speed improved from 12 to 3 seconds.  I tried fiddling with it, making it a bit bigger within the various limits, but that seems to be about the best I can manage.

Thanks for your offer of help, Wolfgang - I'll get in touch off-list

-Mike

Wolfgang Meier wrote:
for $doc in collection ('/bopp.bfldev')
return $doc

takes 12 seconds to evaluate; the number of results returned is limited
to 1 by the client.
    

A query like this should return instantly.

  
My one concern is that possibly there are too many collections:
currently about 6400
    

Ok, that's the only explanation I have. Does the second query execute
faster? What happens if you increase the collectionCache setting in
conf.xml?

If you can't figure it out, I can offer to have a look at your data
(unless it's confident) if you send it to me within the next hours.

Wolfgang
  

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: performance-related question

by Michael Sokolov-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Update for the list:

I believe the problem has been solved by setting collectionCacheSize (which accepts a fixed maximum number of collections to cache), rather than collectionCache (which is supposed to control the cache by a byte size limit, but apparently isn't working right at the moment)

thanks, Wolf

Mike Sokolov wrote:
So I increased the collectionCache setting from 96 to 256 MB and the speed improved from 12 to 3 seconds.  I tried fiddling with it, making it a bit bigger within the various limits, but that seems to be about the best I can manage.

Thanks for your offer of help, Wolfgang - I'll get in touch off-list

-Mike

Wolfgang Meier wrote:
for $doc in collection ('/bopp.bfldev')
return $doc

takes 12 seconds to evaluate; the number of results returned is limited
to 1 by the client.
    

A query like this should return instantly.

  
My one concern is that possibly there are too many collections:
currently about 6400
    

Ok, that's the only explanation I have. Does the second query execute
faster? What happens if you increase the collectionCache setting in
conf.xml?

If you can't figure it out, I can offer to have a look at your data
(unless it's confident) if you send it to me within the next hours.

Wolfgang
  

------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july

_______________________________________________ Exist-open mailing list Exist-open@... https://lists.sourceforge.net/lists/listinfo/exist-open

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: performance-related question

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I believe the problem has been solved by setting collectionCacheSize (which
> accepts a fixed maximum number of collections to cache), rather than
> collectionCache (which is supposed to control the cache by a byte size
> limit, but apparently isn't working right at the moment)

If someone else experiences issues with queries spanning a few
thousand collections or more, here's what we found:
the default setting for the collection cache in conf.xml is:

<db-connection collectionCache="48M"/>

The cache is supposed to grow on demand up to 48M. Unfortunately, this
doesn't seem to work in 1.4 (and maybe 1.2.x as well). In my tests,
the cache size remained fixed to 128 collections and didn't grow. This
causes a lot of IO if there are several thousand collections in the db
and results in a significant performance loss at query time.

Fortunately, there's an alternative setting which allows us to force
the collection cache to a fixed size (specified in terms of
collections cached):

<db-connection collectionCacheSize="10000"/>

So if your DB has 10000 collections, they would all fit into memory.
Well, this isn't a perfect solution (as you have no control over the
memory consumed), but it's ok as a workaround.

I'll try to find a fix for the dynamic cache over the weekend.

Wolfgang

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

Re: performance-related question

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I'll try to find a fix for the dynamic cache over the weekend.

I fixed the collection cache. It does now actually grow to the
specified limits, so the default setting:

<db-connection collectionCache="48M"/>

will indeed be sufficient to hold a few thousand collections. You can
check the current size via JMX.

A few people already reported a significant performance increase :-)

Wolfgang

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open