out of heap space, every day

View: New views
12 Messages — Rating Filter:   Alert me  

out of heap space, every day

by Brian Whitman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This maybe more of a general java q than a solr one, but I'm a bit  
confused.

We have a largish solr index, about 8M documents, the data dir is  
about 70G. We're getting about 500K new docs a week, as well as about  
1 query/second.

Recently (when we crossed about the 6M threshold) resin has been  
stopping with the following:

/usr/local/resin/log/stdout.log:[12:08:21.749] [28304] HTTP/1.1 500  
Java heap space
/usr/local/resin/log/stdout.log:[12:08:21.749]  
java.lang.OutOfMemoryError: Java heap space

Only a restart of resin will get it going again, and then it'll crash  
again within 24 hours.

It's a 4GB machine and we run it with args="-J-mx2500m -J-ms2000m" We  
can't really raise this any higher on the machine.

Are there 'native' memory requirements for solr as a function of  
index size? Does a 70GB index require some minimum amount of wired  
RAM? Or is there some mis-configuration w/ resin or solr or my  
system? I don't really know Java well but it seems strange that the  
VM can't page RAM out to disk or really do something else beside  
stopping the server.










Re: out of heap space, every day

by Yonik Seeley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 4, 2007 10:46 AM, Brian Whitman <brian.whitman@...> wrote:
> Are there 'native' memory requirements for solr as a function of
> index size?

For faceting and sorting, yes.  For normal search, no.

-Yonik

Re: out of heap space, every day

by Brian Whitman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> For faceting and sorting, yes.  For normal search, no.
>

Interesting you mention that, because one of the other changes since  
last week besides the index growing is that we added a sort to an  
sint field on the queries.

Is it reasonable that a sint sort would require over 2.5GB of heap on  
a 8M index? Is there any empirical data on how much RAM that will need?





Re: out of heap space, every day

by Yonik Seeley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 4, 2007 10:59 AM, Brian Whitman <brian.whitman@...> wrote:

> >
> > For faceting and sorting, yes.  For normal search, no.
> >
>
> Interesting you mention that, because one of the other changes since
> last week besides the index growing is that we added a sort to an
> sint field on the queries.
>
> Is it reasonable that a sint sort would require over 2.5GB of heap on
> a 8M index? Is there any empirical data on how much RAM that will need?

int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms.
Then double that to allow for a warming searcher.

One can decrease this memory usage by using an "integer" instead of an
"sint" field if you don't need range queries.  The memory usage would
then drop to a straight int[maxDoc()] (4 bytes per document).

-Yonik

Re: out of heap space, every day

by bigdaddy_ :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I am also fighting with heap exhaustion, however during the indexing
step. I was able to minimize, but not fix the problem
by setting the thread stack size to 64k with "-Xss64k". The minimum size
is os specific, but the VM will tell
you if you set the size too small. You can try it, it may help

 Brian

Brian Whitman schrieb:

> This maybe more of a general java q than a solr one, but I'm a bit
> confused.
>
> We have a largish solr index, about 8M documents, the data dir is
> about 70G. We're getting about 500K new docs a week, as well as about
> 1 query/second.
>
> Recently (when we crossed about the 6M threshold) resin has been
> stopping with the following:
>
> /usr/local/resin/log/stdout.log:[12:08:21.749] [28304] HTTP/1.1 500
> Java heap space
> /usr/local/resin/log/stdout.log:[12:08:21.749]
> java.lang.OutOfMemoryError: Java heap space
>
> Only a restart of resin will get it going again, and then it'll crash
> again within 24 hours.
>
> It's a 4GB machine and we run it with args="-J-mx2500m -J-ms2000m" We
> can't really raise this any higher on the machine.
>
> Are there 'native' memory requirements for solr as a function of index
> size? Does a 70GB index require some minimum amount of wired RAM? Or
> is there some mis-configuration w/ resin or solr or my system? I don't
> really know Java well but it seems strange that the VM can't page RAM
> out to disk or really do something else beside stopping the server.
>

Re: out of heap space, every day

by Mike Klaas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 4-Dec-07, at 8:10 AM, Brian Carmalt wrote:

> Hello,
>
> I am also fighting with heap exhaustion, however during the  
> indexing step. I was able to minimize, but not fix the problem
> by setting the thread stack size to 64k with "-Xss64k". The minimum  
> size is os specific, but the VM will tell
> you if you set the size too small. You can try it, it may help

This seems surprising unless you are positively hammering Solr with  
tons of different threads during indexing.  It's probably not worth  
using more than # processors + a few.

-Mike

RE: out of heap space, every day

by Lance Norskog :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks!

I've seen a few formulae like this go by over the months. Can someone
please make a wiki page for memory and processing estimation with
locality properties?  Or is there a Lucene page we can use?

Lance

-----Original Message-----
From: yseeley@... [mailto:yseeley@...] On Behalf Of Yonik
Seeley
Sent: Tuesday, December 04, 2007 8:06 AM
To: solr-user@...
Subject: Re: out of heap space, every day

On Dec 4, 2007 10:59 AM, Brian Whitman <brian.whitman@...> wrote:
> >
> > For faceting and sorting, yes.  For normal search, no.
> >
>
> Interesting you mention that, because one of the other changes since
> last week besides the index growing is that we added a sort to an sint

> field on the queries.
>
> Is it reasonable that a sint sort would require over 2.5GB of heap on
> a 8M index? Is there any empirical data on how much RAM that will
need?

int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms.
Then double that to allow for a warming searcher.

One can decrease this memory usage by using an "integer" instead of an
"sint" field if you don't need range queries.  The memory usage would
then drop to a straight int[maxDoc()] (4 bytes per document).

-Yonik

Re: out of heap space, every day

by Brian Whitman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms.
> Then double that to allow for a warming searcher.
>

This is great, but can you help me parse this? Assume 8M docs and I'm  
sorting on an int field that is unix time (seonds since epoch.) For  
the purposes of the experiment assume every doc was indexed at a  
unique time.

so..

(int[8000000] + String[8000000], each term is 16 chars + 8000000*4) * 2

that's 384MB by my calculation. Is that right?



RE: out of heap space, every day

by Lance Norskog :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

"String[nTerms()]": Does this mean that you compare the first term, then
the second, etc.? Otherwise I don't understand how to compare multiple
terms in two records.

Lance

-----Original Message-----
From: yseeley@... [mailto:yseeley@...] On Behalf Of Yonik
Seeley
Sent: Tuesday, December 04, 2007 8:06 AM
To: solr-user@...
Subject: Re: out of heap space, every day

On Dec 4, 2007 10:59 AM, Brian Whitman <brian.whitman@...> wrote:
> >
> > For faceting and sorting, yes.  For normal search, no.
> >
>
> Interesting you mention that, because one of the other changes since
> last week besides the index growing is that we added a sort to an sint

> field on the queries.
>
> Is it reasonable that a sint sort would require over 2.5GB of heap on
> a 8M index? Is there any empirical data on how much RAM that will
need?

int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms.
Then double that to allow for a warming searcher.

One can decrease this memory usage by using an "integer" instead of an
"sint" field if you don't need range queries.  The memory usage would
then drop to a straight int[maxDoc()] (4 bytes per document).

-Yonik

Re: out of heap space, every day

by Yonik Seeley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Dec 4, 2007 3:11 PM, Norskog, Lance <lance@...> wrote:
> "String[nTerms()]": Does this mean that you compare the first term, then
> the second, etc.? Otherwise I don't understand how to compare multiple
> terms in two records.

Lucene sorting only supports a single term per document for a field.
The String array stores all the value of all the unique terms (so
nTerms() above should be numberUniqueTerms)

See Lucene's FieldCache.StringIndex

-Yonik

Re: out of heap space, every day

by Charles Hornberger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> See Lucene's FieldCache.StringIndex

To understand just what's getting stored for each string field, you
may also want to look at the createValue() method of the inner Cache
object instantiated as stringsIndexCache in FieldCacheImpl.java (line
399 in HEAD):

http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldCacheImpl.java?view=markup

-Charlie

Re: out of heap space, every day

by Charles Hornberger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It seems to me that another way to write the formula -- borrowing
Python syntax -- is:

4 * numDocs + 38 * len(uniqueTerms) + 2 * sum([len(t) for t in uniqueTerms])

That's 4 bytes per document, plus 38 bytes per term, plus 2 bytes *
the sum of the lengths of the terms. (Numbers taken from
http://martin.nobilitas.com/java/sizeof.html)

Does that seem right?

-Charlie

On Dec 4, 2007 12:31 PM, Charles Hornberger
<charles.hornberger@...> wrote:

> > See Lucene's FieldCache.StringIndex
>
> To understand just what's getting stored for each string field, you
> may also want to look at the createValue() method of the inner Cache
> object instantiated as stringsIndexCache in FieldCacheImpl.java (line
> 399 in HEAD):
>
> http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldCacheImpl.java?view=markup
>
> -Charlie
>