|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
out of heap space, every dayThis maybe more of a general java q than a solr one, but I'm a bit
confused. We have a largish solr index, about 8M documents, the data dir is about 70G. We're getting about 500K new docs a week, as well as about 1 query/second. Recently (when we crossed about the 6M threshold) resin has been stopping with the following: /usr/local/resin/log/stdout.log:[12:08:21.749] [28304] HTTP/1.1 500 Java heap space /usr/local/resin/log/stdout.log:[12:08:21.749] java.lang.OutOfMemoryError: Java heap space Only a restart of resin will get it going again, and then it'll crash again within 24 hours. It's a 4GB machine and we run it with args="-J-mx2500m -J-ms2000m" We can't really raise this any higher on the machine. Are there 'native' memory requirements for solr as a function of index size? Does a 70GB index require some minimum amount of wired RAM? Or is there some mis-configuration w/ resin or solr or my system? I don't really know Java well but it seems strange that the VM can't page RAM out to disk or really do something else beside stopping the server. |
|
|
Re: out of heap space, every dayOn Dec 4, 2007 10:46 AM, Brian Whitman <brian.whitman@...> wrote:
> Are there 'native' memory requirements for solr as a function of > index size? For faceting and sorting, yes. For normal search, no. -Yonik |
|
|
Re: out of heap space, every day>
> For faceting and sorting, yes. For normal search, no. > Interesting you mention that, because one of the other changes since last week besides the index growing is that we added a sort to an sint field on the queries. Is it reasonable that a sint sort would require over 2.5GB of heap on a 8M index? Is there any empirical data on how much RAM that will need? |
|
|
Re: out of heap space, every dayOn Dec 4, 2007 10:59 AM, Brian Whitman <brian.whitman@...> wrote:
> > > > For faceting and sorting, yes. For normal search, no. > > > > Interesting you mention that, because one of the other changes since > last week besides the index growing is that we added a sort to an > sint field on the queries. > > Is it reasonable that a sint sort would require over 2.5GB of heap on > a 8M index? Is there any empirical data on how much RAM that will need? int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. One can decrease this memory usage by using an "integer" instead of an "sint" field if you don't need range queries. The memory usage would then drop to a straight int[maxDoc()] (4 bytes per document). -Yonik |
|
|
Re: out of heap space, every dayHello,
I am also fighting with heap exhaustion, however during the indexing step. I was able to minimize, but not fix the problem by setting the thread stack size to 64k with "-Xss64k". The minimum size is os specific, but the VM will tell you if you set the size too small. You can try it, it may help Brian Brian Whitman schrieb: > This maybe more of a general java q than a solr one, but I'm a bit > confused. > > We have a largish solr index, about 8M documents, the data dir is > about 70G. We're getting about 500K new docs a week, as well as about > 1 query/second. > > Recently (when we crossed about the 6M threshold) resin has been > stopping with the following: > > /usr/local/resin/log/stdout.log:[12:08:21.749] [28304] HTTP/1.1 500 > Java heap space > /usr/local/resin/log/stdout.log:[12:08:21.749] > java.lang.OutOfMemoryError: Java heap space > > Only a restart of resin will get it going again, and then it'll crash > again within 24 hours. > > It's a 4GB machine and we run it with args="-J-mx2500m -J-ms2000m" We > can't really raise this any higher on the machine. > > Are there 'native' memory requirements for solr as a function of index > size? Does a 70GB index require some minimum amount of wired RAM? Or > is there some mis-configuration w/ resin or solr or my system? I don't > really know Java well but it seems strange that the VM can't page RAM > out to disk or really do something else beside stopping the server. > |
|
|
Re: out of heap space, every dayOn 4-Dec-07, at 8:10 AM, Brian Carmalt wrote:
> Hello, > > I am also fighting with heap exhaustion, however during the > indexing step. I was able to minimize, but not fix the problem > by setting the thread stack size to 64k with "-Xss64k". The minimum > size is os specific, but the VM will tell > you if you set the size too small. You can try it, it may help This seems surprising unless you are positively hammering Solr with tons of different threads during indexing. It's probably not worth using more than # processors + a few. -Mike |
|
|
RE: out of heap space, every dayThanks!
I've seen a few formulae like this go by over the months. Can someone please make a wiki page for memory and processing estimation with locality properties? Or is there a Lucene page we can use? Lance -----Original Message----- From: yseeley@... [mailto:yseeley@...] On Behalf Of Yonik Seeley Sent: Tuesday, December 04, 2007 8:06 AM To: solr-user@... Subject: Re: out of heap space, every day On Dec 4, 2007 10:59 AM, Brian Whitman <brian.whitman@...> wrote: > > > > For faceting and sorting, yes. For normal search, no. > > > > Interesting you mention that, because one of the other changes since > last week besides the index growing is that we added a sort to an sint > field on the queries. > > Is it reasonable that a sint sort would require over 2.5GB of heap on > a 8M index? Is there any empirical data on how much RAM that will need? int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. One can decrease this memory usage by using an "integer" instead of an "sint" field if you don't need range queries. The memory usage would then drop to a straight int[maxDoc()] (4 bytes per document). -Yonik |
|
|
Re: out of heap space, every day>
> int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. > Then double that to allow for a warming searcher. > This is great, but can you help me parse this? Assume 8M docs and I'm sorting on an int field that is unix time (seonds since epoch.) For the purposes of the experiment assume every doc was indexed at a unique time. so.. (int[8000000] + String[8000000], each term is 16 chars + 8000000*4) * 2 that's 384MB by my calculation. Is that right? |
|
|
RE: out of heap space, every day"String[nTerms()]": Does this mean that you compare the first term, then
the second, etc.? Otherwise I don't understand how to compare multiple terms in two records. Lance -----Original Message----- From: yseeley@... [mailto:yseeley@...] On Behalf Of Yonik Seeley Sent: Tuesday, December 04, 2007 8:06 AM To: solr-user@... Subject: Re: out of heap space, every day On Dec 4, 2007 10:59 AM, Brian Whitman <brian.whitman@...> wrote: > > > > For faceting and sorting, yes. For normal search, no. > > > > Interesting you mention that, because one of the other changes since > last week besides the index growing is that we added a sort to an sint > field on the queries. > > Is it reasonable that a sint sort would require over 2.5GB of heap on > a 8M index? Is there any empirical data on how much RAM that will need? int[maxDoc()] + String[nTerms()] + size_of_all_unique_terms. Then double that to allow for a warming searcher. One can decrease this memory usage by using an "integer" instead of an "sint" field if you don't need range queries. The memory usage would then drop to a straight int[maxDoc()] (4 bytes per document). -Yonik |
|
|
Re: out of heap space, every dayOn Dec 4, 2007 3:11 PM, Norskog, Lance <lance@...> wrote:
> "String[nTerms()]": Does this mean that you compare the first term, then > the second, etc.? Otherwise I don't understand how to compare multiple > terms in two records. Lucene sorting only supports a single term per document for a field. The String array stores all the value of all the unique terms (so nTerms() above should be numberUniqueTerms) See Lucene's FieldCache.StringIndex -Yonik |
|
|
Re: out of heap space, every day> See Lucene's FieldCache.StringIndex
To understand just what's getting stored for each string field, you may also want to look at the createValue() method of the inner Cache object instantiated as stringsIndexCache in FieldCacheImpl.java (line 399 in HEAD): http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldCacheImpl.java?view=markup -Charlie |
|
|
Re: out of heap space, every dayIt seems to me that another way to write the formula -- borrowing
Python syntax -- is: 4 * numDocs + 38 * len(uniqueTerms) + 2 * sum([len(t) for t in uniqueTerms]) That's 4 bytes per document, plus 38 bytes per term, plus 2 bytes * the sum of the lengths of the terms. (Numbers taken from http://martin.nobilitas.com/java/sizeof.html) Does that seem right? -Charlie On Dec 4, 2007 12:31 PM, Charles Hornberger <charles.hornberger@...> wrote: > > See Lucene's FieldCache.StringIndex > > To understand just what's getting stored for each string field, you > may also want to look at the createValue() method of the inner Cache > object instantiated as stringsIndexCache in FieldCacheImpl.java (line > 399 in HEAD): > > http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/FieldCacheImpl.java?view=markup > > -Charlie > |
| Free embeddable forum powered by Nabble | Forum Help |