Plugin Performance Issues

View: New views
11 Messages — Rating Filter:   Alert me  

Plugin Performance Issues

by entdeveloper :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We recently created a custom class for our spellchecking implementation in Solr.  We decided to include the class in a custom jar and deployed it to the /lib directory in solr_home to use it as a plugin.

After a while (about 12 hours), the heap usage for Solr slowly starts to rise, and we eventually run into swap issues which ends up killing our performance.  We've tried several different things to try to solve the problem, originally thinking it was our code, but on one of our servers, the new code in the plugin wasn't even being used.

Has anyone else experienced?  I'm wondering if this is perhaps a side-effect of using plugins in general, perhaps something going on with the custom class loading of Solr.

We're using Tomcat 6 and Solr 1.3 by the way.

Re: Plugin Performance Issues

by Otis Gospodnetic :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi,

Could it simply be the case that you really do need all that memory that the JVM start consuming with time?  How large of a heap are you using, is Solr the only webapp in your TOmcat, and are you using sorting or faceting?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: CameronL <cameron.developer@...>
> To: solr-user@...
> Sent: Wednesday, July 1, 2009 2:37:40 PM
> Subject: Plugin Performance Issues
>
>
> We recently created a custom class for our spellchecking implementation in
> Solr.  We decided to include the class in a custom jar and deployed it to
> the /lib directory in solr_home to use it as a plugin.
>
> After a while (about 12 hours), the heap usage for Solr slowly starts to
> rise, and we eventually run into swap issues which ends up killing our
> performance.  We've tried several different things to try to solve the
> problem, originally thinking it was our code, but on one of our servers, the
> new code in the plugin wasn't even being used.
>
> Has anyone else experienced?  I'm wondering if this is perhaps a side-effect
> of using plugins in general, perhaps something going on with the custom
> class loading of Solr.
>
> We're using Tomcat 6 and Solr 1.3 by the way.
> --
> View this message in context:
> http://www.nabble.com/Plugin-Performance-Issues-tp24295010p24295010.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Plugin Performance Issues

by entdeveloper :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Our max heap was configured to use 5GB.  It has been running fine until we tried to deploy a new queryConverter for our SpellcheckComponent.  After which, we upped our heap to 8GB and still had issues.

Solr is the only webapp running on Tomcat.

We are using sorting and faceting, but again, hadn't had problems until deploying this plugin.  Also, seeing as how it's only spellchecking related (and we have a separate RequestHandler that only handles spellchecking, while leaving the SpellcheckComponent out of our standard RequestHandler), I'm not entirely convinced that it's related to our code, but it could be.  Just trying to get a sense if other plugins have had similar problems, just by the nature of using Solr's resource loading from the /lib directory.

Otis Gospodnetic wrote:
Hi,

Could it simply be the case that you really do need all that memory that the JVM start consuming with time?  How large of a heap are you using, is Solr the only webapp in your TOmcat, and are you using sorting or faceting?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: CameronL <cameron.developer@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, July 1, 2009 2:37:40 PM
> Subject: Plugin Performance Issues
>
>
> We recently created a custom class for our spellchecking implementation in
> Solr.  We decided to include the class in a custom jar and deployed it to
> the /lib directory in solr_home to use it as a plugin.
>
> After a while (about 12 hours), the heap usage for Solr slowly starts to
> rise, and we eventually run into swap issues which ends up killing our
> performance.  We've tried several different things to try to solve the
> problem, originally thinking it was our code, but on one of our servers, the
> new code in the plugin wasn't even being used.
>
> Has anyone else experienced?  I'm wondering if this is perhaps a side-effect
> of using plugins in general, perhaps something going on with the custom
> class loading of Solr.
>
> We're using Tomcat 6 and Solr 1.3 by the way.
> --
> View this message in context:
> http://www.nabble.com/Plugin-Performance-Issues-tp24295010p24295010.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Plugin Performance Issues

by Otis Gospodnetic :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi,

5GB heap sounds quite big, let along the 8 GB heap.  I would try simple stuff like jmap to see what's eating the memory, and if that doesn't work I'd try using a profiler.

Turn off norms if you don't need them, and either use trie-based fields for date if you have them and sort by them, or round those dates up.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: CameronL <cameron.developer@...>
> To: solr-user@...
> Sent: Wednesday, July 1, 2009 4:43:11 PM
> Subject: Re: Plugin Performance Issues
>
>
> Our max heap was configured to use 5GB.  It has been running fine until we
> tried to deploy a new queryConverter for our SpellcheckComponent.  After
> which, we upped our heap to 8GB and still had issues.
>
> Solr is the only webapp running on Tomcat.
>
> We are using sorting and faceting, but again, hadn't had problems until
> deploying this plugin.  Also, seeing as how it's only spellchecking related
> (and we have a separate RequestHandler that only handles spellchecking,
> while leaving the SpellcheckComponent out of our standard RequestHandler),
> I'm not entirely convinced that it's related to our code, but it could be.
> Just trying to get a sense if other plugins have had similar problems, just
> by the nature of using Solr's resource loading from the /lib directory.
>
>
> Otis Gospodnetic wrote:
> >
> >
> > Hi,
> >
> > Could it simply be the case that you really do need all that memory that
> > the JVM start consuming with time?  How large of a heap are you using, is
> > Solr the only webapp in your TOmcat, and are you using sorting or
> > faceting?
> >
> >  Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: CameronL
> >> To: solr-user@...
> >> Sent: Wednesday, July 1, 2009 2:37:40 PM
> >> Subject: Plugin Performance Issues
> >>
> >>
> >> We recently created a custom class for our spellchecking implementation
> >> in
> >> Solr.  We decided to include the class in a custom jar and deployed it to
> >> the /lib directory in solr_home to use it as a plugin.
> >>
> >> After a while (about 12 hours), the heap usage for Solr slowly starts to
> >> rise, and we eventually run into swap issues which ends up killing our
> >> performance.  We've tried several different things to try to solve the
> >> problem, originally thinking it was our code, but on one of our servers,
> >> the
> >> new code in the plugin wasn't even being used.
> >>
> >> Has anyone else experienced?  I'm wondering if this is perhaps a
> >> side-effect
> >> of using plugins in general, perhaps something going on with the custom
> >> class loading of Solr.
> >>
> >> We're using Tomcat 6 and Solr 1.3 by the way.
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Plugin-Performance-Issues-tp24295010p24295010.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Plugin-Performance-Issues-tp24295010p24296828.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Plugin Performance Issues

by hossman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


: I'm not entirely convinced that it's related to our code, but it could be.
: Just trying to get a sense if other plugins have had similar problems, just
: by the nature of using Solr's resource loading from the /lib directory.

Plugins aren't something that every Solr users -- but enough people use
them that if there was a fundemental memory leak just from loading plugin
jars i'm guessing more people would be complaining.

I use plugins in several solr instances, and i've never noticed any
problems like you describe -- but i don't personally use tomcat.

Otis is right on the money: you need to use profiling tools to really look
at the heap and see what's taking up all that ram.

Alternately: a quick way to rule out the special plugin class loader would
be to embed your custom handler directly into the solr.war ("The Old Way"
on the SolrPlugins wiki) ... if you still have problems, then the cause
isn't the plugin classloader.





-Hoss


Re: Plugin Performance Issues

by entdeveloper :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is an issue we experienced a while back.  We once again tried to load a custom class as a plugin jar from the lib directory and began experiencing severe memory problems again.  The code in our jar wasn't being used at all...the class was only referenced in the schema.  I find it strange that no one else has experienced this, but we're not doing anything particularly complex, which is still leading me to believe that there is something strange going on with Solr's class loading for this lib directory.  Perhaps it is something specific with our environment (specs below)?

java version "1.6.0_05"
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode)

Tomcat 6.0.16

Linux 2.6.9-35.ELsmp #1 SMP Thu Jun 1 14:31:29 PDT 2006 x86_64 x86_64 x86_64 GNU/Linux

Max heap set to 1GB.

With the jars in the plugin directory, RAM usage increases by 1.5 - 2GB, increasing at about 200MB/hr.


hossman wrote:
: I'm not entirely convinced that it's related to our code, but it could be.
: Just trying to get a sense if other plugins have had similar problems, just
: by the nature of using Solr's resource loading from the /lib directory.

Plugins aren't something that every Solr users -- but enough people use
them that if there was a fundemental memory leak just from loading plugin
jars i'm guessing more people would be complaining.

I use plugins in several solr instances, and i've never noticed any
problems like you describe -- but i don't personally use tomcat.

Otis is right on the money: you need to use profiling tools to really look
at the heap and see what's taking up all that ram.

Alternately: a quick way to rule out the special plugin class loader would
be to embed your custom handler directly into the solr.war ("The Old Way"
on the SolrPlugins wiki) ... if you still have problems, then the cause
isn't the plugin classloader.





-Hoss

Re: Plugin Performance Issues

by Grant Ingersoll-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I would guess that your code is being used.  I'm not sure what you  
mean by it "was only referenced in the schema".  That implies usage to  
me.  Is it a new field type?  What is your plugin doing?

Have you tried setting breakpoints at method entry points in your  
plugin and starting up Solr w/ a debugger attached.

-Grant

On Oct 28, 2009, at 4:54 PM, entdeveloper wrote:

>
> This is an issue we experienced a while back.  We once again tried  
> to load a
> custom class as a plugin jar from the lib directory and began  
> experiencing
> severe memory problems again.  The code in our jar wasn't being used  
> at
> all...the class was only referenced in the schema.  I find it  
> strange that
> no one else has experienced this, but we're not doing anything  
> particularly
> complex, which is still leading me to believe that there is something
> strange going on with Solr's class loading for this lib directory.  
> Perhaps
> it is something specific with our environment (specs below)?
>
> java version "1.6.0_05"
> Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode)
>
> Tomcat 6.0.16
>
> Linux 2.6.9-35.ELsmp #1 SMP Thu Jun 1 14:31:29 PDT 2006 x86_64  
> x86_64 x86_64
> GNU/Linux
>
> Max heap set to 1GB.
>
> With the jars in the plugin directory, RAM usage increases by 1.5 -  
> 2GB,
> increasing at about 200MB/hr.
>
>
>
> hossman wrote:
>>
>>
>> : I'm not entirely convinced that it's related to our code, but it  
>> could
>> be.
>> : Just trying to get a sense if other plugins have had similar  
>> problems,
>> just
>> : by the nature of using Solr's resource loading from the /lib  
>> directory.
>>
>> Plugins aren't something that every Solr users -- but enough people  
>> use
>> them that if there was a fundemental memory leak just from loading  
>> plugin
>> jars i'm guessing more people would be complaining.
>>
>> I use plugins in several solr instances, and i've never noticed any
>> problems like you describe -- but i don't personally use tomcat.
>>
>> Otis is right on the money: you need to use profiling tools to  
>> really look
>> at the heap and see what's taking up all that ram.
>>
>> Alternately: a quick way to rule out the special plugin class  
>> loader would
>> be to embed your custom handler directly into the solr.war ("The  
>> Old Way"
>> on the SolrPlugins wiki) ... if you still have problems, then the  
>> cause
>> isn't the plugin classloader.
>>
>>
>>
>>
>>
>> -Hoss
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Plugin-Performance-Issues-tp24295010p26101741.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: Plugin Performance Issues

by entdeveloper :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here is where our custom class is referenced in the schema:

<fieldtype name="text_lc" class="solr.TextField" tokenized="false">
  <analyzer type="index">
    <tokenizer class="my.custom.TokenizerFactory"/>
    <filter class="my.custom.FilterFactory" words="stopwords.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldtype>

As you can see, we built our own field type to be used at index time to essentially act as a sort of KeywordTokenizer, but removing stopwords.  We share a schema.xml for both master and slave servers for convenience, but we only do indexing on the master server.  However, with this schema in place on the slaves, as well as our custom.jar in the solrHome/lib directory, we run into these issues where the memory usage grows and grows without explanation.

We've done this before (earlier in this thread) with having a custom spelling implementation too, and we ran into the same problem. We since gave up on that fix, but this is our very next attempt at deploying custom code using solr's plugin capability.  Unfortunately, we got the same results.  In fact, in a previous try, we had simply dropped one of our custom plugin jars into the lib directory but forgot to deploy the new solrconfig or schema files that referenced the classes in there, and the issue still occurred.

Anyway, for now we've been able to get around this by packaging the solr.war with our custom jars in the WEB-INF/lib. Although this is more proper anyway, it's not nearly as convenient as being able to drop jars into an external lib directory and let solr pick up our classes that way.  I'm still curious if this is unique to our environment or if there's a bug with solr's classloading for the plugin functionality.


Grant Ingersoll-6 wrote:
I would guess that your code is being used.  I'm not sure what you  
mean by it "was only referenced in the schema".  That implies usage to  
me.  Is it a new field type?  What is your plugin doing?

Have you tried setting breakpoints at method entry points in your  
plugin and starting up Solr w/ a debugger attached.

-Grant

On Oct 28, 2009, at 4:54 PM, entdeveloper wrote:

>
> This is an issue we experienced a while back.  We once again tried  
> to load a
> custom class as a plugin jar from the lib directory and began  
> experiencing
> severe memory problems again.  The code in our jar wasn't being used  
> at
> all...the class was only referenced in the schema.  I find it  
> strange that
> no one else has experienced this, but we're not doing anything  
> particularly
> complex, which is still leading me to believe that there is something
> strange going on with Solr's class loading for this lib directory.  
> Perhaps
> it is something specific with our environment (specs below)?
>
> java version "1.6.0_05"
> Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 10.0-b19, mixed mode)
>
> Tomcat 6.0.16
>
> Linux 2.6.9-35.ELsmp #1 SMP Thu Jun 1 14:31:29 PDT 2006 x86_64  
> x86_64 x86_64
> GNU/Linux
>
> Max heap set to 1GB.
>
> With the jars in the plugin directory, RAM usage increases by 1.5 -  
> 2GB,
> increasing at about 200MB/hr.
>
>
>
> hossman wrote:
>>
>>
>> : I'm not entirely convinced that it's related to our code, but it  
>> could
>> be.
>> : Just trying to get a sense if other plugins have had similar  
>> problems,
>> just
>> : by the nature of using Solr's resource loading from the /lib  
>> directory.
>>
>> Plugins aren't something that every Solr users -- but enough people  
>> use
>> them that if there was a fundemental memory leak just from loading  
>> plugin
>> jars i'm guessing more people would be complaining.
>>
>> I use plugins in several solr instances, and i've never noticed any
>> problems like you describe -- but i don't personally use tomcat.
>>
>> Otis is right on the money: you need to use profiling tools to  
>> really look
>> at the heap and see what's taking up all that ram.
>>
>> Alternately: a quick way to rule out the special plugin class  
>> loader would
>> be to embed your custom handler directly into the solr.war ("The  
>> Old Way"
>> on the SolrPlugins wiki) ... if you still have problems, then the  
>> cause
>> isn't the plugin classloader.
>>
>>
>>
>>
>>
>> -Hoss
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Plugin-Performance-Issues-tp24295010p26101741.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Plugin Performance Issues

by hossman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


: <fieldtype name="text_lc" class="solr.TextField" tokenized="false">
:   <analyzer type="index">
:     <tokenizer class="my.custom.TokenizerFactory"/>
:     <filter class="my.custom.FilterFactory" words="stopwords.txt"/>
:     <filter class="solr.LowerCaseFilterFactory"/>
:     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
:   </analyzer>
: </fieldtype>
        ...
: only do indexing on the master server.  However, with this schema in place
: on the slaves, as well as our custom.jar in the solrHome/lib directory, we
: run into these issues where the memory usage grows and grows without
: explanation.

...even if you only o indexing on the master, having a single analyzer
defined for a field means it's used at both index and query time (even
though you say 'type="index"') so a memory leak in either of your custom
factories could cause a problem on a query box.

This however concerns me...

: fact, in a previous try, we had simply dropped one of our custom plugin jars
: into the lib directory but forgot to deploy the new solrconfig or schema
: files that referenced the classes in there, and the issue still occurred.

...this i can't think of a rational explanation for.  Can you elaborate on
what you can do to create this problem .. ie: does the memory usage grow
even when solr doesn't get any requests? or do it happen when searches are
executed? or when commits happen? etc...

If the problem is as easy to reproduce as you describe, can you please
generate some heap dumps against a server that isn't processing any
queries -- one from when hte server first starts up, and one from when hte
server crashes from an OOM (there's a JVM option for generating heap dumps
on OOM that i can't think of off hte top of my head)



-Hoss


Re: Plugin Performance Issues

by entdeveloper :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Interesting...I guess I had logically assumed that having type="index" meant it wasn't used for query time, but I see why that's not possible.  Here's the thing though: We had one field defined using this fieldtype and we deployed the new schema to solr when we started seeing the issue.  However, we had not yet released our code that was using the new field (obviously we have to make the change on the solr end before the code, so we asynchronously do this offset by a few days).  So the field that was of that fieldtype wasn't even being queried against.

The problem for us would be pretty easy to reproduce, but I don't think our sys admins would appreciate experimenting with our production solr servers.  We can pretty much only reproduce on our live environment because that's the only environment that's really getting regular (100 qps) traffic, so I guess you could say that it is traffic related.  

Just some other notes, we have a distributed index across 3 shards.  We also regularly pick up snapshots from the master server about once per hour, so whatever commits happen during snapinstalling may affect it, but the timeline of the memory growing doesn't really line up with those commits.

Anyway, I know it all seems like mystery and I apologize if it seems like I'm being vague, but the issue really is that simple.  Hopefully if someone else ever experiences it they can come up with a better explanation why.  Until then, we decided to just deploy our custom classes "the old way" by exploding the war and placing the jars in there - not nearly as convenient, but we haven't experienced any problems doing it this way (same code and config btw, so since the only difference is using the lib directory vs. not, that's most likely the problem).

Thanks for your help

hossman wrote:
: <fieldtype name="text_lc" class="solr.TextField" tokenized="false">
:   <analyzer type="index">
:     <tokenizer class="my.custom.TokenizerFactory"/>
:     <filter class="my.custom.FilterFactory" words="stopwords.txt"/>
:     <filter class="solr.LowerCaseFilterFactory"/>
:     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
:   </analyzer>
: </fieldtype>
        ...
: only do indexing on the master server.  However, with this schema in place
: on the slaves, as well as our custom.jar in the solrHome/lib directory, we
: run into these issues where the memory usage grows and grows without
: explanation.

...even if you only o indexing on the master, having a single analyzer
defined for a field means it's used at both index and query time (even
though you say 'type="index"') so a memory leak in either of your custom
factories could cause a problem on a query box.

This however concerns me...

: fact, in a previous try, we had simply dropped one of our custom plugin jars
: into the lib directory but forgot to deploy the new solrconfig or schema
: files that referenced the classes in there, and the issue still occurred.

...this i can't think of a rational explanation for.  Can you elaborate on
what you can do to create this problem .. ie: does the memory usage grow
even when solr doesn't get any requests? or do it happen when searches are
executed? or when commits happen? etc...

If the problem is as easy to reproduce as you describe, can you please
generate some heap dumps against a server that isn't processing any
queries -- one from when hte server first starts up, and one from when hte
server crashes from an OOM (there's a JVM option for generating heap dumps
on OOM that i can't think of off hte top of my head)



-Hoss

Re: Plugin Performance Issues

by hossman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


: the thing though: We had one field defined using this fieldtype and we
: deployed the new schema to solr when we started seeing the issue.  However,
: we had not yet released our code that was using the new field (obviously we
: have to make the change on the solr end before the code, so we
: asynchronously do this offset by a few days).  So the field that was of that
: fieldtype wasn't even being queried against.

but even then: having the Factorty declared as part of a fieldtype means
the factory is going to get instantiated -- so without nay information
about what the factory does in it's init/inform methods, there's really no
way to guess what might be causing hte behavior you are seeing.

: The problem for us would be pretty easy to reproduce, but I don't think our
: sys admins would appreciate experimenting with our production solr servers.

I completley understand that, but at this point i can't reproduce what
you're seeing, and i haven't seen anyone else sy that they can reproduce
it either -- the simplest explanation being that it's probably not a bug
in Solr, but it might be a bug in your code.

I don't know how else to say this but: if you don't show us some code
that other people can use to try and reproduce, we can't really help you.



-Hoss