>
> 2. Profiler showed that lot of time is spent on GC due to the fact
> that Solr
> creates new SolrDocument instances for every document
> it retrieves for every query. We solved this by patching
> BinaryResponseWriter (and later InplaceResponseBuilder created by the
> previous patch) so it uses custom SolrCache to cache SolrDocument
> instances.
>
Would this work in a non-embedded mode?
> 3. We noticed that a lot of CPU cycles are spent on copying values
> from one
> Map to another (from Lucene Document to SolrDocument instances) when
> creating new SolrDocument instances. So we created class
> SolrDocumentWrapper
> which doesn't use own Map instance but works as a wrapper around the
> given
> one, avoiding unnecessary memory usage and data copying.
>
> These changes improved our performance very much. We got rid of load
> on GC,
> and IO load created by reading the index.
>
> What do you think, guys? Does it make sense to include all this
> stuff into
> Solr?
Sounds good -- In the EmbeddedSolr design, I think we were mostly
thinking 'standard' use case where only the first 20-100 results are
converted to SolrDocument, any improvement that makes this work better
is welcome!
Do you want to create an issue for 2 & 3? If the changes you have
made generally improve EmbeddedSolrServer and do not hurt anything
else, it would be great to get this into core...
thanks
ryan