|
View:
New views
11 Messages
—
Rating Filter:
Alert me
|
|
|
Boosting for most recent documentsHi,
I'm trying to find a way to get the most recent entry for the searched word. For ex., if I have a document with field name "user". If I search for user:vivek, I want to get the document that was indexed most recently. Two ways I could think of, 1) Sort by some time stamp field - but with millions of documents this becomes a huge memory problem as we have seen OOM with sorting before 2) Boost the most recent document - I'm not sure how to do this. Basically, we want to have the most recent document score higher than any other and then we can retrieve just 10 records and sort in the application by time stamp field to get the most recent document matching the keyword. Any suggestion on how can this be done? Thanks, -vivek |
|
|
Re: Boosting for most recent documentsSort by the internal Lucene document ID and pick the highest one. That might do the job for you. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: vivek sar <vivextra@...> > To: solr-user <solr-user@...> > Sent: Wednesday, July 8, 2009 8:34:16 PM > Subject: Boosting for most recent documents > > Hi, > > I'm trying to find a way to get the most recent entry for the > searched word. For ex., if I have a document with field name "user". > If I search for user:vivek, I want to get the document that was > indexed most recently. Two ways I could think of, > > 1) Sort by some time stamp field - but with millions of documents this > becomes a huge memory problem as we have seen OOM with sorting before > 2) Boost the most recent document - I'm not sure how to do this. > Basically, we want to have the most recent document score higher than > any other and then we can retrieve just 10 records and sort in the > application by time stamp field to get the most recent document > matching the keyword. > > Any suggestion on how can this be done? > > Thanks, > -vivek |
|
|
Re: Boosting for most recent documentsThanks Otis. I got a distributed index - using Solr multi-core.
Basically, I got 6 indexer instances running on 3 different boxes. Couple of questions, 1) Is it possible to sort on document id for multiple-shards? How is that done? 2) How would boost by most recent doc at index time? Thanks, -vivek On Wed, Jul 8, 2009 at 7:47 PM, Otis Gospodnetic<otis_gospodnetic@...> wrote: > > Sort by the internal Lucene document ID and pick the highest one. That might do the job for you. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: vivek sar <vivextra@...> >> To: solr-user <solr-user@...> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >> Subject: Boosting for most recent documents >> >> Hi, >> >> I'm trying to find a way to get the most recent entry for the >> searched word. For ex., if I have a document with field name "user". >> If I search for user:vivek, I want to get the document that was >> indexed most recently. Two ways I could think of, >> >> 1) Sort by some time stamp field - but with millions of documents this >> becomes a huge memory problem as we have seen OOM with sorting before >> 2) Boost the most recent document - I'm not sure how to do this. >> Basically, we want to have the most recent document score higher than >> any other and then we can retrieve just 10 records and sort in the >> application by time stamp field to get the most recent document >> matching the keyword. >> >> Any suggestion on how can this be done? >> >> Thanks, >> -vivek > > |
|
|
Re: Boosting for most recent documentsAh, with multiple indices you can't rely on the max Lucene doc Id. I think you have to do with the timestamp approach. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: vivek sar <vivextra@...> > To: solr-user@... > Sent: Thursday, July 9, 2009 1:13:54 PM > Subject: Re: Boosting for most recent documents > > Thanks Otis. I got a distributed index - using Solr multi-core. > Basically, I got 6 indexer instances running on 3 different boxes. > Couple of questions, > > 1) Is it possible to sort on document id for multiple-shards? How is that done? > 2) How would boost by most recent doc at index time? > > Thanks, > -vivek > > > > On Wed, Jul 8, 2009 at 7:47 PM, Otis > Gospodneticwrote: > > > > Sort by the internal Lucene document ID and pick the highest one. That might > do the job for you. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: vivek sar > >> To: solr-user > >> Sent: Wednesday, July 8, 2009 8:34:16 PM > >> Subject: Boosting for most recent documents > >> > >> Hi, > >> > >> I'm trying to find a way to get the most recent entry for the > >> searched word. For ex., if I have a document with field name "user". > >> If I search for user:vivek, I want to get the document that was > >> indexed most recently. Two ways I could think of, > >> > >> 1) Sort by some time stamp field - but with millions of documents this > >> becomes a huge memory problem as we have seen OOM with sorting before > >> 2) Boost the most recent document - I'm not sure how to do this. > >> Basically, we want to have the most recent document score higher than > >> any other and then we can retrieve just 10 records and sort in the > >> application by time stamp field to get the most recent document > >> matching the keyword. > >> > >> Any suggestion on how can this be done? > >> > >> Thanks, > >> -vivek > > > > |
|
|
Re: Boosting for most recent documentsHow do we sort by internal doc id (say on one index only) using Solr?
I saw couple of threads saying it (Sort.INDEXORDER) was not supported in Solr, http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 Has the index order support been added in Solr 1.4? How do we use that - any documentation? Thanks, -vivek On Thu, Jul 9, 2009 at 2:21 PM, Otis Gospodnetic<otis_gospodnetic@...> wrote: > > Ah, with multiple indices you can't rely on the max Lucene doc Id. I think you have to do with the timestamp approach. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: vivek sar <vivextra@...> >> To: solr-user@... >> Sent: Thursday, July 9, 2009 1:13:54 PM >> Subject: Re: Boosting for most recent documents >> >> Thanks Otis. I got a distributed index - using Solr multi-core. >> Basically, I got 6 indexer instances running on 3 different boxes. >> Couple of questions, >> >> 1) Is it possible to sort on document id for multiple-shards? How is that done? >> 2) How would boost by most recent doc at index time? >> >> Thanks, >> -vivek >> >> >> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis >> Gospodneticwrote: >> > >> > Sort by the internal Lucene document ID and pick the highest one. That might >> do the job for you. >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > ----- Original Message ---- >> >> From: vivek sar >> >> To: solr-user >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >> >> Subject: Boosting for most recent documents >> >> >> >> Hi, >> >> >> >> I'm trying to find a way to get the most recent entry for the >> >> searched word. For ex., if I have a document with field name "user". >> >> If I search for user:vivek, I want to get the document that was >> >> indexed most recently. Two ways I could think of, >> >> >> >> 1) Sort by some time stamp field - but with millions of documents this >> >> becomes a huge memory problem as we have seen OOM with sorting before >> >> 2) Boost the most recent document - I'm not sure how to do this. >> >> Basically, we want to have the most recent document score higher than >> >> any other and then we can retrieve just 10 records and sort in the >> >> application by time stamp field to get the most recent document >> >> matching the keyword. >> >> >> >> Any suggestion on how can this be done? >> >> >> >> Thanks, >> >> -vivek >> > >> > > > |
|
|
Re: Boosting for most recent documentsWith a time stamp you can use a function query to boost the score of newer
documents: http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd Bill On Thu, Jul 9, 2009 at 5:58 PM, vivek sar <vivextra@...> wrote: > How do we sort by internal doc id (say on one index only) using Solr? > I saw couple of threads saying it (Sort.INDEXORDER) was not supported > in Solr, > > > http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 > > http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 > > Has the index order support been added in Solr 1.4? How do we use that > - any documentation? > > Thanks, > -vivek > > On Thu, Jul 9, 2009 at 2:21 PM, Otis > Gospodnetic<otis_gospodnetic@...> wrote: > > > > Ah, with multiple indices you can't rely on the max Lucene doc Id. I > think you have to do with the timestamp approach. > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: vivek sar <vivextra@...> > >> To: solr-user@... > >> Sent: Thursday, July 9, 2009 1:13:54 PM > >> Subject: Re: Boosting for most recent documents > >> > >> Thanks Otis. I got a distributed index - using Solr multi-core. > >> Basically, I got 6 indexer instances running on 3 different boxes. > >> Couple of questions, > >> > >> 1) Is it possible to sort on document id for multiple-shards? How is > that done? > >> 2) How would boost by most recent doc at index time? > >> > >> Thanks, > >> -vivek > >> > >> > >> > >> On Wed, Jul 8, 2009 at 7:47 PM, Otis > >> Gospodneticwrote: > >> > > >> > Sort by the internal Lucene document ID and pick the highest one. > That might > >> do the job for you. > >> > > >> > Otis > >> > -- > >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > > >> > > >> > > >> > ----- Original Message ---- > >> >> From: vivek sar > >> >> To: solr-user > >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM > >> >> Subject: Boosting for most recent documents > >> >> > >> >> Hi, > >> >> > >> >> I'm trying to find a way to get the most recent entry for the > >> >> searched word. For ex., if I have a document with field name "user". > >> >> If I search for user:vivek, I want to get the document that was > >> >> indexed most recently. Two ways I could think of, > >> >> > >> >> 1) Sort by some time stamp field - but with millions of documents > this > >> >> becomes a huge memory problem as we have seen OOM with sorting before > >> >> 2) Boost the most recent document - I'm not sure how to do this. > >> >> Basically, we want to have the most recent document score higher than > >> >> any other and then we can retrieve just 10 records and sort in the > >> >> application by time stamp field to get the most recent document > >> >> matching the keyword. > >> >> > >> >> Any suggestion on how can this be done? > >> >> > >> >> Thanks, > >> >> -vivek > >> > > >> > > > > > > |
|
|
Re: Boosting for most recent documentsThanks Bill. Couple of questions,
1) Would the function query load all unique terms (for that field) in memory the way sort (field cache) does? If so, that wouldn't work for us as we can have over 5 billion records spread across multiple shards (up to 10 indexer instances), that would surely kill the process if it were to load everything in memory. 2) Would the function query work on multi-shard query? For ex., recip(rord(creationDate),1,1000,1000) would it automatically do the function on the combined result from all the shards or would it run on individual shard and get results from them? I would still be interested in knowing if Solr supports Sort.IndexOrder - if so, how? Thanks, -vivek On Thu, Jul 9, 2009 at 8:27 PM, Bill Au<bill.w.au@...> wrote: > With a time stamp you can use a function query to boost the score of newer > documents: > http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd > > Bill > > On Thu, Jul 9, 2009 at 5:58 PM, vivek sar <vivextra@...> wrote: > >> How do we sort by internal doc id (say on one index only) using Solr? >> I saw couple of threads saying it (Sort.INDEXORDER) was not supported >> in Solr, >> >> >> http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 >> >> http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 >> >> Has the index order support been added in Solr 1.4? How do we use that >> - any documentation? >> >> Thanks, >> -vivek >> >> On Thu, Jul 9, 2009 at 2:21 PM, Otis >> Gospodnetic<otis_gospodnetic@...> wrote: >> > >> > Ah, with multiple indices you can't rely on the max Lucene doc Id. I >> think you have to do with the timestamp approach. >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > ----- Original Message ---- >> >> From: vivek sar <vivextra@...> >> >> To: solr-user@... >> >> Sent: Thursday, July 9, 2009 1:13:54 PM >> >> Subject: Re: Boosting for most recent documents >> >> >> >> Thanks Otis. I got a distributed index - using Solr multi-core. >> >> Basically, I got 6 indexer instances running on 3 different boxes. >> >> Couple of questions, >> >> >> >> 1) Is it possible to sort on document id for multiple-shards? How is >> that done? >> >> 2) How would boost by most recent doc at index time? >> >> >> >> Thanks, >> >> -vivek >> >> >> >> >> >> >> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis >> >> Gospodneticwrote: >> >> > >> >> > Sort by the internal Lucene document ID and pick the highest one. >> That might >> >> do the job for you. >> >> > >> >> > Otis >> >> > -- >> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> > >> >> > >> >> > >> >> > ----- Original Message ---- >> >> >> From: vivek sar >> >> >> To: solr-user >> >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >> >> >> Subject: Boosting for most recent documents >> >> >> >> >> >> Hi, >> >> >> >> >> >> I'm trying to find a way to get the most recent entry for the >> >> >> searched word. For ex., if I have a document with field name "user". >> >> >> If I search for user:vivek, I want to get the document that was >> >> >> indexed most recently. Two ways I could think of, >> >> >> >> >> >> 1) Sort by some time stamp field - but with millions of documents >> this >> >> >> becomes a huge memory problem as we have seen OOM with sorting before >> >> >> 2) Boost the most recent document - I'm not sure how to do this. >> >> >> Basically, we want to have the most recent document score higher than >> >> >> any other and then we can retrieve just 10 records and sort in the >> >> >> application by time stamp field to get the most recent document >> >> >> matching the keyword. >> >> >> >> >> >> Any suggestion on how can this be done? >> >> >> >> >> >> Thanks, >> >> >> -vivek >> >> > >> >> > >> > >> > >> > |
|
|
Re: Boosting for most recent documentsHi,
Does anyone know if Solr supports sorting by internal document ids, i.e, like Sort.INDEXORDER in Lucene? If so, how? Also, if anyone have any insight on if function query loads up unique terms (like field sorts) in memory or not. Thanks, -vivek On Fri, Jul 10, 2009 at 10:26 AM, vivek sar<vivextra@...> wrote: > Thanks Bill. Couple of questions, > > 1) Would the function query load all unique terms (for that field) in > memory the way sort (field cache) does? If so, that wouldn't work for > us as we can have over 5 billion records spread across multiple shards > (up to 10 indexer instances), that would surely kill the process if it > were to load everything in memory. > > 2) Would the function query work on multi-shard query? For ex., > recip(rord(creationDate),1,1000,1000) would it automatically do the > function on the combined result from all the shards or would it run on > individual shard and get results from them? > > I would still be interested in knowing if Solr supports > Sort.IndexOrder - if so, how? > > Thanks, > -vivek > > On Thu, Jul 9, 2009 at 8:27 PM, Bill Au<bill.w.au@...> wrote: >> With a time stamp you can use a function query to boost the score of newer >> documents: >> http://wiki.apache.org/solr/SolrRelevancyFAQ#head-b1b1cdedcb9cd9bfd9c994709b4d7e540359b1fd >> >> Bill >> >> On Thu, Jul 9, 2009 at 5:58 PM, vivek sar <vivextra@...> wrote: >> >>> How do we sort by internal doc id (say on one index only) using Solr? >>> I saw couple of threads saying it (Sort.INDEXORDER) was not supported >>> in Solr, >>> >>> >>> http://www.nabble.com/sort-by-index-id-descending--td16124009.html#a16124009 >>> >>> http://www.nabble.com/Reverse-sorting-by-index-order-td1321032.html#a1321032 >>> >>> Has the index order support been added in Solr 1.4? How do we use that >>> - any documentation? >>> >>> Thanks, >>> -vivek >>> >>> On Thu, Jul 9, 2009 at 2:21 PM, Otis >>> Gospodnetic<otis_gospodnetic@...> wrote: >>> > >>> > Ah, with multiple indices you can't rely on the max Lucene doc Id. I >>> think you have to do with the timestamp approach. >>> > >>> > Otis >>> > -- >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> > >>> > >>> > >>> > ----- Original Message ---- >>> >> From: vivek sar <vivextra@...> >>> >> To: solr-user@... >>> >> Sent: Thursday, July 9, 2009 1:13:54 PM >>> >> Subject: Re: Boosting for most recent documents >>> >> >>> >> Thanks Otis. I got a distributed index - using Solr multi-core. >>> >> Basically, I got 6 indexer instances running on 3 different boxes. >>> >> Couple of questions, >>> >> >>> >> 1) Is it possible to sort on document id for multiple-shards? How is >>> that done? >>> >> 2) How would boost by most recent doc at index time? >>> >> >>> >> Thanks, >>> >> -vivek >>> >> >>> >> >>> >> >>> >> On Wed, Jul 8, 2009 at 7:47 PM, Otis >>> >> Gospodneticwrote: >>> >> > >>> >> > Sort by the internal Lucene document ID and pick the highest one. >>> That might >>> >> do the job for you. >>> >> > >>> >> > Otis >>> >> > -- >>> >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >> > >>> >> > >>> >> > >>> >> > ----- Original Message ---- >>> >> >> From: vivek sar >>> >> >> To: solr-user >>> >> >> Sent: Wednesday, July 8, 2009 8:34:16 PM >>> >> >> Subject: Boosting for most recent documents >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> I'm trying to find a way to get the most recent entry for the >>> >> >> searched word. For ex., if I have a document with field name "user". >>> >> >> If I search for user:vivek, I want to get the document that was >>> >> >> indexed most recently. Two ways I could think of, >>> >> >> >>> >> >> 1) Sort by some time stamp field - but with millions of documents >>> this >>> >> >> becomes a huge memory problem as we have seen OOM with sorting before >>> >> >> 2) Boost the most recent document - I'm not sure how to do this. >>> >> >> Basically, we want to have the most recent document score higher than >>> >> >> any other and then we can retrieve just 10 records and sort in the >>> >> >> application by time stamp field to get the most recent document >>> >> >> matching the keyword. >>> >> >> >>> >> >> Any suggestion on how can this be done? >>> >> >> >>> >> >> Thanks, >>> >> >> -vivek >>> >> > >>> >> > >>> > >>> > >>> >> > |
|
|
Re: Boosting for most recent documents: Does anyone know if Solr supports sorting by internal document ids, : i.e, like Sort.INDEXORDER in Lucene? If so, how? It does not. in Solr the decisison to make "score desc" the default search ment there is no way to request simple docId ordering. : Also, if anyone have any insight on if function query loads up unique : terms (like field sorts) in memory or not. It uses the exact same FieldCache as sorting. -Hoss |
|
|
Re: Boosting for most recent documentsHi,
Related question to "getting the latest records first". After trying few suggested ways (function query, index time boosting) of getting the latest first I settled for simple "sort" parameter, sort=field+asc As per wiki, http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), Lucene would cache "4 bytes * the number of documents" plus unique terms for the sorted field in fieldcache. This is done so subsequent sort requests can be retrieved from cache. So the memory usage if I got 1 billion records in one Indexer instance, for ex, 1) 1 billion records 2) sort on time stamp field (rounded to hour) - for 1 year - 8760 unique terms. (negligible) 3) Total memory requirement for sorting on this single field would be around 1G * 4 = 4GB So, if I run only one sort query once in a day there would still be 4GB required at all time. Is there any way to tell Solr/Lucene to release the memory once the query has been run? Basically I don't want cache. I've commented out all the cache parameters in the solrconfig.xml, but I still see the very first time I run the sort query the memory jumps by 4 G and remains there. Is there any way so Lucene/Solr doesn't use so much memory for sorting so my application can scale (sorting memory requirement won't be function of number of documents)? Thanks, -vivek On Thu, Jul 16, 2009 at 3:10 PM, Chris Hostetter<hossman_lucene@...> wrote: > > : Does anyone know if Solr supports sorting by internal document ids, > : i.e, like Sort.INDEXORDER in Lucene? If so, how? > > It does not. in Solr the decisison to make "score desc" the default > search ment there is no way to request simple docId ordering. > > : Also, if anyone have any insight on if function query loads up unique > : terms (like field sorts) in memory or not. > > It uses the exact same FieldCache as sorting. > > > > > -Hoss > |
|
|
Re: Boosting for most recent documentsOn Mon, Aug 3, 2009 at 2:46 PM, vivek sar<vivextra@...> wrote:
> So, if I run only one sort query once in a day there would still be > 4GB required at all time. Is there any way to tell Solr/Lucene to > release the memory once the query has been run? Basically I don't want > cache. I've commented out all the cache parameters in the > solrconfig.xml, but I still see the very first time I run the sort > query the memory jumps by 4 G and remains there. There is currently no way to tell Lucene not to cache the FieldCache entry it uses for sorting. If you call commit though, a new searcher will be opened and the memory should be released. -Yonik http://www.lucidimagination.com |
| Free embeddable forum powered by Nabble | Forum Help |