« Return to Thread: result grouping?

Re: result grouping?

by Luis Neves-3 :: Rate this Message:

Reply to Author | View in Thread

Yonik Seeley wrote:

> On 1/3/07, Ryan McKinley <ryantxu@...> wrote:
>> thanks.  Yes, the presentation layer could group results, but that is
>> not practical if i want to show the first 20 results out of 200,000
>> matches.
>>
>> Nutch groups the results by site.  Any idea how they do it?
>
> Good question.
> Off the top of my head, one could use a priority queue that can change
> it's size dynamically.  One could increment a group count for each hit
> (like faceted search with the FieldCache) and if the group count
> exceeds "n", then you increment the size of the priority queue to
> allow an additional item to be collected to compensate.
>
> -Yonik

You might as wheel say that I have to change the dilithium crystals in the flux
capacitor :-)

One of the reasons I like Solr so much is because I get impressive results
without having to know Lucene, which is something that will have to change
because I also need this feature.

Not knowing much about the internal of Solr/Lucene I had a look at the Facet
code in search of ideas, but from what I could see the facet counts are
calculated after the Documents are added to the response, it seems to me that
any kind of grouping has to be done before that... right?

Could you explain in more detail where should I look?

Can the TopFieldDocCollector/TopFieldDocs classes be used to this end?

I'm immersing my self on Lucene but it will take some time.

Side note: Over here, beside Solr, we also use the "FAST" search platform and
they call this feature "Field collapsing":
<http://www.fastsearch.com/glossary.aspx?m=48&amid=299>
I like the syntax they use:
"&collapseon=<fieldname>&collapsenum=N" -> Collapse, but keep N number of
collapsed documents
For some reason they can only collapse on numeric fields (int32).

Regards,
Luis Neves

 « Return to Thread: result grouping?