MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs

View: New views
2 Messages — Rating Filter:   Alert me  

MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs

by aldana :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi,

i am using MoreLikeThis handler to query similar ads. inside index are docs:
-------
X, which is the base for more-like-this query

A
B1
B2 <- identical to B
B1' <- not-identical but very similar to B
X' <- not-identical but very similar to X
------

in the query result i expect {A,B1} to be returned. the very similar {A,B2,B1',X'} should be discarded.

looking at http://wiki.apache.org/solr/MoreLikeThis i cannot see any option, how to achieve this. or maybe there is a trick when configuring mlt.qf?

what i would expect in configuration is:
-possibility to pass distance function for certain fields
-for distance function define an upper threshold, so too similar docs are excluded (so kind of a 'negative' boost)




manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de

Re: MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs

by Otis Gospodnetic :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I'd start by looking at SOLR-236 and looking for a place where hit-hit similarity could be plugged in instead of looking for hit-hit pairs with identical fields.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----

> From: aldana <aldana@...>
> To: solr-user@...
> Sent: Fri, November 6, 2009 12:24:24 PM
> Subject: MoreLikeThis SearchHandler, offer string-distance to avoid duplicate return-docs
>
>
> hi,
>
> i am using MoreLikeThis handler to query similar ads. inside index are docs:
> -------
> X, which is the base for more-like-this query
>
> A
> B1
> B2 <- identical to B
> B1' <- not-identical but very similar to B
> X' <- not-identical but very similar to X
> ------
>
> in the query result i expect {A,B1} to be returned. the very similar
> {A,B2,B1',X'} should be discarded.
>
> looking at http://wiki.apache.org/solr/MoreLikeThis i cannot see any option,
> how to achieve this. or maybe there is a trick when configuring mlt.qf?
>
> what i would expect in configuration is:
> -possibility to pass distance function for certain fields
> -for distance function define an upper threshold, so too similar docs are
> excluded (so kind of a 'negative' boost)
>
>
>
>
>
>
> -----
> manuel aldana
> aldana((at))gmx.de
> software-engineering blog: http://www.aldana-online.de
> --
> View this message in context:
> http://old.nabble.com/MoreLikeThis-SearchHandler%2C-offer-string-distance-to-avoid-duplicate-return-docs-tp26230839p26230839.html
> Sent from the Solr - User mailing list archive at Nabble.com.