|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
is it possible to make lucene searches match based on per doc field:termcount?Hi All, I hope someone can offer some advice.
I want to extend lucene to search in a particular way(if it cant already): I want to index docs, each with file containing several terms something like: doc1=>myfield:a doc2=>myfield:a,b doc3=>myfield:a,b,c doc4=>myfield:a,b,c,d so far nothing new. I want to query for matching docs such that a query something like myfield:(a or b) should only return docs if the doc itself is FULLY matched. ie, for the query myfield:(a or b) , only doc1 and doc2 should match. So the rules are its only a match if the termcount for each doc is <=the termcount of the query(for that field) AND ALL the terms in the doc were matched a few more examples just to clarify: myfield:(a or b or d) would match doc4 myfield:(a or b or c or d) would match ALL the docs here (this one works anyway but only because it uses all the terms that exist) myfield:(a) would match doc1 order is not important (but might be a nice have) can anyone tell me if its possible to make lucene do this, and perhaps offer a starting point? thanks Jason. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: is it possible to make lucene searches match based on per doc field:termcount?On Nov 5, 2009, at 4:31 PM, Jason Eacott wrote: > Hi All, I hope someone can offer some advice. > I want to extend lucene to search in a particular way(if it cant > already): > > I want to index docs, each with file containing several terms > something like: > doc1=>myfield:a > doc2=>myfield:a,b > doc3=>myfield:a,b,c > doc4=>myfield:a,b,c,d > so far nothing new. > > I want to query for matching docs such that a query something like > myfield:(a or b) should only return docs if the doc itself is FULLY > matched. > ie, for the query myfield:(a or b) , only doc1 and doc2 should match. > So the rules are its only a match if the termcount for each doc is > <=the termcount of the query(for that field) AND ALL the terms in > the doc were matched > > a few more examples just to clarify: > myfield:(a or b or d) would match doc4 > myfield:(a or b or c or d) would match ALL the docs here (this one > works anyway but only because it uses all the terms that exist) > myfield:(a) would match doc1 > > order is not important (but might be a nice have) > > can anyone tell me if its possible to make lucene do this, and > perhaps offer a starting point? Would overriding http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DefaultSimilarity.html#coord%28int,%20int%29 help? -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: is it possible to make lucene searches match based on per doc field:termcount?looks like it, thanks!
but if I index multiple copies of the same value, ie: myfield:a,a,b and search with myfield:(a or b) or perhaps myfield:(a or a or b) will I be able to tell the difference? is this apparently duplicate data kept as part of the query? (I'd like to be able to do this too) Cheers Jason. Grant Ingersoll wrote: > > On Nov 5, 2009, at 4:31 PM, Jason Eacott wrote: > >> Hi All, I hope someone can offer some advice. >> I want to extend lucene to search in a particular way(if it cant >> already): >> >> I want to index docs, each with file containing several terms >> something like: >> doc1=>myfield:a >> doc2=>myfield:a,b >> doc3=>myfield:a,b,c >> doc4=>myfield:a,b,c,d >> so far nothing new. >> >> I want to query for matching docs such that a query something like >> myfield:(a or b) should only return docs if the doc itself is FULLY >> matched. >> ie, for the query myfield:(a or b) , only doc1 and doc2 should match. >> So the rules are its only a match if the termcount for each doc is >> <=the termcount of the query(for that field) AND ALL the terms in the >> doc were matched >> >> a few more examples just to clarify: >> myfield:(a or b or d) would match doc4 >> myfield:(a or b or c or d) would match ALL the docs here (this one >> works anyway but only because it uses all the terms that exist) >> myfield:(a) would match doc1 >> >> order is not important (but might be a nice have) >> >> can anyone tell me if its possible to make lucene do this, and perhaps >> offer a starting point? > > Would overriding > http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/DefaultSimilarity.html#coord%28int,%20int%29 help? > > > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@... > For additional commands, e-mail: java-user-help@... > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
| Free embeddable forum powered by Nabble | Forum Help |