|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
Re: Boolean retrievalCheck out booleanfilter in contrib/queries. It can be wrapped in a constantScoreQuery On 4 Jul 2009, at 17:37, Lukas Michelbacher <michells@...> wrote: This is about an experiment comparing plain Boolean retrieval with vector-space-based retrieval. I would like to disable all of Lucene's scoring mechanisms and just run a true Boolean query that returns exactly the documents that match a query specified in Boolean syntax (OR, AND, NOT). No scoring or sorting required. As far as I can see, this is not supported out of the box. Which classes would I have to modify? Would it be enough to create a subclass of Similarity and to ignore all terms but one (coord, say) and make this term return 1 if the query matches the document and 0 otherwise? Lukas -- Lukas Michelbacher Institute for Natural Language Processing Universität Stuttgart email: michells@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrievalIt is also possible to use the HitCollector api and simply ignore
the score values. Regards, Paul Elschot On Saturday 04 July 2009 21:14:41 Mark Harwood wrote: > > Check out booleanfilter in contrib/queries. It can be wrapped in a constantScoreQuery > > > > On 4 Jul 2009, at 17:37, Lukas Michelbacher <michells@...> wrote: > > > This is about an experiment comparing plain Boolean retrieval with > vector-space-based retrieval. > > I would like to disable all of Lucene's scoring mechanisms and just > run a true Boolean query that returns exactly the documents that match a > query specified in Boolean syntax (OR, AND, NOT). No scoring or sorting > required. > > As far as I can see, this is not supported out of the box. Which classes > would I have to modify? > > Would it be enough to create a subclass of Similarity and to ignore all terms but one (coord, say) and make this term return 1 if the query matches the document and 0 otherwise? > > Lukas > > -- > Lukas Michelbacher > Institute for Natural Language Processing > Universität Stuttgart > email: michells@... > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@... > For additional commands, e-mail: java-user-help@... > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@... > For additional commands, e-mail: java-user-help@... > > > |
|
|
|
|
|
Re: Boolean retrievalTo test my Boolean queries, I have a small test collection where each document
contains one of 1024 possible combinations of the strings "aaa", "bbb", ... "jjj". I tried wrapping a Boolean query like this (it's based on an older post to this list [1]) private static TermsFilter getTermsFilter(String field, String text) { TermsFilter tf = new TermsFilter(); tf.addTerm(new Term(field, text)); return tf; } Query q = new QueryParser("f1", new StandardAnalyzer()).parse("(aaa AND bbb) OR ccc"); IndexSearcher searcher = new IndexSearcher(indexDir); TopDocCollector collector = new TopDocCollector(1024); BooleanQuery bc = (BooleanQuery) q; BooleanFilter finalFilter = new BooleanFilter(); BooleanFilter boolFilt = new BooleanFilter(); // add each clause of the original query to the filter for (BooleanClause clause : bc.getClauses()) { boolFilt.add(new FilterClause(getTermsFilter("f1", clause.getQuery().toString()), clause.getOccur())); System.out.println(clause.getQuery().toString()); } finalFilter.add(new FilterClause(boolFilt, BooleanClause.Occur.MUST)); ConstantScoreQuery csq = new ConstantScoreQuery(finalFilter); searcher.search(csq, finalFilter, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; System.out.println("Found " + collector.getTotalHits() + " hits"); The result is 0 hits (should be 640). [1] tinyurl.com/ml52ye 2009/7/4 Mark Harwood <markharw00d@...>: > > Check out booleanfilter in contrib/queries. It can be wrapped in a constantScoreQuery > > > > On 4 Jul 2009, at 17:37, Lukas Michelbacher <michells@...> wrote: > > > This is about an experiment comparing plain Boolean retrieval with > vector-space-based retrieval. > > I would like to disable all of Lucene's scoring mechanisms and just > run a true Boolean query that returns exactly the documents that match a > query specified in Boolean syntax (OR, AND, NOT). No scoring or sorting > required. > > As far as I can see, this is not supported out of the box. Which classes > would I have to modify? > > Would it be enough to create a subclass of Similarity and to ignore all terms but one (coord, say) and make this term return 1 if the query matches the document and 0 otherwise? > > Lukas > > -- > Lukas Michelbacher > Institute for Natural Language Processing > Universität Stuttgart > email: michells@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrievalSeems a long-winded way of producing a BooleanFilter but I guess you are trying to work with user input in the form of query strings. The bug in your code is that clause.getQuery().getString() is not producing terms that are in your index - the first call to getTermsFilter passes the string "+f1:aaa +f1:bbb" which is not a term in the index. Given the requirement is to ignore scoring I would recommend (as someone else suggested) looking at the IndexSearch.search method that takes a HitCollector and simply accumulate all results, regardless of score. ----- Original Message ---- From: Lukas Michelbacher <mmmasterluke@...> To: java-user@... Sent: Tuesday, 7 July, 2009 9:53:24 Subject: Re: Boolean retrieval To test my Boolean queries, I have a small test collection where each document contains one of 1024 possible combinations of the strings "aaa", "bbb", ... "jjj". I tried wrapping a Boolean query like this (it's based on an older post to this list [1]) private static TermsFilter getTermsFilter(String field, String text) { TermsFilter tf = new TermsFilter(); tf.addTerm(new Term(field, text)); return tf; } Query q = new QueryParser("f1", new StandardAnalyzer()).parse("(aaa AND bbb) OR ccc"); IndexSearcher searcher = new IndexSearcher(indexDir); TopDocCollector collector = new TopDocCollector(1024); BooleanQuery bc = (BooleanQuery) q; BooleanFilter finalFilter = new BooleanFilter(); BooleanFilter boolFilt = new BooleanFilter(); // add each clause of the original query to the filter for (BooleanClause clause : bc.getClauses()) { boolFilt.add(new FilterClause(getTermsFilter("f1", clause.getQuery().toString()), clause.getOccur())); System.out.println(clause.getQuery().toString()); } finalFilter.add(new FilterClause(boolFilt, BooleanClause.Occur.MUST)); ConstantScoreQuery csq = new ConstantScoreQuery(finalFilter); searcher.search(csq, finalFilter, collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; System.out.println("Found " + collector.getTotalHits() + " hits"); The result is 0 hits (should be 640). [1] tinyurl.com/ml52ye 2009/7/4 Mark Harwood <markharw00d@...>: > > Check out booleanfilter in contrib/queries. It can be wrapped in a constantScoreQuery > > > > On 4 Jul 2009, at 17:37, Lukas Michelbacher <michells@...> wrote: > > > This is about an experiment comparing plain Boolean retrieval with > vector-space-based retrieval. > > I would like to disable all of Lucene's scoring mechanisms and just > run a true Boolean query that returns exactly the documents that match a > query specified in Boolean syntax (OR, AND, NOT). No scoring or sorting > required. > > As far as I can see, this is not supported out of the box. Which classes > would I have to modify? > > Would it be enough to create a subclass of Similarity and to ignore all terms but one (coord, say) and make this term return 1 if the query matches the document and 0 otherwise? > > Lukas > > -- > Lukas Michelbacher > Institute for Natural Language Processing > Universität Stuttgart > email: michells@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrievalOn Tue, Jul 7, 2009 at 5:39 AM, mark harwood<markharw00d@...> wrote:
> Given the requirement is to ignore scoring I would recommend (as someone else suggested) looking at the IndexSearch.search method that takes a HitCollector and simply accumulate all results, regardless of score. Make that "Collector" (new as of 2.9). HitCollector is the old (deprecated as of 2.9) way, which always pre-computed the score of each hit and passed the score to the collect method. Whereas Collector makes scoring optional (your collect method must actually request the score). EG sorting by field makes use of this (in 2.9) by making scoring optional. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrieval> Seems a long-winded way of producing a BooleanFilter but I guess you are trying to work with user input in the form of query strings.
Yes I am. I had the same impression but I couldn't figure out a more straightforward way. > The bug in your code is that clause.getQuery().getString() is not producing terms that are in your index - the first call to getTermsFilter passes the string "+f1:aaa +f1:bbb" which is not a term in the index. OK, thanks. I'll have look at that. > Given the requirement is to ignore scoring I would recommend (as someone else suggested) looking at the IndexSearch.search method that takes a HitCollector and simply accumulate all results, regardless of score. I think that's what I'm going to end up doing. I was just curious to know what was wrong with my initial approach. Thanks for your comments. Lukas --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrieval> Make that "Collector" (new as of 2.9).
> > HitCollector is the old (deprecated as of 2.9) way, which always > pre-computed the score of each hit and passed the score to the collect > method. Where can I find docs for 2.9? Do I just have to check out the lucene trunk and run javadoc there? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrievaltsuraan wrote:
>> Make that "Collector" (new as of 2.9). >> >> HitCollector is the old (deprecated as of 2.9) way, which always >> pre-computed the score of each hit and passed the score to the collect >> method. >> > > Where can I find docs for 2.9? Do I just have to check out the lucene > trunk and run javadoc there? > > Koji --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: Boolean retrieval> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/index.html
> > Koji Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
| Free embeddable forum powered by Nabble | Forum Help |