|
View:
New views
6 Messages
—
Rating Filter:
Alert me
|
|
|
Multifield query parser and phrase query behaviour from 1.3 to 1.4Hi All,
I'm using a multified query parser to generated weighted queries across different fields. For instance, perl developer gives me: +(title:perl^10.0 keywords:perl company:perl^3.0) +(title:developer^10.0 keywords:developer company:developer^3.0) Either in solr 1.3 or solr 1.4 (from 12 oct 2009), a query like "d'affaire" gives me: title:"d affaire"^10.0 keywords:"d affaire" company:"d affaire"^3.0 nb: "d" is not a stopword That's the first thing I don't get, since "d'affaire" is parsed as two separate tokens 'd' and 'affaire' , why these phrase queries appear? When I use the analysis interface of solr, "d'affaire" gives (for query or indexing, since the analyzer is the same): term position 1 2 term text d affaire term type word word source start,end 0,1 2,9 You can't see it in this email, but 'd' and 'affaire' are both purple, indicating a match with the query tokens. I don't really get why these two tokens are subsequently put together in a phrase query. In solr 1.3, it didn't seem to be a problem though. title:"d affaire" matches document where title contains "d'affaire" and all is fine. That's the behaviour we should expect since the title field uses exactly the same analyzer at index and query time. Since I'm using solr 1.4, title:"d affaire" does not give any results back. Is there any behaviour change that could be responsible for this, and what's the correct way to fix this? Thanks for your help. Jerome. -- Jerome Eteve. http://www.eteve.net jerome@... |
|
|
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <jerome.eteve@...> wrote:
> I don't really get why these two tokens are subsequently put together > in a phrase query. That's the way the Lucene query parser has always worked... phrase queries are made if multiple tokens are produced from one field query. > In solr 1.3, it didn't seem to be a problem though. title:"d affaire" > matches document where title contains "d'affaire" and all is fine. This should not have changed between 1.3 and 1.4... What's the fieldType and it's definition for your title field? -Yonik http://www.lucidimagination.com |
|
|
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4Hum,
That's probably because of our own customized types/tokenizers/filters. I tried reindexing and querying our data using the default solr type 'textgen' and it works fine. I need to investigate which features of the new lucene 2.9 API is not implemented in our own tokenizers etc... Thanks. Jerome. 2009/10/27 Yonik Seeley <yonik@...>: > On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <jerome.eteve@...> wrote: >> I don't really get why these two tokens are subsequently put together >> in a phrase query. > > That's the way the Lucene query parser has always worked... phrase > queries are made if multiple tokens are produced from one field query. > >> In solr 1.3, it didn't seem to be a problem though. title:"d affaire" >> matches document where title contains "d'affaire" and all is fine. > > This should not have changed between 1.3 and 1.4... > What's the fieldType and it's definition for your title field? > > -Yonik > http://www.lucidimagination.com > -- Jerome Eteve. http://www.eteve.net jerome@... |
|
|
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4Actually here is the difference between the textgen analysis pipeline and our:
For the phrase "ingenieur d'affaire senior" , Our pipeline gives right after our tokenizer: term position 1 2 3 4 term text ingenieur d affaire senior 'd' and 'affaire' are separated as different tokens straight away. Our filters have no later effect for this phrase. * The textgen pipeline uses a whitespace tokenizer, so it gives first: term position 1 2 3 term text ingenieur d'affaire senior term type word word word source start,end 0,9 10,19 20,26 * Then a word delimiter filter splits the token "d'affaire" (and generate the concatenation): erm position 1 2 3 4 term text ingenieur d affaire senior daffaire term type word word word word word source start,end 0,9 10,11 12,19 20,26 10,19 Could you see a reason why title:"d affaire" works with textgen but not with our type? Thanks! Jerome. 2009/10/27 Jérôme Etévé <jerome.eteve@...>: > Hum, > That's probably because of our own customized types/tokenizers/filters. > > I tried reindexing and querying our data using the default solr type > 'textgen' and it works fine. > > I need to investigate which features of the new lucene 2.9 API is not > implemented in our own tokenizers etc... > > Thanks. > > Jerome. > > 2009/10/27 Yonik Seeley <yonik@...>: >> On Tue, Oct 27, 2009 at 8:44 AM, Jérôme Etévé <jerome.eteve@...> wrote: >>> I don't really get why these two tokens are subsequently put together >>> in a phrase query. >> >> That's the way the Lucene query parser has always worked... phrase >> queries are made if multiple tokens are produced from one field query. >> >>> In solr 1.3, it didn't seem to be a problem though. title:"d affaire" >>> matches document where title contains "d'affaire" and all is fine. >> >> This should not have changed between 1.3 and 1.4... >> What's the fieldType and it's definition for your title field? >> >> -Yonik >> http://www.lucidimagination.com >> > > > > -- > Jerome Eteve. > http://www.eteve.net > jerome@... > -- Jerome Eteve. http://www.eteve.net jerome@... |
|
|
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4Mea maxima culpa,
I had foolishly set the option omitTermFreqAndPositions="false" in an attempt to save space. It works when this is set to 'true'. However, even when it's set to 'false' , the highlighting of a field continues to work even if the search doesn't. Does the highlighter use a different strategy to match the query terms in the fields? Cheers! Jerome. 2009/10/27 Jérôme Etévé <jerome.eteve@...>: > Actually here is the difference between the textgen analysis pipeline and our: > > For the phrase "ingenieur d'affaire senior" , > Our pipeline gives right after our tokenizer: > > term position 1 2 3 4 > term text ingenieur d affaire senior > > 'd' and 'affaire' are separated as different tokens straight away. Our > filters have no later effect for this phrase. > > * The textgen pipeline uses a whitespace tokenizer, so it gives first: > term position 1 2 3 > term text ingenieur d'affaire senior > term type word word word > source start,end 0,9 10,19 20,26 > > * Then a word delimiter filter splits the token "d'affaire" (and > generate the concatenation): > erm position 1 2 3 4 > term text ingenieur d affaire senior > daffaire > term type word word word word > word > source start,end 0,9 10,11 12,19 20,26 > 10,19 > > > Could you see a reason why title:"d affaire" works with textgen but > not with our type? |
|
|
Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4: However, even when it's set to 'false' , the highlighting of a field : continues to work even if the search doesn't. : Does the highlighter use a different strategy to match the query terms : in the fields? if it has term vectors, it uses them, otherwise it re analyzes the stored fields. -Hoss |
| Free embeddable forum powered by Nabble | Forum Help |