|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next > |
|
|
[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734015#action_12734015 ] Luis Alves edited comment on LUCENE-1486 at 7/22/09 7:57 AM: ------------------------------------------------------------- I share same opinion as Michael, the implementation has a lot of undefined/undocumented behaviors, simple because it reuses the queryparser to parse the text inside a phrase. All the lucene syntax needs to be accounted on this design, but it does not seem to be the case. Problems like Adriano described, phrase inside a phrase, position reporting for errors. I also have a lot of concerns about having the full lucene syntax inside phrases and trying to restrict this by throwing exceptions for particular cases does not seem the best design. Here is a example of with OR, AND, PARENTESIS with a proximity search "(( jakarta OR green) AND (blue AND orange) AND black~0.5) apache"~10 What should a user expect from this query, without looking at the code. I'm not sure. Does it even make sense to support this complex syntax? In my opinion. no I think we should define what is the subset of the language we want to support inside the phrases with a well defined behavior. If Mark describes all the syntax he wants to support inside phrases, I actually don't mind to implement a new parser.for this. My view is, contrib is probably a better place to have this code, until we figure out a implementation that does not impose as many restrictions on changes to the original queryparser and describes a well defined syntax to be applied inside phrases. was (Author: lafa): I share same opinion as Michael, the implementation has a lot of undefined/undocumented behaviors, simple because it reuses the queryparser to parse the text inside a phrase. All the lucene syntax needs to be accounted on this design, but it does not seem to be the case. Problems like Adriano described, phrase inside a phrase, position reporting for errors. I also have a lot of concerns about having the full lucene syntax inside phrases and trying to restrict this by throwing exceptions for particular cases does not seem the best design. Here is a example of with OR, AND, PARENTESIS with a proximity search "(( jakarta OR green) AND (blue AND orange) AND black~2) apache"~10 What should a user expect from this query, without looking at the code. I'm not sure. Does it even make sense to support this complex syntax? In my opinion. no I think we should define what is the subset of the language we want to support inside the phrases with a well defined behavior. If Mark describes all the syntax he wants to support inside phrases, I actually don't mind to implement a new parser.for this. My view is, contrib is probably a better place to have this code, until we figure out a implementation that does not impose as many restrictions on changes to the original queryparser and describes a well defined syntax to be applied inside phrases. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734148#action_12734148 ] Mark Harwood commented on LUCENE-1486: -------------------------------------- I'll try and catch up with some of the issues raised here: bq. What do you mean on the last check by phrase inside phrase, I don't see any phrase inside a phrase Correct, the "inner phrase" example was a term not a phrase. This is perhaps a better example: checkBadQuery("\"jo* \"percival smith\" \""); //phrases inside phrases is bad bq. I'm trying now to figure out what is supported The Junit is currently the main form of documentation - unlike the XMLQueryParser (which has a DTD) there is no syntax to formally capture the logic. Here is a basic summary of the syntax supported and how it differs from normal non-phrase use of the same operators: * Wildcard/fuzzy/range clauses can be used to define a phrase element (as opposed to simply single terms) * Brackets are used to group/define the acceptable variations for a given phrase element e.g. "(john OR jonathon) smith" * "AND" is irrelevant - there is effectively an implied "AND_NEXT_TO" binding all phrase elements To move this forward I would suggest we consider following one of these options: 1) Keep in core and improve error reporting and documentation 2) Move into "contrib" as experimental 3) Retain in core but simplify it to support only the simplest syntax (as in my Britney~ example) 4) Re-engineer the QueryParser.jj to support a formally defined syntax for acceptable "within phrase" operators e.g. *, ~, ( ) I think 1) is achievable if we carefully define where the existing parser breaks (e.g. ANDs and nested brackets) 2) is unnecessary if we can achieve 1). 3) would be a shame if we lost useful features for some very convoluted edge cases 4) is beyond my JavaCC skills. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734150#action_12734150 ] Mark Miller commented on LUCENE-1486: ------------------------------------- My first thought is, if we can address some of the issues brought up, there is no reason to keep this out of core IMHO. My second thought is, I have a feeling a lot of this concern stems from the fact that these guys (or one of them) has to duplicate this thing with the QueryParser code in contrib. That could be reason enough to move it to contrib. But it doesn't solve the issue longer term when the old QueryParser is removed. It would need to be replaced then, or dropped from contrib. With the new info from Mark H, how hard would it be to create a new imp for the new parser that did a lot of this, in a more defined way? It seems you basically just want to be able to use multiterm queries and group/or things, right? We could even relax a little if we have to. This hasn't been released, so there is still a lot of wiggle room I think. But there does have to be a resolution with this and the new parser at some point either way. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734241#action_12734241 ] Adriano Crestani commented on LUCENE-1486: ------------------------------------------ Hi Mark H., Thanks for the response, some comments inline: {quote} Correct, the "inner phrase" example was a term not a phrase. This is perhaps a better example: checkBadQuery("\"jo* \"percival smith\" \""); //phrases inside phrases is bad {quote} I think you did not get what I meant, even with your new example, there is no inner phrase, it is: a phrase <"jo* ">, followed by a term <percival>, followed by another term <smith>, and an empty phrase <" ">. So, with your change, the junit passes, but for the wrong reason. It gets an exception complaining about the empty phrase and not because there is an inner phrase (I still don't see how you can type an inner phrase with the current syntax). I think it's not a big deal, but I'm just trying to understand and raise a probable wrong test. I expect you understood what I mean, let me know if I did not make it clear. {quote} The Junit is currently the main form of documentation {quote} But not the ideal, because the source code (junit code) is not released in the binary release. So, the ideal place should be in the javadocs. {quote} * Wildcard/fuzzy/range clauses can be used to define a phrase element (as opposed to simply single terms) * Brackets are used to group/define the acceptable variations for a given phrase element e.g. "(john OR jonathon) smith" * "AND" is irrelevant - there is effectively an implied "AND_NEXT_TO" binding all phrase elements {quote} Thanks, now it's clearer for me what is supported or not. I have some questions: I understand this AND_NEXT_TO implicit operator between the queries inside the phrase. However, what happens if the user do not type any explicit boolean operator between two terms inside parentheses: "(query parser) lucene". Is the operator between 'query' and 'parser' the implicit AND_NEXT_TO or the default boolean operator (usually OR)? What happens if I type "(query AND parser) lucene". In my point of view it is: "(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document that contains the term 'query' and the term 'parser' in the position x, and the term 'lucene' in the position x+1. Is this the expected behaviour? {quote} 1) Keep in core and improve error reporting and documentation 2) Move into "contrib" as experimental 3) Retain in core but simplify it to support only the simplest syntax (as in my Britney~ example) 4) Re-engineer the QueryParser.jj to support a formally defined syntax for acceptable "within phrase" operators e.g. *, ~, ( ) {quote} 1 is good, but I would prefer 4 too. Documentation and throw the right exception are necessary. I just don't feel confortable on the complex phrase query parser relying on the main query parser syntax, any change on the main one could easialy brake the complex phrase QP. Anyway, 4 may be done in future :) Mark M.: {quote} With the new info from Mark H, how hard would it be to create a new imp for the new parser that did a lot of this, in a more defined way? It seems you basically just want to be able to use multiterm queries and group/or things, right? We could even relax a little if we have to. This hasn't been released, so there is still a lot of wiggle room I think. But there does have to be a resolution with this and the new parser at some point either way. {quote} Yes, I am working on the new query parser code. I started recently to read and understand how the ComplexPhraseQP works, so I could reproduce the behaviour using the new QP framework. I first tried to look at this QP as a user and could not figure out what exactly I can or not do with it. I think now we are hitting a big problem, which is related to documentation. That is why I started raising these question, because others could also have the same issues in future. So, yes, I can start coding some equivalent QP using the new QP framework, I'm just questioning and trying to understand everything before I start any coding. I don't wanna code anything that wil throw ConcurrentModificationExceptions, that's why I'm raising these issues now, before I start moving it to the new QP. Best Regards, Adriano Crestani Campos > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
|
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734296#action_12734296 ] Michael Busch commented on LUCENE-1486: --------------------------------------- I think the best thing to do here is do exactly define what syntax is supposed to be supported (which Mark H. did in his latest comment), and then implement the new syntax with the new queryparser. It will enforce correct syntax and give meaningful exceptions if a query was entered that is not supported. I think we can still reuse big portions of Mark's patch: we should be able to write a new QueryBuilder that produces the new ComplexPhraseQuery. Adriano/Luis: how long would it take to implement? Can we contain it for 2.9? This would mean that these new features would go into contrib in 2.9 as part of the new query parser framework, and then be moved to core in 3.0. Also from 3.0 these new features would then be part of Lucene's main query syntax. Would this makes sense? > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Reopened: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch reopened LUCENE-1486: ----------------------------------- Reopening this issues; we haven't made a final decision on how we want to go forward yet, but in any case there's remaining work here. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734300#action_12734300 ] Luis Alves commented on LUCENE-1486: ------------------------------------ Hi Mark H I would like to propose 5, 5) Re-engineer the QueryParser.jj to support a formally defined syntax for acceptable "within phrase" operators e.g. *, ~, ( ) I propose doing this using using the new QP implementation. (I can write the new javacc QP for this) (this implies that the code will be in contrib in 2.9 and be part of core on 3.0) I also want to propose to change the complexphrase to use single quotes, this way we can have both implementation for phrases. Here is a summary: - the complexqueryparser would support all Lucene syntax even for phrases - and we could add singlequoted text to identify complexphrases 1) Wildcard/fuzzy/range clauses can be used to define a phrase element (as opposed to simply single terms) 2) Brackets are used to group/define the acceptable variations for a given phrase element e.g. "(john OR jonathon) smith" 3) supported operators: OR, *, ~, ( ), ? 4) disallow fields, proximity, boosting and operators on single quoted phrases (I'm making an assumption here, Mark H please comment) 5) singlequotes need to be escaped, double quotes will be treated as regular punctuation characters inside single quoted strings Mark H, can you please elaborate more on the these other operators "+" "-" "^" "AND" "&&" "||" "NOT" "!" ":" "[" "]" "{" "}". Example: A query with single quoted (complexphrase) followed by a term and a normal phrase: query: '(john OR jonathon) smith~0.3 order*' order:sell "stock market" > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323 ] Luis Alves commented on LUCENE-1486: ------------------------------------ Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be return by or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this does not seem to be working Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, can you describe what is the behavior here. Look like the and is convert into a OR, that the case. What is the behavior you want to implement. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323 ] Luis Alves edited comment on LUCENE-1486 at 7/22/09 2:13 PM: ------------------------------------------------------------- Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 {{monospaced}} DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; {{monospaced}} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? was (Author: lafa): Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be return by or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this does not seem to be working Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, can you describe what is the behavior here. Look like the and is convert into a OR, that the case. What is the behavior you want to implement. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323 ] Luis Alves edited comment on LUCENE-1486 at 7/22/09 2:19 PM: ------------------------------------------------------------- Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 {monospaced} DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; {monospaced} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? was (Author: lafa): Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 {{monospaced}} DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; {{monospaced}} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323 ] Luis Alves edited comment on LUCENE-1486 at 7/22/09 2:21 PM: ------------------------------------------------------------- Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 {code:title=TestComplexPhraseQuery.java|borderStyle=solid} ... DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; ... {code} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? was (Author: lafa): Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 {monospaced} DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; {monospaced} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323 ] Luis Alves edited comment on LUCENE-1486 at 7/22/09 2:22 PM: ------------------------------------------------------------- Mark H - Question 1) I added a doc 5 and 6 {code:title=TestComplexPhraseQuery.java|borderStyle=solid} ... DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; ... {code} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? was (Author: lafa): Mark H - Question 1) I also have a question about position. I added a doc 5 and 6 {code:title=TestComplexPhraseQuery.java|borderStyle=solid} ... DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; ... {code} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734323#action_12734323 ] Luis Alves edited comment on LUCENE-1486 at 7/22/09 2:24 PM: ------------------------------------------------------------- Mark H - Question 1) I added a doc 5 and 6 {code:title=TestComplexPhraseQuery.java|borderStyle=solid} ... DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; ... {code} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned. Is this the correct behavior? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) for query: checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, seems that like the AND is converted into a OR. What is the behavior you want to implement? was (Author: lafa): Mark H - Question 1) I added a doc 5 and 6 {code:title=TestComplexPhraseQuery.java|borderStyle=solid} ... DocData docsContent[] = { new DocData("john smith", "1"), new DocData("johathon smith", "2"), new DocData("john percival smith goes on a b c vacation", "3"), new DocData("jackson waits tom", "4"), new DocData("johathon smith john", "5"), new DocData("johathon mary gomes smith", "6"), }; ... {code} for test checkMatches("\"(jo* -john) smyth\"", "2"); // boolean logic with would document 5 be returned or just doc 2 should be returned, I'm assuming position is always important and doc 5 is supposed to be returned, correct? Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with returns 1,2,5 and not 6, but I was only expecting 6 to be returned, Can you describe what is the behavior here. Looks like the and is converted into a OR. What is the behavior you want to implement? > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734333#action_12734333 ] Luis Alves commented on LUCENE-1486: ------------------------------------ Sorry for all the emails, I'm still new to JIRA and only now I realized that for every edit I do,a email is sent. But now that I found the preview button, it won't happen again. :) > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734337#action_12734337 ] Mark Harwood commented on LUCENE-1486: -------------------------------------- bq. I think it's not a big deal, but I'm just trying to understand and raise a probable wrong test. Granted, the test fails for a reason other than the one for which I wanted it to fail. We can probably strike the test and leave a note saying phrase-within-a-phrase just does not make sense and is not supported. bq. Is the operator between 'query' and 'parser' the implicit AND_NEXT_TO or the default boolean operator (usually OR)? In brackets it's an OR - the brackets are used to suggest that the current phrase element at position X is composed of some choices that are evaluated as a subclause in the same way that in normal query logic sub-clauses are defined in brackets e.g. +a +(b OR c). There seems to be a reasonable logic to this. Ideally the ComplexPhraseQueryParser should explicitly turn this setting on while evaluating the bracketed innards of phrases just in case the base class has AND as the default. bq. Mark H, can you please elaborate more on the these other operators "+" "-" "^" "AND" "&&" "||" "NOT" "!" ":" "[" "]" "{" "}". OK I'll try and deal with them one by one but these are not necessarily definitive answers or guarantees of correctly implemented support OR,||,+, AND, && ..... ignored. The implicit operator is AND_NEXT_TO apart from in bracketed sections where all elements at this level are ORed ^ .....boosts are carried through from TermQuerys to SpanTermQuerys NOT, ! ....Creates SpanNotQueries []{} ....range queries are supported as are wildcards *, fuzzies ~, ? bq. query: '(john OR jonathon) smith~0.3 order*' order:sell "stock market" I'll post the XML query syntax equivalent of what should be parsed here shortly (just seen your next comment come in) > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734349#action_12734349 ] Mark Harwood commented on LUCENE-1486: -------------------------------------- {quote}for test checkMatches("\"(jo* -john) smyth\"", "2"); would document 5 be returned or just doc 2 should be returned, {quote} I presume you mean smith not smyth here otherwise nothing would match? If so, doc 5 should match and position is relevant (subject to slop factors). {quote} Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work {quote} I suppose there's an open question as to if the second example is legal (the brackets are unnecessary) {quote} Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. {quote} That looks like a bug related to slop factor? {quote} Question 4) The usage of AND and AND_NEXT_TO is confusing to me the query checkMatches("\"(jo* AND mary) smith\"", "1,2,5"); // boolean logic with {quote} ANDs are ignored and turned into ORs (see earlier comments) but maybe a query parse error should be thrown to emphasise this. > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734355#action_12734355 ] Mark Harwood commented on LUCENE-1486: -------------------------------------- {quote} query: '(john OR jonathon) smith~0.3 order*' order:sell "stock market" {quote} Would be parsed as follows (shown as equivalent XMLQueryParser syntax) {code:xml} <BooleanQuery> <Clause occurs="should"> <SpanNear > <SpanOr> <SpanOrTerms>john jonathon </SpanOrTerms> </SpanOr> <SpanOr> <SpanOrTerms> smith smyth</SpanOrTerms> </SpanOr> <SpanOr> <SpanOrTerms> order orders</SpanOrTerms> </SpanOr> </SpanNear> </Clause> <Clause occurs="should"> <TermQuery fieldName="order" >sell</TermQuery> </Clause> <Clause occurs="should"> <UserQuery>"stock market"</UserQuery > </Clause> </BooleanQuery> {code} > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734398#action_12734398 ] Adriano Crestani commented on LUCENE-1486: ------------------------------------------ {quote} I propose doing this using using the new QP implementation. (I can write the new javacc QP for this) (this implies that the code will be in contrib in 2.9 and be part of core on 3.0) {quote} That would be good! {quote} Granted, the test fails for a reason other than the one for which I wanted it to fail. We can probably strike the test and leave a note saying phrase-within-a-phrase just does not make sense and is not supported. {quote} Cool, I agree to remove it. But I still don't see how an user can type a phrase inside a phrase with the current syntax definition, can you give me an example? {quote} In brackets it's an OR - the brackets are used to suggest that the current phrase element at position X is composed of some choices that are evaluated as a subclause in the same way that in normal query logic sub-clauses are defined in brackets e.g. +a +(b OR c). There seems to be a reasonable logic to this. Ideally the ComplexPhraseQueryParser should explicitly turn this setting on while evaluating the bracketed innards of phrases just in case the base class has AND as the default. {quote} If we use the implemented java cc code Luis suggested, we would have already a query parser that throws ParseExceptions whenever the user types an AND inside a phrase. {quote} OR,||,+, AND, && ..... ignored {quote} So we should throw an excpetion if any of these is found inside a phrase. It could confuse the user if we just ignore it. {quote} Question 2) Should these 2 queries behave the same when we fix the problem // checkMatches("\"john -percival\"", "1"); // not logic doesn't work // checkMatches("\"john (-percival)\"", "1"); // not logic doesn't work I suppose there's an open question as to if the second example is legal (the brackets are unnecessary) {quote} Yes, the second is unnecessary, but I don't think it's illegal. The user could type <(smith)> outside the phrase, it makes sense to support it inside also. {quote} Question 3) checkMatches("\"jo* smith\"~2", "1,2,3,5"); // position logic works. doc 6 is also returned, so this feature does not seem to be working. That looks like a bug related to slop factor? {quote} I have not checked yet, but I think it's working fine. The slop means how many switches between the terms inside the phrase is allowed to match the query. It matches doc 6, because the term <smith> switches twice to the right and matched "johathon mary gomes smith". Twice = slop 2 :) {quote} ANDs are ignored and turned into ORs (see earlier comments) but maybe a query parse error should be thrown to emphasise this. {quote} I think we could support AND also. I agree there are few cases where the user would use that. It would work as I explained before: {quote} What happens if I type "(query AND parser) lucene". In my point of view it is: "(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document that contains the term 'query' and the term 'parser' in the position x, and the term 'lucene' in the position x+1. Is this the expected behaviour? {quote} > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735041#action_12735041 ] Ahmet Arslan commented on LUCENE-1486: -------------------------------------- Hi everyone, I am using your ComplexPhraseQueryParser. I integrated it into Solr. I am interested in it mainly because it supports OR operator and wildcards inside proximity search. Specifically : "(john johathon) smith"~10 and "j* smith" They both work perfectly, thank you for your work. I downloaded source code of it from http://svn.apache.org/viewvc?view=rev&revision=791579 And then edited the code a little bit since I am using lucene 2.4.1: I replaced those: 1-) TermRangeQuery to RangeQuery. 2-) getConstantScoreRewrite() to getUseOldRangeQuery(); 3-) setConstantScoreRewrite(false); to setUseOldRangeQuery(true); 4-) On line 168 of ComplexPhraseQueryParser.java there are two semicolons (;;) I am not sure what I did is the way to start using this query parser with latest versions of lucene/solr. If it is not can you suggest me better ways or where to get/download latest source code of query parser. I am having problems with multi-field searches. Query "(john johathon) smith"~10 works on default field, e.g. text. But when I want to run the same query on another field (other than default field) title:"(john johathon) smith"~10 it gives exception below: Cannot have clause for field "text" nested in phrase for field "title" When I ran the query distibuting field name to all terms it works: title:"(title:john title:johathon) title:smith"~10 Is there an easy way to set field of all terms (without specifying)? And about boosts of multi-field queries, is this query legal? (default operator = OR, default field = text) title:"(title:john title:johathon) title:smith"~10^1.5 OR "(john johathon) smith"~10^3.0 Shortly I want to use this queryparser to query on multi-fields with different boosts. I am not sure if I am allowed to ask such question in here, if not please accept my apologies. Thank you for your consideration. Ahmet Arslan > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Assignee: Mark Harwood > Priority: Minor > Fix For: 2.9 > > Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |