[jira] Created: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next >

[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated LUCENE-1486:
---------------------------------------

    Fix Version/s:     (was: 2.4.1)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1486:
---------------------------------

    Attachment:     (was: ComplexPhraseQueryParser.java)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1486:
---------------------------------

    Attachment: ComplexPhraseQueryParser.java

Updated to cater for phrase clauses that produce no matches

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1486:
---------------------------------

    Attachment:     (was: TestComplexPhraseQuery.java)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1486:
---------------------------------

    Attachment: TestComplexPhraseQuery.java

Updated Junit test to test for phrases with clauses that produce no matches

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697810#action_12697810 ]

Ali Oral commented on LUCENE-1486:
----------------------------------

This issue is very interesting. I see that you use query rewrite for wildcard and fuzzy queries and then convert them to spanTermQueries. In order to avoid overflowing clauses can't you use MultiPhrase query?

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ali Oral updated LUCENE-1486:
-----------------------------

    Comment: was deleted

(was: This issue is very interesting. I see that you use query rewrite for wildcard and fuzzy queries and then convert them to spanTermQueries. In order to avoid overflowing clauses can't you use MultiPhrase query?)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718281#action_12718281 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

What do you think about this for 2.9 Mark H?

bq. The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept for much of the query parser syntax.

That leads me to think we might want to push to 3.0? Or have you moved beyond that with all of these updates?

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718573#action_12718573 ]

Mark Harwood commented on LUCENE-1486:
--------------------------------------

Perhaps "hacky" was too strong a word. I think it's a reasonable approach to handling the complexity involved in this logic.

A colleague of mine has this running in production on a big installation with lots of users

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719105#action_12719105 ]

Michael McCandless commented on LUCENE-1486:
--------------------------------------------

Is there some reason not to include this in QueryParser instead?  Ie, it accepts a superset of QueryParser's current syntax?

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719115#action_12719115 ]

Mark Harwood commented on LUCENE-1486:
--------------------------------------

The primary reason (and perhaps not a particularly good one) was I didn't want to wade around in the Javacc syntax of the .jj file that generates the QueryParser and the required extensions could be made in a subclass.

Also there is invariably a performance hit for supporting things like wildcards in phrase queries so rather than adding another "off by default" flag in the main parser  and conditional logic to test if "wildcards etc in phrases" are allowed, the subclass could be seen as a specialised extension that is to be used by those that understand the trade-offs between functionality and performance.  

I can sympathise with the purist approach of having all parser syntax defined in Javacc though.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719639#action_12719639 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

Should this go in contrib rather than core? That seems to have been the approach so far, any reason to vary it up here?

Well, actually, looks like I see the multi field parser in core. Makes sense to put subclasses there I guess.

You think this is ready to commit Mark? If so, I should be able to review it (unless you want to commit it yourself).

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1486:
--------------------------------

    Attachment: LUCENE-1486.patch

Reformatted to lucene formatting, removed author tag, removed a couple unused fields, changed to patch format

Tests don't pass because it doesnt work quite correctly with the new constantscore multi term queries yet.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Assigned: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller reassigned LUCENE-1486:
-----------------------------------

    Assignee: Mark Miller

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723699#action_12723699 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

Hey Mark, this doesn't work correctly with the new constant score mode. I'm hesitant to put something in core that only works with boolean expansion.

I'm not sure what needs to be done (I started and realized my interest wasn't high enough). Could you update this? Otherwise I'm tempted to push off to 3.0...

Unless another brave soul steps of course. Or I may jump back in - my brain is fickle.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood updated LUCENE-1486:
---------------------------------

    Attachment: LUCENE-1486.patch

Added fix for ConstantScoreQuery changes

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723742#action_12723742 ]

Mark Harwood commented on LUCENE-1486:
--------------------------------------

The fix was relatively straight-forward from what I could see. Just temporarily unset the QueryParser's ConstantScoreRewrite mode when performing the pass that is just evaluating query elements inside phrase queries. These clauses need to resolve to traditional BooleanQuery-full-of-termQueries in order that they can be inspected and rewritten as Span equivalents for complex phrases.

Should do the job.

Cheers
Mark
(Been far too busy with other things and missing getting my hands dirty here with Lucene!)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723747#action_12723747 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

Figured thats all it would take. I just was feeling a bit too lazy to try and understand the whole class after I put it up in front of me for a few seconds :) Figured I'd try and pawn off a piece. I made some adjustments to the patch last time, but they were basically cosmetic.

Looks like I didnt escape much work this time though - I'll review and commit shortly.

Thanks a lot.


> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1486:
--------------------------------

    Attachment: LUCENE-1486.patch

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1486:
--------------------------------

    Attachment: LUCENE-1486.patch

Whoops - almost let some 1.5 slip by:  throw new IllegalArgumentException(pe.getMessage(), pe) is not in 1.4.

Last patch. I'll commit later today.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...

< Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next >