[jira] Created: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next >

[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727685#action_12727685 ]

Mark Harwood commented on LUCENE-1486:
--------------------------------------

Hi Mark,
Mind if I try committing this patch?
I've just switched from PC to Mac and my dev environment is all changed (Subclipse vs TortoiseSvn etc) and I wouldn't mind checking my config and commit rights still work in this new environment.
If anyone has any  mac/subclipse-related "gotchas" I should be aware of, do let me know.

Cheers
Mark

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727692#action_12727692 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

Please, by all means ! :)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Miller
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Assigned: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller reassigned LUCENE-1486:
-----------------------------------

    Assignee: Mark Harwood  (was: Mark Miller)

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Closed: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Harwood closed LUCENE-1486.
--------------------------------

    Resolution: Fixed

Committed in 791579 -  http://svn.apache.org/viewvc?rev=791579&view=rev

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733889#action_12733889 ]

Adriano Crestani commented on LUCENE-1486:
------------------------------------------

Hi,

I'm trying to understand what kind of syntax this query parser supports. I read the code and it does not say much. Is there any documentation (wiki, javadoc, etc) that specifies the syntax? Because it's not clear for me.

Thanks in advance,
Adriano Crestani Campos

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733893#action_12733893 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

You might check the test class - it has a few basic examples. Its not much different than whats posted in the summary:

Just experiment.

+    checkMatches("\"john smith\"", "1"); // Simple multi-term still works
+    checkMatches("\"j*   smyth~\"", "1,2"); // wildcards and fuzzies are OK in
+    // phrases
+    checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
+    checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
+    checkMatches("\"jo* [sma TO smZ]\" ", "1,2"); // range queries supported
+    checkMatches("\"john\"", "1,3"); // Simple single-term still works
+    checkMatches("\"(john OR johathon)  smith\"", "1,2"); // boolean logic with
+    // brackets works.
+    checkMatches("\"(jo* -john) smyth~\"", "2"); // boolean logic with
+    // brackets works.
+
+    // checkMatches("\"john -percival\"", "1"); // not logic doesn't work
+    // currently :(.
+
+    checkMatches("\"john  nosuchword*\"", ""); // phrases with clauses producing
+    // empty sets
+
+    checkBadQuery("\"jo*  id:1 smith\""); // mixing fields in a phrase is bad
+    checkBadQuery("\"jo* \"smith\" \""); // phrases inside phrases is bad

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adriano Crestani updated LUCENE-1486:
-------------------------------------

    Attachment: junit_complex_phrase_qp_07_21_2009.patch

Thanks for the quick response Mark!

OK, I'm trying now to figure out what is supported reading the junits only, and I ran into some issues:

What do you mean on the last check by phrase inside phrase, I don't see any phrase inside a phrase (I'm not sure either what it would be, because there is no open and close phrase delimiter), all I see is a phrase <"jo*">, followed by a term <smith> and an empty phrase <" ">. And the check passes because the query parser throws an exception complaning about the empty phrase, it seems to not be supported. I just changed the empty phrase to a valid phrase and the query works (failing the test case). But as I said, I'm not sure what you were exactly trying to do there, could you give me more explation about that?

I'm also getting a java.util.ConcurrentModificationException when I type an escaped double quotes inside phrases. So, I suppose it's not supported, but shouldn't it throw a better exception?

I also have an issue with the parse exceptions, if it comes from inside a phrase, it does not tell the correct position in the query string. I think it considers the beginning of the phrase as the beginning of the query and it only prints the phrase that contains the problem.

I'm attaching some changes I did in the TestComplexPhraseQuery junit that shows these problems I'm getting, I think it's easier to understand if you read and run it.

Sorry for so many questions, but I'm just trying to understand what exactly this query parser supports or not.

Thanks,
Adriano Crestani Campos

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733933#action_12733933 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

You may have to wait for the author, Mark Harwood to respond. I just reviewed the issue. A couple points though:

bq. What do you mean on the last check by phrase inside phrase, I don't see any phrase inside a phrase (I'm not sure either what it would be, because there is no open and close phrase delimiter), all I see is a phrase <"jo*">, followed by a term <smith> and an empty phrase <" ">

Its kind of a phrase within a phrase (though the "smith" phrase could be turned into a term query) - unescaped: "jo* "smith"" - the full thing is phrase one, and smith is the inner phrase (though yes, only a term in the phrase).

If Mark Harwood doesn't have time to answer soon, I'll dig in more and respond to your other questions/comments.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733940#action_12733940 ]

Michael Busch commented on LUCENE-1486:
---------------------------------------

Looking at the problems Adriano is seeing it almost seems like this was a bit prematurely committed? It seems like a lot of queries you could enter here are not really supported and might throw strange exceptions.

Maybe it should live in contrib for now (with experimental warnings)?

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733946#action_12733946 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

I originally thought it might live in contrib as well (see above), but I'm personally fine with it being in core.

bq.  It seems like a lot of queries you could enter here are not really supported and might throw strange exceptions.

A lot of queries? I think Adriano is just having trouble with phrases inside phrases, which is unsupported. Other things that are not supported might throw exceptions too, but I think thats to be expected? I see what Adriano was talking about now - technically the first 2 quotes would match, and then the second two - I think Mark H was just demonstrating that you shouldn't try query though - a user might think they are quoting smith, but for the example, it doesn't matter. I think he just trying to show that you shouldn't try and "nest" phrases - even though they wouldn't be interpreted that way anyway.

It only supports a limited subset of the Lucene query language - perhaps we could improve the exceptions being thrown, but the exceptions the queryparser throws often leave just as much to be desired. I don't think its experimental because of that.

Personally, I think the class does what it intends - allows a limited subset of the Lucene query language in phrases. Though of course it could be improved.

I'll let Mark H respond though. I also don't mind seeing it moved to contrib, but I'm not sure anything glaring points to it being moved at the moment. It lives up to its limited contract I think.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733946#action_12733946 ]

Mark Miller edited comment on LUCENE-1486 at 7/21/09 6:43 PM:
--------------------------------------------------------------

I originally thought it might live in contrib as well (see above), but I'm personally fine with it being in core.

bq.  It seems like a lot of queries you could enter here are not really supported and might throw strange exceptions.

A lot of queries? I think Adriano is just having trouble with phrases inside phrases, which is unsupported. Other things that are not supported might throw exceptions too, but I think thats to be expected? I see what Adriano was talking about now - technically the first 2 quotes would match, and then the second two - I think Mark H was just demonstrating that you shouldn't try that query though - a user might think they are quoting smith, but for the example, it doesn't matter. I think he just trying to show that you shouldn't try and "nest" phrases - even though they wouldn't be interpreted that way anyway.

It only supports a limited subset of the Lucene query language - perhaps we could improve the exceptions being thrown, but the exceptions the queryparser throws often leave just as much to be desired. I don't think its experimental because of that.

Personally, I think the class does what it intends - allows a limited subset of the Lucene query language in phrases. Though of course it could be improved.

I'll let Mark H respond though. I also don't mind seeing it moved to contrib, but I'm not sure anything glaring points to it being moved at the moment. It lives up to its limited contract I think.

      was (Author: markrmiller@...):
    I originally thought it might live in contrib as well (see above), but I'm personally fine with it being in core.

bq.  It seems like a lot of queries you could enter here are not really supported and might throw strange exceptions.

A lot of queries? I think Adriano is just having trouble with phrases inside phrases, which is unsupported. Other things that are not supported might throw exceptions too, but I think thats to be expected? I see what Adriano was talking about now - technically the first 2 quotes would match, and then the second two - I think Mark H was just demonstrating that you shouldn't try query though - a user might think they are quoting smith, but for the example, it doesn't matter. I think he just trying to show that you shouldn't try and "nest" phrases - even though they wouldn't be interpreted that way anyway.

It only supports a limited subset of the Lucene query language - perhaps we could improve the exceptions being thrown, but the exceptions the queryparser throws often leave just as much to be desired. I don't think its experimental because of that.

Personally, I think the class does what it intends - allows a limited subset of the Lucene query language in phrases. Though of course it could be improved.

I'll let Mark H respond though. I also don't mind seeing it moved to contrib, but I'm not sure anything glaring points to it being moved at the moment. It lives up to its limited contract I think.
 

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733956#action_12733956 ]

Adriano Crestani commented on LUCENE-1486:
------------------------------------------

{quote}
I see what Adriano was talking about now - technically the first 2 quotes would match, and then the second two - I think Mark H was just demonstrating that you shouldn't try that query though - a user might think they are quoting smith, but for the example, it doesn't matter. I think he just trying to show that you shouldn't try and "nest" phrases - even though they wouldn't be interpreted that way anyway.
{quote}

Well, if you guessed his intention correctly, the comment is misleading: "phrases inside phrases is bad". But lets wait for his response.

{quote}
Other things that are not supported might throw exceptions too
{quote}

I think a user would expect a ParseException. Probably, every query parser user catches ParserException and show a nice message to its final user. Now, if the query parser starts throwing random exception to say the syntax is invalid, every software that uses Lucene query parser is gonna start crashing. For me it's like if a compiler started throwing segmentation fault every time you forget a } in the code.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733957#action_12733957 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

{quote}
I think a user would expect a ParseException. Probably, every query parser user catches ParserException and show a nice message to its final user. Now, if the query parser starts throwing random exception to say the syntax is invalid, every software that uses Lucene query parser is gonna start crashing. For me it's like if a compiler started throwing segmentation fault every time you forget a } in the code.
{quote}

That's a fair point - addressable though - we can likely catch and rethrow in the worst case.

I'll admit, the ... non exactness ... of this parser troubled me at first - one of the reasons I liked contrib as a landing spot early on. I took it for what it is in the end I suppose. I think the shortfalls brought up so far can be addressed to a large degree though.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733958#action_12733958 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

bq. Well, if you guessed his intention correctly, the comment is misleading: "phrases inside phrases is bad". But lets wait for his response.

I think thats a bit of judgement call. We know that the way the query is parsed, you cannot really ever do "phrases inside phrases". However, a user of this parser might think, that like the other syntax, perhaps you can use "phrases inside phrases" - and if you thought that, the example given is likely how you'd imagine it to work. The outside phrase, and then the inside phrase. I certainly agree some comments would clear it up, but I think its a useful example.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733961#action_12733961 ]

Adriano Crestani commented on LUCENE-1486:
------------------------------------------

{quote}
I'll admit, the ... non exactness ... of this parser troubled me at first - one of the reasons I liked contrib as a landing spot early on. I took it for what it is in the end I suppose. I think the shortfalls brought up so far can be addressed to a large degree though.
{quote}

I think contrib would be a good place for now, until it gets more stable and better documented.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733966#action_12733966 ]

Mark Miller commented on LUCENE-1486:
-------------------------------------

bq. I think contrib would be a good place for now, until it gets more stable and better documented.

If Mark H thinks it should be moved, I won't disagree. But I still don't see a convincing reason. It could use some more documentation, but so could quite a few other classes in core. Its something of a subjective call,  and  more importantly, it can be addressed now.

I'm not yet convinced its unstable - the only major issue I see so far is the exception issue - but that wouldn't seem to prompt a move to contrib, but an update to address the concern. Moving to contrib is always an option, but I don't think its the default move based on whats been brought up. The standard move would be to address whatever issues are brought up ... so far I am just seeing the exception issue as a large one, and I think that is fairly easily addressable.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733982#action_12733982 ]

Michael Busch commented on LUCENE-1486:
---------------------------------------

{quote}
It only supports a limited subset of the Lucene query language - perhaps we could improve the exceptions being thrown, but the exceptions the queryparser throws often leave just as much to be desired. I don't think its experimental because of that.
{quote}

Because it only supports a limited subset of the language, I feel like we could have taken a different approach here? Why not add the features that are supported and make sense to the main query parser?

The documentation does not tell me what is supported and what is not currently. And looking through the code some methods now throw RuntimeExceptions, because the overridden methods themselves don't throw anything. These things feel a bit unfinished.

I'm not saying these issues are not fixable. But maybe we should rethink the design. My biggest concern is that this new parser doesn't seem to have a well-defined syntax. So since it doesn't check if a query is actually valid or not, it might be hard to maintain. E.g. if you add new language features to the main QP, it's currently not defined what will happen if you use them with this one.

That's why I'm proposing to move it to contrib and mark it as experimental. Then we have more time to decide if the approach of adding the new features to the main QP makes more sense.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734015#action_12734015 ]

Luis Alves commented on LUCENE-1486:
------------------------------------

I share same opinion as Michael,
the implementation has a lot of undefined/undocumented behaviors,
simple because it reuses the queryparser to parse the text inside a phrase.
All the lucene syntax needs to be accounted on this design, but it does not seem to be the case.

Problems like Adriano described, phrase inside a phrase, position reporting for errors.

I also have a lot of concerns about having the full lucene syntax inside phrases
and trying to restrict this by throwing exceptions for particular cases does not seem the best design.

Here is a example of with OR, AND, PARENTESIS with a proximity search
"(( jakarta OR green) AND (blue AND orange)  AND black~2) apache"~10

What should a user expect from this query, without looking at the code. I'm not sure.
Does it even make sense to support this complex syntax? In my opinion. no

I think we should define what is the subset of the language we want to support inside the phrases with a well defined behavior.
If Mark describes all the syntax he wants to support inside phrases, I actually don't mind to implement a new parser.for this.

My view is, contrib is probably a better place to have this code, until we figure out a implementation that does not impose as many restrictions on changes to the original queryparser and describes a well defined syntax to be applied inside phrases.



> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luis Alves updated LUCENE-1486:
-------------------------------

    Attachment: junit_complex_phrase_qp_07_22_2009.patch

I added 2 testcases that return doc 3, but do not make much sense just to prove the point that we need more docs describing the use case for complex phrase qp, and define what is the subset of the supported syntax we want to support.

        checkMatches("\"(goos~0.5 AND (mike OR smith) AND NOT ( percival AND john) ) vacation\"~3","3"); // proximity with fuzzy, OR, AND, NOT
        checkMatches("\"(goos~0.5 AND (mike OR smith) AND ( percival AND john) ) vacation\"~3","3"); // proximity with fuzzy, OR, AND


> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Issue Comment Edited: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734141#action_12734141 ]

Luis Alves edited comment on LUCENE-1486 at 7/22/09 7:55 AM:
-------------------------------------------------------------

I added 2 testcases that return doc 3.
These queries do not make much sense,
I added it just to prove the point that we need more information
describing the use case for complex phrase qp.
We also should define a subset of the supported syntax we want to support inside phrases,
with well defined behaviors.

        checkMatches("\"(goos~0.5 AND (mike OR smith) AND NOT ( percival AND john) ) vacation\"~3","3"); // proximity with fuzzy, OR, AND, NOT
        checkMatches("\"(goos~0.5 AND (mike OR smith) AND ( percival AND john) ) vacation\"~3","3"); // proximity with fuzzy, OR, AND


      was (Author: lafa):
    I added 2 testcases that return doc 3, but do not make much sense just to prove the point that we need more docs describing the use case for complex phrase qp, and define what is the subset of the supported syntax we want to support.

        checkMatches("\"(goos~0.5 AND (mike OR smith) AND NOT ( percival AND john) ) vacation\"~3","3"); // proximity with fuzzy, OR, AND, NOT
        checkMatches("\"(goos~0.5 AND (mike OR smith) AND ( percival AND john) ) vacation\"~3","3"); // proximity with fuzzy, OR, AND

 

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Mark Harwood
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: ComplexPhraseQueryParser.java, junit_complex_phrase_qp_07_21_2009.patch, junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser itself. This works as a proof of concept  for much of the query parser syntax. Examples from the Junit test include:
> checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
>
> checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...

< Prev | 1 - 2 - 3 - 4 - 5 - 6 | Next >