|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next > |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729532#action_12729532 ] David Sitsky commented on LUCENE-1567: -------------------------------------- I will be out of the office on Friday, 10th of July. -- Cheers, David Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699 Web: http://www.nuix.com Fax: +61 2 9212 6902 > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729551#action_12729551 ] Michael Busch commented on LUCENE-1567: --------------------------------------- Luis, I think you need to modify the main build.xml, because the query parser contrib uses java 1.5. For an example look into the build.xml from Lucene 2.2.x. It had a contrib called gdata, which used JRE 1.5. (This was removed after 2.2, so you won't find it in the current build.xml anymore). Currently the build will fail if the user runs JRE 1.4, but it should rather skip the new query parser contrib. You can use this property, which is definied in common-build.xml: {code} <condition property="build-1-5-contrib"> <equals arg1="1.5" arg2="${ant.java.version}" /> </condition> {code} > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729815#action_12729815 ] Adriano Crestani commented on LUCENE-1567: ------------------------------------------ Hi Luis, I have been improving the code documentation lately, I will merge my diff with your new patch and submit the changes soon. I also could merge with the trunk, it depends when last Luis' patch will be committed. {quote} Adriano when you have some time, can you write an interface for simple usage of the new QueryParser, and a simple implementation of the interface, that creates a textparser, creates a processor pipeline, and instantiates the Lucene builders? {quote} Good idea Luis! I was thinking about a class that would allow query parser implementors to "bundle" their processor, text parser and builder in it, so the user could simply use it, nobody needs to know how it's implemented. I think the class should contain a method parse(String defaultField, String queryString) that returns whatever that query parser creates from it, in Lucene's case, a Query object. Also, some sets and gets to access the internal processor, builder and text parser, if the user wishes to. I'm gonna work more on the design and submit a patch soon containing it. {quote} And please add a simple junit that demonstrates the usage of that interface and ideally some documentation into the package.html of the new contrib package that will help users who want to use the queryparser to get started. {quote} I was also thinking about a wiki page that would guide Lucene users to migrate to the new query parser using this new interface. More suggestions? > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Alves updated LUCENE-1567: ------------------------------- Attachment: lucene_trunk_FlexQueryParser_2009July10_v5.patch fix for jdk 1.4, on build.xml > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729935#action_12729935 ] Luis Alves commented on LUCENE-1567: ------------------------------------ Hi Michael, > For an example look into the build.xml from Lucene 2.2.x. The ant file on this Lucene 2.2 module does not follow the lucene convention and it uses a complex implementation. So I fixed the problem in a different way: I renamed the contrib/queryparser/build.xml to build15.xml, and I fixed the contrib-crawl to include build15.xml when a jdk15 is present. I tested default, build-contrib, javadocs-contrib all work fine. I just uploaded the patch v5 with this fix. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adriano Crestani updated LUCENE-1567: ------------------------------------- Attachment: lucene_1567_adriano_crestani_07_13_2009.patch Hey guys, Here is a patch containing some changes I did on top of last Luis' patch ( lucene_trunk_FlexQueryParser_2009July10_v5.patch): - javadoc reviewed and improved - 2 new classes: QueryParserHelper and LuceneQueryParserHelper, they make it easier to use the new query parser - added the ability to set the prefix length for fuzzy queries, it was still missing in the new query parser - resolved some TODOs - AnalyzerQueryNodeProcessor is now using only the new TokenStream API...is it required to be compatible with the old API even if it is in contrib? - I duplicated the test cases so they run using the query parser API directly, the query parser helpers and the query parser wrappers, this way we test the three ways the user can actually use the query parser. I think that is everything. I will keep reviewing and improving the documentation, I think there might be some broken javadoc links yet. I also would like to rename the package and everythiing else that does reference to "lucene2" to "lucene". I think it does not make sense to have a package name tied to a version. So, the package org.apache.lucene.queryParser.lucene2 would be renamed to org.apache.lucene.queryParser.lucene. I know it's kind of weird, because there are 2 "lucene" in the package declararion, but I think it's better than "lucene2". Anyway, suggestions about this are welcome :) ... if nobody replies I will feel free to rename it and submit a new patch soon. I will also work on writing a documentation for Lucene wiki that explains how to easily migrate from the old query parser to the new one, but I will only add it to the wiki when the code is committed to the trunk, it doesn't make sense a wiki documentation about something that is not even committed, agreed? Suggestions? Regards, Adriano Crestani Campos > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730627#action_12730627 ] Adriano Crestani commented on LUCENE-1567: ------------------------------------------ Ah, I also couldn't run "ant build-contrib" using Java 1.4, it fails, I even tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis? I already opened a thread on the ML about this: http://markmail.org/thread/3fyldf7t423fhwbm > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730641#action_12730641 ] Adriano Crestani commented on LUCENE-1567: ------------------------------------------ {quote} Ah, I also couldn't run "ant build-contrib" using Java 1.4, it fails, I even tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis? I already opened a thread on the ML about this: http://markmail.org/thread/3fyldf7t423fhwbm {quote} Mark Miller just replied to the thread and based on his response there is no need for contrib projects to be able to compile using JDK 1.4. So, Luis, could you rollback your changes you did on the build files? Thanks, Adriano Crestani Campos > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730653#action_12730653 ] Mark Miller commented on LUCENE-1567: ------------------------------------- Hang on a sec - it sounds like the target was 1.4 because this was going to replace a 1.4 core piece of functionality. I don't know that all of the details are fully straightened out though. 1. I'm not pro moving the QueryParser to contrib myself, unless we actually move forward on that 'modules' thread - if not, it doesn't appear very helpful to me. 2. If we move this to contrib, perhaps it can be 1.5? But then in 3.0, can we have 1.5 already? Or is that 3.1? If its 3.1, than if we remove the deprecated query parser in 3.0, you won't have a java 1.4 replacement to move to (if course we could keep the old QueryParser till 4.0 ... ). I'm not clear that we can't add new functionality to 3.0 though. I know Mike has mentioned it, but I can't find where it says that - I just see that we can remove deprecations, not that we can't also add new features. I may be missing something though? We should get things fully straightened out before you spend too much time switching between 1.4 and 1.5 though. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730721#action_12730721 ] Michael Busch commented on LUCENE-1567: --------------------------------------- Mark, it seems like the best thing to do here is to add this as a 1.5 contrib for now and deprecate the core query parser. Then in 3.0 we would move the new one into core and remove the old one entirely. Since it will remain in the same package users won't have to change their code, just while they use 2.9 they have to put an extra jar in their classpath. Looking at the latest patch, that's what it does (new one to contrib while deprecating old one). > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730735#action_12730735 ] Luis Alves commented on LUCENE-1567: ------------------------------------ Adriano, I will rollback the build.xml changes tomorrow, and use the convention that the "spatial" and "fast-vector-highlighter" modules use. On the package name "lucene2": I think during the Lucene 3.X development more parsers will be added to the QueryParser, and these parsers will also be lucene parsers and we will need different names. It is probably better to keep lucene2 on the package name, or use a name that makes a reference to the old queryparser. For example, in the future we could have: org.apache.lucene.queryParser.lucene2 <- lucene 2.X syntax org.apache.lucene.queryParser.lucene3 <- lucene 3.X syntax org.apache.lucene.queryParser.xml <- some XML syntax org.apache.lucene.queryParser.luceneBoolean <- boolean syntax org.apache.lucene.queryParser.explicit <- explict query language syntax I'll also help on the when wiki the code is committed to the trunk. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730780#action_12730780 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- Names that tack a "2" or some other number on the end are pretty much meaningless. I'd suggest finding something better that actually describes what the package contains. After all what is the "second" query parser? > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730812#action_12730812 ] Mark Miller commented on LUCENE-1567: ------------------------------------- {quote}Mark, it seems like the best thing to do here is to add this as a 1.5 contrib for now and deprecate the core query parser. Then in 3.0 we would move the new one into core and remove the old one entirely. Since it will remain in the same package users won't have to change their code, just while they use 2.9 they have to put an extra jar in their classpath. Looking at the latest patch, that's what it does (new one to contrib while deprecating old one).{quote} Right, I think that does make sense, but can we actually go to 1.5 in 3.0? Thats what my main question is around. I know the 1.5 wiki says that we can, but Mike has indicated that 3.0 would just be a quick bug fix release with deprecations removed from 2.9. I thought I'd seen him say that 3.1 would actually be the first with 1.5? Mike M? > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730815#action_12730815 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- 3.0 will be 1.5. See http://wiki.apache.org/lucene-java/Java_1.5_Migration > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730821#action_12730821 ] Michael McCandless commented on LUCENE-1567: -------------------------------------------- Right, 3.0 is when we can first use 1.5 code. But, 3.0 will be a fast "mechanical" release after 2.9. This is just like the 1.9 -> 2.0 fast turnaround, *except* because we begin accepting 1.5 code in 3.0 we may make certain changes (switch to generics in certain APIs; move the new QueryParser into core; etc.). However we don't plan on doing any new features, etc in 3.0; that will first happen in 3.1. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730823#action_12730823 ] Mark Miller commented on LUCENE-1567: ------------------------------------- Yeah, I had seen that, I was just remembering an email or two from Mike that mentioned differently (waiting till 3.1) ... but I just found one of the threads discussing it and it looks like consensus shifted: http://www.lucidimagination.com/search/document/6d2b6488b4115/2_9_3_0_plan_java_1_5#6d2b6488b4115 > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731016#action_12731016 ] Michael Busch commented on LUCENE-1567: --------------------------------------- {quote} except because we begin accepting 1.5 code in 3.0 we may make certain changes (switch to generics in certain APIs; move the new QueryParser into core; etc.). {quote} OK sounds like a plan then! The new QP code will not change, but we'll move it into core in 3.0. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1567: --------------------------------------- Fix Version/s: 2.9 > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Fix For: 2.9 > > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731220#action_12731220 ] Adriano Crestani commented on LUCENE-1567: ------------------------------------------ {quote} except because we begin accepting 1.5 code in 3.0 we may make certain changes (switch to generics in certain APIs; move the new QueryParser into core; etc.). OK sounds like a plan then! The new QP code will not change, but we'll move it into core in 3.0. {quote} Thanks for the explanation! {quote} Names that tack a "2" or some other number on the end are pretty much meaningless. I'd suggest finding something better that actually describes what the package contains. After all what is the "second" query parser? {quote} I agree with Luis, it's a good idea to have a package for each different query parser implementation. I also agree with Grant that it does not make sense to have an implementation tied to a number. So, as the "lucene2" implementation contains the default/main Lucene query parser implementation, I would suggest to rename it to "defaultLucene", "default" or "main". I will give +1 for "default". Regards, Adriano Crestani Campos > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Fix For: 2.9 > > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Alves updated LUCENE-1567: ------------------------------- Attachment: lucene_trunk_FlexQueryParser_2009july15_v6.patch - Undo the changes on the build file to skip queryparser module if jdk 1.4 was found. - Include Adriano changes > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Fix For: 2.9 > > Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009July10_v5.patch, lucene_trunk_FlexQueryParser_2009july15_v6.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |