|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next > |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720382#action_12720382 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- From the IP Clearance, consider yourself reminded: {quote} Remind active committers that they are responsible for ensuring that a Corporate CLA is recorded if such is required to authorize their contributions under their individual CLA. {quote} > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720385#action_12720385 ] Michael Busch commented on LUCENE-1567: --------------------------------------- But we still need to update the code before we can commit. From which patch do you need the MD5/SHA1 hash from? > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720386#action_12720386 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- OK, only outstanding items for clearance are: 1. tarball and hash 2. Vote on Incubator for clearance. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720403#action_12720403 ] Adriano Crestani commented on LUCENE-1567: ------------------------------------------ Hi Michael, I expect it takes one week at max! > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720404#action_12720404 ] Michael Busch commented on LUCENE-1567: --------------------------------------- Ok GO! > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720417#action_12720417 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- Commit is separate from IP Clearance and you can't commit until the clearance is accepted. I just need the tarball for the code that was referenced in the software grant along with a hash on it. In the grant, you have a file directory listing describing the code. Take that file listing, tar it up and run md5 on it. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720422#action_12720422 ] Michael Busch commented on LUCENE-1567: --------------------------------------- OK that should be easy. We'll do that asap. Thanks for explaining, Grant. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1567: ---------------------------------- Attachment: new_query_parser_src.tar MD5 (new_query_parser_src.tar) = b678596e3dea63e8e66e035d6dc7f45e On Jul 4, 2009, at 5:17 PM, Michael Busch wrote: {quote} Hi Grant, attached is the tar file that includes the files that were listed in the software grant. These files contain all the IP of this new feature that was developed internally in IBM. However, the final patch that will be committed will look a bit different, due to discussions with the other committers, which of course take now place on the public mailinglist. {quote} On 7/5/09 8:15 PM, Grant Ingersoll wrote: {quote} Please attach to the issue. No worries on the other part, just need the bits there for me to say they exist and align w/ the Grant. What we commit can be patched. {quote} > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727531#action_12727531 ] Mark Miller commented on LUCENE-1567: ------------------------------------- I wonder if all of this was really necessary. Months ago, while doing some searching, I saw that at least one other Apache project (might have been on the legal email list?), asked about a large code contribution from a company, and the response was that if a guy at the company had a CLA on file (was a committer), and was part of the process, he could commit the large code contribution without all of this paperwork mumbo jumbo. Of course, best to be thorough and complete, but I think we may not have to jump through these same hoops in the future. Not that that means much, as I say that with no authority or complete knowledge about it. But if someone wanted to research further ... > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728039#action_12728039 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- bq. I saw that at least one other Apache project Just because someone else does it wrong... It's pretty clear in this case that the Grant is necessary. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728042#action_12728042 ] Mark Miller commented on LUCENE-1567: ------------------------------------- Someone else didnt do it wrong - they asked and got an answer from someone from Apache that seemed to know what they were talking about, and seemed to have the authority/knowledge to give the answer they gave. I'm not saying something one way or another - just throwing what I saw out there. I'm sure you have more info on the subject than I do. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728052#action_12728052 ] Mark Miller commented on LUCENE-1567: ------------------------------------- Hmmm - if you look at the strict letter of the law in Intellectual Property Clearance, then LocalLucene and Trie and a lot of other stuff also needed this clearance ... I may have just missed the process on those though. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728085#action_12728085 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- LocalLucene did. Not sure about Trie. Anyway, this issue is not the place for this discussion. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728090#action_12728090 ] Mark Miller commented on LUCENE-1567: ------------------------------------- bq. Anyway, this issue is not the place for this discussion. Seems like a couple comments about this here is appropriate to me. Your just being prickly man. I was pointing something out that has relevance to this issue and relevance to committers when dealing with future similar issues. My comment about LocalLucene and Trie were not an accusation, but an attempt to clarify what requires this and what doesn't. As a committer, its important that this information is clear to me. As the PMC head, I'd think youd be more helpful with the matter. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728097#action_12728097 ] Uwe Schindler commented on LUCENE-1567: --------------------------------------- bq. Not sure about Trie Trie was not property of a company, it was my private idea (and even if I work at the University of Bremen, which sponsors me, it is not owned by the University. Scientific research in Germany is the scientist's responsibility). And the code was already Apache 2.0 licensed, so there was no problem to donate it. And now I am committer and already signed the CLA. If there is still a problem, I would open another issue about that. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728102#action_12728102 ] Mark Miller commented on LUCENE-1567: ------------------------------------- I'll just keep my response out of JIRA to avoid taking over that issue: According to http://incubator.apache.org/ip-clearance/index.html, it doesn't matter if it was your companys code or if you are a committer or if you have a CLA. If it was developed outside of Apache svn/mailing lists and was then donated, it says it needs the grant. - Mark -- - Mark http://www.lucidimagination.com > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
|
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728103#action_12728103 ] Grant Ingersoll commented on LUCENE-1567: ----------------------------------------- bq. And the code was already Apache 2.0 licensed, so there was no problem to donate it. This does not matter, nor does the license. I was unaware that it lived in public someplace else. If the code lives somewhere else in public, then it needs to go through Soft. Grant, AIUI. Having it licensed as ASL just makes the paperwork a formality. At any rate, as I said, the discussion of Trie, LocalLucene and when some generic piece of code needs a grant has nothing to do with this particular issue, so please, if you want to continue this conversation, then start one on java-dev. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729529#action_12729529 ] Luis Alves commented on LUCENE-1567: ------------------------------------ Since all legal work is finally finished now, it is time for an updated patch with the latest fixes and improvements. Below are the changes compared to the previous patch: • moved the new queryparser to contrib • deprecated old QueryParser classes in the core • the new queryparser in contrib uses jdk 1.5 • patch compiles against current trunk • rewrote the lucene testcases to use the new API's • created wrapper testcases that uses wrapper classes • created classes to overwrite the old QueryParser in the util folder, and make Lucene use the flexible query parser engine, without having to change your code. I verified that all testcases are working, and that all contrib modules still compile fine. Adriano when you have some time, can you write an interface for simple usage of the new QueryParser, and a simple implementation of the interface, that creates a textparser, creates a processor pipeline, and instantiates the lucene builders? And please add a simple junit that demonstrates the usage of that interface and ideally some documentation into the package.html of the new contrib package that will help users who want to use the queryparser to get started. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1567) New flexible query parser[ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luis Alves updated LUCENE-1567: ------------------------------- Attachment: lucene_trunk_FlexQueryParser_2009July09_v4.patch patch compiles against current trunk > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009July09_v4.patch, lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, QueryParser_restructure_meetup_june2009_v2.pdf > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| < Prev | 1 - 2 - 3 - 4 - 5 - 6 - 7 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |