|
View:
New views
17 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilterASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter
-------------------------------------------------------------------------------------- Key: LUCENE-2015 URL: https://issues.apache.org/jira/browse/LUCENE-2015 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Cédrik LIME Priority: Minor This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cédrik LIME updated LUCENE-2015: -------------------------------- Attachment: Filters.patch (UTF-8 encoding) > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: Filters.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771481#action_12771481 ] Robert Muir commented on LUCENE-2015: ------------------------------------- Cédrik, is it possible to provide a patch without the formatting changes? I am having trouble seeing the changes you made to ASCIIFoldingFilter. btw, I think ISOLatin1AccentFilter only stays around for back compat to support old indexes, in my opinion we should not modify it for this reason. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: Filters.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771496#action_12771496 ] Cédrik LIME commented on LUCENE-2015: ------------------------------------- Robert, All I did is refactor the big switch(c) into its own method: public static final int foldToASCII(char c, char[] output, int outputPos) and change the caller (public void foldToASCII(char[] input, int length)) accordingly. I can submit a patch without formatting changes, but that means the source won't be nicely indented... Please advise. As for the ISOLatin1AccentFilter patch, it really is to enable us to remove a workaround for an issue we had with some special (yet frequent) chars. Feel free to ignore it should you think this part is not relevant. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: Filters.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771524#action_12771524 ] Robert Muir commented on LUCENE-2015: ------------------------------------- Cédrik, in my opinion, it would be easier to see the patch without the formatting changes if possible. Even if there is bad indentation currently, I think this should be corrected in a separate patch. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: Filters.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cédrik LIME updated LUCENE-2015: -------------------------------- Attachment: ASCIIFoldingFilter-no_formatting.patch ISOLatin1AccentFilter.patch Here are the patches (UTF-8 encoding), 1 per filter. I have removed the formatting on the switch(c) in ASCIIFoldingFilter for easier review. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771546#action_12771546 ] Robert Muir commented on LUCENE-2015: ------------------------------------- Cédrik, I think adding the idea of adding a public static method for folding is OK. but I think it should essentially do what foldToAscii does, not operate on a single 'char'. we should avoid single 'char' as parameter arguments, instead it should work on the entire char[] I think ? > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771557#action_12771557 ] Cédrik LIME commented on LUCENE-2015: ------------------------------------- Indeed, and that was my primary (internal) patch. But then you loose the shared "output" buffer between incrementToken() calls, and you end up creating char[]'s like there is no tomorrow, which may be a performance regression. What I can do is /add/ a static method that operates on a char[], for convenient external use. What do you think? > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771564#action_12771564 ] Robert Muir commented on LUCENE-2015: ------------------------------------- Cédrik, why would you create char[]'s like there is no tomorrow if you add a static method that operates on char[], for external use, but also use this within the incrementToken(), passing the tokenBuffer as an argument? > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771568#action_12771568 ] Uwe Schindler commented on LUCENE-2015: --------------------------------------- We cannot apply the patch to ISOLatin1Filter, as it would break indexes already using it. Because of that we migrated to ASCIIFoldingFilter and kept ISOLatin1Filter alive. So we should leave it as it is. To the buffer problem: For easy external use we could also provide a expert API that works like the current public foldToASCII method, which is memory efficient. But may also provide String/StringBuilder converters for external use. Internal it cannot be better as it currently is :-) > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771576#action_12771576 ] Cédrik LIME commented on LUCENE-2015: ------------------------------------- Uwe, ISOLatin1AccentFilter was already modified in Lucene 2.4: see LUCENE-1351 As for ASCIIFoldingFilter, I will take a second shot at an expert API next week. Stay tuned! > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771578#action_12771578 ] Robert Muir commented on LUCENE-2015: ------------------------------------- bq. ISOLatin1AccentFilter was already modified in Lucene 2.4: see LUCENE-1351 that's interesting, so if someone has a < Lucene 2.4 index built with this filter, its currently not compatible... I guess no one has complained but there could be some conditional logic based on Version to support those indexes... > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771592#action_12771592 ] Uwe Schindler commented on LUCENE-2015: --------------------------------------- I would leave ISOLatin1AccentFilter as it is. No version logic for already deprecated classes, they are deprecated, so no support any more. Normally we would have removed it in 3.0, it is really only be there to support old indexes, so no new features. If until now, nobody complained, we do not need to care. Maybe the modifications were so special, that only some of the term in such indexes were affected and nobody realized that difference. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771593#action_12771593 ] Michael McCandless commented on LUCENE-2015: -------------------------------------------- I think those changes to ISOLatin1AccentFilter predated our Version logic... I agree that had Version been around we probably should have used it. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cédrik LIME updated LUCENE-2015: -------------------------------- Attachment: ASCIIFoldingFilter-no_formatting.patch As suggested by Robert, here is a new version of the ASCIIFoldingFilter patch which exposes the folding logic. I have added 2 convenience methods that can operate on a char[] and on a CharSequence. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773014#action_12773014 ] Robert Muir commented on LUCENE-2015: ------------------------------------- Cédrik, thanks! at a glance this looks good to me... can look at it more thoroughly later, i am heading out of town. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-2015) ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter[ https://issues.apache.org/jira/browse/LUCENE-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786710#action_12786710 ] Mark Miller commented on LUCENE-2015: ------------------------------------- For this type of stuff "no one has complained" doesn't mean much - thats why these changes are so insidious - they are easy not to notice - docs just disappear, and users likely don't know they ever existed. For some apps this is absolutely disastrous. We prob should have been more careful with 1351 and more careful in the future. > ASCIIFoldingFilter: expose folding logic + small improvements to ISOLatin1AccentFilter > -------------------------------------------------------------------------------------- > > Key: LUCENE-2015 > URL: https://issues.apache.org/jira/browse/LUCENE-2015 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis > Reporter: Cédrik LIME > Priority: Minor > Attachments: ASCIIFoldingFilter-no_formatting.patch, ASCIIFoldingFilter-no_formatting.patch, Filters.patch, ISOLatin1AccentFilter.patch > > > This patch adds a couple of non-ascii chars to ISOLatin1AccentFilter (namely: left & right single quotation marks, en dash, em dash) which we very frequently encounter in our projects. I know that this class is now deprecated; this improvement is for legacy code that hasn't migrated yet. > It also enables easy access to the ascii folding technique use in ASCIIFoldingFilter for potential re-use in non-Lucene-related code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| Free embeddable forum powered by Nabble | Forum Help |