|
View:
New views
20 Messages
—
Rating Filter:
Alert me
|
| < Prev | 1 - 2 | Next > |
|
|
[jira] Created: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib projectMove SmartChineseAnalyzer & resources to own contrib project
------------------------------------------------------------ Key: LUCENE-1728 URL: https://issues.apache.org/jira/browse/LUCENE-1728 Project: Lucene - Java Issue Type: Improvement Components: contrib/analyzers Reporter: Simon Willnauer Fix For: 2.9 SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-1728: ------------------------------------ Description: SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 was: SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. Priority: Minor (was: Major) > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726155#action_12726155 ] Robert Muir commented on LUCENE-1728: ------------------------------------- Simon, what do you think about name for the new folder? I am concerned users will be confused between analyzers/cjk, analyzers/cn and analyzer-cn, all of which are different. Should we name the new package analyzer-cnhmm or something to help clarify it? I intend to also add a little wordage to the javadocs to help disambiguate this, whatever we decide to name it. thanks > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Assigned: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-1728: --------------------------------------- Assignee: Simon Willnauer > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726184#action_12726184 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- bq. I am concerned users will be confused between analyzers/cjk, analyzers/cn and analyzer-cn, all of which are different. Should we name the new package analyzer-cnhmm or something to help clarify it? The name analyzer-cn was just an example though but I don't like analyzer-cnhmm. Whats about analyzer-smartcn? Definitly +1 for a less ambigious name. bq. I intend to also add a little wordage to the javadocs to help disambiguate this, whatever we decide to name it. +1 > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726187#action_12726187 ] Robert Muir commented on LUCENE-1728: ------------------------------------- Simon, analyzer-smartcn works, and its consistent with the name of the analyzer. If i "svn move" the files in my local, and submit a patch, will it ensure that history is preserved? I am not an svn expert. > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726314#action_12726314 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- bq. If i "svn move" the files in my local, and submit a patch, will it ensure that history is preserved? I am not an svn expert. No. an svn copy (or svn move) will not be reflected in a patch. I guess I should do the moveing and commit it and the refactoring should be done afterwards. Would that make sense to you?! simon > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Issue Comment Edited: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726314#action_12726314 ] Simon Willnauer edited comment on LUCENE-1728 at 7/1/09 9:14 PM: ----------------------------------------------------------------- bq. If i "svn move" the files in my local, and submit a patch, will it ensure that history is preserved? I am not an svn expert. No. an svn copy (or svn move) will not be reflected in a patch. If you "svn move A B" and create a patch from your local WC the history of A will be lost. I guess I should do the moveing and commit it, the refactoring should be done afterwards. Would that make sense to you?! simon was (Author: simonw): bq. If i "svn move" the files in my local, and submit a patch, will it ensure that history is preserved? I am not an svn expert. No. an svn copy (or svn move) will not be reflected in a patch. I guess I should do the moveing and commit it and the refactoring should be done afterwards. Would that make sense to you?! simon > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726319#action_12726319 ] Robert Muir commented on LUCENE-1728: ------------------------------------- simon, i think i've almost got it ready. i've got a set of svn move's etc you can run first, then a patch to apply over it. this way also the patch only reflects the real changes, and you can review what these are before changing anything in SVN... i'll upload it soon... > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726354#action_12726354 ] Uwe Schindler commented on LUCENE-1728: --------------------------------------- After creating the new contrib, do not forget to add the javadocs generation of the "all/" subdir in the main build.xml! Also new contribs should be added to the developers part in the site docs and so on. I can do that if you like after committing the whole thing (I have done it several times the last months for spatial, trie,...). Another idea: We can do it without creating a new contrib, instead do it like the contrib-bdb, which consists of 2 sub-contribs. Here the contrib folder of bdb is divided into two sub-folders, the build.xml of the main folder is just a "delegator" (or how you would call it) and delegates the ant targets to the build.xmls in the sub-folders. Using this approach we would still have only one contrib-analyzers main folder with two subdirs, which are two separate contribs modules (like the two bdb ones), but are in one folder. This approach is only good for source code, the user still gets the jar files in the main build folder directly under contrib. So I am not sure, if this is really better than two really separate contribs. > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726362#action_12726362 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- bq. After creating the new contrib, do not forget to add the javadocs generation of the "all/" subdir in the main build.xml! Also new contribs should be added to the developers part in the site docs and so on. I can do that if you like after committing the whole thing (I have done it several times the last months for spatial, trie,...). Uwe, I will not be able to commit those changes I guess. This reminds me that contrib commiters should have access to those files too. Once I get this change in I will notify you with a patch so you can get it in. bq. This approach is only good for source code, the user still gets the jar files in the main build folder directly under contrib. So I am not sure, if this is really better than two really separate contribs. I really like this approach as it keeps the code logically consistent. I think we should go for this approach, that makes much more sense to me. > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726453#action_12726453 ] Robert Muir commented on LUCENE-1728: ------------------------------------- bq. I really like this approach as it keeps the code logically consistent. I think we should go for this approach, that makes much more sense to me. Simon, are you referring to Uwe's approach of splitting the analyzers contrib into two, or your (previous) approach of analyzer-smartcn contrib? > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726456#action_12726456 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- bq. Simon, are you referring to Uwe's approach of splitting the analyzers contrib into two, or your (previous) approach of analyzer-smartcn contrib? I refer to Uwe's approach. > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726458#action_12726458 ] Robert Muir commented on LUCENE-1728: ------------------------------------- great, I like this too. any preference on names? contrib/analyzers/analyzers and contrib/analyzers/smartcn? once we figure that out i can create updated set of svn moves + patch for this approach > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726818#action_12726818 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- bq. contrib/analyzers/analyzers and contrib/analyzers/smartcn? +1 go ahead! > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Updated: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1728: -------------------------------- Attachment: LUCENE-1728.txt Simon, below is the method I used to do the refactoring with this patch. I know I am pressing the limits of what is a "refactoring" but in my opinion, this minor cleanup was necessary to prevent internal structures from being exposed: * Use of two Tokenizers in the same analyzer was confusing, WordTokenizer is now a TokenFilter. * Analyzer uses the standard WordListLoader rather than custom stuff. * Rather than force SmartChineseAnalyzer to keep track of internal heavyweight structures, it implements reusableTokenStream, etc. I added a few tests to ensure I didn't break anything in the SmartChineseAnalyzer. {noformat} ## 1. clean svn checkout ## 2. run the following commands to refactor the files. mkdir -p contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn svn add contrib/analysis svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/*.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn svn delete contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart svn move contrib/analyzers/src/test/org/apache/lucene/analysis/cn/TestSmartChineseAnalyzer.java contrib/analysis/smartcn/src/test/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/stopwords.txt contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analysis/smartcn/src/resources/org/apache/lucene/analysis/cn svn delete contrib/analyzers/src/resources/org/apache/lucene/analysis/cn svn move contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenizer.java contrib/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenFilter.java svn move contrib/analyzers contrib/analysis ## 3. eclipse "refresh" at project level. ## 4. set text-file encoding at project level to UTF-8 ## 5. manually force text-file encoding as UTF-8 for contrib/analysis/analyzers/src/java/org/apache/lucene/analysis/cn/package.html ## this is an existing encoding issue that is corrected by this patch. ## 6. apply patch from clipboard (you may now remove the above hack and you will notice this file is now detected properly as UTF-8) {noformat} > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728264#action_12728264 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- Robert, thanks for all that! I just had a brief look at it but looks good so far. I need to look over it again in the next days. Plan to commit it this week. simon > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731354#action_12731354 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- Robert, I don't think we should rename the directory analyzers to analysis I would rather go for analyzers/common and analyzers/smartcn or a similar scheme. simon > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731357#action_12731357 ] Robert Muir commented on LUCENE-1728: ------------------------------------- Simon, sounds good. I will update the patch / svn commands with that scheme (hopefully in the next day or 2) > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
|
|
[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project[ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731360#action_12731360 ] Simon Willnauer commented on LUCENE-1728: ----------------------------------------- cool thanks! > Move SmartChineseAnalyzer & resources to own contrib project > ------------------------------------------------------------ > > Key: LUCENE-1728 > URL: https://issues.apache.org/jira/browse/LUCENE-1728 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/analyzers > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1728.txt > > > SmartChineseAnalyzer depends on a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar. > Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space. > Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc. > I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@... For additional commands, e-mail: java-dev-help@... |
| < Prev | 1 - 2 | Next > |
| Free embeddable forum powered by Nabble | Forum Help |