[jira] Created: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

View: New views
17 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 | Next >

[jira] Updated: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1728:
--------------------------------

    Attachment: LUCENE-1728.txt

Simon, I revised the patch. Here are the new instructions for the analyzers/common and analyzers/smartcn scheme.
Sorry for the delay.

{code}
## 1. clean svn checkout
## 2. run the following commands to refactor the files.

mkdir contrib/analyzers/common
mkdir -p contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn contrib/analyzers/smartcn/src/test/org/apache/lucene/analysis/cn contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis/cn
svn add contrib/analyzers/smartcn contrib/analyzers/common
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart/*.java contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn delete contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart
svn move contrib/analyzers/src/test/org/apache/lucene/analysis/cn/TestSmartChineseAnalyzer.java contrib/analyzers/smartcn/src/test/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/stopwords.txt contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn/smart/hhmm/* contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis/cn
svn delete contrib/analyzers/src/resources/org/apache/lucene/analysis/cn
svn move contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenizer.java contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/WordTokenFilter.java
svn move contrib/analyzers/build.xml contrib/analyzers/common
svn move contrib/analyzers/pom.xml.template contrib/analyzers/common
svn move contrib/analyzers/src contrib/analyzers/common

## 3. eclipse "refresh" at project level.
## 4. set text-file encoding at project level to UTF-8
## 5. manually force text-file encoding as UTF-8 for contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html
##   this is an existing encoding issue that is corrected by this patch.
## 6. apply patch from clipboard (you may now remove the above hack and you will notice this file is now detected properly as UTF-8)
{code}


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1728:
--------------------------------

    Attachment: LUCENE-1728.txt

same patch, but this time i clicked ASF license... sorry!

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733538#action_12733538 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

Robert, I have looked at this patch and more important at the source itself and I get more and more the impression that we have to do more work on this analyzer and the related classes as just moving them into one package and make everything package private. From my understanding the Hidden Markov Model Segmenter is a feature which could be replaced by some other algorithm. Once you have such a feature relationship I would prefer packages by feature which enables you to remove a single feature just by removing a whole package.
In other words I would love to see a general refactoring of the code which exploits a tiny but common API in the base package and is subsequently used by the HHMM "feature". There is quite a bit of work to do that I do not consider 2.9 work.
So here is the question, do we keep the structure as it is and just move it to a new subdir to build a sep. jar or do we move them into one single package (as you did in the patch) and build up a clean HHMM package  later in 3.*.

Beside the packaging I found heaps of things I do not like very much in the code (not your patch :) an my fingertips getting nervous when I see stuff like the AbstractDictionary hierarchy or those Singletions. I would really like to have this separation of CN and common Analyzers in for 2.9 -- we just need to decide which way we go. I guess moving it over without changing code would be easiest.

simon


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733544#action_12733544 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, I agree with you, there is a ton of work to be done.

I also did not particularly like my method of moving everything into one package to hide the internals... and I 100% agree that a "correct" refactoring is quite a bit of work.

I don't want to sound like a complainer since I don't have a patch to fix these things, but I want to list some things that I would like to fix/refactor also.
* removal of GB2312 dictionary dependency: this limits functionality to simplified chinese.
* use of unicode categories (java Character class, etc) versus Utility.getCharType()
* support for codepoints outside of BMP, this is necessary to support traditional chinese.
* a little more flexibility with tokenization, honestly I'm really not sold on indexing "words" for chinese in the first place. But words + bigrams (overlapping tokens), that would be nice.

In the future it would be nice to add support for traditional chinese, and there is frequency data out there (libtabe: BSD license, etc), but we need to refactor first.

As far as what to do for 2.9... I really don't know either, just let me know if you need a new patch :)


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733547#action_12733547 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. I don't want to sound like a complainer since I don't have a patch to fix these things, but I want to list some things that I would like to fix/refactor also.

 :) pushing things forward is not complaining to me. I agree with you points I did not look closely into implementation details but rather on structural things. Apparently we both agree that we have work to do on this and I guess we can work out good solutions in the future together. Let's just move the classes into it's own subdir as you already did and keep the structure as it is (with the smallest changes - some classes have to be moved). If you could provide a patch I will commit the refactoring and we open a new issue for 3.*.
This solution seems to be ideal as 2.9 release is quite close...


simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733552#action_12733552 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon OK, I will work on a patch that tries to maintain the package structure.

Other than package structure, is there anything in the patch you are uncomfortable with?
I can either try to unfix any small fixes you don't like or create more testcases, whatever makes sense.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733556#action_12733556 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. Other than package structure, is there anything in the patch you are uncomfortable with?
no that I could tell. You can keep whatever applies to the package structure - means we might have to keep some classes public etc.

thanks for your patience! Good job!

simon

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733560#action_12733560 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, yes some things may have to be public that should not be due to the package structure.

I'll see if I can improve the javadocs for anything that falls in this situation as a short-term workaround.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733986#action_12733986 ]

Michael Busch commented on LUCENE-1728:
---------------------------------------

So we are going to move everything currently under contrib/analyzers to contrib/analyzers/common?

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733989#action_12733989 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

yeah, except smart chinese analyzer. I am testing the latest patch (that keeps the previous smart chinese analyzer package structure), regenerating docs, etc etc.

I will upload it in a few when I think it is good to go.

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Updated: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-1728:
--------------------------------

    Attachment: LUCENE-1728.txt

Simon, here is the new patch. It also has the changes to build.xml and site.xml so that javadocs are correctly linked, and the regenerated docs.

{noformat}
## 1. clean svn checkout
## 2. run the following commands to refactor the files.

mkdir contrib/analyzers/common
mkdir -p contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn contrib/analyzers/smartcn/src/test/org/apache/lucene/analysis/cn contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis
svn add contrib/analyzers/smartcn contrib/analyzers/common
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/SmartChineseAnalyzer.java contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/java/org/apache/lucene/analysis/cn/smart contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/test/org/apache/lucene/analysis/cn/TestSmartChineseAnalyzer.java contrib/analyzers/smartcn/src/test/org/apache/lucene/analysis/cn
svn move contrib/analyzers/src/resources/org/apache/lucene/analysis/cn contrib/analyzers/smartcn/src/resources/org/apache/lucene/analysis
svn copy contrib/analyzers/build.xml contrib/analyzers/common
svn move contrib/analyzers/pom.xml.template contrib/analyzers/common
svn move contrib/analyzers/src contrib/analyzers/common
svn move contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/WordTokenizer.java contrib/analyzers/smartcn/src/java/org/apache/lucene/analysis/cn/smart/WordTokenFilter.java

## 3. eclipse "refresh" at project level.
## 4. set text-file encoding at project level to UTF-8
## 5. manually force text-file encoding as UTF-8 for contrib/analyzers/common/src/java/org/apache/lucene/analysis/cn/package.html
##   also manually force text-file encoding as UTF-8 for contrib/analyzers/common/src/java/org/apache/lucene/analysis/cjk/package.html
##   this is an existing encoding issue that is corrected by this patch.
## 6. apply patch from clipboard (you may now remove the above hack and you will notice the above files are now detected properly as UTF-8)
{noformat}

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Resolved: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Simon Willnauer resolved LUCENE-1728.
-------------------------------------

    Resolution: Fixed

Robert, I just committed your patch. Thanks a lot for that.
I added equals and hashcode methods to the classes you removed them just in case.

@ Uwe(or some other core commiter): could you please prepare the documentation and top level build.xml and commit it, thanks! I think robert already prepared everything in his patch.

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734647#action_12734647 ]

Robert Muir commented on LUCENE-1728:
-------------------------------------

Simon, thanks!

oh, the equals and hashcode were commented out in the original src (I removed the commented lines).

I was afraid to uncomment them (I didnt know why they were commented out),
 but I shouldn't have deleted the commented lines... thanks for resolving this.


> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734685#action_12734685 ]

Michael McCandless commented on LUCENE-1728:
--------------------------------------------

I'll commit the top-level changes for the web-site.  Thanks Robert!

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734695#action_12734695 ]

Simon Willnauer commented on LUCENE-1728:
-----------------------------------------

bq. I'll commit the top-level changes for the web-site. Thanks Robert!
thanks mike!

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Commented: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734768#action_12734768 ]

Uwe Schindler commented on LUCENE-1728:
---------------------------------------

I committed the incorrect javadocs dirs for contrib/analysis in the main build.xml.
Revision: 797213

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...


[jira] Issue Comment Edited: (LUCENE-1728) Move SmartChineseAnalyzer & resources to own contrib project

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/LUCENE-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734768#action_12734768 ]

Uwe Schindler edited comment on LUCENE-1728 at 7/23/09 1:33 PM:
----------------------------------------------------------------

I committed a fix for the incorrect javadocs dirs for contrib/analysis in the main build.xml.
Revision: 797213

      was (Author: thetaphi):
    I committed the incorrect javadocs dirs for contrib/analysis in the main build.xml.
Revision: 797213
 

> Move SmartChineseAnalyzer & resources to own contrib project
> ------------------------------------------------------------
>
>                 Key: LUCENE-1728
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1728
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt, LUCENE-1728.txt
>
>
> SmartChineseAnalyzer depends on  a large dictionary that causes the analyzer jar to grow up to 3MB. The dictionary is quite big compared to all the other resouces / class files contained in that jar.
> Having a separate analyzer-cn contrib project enables footprint-sensitive users (e.g. using lucene on a mobile phone) to include analyzer.jar without getting into trouble with disk space.
> Moving SmartChineseAnalyzer to a separate project could also include a small refactoring as Robert mentioned in [LUCENE-1722|https://issues.apache.org/jira/browse/LUCENE-1722] several classes should be package protected, members and classes could be final, commented syserr and logging code should be removed etc.
> I set this issue target to 2.9 - if we can not make it until then feel free to move it to 3.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@...
For additional commands, e-mail: java-dev-help@...

< Prev | 1 - 2 | Next >