[jira] Created: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

View: New views
4 Messages — Rating Filter:   Alert me  

[jira] Created: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command
----------------------------------------------------------------------------------

                 Key: NUTCH-735
                 URL: https://issues.apache.org/jira/browse/NUTCH-735
             Project: Nutch
          Issue Type: Bug
          Components: web gui
    Affects Versions: 1.0.0
            Reporter: Susam Pal
            Priority: Minor


The inline documentation of 'conf/crawl-tool.xml' mentions:

{code:xml}
<!-- Do not modify this file directly.  Instead, copy entries that you -->
<!-- wish to modify from this file into nutch-site.xml and change them -->
<!-- there.  If nutch-site.xml does not already exist, create it.      -->
{code}

However, I don't see any way of overriding the properties defined in 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the configuration before 'conf/crawl-tool.xml' in the code. Here are the relevant code snippets:

*src/org/apache/nutch/crawl/Crawl.java:*

{code:java}
Configuration conf = NutchConfiguration.create();
conf.addResource("crawl-tool.xml");
JobConf job = new NutchJob(conf);
{code}

*src/org/apache/nutch/tool/NutchConfiguration.java:*

{code:java}
conf.addResource("nutch-default.xml");
conf.addResource("nutch-site.xml");
{code}

I have fixed this in the attached patch. 'crawl-tool.xml' is now added to the configuration before 'nutch-site.xml' only if crawl is invoked using the 'bin/nutch crawl' command.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/NUTCH-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Susam Pal updated NUTCH-735:
----------------------------

    Attachment: NUTCH-735v0.1.patch

Attached patch.

> crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command
> ----------------------------------------------------------------------------------
>
>                 Key: NUTCH-735
>                 URL: https://issues.apache.org/jira/browse/NUTCH-735
>             Project: Nutch
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 1.0.0
>            Reporter: Susam Pal
>            Priority: Minor
>         Attachments: NUTCH-735v0.1.patch
>
>
> The inline documentation of 'conf/crawl-tool.xml' mentions:
> {code:xml}
> <!-- Do not modify this file directly.  Instead, copy entries that you -->
> <!-- wish to modify from this file into nutch-site.xml and change them -->
> <!-- there.  If nutch-site.xml does not already exist, create it.      -->
> {code}
> However, I don't see any way of overriding the properties defined in 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the configuration before 'conf/crawl-tool.xml' in the code. Here are the relevant code snippets:
> *src/org/apache/nutch/crawl/Crawl.java:*
> {code:java}
> Configuration conf = NutchConfiguration.create();
> conf.addResource("crawl-tool.xml");
> JobConf job = new NutchJob(conf);
> {code}
> *src/org/apache/nutch/tool/NutchConfiguration.java:*
> {code:java}
> conf.addResource("nutch-default.xml");
> conf.addResource("nutch-site.xml");
> {code}
> I have fixed this in the attached patch. 'crawl-tool.xml' is now added to the configuration before 'nutch-site.xml' only if crawl is invoked using the 'bin/nutch crawl' command.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


     [ https://issues.apache.org/jira/browse/NUTCH-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doğacan Güney closed NUTCH-735.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.1
         Assignee: Doğacan Güney

Committed in rev. 782412.

Thanks!

> crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command
> ----------------------------------------------------------------------------------
>
>                 Key: NUTCH-735
>                 URL: https://issues.apache.org/jira/browse/NUTCH-735
>             Project: Nutch
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 1.0.0
>            Reporter: Susam Pal
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: NUTCH-735v0.1.patch
>
>
> The inline documentation of 'conf/crawl-tool.xml' mentions:
> {code:xml}
> <!-- Do not modify this file directly.  Instead, copy entries that you -->
> <!-- wish to modify from this file into nutch-site.xml and change them -->
> <!-- there.  If nutch-site.xml does not already exist, create it.      -->
> {code}
> However, I don't see any way of overriding the properties defined in 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the configuration before 'conf/crawl-tool.xml' in the code. Here are the relevant code snippets:
> *src/org/apache/nutch/crawl/Crawl.java:*
> {code:java}
> Configuration conf = NutchConfiguration.create();
> conf.addResource("crawl-tool.xml");
> JobConf job = new NutchJob(conf);
> {code}
> *src/org/apache/nutch/tool/NutchConfiguration.java:*
> {code:java}
> conf.addResource("nutch-default.xml");
> conf.addResource("nutch-site.xml");
> {code}
> I have fixed this in the attached patch. 'crawl-tool.xml' is now added to the configuration before 'nutch-site.xml' only if crawl is invoked using the 'bin/nutch crawl' command.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (NUTCH-735) crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command

by JIRA jira@apache.org :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


    [ https://issues.apache.org/jira/browse/NUTCH-735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717137#action_12717137 ]

Hudson commented on NUTCH-735:
------------------------------

Integrated in Nutch-trunk #838 (See [http://hudson.zones.apache.org/hudson/job/Nutch-trunk/838/])
     - crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command. Patch by Susam Pal.


> crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command
> ----------------------------------------------------------------------------------
>
>                 Key: NUTCH-735
>                 URL: https://issues.apache.org/jira/browse/NUTCH-735
>             Project: Nutch
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 1.0.0
>            Reporter: Susam Pal
>            Assignee: Doğacan Güney
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: NUTCH-735v0.1.patch
>
>
> The inline documentation of 'conf/crawl-tool.xml' mentions:
> {code:xml}
> <!-- Do not modify this file directly.  Instead, copy entries that you -->
> <!-- wish to modify from this file into nutch-site.xml and change them -->
> <!-- there.  If nutch-site.xml does not already exist, create it.      -->
> {code}
> However, I don't see any way of overriding the properties defined in 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the configuration before 'conf/crawl-tool.xml' in the code. Here are the relevant code snippets:
> *src/org/apache/nutch/crawl/Crawl.java:*
> {code:java}
> Configuration conf = NutchConfiguration.create();
> conf.addResource("crawl-tool.xml");
> JobConf job = new NutchJob(conf);
> {code}
> *src/org/apache/nutch/tool/NutchConfiguration.java:*
> {code:java}
> conf.addResource("nutch-default.xml");
> conf.addResource("nutch-site.xml");
> {code}
> I have fixed this in the attached patch. 'crawl-tool.xml' is now added to the configuration before 'nutch-site.xml' only if crawl is invoked using the 'bin/nutch crawl' command.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.