[
https://issues.apache.org/jira/browse/NUTCH-735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doğacan Güney closed NUTCH-735.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.1
Assignee: Doğacan Güney
Committed in rev. 782412.
Thanks!
> crawl-tool.xml must be read before nutch-site.xml when invoked using crawl command
> ----------------------------------------------------------------------------------
>
> Key: NUTCH-735
> URL:
https://issues.apache.org/jira/browse/NUTCH-735> Project: Nutch
> Issue Type: Bug
> Components: web gui
> Affects Versions: 1.0.0
> Reporter: Susam Pal
> Assignee: Doğacan Güney
> Priority: Minor
> Fix For: 1.1
>
> Attachments: NUTCH-735v0.1.patch
>
>
> The inline documentation of 'conf/crawl-tool.xml' mentions:
> {code:xml}
> <!-- Do not modify this file directly. Instead, copy entries that you -->
> <!-- wish to modify from this file into nutch-site.xml and change them -->
> <!-- there. If nutch-site.xml does not already exist, create it. -->
> {code}
> However, I don't see any way of overriding the properties defined in 'conf/crawl-tool.xml' as 'conf/nutch-site.xml' is added to the configuration before 'conf/crawl-tool.xml' in the code. Here are the relevant code snippets:
> *src/org/apache/nutch/crawl/Crawl.java:*
> {code:java}
> Configuration conf = NutchConfiguration.create();
> conf.addResource("crawl-tool.xml");
> JobConf job = new NutchJob(conf);
> {code}
> *src/org/apache/nutch/tool/NutchConfiguration.java:*
> {code:java}
> conf.addResource("nutch-default.xml");
> conf.addResource("nutch-site.xml");
> {code}
> I have fixed this in the attached patch. 'crawl-tool.xml' is now added to the configuration before 'nutch-site.xml' only if crawl is invoked using the 'bin/nutch crawl' command.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.