How to run fetch from local

View: New views
1 Messages — Rating Filter:   Alert me  

How to run fetch from local

by saravan.krish :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I had generated the segments after crawling process. Then I downloaded the segments to local from crawldb. Below are the four segments I generated and downloaded from crawldb. Now if I run fetch upon these four segments then I get the below error. Please help me how to run fetch in local.

[nutch@devcluster01 search]$ ls -lrt db/segments/crawled_22/segments/
total 32
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022065049
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022065828
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022071136
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022104701
[nutch@devcluster01 search]$ bin/nutch fetch db/segments/crawled_22/segments/20091022065049
Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property.
Fetcher: starting
Fetcher: segment: db/segments/crawled_22/segments/20091022065049
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://devcluster01:9000/user/nutch/db/segments/crawled_22/segments/20091022065049/crawl_generate
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
        at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
        at org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:101)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)