Yes, I am also occuring to this problem. Can anyone help?
On Sun, Jul 5, 2009 at 11:33 PM, xiao yang <
yangxiao9901@...> wrote:
> I often get this error message while crawling the intranet
> Is it the network problem? What can I do for it?
>
> $bin/nutch crawl urls -dir crawl -depth 3 -topN 4
>
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> topN = 4
> Injector: starting
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: Selecting best-scoring urls due for fetch.
> Generator: starting
> Generator: segment: crawl/segments/20090705212324
> Generator: filtering: true
> Generator: topN: 4
> Generator: Partitioning selected urls by host, for politeness.
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)
>