anyone help? so disappointed.
On Fri, Jul 10, 2009 at 4:29 PM, lei wang <
nutchmaillist@...> wrote:
> Yes, I am also occuring to this problem. Can anyone help?
>
>
> On Sun, Jul 5, 2009 at 11:33 PM, xiao yang <
yangxiao9901@...> wrote:
>
>> I often get this error message while crawling the intranet
>> Is it the network problem? What can I do for it?
>>
>> $bin/nutch crawl urls -dir crawl -depth 3 -topN 4
>>
>> crawl started in: crawl
>> rootUrlDir = urls
>> threads = 10
>> depth = 3
>> topN = 4
>> Injector: starting
>> Injector: crawlDb: crawl/crawldb
>> Injector: urlDir: urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: Merging injected urls into crawl db.
>> Injector: done
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: starting
>> Generator: segment: crawl/segments/20090705212324
>> Generator: filtering: true
>> Generator: topN: 4
>> Generator: Partitioning selected urls by host, for politeness.
>> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>> at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
>> at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)
>>
>
>