Hi there,
seems i have some serious problems with hadoop during map-reduce for
MergeSegments.
i am out of ideas on this. Any suggestions will be quite welcome.
Here is my set up:
RAM: 4G
JVM HEAP: 2G
mapred.child.java.opts = 1024M
hadoop-0.19.1-core.jar
nutch-1.0
Xen VPS.
After running a recrawl a few times; i end up with one segment that is
relatively larger compared to the new ones last generated. here is my
segments structure when things blow up after a (5th) recrawl;
segment1 = 674Megs (after several recrawls)
segment2 = 580k (last recrawl)
segment3 = 568k (last recrawl)
segment4 = 584k (last recrawl)
..
segment8 = 560k (last recrawl)
when i run mergeSegments everything goes well until we get up to 90% of
the map-reduce and we get a thread death; here is a stack trace
2009-11-05 10:54:16,874 INFO [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-05 10:54:29,794 INFO [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-05 10:54:55,194 INFO [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-05 10:57:25,844 WARN [org.apache.hadoop.mapred.LocalJobRunner]
job_local_0001
java.lang.ThreadDeath
at java.lang.Thread.stop(Thread.java:715)
at
org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
at
org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
any suggestions please!!!!
thanks.