MergeSegments - map reduce thread death

View: New views
1 Messages — Rating Filter:   Alert me  

MergeSegments - map reduce thread death

by Fadzi Ushewokunze-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi there,

seems i have some serious problems with hadoop during map-reduce for
MergeSegments.

i am out of ideas on this. Any suggestions will be quite welcome.

Here is my set up:

RAM: 4G
JVM HEAP: 2G
mapred.child.java.opts = 1024M
hadoop-0.19.1-core.jar
nutch-1.0
Xen VPS.

After running a recrawl a few times; i end up with one segment that is
relatively larger compared to the new ones last generated. here is my
segments structure when things blow up after a (5th) recrawl;

segment1 = 674Megs (after several recrawls)
segment2 = 580k (last recrawl)
segment3 = 568k (last recrawl)
segment4 = 584k (last recrawl)
..
segment8 = 560k (last recrawl)

when i run mergeSegments everything goes well until we get up to 90% of
the map-reduce and we get a thread death; here is a stack trace

2009-11-05 10:54:16,874 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-05 10:54:29,794 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-05 10:54:55,194 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-05 10:57:25,844 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
job_local_0001
java.lang.ThreadDeath
        at java.lang.Thread.stop(Thread.java:715)
        at
org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
        at
org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
        at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
        at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)

any suggestions please!!!!

thanks.