|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
reduce > heap space errorhi,
i am running on a single machine; 2G RAM, and java heap space set at 1024m, the segments are quite - tiny less than 100 urls and during mergeSegments i get this exception below; i have set mapred.child.java.opts=-Xmx512m but there is no change; any suggestions? ====> 2009-11-03 17:58:28,971 INFO [org.apache.hadoop.mapred.LocalJobRunner] reduce > reduce 2009-11-03 17:58:38,448 INFO [org.apache.hadoop.mapred.LocalJobRunner] reduce > reduce 2009-11-03 17:58:57,085 INFO [org.apache.hadoop.mapred.LocalJobRunner] reduce > reduce 2009-11-03 17:59:34,723 INFO [org.apache.hadoop.mapred.LocalJobRunner] reduce > reduce 2009-11-03 18:02:09,660 INFO [org.apache.hadoop.mapred.TaskRunner] Communication exception: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.Counters $Group.getCounterForName(Counters.java:327) at org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494) at org.apache.hadoop.mapred.Counters.sum(Counters.java:506) at org.apache.hadoop.mapred.LocalJobRunner $Job.statusUpdate(LocalJobRunner.java:222) at org.apache.hadoop.mapred.Task$1.run(Task.java:418) at java.lang.Thread.run(Thread.java:619) 2009-11-03 18:02:10,376 WARN [org.apache.hadoop.mapred.LocalJobRunner] job_local_0001 java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:715) at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) at org.apache.hadoop.mapred.JobClient $NetworkedJob.killJob(JobClient.java:315) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) |
|
|
Re: reduce > heap space errorif you set the mapred.child.java.opts
with additional value "-XX: -UseGCOverheadLimit" you can bypass this exception. I don't know if it has any side effects as a result of this.. ex. -Xmx512m -XX: -UseGCOverheadLimit On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze <fadzi@...> wrote: > hi, > > i am running on a single machine; 2G RAM, and java heap space set at > 1024m, the segments are quite - tiny less than 100 urls and during > mergeSegments i get this exception below; > > i have set mapred.child.java.opts=-Xmx512m but there is no change; > > any suggestions? > > > ====> > > 2009-11-03 17:58:28,971 INFO [org.apache.hadoop.mapred.LocalJobRunner] > reduce > reduce > 2009-11-03 17:58:38,448 INFO [org.apache.hadoop.mapred.LocalJobRunner] > reduce > reduce > 2009-11-03 17:58:57,085 INFO [org.apache.hadoop.mapred.LocalJobRunner] > reduce > reduce > 2009-11-03 17:59:34,723 INFO [org.apache.hadoop.mapred.LocalJobRunner] > reduce > reduce > 2009-11-03 18:02:09,660 INFO [org.apache.hadoop.mapred.TaskRunner] > Communication exception: java.lang.OutOfMemoryError: Java heap space > at org.apache.hadoop.mapred.Counters > $Group.getCounterForName(Counters.java:327) > at > org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494) > at org.apache.hadoop.mapred.Counters.sum(Counters.java:506) > at org.apache.hadoop.mapred.LocalJobRunner > $Job.statusUpdate(LocalJobRunner.java:222) > at org.apache.hadoop.mapred.Task$1.run(Task.java:418) > at java.lang.Thread.run(Thread.java:619) > > 2009-11-03 18:02:10,376 WARN [org.apache.hadoop.mapred.LocalJobRunner] > job_local_0001 > java.lang.ThreadDeath > at java.lang.Thread.stop(Thread.java:715) > at > org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) > at org.apache.hadoop.mapred.JobClient > $NetworkedJob.killJob(JobClient.java:315) > at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) > at > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) > at > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) > > |
|
|
Re: reduce > heap space error + DiskChecker$DiskErrorExceptionhi there,
there was a little improvement; at least its not running out of ram anymore; but you're right there seems to be a side effect. i am now having what seems to be disk issues! i am running in a VPS so i am suspecting this might have something to do with it? but what is the cause now? ==>> 00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0' done. 00:36:29,104 INFO [MapTask] numReduceTasks: 1 00:36:29,104 INFO [MapTask] io.sort.mb = 100 00:36:29,240 INFO [MapTask] data buffer = 79691776/99614720 00:36:29,240 INFO [MapTask] record buffer = 262144/327680 00:36:29,260 INFO [CodecPool] Got brand-new decompressor 00:36:29,264 INFO [MapTask] Starting flush of map output 00:36:29,276 INFO [MapTask] Finished spill 0 00:36:29,280 INFO [TaskRunner] Task:attempt_local_0001_m_000065_0 is done. And is in the process of commiting 00:36:29,280 INFO [LocalJobRunner] file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655 00:36:29,280 INFO [TaskRunner] Task 'attempt_local_0001_m_000065_0' done. 00:36:38,533 WARN [LocalJobRunner] job_local_0001 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out in any of the configured local directories at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote: > if you set the mapred.child.java.opts > with additional value "-XX: -UseGCOverheadLimit" you can bypass this > exception. I don't know if it has any side effects as a result of > this.. > ex. > -Xmx512m -XX: -UseGCOverheadLimit > > > On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze > <fadzi@...> wrote: > > hi, > > > > i am running on a single machine; 2G RAM, and java heap space set at > > 1024m, the segments are quite - tiny less than 100 urls and during > > mergeSegments i get this exception below; > > > > i have set mapred.child.java.opts=-Xmx512m but there is no change; > > > > any suggestions? > > > > > > ====> > > > > 2009-11-03 17:58:28,971 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 17:58:38,448 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 17:58:57,085 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 17:59:34,723 INFO [org.apache.hadoop.mapred.LocalJobRunner] > > reduce > reduce > > 2009-11-03 18:02:09,660 INFO [org.apache.hadoop.mapred.TaskRunner] > > Communication exception: java.lang.OutOfMemoryError: Java heap space > > at org.apache.hadoop.mapred.Counters > > $Group.getCounterForName(Counters.java:327) > > at > > org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494) > > at org.apache.hadoop.mapred.Counters.sum(Counters.java:506) > > at org.apache.hadoop.mapred.LocalJobRunner > > $Job.statusUpdate(LocalJobRunner.java:222) > > at org.apache.hadoop.mapred.Task$1.run(Task.java:418) > > at java.lang.Thread.run(Thread.java:619) > > > > 2009-11-03 18:02:10,376 WARN [org.apache.hadoop.mapred.LocalJobRunner] > > job_local_0001 > > java.lang.ThreadDeath > > at java.lang.Thread.stop(Thread.java:715) > > at > > org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) > > at org.apache.hadoop.mapred.JobClient > > $NetworkedJob.killJob(JobClient.java:315) > > at > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) > > at > > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) > > at > > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) > > > > |
|
|
Re: reduce > heap space error + DiskChecker$DiskErrorExceptionseems this was a file permissions error; deleting files generated by hadoop in /tmp seems to have taken care of the the Disk error; not sure if - this is the best thing to do? but now looks like there is sudden thread death; no explanation: 2009-11-04 14:56:41,613 WARN [org.apache.hadoop.mapred.LocalJobRunner] job_local_0001 java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:715) at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) at org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) > hi there, > > there was a little improvement; at least its not running out of ram > anymore; but you're right there seems to be a side effect. > > i am now having what seems to be disk issues! i am running in a VPS so i > am suspecting this might have something to do with it? > > but what is the cause now? > > > ==>> > > 00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0' > done. > 00:36:29,104 INFO [MapTask] numReduceTasks: 1 > 00:36:29,104 INFO [MapTask] io.sort.mb = 100 > 00:36:29,240 INFO [MapTask] data buffer = 79691776/99614720 > 00:36:29,240 INFO [MapTask] record buffer = 262144/327680 > 00:36:29,260 INFO [CodecPool] Got brand-new decompressor > 00:36:29,264 INFO [MapTask] Starting flush of map output > 00:36:29,276 INFO [MapTask] Finished spill 0 > 00:36:29,280 INFO [TaskRunner] Task:attempt_local_0001_m_000065_0 is > done. And is in the process of commiting > 00:36:29,280 INFO [LocalJobRunner] > file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655 > 00:36:29,280 INFO [TaskRunner] Task 'attempt_local_0001_m_000065_0' done. > 00:36:38,533 WARN [LocalJobRunner] job_local_0001 > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out > in any of the configured local directories > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) > at > org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150) > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) > at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) > at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) > > > > On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote: >> if you set the mapred.child.java.opts >> with additional value "-XX: -UseGCOverheadLimit" you can bypass this >> exception. I don't know if it has any side effects as a result of >> this.. >> ex. >> -Xmx512m -XX: -UseGCOverheadLimit >> >> >> On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze >> <fadzi@...> wrote: >> > hi, >> > >> > i am running on a single machine; 2G RAM, and java heap space set at >> > 1024m, the segments are quite - tiny less than 100 urls and during >> > mergeSegments i get this exception below; >> > >> > i have set mapred.child.java.opts=-Xmx512m but there is no change; >> > >> > any suggestions? >> > >> > >> > ====> >> > >> > 2009-11-03 17:58:28,971 INFO >> [org.apache.hadoop.mapred.LocalJobRunner] >> > reduce > reduce >> > 2009-11-03 17:58:38,448 INFO >> [org.apache.hadoop.mapred.LocalJobRunner] >> > reduce > reduce >> > 2009-11-03 17:58:57,085 INFO >> [org.apache.hadoop.mapred.LocalJobRunner] >> > reduce > reduce >> > 2009-11-03 17:59:34,723 INFO >> [org.apache.hadoop.mapred.LocalJobRunner] >> > reduce > reduce >> > 2009-11-03 18:02:09,660 INFO [org.apache.hadoop.mapred.TaskRunner] >> > Communication exception: java.lang.OutOfMemoryError: Java heap space >> > at org.apache.hadoop.mapred.Counters >> > $Group.getCounterForName(Counters.java:327) >> > at >> > org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494) >> > at org.apache.hadoop.mapred.Counters.sum(Counters.java:506) >> > at org.apache.hadoop.mapred.LocalJobRunner >> > $Job.statusUpdate(LocalJobRunner.java:222) >> > at org.apache.hadoop.mapred.Task$1.run(Task.java:418) >> > at java.lang.Thread.run(Thread.java:619) >> > >> > 2009-11-03 18:02:10,376 WARN >> [org.apache.hadoop.mapred.LocalJobRunner] >> > job_local_0001 >> > java.lang.ThreadDeath >> > at java.lang.Thread.stop(Thread.java:715) >> > at >> > org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) >> > at org.apache.hadoop.mapred.JobClient >> > $NetworkedJob.killJob(JobClient.java:315) >> > at >> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) >> > at >> > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) >> > at >> > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) >> > >> > > > |
|
|
Re: reduce > heap space error + DiskChecker$DiskErrorExceptionHello,
You should try to copy your data to local machine and try it. VPS creates a lot of limits depending on technology used. Anyway, nutch is disk bound, slow disk will get you very slow results. VPS's are always on commodity hardware, I am almost sure that there's standard SATA drive and that's shared for 10 to 30 vps! Regards, Bartosz fadzi@... pisze: > seems this was a file permissions error; deleting files generated by > hadoop in /tmp seems to have taken care of the the Disk error; not sure if > - this is the best thing to do? > > but now looks like there is sudden thread death; no explanation: > > 2009-11-04 14:56:41,613 WARN [org.apache.hadoop.mapred.LocalJobRunner] > job_local_0001 > java.lang.ThreadDeath > at java.lang.Thread.stop(Thread.java:715) > at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) > at > org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) > at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) > at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) > > > >> hi there, >> >> there was a little improvement; at least its not running out of ram >> anymore; but you're right there seems to be a side effect. >> >> i am now having what seems to be disk issues! i am running in a VPS so i >> am suspecting this might have something to do with it? >> >> but what is the cause now? >> >> >> ==>> >> >> 00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0' >> done. >> 00:36:29,104 INFO [MapTask] numReduceTasks: 1 >> 00:36:29,104 INFO [MapTask] io.sort.mb = 100 >> 00:36:29,240 INFO [MapTask] data buffer = 79691776/99614720 >> 00:36:29,240 INFO [MapTask] record buffer = 262144/327680 >> 00:36:29,260 INFO [CodecPool] Got brand-new decompressor >> 00:36:29,264 INFO [MapTask] Starting flush of map output >> 00:36:29,276 INFO [MapTask] Finished spill 0 >> 00:36:29,280 INFO [TaskRunner] Task:attempt_local_0001_m_000065_0 is >> done. And is in the process of commiting >> 00:36:29,280 INFO [LocalJobRunner] >> file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655 >> 00:36:29,280 INFO [TaskRunner] Task 'attempt_local_0001_m_000065_0' done. >> 00:36:38,533 WARN [LocalJobRunner] job_local_0001 >> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find >> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out >> in any of the configured local directories >> at >> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381) >> at >> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138) >> at >> org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150) >> Exception in thread "main" java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) >> at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) >> at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) >> >> >> >> On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote: >> >>> if you set the mapred.child.java.opts >>> with additional value "-XX: -UseGCOverheadLimit" you can bypass this >>> exception. I don't know if it has any side effects as a result of >>> this.. >>> ex. >>> -Xmx512m -XX: -UseGCOverheadLimit >>> >>> >>> On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze >>> <fadzi@...> wrote: >>> >>>> hi, >>>> >>>> i am running on a single machine; 2G RAM, and java heap space set at >>>> 1024m, the segments are quite - tiny less than 100 urls and during >>>> mergeSegments i get this exception below; >>>> >>>> i have set mapred.child.java.opts=-Xmx512m but there is no change; >>>> >>>> any suggestions? >>>> >>>> >>>> ====> >>>> >>>> 2009-11-03 17:58:28,971 INFO >>>> >>> [org.apache.hadoop.mapred.LocalJobRunner] >>> >>>> reduce > reduce >>>> 2009-11-03 17:58:38,448 INFO >>>> >>> [org.apache.hadoop.mapred.LocalJobRunner] >>> >>>> reduce > reduce >>>> 2009-11-03 17:58:57,085 INFO >>>> >>> [org.apache.hadoop.mapred.LocalJobRunner] >>> >>>> reduce > reduce >>>> 2009-11-03 17:59:34,723 INFO >>>> >>> [org.apache.hadoop.mapred.LocalJobRunner] >>> >>>> reduce > reduce >>>> 2009-11-03 18:02:09,660 INFO [org.apache.hadoop.mapred.TaskRunner] >>>> Communication exception: java.lang.OutOfMemoryError: Java heap space >>>> at org.apache.hadoop.mapred.Counters >>>> $Group.getCounterForName(Counters.java:327) >>>> at >>>> org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494) >>>> at org.apache.hadoop.mapred.Counters.sum(Counters.java:506) >>>> at org.apache.hadoop.mapred.LocalJobRunner >>>> $Job.statusUpdate(LocalJobRunner.java:222) >>>> at org.apache.hadoop.mapred.Task$1.run(Task.java:418) >>>> at java.lang.Thread.run(Thread.java:619) >>>> >>>> 2009-11-03 18:02:10,376 WARN >>>> >>> [org.apache.hadoop.mapred.LocalJobRunner] >>> >>>> job_local_0001 >>>> java.lang.ThreadDeath >>>> at java.lang.Thread.stop(Thread.java:715) >>>> at >>>> org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) >>>> at org.apache.hadoop.mapred.JobClient >>>> $NetworkedJob.killJob(JobClient.java:315) >>>> at >>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239) >>>> at >>>> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620) >>>> at >>>> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665) >>>> >>>> >>>> >> > > > > |
| Free embeddable forum powered by Nabble | Forum Help |