reduce > heap space error

View: New views
5 Messages — Rating Filter:   Alert me  

reduce > heap space error

by Fadzi Ushewokunze-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi,

i am running on a single machine; 2G RAM, and java heap space set at
1024m, the segments are quite - tiny less than 100 urls and during
mergeSegments i get this exception below;

i have set mapred.child.java.opts=-Xmx512m but there is no change;

any suggestions?


====>

2009-11-03 17:58:28,971 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-03 17:58:38,448 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-03 17:58:57,085 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-03 17:59:34,723 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
reduce > reduce
2009-11-03 18:02:09,660 INFO  [org.apache.hadoop.mapred.TaskRunner]
Communication exception: java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.mapred.Counters
$Group.getCounterForName(Counters.java:327)
        at
org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494)
        at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
        at org.apache.hadoop.mapred.LocalJobRunner
$Job.statusUpdate(LocalJobRunner.java:222)
        at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
        at java.lang.Thread.run(Thread.java:619)

2009-11-03 18:02:10,376 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
job_local_0001
java.lang.ThreadDeath
        at java.lang.Thread.stop(Thread.java:715)
        at
org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
        at org.apache.hadoop.mapred.JobClient
$NetworkedJob.killJob(JobClient.java:315)
        at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
        at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
        at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)  


Re: reduce > heap space error

by Kalaimathan Mahenthiran :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

if you set the mapred.child.java.opts
with additional value "-XX: -UseGCOverheadLimit" you can bypass this
exception. I don't know if it has any side effects as a result of
this..
ex.
-Xmx512m -XX: -UseGCOverheadLimit


On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze
<fadzi@...> wrote:

> hi,
>
> i am running on a single machine; 2G RAM, and java heap space set at
> 1024m, the segments are quite - tiny less than 100 urls and during
> mergeSegments i get this exception below;
>
> i have set mapred.child.java.opts=-Xmx512m but there is no change;
>
> any suggestions?
>
>
> ====>
>
> 2009-11-03 17:58:28,971 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> reduce > reduce
> 2009-11-03 17:58:38,448 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> reduce > reduce
> 2009-11-03 17:58:57,085 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> reduce > reduce
> 2009-11-03 17:59:34,723 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> reduce > reduce
> 2009-11-03 18:02:09,660 INFO  [org.apache.hadoop.mapred.TaskRunner]
> Communication exception: java.lang.OutOfMemoryError: Java heap space
>        at org.apache.hadoop.mapred.Counters
> $Group.getCounterForName(Counters.java:327)
>        at
> org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494)
>        at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
>        at org.apache.hadoop.mapred.LocalJobRunner
> $Job.statusUpdate(LocalJobRunner.java:222)
>        at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2009-11-03 18:02:10,376 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
> job_local_0001
> java.lang.ThreadDeath
>        at java.lang.Thread.stop(Thread.java:715)
>        at
> org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
>        at org.apache.hadoop.mapred.JobClient
> $NetworkedJob.killJob(JobClient.java:315)
>        at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
>        at
> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
>        at
> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>
>

Re: reduce > heap space error + DiskChecker$DiskErrorException

by Fadzi Ushewokunze-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi there,

there was a little improvement; at least its not running out of ram
anymore; but you're right there seems to be a side effect.

i am now having what seems to be disk issues! i am running in a VPS so i
am suspecting this might have something to do with it?

but what is the cause now?


==>>

00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0'
done.
00:36:29,104 INFO  [MapTask] numReduceTasks: 1
00:36:29,104 INFO  [MapTask] io.sort.mb = 100
00:36:29,240 INFO  [MapTask] data buffer = 79691776/99614720
00:36:29,240 INFO  [MapTask] record buffer = 262144/327680
00:36:29,260 INFO  [CodecPool] Got brand-new decompressor
00:36:29,264 INFO  [MapTask] Starting flush of map output
00:36:29,276 INFO  [MapTask] Finished spill 0
00:36:29,280 INFO  [TaskRunner] Task:attempt_local_0001_m_000065_0 is done. And is in the process of commiting
00:36:29,280 INFO  [LocalJobRunner] file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655
00:36:29,280 INFO  [TaskRunner] Task 'attempt_local_0001_m_000065_0' done.
00:36:38,533 WARN  [LocalJobRunner] job_local_0001
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out in any of the configured local directories
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381)
        at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
        at org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
        at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)



On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote:

> if you set the mapred.child.java.opts
> with additional value "-XX: -UseGCOverheadLimit" you can bypass this
> exception. I don't know if it has any side effects as a result of
> this..
> ex.
> -Xmx512m -XX: -UseGCOverheadLimit
>
>
> On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze
> <fadzi@...> wrote:
> > hi,
> >
> > i am running on a single machine; 2G RAM, and java heap space set at
> > 1024m, the segments are quite - tiny less than 100 urls and during
> > mergeSegments i get this exception below;
> >
> > i have set mapred.child.java.opts=-Xmx512m but there is no change;
> >
> > any suggestions?
> >
> >
> > ====>
> >
> > 2009-11-03 17:58:28,971 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> > reduce > reduce
> > 2009-11-03 17:58:38,448 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> > reduce > reduce
> > 2009-11-03 17:58:57,085 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> > reduce > reduce
> > 2009-11-03 17:59:34,723 INFO  [org.apache.hadoop.mapred.LocalJobRunner]
> > reduce > reduce
> > 2009-11-03 18:02:09,660 INFO  [org.apache.hadoop.mapred.TaskRunner]
> > Communication exception: java.lang.OutOfMemoryError: Java heap space
> >        at org.apache.hadoop.mapred.Counters
> > $Group.getCounterForName(Counters.java:327)
> >        at
> > org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494)
> >        at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
> >        at org.apache.hadoop.mapred.LocalJobRunner
> > $Job.statusUpdate(LocalJobRunner.java:222)
> >        at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
> >        at java.lang.Thread.run(Thread.java:619)
> >
> > 2009-11-03 18:02:10,376 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
> > job_local_0001
> > java.lang.ThreadDeath
> >        at java.lang.Thread.stop(Thread.java:715)
> >        at
> > org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
> >        at org.apache.hadoop.mapred.JobClient
> > $NetworkedJob.killJob(JobClient.java:315)
> >        at
> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
> >        at
> > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
> >        at
> > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
> >
> >


Re: reduce > heap space error + DiskChecker$DiskErrorException

by Fadzi Ushewokunze-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


seems this was a file permissions error; deleting files generated by
hadoop in /tmp seems to have taken care of the the Disk error; not sure if
- this is the best thing to do?

but now looks like there is sudden thread death; no explanation:

2009-11-04 14:56:41,613 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
job_local_0001
java.lang.ThreadDeath
        at java.lang.Thread.stop(Thread.java:715)
        at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
        at
org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
        at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
        at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)


> hi there,
>
> there was a little improvement; at least its not running out of ram
> anymore; but you're right there seems to be a side effect.
>
> i am now having what seems to be disk issues! i am running in a VPS so i
> am suspecting this might have something to do with it?
>
> but what is the cause now?
>
>
> ==>>
>
> 00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0'
> done.
> 00:36:29,104 INFO  [MapTask] numReduceTasks: 1
> 00:36:29,104 INFO  [MapTask] io.sort.mb = 100
> 00:36:29,240 INFO  [MapTask] data buffer = 79691776/99614720
> 00:36:29,240 INFO  [MapTask] record buffer = 262144/327680
> 00:36:29,260 INFO  [CodecPool] Got brand-new decompressor
> 00:36:29,264 INFO  [MapTask] Starting flush of map output
> 00:36:29,276 INFO  [MapTask] Finished spill 0
> 00:36:29,280 INFO  [TaskRunner] Task:attempt_local_0001_m_000065_0 is
> done. And is in the process of commiting
> 00:36:29,280 INFO  [LocalJobRunner]
> file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655
> 00:36:29,280 INFO  [TaskRunner] Task 'attempt_local_0001_m_000065_0' done.
> 00:36:38,533 WARN  [LocalJobRunner] job_local_0001
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out
> in any of the configured local directories
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
> at
> org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
> at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
> at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>
>
>
> On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote:
>> if you set the mapred.child.java.opts
>> with additional value "-XX: -UseGCOverheadLimit" you can bypass this
>> exception. I don't know if it has any side effects as a result of
>> this..
>> ex.
>> -Xmx512m -XX: -UseGCOverheadLimit
>>
>>
>> On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze
>> <fadzi@...> wrote:
>> > hi,
>> >
>> > i am running on a single machine; 2G RAM, and java heap space set at
>> > 1024m, the segments are quite - tiny less than 100 urls and during
>> > mergeSegments i get this exception below;
>> >
>> > i have set mapred.child.java.opts=-Xmx512m but there is no change;
>> >
>> > any suggestions?
>> >
>> >
>> > ====>
>> >
>> > 2009-11-03 17:58:28,971 INFO
>> [org.apache.hadoop.mapred.LocalJobRunner]
>> > reduce > reduce
>> > 2009-11-03 17:58:38,448 INFO
>> [org.apache.hadoop.mapred.LocalJobRunner]
>> > reduce > reduce
>> > 2009-11-03 17:58:57,085 INFO
>> [org.apache.hadoop.mapred.LocalJobRunner]
>> > reduce > reduce
>> > 2009-11-03 17:59:34,723 INFO
>> [org.apache.hadoop.mapred.LocalJobRunner]
>> > reduce > reduce
>> > 2009-11-03 18:02:09,660 INFO  [org.apache.hadoop.mapred.TaskRunner]
>> > Communication exception: java.lang.OutOfMemoryError: Java heap space
>> >        at org.apache.hadoop.mapred.Counters
>> > $Group.getCounterForName(Counters.java:327)
>> >        at
>> > org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494)
>> >        at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
>> >        at org.apache.hadoop.mapred.LocalJobRunner
>> > $Job.statusUpdate(LocalJobRunner.java:222)
>> >        at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > 2009-11-03 18:02:10,376 WARN
>> [org.apache.hadoop.mapred.LocalJobRunner]
>> > job_local_0001
>> > java.lang.ThreadDeath
>> >        at java.lang.Thread.stop(Thread.java:715)
>> >        at
>> > org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
>> >        at org.apache.hadoop.mapred.JobClient
>> > $NetworkedJob.killJob(JobClient.java:315)
>> >        at
>> > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
>> >        at
>> > org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
>> >        at
>> > org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>> >
>> >
>
>



Re: reduce > heap space error + DiskChecker$DiskErrorException

by Bartosz Gadzimski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

You should try to copy your data to local machine and try it. VPS
creates a lot of limits depending on technology used. Anyway, nutch is
disk bound, slow disk will get you very slow results.
VPS's are always on commodity hardware, I am almost sure that there's
standard SATA drive and that's shared for 10 to 30 vps!

Regards,
Bartosz

fadzi@... pisze:

> seems this was a file permissions error; deleting files generated by
> hadoop in /tmp seems to have taken care of the the Disk error; not sure if
> - this is the best thing to do?
>
> but now looks like there is sudden thread death; no explanation:
>
> 2009-11-04 14:56:41,613 WARN  [org.apache.hadoop.mapred.LocalJobRunner]
> job_local_0001
> java.lang.ThreadDeath
> at java.lang.Thread.stop(Thread.java:715)
> at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
> at
> org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
> at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
> at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>
>
>  
>> hi there,
>>
>> there was a little improvement; at least its not running out of ram
>> anymore; but you're right there seems to be a side effect.
>>
>> i am now having what seems to be disk issues! i am running in a VPS so i
>> am suspecting this might have something to do with it?
>>
>> but what is the cause now?
>>
>>
>> ==>>
>>
>> 00:36:28,912 INFO [TaskRunner] Task 'attempt_local_0001_m_000064_0'
>> done.
>> 00:36:29,104 INFO  [MapTask] numReduceTasks: 1
>> 00:36:29,104 INFO  [MapTask] io.sort.mb = 100
>> 00:36:29,240 INFO  [MapTask] data buffer = 79691776/99614720
>> 00:36:29,240 INFO  [MapTask] record buffer = 262144/327680
>> 00:36:29,260 INFO  [CodecPool] Got brand-new decompressor
>> 00:36:29,264 INFO  [MapTask] Starting flush of map output
>> 00:36:29,276 INFO  [MapTask] Finished spill 0
>> 00:36:29,280 INFO  [TaskRunner] Task:attempt_local_0001_m_000065_0 is
>> done. And is in the process of commiting
>> 00:36:29,280 INFO  [LocalJobRunner]
>> file:/home/meda/workspace/web/crawl/segments/20091101171338/parse_text/part-00000/data:0+12655
>> 00:36:29,280 INFO  [TaskRunner] Task 'attempt_local_0001_m_000065_0' done.
>> 00:36:38,533 WARN  [LocalJobRunner] job_local_0001
>> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
>> taskTracker/jobcache/job_local_0001/attempt_local_0001_m_000000_0/output/file.out
>> in any of the configured local directories
>> at
>> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:381)
>> at
>> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
>> at
>> org.apache.hadoop.mapred.MapOutputFile.getOutputFile(MapOutputFile.java:50)
>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:150)
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>> at org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
>> at org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>>
>>
>>
>> On Tue, 2009-11-03 at 10:28 -0500, Kalaimathan Mahenthiran wrote:
>>    
>>> if you set the mapred.child.java.opts
>>> with additional value "-XX: -UseGCOverheadLimit" you can bypass this
>>> exception. I don't know if it has any side effects as a result of
>>> this..
>>> ex.
>>> -Xmx512m -XX: -UseGCOverheadLimit
>>>
>>>
>>> On Tue, Nov 3, 2009 at 7:50 AM, Fadzi Ushewokunze
>>> <fadzi@...> wrote:
>>>      
>>>> hi,
>>>>
>>>> i am running on a single machine; 2G RAM, and java heap space set at
>>>> 1024m, the segments are quite - tiny less than 100 urls and during
>>>> mergeSegments i get this exception below;
>>>>
>>>> i have set mapred.child.java.opts=-Xmx512m but there is no change;
>>>>
>>>> any suggestions?
>>>>
>>>>
>>>> ====>
>>>>
>>>> 2009-11-03 17:58:28,971 INFO
>>>>        
>>> [org.apache.hadoop.mapred.LocalJobRunner]
>>>      
>>>> reduce > reduce
>>>> 2009-11-03 17:58:38,448 INFO
>>>>        
>>> [org.apache.hadoop.mapred.LocalJobRunner]
>>>      
>>>> reduce > reduce
>>>> 2009-11-03 17:58:57,085 INFO
>>>>        
>>> [org.apache.hadoop.mapred.LocalJobRunner]
>>>      
>>>> reduce > reduce
>>>> 2009-11-03 17:59:34,723 INFO
>>>>        
>>> [org.apache.hadoop.mapred.LocalJobRunner]
>>>      
>>>> reduce > reduce
>>>> 2009-11-03 18:02:09,660 INFO  [org.apache.hadoop.mapred.TaskRunner]
>>>> Communication exception: java.lang.OutOfMemoryError: Java heap space
>>>>        at org.apache.hadoop.mapred.Counters
>>>> $Group.getCounterForName(Counters.java:327)
>>>>        at
>>>> org.apache.hadoop.mapred.Counters.incrAllCounters(Counters.java:494)
>>>>        at org.apache.hadoop.mapred.Counters.sum(Counters.java:506)
>>>>        at org.apache.hadoop.mapred.LocalJobRunner
>>>> $Job.statusUpdate(LocalJobRunner.java:222)
>>>>        at org.apache.hadoop.mapred.Task$1.run(Task.java:418)
>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>
>>>> 2009-11-03 18:02:10,376 WARN
>>>>        
>>> [org.apache.hadoop.mapred.LocalJobRunner]
>>>      
>>>> job_local_0001
>>>> java.lang.ThreadDeath
>>>>        at java.lang.Thread.stop(Thread.java:715)
>>>>        at
>>>> org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310)
>>>>        at org.apache.hadoop.mapred.JobClient
>>>> $NetworkedJob.killJob(JobClient.java:315)
>>>>        at
>>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1239)
>>>>        at
>>>> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:620)
>>>>        at
>>>> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:665)
>>>>
>>>>
>>>>        
>>    
>
>
>
>