Hadoop-2438

View: New views
2 Messages — Rating Filter:   Alert me  

Hadoop-2438

by Miles Osborne :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Has there been any progress / a work-around for this?

Currently I'm experimenting with Streaming and I've encountered what looks
like the same problem as described here:

https://issues.apache.org/jira/browse/HADOOP-2438

So, I get much the same errors (see below).

For this particular task, when I replace the mappers and reducers with the
identity operation (ie just pass through the data) all is well.  When
instead I try to do something more taxing
(in this case, gathering together all ngrams with the same prefix), I get
these errors.

My guess is that this is something to do with caching / buffering, since I
presume that when the Stream mapper has real work to do, the associated Java
streamer buffers input until the Mapper signals that it can process more
data.  If the Mapper is busy, then a lot of data would get cached, causing
some internal buffer to overflow.

Miles

>

Date: Tue Jan 22 14:12:28 GMT 2008
java.io.IOException: Broken pipe
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)


        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

java.io.IOException: MROutput/MRErrThread
failed:java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349)
        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344)

        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

java.io.IOException: MROutput/MRErrThread
failed:java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349)
        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344)

        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

RE: Hadoop-2438

by Joydeep Sen Sarma :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> My guess is that this is something to do with caching / buffering, since I
> presume that when the Stream mapper has real work to do, the associated Java
> streamer buffers input until the Mapper signals that it can process more
> data.  If the Mapper is busy, then a lot of data would get cached, causing
> some internal buffer to overflow.

unlikely. the java buffer would be fixed size. it would write to a unix pipe periodically. if the streaming mapper is not consuming data - the java side would quickly become blocked writing to this pipe.

the broken pipe case is extremely common and just tells that the mapper died. best thing to do is find the stderr log for the task (from the jobtracker ui) and find if the mapper left something there before dying.


if streaming gurus are reading this - i am curious about one unrelated thing - the java map task does a 'flush()' in the buffered input stream to the streaming mapper after every input line. seemed like unnecessary overhead to me. was curious why (must be some rationale).



-----Original Message-----
From: milesosb@... on behalf of Miles Osborne
Sent: Tue 1/22/2008 6:26 AM
To: hadoop-user@...
Subject: Hadoop-2438
 
Has there been any progress / a work-around for this?

Currently I'm experimenting with Streaming and I've encountered what looks
like the same problem as described here:

https://issues.apache.org/jira/browse/HADOOP-2438

So, I get much the same errors (see below).

For this particular task, when I replace the mappers and reducers with the
identity operation (ie just pass through the data) all is well.  When
instead I try to do something more taxing
(in this case, gathering together all ngrams with the same prefix), I get
these errors.

My guess is that this is something to do with caching / buffering, since I
presume that when the Stream mapper has real work to do, the associated Java
streamer buffers input until the Mapper signals that it can process more
data.  If the Mapper is busy, then a lot of data would get cached, causing
some internal buffer to overflow.

Miles

>

Date: Tue Jan 22 14:12:28 GMT 2008
java.io.IOException: Broken pipe
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:260)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)


        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

java.io.IOException: MROutput/MRErrThread
failed:java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349)
        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344)

        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

java.io.IOException: MROutput/MRErrThread
failed:java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at org.apache.hadoop.io.Text.write(Text.java:243)
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349)
        at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344)

        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)