|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Hadoop-2438Has there been any progress / a work-around for this?
Currently I'm experimenting with Streaming and I've encountered what looks like the same problem as described here: https://issues.apache.org/jira/browse/HADOOP-2438 So, I get much the same errors (see below). For this particular task, when I replace the mappers and reducers with the identity operation (ie just pass through the data) all is well. When instead I try to do something more taxing (in this case, gathering together all ngrams with the same prefix), I get these errors. My guess is that this is something to do with caching / buffering, since I presume that when the Stream mapper has real work to do, the associated Java streamer buffers input until the Mapper signals that it can process more data. If the Mapper is busy, then a lot of data would get cached, causing some internal buffer to overflow. Miles > Date: Tue Jan 22 14:12:28 GMT 2008 java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.io.IOException: MROutput/MRErrThread failed:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.Text.write(Text.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349) at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.io.IOException: MROutput/MRErrThread failed:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.Text.write(Text.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349) at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) |
|
|
RE: Hadoop-2438> My guess is that this is something to do with caching / buffering, since I
> presume that when the Stream mapper has real work to do, the associated Java > streamer buffers input until the Mapper signals that it can process more > data. If the Mapper is busy, then a lot of data would get cached, causing > some internal buffer to overflow. unlikely. the java buffer would be fixed size. it would write to a unix pipe periodically. if the streaming mapper is not consuming data - the java side would quickly become blocked writing to this pipe. the broken pipe case is extremely common and just tells that the mapper died. best thing to do is find the stderr log for the task (from the jobtracker ui) and find if the mapper left something there before dying. if streaming gurus are reading this - i am curious about one unrelated thing - the java map task does a 'flush()' in the buffered input stream to the streaming mapper after every input line. seemed like unnecessary overhead to me. was curious why (must be some rationale). -----Original Message----- From: milesosb@... on behalf of Miles Osborne Sent: Tue 1/22/2008 6:26 AM To: hadoop-user@... Subject: Hadoop-2438 Has there been any progress / a work-around for this? Currently I'm experimenting with Streaming and I've encountered what looks like the same problem as described here: https://issues.apache.org/jira/browse/HADOOP-2438 So, I get much the same errors (see below). For this particular task, when I replace the mappers and reducers with the identity operation (ie just pass through the data) all is well. When instead I try to do something more taxing (in this case, gathering together all ngrams with the same prefix), I get these errors. My guess is that this is something to do with caching / buffering, since I presume that when the Stream mapper has real work to do, the associated Java streamer buffers input until the Mapper signals that it can process more data. If the Mapper is busy, then a lot of data would get cached, causing some internal buffer to overflow. Miles > Date: Tue Jan 22 14:12:28 GMT 2008 java.io.IOException: Broken pipe at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:124) at java.io.DataOutputStream.flush(DataOutputStream.java:106) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:96) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:107) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.io.IOException: MROutput/MRErrThread failed:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.Text.write(Text.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349) at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) java.io.IOException: MROutput/MRErrThread failed:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.Text.write(Text.java:243) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:349) at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:344) at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:76) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:192) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) |
| Free embeddable forum powered by Nabble | Forum Help |