|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
[core-user] Help deflating output filesHi all,
I'm using hadoop-streaming to execute Python jobs in an EC2 cluster. The output directory in HDFS has part-00000.deflate files - how can I deflate them back into regular text? In my hadoop-site.xml, I unfortunately have: <property> <name>mapred.output.compress</name> <value>true</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> </property> Of course, I could re-build my AMI's without this option, but is there some way I can read my deflate files without going through that hassle? I'm hoping there's a command-line program to read these files since I'm none of my code is Java. Thanks in advance for any help. :) -- Jim R. Wilson (jimbojw) |
|
|
RE: [core-user] Help deflating output filesYou can run another map-only job to read convert the deflated files and write them out in the format you want. Runping > -----Original Message----- > From: Jim R. Wilson [mailto:wilson.jim.r@...] > Sent: Wednesday, June 04, 2008 4:13 PM > To: core-user@... > Subject: [core-user] Help deflating output files > > Hi all, > > I'm using hadoop-streaming to execute Python jobs in an EC2 cluster. > The output directory in HDFS has part-00000.deflate files - how can I > deflate them back into regular text? > > In my hadoop-site.xml, I unfortunately have: > <property> > <name>mapred.output.compress</name> > <value>true</value> > </property> > <property> > <name>mapred.output.compression.type</name> > <value>BLOCK</value> > </property> > > Of course, I could re-build my AMI's without this option, but is there > some way I can read my deflate files without going through that > hassle? I'm hoping there's a command-line program to read these files > since I'm none of my code is Java. > > Thanks in advance for any help. :) > > -- Jim R. Wilson (jimbojw) |
|
|
Re: [core-user] Help deflating output filesHas someone already written a generic deflator program? It would be a
great util to add to the core :) -- Jim On Wed, Jun 4, 2008 at 7:27 PM, Runping Qi <runping@...> wrote: > > You can run another map-only job to read convert the deflated files and > write them out in the format you want. > > Runping > > >> -----Original Message----- >> From: Jim R. Wilson [mailto:wilson.jim.r@...] >> Sent: Wednesday, June 04, 2008 4:13 PM >> To: core-user@... >> Subject: [core-user] Help deflating output files >> >> Hi all, >> >> I'm using hadoop-streaming to execute Python jobs in an EC2 cluster. >> The output directory in HDFS has part-00000.deflate files - how can I >> deflate them back into regular text? >> >> In my hadoop-site.xml, I unfortunately have: >> <property> >> <name>mapred.output.compress</name> >> <value>true</value> >> </property> >> <property> >> <name>mapred.output.compression.type</name> >> <value>BLOCK</value> >> </property> >> >> Of course, I could re-build my AMI's without this option, but is there >> some way I can read my deflate files without going through that >> hassle? I'm hoping there's a command-line program to read these files >> since I'm none of my code is Java. >> >> Thanks in advance for any help. :) >> >> -- Jim R. Wilson (jimbojw) > |
|
|
Re: [core-user] Help deflating output filesYou can override this property by passing in -jobconf mapred.output.compress=false to the hadoop binary, e.g.
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.18.0-streaming.jar -input "/user/root/input" -mapper 'cat' -reducer 'wc -l' -output "/user/root/output" -jobconf mapred.job.name="Experiment" -jobconf mapred.output.compress=false -- Martin
|
|
|
Re: [core-user] Help deflating output filesHi,
br = new BufferedReader(new InputStreamReader(new java.util.zip.InflaterInputStream(new FileInputStream(currFile))));
|
| Free embeddable forum powered by Nabble | Forum Help |