hadoop and local files

View: New views
2 Messages — Rating Filter:   Alert me  

hadoop and local files

by jerrro :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

When launching a map-reduce job, I am interested in copying a certain file to the datanodes, but not HDFS - the local file system, so I can access that file from my job on the datanode. (The file is around 500KB, so I don't think there will be much overhead). Is there a way to tell hadoop to do that (I heard it is possible, but not sure how)? Also, how do I know where the file is copied to? (I understood it can be copied to /tmp or something of that sort of the datanode).

Thanks.



Jerr.

Re: hadoop and local files

by pi_song :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

DistributedCache.addCacheFile()  when you're setting up the JobConf

DistributedCache.getLocalCacheFiles()    in your map or reduce methods


There is a simple example here: http://hadoop.apache.org/core/docs/r0.15.3/mapred_tutorial.html
Have a look at the last example!

Cheers,
Pi


jerrro wrote:
Hello,

When launching a map-reduce job, I am interested in copying a certain file to the datanodes, but not HDFS - the local file system, so I can access that file from my job on the datanode. (The file is around 500KB, so I don't think there will be much overhead). Is there a way to tell hadoop to do that (I heard it is possible, but not sure how)? Also, how do I know where the file is copied to? (I understood it can be copied to /tmp or something of that sort of the datanode).

Thanks.



Jerr.