DistributedCache.addCacheFile() when you're setting up the JobConf
DistributedCache.getLocalCacheFiles() in your map or reduce methods
There is a simple example here:
http://hadoop.apache.org/core/docs/r0.15.3/mapred_tutorial.htmlHave a look at the last example!
Cheers,
Pi
jerrro wrote:
Hello,
When launching a map-reduce job, I am interested in copying a certain file to the datanodes, but not HDFS - the local file system, so I can access that file from my job on the datanode. (The file is around 500KB, so I don't think there will be much overhead). Is there a way to tell hadoop to do that (I heard it is possible, but not sure how)? Also, how do I know where the file is copied to? (I understood it can be copied to /tmp or something of that sort of the datanode).
Thanks.
Jerr.