<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<id>tag:old.nabble.com,2006:forum-17067</id>
	<title>Nabble - Hadoop lucene-users</title>
	<updated>2009-11-04T03:46:11Z</updated>
	<link rel="self" type="application/atom+xml" href="http://old.nabble.com/Hadoop-lucene-users-f17067.xml" />
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Hadoop-lucene-users-f17067.html" />
	<subtitle type="html"></subtitle>
	
<entry>
	<id>tag:old.nabble.com,2006:post-26194789</id>
	<title>hadoop log file for localfile system</title>
	<published>2009-11-04T03:46:11Z</published>
	<updated>2009-11-04T03:46:11Z</updated>
	<author>
		<name>kulketa</name>
	</author>
	<content type="html">Hi i am new to hadoop been using it for few weeks now, i tried few map/reduce example and could see logs in jobtracker.jsp, . we can view status of the job, every task, config, counters, exceptions and even anything that is written to System.out, System.err and syslogs thats really nice but the problem is, This log can be used only for HDFS but not from localfile system.
&lt;br&gt;I have to use local file system , so is there any way to have log for local file system in hadoop.or is there anyway to use same jobtracker.jsb for local file system too. i am really stuck with this problem. it would be really great help if anyone could help me out.
&lt;br&gt;&lt;br&gt;Thanks in advance.
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hadoop-log-file-for-localfile-system-tp26194789p26194789.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-26157876</id>
	<title>A way to input xml files in mapreduce</title>
	<published>2009-11-02T14:27:30Z</published>
	<updated>2009-11-02T14:27:30Z</updated>
	<author>
		<name>VIPUL SHARMA</name>
	</author>
	<content type="html">I am new to hadoop and still learning most of the details. I am working on an application that will take input from lots of small xml files. Each xml files has some record that I want to parse and input data in a hbase table. How should I go about parsing xml files and input in map functions. Should I have one mapper per xml file or is there another way of doing this? Thanks for your help and time.
&lt;br&gt;&lt;br&gt;Thanks
&lt;br&gt;Vipul </content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/A-way-to-input-xml-files-in-mapreduce-tp26157876p26157876.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25982924</id>
	<title>Problem in the way reducer and combiner operate(Merging)</title>
	<published>2009-10-20T14:30:47Z</published>
	<updated>2009-10-20T14:30:47Z</updated>
	<author>
		<name>nikjosh</name>
	</author>
	<content type="html">2009-10-19 17:54:16,221 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId=
&lt;br&gt;2009-10-19 17:54:17,632 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=78643200, MaxSingleShuffleLimit=19660800
&lt;br&gt;2009-10-19 17:54:17,761 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Thread started: Thread for merging on-disk files
&lt;br&gt;2009-10-19 17:54:17,762 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Thread waiting: Thread for merging on-disk files
&lt;br&gt;2009-10-19 17:54:17,763 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Thread started: Thread for merging in memory files
&lt;br&gt;2009-10-19 17:54:17,782 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Need another 2 map output(s) where 0 is already in progress
&lt;br&gt;2009-10-19 17:54:17,793 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0: Got 0 new map-outputs &amp; number of known map outputs is 0
&lt;br&gt;2009-10-19 17:54:17,793 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Scheduled 0 of 0 known outputs (0 slow hosts and 0 dup hosts)
&lt;br&gt;2009-10-19 17:54:22,816 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0: Got 2 new map-outputs &amp; number of known map outputs is 2
&lt;br&gt;2009-10-19 17:54:22,818 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Scheduled 1 of 2 known outputs (0 slow hosts and 1 dup hosts)
&lt;br&gt;2009-10-19 17:54:23,041 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 44 bytes (44 raw bytes) into RAM from attempt_200910191750_0001_m_000000_0
&lt;br&gt;2009-10-19 17:54:23,043 INFO org.apache.hadoop.mapred.ReduceTask: Read 44 bytes from map-output for attempt_200910191750_0001_m_000000_0
&lt;br&gt;2009-10-19 17:54:23,047 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200910191750_0001_m_000000_0 -&amp;gt; (4, 8) from localhost
&lt;br&gt;2009-10-19 17:54:24,808 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Scheduled 1 of 1 known outputs (0 slow hosts and 0 dup hosts)
&lt;br&gt;2009-10-19 17:54:24,824 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 44 bytes (44 raw bytes) into RAM from attempt_200910191750_0001_m_000001_0
&lt;br&gt;2009-10-19 17:54:24,825 INFO org.apache.hadoop.mapred.ReduceTask: Read 44 bytes from map-output for attempt_200910191750_0001_m_000001_0
&lt;br&gt;2009-10-19 17:54:24,825 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200910191750_0001_m_000001_0 -&amp;gt; (4, 8) from localhost
&lt;br&gt;2009-10-19 17:54:25,800 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager
&lt;br&gt;2009-10-19 17:54:25,817 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 0 files left.
&lt;br&gt;2009-10-19 17:54:25,973 INFO org.apache.hadoop.mapred.ReduceTask: Initiating in-memory merge with 2 segments...
&lt;br&gt;2009-10-19 17:54:25,984 INFO org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
&lt;br&gt;2009-10-19 17:54:25,985 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 88 bytes
&lt;br&gt;2009-10-19 17:54:25,991 FATAL mma.CombA: Running Identity Combiner... 
&lt;br&gt;2009-10-19 17:54:25,992 FATAL mma.CombA: Identity Key is1 Combiner processed #: 2
&lt;br&gt;2009-10-19 17:54:25,992 FATAL mma.CombA: Running Identity Combiner... 
&lt;br&gt;2009-10-19 17:54:25,992 FATAL mma.CombA: Identity Key is4 Combiner processed #: 1
&lt;br&gt;2009-10-19 17:54:25,993 FATAL mma.CombA: Running Identity Combiner... 
&lt;br&gt;2009-10-19 17:54:25,993 FATAL mma.CombA: Identity Key is3 Combiner processed #: 1
&lt;br&gt;2009-10-19 17:54:25,993 FATAL mma.CombA: Running Identity Combiner... 
&lt;br&gt;2009-10-19 17:54:25,993 FATAL mma.CombA: Identity Key is4 Combiner processed #: 1
&lt;br&gt;2009-10-19 17:54:25,993 FATAL mma.CombA: Running Identity Combiner... 
&lt;br&gt;2009-10-19 17:54:25,993 FATAL mma.CombA: Identity Key is3 Combiner processed #: 1
&lt;br&gt;2009-10-19 17:54:25,996 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200910191750_0001_r_000000_0 Merge of the 2 files in-memory complete. Local file is /var/lib/hadoop/cache/hadoop/mapred/local/taskTracker/jobcache/job_200910191750_0001/attempt_200910191750_0001_r_000000_0/output/map_0.out of size 86
&lt;br&gt;2009-10-19 17:54:25,996 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 0 files left.
&lt;br&gt;2009-10-19 17:54:25,998 INFO org.apache.hadoop.mapred.ReduceTask: Initiating final on-disk merge with 1 files
&lt;br&gt;2009-10-19 17:54:25,999 INFO org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
&lt;br&gt;2009-10-19 17:54:26,012 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 86 bytes
&lt;br&gt;2009-10-19 17:54:26,081 FATAL mma.RedA: THE KEY WAS 'PAVG' 
&lt;br&gt;2009-10-19 17:54:26,081 FATAL mma.RedA: AVG ENCTRD:0.0
&lt;br&gt;2009-10-19 17:54:26,082 FATAL mma.RedA: AVG ENCTRD:3.0
&lt;br&gt;2009-10-19 17:54:26,083 FATAL mma.RedA: Executed with key:1 Reducer with2.0
&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: &lt;i&gt;THE KEY WAS 'MIN' &lt;/i&gt;&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: MIN ENCTRD:3.0
&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: Executed with key:4 Reducer with1.0
&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA:&lt;b&gt;&amp;nbsp;THE KEY WAS 'MAX' &lt;/b&gt;&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: MAX ENCTRD:3.0
&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: Executed with key:3 Reducer with1.0
&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: &lt;i&gt;THE KEY WAS 'MIN' &lt;/i&gt;&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: MIN ENCTRD:1.0
&lt;br&gt;2009-10-19 17:54:26,084 FATAL mma.RedA: Executed with key:4 Reducer with1.0
&lt;br&gt;2009-10-19 17:54:26,085 FATAL mma.RedA: &lt;b&gt;THE KEY WAS 'MAX' &lt;/b&gt;&lt;br&gt;2009-10-19 17:54:26,086 FATAL mma.RedA: MAX ENCTRD:2.0
&lt;br&gt;2009-10-19 17:54:26,087 FATAL mma.RedA: Executed with key:3 Reducer with1.0
&lt;br&gt;2009-10-19 17:54:26,215 INFO org.apache.hadoop.mapred.TaskRunner: Task 'attempt_200910191750_0001_r_000000_0' done.
&lt;br&gt;&lt;br&gt;The above is the log of my program, which simply does following:
&lt;br&gt;1.Map outputs a kv pair, the key is set to value UNPROCESSED (the key is intwritable)
&lt;br&gt;2.Combiner takes kv pair with value UNPROCESSED and from the list for this key, finds out min,max,avg and outputs
&lt;br&gt;&amp;nbsp; &amp;nbsp;three kv pairs, one for min with key MIN, one for MAX,one for AVG.
&lt;br&gt;&lt;br&gt;3.Reducer should then get the list of values for key MIN, list for MAX, and list for AVG.
&lt;br&gt;&lt;br&gt;Problem:
&lt;br&gt;&lt;br&gt;As seen from log above during merging process, the output isnt sorted as per the key, thus my reducer seemingly gets two lists for MIN and two list for MAX instead of getting 1 key and a list for MAX, and same for min.
&lt;br&gt;&lt;br&gt;&lt;br&gt;I wonder what the problem is, I concur that the problem is in merging step....
&lt;br&gt;&lt;br&gt;Does anyone know whats wrong in above ,from the log?&lt;img class='smiley' src='http://old.nabble.com/images/smiley/anim_crazy.gif' /&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Problem-in-the-way-reducer-and-combiner-operate%28Merging%29-tp25982924p25982924.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25859056</id>
	<title>map function</title>
	<published>2009-10-12T09:40:40Z</published>
	<updated>2009-10-12T09:40:40Z</updated>
	<author>
		<name>hellpizza</name>
	</author>
	<content type="html">Can map function be called recursively?</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/map-function-tp25859056p25859056.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25751507</id>
	<title>Is anyone interested in working in Spain (Madrid or Barcelona), in Telefónica R&amp;D as a Data Analysis Expert?</title>
	<published>2009-10-05T07:09:49Z</published>
	<updated>2009-10-05T07:09:49Z</updated>
	<author>
		<name>telefonicaid</name>
	</author>
	<content type="html">Please, check the vacancy:
&lt;br&gt;&lt;br&gt;Telefonica is in the process of converting data into information of our customers using modern data analysis platforms to personalize our services. The successful candidate will: 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 
&lt;br&gt;Technically lead teams of engineers and be responsible for guiding personalization products throughout the execution cycle, focusing specifically on analyzing, designing, implementing and tailoring our solutions to all the markets where Telefonica does business.
&lt;br&gt;&lt;br&gt;Take part in the Telefonica Group Data Analysis strategy through different global initiatives.
&lt;br&gt;&lt;br&gt;Drive the architecture and take decisions on technologies for developing global products implementing Telefonica personalization strategy.
&lt;br&gt;&lt;br&gt;&lt;br&gt;PROFILE:
&lt;br&gt;&lt;br&gt;Requirements:
&lt;br&gt;&lt;br&gt;-	Graduate degree on computer science or engineering program (BA or MS degree in computer science, engineering or other technical field preferred). 
&lt;br&gt;-	Have a wide technical and business knowledge of modern Data Analysis Platforms and personalization Technologies, distributed databases, data modeling, parallel processing, distributed and cloud computing, recommendation systems, inference technologies, etc…
&lt;br&gt;-	Full knowledge of technology and business in his area (companies, universities, official organisms, products, other operators…)
&lt;br&gt;&lt;br&gt;&lt;br&gt;Experience:
&lt;br&gt;&lt;br&gt;-	Entrepreneurial drive, demonstrated ability to achieve stretch goals in an innovative and fast paced environment. 
&lt;br&gt;-	5+ years relevant experience in the Data Analysis &amp; Personalization Technologies area. 
&lt;br&gt;-	10+ years in architecting software products.
&lt;br&gt;-	Proven track record for product delivery in real world.(At least 5 or more complete product development cycles) 
&lt;br&gt;-	Able to fit in well within an informal startup-type environment and to provide hands-on development.
&lt;br&gt;-	Participation in technical strategic decisions.
&lt;br&gt;&lt;br&gt;Skills:
&lt;br&gt;&lt;br&gt;-	Very strong technical skills and deep domain knowledge in the Area.
&lt;br&gt;-	Technical leadership and change focused. You will have to become a role model for the Data Analysis &amp; Personalization Technologies Group.
&lt;br&gt;-	Excellent technical vision and innovative skills.
&lt;br&gt;-	Sound knowledge in software architecture for data analysis and personalization products.
&lt;br&gt;-	Strong communication skills that make you able to interact within a global corporation at a Sr Management level. &amp;nbsp;
&lt;br&gt;-	That, not easy to find, mix of intelligence, integrity, domain knowledge, verbal agility, and diplomacy which allows you to rapidly earn the trust of technically-astute teams across the company.
&lt;br&gt;&lt;br&gt;&lt;br&gt;Other interesting information:
&lt;br&gt;&lt;br&gt;-	Availability to travel.
&lt;br&gt;-	English Proficiency level; Spanish conversational level.
&lt;br&gt;-	Demonstrated strong performance in prior roles, with increasing levels of responsibility.
&lt;br&gt;-	The position will be based in Barcelona or Madrid, Spain.
&lt;br&gt;&lt;br&gt;Send your CV to: e.seleccion1@tid.es
&lt;br&gt;&lt;br&gt;Thanks a lot!</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Is-anyone-interested-in-working-in-Spain-%28Madrid-or-Barcelona%29%2C-in-Telef%C3%B3nica-R-D-as-a-Data-Analysis-Expert--tp25751507p25751507.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25584701</id>
	<title>Exception in thread &quot;main&quot; org.apache.hadoop.ipc.RemoteException: java.io.IOExce</title>
	<published>2009-09-23T12:29:56Z</published>
	<updated>2009-09-23T12:29:56Z</updated>
	<author>
		<name>az49</name>
	</author>
	<content type="html">I am having an issue starting a job on &amp;nbsp;a hadoop pseudo cluster. I am getting an exception see below. I tried to override &amp;nbsp;the mapred.local.dir with -D parameter and by passing a custom config as well
&lt;br&gt;&lt;br&gt;Here is the custom config declaration:
&lt;br&gt;&lt;br&gt;zeltoa01@hadoop01:~/dev/etl/Hadoop/ybhadoop$ cat etc/hadoop-cluster.xml
&lt;br&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&lt;br&gt;&amp;lt;configuration&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;name&amp;gt;mapred.local.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;value&amp;gt;${hadoop.tmp.dir}/mapred/local&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;/property&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;name&amp;gt;hadoop.tmp.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt;value&amp;gt;/opt/hadoop-datastore&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;lt;description&amp;gt;A base for other temporary directories.&amp;lt;/description&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;lt;/property&amp;gt;
&lt;br&gt;&lt;br&gt;&lt;br&gt;zeltoa01@hadoop01:~/dev/etl/Hadoop/ybhadoop$ hadoop jar ybhadoop.jar com.yellow
&lt;br&gt;book.data.hadoop.jobs.ListingsJobStandAlone -Dmapred.local.dir=/home/zeltoa01/m
&lt;br&gt;apred_dir/
&lt;br&gt;09/09/23 15:17:08 INFO mapred.FileInputFormat: Total input paths to process : 1
&lt;br&gt;Exception in thread &amp;quot;main&amp;quot; org.apache.hadoop.ipc.RemoteException: java.io.IOExce
&lt;br&gt;ption: No valid local directories in property: mapred.local.dir
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:
&lt;br&gt;938)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:279)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobInProgress.&amp;lt;init&amp;gt;(JobInProgress.java:256)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobInProgress.&amp;lt;init&amp;gt;(JobInProgress.java:240)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3024)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
&lt;br&gt;java:39)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
&lt;br&gt;sorImpl.java:25)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.lang.reflect.Method.invoke(Method.java:597)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.security.AccessController.doPrivileged(Native Method)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at javax.security.auth.Subject.doAs(Subject.java:396)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.Client.call(Client.java:739)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.$Proxy1.submitJob(Unknown Source)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:8
&lt;br&gt;41)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:771)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1290)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at com.yellowbook.data.hadoop.jobs.ListingsJobStandAlone.run(Unknown Sou
&lt;br&gt;rce)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at com.yellowbook.data.hadoop.jobs.ListingsJobStandAlone.main(Unknown So
&lt;br&gt;urce)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
&lt;br&gt;java:39)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
&lt;br&gt;sorImpl.java:25)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.lang.reflect.Method.invoke(Method.java:597)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.util.RunJar.main(RunJar.java:185)
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Exception-in-thread-%22main%22-org.apache.hadoop.ipc.RemoteException%3A-java.io.IOExce-tp25584701p25584701.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25177034</id>
	<title>Re: Delete replicated blocks?</title>
	<published>2009-08-27T11:00:21Z</published>
	<updated>2009-08-27T11:00:21Z</updated>
	<author>
		<name>Alex Loddengaard-3</name>
	</author>
	<content type="html">I don't know for sure, but running the rebalancer might do this for you.
&lt;br&gt;&lt;br&gt;&amp;lt;
&lt;br&gt;&lt;a href=&quot;http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Rebalancer&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://hadoop.apache.org/common/docs/r0.20.0/hdfs_user_guide.html#Rebalancer&lt;/a&gt;&lt;br&gt;&amp;gt;
&lt;br&gt;&lt;br&gt;Alex
&lt;br&gt;&lt;br&gt;On Thu, Aug 27, 2009 at 9:18 AM, Michael Thomas &amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25177034&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;thomas@...&lt;/a&gt;&amp;gt;wrote:
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; dfs.replication is only used by the client at the time the files are
&lt;br&gt;&amp;gt; written. &amp;nbsp;Changing this setting will not automatically change the
&lt;br&gt;&amp;gt; replication level on existing files. &amp;nbsp;To do that, you need to use the
&lt;br&gt;&amp;gt; hadoop cli:
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; hadoop fs -setrep -R 1 /
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; --Mike
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Vladimir Klimontovich wrote:
&lt;br&gt;&amp;gt; &amp;gt; This will happen automatically.
&lt;br&gt;&amp;gt; &amp;gt; On Aug 27, 2009, at 6:04 PM, Andy Liu wrote:
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; I'm running a test Hadoop cluster, which had a dfs.replication value
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; of 3.
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; I'm now running out of disk space, so I've reduced dfs.replication to
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; 1 and
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; restarted my datanodes. &amp;nbsp;Is there a way to free up the over-replicated
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; blocks, or does this happen automatically at some point?
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt;
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Thanks,
&lt;br&gt;&amp;gt; &amp;gt;&amp;gt; Andy
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt; &amp;gt; ---
&lt;br&gt;&amp;gt; &amp;gt; Vladimir Klimontovich,
&lt;br&gt;&amp;gt; &amp;gt; skype: klimontovich
&lt;br&gt;&amp;gt; &amp;gt; GoogleTalk/Jabber: &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25177034&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;klimontovich@...&lt;/a&gt;
&lt;br&gt;&amp;gt; &amp;gt; Cell phone: +7926 890 2349
&lt;br&gt;&amp;gt; &amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;br&gt;&lt;/div&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Delete-replicated-blocks--tp25173066p25177034.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25175355</id>
	<title>Re: Delete replicated blocks?</title>
	<published>2009-08-27T09:18:00Z</published>
	<updated>2009-08-27T09:18:00Z</updated>
	<author>
		<name>Michael Thomas-13</name>
	</author>
	<content type="html">dfs.replication is only used by the client at the time the files are
&lt;br&gt;written. &amp;nbsp;Changing this setting will not automatically change the
&lt;br&gt;replication level on existing files. &amp;nbsp;To do that, you need to use the
&lt;br&gt;hadoop cli:
&lt;br&gt;&lt;br&gt;hadoop fs -setrep -R 1 /
&lt;br&gt;&lt;br&gt;--Mike
&lt;br&gt;&lt;br&gt;&lt;br&gt;Vladimir Klimontovich wrote:
&lt;div class='shrinkable-quote'&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; This will happen automatically.
&lt;br&gt;&amp;gt; On Aug 27, 2009, at 6:04 PM, Andy Liu wrote:
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt;&amp;gt; I'm running a test Hadoop cluster, which had a dfs.replication value
&lt;br&gt;&amp;gt;&amp;gt; of 3.
&lt;br&gt;&amp;gt;&amp;gt; I'm now running out of disk space, so I've reduced dfs.replication to
&lt;br&gt;&amp;gt;&amp;gt; 1 and
&lt;br&gt;&amp;gt;&amp;gt; restarted my datanodes. &amp;nbsp;Is there a way to free up the over-replicated
&lt;br&gt;&amp;gt;&amp;gt; blocks, or does this happen automatically at some point?
&lt;br&gt;&amp;gt;&amp;gt;
&lt;br&gt;&amp;gt;&amp;gt; Thanks,
&lt;br&gt;&amp;gt;&amp;gt; Andy
&lt;br&gt;&amp;gt; 
&lt;br&gt;&amp;gt; ---
&lt;br&gt;&amp;gt; Vladimir Klimontovich,
&lt;br&gt;&amp;gt; skype: klimontovich
&lt;br&gt;&amp;gt; GoogleTalk/Jabber: &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25175355&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;klimontovich@...&lt;/a&gt;
&lt;br&gt;&amp;gt; Cell phone: +7926 890 2349
&lt;br&gt;&amp;gt; 
&lt;/div&gt;&lt;/div&gt;&lt;br /&gt; &lt;div class=&quot;small&quot;&gt;&lt;br/&gt;&lt;img src=&quot;http://old.nabble.com/images/icon_attachment.gif&quot; &gt; &lt;strong&gt;smime.p7s&lt;/strong&gt; (5K) &lt;a href=&quot;http://old.nabble.com/attachment/25175355/0/smime.p7s&quot; target=&quot;_top&quot;&gt;Download Attachment&lt;/a&gt;&lt;/div&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Delete-replicated-blocks--tp25173066p25175355.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25175233</id>
	<title>Re: Delete replicated blocks?</title>
	<published>2009-08-27T07:05:37Z</published>
	<updated>2009-08-27T07:05:37Z</updated>
	<author>
		<name>Vladimir Klimontovich</name>
	</author>
	<content type="html">This will happen automatically.
&lt;br&gt;On Aug 27, 2009, at 6:04 PM, Andy Liu wrote:
&lt;br&gt;&lt;br&gt;&amp;gt; I'm running a test Hadoop cluster, which had a dfs.replication value &amp;nbsp;
&lt;br&gt;&amp;gt; of 3.
&lt;br&gt;&amp;gt; I'm now running out of disk space, so I've reduced dfs.replication &amp;nbsp;
&lt;br&gt;&amp;gt; to 1 and
&lt;br&gt;&amp;gt; restarted my datanodes. &amp;nbsp;Is there a way to free up the over-replicated
&lt;br&gt;&amp;gt; blocks, or does this happen automatically at some point?
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Thanks,
&lt;br&gt;&amp;gt; Andy
&lt;br&gt;&lt;br&gt;---
&lt;br&gt;Vladimir Klimontovich,
&lt;br&gt;skype: klimontovich
&lt;br&gt;GoogleTalk/Jabber: &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=25175233&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;klimontovich@...&lt;/a&gt;
&lt;br&gt;Cell phone: +7926 890 2349
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Delete-replicated-blocks--tp25173066p25175233.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25173066</id>
	<title>Delete replicated blocks?</title>
	<published>2009-08-27T07:04:17Z</published>
	<updated>2009-08-27T07:04:17Z</updated>
	<author>
		<name>Andy Liu-3</name>
	</author>
	<content type="html">I'm running a test Hadoop cluster, which had a dfs.replication value of 3.
&lt;br&gt;I'm now running out of disk space, so I've reduced dfs.replication to 1 and
&lt;br&gt;restarted my datanodes. &amp;nbsp;Is there a way to free up the over-replicated
&lt;br&gt;blocks, or does this happen automatically at some point?
&lt;br&gt;&lt;br&gt;Thanks,
&lt;br&gt;Andy
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Delete-replicated-blocks--tp25173066p25173066.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-25173006</id>
	<title>Delete replicated blocks?</title>
	<published>2009-08-27T07:00:19Z</published>
	<updated>2009-08-27T07:00:19Z</updated>
	<author>
		<name>Andy Liu-3</name>
	</author>
	<content type="html">I'm running a test Hadoop cluster, which had
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Delete-replicated-blocks--tp25173006p25173006.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24626239</id>
	<title>Hadoop performance using Mahout</title>
	<published>2009-07-23T06:54:15Z</published>
	<updated>2009-07-23T06:54:15Z</updated>
	<author>
		<name>nfantone</name>
	</author>
	<content type="html">First things first: I want to salute you all and thank you for developing a distributed engine such as Hadoop. It certainly helped me at work. I am now in the process of writing an application for user clustering based on their historical behavior as consumers. For clustering/classification algorithms I resorted to Apache Mahout.
&lt;br&gt;&lt;br&gt;Here's the thing: I generated a pretty small dataset of about ~62MB and set up a small cluster of 5 datanodes and a namenode/jobtracker (runnning on the same machine). Of the datanodes, two of them are four-core processors and the remaining are two-cores (totaling fourteen slaves nodes)... and I tend to think that's more than enough processing power to finish the task in a relatively considerate time, which is exactly what it is not happening. Each MR job is taking about ~3hs to complete, as shown by the jobtracker web UI:
&lt;br&gt;&lt;br&gt;Hadoop job_200907221734_0004
&lt;br&gt;Finished in: 2hrs, 34mins, 3sec
&lt;br&gt;&lt;br&gt;Hadoop job_200907221734_0005
&lt;br&gt;Finished in: 2hrs, 59mins, 34sec
&lt;br&gt;&lt;br&gt;The clustering algorithms runs several iterations of MR phases until it converges, and it takes more than 30hs. in total to complete. For such a small dataset, this is unacceptable and I'm quite sure is has something to do with my cluster configuration and/or how block and its sizes are treated in HDFS. Moreover -and this is quite puzzling to me-, every core on every machine is running at its full capacity almost constantly and it doesn't seem to be any idle time in between tasks. Here are my .xml conf files (just relevant lines):
&lt;br&gt;&lt;br&gt;(mapred-site.xml)
&lt;br&gt;&amp;lt;name&amp;gt;mapred.job.tracker&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;hdfs://hadoop-jobtracker:54311/&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;mapred.reduce.tasks&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;98&amp;lt;/value&amp;gt; &amp;lt;-- 1.75*14*4 (as suggested by Hadoop's documentation)
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;mapred.tasktracker.reduce.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;4&amp;lt;/value&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;mapred.tasktracker.map.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;4&amp;lt;/value&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;mapred.map.tasks&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;17&amp;lt;/value&amp;gt; &amp;lt;-- With 4MB set as dfs.block.size and having -put the 62MB dataset file with -D dfs.block.size=4194304, there should be ~16 map tasks spawned.
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;mapred.tasktracker.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;20&amp;lt;/value&amp;gt;
&lt;br&gt;&lt;br&gt;(hdfs-site.xml)
&lt;br&gt;&amp;lt;name&amp;gt;dfs.replication&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;5&amp;lt;/value&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;dfs.block.size&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;4194304&amp;lt;/value&amp;gt;
&lt;br&gt;&lt;br&gt;(core-site.xml)
&lt;br&gt;&amp;lt;name&amp;gt;hadoop.tmp.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}&amp;lt;/value&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;name&amp;gt;fs.default.name&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;lt;value&amp;gt;hdfs://hadoop-namenode:54310/&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&lt;br&gt;Of course, hadoop-namenode and hadoop-jobtracker are both defined in /etc/hosts and they both reference the same IP. No firewall is enabled on the network. The doesn't seem to be any errors output on the datanode/jobtracker's logs, either. Is there something I should be taking into account, that I am currently not? What could be the cause of such poor performance? Overhead due to copying small bits of data through the nodes, perhaps? Any pointers would be generously appreciated.</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Hadoop-performance-using-Mahout-tp24626239p24626239.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24520521</id>
	<title>Error in running hadoop examples</title>
	<published>2009-07-16T10:04:25Z</published>
	<updated>2009-07-16T10:04:25Z</updated>
	<author>
		<name>Pooja Dave</name>
	</author>
	<content type="html">Hi 
&lt;br&gt;&lt;br&gt;I am relatively new to using hadoop. &amp;nbsp;After installing hadoop on 3 machines i tried running the word count example one one of the machines running as a single node only. However when i try to tun the word count example using the following command on the terminal:
&lt;br&gt;&lt;br&gt;hadoop@user5:~$ /home/hadoop/Desktop/hadoop/bin/hadoop jar /home/hadoop/Desktop/hadoop/hadoop-0.19.1-examples.jar wordcount gutenberg gut-out
&lt;br&gt;&lt;br&gt;where : hadoop is my user account and gutenberg is where the txt files for the word count example are stored and gut-out is where the result is to be stored
&lt;br&gt;&lt;br&gt;it starts the map-reduce however the recduce gets stuck at 0 % even though map reaches 100 % and the output on the console is as follows. I need help. Have been stuck on this problem since 3 days !
&lt;br&gt;&lt;br&gt;09/07/16 12:32:01 INFO mapred.FileInputFormat: Total input paths to process : 3
&lt;br&gt;09/07/16 12:32:44 INFO mapred.JobClient: Running job: job_200907161230_0001
&lt;br&gt;09/07/16 12:32:45 INFO mapred.JobClient: &amp;nbsp;map 0% reduce 0%
&lt;br&gt;09/07/16 12:33:33 INFO mapred.JobClient: &amp;nbsp;map 1% reduce 0%
&lt;br&gt;09/07/16 12:33:37 INFO mapred.JobClient: &amp;nbsp;map 3% reduce 0%
&lt;br&gt;09/07/16 12:33:54 INFO mapred.JobClient: &amp;nbsp;map 5% reduce 0%
&lt;br&gt;09/07/16 12:33:57 INFO mapred.JobClient: &amp;nbsp;map 7% reduce 0%
&lt;br&gt;09/07/16 12:34:07 INFO mapred.JobClient: &amp;nbsp;map 9% reduce 0%
&lt;br&gt;09/07/16 12:34:14 INFO mapred.JobClient: &amp;nbsp;map 11% reduce 0%
&lt;br&gt;09/07/16 12:34:21 INFO mapred.JobClient: &amp;nbsp;map 12% reduce 0%
&lt;br&gt;09/07/16 12:34:29 INFO mapred.JobClient: &amp;nbsp;map 14% reduce 0%
&lt;br&gt;09/07/16 12:34:37 INFO mapred.JobClient: &amp;nbsp;map 16% reduce 0%
&lt;br&gt;09/07/16 12:34:44 INFO mapred.JobClient: &amp;nbsp;map 18% reduce 0%
&lt;br&gt;09/07/16 12:34:51 INFO mapred.JobClient: &amp;nbsp;map 20% reduce 0%
&lt;br&gt;09/07/16 12:34:58 INFO mapred.JobClient: &amp;nbsp;map 22% reduce 0%
&lt;br&gt;09/07/16 12:35:09 INFO mapred.JobClient: &amp;nbsp;map 24% reduce 0%
&lt;br&gt;09/07/16 12:35:41 INFO mapred.JobClient: &amp;nbsp;map 25% reduce 0%
&lt;br&gt;09/07/16 12:36:01 INFO mapred.JobClient: &amp;nbsp;map 27% reduce 0%
&lt;br&gt;09/07/16 12:36:10 INFO mapred.JobClient: &amp;nbsp;map 29% reduce 0%
&lt;br&gt;09/07/16 12:36:34 INFO mapred.JobClient: &amp;nbsp;map 31% reduce 0%
&lt;br&gt;09/07/16 12:36:58 INFO mapred.JobClient: &amp;nbsp;map 33% reduce 0%
&lt;br&gt;09/07/16 12:37:08 INFO mapred.JobClient: &amp;nbsp;map 35% reduce 0%
&lt;br&gt;09/07/16 12:37:15 INFO mapred.JobClient: &amp;nbsp;map 37% reduce 0%
&lt;br&gt;09/07/16 12:37:29 INFO mapred.JobClient: &amp;nbsp;map 38% reduce 0%
&lt;br&gt;09/07/16 12:37:31 INFO mapred.JobClient: &amp;nbsp;map 40% reduce 0%
&lt;br&gt;09/07/16 12:37:47 INFO mapred.JobClient: &amp;nbsp;map 42% reduce 0%
&lt;br&gt;09/07/16 12:37:48 INFO mapred.JobClient: &amp;nbsp;map 44% reduce 0%
&lt;br&gt;09/07/16 12:38:04 INFO mapred.JobClient: &amp;nbsp;map 46% reduce 0%
&lt;br&gt;09/07/16 12:38:06 INFO mapred.JobClient: &amp;nbsp;map 48% reduce 0%
&lt;br&gt;09/07/16 12:38:22 INFO mapred.JobClient: &amp;nbsp;map 49% reduce 0%
&lt;br&gt;09/07/16 12:38:23 INFO mapred.JobClient: &amp;nbsp;map 51% reduce 0%
&lt;br&gt;09/07/16 12:38:39 INFO mapred.JobClient: &amp;nbsp;map 53% reduce 0%
&lt;br&gt;09/07/16 12:38:40 INFO mapred.JobClient: &amp;nbsp;map 55% reduce 0%
&lt;br&gt;09/07/16 12:39:17 INFO mapred.JobClient: &amp;nbsp;map 59% reduce 0%
&lt;br&gt;09/07/16 12:39:37 INFO mapred.JobClient: Task Id : attempt_200907161230_0001_m_000000_0, Status : FAILED
&lt;br&gt;Too many fetch-failures
&lt;br&gt;09/07/16 12:39:37 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:39:37 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:39:43 INFO mapred.JobClient: &amp;nbsp;map 61% reduce 0%
&lt;br&gt;09/07/16 12:40:06 INFO mapred.JobClient: &amp;nbsp;map 64% reduce 0%
&lt;br&gt;09/07/16 12:40:25 INFO mapred.JobClient: &amp;nbsp;map 66% reduce 0%
&lt;br&gt;09/07/16 12:40:27 INFO mapred.JobClient: &amp;nbsp;map 68% reduce 0%
&lt;br&gt;09/07/16 12:40:46 INFO mapred.JobClient: &amp;nbsp;map 70% reduce 0%
&lt;br&gt;09/07/16 12:40:48 INFO mapred.JobClient: &amp;nbsp;map 72% reduce 0%
&lt;br&gt;09/07/16 12:41:06 INFO mapred.JobClient: &amp;nbsp;map 74% reduce 0%
&lt;br&gt;09/07/16 12:41:07 INFO mapred.JobClient: &amp;nbsp;map 75% reduce 0%
&lt;br&gt;09/07/16 12:41:27 INFO mapred.JobClient: &amp;nbsp;map 77% reduce 0%
&lt;br&gt;09/07/16 12:41:28 INFO mapred.JobClient: &amp;nbsp;map 79% reduce 0%
&lt;br&gt;09/07/16 12:41:44 INFO mapred.JobClient: &amp;nbsp;map 81% reduce 0%
&lt;br&gt;09/07/16 12:41:47 INFO mapred.JobClient: &amp;nbsp;map 83% reduce 0%
&lt;br&gt;09/07/16 12:42:03 INFO mapred.JobClient: &amp;nbsp;map 85% reduce 0%
&lt;br&gt;09/07/16 12:42:06 INFO mapred.JobClient: &amp;nbsp;map 87% reduce 0%
&lt;br&gt;09/07/16 12:42:42 INFO mapred.JobClient: &amp;nbsp;map 88% reduce 0%
&lt;br&gt;09/07/16 12:42:45 INFO mapred.JobClient: &amp;nbsp;map 90% reduce 0%
&lt;br&gt;09/07/16 12:43:37 INFO mapred.JobClient: &amp;nbsp;map 92% reduce 0%
&lt;br&gt;09/07/16 12:43:40 INFO mapred.JobClient: &amp;nbsp;map 94% reduce 0%
&lt;br&gt;09/07/16 12:44:30 INFO mapred.JobClient: &amp;nbsp;map 96% reduce 0%
&lt;br&gt;09/07/16 12:44:34 INFO mapred.JobClient: &amp;nbsp;map 98% reduce 0%
&lt;br&gt;09/07/16 12:45:21 INFO mapred.JobClient: &amp;nbsp;map 100% reduce 0%
&lt;br&gt;09/07/16 12:46:27 INFO mapred.JobClient: Task Id : attempt_200907161230_0001_m_000001_0, Status : FAILED
&lt;br&gt;Too many fetch-failures
&lt;br&gt;09/07/16 12:46:27 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:46:27 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:46:32 INFO mapred.JobClient: &amp;nbsp;map 98% reduce 0%
&lt;br&gt;09/07/16 12:46:34 INFO mapred.JobClient: &amp;nbsp;map 100% reduce 0%
&lt;br&gt;09/07/16 12:52:46 INFO mapred.JobClient: Task Id : attempt_200907161230_0001_m_000002_0, Status : FAILED
&lt;br&gt;Too many fetch-failures
&lt;br&gt;09/07/16 12:52:46 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:52:46 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:52:51 INFO mapred.JobClient: &amp;nbsp;map 98% reduce 0%
&lt;br&gt;09/07/16 12:53:01 INFO mapred.JobClient: &amp;nbsp;map 100% reduce 0%
&lt;br&gt;09/07/16 12:59:02 INFO mapred.JobClient: Task Id : attempt_200907161230_0001_m_000003_0, Status : FAILED
&lt;br&gt;Too many fetch-failures
&lt;br&gt;09/07/16 12:59:02 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:59:02 WARN mapred.JobClient: Error reading task outputConnection refused
&lt;br&gt;09/07/16 12:59:07 INFO mapred.JobClient: &amp;nbsp;map 98% reduce 0%
&lt;br&gt;09/07/16 12:59:15 INFO mapred.JobClient: &amp;nbsp;map 100% reduce 0%
&lt;br&gt;&lt;br&gt;&lt;br&gt;&amp;nbsp;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Error-in-running-hadoop-examples-tp24520521p24520521.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24459151</id>
	<title>Re: could only be replicated to 0 nodes, instead of 1</title>
	<published>2009-07-13T03:25:02Z</published>
	<updated>2009-07-13T03:25:02Z</updated>
	<author>
		<name>Anthony.Fan</name>
	</author>
	<content type="html">The full error message is 
&lt;br&gt;09/07/02 16:28:09 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/hadoop/count/count/temp1 retries left 1
&lt;br&gt;09/07/02 16:28:12 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/count/count/temp1 could only be replicated to 0 nodes, instead of 1
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.lang.reflect.Method.invoke(Method.java:597)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.Client.call(Client.java:697)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at $Proxy0.addBlock(Unknown Source)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.lang.reflect.Method.invoke(Method.java:597)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at $Proxy0.addBlock(Unknown Source)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)
&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/could-only-be-replicated-to-0-nodes%2C-instead-of-1-tp24459104p24459151.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24459104</id>
	<title>could only be replicated to 0 nodes, instead of 1</title>
	<published>2009-07-13T03:20:37Z</published>
	<updated>2009-07-13T03:20:37Z</updated>
	<author>
		<name>Anthony.Fan</name>
	</author>
	<content type="html">Hi, All
&lt;br&gt;&lt;br&gt;I just start to use Hadoop few days ago. I met the error message 
&lt;br&gt;&amp;quot; WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/count/count/temp1 could only be replicated to 0 nodes, instead of 1&amp;quot;
&lt;br&gt;while trying to copy data files to DFS after Hadoop is started.
&lt;br&gt;&lt;br&gt;I did all the settings according to the &amp;quot;Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)&amp;quot;'s instruction, and I don't know what's wrong. Besides, during the process, no error message is written to log files.
&lt;br&gt;&lt;br&gt;Also, according to &amp;quot;&lt;a href=&quot;http://localhost.localdomain:50070/dfshealth.jsp&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://localhost.localdomain:50070/dfshealth.jsp&lt;/a&gt;&amp;quot;, I have one live namenode. By the broswer, I even can see the first data file is created in DFS, but the size of it is 0.
&lt;br&gt;&lt;br&gt;Things I've tried:
&lt;br&gt;1. Stop hadoop, re-format DFS and start hadoop again.
&lt;br&gt;2. Change &amp;quot;localhost&amp;quot; to &amp;quot;127.0.0.1&amp;quot;
&lt;br&gt;&lt;br&gt;But neigher of them works.
&lt;br&gt;&lt;br&gt;Could anyone help me or give me a hint?
&lt;br&gt;&lt;br&gt;Thanks.
&lt;br&gt;&lt;br&gt;Anthony</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/could-only-be-replicated-to-0-nodes%2C-instead-of-1-tp24459104p24459104.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24279988</id>
	<title>Re: SVN gone ?</title>
	<published>2009-06-30T14:28:34Z</published>
	<updated>2009-06-30T14:28:34Z</updated>
	<author>
		<name>marcusherou</name>
	</author>
	<content type="html">Yes I seem to be an idiot https.... However the page refer to http
&lt;br&gt;&lt;br&gt;/M
&lt;br&gt;&lt;br&gt;On Tue, Jun 30, 2009 at 11:25 PM, Marcus Herou
&lt;br&gt;&amp;lt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24279988&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;marcus.herou@...&lt;/a&gt;&amp;gt;wrote:
&lt;br&gt;&lt;div class='shrinkable-quote'&gt;&lt;br&gt;&amp;gt; Am I total moron or have the Subversion repo gone fishing ?
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I noticed that yesterday when I did a svn up.
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; I get a 404 on this url: &lt;a href=&quot;http://svn.apache.org/repos/asf/hadoop/core/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://svn.apache.org/repos/asf/hadoop/core/&lt;/a&gt;&lt;br&gt;&amp;gt; which is refferred to from:
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://hadoop.apache.org/core/version_control.html&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://hadoop.apache.org/core/version_control.html&lt;/a&gt;&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; Cheers
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; /Marcus
&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt; --
&lt;br&gt;&amp;gt; Marcus Herou CTO and co-founder Tailsweep AB
&lt;br&gt;&amp;gt; +46702561312
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24279988&amp;i=1&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;marcus.herou@...&lt;/a&gt;
&lt;br&gt;&amp;gt; &lt;a href=&quot;http://www.tailsweep.com/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.tailsweep.com/&lt;/a&gt;&lt;br&gt;&amp;gt;
&lt;br&gt;&amp;gt;
&lt;/div&gt;&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Marcus Herou CTO and co-founder Tailsweep AB
&lt;br&gt;+46702561312
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24279988&amp;i=2&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;marcus.herou@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.tailsweep.com/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.tailsweep.com/&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/SVN-gone---tp24279924p24279988.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24279924</id>
	<title>SVN gone ?</title>
	<published>2009-06-30T14:25:09Z</published>
	<updated>2009-06-30T14:25:09Z</updated>
	<author>
		<name>marcusherou</name>
	</author>
	<content type="html">Am I total moron or have the Subversion repo gone fishing ?
&lt;br&gt;&lt;br&gt;I noticed that yesterday when I did a svn up.
&lt;br&gt;&lt;br&gt;I get a 404 on this url: &lt;a href=&quot;http://svn.apache.org/repos/asf/hadoop/core/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://svn.apache.org/repos/asf/hadoop/core/&lt;/a&gt;&lt;br&gt;which is refferred to from:
&lt;br&gt;&lt;a href=&quot;http://hadoop.apache.org/core/version_control.html&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://hadoop.apache.org/core/version_control.html&lt;/a&gt;&lt;br&gt;&lt;br&gt;Cheers
&lt;br&gt;&lt;br&gt;/Marcus
&lt;br&gt;&lt;br&gt;-- 
&lt;br&gt;Marcus Herou CTO and co-founder Tailsweep AB
&lt;br&gt;+46702561312
&lt;br&gt;&lt;a href=&quot;http://old.nabble.com/user/SendEmail.jtp?type=post&amp;post=24279924&amp;i=0&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;marcus.herou@...&lt;/a&gt;
&lt;br&gt;&lt;a href=&quot;http://www.tailsweep.com/&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://www.tailsweep.com/&lt;/a&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/SVN-gone---tp24279924p24279924.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24228728</id>
	<title>Can I post pig questions on this forum?</title>
	<published>2009-06-26T16:34:05Z</published>
	<updated>2009-06-26T16:34:05Z</updated>
	<author>
		<name>pmg</name>
	</author>
	<content type="html"></content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Can-I-post-pig-questions-on-this-forum--tp24228728p24228728.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-24214951</id>
	<title>hadoop lucene integration</title>
	<published>2009-06-25T21:53:40Z</published>
	<updated>2009-06-25T21:53:40Z</updated>
	<author>
		<name>m.harig</name>
	</author>
	<content type="html">hi all
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; I've work experience with lucene , but am new to hadoop , i created a index by lucene , please any1 tell me how to use hadoop for my lucene index for distributed file system , &lt;b&gt;if possible can any1 send me an example or the link&lt;/b&gt;&amp;nbsp;in which i can use it for my index. Please .</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hadoop-lucene-integration-tp24214951p24214951.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23569592</id>
	<title>BlockAlreadyExistsException</title>
	<published>2009-05-15T18:35:33Z</published>
	<updated>2009-05-15T18:35:33Z</updated>
	<author>
		<name>zxh116116</name>
	</author>
	<content type="html">when I used nutch1.0 fetch data to hadoop,and I have 1 mater 10 clusters all with 4G Memory,1T Hard disk of ubuntu system. 
&lt;br&gt;my config is 
&lt;br&gt;master:
&lt;br&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&lt;br&gt;&amp;lt;?xml-stylesheet type=&amp;quot;text/xsl&amp;quot; href=&amp;quot;configuration.xsl&amp;quot;?&amp;gt;
&lt;br&gt;&amp;lt;!--
&lt;br&gt;&amp;nbsp;Autogenerated by Cloudera's Configurator for Hadoop 0.1.0 on Fri May 15 06:49:30 2009
&lt;br&gt;--&amp;gt;
&lt;br&gt;&amp;lt;configuration&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.block.size&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;134217728&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.data.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/data&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.datanode.du.reserved&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1073741824&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.datanode.handler.count&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;3&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.name.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/namenode&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.namenode.handler.count&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;5&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.permissions&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;True&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.replication&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;3&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;fs.checkpoint.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/secondary-nn&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;fs.default.name&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;hdfs://ubuntu76:9000&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;fs.trash.interval&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1440&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;hadoop.tmp.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/tmp/hadoop-${user.name}&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;io.file.buffer.size&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;65536&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.child.java.opts&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;-Xmx1945m&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.child.ulimit&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;3983360&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.job.tracker&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;ubuntu76:9001&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.job.tracker.handler.count&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;5&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.local.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;${hadoop.tmp.dir}/mapred/local&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.map.tasks.speculative.execution&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;true&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.reduce.parallel.copies&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;10&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.reduce.tasks&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;10&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.reduce.tasks.speculative.execution&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;false&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.tasktracker.map.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.tasktracker.reduce.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;tasktracker.http.threads&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;12&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;/configuration&amp;gt;
&lt;br&gt;&lt;br&gt;slaver
&lt;br&gt;&amp;lt;?xml version=&amp;quot;1.0&amp;quot;?&amp;gt;
&lt;br&gt;&amp;lt;?xml-stylesheet type=&amp;quot;text/xsl&amp;quot; href=&amp;quot;configuration.xsl&amp;quot;?&amp;gt;
&lt;br&gt;&amp;lt;!--
&lt;br&gt;&amp;nbsp;Autogenerated by Cloudera's Configurator for Hadoop 0.1.0 on Fri May 15 06:49:29 2009
&lt;br&gt;--&amp;gt;
&lt;br&gt;&amp;lt;configuration&amp;gt;
&lt;br&gt;&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.block.size&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;134217728&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.data.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/data&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.datanode.du.reserved&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1073741824&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.datanode.handler.count&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;3&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.name.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/namenode&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.namenode.handler.count&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;5&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.permissions&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;True&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;dfs.replication&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;3&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;fs.checkpoint.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/secondary-nn&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;fs.default.name&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;hdfs://ubuntu76:9000&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;fs.trash.interval&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1440&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;hadoop.tmp.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/tmp/hadoop-${user.name}&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;io.file.buffer.size&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;65536&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.child.java.opts&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;-Xmx1945m&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.child.ulimit&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;3983360&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.job.tracker&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;ubuntu76:9001&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.job.tracker.handler.count&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;5&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.local.dir&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;/data/filesystem/mapred/local&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.map.tasks.speculative.execution&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;true&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.reduce.parallel.copies&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;10&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.reduce.tasks&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;10&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.reduce.tasks.speculative.execution&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;false&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.tasktracker.map.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;mapred.tasktracker.reduce.tasks.maximum&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;1&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;property&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;name&amp;gt;tasktracker.http.threads&amp;lt;/name&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;value&amp;gt;12&amp;lt;/value&amp;gt;
&lt;br&gt;&amp;nbsp;&amp;lt;final&amp;gt;true&amp;lt;/final&amp;gt;
&lt;br&gt;&amp;lt;/property&amp;gt;
&lt;br&gt;&amp;lt;/configuration&amp;gt;
&lt;br&gt;&lt;br&gt;but when I excute command &amp;quot;bin/nutch crawl urls -dir crawled -depth 3&amp;quot; in datanode logs file have exception like this:
&lt;br&gt;009-05-15 21:04:10,328 ERROR datanode.DataNode - DatanodeRegistration(113.45.58.77:50010, storageID=DS-1293122987-113.45.58.77-50010-1242435813708, infoPort=50075, ipcPort=50020):DataXceiver
&lt;br&gt;org.apache.hadoop.hdfs.server.datanode.BlockAlreadyExistsException: Block blk_5260319812111246094_1002 is valid, and cannot be written to.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.FSDataset.writeToBlock(FSDataset.java:975)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.&amp;lt;init&amp;gt;(BlockReceiver.java:97)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:259)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.lang.Thread.run(Thread.java:619)
&lt;br&gt;2009-05-15 21:04:39,750 WARN &amp;nbsp;datanode.DataNode - DatanodeRegistration(113.45.58.77:50010, storageID=DS-1293122987-113.45.58.77-50010-1242435813708, infoPort=50075, ipcPort=50020):Failed to transfer blk_9220934476097358434_1017 to 113.45.58.78:50010 got java.net.SocketException: Original Exception : java.io.IOException: Connection reset by peer
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:418)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:519)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:199)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1108)
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at java.lang.Thread.run(Thread.java:619)
&lt;br&gt;Caused by: java.io.IOException: Connection reset by peer
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ... 8 more &amp;nbsp;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/BlockAlreadyExistsException-tp23569592p23569592.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23149249</id>
	<title>Re: hadoop-a small doubt</title>
	<published>2009-04-20T22:06:05Z</published>
	<updated>2009-04-20T22:06:05Z</updated>
	<author>
		<name>Parul Kudtarkar</name>
	</author>
	<content type="html">What is the exact purpose that you want a system not in hadoop cluster to access the namenode or datanode? If it is simply to write data to HDFS from local system and then to copy back data from HDFS to local system simply use hadoop file system's shell commands.
&lt;br&gt;&lt;br&gt;Hope this helps!
&lt;br&gt;&lt;br&gt;&lt;blockquote class=&quot;quote light-black dark-border-color&quot;&gt;&lt;div class=&quot;quote light-border-color&quot;&gt;
&lt;div class=&quot;quote-author&quot; style=&quot;font-weight: bold;&quot;&gt;deepya wrote:&lt;/div&gt;
&lt;div class=&quot;quote-message shrinkable-quote&quot;&gt;Hi,
&lt;br&gt;&amp;nbsp; &amp;nbsp;I am SreeDeepya doing MTech in IIIT.I am working on a project named cost effective and scalable storage server.I configured a small hadoop cluster with only two nodes one namenode and one datanode.I am new to hadoop.
&lt;br&gt;I have a small doubt.
&lt;br&gt;&lt;br&gt;Can a system not in the hadoop cluster access the namenode or the datanode????If yes,then can you please tell me the necessary configurations that has to be done.
&lt;br&gt;&lt;br&gt;Thanks in advance.
&lt;br&gt;&lt;br&gt;SreeDeepya
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hadoop-a-small-doubt-tp22764615p23149249.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23149085</id>
	<title>Copying files from HDFS to remote database</title>
	<published>2009-04-20T21:40:43Z</published>
	<updated>2009-04-20T21:40:43Z</updated>
	<author>
		<name>Parul Kudtarkar</name>
	</author>
	<content type="html">Our application is using hadoop to parallelize jobs across ec2 cluster. HDFS is used to store output files. How would you ideally copy output files from HDFS to remote databases? 
&lt;br&gt;&lt;br&gt;Thanks,
&lt;br&gt;Parul V. Kudtarkar</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Copying-files-from-HDFS-to-remote-database-tp23149085p23149085.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23136416</id>
	<title>Not able to subscribe to pig user/dev mailing list</title>
	<published>2009-04-20T06:14:16Z</published>
	<updated>2009-04-20T06:14:16Z</updated>
	<author>
		<name>Pallavi Palleti</name>
	</author>
	<content type="html">Hi all,
&lt;br&gt;&amp;nbsp;I am not able to subscribe to pig mailing list (both dev and user). Here is the error message that I am getting when I tried to confirm the subscribtion.
&lt;br&gt;&lt;br&gt;&lt;i&gt;Your message did not reach some or all of the intended recipients.
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Subject:	pig-dev-sc.1239701669.ohbefaiphgajdgbcjmjg-pallavi.palleti=corp.aol.com@hadoop.apache.org
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Sent:	14/04/2009 3:11 PM
&lt;br&gt;&lt;br&gt;The following recipient(s) cannot be reached:
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; pig-dev-request@hadoop.apache.org on 14/04/2009 3:12 PM
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; The message could not be delivered because the recipient's mailbox is full.
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;lt; omr-m35.mx.aol.com #5.2.2 SMTP; 552 spam score (6.2) exceeded threshold&amp;gt;
&lt;br&gt;&lt;/i&gt;&lt;br&gt;&lt;br&gt;I tried couple of times so far but not able to register. Did any one else face same issue? 
&lt;br&gt;&lt;br&gt;Thanks
&lt;br&gt;Pallavi</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Not-able-to-subscribe-to-pig-user-dev-mailing-list-tp23136416p23136416.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-23061794</id>
	<title>Re: hadoop-a small doubt</title>
	<published>2009-04-15T08:56:43Z</published>
	<updated>2009-04-15T08:56:43Z</updated>
	<author>
		<name>Pankil Doshi</name>
	</author>
	<content type="html">Hey ,
&lt;br&gt;You can do that.That system should have same usrname like those of cluster and ofcourse it &amp;nbsp;should be able to ssh name node.Also it should have hadoop and its hadoop-site.xml should be similar .Then u can access namenode,hdfs etc.
&lt;br&gt;&lt;br&gt;if you are willing to see the web interface that can be done easily using any system.
&lt;br&gt;&lt;blockquote class=&quot;quote light-black dark-border-color&quot;&gt;&lt;div class=&quot;quote light-border-color&quot;&gt;
&lt;div class=&quot;quote-author&quot; style=&quot;font-weight: bold;&quot;&gt;deepya wrote:&lt;/div&gt;
&lt;div class=&quot;quote-message shrinkable-quote&quot;&gt;Hi,
&lt;br&gt;&amp;nbsp; &amp;nbsp;I am SreeDeepya doing MTech in IIIT.I am working on a project named cost effective and scalable storage server.I configured a small hadoop cluster with only two nodes one namenode and one datanode.I am new to hadoop.
&lt;br&gt;I have a small doubt.
&lt;br&gt;&lt;br&gt;Can a system not in the hadoop cluster access the namenode or the datanode????If yes,then can you please tell me the necessary configurations that has to be done.
&lt;br&gt;&lt;br&gt;Thanks in advance.
&lt;br&gt;&lt;br&gt;SreeDeepya
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hadoop-a-small-doubt-tp22764615p23061794.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-22764615</id>
	<title>hadoop-a small doubt</title>
	<published>2009-03-28T22:29:24Z</published>
	<updated>2009-03-28T22:29:24Z</updated>
	<author>
		<name>deepya</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;&amp;nbsp; &amp;nbsp;I am SreeDeepya doing MTech in IIIT.I am working on a project named cost effective and scalable storage server.I configured a small hadoop cluster with only two nodes one namenode and one datanode.I am new to hadoop.
&lt;br&gt;I have a small doubt.
&lt;br&gt;&lt;br&gt;Can a system not in the hadoop cluster access the namenode or the datanode????If yes,then can you please tell me the necessary configurations that has to be done.
&lt;br&gt;&lt;br&gt;Thanks in advance.
&lt;br&gt;&lt;br&gt;SreeDeepya</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/hadoop-a-small-doubt-tp22764615p22764615.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-22439420</id>
	<title>streaming inputformat: class not found</title>
	<published>2009-03-10T10:30:16Z</published>
	<updated>2009-03-10T10:30:16Z</updated>
	<author>
		<name>t-alleyne</name>
	</author>
	<content type="html">Hello,
&lt;br&gt;&lt;br&gt;I'm try to run a mapreduce job on a data file in which the keys and values alternate rows. &amp;nbsp;E.g.
&lt;br&gt;&lt;br&gt;key1
&lt;br&gt;value1
&lt;br&gt;key2
&lt;br&gt;...
&lt;br&gt;&lt;br&gt;I've written my own InputFormat by extending FileInputFormat (the code for this class is below.) &amp;nbsp;The problem is that when I run hadoop streaming with the command
&lt;br&gt;&lt;br&gt;bin/hadoop jar contrib/streaming/hadoop-0.18.3-streaming.jar -mapper mapper.pl -reducer org.apache.hadoop.mapred.lib.IdentityReducer -input test.data -output test-output -file &amp;lt;pathToMapper.pl&amp;gt; -inputformat MyFormatter
&lt;br&gt;&lt;br&gt;I get the error
&lt;br&gt;&lt;br&gt;-inputformat : class not found : MyFormatter
&lt;br&gt;java.lang.RuntimeException: -inputformat : class not found : MyFormatter
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:550)
&lt;br&gt;...
&lt;br&gt;&lt;br&gt;I have tried putting .java, .class, and .jar file of MyFormatter in the job jar using the -file parameter. &amp;nbsp;I have also tried putting them on the hdfs using -copyFromLocal, but I still get the same error. &amp;nbsp;Can anyone give me some hints as to what the problem might be? &amp;nbsp;Also, I tried to hack together my formatter based on the hadoop examples, so does it seems like it should properly process the input files I described above?
&lt;br&gt;&lt;br&gt;Trevis
&lt;br&gt;&lt;br&gt;&lt;br&gt;&amp;lt;imports ommitted&amp;gt;
&lt;br&gt;&lt;br&gt;public final class MyFormatter extends org.apache.hadoop.mapred.FileInputFormat&amp;lt;Text, Text&amp;gt;
&lt;br&gt;&lt;br&gt;{
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; public RecordReader&amp;lt;Text, Text&amp;gt; getRecordReader( InputSplit split,
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; JobConf job, Reporter reporter ) throws IOException
&lt;br&gt;&amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return new MyRecordReader( job, (FileSplit) split );
&lt;br&gt;&amp;nbsp; &amp;nbsp; }
&lt;br&gt;&amp;nbsp; &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; static class MyRecordReader implements RecordReader&amp;lt;Text, Text&amp;gt;
&lt;br&gt;&amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; private LineRecordReader _in &amp;nbsp; = null;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; private LongWritable &amp;nbsp; &amp;nbsp; _junk = null;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public FastaRecordReader( JobConf job, FileSplit split ) throws IOException
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; _junk = new LongWritable();
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; _in = new LineRecordReader( job, split );
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public void close() throws IOException
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; _in.close();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public Text createKey()
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return new Text();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public Text createValue()
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return new Text();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public long getPos() throws IOException
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return _in.getPos();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public float getProgress() throws IOException
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return _in.getProgress();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; @Override
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; public boolean next( Text key, Text value ) throws IOException
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if ( _in.next( _junk, key ) )
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; if ( _in.next( _junk, value ) )
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return true;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; key.clear();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; value.clear();
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; return false;
&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }
&lt;br&gt;&amp;nbsp; &amp;nbsp; }
&lt;br&gt;}</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/streaming-inputformat%3A-class-not-found-tp22439420p22439420.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20736731</id>
	<title>Re: bypass ssh to upload files on HDFS</title>
	<published>2008-11-28T07:56:59Z</published>
	<updated>2008-11-28T07:56:59Z</updated>
	<author>
		<name>jas69</name>
	</author>
	<content type="html">thanks for your reply, can you suggest the file in hadoop which i can make changes to irradicate ssh.
&lt;br&gt;&lt;quote author=&quot;yossale&quot;&gt;&lt;br&gt;I'm not sure if this is what you meant , but you can start an http service on on of the hdfs machines and use it upload files from the local machine to the HDFS , without using ssh. (remoteServer -&amp;gt; httpService on local , submit to local HDFS)
&lt;br&gt;&lt;br&gt;&lt;blockquote class=&quot;quote light-black dark-border-color&quot;&gt;&lt;div class=&quot;quote light-border-color&quot;&gt;
&lt;div class=&quot;quote-author&quot; style=&quot;font-weight: bold;&quot;&gt;jas69 wrote:&lt;/div&gt;
&lt;div class=&quot;quote-message&quot;&gt;&amp;nbsp;Hi,
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Please suggest me a way out. My interest is to bypass ssh and upload files from local filesystem to HDFS without the use of ssh service. 
&lt;br&gt;&lt;br&gt;&amp;nbsp;Regards.
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
&lt;/quote&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/bypass-ssh-to-upload-files-on-HDFS-tp20723090p20736731.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20723496</id>
	<title>Re: bypass ssh to upload files on HDFS</title>
	<published>2008-11-27T10:06:06Z</published>
	<updated>2008-11-27T10:06:06Z</updated>
	<author>
		<name>yossale</name>
	</author>
	<content type="html">I'm not sure if this is what you meant , but you can start an http service on on of the hdfs machines and use it upload files from the local machine to the HDFS , without using ssh. (remoteServer -&amp;gt; httpService on local , submit to local HDFS)
&lt;br&gt;&lt;br&gt;&lt;blockquote class=&quot;quote light-black dark-border-color&quot;&gt;&lt;div class=&quot;quote light-border-color&quot;&gt;
&lt;div class=&quot;quote-author&quot; style=&quot;font-weight: bold;&quot;&gt;jas69 wrote:&lt;/div&gt;
&lt;div class=&quot;quote-message&quot;&gt;&amp;nbsp;Hi,
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Please suggest me a way out. My interest is to bypass ssh and upload files from local filesystem to HDFS without the use of ssh service. 
&lt;br&gt;&lt;br&gt;&amp;nbsp;Regards.
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/bypass-ssh-to-upload-files-on-HDFS-tp20723090p20723496.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20723090</id>
	<title>bypass ssh to upload files on HDFS</title>
	<published>2008-11-27T09:36:19Z</published>
	<updated>2008-11-27T09:36:19Z</updated>
	<author>
		<name>jas69</name>
	</author>
	<content type="html">&amp;nbsp;Hi,
&lt;br&gt;&lt;br&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Please suggest me a way out. My interest is to bypass ssh and upload files from local filesystem to HDFS without the use of ssh service. 
&lt;br&gt;&lt;br&gt;&amp;nbsp;Regards.</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/bypass-ssh-to-upload-files-on-HDFS-tp20723090p20723090.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20674353</id>
	<title>Block placement in HDFS</title>
	<published>2008-11-24T18:59:28Z</published>
	<updated>2008-11-24T18:59:28Z</updated>
	<author>
		<name>dennis81</name>
	</author>
	<content type="html">Hi everyone,
&lt;br&gt;&lt;br&gt;I was wondering whether it is possible to control the placement of the blocks of a file in HDFS. Is it possible to instruct HDFS about which nodes will hold the block replicas?
&lt;br&gt;&lt;br&gt;Thanks!</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Block-placement-in-HDFS-tp20674353p20674353.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20488938</id>
	<title>Re: &quot;could only be replicated to 0 nodes, instead of 1&quot;</title>
	<published>2008-11-13T12:28:32Z</published>
	<updated>2008-11-13T12:28:32Z</updated>
	<author>
		<name>Arul Ganesh</name>
	</author>
	<content type="html">Hi,
&lt;br&gt;If you are getting this in windows environment (2003 64 bit). We have faced the same problem. Now we tried the following steps and it started working.
&lt;br&gt;1)Install cygwin and ssh.
&lt;br&gt;2) Downloaded the stable version Hadoop - hadoop-0.17.2.1.tar.gz as on 13/Nov/2008
&lt;br&gt;3) Untar it via cygwin (tar xvfz hadoop-0.17.2.1.tar.gz). please DONOT use WINZIP to untar.
&lt;br&gt;4) We tried running the sudo distribution example provided in quickstart (&lt;a href=&quot;http://hadoop.apache.org/core/docs/current/quickstart.html&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://hadoop.apache.org/core/docs/current/quickstart.html&lt;/a&gt;) and it worked.
&lt;br&gt;&lt;br&gt;Thanks
&lt;br&gt;Arul and Limin
&lt;br&gt;eBay Inc.,
&lt;br&gt;&lt;br&gt;&lt;br&gt;&lt;blockquote class=&quot;quote light-black dark-border-color&quot;&gt;&lt;div class=&quot;quote light-border-color&quot;&gt;
&lt;div class=&quot;quote-author&quot; style=&quot;font-weight: bold;&quot;&gt;jerrro wrote:&lt;/div&gt;
&lt;div class=&quot;quote-message shrinkable-quote&quot;&gt;I am trying to install/configure hadoop on a cluster with several computers. I followed exactly the instructions in the hadoop website for configuring multiple slaves, and when I run start-all.sh I get no errors - both datanode and tasktracker are reported to be running (doing ps awux | grep hadoop on the slave nodes returns two java processes). Also, the log files are empty - nothing is printed there. Still, when I try to use bin/hadoop dfs -put,
&lt;br&gt;I get the following error:
&lt;br&gt;&lt;br&gt;# bin/hadoop dfs -put w.txt w.txt
&lt;br&gt;put: java.io.IOException: File /user/scohen/w4.txt could only be replicated to 0 nodes, instead of 1
&lt;br&gt;&lt;br&gt;and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it).
&lt;br&gt;&lt;br&gt;I couldn't find much information about this error, but I did manage to see somewhere it might mean that there are no datanodes running. But as I said, start-all does not give any errors. Any ideas what could be problem?
&lt;br&gt;&lt;br&gt;Thanks.
&lt;br&gt;&lt;br&gt;Jerr.
&lt;/div&gt;
&lt;/div&gt;&lt;/blockquote&gt;
</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/%22could-only-be-replicated-to-0-nodes%2C-instead-of-1%22-tp14175780p20488938.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20445877</id>
	<title>Hadoop Streaming  - running a jar file</title>
	<published>2008-11-11T10:50:35Z</published>
	<updated>2008-11-11T10:50:35Z</updated>
	<author>
		<name>Amit_Gupta</name>
	</author>
	<content type="html">Hi
&lt;br&gt;&lt;br&gt;I have a jar file which takes input from stdin and writes something on stdout. i.e. When I run 
&lt;br&gt;&lt;br&gt;java -jar A.jar &amp;lt; input 
&lt;br&gt;&lt;br&gt;It prints the required output.
&lt;br&gt;&lt;br&gt;However, when I run it as a mapper in hadoop streaming using the command
&lt;br&gt;&lt;br&gt;$HADOOP_HOME/bin/hadoop jar ....streaming.jar -input .. -output ... &amp;nbsp;-mapper 'java -jar A.jar' &amp;nbsp;-reducer NONE 
&lt;br&gt;&lt;br&gt;i get the broken pipe exception.
&lt;br&gt;&lt;br&gt;&lt;br&gt;the error message is 
&lt;br&gt;&lt;br&gt;additionalConfSpec_:null
&lt;br&gt;null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
&lt;br&gt;packageJobJar: [/mnt/hadoop/HADOOP/hadoop-0.16.3/tmp/dir/hadoop-hadoop/hadoop-unjar45410/] [] /tmp/streamjob45411.jar tmpDir=null
&lt;br&gt;08/11/11 23:20:14 INFO mapred.FileInputFormat: Total input paths to process : 1
&lt;br&gt;08/11/11 23:20:14 INFO streaming.StreamJob: getLocalDirs(): [/mnt/hadoop/HADOOP/hadoop-0.16.3/tmp/mapred]
&lt;br&gt;08/11/11 23:20:14 INFO streaming.StreamJob: Running job: job_200811111724_0014
&lt;br&gt;08/11/11 23:20:14 INFO streaming.StreamJob: To kill this job, run:
&lt;br&gt;08/11/11 23:20:14 INFO streaming.StreamJob: /mnt/hadoop/HADOOP/hadoop-0.16.3/bin/../bin/hadoop job &amp;nbsp;-Dmapred.job.tracker=10.105.41.25:54311 -kill job_200811111724_0014
&lt;br&gt;08/11/11 23:20:15 INFO streaming.StreamJob: Tracking URL: &lt;a href=&quot;http://sayali:50030/jobdetails.jsp?jobid=job_200811111724_0014&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://sayali:50030/jobdetails.jsp?jobid=job_200811111724_0014&lt;/a&gt;&lt;br&gt;08/11/11 23:20:16 INFO streaming.StreamJob: &amp;nbsp;map 0% &amp;nbsp;reduce 0%
&lt;br&gt;08/11/11 23:21:00 INFO streaming.StreamJob: &amp;nbsp;map 100% &amp;nbsp;reduce 100%
&lt;br&gt;08/11/11 23:21:00 INFO streaming.StreamJob: To kill this job, run:
&lt;br&gt;08/11/11 23:21:00 INFO streaming.StreamJob: /mnt/hadoop/HADOOP/hadoop-0.16.3/bin/../bin/hadoop job &amp;nbsp;-Dmapred.job.tracker=10.105.41.25:54311 -kill job_200811111724_0014
&lt;br&gt;08/11/11 23:21:00 INFO streaming.StreamJob: Tracking URL: &lt;a href=&quot;http://sayali:50030/jobdetails.jsp?jobid=job_200811111724_0014&quot; target=&quot;_top&quot; rel=&quot;nofollow&quot;&gt;http://sayali:50030/jobdetails.jsp?jobid=job_200811111724_0014&lt;/a&gt;&lt;br&gt;08/11/11 23:21:00 ERROR streaming.StreamJob: Job not Successful!
&lt;br&gt;08/11/11 23:21:00 INFO streaming.StreamJob: killJob...
&lt;br&gt;Streaming Job Failed!
&lt;br&gt;&lt;br&gt;Could some one please help me with any ideas or pointers.
&lt;br&gt;&lt;br&gt;regards
&lt;br&gt;Amit
&lt;br&gt;&lt;br&gt;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Hadoop-Streaming----running-a-jar-file-tp20445877p20445877.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-20192214</id>
	<title>How does an offline Datanode come back up ?</title>
	<published>2008-10-27T10:22:46Z</published>
	<updated>2008-10-27T10:22:46Z</updated>
	<author>
		<name>wmitchell</name>
	</author>
	<content type="html">Hi All,
&lt;br&gt;&lt;br&gt;Ive been working michael nolls multi-node cluster setup example (Running_Hadoop_On_Ubuntu_Linux) for hadoop and I have a working setup. I then on my slave machine -- which is currently running a datanode killed the process in an effort to try to simulate some sort of failure on the slave machine datanode. I had assumed that the namenode would have been polling its datanodes and thus attempted to bring up any node that goes down. On looking at my slave machine it seems that the datanode process is still down (I've checked jps).
&lt;br&gt;&lt;br&gt;Obviously im missing something ! Does hadoop look after its datanodes ? Is there a config setting that i may have missed ? Do I need to create some sort of external tool to pool and attempt to bring up nodes that have gone down ?
&lt;br&gt;&lt;br&gt;Thanks
&lt;br&gt;Will
&lt;br&gt;&amp;nbsp;</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/How-does-an-offline-Datanode-come-back-up---tp20192214p20192214.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19994780</id>
	<title>Hadoop with image processing</title>
	<published>2008-10-15T07:31:54Z</published>
	<updated>2008-10-15T07:31:54Z</updated>
	<author>
		<name>mrasitozdas</name>
	</author>
	<content type="html">!!! MEMBERS OF core-user@hadoop.apache.org, DON'T READ THIS
&lt;br&gt;----------------
&lt;br&gt;&lt;br&gt;Hi to all, I started to work on a hadoop-based project.
&lt;br&gt;In our application, there are a huge number of images with a regular pattern, differing in 4 parts/blocks.
&lt;br&gt;System takes an image as input and looks for a similar image, considering if all these 4 parts match.
&lt;br&gt;(System finds all the matches, even after finding one).
&lt;br&gt;Each of these parts are independent, result of each part computed separately, these are
&lt;br&gt;printed on the screen and then an average matching percentage is calculated from these.
&lt;br&gt;&lt;br&gt;(I can write more detailed information if needed)
&lt;br&gt;&lt;br&gt;Could you suggest a structure? or any ideas to have a better result?
&lt;br&gt;&lt;br&gt;Images can be divided into 4 parts, I see that. But folder structure of images are important and
&lt;br&gt;I have no idea with that. Images are kept in DB (can be changed, if folder structure is better)
&lt;br&gt;Is two stage of map-reduce operations better? First, one map-reduce for each image,
&lt;br&gt;then a second map-reduce for every part of one image.
&lt;br&gt;But as far as I know, the slowest computation slows down whole operation.
&lt;br&gt;&lt;br&gt;Thanks in advance..</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Hadoop-with-image-processing-tp19994780p19994780.html" />
</entry>

<entry>
	<id>tag:old.nabble.com,2006:post-19842438</id>
	<title>Searching Lucene Index built using Hadoop</title>
	<published>2008-10-06T10:26:43Z</published>
	<updated>2008-10-06T10:26:43Z</updated>
	<author>
		<name>Saranath</name>
	</author>
	<content type="html">I'm trying to index a large dataset using Hadoop+Lucene. I used the example under hadoop/trunk/src/conrib/index/ for indexing. I'm unable to find a way to search the index that was successfully built.
&lt;br&gt;&lt;br&gt;I tried copying over the index to one machine and merging them using IndexWriter.addIndexesNoOptimize().
&lt;br&gt;&lt;br&gt;I would like hear your input on the best way to index+search large datasets.
&lt;br&gt;&lt;br&gt;Thanks,
&lt;br&gt;Saranath</content>
	<link rel="alternate" type="text/html" href="http://old.nabble.com/Searching-Lucene-Index-built-using-Hadoop-tp19842438p19842438.html" />
</entry>

</feed>
