|
View:
New views
16 Messages
—
Rating Filter:
Alert me
|
|
|
Could only be replicated to 0 nodes, instead of 1Hi.
I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 ) |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Hi ,
I have two suggestion i)Choose a right version ( Hadoop- 0.18 is good) ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that ur configuration is correct !!) Hey even i am just suggesting this as i am also a new to hadoop Ashish Pareek On Thu, May 21, 2009 at 2:41 PM, Stas Oskin <stas.oskin@...> wrote: > Hi. > > I'm testing Hadoop in our lab, and started getting the following message > when trying to copy a file: > Could only be replicated to 0 nodes, instead of 1 > > I have the following setup: > > * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB > * Two clients are copying files all the time (one of them is the 1.5GB > machine) > * The replication is set on 2 > * I let the space on 2 smaller machines to end, to test the behavior > > Now, one of the clients (the one located on 1.5GB) works fine, and the > other > one - the external, unable to copy and displays the error + the exception > below > > Any idea if this expected on my scenario? Or how it can be solved? > > Thanks in advance. > > > > 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping > /test/test.bin retries left 1 > > 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /test/test.bin could only be replicated to 0 nodes, instead of 1 > > at > > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 > ) > > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) > > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > ) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:716) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > ) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > ) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 > ) > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 > ) > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 > ) > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 > ) > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 > ) > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 > ) > > > > 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad > datanode[0] > > java.io.IOException: Could not get block locations. Aborting... > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 > ) > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 > ) > > at > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 > ) > |
|
|
Re: Could only be replicated to 0 nodes, instead of 1It does not appear that any datanodes have connected to your namenode.
on the datanode machines look in the hadoop logs directory at the datanode log files. There should be some information there that helps you diagnose the problem. chapter 4 of my book provides some detail on work with this problem On Thu, May 21, 2009 at 4:29 AM, ashish pareek <pareekash@...> wrote: > Hi , > > I have two suggestion > > i)Choose a right version ( Hadoop- 0.18 is good) > ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that > ur configuration is correct !!) > > Hey even i am just suggesting this as i am also a new to hadoop > > Ashish Pareek > > > On Thu, May 21, 2009 at 2:41 PM, Stas Oskin <stas.oskin@...> wrote: > > > Hi. > > > > I'm testing Hadoop in our lab, and started getting the following message > > when trying to copy a file: > > Could only be replicated to 0 nodes, instead of 1 > > > > I have the following setup: > > > > * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB > > * Two clients are copying files all the time (one of them is the 1.5GB > > machine) > > * The replication is set on 2 > > * I let the space on 2 smaller machines to end, to test the behavior > > > > Now, one of the clients (the one located on 1.5GB) works fine, and the > > other > > one - the external, unable to copy and displays the error + the exception > > below > > > > Any idea if this expected on my scenario? Or how it can be solved? > > > > Thanks in advance. > > > > > > > > 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping > > /test/test.bin retries left 1 > > > > 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > > /test/test.bin could only be replicated to 0 nodes, instead of 1 > > > > at > > > > > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 > > ) > > > > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) > > > > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > > > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > > ) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > > > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:716) > > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > > > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > > ) > > > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > > ) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 > > ) > > > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 > > ) > > > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 > > ) > > > > > > > > 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad > > datanode[0] > > > > java.io.IOException: Could not get block locations. Aborting... > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 > > ) > > > -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Hi.
i)Choose a right version ( Hadoop- 0.18 is good) I'm using 0.18.3. > > ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that > ur configuration is correct !!) > Actually I'm testing 2x replication on any number of DN's, to see how reliable is it. > > Hey even i am just suggesting this as i am also a new to hadoop > > Ashish Pareek > > > On Thu, May 21, 2009 at 2:41 PM, Stas Oskin <stas.oskin@...> wrote: > > > Hi. > > > > I'm testing Hadoop in our lab, and started getting the following message > > when trying to copy a file: > > Could only be replicated to 0 nodes, instead of 1 > > > > I have the following setup: > > > > * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB > > * Two clients are copying files all the time (one of them is the 1.5GB > > machine) > > * The replication is set on 2 > > * I let the space on 2 smaller machines to end, to test the behavior > > > > Now, one of the clients (the one located on 1.5GB) works fine, and the > > other > > one - the external, unable to copy and displays the error + the exception > > below > > > > Any idea if this expected on my scenario? Or how it can be solved? > > > > Thanks in advance. > > > > > > > > 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping > > /test/test.bin retries left 1 > > > > 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > > /test/test.bin could only be replicated to 0 nodes, instead of 1 > > > > at > > > > > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 > > ) > > > > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) > > > > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > > > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > > ) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > > > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > > > > > > > > at org.apache.hadoop.ipc.Client.call(Client.java:716) > > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > > > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > > ) > > > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > > ) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 > > ) > > > > at > > > > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 > > ) > > > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 > > ) > > > > > > > > 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad > > datanode[0] > > > > java.io.IOException: Could not get block locations. Aborting... > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 > > ) > > > > at > > > > > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 > > ) > > > |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Hi.
2009/5/21 jason hadoop <jason.hadoop@...> > It does not appear that any datanodes have connected to your namenode. > on the datanode machines look in the hadoop logs directory at the datanode > log files. > There should be some information there that helps you diagnose the problem. > > chapter 4 of my book provides some detail on work with this problem > NameNode web panel shows that all DataNodes are connected. Also, as I said above, one client (same as located on the 1.5GB DataNode) is working ok. Anything else that I can check? Regards. |
|
|
Re: Could only be replicated to 0 nodes, instead of 1I think you should file a jira on this. Most likely this is what is happening : * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other block is being written on the 3rd, it is still greater than (2 * 1/3). To test this, if you write just one block to an idle cluster it should succeed. Writing from the client on the 3rd dn succeeds since local node is always favored. This particular problem is not that severe on a large cluster but HDFS should do the sensible thing. Raghu. Stas Oskin wrote: > Hi. > > I'm testing Hadoop in our lab, and started getting the following message > when trying to copy a file: > Could only be replicated to 0 nodes, instead of 1 > > I have the following setup: > > * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB > * Two clients are copying files all the time (one of them is the 1.5GB > machine) > * The replication is set on 2 > * I let the space on 2 smaller machines to end, to test the behavior > > Now, one of the clients (the one located on 1.5GB) works fine, and the other > one - the external, unable to copy and displays the error + the exception > below > > Any idea if this expected on my scenario? Or how it can be solved? > > Thanks in advance. > > > > 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping > /test/test.bin retries left 1 > > 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /test/test.bin could only be replicated to 0 nodes, instead of 1 > > at > org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 > ) > > at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) > > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > ) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) > > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:716) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 > ) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 > ) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 > ) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 > ) > > at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 > ) > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 > ) > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 > ) > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 > ) > > > > 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad > datanode[0] > > java.io.IOException: Could not get block locations. Aborting... > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 > ) > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 > ) > > at > org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 > ) > |
|
|
Re: Could only be replicated to 0 nodes, instead of 1On May 21, 2009, at 2:01 PM, Raghu Angadi wrote: > > I think you should file a jira on this. Most likely this is what is > happening : > > * two out of 3 dns can not take anymore blocks. > * While picking nodes for a new block, NN mostly skips the third dn > as well since '# active writes' on it is larger than '2 * avg'. > * Even if there is one other block is being written on the 3rd, it > is still greater than (2 * 1/3). > > To test this, if you write just one block to an idle cluster it > should succeed. > > Writing from the client on the 3rd dn succeeds since local node is > always favored. > > This particular problem is not that severe on a large cluster but > HDFS should do the sensible thing. > Hey Raghu, If this analysis is right, I would add it can happen even on large clusters! I've seen this error at our cluster when we're very full (>97%) and very few nodes have any empty space. This usually happens because we have two very large nodes (10x bigger than the rest of the cluster), and HDFS tends to distribute writes randomly -- meaning the smaller nodes fill up quickly, until the balancer can catch up. Brian > Raghu. > > Stas Oskin wrote: >> Hi. >> I'm testing Hadoop in our lab, and started getting the following >> message >> when trying to copy a file: >> Could only be replicated to 0 nodes, instead of 1 >> I have the following setup: >> * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB >> * Two clients are copying files all the time (one of them is the >> 1.5GB >> machine) >> * The replication is set on 2 >> * I let the space on 2 smaller machines to end, to test the behavior >> Now, one of the clients (the one located on 1.5GB) works fine, and >> the other >> one - the external, unable to copy and displays the error + the >> exception >> below >> Any idea if this expected on my scenario? Or how it can be solved? >> Thanks in advance. >> 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException >> sleeping >> /test/test.bin retries left 1 >> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File >> /test/test.bin could only be replicated to 0 nodes, instead of 1 >> at >> org >> .apache >> .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 >> ) >> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java: >> 330) >> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown >> Source) >> at >> sun >> .reflect >> .DelegatingMethodAccessorImpl >> .invoke(DelegatingMethodAccessorImpl.java:25 >> ) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java: >> 890) >> at org.apache.hadoop.ipc.Client.call(Client.java:716) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) >> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> at >> sun >> .reflect >> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 >> ) >> at >> sun >> .reflect >> .DelegatingMethodAccessorImpl >> .invoke(DelegatingMethodAccessorImpl.java:25 >> ) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org >> .apache >> .hadoop >> .io >> .retry >> .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 >> ) >> at >> org >> .apache >> .hadoop >> .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java: >> 59 >> ) >> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) >> at >> org.apache.hadoop.dfs.DFSClient >> $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 >> ) >> at >> org.apache.hadoop.dfs.DFSClient >> $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 >> ) >> at >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access >> $1800(DFSClient.java:1745 >> ) >> at >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream >> $DataStreamer.run(DFSClient.java:1922 >> ) >> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null >> bad >> datanode[0] >> java.io.IOException: Could not get block locations. Aborting... >> at >> org.apache.hadoop.dfs.DFSClient >> $DFSOutputStream.processDatanodeError(DFSClient.java:2153 >> ) >> at >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access >> $1400(DFSClient.java:1745 >> ) >> at >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream >> $DataStreamer.run(DFSClient.java:1899 >> ) |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Brian Bockelman wrote:
> > On May 21, 2009, at 2:01 PM, Raghu Angadi wrote: > >> >> I think you should file a jira on this. Most likely this is what is >> happening : >> >> * two out of 3 dns can not take anymore blocks. >> * While picking nodes for a new block, NN mostly skips the third dn as >> well since '# active writes' on it is larger than '2 * avg'. >> * Even if there is one other block is being written on the 3rd, it is >> still greater than (2 * 1/3). >> >> To test this, if you write just one block to an idle cluster it should >> succeed. >> >> Writing from the client on the 3rd dn succeeds since local node is >> always favored. >> >> This particular problem is not that severe on a large cluster but HDFS >> should do the sensible thing. >> > > Hey Raghu, > > If this analysis is right, I would add it can happen even on large > clusters! I've seen this error at our cluster when we're very full > (>97%) and very few nodes have any empty space. This usually happens > because we have two very large nodes (10x bigger than the rest of the > cluster), and HDFS tends to distribute writes randomly -- meaning the > smaller nodes fill up quickly, until the balancer can catch up. Yes. This would bite when ever a large portion of nodes can not accept blocks. In general can happen whenever less than half the nodes have any space left. Raghu. |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Hi.
I think you should file a jira on this. Most likely this is what is > happening : > Will do - this goes to DFS section, correct? > > * two out of 3 dns can not take anymore blocks. > * While picking nodes for a new block, NN mostly skips the third dn as > well since '# active writes' on it is larger than '2 * avg'. > * Even if there is one other block is being written on the 3rd, it is > still greater than (2 * 1/3). > Frankly I'm not so familiar with Hadoop inner workings to understand this completely, but from what I digest, NN doesn't like the 3rd DN because there is too many blocks on it, compared to other servers? > > To test this, if you write just one block to an idle cluster it should > succeed. > What exactly is "idle cluster"? Something that nothing is being written to (including the 3rd DN)? > > Writing from the client on the 3rd dn succeeds since local node is always > favored. Makes sense. > > This particular problem is not that severe on a large cluster but HDFS > should do the sensible thing. > Yes, I agree that this is a non-standard situation, but IMHO the best way of action would be write anyway, but throw a warning. There is one already appearing when there is not enough space for replication, and it explains quite well the matter. So similar one would be great. |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Hi.
If this analysis is right, I would add it can happen even on large clusters! > I've seen this error at our cluster when we're very full (>97%) and very > few nodes have any empty space. This usually happens because we have two > very large nodes (10x bigger than the rest of the cluster), and HDFS tends > to distribute writes randomly -- meaning the smaller nodes fill up quickly, > until the balancer can catch up. > A bit of topic, do you ran the balancer manually? Or you have some scheduler that does it? |
|
|
Re: Could only be replicated to 0 nodes, instead of 1On May 21, 2009, at 3:10 PM, Stas Oskin wrote: > Hi. > > If this analysis is right, I would add it can happen even on large > clusters! >> I've seen this error at our cluster when we're very full (>97%) and >> very >> few nodes have any empty space. This usually happens because we >> have two >> very large nodes (10x bigger than the rest of the cluster), and >> HDFS tends >> to distribute writes randomly -- meaning the smaller nodes fill up >> quickly, >> until the balancer can catch up. >> > > > A bit of topic, do you ran the balancer manually? Or you have some > scheduler > that does it? crontab does it for us, once an hour. We're always importing data, so the cluster is always out-of-balance. If the previous balancer didn't exit, the new one will simply exit. The real trick has been to make sure the balancer doesn't get stuck -- a Nagios plugin makes sure that the stdout has been printed to in the last hour or so, otherwise it kills the running balancer. Stuck balancers have been an issue in the past. Brian |
|
|
Re: Could only be replicated to 0 nodes, instead of 1>
> The real trick has been to make sure the balancer doesn't get stuck -- a > Nagios plugin makes sure that the stdout has been printed to in the last > hour or so, otherwise it kills the running balancer. Stuck balancers have > been an issue in the past. > Thanks for the advice. |
|
|
Re: Could only be replicated to 0 nodes, instead of 1>
> I think you should file a jira on this. Most likely this is what is > happening : > Here it is - hope it's ok: https://issues.apache.org/jira/browse/HADOOP-5886 |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Stas Oskin wrote:
>> I think you should file a jira on this. Most likely this is what is >> happening : >> > > Here it is - hope it's ok: > > https://issues.apache.org/jira/browse/HADOOP-5886 looks good. I will add my earlier post as comment. You could update the jira with any more tests. Next time, it would be better include larger stack traces, logs etc in subsequent comments rather than in the description. Thanks, Raghu. |
|
|
Re: Could only be replicated to 0 nodes, instead of 1>
> > Next time, it would be better include larger stack traces, logs etc in > subsequent comments rather than in the description. > Will do, thanks for the tip. |
|
|
Re: Could only be replicated to 0 nodes, instead of 1Hi.
I wonder if there was any progress with this issue? Regards. On Thu, May 21, 2009 at 9:01 PM, Raghu Angadi <rangadi@...> wrote: > > I think you should file a jira on this. Most likely this is what is > happening : > > * two out of 3 dns can not take anymore blocks. > * While picking nodes for a new block, NN mostly skips the third dn as > well since '# active writes' on it is larger than '2 * avg'. > * Even if there is one other block is being written on the 3rd, it is > still greater than (2 * 1/3). > > To test this, if you write just one block to an idle cluster it should > succeed. > > Writing from the client on the 3rd dn succeeds since local node is always > favored. > > This particular problem is not that severe on a large cluster but HDFS > should do the sensible thing. > > Raghu. > > > Stas Oskin wrote: > >> Hi. >> >> I'm testing Hadoop in our lab, and started getting the following message >> when trying to copy a file: >> Could only be replicated to 0 nodes, instead of 1 >> >> I have the following setup: >> >> * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB >> * Two clients are copying files all the time (one of them is the 1.5GB >> machine) >> * The replication is set on 2 >> * I let the space on 2 smaller machines to end, to test the behavior >> >> Now, one of the clients (the one located on 1.5GB) works fine, and the >> other >> one - the external, unable to copy and displays the error + the exception >> below >> >> Any idea if this expected on my scenario? Or how it can be solved? >> >> Thanks in advance. >> >> >> >> 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping >> /test/test.bin retries left 1 >> >> 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File >> /test/test.bin could only be replicated to 0 nodes, instead of 1 >> >> at >> >> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 >> ) >> >> at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) >> >> at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) >> >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 >> ) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) >> >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) >> >> >> >> at org.apache.hadoop.ipc.Client.call(Client.java:716) >> >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) >> >> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 >> ) >> >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 >> ) >> >> at java.lang.reflect.Method.invoke(Method.java:597) >> >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 >> ) >> >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 >> ) >> >> at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 >> ) >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 >> ) >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 >> ) >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 >> ) >> >> >> >> 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad >> datanode[0] >> >> java.io.IOException: Could not get block locations. Aborting... >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 >> ) >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 >> ) >> >> at >> >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 >> ) >> >> > |
| Free embeddable forum powered by Nabble | Forum Help |