|
View:
New views
15 Messages
—
Rating Filter:
Alert me
|
|
|
HBase Exceptions on version 0.20.1Hello Everyone, We are experiencing many HBase exceptions while running the attached test program using HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster with 14 data nodes. The HBase operational performance degrades quickly once certain error conditions are encountered especially if there are issues with the META region. Has anyone else seen these exceptions listed below? We ran the test with 3 region servers and then with 14 region servers. In both cases, we still got exceptions. In each test run, we shutdown HBase, removed the main /hbase folder, restarted HBase and then ran the test program. The test program uses a table named "fc_test". The program is started by the following command: /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 The attached HBase test program starts 30 threads. Each thread loops, inserting randomly generated keys along with a fixed data pattern of 4,000,000 bytes. After running the test for 2-12 hours, the table scanning of "fc_test" is broken and only a limited number of put operations are still succeeding. EXCEPTION LIST ------------------------------ 2009-10-15 03:28:39,005 DEBUG org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered on store data; time since last major compaction 89677908ms 2009-10-15 03:28:39,012 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-6054029382496843671_324379 from any node: java.io.IOException: No live nodes contain current block ------------------------------ 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping response for sessionid:0x2244f6194a5000d after 1ms 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca> java.io.IOException: TIMED OUT at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) 2009-10-15 07:41:54,344 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Disconnected, type: None, path: null 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server hfs-029014/10.1.29.14:2181 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] ------------------------------ 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) at org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) ------------------------------ 2009-10-15 07:57:28,025 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, Evicted/Run=NaN 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7166000434569842394_332632 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; editcount=146 ----------------------------- 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception while closing send thread for session 0x1244f6194780008 : Read error rc = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: Disconnecting ClientCnxn for session: 0x1244f6194780008 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down ------------------------------ 2009-10-13 12:30:58,207 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read master address from ZooKeeper. Retrying. Error was: java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) ------------------------------ 2009-10-14 15:12:42,664 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 org.apache.hadoop.hbase.Leases$LeaseStillHeldException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) at java.lang.Thread.run(Unknown Source) ------------------------------ 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4> java.io.IOException: Session Expired at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) 2009-10-14 15:12:43,567 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: Expired, type: None, path: null 2009-10-14 15:12:43,568 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired ------------------------------ 2009-10-14 15:12:51,563 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: java.io.IOException: Cannot append; log is closed at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) ------------------------------ 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not cancel cleanup thread, though no FileSystems are open 2009-10-14 15:13:51,864 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down HRegionServer: file system not available java.io.IOException: File system is not available at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) at org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) at org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 failed on local exception: java.nio.channels.ClosedByInterruptException at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) at org.apache.hadoop.ipc.Client.call(Client.java:742) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.getFileInfo(Unknown Source) at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) at org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) ... 9 more Caused by: java.nio.channels.ClosedByInterruptException at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown Source) at sun.nio.ch.SocketChannelImpl.write(Unknown Source) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.flushBuffer(Unknown Source) at java.io.BufferedOutputStream.flush(Unknown Source) at java.io.DataOutputStream.flush(Unknown Source) at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) at org.apache.hadoop.ipc.Client.call(Client.java:720) ------------------------------ PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE 2009-10-13 00:00:00,508 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for ... .... at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) at org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) at org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) ------------------------------ 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60020, call put([B@1f5b92a, [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) 2009-10-12 10:06:57,812 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60020, call put([B@1f52982, [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: error: java.io.IOException: Server not running, aborting java.io.IOException: Server not running, aborting at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) ------------------------------ . |
|
|
Re: HBase Exceptions on version 0.20.1Most of these exceptions look related to overloaded servers (GC pauses
causing timeouts, high IO wait tripping up the datanodes, etc). Have you turned on GC logging? Also, are you swapping on these nodes? Check out the performance tuning page here: http://wiki.apache.org/hadoop/PerformanceTuning The WrongRegionException at the end could be a fault but it's hard to know without seeing the entire context and knowing what the cluster was up to at that point. Performance can degrade as the JVMs fill up, get more and more fragmented, and the GC gets slower. Also, you are inserting 4MB values? Those are fairly large, at the upper-end of what you would want to put into HBase. Is this your actual use case? At the least you'll want to increase your region size (otherwise you're going to have at most 64 rows per region, often less), but also consider if HBase is the right place to store 4MB values. Hope that helps. JG elsif wrote: > Hello Everyone, > > We are experiencing many HBase exceptions while running the attached > test program using > HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster > with 14 data nodes. > > The HBase operational performance degrades quickly once certain error > conditions are > encountered especially if there are issues with the META region. > > Has anyone else seen these exceptions listed below? > > We ran the test with 3 region servers and then with 14 region servers. > In both cases, we still got exceptions. > > In each test run, we shutdown HBase, removed the main /hbase > folder, restarted HBase and then ran the test program. > The test program uses a table named "fc_test". The program is started by > the following command: > > /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 > > The attached HBase test program starts 30 threads. Each thread loops, > inserting randomly generated keys > along with a fixed data pattern of 4,000,000 bytes. > > After running the test for 2-12 hours, the table scanning of "fc_test" > is broken and only a limited number of put > operations are still succeeding. > > > > EXCEPTION LIST > > ------------------------------ > 2009-10-15 03:28:39,005 DEBUG > org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered > on store data; time since last major compaction 89677908ms > 2009-10-15 03:28:39,012 DEBUG > org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 > file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 > 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not > obtain block blk_-6054029382496843671_324379 from any node: > java.io.IOException: No live nodes contain current block > ------------------------------ > 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x2244f6194a5000d after 1ms > 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca > <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca> > java.io.IOException: TIMED OUT > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) > 2009-10-15 07:41:54,344 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > state: Disconnected, type: None, path: null > 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server hfs-029014/10.1.29.14:2181 > 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected > local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] > ------------------------------ > 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) > ------------------------------ > 2009-10-15 07:57:28,025 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB > (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, > Evicted/Run=NaN > 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception > in createBlockOutputStream java.io.EOFException > 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_7166000434569842394_332632 > 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: > IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; > editcount=146 > ----------------------------- > 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception > while closing send thread for session 0x1244f6194780008 : Read error rc > = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] > 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: > Disconnecting ClientCnxn for session: 0x1244f6194780008 > 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: > EventThread shut down > ------------------------------ > 2009-10-13 12:30:58,207 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read > master address from ZooKeeper. Retrying. Error was: > java.io.IOException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hbase/master > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) > at java.lang.Thread.run(Unknown Source) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /hbase/master > at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) > ------------------------------ > 2009-10-14 15:12:42,664 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 > org.apache.hadoop.hbase.Leases$LeaseStillHeldException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) > at > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) > at java.lang.Thread.run(Unknown Source) > ------------------------------ > 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 > <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4> > java.io.IOException: Session Expired > at > org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) > at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > 2009-10-14 15:12:43,567 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > state: Expired, type: None, path: null > 2009-10-14 15:12:43,568 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session > expired > ------------------------------ > 2009-10-14 15:12:51,563 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: > java.io.IOException: Cannot append; log is closed > at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not > cancel cleanup thread, though no FileSystems are open > 2009-10-14 15:13:51,864 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down > HRegionServer: file system not available > java.io.IOException: File system is not available > at > org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 > failed on local exception: java.nio.channels.ClosedByInterruptException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) > at org.apache.hadoop.ipc.Client.call(Client.java:742) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy1.getFileInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) > at > org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) > ... 9 more > Caused by: java.nio.channels.ClosedByInterruptException > at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown > Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.flushBuffer(Unknown Source) > at java.io.BufferedOutputStream.flush(Unknown Source) > at java.io.DataOutputStream.flush(Unknown Source) > at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) > at org.apache.hadoop.ipc.Client.call(Client.java:720) > ------------------------------ > > PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE > > 2009-10-13 00:00:00,508 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: > org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row > out of range for ... > .... > at > org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) > at > org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) > at > org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 7 on 60020, call put([B@1f5b92a, > [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: > error: java.io.IOException: Server not running, aborting > java.io.IOException: Server not running, aborting > at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > 2009-10-12 10:06:57,812 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, > memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, > blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 > 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 3 on 60020, call put([B@1f52982, > [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: > error: java.io.IOException: Server not running, aborting > java.io.IOException: Server not running, aborting > at > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > > > > > > . > |
|
|
Re: HBase Exceptions on version 0.20.1The cpu load on each of the nodes never goes above 1 and very little if
any swap is in use. The 4MB is our boundary case - the actual data will always be smaller. This was just a encapsulated test that reproduces our issue. I will re-test with the CMS collector and logging enabled and reply back. Thanks, elsif > Most of these exceptions look related to overloaded servers (GC pauses > causing timeouts, high IO wait tripping up the datanodes, etc). Have > you turned on GC logging? Also, are you swapping on these nodes? > > Check out the performance tuning page here: > > http://wiki.apache.org/hadoop/PerformanceTuning > > The WrongRegionException at the end could be a fault but it's hard to > know without seeing the entire context and knowing what the cluster > was up to at that point. > > Performance can degrade as the JVMs fill up, get more and more > fragmented, and the GC gets slower. > > Also, you are inserting 4MB values? Those are fairly large, at the > upper-end of what you would want to put into HBase. Is this your > actual use case? At the least you'll want to increase your region > size (otherwise you're going to have at most 64 rows per region, often > less), but also consider if HBase is the right place to store 4MB values. > > Hope that helps. > > JG > |
|
|
Re: HBase Exceptions on version 0.20.1The reason JG points to load as being a problem as all signs point to it: This is usually the culprit behind DFS "no live block" errors -- the namenode is too busy and/or falling behind, or the datanodes are falling behind, or actually failing. Also, in the log snippets you provide, HBase is complaining about writes to DFS (for the WAL) taking in excess of 2 seconds. Also highly indicative of load, write load. Shortly after this, Zookeeper sessions begin expiring, which is also usually indicative of overloading -- heartbeats miss their deadline.
When I see these signs on my test clusters, I/O wait is generally in excess of 40%. If your total CPU load is really just a few % (user + system + iowait), then I'd suggest you look at the storage layer. Is there anything in the datanode logs that seems like it might be relevant? What about the network? Gigabit? Any potential sources of contention? Are you tracking network utilization metrics during the test? Also, you might consider using Ganglia to monitor and correlate system metrics and HBase and HDFS metrics during your testing, if you are not doing this already. - Andy ________________________________ From: elsif <elsif.then@...> To: hbase-user@... Sent: Wed, October 21, 2009 7:47:43 AM Subject: Re: HBase Exceptions on version 0.20.1 The cpu load on each of the nodes never goes above 1 and very little if any swap is in use. The 4MB is our boundary case - the actual data will always be smaller. This was just a encapsulated test that reproduces our issue. I will re-test with the CMS collector and logging enabled and reply back. Thanks, elsif > Most of these exceptions look related to overloaded servers (GC pauses > causing timeouts, high IO wait tripping up the datanodes, etc). Have > you turned on GC logging? Also, are you swapping on these nodes? > > Check out the performance tuning page here: > > http://wiki.apache.org/hadoop/PerformanceTuning > > The WrongRegionException at the end could be a fault but it's hard to > know without seeing the entire context and knowing what the cluster > was up to at that point. > > Performance can degrade as the JVMs fill up, get more and more > fragmented, and the GC gets slower. > > Also, you are inserting 4MB values? Those are fairly large, at the > upper-end of what you would want to put into HBase. Is this your > actual use case? At the least you'll want to increase your region > size (otherwise you're going to have at most 64 rows per region, often > less), but also consider if HBase is the right place to store 4MB values. > > Hope that helps. > > JG > |
|
|
Re: HBase Exceptions on version 0.20.1While running the test on this cluster of 14 servers, the highest loads I see are 3.68 (0.0% wa) on the master node and 2.65 (3.4% wa) on the node serving the .META. region. All the machines are on a single gigabit switch dedicated to the cluster. The highest throughput between nodes has been 21.4MBps Rx on the node hosting the .META. region. There are 239 "Block blk_-xxx is not valid errors", 522 "BlockInfo not found in volumeMap" errors, and 208 "BlockAlreadyExistsException" found in the hadoop logs over 12 hours of running the test. I understand that I am loading the cluster - that is the point of the test, but I don't think that this should result in data loss. Failed inserts at the client level I can handle, but loss of data that was previously thought to be stored in hbase is a major issue. Are there plans to make hbase more resilient to load based failures? Regards, elsif Andrew Purtell wrote: > The reason JG points to load as being a problem as all signs point to it: This is usually the culprit behind DFS "no live block" errors -- the namenode is too busy and/or falling behind, or the datanodes are falling behind, or actually failing. Also, in the log snippets you provide, HBase is complaining about writes to DFS (for the WAL) taking in excess of 2 seconds. Also highly indicative of load, write load. Shortly after this, Zookeeper sessions begin expiring, which is also usually indicative of overloading -- heartbeats miss their deadline. > > When I see these signs on my test clusters, I/O wait is generally in excess of 40%. > > If your total CPU load is really just a few % (user + system + iowait), then I'd suggest you look at the storage layer. Is there anything in the datanode logs that seems like it might be relevant? > > What about the network? Gigabit? Any potential sources of contention? Are you tracking network utilization metrics during the test? > > Also, you might consider using Ganglia to monitor and correlate system metrics and HBase and HDFS metrics during your testing, if you are not doing this already. > > - Andy > > |
|
|
Re: HBase Exceptions on version 0.20.1On Wed, Oct 21, 2009 at 8:16 AM, elsif <elsif.then@...> wrote:
> > There are 239 "Block blk_-xxx is not valid errors", 522 "BlockInfo not > found in volumeMap" errors, and 208 "BlockAlreadyExistsException" found > in the hadoop logs over 12 hours of running the test. > Above are from application-level (hbase) or datanode logs? If you trace any of the above -- follow the block name -- in the NN are the blocks lost or do you see replicas taking over or recoveries triggered? > > I understand that I am loading the cluster - that is the point of the > test, but I don't think that this should result in data loss. Failed > inserts at the client level I can handle, but loss of data that was > previously thought to be stored in hbase is a major issue. Are there > plans to make hbase more resilient to load based failures? > > It looks like there'll be data loss going by a few of the exceptions you provide originally. Here's a couple of comments: "No live nodes contain current block" Usually we see this if the client-side hadoop has not been patched with hdfs-127/hadoop-4681. Your test program doesn't seem to have come across. Mind attaching it to an issue so I can try it? Going by the way you started your test program, you should have the hbase patched hadoop first in your CLASSPATH so you should be ok but maybe there is something about your environmnent frustrating hbase's using a patched hadoop? "java.io.IOException: TIMED OUT" Your regionserver or master timed out its zk session. GC or swapping or disk used by zk is under heavy i/o loading? "ClosedChannelException" Probably symptom of a RS shutdown because of events such as above. "Abandoning block..." Did this write to HLog fail? Its just an INFO level log out of DFSClient. "file system not available" What happened before this? Was this just an emission on regionserver shutdown? St.Ack |
|
|
Re: HBase Exceptions on version 0.20.1stack wrote:
> On Wed, Oct 21, 2009 at 8:16 AM, elsif <elsif.then@...> wrote: > > >> There are 239 "Block blk_-xxx is not valid errors", 522 "BlockInfo not >> found in volumeMap" errors, and 208 "BlockAlreadyExistsException" found >> in the hadoop logs over 12 hours of running the test. >> >> > > Above are from application-level (hbase) or datanode logs? If you trace any > of the above -- follow the block name -- in the NN are the blocks lost or do > you see replicas taking over or recoveries triggered? > > > are being requested after they are deleted (HBASE-1894): hdfs:2009-10-21 11:30:55,580 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.29.12:50010, storageID=DS-1209147803-10.1.29.12-50010-1651186044136, infoPort=50075, ipcPort=50020):Transmitted block blk_-6090437519946022724_469173 to /10.1.29.20:50010 hdfs:2009-10-21 11:30:59,938 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-6090437519946022724_469173 file /mnt/8d8aa987-017c-4f4c-822c-fe53f4163cee/current/subdir18/blk_-6090437519946022724 hdfs:2009-10-21 11:59:09,509 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.29.12:50010, storageID=DS-1209147803-10.1.29.12-50010-1651186044136, infoPort=50075, ipcPort=50020):Got exception while serving blk_-6090437519946022724_469173 to /10.1.29.14: hdfs:java.io.IOException: Block blk_-6090437519946022724_469173 is not valid. >> I understand that I am loading the cluster - that is the point of the >> test, but I don't think that this should result in data loss. Failed >> inserts at the client level I can handle, but loss of data that was >> previously thought to be stored in hbase is a major issue. Are there >> plans to make hbase more resilient to load based failures? >> >> >> > It looks like there'll be data loss going by a few of the exceptions you > provide originally. Here's a couple of comments: > > "No live nodes contain current block" Usually we see this if the > client-side hadoop has not been patched with hdfs-127/hadoop-4681. Your > test program doesn't seem to have come across. Mind attaching it to an > issue so I can try it? Going by the way you started your test program, you > should have the hbase patched hadoop first in your CLASSPATH so you should > be ok but maybe there is something about your environmnent frustrating > hbase's using a patched hadoop? > > this last run, the client was running against the hadoop-0.20.1-core.jar instead of the hadoop-0.20.1-hdfs127-core.jar. I will retest using the hdfs-127 jar. > "java.io.IOException: TIMED OUT" Your regionserver or master timed out its > zk session. GC or swapping or disk used by zk is under heavy i/o loading? > There are five zk servers in the test setup that share hardware with nodes that are also datanodes and regionservers. There are only 8 instances of this error and all of them occurred within 30 seconds of each other on the same node. > "ClosedChannelException" Probably symptom of a RS shutdown because of events > such as above. > > "Abandoning block..." Did this write to HLog fail? Its just an INFO level > log out of DFSClient. > Since these are just INFO level messages, they are not an issue? hbase:2009-10-21 11:40:57,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-1365065510928989381_470690 hdfs:2009-10-21 11:40:57,080 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-1365065510928989381_470690 src: /10.1.29.23:56117 dest: /10.1.29.23:50010 hdfs:2009-10-21 11:40:57,081 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-1365065510928989381_470690 received exception org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/mnt/eac8b34a-a4a4-495b-8241-43cd4b21c28f/current/blk_-8854288022995097022_462986.meta': No such file or directory hbase:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_4473876596208463127_470012 hdfs:2009-10-21 11:35:44,842 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4473876596208463127_470012 src: /10.1.29.22:57916 dest: /10.1.29.14:50010 hdfs:2009-10-21 11:35:44,844 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_4473876596208463127_470012 received exception java.io.EOFException hdfs:2009-10-21 11:36:26,105 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4473876596208463127_470012 src: /10.1.29.14:34537 dest: /10.1.29.21:50010 hdfs:2009-10-21 11:36:26,105 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_4473876596208463127_470012 received exception org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/mnt/e9726517-c84c-4326-8859-44f8261a14fa/current/subdir26/subdir62/blk_4151284369367444478': No such file or directory hdfs:2009-10-21 11:36:12,605 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4473876596208463127_470012 src: /10.1.29.22:34103 dest: /10.1.29.22:50010 hdfs:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_4473876596208463127_470012 2 Exception java.io.EOFException hdfs:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_4473876596208463127_470012 terminating hbase:2009-10-21 06:47:10,189 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6810931018745175038_422166 hdfs:2009-10-21 06:47:10,171 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6810931018745175038_422166 src: /10.1.29.12:44933 dest: /10.1.29.12:50010 hdfs:2009-10-21 06:47:10,188 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-6810931018745175038_422166 received exception java.io.EOFException hdfs:2009-10-21 06:46:43,362 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6810931018745175038_422166 src: /10.1.29.12:48908 dest: /10.1.29.14:50010 hdfs:2009-10-21 06:46:43,362 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-6810931018745175038_422166 received exception org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/mnt/1a3e57e4-b514-4a85-83b1-ae9d12e3fb7a/current/subdir27/blk_-3769845572103484957_410554.meta': No such file or directory > "file system not available" What happened before this? Was this just an > emission on regionserver shutdown? > I don't have any "file system not available" messages this time around. > St.Ack > > Here is the source for the test program. For my tests I run two instances with the default settings: -- package org.apache.hadoop.hbase.util; import java.io.IOException; import java.io.FileInputStream; import java.lang.Integer; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; /* create 'fc_test', {NAME => 'json', VERSIONS => 1}, {NAME => 'data', VERSIONS =>1} */ public class LoadTest extends Thread { protected int datasize = 4000000; public TableTest(int datasize) { this.datasize = datasize; } public void run() { try { FileInputStream random = new FileInputStream("/dev/urandom"); byte[] key = new byte[32]; byte[] data = new byte[datasize]; HBaseConfiguration hbaseConfiguration = new HBaseConfiguration(); HTable table = new HTable(hbaseConfiguration, "fc_test"); random.read(data, 0, datasize); while (true) { random.read(key, 0, 32); Put update = new Put(Bytes.toBytes(getHexString(key))); update.add(Bytes.toBytes("data"), Bytes.toBytes(""), data); table.put(update); } } catch (Exception e) { System.out.println(e.toString()); } } public static String getHexString(byte[] b) throws Exception { String result = ""; for (int i=0; i < b.length; i++) { result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 ); } return result; } public static void main(String args[]) throws IOException { Thread t = null; int i; int threads = 30; int datasize = 400000; try { threads = Integer.getInteger(args[0]).intValue(); datasize = Integer.getInteger(args[1]).intValue(); } catch (Exception e) { } System.out.println("Starting " + threads + " threads"); for (i = 0; i< threads; i++) { t = new TableTest(datasize); t.start(); } // wait for last thread try { t.join(); } catch (InterruptedException e) { } } } |
|
|
Re: HBase Exceptions on version 0.20.1> 239 "Block blk_-xxx is not valid errors",
> 522 "BlockInfo not found in volumeMap" errors, > and 208 "BlockAlreadyExistsException" I assume since you say they were found in Hadoop logs that these appeared in the datanode and/or namenode logs. If not, and instead these are from HBase logs, please correct my understanding. It seems to me that your HDFS is sick. That's not particularly helpful, I know, but HBase is a client application of HDFS and depends on its good functioning. Have you talked with anyone on or mailed the logs to hdfs-user@... ? If so, what did they say? > Are there plans to make hbase more resilient to load based failures? Yes, this is definitely something we have done and continue to do. It's hard if you can't trust your filesystem. Once I tried running HBase on top of KFS instead of HDFS. KFS did seem slower as is the conventional wisdom but I had bigger problems... the chunkservers would randomly abort on my x86_64 nodes and even after I gave Sriram system access for gdb stack dumps, there was no clear resolution. On the other hand, if you get it working, it has working sync and append. HDFS won't have a fully working sync until 0.21. YMMV. - Andy ________________________________ From: elsif <elsif.then@...> To: hbase-user@... Sent: Wed, October 21, 2009 8:16:40 AM Subject: Re: HBase Exceptions on version 0.20.1 While running the test on this cluster of 14 servers, the highest loads I see are 3.68 (0.0% wa) on the master node and 2.65 (3.4% wa) on the node serving the .META. region. All the machines are on a single gigabit switch dedicated to the cluster. The highest throughput between nodes has been 21.4MBps Rx on the node hosting the .META. region. There are 239 "Block blk_-xxx is not valid errors", 522 "BlockInfo not found in volumeMap" errors, and 208 "BlockAlreadyExistsException" found in the hadoop logs over 12 hours of running the test. I understand that I am loading the cluster - that is the point of the test, but I don't think that this should result in data loss. Failed inserts at the client level I can handle, but loss of data that was previously thought to be stored in hbase is a major issue. Are there plans to make hbase more resilient to load based failures? Regards, elsif Andrew Purtell wrote: > The reason JG points to load as being a problem as all signs point to it: This is usually the culprit behind DFS "no live block" errors -- the namenode is too busy and/or falling behind, or the datanodes are falling behind, or actually failing. Also, in the log snippets you provide, HBase is complaining about writes to DFS (for the WAL) taking in excess of 2 seconds. Also highly indicative of load, write load. Shortly after this, Zookeeper sessions begin expiring, which is also usually indicative of overloading -- heartbeats miss their deadline. > > When I see these signs on my test clusters, I/O wait is generally in excess of 40%. > > If your total CPU load is really just a few % (user + system + iowait), then I'd suggest you look at the storage layer. Is there anything in the datanode logs that seems like it might be relevant? > > What about the network? Gigabit? Any potential sources of contention? Are you tracking network utilization metrics during the test? > > Also, you might consider using Ganglia to monitor and correlate system metrics and HBase and HDFS metrics during your testing, if you are not doing this already. > > - Andy > > |
|
|
Re: HBase Exceptions on version 0.20.1Please ignore my earlier response. That does look suspiciously like client error.
Can you correlate the block /mnt/8d8aa987-017c-4f4c-822c-fe53f4163cee/current/subdir18/blk_-6090437519946022724 to a file? And this file to a region? What is in the master log as relates to this region around the time of the block errors? - Andy ________________________________ From: elsif <elsif.then@...> To: hbase-user@... Sent: Wed, October 21, 2009 2:36:12 PM Subject: Re: HBase Exceptions on version 0.20.1 stack wrote: > On Wed, Oct 21, 2009 at 8:16 AM, elsif <elsif.then@...> wrote: > > >> There are 239 "Block blk_-xxx is not valid errors", 522 "BlockInfo not >> found in volumeMap" errors, and 208 "BlockAlreadyExistsException" found >> in the hadoop logs over 12 hours of running the test. >> >> > > Above are from application-level (hbase) or datanode logs? If you trace any > of the above -- follow the block name -- in the NN are the blocks lost or do > you see replicas taking over or recoveries triggered? > > > are being requested after they are deleted (HBASE-1894): hdfs:2009-10-21 11:30:55,580 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.29.12:50010, storageID=DS-1209147803-10.1.29.12-50010-1651186044136, infoPort=50075, ipcPort=50020):Transmitted block blk_-6090437519946022724_469173 to /10.1.29.20:50010 hdfs:2009-10-21 11:30:59,938 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block blk_-6090437519946022724_469173 file /mnt/8d8aa987-017c-4f4c-822c-fe53f4163cee/current/subdir18/blk_-6090437519946022724 hdfs:2009-10-21 11:59:09,509 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.1.29.12:50010, storageID=DS-1209147803-10.1.29.12-50010-1651186044136, infoPort=50075, ipcPort=50020):Got exception while serving blk_-6090437519946022724_469173 to /10.1.29.14: hdfs:java.io.IOException: Block blk_-6090437519946022724_469173 is not valid. >> I understand that I am loading the cluster - that is the point of the >> test, but I don't think that this should result in data loss. Failed >> inserts at the client level I can handle, but loss of data that was >> previously thought to be stored in hbase is a major issue. Are there >> plans to make hbase more resilient to load based failures? >> >> >> > It looks like there'll be data loss going by a few of the exceptions you > provide originally. Here's a couple of comments: > > "No live nodes contain current block" Usually we see this if the > client-side hadoop has not been patched with hdfs-127/hadoop-4681. Your > test program doesn't seem to have come across. Mind attaching it to an > issue so I can try it? Going by the way you started your test program, you > should have the hbase patched hadoop first in your CLASSPATH so you should > be ok but maybe there is something about your environmnent frustrating > hbase's using a patched hadoop? > > this last run, the client was running against the hadoop-0.20.1-core.jar instead of the hadoop-0.20.1-hdfs127-core.jar. I will retest using the hdfs-127 jar. > "java.io.IOException: TIMED OUT" Your regionserver or master timed out its > zk session. GC or swapping or disk used by zk is under heavy i/o loading? > There are five zk servers in the test setup that share hardware with nodes that are also datanodes and regionservers. There are only 8 instances of this error and all of them occurred within 30 seconds of each other on the same node. > "ClosedChannelException" Probably symptom of a RS shutdown because of events > such as above. > > "Abandoning block..." Did this write to HLog fail? Its just an INFO level > log out of DFSClient. > Since these are just INFO level messages, they are not an issue? hbase:2009-10-21 11:40:57,082 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-1365065510928989381_470690 hdfs:2009-10-21 11:40:57,080 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-1365065510928989381_470690 src: /10.1.29.23:56117 dest: /10.1.29.23:50010 hdfs:2009-10-21 11:40:57,081 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-1365065510928989381_470690 received exception org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/mnt/eac8b34a-a4a4-495b-8241-43cd4b21c28f/current/blk_-8854288022995097022_462986.meta': No such file or directory hbase:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_4473876596208463127_470012 hdfs:2009-10-21 11:35:44,842 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4473876596208463127_470012 src: /10.1.29.22:57916 dest: /10.1.29.14:50010 hdfs:2009-10-21 11:35:44,844 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_4473876596208463127_470012 received exception java.io.EOFException hdfs:2009-10-21 11:36:26,105 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4473876596208463127_470012 src: /10.1.29.14:34537 dest: /10.1.29.21:50010 hdfs:2009-10-21 11:36:26,105 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_4473876596208463127_470012 received exception org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/mnt/e9726517-c84c-4326-8859-44f8261a14fa/current/subdir26/subdir62/blk_4151284369367444478': No such file or directory hdfs:2009-10-21 11:36:12,605 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_4473876596208463127_470012 src: /10.1.29.22:34103 dest: /10.1.29.22:50010 hdfs:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder blk_4473876596208463127_470012 2 Exception java.io.EOFException hdfs:2009-10-21 11:36:12,609 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block blk_4473876596208463127_470012 terminating hbase:2009-10-21 06:47:10,189 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6810931018745175038_422166 hdfs:2009-10-21 06:47:10,171 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6810931018745175038_422166 src: /10.1.29.12:44933 dest: /10.1.29.12:50010 hdfs:2009-10-21 06:47:10,188 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-6810931018745175038_422166 received exception java.io.EOFException hdfs:2009-10-21 06:46:43,362 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6810931018745175038_422166 src: /10.1.29.12:48908 dest: /10.1.29.14:50010 hdfs:2009-10-21 06:46:43,362 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-6810931018745175038_422166 received exception org.apache.hadoop.util.Shell$ExitCodeException: du: cannot access `/mnt/1a3e57e4-b514-4a85-83b1-ae9d12e3fb7a/current/subdir27/blk_-3769845572103484957_410554.meta': No such file or directory > "file system not available" What happened before this? Was this just an > emission on regionserver shutdown? > I don't have any "file system not available" messages this time around. > St.Ack > > Here is the source for the test program. For my tests I run two instances with the default settings: -- package org.apache.hadoop.hbase.util; import java.io.IOException; import java.io.FileInputStream; import java.lang.Integer; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Get; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.util.Bytes; /* create 'fc_test', {NAME => 'json', VERSIONS => 1}, {NAME => 'data', VERSIONS =>1} */ public class LoadTest extends Thread { protected int datasize = 4000000; public TableTest(int datasize) { this.datasize = datasize; } public void run() { try { FileInputStream random = new FileInputStream("/dev/urandom"); byte[] key = new byte[32]; byte[] data = new byte[datasize]; HBaseConfiguration hbaseConfiguration = new HBaseConfiguration(); HTable table = new HTable(hbaseConfiguration, "fc_test"); random.read(data, 0, datasize); while (true) { random.read(key, 0, 32); Put update = new Put(Bytes.toBytes(getHexString(key))); update.add(Bytes.toBytes("data"), Bytes.toBytes(""), data); table.put(update); } } catch (Exception e) { System.out.println(e.toString()); } } public static String getHexString(byte[] b) throws Exception { String result = ""; for (int i=0; i < b.length; i++) { result += Integer.toString( ( b[i] & 0xff ) + 0x100, 16).substring( 1 ); } return result; } public static void main(String args[]) throws IOException { Thread t = null; int i; int threads = 30; int datasize = 400000; try { threads = Integer.getInteger(args[0]).intValue(); datasize = Integer.getInteger(args[1]).intValue(); } catch (Exception e) { } System.out.println("Starting " + threads + " threads"); for (i = 0; i< threads; i++) { t = new TableTest(datasize); t.start(); } // wait for last thread try { t.join(); } catch (InterruptedException e) { } } } |
|
|
Re: HBase Exceptions on version 0.20.1Looks like the file was part of a compaction:
2009-10-21 11:58:38,434 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region fc_test,d46a332f60405431f0206d47143044676f1ff15239f24cb72c9743145d15f3e3,1256134371826 2009-10-21 11:58:38,442 DEBUG org.apache.hadoop.hbase.regionserver.Store: Skipped compaction of 1 file; compaction size of data: 173.6m; Skipped 3 files, size: 176812375 2009-10-21 11:58:38,451 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region fc_test,d46a332f60405431f0206d47143044676f1ff15239f24cb72c9743145d15f3e3,1256134371826 in 0sec 2009-10-21 11:58:38,451 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region fc_test,751960d5f43b5e53e806a89afbbd0b819f30de7369db613413022d99a77b54b8,1256135307641 2009-10-21 11:58:38,459 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of data: 170.2m; Skipped 1 file(s), size: 131805190 2009-10-21 11:58:38,459 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 3 file(s) into /hbase/fc_test/compaction.dir/909519914, seqid=113712 2009-10-21 11:58:38,913 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-6090437519946022724_469173 from any node: java.io.IOException: No live nodes contain current block I don't see anything in the master log relating to this: 2009-10-21 11:58:31,254 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 1406 row(s) of meta region {server: 10.1.29.20:60020, regionname: .META.,,1, startKey: <>} complete 2009-10-21 11:58:31,254 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 . META. region(s) scanned 2009-10-21 11:59:26,548 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.1.29.20:60020, regionname: .META.,,1, startKey: <>} Andrew Purtell wrote: > Please ignore my earlier response. That does look suspiciously like client error. > > Can you correlate the block /mnt/8d8aa987-017c-4f4c-822c-fe53f4163cee/current/subdir18/blk_-6090437519946022724 to a file? And this file to a region? What is in the master log as relates to this region around the time of the block errors? > > - Andy > > |
|
|
Re: HBase Exceptions on version 0.20.1Thanks.
Can you grep your master log for all lines including fc_test,751960d5f43b5e53e806a89afbbd0b819f30de7369db613413022d99a77b54b8,1256135307641 ? Best regards, - Andy ________________________________ From: elsif <elsif.then@...> To: hbase-user@... Sent: Fri, October 23, 2009 12:45:23 PM Subject: Re: HBase Exceptions on version 0.20.1 Looks like the file was part of a compaction: 2009-10-21 11:58:38,434 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region fc_test,d46a332f60405431f0206d47143044676f1ff15239f24cb72c9743145d15f3e3,1256134371826 2009-10-21 11:58:38,442 DEBUG org.apache.hadoop.hbase.regionserver.Store: Skipped compaction of 1 file; compaction size of data: 173.6m; Skipped 3 files, size: 176812375 2009-10-21 11:58:38,451 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region fc_test,d46a332f60405431f0206d47143044676f1ff15239f24cb72c9743145d15f3e3,1256134371826 in 0sec 2009-10-21 11:58:38,451 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region fc_test,751960d5f43b5e53e806a89afbbd0b819f30de7369db613413022d99a77b54b8,1256135307641 2009-10-21 11:58:38,459 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compaction size of data: 170.2m; Skipped 1 file(s), size: 131805190 2009-10-21 11:58:38,459 DEBUG org.apache.hadoop.hbase.regionserver.Store: Started compaction of 3 file(s) into /hbase/fc_test/compaction.dir/909519914, seqid=113712 2009-10-21 11:58:38,913 INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block blk_-6090437519946022724_469173 from any node: java.io.IOException: No live nodes contain current block I don't see anything in the master log relating to this: 2009-10-21 11:58:31,254 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scan of 1406 row(s) of meta region {server: 10.1.29.20:60020, regionname: .META.,,1, startKey: <>} complete 2009-10-21 11:58:31,254 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1 . META. region(s) scanned 2009-10-21 11:59:26,548 INFO org.apache.hadoop.hbase.master.BaseScanner: RegionManager.metaScanner scanning meta region {server: 10.1.29.20:60020, regionname: .META.,,1, startKey: <>} Andrew Purtell wrote: > Please ignore my earlier response. That does look suspiciously like client error. > > Can you correlate the block /mnt/8d8aa987-017c-4f4c-822c-fe53f4163cee/current/subdir18/blk_-6090437519946022724 to a file? And this file to a region? What is in the master log as relates to this region around the time of the block errors? > > - Andy > > |
|
|
Re: HBase Exceptions on version 0.20.1I ran your program elsif against our cluster. It ran for about two days,
loaded up 1500 regions into 3-node cluster and then each of the RS's variously timed out against zookeeper and shut themselves down. I'll take a look at why the session timeout. The upload is pretty slow. My guess is that the client with its 30 threads is contending over the map of region locations (Ryan noted the last day that all writes are blocked while the search for new region location is being done). I can take a look into this too. St.Ack On Tue, Oct 20, 2009 at 2:12 PM, elsif <elsif.then@...> wrote: > > Hello Everyone, > > We are experiencing many HBase exceptions while running the attached > test program using > HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster > with 14 data nodes. > > The HBase operational performance degrades quickly once certain error > conditions are > encountered especially if there are issues with the META region. > > Has anyone else seen these exceptions listed below? > > We ran the test with 3 region servers and then with 14 region servers. > In both cases, we still got exceptions. > > In each test run, we shutdown HBase, removed the main /hbase > folder, restarted HBase and then ran the test program. > The test program uses a table named "fc_test". The program is started by > the following command: > > /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 > > The attached HBase test program starts 30 threads. Each thread loops, > inserting randomly generated keys > along with a fixed data pattern of 4,000,000 bytes. > > After running the test for 2-12 hours, the table scanning of "fc_test" > is broken and only a limited number of put > operations are still succeeding. > > > > EXCEPTION LIST > > ------------------------------ > 2009-10-15 03:28:39,005 DEBUG > org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered > on store data; time since last major compaction 89677908ms > 2009-10-15 03:28:39,012 DEBUG > org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 > file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 > 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not > obtain block blk_-6054029382496843671_324379 from any node: > java.io.IOException: No live nodes contain current block > ------------------------------ > 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x2244f6194a5000d after 1ms > 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca > <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca> > java.io.IOException: TIMED OUT > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) > 2009-10-15 07:41:54,344 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > state: Disconnected, type: None, path: null > 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server hfs-029014/10.1.29.14:2181 > 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected > local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] > ------------------------------ > 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) > ------------------------------ > 2009-10-15 07:57:28,025 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB > (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, > Evicted/Run=NaN > 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception > in createBlockOutputStream java.io.EOFException > 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_7166000434569842394_332632 > 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: > IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; > editcount=146 > ----------------------------- > 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception > while closing send thread for session 0x1244f6194780008 : Read error rc > = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] > 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: > Disconnecting ClientCnxn for session: 0x1244f6194780008 > 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: > EventThread shut down > ------------------------------ > 2009-10-13 12:30:58,207 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read > master address from ZooKeeper. Retrying. Error was: > java.io.IOException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hbase/master > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) > at java.lang.Thread.run(Unknown Source) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /hbase/master > at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) > ------------------------------ > 2009-10-14 15:12:42,664 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 > org.apache.hadoop.hbase.Leases$LeaseStillHeldException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at > > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) > at > > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) > at > > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) > at java.lang.Thread.run(Unknown Source) > ------------------------------ > 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 > <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4> > java.io.IOException: Session Expired > at > > org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) > at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > 2009-10-14 15:12:43,567 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > state: Expired, type: None, path: null > 2009-10-14 15:12:43,568 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session > expired > ------------------------------ > 2009-10-14 15:12:51,563 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: > java.io.IOException: Cannot append; log is closed > at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not > cancel cleanup thread, though no FileSystems are open > 2009-10-14 15:13:51,864 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down > HRegionServer: file system not available > java.io.IOException: File system is not available > at > > org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 > failed on local exception: java.nio.channels.ClosedByInterruptException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) > at org.apache.hadoop.ipc.Client.call(Client.java:742) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy1.getFileInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) > at > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) > at > > org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) > ... 9 more > Caused by: java.nio.channels.ClosedByInterruptException > at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown > Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) > at > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.flushBuffer(Unknown Source) > at java.io.BufferedOutputStream.flush(Unknown Source) > at java.io.DataOutputStream.flush(Unknown Source) > at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) > at org.apache.hadoop.ipc.Client.call(Client.java:720) > ------------------------------ > > PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE > > 2009-10-13 00:00:00,508 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: > org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row > out of range for ... > .... > at > org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) > at > > org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) > at > org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 7 on 60020, call put([B@1f5b92a, > [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: > error: java.io.IOException: Server not running, aborting > java.io.IOException: Server not running, aborting > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > 2009-10-12 10:06:57,812 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, > memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, > blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 > 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 3 on 60020, call put([B@1f52982, > [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: > error: java.io.IOException: Server not running, aborting > java.io.IOException: Server not running, aborting > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > > > > > > . > > |
|
|
Re: HBase Exceptions on version 0.20.1This second run has been going for > 24 hours now. Its loaded up near 2k
regions. I'm using defaults from your script which is 30 threads and 400k cells. For 400k cells, I should probably be making adjustments upping the flush and region size. I'm also noticing that all load seems to land on a single regionserver. Did you notice that? Maybe I should use something other than java Random? Maybe I should md5 the Random output? But I just noticed that you ran with 4M cells. 4M cells are pretty big. HBase defaults are for cells much smaller. So I killed the above test and am now trying w/ 4M cells. I see that we're flushing every 16 or so edits. We're spending loads of our time just flushing and rolling WAL logs because the cells are all so large. Are 4M cells your use case? If so, I'll try and get you tunings that'll work better for these cell sizes. I've been watching ZooKeeper over last 24 hours. The average latency serving zk response is about 5ms. The longest response is 120ms (I see these stats by sending the quorum members the 'stat' command as in "echo stat|nc HOST PORT"). This time around I gave zk its own disk. My guess is that when I saw the TIMED OUT yesterday, it was probably zk client-side issue.. a long GC or so. I have GC logging running at the moment so will be able to correlate should it happen again. As for things slowing down, this third time loading I added in ganglia logging. Will let you know. As to the exceptions you were seeing, here's some more on those: On the "INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block...", I see those in my testing. The logs are at INFO level. They should probably be WARN-level instead. See the code below from DFSClient. The map of nodes carrying block replicas can shift while DFSClient is open (Looking at one particular block from uploading I see that the local block was removed from local datanode because the block was over-replicated). When we try to get the local block, we can fail. Its logged as "No live nodes contain current block". The current DFSClient map doesn't contain a node for the block. As you can see in the code, we'll then ask NameNode for new map of blocks for this file. As long as we get the block before cycle > failures, we'll see INFO-level loggings (For some reason I see 3 loggings often before we'll succeed and get the block). Its a failure if DFSClient throws an IOE out of here. I'm just seeing the INFO loggings in my testing. On this second test I added logging to DFSClient to see if I can figure why it takes 3 goes often to find the right block. private DNAddrPair chooseDataNode(LocatedBlock block) throws IOException { while (true) { DatanodeInfo[] nodes = block.getLocations(); try { DatanodeInfo chosenNode = bestNode(nodes, deadNodes); InetSocketAddress targetAddr = NetUtils.createSocketAddr(chosenNode.getName()); return new DNAddrPair(chosenNode, targetAddr); } catch (IOException ie) { String blockInfo = block.getBlock() + " file=" + src; if (failures >= maxBlockAcquireFailures) { throw new IOException("Could not obtain block: " + blockInfo); } if (nodes == null || nodes.length == 0) { LOG.info("No node available for block: " + blockInfo); } LOG.info("Could not obtain block " + block.getBlock() + " from any node: " + ie); try { Thread.sleep(3000); } catch (InterruptedException iex) { } deadNodes.clear(); //2nd option is to remove only nodes[blockId] openInfo(); failures++; continue; } } St.Ack On Tue, Oct 20, 2009 at 2:12 PM, elsif <elsif.then@...> wrote: > > Hello Everyone, > > We are experiencing many HBase exceptions while running the attached > test program using > HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster > with 14 data nodes. > > The HBase operational performance degrades quickly once certain error > conditions are > encountered especially if there are issues with the META region. > > Has anyone else seen these exceptions listed below? > > We ran the test with 3 region servers and then with 14 region servers. > In both cases, we still got exceptions. > > In each test run, we shutdown HBase, removed the main /hbase > folder, restarted HBase and then ran the test program. > The test program uses a table named "fc_test". The program is started by > the following command: > > /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 > > The attached HBase test program starts 30 threads. Each thread loops, > inserting randomly generated keys > along with a fixed data pattern of 4,000,000 bytes. > > After running the test for 2-12 hours, the table scanning of "fc_test" > is broken and only a limited number of put > operations are still succeeding. > > > > EXCEPTION LIST > > ------------------------------ > 2009-10-15 03:28:39,005 DEBUG > org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered > on store data; time since last major compaction 89677908ms > 2009-10-15 03:28:39,012 DEBUG > org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 > file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 > 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not > obtain block blk_-6054029382496843671_324379 from any node: > java.io.IOException: No live nodes contain current block > ------------------------------ > 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping > response for sessionid:0x2244f6194a5000d after 1ms > 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca > <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca> > java.io.IOException: TIMED OUT > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) > 2009-10-15 07:41:54,344 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > state: Disconnected, type: None, path: null > 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server hfs-029014/10.1.29.14:2181 > 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected > local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] > ------------------------------ > 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException > at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) > at > > org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) > ------------------------------ > 2009-10-15 07:57:28,025 DEBUG > org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: > Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB > (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, > Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, > Evicted/Run=NaN > 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception > in createBlockOutputStream java.io.EOFException > 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_7166000434569842394_332632 > 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: > IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; > editcount=146 > ----------------------------- > 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception > while closing send thread for session 0x1244f6194780008 : Read error rc > = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] > 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: > Disconnecting ClientCnxn for session: 0x1244f6194780008 > 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: > EventThread shut down > ------------------------------ > 2009-10-13 12:30:58,207 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read > master address from ZooKeeper. Retrying. Error was: > java.io.IOException: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hbase/master > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) > at java.lang.Thread.run(Unknown Source) > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /hbase/master > at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) > ------------------------------ > 2009-10-14 15:12:42,664 WARN > org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 > org.apache.hadoop.hbase.Leases$LeaseStillHeldException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) > at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > Source) > at java.lang.reflect.Constructor.newInstance(Unknown Source) > at > > org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) > at > > org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) > at > > org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) > at java.lang.Thread.run(Unknown Source) > ------------------------------ > 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 > <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4> > java.io.IOException: Session Expired > at > > org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) > at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > 2009-10-14 15:12:43,567 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, > state: Expired, type: None, path: null > 2009-10-14 15:12:43,568 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session > expired > ------------------------------ > 2009-10-14 15:12:51,563 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: > java.io.IOException: Cannot append; log is closed > at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) > at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not > cancel cleanup thread, though no FileSystems are open > 2009-10-14 15:13:51,864 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down > HRegionServer: file system not available > java.io.IOException: File system is not available > at > > org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 > failed on local exception: java.nio.channels.ClosedByInterruptException > at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) > at org.apache.hadoop.ipc.Client.call(Client.java:742) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy1.getFileInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) > at > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) > at > > org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) > ... 9 more > Caused by: java.nio.channels.ClosedByInterruptException > at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown > Source) > at sun.nio.ch.SocketChannelImpl.write(Unknown Source) > at > > org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) > at > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) > at java.io.BufferedOutputStream.flushBuffer(Unknown Source) > at java.io.BufferedOutputStream.flush(Unknown Source) > at java.io.DataOutputStream.flush(Unknown Source) > at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) > at org.apache.hadoop.ipc.Client.call(Client.java:720) > ------------------------------ > > PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE > > 2009-10-13 00:00:00,508 ERROR > org.apache.hadoop.hbase.regionserver.HRegionServer: > org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row > out of range for ... > .... > at > org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) > at > > org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) > at > org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) > at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) > at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 7 on 60020, call put([B@1f5b92a, > [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: > error: java.io.IOException: Server not running, aborting > java.io.IOException: Server not running, aborting > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > 2009-10-12 10:06:57,812 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: > request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, > memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, > blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 > 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC > Server handler 3 on 60020, call put([B@1f52982, > [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: > error: java.io.IOException: Server not running, aborting > java.io.IOException: Server not running, aborting > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > ------------------------------ > > > > > > . > > |
|
|
Re: HBase Exceptions on version 0.20.1On my last set of tests I used two instances of the test program - 60
threads total generating rows containing 400kB of data. The highest load appeared to be on the region serving meta. On my cluster things started to break down at about 20,000 regions with a heap size of 6MB. You can always lower your heap size to generate issues quicker. Thanks, elsif stack wrote: > This second run has been going for > 24 hours now. Its loaded up near 2k > regions. I'm using defaults from your script which is 30 threads and 400k > cells. For 400k cells, I should probably be making adjustments upping the > flush and region size. I'm also noticing that all load seems to land on a > single regionserver. Did you notice that? Maybe I should use something > other than java Random? Maybe I should md5 the Random output? > > But I just noticed that you ran with 4M cells. 4M cells are pretty big. > HBase defaults are for cells much smaller. > > So I killed the above test and am now trying w/ 4M cells. I see that we're > flushing every 16 or so edits. We're spending loads of our time just > flushing and rolling WAL logs because the cells are all so large. Are 4M > cells your use case? If so, I'll try and get you tunings that'll work > better for these cell sizes. > > I've been watching ZooKeeper over last 24 hours. The average latency > serving zk response is about 5ms. The longest response is 120ms (I see > these stats by sending the quorum members the 'stat' command as in "echo > stat|nc HOST PORT"). This time around I gave zk its own disk. My guess is > that when I saw the TIMED OUT yesterday, it was probably zk client-side > issue.. a long GC or so. I have GC logging running at the moment so will be > able to correlate should it happen again. > > As for things slowing down, this third time loading I added in ganglia > logging. Will let you know. > > As to the exceptions you were seeing, here's some more on those: > > On the "INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block...", I > see those in my testing. The logs are at INFO level. They should probably > be WARN-level instead. See the code below from DFSClient. The map of nodes > carrying block replicas can shift while DFSClient is open (Looking at one > particular block from uploading I see that the local block was removed from > local datanode because the block was over-replicated). When we try to get > the local block, we can fail. Its logged as "No live nodes contain current > block". The current DFSClient map doesn't contain a node for the block. As > you can see in the code, we'll then ask NameNode for new map of blocks for > this file. As long as we get the block before cycle > failures, we'll see > INFO-level loggings (For some reason I see 3 loggings often before we'll > succeed and get the block). Its a failure if DFSClient throws an IOE out of > here. I'm just seeing the INFO loggings in my testing. On this second test > I added logging to DFSClient to see if I can figure why it takes 3 goes > often to find the right block. > > private DNAddrPair chooseDataNode(LocatedBlock block) > throws IOException { > while (true) { > DatanodeInfo[] nodes = block.getLocations(); > try { > DatanodeInfo chosenNode = bestNode(nodes, deadNodes); > InetSocketAddress targetAddr = > NetUtils.createSocketAddr(chosenNode.getName()); > return new DNAddrPair(chosenNode, targetAddr); > } catch (IOException ie) { > String blockInfo = block.getBlock() + " file=" + src; > if (failures >= maxBlockAcquireFailures) { > throw new IOException("Could not obtain block: " + blockInfo); > } > > if (nodes == null || nodes.length == 0) { > LOG.info("No node available for block: " + blockInfo); > } > LOG.info("Could not obtain block " + block.getBlock() + " from any > node: " + ie); > try { > Thread.sleep(3000); > } catch (InterruptedException iex) { > } > deadNodes.clear(); //2nd option is to remove only nodes[blockId] > openInfo(); > failures++; > continue; > } > } > > St.Ack > > > On Tue, Oct 20, 2009 at 2:12 PM, elsif <elsif.then@...> wrote: > > >> Hello Everyone, >> >> We are experiencing many HBase exceptions while running the attached >> test program using >> HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster >> with 14 data nodes. >> >> The HBase operational performance degrades quickly once certain error >> conditions are >> encountered especially if there are issues with the META region. >> >> Has anyone else seen these exceptions listed below? >> >> We ran the test with 3 region servers and then with 14 region servers. >> In both cases, we still got exceptions. >> >> In each test run, we shutdown HBase, removed the main /hbase >> folder, restarted HBase and then ran the test program. >> The test program uses a table named "fc_test". The program is started by >> the following command: >> >> /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 >> >> The attached HBase test program starts 30 threads. Each thread loops, >> inserting randomly generated keys >> along with a fixed data pattern of 4,000,000 bytes. >> >> After running the test for 2-12 hours, the table scanning of "fc_test" >> is broken and only a limited number of put >> operations are still succeeding. >> >> >> >> EXCEPTION LIST >> >> ------------------------------ >> 2009-10-15 03:28:39,005 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered >> on store data; time since last major compaction 89677908ms >> 2009-10-15 03:28:39,012 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 >> file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 >> 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not >> obtain block blk_-6054029382496843671_324379 from any node: >> java.io.IOException: No live nodes contain current block >> ------------------------------ >> 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >> response for sessionid:0x2244f6194a5000d after 1ms >> 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca >> <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca> >> java.io.IOException: TIMED OUT >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) >> 2009-10-15 07:41:54,344 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, >> state: Disconnected, type: None, path: null >> 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server hfs-029014/10.1.29.14:2181 >> 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming >> connection to java.nio.channels.SocketChannel[connected >> local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] >> ------------------------------ >> 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException >> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) >> at sun.nio.ch.SocketChannelImpl.write(Unknown Source) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) >> ------------------------------ >> 2009-10-15 07:57:28,025 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB >> (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, >> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, >> Evicted/Run=NaN >> 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception >> in createBlockOutputStream java.io.EOFException >> 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: >> Abandoning block blk_7166000434569842394_332632 >> 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: >> IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; >> editcount=146 >> ----------------------------- >> 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception >> while closing send thread for session 0x1244f6194780008 : Read error rc >> = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] >> 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: >> Disconnecting ClientCnxn for session: 0x1244f6194780008 >> 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: >> EventThread shut down >> ------------------------------ >> 2009-10-13 12:30:58,207 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read >> master address from ZooKeeper. Retrying. Error was: >> java.io.IOException: >> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >> NoNode for /hbase/master >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) >> at java.lang.Thread.run(Unknown Source) >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> KeeperErrorCode = NoNode for /hbase/master >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) >> ------------------------------ >> 2009-10-14 15:12:42,664 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 >> org.apache.hadoop.hbase.Leases$LeaseStillHeldException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown >> Source) >> at java.lang.reflect.Constructor.newInstance(Unknown Source) >> at >> >> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) >> at >> >> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) >> at >> >> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) >> at java.lang.Thread.run(Unknown Source) >> ------------------------------ >> 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 >> <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4> >> java.io.IOException: Session Expired >> at >> >> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) >> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) >> 2009-10-14 15:12:43,567 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, >> state: Expired, type: None, path: null >> 2009-10-14 15:12:43,568 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session >> expired >> ------------------------------ >> 2009-10-14 15:12:51,563 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> java.io.IOException: Cannot append; log is closed >> at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) >> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ------------------------------ >> 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not >> cancel cleanup thread, though no FileSystems are open >> 2009-10-14 15:13:51,864 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down >> HRegionServer: file system not available >> java.io.IOException: File system is not available >> at >> >> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) >> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 >> failed on local exception: java.nio.channels.ClosedByInterruptException >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) >> at org.apache.hadoop.ipc.Client.call(Client.java:742) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) >> at $Proxy1.getFileInfo(Unknown Source) >> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at $Proxy1.getFileInfo(Unknown Source) >> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) >> at >> >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) >> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) >> at >> >> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) >> ... 9 more >> Caused by: java.nio.channels.ClosedByInterruptException >> at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown >> Source) >> at sun.nio.ch.SocketChannelImpl.write(Unknown Source) >> at >> >> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) >> at >> >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) >> at >> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) >> at >> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) >> at java.io.BufferedOutputStream.flushBuffer(Unknown Source) >> at java.io.BufferedOutputStream.flush(Unknown Source) >> at java.io.DataOutputStream.flush(Unknown Source) >> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) >> at org.apache.hadoop.ipc.Client.call(Client.java:720) >> ------------------------------ >> >> PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE >> >> 2009-10-13 00:00:00,508 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row >> out of range for ... >> .... >> at >> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) >> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) >> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ------------------------------ >> 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server handler 7 on 60020, call put([B@1f5b92a, >> [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: >> error: java.io.IOException: Server not running, aborting >> java.io.IOException: Server not running, aborting >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) >> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> 2009-10-12 10:06:57,812 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >> request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, >> memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, >> blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 >> 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server handler 3 on 60020, call put([B@1f52982, >> [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: >> error: java.io.IOException: Server not running, aborting >> java.io.IOException: Server not running, aborting >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) >> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ------------------------------ >> >> >> >> >> >> . >> >> >> > > |
|
|
Re: HBase Exceptions on version 0.20.1So that's about 1500 regions per region server. Should be more like 200-250 I think because as a practical matter the recovery time for redeploying 1500 regions upon hardware failure will block clients for a long time. It may not even be possible if all region servers are almost at the point of overcommittment when this large chunk of regions suddenly comes up for reassignment. AFAIK, Google's BigTable is tuned for 100 per region server, 200 MB per region. This is to minimize churn and region unavailability time should a server fail and also allows the node to serve the data mostly out of RAM.
This is with default max file size (256 MB)? What happens if you up that to 1 GB? Set hbase.hregion.max.filesize property in hbase-site.xml to make this change globally. - Andy ________________________________ From: elsif <elsif.then@...> To: hbase-user@... Sent: Wed, November 11, 2009 6:44:34 AM Subject: Re: HBase Exceptions on version 0.20.1 On my last set of tests I used two instances of the test program - 60 threads total generating rows containing 400kB of data. The highest load appeared to be on the region serving meta. On my cluster things started to break down at about 20,000 regions with a heap size of 6MB. You can always lower your heap size to generate issues quicker. Thanks, elsif stack wrote: > This second run has been going for > 24 hours now. Its loaded up near 2k > regions. I'm using defaults from your script which is 30 threads and 400k > cells. For 400k cells, I should probably be making adjustments upping the > flush and region size. I'm also noticing that all load seems to land on a > single regionserver. Did you notice that? Maybe I should use something > other than java Random? Maybe I should md5 the Random output? > > But I just noticed that you ran with 4M cells. 4M cells are pretty big. > HBase defaults are for cells much smaller. > > So I killed the above test and am now trying w/ 4M cells. I see that we're > flushing every 16 or so edits. We're spending loads of our time just > flushing and rolling WAL logs because the cells are all so large. Are 4M > cells your use case? If so, I'll try and get you tunings that'll work > better for these cell sizes. > > I've been watching ZooKeeper over last 24 hours. The average latency > serving zk response is about 5ms. The longest response is 120ms (I see > these stats by sending the quorum members the 'stat' command as in "echo > stat|nc HOST PORT"). This time around I gave zk its own disk. My guess is > that when I saw the TIMED OUT yesterday, it was probably zk client-side > issue.. a long GC or so. I have GC logging running at the moment so will be > able to correlate should it happen again. > > As for things slowing down, this third time loading I added in ganglia > logging. Will let you know. > > As to the exceptions you were seeing, here's some more on those: > > On the "INFO org.apache.hadoop.hdfs.DFSClient: Could not obtain block...", I > see those in my testing. The logs are at INFO level. They should probably > be WARN-level instead. See the code below from DFSClient. The map of nodes > carrying block replicas can shift while DFSClient is open (Looking at one > particular block from uploading I see that the local block was removed from > local datanode because the block was over-replicated). When we try to get > the local block, we can fail. Its logged as "No live nodes contain current > block". The current DFSClient map doesn't contain a node for the block. As > you can see in the code, we'll then ask NameNode for new map of blocks for > this file. As long as we get the block before cycle > failures, we'll see > INFO-level loggings (For some reason I see 3 loggings often before we'll > succeed and get the block). Its a failure if DFSClient throws an IOE out of > here. I'm just seeing the INFO loggings in my testing. On this second test > I added logging to DFSClient to see if I can figure why it takes 3 goes > often to find the right block. > > private DNAddrPair chooseDataNode(LocatedBlock block) > throws IOException { > while (true) { > DatanodeInfo[] nodes = block.getLocations(); > try { > DatanodeInfo chosenNode = bestNode(nodes, deadNodes); > InetSocketAddress targetAddr = > NetUtils.createSocketAddr(chosenNode.getName()); > return new DNAddrPair(chosenNode, targetAddr); > } catch (IOException ie) { > String blockInfo = block.getBlock() + " file=" + src; > if (failures >= maxBlockAcquireFailures) { > throw new IOException("Could not obtain block: " + blockInfo); > } > > if (nodes == null || nodes.length == 0) { > LOG.info("No node available for block: " + blockInfo); > } > LOG.info("Could not obtain block " + block.getBlock() + " from any > node: " + ie); > try { > Thread.sleep(3000); > } catch (InterruptedException iex) { > } > deadNodes.clear(); //2nd option is to remove only nodes[blockId] > openInfo(); > failures++; > continue; > } > } > > St.Ack > > > On Tue, Oct 20, 2009 at 2:12 PM, elsif <elsif.then@...> wrote: > > >> Hello Everyone, >> >> We are experiencing many HBase exceptions while running the attached >> test program using >> HBase 0.20.1 and Hadoop 0.20.1 (r810220) releases on a Hadoop cluster >> with 14 data nodes. >> >> The HBase operational performance degrades quickly once certain error >> conditions are >> encountered especially if there are issues with the META region. >> >> Has anyone else seen these exceptions listed below? >> >> We ran the test with 3 region servers and then with 14 region servers. >> In both cases, we still got exceptions. >> >> In each test run, we shutdown HBase, removed the main /hbase >> folder, restarted HBase and then ran the test program. >> The test program uses a table named "fc_test". The program is started by >> the following command: >> >> /opt/hbase/bin/hbase org.apache.hadoop.hbase.util.TableTest 30 4000000 >> >> The attached HBase test program starts 30 threads. Each thread loops, >> inserting randomly generated keys >> along with a fixed data pattern of 4,000,000 bytes. >> >> After running the test for 2-12 hours, the table scanning of "fc_test" >> is broken and only a limited number of put >> operations are still succeeding. >> >> >> >> EXCEPTION LIST >> >> ------------------------------ >> 2009-10-15 03:28:39,005 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: Major compaction triggered >> on store data; time since last major compaction 89677908ms >> 2009-10-15 03:28:39,012 DEBUG >> org.apache.hadoop.hbase.regionserver.Store: Started compaction of 4 >> file(s) into /hbase/fc_test/compaction.dir/1073627844, seqid=396374 >> 2009-10-15 03:28:39,245 INFO org.apache.hadoop.hdfs.DFSClient: Could not >> obtain block blk_-6054029382496843671_324379 from any node: >> java.io.IOException: No live nodes contain current block >> ------------------------------ >> 2009-10-15 07:41:43,426 DEBUG org.apache.zookeeper.ClientCnxn: Got ping >> response for sessionid:0x2244f6194a5000d after 1ms >> 2009-10-15 07:41:53,993 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x2244f6194a5000d to sun.nio.ch.SelectionKeyImpl@139f5ca >> <mailto:sun.nio.ch.SelectionKeyImpl@139f5ca> >> java.io.IOException: TIMED OUT >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906) >> 2009-10-15 07:41:54,344 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, >> state: Disconnected, type: None, path: null >> 2009-10-15 07:41:55,084 INFO org.apache.zookeeper.ClientCnxn: Attempting >> connection to server hfs-029014/10.1.29.14:2181 >> 2009-10-15 07:41:55,085 INFO org.apache.zookeeper.ClientCnxn: Priming >> connection to java.nio.channels.SocketChannel[connected >> local=/10.1.29.19:39421 remote=hfs-029014/10.1.29.14:2181] >> ------------------------------ >> 2009-10-15 08:37:40,982 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server handler 6 on 60020 caught: java.nio.channels.ClosedChannelException >> at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(Unknown Source) >> at sun.nio.ch.SocketChannelImpl.write(Unknown Source) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1125) >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:615) >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:679) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:943) >> ------------------------------ >> 2009-10-15 07:57:28,025 DEBUG >> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes: >> Total=4.652237MB (4878224), Free=987.6603MB (1035636848), Max=992.3125MB >> (1040515072), Counts: Blocks=0, Access=51003, Hit=0, Miss=51003, >> Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%, >> Evicted/Run=NaN >> 2009-10-15 07:57:28,189 INFO org.apache.hadoop.hdfs.DFSClient: Exception >> in createBlockOutputStream java.io.EOFException >> 2009-10-15 07:57:28,190 INFO org.apache.hadoop.hdfs.DFSClient: >> Abandoning block blk_7166000434569842394_332632 >> 2009-10-15 07:57:28,306 WARN org.apache.hadoop.hbase.regionserver.HLog: >> IPC Server handler 5 on 60020 took 2297ms appending an edit to hlog; >> editcount=146 >> ----------------------------- >> 2009-10-15 09:10:00,559 INFO org.apache.zookeeper.ClientCnxn: Exception >> while closing send thread for session 0x1244f6194780008 : Read error rc >> = -1 java.nio.DirectByteBuffer[pos=0 lim=4 cap=4] >> 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: >> Disconnecting ClientCnxn for session: 0x1244f6194780008 >> 2009-10-15 09:10:00,739 INFO org.apache.zookeeper.ClientCnxn: >> EventThread shut down >> ------------------------------ >> 2009-10-13 12:30:58,207 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to read >> master address from ZooKeeper. Retrying. Error was: >> java.io.IOException: >> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = >> NoNode for /hbase/master >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:331) >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:240) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1338) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1370) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:427) >> at java.lang.Thread.run(Unknown Source) >> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: >> KeeperErrorCode = NoNode for /hbase/master >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:892) >> at >> >> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:327) >> ------------------------------ >> 2009-10-14 15:12:42,664 WARN >> org.apache.hadoop.hbase.regionserver.HRegionServer: Attempt=1 >> org.apache.hadoop.hbase.Leases$LeaseStillHeldException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown >> Source) >> at java.lang.reflect.Constructor.newInstance(Unknown Source) >> at >> >> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) >> at >> >> org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48) >> at >> >> org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) >> at java.lang.Thread.run(Unknown Source) >> ------------------------------ >> 2009-10-14 15:12:43,566 WARN org.apache.zookeeper.ClientCnxn: Exception >> closing session 0x1244f619478000d to sun.nio.ch.SelectionKeyImpl@15e32c4 >> <mailto:sun.nio.ch.SelectionKeyImpl@15e32c4> >> java.io.IOException: Session Expired >> at >> >> org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589) >> at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709) >> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) >> 2009-10-14 15:12:43,567 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, >> state: Expired, type: None, path: null >> 2009-10-14 15:12:43,568 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session >> expired >> ------------------------------ >> 2009-10-14 15:12:51,563 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> java.io.IOException: Cannot append; log is closed >> at org.apache.hadoop.hbase.regionserver.HLog.append(HLog.java:584) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1445) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1244) >> at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1831) >> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ------------------------------ >> 2009-10-14 15:13:51,852 INFO org.apache.hadoop.fs.FileSystem: Could not >> cancel cleanup thread, though no FileSystems are open >> 2009-10-14 15:13:51,864 FATAL >> org.apache.hadoop.hbase.regionserver.HRegionServer: Shutting down >> HRegionServer: file system not available >> java.io.IOException: File system is not available >> at >> >> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:125) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkFileSystem(HRegionServer.java:901) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:849) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:832) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1839) >> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> Caused by: java.io.IOException: Call to hfs-029010/10.1.29.10:8020 >> failed on local exception: java.nio.channels.ClosedByInterruptException >> at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) >> at org.apache.hadoop.ipc.Client.call(Client.java:742) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) >> at $Proxy1.getFileInfo(Unknown Source) >> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at $Proxy1.getFileInfo(Unknown Source) >> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:615) >> at >> >> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) >> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643) >> at >> >> org.apache.hadoop.hbase.util.FSUtils.checkFileSystemAvailable(FSUtils.java:114) >> ... 9 more >> Caused by: java.nio.channels.ClosedByInterruptException >> at java.nio.channels.spi.AbstractInterruptibleChannel.end(Unknown >> Source) >> at sun.nio.ch.SocketChannelImpl.write(Unknown Source) >> at >> >> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) >> at >> >> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) >> at >> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) >> at >> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) >> at java.io.BufferedOutputStream.flushBuffer(Unknown Source) >> at java.io.BufferedOutputStream.flush(Unknown Source) >> at java.io.DataOutputStream.flush(Unknown Source) >> at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:480) >> at org.apache.hadoop.ipc.Client.call(Client.java:720) >> ------------------------------ >> >> PREVIOUS EXCEPTIONS FROM OTHER TESTS INCLUDE >> >> 2009-10-13 00:00:00,508 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: >> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row >> out of range for ... >> .... >> at >> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1527) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1559) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1627) >> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2277) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1768) >> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ------------------------------ >> 2009-10-12 10:06:57,809 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server handler 7 on 60020, call put([B@1f5b92a, >> [Lorg.apache.hadoop.hbase.client.Put;@14f8e05) from 10.1.29.10:36017: >> error: java.io.IOException: Server not running, aborting >> java.io.IOException: Server not running, aborting >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) >> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> 2009-10-12 10:06:57,812 INFO >> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics: >> request=0.0, regions=12, stores=24, storefiles=15, storefileIndexSize=8, >> memstoreSize=378, usedHeap=441, maxHeap=992, blockCacheSize=4878224, >> blockCacheFree=1035636848, blockCacheCount=0, blockCacheHitRatio=0 >> 2009-10-12 10:06:57,813 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server handler 3 on 60020, call put([B@1f52982, >> [Lorg.apache.hadoop.hbase.client.Put;@78283f) from 10.1.29.10:36017: >> error: java.io.IOException: Server not running, aborting >> java.io.IOException: Server not running, aborting >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2299) >> at >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1806) >> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) >> at java.lang.reflect.Method.invoke(Unknown Source) >> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:650) >> at >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> ------------------------------ >> >> >> >> >> >> . >> >> >> > > |
| Free embeddable forum powered by Nabble | Forum Help |