long pauses during hbase.PerformanceEvaluation tests

View: New views
9 Messages — Rating Filter:   Alert me  

long pauses during hbase.PerformanceEvaluation tests

by Sujee Maniyam-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all

I have been running 'bin/hbase
org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an idea
of the cluster.  I see long PAUSES during writes.

here is my setup:
- 5 nodes on EC2 (m1.large) :  1 hbase master + 4 regions
- hadoop & hbase get 2G heap

- I don't see any JVM heap getting full (not even close)
- during the pauses, the CPU usage is pretty much zero (on client,
running on master machine & region servers)
- no swap going on
- network connections between machines are pretty fast

I am curious to know what is causing the delay.  I don't see errors in logs.
Could region-splits be causing such long delays?  Because re-runs go
through much faster.

Any pointers/explanations much appreciated.

Here is a couple of examples:

A)  SequentialWrite : TestTable created
notice the large gaps in timestamps (*)
# bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
        ==> 201 writes / sec


    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Table {NAME =>
    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
    BLOCKCACHE => 'true'}]} created
    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Start sequentialWrite
    at offset 0 for 1048576 rows
    09/10/28 23:18:10 INFO hbase.PerformanceEvaluation: 0/104857/1048576
    09/10/28 23:18:24 INFO hbase.PerformanceEvaluation: 0/209714/1048576

 *  09/10/28 23:52:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
    09/10/29 00:25:50 INFO hbase.PerformanceEvaluation: 0/419428/1048576
    09/10/29 00:26:06 INFO hbase.PerformanceEvaluation: 0/524285/1048576
    09/10/29 00:26:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576

 *  09/10/29 00:41:47 INFO hbase.PerformanceEvaluation: 0/733999/1048576
    09/10/29 00:43:34 INFO hbase.PerformanceEvaluation: 0/838856/1048576
    09/10/29 00:43:49 INFO hbase.PerformanceEvaluation: 0/943713/1048576
    09/10/29 00:44:43 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
    09/10/29 00:44:44 INFO hbase.PerformanceEvaluation: Finished
    sequentialWrite in 5207981ms at offset 0 for 1048576 rows


During one of the pauses, here is what I see on one region server.
Notice how the table split took 14 mins

@ hadoop4
    2009-10-29 00:26:33,925 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Starting split of region TestTable,0000088768,1256773950625
    2009-10-29 00:41:30,428 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Closed TestTable,0000088768,1256773950625
    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    region TestTable,0000088768,1256775993930/1603364931 available; sequence id
    is 58208257
    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Closed TestTable,0000088768,1256775993930
    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    region TestTable,0000249152,1256775993930/1798535948 available; sequence id
    is 58208258
    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
    Closed TestTable,0000249152,1256775993930
    2009-10-29 00:41:32,422 INFO org.apache.zookeeper.ZooKeeper: Initiating
    client connection,
    connectString=domU-12-31-39-06-B9-61.compute-1.internal:2181,domU-12-31-39-06-E0-62.compute-1.internal:2181,domU-12-31-39-07-8A-22.compute-1.internal:2181,domU-12-31-39-06-B0-32.compute-1.internal:2181,domU-12-31-39-06-3D-A1.compute-1.internal:2181
    sessionTimeout=60000
    watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@1f5b63822009-10-29
    00:41:32,422 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to
    server domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181
    2009-10-29 00:41:32,466 INFO org.apache.zookeeper.ClientCnxn: Priming
    connection to java.nio.channels.SocketChannel[connected
    local=/10.208.190.143:53364
    remote=domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181]2009-10-29
    00:41:32,467 INFO org.apache.zookeeper.ClientCnxn: Server connection
    successful
    2009-10-29 00:41:32,528 INFO
    org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split, META
    updated, and report to master all successful. Old region=REGION => {NAME =>
    'TestTable,0000088768,1256773950625', STARTKE
    Y => '0000088768', ENDKEY => '', ENCODED => 1389834479, OFFLINE => true,
    SPLIT => true, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME =>
    'info', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647',
    BLOCKSIZE => '6
    5536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, new regions:
    TestTable,0000088768,1256775993930, TestTable,0000249152,1256775993930.
    Split took 14mins, 58sec
    2009-10-29 00:41:34,379 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
    TestTable,00002 49152,1256775993930
    2009-10-29 00:41:34,380 INFO
    org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
    MSG_REGION_OPEN: TestTable,0000249152,1256775993930



B) 'TestTable' exists.  When I re-ran the test,

  ==> 3898 writes / sec

    09/10/29 04:28:22 INFO hbase.PerformanceEvaluation: Finished
    sequentialWrite in 269966ms at offset 0 for 1048576 rows


C) For kickers, here is a RandomWrite
(Table created)

bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1

      ==> 304 writes /sec

    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Table {NAME =>
    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
    BLOCKCACHE => 'true'}]} created
    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Start randomWrite at
    offset 0 for 1048576 rows
    09/10/29 05:34:01 INFO hbase.PerformanceEvaluation: 0/104857/1048576
    09/10/29 05:34:15 INFO hbase.PerformanceEvaluation: 0/209714/1048576
    09/10/29 05:35:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
    09/10/29 05:36:25 INFO hbase.PerformanceEvaluation: 0/419428/1048576
    09/10/29 05:36:38 INFO hbase.PerformanceEvaluation: 0/524285/1048576
    09/10/29 05:37:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
    09/10/29 05:37:37 INFO hbase.PerformanceEvaluation: 0/733999/1048576
    09/10/29 05:37:51 INFO hbase.PerformanceEvaluation: 0/838856/1048576
  * 09/10/29 05:56:47 INFO hbase.PerformanceEvaluation: 0/943713/1048576
  * 09/10/29 06:31:12 INFO hbase.PerformanceEvaluation: 0/1048570/1048576

09/10/29 06:31:13 INFO hbase.PerformanceEvaluation: Finished randomWrite in
3446240ms at offset 0 for 1048576 rows


D) randomWrite rerun, with existing TestTable
bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1

        ==> 3075 writes / sec
09/10/29 06:57:04 INFO hbase.PerformanceEvaluation: Finished
randomWrite in 341746ms at offset 0 for 1048576 rows


Thanks
Sujee

--
http://sujee.net

Re: long pauses during hbase.PerformanceEvaluation tests

by Sujee Maniyam-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

forgot to add that I am running
    hbase v0.20.1
    hadoop v0.20.1
pretty much at default settings...

On Thu, Oct 29, 2009 at 12:01 AM, Sujee Maniyam <sujee@...> wrote:

> Hi all
>
> I have been running 'bin/hbase
> org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an idea
> of the cluster.  I see long PAUSES during writes.
>
> here is my setup:
> - 5 nodes on EC2 (m1.large) :  1 hbase master + 4 regions
> - hadoop & hbase get 2G heap
>
> - I don't see any JVM heap getting full (not even close)
> - during the pauses, the CPU usage is pretty much zero (on client,
> running on master machine & region servers)
> - no swap going on
> - network connections between machines are pretty fast
>
> I am curious to know what is causing the delay.  I don't see errors in logs.
> Could region-splits be causing such long delays?  Because re-runs go
> through much faster.
>
> Any pointers/explanations much appreciated.
>
> Here is a couple of examples:
>
> A)  SequentialWrite : TestTable created
> notice the large gaps in timestamps (*)
> # bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
>        ==> 201 writes / sec
>
>
>    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Table {NAME =>
>    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
>    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>    BLOCKCACHE => 'true'}]} created
>    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Start sequentialWrite
>    at offset 0 for 1048576 rows
>    09/10/28 23:18:10 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>    09/10/28 23:18:24 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>
>  *  09/10/28 23:52:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>    09/10/29 00:25:50 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>    09/10/29 00:26:06 INFO hbase.PerformanceEvaluation: 0/524285/1048576
>    09/10/29 00:26:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
>
>  *  09/10/29 00:41:47 INFO hbase.PerformanceEvaluation: 0/733999/1048576
>    09/10/29 00:43:34 INFO hbase.PerformanceEvaluation: 0/838856/1048576
>    09/10/29 00:43:49 INFO hbase.PerformanceEvaluation: 0/943713/1048576
>    09/10/29 00:44:43 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
>    09/10/29 00:44:44 INFO hbase.PerformanceEvaluation: Finished
>    sequentialWrite in 5207981ms at offset 0 for 1048576 rows
>
>
> During one of the pauses, here is what I see on one region server.
> Notice how the table split took 14 mins
>
> @ hadoop4
>    2009-10-29 00:26:33,925 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>    Starting split of region TestTable,0000088768,1256773950625
>    2009-10-29 00:41:30,428 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>    Closed TestTable,0000088768,1256773950625
>    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>    region TestTable,0000088768,1256775993930/1603364931 available; sequence id
>    is 58208257
>    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>    Closed TestTable,0000088768,1256775993930
>    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>    region TestTable,0000249152,1256775993930/1798535948 available; sequence id
>    is 58208258
>    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>    Closed TestTable,0000249152,1256775993930
>    2009-10-29 00:41:32,422 INFO org.apache.zookeeper.ZooKeeper: Initiating
>    client connection,
>    connectString=domU-12-31-39-06-B9-61.compute-1.internal:2181,domU-12-31-39-06-E0-62.compute-1.internal:2181,domU-12-31-39-07-8A-22.compute-1.internal:2181,domU-12-31-39-06-B0-32.compute-1.internal:2181,domU-12-31-39-06-3D-A1.compute-1.internal:2181
>    sessionTimeout=60000
>    watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@1f5b63822009-10-29
>    00:41:32,422 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to
>    server domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181
>    2009-10-29 00:41:32,466 INFO org.apache.zookeeper.ClientCnxn: Priming
>    connection to java.nio.channels.SocketChannel[connected
>    local=/10.208.190.143:53364
>    remote=domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181]2009-10-29
>    00:41:32,467 INFO org.apache.zookeeper.ClientCnxn: Server connection
>    successful
>    2009-10-29 00:41:32,528 INFO
>    org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split, META
>    updated, and report to master all successful. Old region=REGION => {NAME =>
>    'TestTable,0000088768,1256773950625', STARTKE
>    Y => '0000088768', ENDKEY => '', ENCODED => 1389834479, OFFLINE => true,
>    SPLIT => true, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME =>
>    'info', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647',
>    BLOCKSIZE => '6
>    5536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, new regions:
>    TestTable,0000088768,1256775993930, TestTable,0000249152,1256775993930.
>    Split took 14mins, 58sec
>    2009-10-29 00:41:34,379 INFO
>    org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>    TestTable,00002 49152,1256775993930
>    2009-10-29 00:41:34,380 INFO
>    org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>    MSG_REGION_OPEN: TestTable,0000249152,1256775993930
>
>
>
> B) 'TestTable' exists.  When I re-ran the test,
>
>  ==> 3898 writes / sec
>
>    09/10/29 04:28:22 INFO hbase.PerformanceEvaluation: Finished
>    sequentialWrite in 269966ms at offset 0 for 1048576 rows
>
>
> C) For kickers, here is a RandomWrite
> (Table created)
>
> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1
>
>      ==> 304 writes /sec
>
>    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Table {NAME =>
>    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
>    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>    BLOCKCACHE => 'true'}]} created
>    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Start randomWrite at
>    offset 0 for 1048576 rows
>    09/10/29 05:34:01 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>    09/10/29 05:34:15 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>    09/10/29 05:35:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>    09/10/29 05:36:25 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>    09/10/29 05:36:38 INFO hbase.PerformanceEvaluation: 0/524285/1048576
>    09/10/29 05:37:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
>    09/10/29 05:37:37 INFO hbase.PerformanceEvaluation: 0/733999/1048576
>    09/10/29 05:37:51 INFO hbase.PerformanceEvaluation: 0/838856/1048576
>  * 09/10/29 05:56:47 INFO hbase.PerformanceEvaluation: 0/943713/1048576
>  * 09/10/29 06:31:12 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
>
> 09/10/29 06:31:13 INFO hbase.PerformanceEvaluation: Finished randomWrite in
> 3446240ms at offset 0 for 1048576 rows
>
>
> D) randomWrite rerun, with existing TestTable
> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1
>
>        ==> 3075 writes / sec
> 09/10/29 06:57:04 INFO hbase.PerformanceEvaluation: Finished
> randomWrite in 341746ms at offset 0 for 1048576 rows
>
>
> Thanks
> Sujee
>
> --
> http://sujee.net
>



--
http://sujee.net

Re: long pauses during hbase.PerformanceEvaluation tests

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

14 minutes seems way too much, anything relevant in the region server
logs around the same time? (The one who holds the region at that
point)

J-D

On Thu, Oct 29, 2009 at 1:11 AM, Sujee Maniyam <sujee@...> wrote:

> forgot to add that I am running
>    hbase v0.20.1
>    hadoop v0.20.1
> pretty much at default settings...
>
> On Thu, Oct 29, 2009 at 12:01 AM, Sujee Maniyam <sujee@...> wrote:
>> Hi all
>>
>> I have been running 'bin/hbase
>> org.apache.hadoop.hbase.PerformanceEvaluation ' script to get an idea
>> of the cluster.  I see long PAUSES during writes.
>>
>> here is my setup:
>> - 5 nodes on EC2 (m1.large) :  1 hbase master + 4 regions
>> - hadoop & hbase get 2G heap
>>
>> - I don't see any JVM heap getting full (not even close)
>> - during the pauses, the CPU usage is pretty much zero (on client,
>> running on master machine & region servers)
>> - no swap going on
>> - network connections between machines are pretty fast
>>
>> I am curious to know what is causing the delay.  I don't see errors in logs.
>> Could region-splits be causing such long delays?  Because re-runs go
>> through much faster.
>>
>> Any pointers/explanations much appreciated.
>>
>> Here is a couple of examples:
>>
>> A)  SequentialWrite : TestTable created
>> notice the large gaps in timestamps (*)
>> # bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
>>        ==> 201 writes / sec
>>
>>
>>    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Table {NAME =>
>>    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
>>    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>>    BLOCKCACHE => 'true'}]} created
>>    09/10/28 23:17:55 INFO hbase.PerformanceEvaluation: Start sequentialWrite
>>    at offset 0 for 1048576 rows
>>    09/10/28 23:18:10 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>>    09/10/28 23:18:24 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>>
>>  *  09/10/28 23:52:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>>    09/10/29 00:25:50 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>>    09/10/29 00:26:06 INFO hbase.PerformanceEvaluation: 0/524285/1048576
>>    09/10/29 00:26:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
>>
>>  *  09/10/29 00:41:47 INFO hbase.PerformanceEvaluation: 0/733999/1048576
>>    09/10/29 00:43:34 INFO hbase.PerformanceEvaluation: 0/838856/1048576
>>    09/10/29 00:43:49 INFO hbase.PerformanceEvaluation: 0/943713/1048576
>>    09/10/29 00:44:43 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
>>    09/10/29 00:44:44 INFO hbase.PerformanceEvaluation: Finished
>>    sequentialWrite in 5207981ms at offset 0 for 1048576 rows
>>
>>
>> During one of the pauses, here is what I see on one region server.
>> Notice how the table split took 14 mins
>>
>> @ hadoop4
>>    2009-10-29 00:26:33,925 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Starting split of region TestTable,0000088768,1256773950625
>>    2009-10-29 00:41:30,428 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Closed TestTable,0000088768,1256773950625
>>    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    region TestTable,0000088768,1256775993930/1603364931 available; sequence id
>>    is 58208257
>>    2009-10-29 00:41:32,218 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Closed TestTable,0000088768,1256775993930
>>    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    region TestTable,0000249152,1256775993930/1798535948 available; sequence id
>>    is 58208258
>>    2009-10-29 00:41:32,407 INFO org.apache.hadoop.hbase.regionserver.HRegion:
>>    Closed TestTable,0000249152,1256775993930
>>    2009-10-29 00:41:32,422 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>    client connection,
>>    connectString=domU-12-31-39-06-B9-61.compute-1.internal:2181,domU-12-31-39-06-E0-62.compute-1.internal:2181,domU-12-31-39-07-8A-22.compute-1.internal:2181,domU-12-31-39-06-B0-32.compute-1.internal:2181,domU-12-31-39-06-3D-A1.compute-1.internal:2181
>>    sessionTimeout=60000
>>    watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@1f5b63822009-10-29
>>    00:41:32,422 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to
>>    server domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181
>>    2009-10-29 00:41:32,466 INFO org.apache.zookeeper.ClientCnxn: Priming
>>    connection to java.nio.channels.SocketChannel[connected
>>    local=/10.208.190.143:53364
>>    remote=domU-12-31-39-06-B0-32.compute-1.internal/10.208.183.192:2181]2009-10-29
>>    00:41:32,467 INFO org.apache.zookeeper.ClientCnxn: Server connection
>>    successful
>>    2009-10-29 00:41:32,528 INFO
>>    org.apache.hadoop.hbase.regionserver.CompactSplitThread: region split, META
>>    updated, and report to master all successful. Old region=REGION => {NAME =>
>>    'TestTable,0000088768,1256773950625', STARTKE
>>    Y => '0000088768', ENDKEY => '', ENCODED => 1389834479, OFFLINE => true,
>>    SPLIT => true, TABLE => {{NAME => 'TestTable', FAMILIES => [{NAME =>
>>    'info', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647',
>>    BLOCKSIZE => '6
>>    5536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}, new regions:
>>    TestTable,0000088768,1256775993930, TestTable,0000249152,1256775993930.
>>    Split took 14mins, 58sec
>>    2009-10-29 00:41:34,379 INFO
>>    org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGION_OPEN:
>>    TestTable,00002 49152,1256775993930
>>    2009-10-29 00:41:34,380 INFO
>>    org.apache.hadoop.hbase.regionserver.HRegionServer: Worker:
>>    MSG_REGION_OPEN: TestTable,0000249152,1256775993930
>>
>>
>>
>> B) 'TestTable' exists.  When I re-ran the test,
>>
>>  ==> 3898 writes / sec
>>
>>    09/10/29 04:28:22 INFO hbase.PerformanceEvaluation: Finished
>>    sequentialWrite in 269966ms at offset 0 for 1048576 rows
>>
>>
>> C) For kickers, here is a RandomWrite
>> (Table created)
>>
>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1
>>
>>      ==> 304 writes /sec
>>
>>    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Table {NAME =>
>>    'TestTable', FAMILIES => [{NAME => 'info', COMPRESSION => 'NONE', VERSIONS
>>    => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>>    BLOCKCACHE => 'true'}]} created
>>    09/10/29 05:33:46 INFO hbase.PerformanceEvaluation: Start randomWrite at
>>    offset 0 for 1048576 rows
>>    09/10/29 05:34:01 INFO hbase.PerformanceEvaluation: 0/104857/1048576
>>    09/10/29 05:34:15 INFO hbase.PerformanceEvaluation: 0/209714/1048576
>>    09/10/29 05:35:16 INFO hbase.PerformanceEvaluation: 0/314571/1048576
>>    09/10/29 05:36:25 INFO hbase.PerformanceEvaluation: 0/419428/1048576
>>    09/10/29 05:36:38 INFO hbase.PerformanceEvaluation: 0/524285/1048576
>>    09/10/29 05:37:25 INFO hbase.PerformanceEvaluation: 0/629142/1048576
>>    09/10/29 05:37:37 INFO hbase.PerformanceEvaluation: 0/733999/1048576
>>    09/10/29 05:37:51 INFO hbase.PerformanceEvaluation: 0/838856/1048576
>>  * 09/10/29 05:56:47 INFO hbase.PerformanceEvaluation: 0/943713/1048576
>>  * 09/10/29 06:31:12 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
>>
>> 09/10/29 06:31:13 INFO hbase.PerformanceEvaluation: Finished randomWrite in
>> 3446240ms at offset 0 for 1048576 rows
>>
>>
>> D) randomWrite rerun, with existing TestTable
>> bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation  randomWrite 1
>>
>>        ==> 3075 writes / sec
>> 09/10/29 06:57:04 INFO hbase.PerformanceEvaluation: Finished
>> randomWrite in 341746ms at offset 0 for 1048576 rows
>>
>>
>> Thanks
>> Sujee
>>
>> --
>> http://sujee.net
>>
>
>
>
> --
> http://sujee.net
>

Re: long pauses during hbase.PerformanceEvaluation tests

by Sujee Maniyam-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jean,
that would be the logs  under '@hadoop4' section - I believe that was
the region server holding the server at the time
thanks
Sujee

On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans <jdcryans@...> wrote:
> 14 minutes seems way too much, anything relevant in the region server
> logs around the same time? (The one who holds the region at that
> point)
>

Re: long pauses during hbase.PerformanceEvaluation tests

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yes, anything there? Care to paste some lines in a pastebin?

Thx,

J-D

On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam <sujee@...> wrote:

> Jean,
> that would be the logs  under '@hadoop4' section - I believe that was
> the region server holding the server at the time
> thanks
> Sujee
>
> On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans <jdcryans@...> wrote:
>> 14 minutes seems way too much, anything relevant in the region server
>> logs around the same time? (The one who holds the region at that
>> point)
>>
>

Re: long pauses during hbase.PerformanceEvaluation tests

by Sujee Maniyam-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

http://pastebin.com/f37d75e1d
This is what I previously sent out in the email as well.

I don't have the logs anymore (shoulda saved them!).

I also realized that my  'ulimit -n = 1024'.   Going to up this to 32k
and re-run the test.  Will post the results.

Any thing else, I should keep an eye out for?

thanks
Sujee

On Thu, Oct 29, 2009 at 11:01 AM, Jean-Daniel Cryans
<jdcryans@...> wrote:

> Yes, anything there? Care to paste some lines in a pastebin?
>
> Thx,
>
> J-D
>
> On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam <sujee@...> wrote:
>> Jean,
>> that would be the logs  under '@hadoop4' section - I believe that was
>> the region server holding the server at the time
>> thanks
>> Sujee
>>
>> On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans <jdcryans@...> wrote:
>>> 14 minutes seems way too much, anything relevant in the region server
>>> logs around the same time? (The one who holds the region at that
>>> point)
>>>
>>
>



--
http://sujee.net

Re: long pauses during hbase.PerformanceEvaluation tests

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sujee,

Sorry the mailing list often doesn't ship the attached files so I
couldn't see it.

So your log tells me that your split really took 14 mins... !!! Could
you re-run that with DEBUG? See
http://wiki.apache.org/hadoop/Hbase/FAQ#A5

Thx!

J-D

On Thu, Oct 29, 2009 at 3:23 PM, Sujee Maniyam <sujee@...> wrote:

> http://pastebin.com/f37d75e1d
> This is what I previously sent out in the email as well.
>
> I don't have the logs anymore (shoulda saved them!).
>
> I also realized that my  'ulimit -n = 1024'.   Going to up this to 32k
> and re-run the test.  Will post the results.
>
> Any thing else, I should keep an eye out for?
>
> thanks
> Sujee
>
> On Thu, Oct 29, 2009 at 11:01 AM, Jean-Daniel Cryans
> <jdcryans@...> wrote:
>> Yes, anything there? Care to paste some lines in a pastebin?
>>
>> Thx,
>>
>> J-D
>>
>> On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam <sujee@...> wrote:
>>> Jean,
>>> that would be the logs  under '@hadoop4' section - I believe that was
>>> the region server holding the server at the time
>>> thanks
>>> Sujee
>>>
>>> On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans <jdcryans@...> wrote:
>>>> 14 minutes seems way too much, anything relevant in the region server
>>>> logs around the same time? (The one who holds the region at that
>>>> point)
>>>>
>>>
>>
>
>
>
> --
> http://sujee.net
>

Re: long pauses during hbase.PerformanceEvaluation tests

by Sujee Maniyam-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

following up on this; hopefully this can help out some one else.

turned out the culprit was 'ulimit -n'.
it was the default 1024.   I upped it to 32k.    And that was like
flicking a switch and making every thing work great! :-)

now the benchmark-tests look much, much better.

regards
Sujee


On Thu, Oct 29, 2009 at 3:57 PM, Jean-Daniel Cryans <jdcryans@...> wrote:

> Sujee,
>
> Sorry the mailing list often doesn't ship the attached files so I
> couldn't see it.
>
> So your log tells me that your split really took 14 mins... !!! Could
> you re-run that with DEBUG? See
> http://wiki.apache.org/hadoop/Hbase/FAQ#A5
>
> Thx!
>
> J-D
>
> On Thu, Oct 29, 2009 at 3:23 PM, Sujee Maniyam <sujee@...> wrote:
>> http://pastebin.com/f37d75e1d
>> This is what I previously sent out in the email as well.
>>
>> I don't have the logs anymore (shoulda saved them!).
>>
>> I also realized that my  'ulimit -n = 1024'.   Going to up this to 32k
>> and re-run the test.  Will post the results.
>>
>> Any thing else, I should keep an eye out for?
>>
>> thanks
>> Sujee
>>
>> On Thu, Oct 29, 2009 at 11:01 AM, Jean-Daniel Cryans
>> <jdcryans@...> wrote:
>>> Yes, anything there? Care to paste some lines in a pastebin?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam <sujee@...> wrote:
>>>> Jean,
>>>> that would be the logs  under '@hadoop4' section - I believe that was
>>>> the region server holding the server at the time
>>>> thanks
>>>> Sujee
>>>>
>>>> On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans <jdcryans@...> wrote:
>>>>> 14 minutes seems way too much, anything relevant in the region server
>>>>> logs around the same time? (The one who holds the region at that
>>>>> point)
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> http://sujee.net
>>
>



--
http://sujee.net

Re: long pauses during hbase.PerformanceEvaluation tests

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

As a remainder, on the Getting Started page in the Requirements it says

"HBase currently is a file handle hog. The usual default of 1024 on
*nix systems is insufficient if you are loading any significant amount
of data into regionservers. See the FAQ: Why do I see
"java.io.IOException...(Too many open files)" in my logs?  for how to
up the limit."

http://hadoop.apache.org/hbase/docs/r0.20.1/api/overview-summary.html#overview_description

J-D

On Fri, Oct 30, 2009 at 4:50 PM, Sujee Maniyam <sujee@...> wrote:

> following up on this; hopefully this can help out some one else.
>
> turned out the culprit was 'ulimit -n'.
> it was the default 1024.   I upped it to 32k.    And that was like
> flicking a switch and making every thing work great! :-)
>
> now the benchmark-tests look much, much better.
>
> regards
> Sujee
>
>
> On Thu, Oct 29, 2009 at 3:57 PM, Jean-Daniel Cryans <jdcryans@...> wrote:
>> Sujee,
>>
>> Sorry the mailing list often doesn't ship the attached files so I
>> couldn't see it.
>>
>> So your log tells me that your split really took 14 mins... !!! Could
>> you re-run that with DEBUG? See
>> http://wiki.apache.org/hadoop/Hbase/FAQ#A5
>>
>> Thx!
>>
>> J-D
>>
>> On Thu, Oct 29, 2009 at 3:23 PM, Sujee Maniyam <sujee@...> wrote:
>>> http://pastebin.com/f37d75e1d
>>> This is what I previously sent out in the email as well.
>>>
>>> I don't have the logs anymore (shoulda saved them!).
>>>
>>> I also realized that my  'ulimit -n = 1024'.   Going to up this to 32k
>>> and re-run the test.  Will post the results.
>>>
>>> Any thing else, I should keep an eye out for?
>>>
>>> thanks
>>> Sujee
>>>
>>> On Thu, Oct 29, 2009 at 11:01 AM, Jean-Daniel Cryans
>>> <jdcryans@...> wrote:
>>>> Yes, anything there? Care to paste some lines in a pastebin?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Thu, Oct 29, 2009 at 10:19 AM, Sujee Maniyam <sujee@...> wrote:
>>>>> Jean,
>>>>> that would be the logs  under '@hadoop4' section - I believe that was
>>>>> the region server holding the server at the time
>>>>> thanks
>>>>> Sujee
>>>>>
>>>>> On Thu, Oct 29, 2009 at 9:32 AM, Jean-Daniel Cryans <jdcryans@...> wrote:
>>>>>> 14 minutes seems way too much, anything relevant in the region server
>>>>>> logs around the same time? (The one who holds the region at that
>>>>>> point)
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> http://sujee.net
>>>
>>
>
>
>
> --
> http://sujee.net
>