random write slower than random read in HBase 0.20.0

View: New views
7 Messages — Rating Filter:   Alert me  

random write slower than random read in HBase 0.20.0

by Jun Li-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a cluster of 13 Linux RedHat machines (each one with 4 CPU
cores and 8GB RAM) and I configured it to form a HBase cluster of 1
master and 12 region servers, with HBase-0.20.0 code base.  For the
cluster configuration, I followed what was described in the article
“HBase-0.20.0 Performance Evaluation” by Anty Rao and Schubert Zhang
on August 21, 2009 (accessible from:
http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation,
which was also discussed in one of the HBase  mailing list archives in
the last several months). So I allocated 4GB heap for each
HBase-related process, with the “hbase.regionserver.handler.count” set
to be “20”.  A small difference is that I used the HBase managed
Zoo-keeper to manage the Hbase Masters.

I then set up an Hbase table with a row key of 48 bytes, and a column
that holds about 20 Bytes data.  For a single client, I was able to
get in average, the write of 0.6 milliseconds per row (random write),
and the read of 0.4 milliseconds per row (random read).

Then  I  had each machine in the cluster to launch 1, or 2,  or 3
client test applications, with each client test application read/write
100000 rows for each test run, for throughput testing.  From my
measurement results, I found that the random write will have best
measured performance when each machine having 2 clients (totally
2*13=26 clients in the cluster), with 8500 rows/second; and the random
read will have almost the same throughput for 2 or 3 clients, with
35000 rows/second.

Since I designed the Hbase table to be accessed via the Web Service
for random read or write data access, the HTable instance is created
in a default mode, that is, I did not have customized settings for
“setAutoFlush(false)”, “setWriteBufferSize(**)”, etc.

I also incorporated the HTablePool so that each test client
application will use the same HTable instance. But I observed little
improvement on my random write testing from the above numbers, with or
without table pooling.

So the question that I have is that, following the original Google’s
BigTable paper, should Random Write be always much faster than Random
Read?   If that is the case, what are the tunable parameters in terms
of HBase setup that I can explore to improve the Random Write speed.

I also downloaded the PerformanceEvaluation.java that is patched by
Schubert Zhang (the link to the code is in the article mentioned
above), and used it to test my cluster’s performance as well. In my
cluster, to read/write 4,194,280 rows, I will need 274 seconds for
random writes (translated to15307 row/second) and 305 seconds for
random reads (translated to 13751 row/second). Notice that although
random write is still faster than random read, but they are almost
compatible.  As a comparison, in the measurement result reported by
Schubert Zhang (in the article mentioned above),  with the smaller
test environment that he had (1 master and 4 slaves, 4 CPU
core/machine, 8GB RAM per machine),  it was reported to obtain the
random write of 11366 row/second and the random read of 4424
row/second. That is, the random read does get significantly improved
in my case as I had more machines in the cluster, but not the random
write.

Please help on making comments and suggestions, for possible
performance improvement on random write.

Regards,


Jun Li

Re: random write slower than random read in HBase 0.20.0

by Tatsuya Kawano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jun,

On Sat, Oct 24, 2009 at 10:56 AM, Jun Li wrote:
> I have a cluster of 13 Linux RedHat machines (each one with 4 CPU
> cores and 8GB RAM) and I configured it to form a HBase cluster of 1
> master and 12 region servers, with HBase-0.20.0 code base.


> As a comparison, in the measurement result reported by
> Schubert Zhang (in the article mentioned above),  with the smaller
> test environment that he had (1 master and 4 slaves, 4 CPU
> core/machine, 8GB RAM per machine),  it was reported to obtain the
> random write of 11366 row/second and the random read of 4424
> row/second. That is, the random read does get significantly improved
> in my case as I had more machines in the cluster, but not the random
> write.
>
> Please help on making comments and suggestions, for possible
> performance improvement on random write.


Did you also run the same test with 4 region servers or some smaller
numbers than 12?

Your assumption is based on both random reads and random writes would
scale at the same degree as you add more region servers, but they
probably not. HBase employs a block cache for reads, and this could
make random reads to increase the performance quicker than random
writes.

If you have more region servers, you have more RAM for block cache.
This makes each region server to read blocks from the disk less often,
and this could dramatically improve the random reads performance. But,
on the other hand, a write operation always involves a disk access to
append WAL on the disk. This will make random writes slower to
increase the performance than random reads.

So, if you run the same test with smaller number of region servers. It
could yield a similar performance to the "random reads" plot in the
BigTable paper. And as you add more region servers, the block cache
will become more effective and the performance could be getting closer
to the "random reads (mem)" plot.

Well, I'm not totally sure if I'm right; I've never had a chance to
run a large HBase cluster. So, if you run your test on a smaller and
larger clusters and share us your results, that would be great.


As for improving write performance, I would suggest
“setAutoFlush(false)”, “setWriteBufferSize(**)” so that you can put
multiple records in a single flushCommits(). This will give you
instant performance boost. You could do this by updating your web
service to accept multiple puts in a single RPC.

There should other HBase parameters to optimize, but I'm not the
expert who can tell you how to optimize them for your environment. And
I also think it would be nice to check how random read improves the
performance in different sizes of clusters **before** going crazy to
tweak those parameters.

Thanks,

--
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Sat, Oct 24, 2009 at 10:56 AM, Jun Li wrote:

> I have a cluster of 13 Linux RedHat machines (each one with 4 CPU
> cores and 8GB RAM) and I configured it to form a HBase cluster of 1
> master and 12 region servers, with HBase-0.20.0 code base.  For the
> cluster configuration, I followed what was described in the article
> “HBase-0.20.0 Performance Evaluation” by Anty Rao and Schubert Zhang
> on August 21, 2009 (accessible from:
> http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation,
> which was also discussed in one of the HBase  mailing list archives in
> the last several months). So I allocated 4GB heap for each
> HBase-related process, with the “hbase.regionserver.handler.count” set
> to be “20”.  A small difference is that I used the HBase managed
> Zoo-keeper to manage the Hbase Masters.
>
> I then set up an Hbase table with a row key of 48 bytes, and a column
> that holds about 20 Bytes data.  For a single client, I was able to
> get in average, the write of 0.6 milliseconds per row (random write),
> and the read of 0.4 milliseconds per row (random read).
>
> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
> client test applications, with each client test application read/write
> 100000 rows for each test run, for throughput testing.  From my
> measurement results, I found that the random write will have best
> measured performance when each machine having 2 clients (totally
> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
> read will have almost the same throughput for 2 or 3 clients, with
> 35000 rows/second.
>
> Since I designed the Hbase table to be accessed via the Web Service
> for random read or write data access, the HTable instance is created
> in a default mode, that is, I did not have customized settings for
> “setAutoFlush(false)”, “setWriteBufferSize(**)”, etc.
>
> I also incorporated the HTablePool so that each test client
> application will use the same HTable instance. But I observed little
> improvement on my random write testing from the above numbers, with or
> without table pooling.
>
> So the question that I have is that, following the original Google’s
> BigTable paper, should Random Write be always much faster than Random
> Read?   If that is the case, what are the tunable parameters in terms
> of HBase setup that I can explore to improve the Random Write speed.
>
> I also downloaded the PerformanceEvaluation.java that is patched by
> Schubert Zhang (the link to the code is in the article mentioned
> above), and used it to test my cluster’s performance as well. In my
> cluster, to read/write 4,194,280 rows, I will need 274 seconds for
> random writes (translated to15307 row/second) and 305 seconds for
> random reads (translated to 13751 row/second). Notice that although
> random write is still faster than random read, but they are almost
> compatible.  As a comparison, in the measurement result reported by
> Schubert Zhang (in the article mentioned above),  with the smaller
> test environment that he had (1 master and 4 slaves, 4 CPU
> core/machine, 8GB RAM per machine),  it was reported to obtain the
> random write of 11366 row/second and the random read of 4424
> row/second. That is, the random read does get significantly improved
> in my case as I had more machines in the cluster, but not the random
> write.
>
> Please help on making comments and suggestions, for possible
> performance improvement on random write.
>
> Regards,
>
>
> Jun Li
>

Re: random write slower than random read in HBase 0.20.0

by stack-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

What Tatsuya said and then, how many regions in your table?  Were your
regions distributed out over the cluster evenly so all machines were
participating in the test?  Did you start over from scratch each time?   I'm
wondering if you had enough data for 12 servers.
St.Ack

On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922181@...> wrote:

> I have a cluster of 13 Linux RedHat machines (each one with 4 CPU
> cores and 8GB RAM) and I configured it to form a HBase cluster of 1
> master and 12 region servers, with HBase-0.20.0 code base.  For the
> cluster configuration, I followed what was described in the article
> “HBase-0.20.0 Performance Evaluation” by Anty Rao and Schubert Zhang
> on August 21, 2009 (accessible from:
> http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation,
> which was also discussed in one of the HBase  mailing list archives in
> the last several months). So I allocated 4GB heap for each
> HBase-related process, with the “hbase.regionserver.handler.count” set
> to be “20”.  A small difference is that I used the HBase managed
> Zoo-keeper to manage the Hbase Masters.
>
> I then set up an Hbase table with a row key of 48 bytes, and a column
> that holds about 20 Bytes data.  For a single client, I was able to
> get in average, the write of 0.6 milliseconds per row (random write),
> and the read of 0.4 milliseconds per row (random read).
>
> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
> client test applications, with each client test application read/write
> 100000 rows for each test run, for throughput testing.  From my
> measurement results, I found that the random write will have best
> measured performance when each machine having 2 clients (totally
> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
> read will have almost the same throughput for 2 or 3 clients, with
> 35000 rows/second.
>
> Since I designed the Hbase table to be accessed via the Web Service
> for random read or write data access, the HTable instance is created
> in a default mode, that is, I did not have customized settings for
> “setAutoFlush(false)”, “setWriteBufferSize(**)”, etc.
>
> I also incorporated the HTablePool so that each test client
> application will use the same HTable instance. But I observed little
> improvement on my random write testing from the above numbers, with or
> without table pooling.
>
> So the question that I have is that, following the original Google’s
> BigTable paper, should Random Write be always much faster than Random
> Read?   If that is the case, what are the tunable parameters in terms
> of HBase setup that I can explore to improve the Random Write speed.
>
> I also downloaded the PerformanceEvaluation.java that is patched by
> Schubert Zhang (the link to the code is in the article mentioned
> above), and used it to test my cluster’s performance as well. In my
> cluster, to read/write 4,194,280 rows, I will need 274 seconds for
> random writes (translated to15307 row/second) and 305 seconds for
> random reads (translated to 13751 row/second). Notice that although
> random write is still faster than random read, but they are almost
> compatible.  As a comparison, in the measurement result reported by
> Schubert Zhang (in the article mentioned above),  with the smaller
> test environment that he had (1 master and 4 slaves, 4 CPU
> core/machine, 8GB RAM per machine),  it was reported to obtain the
> random write of 11366 row/second and the random read of 4424
> row/second. That is, the random read does get significantly improved
> in my case as I had more machines in the cluster, but not the random
> write.
>
> Please help on making comments and suggestions, for possible
> performance improvement on random write.
>
> Regards,
>
>
> Jun Li
>

Re: random write slower than random read in HBase 0.20.0

by Jun Li-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Michael and Tatsuya,

Thank you very much for your quick replies!

I realized that when an HTable is initiated, it starts with one region
from one of the region servers in the cluster and as I pump more and
more data, the number of regions grow and the regions get splitted to
other region servers. I did the measurement only after the regions
fully cover the available region servers in the cluster. Currently the
table holds at least 14 million rows (as I am writing this email, it
is counting in the HBase shell).

I purposely had the row keys to be generated randomly (a row key, in
my implementation, is the concatenate of the 3 sections, each section
has 16 bytes and is the hex representation of a random number with 8
bytes).  I just checked the distribution of the regions to the region
servers and the current distribution is the following (totally number
of regions: 33):

Region Server 1:  3
Region Server 2:  2
Region Server 3:  4
Region Server 4:  1
Region Server 5:  2
Region Server 6:  2
Region Server 7:  4
Region Server 8:  2
Region Server 9:  3
Region Server 10:  4
Region Server 11:  2
Region Server 12:  4

So the distribution of the number of the regions seems not that even.
In fact, in my Ganglia monitor, I can see that often the CPU usage
distribution is not even.

Although the way that I generated the row key is tied to the
application that I am building, but from the performance evaluation
purpose, I can do some modification in my measurement code. So I would
like to know what will be a good way to randomly generate a row key so
that at any certain time, statistically, the rows are evenly
distributed across available region servers?

What about HTablePool? In my measurement, I did not see performance
improvement on write, when I used a single HTable instance, vs. a
HTable Instance for every client request, in each of the client test
application.  Will HTablePool only help in a Java process (such as
Axis web service) that has multiple current threads that try to access
the HBase?

Regarding Tatsuya’s suggestion of packing multiple writes into one
single RPC, I am using HTable as the database tier in a 3-tier web
application architecture to serve client’s instantaneous data access,
rather than batched processing.  I am not sure how I can pack multiple
records into one single flushCommits(), as in one client invocation
session, data read and data write are interleaving.

So in your email reply, do you mean “multiple puts” across the SAME
table, or can it be across multiple tables? From my SQL based
programming experience, I would guess it works only across one single
table.

I can have my cluster to go down to 4 region servers as Tatsuya
suggested and I just acquired a 16-machine cluster (8 cores and 32 GB
RAM per machine), so I  can increase the region servers to 15  to see
the performance. And I will let you know my performance measurement
result in the next week or so.

I realized that there is the HBase 0.20.1.  Are there some new
features inside that can help improve the write performance?

Regards,

Jun


On Sat, Oct 24, 2009 at 9:38 AM, stack <stack@...> wrote:

> What Tatsuya said and then, how many regions in your table?  Were your
> regions distributed out over the cluster evenly so all machines were
> participating in the test?  Did you start over from scratch each time?   I'm
> wondering if you had enough data for 12 servers.
> St.Ack
>
> On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922181@...> wrote:
>
>> I have a cluster of 13 Linux RedHat machines (each one with 4 CPU
>> cores and 8GB RAM) and I configured it to form a HBase cluster of 1
>> master and 12 region servers, with HBase-0.20.0 code base.  For the
>> cluster configuration, I followed what was described in the article
>> “HBase-0.20.0 Performance Evaluation” by Anty Rao and Schubert Zhang
>> on August 21, 2009 (accessible from:
>> http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation,
>> which was also discussed in one of the HBase  mailing list archives in
>> the last several months). So I allocated 4GB heap for each
>> HBase-related process, with the “hbase.regionserver.handler.count” set
>> to be “20”.  A small difference is that I used the HBase managed
>> Zoo-keeper to manage the Hbase Masters.
>>
>> I then set up an Hbase table with a row key of 48 bytes, and a column
>> that holds about 20 Bytes data.  For a single client, I was able to
>> get in average, the write of 0.6 milliseconds per row (random write),
>> and the read of 0.4 milliseconds per row (random read).
>>
>> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
>> client test applications, with each client test application read/write
>> 100000 rows for each test run, for throughput testing.  From my
>> measurement results, I found that the random write will have best
>> measured performance when each machine having 2 clients (totally
>> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
>> read will have almost the same throughput for 2 or 3 clients, with
>> 35000 rows/second.
>>
>> Since I designed the Hbase table to be accessed via the Web Service
>> for random read or write data access, the HTable instance is created
>> in a default mode, that is, I did not have customized settings for
>> “setAutoFlush(false)”, “setWriteBufferSize(**)”, etc.
>>
>> I also incorporated the HTablePool so that each test client
>> application will use the same HTable instance. But I observed little
>> improvement on my random write testing from the above numbers, with or
>> without table pooling.
>>
>> So the question that I have is that, following the original Google’s
>> BigTable paper, should Random Write be always much faster than Random
>> Read?   If that is the case, what are the tunable parameters in terms
>> of HBase setup that I can explore to improve the Random Write speed.
>>
>> I also downloaded the PerformanceEvaluation.java that is patched by
>> Schubert Zhang (the link to the code is in the article mentioned
>> above), and used it to test my cluster’s performance as well. In my
>> cluster, to read/write 4,194,280 rows, I will need 274 seconds for
>> random writes (translated to15307 row/second) and 305 seconds for
>> random reads (translated to 13751 row/second). Notice that although
>> random write is still faster than random read, but they are almost
>> compatible.  As a comparison, in the measurement result reported by
>> Schubert Zhang (in the article mentioned above),  with the smaller
>> test environment that he had (1 master and 4 slaves, 4 CPU
>> core/machine, 8GB RAM per machine),  it was reported to obtain the
>> random write of 11366 row/second and the random read of 4424
>> row/second. That is, the random read does get significantly improved
>> in my case as I had more machines in the cluster, but not the random
>> write.
>>
>> Please help on making comments and suggestions, for possible
>> performance improvement on random write.
>>
>> Regards,
>>
>>
>> Jun Li
>>
>

Re: random write slower than random read in HBase 0.20.0

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Jun,

Some answers to your remarks inline.

J-D

On Sat, Oct 24, 2009 at 10:43 PM, Jun Li <jltz922181@...> wrote:

> Hi Michael and Tatsuya,
>
> Thank you very much for your quick replies!
>
> I realized that when an HTable is initiated, it starts with one region
> from one of the region servers in the cluster and as I pump more and
> more data, the number of regions grow and the regions get splitted to
> other region servers. I did the measurement only after the regions
> fully cover the available region servers in the cluster. Currently the
> table holds at least 14 million rows (as I am writing this email, it
> is counting in the HBase shell).
>
> I purposely had the row keys to be generated randomly (a row key, in
> my implementation, is the concatenate of the 3 sections, each section
> has 16 bytes and is the hex representation of a random number with 8
> bytes).  I just checked the distribution of the regions to the region
> servers and the current distribution is the following (totally number
> of regions: 33):
>
> Region Server 1:  3
> Region Server 2:  2
> Region Server 3:  4
> Region Server 4:  1
> Region Server 5:  2
> Region Server 6:  2
> Region Server 7:  4
> Region Server 8:  2
> Region Server 9:  3
> Region Server 10:  4
> Region Server 11:  2
> Region Server 12:  4
>
> So the distribution of the number of the regions seems not that even.
> In fact, in my Ganglia monitor, I can see that often the CPU usage
> distribution is not even.

There is a sloppiness factor in the way the regions are distributed
since we don't want some major regions reassignment every time there's
a split (a region that becomes too big is split into two new regions).
Also you said your were doing a count while writing this? The shell
count is a scan basically, so since the rows are sequentially
partitioned in each region, the shell count only hits a region at a
time thus only a region server at a time. Do consider using the
provided RowCounter MapReduce job (see the mapreduce package in the
API javadoc).

>
> Although the way that I generated the row key is tied to the
> application that I am building, but from the performance evaluation
> purpose, I can do some modification in my measurement code. So I would
> like to know what will be a good way to randomly generate a row key so
> that at any certain time, statistically, the rows are evenly
> distributed across available region servers?

Well that's pretty much already the case isn't it? But take into
account that you cannot get the exact number of rows on each region
server, in your case with 14M rows on 12 regions server it seems that
a slightly uneven distribution hits you a lot but on the same setup
with 10x that number of rows the difference is lessen.

>
> What about HTablePool? In my measurement, I did not see performance
> improvement on write, when I used a single HTable instance, vs. a
> HTable Instance for every client request, in each of the client test
> application.  Will HTablePool only help in a Java process (such as
> Axis web service) that has multiple current threads that try to access
> the HBase?

Yes.

>
> Regarding Tatsuya’s suggestion of packing multiple writes into one
> single RPC, I am using HTable as the database tier in a 3-tier web
> application architecture to serve client’s instantaneous data access,
> rather than batched processing.  I am not sure how I can pack multiple
> records into one single flushCommits(), as in one client invocation
> session, data read and data write are interleaving.

Using the write buffer really just makes sense in a batch job you are right,

>
> So in your email reply, do you mean “multiple puts” across the SAME
> table, or can it be across multiple tables? From my SQL based
> programming experience, I would guess it works only across one single
> table.

Yes.

>
> I can have my cluster to go down to 4 region servers as Tatsuya
> suggested and I just acquired a 16-machine cluster (8 cores and 32 GB
> RAM per machine), so I  can increase the region servers to 15  to see
> the performance. And I will let you know my performance measurement
> result in the next week or so.

I think that before anything else, you should get more data. In most
of my comments and others previous replies, we all say that you
probably have more region servers than needed. It may seem weird, but
HBase performs better when loaded with at least an average of 40-50
regions per region server.

>
> I realized that there is the HBase 0.20.1.  Are there some new
> features inside that can help improve the write performance?

Mostly bug fixes. And BTW, you can set hfile.block.cache.size to 0 if
you want to disable the caching because in your case I'm sure HBase is
able to cache everything thus screwing your numbers ;)

>
> Regards,
>
> Jun
>
>
> On Sat, Oct 24, 2009 at 9:38 AM, stack <stack@...> wrote:
>> What Tatsuya said and then, how many regions in your table?  Were your
>> regions distributed out over the cluster evenly so all machines were
>> participating in the test?  Did you start over from scratch each time?   I'm
>> wondering if you had enough data for 12 servers.
>> St.Ack
>>
>> On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922181@...> wrote:
>>
>>> I have a cluster of 13 Linux RedHat machines (each one with 4 CPU
>>> cores and 8GB RAM) and I configured it to form a HBase cluster of 1
>>> master and 12 region servers, with HBase-0.20.0 code base.  For the
>>> cluster configuration, I followed what was described in the article
>>> “HBase-0.20.0 Performance Evaluation” by Anty Rao and Schubert Zhang
>>> on August 21, 2009 (accessible from:
>>> http://www.slideshare.net/schubertzhang/hbase-0200-performance-evaluation,
>>> which was also discussed in one of the HBase  mailing list archives in
>>> the last several months). So I allocated 4GB heap for each
>>> HBase-related process, with the “hbase.regionserver.handler.count” set
>>> to be “20”.  A small difference is that I used the HBase managed
>>> Zoo-keeper to manage the Hbase Masters.
>>>
>>> I then set up an Hbase table with a row key of 48 bytes, and a column
>>> that holds about 20 Bytes data.  For a single client, I was able to
>>> get in average, the write of 0.6 milliseconds per row (random write),
>>> and the read of 0.4 milliseconds per row (random read).
>>>
>>> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
>>> client test applications, with each client test application read/write
>>> 100000 rows for each test run, for throughput testing.  From my
>>> measurement results, I found that the random write will have best
>>> measured performance when each machine having 2 clients (totally
>>> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
>>> read will have almost the same throughput for 2 or 3 clients, with
>>> 35000 rows/second.
>>>
>>> Since I designed the Hbase table to be accessed via the Web Service
>>> for random read or write data access, the HTable instance is created
>>> in a default mode, that is, I did not have customized settings for
>>> “setAutoFlush(false)”, “setWriteBufferSize(**)”, etc.
>>>
>>> I also incorporated the HTablePool so that each test client
>>> application will use the same HTable instance. But I observed little
>>> improvement on my random write testing from the above numbers, with or
>>> without table pooling.
>>>
>>> So the question that I have is that, following the original Google’s
>>> BigTable paper, should Random Write be always much faster than Random
>>> Read?   If that is the case, what are the tunable parameters in terms
>>> of HBase setup that I can explore to improve the Random Write speed.
>>>
>>> I also downloaded the PerformanceEvaluation.java that is patched by
>>> Schubert Zhang (the link to the code is in the article mentioned
>>> above), and used it to test my cluster’s performance as well. In my
>>> cluster, to read/write 4,194,280 rows, I will need 274 seconds for
>>> random writes (translated to15307 row/second) and 305 seconds for
>>> random reads (translated to 13751 row/second). Notice that although
>>> random write is still faster than random read, but they are almost
>>> compatible.  As a comparison, in the measurement result reported by
>>> Schubert Zhang (in the article mentioned above),  with the smaller
>>> test environment that he had (1 master and 4 slaves, 4 CPU
>>> core/machine, 8GB RAM per machine),  it was reported to obtain the
>>> random write of 11366 row/second and the random read of 4424
>>> row/second. That is, the random read does get significantly improved
>>> in my case as I had more machines in the cluster, but not the random
>>> write.
>>>
>>> Please help on making comments and suggestions, for possible
>>> performance improvement on random write.
>>>
>>> Regards,
>>>
>>>
>>> Jun Li
>>>
>>
>

Re: random write slower than random read in HBase 0.20.0

by stack-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922181@...> wrote:

> ...
> I then set up an Hbase table with a row key of 48 bytes, and a column
> that holds about 20 Bytes data.  For a single client, I was able to
> get in average, the write of 0.6 milliseconds per row (random write),
> and the read of 0.4 milliseconds per row (random read).
>
> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
> client test applications, with each client test application read/write
> 100000 rows for each test run, for throughput testing.  From my
> measurement results, I found that the random write will have best
> measured performance when each machine having 2 clients (totally
> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
> read will have almost the same throughput for 2 or 3 clients, with
> 35000 rows/second.
> ...
>

Single server gives you 0.6ms to random-write and 0.4ms to random read?
Thats not bad.  Random-write is slower because its appending the WAL.  The
random-read is coming from cache otherwise I'd expect it taking milliseconds
(disk-seek).

8500rows/second is across whole cluster?  If it took 1ms per random-write,
you should be doing about twice this rate over the cluster (if your writes
are not batched): 1ms * 13 * 1000.

What kinda numbers are you looking for Jun?



> So the question that I have is that, following the original Google’s
> BigTable paper, should Random Write be always much faster than Random
> Read?


Random write should be faster than random read unless a good portion of your
dataset fits into cache (random read involves disk seek if no cache hit;
random write is appending to a file... which usually would not involve disk
seek).



>   If that is the case, what are the tunable parameters in terms
> of HBase setup that I can explore to improve the Random Write speed.
>
>
It looks like batching won't help in your case because no locality in your
keying.

St.Ack

Re: random write slower than random read in HBase 0.20.0

by Jun Li-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi J-D and Michael,

Following your insightful suggestions, I will try out the setting of
hfile.block.cache.size to be 0 and also manage to get the region
servers to hold much more regions.  In order to get to 40-50 regions
per server, I would need a huge data set to fill my cluster. According
to your experience,  I wonder what is  the nature of such data that
people have experienced to hold 40-50 region servers?  In particular,
the size of the row? Also, have people ever try to reduce the
threshold of splitting a region,  from the default (64MB, I think) to
smaller size, so that the regions can be splitted faster and thus gain
better concurrency?

Just to answer Michael's question, regarding my performance
measurement that I reported, 0.6ms  is the latency number. I measured
it by having only one single client to be launched in the entire
cluster to do read/write.

But for throughput measurement, I used 2 client test applications on
every machine. Thus, I had 2*13=26 client application instances
running in the cluster to concurrently do read/write to the HBase
cluster. For each client, to finish the same read/write task, the
averaged latency will climb up to about 3 ms (because it has to
compete  with other clients). That is, roughly, (1/0.003)*2*13 = 8600
calls/sec, for the entire cluster. But to be more accurate, I
collected all the elapsed time spent for all the clients to finish
their work, and it took 5 minutes and 3 seconds for a particular
round, which
translates to:

      2*100000*13/(5*60+3)=8580 calls/sec.

The two numbers agreed well, because all clients are able to be
launched almost simultaneously, and they finished their job at almost
the same time as well.

In terms of the next round of performance testing, I could scale my
cluster to a 16-machine cluster with 8 cores and 32GB RAM per machine.
From your experience, I am curious about how other people have done or
observed, in terms of the linear scalability of the current
implementation, the HBase0.20.0.


Regards,

Jun


On Sun, Oct 25, 2009 at 10:05 AM, stack <stack@...> wrote:

> On Fri, Oct 23, 2009 at 6:56 PM, Jun Li <jltz922181@...> wrote:
>
>> ...
>> I then set up an Hbase table with a row key of 48 bytes, and a column
>> that holds about 20 Bytes data.  For a single client, I was able to
>> get in average, the write of 0.6 milliseconds per row (random write),
>> and the read of 0.4 milliseconds per row (random read).
>>
>> Then  I  had each machine in the cluster to launch 1, or 2,  or 3
>> client test applications, with each client test application read/write
>> 100000 rows for each test run, for throughput testing.  From my
>> measurement results, I found that the random write will have best
>> measured performance when each machine having 2 clients (totally
>> 2*13=26 clients in the cluster), with 8500 rows/second; and the random
>> read will have almost the same throughput for 2 or 3 clients, with
>> 35000 rows/second.
>> ...
>>
>
> Single server gives you 0.6ms to random-write and 0.4ms to random read?
> Thats not bad.  Random-write is slower because its appending the WAL.  The
> random-read is coming from cache otherwise I'd expect it taking milliseconds
> (disk-seek).
>
> 8500rows/second is across whole cluster?  If it took 1ms per random-write,
> you should be doing about twice this rate over the cluster (if your writes
> are not batched): 1ms * 13 * 1000.
>
> What kinda numbers are you looking for Jun?
>
>
>
>> So the question that I have is that, following the original Google’s
>> BigTable paper, should Random Write be always much faster than Random
>> Read?
>
>
> Random write should be faster than random read unless a good portion of your
> dataset fits into cache (random read involves disk seek if no cache hit;
> random write is appending to a file... which usually would not involve disk
> seek).
>
>
>
>>   If that is the case, what are the tunable parameters in terms
>> of HBase setup that I can explore to improve the Random Write speed.
>>
>>
> It looks like batching won't help in your case because no locality in your
> keying.
>
> St.Ack
>