Hbase can we insert such (inside) data faster?

View: New views
5 Messages — Rating Filter:   Alert me  

Hbase can we insert such (inside) data faster?

by Dmitriy Lyfar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

We are using hadoop + hbase (0.20.1) for tests now. Machines we are testing
on have following configuration:
Vmware
4 core intel xeon, 2.27GHz
Two hbase nodes (one master and one regionserver), 6GB RAM per each.

Table has following definition:

12-byte string as Row
Column family: C1 and 3 qualifiers: q1, q2, q3 (about 200 bytes per record)
Column family: C2 and 2 qualifiers q1, q2 (about 2-4KB per record)

I've implemented simple java utility which parses our data source and
inserts results into hbase (write buffer is 12MB, autoflush off).
We got following results:
~450K records ~= 4GB of data.
Total time of insertion is about 600-650 seconds or ~7 MB/second or 675 rows
per second, or 2ms per row.

So the question is: is this time ok for such hardware or did I miss
something important?
Thank you.

Regards, Dmitriy.

Re: Hbase can we insert such (inside) data faster?

by Amandeep Khurana :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This is slow.. We get about 4k inserts per second per region server with row
size being about 30kB. Using Vmware could be causing the slow down.

Amandeep

On Mon, Oct 26, 2009 at 2:04 AM, Dmitriy Lyfar <dlyfar@...> wrote:

> Hello,
>
> We are using hadoop + hbase (0.20.1) for tests now. Machines we are testing
> on have following configuration:
> Vmware
> 4 core intel xeon, 2.27GHz
> Two hbase nodes (one master and one regionserver), 6GB RAM per each.
>
> Table has following definition:
>
> 12-byte string as Row
> Column family: C1 and 3 qualifiers: q1, q2, q3 (about 200 bytes per record)
> Column family: C2 and 2 qualifiers q1, q2 (about 2-4KB per record)
>
> I've implemented simple java utility which parses our data source and
> inserts results into hbase (write buffer is 12MB, autoflush off).
> We got following results:
> ~450K records ~= 4GB of data.
> Total time of insertion is about 600-650 seconds or ~7 MB/second or 675
> rows
> per second, or 2ms per row.
>
> So the question is: is this time ok for such hardware or did I miss
> something important?
> Thank you.
>
> Regards, Dmitriy.
>

Re: Hbase can we insert such (inside) data faster?

by Dmitriy Lyfar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Amandeep,

Thank you. I also forgot to mention that Zookeeper is managed by hbase on
both nodes and
quorum consists of two zookeepers per node.
Could you tell me how much Zookeepers should I have per this configuration
and how it usually should be?
BTW, which hards disks did you use?

2009/10/26 Amandeep Khurana <amansk@...>

> This is slow.. We get about 4k inserts per second per region server with
> row
> size being about 30kB. Using Vmware could be causing the slow down.
>
> Amandeep
>
> On Mon, Oct 26, 2009 at 2:04 AM, Dmitriy Lyfar <dlyfar@...> wrote:
>
> > Hello,
> >
> > We are using hadoop + hbase (0.20.1) for tests now. Machines we are
> testing
> > on have following configuration:
> > Vmware
> > 4 core intel xeon, 2.27GHz
> > Two hbase nodes (one master and one regionserver), 6GB RAM per each.
> >
> > Table has following definition:
> >
> > 12-byte string as Row
> > Column family: C1 and 3 qualifiers: q1, q2, q3 (about 200 bytes per
> record)
> > Column family: C2 and 2 qualifiers q1, q2 (about 2-4KB per record)
> >
> > I've implemented simple java utility which parses our data source and
> > inserts results into hbase (write buffer is 12MB, autoflush off).
> > We got following results:
> > ~450K records ~= 4GB of data.
> > Total time of insertion is about 600-650 seconds or ~7 MB/second or 675
> > rows
> > per second, or 2ms per row.
> >
> > So the question is: is this time ok for such hardware or did I miss
> > something important?
> > Thank you.
> >
> > Regards, Dmitriy.
> >
>



--
Regards, Lyfar Dmitriy
mailto: dlyfar@...
jabber: dlyfar@...

Re: Hbase can we insert such (inside) data faster?

by Amandeep Khurana :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

1. You need odd number of servers for the zk quorum. 3-5 should be good
enough. In your case, even 1 is fine since the load is not much.
2. We used 7200rpm SATA drives.

On Mon, Oct 26, 2009 at 2:57 AM, Dmitriy Lyfar <dlyfar@...> wrote:

> Hi Amandeep,
>
> Thank you. I also forgot to mention that Zookeeper is managed by hbase on
> both nodes and
> quorum consists of two zookeepers per node.
> Could you tell me how much Zookeepers should I have per this configuration
> and how it usually should be?
> BTW, which hards disks did you use?
>
> 2009/10/26 Amandeep Khurana <amansk@...>
>
> > This is slow.. We get about 4k inserts per second per region server with
> > row
> > size being about 30kB. Using Vmware could be causing the slow down.
> >
> > Amandeep
> >
> > On Mon, Oct 26, 2009 at 2:04 AM, Dmitriy Lyfar <dlyfar@...> wrote:
> >
> > > Hello,
> > >
> > > We are using hadoop + hbase (0.20.1) for tests now. Machines we are
> > testing
> > > on have following configuration:
> > > Vmware
> > > 4 core intel xeon, 2.27GHz
> > > Two hbase nodes (one master and one regionserver), 6GB RAM per each.
> > >
> > > Table has following definition:
> > >
> > > 12-byte string as Row
> > > Column family: C1 and 3 qualifiers: q1, q2, q3 (about 200 bytes per
> > record)
> > > Column family: C2 and 2 qualifiers q1, q2 (about 2-4KB per record)
> > >
> > > I've implemented simple java utility which parses our data source and
> > > inserts results into hbase (write buffer is 12MB, autoflush off).
> > > We got following results:
> > > ~450K records ~= 4GB of data.
> > > Total time of insertion is about 600-650 seconds or ~7 MB/second or 675
> > > rows
> > > per second, or 2ms per row.
> > >
> > > So the question is: is this time ok for such hardware or did I miss
> > > something important?
> > > Thank you.
> > >
> > > Regards, Dmitriy.
> > >
> >
>
>
>
> --
> Regards, Lyfar Dmitriy
> mailto: dlyfar@...
> jabber: dlyfar@...
>

Re: Hbase can we insert such (inside) data faster?

by Jonathan Gray-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dmitriy,

Are you using any system/resource monitoring software?  You should be
able to see if you are IO, CPU, Memory/GC, or Network bound by doing
some investigating during the import.... this should tell you if you can
get better performance or not (and if things are maxed, you can figure
the bottleneck and try to optimize).

Also, if you are doing an import into a new table, you could use the
HFileOutputFormat.  In my benchmarking, I saw about 10X improvement in
performance compared to a heavily optimized normal import.  Check out
HBASE-48 for more information.

JG

Amandeep Khurana wrote:

> 1. You need odd number of servers for the zk quorum. 3-5 should be good
> enough. In your case, even 1 is fine since the load is not much.
> 2. We used 7200rpm SATA drives.
>
> On Mon, Oct 26, 2009 at 2:57 AM, Dmitriy Lyfar <dlyfar@...> wrote:
>
>> Hi Amandeep,
>>
>> Thank you. I also forgot to mention that Zookeeper is managed by hbase on
>> both nodes and
>> quorum consists of two zookeepers per node.
>> Could you tell me how much Zookeepers should I have per this configuration
>> and how it usually should be?
>> BTW, which hards disks did you use?
>>
>> 2009/10/26 Amandeep Khurana <amansk@...>
>>
>>> This is slow.. We get about 4k inserts per second per region server with
>>> row
>>> size being about 30kB. Using Vmware could be causing the slow down.
>>>
>>> Amandeep
>>>
>>> On Mon, Oct 26, 2009 at 2:04 AM, Dmitriy Lyfar <dlyfar@...> wrote:
>>>
>>>> Hello,
>>>>
>>>> We are using hadoop + hbase (0.20.1) for tests now. Machines we are
>>> testing
>>>> on have following configuration:
>>>> Vmware
>>>> 4 core intel xeon, 2.27GHz
>>>> Two hbase nodes (one master and one regionserver), 6GB RAM per each.
>>>>
>>>> Table has following definition:
>>>>
>>>> 12-byte string as Row
>>>> Column family: C1 and 3 qualifiers: q1, q2, q3 (about 200 bytes per
>>> record)
>>>> Column family: C2 and 2 qualifiers q1, q2 (about 2-4KB per record)
>>>>
>>>> I've implemented simple java utility which parses our data source and
>>>> inserts results into hbase (write buffer is 12MB, autoflush off).
>>>> We got following results:
>>>> ~450K records ~= 4GB of data.
>>>> Total time of insertion is about 600-650 seconds or ~7 MB/second or 675
>>>> rows
>>>> per second, or 2ms per row.
>>>>
>>>> So the question is: is this time ok for such hardware or did I miss
>>>> something important?
>>>> Thank you.
>>>>
>>>> Regards, Dmitriy.
>>>>
>>
>>
>> --
>> Regards, Lyfar Dmitriy
>> mailto: dlyfar@...
>> jabber: dlyfar@...
>>
>