Rest, Stargate or Thrift ?

View: New views
13 Messages — Rating Filter:   Alert me  

Rest, Stargate or Thrift ?

by Joost Ouwerkerk-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Language-independent RPC services for HBase: is there concensus on
preference?  Architecturally, I imagine they are equivalent.
Performance-wise, I imagine Thrift is most optimal.  As for support and
maintenance by core HBase contributors, is Thrift also privileged?

jo.

Re: Rest, Stargate or Thrift ?

by Kevin Peterson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote:

> Language-independent RPC services for HBase: is there concensus on
> preference?  Architecturally, I imagine they are equivalent.
> Performance-wise, I imagine Thrift is most optimal.  As for support and
> maintenance by core HBase contributors, is Thrift also privileged?
>
>
Yes, use Thrift.

I was not able to get Stargate or the legacy REST API working acceptably at
the time 0.20 was released. I don't remember what my problems were -- I
think Stargate doesn't yet support scanners, and I couldn't find a working
ruby client for REST.

The only downside we've had is that our developers on Windows can't seem to
get the thrift ruby gem installed.

I don't see what Stargate could offer that would get sites like Stumbleupon
to migrate away from Thrift, so even if Stargate improves, Thrift support
will likely continue.

Re: Rest, Stargate or Thrift ?

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The new REST API called Stargate is currently considered not yet
production-ready by its author, committer Andrew Purtell. I'll leave
him expand further. If there's a bug or a feature is missing, the best
thing to do is to write to the list and/or open a jira.

The Thrift API is lagging in features compared to the current Java
API. There's currently no official support from any core dev. At SU we
use it but we are still considering other options; the main problem is
that the implementation in some languages are buggy and hard to work
with.

Both APIs have little overhead since they act as thin clients over the
fat Java client. I would consider running the API servers directly on
the client machines and have the application code bind on localhost
rather than going on the network.

J-D

On Wed, Nov 4, 2009 at 6:06 PM, Kevin Peterson <kpeterson@...> wrote:

> On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote:
>
>> Language-independent RPC services for HBase: is there concensus on
>> preference?  Architecturally, I imagine they are equivalent.
>> Performance-wise, I imagine Thrift is most optimal.  As for support and
>> maintenance by core HBase contributors, is Thrift also privileged?
>>
>>
> Yes, use Thrift.
>
> I was not able to get Stargate or the legacy REST API working acceptably at
> the time 0.20 was released. I don't remember what my problems were -- I
> think Stargate doesn't yet support scanners, and I couldn't find a working
> ruby client for REST.
>
> The only downside we've had is that our developers on Windows can't seem to
> get the thrift ruby gem installed.
>
> I don't see what Stargate could offer that would get sites like Stumbleupon
> to migrate away from Thrift, so even if Stargate improves, Thrift support
> will likely continue.
>

Impromptu HBase survey

by Greg Cottman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi everyone,



I was mulling over the question from Jason Strutz at Cumulus Code before, and wondering what sort of data he was storing in HBase.  Then it occurred to me that this question may have broad appeal to many HBase users who are interested in "what" other developers are doing, as opposed to the usual "how" questions.



To this end, I would invite people who feel like sharing to give me a paragraph or two on what they are doing with HBase.  Of course, I don't want anyone to give away their eleven secret herbs and spices or tell me what Ingredient X is.  :-)  I am more interested in metadata and semantics.



To give you an idea of questions that I wonder about:

*        Are you using a natural or synthetic key?

*        Are you using HBase index tables or maintaining your own?

*        Do you have multiple data tables in your HBase server?

*        How many rows of data are in each HBase table?

*        What type of data are you storing in each record?

*        Are you using column families to localize data or store name/value pairs?

*        Are there columns like name, address, etc., that are present in each row?

*        Are you running HBase on your own servers or on Amazon EC2?

*        Are you using Hadoop to run map/reduce functions against HBase?

*        How does your client interact with HBase?  Java API, REST, Stargate, Thrift, other (please specify), etc.



Anyone who is interested in responding can do so to the list or directly to me.  I will keep your responses but not your name or company.  Feel free to answer some or all of the questions, or add your own information that you feel is pertinent to how you are using HBase.  I will give it a week and then collate the responses into an integrated summary that I will publish back to this list.



I should declare that I have no official HBase standing.  I'm just very curious about NoSQL databases as an emerging technology, and HBase in particular.  The 'net shows a general consensus is that HBase is an early NoSQL leader but no-one discusses specifics.  Some empirical data would be very interesting.



Thanks in advance,

Greg.





Greg Cottman

Technical Architect

Quest Software, Australia

Tel: +61 3 9811 8057





Re: Impromptu HBase survey

by TimRobertson100 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

We don't run HBase in operational mode yet, but researching it with a
goal of moving towards there...

> To give you an idea of questions that I wonder about:
>
> *        Are you using a natural or synthetic key?

- synthetic.  UUID but considering an encoded uuid to shorten it.
Would like to see some KeyUtils classes in the HBase library, or some
recommendations.  I'd like an 8 char synthetic key ideally, but
haven't found a good way to do this yet (lack of time).

> *        Are you using HBase index tables or maintaining your own?

- lucene, but will use hbase index tables

> *        Do you have multiple data tables in your HBase server?

- yes, but only for convenience of keeping the 2 small tables with the big one.

> *        How many rows of data are in each HBase table?

- 200 million.  When operation, will expect to grow at 5-10%/month and
expect columns to grow at 10% or so per month also

> *        What type of data are you storing in each record?

- 30-60 fields of INT and String
- might be putting in PNGs in a new table to represent google map tiles


> *        Are you using column families to localize data or store name/value pairs?

- no

> *        Are there columns like name, address, etc., that are present in each row?

- no (http://rs.tdwg.org/dwc/terms/index.htm is our term vocabulary)

> *        Are you running HBase on your own servers or on Amazon EC2?

- in house 10 node cluster with 3 masters
(http://code.google.com/p/biodiversity/wiki/ClusterSetup)


> *        Are you using Hadoop to run map/reduce functions against HBase?

- progressing towards this.  Still using text file exports from a
mysql DB in Hadoop, as HBase is not in production mode yet


> *        How does your client interact with HBase?  Java API, REST, Stargate, Thrift, other (please specify), etc.

- JavaAPI

Re: Rest, Stargate or Thrift ?

by Andrew Purtell-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stargate (REST HTTP) is not going to replace Thrift.

Stargate does support scanners.

   - Andy




________________________________
From: Kevin Peterson <kpeterson@...>
To: hbase-user@...
Sent: Thu, November 5, 2009 10:06:36 AM
Subject: Re: Rest, Stargate or Thrift ?

On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote:

> Language-independent RPC services for HBase: is there concensus on
> preference?  Architecturally, I imagine they are equivalent.
> Performance-wise, I imagine Thrift is most optimal.  As for support and
> maintenance by core HBase contributors, is Thrift also privileged?
>
>
Yes, use Thrift.

I was not able to get Stargate or the legacy REST API working acceptably at
the time 0.20 was released. I don't remember what my problems were -- I
think Stargate doesn't yet support scanners, and I couldn't find a working
ruby client for REST.

The only downside we've had is that our developers on Windows can't seem to
get the thrift ruby gem installed.

I don't see what Stargate could offer that would get sites like Stumbleupon
to migrate away from Thrift, so even if Stargate improves, Thrift support
will likely continue.



     

Re: Rest, Stargate or Thrift ?

by Andrew Purtell-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stargate in 0.20 is experimental. This is because there is little experience with it. If you try it out, and have success or not, a note on your experiences and observations would be most helpful. Use the version in HBase 0.20.1+.

For HBase 0.21, Stargate will be "production ready". See http://su.pr/3g5P7B for open issues in that regard.

    - Andy




________________________________
From: Jean-Daniel Cryans <jdcryans@...>
To: hbase-user@...
Sent: Thu, November 5, 2009 12:55:49 PM
Subject: Re: Rest, Stargate or Thrift ?

The new REST API called Stargate is currently considered not yet
production-ready by its author, committer Andrew Purtell. I'll leave
him expand further. If there's a bug or a feature is missing, the best
thing to do is to write to the list and/or open a jira.

The Thrift API is lagging in features compared to the current Java
API. There's currently no official support from any core dev. At SU we
use it but we are still considering other options; the main problem is
that the implementation in some languages are buggy and hard to work
with.

Both APIs have little overhead since they act as thin clients over the
fat Java client. I would consider running the API servers directly on
the client machines and have the application code bind on localhost
rather than going on the network.

J-D

On Wed, Nov 4, 2009 at 6:06 PM, Kevin Peterson <kpeterson@...> wrote:

> On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote:
>
>> Language-independent RPC services for HBase: is there concensus on
>> preference?  Architecturally, I imagine they are equivalent.
>> Performance-wise, I imagine Thrift is most optimal.  As for support and
>> maintenance by core HBase contributors, is Thrift also privileged?
>>
>>
> Yes, use Thrift.
>
> I was not able to get Stargate or the legacy REST API working acceptably at
> the time 0.20 was released. I don't remember what my problems were -- I
> think Stargate doesn't yet support scanners, and I couldn't find a working
> ruby client for REST.
>
> The only downside we've had is that our developers on Windows can't seem to
> get the thrift ruby gem installed.
>
> I don't see what Stargate could offer that would get sites like Stumbleupon
> to migrate away from Thrift, so even if Stargate improves, Thrift support
> will likely continue.
>



     

Re: Rest, Stargate or Thrift ?

by Sylvain Hellegouarch :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> Language-independent RPC services for HBase: is there concensus on
> preference?  Architecturally, I imagine they are equivalent.
> Performance-wise, I imagine Thrift is most optimal.  As for support and
> maintenance by core HBase contributors, is Thrift also privileged?
>
> jo.
>

Thrift is probably the most used in the field but then again it's a
complete guess at this point as there are no official statements on the
Hadoop web site (or perhaps it is lost within the arcane of the beast) to
what is officially supported or not and by whom. I mean could you also
consider Avro for that task?

It seems new comers to Hadoop/Hbase have to go through the same steps of
"try and see for your own business case". Learning the hard way is
sometimes the best bet.

- Sylvain
--
Sylvain Hellegouarch
http://www.defuze.org

RE: Impromptu HBase survey

by Wim Van Leuven (highestpoint) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Evenly interesting would be to know what type of data you are storing. I
mean, if I store web crawled data in my hbase it doesn't matter if I miss or
lose some or more pages, does it? I'll crawl it next time.
Is the data or every element of it business critical? Is it derived data
from some other source? Aggregated data? Or do we store traditional online
transactions?



-----Original Message-----
From: Tim Robertson [mailto:timrobertson100@...]
Sent: donderdag 5 november 2009 11:14
To: hbase-user@...
Subject: Re: Impromptu HBase survey

We don't run HBase in operational mode yet, but researching it with a
goal of moving towards there...

> To give you an idea of questions that I wonder about:
>
> *        Are you using a natural or synthetic key?

- synthetic.  UUID but considering an encoded uuid to shorten it.
Would like to see some KeyUtils classes in the HBase library, or some
recommendations.  I'd like an 8 char synthetic key ideally, but
haven't found a good way to do this yet (lack of time).

> *        Are you using HBase index tables or maintaining your own?

- lucene, but will use hbase index tables

> *        Do you have multiple data tables in your HBase server?

- yes, but only for convenience of keeping the 2 small tables with the big
one.

> *        How many rows of data are in each HBase table?

- 200 million.  When operation, will expect to grow at 5-10%/month and
expect columns to grow at 10% or so per month also

> *        What type of data are you storing in each record?

- 30-60 fields of INT and String
- might be putting in PNGs in a new table to represent google map tiles


> *        Are you using column families to localize data or store
name/value pairs?

- no

> *        Are there columns like name, address, etc., that are present in
each row?

- no (http://rs.tdwg.org/dwc/terms/index.htm is our term vocabulary)

> *        Are you running HBase on your own servers or on Amazon EC2?

- in house 10 node cluster with 3 masters
(http://code.google.com/p/biodiversity/wiki/ClusterSetup)


> *        Are you using Hadoop to run map/reduce functions against HBase?

- progressing towards this.  Still using text file exports from a
mysql DB in Hadoop, as HBase is not in production mode yet


> *        How does your client interact with HBase?  Java API, REST,
Stargate, Thrift, other (please specify), etc.

- JavaAPI


Re: Rest, Stargate or Thrift ?

by Andrew Purtell-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On a related note, HBASE-1015 is about a native C/C++ client library to improve the client side integration story. What falls out of this is replacement of HRPC with some platform independent RPC for direct access to master and regionserver interfaces. The options there I think are to use Thrift or wait for Avro. The discussion of one always brings up the other.

   - Andy




________________________________
From: Sylvain Hellegouarch <sh@...>
To: hbase-user@...
Cc: hbase-user@...
Sent: Thu, November 5, 2009 10:08:49 PM
Subject: Re: Rest, Stargate or Thrift ?


> Language-independent RPC services for HBase: is there concensus on
> preference?  Architecturally, I imagine they are equivalent.
> Performance-wise, I imagine Thrift is most optimal.  As for support and
> maintenance by core HBase contributors, is Thrift also privileged?
>
> jo.
>

Thrift is probably the most used in the field but then again it's a
complete guess at this point as there are no official statements on the
Hadoop web site (or perhaps it is lost within the arcane of the beast) to
what is officially supported or not and by whom. I mean could you also
consider Avro for that task?

It seems new comers to Hadoop/Hbase have to go through the same steps of
"try and see for your own business case". Learning the hard way is
sometimes the best bet.

- Sylvain
--
Sylvain Hellegouarch
http://www.defuze.org



     

Re: Impromptu HBase survey

by stack-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 10:39 PM, Greg Cottman <greg.cottman@...>wrote:

> Hi everyone,
>
> To this end, I would invite people who feel like sharing to give me a
> paragraph or two on what they are doing with HBase.  Of course, I don't want
> anyone to give away their eleven secret herbs and spices or tell me what
> Ingredient X is.  :-)  I am more interested in metadata and semantics.
>
>
> To give you an idea of questions that I wonder about:
>
> *        Are you using a natural or synthetic key?
>
> Keys are urls that have been md5'd so there is a good spread across the
namespace and then base-64'd (I don't know why the latter is done).




> *        Are you using HBase index tables or maintaining your own?
>
> No





> *        Do you have multiple data tables in your HBase server?
>
>
Yes.  About 50 tables.



> *        How many rows of data are in each HBase table?
>
>
Between 3-20 million rows in each.



> *        What type of data are you storing in each record?
>
>
Some of the tables have wikipedia content and then derivatives; mimetypes,
inlinks, alternate urls, etc., etc.   Other tables hold other indexing
pipeline input and intermediate product.



> *        Are you using column families to localize data or store name/value
> pairs?
>
>
Both.



> *        Are there columns like name, address, etc., that are present in
> each row?
>
>
Sort-of.



> *        Are you running HBase on your own servers or on Amazon EC2?
>
> Own.  Between 100 and 110.



> *        Are you using Hadoop to run map/reduce functions against HBase?
>
> Yes.



> *        How does your client interact with HBase?  Java API, REST,
> Stargate, Thrift, other (please specify), etc.
>
>
Java and REST and thrift.

Above responses describe a cluster from 6 months ago.

St.Ack



>
> Anyone who is interested in responding can do so to the list or directly to
> me.  I will keep your responses but not your name or company.  Feel free to
> answer some or all of the questions, or add your own information that you
> feel is pertinent to how you are using HBase.  I will give it a week and
> then collate the responses into an integrated summary that I will publish
> back to this list.
>
>
>
> I should declare that I have no official HBase standing.  I'm just very
> curious about NoSQL databases as an emerging technology, and HBase in
> particular.  The 'net shows a general consensus is that HBase is an early
> NoSQL leader but no-one discusses specifics.  Some empirical data would be
> very interesting.
>
>
>
> Thanks in advance,
>
> Greg.
>
>
>
>
>
> Greg Cottman
>
> Technical Architect
>
> Quest Software, Australia
>
> Tel: +61 3 9811 8057
>
>
>
>
>

RE: Impromptu HBase survey

by Greg Cottman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


OK.  I'll just move this e-mail to the "Torpid Disinterest" folder.  :'-(

Thanks to the three people who replied but I don't think we have enough for a statistically sound sample.

-----Original Message-----
From: Greg Cottman [mailto:greg.cottman@...]
Sent: Thursday, 5 November 2009 5:39 PM
To: hbase-user@...
Subject: Impromptu HBase survey

Hi everyone,



I was mulling over the question from Jason Strutz at Cumulus Code before, and wondering what sort of data he was storing in HBase.  Then it occurred to me that this question may have broad appeal to many HBase users who are interested in "what" other developers are doing, as opposed to the usual "how" questions.



To this end, I would invite people who feel like sharing to give me a paragraph or two on what they are doing with HBase.  Of course, I don't want anyone to give away their eleven secret herbs and spices or tell me what Ingredient X is.  :-)  I am more interested in metadata and semantics.



To give you an idea of questions that I wonder about:

*        Are you using a natural or synthetic key?

*        Are you using HBase index tables or maintaining your own?

*        Do you have multiple data tables in your HBase server?

*        How many rows of data are in each HBase table?

*        What type of data are you storing in each record?

*        Are you using column families to localize data or store name/value pairs?

*        Are there columns like name, address, etc., that are present in each row?

*        Are you running HBase on your own servers or on Amazon EC2?

*        Are you using Hadoop to run map/reduce functions against HBase?

*        How does your client interact with HBase?  Java API, REST, Stargate, Thrift, other (please specify), etc.



Anyone who is interested in responding can do so to the list or directly to me.  I will keep your responses but not your name or company.  Feel free to answer some or all of the questions, or add your own information that you feel is pertinent to how you are using HBase.  I will give it a week and then collate the responses into an integrated summary that I will publish back to this list.



I should declare that I have no official HBase standing.  I'm just very curious about NoSQL databases as an emerging technology, and HBase in particular.  The 'net shows a general consensus is that HBase is an early NoSQL leader but no-one discusses specifics.  Some empirical data would be very interesting.



Thanks in advance,

Greg.





Greg Cottman

Technical Architect

Quest Software, Australia

Tel: +61 3 9811 8057





Re: Impromptu HBase survey

by Steven Noels :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 18, 2009 at 5:43 AM, Greg Cottman <greg.cottman@...>wrote:

OK.  I'll just move this e-mail to the "Torpid Disinterest" folder.  :'-(
>
> Thanks to the three people who replied but I don't think we have enough for
> a statistically sound sample.
>

I wouldn't call it disinterest - it's just (still) early times, I guess.
Monday, I have a small intro presentation on NoSQL and HBase during Devoxx,
it was a very late afternoon 30' session and there were about 300 people in
the room. I did a show of hands and almost all had heard about NoSQL stores,
while almost none had started doing actual work with them. So there's a lot
of interest, but people are still evaluating - and they have more choice
than every NoSQL project perhaps would like them to have. ;)

During the subsequent BOF session later in the evening, there were 50+
people, and the same situation: lots of (genuine!) interest, just a few
early adopters. But we actually had to force people out of the room (as
another BOF session was planned) - or else we would have sit there for a
nice couple of hours.

The questionnaire still stands IMO: some good questions, and hopefully some
new responses in a couple of months. I know we are trying, at the very
least.

Cheers,

Steven.
--
Steven Noels                            http://outerthought.org/
Outerthought                            Open Source Java & XML
stevenn at outerthought.org             Makers of the Daisy CMS