|
View:
New views
13 Messages
—
Rating Filter:
Alert me
|
|
|
Rest, Stargate or Thrift ?Language-independent RPC services for HBase: is there concensus on
preference? Architecturally, I imagine they are equivalent. Performance-wise, I imagine Thrift is most optimal. As for support and maintenance by core HBase contributors, is Thrift also privileged? jo. |
|
|
Re: Rest, Stargate or Thrift ?On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote:
> Language-independent RPC services for HBase: is there concensus on > preference? Architecturally, I imagine they are equivalent. > Performance-wise, I imagine Thrift is most optimal. As for support and > maintenance by core HBase contributors, is Thrift also privileged? > > Yes, use Thrift. I was not able to get Stargate or the legacy REST API working acceptably at the time 0.20 was released. I don't remember what my problems were -- I think Stargate doesn't yet support scanners, and I couldn't find a working ruby client for REST. The only downside we've had is that our developers on Windows can't seem to get the thrift ruby gem installed. I don't see what Stargate could offer that would get sites like Stumbleupon to migrate away from Thrift, so even if Stargate improves, Thrift support will likely continue. |
|
|
Re: Rest, Stargate or Thrift ?The new REST API called Stargate is currently considered not yet
production-ready by its author, committer Andrew Purtell. I'll leave him expand further. If there's a bug or a feature is missing, the best thing to do is to write to the list and/or open a jira. The Thrift API is lagging in features compared to the current Java API. There's currently no official support from any core dev. At SU we use it but we are still considering other options; the main problem is that the implementation in some languages are buggy and hard to work with. Both APIs have little overhead since they act as thin clients over the fat Java client. I would consider running the API servers directly on the client machines and have the application code bind on localhost rather than going on the network. J-D On Wed, Nov 4, 2009 at 6:06 PM, Kevin Peterson <kpeterson@...> wrote: > On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote: > >> Language-independent RPC services for HBase: is there concensus on >> preference? Architecturally, I imagine they are equivalent. >> Performance-wise, I imagine Thrift is most optimal. As for support and >> maintenance by core HBase contributors, is Thrift also privileged? >> >> > Yes, use Thrift. > > I was not able to get Stargate or the legacy REST API working acceptably at > the time 0.20 was released. I don't remember what my problems were -- I > think Stargate doesn't yet support scanners, and I couldn't find a working > ruby client for REST. > > The only downside we've had is that our developers on Windows can't seem to > get the thrift ruby gem installed. > > I don't see what Stargate could offer that would get sites like Stumbleupon > to migrate away from Thrift, so even if Stargate improves, Thrift support > will likely continue. > |
|
|
Impromptu HBase surveyHi everyone,
I was mulling over the question from Jason Strutz at Cumulus Code before, and wondering what sort of data he was storing in HBase. Then it occurred to me that this question may have broad appeal to many HBase users who are interested in "what" other developers are doing, as opposed to the usual "how" questions. To this end, I would invite people who feel like sharing to give me a paragraph or two on what they are doing with HBase. Of course, I don't want anyone to give away their eleven secret herbs and spices or tell me what Ingredient X is. :-) I am more interested in metadata and semantics. To give you an idea of questions that I wonder about: * Are you using a natural or synthetic key? * Are you using HBase index tables or maintaining your own? * Do you have multiple data tables in your HBase server? * How many rows of data are in each HBase table? * What type of data are you storing in each record? * Are you using column families to localize data or store name/value pairs? * Are there columns like name, address, etc., that are present in each row? * Are you running HBase on your own servers or on Amazon EC2? * Are you using Hadoop to run map/reduce functions against HBase? * How does your client interact with HBase? Java API, REST, Stargate, Thrift, other (please specify), etc. Anyone who is interested in responding can do so to the list or directly to me. I will keep your responses but not your name or company. Feel free to answer some or all of the questions, or add your own information that you feel is pertinent to how you are using HBase. I will give it a week and then collate the responses into an integrated summary that I will publish back to this list. I should declare that I have no official HBase standing. I'm just very curious about NoSQL databases as an emerging technology, and HBase in particular. The 'net shows a general consensus is that HBase is an early NoSQL leader but no-one discusses specifics. Some empirical data would be very interesting. Thanks in advance, Greg. Greg Cottman Technical Architect Quest Software, Australia Tel: +61 3 9811 8057 |
|
|
Re: Impromptu HBase surveyWe don't run HBase in operational mode yet, but researching it with a
goal of moving towards there... > To give you an idea of questions that I wonder about: > > * Are you using a natural or synthetic key? - synthetic. UUID but considering an encoded uuid to shorten it. Would like to see some KeyUtils classes in the HBase library, or some recommendations. I'd like an 8 char synthetic key ideally, but haven't found a good way to do this yet (lack of time). > * Are you using HBase index tables or maintaining your own? - lucene, but will use hbase index tables > * Do you have multiple data tables in your HBase server? - yes, but only for convenience of keeping the 2 small tables with the big one. > * How many rows of data are in each HBase table? - 200 million. When operation, will expect to grow at 5-10%/month and expect columns to grow at 10% or so per month also > * What type of data are you storing in each record? - 30-60 fields of INT and String - might be putting in PNGs in a new table to represent google map tiles > * Are you using column families to localize data or store name/value pairs? - no > * Are there columns like name, address, etc., that are present in each row? - no (http://rs.tdwg.org/dwc/terms/index.htm is our term vocabulary) > * Are you running HBase on your own servers or on Amazon EC2? - in house 10 node cluster with 3 masters (http://code.google.com/p/biodiversity/wiki/ClusterSetup) > * Are you using Hadoop to run map/reduce functions against HBase? - progressing towards this. Still using text file exports from a mysql DB in Hadoop, as HBase is not in production mode yet > * How does your client interact with HBase? Java API, REST, Stargate, Thrift, other (please specify), etc. - JavaAPI |
|
|
Re: Rest, Stargate or Thrift ?Stargate (REST HTTP) is not going to replace Thrift.
Stargate does support scanners. - Andy ________________________________ From: Kevin Peterson <kpeterson@...> To: hbase-user@... Sent: Thu, November 5, 2009 10:06:36 AM Subject: Re: Rest, Stargate or Thrift ? On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote: > Language-independent RPC services for HBase: is there concensus on > preference? Architecturally, I imagine they are equivalent. > Performance-wise, I imagine Thrift is most optimal. As for support and > maintenance by core HBase contributors, is Thrift also privileged? > > Yes, use Thrift. I was not able to get Stargate or the legacy REST API working acceptably at the time 0.20 was released. I don't remember what my problems were -- I think Stargate doesn't yet support scanners, and I couldn't find a working ruby client for REST. The only downside we've had is that our developers on Windows can't seem to get the thrift ruby gem installed. I don't see what Stargate could offer that would get sites like Stumbleupon to migrate away from Thrift, so even if Stargate improves, Thrift support will likely continue. |
|
|
Re: Rest, Stargate or Thrift ?Stargate in 0.20 is experimental. This is because there is little experience with it. If you try it out, and have success or not, a note on your experiences and observations would be most helpful. Use the version in HBase 0.20.1+.
For HBase 0.21, Stargate will be "production ready". See http://su.pr/3g5P7B for open issues in that regard. - Andy ________________________________ From: Jean-Daniel Cryans <jdcryans@...> To: hbase-user@... Sent: Thu, November 5, 2009 12:55:49 PM Subject: Re: Rest, Stargate or Thrift ? The new REST API called Stargate is currently considered not yet production-ready by its author, committer Andrew Purtell. I'll leave him expand further. If there's a bug or a feature is missing, the best thing to do is to write to the list and/or open a jira. The Thrift API is lagging in features compared to the current Java API. There's currently no official support from any core dev. At SU we use it but we are still considering other options; the main problem is that the implementation in some languages are buggy and hard to work with. Both APIs have little overhead since they act as thin clients over the fat Java client. I would consider running the API servers directly on the client machines and have the application code bind on localhost rather than going on the network. J-D On Wed, Nov 4, 2009 at 6:06 PM, Kevin Peterson <kpeterson@...> wrote: > On Wed, Nov 4, 2009 at 5:10 PM, Joost Ouwerkerk <joost@...>wrote: > >> Language-independent RPC services for HBase: is there concensus on >> preference? Architecturally, I imagine they are equivalent. >> Performance-wise, I imagine Thrift is most optimal. As for support and >> maintenance by core HBase contributors, is Thrift also privileged? >> >> > Yes, use Thrift. > > I was not able to get Stargate or the legacy REST API working acceptably at > the time 0.20 was released. I don't remember what my problems were -- I > think Stargate doesn't yet support scanners, and I couldn't find a working > ruby client for REST. > > The only downside we've had is that our developers on Windows can't seem to > get the thrift ruby gem installed. > > I don't see what Stargate could offer that would get sites like Stumbleupon > to migrate away from Thrift, so even if Stargate improves, Thrift support > will likely continue. > |
|
|
Re: Rest, Stargate or Thrift ?> Language-independent RPC services for HBase: is there concensus on > preference? Architecturally, I imagine they are equivalent. > Performance-wise, I imagine Thrift is most optimal. As for support and > maintenance by core HBase contributors, is Thrift also privileged? > > jo. > Thrift is probably the most used in the field but then again it's a complete guess at this point as there are no official statements on the Hadoop web site (or perhaps it is lost within the arcane of the beast) to what is officially supported or not and by whom. I mean could you also consider Avro for that task? It seems new comers to Hadoop/Hbase have to go through the same steps of "try and see for your own business case". Learning the hard way is sometimes the best bet. - Sylvain -- Sylvain Hellegouarch http://www.defuze.org |
|
|
RE: Impromptu HBase surveyEvenly interesting would be to know what type of data you are storing. I
mean, if I store web crawled data in my hbase it doesn't matter if I miss or lose some or more pages, does it? I'll crawl it next time. Is the data or every element of it business critical? Is it derived data from some other source? Aggregated data? Or do we store traditional online transactions? -----Original Message----- From: Tim Robertson [mailto:timrobertson100@...] Sent: donderdag 5 november 2009 11:14 To: hbase-user@... Subject: Re: Impromptu HBase survey We don't run HBase in operational mode yet, but researching it with a goal of moving towards there... > To give you an idea of questions that I wonder about: > > * Are you using a natural or synthetic key? - synthetic. UUID but considering an encoded uuid to shorten it. Would like to see some KeyUtils classes in the HBase library, or some recommendations. I'd like an 8 char synthetic key ideally, but haven't found a good way to do this yet (lack of time). > * Are you using HBase index tables or maintaining your own? - lucene, but will use hbase index tables > * Do you have multiple data tables in your HBase server? - yes, but only for convenience of keeping the 2 small tables with the big one. > * How many rows of data are in each HBase table? - 200 million. When operation, will expect to grow at 5-10%/month and expect columns to grow at 10% or so per month also > * What type of data are you storing in each record? - 30-60 fields of INT and String - might be putting in PNGs in a new table to represent google map tiles > * Are you using column families to localize data or store name/value pairs? - no > * Are there columns like name, address, etc., that are present in each row? - no (http://rs.tdwg.org/dwc/terms/index.htm is our term vocabulary) > * Are you running HBase on your own servers or on Amazon EC2? - in house 10 node cluster with 3 masters (http://code.google.com/p/biodiversity/wiki/ClusterSetup) > * Are you using Hadoop to run map/reduce functions against HBase? - progressing towards this. Still using text file exports from a mysql DB in Hadoop, as HBase is not in production mode yet > * How does your client interact with HBase? Java API, REST, Stargate, Thrift, other (please specify), etc. - JavaAPI |
|
|
Re: Rest, Stargate or Thrift ?On a related note, HBASE-1015 is about a native C/C++ client library to improve the client side integration story. What falls out of this is replacement of HRPC with some platform independent RPC for direct access to master and regionserver interfaces. The options there I think are to use Thrift or wait for Avro. The discussion of one always brings up the other.
- Andy ________________________________ From: Sylvain Hellegouarch <sh@...> To: hbase-user@... Cc: hbase-user@... Sent: Thu, November 5, 2009 10:08:49 PM Subject: Re: Rest, Stargate or Thrift ? > Language-independent RPC services for HBase: is there concensus on > preference? Architecturally, I imagine they are equivalent. > Performance-wise, I imagine Thrift is most optimal. As for support and > maintenance by core HBase contributors, is Thrift also privileged? > > jo. > Thrift is probably the most used in the field but then again it's a complete guess at this point as there are no official statements on the Hadoop web site (or perhaps it is lost within the arcane of the beast) to what is officially supported or not and by whom. I mean could you also consider Avro for that task? It seems new comers to Hadoop/Hbase have to go through the same steps of "try and see for your own business case". Learning the hard way is sometimes the best bet. - Sylvain -- Sylvain Hellegouarch http://www.defuze.org |
|
|
Re: Impromptu HBase surveyOn Wed, Nov 4, 2009 at 10:39 PM, Greg Cottman <greg.cottman@...>wrote:
> Hi everyone, > > To this end, I would invite people who feel like sharing to give me a > paragraph or two on what they are doing with HBase. Of course, I don't want > anyone to give away their eleven secret herbs and spices or tell me what > Ingredient X is. :-) I am more interested in metadata and semantics. > > > To give you an idea of questions that I wonder about: > > * Are you using a natural or synthetic key? > > Keys are urls that have been md5'd so there is a good spread across the > * Are you using HBase index tables or maintaining your own? > > No > * Do you have multiple data tables in your HBase server? > > Yes. About 50 tables. > * How many rows of data are in each HBase table? > > Between 3-20 million rows in each. > * What type of data are you storing in each record? > > Some of the tables have wikipedia content and then derivatives; mimetypes, inlinks, alternate urls, etc., etc. Other tables hold other indexing pipeline input and intermediate product. > * Are you using column families to localize data or store name/value > pairs? > > Both. > * Are there columns like name, address, etc., that are present in > each row? > > Sort-of. > * Are you running HBase on your own servers or on Amazon EC2? > > Own. Between 100 and 110. > * Are you using Hadoop to run map/reduce functions against HBase? > > Yes. > * How does your client interact with HBase? Java API, REST, > Stargate, Thrift, other (please specify), etc. > > Java and REST and thrift. Above responses describe a cluster from 6 months ago. St.Ack > > Anyone who is interested in responding can do so to the list or directly to > me. I will keep your responses but not your name or company. Feel free to > answer some or all of the questions, or add your own information that you > feel is pertinent to how you are using HBase. I will give it a week and > then collate the responses into an integrated summary that I will publish > back to this list. > > > > I should declare that I have no official HBase standing. I'm just very > curious about NoSQL databases as an emerging technology, and HBase in > particular. The 'net shows a general consensus is that HBase is an early > NoSQL leader but no-one discusses specifics. Some empirical data would be > very interesting. > > > > Thanks in advance, > > Greg. > > > > > > Greg Cottman > > Technical Architect > > Quest Software, Australia > > Tel: +61 3 9811 8057 > > > > > |
|
|
RE: Impromptu HBase surveyOK. I'll just move this e-mail to the "Torpid Disinterest" folder. :'-( Thanks to the three people who replied but I don't think we have enough for a statistically sound sample. -----Original Message----- From: Greg Cottman [mailto:greg.cottman@...] Sent: Thursday, 5 November 2009 5:39 PM To: hbase-user@... Subject: Impromptu HBase survey Hi everyone, I was mulling over the question from Jason Strutz at Cumulus Code before, and wondering what sort of data he was storing in HBase. Then it occurred to me that this question may have broad appeal to many HBase users who are interested in "what" other developers are doing, as opposed to the usual "how" questions. To this end, I would invite people who feel like sharing to give me a paragraph or two on what they are doing with HBase. Of course, I don't want anyone to give away their eleven secret herbs and spices or tell me what Ingredient X is. :-) I am more interested in metadata and semantics. To give you an idea of questions that I wonder about: * Are you using a natural or synthetic key? * Are you using HBase index tables or maintaining your own? * Do you have multiple data tables in your HBase server? * How many rows of data are in each HBase table? * What type of data are you storing in each record? * Are you using column families to localize data or store name/value pairs? * Are there columns like name, address, etc., that are present in each row? * Are you running HBase on your own servers or on Amazon EC2? * Are you using Hadoop to run map/reduce functions against HBase? * How does your client interact with HBase? Java API, REST, Stargate, Thrift, other (please specify), etc. Anyone who is interested in responding can do so to the list or directly to me. I will keep your responses but not your name or company. Feel free to answer some or all of the questions, or add your own information that you feel is pertinent to how you are using HBase. I will give it a week and then collate the responses into an integrated summary that I will publish back to this list. I should declare that I have no official HBase standing. I'm just very curious about NoSQL databases as an emerging technology, and HBase in particular. The 'net shows a general consensus is that HBase is an early NoSQL leader but no-one discusses specifics. Some empirical data would be very interesting. Thanks in advance, Greg. Greg Cottman Technical Architect Quest Software, Australia Tel: +61 3 9811 8057 |
|
|
Re: Impromptu HBase surveyOn Wed, Nov 18, 2009 at 5:43 AM, Greg Cottman <greg.cottman@...>wrote:
OK. I'll just move this e-mail to the "Torpid Disinterest" folder. :'-( > > Thanks to the three people who replied but I don't think we have enough for > a statistically sound sample. > I wouldn't call it disinterest - it's just (still) early times, I guess. Monday, I have a small intro presentation on NoSQL and HBase during Devoxx, it was a very late afternoon 30' session and there were about 300 people in the room. I did a show of hands and almost all had heard about NoSQL stores, while almost none had started doing actual work with them. So there's a lot of interest, but people are still evaluating - and they have more choice than every NoSQL project perhaps would like them to have. ;) During the subsequent BOF session later in the evening, there were 50+ people, and the same situation: lots of (genuine!) interest, just a few early adopters. But we actually had to force people out of the room (as another BOF session was planned) - or else we would have sit there for a nice couple of hours. The questionnaire still stands IMO: some good questions, and hopefully some new responses in a couple of months. I know we are trying, at the very least. Cheers, Steven. -- Steven Noels http://outerthought.org/ Outerthought Open Source Java & XML stevenn at outerthought.org Makers of the Daisy CMS |
| Free embeddable forum powered by Nabble | Forum Help |