« Return to Thread: Use cases of HBase

Re: Use cases of HBase

by jaxzin :: Rate this Message:

| View in Thread

Thanks Gary, this is great!

I'm designing a central store/service for all user data for the fantasy section of ESPN.com (profile/preferences/record of activity, you name it).  The record-of-activity wouldn't be on a page view granularity but more like "created a league" or "won a trophy" type activities.  I expect it will be much more read-heavy, at least for the core column families.  And since it's user data, I expect it to be randomly accessed, keyed on our internal user IDs.  

I expect it could be fronted by a public RESTful service that browsers might access directly via Ajax, but our initial usage pattern will most likely be server-side inclusion of the data on the hosts responsible for rendering pages.  

But even if its only exposed internally, I don't want each client of the data to be aware its backed by HBase and so the store will be fronted by a web or TCP-based service to manage that abstraction layer.  Ideally it would be a RESTful service, but if I can't get that to perform I'd be willing to use a higher-performance protocol like Thrift, Google protobuf, etc.

If that's not enough info for guiding me, I'll gladly volunteer more. Thanks again.

Also to give you some background of what I know already, the reason I'm asking this publicly is that I spoke with an engineer that did a proof of concept with HBase and he found the cluster would tip over if you have more than 4 clients connecting to a regionserver for reads or 1 client/node for writes.  And that if a region server failed it corrupts the table in an unrecoverable way.  These issues sounded like blockers to me for using HBase in an online, mission-critical way so I figure I'm missing something big.  

Gary Helmling wrote:
Hey Brian,

We use HBase to complement MySQL in serving activity-stream type data here
at Meetup.  It's handling real-time requests involved in 20-25% of our page
views, but our latency requirements aren't as strict as yours.  For what
it's worth, I did a presentation on our setup which will hopefully fill in
some details: http://www.slideshare.net/ghelmling/hbase-at-meetup

There are also some great presentations by Ryan Rawson and Jonathan Gray on
how they've used HBase for realtime serving on their sites.  See the
presentations wiki page:
http://wiki.apache.org/hadoop/HBase/HBasePresentations

Like Barney, I suspect where you'll hit some issues will be in your latency
requirements.  Depending on how you layout your data and configure your
column families, your average latency may be good, but you will hit some
pauses as I believe reads block at times during region splits or compactions
and memstore flushes (unless you have a fairly static data set).  Others
here should be able to fill in more details.

With a relatively small dataset, you may want to look at the "in memory"
configuration option for your column families.

What's your expected workload -- writes vs. reads?  types of reads you'll be
doing: random access vs. sequential?  There are a lot of knowledgeable folks
here to offer advice if you can give us some more insight into what you're
trying to build.

--gh


On Tue, Mar 9, 2010 at 11:21 AM, jaxzin <Brian.R.Jackson@espn3.com> wrote:

>
> This is exactly the kind of feedback I'm looking for thanks, Barney.
>
> So its sounds like you cache the data you get from HBase in a session-based
> memory?  Are you using a Java EE HttpSession? (I'm less familiar with
> django/rails equivalent but I'm assuming they exist)  Or are you using a
> memory cache provider like ehcache or memcache(d)?
>
> Can you tell me more about your experience with latency and why you say
> that?
>
>
> Barney Frank wrote:
> >
> > I am using Hbase to store visitor level clickstream-like data.  At the
> > beginning of the visitor session I retrieve all the previous session data
> > from hbase and use it within my app server and massage it a little and
> > serve
> > to the consumer via web services.  Where I think you will run into the
> > most
> > problems is your latency requirement.
> >
> > Just my 2 cents from a user.
> >
> > On Tue, Mar 9, 2010 at 9:45 AM, jaxzin <Brian.R.Jackson@espn3.com>
> wrote:
> >
> >>
> >> Hi all, I've got a question about how everyone is using HBase.  Is
> anyone
> >> using its as online data store to directly back a web service?
> >>
> >> The text-book example of a weblink HBase table suggests there would be
> an
> >> associated web front-end to display the information in that HBase table
> >> (ex.
> >> search results page), but I'm having trouble finding evidence that
> anyone
> >> is
> >> servicing web traffic backed directly by an HBase instance in practice.
> >>
> >> I'm evaluating if HBase would be the right tool to provide a few things
> >> for
> >> a large-scale web service we want to develop at ESPN and I'd really like
> >> to
> >> get opinions and experience from people who have already been down this
> >> path.  No need to reinvent the wheel, right?
> >>
> >> I can tell you a little about the project goals if it helps give you an
> >> idea
> >> of what I'm trying to design for:
> >>
> >> 1) Highly available (It would be a central service and an outage would
> >> take
> >> down everything)
> >> 2) Low latency (1-2 ms, less is better, more isn't acceptable)
> >> 3) High throughput (5-10k req/sec at worse case peak)
> >> 4) Unstable traffic (ex. Sunday afternoons during football season)
> >> 5) Small data...for now (< 10 GB of total data currently, but HBase
> could
> >> allow us to design differently and store more online)
> >>
> >> The reason I'm looking at HBase is that we've solved many of our scaling
> >> issues with the same basic concepts of HBase (sharding, flattening data
> >> to
> >> fit in one row, throw away ACID, etc) but with home-grown software.  I'd
> >> like to adopt an active open-source project if it makes sense.
> >>
> >> Alternatives I'm also looking at: RDBMS fronted with Websphere eXtreme
> >> Scale, RDBMS fronted with Hibernate/ehcache, or (the option I understand
> >> the
> >> least right now) memcached.
> >>
> >> Thanks,
> >> Brian
> >> --
> >> View this message in context:
> >> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27837470.html
> >> Sent from the HBase User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Use-cases-of-HBase-tp27837470p27838006.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

 « Return to Thread: Use cases of HBase