HBase table design question

View: New views
8 Messages — Rating Filter:   Alert me  

HBase table design question

by Something Something-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.

In the relational database world, we could have a table with following columns:

Primary Key (system generated)
UserId (foreign key)
WebPageId (foreign key)
VisitedDateTime
& so on....

Basically, this table would allow us to answer (amongst many others) the following questions...

1)  How many times a User visited a certain Page?
2)  Which web pages did a particular user visit?
3)  Which users visited a particular web page?  etc etc.

What's the best way to model this in HTable?  

Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?

1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
2) One table with UserId as the key? (To answer #2)
3) One table with WebPageId as the key? (To answer #3)

Along with HTable should I use Hive to run queries such as #1 above?  

Any help in this regard will be greatly appreciated.  Thanks.


     

Re: HBase table design question

by Jonathan Gray-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You're generally on the right track.  In many cases, rather than using
secondary indexes in the relational world, you would have multiple
tables in HBase with different keys.

You may not need a table for each query, but that depends on your
requirements of performance and the specific details of the data
patterns (how sparse or dense certain things will be).

I would start with a User table and a WebPage table, keyed by their ids.

The User table could have a Visited family.  The WebPage table could
have a VisitedBy family.

Your queries could be run like this:

1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
    There are a couple different ways you could model the data here.
You could either put in a new version of the same qualifier for each
visit, or you could make the qualifier a composite key like
WebPageID+VisitStamp, so they would then be grouped together.

2) Get(table=User, row=userid, family=Visited)
    All qualifiers would represent all pages visited.

3) Get(table=WebPage, row=pageid, family=VisitedBy)
    All qualifiers would represent all users who visited.  You could
store multiple visits by the same user in different ways, as above.


As for using hive to run these queries, that is not something I would
recommend.  For one, hive integration with hbase is not complete (as far
as I know).  Second, hive's emphasis is on batch/offline mapreduce jobs.
   Running the above 3 queries can be done with the HBase API directly,
and efficiently.  There's no need for SQL or anything like it.

Hope that helps.

JG

Something Something wrote:

> Hello,
>
> Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.
>
> In the relational database world, we could have a table with following columns:
>
> Primary Key (system generated)
> UserId (foreign key)
> WebPageId (foreign key)
> VisitedDateTime
> & so on....
>
> Basically, this table would allow us to answer (amongst many others) the following questions...
>
> 1)  How many times a User visited a certain Page?
> 2)  Which web pages did a particular user visit?
> 3)  Which users visited a particular web page?  etc etc.
>
> What's the best way to model this in HTable?  
>
> Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?
>
> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
> 2) One table with UserId as the key? (To answer #2)
> 3) One table with WebPageId as the key? (To answer #3)
>
> Along with HTable should I use Hive to run queries such as #1 above?  
>
> Any help in this regard will be greatly appreciated.  Thanks.
>
>
>      

Re: HBase table design question

by Barney Frank :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I am no expert, but I am doing something very similar i.e. tracking user
sessions within Hbase.

2 tables:

Table 1:  'Users'
Column Family 1: WebPages
Columns: Page Names
RowId=UserId

For a given userid, you could retrieve the pages visited, the number of
times (watch out for versions), and the first/last date visited.  I use the
PageId as the column and use the cells for the count and date info.

Table 2: 'Pages'
Column Family 1: Visits
Columns: UserIds
RowId=PageId

For a given PageId, retrieve the userIds.  Depending upon the volume of the
web site and exactly what types of queries you have, you might want to store
the userIds as columns within the Visits.  Then iterate over the userIds.

How much data is kept is governed by the number of VERSIONS and TTL
settings.

My two cents.

Good luck.

On Wed, Oct 21, 2009 at 10:03 AM, Something Something <
luckyguy2050@...> wrote:

> Hello,
>
> Trying to figure out what's the recommended way of designing tables under
> HBase.  Let's say I need a table to gather statistics regarding user's
> visits to different web pages.
>
> In the relational database world, we could have a table with following
> columns:
>
> Primary Key (system generated)
> UserId (foreign key)
> WebPageId (foreign key)
> VisitedDateTime
> & so on....
>
> Basically, this table would allow us to answer (amongst many others) the
> following questions...
>
> 1)  How many times a User visited a certain Page?
> 2)  Which web pages did a particular user visit?
> 3)  Which users visited a particular web page?  etc etc.
>
> What's the best way to model this in HTable?
>
> Since every HTable is really a distributed hashmap, does that mean I need
> to create 3 different HTables (HashMaps) to answer these 3 questions?
>
> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
> 2) One table with UserId as the key? (To answer #2)
> 3) One table with WebPageId as the key? (To answer #3)
>
> Along with HTable should I use Hive to run queries such as #1 above?
>
> Any help in this regard will be greatly appreciated.  Thanks.
>
>
>

Re: HBase table design question

by Something Something-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks, Jonathan for the reply.  One quick question...

So in the User table when I perform the put operation:

.put("visited", "pageId", 100);

.put("visited", "pageId", 200);

The 100 gets overwritten with 200.  Correct?  So should I use... something like this...

.put("visited", "pageId100", 100);
.put("visited", "pageId200", 200);

I guess, I am still missing something... sorry.. Please explain.  Thanks.




________________________________
From: Jonathan Gray <jlist@...>
To: hbase-user@...
Sent: Wed, October 21, 2009 10:25:52 AM
Subject: Re: HBase table design question

You're generally on the right track.  In many cases, rather than using secondary indexes in the relational world, you would have multiple tables in HBase with different keys.

You may not need a table for each query, but that depends on your requirements of performance and the specific details of the data patterns (how sparse or dense certain things will be).

I would start with a User table and a WebPage table, keyed by their ids.

The User table could have a Visited family.  The WebPage table could have a VisitedBy family.

Your queries could be run like this:

1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
   There are a couple different ways you could model the data here. You could either put in a new version of the same qualifier for each visit, or you could make the qualifier a composite key like WebPageID+VisitStamp, so they would then be grouped together.

2) Get(table=User, row=userid, family=Visited)
   All qualifiers would represent all pages visited.

3) Get(table=WebPage, row=pageid, family=VisitedBy)
   All qualifiers would represent all users who visited.  You could store multiple visits by the same user in different ways, as above.


As for using hive to run these queries, that is not something I would recommend.  For one, hive integration with hbase is not complete (as far as I know).  Second, hive's emphasis is on batch/offline mapreduce jobs.   Running the above 3 queries can be done with the HBase API directly, and efficiently.  There's no need for SQL or anything like it.

Hope that helps.

JG

Something Something wrote:

> Hello,
>
> Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.
>
> In the relational database world, we could have a table with following columns:
>
> Primary Key (system generated)
> UserId (foreign key)
> WebPageId (foreign key)
> VisitedDateTime & so on....
>
> Basically, this table would allow us to answer (amongst many others) the following questions...
>
> 1)  How many times a User visited a certain Page?
> 2)  Which web pages did a particular user visit?
> 3)  Which users visited a particular web page?  etc etc.
>
> What's the best way to model this in HTable?  
> Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?
>
> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
> 2) One table with UserId as the key? (To answer #2)
> 3) One table with WebPageId as the key? (To answer #3)
>
> Along with HTable should I use Hive to run queries such as #1 above?  
> Any help in this regard will be greatly appreciated.  Thanks.
>
>
>      


     

Re: HBase table design question

by Something Something-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

No responses to this question :(  Is my question that stupid, I wonder!




________________________________
From: Something Something <luckyguy2050@...>
To: hbase-user@...
Sent: Wed, October 21, 2009 12:16:19 PM
Subject: Re: HBase table design question

Thanks, Jonathan for the reply.  One quick question...

So in the User table when I perform the put operation:

.put("visited", "pageId", 100);

.put("visited", "pageId", 200);

The 100 gets overwritten with 200.  Correct?  So should I use... something like this...

.put("visited", "pageId100", 100);
.put("visited", "pageId200", 200);

I guess, I am still missing something... sorry.. Please explain.  Thanks.




________________________________
From: Jonathan Gray <jlist@...>
To: hbase-user@...
Sent: Wed, October 21, 2009 10:25:52 AM
Subject: Re: HBase table design question

You're generally on the right track.  In many cases, rather than using secondary indexes in the relational world, you would have multiple tables in HBase with different keys.

You may not need a table for each query, but that depends on your requirements of performance and the specific details of the data patterns (how sparse or dense certain things will be).

I would start with a User table and a WebPage table, keyed by their ids.

The User table could have a Visited family.  The WebPage table could have a VisitedBy family.

Your queries could be run like this:

1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
   There are a couple different ways you could model the data here. You could either put in a new version of the same qualifier for each visit, or you could make the qualifier a composite key like WebPageID+VisitStamp, so they would then be grouped together.

2) Get(table=User, row=userid, family=Visited)
   All qualifiers would represent all pages visited.

3) Get(table=WebPage, row=pageid, family=VisitedBy)
   All qualifiers would represent all users who visited.  You could store multiple visits by the same user in different ways, as above.


As for using hive to run these queries, that is not something I would recommend.  For one, hive integration with hbase is not complete (as far as I know).  Second, hive's emphasis is on batch/offline mapreduce jobs.   Running the above 3 queries can be done with the HBase API directly, and efficiently.  There's no need for SQL or anything like it.

Hope that helps.

JG

Something Something wrote:

> Hello,
>
> Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.
>
> In the relational database world, we could have a table with following columns:
>
> Primary Key (system generated)
> UserId (foreign key)
> WebPageId (foreign key)
> VisitedDateTime & so on....
>
> Basically, this table would allow us to answer (amongst many others) the following questions...
>
> 1)  How many times a User visited a certain Page?
> 2)  Which web pages did a particular user visit?
> 3)  Which users visited a particular web page?  etc etc.
>
> What's the best way to model this in HTable?  
> Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?
>
> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
> 2) One table with UserId as the key? (To answer #2)
> 3) One table with WebPageId as the key? (To answer #3)
>
> Along with HTable should I use Hive to run queries such as #1 above?  
> Any help in this regard will be greatly appreciated.  Thanks.
>
>
>      


     

Re: HBase table design question

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think your question was just forgotten.

So your value will not be overwritten, it will simply be on 2
different timestamps and only the latest one will be retrieved if you
do not specify one on your Get. By default 3 versions of that cell
will be kept but you can change this with the family attributes.

J-D

On Tue, Oct 27, 2009 at 10:17 AM, Something Something
<luckyguy2050@...> wrote:

> No responses to this question :(  Is my question that stupid, I wonder!
>
>
>
>
> ________________________________
> From: Something Something <luckyguy2050@...>
> To: hbase-user@...
> Sent: Wed, October 21, 2009 12:16:19 PM
> Subject: Re: HBase table design question
>
> Thanks, Jonathan for the reply.  One quick question...
>
> So in the User table when I perform the put operation:
>
> .put("visited", "pageId", 100);
>
> .put("visited", "pageId", 200);
>
> The 100 gets overwritten with 200.  Correct?  So should I use... something like this...
>
> .put("visited", "pageId100", 100);
> .put("visited", "pageId200", 200);
>
> I guess, I am still missing something... sorry.. Please explain.  Thanks.
>
>
>
>
> ________________________________
> From: Jonathan Gray <jlist@...>
> To: hbase-user@...
> Sent: Wed, October 21, 2009 10:25:52 AM
> Subject: Re: HBase table design question
>
> You're generally on the right track.  In many cases, rather than using secondary indexes in the relational world, you would have multiple tables in HBase with different keys.
>
> You may not need a table for each query, but that depends on your requirements of performance and the specific details of the data patterns (how sparse or dense certain things will be).
>
> I would start with a User table and a WebPage table, keyed by their ids.
>
> The User table could have a Visited family.  The WebPage table could have a VisitedBy family.
>
> Your queries could be run like this:
>
> 1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
>   There are a couple different ways you could model the data here. You could either put in a new version of the same qualifier for each visit, or you could make the qualifier a composite key like WebPageID+VisitStamp, so they would then be grouped together.
>
> 2) Get(table=User, row=userid, family=Visited)
>   All qualifiers would represent all pages visited.
>
> 3) Get(table=WebPage, row=pageid, family=VisitedBy)
>   All qualifiers would represent all users who visited.  You could store multiple visits by the same user in different ways, as above.
>
>
> As for using hive to run these queries, that is not something I would recommend.  For one, hive integration with hbase is not complete (as far as I know).  Second, hive's emphasis is on batch/offline mapreduce jobs.   Running the above 3 queries can be done with the HBase API directly, and efficiently.  There's no need for SQL or anything like it.
>
> Hope that helps.
>
> JG
>
> Something Something wrote:
>> Hello,
>>
>> Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.
>>
>> In the relational database world, we could have a table with following columns:
>>
>> Primary Key (system generated)
>> UserId (foreign key)
>> WebPageId (foreign key)
>> VisitedDateTime & so on....
>>
>> Basically, this table would allow us to answer (amongst many others) the following questions...
>>
>> 1)  How many times a User visited a certain Page?
>> 2)  Which web pages did a particular user visit?
>> 3)  Which users visited a particular web page?  etc etc.
>>
>> What's the best way to model this in HTable?
>> Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?
>>
>> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
>> 2) One table with UserId as the key? (To answer #2)
>> 3) One table with WebPageId as the key? (To answer #3)
>>
>> Along with HTable should I use Hive to run queries such as #1 above?
>> Any help in this regard will be greatly appreciated.  Thanks.
>>
>>
>>
>
>
>

Re: HBase table design question

by Something Something-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks, Jean-Daniel, for the reply.  Greatly appreciate it.

So is this the recommended way of implementing Parent-Child relationship in HBase?  Like... a User Visits zero to many WebPages   or say...   a Customer buys 1 to many Items.  In such cases, would we create a "Customer" HTable with a "buys" family and keep adding "ItemsIds" for every "CustomerId"?  Sounds a bit akward for some reason.. but if that's the recommended way then that's how I will implement it.  Please let me know what's the best way to implement Parent-Child relationships in HBase is.

Thanks.




________________________________
From: Jean-Daniel Cryans <jdcryans@...>
To: hbase-user@...
Sent: Tue, October 27, 2009 11:06:04 AM
Subject: Re: HBase table design question

I think your question was just forgotten.

So your value will not be overwritten, it will simply be on 2
different timestamps and only the latest one will be retrieved if you
do not specify one on your Get. By default 3 versions of that cell
will be kept but you can change this with the family attributes.

J-D

On Tue, Oct 27, 2009 at 10:17 AM, Something Something
<luckyguy2050@...> wrote:

> No responses to this question :(  Is my question that stupid, I wonder!
>
>
>
>
> ________________________________
> From: Something Something <luckyguy2050@...>
> To: hbase-user@...
> Sent: Wed, October 21, 2009 12:16:19 PM
> Subject: Re: HBase table design question
>
> Thanks, Jonathan for the reply.  One quick question...
>
> So in the User table when I perform the put operation:
>
> .put("visited", "pageId", 100);
>
> .put("visited", "pageId", 200);
>
> The 100 gets overwritten with 200.  Correct?  So should I use... something like this...
>
> .put("visited", "pageId100", 100);
> .put("visited", "pageId200", 200);
>
> I guess, I am still missing something... sorry.. Please explain.  Thanks.
>
>
>
>
> ________________________________
> From: Jonathan Gray <jlist@...>
> To: hbase-user@...
> Sent: Wed, October 21, 2009 10:25:52 AM
> Subject: Re: HBase table design question
>
> You're generally on the right track.  In many cases, rather than using secondary indexes in the relational world, you would have multiple tables in HBase with different keys.
>
> You may not need a table for each query, but that depends on your requirements of performance and the specific details of the data patterns (how sparse or dense certain things will be).
>
> I would start with a User table and a WebPage table, keyed by their ids.
>
> The User table could have a Visited family.  The WebPage table could have a VisitedBy family.
>
> Your queries could be run like this:
>
> 1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
>   There are a couple different ways you could model the data here. You could either put in a new version of the same qualifier for each visit, or you could make the qualifier a composite key like WebPageID+VisitStamp, so they would then be grouped together.
>
> 2) Get(table=User, row=userid, family=Visited)
>   All qualifiers would represent all pages visited.
>
> 3) Get(table=WebPage, row=pageid, family=VisitedBy)
>   All qualifiers would represent all users who visited.  You could store multiple visits by the same user in different ways, as above.
>
>
> As for using hive to run these queries, that is not something I would recommend.  For one, hive integration with hbase is not complete (as far as I know).  Second, hive's emphasis is on batch/offline mapreduce jobs.   Running the above 3 queries can be done with the HBase API directly, and efficiently.  There's no need for SQL or anything like it.
>
> Hope that helps.
>
> JG
>
> Something Something wrote:
>> Hello,
>>
>> Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.
>>
>> In the relational database world, we could have a table with following columns:
>>
>> Primary Key (system generated)
>> UserId (foreign key)
>> WebPageId (foreign key)
>> VisitedDateTime & so on....
>>
>> Basically, this table would allow us to answer (amongst many others) the following questions...
>>
>> 1)  How many times a User visited a certain Page?
>> 2)  Which web pages did a particular user visit?
>> 3)  Which users visited a particular web page?  etc etc.
>>
>> What's the best way to model this in HTable?
>> Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?
>>
>> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
>> 2) One table with UserId as the key? (To answer #2)
>> 3) One table with WebPageId as the key? (To answer #3)
>>
>> Along with HTable should I use Hive to run queries such as #1 above?
>> Any help in this regard will be greatly appreciated.  Thanks.
>>
>>
>>
>
>
>



     

Re: HBase table design question

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Yeah it may be awkward to repeat the same information in your second
solution but that's usually how it's done, you could even drop the
"pageid" part of the qualifier and just call the family like that so
that "pageid:100" returns you... 100. But then you could denormalize
some more so let's say that there's only one value you really need
while doing that join which could be the page title so "pageid:100"
returns "Some webpage title" and maybe even save you from doing
another Get. You would probably duplicate a lot of data but if that
family is compressed then it doesn't have a big impact.

J-D

On Tue, Oct 27, 2009 at 11:48 AM, Something Something
<luckyguy2050@...> wrote:

> Thanks, Jean-Daniel, for the reply.  Greatly appreciate it.
>
> So is this the recommended way of implementing Parent-Child relationship in HBase?  Like... a User Visits zero to many WebPages   or say...   a Customer buys 1 to many Items.  In such cases, would we create a "Customer" HTable with a "buys" family and keep adding "ItemsIds" for every "CustomerId"?  Sounds a bit akward for some reason.. but if that's the recommended way then that's how I will implement it.  Please let me know what's the best way to implement Parent-Child relationships in HBase is.
>
> Thanks.
>
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <jdcryans@...>
> To: hbase-user@...
> Sent: Tue, October 27, 2009 11:06:04 AM
> Subject: Re: HBase table design question
>
> I think your question was just forgotten.
>
> So your value will not be overwritten, it will simply be on 2
> different timestamps and only the latest one will be retrieved if you
> do not specify one on your Get. By default 3 versions of that cell
> will be kept but you can change this with the family attributes.
>
> J-D
>
> On Tue, Oct 27, 2009 at 10:17 AM, Something Something
> <luckyguy2050@...> wrote:
>> No responses to this question :(  Is my question that stupid, I wonder!
>>
>>
>>
>>
>> ________________________________
>> From: Something Something <luckyguy2050@...>
>> To: hbase-user@...
>> Sent: Wed, October 21, 2009 12:16:19 PM
>> Subject: Re: HBase table design question
>>
>> Thanks, Jonathan for the reply.  One quick question...
>>
>> So in the User table when I perform the put operation:
>>
>> .put("visited", "pageId", 100);
>>
>> .put("visited", "pageId", 200);
>>
>> The 100 gets overwritten with 200.  Correct?  So should I use... something like this...
>>
>> .put("visited", "pageId100", 100);
>> .put("visited", "pageId200", 200);
>>
>> I guess, I am still missing something... sorry.. Please explain.  Thanks.
>>
>>
>>
>>
>> ________________________________
>> From: Jonathan Gray <jlist@...>
>> To: hbase-user@...
>> Sent: Wed, October 21, 2009 10:25:52 AM
>> Subject: Re: HBase table design question
>>
>> You're generally on the right track.  In many cases, rather than using secondary indexes in the relational world, you would have multiple tables in HBase with different keys.
>>
>> You may not need a table for each query, but that depends on your requirements of performance and the specific details of the data patterns (how sparse or dense certain things will be).
>>
>> I would start with a User table and a WebPage table, keyed by their ids.
>>
>> The User table could have a Visited family.  The WebPage table could have a VisitedBy family.
>>
>> Your queries could be run like this:
>>
>> 1) Get(table=User, row=userid, family=Visited, qualifier=WebPageID)
>>   There are a couple different ways you could model the data here. You could either put in a new version of the same qualifier for each visit, or you could make the qualifier a composite key like WebPageID+VisitStamp, so they would then be grouped together.
>>
>> 2) Get(table=User, row=userid, family=Visited)
>>   All qualifiers would represent all pages visited.
>>
>> 3) Get(table=WebPage, row=pageid, family=VisitedBy)
>>   All qualifiers would represent all users who visited.  You could store multiple visits by the same user in different ways, as above.
>>
>>
>> As for using hive to run these queries, that is not something I would recommend.  For one, hive integration with hbase is not complete (as far as I know).  Second, hive's emphasis is on batch/offline mapreduce jobs.   Running the above 3 queries can be done with the HBase API directly, and efficiently.  There's no need for SQL or anything like it.
>>
>> Hope that helps.
>>
>> JG
>>
>> Something Something wrote:
>>> Hello,
>>>
>>> Trying to figure out what's the recommended way of designing tables under HBase.  Let's say I need a table to gather statistics regarding user's visits to different web pages.
>>>
>>> In the relational database world, we could have a table with following columns:
>>>
>>> Primary Key (system generated)
>>> UserId (foreign key)
>>> WebPageId (foreign key)
>>> VisitedDateTime & so on....
>>>
>>> Basically, this table would allow us to answer (amongst many others) the following questions...
>>>
>>> 1)  How many times a User visited a certain Page?
>>> 2)  Which web pages did a particular user visit?
>>> 3)  Which users visited a particular web page?  etc etc.
>>>
>>> What's the best way to model this in HTable?
>>> Since every HTable is really a distributed hashmap, does that mean I need to create 3 different HTables (HashMaps) to answer these 3 questions?
>>>
>>> 1) One table with (UserId + WebPageId) as the compound key? (To answer #1)
>>> 2) One table with UserId as the key? (To answer #2)
>>> 3) One table with WebPageId as the key? (To answer #3)
>>>
>>> Along with HTable should I use Hive to run queries such as #1 above?
>>> Any help in this regard will be greatly appreciated.  Thanks.
>>>
>>>
>>>
>>
>>
>>
>
>
>
>