Records missing

View: New views
7 Messages — Rating Filter:   Alert me  

Records missing

by Eason Lee :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have dumped all my data into Hbase using mapreduce
the result shows i have processed 4,413,160 records
But there are only 4217742 rows in the table(count by rowcounter)

dump:
Map input records 4,413,160 0 4,413,160count:
Map input records 4,217,742 0 4,217,742
There is no error,and row key is unique(
Math.Random()+"_"+fileName+"_"+currentPosition )

Re: Records missing

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Which version of HBase? Any region server crash during the upload?

J-D

On Wed, Nov 4, 2009 at 10:09 PM, Eason.Lee <leongfans@...> wrote:

> I have dumped all my data into Hbase using mapreduce
> the result shows i have processed 4,413,160 records
> But there are only 4217742 rows in the table(count by rowcounter)
>
> dump:
> Map input records 4,413,160 0 4,413,160count:
> Map input records 4,217,742 0 4,217,742
> There is no error,and row key is unique(
> Math.Random()+"_"+fileName+"_"+currentPosition )
>

Re: Records missing

by Eason Lee :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

0.20.1

I didn't see any error during upload~~

and there is no error in logs

2009/11/5 Jean-Daniel Cryans <jdcryans@...>

> Which version of HBase? Any region server crash during the upload?
>
> J-D
>
> On Wed, Nov 4, 2009 at 10:09 PM, Eason.Lee <leongfans@...> wrote:
> > I have dumped all my data into Hbase using mapreduce
> > the result shows i have processed 4,413,160 records
> > But there are only 4217742 rows in the table(count by rowcounter)
> >
> > dump:
> > Map input records 4,413,160 0 4,413,160count:
> > Map input records 4,217,742 0 4,217,742
> > There is no error,and row key is unique(
> > Math.Random()+"_"+fileName+"_"+currentPosition )
> >
>

Re: Records missing

by stack-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For sure all your keys are unique?  Write them as output from your job and
count that output too (does the reduce count match the map count)?
St.Ack

On Wed, Nov 4, 2009 at 10:30 PM, Eason.Lee <leongfans@...> wrote:

> 0.20.1
>
> I didn't see any error during upload~~
>
> and there is no error in logs
>
> 2009/11/5 Jean-Daniel Cryans <jdcryans@...>
>
> > Which version of HBase? Any region server crash during the upload?
> >
> > J-D
> >
> > On Wed, Nov 4, 2009 at 10:09 PM, Eason.Lee <leongfans@...> wrote:
> > > I have dumped all my data into Hbase using mapreduce
> > > the result shows i have processed 4,413,160 records
> > > But there are only 4217742 rows in the table(count by rowcounter)
> > >
> > > dump:
> > > Map input records 4,413,160 0 4,413,160count:
> > > Map input records 4,217,742 0 4,217,742
> > > There is no error,and row key is unique(
> > > Math.Random()+"_"+fileName+"_"+currentPosition )
> > >
> >
>

Re: Records missing

by Eason Lee :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry for reply to late~

2009/11/6 stack <stack@...>

> For sure all your keys are unique?  Write them as output from your job and
> count that output too (does the reduce count match the map count)?
> St.Ack
>
> Yes, they are unique, I have just checked it~~
I don't have reduce. Just save records into hbase in the map.
But i just did a test , collect all the row keys, and found that the reduce
count matches the map count


> On Wed, Nov 4, 2009 at 10:30 PM, Eason.Lee <leongfans@...> wrote:
>
> > 0.20.1
> >
> > I didn't see any error during upload~~
> >
> > and there is no error in logs
> >
> > 2009/11/5 Jean-Daniel Cryans <jdcryans@...>
> >
> > > Which version of HBase? Any region server crash during the upload?
> > >
> > > J-D
> > >
> > > On Wed, Nov 4, 2009 at 10:09 PM, Eason.Lee <leongfans@...>
> wrote:
> > > > I have dumped all my data into Hbase using mapreduce
> > > > the result shows i have processed 4,413,160 records
> > > > But there are only 4217742 rows in the table(count by rowcounter)
> > > >
> > > > dump:
> > > > Map input records 4,413,160 0 4,413,160count:
> > > > Map input records 4,217,742 0 4,217,742
> > > > There is no error,and row key is unique(
> > > > Math.Random()+"_"+fileName+"_"+currentPosition )
> > > >
> > >
> >
>

Re: Records missing

by stack-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

So, can you dump the keys from hbase and compare to those you entered and
see if you can figure what the difference is?  Might give you a clue as to
whats happening: e.g. take one of the missing keys and grep it in your logs,
maybe there was an error around it?

I can insert into an hbase instance hundreds of millions without losing
entries.  This is why I'm of the opinion that its something to do with your
environment.

If you can turn up more than the below, that'd help.

Thanks Eason.
St.Ack

On Fri, Nov 6, 2009 at 12:33 AM, Eason.Lee <leongfans@...> wrote:

> Sorry for reply to late~
>
> 2009/11/6 stack <stack@...>
>
> > For sure all your keys are unique?  Write them as output from your job
> and
> > count that output too (does the reduce count match the map count)?
> > St.Ack
> >
> > Yes, they are unique, I have just checked it~~
> I don't have reduce. Just save records into hbase in the map.
> But i just did a test , collect all the row keys, and found that the reduce
> count matches the map count
>
>
> > On Wed, Nov 4, 2009 at 10:30 PM, Eason.Lee <leongfans@...> wrote:
> >
> > > 0.20.1
> > >
> > > I didn't see any error during upload~~
> > >
> > > and there is no error in logs
> > >
> > > 2009/11/5 Jean-Daniel Cryans <jdcryans@...>
> > >
> > > > Which version of HBase? Any region server crash during the upload?
> > > >
> > > > J-D
> > > >
> > > > On Wed, Nov 4, 2009 at 10:09 PM, Eason.Lee <leongfans@...>
> > wrote:
> > > > > I have dumped all my data into Hbase using mapreduce
> > > > > the result shows i have processed 4,413,160 records
> > > > > But there are only 4217742 rows in the table(count by rowcounter)
> > > > >
> > > > > dump:
> > > > > Map input records 4,413,160 0 4,413,160count:
> > > > > Map input records 4,217,742 0 4,217,742
> > > > > There is no error,and row key is unique(
> > > > > Math.Random()+"_"+fileName+"_"+currentPosition )
> > > > >
> > > >
> > >
> >
>

Re: Records missing

by Eason Lee :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the reply
I will check that~~

2009/11/7 stack <stack@...>

> So, can you dump the keys from hbase and compare to those you entered and
> see if you can figure what the difference is?  Might give you a clue as to
> whats happening: e.g. take one of the missing keys and grep it in your
> logs,
> maybe there was an error around it?
>
> I can insert into an hbase instance hundreds of millions without losing
> entries.  This is why I'm of the opinion that its something to do with your
> environment.
>
> If you can turn up more than the below, that'd help.
>
> Thanks Eason.
> St.Ack
>
> On Fri, Nov 6, 2009 at 12:33 AM, Eason.Lee <leongfans@...> wrote:
>
> > Sorry for reply to late~
> >
> > 2009/11/6 stack <stack@...>
> >
> > > For sure all your keys are unique?  Write them as output from your job
> > and
> > > count that output too (does the reduce count match the map count)?
> > > St.Ack
> > >
> > > Yes, they are unique, I have just checked it~~
> > I don't have reduce. Just save records into hbase in the map.
> > But i just did a test , collect all the row keys, and found that the
> reduce
> > count matches the map count
> >
> >
> > > On Wed, Nov 4, 2009 at 10:30 PM, Eason.Lee <leongfans@...>
> wrote:
> > >
> > > > 0.20.1
> > > >
> > > > I didn't see any error during upload~~
> > > >
> > > > and there is no error in logs
> > > >
> > > > 2009/11/5 Jean-Daniel Cryans <jdcryans@...>
> > > >
> > > > > Which version of HBase? Any region server crash during the upload?
> > > > >
> > > > > J-D
> > > > >
> > > > > On Wed, Nov 4, 2009 at 10:09 PM, Eason.Lee <leongfans@...>
> > > wrote:
> > > > > > I have dumped all my data into Hbase using mapreduce
> > > > > > the result shows i have processed 4,413,160 records
> > > > > > But there are only 4217742 rows in the table(count by rowcounter)
> > > > > >
> > > > > > dump:
> > > > > > Map input records 4,413,160 0 4,413,160count:
> > > > > > Map input records 4,217,742 0 4,217,742
> > > > > > There is no error,and row key is unique(
> > > > > > Math.Random()+"_"+fileName+"_"+currentPosition )
> > > > > >
> > > > >
> > > >
> > >
> >
>