Record limit in scan api?

View: New views
4 Messages — Rating Filter:   Alert me  

Record limit in scan api?

by Adam Silberstein :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

   Hi,
Is there a way to specify a limit on number of returned records for scan?  I
don¹t see any way to do this when building the scan.  If there is, that
would be great.  If not, what about when iterating over the result?  If I
exit the loop when I reach my limit, will that approximate this clause?   I
guess my real question is about how scan is implemented in the client.  I.e.
How many records are returned from Hbase at a time as I iterate through the
scan result?  If I want 1,000 records and 100 get returned at a time, then
I¹m in good shape.  On the other hand, if I want 10 records and get 100 at a
time, it¹s a bit wasteful, though the waste is bounded.

Thanks,
Adam

Re: Record limit in scan api?

by Jean-Daniel Cryans-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Adam,

You have to exit when you reach your limit, but you can specify start
and stop rows which is usually very useful with well designed row
keys.

By default the scanner client fetches rows one by one. You can set
scanner caching with Scan.setCaching which improves the performance of
the scan by lowering the number of RPCs.

J-D

On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
<silberst@...> wrote:

>   Hi,
> Is there a way to specify a limit on number of returned records for scan?  I
> don¹t see any way to do this when building the scan.  If there is, that
> would be great.  If not, what about when iterating over the result?  If I
> exit the loop when I reach my limit, will that approximate this clause?   I
> guess my real question is about how scan is implemented in the client.  I.e.
> How many records are returned from Hbase at a time as I iterate through the
> scan result?  If I want 1,000 records and 100 get returned at a time, then
> I¹m in good shape.  On the other hand, if I want 10 records and get 100 at a
> time, it¹s a bit wasteful, though the waste is bounded.
>
> Thanks,
> Adam
>

Re: Record limit in scan api?

by stack-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

There is this in the configuration:

  <property>
    <name>hbase.client.scanner.caching</name>
    <value>1</value>
    <description>Number of rows that will be fetched when calling next
    on a scanner if it is not served from memory. Higher caching values
    will enable faster scanners but will eat up more memory and some
    calls of next may take longer and longer times when the cache is empty.
    </description>
  </property>


Being able to do it per Scan sounds like something we should add.

St.Ack


On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
<silberst@...>wrote:

>   Hi,
> Is there a way to specify a limit on number of returned records for scan?
>  I
> don¹t see any way to do this when building the scan.  If there is, that
> would be great.  If not, what about when iterating over the result?  If I
> exit the loop when I reach my limit, will that approximate this clause?   I
> guess my real question is about how scan is implemented in the client.
>  I.e.
> How many records are returned from Hbase at a time as I iterate through the
> scan result?  If I want 1,000 records and 100 get returned at a time, then
> I¹m in good shape.  On the other hand, if I want 10 records and get 100 at
> a
> time, it¹s a bit wasteful, though the waste is bounded.
>
> Thanks,
> Adam
>

Re: Record limit in scan api?

by Gary Helmling :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

To set this per scan you should be able to do:

Scan s = new Scan()
s.setCaching(...)

(I think this works anyway)


The other thing that I've found useful is using a PageFilter on scans:
http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html

I believe this is applied independently on each region server (?) so you
still need to do your own counting in iterating the results, but it can be
used to early out on the server side separately from the scanner caching
value.

--gh

On Fri, Nov 20, 2009 at 3:04 PM, stack <stack@...> wrote:

> There is this in the configuration:
>
>  <property>
>    <name>hbase.client.scanner.caching</name>
>    <value>1</value>
>    <description>Number of rows that will be fetched when calling next
>    on a scanner if it is not served from memory. Higher caching values
>    will enable faster scanners but will eat up more memory and some
>    calls of next may take longer and longer times when the cache is empty.
>    </description>
>  </property>
>
>
> Being able to do it per Scan sounds like something we should add.
>
> St.Ack
>
>
> On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein
> <silberst@...>wrote:
>
> >   Hi,
> > Is there a way to specify a limit on number of returned records for scan?
> >  I
> > don¹t see any way to do this when building the scan.  If there is, that
> > would be great.  If not, what about when iterating over the result?  If I
> > exit the loop when I reach my limit, will that approximate this clause?
> I
> > guess my real question is about how scan is implemented in the client.
> >  I.e.
> > How many records are returned from Hbase at a time as I iterate through
> the
> > scan result?  If I want 1,000 records and 100 get returned at a time,
> then
> > I¹m in good shape.  On the other hand, if I want 10 records and get 100
> at
> > a
> > time, it¹s a bit wasteful, though the waste is bounded.
> >
> > Thanks,
> > Adam
> >
>