|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Record limit in scan api? Hi,
Is there a way to specify a limit on number of returned records for scan? I don¹t see any way to do this when building the scan. If there is, that would be great. If not, what about when iterating over the result? If I exit the loop when I reach my limit, will that approximate this clause? I guess my real question is about how scan is implemented in the client. I.e. How many records are returned from Hbase at a time as I iterate through the scan result? If I want 1,000 records and 100 get returned at a time, then I¹m in good shape. On the other hand, if I want 10 records and get 100 at a time, it¹s a bit wasteful, though the waste is bounded. Thanks, Adam |
|
|
Re: Record limit in scan api?Adam,
You have to exit when you reach your limit, but you can specify start and stop rows which is usually very useful with well designed row keys. By default the scanner client fetches rows one by one. You can set scanner caching with Scan.setCaching which improves the performance of the scan by lowering the number of RPCs. J-D On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein <silberst@...> wrote: > Hi, > Is there a way to specify a limit on number of returned records for scan? I > don¹t see any way to do this when building the scan. If there is, that > would be great. If not, what about when iterating over the result? If I > exit the loop when I reach my limit, will that approximate this clause? I > guess my real question is about how scan is implemented in the client. I.e. > How many records are returned from Hbase at a time as I iterate through the > scan result? If I want 1,000 records and 100 get returned at a time, then > I¹m in good shape. On the other hand, if I want 10 records and get 100 at a > time, it¹s a bit wasteful, though the waste is bounded. > > Thanks, > Adam > |
|
|
Re: Record limit in scan api?There is this in the configuration:
<property> <name>hbase.client.scanner.caching</name> <value>1</value> <description>Number of rows that will be fetched when calling next on a scanner if it is not served from memory. Higher caching values will enable faster scanners but will eat up more memory and some calls of next may take longer and longer times when the cache is empty. </description> </property> Being able to do it per Scan sounds like something we should add. St.Ack On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein <silberst@...>wrote: > Hi, > Is there a way to specify a limit on number of returned records for scan? > I > don¹t see any way to do this when building the scan. If there is, that > would be great. If not, what about when iterating over the result? If I > exit the loop when I reach my limit, will that approximate this clause? I > guess my real question is about how scan is implemented in the client. > I.e. > How many records are returned from Hbase at a time as I iterate through the > scan result? If I want 1,000 records and 100 get returned at a time, then > I¹m in good shape. On the other hand, if I want 10 records and get 100 at > a > time, it¹s a bit wasteful, though the waste is bounded. > > Thanks, > Adam > |
|
|
Re: Record limit in scan api?To set this per scan you should be able to do:
Scan s = new Scan() s.setCaching(...) (I think this works anyway) The other thing that I've found useful is using a PageFilter on scans: http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/filter/PageFilter.html I believe this is applied independently on each region server (?) so you still need to do your own counting in iterating the results, but it can be used to early out on the server side separately from the scanner caching value. --gh On Fri, Nov 20, 2009 at 3:04 PM, stack <stack@...> wrote: > There is this in the configuration: > > <property> > <name>hbase.client.scanner.caching</name> > <value>1</value> > <description>Number of rows that will be fetched when calling next > on a scanner if it is not served from memory. Higher caching values > will enable faster scanners but will eat up more memory and some > calls of next may take longer and longer times when the cache is empty. > </description> > </property> > > > Being able to do it per Scan sounds like something we should add. > > St.Ack > > > On Fri, Nov 20, 2009 at 11:43 AM, Adam Silberstein > <silberst@...>wrote: > > > Hi, > > Is there a way to specify a limit on number of returned records for scan? > > I > > don¹t see any way to do this when building the scan. If there is, that > > would be great. If not, what about when iterating over the result? If I > > exit the loop when I reach my limit, will that approximate this clause? > I > > guess my real question is about how scan is implemented in the client. > > I.e. > > How many records are returned from Hbase at a time as I iterate through > the > > scan result? If I want 1,000 records and 100 get returned at a time, > then > > I¹m in good shape. On the other hand, if I want 10 records and get 100 > at > > a > > time, it¹s a bit wasteful, though the waste is bounded. > > > > Thanks, > > Adam > > > |
| Free embeddable forum powered by Nabble | Forum Help |