Use Longs for Line Numbers

View: New views
4 Messages — Rating Filter:   Alert me  

Use Longs for Line Numbers

by Kurt Guenther :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


I'm currently parsing an 8 GB file with 241,000,000 lines which is about
12.5%  of the 2^31 -1 max value of int.   I'm supposed to support files
up to 200 GB, so is there a way to generate the token manager with longs
instead of ints?




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Use Longs for Line Numbers

by Tom Copeland :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Oct 6, 2009, at 6:57 PM, Kurt Guenther wrote:

>
> I'm currently parsing an 8 GB file with 241,000,000 lines which is  
> about 12.5%  of the 2^31 -1 max value of int.   I'm supposed to  
> support files up to 200 GB, so is there a way to generate the token  
> manager with longs instead of ints?

I keep thinking about this one.  We can't do just a search and replace  
on the generated code since other stuff uses ints as well - e.g., the  
lexical states are contained in an array of integers.  I think it'd  
also affect the parser generating code, since we'd need to make jj_gen  
and jj_ntk and others into longs as well.

At the end of the day, it might be easier to preprocess your input  
data to chunk it into smaller files.

Yours,

Tom


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Use Longs for Line Numbers

by Sreenivasa Viswanadha :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Yes, we should fix it. But for now, you can simply edit the generated
files - Token.java and SimpleCharStream.java to make the line numbers
long. There might be some issues with error reporting, but it should work.

Tom, I'm not sure I understand what you are saying. The line number stuff
is rather straightforward. It's used purely for reporting. It doesn't
affect the generated code at all.

>
> On Oct 6, 2009, at 6:57 PM, Kurt Guenther wrote:
>
>>
>> I'm currently parsing an 8 GB file with 241,000,000 lines which is
>> about 12.5%  of the 2^31 -1 max value of int.   I'm supposed to
>> support files up to 200 GB, so is there a way to generate the token
>> manager with longs instead of ints?
>
> I keep thinking about this one.  We can't do just a search and replace
> on the generated code since other stuff uses ints as well - e.g., the
> lexical states are contained in an array of integers.  I think it'd
> also affect the parser generating code, since we'd need to make jj_gen
> and jj_ntk and others into longs as well.
>
> At the end of the day, it might be easier to preprocess your input
> data to chunk it into smaller files.
>
> Yours,
>
> Tom
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@...
> For additional commands, e-mail: users-help@...
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...


Re: Use Longs for Line Numbers

by Tom Copeland :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 9, 2009, at 3:46 PM, sreeni@... wrote:

>
> Yes, we should fix it. But for now, you can simply edit the generated
> files - Token.java and SimpleCharStream.java to make the line numbers
> long. There might be some issues with error reporting, but it should work.
>
> Tom, I'm not sure I understand what you are saying. The line number stuff
> is rather straightforward. It's used purely for reporting. It doesn't
> affect the generated code at all.

Hm, maybe I misunderstood... I was thinking that Kurt was saying that there were more than Integer.MAX_INT tokens... so that all those int[] arrays of tokens and such would need to be long[] instead.  But maybe I'm misrepresenting his question...

Yours,

Tom


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@...
For additional commands, e-mail: users-help@...