|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
Use Longs for Line NumbersI'm currently parsing an 8 GB file with 241,000,000 lines which is about 12.5% of the 2^31 -1 max value of int. I'm supposed to support files up to 200 GB, so is there a way to generate the token manager with longs instead of ints? --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Use Longs for Line NumbersOn Oct 6, 2009, at 6:57 PM, Kurt Guenther wrote: > > I'm currently parsing an 8 GB file with 241,000,000 lines which is > about 12.5% of the 2^31 -1 max value of int. I'm supposed to > support files up to 200 GB, so is there a way to generate the token > manager with longs instead of ints? I keep thinking about this one. We can't do just a search and replace on the generated code since other stuff uses ints as well - e.g., the lexical states are contained in an array of integers. I think it'd also affect the parser generating code, since we'd need to make jj_gen and jj_ntk and others into longs as well. At the end of the day, it might be easier to preprocess your input data to chunk it into smaller files. Yours, Tom --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Use Longs for Line NumbersYes, we should fix it. But for now, you can simply edit the generated files - Token.java and SimpleCharStream.java to make the line numbers long. There might be some issues with error reporting, but it should work. Tom, I'm not sure I understand what you are saying. The line number stuff is rather straightforward. It's used purely for reporting. It doesn't affect the generated code at all. > > On Oct 6, 2009, at 6:57 PM, Kurt Guenther wrote: > >> >> I'm currently parsing an 8 GB file with 241,000,000 lines which is >> about 12.5% of the 2^31 -1 max value of int. I'm supposed to >> support files up to 200 GB, so is there a way to generate the token >> manager with longs instead of ints? > > I keep thinking about this one. We can't do just a search and replace > on the generated code since other stuff uses ints as well - e.g., the > lexical states are contained in an array of integers. I think it'd > also affect the parser generating code, since we'd need to make jj_gen > and jj_ntk and others into longs as well. > > At the end of the day, it might be easier to preprocess your input > data to chunk it into smaller files. > > Yours, > > Tom > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscribe@... > For additional commands, e-mail: users-help@... > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Use Longs for Line NumbersOn Nov 9, 2009, at 3:46 PM, sreeni@... wrote: > > Yes, we should fix it. But for now, you can simply edit the generated > files - Token.java and SimpleCharStream.java to make the line numbers > long. There might be some issues with error reporting, but it should work. > > Tom, I'm not sure I understand what you are saying. The line number stuff > is rather straightforward. It's used purely for reporting. It doesn't > affect the generated code at all. Hm, maybe I misunderstood... I was thinking that Kurt was saying that there were more than Integer.MAX_INT tokens... so that all those int[] arrays of tokens and such would need to be long[] instead. But maybe I'm misrepresenting his question... Yours, Tom --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |