|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Regular expression useI noticed that if one is not careful in one’s regular
expression use, the compilation for a regular expression can take minutes.
I’m not talking about applying the pattern just compiling it! Should regular expressions be avoided altogether and should
one use hand-crafted state machines for parsing, and tokenizing, or can regular
expressions be used as long as one is careful? Best Regards, Jonathan S. Levinson |
|
|
Re: Regular expression useHi Jonathan,
Jonathan Levinson wrote: > I noticed that if one is not careful in one's regular expression use, > the compilation for a regular expression can take minutes. I'm not > talking about applying the pattern just compiling it! > > > > Should regular expressions be avoided altogether and should one use > hand-crafted state machines for parsing, and tokenizing, or can regular > expressions be used as long as one is careful? I’d say, use regular expressions as long as they are not too complex. But I guess you’re mentioning that in the context of property parsing, in which case I don’t think regular expressions are the ultimate answer. A proper lexer is likely to be needed, either generated or written by hand. As the latter solution quickly becomes a maintenance nightmare, some lexer generator will probably be needed. Question remains, which one, and I’m not even sure there’s one that exists whose license is ASLv2-compatible. Plus there are some issues specific to property parsing, like shorthands (which should ideally re-use the parsers of the individual properties), sub-properties, etc. Vincent |
|
|
RE: Regular expression useI'm sure someone has mentioned it already but what about the lexer support in ANTLR?
http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Lexical+analysis ANTLR is available under the BSD license, which seems to be one with no strings attached: http://www.antlr.org/license.html Best Regards, Jonathan S. Levinson -----Original Message----- From: Vincent Hennebert [mailto:vhennebert@...] Sent: Wednesday, October 07, 2009 6:51 AM To: fop-dev@... Subject: Re: Regular expression use Hi Jonathan, Jonathan Levinson wrote: > I noticed that if one is not careful in one's regular expression use, > the compilation for a regular expression can take minutes. I'm not > talking about applying the pattern just compiling it! > > > > Should regular expressions be avoided altogether and should one use > hand-crafted state machines for parsing, and tokenizing, or can regular > expressions be used as long as one is careful? I’d say, use regular expressions as long as they are not too complex. But I guess you’re mentioning that in the context of property parsing, in which case I don’t think regular expressions are the ultimate answer. A proper lexer is likely to be needed, either generated or written by hand. As the latter solution quickly becomes a maintenance nightmare, some lexer generator will probably be needed. Question remains, which one, and I’m not even sure there’s one that exists whose license is ASLv2-compatible. Plus there are some issues specific to property parsing, like shorthands (which should ideally re-use the parsers of the individual properties), sub-properties, etc. Vincent |
|
|
Re: Regular expression useHi Jonathan,
Jonathan Levinson wrote: > I'm sure someone has mentioned it already but what about the lexer support in ANTLR? > > http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Lexical+analysis > > ANTLR is available under the BSD license, which seems to be one with no strings attached: > > http://www.antlr.org/license.html Basically we’re back to the same discussion as about the parser generator, this time at the lexer level. http://markmail.org/thread/64rmyl7x4nyoxhh3 Among the tools mentioned in the above thread, it would be good to know which ones allow to use the lexer independently of the parser. Unless we decide to use both the lexer and parser anyway... Vincent > Best Regards, > Jonathan S. Levinson > > -----Original Message----- > From: Vincent Hennebert [mailto:vhennebert@...] > Sent: Wednesday, October 07, 2009 6:51 AM > To: fop-dev@... > Subject: Re: Regular expression use > > Hi Jonathan, > > Jonathan Levinson wrote: >> I noticed that if one is not careful in one's regular expression use, >> the compilation for a regular expression can take minutes. I'm not >> talking about applying the pattern just compiling it! >> >> >> >> Should regular expressions be avoided altogether and should one use >> hand-crafted state machines for parsing, and tokenizing, or can regular >> expressions be used as long as one is careful? > > I’d say, use regular expressions as long as they are not too complex. > But I guess you’re mentioning that in the context of property parsing, > in which case I don’t think regular expressions are the ultimate answer. > A proper lexer is likely to be needed, either generated or written by > hand. As the latter solution quickly becomes a maintenance nightmare, > some lexer generator will probably be needed. Question remains, which > one, and I’m not even sure there’s one that exists whose license is > ASLv2-compatible. Plus there are some issues specific to property > parsing, like shorthands (which should ideally re-use the parsers of the > individual properties), sub-properties, etc. > > > Vincent |
|
|
RE: Regular expression useFrom the following link, it looks like we can call the Lexer to get tokens - independently of the parser.
http://www.antlr.org/wiki/display/ANTLR3/1.+Lexer Here is the example from the above which gives me such a hope: import org.antlr.runtime.*; import org.antlr.runtime.tree.*; public class MainLexer { public static void main(String[] args) { CharStream input = new ANTLRFileStream(args[0]); XMLLexer lexer = new XMLLexer(input); Token token; while ((token = lexer.nextToken())!=Token.EOF_TOKEN) { System.out.println("Token: "+token.getText()); } } catch(Throwable t) { System.out.println("Exception: "+t); t.printStackTrace(); } } } I don't know if CharStream or XMLLexer can take a String constructor or has a String factory, which is what we'd probably use within FOP. Best Regards, Jonathan S. Levinson -----Original Message----- From: Vincent Hennebert [mailto:vhennebert@...] Sent: Thursday, October 08, 2009 5:15 AM To: fop-dev@... Subject: Re: Regular expression use Hi Jonathan, Jonathan Levinson wrote: > I'm sure someone has mentioned it already but what about the lexer support in ANTLR? > > http://www.antlr.org/wiki/display/ANTLR3/FAQ+-+Lexical+analysis > > ANTLR is available under the BSD license, which seems to be one with no strings attached: > > http://www.antlr.org/license.html Basically we’re back to the same discussion as about the parser generator, this time at the lexer level. http://markmail.org/thread/64rmyl7x4nyoxhh3 Among the tools mentioned in the above thread, it would be good to know which ones allow to use the lexer independently of the parser. Unless we decide to use both the lexer and parser anyway... Vincent > Best Regards, > Jonathan S. Levinson > > -----Original Message----- > From: Vincent Hennebert [mailto:vhennebert@...] > Sent: Wednesday, October 07, 2009 6:51 AM > To: fop-dev@... > Subject: Re: Regular expression use > > Hi Jonathan, > > Jonathan Levinson wrote: >> I noticed that if one is not careful in one's regular expression use, >> the compilation for a regular expression can take minutes. I'm not >> talking about applying the pattern just compiling it! >> >> >> >> Should regular expressions be avoided altogether and should one use >> hand-crafted state machines for parsing, and tokenizing, or can regular >> expressions be used as long as one is careful? > > I’d say, use regular expressions as long as they are not too complex. > But I guess you’re mentioning that in the context of property parsing, > in which case I don’t think regular expressions are the ultimate answer. > A proper lexer is likely to be needed, either generated or written by > hand. As the latter solution quickly becomes a maintenance nightmare, > some lexer generator will probably be needed. Question remains, which > one, and I’m not even sure there’s one that exists whose license is > ASLv2-compatible. Plus there are some issues specific to property > parsing, like shorthands (which should ideally re-use the parsers of the > individual properties), sub-properties, etc. > > > Vincent |
| Free embeddable forum powered by Nabble | Forum Help |