WARNING: This server is unstable and will be retired in the next days.
If you want to keep this forum available, please request immediately a migration
on the Nabble Support forum.
Forums that don't receive any migration request will be deleted forever.
On Wed, Dec 14, 2011 at 2:00 AM, Ian Hickson <ian@...> wrote:
> I can remove the text "one at a time", if you like. Would that be
> satisfactory? Or I guess I could change the spec to say that the parser
> should process the characters, rather than the tokenizer, since really
> it's the whole shebang that needs to be involved (stream preprocessor and
> everything). Any opinions on what the right text is here?
I'd like the CRLF preprocessing to be defined as an eager stateful
operation so that there's one bit of state: "last was CR". Then, input
is handled as follows:
If the input character is CR, set "last was CR" to true and emit LF.
If the input character is LF and "last was CR" is true, don't emit
anything and set "last was CR" to false.
If the input character is LF and "last was CR" is is false, emit LF.
Else set "last was CR" to false and emit the input character.
Where "emit" feeds into the tokenizer. By "eager", I mean that the
operation described above doesn't buffer. I.e. the first case emits an
LF upon seeing a CR without waiting for an LF also to appear in the