[bug #27799] fix for nested-nested-nested variations

View: New views
15 Messages — Rating Filter:   Alert me  

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


URL:
  <http://savannah.gnu.org/bugs/?27799>

                 Summary: fix for nested-nested-nested variations
                 Project: XBoard
            Submitted by: None
            Submitted on: Sat Oct 24 16:51:30 2009
                Category: None
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

In file parser.l theres a problem with nested-nested variations in PGN files
at around line 920. I fixed it with:
\([^()]*(\([^()]*(\([^()]*(\([^()]*\)[^()]*)*\)[^()]*)*\)[^()]*)+[^()]*\)  {
/* nested () */

Attached is a PGN by Fabiano Caruana to test if it works.



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Sat Oct 24 16:51:30 2009  Name: f.pgn  Size: 14kB   By: None
Attached is a PGN by Fabiano Caruana
<http://savannah.gnu.org/bugs/download.php?file_id=18937>

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #1, bug #27799 (project xboard):

Not sure I understand the lex syntax completely. The "+", is that a meta
character for one or more, like "*" means zero or more times, and "?" means
one or zero? And unescaped parentheses group these operators?

Btw, now that I look at parser.l, what do you think of:

(([Ww](hite)?)|([Bb](lack)?))" "([Mm]ate(s|ed)?)|([Ww][io]n(s)?.*)  {

Doesn't that match "Mateed", in stead of "Mated"?

H.G. Muller

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #2, bug #27799 (project xboard):

I never used flex before, but I think "White mated" isn't even desired;
wouldn't that return a win for white?
It still seems to match by ignoring the final "d".
Also "wins" alone seems to match, because of missing brackets in the right
half.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #3, bug #27799 (project xboard):

All good points, and you understand the syntax correctly.

Looking at the next pattern, besides the fact that the patterns are wrongly
looking for "Mateed" we actually are trying to recognize "White mated" as both
a win and a loss by white!  I guess it could mean "White mated his opponent",
but "White is mated here" seems more likely.

Also, "White mate" seems meaningless to me, so maybe we should ignore that
one.  (If we don't match it here, the "skip everything else" pattern will skip
"White " and the "([Cc]heck)?[Mm]ate" pattern will match at that point and
consider it a loss for the player who is on move.)

Actually, much of this stuff is probably unnecessary now that PGN is a fairly
accepted standard -- it dates from when xboard was often used to parse out
games people had typed into email messages by hand, before PGN even had been
proposed.  But we could keep it and fix it this way:

(([Ww](hite)?)|([Bb](lack)?))" "[Mm]ates {
    return (int) (ToUpper(yytext[0]) == 'W' ? WhiteWins : BlackWins);
}

(([Ww](hite)?)|([Bb](lack)?))" "([Mm]ated)|([Ll]os[tes]+.*) {
    return (int) (ToUpper(yytext[0]) == 'W' ? BlackWins : WhiteWins);
}



    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #4, bug #27799 (project xboard):

Without guarantees this looks better:

(([Ww](hite)?)|([Bb](lack)?))" "(([Mm]ates)|([Ww][io]n(s)?)).*  {
    return (int) (ToUpper(yytext[0]) == 'W' ? WhiteWins : BlackWins);
}

(([Ww](hite)?)|([Bb](lack)?))" "(([Mm]ated)|([Ll]os[tes]+)).*  {
    return (int) (ToUpper(yytext[0]) == 'W' ? BlackWins : WhiteWins);
}

It assumes, that "White mated" means "White got mated" and not "White mated
his opponent". e.g.: at FICS "White checkmated" means white lost.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #5, bug #27799 (project xboard):

I posted #4 before seeing #3.
Wouldn't your 2nd rule match any string containing "lost", returning
WhiteWins?
There should be brackets around (Mated | Lost),imho.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #6, bug #27799 (project xboard):

Right, I missed fixing the missing parentheses there.  Also, I don't know why
I deleted the "wins" part.  I think I wrote that comment too early in the
morning.

Your version looks good, but one thing:  The trailing .* in those patterns
will unconditionally eat all following characters on the line.  Do we really
want that?


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #7, bug #27799 (project xboard):

I wasn't sure about the ".*" . I just saw in the original version, that a
"wins" eats the rest and a "white mates" doesn't.
So I unified that and added "Without guarantees".
Feel free to change that :)

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #8, bug #27799 (project xboard):

I see Arun checked in the change (on HG's behalf) from comment #4, with the
trailing .* removed.  That looks perfect to me.  I tested it with the
following file, and it sees 5 games and parses out the results correctly.

1. e4 White wins

1. e4 B mated  1. e4 B mates

1. d4 b Won  1. c4 e5 w lost


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Update of bug #27799 (project xboard):

                  Status:                    None => Fixed                  
             Open/Closed:                    Open => Closed                


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #9, bug #27799 (project xboard):

I almost regret that I tested this:
Load the file from the original submitter into xboard.
Save an unchanged copy to a different file.
Loading the copy gives: "Illegal move: 17 .... d4"

The first difference seems to be in the comment on "15. exd5", where the copy
has no "}" right after "opponent.".

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #10, bug #27799 (project xboard):

OK, problem here is that the parser recognizes several types of "comments",
between {} or (). XBoard then strips off the delimiters, appends all comments
belonging to the same move to each other, and saves between {}. This loses the
(), and explains why there is no closing brace after "opponent.". This in
itself is not responsible for the error, as there also is no opening "("
before "15. e4", and the matching closing ")" has been replaced by "}". This
of course f*s the syntax of sub-variation nesting, but hey, it is only a
comment.

But the effect of this is that the comment within the subvariation "{has been
the main choice..." is now no longer 'protected' by the subvariation's () that
were around it in the original file. So the saved file has a _nested_ comment
"{{}}". And it turns out the parser choked on that, as its rule for parsing
comments is:

{[^}]*} {         /* anything in {} */

i.e. it triggers on the first closing brace, no matter how amny opening
braces it encounters. So we drop out of comment mode too quickly, and start
parsing moves in the sub-variation like they were game moves.

The problem disappears when I add the rule:

{[^{}]*({[^{}]*}[^{}]*)+[^{}]*}  { /* nested {} */

What does the PGN stndard say here? Do {} have to be balanced?. For safety I
left the old rule after the new one, so that when no balancing }} is found,
parsing resumes after the first } that is found.

Of course what we are trying to do here is doomed from the outset. A state
machine like lex is fundamentally incapable to parse a recursive language as
required for describing balanced  parenthesising. Nesting is bad enough, but
the use of both () and {} turns it into a disaster. We can add all the
patterns for mixed nesting of {}, () (and []?) we want, but it will always be
possible to design a PGN that has more levels of nesting than our deepest
pattern has. And then I am still only addressing parsing of correctly balanced
PGN.

H.G. Muller

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #11, bug #27799 (project xboard):

Looks like the saved file violates the standard with its nested comment.
Quote from the standard:

Brace comments do not nest; a left brace character appearing in a brace
comment
loses its special meaning and is ignored.  A semicolon appearing inside of a
brace comment loses its special meaning and is ignored.  Braces appearing
inside of a semicolon comments lose their special meaning and are ignored.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #12, bug #27799 (project xboard):

OK, this is what I was afraid of. So the method of stripping outer
parentheses, and then replacing them with braces, is not safe.

I guess the only solution is to not strip off the {} or () hen storing the
comment ith the move. Then you also don't have to add them when you save them
on file. Just append the various comment tokens with a space between them.

A little complication is that I currently also weave in the engine
score/depth and time on saving and displaying, when they are available, and
extract and strip them from the comment when they are present in it on
reading. (To prevent they will apper twice when you save again.) This would
then have to be done in a slightly smarter way, depending on if the comment
starts with { or (.

I guess I am not going to fix that now. Re-saving heavily commented games is
not really an application XBoard is designed for. We can put it on the to-do
list.

H.G. Muller

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard

[bug #27799] fix for nested-nested-nested variations

by Sylvain Beucler-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Follow-up Comment #13, bug #27799 (project xboard):

Of course you are right about regular expressions not being able to handle
nesting -- what is in parser.l now is a kludge.  I had vague thoughts at one
time about recognizing {, }, (, and ) as individual tokens and handling
nesting (including a subvariation tree!) in the back end.  That never went
anywhere.


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?27799>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



_______________________________________________
Bug-XBoard mailing list
Bug-XBoard@...
http://lists.gnu.org/mailman/listinfo/bug-xboard