a bug in tokenize_and_parse ?

View: New views
15 Messages — Rating Filter:   Alert me  

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I am continuing my experiments with a standalone lexer.
>
> When I call
>
>    bool ok = lex::tokenize_and_parse( first, last, lexer, grammar);
>
> the function returns true even if parsing fails.
>
> It is easy enough to verify that the parsing failed since upon return
>
>  (first == last) is false
>
> and  'first' indeed points to the token where the error occured.
>
> If the error occurs in the lexer, tokenize_and_parse returns false
> and the 'first' pointer points to the correct token. In other words,
> it seems like an error in the parser ( grammar) does not result in
> lex::tokenize_and_parse(...) returning false when a separate lexer
> is used.
>
> I am not sure if this qualifies as a "bug".
> If the behavior is correct, then the example in Lex  'Quickstart 3 -
> Counting Words Using a Parser' is misleading
> ...
>
>     bool r = lex::tokenize_and_parse(first, last, word_count, g);
>
>     if (r) {
>         std::cout << "lines: " << g.l << ", words: " << g.w
>                   << ", characters: " << g.c << "\n";
>     }
>     else {
>         std::string rest(first, last);
>         std::cerr << "Parsing failed\n" << "stopped at: \""
>                   << rest << "\"\n";
>     }
>     return 0;
> ....
>
> since the function may return success even if the parsing fails along
> the
> way. Of course in this example the parser is mnay be too simple for
> this
> condition to occur, but one is left with the impression that testing
> (r) is sufficient to establish success ... which it is not.
>
> Comments ?

>From looking at the implementation of tokenize_and_parse I can't spot any
problems. Could you provide us with a small example reproducing this
behavior? I would consider it a bug if it behaved the way you're describing.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

a bug in tokenize_and_parse ?

by Jean-Francois Ostiguy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello again.  

I am continuing my experiments with a standalone lexer.

When I call

   bool ok = lex::tokenize_and_parse( first, last, lexer, grammar);

the function returns true even if parsing fails.

It is easy enough to verify that the parsing failed since upon return
   
 (first == last) is false

and  'first' indeed points to the token where the error occured.

If the error occurs in the lexer, tokenize_and_parse returns false
and the 'first' pointer points to the correct token. In other words,
it seems like an error in the parser ( grammar) does not result in
lex::tokenize_and_parse(...) returning false when a separate lexer
is used.  

I am not sure if this qualifies as a "bug".
If the behavior is correct, then the example in Lex  'Quickstart 3 -
Counting Words Using a Parser' is misleading
...

    bool r = lex::tokenize_and_parse(first, last, word_count, g);

    if (r) {
        std::cout << "lines: " << g.l << ", words: " << g.w
                  << ", characters: " << g.c << "\n";
    }
    else {
        std::string rest(first, last);
        std::cerr << "Parsing failed\n" << "stopped at: \""
                  << rest << "\"\n";
    }
    return 0;
....
 
since the function may return success even if the parsing fails along the
way. Of course in this example the parser is mnay be too simple for this
condition to occur, but one is left with the impression that testing
(r) is sufficient to establish success ... which it is not.

Comments ?





------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Jean-Francois Ostiguy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>>From looking at the implementation of tokenize_and_parse I can't spot any
> problems. Could you provide us with a small example reproducing this
> behavior? I would consider it a bug if it behaved the way you're
> describing.
>

Ok. I attach a simple test.
The parser parses a trivial toy language.

First I use the "correct" syntax. In that case, all is
well. All 7 lines are parsed. Here is the output:
-----------------------------------------------------
Parsing :                                                                              
BEGIN section                                                                          
gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;                                    
gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;                                    
END section                                                                            
BEGIN section                                                                          
Identity;                                                                              
END section                                                                            

rule_dbg: delimiter
rule_dbg: declaration
rule_dbg: declaration
rule_dbg: delimiter  
rule_dbg: delimiter  
rule_dbg: declaration
rule_dbg: delimiter  
lex::tokenize_and_parse succeeds.
(first == last) = 1
----------------------------------------------

When I introduce a syntax error on line 6
that is,
 
  Identity;                                                                              

is replaced with

  Identity;;

Even though a parse error occurs on line 6, tokenize_and_parse
returns success. The output is
------------------------------------------------

Parsing :
BEGIN section
gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;
gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;
END section
BEGIN section
Identity;;
END section

rule_dbg: delimiter
rule_dbg: declaration
rule_dbg: declaration
rule_dbg: delimiter
rule_dbg: delimiter
rule_dbg: declaration
lex::tokenize_and_parse succeeds.
(first == last) = 0
Parser failed at:
END section


[tokenize_and_parse_test.cc]

//-----------------------------------------------------------------
// tokenize_and_parse_test.cc
// Demonstrates conversional lexing/parsing using boost.spirit 2.1  
// ostiguy@...
//-----------------------------------------------------------------

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_statement.hpp>
#include <boost/spirit/include/phoenix_bind.hpp>
#include <string>

namespace lex = boost::spirit::lex;

template <typename Lexer>
struct my_lexer : boost::spirit::lex::lexer<Lexer>
{
  my_lexer() {
     delimiter   = "BEGIN|END";
     identifier  = "[a-zA-Z][_\\.a-zA-Z0-9]*";              
     ws          = "[ \\t\\n]+";
     real        = "([0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)|([-+]?[1-9]+\\.?([eE][-+]?[0-9]+))";
     integer     = "[0-9]+";  

     boost::spirit::lex::lexer<Lexer>::self += ws[ lex::_pass = lex::pass_flags::pass_ignore];
     boost::spirit::lex::lexer<Lexer>::self += delimiter;
     boost::spirit::lex::lexer<Lexer>::self += identifier;
     boost::spirit::lex::lexer<Lexer>::self += real;
     boost::spirit::lex::lexer<Lexer>::self += integer;
     boost::spirit::lex::lexer<Lexer>::self += '=';
     boost::spirit::lex::lexer<Lexer>::self += ';';
   
    }

    lex::token_def<>            ws;
    lex::token_def<std::string> identifier;
    lex::token_def<int>         integer;
    lex::token_def<double>      real;
    lex::token_def<double>      delimiter;

};

//||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
//||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

void rule_dbg(std::string const& str )
{
  std::cout << "rule_dbg: " << str << std::endl;
}

//||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
//||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

template <typename Iterator>
struct my_grammar : boost::spirit::qi::grammar<Iterator> {
   
  template <typename TokenDef>
  my_grammar( TokenDef const& tok )
      :  my_grammar::base_type(statement) {
       
       using namespace boost::spirit::qi;
   
       using namespace boost::spirit::qi;
       using boost::spirit::_1;
       using boost::phoenix::bind;
   

       statement      =   eoi             [bind(rule_dbg,"eoi")        ]
                         |
                           *(  delimiter     [bind(rule_dbg,"delimiter")  ]
                             | declaration   [bind(rule_dbg,"declaration")]
                            )
                         ;

       delimiter      =   tok.delimiter  >> tok.identifier;  

       declaration    =   tok.identifier >> option  >> ';';

       option         =   *(tok.identifier >> '=' >> (tok.real|tok.integer) );    

   }

  boost::spirit::qi::rule<Iterator> statement, delimiter, declaration, option;
};

//||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
//||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

typedef lex::lexertl::token<char const*>        token_type;
typedef lex::lexertl::actor_lexer<token_type>   lexer_type;
typedef my_lexer<lexer_type>::iterator_type  iterator_type;

#include <iostream>
#include <sstream>

using namespace std;

int main( int argc, char* argv[] )
{
 
    string test_string="BEGIN section\n";
    test_string += "gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;\n";
    test_string += "gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;\n";
    test_string += "END section\n";
    test_string += "BEGIN section\n";
 
    // WE INTRODUCE A SYNTAX ERROR: ";;" instead of ";" as a terminator.  
 
    test_string += "Identity;;\n";      // THIS WILL MAKE THE PARSER FAIL
    //test_string += "Identity;\n";     // CORRECT SYNTAX  

 
    test_string += "END section\n" ;

 
  cout << "Parsing : \n" << test_string << endl;  

  char const* first = &test_string[0];
  char const* last  = &first[test_string.size()];  
     
  my_lexer<lexer_type>          lexer;            
  my_grammar<iterator_type>     grammar(lexer);      

  bool ok = lex::tokenize_and_parse( first, last, lexer, grammar );

  if (ok ) {
    cout << "lex::tokenize_and_parse succeeds." << endl;
  }
  else {
    cout << "lex::tokenize_and_parse fails." << endl;
  }

  cout << "(first == last) = " << (first == last) << endl;

  if( first != last) {
    string rest( first,last );
    cout << "Parser failed at: " << rest << endl;
  }


}




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> >>From looking at the implementation of tokenize_and_parse I can't spot
> >>any
> > problems. Could you provide us with a small example reproducing this
> > behavior? I would consider it a bug if it behaved the way you're
> > describing.
> >
>
> Ok. I attach a simple test.
> The parser parses a trivial toy language.

Thanks! That example made it clear. The tokenize_and_parse functions now
check whether the lexer has reached its end of input. This makes the return
value semantically equivalent to the return value of tokenize().

I hope you don't mind me adding a new regression test based on your example.
I'm not sure if we will be able to include this fix into the upcoming
release, though. But I'll ask Beman after the beta has been finished.

Some unrelated comment:

If you make your token_def's carry an explicit token value (i.e.
token_def<double>), I suggest to add the full list of used token value types
to the token definition as well:

    typedef lex::lexertl::token<char const*
      , mpl::vector<std::string, double, int> > token_type;

which enables late value conversion in the token type, making the whole
lexing process more efficient.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com


>
> First I use the "correct" syntax. In that case, all is well. All 7
> lines are parsed. Here is the output:
> -----------------------------------------------------
> Parsing :
> BEGIN section
> gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;
> gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0;
> END section
> BEGIN section
> Identity;
> END section
>
> rule_dbg: delimiter
> rule_dbg: declaration
> rule_dbg: declaration
> rule_dbg: delimiter
> rule_dbg: delimiter
> rule_dbg: declaration
> rule_dbg: delimiter
> lex::tokenize_and_parse succeeds.
> (first == last) = 1
> ----------------------------------------------
>
> When I introduce a syntax error on line 6 that is,
>
>   Identity;
>
> is replaced with
>
>   Identity;;
>
> Even though a parse error occurs on line 6, tokenize_and_parse returns
> success. The output is
> ------------------------------------------------
>
> Parsing :
> BEGIN section
> gaussian pmin = 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0; gaussian pmin =
> 0.0 pmax=2.17 qmin = 0.0 qmax = 1.0; END section BEGIN section
> Identity;; END section
>
> rule_dbg: delimiter
> rule_dbg: declaration
> rule_dbg: declaration
> rule_dbg: delimiter
> rule_dbg: delimiter
> rule_dbg: declaration
> lex::tokenize_and_parse succeeds.
> (first == last) = 0
> Parser failed at:
> END section



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

[lex] Is there a lex equivalent of distinct directive in Qi

by Andy Stevenson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

By the way I am extremely impressed with the whole Spirit 2.1  
frameworks and docs. An awesome piece of work.

I want to write a lexer that uses the equivalent of the distinct  
directive that's now in the repository.
What would you recommend as the best technique in Lex for doing this?

Andy

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Jean-Francois Ostiguy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:

> I hope you don't mind me adding a new regression test based on your
> example. I'm not sure if we will be able to include this fix into the
> upcoming release, though. But I'll ask Beman after the beta has been
> finished.

I am more than happy to make (a very very minor) contribution to such
fine piece of work. Use my test case as you see fit.

I sure hope that 1.41 will include the fix. Perhaps you should
consider maintaining a list of "interesting" bugfixes
on the spirit website (the list would provide a short explanation and
refer to relevant svn commits so that patches can be retrieved).  

Another issue: standalone functions such as lex::tokenize_and_parse()
and  lex::tokenize_and_phrase_parse() do not seem to be formally documented,
although lex::tokenize_and_parse is mentioned in one of the examples. There
is also no discussion of why they should be used,
instead of, say, the parse() member functions.

> Some unrelated comment:
>
> If you make your token_def's carry an explicit token value (i.e.
> token_def<double>), I suggest to add the full list of used token value
> types to the token definition as well:
>
>     typedef lex::lexertl::token<char const*
>       , mpl::vector<std::string, double, int> > token_type;
>
> which enables late value conversion in the token type, making the whole
> lexing process more efficient.

Thanks for this comment. I am barely beginning to be able to appreciate this
kind of detail. I really admire the quality of the work that went into
spirit and the dedication of its developpers.

Regards

-Francois



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > I hope you don't mind me adding a new regression test based on your
> > example. I'm not sure if we will be able to include this fix into the
> > upcoming release, though. But I'll ask Beman after the beta has been
> > finished.
>
> I am more than happy to make (a very very minor) contribution to such
> fine piece of work. Use my test case as you see fit.

Ok, it's added to SVN now.

> I sure hope that 1.41 will include the fix. Perhaps you should
> consider maintaining a list of "interesting" bugfixes
> on the spirit website (the list would provide a short explanation and
> refer to relevant svn commits so that patches can be retrieved).

Good point, we might want to do that during time between releases.

> Another issue: standalone functions such as lex::tokenize_and_parse()
> and  lex::tokenize_and_phrase_parse() do not seem to be formally
> documented,
> although lex::tokenize_and_parse is mentioned in one of the examples.
> There
> is also no discussion of why they should be used,
> instead of, say, the parse() member functions.

Yes, the lexer docs are incomplete, sorry. I'm working on that, still. We
concentrated on Qi/Karma docs for this release.

> > Some unrelated comment:
> >
> > If you make your token_def's carry an explicit token value (i.e.
> > token_def<double>), I suggest to add the full list of used token
> value
> > types to the token definition as well:
> >
> >     typedef lex::lexertl::token<char const*
> >       , mpl::vector<std::string, double, int> > token_type;
> >
> > which enables late value conversion in the token type, making the
> whole
> > lexing process more efficient.
>
> Thanks for this comment. I am barely beginning to be able to appreciate
> this
> kind of detail. I really admire the quality of the work that went into
> spirit and the dedication of its developpers.

Thanks!
Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Another issue: standalone functions such as lex::tokenize_and_parse()
> and  lex::tokenize_and_phrase_parse() do not seem to be formally
> documented,
> although lex::tokenize_and_parse is mentioned in one of the examples.
> There
> is also no discussion of why they should be used,
> instead of, say, the parse() member functions.

Added now here: http://tinyurl.com/yfw8oqq.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Francois,

> > I hope you don't mind me adding a new regression test based on your
> > example. I'm not sure if we will be able to include this fix into the
> > upcoming release, though. But I'll ask Beman after the beta has been
> > finished.
>
> I am more than happy to make (a very very minor) contribution to such
> fine piece of work. Use my test case as you see fit.
>
> I sure hope that 1.41 will include the fix. Perhaps you should
> consider maintaining a list of "interesting" bugfixes
> on the spirit website (the list would provide a short explanation and
> refer to relevant svn commits so that patches can be retrieved).

I was thinking about this 'fix' ever since and came to the conclusion that I
would like to revert that change.

Here is my rationale: all Qi parse API functions return whether the parsing
succeeded without checking whether the end of input (eoi) has been reached.
That allows parsing of partial input while still getting the proper return
value.

The change I made to the lexer API functions (tokenize_and_parse,
tokenize_and_phrase_parse) introduces different semantics because these
functions now check for the eoi criteria as well. But I would like to keep
the semantics as close as possible.

Reverting this change would require a minor change to your code as you now
need to check for the eoi criteria yourself by comparing the iterators after
the tokenize_and_... functions returned:

  // old code
  bool ok = lex::tokenize_and_parse( first, last, lexer, grammar );

  // new code
  bool ok = lex::tokenize_and_parse( first, last, lexer, grammar ) &&
      first == last;

How does this sound to you?
Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Jean-Francois Ostiguy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:


> I was thinking about this 'fix' ever since and came to the conclusion that
> I would like to revert that change.
>
> Here is my rationale: all Qi parse API functions return whether the
> parsing succeeded without checking whether the end of input (eoi) has been
> reached. That allows parsing of partial input while still getting the
> proper return value.
>
> The change I made to the lexer API functions (tokenize_and_parse,
> tokenize_and_phrase_parse) introduces different semantics because these
> functions now check for the eoi criteria as well. But I would like to keep
> the semantics as close as possible.
>
> Reverting this change would require a minor change to your code as you now
> need to check for the eoi criteria yourself by comparing the iterators
> after the tokenize_and_... functions returned:
>
>   // old code
>   bool ok = lex::tokenize_and_parse( first, last, lexer, grammar );
>
>   // new code
>   bool ok = lex::tokenize_and_parse( first, last, lexer, grammar ) &&
>       first == last;
>

I think this is fine as long at it is documented. The issue is that the
status code true does not  imply success and this needs to be clear. Some of
the examples also need to be modified because they imply that checking the
the status code is sufficient, which it is not.
-Francois
 


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > I was thinking about this 'fix' ever since and came to the conclusion
> that
> > I would like to revert that change.
> >
> > Here is my rationale: all Qi parse API functions return whether the
> > parsing succeeded without checking whether the end of input (eoi) has
> been
> > reached. That allows parsing of partial input while still getting the
> > proper return value.
> >
> > The change I made to the lexer API functions (tokenize_and_parse,
> > tokenize_and_phrase_parse) introduces different semantics because
> these
> > functions now check for the eoi criteria as well. But I would like to
> keep
> > the semantics as close as possible.
> >
> > Reverting this change would require a minor change to your code as
> you now
> > need to check for the eoi criteria yourself by comparing the
> iterators
> > after the tokenize_and_... functions returned:
> >
> >   // old code
> >   bool ok = lex::tokenize_and_parse( first, last, lexer, grammar );
> >
> >   // new code
> >   bool ok = lex::tokenize_and_parse( first, last, lexer, grammar ) &&
> >       first == last;
> >
>
> I think this is fine as long at it is documented. The issue is that the
> status code true does not  imply success and this needs to be clear.
> Some of
> the examples also need to be modified because they imply that checking
> the
> the status code is sufficient, which it is not.

Makes sense.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by OvermindDL1 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 3, 2009 at 11:28 AM, Jean-Francois Ostiguy <ostiguy@...> wrote:
> I think this is fine as long at it is documented. The issue is that the
> status code true does not  imply success and this needs to be clear. Some of
> the examples also need to be modified because they imply that checking the
> the status code is sufficient, which it is not.

But it does indicate success.  It does not mean that all of your input
was parsed, but the parse did complete successfully.  Thus I do not
understand why you was it is not?

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Jean-Francois Ostiguy :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

OvermindDL1 wrote:


> But it does indicate success.  It does not mean that all of your input
> was parsed, but the parse did complete successfully.  Thus I do not
> understand why you was it is not?
>

May be this is not clear.

Say you call

bool success = tokenize_and_parse(start, end, lexer, parser);

Assume you are parsing a file and there is a syntax error somewhere in the
input; the current semantics is to return a status of "true" (success) even
if parsing has stopped due to incorrect syntax.
You say:  if not all input was parsed, this is "success".
So how do you define "failure" i.e. what does success = false means ?  
As I understand it, it means that lexing has failed.

My definition of success is: both lexing and parsing are completely
successful.
This is what I want do test for in my code before proceeding to
other tasks.

For this, I need to do something like

  bool success =  
    tokenize_and_parse(start, end, lexer, parser) && (start == end);  
         
   
or more likely

   bool success   =  tokenize_and_parse(start, end, lexer, parser);
        success   =  (success && (start == end));

 since I think that with the first form one cannot assume that
that tokenize_and_parse( ... ) would be evaluated first.

The bottom line is that it is not worth to make a Federal case of the
specific semantics. I do think. however, that it needs to be clearly
documented.

-Francois
 



 
   


------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Hartmut Kaiser :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> > But it does indicate success.  It does not mean that all of your
> input
> > was parsed, but the parse did complete successfully.  Thus I do not
> > understand why you was it is not?
> >
>
> May be this is not clear.
>
> Say you call
>
> bool success = tokenize_and_parse(start, end, lexer, parser);
>
> Assume you are parsing a file and there is a syntax error somewhere in
> the
> input; the current semantics is to return a status of "true" (success)
> even
> if parsing has stopped due to incorrect syntax.

That's not true. Your parser returned true even in case of an error because
your top level rule was a Kleene expression (unary operator*()) which by
design always succeeds, even if matching nothing. It's semantics are 'match
zero or more items of something', so any number of successfully matched
items is a success. You didn't write the grammar requiring to match the
whole input.

> You say:  if not all input was parsed, this is "success".
> So how do you define "failure" i.e. what does success = false means ?
> As I understand it, it means that lexing has failed.

Success means the parser returned success. That might happen even if the
input has been matched partially only. If you want to ensure your parser ate
all the input you need either append a   '>> qi::eoi' to your grammar or
check whether the iterators are equal after parsing.

> My definition of success is: both lexing and parsing are completely
> successful.

Sure, I agree. But if your parser thinks everything is ok, then
tokenize_and_parse can't tell differently.

> This is what I want do test for in my code before proceeding to
> other tasks.
>
> For this, I need to do something like
>
>   bool success =
>     tokenize_and_parse(start, end, lexer, parser) && (start == end);
>
>
> or more likely
>
>    bool success   =  tokenize_and_parse(start, end, lexer, parser);
>         success   =  (success && (start == end));
>
>  since I think that with the first form one cannot assume that
> that tokenize_and_parse( ... ) would be evaluated first.

The Standard guarantees the first form to be correct, always.

> The bottom line is that it is not worth to make a Federal case of the
> specific semantics. I do think. however, that it needs to be clearly
> documented.

Sure, agreed as well.

Regards Hartmut

-------------------
Meet me at BoostCon
http://boostcon.com




------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general

Re: a bug in tokenize_and_parse ?

by Jean-Francois Ostiguy-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hartmut Kaiser wrote:


> That's not true. Your parser returned true even in case of an error
> because your top level rule was a Kleene expression (unary operator*())
> which by design always succeeds, even if matching nothing. It's semantics
> are 'match zero or more items of something', so any number of successfully
> matched items is a success. You didn't write the grammar requiring to
> match the whole input.
>

Interesting; I certainly missed that nuance. I suppose one would need
something like  
                   
       rule =  eoi | +( statement );
           

>>    bool success   =  tokenize_and_parse(start, end, lexer, parser);
>>         success   =  (success && (start == end));
>>
>>  since I think that with the first form one cannot assume that
>> that tokenize_and_parse( ... ) would be evaluated first.
>
> The Standard guarantees the first form to be correct, always.

I was not sure about that ... but it is reassuring that the Standard
guarantees the order of evaluation in a logical expression ;-)

>> The bottom line is that it is not worth to make a Federal case of the
>> specific semantics. I do think. however, that it needs to be clearly
>> documented.
>
> Sure, agreed as well.
>

Thank you for the explanation.  At this point, I think your decision to
leave things the way they were in the first place is definitely the correct
one.  
 

-Francois
 


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Spirit-general mailing list
Spirit-general@...
https://lists.sourceforge.net/lists/listinfo/spirit-general