LALR parser from DEFPARSER

Hello all,

My parser theory isn't quite up to snuff, I suppose, but I think this
is a quick
question to answer. 

Say I've got a simple grammar like:

program -> vars statements
vars -> var vars
vars -> 

var -> :id :id :semicolon

statements -> statement statements
statements -> 

statement -> :id :gets :id :semicolon
statement -> :lbrace statement :rbrace

So, the interesting thing with regard to DEFPARSER is that 
if we only see the token :ID, it can't be told whether it starts a var
or a statement.

Alright, so my parser looks like:


(parsergen:defparser small-parser
  ((x        program)          $1)
  ((program  vars statements) `((vars ,$1) (statements ,$2)))

  ((vars var vars)             (cons $1 $2))
  ((vars) ())

  ((var :id :id :semicolon) 'some-var)

  ((statements statement statements) (cons $1 $2))
  ((statements) ())

  ((statement :lbrace statement :rbrace) `(,$2))
  ((statement :id :gets :id :semicolon) 'some-assigment-statement))

and I'll make a cheap tokenizer as follows: 

(defparameter *token-list*
  '(:id  :id       :semicolon
    :id :id       :semicolon
    :id  :gets :id :semicolon
    :lbrace :id :gets :id :semicolon :rbrace))
    
(defun get-token ()
  (if *token-list* (pop *token-list*) :eoi))

(small-parser #'get-token)


Creating the parser gives me the warnings:

.... pre processing grammar... creating state table... defining actions ...
Warning: Conflict in state 0 for symbol :Id
   Action 3 (Vars -> . )
   Action :Shift (X -> . Program )
  Using action :Shift

Warning: Conflict in state 3 for symbol :Id
   Action 3 (Vars -> . )
   Action :Shift (Vars -> Var . Vars )
  Using action :Shift
.... done.
Terminal symbols:  :Gets, :Id, :Lbrace, :Rbrace, :Semicolon

To the best of my understanding this is because seeing an :id
doesn't clarify whether it's a var or a statement being examined.

Running (small-parser #'get-token) yields:
((Vars (Some-Var Some-Var Some-Var)) (Statements ((Some-Assigment-Statement))))

which shows that the third line of tokens (which should be a statement) is
being picked up as a var. 

There is also this in the output:

Warning: Expecting (:Id), found Nil
Discarding input symbol Nil

Nil corresponds to the :gets (but printed is the second of multiple
values, which
there aren't in this case). 

So, am I looking at a fundamental limitation of LALR parsers, a limitation of
an LALR parser that can only look ahead one token (perhaps that's a quality
of LALR anyways), or a limitation on my understanding of parser generators. 
Assume for the moment that I can change the grammar only to another 
equivalent form.

Thanks in advance!

-- 
=====================
Joshua Taylor
tayloj@rpi.edu

Re: LALR parser from DEFPARSER

"Joshua Taylor" <joshuaaaron@gmail.com> wrote:

> My parser theory isn't quite up to snuff, I suppose, but I think this
> is a quick
> question to answer.
>
> Say I've got a simple grammar like:
>
> program -> vars statements
> vars -> var vars
> vars ->
>
> var -> :id :id :semicolon

You can probably specify more your tokens here. I don't know what you want
to do but for example if the first of the :id is a type then use something
like :type :id :semicolon.

(parsergen:defparser small-parser
  ((x        program)          $1)
  ((program  vars statements) `((vars ,$1) (statements ,$2)))

  ((vars var vars)             (cons $1 $2))
  ((vars) ())

  ((var :type :id :semicolon) 'some-var)

  ((statements statement statements) (cons $1 $2))
  ((statements) ())

  ((statement :lbrace statement :rbrace) `(,$2))
  ((statement :id :gets :id :semicolon) 'some-assigment-statement))
.... pre processing grammar... creating state table... defining actions
....... done.
Terminal symbols:  :gets, :id, :lbrace, :rbrace, :semicolon, :type
|small-parser-ACTION8|

Marc

Re: LALR parser from DEFPARSER

Some clarification, 
the grammar aI provided was actually a toy example that demonstrates the 
problem Im having with a somewhat more complex grammar. The real problem
does come down to variable-declarations first, then statements. However, these
can both start with :id (identifier, e.g. a class name for class
objects, and a variable
name) so in fact, it's actually something on the lines of 

vars -> var vars
var -> type id

type -> :int
type -> :string
type -> :id 

....  with statements as before similarly.
The problem seems to be that an :id is read, but it can't
be clear yet whether it's a statement or a variable declaration, but 
the parser keeps going with the assumption that it's a var-decl, and 
when it comes to the assignment operator = discards it rather than 
determining the line to be a statement.

On 06/09/05, Marc Battyani <marc.battyani@fractalconcept.com> wrote:
> "Joshua Taylor" <joshuaaaron@gmail.com> wrote:
> 
> > My parser theory isn't quite up to snuff, I suppose, but I think this
> > is a quick
> > question to answer.
> >
> > Say I've got a simple grammar like:
> >
> > program -> vars statements
> > vars -> var vars
> > vars ->
> >
> > var -> :id :id :semicolon
> 
> You can probably specify more your tokens here. I don't know what you want
> to do but for example if the first of the :id is a type then use something
> like :type :id :semicolon.
> 
> (parsergen:defparser small-parser
>   ((x        program)          $1)
>   ((program  vars statements) `((vars ,$1) (statements ,$2)))
> 
>   ((vars var vars)             (cons $1 $2))
>   ((vars) ())
> 
>   ((var :type :id :semicolon) 'some-var)
> 
>   ((statements statement statements) (cons $1 $2))
>   ((statements) ())
> 
>   ((statement :lbrace statement :rbrace) `(,$2))
>   ((statement :id :gets :id :semicolon) 'some-assigment-statement))
> ... pre processing grammar... creating state table... defining actions
> ...... done.
> Terminal symbols:  :gets, :id, :lbrace, :rbrace, :semicolon, :type
> |small-parser-ACTION8|
> 
> Marc
> 
> 
> 
> 

-- 
=====================
Joshua Taylor
tayloj@rpi.edu

Re: LALR parser from DEFPARSER

Well, yes changing the grammar would work, but as I mentioned before, 
that's not an option. The real issue I'm having is that 
the non ambiguous series of tokens that I provided in the 
first message is coming up incorrectly. The first line that corresponds
to a statement is being trashed and reinterpreted as a variable 
declaration, because (and I just mean, to my understanding) var
declarations and statements can both start with the same token,
_even though_ reading the next symbol (after the one that can 
be 'shared') would clearly disambiguate between the var declaration
and the statement.

On 06/09/05, Marc Battyani <marc.battyani@fractalconcept.com> wrote:
> Joshua Taylor wrote:
> >Some clarification,
> >the grammar aI provided was actually a toy example that demonstrates the
> >problem Im having with a somewhat more complex grammar. The real problem
> >does come down to variable-declarations first, then statements. However,
> these
> >can both start with :id (identifier, e.g. a class name for class
> >objects, and a variable
> >name) so in fact, it's actually something on the lines of
> >
> >vars -> var vars
> >var -> type id
> >
> >type -> :int
> >type -> :string
> >type -> :id
> 
> OK I see. In that case your problem is not the :id but the fact that both
> vars and statement can be empty.
> For instance you can modify the grammar so that you have at least a
> declaration or at least one statement.
> (anyway a program with no statement is not very useful ;-)
> Or you can put a separator between the vars and the statements.
> Or you can turn the var decl into a statement
> etc...
> 
> ;;; At least one var:
> CL-USER 18 >
> (parsergen:defparser small-parser
>   ((x        program)          $1)
>   ((program  vars statements) `((vars ,$1) (statements ,$2)))
> 
>   ((vars vars var)             (cons $2 $1))
>   ((vars var) (list $1))
> 
>   ((var :id :id :semicolon) 'some-var)
> 
>   ((statements statement statements) (cons $1 $2))
>   ((statements) ())
> 
>   ((statement :lbrace statement :rbrace) `(,$2))
>   ((statement :id :gets :id :semicolon) 'some-assigment-statement))
> ... pre processing grammar... creating state table... defining actions
> ...... done.
> Terminal symbols:  :gets, :id, :lbrace, :rbrace, :semicolon
> |small-parser-ACTION8|
> 
> CL-USER 19 > (defparameter *token-list*
>   '(:id  :id       :semicolon
>     :id :id       :semicolon
>     :id  :gets :id :semicolon
>     :lbrace :id :gets :id :semicolon :rbrace))
> *token-list*
> 
> CL-USER 20 >
> (defun get-token ()
>   (if *token-list* (pop *token-list*) :eoi))
> get-token
> 
> CL-USER 21 >
> (small-parser #'get-token)
> ((vars (some-var some-var)) (statements (some-assigment-statement
> (some-assigment-statement))))
> nil
> 
> 
> ;;; At least one statement
>  (parsergen:defparser small-parser
>   ((x        program)          $1)
>   ((program  vars statements) `((vars ,$1) (statements ,$2)))
> 
>   ((vars vars var)             (cons $2 $1))
>   ((vars) ())
> 
>   ((var :id :id :semicolon) 'some-var)
> 
>   ((statements statement statements) (cons $1 $2))
>   ((statements statement) (list $1))
> 
>   ((statement :lbrace statement :rbrace) `(,$2))
>   ((statement :id :gets :id :semicolon) 'some-assigment-statement))
> ... pre processing grammar... creating state table... defining actions
> ...... done.
> Terminal symbols:  :gets, :id, :lbrace, :rbrace, :semicolon
> |small-parser-ACTION8|
> 
> CL-USER 44 > (defparameter *token-list*
>   '(:id  :id       :semicolon
>     :id :id       :semicolon
>     :id  :gets :id :semicolon
>     :lbrace :id :gets :id :semicolon :rbrace))
> *token-list*
> 
> CL-USER 45 >
> (defun get-token ()
>   (if *token-list* (pop *token-list*) :eoi))
> get-token
> 
> CL-USER 46 >
> (small-parser #'get-token)
> ((vars (some-var some-var)) (statements (some-assigment-statement
> (some-assigment-statement))))
> 
> Marc
> 
> 
> 


-- 
=====================
Joshua Taylor
tayloj@rpi.edu

Re: LALR parser from DEFPARSER

Aha!
Thank you greatly. It was not clear to me that the defparser only generated
LALR(1). I'm not sure this is clear from the manual, but it probably would have
been if I was more familiar with the parsing canon. I'd suspected that some 
factoring might be able to solve the problem (still not quite sure),
but I wanted
to make sure I was understanding the problem before attacking that (the  full 
grammar I'm working with might be a little hairer to do than this
simplified one).
Again, many thanks!

-joshua

On 07/09/05, tarvydas <tarvydas@allstream.net> wrote:
> On September 6, 2005 08:45 pm, Joshua Taylor wrote:
> > that's not an option. The real issue I'm having is that
> > the non ambiguous series of tokens that I provided in the
> ...
> > _even though_ reading the next symbol (after the one that can
> > be 'shared') would clearly disambiguate between the var declaration
> > and the statement.
> 
> What you're saying, I think, is equivalent to saying that you don't have a
> LALR(1) grammar (an LA LR grammar with just one look-ahead).  You wish you
> had a LALR(2) parser generator, but because the dragon book says that the
> theory for LALR(1) is supremely beautiful, most parser generators generate
> parsers only for LALR(1) grammars.
> 
> The parser generator messages are telling you the same thing - you've got the
> classic shift/reduce problem.  For example:
> 
> Warning: Conflict in state 3 for symbol :Id
>  Action 3 (Vars -> . )
>  Action :Shift (Vars -> Var . Vars )
>  Using action :Shift
> 
> says that the parser can't decide whether to accept (reduce) or to shift (i.e.
> keep moving the dot to the right).  It says that it chose the shift action
> (consistent with the longest matching string ideology and the order in which
> you've specified the productions).
> 
> BTW: in the warning, the "." shows the position of the parse ("." is used in
> the dragon book and other even more theoretical stuff by Aho and Ullman).
> The parser is a state machine. The stuff to the left of the "." has been
> parsed (if it reaches that state) and the stuff to the right is still
> unrecognized.  In LALR(1) parsers, the item immediately to the right of the
> dot has to unambiguously tell the parser what to do next.  The warning above
> says that you've got something equivalent to a "race condition" - the state
> machine might enter two different states, but can't figure out which.
> 
> The parsers for C "solve" this classic problem by using semi-colons and by
> embedding a symbol table into the scanner, so that the scanner can detect
> symbols which have been typedef'ed and return a token different from :id in
> that case.
> 
> I think your problem is similar to the C typedef problem (which is even worse
> than your problem, because the number of lookaheads in a C typedef is
> unbounded).
> 
> If you, instead, had
> 
> var -> :type-id :id :semicolon
> 
> some (if not all) of your problem would go away (you can test this by doing
> this to a temporary copy of the grammar - even if you can't use that
> solution, you'll at least understand the problem).
> 
> The dragon book (I've got it cracked open to see if I can still remember this
> stuff from 30 years ago :-) says that you *might* be able to back out of this
> problem by:
> 
> a) left factoring your grammar
> 
> b) cheating, like C does, and build some extra smarts into the scanner (i.e.
> returning :type-id instead of :id).
> 
> Left factoring involves re-organizing the grammar so that the longest common
> prefix is moved off to another production (if you have the dragon book, look
> for the Left Factoring A Grammar algorithm - 4.2 in my book).
> 
> Here's what I've come up with... but it's getting late and this might be
> flawed...
> 
> (parsergen:defparser small-parser
>   ((x        program)          `((x ,$1)))
> 
>   ((program :id v-or-s) `((program ,$2)))
>   ((program :lbrace statement :rbrace statements) `((program (statements ,$2 ,
> $4))))
> 
>   ((v-or-s :gets :id :semicolon statements) `((assign ,$2 ,$4)))
>   ((v-or-s :lbrace statement :rbrace statements) `((statements ,$2 ,$4)))
>   ((v-or-s :id :semicolon :id v-or-s) `((some-var ,$1 ,$3 ,$4)))
>   ((v-or-s) ())
> 
>   ((statements statement statements) (cons $1 $2))
>   ((statements) ())
> 
>   ((statement :lbrace statement :rbrace) `((braces ,$2)))
>   ((statement :id :gets :id :semicolon) 'some-assigment-statement)
> )
> 
> "v-or-s" means "vars or statements" and the productions for v-or-s (try to)
> carry along all of the vars and statements stuff in a schmozzled state until
> we definitely see the first sign of a statement at which point we can punt to
> the "statements" production.  The epsilon (empty) transition for vars isn't
> explicitly given, but is handled implicitly when the grammar jumps to
> "statements" without parsing any vars.  The "v-or-s" epsilon transition
> handles the case where there are no vars nor statements.  I think :-).  Time
> for bed :-)...
> 
> pt
> 


-- 
=====================
Joshua Taylor
tayloj@rpi.edu

Re: LALR parser from DEFPARSER

Unable to parse email body. Email id is 4485

Re: LALR parser from DEFPARSER

Actually, that did the trick. I wouldn't mind knowing why though...
Thanks! (Glad I saw this before I started refactoring...)

On 07/09/05, Martin Simmons <martin@lispworks.com> wrote:
> >>>>> On Wed, 7 Sep 2005 07:38:17 -0400, Joshua Taylor <joshuaaaron@gmail.com> said:
> 
>   Joshua> Thank you greatly. It was not clear to me that the defparser only generated
>   Joshua> LALR(1). I'm not sure this is clear from the manual, but it probably would have
>   Joshua> been if I was more familiar with the parsing canon. I'd suspected that some
>   Joshua> factoring might be able to solve the problem (still not quite sure),
>   Joshua> but I wanted
>   Joshua> to make sure I was understanding the problem before attacking that (the  full
>   Joshua> grammar I'm working with might be a little hairer to do than this
>   Joshua> simplified one).
> 
> FWIW, I found that the following does work:
> 
> (parsergen:defparser small-parser
>   ((x        program)          $1)
>   ((program  vars statements) `((vars ,$1) (statements ,$2)))
> 
>   ((vars vars var)             (cons $2 $1))
>   ((vars) ())
> 
>   ((var :id :id :semicolon) 'some-var)
> 
>   ((statements statement statements) (cons $1 $2))
>   ((statements) ())
> 
>   ((statement :lbrace statement :rbrace) `(,$2))
>   ((statement :id :gets :id :semicolon) 'some-assigment-statement))
> 
> I'm not yet sure why, but somehow this makes a parser with an extra state
> after matching :ID that can then choose between :ID and :GETS.  This might not
> help if your real grammar is more complex.
> 
> --
> Martin Simmons                              Email: martin@lispworks.com
> LispWorks Ltd, St John's Innovation Centre    TEL:   +44 1223 421860
> Cowley Road, Cambridge CB4 0WS, England.      FAX:   +44 870 2206189
> 


-- 
=====================
Joshua Taylor
tayloj@rpi.edu

Re: LALR parser from DEFPARSER

Unable to parse email body. Email id is 4487