Re: LALR parser from DEFPARSER
Aha!
Thank you greatly. It was not clear to me that the defparser only generated
LALR(1). I'm not sure this is clear from the manual, but it probably would have
been if I was more familiar with the parsing canon. I'd suspected that some
factoring might be able to solve the problem (still not quite sure),
but I wanted
to make sure I was understanding the problem before attacking that (the full
grammar I'm working with might be a little hairer to do than this
simplified one).
Again, many thanks!
-joshua
On 07/09/05, tarvydas <tarvydas@allstream.net> wrote:
> On September 6, 2005 08:45 pm, Joshua Taylor wrote:
> > that's not an option. The real issue I'm having is that
> > the non ambiguous series of tokens that I provided in the
> ...
> > _even though_ reading the next symbol (after the one that can
> > be 'shared') would clearly disambiguate between the var declaration
> > and the statement.
>
> What you're saying, I think, is equivalent to saying that you don't have a
> LALR(1) grammar (an LA LR grammar with just one look-ahead). You wish you
> had a LALR(2) parser generator, but because the dragon book says that the
> theory for LALR(1) is supremely beautiful, most parser generators generate
> parsers only for LALR(1) grammars.
>
> The parser generator messages are telling you the same thing - you've got the
> classic shift/reduce problem. For example:
>
> Warning: Conflict in state 3 for symbol :Id
> Action 3 (Vars -> . )
> Action :Shift (Vars -> Var . Vars )
> Using action :Shift
>
> says that the parser can't decide whether to accept (reduce) or to shift (i.e.
> keep moving the dot to the right). It says that it chose the shift action
> (consistent with the longest matching string ideology and the order in which
> you've specified the productions).
>
> BTW: in the warning, the "." shows the position of the parse ("." is used in
> the dragon book and other even more theoretical stuff by Aho and Ullman).
> The parser is a state machine. The stuff to the left of the "." has been
> parsed (if it reaches that state) and the stuff to the right is still
> unrecognized. In LALR(1) parsers, the item immediately to the right of the
> dot has to unambiguously tell the parser what to do next. The warning above
> says that you've got something equivalent to a "race condition" - the state
> machine might enter two different states, but can't figure out which.
>
> The parsers for C "solve" this classic problem by using semi-colons and by
> embedding a symbol table into the scanner, so that the scanner can detect
> symbols which have been typedef'ed and return a token different from :id in
> that case.
>
> I think your problem is similar to the C typedef problem (which is even worse
> than your problem, because the number of lookaheads in a C typedef is
> unbounded).
>
> If you, instead, had
>
> var -> :type-id :id :semicolon
>
> some (if not all) of your problem would go away (you can test this by doing
> this to a temporary copy of the grammar - even if you can't use that
> solution, you'll at least understand the problem).
>
> The dragon book (I've got it cracked open to see if I can still remember this
> stuff from 30 years ago :-) says that you *might* be able to back out of this
> problem by:
>
> a) left factoring your grammar
>
> b) cheating, like C does, and build some extra smarts into the scanner (i.e.
> returning :type-id instead of :id).
>
> Left factoring involves re-organizing the grammar so that the longest common
> prefix is moved off to another production (if you have the dragon book, look
> for the Left Factoring A Grammar algorithm - 4.2 in my book).
>
> Here's what I've come up with... but it's getting late and this might be
> flawed...
>
> (parsergen:defparser small-parser
> ((x program) `((x ,$1)))
>
> ((program :id v-or-s) `((program ,$2)))
> ((program :lbrace statement :rbrace statements) `((program (statements ,$2 ,
> $4))))
>
> ((v-or-s :gets :id :semicolon statements) `((assign ,$2 ,$4)))
> ((v-or-s :lbrace statement :rbrace statements) `((statements ,$2 ,$4)))
> ((v-or-s :id :semicolon :id v-or-s) `((some-var ,$1 ,$3 ,$4)))
> ((v-or-s) ())
>
> ((statements statement statements) (cons $1 $2))
> ((statements) ())
>
> ((statement :lbrace statement :rbrace) `((braces ,$2)))
> ((statement :id :gets :id :semicolon) 'some-assigment-statement)
> )
>
> "v-or-s" means "vars or statements" and the productions for v-or-s (try to)
> carry along all of the vars and statements stuff in a schmozzled state until
> we definitely see the first sign of a statement at which point we can punt to
> the "statements" production. The epsilon (empty) transition for vars isn't
> explicitly given, but is handled implicitly when the grammar jumps to
> "statements" without parsing any vars. The "v-or-s" epsilon transition
> handles the case where there are no vars nor statements. I think :-). Time
> for bed :-)...
>
> pt
>
--
=====================
Joshua Taylor
tayloj@rpi.edu