In linguistics, an example of bottom-up parsing would be analyzing a sentence by identifying words first, and then using properties of the words to infer grammatical relations and phrase structures to build a parse tree of the complete sentence. This means that rather than beginning with the starting symbol and generating an input string, we shall examine the string and attempt to work our way back to the starting symbol. We can gain some power by starting at the bottom and working our way up.
In programming language compilers, bottom-up parsing is a parsing method that works by identifying terminal symbols first, and combines them successively to produce nonterminals. The productions of the parser can be used to build a parse tree of a program written in human-readable source code that can be compiled to assembly language or pseudocode.
Different computer languages require different parsing techniques, although it is not uncommon to use a parsing technique that is more powerful than that actually required.
It is common for bottom-up parsers to take the form of general parsing engines, that can either parse or generate a parser for a specific programming language given a specification of its grammar. Perhaps the most well known generalized parser generators are YACC and GNU bison.
S → Ax A → a A → b
For the input sentence ax, the leftmost derivation is
S → Ax → ax
which also happens to be the rightmost derivation as there is only one nonterminal ever to replace in a sentential form.
An LL(1) parser starts with S and asks "which production should I attempt?" Naturally, it predicts the only alternative of S. From there it tries to match A by calling method A (in a recursive-descent parser). Lookahead a predicts production
A → a
The parser matches a, returns to S and matches x. Done. The derivation tree is:
A bottom up parser is trying to go backwards, performing the following reverse derivation sequence:
ax → Ax → S
Intuitively, a top-down parser tries to expand nonterminals into right-hand-sides and a bottom-up parser tries to replace (reduce) right-hand-sides with nonterminals.
The first action of the bottom-up parser would be to replace a with A yielding Ax. Then it would replace Ax with S. Once it arrives at a sentential form with exactly S, it has reached the goal and stops, indicating success.
Just as with top-down parsing, a brute-force approach will work. Try every replacement until you run out of right-hand-sides to replace or you reach a sentential form consisting of exactly S. While not obvious here, not every replacement is valid and this approach may try all the invalid ones before attempting the correct reduction. Backtracking is extremely inefficient, but as you would expect lookahead proves useful in reducing the number of "wrong turns."
The parser is a stack automaton which is in one of several discrete states. In reality, the parse stack contains states, rather than grammar symbols. However, since each state corresponds to a unique grammar symbol, the state stack can be mapped onto the grammar symbol stack mentioned earlier.
In step 2.1 above we are "shifting" the input symbols to one side as we move through them; hence a parser which operates by repeatedly applying steps 2.1 and 2.2 above is known as a shift-reduce parser.
A shift-reduce parser is most commonly implemented using a stack, where we proceed as follows:
Take the language:Sentence --> NounPhrase VerbPhraseNounPhrase --> Art NounVerbPhrase --> Verb | Adverb VerbArt --> the | a | ...Verb --> jumps | sings | ...Noun --> dog | cat | ...And the input:the dog jumpsThen the bottom up parsing is:Stack Input Sequence () (the dog jumps) (the) (dog jumps) SHIFT word onto stack (Art) (dog jumps) REDUCE using grammar rule (Art dog) (jumps) SHIFT.. (Art Noun) (jumps) REDUCE.. (NounPhrase) (jumps) REDUCE (NounPhrase jumps) SHIFT (NounPhrase Verb) REDUCE (NounPhrase VerbPhrase)() REDUCE (Sentence) SUCCESS
Given the language:
--> | + --> | * --> [ ] | 0...9
() (2 * [1 + 3 ]) SHIFT (2) (* [1 + 3 ]) REDUCE (
) (* [1 + 3]) SHIFT ( *) ([1 + 3]) SHIFT ( * [) (1 + 3]) SHIFT ( * [1) (+ 3]) REDUCE (twice) ( * [ ) (+ 3 ]) SHIFT (twice) ( * [ + 3) (]) REDUCE (thrice) ( * [ + ) (]) REDUCE ( * [ ) (]) SHIFT ( * [ ]) REDUCE ( * ) REDUCE ( * ) REDUCE ( ) REDUCE ( ) SUCCESS