Introduction
Lecture slide: click here
History, standards, syntax, semantics, grammars, parsing. Readings: Scott, ch 1 - 2 See resources folder for language standards documents: C11, C++17, ECMA-262, ECMA-334, and JLS15.
Programming paradigms
Imperative (von Neumann)
Such as: Fortran, Pascal, C, Ada
programs have mutable storage (state) modified by assignments
the most common and familiar paradigm
Functional (applicative)
Such as: Scheme, Lisp, ML, Haskell
functions are first-class values
side effects (e.g., assignments) discouraged
Logical (declarative)
Such as: Prolog, Mercury
programs are sets of assertions and rules
Object-Oriented
Such as: Simula 67, Smalltalk, C++, Ada95, Java, C#
data structures and their operations are bundled together
inheritance
Quantum
Such as: QCL, Q, Q#, qGCL
performs operations on data using quantum bits (“qubits”)
utilizes quantum properties such as superposition and entanglement
BNF (Backus-Naur Form)
Does not add expressiveness to the language—for convenience only.
alernation:
<Symb> ::= <Letter> | <Digit>
sequencing:
<Id> ::= <Letter> <Symb>
BNF Building blocks
Numbers of a and b are equal (in any order):
a is more:
EBNF (Extended Backus-Naur Form)
Encompasses everything BNF has, plus:
repetition:
zero or more:
{<Symb>}
or<Symb>*
one or more:
<Digit>+
option:
[<Digit>]
grouping:
('+'|'-')
The Chomsky hierarchy
Regular grammars (Type 3)
all productions can be written in the form: N ::= TN
one non-terminal on left side; at most one on right
generally used for scanners
Context-free grammars (Type 2)
all productions can be written in the form:N ::= XYZ
one non-terminal on the left-hand side; mixture on right
most major programming languages
Context-sensitive grammars (Type 1):
number of symbols on the left is no greater than on the right
no production shrinks the size of the sentential form
used for parts of C++, but otherwise rarely used
Type-0 grammars
no restrictions
Regular expressions
Regular grammars can be used to generate regular languages. Regular expressions can be used to accept regular languages.
ε
denotes∅
a character
x
, wherex ∈ Σ
, denotes{x}
sequencing: a sequence of two regular expressions
RS
denotes{αβ | α ∈ [R], β ∈ [S]}
alternation:
R|S
denotes[R] ∪ [S]
Kleene star:
R*
denotes the set of strings which are concatenations of zero or more strings from[R]
grouping: parentheses
(A|B)
Shorthands
Conventions
Parse tree
A parse tree describes the grammatical structure of a sentence
leaf nodes are terminal symbols
internal nodes are non-terminal symbols
construction of tree from sentence is called parsing
Example:
Input string: 52.316
Grammar:
Parse tree:
Grammar ambiguity
If the parse tree for a sentence is not unique, the grammar is ambiguous.
From recitation: A CFG is ambiguous if it has more than one parse tree for some strings. i.e. there is more than 1 derivation for a string.
"Dangling if" rewrite solution
Precedence
If we say operator *
has precedence over operator +
. This means expression 5 + 2 * 3
should be evaluated as: 5 + (2 * 3)
, not (5 + 2) * 3
.
Precedence can be specified in two ways:
Write precedence directly into the rules: higher precedence appear in deeper rules.
Write an ambiguous grammar first, then specify operator precedence separately.
Associativity
Associativity tells the parser what to do with operators at the same level of precedence.
For example, 5 - (2 - 3)
verses (5 - 2) - 3
. Two -
operators have the same precedence. However, how you associate them will yield different mathematical results.
Usually, you can specify using left associativity or right associativity.
Scanners and parsers
Scanners (or tokenizers) read in text and extract tokens. Parsers read in tokens and construct a parse tree.
LL parsers
LL (Left-to-right, Leftmost derivation) parsers are also called top-down, recursive descent or predictive parsers. It begins at the root symbol.
LL(k)
: means k
look ahead.
Problems with LL parsing:
Left recursion: a grammar is left-recursive if there exists non-terminal
A
such that<A> ::= <A> α
for someα
.Common prefixes: if there exists a non-terminal
A
and terminalb
such that there exists rule R1<A> ::= b ...
and R2<A> ::= b ...
.
How to eliminate left-recursive problem:
Original:
Convert to:
Last updated