Term
What does lexical analysis consist of? |
|
Definition
- Splits the source into words or tokents
- Computes semantic values
- Convert numeric strings into numebrs
- Convert names into identifiers
- Discard white space (spaces & comments)
|
|
|
Term
|
Definition
A token is a sequence of characters that is treated as a unit in the grammar of the programming language.
A specific semantic value can have different lexemes:
Sematic value: REAL(42.0)
Lexemes: 4.2e1, 42.0, 42. |
|
|
Term
Why a separate lexical analysis makes syntax analysis simpler and easier? |
|
Definition
- if spaces and comments are allowed everywhere, we can rewrite "if n==/*base*/ 0 then return 0" to "IF ID(n) ROP(==) NUM(0) THEN RETURN NUM(0)"
- Tokens group characters and encapsulate variability
- Syntax rules become simpler
- Scanner uses more efficient mechanism than parser
- Finite-state machine vs. push-down automaton
|
|
|
Term
What would be a theory based approach to lexical analysis? |
|
Definition
- Regular expressions to describe tokens
- Non-deterministic finite automaton
- Deterministic finite automaton
- Linear-time algorithm in C, Java to recognize tokns
|
|
|
Term
Define alphabet, string and language |
|
Definition
Alphabet is a finite set of symbols - string is a finite sequence of symbols in the alphabet - language is a possibly infinite set of string |
|
|
Term
What is a regular expression M? |
|
Definition
- concisely describes a regular language L(M)
- Finite representation of infinite sets
- Built form operators: a ε | • * ( ) +
|
|
|
Term
Define a finite automaton |
|
Definition
Consists of:
- A set of input symbols (alphabet) ∑
- a finite set of states S
- a finite set of labeled edges E proper subset of (∑ U ε) x S x S
- A set of final states F proper subset of S
|
|
|
Term
What is the difference between a deterministic and non-deterministic finite automata? |
|
Definition
Difference is in the edges:
- A non deterministic finite automaton (NFA)
- May have edges with label ε (ε-edges)
- May have multiple edges with the same label (starting at the same state)
- A deterministic finte automaton (DFA)
- Has no ε-edges
- Has at most one edge with label a for each state s and symbol a
|
|
|
Term
How can regular expressions be converted into finite automata? |
|
Definition
McNaughton/Yamada/Tompson-algorithm:
- Recurses over structure fo regular expression
- Defines one pattern for each opearator
- X | Y would be state s1 and s2 where x and y would both have an edge from s1 to s2
- XY would be states s1, s2, s3, where s1 -x> s2 and s2 -y> s3
- X* would be s1 -x> s1
- uses ε-edges to glue together sub-automata
- each automaton
- has a single accepting state
- has a start state with no incoming edge
|
|
|
Term
What is the formal definition of acceptance of a string by a DFA? |
|
Definition
- A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 of staets with
- s1 is the initial state
- sn+1 is a final state
- s1 -a1> s2 ... sn -an> Sn+1
- DFA accepts ε iff s1 is in F
|
|
|
Term
What is the formal definition of acceptance of a string by a DFA?
|
|
Definition
- A string a1, a2, ... an is accepted by a DFA if there is a sequence s1, s2, ... sn+1 and t1 ... tn states with
- s1 is the initial state
- sn+1 is a final state
- t1 -a1> s2 ... tn -an> Sn+1
- ti is a ε-closure(si) iff there exists a sequence of s2, ..., s-1 states with
- DFA accepts ε iff s1 is in F
|
|
|
Term
How is a DFA runned over a string to check acceptace? |
|
Definition
Algorithm
- curr = s1;
- for i = 1 to n
- return curr in F;
w = string to be checked
E = edges represented as state array indexed by state x symbol
curr = current state |
|
|
Term
What do we need to define to convert an NFA into DFA? |
|
Definition
DFA state == set(NFA states)
DFA edge == NFA edge + ε-edges
- We let:
- s, t be NFA states
- T be a set of NFA states (= single DFA state)
- 'a' be an input symbol
- We define
- move(T, a) = {t | Ê s in T and S -a> t}
- ε-closure(T) = union of all s in T of ε-closure(s)
|
|
|
Term
What is the algorithm for a subset construction? |
|
Definition
Let D = set of DFA states
Let DTran[T, a] be the DFA transition table
- D = ε-closure(s1);
- while (There exists unmarked state T in D) {
- mark T;
- for each (a in Σ) {
- V = ε-closure(Move(T,a));
- if (V not in D) D = D U {V};
- DTran[T, a] = V;
- }
- }
|
|
|
Term
What is ε-closure of a state s? |
|
Definition
It is all the states you can reach from state s by following all the ε paths |
|
|
Term
Intuitively explain the conversion from NFA to DFA |
|
Definition
http://www.youtube.com/watch?v=taClnxU-nao
- We write a table where the columns are the transition inputs followed by ε*, and the rows are the states - initially we just have the starting state
- We take the starting state and see all the possible transitions with the inputs followed by ε* and we write the results in the respective cell all the states we can reach with that input ε*
- We then add all the states we obtained as a state (if they contain a final state, they are a final state, so we circle them
- We then draw all the states we obtained, and we join them with their corresponding inputs
|
|
|
Term
|
Definition
- Add "dead" state so that transitions are defined for all symbols in all states
- Partition states into final and non-final
- Partition each group so that two states s, t are in the same (new) subgroup iff for all a in ∑, s -a> S' and t -a> t' then s' and t' are in the same (old) group
- Repeat 3 until no further change
- Merge equivalent states
- Remove dead states
|
|
|
Term
Explain intuitively how to minimize a DFA |
|
Definition
http://www.youtube.com/watch?v=TvMEX2htBYw
- From the DFA we make two sets, the first set will hav all the final sets, and the second set will have the rest of the sets
- We take one set and for each element of that set we try all the inputs and observe which are the states it reaches.
- If the states it reaches are the same states that we have in the set we don't have to do anything
- Otherwise, if it reaches states outside its own set we classify it as a new set.
- If elements reach the same sets, they would belong to the same set
- We repeat S3 until we see no changes
|
|
|