resources: tracing a lex program jflex jflap.org JFLAP jar JFLAP tutorial week04 files ascii chart flex & bison text Foundations of Computing |
This lab takes a closer look at the theory
underlying the lexical analyzer generated by lex. This lab
parallels the material in the second half of chapter 3 of the Dragon book.
We will be using an open source tool called JFLAP
to visually see the relationship between REs, NFAs, and DFAs.
You can download the latest version of JFLAP jar file from
JFLAP
or use this jarfile.
The latest version requires Java 1.7.
You will essentially be duplicating
the running example of RE to NFA to DFA in the Dragon book
only with a different regex. The Dragon book
and my lecture notes use (a|b)*abb
.
The regex you will model in this lab is:
a(a|b)*bb
Aside. You may be interested in installing
a graphical version of flex
called jflex. The tables
that you dump are in DOT format which you can then view in
Graphviz on your desktop or
online.
Before starting you should review the sample lex dump at the bottom of this lab
for a*. This will help you understand the lex
notation for the transition tables.
Note that Lex reduces on the size of the transaction diagram
through the use of equivalence classes.
For example, if you match [aeiou] lex will assign this a single
transition. See vowels.out
for an example.
When you are comfortable with understanding the lex table dump
create a Lex program for pattern a(a|b)*bb
. Then do the
following:
$ lex -T -7 -d lab04.l 2> outReview the table dump generated by lex for this regex. Based on the NFA table dump, draw the Transaction Diagram/Table. Next, perform Subset Construction on your NFA to convert it to a DFA. Draw the Transaction Diagram for the DFA. (You can check your expected outcome with the Lex dump of the DFA but I cannot for the life of me see the correlation).
Execute the RE-NFA-DFA Conversion using JFLAP. Download JFLAP to your desktop. From a shell, execute the jarfile in java:
$ java -jar JFLAP_Thin.jarNote: JFLAP jar requires Java 1.6. The machines in Rm 315 have Java installed. You will need to download Java to your personal desktop before beginning if you do not already have it. By default JFLAP uses λ (lambda) to denote the empty string. The text uses ε. JFLAP also uses '+' to denote '|' the OR operator.
In JFLAP select Regular Expression off the opening menu (it is about 8th from the top). Enter the regex you are modeling in this lab. IMPORTANT: JFLAP uses '+' for '|'; for example, regext a(a|b)* is entered as:
a(a+b)*From the menu select "Convert RE to NFA" to create an NFA from the RE. JFLAP converts an RE to an NFA following the M-Y-T Algorithm. As a learning tool you should try to do the conversion manually. If you "Do Step" or "Do All" JFLAP will perform all the steps to convert it for you. Deconstruct each operator in the RE (concatentation, union, kleene-closure, and parentheses) one at a time. Export the NFA to a new window. Next, convert the NFA to a DFA. JFLAP uses the subset construction algorithm covered in the dragon text to do this. Export your DFA to a new window. Check the DFA against the lex DFA. While the state numbers will differ the number of states and the transactions should be the same.
As a sanity check, select Input from the main menu to run your DFA against valid and invalid input. You can also convert the DFA back into an RE. However, JFLAP will give you an equivalent RE but not the original. For example, this RE
(aa*b|bb*aa*b)(aa*b)*b((aa*b|bb*aa*b)(aa*b)*b)*is equivalent to (a|b)*abb but is quite a bit more complex.
WHAT TO SUBMIT FOR THIS LAB
Submit the NFA diagraph and the DFA digraph for regex a(a|b)*bb. You can generate the digraphs from JFLAP or you can draw the digraphs out on paper.
,---------------------, | EXAMPLE | | RE -> NFA -> DFA | '---------------------' RE = a* We can easily construct the NFA transition diagram for a*: λ a ------------- q0 | q1 | q1 F = q1 ---+----+---- q1 | Ø | q1 ------------- Next, let's look at what lex does with this pattern. ----------------------------------------------------------------------------- Example of NFA and DFA dump from Lex for a*: %option %% 1 a* 2 End Marker ********** beginning dump of nfa with start state 10 state # 1 257: 0, 0 # 257 denotes epsilon state # 2 257: 0, 0 # 0 denotes null transition state # 3 97: 4, 0 # ASCII 97 = a state # 4 257: 3, 5 state # 5 257: 0, 0 [1] # first accepting state state # 6 257: 3, 5 # e transitions always go to 0 or 2 states state # 7 257: 1, 6 state # 8 -1: 9, 0 # -1 is EOF state # 9 257: 0, 0 [2] # second accepting state state # 10 257: 7, 8 ********** end of dump ,-----------------------, | NFA Transition Table | '-----------------------' If you do not encounter an 'a' or EOF you do not transition to a new state. Start state: 10. 1 a* 2 End Marker -------------------------, State | lambda| a | EOF| -------+-------+----+----+ #10 | {7,8} | | | -------+-------+----+----' #9 | Accepting State [2] for End Marker -------+-------+----+----+ #8 | 0 | 0 | #9 | -------+-------+----+----| #7 | {1,6} | | | -------+-------+----+----| #6 | {3,5} | | | -------+-------+----+----+ #5 | Accepting State [1] for a* -------,-------,----,----, #4 | {3,5} | | | -------+-------+----+----| #3 | |{4} | | -------+-------+----+----' #2 | no incoming edges so omit from graph -------+-------+----+----' #1 | DEAD STATE | -------------------------' NFA Transition Diagram (e is epsilon): e e |>- 10 ----> 7 ----> 1 ??? (i do not know what this state for) \ \ \ e \ e e \ `--> 6 -------------------> [5] Accepting state #1 \ \ ^ \ \ e a / e \ `--> 3 -----> 4 ---' \ ^ / \ \ e / \ `---' `--> 8 ---> [9] Accepting State #2 EOF CHECK!! ------------------------------------------------------------- 1 a* 2 End Marker DFA Dump: state # 1: 1 4 # from state 1 on 'a' [1] move to state 4 (D1) 2 5 # from state 1 on EOF [2] move to state 5 (D2) state # 2: 1 4 # combine state #1 and state #2 since they have the same transitions 2 5 state # 3: state # 4: state # 5: 2 6 # from state 5 on EOF move to state 6 state # 6: 2 6 # combine with state 5 state # 1 accepts: [1] state # 2 accepts: [1] state # 3 accepts: [3] (error state - there is no transition) state # 4 accepts: [2] state # 5 accepts: [1] state # 6 accepts: [1] Minimize by eliminating the states with the same transitions above and we have 3 accepting states: DFA Transition Table State a EOF ------------------ D0 D2 D1 ------------------ D1 Accepting State for regex #2 (EOF) ------------------ D2 D2 D0 Accepting State for regex #1 (a*) ------------------ EOF |>- [D0]-------> [D1] \ \ a `----> [D2] / ^ \ \ `-' a Equivalence Classes: (all characters are in class 1 or class 2) 'a' is equivalence class 2, everything else is class 1. \000 = -1 \020 = 1 ' ' = 1 0 = 1 @ = 1 P = 1 ` = 1 p = 1 \001 = 1 \021 = 1 ! = 1 1 = 1 A = 1 Q = 1 a = 2 q = 1 \002 = 1 \022 = 1 " = 1 2 = 1 B = 1 R = 1 b = 1 r = 1 \003 = 1 \023 = 1 # = 1 3 = 1 C = 1 S = 1 c = 1 s = 1 blah blah blah . . . Meta-Equivalence Classes: 1 = 1 2 = 2
(λ|aa*)
is equivalent to a*.
The null (ø) symbol denotes a transition from one state to another (or
the same state) that is not defined.