450: LAB 04

CMPS 450 Lab 04 - Finite State Automata and Lex Generated Scanners

Due: next tuesday before lab

resources:
tracing a lex program
jflex
jflap.org
JFLAP jar
JFLAP tutorial
week04 files
ascii chart
flex & bison text
Foundations of Computing

This lab takes a closer look at the theory underlying the lexical analyzer generated by lex. This lab parallels the material in the second half of chapter 3 of the Dragon book. We will be using an open source tool called JFLAP to visually see the relationship between REs, NFAs, and DFAs. You can download the latest version of JFLAP jar file from JFLAP or use this jarfile. The latest version requires Java 1.7. You will essentially be duplicating the running example of RE to NFA to DFA in the Dragon book only with a different regex. The Dragon book and my lecture notes use (a|b)*abb. The regex you will model in this lab is:

      a(a|b)*bb

Aside. You may be interested in installing a graphical version of flex called jflex. The tables that you dump are in DOT format which you can then view in Graphviz on your desktop or online.

Before starting you should review the sample lex dump at the bottom of this lab for a*. This will help you understand the lex notation for the transition tables. Note that Lex reduces on the size of the transaction diagram through the use of equivalence classes. For example, if you match [aeiou] lex will assign this a single transition. See vowels.out for an example. When you are comfortable with understanding the lex table dump create a Lex program for pattern a(a|b)*bb. Then do the following:

      $ lex -T -7 -d lab04.l  2> out

Review the table dump generated by lex for this regex. Based on the NFA table dump, draw the Transaction Diagram/Table. Next, perform Subset Construction on your NFA to convert it to a DFA. Draw the Transaction Diagram for the DFA. (You can check your expected outcome with the Lex dump of the DFA but I cannot for the life of me see the correlation).

Execute the RE-NFA-DFA Conversion using JFLAP. Download JFLAP to your desktop. From a shell, execute the jarfile in java:

     $ java -jar JFLAP_Thin.jar

Note: JFLAP jar requires Java 1.6. The machines in Rm 315 have Java installed. You will need to download Java to your personal desktop before beginning if you do not already have it. By default JFLAP uses λ (lambda) to denote the empty string. The text uses ε. JFLAP also uses '+' to denote '|' the OR operator.

In JFLAP select Regular Expression off the opening menu (it is about 8th from the top). Enter the regex you are modeling in this lab. IMPORTANT: JFLAP uses '+' for '|'; for example, regext a(a|b)* is entered as:

        a(a+b)*

From the menu select "Convert RE to NFA" to create an NFA from the RE. JFLAP converts an RE to an NFA following the M-Y-T Algorithm. As a learning tool you should try to do the conversion manually. If you "Do Step" or "Do All" JFLAP will perform all the steps to convert it for you. Deconstruct each operator in the RE (concatentation, union, kleene-closure, and parentheses) one at a time. Export the NFA to a new window. Next, convert the NFA to a DFA. JFLAP uses the subset construction algorithm covered in the dragon text to do this. Export your DFA to a new window. Check the DFA against the lex DFA. While the state numbers will differ the number of states and the transactions should be the same.

As a sanity check, select Input from the main menu to run your DFA against valid and invalid input. You can also convert the DFA back into an RE. However, JFLAP will give you an equivalent RE but not the original. For example, this RE

     (aa*b|bb*aa*b)(aa*b)*b((aa*b|bb*aa*b)(aa*b)*b)*

is equivalent to (a|b)*abb but is quite a bit more complex.

WHAT TO SUBMIT FOR THIS LAB

Submit the NFA diagraph and the DFA digraph for regex a(a|b)*bb. You can generate the digraphs from JFLAP or you can draw the digraphs out on paper.

This example demonstrates the concepts covered in this lab for the regex a*. In particular, information on reading the table dump from Lex.


,---------------------,
|      EXAMPLE        |
|  RE -> NFA -> DFA   |
'---------------------'
 
  RE = a*

We can easily construct the NFA transition diagram for a*:

    λ    a
-------------
q0 | q1 | q1    F = q1
---+----+---- 
q1 | Ø  | q1  
-------------    

Next, let's look at what lex does with this pattern.

-----------------------------------------------------------------------------
Example of NFA and DFA dump from Lex for a*:

%option %%
1	a*
2	End Marker

********** beginning dump of nfa with start state 10
state #    1	257:     0,    0       # 257 denotes epsilon 
state #    2	257:     0,    0       # 0 denotes null transition
state #    3	 97:     4,    0       # ASCII 97 = a
state #    4	257:     3,    5
state #    5	257:     0,    0  [1]  # first accepting state 
state #    6	257:     3,    5       # e transitions always go to 0 or 2 states 
state #    7	257:     1,    6
state #    8	 -1:     9,    0       # -1 is EOF
state #    9	257:     0,    0  [2]  # second accepting state 
state #   10	257:     7,    8
********** end of dump

 ,-----------------------,
 | NFA Transition Table  |
 '-----------------------'
If you do not encounter an 'a' or EOF you do not transition to a new state.
Start state: 10.

1	a*
2	End Marker

-------------------------,
 State | lambda|  a | EOF|
-------+-------+----+----+
 #10   | {7,8} |    |    | 
-------+-------+----+----'
 #9    |         Accepting State [2] for End Marker 
-------+-------+----+----+
 #8    |   0   | 0  | #9 |   
-------+-------+----+----|
 #7    | {1,6} |    |    |
-------+-------+----+----|
 #6    | {3,5} |    |    |
-------+-------+----+----+
 #5    |        Accepting State [1] for a* 
-------,-------,----,----,
 #4    | {3,5} |    |    |
-------+-------+----+----|
 #3    |       |{4} |    |
-------+-------+----+----'
 #2    |  no incoming edges so omit from graph
-------+-------+----+----'
 #1    |  DEAD STATE     |
-------------------------'

NFA Transition Diagram (e is epsilon):
            e      e
   |>- 10 ----> 7 ----> 1   ??? (i do not know what this state for) 
         \       \
          \ e     \ e              e
           \       `--> 6 -------------------> [5]  Accepting state #1 
            \            \                      ^ 
             \            \ e       a          / e  
              \            `--> 3 -----> 4 ---'
               \                 ^       /
                \                 \  e  /
                 \                 `---'
                  `--> 8 ---> [9] Accepting State #2 
                          EOF     
CHECK!!
-------------------------------------------------------------
1	a*
2	End Marker

DFA Dump:
state # 1:  
	1	4      # from state 1 on 'a' [1] move to state 4  (D1)
	2	5      # from state 1 on EOF [2] move to state 5  (D2)
state # 2:
	1	4      # combine state #1 and state #2 since they have the same transitions 
	2	5     
state # 3: 
state # 4:
state # 5: 
	2	6     # from state 5 on EOF move to state 6 
state # 6:
	2	6     # combine with state 5 
 
state # 1 accepts: [1]   
state # 2 accepts: [1]    
state # 3 accepts: [3] (error state - there is no transition)   
state # 4 accepts: [2]      
state # 5 accepts: [1]    
state # 6 accepts: [1]   


Minimize by eliminating the states with the same transitions above
and we have 3 accepting states:

DFA Transition Table 

State   a   EOF  
------------------
D0      D2  D1    
------------------
D1                   Accepting State for regex #2 (EOF)
------------------
D2      D2  D0       Accepting State for regex #1 (a*)   
------------------

           EOF
|>- [D0]-------> [D1]
       \          
        \ a        
         `----> [D2]   
                / ^ 
                \  \   
                 `-'
                  a

Equivalence Classes:  (all characters are in class 1 or class 2) 
'a' is equivalence class 2, everything else is class 1.

\000 = -1 \020 = 1   ' ' = 1     0 = 1     @ = 1     P = 1     ` = 1     p = 1  
\001 = 1  \021 = 1     ! = 1     1 = 1     A = 1     Q = 1     a = 2     q = 1  
\002 = 1  \022 = 1     " = 1     2 = 1     B = 1     R = 1     b = 1     r = 1  
\003 = 1  \023 = 1     # = 1     3 = 1     C = 1     S = 1     c = 1     s = 1  

blah blah blah . . . 

Meta-Equivalence Classes:
1 = 1
2 = 2

USING JFLAP

Step 1. Use JFLAP to convert a* to NFA following Rule 1 and Rule 4 of M-Y-T Algorithm:

Step 2. Convert NFA to DFA using Subset Construction:

Step 3. Just for kicks convert DFA back to RE Transition Graph:

The RE we end up with (λ|aa*) is equivalent to a*. The null (ø) symbol denotes a transition from one state to another (or the same state) that is not defined.