CS 43009 Compiler Construction

(Autumn Semester 2007)


Niloy Ganguly niloy@cse.iitkgp.ernet.in

Teaching Assisstant
Hema Swetha K. hema.swetha@gmail.com
Joy Deep Nath jd.nath.compiler@gmail.com
Nitin Bansal n.bansal.iit@gmail.com
Udit Sajjanhar sajjanharudit+compiler@gmail.com


Notices

Theory

       Lectures
       Evaluation
      Students List       

 Lab-Assignments (Groups and Marks)
        Assignment 1
        Tiny C
        Assignment 2



Theory

  Lectures   : 	Mon - 4, Tue - 1,2, Thu -3 
  Room #     :  CSE 119
  Units      :	4-0-2
  Credits    :  4 (Theory) 
  Instructor :	Niloy Ganguly
  Contact    : 	Room #313  (CSE), Phone 3460

Text Books:

[1]  Aho, A. V., Sethi, R. and Ullman, J. D.
     Compilers - Principles, Techniques and Tools
     Addison-Wesley, 1988 (Indian reprint)
     - aka Dragon book

[2] Santanu Chattopadhyay
    Compiler Design
    PHI, 2005.

Lectures

Evaluation

Teacher's Assessment : 20
        Attendance: 10
        Class Test 1 : 5
        Class Test 2 : 5
       
Mid-sem : 30

End-sem : 50

Lab Assignments

General Guidelines

Assignment 1: Assignment on Thompson's Construction

You have to implement the following two algorithms

The input alphabet will be-

The regular expression will be input with no spaces. The first brackets should be used to show precedence at all times and the scanner must work accordingly. eg-

Marking Guidelines

(A)NFA construction 15
(B)NFA ouput in proper format 5
(C)Subset construction 15
(D)DFA ouput in proper format 5
(D)Code Quality 10
Total marks 50

Output formats

NFA Output: The Ouptut of NFA will be in a tab separated tabular form. The output format of for NFA corresponding to the regular expression - ((a)|(b)) would be-
state a b epsilon
0 = = 1,3
1 2 = =
2 = = 5-
3 = 4 =
4 = = 5-
5- = = =
Note. The columns are tab separated.
DFA Output: The Ouptut of DFA will be in a tab separated tabular form. The output format of for DFA corresponding to the regular expression - ((a)|(b)) using the NFA given above would be-
A0,1,3
B-2,5
C-4,5
stateab
AB-C-
B-==
C-==
Note. The columns are tab separated.

* Evaluation of items (B) and (D) will be done in the lab. The output format for the NFA would be-

Please be present on 06/08/2007 to show your code.

* The overall working of the code will be evaluated based on test cases.

 

Tiny-C Specification

1. Data types: int, char, float
2. Array Support: only single dimensional as in C
Ex: int a[10];
Multiple variable declarations should be supported. The format is as in C.
Ex: int a, b;
      int a[10],b[5];
3. Global variables
4. Composite statement is enclosed within braces { . . .}.
5. Assignment operator: = .
Ex: a = 10;
Continued assingment should also be supported
Ex: a = b = 10;
6. Binary Arithmetic operators. +, -, *
7. Unary Arithmetic operator. ++, ** (only postfix)
8. Boolean operators: ==, >, <
9. Binary Logical operator : &&, ||
10. Unary logical operator : !
11. Unary operator: -
12. Input / Output Constructs:
1. read(x) - Read into variable x
2. print(x) - Write variable x to output
13. Conditional statement:
if (condition) composite-statement [else composite-statement]
14. Loops:
a. For loop as in C
b. While loop as in C
15. Program entry point defined by keyword "main"
Programes written in Tiny-C can use identifiers consisting of any alpha-numeric characters, starting with only _ or [a..z] or [A..Z]
 

Example

 


int count;
main()
{
        int i=0;
        int sum=0;
              
        read(count);
        for(i=0;i<10;i++)
        {
               
               read(x);
               sum = sum + x;

        }
        
        print(sum);
}

Assignment 2: Understanding and working with Lex

(2) You have to implement a Lexical Analyzer for Tiny-C programming language using LEX tools. The specifications of Tiny-C is given above.

Specifications of Lexical Analyzer:
(a) The name of the scanner function should be "int yylex()", it returns the integer as token.
(b) Each token may have some attributes. For example : If the token corresponds to "integer constant" than yylex() returns an integer corresponding to "integer constant" but the value of "integer constant" is its attribute. Such attributes should be made available to the future stages of compilation through a global variable of C-Union type which holds atleast the following attributes :
- Value of the integer constant in case of integer token
- Value of the floating point constant in case of floating point token
- character string for other cases. Like it will have the name of the identifier in case of identifier token
(c) Output Format : Print only the token name in a new line, ' <token name>newline '. In case of integer constant and floating point constant print ' <token name>tabspace<value>newline '. In case of identifier or literal constant print ' <token name>tabspace<character string>newline '. Nothing else should be outputed.

Specification of Token Names is as follows:
 

Token Name Description
MAIN main
INT int
FLOAT float
CHAR char
IF if
ELSE else
FOR for
WHILE while
READ read
PRINT print
ID identifier
INTNUM integer constant
FLOATNUM floating point constant
LTE less than equal to
GTE greater than equal to
LT lesser than
GT greater than
ET equal to
INCR increment operator, ++
DECR decrement operator, --
MINUS minus
PLUS plus
DIV divide
MULT multiply
ASSIGN assignment operator
SCOLON semicolon
OPAREN open parenthesis
CPAREN close parenthesis
OBRACE open braces
CBRACE close braces
OSQUAR open square bracket
CSQUAR close square bracket
LITC literal constant