CS 43001 Compiler Construction
(Autumn Semester 2006)
Theory
Niloy Ganguly niloy@cse.iitkgp.ernet.in
Laboratory
Chitta Ranjan Mandal chitta@cse.iitkgp.ernet.in
Niloy Ganguly niloy@cse.iitkgp.ernet.in
Teaching Assisstant
Subhankar Mitra subhankar.mitra@gmail.com
Ashish Tiwari mail.ashishtiwari@gmail.com
Monu Kedia monu.kedia@gmail.com
Ankur Saxena ankur.ankursaxena@gmail.com
Notices
Notices
30.11.2006 The Marks are out. You can come to see your answer script. However please avoid it
if you have got EX or the chance of getting your grade change is minimum. The marks are abolute and the grades follow accordingly. (89 - EX).
Also all grievances regarding lab has been taken into consideration.
12.09.2006 - In scribing you have to submit the doc, pdf and ppt file
25.09.2006 - Assignment 4 last date is extended to 16th October, 2006. Before DP, please submit the
parsing table in doc format
25.09.2006 - All scribes before midsem has to be submitted by 16th OCtober, 2006.
10.10.2006 - Midsem and class test marks are out, If you want to see the answer scripts visit my office by
Oct 13.
Theory
       Lectures
       Evaluation
       Blog
       Assignments
       Students List
          Marks
          Assignment 1
          Assignment 2
          Tiny-C Specification
          Assignment 3
          Assignment 4
          Assignment 5
          Assignment 6
Theory
Lectures : Mon - 4, Tue - 1,2, Thu -3
Room # : CSE 119
Units : 4-0-2
Credits : 4 (Theory)
Instructor : Niloy Ganguly
Contact : Room #313 (CSE), Phone 3460
Text Books:
[1] Aho, A. V., Sethi, R. and Ullman, J. D.
Compilers - Principles, Techniques and Tools
Addison-Wesley, 1988 (Indian reprint)
- aka Dragon book
[2] Santanu Chattopadhyay
Compiler Design
PHI, 2005.
[3] Advanced Compiler Design Implementation
Steven S. Muchnick
Elsevier, 2003
Lectures
The lecture notes are unedited version of student submission.
01. 24.07.06 - Introduction,. ppt 04CS1024
02. 25.07.06 - Introduction,. ppt 04CS3019
03. 25.07.06 - Introduction,. ppt 04CS1005
04. 07.08.06 - Lexical Analyzer. ppt 04CS1006
05. 08.08.06 - Thompson Construction, Subset Construction,. ppt 04CS1017
06. 08.08.06 - Thompson Construction, Subset Construction,. ppt04CS1007
07. 10.08.06 - Thompson Construction, Subset Construction,. ppt04CS1002
08. 16.08.06 - Lex, ppt04CS1008
09. 17.08.06 - Syntax Analysis, Error Recovery, ppt04CS1004
10. 24.08.06 - Ambiguity in Grammar,ppt 04CS1010
11. 28.08.06 - Left Recursion, ppt , 04CS1011
12. 29.08.06 - Top Down Parsing ppt, 04CS1012
13. 29.08.06 - LL(1) Parser, ppt, 04CS1020
14. 31.08.06 - LL(1) Parser, ppt, 04CS1013
15. 07.09.06 - Bottom-up Parser, ppt1, ppt204CS1014
16. 11.09.06 - Bottom-up Parser, Operator Precedence Parser, ppt, 04CS1015
17. 12.09.06 - SLR Parser,ppt, 04CS1016
18. 12.09.06 - SLR Parser, ppt, 04CS1019
19. 14.09.06 - SLR Parser, ppt, 04CS1021
20. 25.09.06 - LR(1) Parser, ppt,04CS1023
21. 26.09.06 - LR(1) Parser, LALR Parser, ppt, 04CS1026
22. 26.09.06 - LR(1) Parser, LALR Parser, ppt, 04CS1027
23. 28.09.06 - LALR Parser, ppt, 04CS1029
24. 09.10.06 - Ambiguity in Grammar ppt, 04CS1030
25. 10.10.06 - Ambiguity in Grammar, Error Recovery ppt, 04CS1028
26. 10.10.06 - Ambiguity in Grammar ppt, 04CS1032
27. 16.10.06 - Intermediate Code Generation ppt, 04CS3002
28. 17.10.06 - Three Address Code Generation,, ppt, 04CS1033
29. 17.10.06 - Three Address Code Generation, ppt, 04CS1034
30. 18.10.06 - Three Address Code Generation ppt, 04CS1035
31. 23.10.06 - Three Address Code Generation - Array, ppt, 04CS1036
32. 24.10.06 - Three Address Code Generation - Array, ppt, 04CS1037
33. 24.10.06 - Three Address Code Generation -Boolean Functions, ppt, 04CS3001
34. 26.10.06 - Three Address Code Generation - Control Statements, ppt, 04CS3004
35. 30.10.06 - Three Address Code Generation - Control Statements, ppt, 04CS3005
36. 31.10.06 - Three Address Code Generation - Control Statements, ppt, 04CS3008
37. 31.10.06 - Three Address Code Generation - Backpatching, ppt, 04CS3010
38. 02.11.06 - Three Address Code Generation - Backpatching, ppt, 04CS3013
39. 09.11.06 - Three Address Code Generation - Backpatching, ppt, 04CS3014
40. 13.11.06 - Target Code Generation, ppt, 04CS3015
41. 14.11.06 - Target Code Generation - Basic Blocks, ppt, 04CS3020
42. 14.11.06 - Target Code Generation - Next Usage, ppt, 04CS3022
43. 16.11.06 - Target Code Generation - Register Allocation/DAG, ppt, 04CS3023
Evaluation
Teacher's Assessment : 20
        Assignment/Scribing : 6
        Class Test I : 5 - 04CS1022
        Class Test II : 5 - 04CS3024
        Attendance : 4
Mid-sem : 30 - Solution 04CS3017
End-sem : 50 - Solution 04CS3025
Blog
Class Diary before midsem
04CS3011, 04CS1018, 04CS3026, 04CS1031
Class Diary after midsem
04CS3018, 04CS3016, 04CS1025, 04CS3012
Assignments
Let G be the following grammar:
(04CS3006)
S-> S op1 S | S op2 S | A
A-> a | b
a. Prove that G is ambiguous.
b. Rewrite G to impose left associativity on both op1 and op2 and to
impose higher
precedence on op1 than op2. The resulting grammar should be unambiguous,
and able to specify an LR(1) parser.
Consider the following grammar:
(04CS3007)
R => R | R
R => R R
R => R *
R => ( R )
R => a | b
where the terminals are { |, *, (, ), a, b }. This grammar generates regular
expressions over a,b involving alternation, concatenation, and repetition.
| alternation,
* repetetion 1 or more time
a) Compute FIRST and FOLLOW for the only non-terminal in this grammar.
b) Show the grammar is ambiguous.
c) construct an equivalent unambiguous grammar that gives the operators
star (repetition), concatenation, and alternation the usual precedences :
repetition > concatenation > alternation.
d) Build the FIRST and FOLLOW set for this new grammar. Is it LL (1)?
We want to construct a simple pocket-calculator program using yacc and lex which can parse strings
such as 1+(10-5-3)*5+2 and print the result, 13 in this case.
Outline the overall structure of your program components. Give full details of the input to yacc and lex.
(Precise syntactic details are not important, but your answer should reflect basic principles involved).
Make following assumption :
- Only three binary operator are allowed i.e. plus, minus and multiply
- Only integer data type is allowed
- Parentheses are used to override precedence.
04CS3009
Lab Assignments
General Guidelines
- 2 people per group.
- Each member of the group should specify what he/she has done.
Assignment 1: Assignment on Thompson's Construction
- understand the working of #include, #define, #ifdef, #ifndef and
commenting in C preprocessor.
Write a mini pre-processor for a C-language. The preprocessor should do the
following:
- inclusion of header files
-- Implement #include preprocessor directive. The content of the argument
file should be copied to the output file.
- textual replacement
-- Implement #define preprocessor (macro) directive that is replace the macro
calls inside the program with the macro contents.
Parametrized and recursive #define need not be considered.
- conditional inclusion/exclusion
-- Implement #ifdef and #ifndef preprocessor directives. According to the
value copy the #ifdef ot #ifndef part to the output file
- stripping out the comments.
-- Implement both single line (//) and multiple line comments (/*
*/)stripping.
Submission Date - 7th August
On 31st Jule come to the lab between 2 to 3 and meet your Teaching Assistants.
Any doubt about the assignments will be cleared by them
Assignment 2: Assignment on Thompson's Construction
You have to implement the following two algorithms
* Thompson Construction - Constructing NFA from regular expression
* Constructing DFA from NFA.
For constructing a lexical analyzer/scanner from a given set of regular
expression.
Assume following specifications for implementation :
1. Input alphabets
* All alphabets from a to z character
- [a..z] denotes all the alphabets from a to z
* White space, \t, \n
2. Character corresponding to operators
* Union (|) Ex: A|B
* Intersection (.) Ex: A.B
* Asterate (*) Ex: A* (0 or more occurences of A)
* Plus (+) Ex: A+ (1 or more occurences)
3. Input format to the tool
Assume that the input regular expressions will given in a file
(reg.exp). Each line will corresponds to one regular expression of the form
.
(a) Assume simple regular expressions only which can be parsed easily without need for
complicated parsing.
(b) Return UNMATCHED for lexeme which donot match any of the regular expression in reg.exp.
(c) Your scanner should go for longest possible match i.e. a string 'iff' should be identified
as 'iff', not as 'if' and 'f'.
(d) Regular expression defined earlier in the reg.exp file should get priority over the one
defined later i.e. 'for' can match as an ID also but if 'for' is defined earlier in the reg.exp
as FOR then it should return FOR not ID
Sample Input-Output (just for illusatration, your tool should work for other
test cases also)
-------------------------------------------------
reg.exp file :
< i.n.t,INT>
< f.l.o.a.t,FLOAT>
<[a..z]+,ID>
Input to be scanned :
int a
float a
ag-fgfg
Output to be generated :
INT
ID
FLOAT
ID
UNMATCHED
----------------------------------------------------
Submission Date - 21st August
On 14th August come to the lab between 2 to 3 and meet your Teaching
Assistants. Any doubt about the assignments will be cleared by them
Tiny-C Specification
1. Data types: int, char, float
2. Array Support: As in C
Ex: int a[10], b[10][5];
Multiple array declarations are also allowed. The format is
as in C.
Ex: int a[10],b[5];
3. Structures: As in C language
4. Composite statement is enclosed within braces { . . .}.
5. Assignment.
Ex: a = 10;
Continued assingment should also be supported
Ex: a = b = 10;
6. Binary Arithmetic operators. +, -, *
7. Unary Arithmetic operator. ++, ** (only postfix)
8. Boolean operators: ==, >, <
9. Binary Logical operator : &&, ||
10. Unary logical operator : !
11. Unary operator: -
12. Conditional statement:
if (condition) composite-statement [else composite-statement]
13. Loops:
a. For loop as in C
b. Do while loop (a bottom tested loop) as in C
b. While loop as in C
14. Program entry point defined by keyword "main"
15. Subroutine identified by keyword "SUB", followed by subroutine name.
It accepts parameters, they are within paranthesis following the subroutine
name. A subroutine is declared wherein the type of the parameters are defined.
There is no return value for a subroutine.
Example :
// Declaration of subroutine (here the type of parameters are defined)
SUB routine(float, int);
// Definition of subroutine (Note that the type of parameters are NOT mentioned
here. This is different from C)
SUB routine(inti_f, iterate)
{
// Here the code for subroutine
....
...
..
}
// Call
routine(1.9, 5);
Programes written in Tiny-C can use identifiers consisting of any alpha-numeric characters,
starting with only _ or [a..z] or [A..Z]
Note : A sample programe written in Tiny-C will be circulated soon.
Assignment 3: Understanding and working with Lex
(2) You have to implement a Lexical Analyzer for Tiny-C programming language
using LEX tools. The specifications of Tiny-C is given above.
Specifications of Lexical Analyzer:
(a) The name of the scanner function should be "int yylex()", it returns the
integer as token.
(b) Each token may have some attributes. For example : If the token corresponds to
"integer constant" than yylex() returns an integer corresponding to "integer constant"
but the value of "integer constant" is its attribute. Such attributes should be made
available to the future stages of compilation through a global variable of C-Union type
which holds atleast the following attributes :
- Value of the integer constant in case of integer token
- Value of the floating point constant in case of floating point token
- character string for other cases. Like it will have the name of the
identifier in case of identifier token
(c) Print some meaningful names for the tokens in the output file, rather than
the integer value returned by the lexical analyzer to aid the understanding of
the output.
The output should be a sequence of duplets :
Example :
Note : This analyzer will be used in the subsequent assingments. So,it should be
implemented in a clean and readable way.
Submission Date - 4th September
Assignment 4: CFG for Tiny-C and table-driven LL(1) parser
You have to write a CFG for Tiny-C programming language, and implement a
table-driven LL(1) parser.
Given:
______
1. The Tiny-C langauge is a simplified and modified subset of C programming language.
The language specifications are given below.
2. You can get the CFG for C-programming language
at Appendix A13
in Kernighan, B. W. and Ritchie, D. M., The C Programming Language, PHI.
Reuse:
______
- Use the Lexical Analizer developed by you in Expt3 (feel free to modify).
TO BE DONE - with paper and pencil
__________________________________
- Write a CFG for the Tiny-C language specifications.
- Transform your CFG (without changing the language) suitably for
a table-driven predictive parser (LL(1)).
- Compute FIRST, FOLLOW and create the Parsing table
(manually).
- Make a proper document of the above work, and submit a HARD-copy.
(Also mail the soft-copy please).
TO BE DONE - programming assignment
___________________________________
- Write a Non-recursive table-driven LL(1) parser.
Input - Outputs
_______________
- The grammar of the language must not be hard coded in the program, rather should be taken
as input using a file. The file name can be fixed in the program or taken as one of the
input of the program.
- The program should also take the test program file as input.
- The output of the program should be "Accepted" in the case that the input program
satisfies the given grammar.
- In the event of errors, try to do as clear error reporting as possible. Both lexical and
syntactic errors should be clearly reported. The error reporting should contain the possible
line number of the error and the type of error as well. All errors must be reported in
separate lines and the format should be similar to the one used by gcc.
- The program should try to find multiple errors (if present). The program should be
designed in such a way that it does not terminate when an error is detected, but continues
to detect any other occurance of error.(This feature is optional, but, will attract bonus
marks if implemented.)
Programming Assignemnet Submission:
__________________________________
1. Compile and test your program on one of the dept. machines. Once tested send
the report in the following way. Use a 'Makefile' for the whole
process(optional). The program must take inputs from the command line and must not be
interactive. which then will be tested on 'test-data' (say).
$make
.
.
$a.out < test-data
or
$a.out test-data
2. For submission prepare a tar-archive consisting of the 'Makefile',
README, and all .c as well as all .h files. A .h file should
contain Macros, prototype declarations, and type definitions. But
it should not contain any C function body or variable declaration.
Never '#include' one .c file in another. Also include the Grammar
in a text file.
You may include a few sample test-files. One sample file should
be named as 'in'. If you wish to include more sample files, name
them as 'in1', 'in2' ...
3. NAME the tar-archive as "expt4_??.tar" where "??" is a TWO DIGIT
Group_Id number. Do not include the directory while creating the
archive. For example, use "tar -cvf .tar ." in a
directory that contains all the files to be included. Do not
include binaries/image files.
4. Submit the tar file by mailing it to compiler.06@gmail.com.
5. This experiment is a part of generating a working compiler for Tiny-C
language which will continue until the semester-end. So keep a copy
of the archive with you, you need to reuse the source-code later.
Submission Date - 28th September - The submission is to be done through mail
Assignment 5: Assignment on YACC
You have to implement a LALR(1) parser for Tiny-C programming language
using 'yacc' tools.
- Specifications of Tiny-C is the same as circulated earlier
- Reuse the Lexical Analyzer which you implemented in Assingment 3
- Reuse the grammar which you wrote in Assingment 4. Feel free to modify
it as per the requirements of 'yacc'
- Try to report some error message in case of syntax error in the input
Submission Date - 16th October
Assignment 6: Assignment of Intermediate Code Generation
The goal of this assingment is to generate machine independent intermediate
representation (IR) for a program written in Tiny-C. There can be several
different types of IR. We are concerned with generating a simple but widely
followed IR i.e. 3-address (TA) code.
Programming Assingment:
You have to incorporate semantic actions in the parser implemented in
Assingment 5 for generating 3-address code.
- Reuse the Lexical analyzer for Tiny-C developed in Assingment 3
- Reuse the parser for Tiny-C developed in Assingment 5
" Note : Generate 3-address code in such a way so that it can be
compiled using native C compiler. For this use the equivalent C
statements for 3-address code and include variable declarations
of the user defined variables and temporaries. A sample front-end
following this philosophy for C language can be looked at
http://www.lancecompiler.com/ , we want to have similar front-end
for Tiny-C "
Submission Date - 30th October