Sonoma State University
Department of Computer Science
CS-460: Programming Languages
Programming Assignment 2: Tokenization

Objective:

  • Write a program in C or C++ that will identify and remove comments from an input test file using a deterministic finite state automoton (DFA) then use a DFA to convert the input file into a series of tokens. Lastly, your program should display the tokens as output (if no syntax errors occurred) or an error message instead.

Please note

Input test files to determine the effectiveness of your programming assignment two solution:

The output from your programming assignment two solution should look like the following text files:

Reflection

  • Since tokenization is the first step in parsing, it can be difficult to determine syntax errors at this point (except for the most rudimentary tokens). The difficulty is not knowing which grammatical rules should be applied on a token-by-token basis as defined by the C-like programming language in BNF. Regardless of difficulty, tokenizing the entire input file without loss of a single byte (i.e. every byte is accounted for from input file to tokenization) is vitally important as you will be relying on tokens exclusively from here on in parsing and lexical analysis.

Uploading your programming project files

  • Please upload your source code and a Makefile as a zip or gzipped-tar file.

Programming Assignment 2 Rubric

CRITERIA RATINGS POINTS
Compilation:
Will the program compile with GNU compiler?
Proficient
2 points

Makefile provided. Student's assignment two program is written in C or C++ and compiles with gcc (GNU C compiler) or g++ (GNU C++ compiler) without syntax errors on GNU/Linux. No external libraries (besides standard built-in C/C++ libraries) are required to build the project.
Satisfactory
1 point

Student's assignment two program is written in C or C++ and compiles on GNU/Linux with gcc or g++. A Makefile is not included. Extra external library dependencies may be required to compile and run student's assignment one program besides standard built-in C/C++ libraries.
Below Expectation
0 points

Makefile not included. Student's assignment two program fails to compile with gcc or g++ on GNU/Linux.
2 points
Parsing Implementation:
How was the parsing technique implemented?
Proficient
2 points

Program features a procedurally-driven deterministic finite-state automaton (DFA) to identify and parse comments.
Below Expectation
0 points

Student's assignment two program implements table-driven DFAs; OR student's assignment two program implements a combination of procedurally-driven and table-driven DFAs; OR student's assignment two program fails to use DFAs (table or procedural).
2 points
Test program 1:
The first benchmark test.
Proficient
1 point

Student's assignment two program removes all comments for test program one without impacting the line numbering of statements. Student's assignment two program identifies then displays the list of tokens as output. The token list output from student's assignment two program matches the expected token list output as noted above in the assignment guidelines.
Satisfactory
0.75 points

Between one and three tokens do not match the expected output token list as noted in the assignment guidelines above.
Below Expectation
0 points

Four or more tokens do not match the expected output token list as noted in the assignment guidelines above; OR the output token list is missing.
1 point
Test program 2:
The second benchmark test.
Proficient
1 point

Student's assignment two program removes all comments for test program two without impacting the line numbering of statements. Student's assignment two program identifies then displays the list of tokens as output. The token list output from student's assignment two program matches the expected token list output as noted above in the assignment guidelines.
Satisfactory
0.75 points

Between one and three tokens do not match the expected output token list as noted in the assignment guidelines above.
Below Expectation
0 points

Four or more tokens do not match the expected output token list as noted in the assignment guidelines above; OR the output token list is missing.
1 points
Test program 3:
The third benchmark test.
Proficient
1 point

Student's assignment two program removes all comments for test program three without impacting the line numbering of statements. Student's assignment two program identifies then displays the list of tokens as output. The token list output from student's assignment two program matches the expected token list output as noted above in the assignment guidelines.
Satisfactory
0.75 points

Between one and three tokens do not match the expected output token list as noted in the assignment guidelines above.
Below Expectation
0 points

Four or more tokens do not match the expected output token list as noted in the assignment guidelines above; OR the output token list is missing.
1 point
Test program 4:
The fourth benchmark test.
Proficient
1 point

Student's assignment two program removes all comments for test program four without impacting the line numbering of statements. Student's assignment two program identifies then displays the list of tokens as output. The token list output from student's assignment two program matches the expected token list output as noted above in the assignment guidelines.
Satisfactory
0.75 points

Between one and three tokens do not match the expected output token list as noted in the assignment guidelines above.
Below Expectation
0 points

Four or more tokens do not match the expected output token list as noted in the assignment guidelines above; OR the output token list is missing.
1 point
Test program 5:
The fifth benchmark test.
Proficient
1 point

Student's assignment two program detects an invalid integer on line eight of test program five then outputs the error message "Syntax error on line 8: invalid integer". No token list is displayed since a syntax error occurred.
Satisfactory
0.75 points

An error message is displayed but the line number where the error occurred is incorrect or missing; OR a token list is displayed with the error message.
Below Expectation
0 points

No error message is displayed. A token list may be displayed instead.
1 points
Test program 6:
The sixth benchmark test.
Proficient
1 point

Student's assignment two program detects an invalid integer on line eight of test program six then outputs the error message "Syntax error on line 8: invalid integer". No token list is displayed since a syntax error occurred.
Satisfactory
0.75 points

An error message is displayed but the line number where the error occurred is incorrect or missing; OR a token list is displayed with the error message.
Below Expectation
0 points

No error message is displayed. A token list may be displayed instead.
1 point
Total points: 10