The first phase of the compiler is the parser, which transforms an
input program in text form into an abstract syntax tree. The parser is
followed by the weeder, which performs some simple syntactical checks
and transformations on the AST that are not (easily) expressible in
the parser itself.
You must hand in two files parser.mly and
weeding.ml (and optionally ast.ml if you extend some
of the datatypes) that extend the skeleton so
that it parses the input syntax and constructs the correct ASTs. You
must also post an entry in your group blog that
explains how your code works and which problems you have encountered
and solved in the process.
The AST specified in ast.ml defines the interface between the
parser and the weeder. You may add additional nodes to this AST, but
after the weeding phase, the AST must contain only nodes specified
in weedingast.ml. Other nodes may be added temporarily by the
parser, but these must be rewritten or rejected by the
weeder.
For any node with a Lexing.position field, the position field may be
changed temporarily to use Lexing.dummy_pos, such that the node can be
created without having a proper position available. However, in all
such cases, the position must be filled out appropriately by the weeder.
Missing syntactical constructs
The parser.mly file in the skeleton covers most of the Joos
1 language, but a few, essential language constructs are missing. It
is your job to fill in these missing parts of the grammar. In
particular, the grammar is missing
- fields,
instanceof expressions,
- cast expressions,
- array access lvalues,
- simple
for statements (since for statements
are not available in the AST, they must be rewritten (desugared)
using available constructs, either by the parser or by the
weeder), and
- negative integer literals (these are currently parsed as negated
positive integer literals, which does not always work correctly -
figure out yourself why and how to fix it).
final, static and abstract
modifiers for method declarations. Note that access should always come first in the joos-languages.
For a Joos 2 compiler, the grammar must furthermore be extended with
throw statements,
- interfaces,
super and this constructor calls,
- increment and decrement operations,
- multidimensional array types, and
- multidimensional array creation expressions.
Syntactical checks and transformations
The AST delivered by the weeding phase must adhere to a number of
additional requirements that are not expressed in the AST definition.
Each of these requirements can be enforced syntactically in the
grammar or checked explicitly in the weeder, or some combination (note
that some of these are already enforced by the given grammar). When
handling checks in the weeder, use the methods mentioned in parentheses
to report the corresponding errors:
- An abstract class cannot be final. (error_abstract_final_class)
- An abstract method must not have a body. (error_abstract_method_body)
- A non-abstract method must have a body. (error_non_abstract_method_body)
- An abstract method cannot be static or final. (error_abstract_method_final_or_static)
- A static method cannot be final.(error_static_final_method)
- A typeexp node which is not the return type of a method
must not be a Void node. (Use one of error_void_type_array, error_void_type_field, error_void_type_instanceof, error_void_type_not_return_type, error_void_type_cast, and error_void_type_variable)
- A constructor must have the same name as its enclosing
class. (error_constructor_name)
- A class declaration must reside in a .java source file
with the same base name as the class. (error_invalid_source_filename)
- Either (Joos 1):
No interfaces allowed, (check_joos1_interface)
or (Joos 2):
- An interface declaration must reside in a .java source file
with the same base name as the interface. (error_invalid_source_filename)
- An interface must declare no fields or constructors. (error_interface_field / error_interface_constructor)
- An interface method is implicitly abstract.
- An interface method is implicitly public.
- An interface method cannot be static or final. (error_static_or_final_interface_method)
- An interface method must not have a body. (error_interface_method_with_body)
- Either (Joos 1):
A class must contain at least one explicit constructor, (check_joos1_omitted_constructor)
or (Joos 2):
If a class contains no explicit constructors, it implicitly
contains a public no-arg constructor which simply contains
a no-arg super statement.
- Either (Joos 1):
- A constructor implicitly contains a no-arg super statement.
- No explicit super or this statements allowed. (check_joos1_explicit_supercall / check_joos1_thiscall)
or (Joos 2):
- A super or this statement must be the first statement
in a constructor body. (error_supercall_not_first_statement / error_thiscall_not_first_statement)
- If a constructor body contains no explicit super or this statement,
it implicitly contains a no-arg super statement.
- Either (Joos 1):
No final field declarations allowed, (check_joos1_final_field_declaration)
or (Joos 2): A final field must have an initializer. (error_missing_final_field_initializer)
- (Joos 1): No throw statements allowed. (check_joos1_throw)
- (Joos 1): No static field declarations allowed. (check_joos1_static_field_declaration)
- (Joos 1): No increment or decrement expressions allowed. (check_joos1_incdec)
- (Joos 1): No multidimensional array types allowed. (check_joos1_multiarray)
- (Joos 1): No multidimensional array creation expressions allowed. (check_joos1_multiarray)
Use error_syntax_error for reporting syntactical errors handled in the weeder.
Values of literals
The weeder is responsible for calculating the values of all
literals. In particular:
- For each integer literal in the program, bind its integer value
with the
Weedingast.IntConst constructor. Check that the
number is within the legal range for 32-bit signed integers. (error_invalid_integer)
- For each character literal in the program, process any character
escape sequence in the literal and bind the resulting character
value with the
Weedingast.CharConst constructor.
- For each string literal in the program, process any character
escape sequences in the literal and bind the resulting string
value with the
Weedingast.StringConst constructor.
Things to be done later
After the weeding phase, the AST nodes must correspond to the actual
language constructs, except for a number of cases which cannot be
resolved at this time.
The nodes that might be represented temporarily by different nodes
are StaticInvoke, NonstaticInvoke, and NonstaticField. In particular,
- A method invocation qualified by a name is represented as an
AmbiguousInvoke node.
- An unqualified method invocation is represented as an
SimpleInvoke node.
- A simple or compound name used as an lvalue is represented as an
AmbiguousName node.