7.2 Generating the Grammar-Based Information

After having constructed the grammar, several grammar analysis tools needs to be invoked in order to analyse the grammar and generate the necessary information for the various tools. In the following discussion, we will assume the grammar is named mylang and the grammar is residing on the file mylang-meta.gram.

The naming conventions used here are mandatory:
A grammar must reside on a file with a name that is the name of the grammar (as specified in the Grammar clause of the grammar) followed by -meta.gram.

7.2.1 Generating the Metaprogramming Interface

In order to generate the predefined patterns that constitute the tree- and context-free level interface to the AST's generated by the grammar, we have to invoke the generator tool:

generator mylang

The generator checks whether the grammar is a valid structured context-free grammar, and generates the following files:

7.2.2 Generating Parser and Parser Tables

The next step is to analyse the grammar (to check that the grammar is LALR(1) and otherwise well-formed). This is done by the bobsit tool:

bobsit mylang

Besides analysing the grammar, the bobsit tool generates the file:

Bobsit may find errors in the grammar (such as the grammar not being LALR(1), nonterminals that cannot be reached from the startsymbol, etc). If a parser is not to be used at all for the mylang grammar (i.e. all AST's of the grammar is generated by grammar-based tools) these errors may be ignored.

Bobsit is a revised version of the BOBS compiler generator.

7.2.3 Generating Pretty-printer Specification

Having analysed and checked the grammar, generated the AST interface to the grammar, generated the BETA interface to the AST's of the language, and generated the parser and parser tables, the next step is to generate a pretty-printer specification for the language. This is done by the makepretty tool:

makepretty mylang

Makepretty generates the default pretty-printer specification for the grammar mylang on the file:

This default pretty-printer specification is often not the best possible pretty-printer specification for the given grammar. The default pretty-printer specification is therefore often modified in order give a better reflection of the semantical structure of the language. These modifications are done manually and discussed later.

7.2.4 Generating Pretty-printer Specification Tables

As the final step in generating the grammar-based information to be used by the various grammar-based tools, the pretty-printer specification tables need to be generated. This in done by the morepretty tool:

morepretty mylang

Morepretty analyses and checks the pretty-printer specification on the file mylang-pretty.pgram and generates the pretty-printer specification tables on the file:

7.2.5 Generating the Grammar-based Information Easily

In order to make it easier to run these four tools in the right sequence, a utility tool is available:

dogram mylang

which runs generator, bobsit, makepretty and morepretty in that sequence. Please note, that the dogram tool will overwrite any existing manually edited pretty-printer specifications. If these should be retained, either run the three other tools (e.g. except makepretty) manually, or copy the manually edited pretty-printer specification file to a safe place before invoking dogram.

7.2.6 Registering the new grammar

The grammar is now ready for being used by the different grammar-based tools in the Mjølner BETA System. The grammar-based tools uses a particular searching strategy, when trying to locate the grammar to be used for interpreting a given file (textual source file, of a group file, containing the ASTs). This searching strategy is implemented in the findGrammar fragment, described later. This strategy is the following:

  1. First try to locate the grammar in the current directory.
  2. Then try to find the grammar among the grammars specified in one of the grammar specification files:
    1. First try among the grammars defined in the MBSgrammars.text file located in the current directory.
    2. Then try among the grammars defined in the MBSgrammars.text file located in the HOME directory of the user.
    3. Then try among the grammars defined in the MBSgrammars_DEMO.text file located in the ~beta directory.
    4. Finally try among the grammars defined in the MBSgrammars_STD.text file located in the ~beta directory.

The first grammar found in this sequence will be used. The format of these grammar specification files are:

[[
-- INCLUDE 'filename of grammar'
-- INCLUDE 'filename of grammar'
... etc. ...
-- INCLUDE 'filename of grammar'
]]

The filename of the grammar should not include the -meta suffix. If we assume that your new grammar mylang is located in the ~you/mylang directory, you can specify the grammar by inserting the following line in one of the grammar specification files:

-- INCLUDE '~you/mylang/mylang'

In order to have your new grammar being usable by the grammar-based tools, you therefore either have to have the grammar files located in the current directory, or have the grammar specified in one of the above mentioned grammar specification files. Since the grammarsDEMO.text and grammarsSTD.text files are located in the ~beta directory, these files will not usually be modifyable by the normal users. You will therefore most often be specifying your grammar in one of the .MBSgrammars.text files mentioned above.

Important note: In this release, it is necessary to specify the grammar also in the file

MBSgrammarsExt.text

to make the grammar usable for the grammar-based tools (Sif, Freja, Frigg, Valhalla).

7.2.7 Using the Pretty-printer and the Hyper-structure Editor

After having specified your grammar, you will be able to use the new grammar with the pretty-printer and the hyper-structure editor.

In order to make a pretty-print of the mylang program on the file tst.mylang, the pp tool must be used:

pp tst.mylang > tst.pp

This will pretty-print the file tst.mylang (or tst.ast) and deliver the pretty-print on the file tst.pp. pp accepts two options:

The grammar is now also ready for use by the hyper-structure editor Sif in the Mjolner tool. Usage of mylang in Sif is described in detail in the Mjolner manual [MIA 99-34]. However, in order to be able to utilize the automatic contraction facilities of Sif, the grammar specification needs to be augmented with one additional section, namely definition of the contractionCategories property. This property is specified at the very beginning of the grammar specification (in the form of a fragment property). As an example, we can define the contractionCategories property for the mylang grammar:

contractioncategories
        module
        import
        if
        while;
-- mylang: aGrammar: metagrammar --
grammar mylang:
rule
<module>        ::= 'module' <module:id> ';' <importOpt>
                    'begin' <statement> 'end';
<id>            ::= <nameDecl>;
<importOpt>     ::? <import>;

etc. as previously

In this example, we have specified the rules module, import, if and while as the contractions categories. This implies, that Sif automatically will contract these parts when displaying a program derived from mylang. Please refer to the Sif.

Please note, that we in this example shows the entire file, including the fragment syntax needed:

-- mylang: aGrammar: metagrammar --

Note, that aGrammar and metagrammar are mandatory names, whereas mylang can be freely chosen.

7.2.8 Modifying the Pretty-print Specification

If you want to modify the specification, then modify the file: mylang-pretty.pgram either by means of a text editor or the pretty-print specification language-editor. After modifying the pretty-print specification morepretty is used again to create new tables for the editor. Note that if the grammar is modified, then the file mylang-pretty.pgram must be updated accordingly, e.g. a new production in the grammar might require a new production in the pretty-print specification grammar. In order to be able to modify the pretty-print specification the user must know the pretty-print algorithm.

7.3 The Pretty-print Algorithm

The pretty-print algorithm is an adaptive pretty-printer, i.e. the pretty-printer always tries to print as much as possible on each line. If the pretty-printed text cannot fit on one line, the pretty-print specification tells where to break the line. For each production in the grammar, there is a specification of how to pretty-print that production. Furthermore it is indicated where to associate a possible comment.

The pretty-printer algorithm takes as input a stream of tokens. A token is a text string, a break or a block.

A stream is defined by

The algorithm gives a text string as output. The text string has a fixed maximal width. The block concept is used in the following way: the algorithm tries to break onto different lines as few blocks as possible according to the maximal width. If a block cannot fit on one line the block has to be broken. The breaks are used to specify where to break the block. A break has a length and an indention. The length specifies the number of single space characters to be written if the block is not broken. The indention specifies the number single space characters to be written (relative to the surrounding block) at the beginning of a new line if the block is broken.

There exist two types of blocks: consistent and inconsistent blocks. If a consistent block cannot be written on one line the substreams of the block will be written on separate lines. I.e. all breaks in the block will imply a line break. If an inconsistent block cannot be written on one line the substreams are only written on a separate line if they cannot be written on the rest of the current line.

The reason for the distinction between consistent and inconsistent blocks is that one might prefer:

if <<condition:exp>> then
       <<thenPart:statement>>
else
       <<elsePart:statement>>
endif
to
if <<condition:exp>> then <<thenPart:statement>> else
<<elsePart:statement>> endif
and
import a,b,
       c
to [4]
import a,
       b,
       c

7.4 The Pretty-print Specification

The pretty-printer is based on the algorithm just described, but because the internal representation of a document is an abstract syntax tree, the input stream to the pretty-printer algorithm must be generated from the AST. This step is called unparsing. The structure of an AST is described by means of a grammar. The pretty-print specification defines for each production in the grammar how it shall be unparsed.

Only those productions, that result in nodes in the AST, has a corresponding specification. The productions are constructor and list productions as described previously.

The grammar for pretty-print specifications is described in appendix 2.

The pretty-print specification of a constructor production has the following form:

<Constructor> ::= <ProductionName:nameAppl> '=' <Stream:ItemList>;
<ItemList>    ::* <Item>;
<Item>        ::| <Terminal> | <NontTerm> | <Break> 
                | <Block> | <CommentPlace>

The ProductionName is the name of the syntactic category on the left side of the corresponding production in the language grammar. The items in the stream can be terminals, nonterminals, breaks, blocks or comment places:

The pretty-print specification of a list production has the following form:

<ListProd> ::= <ProductionName:nameAppl> '=' '(' <ListSpec> ')';
<ListSpec> ::= <Beginning:ItemList> 
               '{' <BlockType> <Separator:ItemList> '}' 
               <Ending:ItemList>

The list production specifies what is going to be pretty-printed before the list, between the list elements and after the list. A list is always surrounded by a block. Note the the block delimitors for lists are { and }.

The pretty-print specification must always start with a fragment form soecification (similar to the grammar specification). For a pretty-print specification the syntax is:

-- name: prettyprint: prettyprint --

where name must be the name chosen for this grammar.

7.5 An Example of Modifying the Pretty-print Specification

As mentioned the makepretty script generates a default pretty-print specification. This section illustrates the default pretty-print specification of the sample grammar and how the specification can be improved to obtain a more "pretty" pretty-printing.

The default pretty-print specification for mylang (mylang-pretty.pgram) might looks like [5]: d

-- mylang: prettyprint: prettyprint --
PrettyPrintScheme mylangSpec
for mylang:
module  =  T:1 $$ * $$ N:1 $$ T:2 $$ N:2 $$ T:3 $$ N:3 $$ T:4;
id      =  N:1;
importOpt       =  N:1 ;
import  =  T:1 $$ * $$ N:1 $$ T:2;
nameList        =  ( {c T:1 $$ } );
statement       =  N:1 ;
if      =  T:1 $$ * $$ N:1 $$ T:2 $$ N:2 $$ T:3 $$ N:3 $$ T:4;
while   =  T:1 $$ * $$ N:1 $$ T:2 $$ N:2 $$ T:3;
statementList   =  ( {c T:1 $$ } );
exp     =  N:1 ;
text    =  N:1;
number  =  N:1;
expProcCall     =  N:1;
procCall        =  N:1 $$ T:1 $$ * $$ T:2

As can be seen the default specification uses default breaks; there are no blocks except in lists; the block type of a list is consistent and only the list separator is specified; comments are only associated with constructor productions and always after the first terminal symbol.

Let us look at the following source file:

-- test: module: mylang --
module pip;
(* Copyright 1992*)
(* Mj&oslash;lner Informatics *)
import dyt, baat, olsen; (* some imported operations *)
begin
  while olsen() do
     if (* the test condition will be filled out later *) <<condition:exp>>
     then
        dyt()
     else if <<condition:exp>> then baat() else <<elsePart:statement>> endif
     endif;
     <<statement>>
  end
end

Using the default pretty-printer specification as generated by the makepretty tool (see above) results in the following pretty-print [6]:

-- test: module: mylang --
module (* Copyright 1992 *)
(* Mj&oslash;lner Informatics *) pip ; import
(* some imported operations *) dyt, baat, olsen ; begin while olsen ( ) do
if
(* the test condition will be filled out later *)
<<condition: exp>>
then
dyt
(
)
else
if
<<condition: exp>>
then
baat
(
)
else
<<elsePart: statement>>
endif
endif;
<<statement>> end end

Because there are no blocks in the specification, the pretty-printer tries to write as much as possible on each line as can be seen on the first three lines of the program. Note that no blocks in the specification has the same effect as one surrounding inconsistent block. After the third line each item is written on a separate line. This is caused by the statementList, that introduces a consistent block of statements.

This pretty-printing is certainly not very pretty, but only a few modifications of the specifiation improves the layout drastically. Consider the following modified specification:

-- mylang: prettyprint: prettyprint --
PrettyPrintScheme mylangSpec
for mylang:
module  =  [c [i T:1 $$ N:1 T:2 ] $$ * $$ N:2 $$ [c T:3 $1,2 N:3 $$ T:4] ];
id      =  N:1;
importOpt       =  N:1 ;
import  =  [i T:1 $$ N:1 T:2 $$ *];
nameList        =  ( {i T:1 $$ } );
statement       =  N:1 ;
if      =  [c [i T:1 $$ * $1,3 N:1 $$ T:2 ] $1,3 N:2 $$ [i T:3 $1,3 N:3] $$ T:4 ];
while   =  [c [i T:1 $$ * $$ N:1 $$ T:2 ] $1,3 N:2 $$ T:3];
statementList   =  ( {c T:1 $$ } );
exp     =  N:1 ;
text    =  N:1;
number  =  N:1;
expProcCall     =  N:1;
procCall        =  N:1 T:1 T:2 $$ *

The test program now looks much nicer:

-- test: module: mylang --
module pip;
(* Copyright 1992 *)
(* Mj&oslash;lner Informatics *)
import dyt, baat, olsen; (* some imported operations *)
begin
  while olsen() do
     if (* the test condition will be filled out later *)
        <<condition: exp>> then
        dyt()
     else
        if <<condition: exp>> then
           baat()
        else <<elsePart: statement>>
        endif
     endif;
     <<statement>>
  end
end

The following changes have been made: In the module production some blocks have been introduced: one that surrounds the whole production, one that surrounds the header of the module and one that surrounds the body of the module. It is important that the block types of the first and the last block are consistent. The break between the module name and the ":" have been removed. The comment character have been moved. The break between the begin keyword and the <statement> has been changed to an indention of two characters.

In the nameList production, the block type has been changed to inconsistent. The import production is surrounded by an inconsistent block in order to "overload" the consistent block of the start production. In the if and while productions, blocks has been introduced and the indention has been changed. In the procCall production, the breaks has been removed.

The reader is recommended to try to understand the effect of these modifications in order to gain insight into the workings of the pretty-print specification.


[4] Naturally, we are assuming that the line width is not sufficient to have the entire import a, b, c on one single line
[5] This default pretty-print speficication is subject to changes. The default pretty-print specification on your system might differ from the one shown here. Especially will all construction nonterminals have consistent blocks surrounding their right-hand pretty-print specifications
[6] Since the default pretty-print specification might differ on your system, the same applies for this pretty-print example


The Metaprogramming System - Reference Manual
© 1991-2002 Mjølner Informatics
[Modified: Friday January 4th 2002 at 13:10]