A Compiller Generator for Microcomputers - Mössenböck H., Rechenberg P.

Автор: Mössenböck H. Rechenberg P.
Теги: programming computer science microcomputers software tools
ISBN: 0-13-155060-8
Год: 1989
Похожие
A Compiller Generator for Microcomputers
Modular Programming Languages
Modern Compiler Implementation in C
Engineering a Compiler
Текст
                    ACOMPILER
ee
MICROCOMPUTERS

eee

~P.RECHENBERG

ISSENBOCK

hy

190081

Digitized by the Internet Archive
in 2022 with funding from
Kahle/Austin Foundation

https://archive.org/details/compilergenerato0000rech

A COMPILER GENERATOR
FOR MICROCOMPUTERS

Limits of Liability and Disclaimer of Warranty

The authors and publishers of this book have used their best efforts in
preparing this book and the programs contained within it. These efforts
include the development, research and testing of the theories and programs
to determine their effectiveness. The authors and publishers make no
warranty of any kind, expressed or implied, with regard to these programs
or the documentation contained in this book. The authors and publishers
shall not be liable in any event for incidental and consequential damages
in connection with, or arising from, the furnishing, performance or use of

these programs.

A COMPILER GENERATOR
FOR MICROCOMPUTERS
Peter Rechenberg
University of Linz

Hanspeter Mossenbock
University of Ziirich

Translated by John O’Meara
and the authors

First published in English 1989 by
Prentice Hall International (UK) Ltd,
66 Wood Lane End, Hemel Hempstead,
Hertfordshire, HP2 4RG
A division of
Simon & Schuster International Group
This book was originally published in German under the title
Ein Compiler Generator ftir Mikrocomputer by Peter
Rechenberg and Hanspeter Rechenberg
© 1985 Carl Hanser Verlag, Munich and Vienna.

© 1989 Carl Hanser Verlag and
Prentice Hall International (UK) Ltd
All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in
any form, or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior
permission, in writing, from the publisher.

For permission within the United States of America contact
Prentice Hall Inc., Englewood Cliffs, NJ 07632.

Printed and bound in Great Britain by
A. Wheaton

& Co. Ltd, Exeter.

Library of Congress Cataloguing-in-Publication Data
Rechenberg, Peter

[Compiler- Generator fiir Mikrocomputer. English]
A compiler generator for microcomputers / Peter
Rechenberg.
Hanspeter Mössenböck.

>

Cin

Translation of: Ein Compiler- Generator fiir
Mikrocomputer.
Bibliography: p.
Includes index.

ISBN 0-13-155060-8 : $40.00
1. Compilers (Computer programs)
— Programming.
I. Mössenböck,

Hanspeter,

QA76.76. C65R4313
005.26 — dc19
—————

1959-

.

2. Microcomputers
II. Title

1988
ee

ee en

ee

ee ee

British Library Cataloguing in Publication Data
Rechenberg, Peter

A compiler generator for microcomputers.
l. Computer systems. Programming languages.
Compilers. Design & construction
I. Title II. Mössenböck, Hanspeter

III. Ein Compiler-Generator für Mikrocomputer.
English 005.4'53
ISBN 0-13-155060-8
ISBN 0-13-155136-1 Pbk
ee
ee

12345

92 91 90 89 88

ISBN D-13-155060-8
ISBN O-13-15513b-1

PBK

88-28926

Contents

Preface
Numbered

definitions,

algorithms,

examples

Symbols
1

Introduction and survey
1.1 Compilers and compiler compilers
1.2 Static compiler structure
1.3 Dynamic compiler structure
1.4 The structure of the book

2

Syntax
2.1 Basic concepts from formal language theory
2.2 LL(1) grammars and syntax analysis
2.3 The top-down graph
2.4 The G-code
2.5 Parsing with the G-code
2.6 Error handling

xi

Contents

3

Semantics
3,1 Semantic actions
3.2 Attributes
3.3 Context conditions
3.4 Attributed grammars
3.5 L-Attributed grammars
3.6 Implementation of the semantic interface

69
70
a
76
79
82
85

Various compiler compilers
4.1 YACC - yet another compiler compiler
4.2 HLP84 - Helsinki language processor
4.3 GAG - generator based on attribute grammars
4.4 MUG - modular compiler generator
4.5 Coco - compiler compiler
4.6 Summary

91
92
94
96
98
100
102

The compiler description language Cocol
5.1 Lexical structure
5.2 Cocol as a syntax description language
5.2.1 Productions
5.2.2 Declarations
5.3 Cocol as a semantic description language
5.3.1 Semantic actions
5.3.2 Attributes
5.3.3 Context conditions
5.3.4 Semantic declarations
5.3.5 Scope of semantic objects

105
105
106
107
109
110
111
113
115
115
116

The
6.1
6.2
6.3
6.4

117
147
119
120
121
121
1272
122
123
124

compiler compiler Coco
Characteristics
Components of the generated compiler
Operation of the generated compiler
Interfaces of the generated compiler
6.4.1 Caller interface
6.4.2 Input interface
6.4.3 Output interface
6.4.4 Syntax error interface
6.5 Generation of multi-pass compilers

Contents

7

The implementation
ed Survey
12 Structure of the symbol list
7.2.1 Symbol list representation
7.2.2 Symbol list construction
1.3 Structure of the top-down graph
7.3.1 Top-down graph representation
7.3.2 Top-down graph construction
7.3.3 Insertion of eps-nodes
7.3.4 Removal of redundant eps-nodes
7.4 Collecting the symbol sets
7.4.1 Deletable nonterminals
7.4.2 Terminal start symbols of nonterminals
7.4.3 Terminal successors of nonterminals
7.4.4 eps-sets
7.4.5 any-sets
FS) Grammar tests
7.5.1 Completeness
7.5.2 Reachability
7.5.3 Noncircularity
a 7.5.4
7.5.5

Termination
LL(1) condition

7.6 Generation of the parser tables
7.6.1
7.6.2
7.6.3

Table format
Generation of the G-code
Generation of the remaining tables

77, Generation of the syntax analyzer
7.8 Generation of the semantic evaluator
7.8.1 The invariant parts of the semantic evaluator
7.8.2
7.8.3
7.8.4
8

Processing of the semantic declarations
Processing of the semantic actions
Attribute processing

Applications
8.1 Applications in compiler construction
8.1.1 Specification of a lexical analyzer
8.1.2 Description of a lexical analyzer for Modula-2
8.1.3 Semantic procedures for lexical analysis
8.2 Applications in software engineering
8.2.1 Attributed grammars as a software design method

Vill

Contents

Jackson method as a special case
a Coco run
generated syntax analyzer
generated semantic evaluator
generated parser tables

184
187
187
192
193
194
195

Experiences with Coco
9.1 A basis for measurements
9.2 Measurements on Coco
9.3 Measurements on some generated compilers
9.4 General experiences

197
197
199
200
201

8.2.2 The telegram problem as an example
8.2.3 Attributed grammars as documentation
8.2.4 The
8.3 Results of
8.3.1 The
8.3.2 The
8.3.3 The

9

Appendices
A
B

Definition of Adele
Modula-2 and Pascal

C

Syntax of Cocol

D

G-code

E

_Intermodular cross-reference list

F

Program listings

203
207
IA:
213
214
220

Bibliography

370

Index

373

Preface

This book describes the structure of the compiler compiler Coco, which was
developed for microcomputers by the authors. It also deals with the techniques
used by Coco and those by which Coco was developed. Special attention is
given to the table driven top-down syntax analysis with automatic error
recovery and description of semantics using L-attributed grammars. Coco is
writteninModula-2 and generates compilers in Modula-2. It is hoped that this
will show how well Modula-2 is suited to the implementation and
documentation of large modular programs.
Compiler compilers, as we understand them, are not the field of a few
specialists in compiler construction, but rather are tools for managing various
tasks in software engineering, a fact which is not generally known. The
methodology of attributed grammars which lies at the foundation of compiler
compilers includes, for example, the Jackson method as a simple special case,
and can be applied where the program flow is primarily controlled by one
structured input data stream.
Thus this book has something to offer for a wide circle of readers:
1.

2.
3.

Itis a representation of the principles of compiler construction, as far as
they concern the analysis part of compilers especially LL(1)-syntax
analysis with attributed grammars. (Lexical analysis is covered only
marginally.)
Itisa detailed description of a compiler compiler.
It illustrates the application of a compiler compiler by numerous
examples.

ix

X

4.

5.

Preface

Itillustrates the application of software documentation methods on a large
program system, especially the method of stepwise refinement and the
use of an algorithm description language.
It can be used to evaluate the suitability of Modula-2 for software
engineering because it presents a large program in Modula-2 which
exploits the special properties of modular programming.

We consider the primary circle of readers to be advanced computer science
students, theoretically and practically active computer scientists and software
engineers. We therefore presuppose the usual terminology, assume that the
reader is acquainted with the development of software and that he can read
Pascal, or even better Modula-2, or some similar language. Accordingly, we
have kept the discussion brief, but have also taken pains not to refer to special
knowledge cited elsewhere to make the book understandable in itself.
The focal point around which the entire book evolved is the complete
Modula-2 code of Coco in Appendix F. We consider the publishing of such a
large program system a gamble because we are not sure whether the reader
will be interested in the numerous details in it, and because we expose
ourselves to all sorts of criticism of our programming style and choice of
algorithms. But at the same time we hope that it is just this completeness
which makes the book valuable and distinct from others.
For information concerning the structure of the book the reader is referred
to Section 1.4.
The Austrian Foundation for the Advancement of Scientific Research
financially supported the development of the compiler compiler and thereby
rendered it possible, for which we wish to express our appreciation.
For the careful review of the manuscript and for helpful suggestions we
wish to thank our colleagues and friends Prof. G. Pomberger, Dr G. Blaschek
and F. Ritzinger; for proof reading the English translation we wish to thank D.
Raye; for the review of the examples in Chapter 4 we wish to thank Prof. H.
Ganzinger, Prof. U. Kastens, Dr K. Koskimies and Prof. R. Marty. The text
was produced by ourselves with the text processor WriteNow on a Macintosh
computer.

Linz
August, 1988

P. Rechenberg
H. Mössenböck

Numbered definitions,
algorithms, examples

Definition Compiler
Definition Compiler compiler
Versions of Coco
Example Lexical analysis

Example Syntax tree

2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18

Definition Abbreviations for strings and sets of strings
Definition Grammar
Definition Derivation, sentential form, sentence, language
Example Derivation of all sentential forms of a language
Definition Left-canonical derivation
Definition Phrase
Definition Simple phrase, handle
Example Phrase, simple phrase, handle
Definition Recursive grammar
Example Arithmetic expressions
Definition Terminating symbol, derivable symbol
Definition Useless symbol
Definition Reduced grammar
Definition LL(k) grammar
Definition Terminal start symbols of a nonterminal
Definition Terminal start symbols of a string
LL(I) conditions for e-free grammars
Example LL(1) conditions
xi

Numbered definitions, algorithms, examples

xii

219
2.20
2:21
2:22
2.23
2.24
225
2.26
221
2.28
229
2.30
231
252
2.33
2.34
233
2.36
231
2.38
2:39
2.40
2.41
2.42
2.43
2.44
2.45
2.46
2.47
2.48
2.49
250
21
>
32
323
3.4
3.9
3.6
Saif
3.8

Definition Terminal successors
LL{1) conditions for arbitrary grammars
Example LL(1) conditions

Example Dangling else
Definition Deletability
Algorithm Marking deletable symbols
Algorithm Calculation of the sets of terminal start symbols
Algorithm Calculation of successor sets
Algorithm LL(1) analysis (recursive)
Example Recursive LL(1) parsing
Algorithm LL(]) parsing (nonrecursive)
Example Nonrecursive LL(1) parsing
Definition Terminal start symbols of length k
Definition LL(k) grammar
LL(k) condition
Example LL(2) and LL(3) test
Example Basic structure of the top-down graph
Definition Complement symbol any
Example Equivalent top-down graphs
Definition Alternative chain

Example Alternative chains
Definition Match
Definition LL(1) conditions for top-down graphs
Definition G-code (incomplete)
Algorithm Parse (simplified)
Algorithm Parse (complete)
Example Error situation

Principle of error handling
Algorithm Error (basic structure)

Algorithm Triple
Algorithm Fill
Algorithm FillSucc
Algorithm Error (with heuristic enhancements)

Example
Example
Example
Example
Example
Example
Example
Example

Semantic actions
Semantic actions
Interpretation of arithmetic expressions
Interpretation of arithmetic expressions in EBNF
Inherited attributes
A context-sensitive language
Context condition
Context condition

Numbered definitions, algorithms, examples

xiii

3.9
3.10
3.11
3.12
3.13

Definition Attributed grammar
Example Variable declaration
Definition L-attributed grammar
Parser with semantic interface
Example Attribute passing

3.14
3.15
4.1
4.2
4.3
4.4
4.5
5.1
5.2
5.3
5.4

Definition G-code (remainder)
Principle of attribute saving for recursive symbols
Example Attributed grammar as input for YACC
Example Attributed grammar as input for HLP84
Example Attributed grammar as input for GAG
Example Attributed grammar as input for MUG
Example Attributed grammar as input for Coco
Example Cocol grammar for real constants
Example The use of eps
Example The use of any
Example How the compiler treats LL(1) conflicts

88
90
93
95
97
99
101
107
107
108
108

5.5
5.6
5.7

109
110
110

5.9
5.10
5.11
5.12
5.13
5.14
5.15

Example
Exampie
Example
Example
Example
Example
Example
Example
Example
Example
Example

6.1

Example Application of any

124

8.1

Example LL(]) conflicts in lexical structures

179

5.8

Terminal declarations
Pragma declarations
Nonterminal declarations
Semantic actions
Indication of data flow at parameters
Semantic macros
Semantic actions for pragmas
Attributes
Context conditions
Declarations of semantic objects
Stacking of semantic objects

79
80
83
86
87

B12
112
113
113
114
115
115
116

Symbols

a

14

(085

14

a*

14

G
O
N

15
40
15

V

14

yr

14

Ve

14

€

14

Vr
Vy

15
15

@
e
A

U

>
|

15
15

[]

20

{}
{}
=>
>t

20
16
16

="

16

=

17

>

Ent
a, B, 9, @

(6)

203

The string of n identical symbols a
The set {a": n= 1}
The set {a”: n >0}
Grammar
Order (asymptotic time complexity)
Sentence symbol
Alphabet
Alphabet of terminals
Alphabet of nonterminals
The set of all non-empty strings built from symbols of V
The set of all strings built from symbols of V including the
empty string
The empty string
The empty set
‘Element of
Intersection of two sets
Union of two sets
Replacement symbol: ‘is defined as’
Separates alternatives
Option notation (encloses optional symbols and strings)
Set notation
Repetition notation
Direct derivation: 'produces directly'
Derivation: 'produces'
Derivation: ‘produces or is equal to’
Left-canonical derivation
‘Does not produce and is not equal to'
Input, output, transient parameters
Strings
String to be analyzed

1
Introduction and survey

The older of the two authors distinctly remembers that he first heard the word
‘compiler compiler’ at the IFIP-Congress in Munich in 1962 in connection
with Atlas, the super computer of its time by the English company Ferranti. It
was a dark, secretive term. Since compiler writing was still an art mastered by
only a few initiates, one could only touch one's cap humbly to people who
were involved in writing compilers which generated compilers. There was just
no way to understand them.
The two works which focused attention on compiler generating programs
and which eliminated much of the mystery from the concept were the anthology by Rosen [1967] and the survey article Translator Writing Systems by
Feldman and Gries [1968]. But it was the clear formulation of the two most

important deterministic grammars, LR(k)-grammars by Knuth [1965] and
LL(k)-grammars by Lewis and Stearns [1968] that helped compiler generators achieve the actual breakthrough.
Today, the terms 'compiler generator’, ‘compiler generating program’
and ‘compiler compiler’ are used synonymously and refer to a system which
in some way supports and partially automates the production of compilers.
In the first chapter we introduce the concepts of 'compiler' and ‘compiler
compiler’, survey the subtasks which a compiler must handle and discuss the
organization of the book. The reader who is acquainted with the terminology
of compiler construction, even only partially, can start immediately with Section 1.4.

Chap. 1

Introduction and survey

2

1.1 Compilers

and

compiler

compilers

With the exception of special cases, a program can be seen as the description
of a process (algorithm) which transforms input data into output data (Fig.
el):

Fig 1.1 Program

If the input data themselves form a program, and the program P transforms
them into another language, P is called a compiler, the input data are called the
source program and the output data are called the target program (Fig. 1.2).

S

Cc

dt

Fig. 1.2 Compiler

Here, the source language is almost inevitably the higher, less machine-oriented, and the target language the lower, more machine-oriented, language often
the machine language itself. Thus a compiler can be defined, as in Waite and
Goos [1984].

1.1 Definition Compiler
A compiler is a program which transforms an algorithm from a language
acceptable to humans into a language acceptable to machines.

Because a compiler is a complex program which itself must be written in a
programming language, the question arose quite early as to whether, given an
abstract description of the source language and its transformation into a target
language, a compiler could be generated either completely or partially. A program CC which is to solve such a task reads the description of the source language S together with its transformation into a target language T as input data.
It transforms this description into a program C which, when it is later executed, transforms source programs written in S into the target language T. Thus
CC generates a compiler C, and is known as a compiler generator or compiler compiler (Fig. 1.3).

pecat.1

Compilers and compiler compilers

Compiler description
in
compiler description
language
CDL

3

Compiler
in
compiler implementation
language
CIL

Compiler compiler
EC

Fig. 1.3 Compiler compiler

This leads to the following definition.
1.2 Definition Compiler compiler

A compiler compiler is a program which generates a compiler, or major
parts thereof, from the complete or partial description of the compiler.

A compiler compiler and the compiler it generates can be represented as in Fig.
1.4.
Compiler
description

in CDL
#

Source program

S

Compiler

Target program

Cc

it

Fig. 1.4 Compiler compiler and the generated compiler

A compiler compiler and its compiler description language are very closely
related. For the user of a compiler compiler the compiler description language
is actually the only interesting feature because it determines whether the description of the compiler to be generated can be formulated and how conveniently this may be accomplished.
Compiler description languages have two primary tasks: (1) the description of the syntax of the source language of the compiler to be generated and
(2) the description of the transformation of the source program into the target
program. Because the meaning of the source program is visible in this transformation, the description of the transformation is also known as a semantical

description.

There are basically two notations for syntax description: Backus-Naur
form (BNF) and Extended Backus-Naur form (EBNF). Both describe the

Introduction and survey

4

Chap. 1

syntax as a grammar in the form of so-called productions. They constitute
well-understood formal systems and are based on the theory of formal
languages.
The technique of describing semantics is less consolidated. Aside from
ad hoc methods, attributed grammars in a wide variety of forms are usually
applied here.
The compiler compiler described in this book is named Coco (a not very
imaginative abbreviation of 'compiler compiler’) and its compiler description
language is called Cocol (compiler compiler language). Cocol uses the EBNF
of Wirth [1982] for syntax description and a special form of attributed grammars, the so called L-attributed grammars, for semantical descriptions.
Coco was originally implemented in PLM80 and generated a compiler in
Pascal-86. The version described here is written in Modula-2 and generates
compilers in Modula-2. Table 1.3 shows the versions of Coco that are available for several popular compilers at the time of writing. They are different in
the languages of the generated compilers (Modula-2 or Pascal) and in the
machines on which they run.
1.3 Versions of Coco
Computer

Modula-2

Pascal

Macintosh

Mac-METH

Turbo-Pascal

MS-DOS computers

Logitech V. 3.0
M2-SDS
Taylor-Modula

Turbo-Pascal V. 4.0

ATARI-ST

TDI-Modula

IBM/370

Modula/370

1.2 Static compiler

structure

Like the translation of a sentence in a natural language Q into another natural
language Z, the transformation of a source program into a target program can
be roughly divided into two phases. First the sentence in Q must be 'understood’, through grammatical analysis. With knowledge of its grammatical
structure and the aid of a dictionary it is then possible to construct the sentence
in Z with the same meaning. In a similar way, the translation of a program
consists of analysis and synthesis.
In the analysis phase the source program is decomposed into its constituent parts. Here one distinguishes:

Sec.1.2

1.
2.
3.

Static compiler structure

5

lexical analysis, which transforms the input character stream into 'symbols' such as names, numbers and operators;
syntax analysis, which analyzes the grammatical structure of the program,
semantic analysis, which analyzes all the properties of the program
which are not of a syntactical nature.

Analysis yields:
1.
2.

3.

the determination of the correctness of the program;
the internal representation of the source program in a form which is particularly well adapted for synthesis (so-called intermediate language);
memory tables which are used for further processing of the intermediate
language.
Source program
A

N

DET

suse Cele senate tactoseodass Sesbccendaceek

Characters

Lexical analysis

a

Symbols

Compiler front end

Sealy

2

Syntax tree

te

capa tes sen feloeiekele Seele

Synthesis

Intermediate language

Intermediate language

|

Compiler back end

Target program

Fig. 1.5 Static compiler structure

In the synthesis phase the target program is generated from the program in the
intermediate language. Here one distinguishes:

ie
2,

optimization, which transforms the program in the intermediate language
to improve the target program with respect to certain criteria;
code generation, which generates the target program from the optimized
intermediate language.

This static, or logical compiler structure is shown in Fig. 1.5.

Introduction and survey

6

Chap. 1

The analysis sections are determined by the source language and the intermediate language; the synthesis sections are determined by the intermediate language and the target language. The analysis sections are known as the compiler front end; the synthesis sections are known as the compiler back end.
The compiler front end is independent of the target language; the compiler
back end is independent of the source language.
Compiler compilers primarily support the analysis phase, and therefore
this book only deals with the analysis phase.
Lexical analysis
Lexical analysis preprocesses the source program text in order to simplify the
tasks of the later phases. This preprocessing includes the following points:

1.
2.

Elimination of meaningless characters. Comments, empty lines and unnecessary spaces are eliminated.
Recognition of symbols. One or more characters in sequence which together constitute a symbol are recognized. Symbols are:
(a)
(b)
(c)
(d)

3.

keywords such as IF, WHILE, END, etc.;
names for constants, types, variables, procedures, etc.;
literals (numerical constants) such as 3.14;
strings, usually enclosed in inverted commas, such as ‘This is a
string’;

(€) compound characters such as ':=', '<=','. .', etc.;
(f) individual characters such as '(', '+', etc.
Arithmetization of symbols. Because numbers can be processed more

easily than strings, keywords, names and strings are replaced by numbers, and literals are converted to the internal numerical representation of
the machine. This process is known as arithmetization. Names are stored
in a name list, strings in a string list, and literals, possibly, in a constant
list.
1.4 Example Lexical analysis
The source statement
x :=3 + base * factor;
contains the names x, base and factor; the numerical value 3, the character combination ':=' and the individual characters ande:
alt
ident, becomessy, number, plussy, timessy and semicolonsy
are

names for the arithmetized symbols, lexical analysis yields the sequence
of 8 symbols:
ident becomessy number plussy ident timessy ident semicolonsy

Sec.1:2

Static compiler structure

7

Some of these symbols are uniquely determined (e.g. plussy); others
such as ident and number refer to a class of symbols and must be made
unique by a semantic value (e.g. an index in the name list for names, the
converted numerical value for literals). If x, base and factor are stored
respectively in places 1, 2 and 3 in the name list, lexical analysis yields
the following symbols with their semantic values:
ident/I becomessy number/3 plussy ident/2 timessy ident/3
semicolonsy
Lexical analysis is the simplest part of the compiler. However, it does take up
a large portion of the compilation time (typically 20 to 40%), which means that
efficiency is especially important.
A lexical analyzer written in Cocol is described in Section 8.1. But lexical
analyzers are not discussed anywhere else in the book and the reader is referred to the literature, for example Gries [1971] or Bauer [1976].

Syntax analysis
Syntax analysis decomposes the source program, which now consists of symbols, into its grammatical parts and represents its structure as a tree (called a
Syntax tree) or as something equivalent to a tree.
vl

:=

Variable

3

+

v2

*

v3

3

Expression

Assignment

Fig. 1.6 Syntax tree

1.5 Example Syntax tree
The source statement in Example 1.4 is an assignment. An assignment
consists of a variable, the assignment symbol, an expression and a closing semicolon. An expression consists of terms connected by addition
operators, and terms consist of factors connected by multiplication operators. This yields the syntax tree in Fig. 1.6.

8

Introduction and survey

Chap. 1

Syntax analysis is much more difficult than lexical analysis. There are, however, methods for syntax analysis which are based on the grammar of the
source language. Knowledge of these methods makes syntax analysis a routine task.

Semantic Analysis
Semantic analysis examines the properties of the source program which cannot
be represented grammatically, in particular:
1.
2.
3.

the scope of names;
the correspondence between declarations and uses of names;
the type compatibility of operands in expressions and statements.

Semantic analysis and syntax analysis can be performed together, in which
case the two phases merge; or they can be performed separately, in which case
the syntax tree, the result of the syntax analysis, is augmented with semantic
information.

1.3 Dynamic

compiler

structure

Dynamic, or time-dependent, compiler structure must be distinguished from
static, or logical, compiler structure. The individual logical divisions — lexical
analysis, syntax analysis, semantic analysis, optimization and code generation
— can be executed either sequentially or simultaneously, which means interwoven in time. Each part of the compiler which reads the source program or
an intermediate program in its entirety is called a pass, and thus compilers are
classified as single-pass or multi-pass compilers.
Figure 1.7 shows both cases. For a single-pass compiler the syntax analyzer is the central, controlling program. It calls the lexical analyzer when
it requires the next source symbol, and it calls the semantic analyzer
when it
wishes to pass on a syntactically correct construction. The semantic analyzer
generates a section of intermediate code or the corresponding machine
code
(with or without optimization). For a multi-pass compiler each
section is
executed sequentially. The result of each section is an intermediate
program
which is written onto an external storage device and is read
again by the next
pass.
Single pass compilers are generally much faster than multi-p
ass compilers
because they avoid access to external storage devices for readin
g and writing

Secil.3

Dynamic compiler structure

9

intermediate programs. Multi-pass compilers, on the other hand, require less
storage space because only one part of the compiler need ever reside in main
storage at once, and they are logically simpler because the various parts are not
intertwined. Some source languages cannot even be compiled by single-pass
pass compilers because they contain grammatical constructs whose translation
requires information which becomes available only from parts of the source
program that are processed later. This is the case, for example, when a variable can be used before it is declared.
The advantages and disadvantages of single-pass and multi-pass compilers can be summarized as in Fig. 1.8.
Source program
Lexical
analyzer

External memory

Syntax
analyzer
External memory

: nn

—
en, "

_ Control flow

Optimization
and

Data flow

code generation

Intermediate language
Semantic
analyzer

Target
program

External memory

Optimization and
code generation
Target program

Fig. 1.7 Single-pass and multi-pass compilers

Single-pass

| Multi-pass

compiler

Speed
Memory
Logical complexity
Universal applicability

Fig. 1.8 Properties of single-pass and multi-pass compilers
+=favorable - = unfavorable

10

1.4 The

Introduction and survey

structure

Chap. 1

of the book

This book consists of nine chapters and six appendices. The first three chapters cover the principles of compiler construction as far as they are required for
an understanding of Coco; occasionally rather more than the minimum is presented in order to provide a well-rounded picture. The fourth chapter provides
a glimpse into other compiler compilers, and the rest of the chapters present
Coco itself, its compiler description language, its implementation and applications. In view of this an outline looks as follows:
Principles of compiler construction
1. Introduction and survey
2. Syntax
3. Semantics

Various compiler compilers
4. Various compiler compilers
The compiler compiler Coco
5. The compiler description language Cocol
6. The compiler compiler Coco
7. The implementation
8. Applications
a
Experiences with Coco

The second chapter starts with those concepts from formal language theory
which are necessary for the remainder of the book. Then table-driven LL(1)
Syntax analysis is covered; this determines the fundamental structure of this
compiler compiler, and at the same time is a simple and efficient method for
developing the syntactic section of compilers. Most importantly this chapter
contains a method for automatic error recovery which is independent of the
language to be analyzed.
In the third chapter, the method applied in this compiler compiler for describing the actual translation process, using attributed grammars, is presented. The special case of L-attributed grammars is used here and the translat
ion
process is described by attributes, context conditions and semantic
actions.
The fourth chapter gives a survey of a few compiler generators
described
in the literature, and thus also surveys the state of the art.
The fifth chapter is a definition of the compiler descri
ption language
Cocol.

Sec.1.4

The structure of the book

11

The sixth chapter describes Coco from the view point of the user: its
characteristics, how to use it and what the compilers it generates look like.
Along the way it is shown that Coco is also suitable for implementing multipass compilers. This chapter, together with the language description of Chapter 5, forms the ‘external’ description of Coco.
The seventh chapter, the longest, contains the details of the implementation of Coco. This chapter is also intended as a study in program documentation.

The eighth chapter presents three major examples of the use of Coco. The
first is a complete description of a lexical analyzer in Cocol. The second
illustrates Coco as a software engineering tool and the method of attributed
grammars as a Software engineering method which encompasses the Jackson
method as a special case. The third presents the compiler sections generated
for a concrete input grammar.
In conclusion the ninth chapter presents experiences of the authors with

Coco.
The Appendices contain the algorithm description language Adele used
here, describe Modula-2 in as much as it differs from or supersedes Pascal,
present a complete listing of Coco in Modula-2 and a description of Coco in
Cocol, that is a self-description of Coco.
Systematic readers should read the book chapter by chapter. Readers who
wish to begin with lexical analysis should consult Section 8.1 as early as
Chapter 2. Readers who wish to know about Coco only (or firstly) from the
user's point of view can start immediately with Chapter 5 followed by Chapters 6 and 8, and perhaps Chapter 4.
Finally, readers who are already familiar with LL(1)-grammars and are
primarily interested in the implementation of Coco can acquaint themselves
first with Cocol in Chapter 5 and then concentrate on Chapters 6 and 7, although they will occasionally have to refer back to Chapters 2 and 3.
The following chapter sequences are therefore recommended (Chapters
which extend the material are in italics):
Novices and all-embracing readers:
Primarily interested in applying Coco:
Primarily interested in comparing Coco:
Primarily interested in the implementation of Coco:

2-9
5, 6, 8, 4
A088
5,6, 7, 8.3

Some remarks have been repeated so that the chapters do not become too
interdependent. We hope the all-embracing reader will forgive us for this.
In general the presentation is organized according to the principle of
stepwise refinement. This is true of the individual chapters as well as for the

Introduction and survey

12

Chap. 1

book as a whole. Thus Chapters 2 and 3 are basically refinements of Section
1.2, Chapters 5 and 7 refinements of Chapters 2 and 3 and Appendix F, containing the text of Coco in Modula-2, is a refinement of Chapter 7.
For representing algorithms our algorithm design language Adele is
used. It is defined in Appendix A, but should be understandable without a
definition as it relies strongly on Modula-2 and Pascal. The authors use Adele
constantly in their daily work and view Adele as a method for algorithm description which is adequate in most cases.
Actual Modula-2 programs occur only in the appendices, but there are
also Modula-2 fragments in Chapters 5 and 7. The book is therefore understandable for readers who are not familiar with Modula-2. In spite of this,
Modula-2 is viewed as of major importance in this book because of the technique of modular programming, and especially because of data encapsulation.
One of the book's important aims is to document a large Modula-2 program
and to demonstrate in the process how well Modula-2 is suited to software engineering projects.
Definitions, algorithms and examples are numbered and indented. A col-

lection of all numbers is to be found after the table of contents to facilitate fast
searching.

Syntax

In this chapter we deal with all syntax-related questions as far as they concern
compilers that use LL(1) syntax analysis. First, we will summarize the
terminology and some important results of formal language theory. Next, we
look at LL(1) grammars and their syntactical analysis. Since the flexibility
and efficiency of syntax analysis depends to a large degree on the representation of
the grammar in memory, we will describe the tree-like data structure
used in Coco which is called a top-down graph. We will also describe an
optimized version of the top-down graph, called the G-code, which is especially suited for interpretation. At the end of the chapter we describe the Gcode syntax analyzer and a method for automatic error handling.
Except for the G-code and its interpretation this chapter is not Coco specific. Thus, it can be read as a general treatment of syntax issues in compiler
design. Bottom-up analysis and LR(k) grammars have been left out, since
they constitute a large and self-contained topic that does not apply to Coco.
Interested readers are referred to Knuth [1965], Aho and Johnson [1974],
Waite and Goos [1984], and Fischer and LeBlanc [1988].

2.1

—

Basic concepts from formal language theory

We assume that the reader is familiar with the elements of formal languages,
and we summarize here only the terms and definitions that we will use later
on. We primarily use the terminology from the books of Gries [1971] and
Aho and Ullman [1972].

13

14

Syntax

Symbols

and

Chap. 2

strings

Programs consist syntactically of sequences or strings of symbols which
belong to an alphabet or vocabulary. If a, b, c are the symbols that constitute the alphabet V, then we can write:
Vi={a,6.c}

Symbols can be concatenated to form strings. For some strings and sets of
strings there are commonly used abbreviations:
2.1 Definition Abbreviations for strings and sets of strings

a”
e

denotes the string consisting of n identical symbols a, e. g. a3 = aaa.
denotes the empty string, i.e. a string of null symbols.

a*
a*

denotes the set {a": n> 1}, e. g. at = (a, aa, aaa, aaaa, ...}.

V+

denotes the set of all non-empty strings which can be formed from
the symbols contained in V. For example, if V = {a, b, c} then

denotes the set (an:

n2 0}, e. g. a*= {e, a, aa, aaa, ...).

It is obvious that a* = at U {e}.

Vt = {a, b,c, aa, ab, ac, ba, bb, bc, ca, cb, cc, Gade

V*

V+ is called the free semi-group over the alphabet V.
denotes the set of all strings including the empty string that can be
formed from the symbols of V. For example, if V = {a, b, c}
then
V* = {e, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, <n}

V* is called the free monoid over the alphabet V.
It is obvious that V* = V+ U fe}.

The set V is always finite whereas the sets a+, a*, V+, V* are always infinite.

Grammar

and language

In Section 1.2, we showed that the grammatical structure of an instructi
on, a
program, or generally of a ‘sentence’ is a tree, the syntax tree. In the
syntax
tree, there are two types of symbols:

1.

2.

Terminals are the symbols of the sentence itself. They are the leaves
of
the tree and cannot be decomposed further.
Nonterminals are all other symbols.

Sec. 2.1

Basic concepts from formal language theory

15

In addition to the above, each tree contains a distinguished nonterminal, the

sentence symbol, or the root, from which the entire tree originates. The valid
structures of syntax trees and hence the sentences of a formal language are described by a grammar.
A context-free grammar or, simply, grammar — since we only use
context-free grammars — is a system of rules for producing strings over an
alphabet V.
2.2 Definition

Grammar

A grammar G is a quadruple G = (Vy, V7, R, S) with the following
components:
Vy: alphabet of nonterminals,
Vr: alphabet of terminals,
R:
set of productions, also called syntax rules or derivation rules,
S:
sentence symbol, a special symbol from Vy, the root of the syntax
tree.

By V=Vy vu Vr we denote the union of the terminal and nonterminal
alphabets.
A production is written as

Aa

where

X € Vy and ae V*

(read: 'X is defined as a' or 'X can be replaced by a' which means that the
nonterminal X can be replaced by the string a in each string that contains X.
Several productions may have the same left-hand side, such as:
X
X

Q
>

X>

a
43

They denote alternatives and can be grouped by use of the symbol '!:

Xa,

la2!la3

|

(read: 'X is defined as a, or a2 Or a3’).
The productions describe the replacement of nonterminals by strings. We
start from the sentence symbol S, and replace it by a string according to the
productions of the grammar. Then we repeatedly replace nonterminals in the
string by other strings until we reach a string that contains only terminals. S$
itself and all strings that result from S by the application of the productions
are called sentential forms. The sentential forms that consist of terminals only
are called sentences.

16

Syntax

Chap. 2

We denote replacement by the replacement or derivation symbol =. If a
and B are two sentential forms and ß may be derived from a by the application
of a production, we write:
a=>B

(read: ‘a produces ß' or 'B is derived from a’).
These terms are formalized by Definition 2.3 and are illustrated by Example 2.4.
2.3 Definition Derivation, sentential form, sentence, language
We say that a string a directly produces a string B, written a = ß, if
there exist strings @; and @,, such that we can write a =, A w,,

8 =@,

@, and the production A => g belongs to the grammar. We then

call B a direct derivation of a. We describe a sequence of several
derivations by the symbols >* and >*. A string a produces a string B,
written as
a>+tß

if there exists a sequence of direct derivations
a=09>01)>02>...5
0,=8

where

n2>1

Such a sequence is called a derivation of length n. For the case of
a =+
B ora =B8, we write
a=

6s

(read: 'a produces or is equal to B'). If G is a grammar with sentence

symbol S, then a string a is called a sentential form if
S=>* a
A sentence is a sentential form that consists only of terminals, and
a language L(G) is the set of all sentences that can be derived from the
sentence symbol:
L(G) ={a:Sata

& aeVr*}

2.4 Example Derivation of all sentential forms of a langua
ge
Consider

the grammar

nonterminals
tions:

G = (SA)

B)

shag

wer:

S) with the

S, A, B, the terminals a, b, ;, and the set R of produc-

peco2.t

Basic concepts from formal language theory

S

>

A;

A —

aB

17

| BBb

B—
b | ab

From this, the following derivations of sentential forms can be produced:
S =)Ay) => ab;

=> ab;

=>

=

BBb;

aab;

=> bBb;
“=

abBb;

=) bbb;
>

babb;

>

abbb;

>

ababb;

The result is L(G) = {ab; aab; bbb; babb; abbb; ababb;}.
the language L(G) consists of 6 sentences.

Hence,

A syntax tree can be assigned to each sentence. Figure 2.1 shows the syntax
tree of abbb; in two forms.
S

S

A
Ey

;

t

+

B

By)

ae
ab

A

+

|

B

fae

B

b

ane ab

b

;

Fig. 2.1 Two forms of syntax tree for abbb;

In the top-down syntax analysis discussed later on, we will always use derivations in which the leftmost nonterminal is replaced. This kind of derivation
is called left-canonical:
2.5 Definition Left-canonical derivation

A direct derivation ,; A 2 > wı a @2 is called left-canonical, and
written as
L
01 A @2 > wıam
if o| € Vr", that is if A is the leftmost nonterminal. A derivation is
called left-canonical if all its direct derivations are left-canonical.

Sometimes it is useful to have a name for the string that is substituted for
a nonterminal during a derivation. This string is called a ‘phrase’.

18

Syntax

Chap. 2

2.6 Definition Phrase
When wı a 2 is a sentential form such that

S>*0,A@2>*01] 002,
then «a is called a phrase, more specifically an A-phrase.

According to this definition, each sentential form is an S-phrase.
Because of their importance in bottom-up syntax analysis, which is not
covered in this book, we shall also define the terms simple phrase and
handle.
2.7 Definition Simple phrase, handle

If a is an A-phrase and a direct derivation of A, then
S>*o,A
02> 0, am

holds and a is also called a simple A-phrase. The leftmost simple phrase
in a sentential form is called the handle of the sentential form.
2.8 Example Phrase, simple phrase, handle

Consider Example 2.4 and the derivation sequence
S>

A;

= B1B2b3;

> Byb2b3;

= ab b2b33

where the different Bs and bs have been distinguished by an index. In
the sentential form ab,b2b3;

abı
bo

is a simple B-phrase and the handle,
is a simple B-phrase,

abjb2b3
abi b2b3;

is an A-phrase,
is an S-phrase.

In the sentential form B,b2b3;
be
is a simple B-phrase and the handle,
B,b2b3
B,b2b3;

is an A-phrase,
is an S-phrase.

In the sentential form B1B2b3;

B1Bab3
B1B2b3;

is a simple A-phrase and the handle,
is an S-phrase.

In the sentential form a;

A;

is a simple S-phrase and the handle.

Sec. 2.1

Basic concepts from formal language theory

19

Recursive productions produce languages with an infinite number of sentences. The production A > a1Ab produces the set of sentences ab*. The
production A - a |bA produces the set of sentences b*a, the production
A

al (A) produces the set of sentences { ("a )":n>0}.
Recursion can also appear indirectly, which means it can span several
productions, as in the production pair
IN Sp oe

By.

B>
z | Au

The following definition i§ a consequence of this:

2.9 Definition Recursive grammar
A grammar is called recursive if it permits derivations of the form A >*
1 A @2 with A € Vy, wı e V*, w2 e V*. More specifically, it is called

A=>* A w

Left-recursive if
Right-recursive if

A>* oA

Central-recursive or self-embedding if A =>*+ 1 A @2.

2.10 Example Arithmetic expressions
The grammar of arithmetic expressions with the sentence symbol E and
the terminals v for variables, and c for constants:
We

PS are a

|

Be

Al > eee eat ACP

Oh ape IB [lee oe
ee by

Pam
theca ili (abhi)

is left-recursive in E and T, and central-recursive in E, T, and F.

The extended Backus-Naur form (EBNF)
Computer science uses various notations for grammar productions. The pre-

viously used notation has the following

characteristics:

1.
2.

terminals are lower case
nonterminals are upper case

3.
4.

replacement symbol is >
separation of alternatives is denoted by |

Indefinite repetition, which is a frequently occurring language element, must
be described by recursive productions, especially left-recursive productions.
This appears in many cases unnatural and it is also unsuited for top-down
syntax analysis. Several grammatical notations have therefore evolved that

Chap. 2

Syntax

20

remove these and other deficiencies. Among these, the notation introduced by
Wirth [1982] for the description of Modula-2 is especially notable because of
its simplicity and clarity. Its characteristics are:
1.
2.

CON
PW
NN

Terminals that represent themselves (literals) are in quotes
Other terminals and nonterminals have names that imply their meaning
(this is customary but not mandatory)
Replacement symbol is =
Separation of alternatives is denoted by |
Productions are ended explicitly by a period
Option symbol: [A] means A le
Repetition symbol: {A} means elAIAAIAAA|...
Parentheses for enclosing

The grammar of the arithmetic expressions is as follows:
PGMAMO
term
factor

[PMSA| steeweni EWE
||Yo) ei ic
ea neKoretong AX(MUVUA LAY) “SeEKelEyoat
= Cn Vile
(eeexpre’ssmonian)
co

The form of the EBNF grammar itself can also be described by an EBNF
grammar:
EBNFGrammar
production

= production
= symbol "="

expr
term
factor

{production}
expr.

".".

term {"|" term}.
factor {factor}.

ident
N—
il

De

| string
expr

un

|

Lu

la

expr

alley

|

bi

expr

WA

ident is the terminal for names, string is the terminal for a character string
enclosed in quotes.
In this book, we will primarily use Wirth's EBNF notation. However,
where structural simplicity of the grammar is important, we will still use the
older notation of the formal languages.

Reduced

grammars

In the grammar of a programming language, each nonterminal and each alternative should contribute to the generation of sentences. If this is the case, the

grammar is called reduced. In the development of a grammar, unnecessary
nonterminals and alternatives may creep in. Therefore, each newly developed
grammar should be checked to determine if it is reduced. If it is not, the unnecessary symbols and productions should be removed.
In order to contribute to the generation of sentences, each nonterminal
must meet the following two conditions: It must be 'terminating', that is, it

Sec..2.1

Basic concepts from formal language theory

21

must produce a terminal string, and it must be ‘derivable’, that is, it must
appear in some sentential form.

2.11 Definition Terminating symbol, derivable symbol
A nonterminal A is called terminating if it produces a terminal string,
that is

A=*a withaeVr*.
A nonterminal A is called derivable if it appears in a sentential form, that
is, if
7
S=* @,A @2.

A nonterminal that is not derivable or not terminating, contributes nothing to
the generation of sentences, and is therefore useless.
2.12 Definition

Useless symbol

A nonterminal A is called useless if there is no derivation
*

S =>" 0, A 25" ©]
a @2

where @), 02, a eV*

2.13 Definition Reduced grammar
A grammar that contains no useless nonterminals is called reduced.
Algorithms for the detection of all useless symbols are simple (see Sections
7.5.2 and 7.5.4, or Hopcroft and Ullman [1979]). If one wants to delete
them, the order is important. First, the nonterminating symbols must be
found and all alternatives in which they appear must be deleted from the
grammar. Then, the nonderivable symbols and the alternatives in which they
appear must be found in the new grammar and deleted. Automatic deletion is
possible but not recommended since useless symbols often indicate errors in
the grammar.
Even after removing useless symbols, the grammar may still contain use-

less alternatives, which permit derivations of the form A >* A. Such a derivation is called a circular derivation, and the grammar is called circular or
cyclic. Section 7.5.3 contains an algorithm for a circularity check of a grammar. The book by Hopcroft and Ullman [1979] contains an algorithm for the
deletion of productions where the right-hand side consists of only a nonterminal, and thus for the removal of cycles.

In the following, we will cover only non-circular reduced grammars.

22

Syntax

Chap. 2

Grammatical levels
Programming languages contain constructs of various hierarchy. At the very
top are programs, which are composed of modules, procedures, declarations
and statements. Declarations and statements in turn are composed of expressions, keywords, names and numbers. Names and numbers are composed
of characters. It is somewhat arbitrary which of these constructs are defined
as terminals. If one only wants to show the nesting of procedures, then declarations and expressions can be regarded as terminals. If one wants to describe
the structure of expressions, then keywords, names, numbers, and operators
can be regarded as terminals. Only if one wants to descend further must
individual characters be seen as terminals.
In this way, the syntax of a programming language need not be completely described by one grammar, but may be partitioned into several grammatical
levels. The terminals of the higher level are the nonterminals of the lower
level.

In compiler design, usually two levels are used: the syntactical and the
lexical level. The syntactical level is the higher of the two; its sentence symbol
is the program. Its terminals are keywords, names, numbers, operators, etc.
Below this, nonterminals of the lexical level are keywords, names, numbers,
and special symbols. Its terminals are the individual characters of the source
text, insofar as they are meaningful for the grammar (comments, end-of-lines,
and meaningless empty symbols are not part of grammar). Figure 2.2 shows
this relationship.

syntactic

lexical

program
procedure
statement

name
number
keyword

expression

ee

name
number
keyword

individual character

Fig. 2.2 Syntactic and lexical grammatical levels

In this book, we consider mainly the syntactical level. This results in a difficulty with the notation of terminals. From the syntactical level, the expression
art pe

Seen2:2

LL (I) grammars and syntax analysis

23

consists of two names v, a number c, and the operators '+' and '*':
vtivitc

While the terminals '+' and '*' are simultaneously members of the syntactical
and the lexical level, the terminal v denotes all names, and the terminal c
denotes all numbers. In order to emphasize this fact, we call terminals of the
syntactical level that represent an entire class of symbols of the lexical level, a
terminal class. Thus, in the grammar of arithmetic expressions, v and c are
terminal classes, and +, -, *, /, (, ) are individual terminals.

It is to some extent arbitrary which terminals of the lexical level are also
considered as terminals of the syntactical level, and which are combined to
terminal classes. For instance, the operators *, /, and MOD from the lexical
level can be considered at the syntactical level as individual terminals or can be
combined at the lexical level to the terminal class mulop by the production
mulop

2.2

PT

Beh if

|

WA

|

"MOD",

LL(1) grammars

and syntax analysis

A grammar for a language can be used in two different ways: as a generative
grammar for the generation of sentences of the language, and as an analyzing
grammar for the decision whether a given string is a sentence of the language.
The generation of sentences is a trivial, straightforward, combinatorial
problem, and of no interest in the practical areas of computer science. However, the aspect of the generative grammar is important in theoretical computer
science and mathematics. In these fields grammars are classified according to
the expressive power and the structural characteristics of the languages they
generate.
The analysis, more precisely the recognition of sentences is, from a
mathematical point of view, also a trivial problem. All sentences of the grammar may simply be generated in ascending order by their length, and it is then
easily determined if the specified string is among the sentences (search by exhaustion). In reality, this is not feasible since the number of sentences generally grows exponentially with their length. Analysis methods are needed that
make use of all information in the grammar, and that perform the analysis of
the given string in a minimum of time and memory requirement. These
methods can be separated into two main categories: top-down methods start at
the top with the sentence symbol and move downward by repeated derivations
trying to find a sentence which matches the given terminal string; bottom-up

24

Syntax

Chap. 2

methods start at the given terminal string and move upward by repeated reductions of phrases until the sentence symbol is reached. In addition to these
two main approaches, there are analysis methods that mix the top-down and
bottom-up approach.
In this book, we will cover only top-down analysis.
In top-down analysis, we start from the sentence symbol and repeatedly
generate new sentential forms by left-canonical derivations, with the goal of
deriving a sentence matching the given string. If this is successful, the string
has been parsed. If it is not successful, and we have exhausted all possibilities
for the derivation of sentences that match the string, then it is clear that the
symbol string is not a sentence of the grammar.
The only difficulty with this approach is the selection of the correct alternative. Generally, there is not enough information available at the time when
the selection between several alternatives must be made to be reasonably sure
of choosing the correct one. Therefore, usually the alternatives must be tried
one after the other until the correct one is found. The alternatives that have
been tried unsuccessfully are dead ends from which one has to return by
backtracking. Fortunately, programming languages are structured in such a
way that the proper alternative can be determined with certainty by considering
only a part of the input string. These grammars are called deterministic. In
compiler construction, only deterministic grammars are used, and so we shall
cover only the top-down analysis of deterministic grammars.
Deterministic

top-down

parsing

The concept of deterministic top-down parsing consists in selecting the proper
alternative by looking at the start symbols of the string to be analyzed. In this
way parsing proceeds from left to right. Consider, for instance, the grammar
S

>

A —

aA
x

| bB
| aB

B—
y | bA

and the input string o = bbay. The grammar has the property that all of its
alternatives start with terminals, and also that the heads of the alternatives are
different in each rule. This property permits the dead-end-free determination of
the correct alternative by consulting the string o. Assuming that the string is
read from left to right, the parsing proceeds as follows:

1.

In the beginning, when a choice has to be made between S >a@A and
S = bB, the first symbol of o is read, b is found, and therefore it is

Sec. 2.2

LL (1) grammars and syntax analysis

known that

S >bB

must be the correct alternative since

never lead to a sentence starting with b.

25
$S=aA can

2.

If bB is further derived, one has the choice of replacing B with y or
with DA. If the next symbol is read, a b is found, and so DA must be
the correct alternative.

3.

Continuing this procedure, the following derivations are generated:
S =bB >bbA >bbaB =bbay
resulting in the recognition of o as a sentence.

From the above derivation, the syntax tree of Fig. 2.3 follows.

Fig. 2.3 Syntax tree

This is the essence of deterministic top-down parsing: Starting with the sentence symbol, a sequence of left-canonical derivations is built, selecting the
correct alternatives by the inspection of the string to be parsed. The string is
read from left to right.
More than one symbol of the input string must be considered when several alternatives of a production start with the same symbol. This lookahead is
a characteristic of the LL(k) grammar:

2.14 Definition LL(k) grammar
A grammar is called LL(k) (deterministically recognizable from left to

right with left canonical derivations and a lookahead of k symbols) if its
sentences can be parsed by a top-down analysis from left to right in such
a way that in each situation where a choice must be made between several
alternatives, the correct alternative can always be found by considering
the next k symbols of the input string.

Chap. 2

Syntax

26

Since it is desirable to restrict the lookahead to one symbol, and since it turns
out that this restriction allows us to handle most practical cases, we will examine more closely only the LL(1) grammars. The main question is how to
determine if a given grammar is LL(1). We will answer this question first for
e-free grammars (i. e. grammars without empty alternatives), and then for
grammars that do contain empty alternatives.
LL(1)

Grammars

without

empty alternatives

Even a grammar whose alternatives begin with nonterminals may be parsable
without running into dead ends. Consider the grammar
S

—

D>

Aa
xza|

| Bb
vB

B—
uz | vA

and the string o = uzb. Even though none of the alternatives of the production
for S starts with u, it is obvious that only B can be derived into a string
starting with u, while all derivations of A start with x or y. The symbols x
and y are the 'terminal start symbols’ of A, and u and v are the terminal
start symbols of B. The concept of a set of terminal start symbols is central
for the description of the LL(1) property.
2.15 Definition Terminal start symbols of a nonterminal

The set first(A) of terminal start symbols of the nonterminal A is defined to be the set of all terminals with which a string derived from A can
Start:

first(A)= {x:A=>*xo,

for

xe Vrand
we V*}

For the production A — ¢, first(A)=®

(the empty set)

This definition can be expanded in a natural way for a string as argument :
2.16 Definition Terminal start symbols of a string

The set first(a) of the terminal start symbols of a string a is defined to be
the set of all terminals with which « or a string derived from « can start:
first(a)= {x:a=>* x, forxeVr and we V*}
As a special case we define first(e) = @.

With the concept of terminal start symbols, it is easy to define the conditions
under which an e-free grammar is LL(1):

Sec#2.2

LL (1) grammars and syntax analysis

27

2.17 LL(1) condition for e-free grammars
An e-free grammar is LL(1) if, for each of its productions,
the sets of
terminal start symbols of its alternatives are pairwise disjoint
. That is,
for each of its productions
A>

oj | =.

| a,

the following holds:
firsta) N first(aj)=@

forl<izj<n

2.18 Example LL(1) conditions
The grammar
DEDERN;

A —

aB

| BBb

Bi => 10) | ale

is not LL(1) since the following is true for the production

A — aB|BBb:

first(aB)= {a}, first(BBb)= {a,b},
and hence
*,

first(aB) N

first(BBb)= {a}

The sets of terminal start symbols of both alternatives are not disjoint.
Both alternatives can start with an a. As a result, if a choice must be
made between alternatives, and a is the leftmost symbol of the input
string, the correct alternative cannot be found without a lookahead of
more than one symbol.
No left-recursive grammar is LL(1) since for a production of form
A-~ a | AB the following is true: first(a) = first(A B).

LL(1) Grammars with empty alternatives
For a grammar with empty alternatives, the LL(1) condition of the preceding
section no longer holds. Consider, for instance, the grammar
S —

aA;

| bAc;

Arc|/e

and the input string o = bc;. It is obvious that the production for S meets the
LL(1) condition 2.17 which is also true for the production for A because
first(c)={c}, first(e)=@

andhence

first(c)N first(e)=®

Chap. 2

Syntax

28

However, the grammar is not LL(1) since after the derivation
Se

DAG.

it is impossible to determine with a lookahead of only one symbol whether
A—>c

orA > e must be used for the next derivation.The choice of A — c:

SEZEDACHEDEDEC,

does not lead to o. The choice of A > e is the correct one. Therefore, the
grammar is not LL(1).

If we must choose one of the alternatives of a production
AO,

holo

le

and only the next symbol of o can be used, then the terminal start symbols of
a1 to a, and the terminal successors of A must be pairwise disjoint, since
in the case of the production A > e , the terminal following A is the next one
in o. The set of terminal sucessors is defined as follows:
2.19 Definition

Terminal successors

The set follow(A) of the terminal successors of a nonterminal A is the

set of all terminals that can follow A in any sentential form:

follow(A) = {x:S ="

Ax @2, forAe Vy, xe Vz, 01, waeV*}

This definition makes it possible to determine the conditions under which an
arbitrary grammar is LL(1):
2.20 LL(1) conditions for arbitrary grammars
A grammar is LL(1) if (1) for each of its productions, the sets of all ter-

minal start symbols of all alternatives are pairwise disjoint, and (2) for the
nonterminals which can be derived into the empty string, all terminal successors of the nonterminal are disjoint from the terminal start symbols of
each alternative. Formally: for each production
A>a,l...la,

the following must hold:
first(a; follow(A)) N first(a jfollow(A))=®

forl<i#j<n

Note that in the formal representation the cases «; PD" ¢ and a;> ir are
combined. For a; #" ewe have first(a; follow(A)) = first(a;), for

aj=

*

ewe have first(a; follow(A)) = follow(A).

Sec. 2.2

LL (1) grammars and syntax analysis

29

2.21 Example LL(1) conditions
Consider the grammar of Knuth[1965]:
S->

E;

E >

aAbE

| bBaE

A

aAbA

| €

B —

bBaB

| €

| ¢€

Is it LL(1)? Since e appears in the productions for E, A, and B, the terminal successors of E, A, and B are needed. From the grammar it can
be easily seen that follow(E)
follow (A)
follow (B)

{7}

{b}
{a}

The lookahead sets are:

for the alternatives of the E production

first (aAbE

follow(E))

{a}

first (bBaE

follow(E))

{b}

first(€ follow(E))

for the alternatives of the A production

first (aAbA

{7}

follow(A))

{a}

first(€ follow(A))
© #4

for the alternatives of the B production

first (bBaB

follow(B))

{b}
ll

{a}

Since the lookahead sets are pairwise disjoint for the alternatives of each
production, the grammar is LL(1).
The calculation of the successor sets is not always easy as we can see in the
following example of an if statement having a dangling else clause.

2.22 Example Dangling else
Consider the grammar

NS
FG)
eS
OYE
=F
=

program

4

statement

programrest

>

program

statement

4

assignment

assignment

+

Vie

programrest
| end

| ifstatement

xpi,

ifstatement

>

if thenpart

thenpart

>

expr

then

elsepart

4

else

statement

elsepart
statement
| €

with the sentence symbol program and the terminals end, v, :=, expr,
„if, then, else.

Chap. 2

Syntax

30

Is it LL(1)? There are three productions with alternatives: programrest,
statement, elsepart. The first two are LL(1) since
first (program)

=I

ivan yy etelsite (ena)

=

{end}

first (assignment)

=

{v},

=

{if}

first (ifstatement)

The calculation of follow(elsepart) is longer:
follow(elsepart)
follow(ifstatement)
follow(statement)

=
=
=

follow(ifstatement)
follow(statement)
first (programrest)
U follow(thenpart)
U follow(elsepart)

by production
by production
by production

5
3
1

by production
by production

6
7

with the result:
follow(elsepart)

U
U

first (programrest)
follow(thenpart)
follow(elsepart).

Since the last term on the right-hand side agrees with the left-hand side, it
adds nothing to the set. In addition, since
first (programrest)

=
=

first (program)
{v,if,end}

=

{v,if,end}

U

first (end)

we have
follow(elsepart)

U

follow(thenpart).

Additionally,
follow(thenpart)
first (elsepart)
follow(ifstatement)

= first (elsepart)
U follow(ifstatement)
= {else}

by production
by production

5
7

=

follow(statement)

by production

3

=
=

{v,if,end} U {else}
{v,if,end,else}

hence
follow(elsepart)

Checking the LL(1) condition for production 7 results in:
first (else

statement)

N

follow(elsepart)

=

{else}

# ®,

The grammar is therefore not LL(1).

The fact that the grammar in this example is not LL(1) does not preclude it
from being deterministically parsable with a lookahead of one symbol. The
Syntax analyzer can always choose the first alternative when it sees the production elsepart and else is the next input symbol. In spite of the ambiguity
of the statement

Sece22

if

LL (1) grammars and syntax analysis

a then

if b then

c else

31

d

the first else then always belongs to the innermost then (as in
PL/I and Pas-

cal).

LL(1) grammars and grammars of programming languages
The LL(1) conditions severely restrict the class of grammars that can
be analyzed deterministically. Almost all programming language grammars
violate
the LL(1) conditions. Especially disturbing are two facts:
1. Left-recursive productions are not LEG):
2. Alternatives that start with the same string are not LL(1).

However, it is almost always possible to transform non-LL(1) constructs into
LL(1) constructs. This is greatly aided by the use of EBNF notation. With it
left-recursive productions can be described by use of the repetition symbol {},
and common beginnings of alternatives can be extracted by factorization. We
have defined the LL(1) conditions only for grammars with simple BNF productions. So the question must arise how they look when an EBNF grammar
is used. We will defer the answer for the time being until the end of Section

223:

Computation

of start and successor

sets

For small grammars, the calculation of start and successor sets to check for the
LL(1) property can be done by careful visual inspection. However, an algorithm is needed for larger grammars. Since the derivation of the form A >+ ¢
plays an important role, we will first introduce the concept of 'deletability".
2.23 Definition Deletability

A nonterminal A is called deletable, if it produces the empty string:
A="

«.

In this section we will write deletable symbols in square brackets: [A].
An algorithm for marking deletable symbols in a grammar is trivial. It is
based on the following assertions:
1.

IfA > eisa

2.

FA

production then A is deletable.

X... X, is a production and all X; are deletable, then A is

also deletable.

Chap. 2

Syntax

32

2.24 Algorithm Marking deletable symbols
MarkDeletableSymbols:
Mark all nonterminals A for which A>8& exists;
repeat
-- Assert: All marked symbols are deletable

Mark all nonterminals A for which A > X1...Xn
and Xj...Xn are all marked nonterminals
until No new symbol was
end MarkDeletableSymbols

marked

Sets of terminal start symbols. Three cases must be distinguished for the
calculation of the terminal start symbols of a string a:
1.

the string is deletable;

2.
3.

its first nondeletable symbol is a terminal y;
its first nondeletable symbol is a nonterminal Y.

From this, computation rules (1) through (3) follow:

1.

foro = [X)] ... X];
EIESE(WE-FFILSERT

2.

3.

EUR

fora = [X,]... [X;] yo,
first (a) = first (1) Us..U
fora= (Xq) 1X2
0,
FIRSELQ)EZEFTESEKT

EUTL

OERITSEIXE)

first (Xe) VO {y)
UERIESEN
RE

)RÜUSLTESET)

The set of start symbols of a nonterminal is the union of the sets of start symbols of its alternatives:
4.

“for a>

ay | sel oF

erase (UA))

First (OU

5 fU) eiliecye, (0)

From these computation rules, the following algorithm is derived.
2.25 Algorithm Calculation of the sets of terminal start symbols
FindFirstSets(lGTfirst):
param G: A grammar
nonterminals;

with

marked

deletable

symbols

first: array(l:n)
of set of terminal;
begin
first (1l:n):=@;
-- start with empty sets

repeat
for all

productions

A>Q]|...|Qm

do

and

n

Seci2.2

LL (1) grammars and syntax analysis

for all alternatives a@;=[Xıl...[Xt]Y@ with t>=0,
first (A) :=first (A) +first (X))+...+first (X¢);
case of
Y is terminal:
| Y is nonterminal:
| Y® is absent:
end
end
end
until No change in first
end FindFirstSets

33

Ywev*

do

first (A) :=first (A) +{Y}
first (A) :=first (A) +first (Y)
-- nothing

Terminal successor sets. When calculating the terminal successors of a nonterminal A there are also three cases which must be distinguished: in the
right-hand side of a production in which A appears, either a terminal y, a
nondeletable nonterminal Y, or nothing follows after any deletable symbols.
From this, the computation rules (5) through (7) follow:
5.

forB >A

[X)]... X],

(first (X}) U...U

6.

forB

>A

first (Xt) U follow(B))

[X)]... [XJ y 2,

(Eanesti(
Xk) Oe On tars

7.

fer

is in follow(A)

(xp) EU

tye)

is in follow(A)

B+
A [X] ... [Xi] Yo,
(first (X]) U...U first (X) U first(Y))

isin follow(A)

If all occurrences of A on the right-hand side of the productions are considered, the total set follow(A) will be the combination of all partial sets of
follow(A) that result from (5) through (7). Therefore we have the following

algorithm.

2.26 Algorithm Calculation of successor sets

FindFollowSets(lGlfirstT follow) :
param

G:

A grammar with marked deletable symbols and n
nonterminals;
first: array(1:n) of set of terminal;
follow: array(l:n) of set of terminal;

begin
follow(1:n):=@;
-- start with
repeat
for all nonterminals A do

empty

follow

for all productions B-@A[X1]...[Xt]Y¥@2

sets

with t>=0 and Yw,eVv* do

follow(A) :=follow(A) +first (X;)+...+first
(Xt);

Chap. 2

Syntax

34

case of
Y is terminal:
| Y is nonterminal:

| YM.

is absent:

end
end
end
until No change in
end FindFollowSets

follow(A)
follow(A)

:=follow(A) +{y}
:=follow(A)+first

(Y)

follow(A) :=follow(A) +follow(B)

follow

The implementation of the algorithms depends strongly on the data structure of
the grammar. The execution time depends on the order in which the productions are visited. Many optimizations are possible.

Principles of syntax analysis of LL(1) grammars
The principle of deterministic syntax analysis of LL(1) grammars can be described abstractly under the following assumptions.
1.

The grammar is given in ‘matrix form':
form
Aj > al.

| Ojjmax(i)

where

It has imax productions of the
1 SiS imax

The sentence symbol is A;. An alternative aj; is given by kmax components of the form

Oj = X
aj =e

ji. Xijkmax(i,j)

means kmax(i,j) = 1, and Xj) =e.

The representation is matrix-like: index i describes the production,
index j describes the alternative, and index k describes the component.
As programmers, we understand this representation as an abstract data
structure with the access functions:
X(Lidjlk): symbol
returns the value of symbol X ijk:

Kind(liljlk): Symkind
returns the kind of the symbol Xjjx,
where Symkind = (terminal,nonterminal,epsilon).
Rule(Liljlk): integer
If X;jx is the nonterminal A;, then this function returns index i:
Rule(liljik)=i' © X ijk = Ai

Kmax(litj): integer
returns the number of components of alternative j in the production i.

Sec. 2.2

LL (1) grammars and syntax analysis

35

Match(\xli): boolean
returns true if a phrase of the nonterminal A; can start with terminal
x, or - if Ai >* e -x can follow the phrase of Ai:
Match(\xli) & x e’first(A; follow(A;))

Alternative(Jx\i): integer
returns the index j of the alternative of the production i which can
begin with the terminal x:
Alternative(txli) =j = xe first(a; follow(A;))
3.

The string to be parsed consists of pmax symbols Sp:
© = S}...Spmax With pmax 2 1

The description is basic and abstract since we ignore (1) the concrete data
structures of the stored grammar, (2) the implementation of the access functions, and (3) the fact that the input string is actually supplied by a lexical analyzer.
We will now give a recursive and a nonrecursive parsing algorithm.
The recursive algorithm uses an internal recursive procedure Parse. Its
operation should be clear from the following specifications and from the text
of the algorithm without additional explanations.
Initial state: The input string, up to the symbol s,.ı, is recognized as a legal
beginning of a sentence. The A;-phrase starts with Sp:
Function: Parse(litcorrect) tries to parse the A;-phrase.

Final state: If correct = true, an A;-phrase is parsed and p is advanced
such that s, is the first input symbol that is no longer part of the A;-phrase. If
correct = false, an A;-phrase was not parsed.
2.27 Algorithm LL(]1) analysis (recursive)
ParseRecursive

(Tcorrect):

param
global

correct: boolean;
grammar in matrix

local

pmax: integer;
p: integer;

s: array(l:pmax)

Parse (Jilcorrect):
param i: integer;
correct: boolean;
local j,k: integer;

--the

string

is successfully

parsed

form;

of symbol;

--input

string

--index

of current

--an

Aj phrase

input

is parsed

symbol

Chap. 2

Syntax

36

--try

begin
-- position

to parse

an

1 --

if Match (4spli)

Ai phrase

|

then
j:=Alternat ive (Jspli); k:=1;
loop
—— OO SuseO Nie
case Kind (LiljJk) of
terminal:

--parsing of A; possible
--parse ajj
--parse Xjjk

if p>pmax or Sp<>X (Lidjdk) then
correct:=false; exit
end;
p:=ptl
--read

next

input

symbol

| nonterminal:

Parse (JRule (Liljlk) Tcorrect) ;
if not correct then exit end

| epsilon:

--

do nothing

end;

ihe k<Kmax (JiJj)

then

k:=k+1l

end
else correct:=false
end

=>

correct:=true;

--parsing

of A;

exit

end

impossible

foyoyenlicst@yel. 3) =

end

begin
p:=1;

end

else

Parse;

--pmax and s are assumed to have values
Parse (J1Tcorrect); correct:=correct
& p=pmaxtl

ParseRecursive

We will show the behavior of the above algorithm in Example 2.28 below
where we take a snapshot of the states of the algorithm at 'position 1',
‘position 2', and 'position 3'.
2.28 Example Recursive LL(1) parsing

Consider Knuth's e-containing grammar from Example 2.21. Let us give
its components the indices i, j, and k, and extend it by the component
eof so that it will not produce empty sentences:
Sh
E2
A3
Ba

=
©
>
>

Hii
4211
a311
bai1

COs
A212 b213 E214
A312 b313 A314
Byı2 4413 Baıa

| b221 B222 ao23 E224
| €321
| €421

| €231

The input string shall be
a1 b2 by aq eofs

All steps performed by the algorithm can be traced in full detail by the
snapshots of Fig. 2.4.

Sec. 2.2

LL (1) grammars and syntax analysis

Recursion depth:
0
Position p Sp ijk Xijx

1

2

ijk Xijk

3

13k Xjjx

1
2

a
la

ib
2

a
oA

2-Alle

2

Zab

ZZ

1

22D

Aa

2

2D

SZ

37

ijk Xijk

1-TIGER
ea

3

Dub

2

2b

321

2
1
2
2

Sie,
Seb
Bab
4a

ii

4a

Dame

2

4a

421

€

3

4a

421

€

2
2

4a
5 eof

il

5 eof

Zee

=

2

5 eof

Zalse

3

5 eof

3
"3

5 eof
5 eof

2
3

5 eof
6 -

*

243°

5

BNGY

iy

¢

correct=true

2-BON
N
PPDEEB

BR
224

214
112
112

eof
eof

E

correct :=true

Bi
E

Oeil
E
correct :=true
224
E correct:=true
correct:=true

correct:=true

Fig. 2.4 Snapshot of the LL(1) parsing of Algorithm 2.27 applied to the
grammar of Example 2.28

The nonrecursive algorithm uses a stack for the intermediate storage of the
indices of all nonterminals that are currently being processed. The access
functions of the stack are InitStack, Push(Liljlk) and Pop(Titjtk).
The algorithm can be in three states: findalternative, try, forward.
These are characterized by the assertions which hold in each one respectively:
findalternative: The input string is already recognized up to the symbol s,_ı
as a legal beginning of a sentence. s, is recognized and it is expected that an
A,-phrase, starting with s,, will follow. The index j of the matching alternative of the A;-production will be found.
try: The grammar symbol X;;, will be parsed.
forward: Xj, has been successfully parsed, so move to its successor.

For the stack, the following assertion holds in all three states. If i = 1, the
stack is empty. If i > 1, then A; is at the top of the stack.

38

Chap. 2

Syntax

2.29 Algorithm LL(J/) parsing (nonrecursive)
ParseNonRecursive(fcorrect) :
--the string is successfully
param
correct: boolean;
global grammar in matrix form;
--input string
s: array(l:pmax) of symbol;
pmax: integer;
try, forward);
type
State = (findalternative,
i,j,k,p: integer;
local
Su

parsed

mowaleys

begin
i:=1; p:=1; stack:=empty;
st:=findalternative;

--pmax and s are assumed
--have values
-=starbewichsefirseterule

to

loop

case st of
findalternative:
—— Osi
one —

ge Match (Jspli)
then

j:=Alternative(Yspli);

==Xj;x

else

k:=1; st:=try
correct:=false;

==5p does

exit

is first
not

component

match

end
| eine

——

--parse

Xj jx

DOS
tekOns ZI

case Kind(Libjlk)
terminal:

if p>pmax
p:=ptl;

of

or X (Lid jk) <>sp then
st:=forward

--Xijk

correct:=false;

exit

end;

is parsed

| nonterminal:

Push (Lil jlk);

i:=Rule(Liljlk);

st:=findalternative

| epsilon:
st:=forward

end

--case

Kind

| forward:

--

--advance

position

to next

component

3 --

if k<Kmax(lil})
then
else

k:=k+l;
if i>1

st:=try

--no end of alternative
--end of alternative

then Pop(TiTjtk)
else correct:=p=pmax+l;
end
end
--ease st
end
==lloop
-- position 4 -end ParseNonRecursive

--Nonterminal
exit

--rule

X;;jx is parsed

1 is parsed

The behavior of the nonrecursive algorithm is shown in Example 2.30.

SeCig2.2

LL (1) grammars and syntax analysis

2.30 Example

39

Nonrecursive LL(1) parsing

We consider the same grammar and input stream as in Example 2.28 with
snapshots at positions 1 to 4. The algorithm executes as in Fig. 2.5.
Position

P Sp

Stack

(End-Of-Stack

ZA

NA

2,

WL

BOS

Bev

Matt

222,

204)

110

BR

Pilih

Aaa

224,

214,

111

22.0,

alla,

hahah

224,

214,

111

left)

EN
WDEWEN
SL
IE
ler
Ken
lon
hey
iep
lem
jemion
ton
hte
ue
DBIEDZREED
® fo} Ph

eof
eof

mmeow

eof

NS
UO
WW
DD
WI
NO
ING
DIN
DIA
Go
NM
Gd
FD
>
ID
MN
DD
G
MR
G

231
224
214
m
111 mm
112 eof
112 eof
1 eof
Oo
SS
BS
SP
Lf
NH
DH
DD
FF
FF
WWD
NHN
OOo
DDO
SP
WW

correct=true

Fig. 2.5 Snapshots of the nonrecursive LL(1) algorithm 2.29,
applied to the grammar of Example 2.28

The recursive algorithm is statically shorter and more elegant. The nonrecursive algorithm is more suited for the inclusion of error handling since the explicitly stacked symbols are accessible (see Section 2.6).
Both scan the input string strictly from left to right (p is never decremented). In addition, there exists a grammar-specific upper limit c such that after

Chap. 2

Syntax

40

a maximum of c loops, a new input symbol is read. Hence, the algorithm has
a linear execution time with respect to the length of the input string. It has the
time complexity O( pmax).
LL(k)

grammars

for k > 1

A lookahead of more than one symbol is rarely used in compilers. We shall
therefore treat LL(k) grammars for k > 1 only briefly, for the sake of
completeness.
First, we define the set of terminal start symbols of length k of a
string a:
2.31 Definition Terminal start symbols of length k

first(a) = {B: a =* Bo with BeV;*, IBl=k,weV*}
first(a)
= {B:a=*B with
B Vz", IBI<k)

for Bork
for B >k

If the terminal string which can be derived from «a. is shorter than k, then
the elements of first,(a) are also shorter than k.

We will now give a formal definition of the LL(k) grammars according to
Rosenkrantz and Stearns [1970]:

2.32 Definition LL(k) grammar
A grammar is called LL(&) if for all left-canonical derivations of the form

S>*aAo*a
S>*aAw*

Bo
a Yo

where first,(B ©) = first,(y @), it is implied that B =y.
This means that in an LL(k) grammar no two sentential forms with the leftmost nonterminal A and the alternatives A > B and A > ycan exist in which

the left canonical derivations of the remaining strings beginning with ß and y
agree in the first k symbols. From this, we get the following condition:
2.33

LL(k) condition

A grammar is LL(k) if for each pair of rules
A>ß

and

Ay

and each left canonically derived sentential form that contains A:

SECEL2

LL (1) grammars and syntax analysis

41

S>*aAo
the following condition holds:
first(B w) 0 firstly o) =
2.34 Example

LL(2) and LL(3) test

Again, consider the grammar
S 7

A;

A —

aB

| BBb,

B—>
b | ab

in order to see if it is LL(k) and determine the value of k.
The only pair of rules that creates a problem is:

A > aB
A > BBb
and the only sentential form in which its left-hand side A appears is A;.
k=1:

the LL(1) test produces:

1 (aB;)

=iaq}

Lı(3BBb;)

=T{a,b}

Since a belongs to both sets, the grammar is not LL(1).
k=2:

the LL(2) test produces:

L>(aB;)
L2(BBb;)

= (aa, ab}
= {bb, ba, ab}

Since the element ab belongs to both sets, the grammar is not LL(2).

k=3:

the LL(3) test produces:

L3(aB;)
L3(BBb;)

= {ab;, aab}
= {bbb, bab, abb, aba}

Since both sets are disjoint the grammar is LL(3).

Algorithms for the computation of the sets first,(a) and for checking the
LL(k) conditions for k > 1 can be found in Aho and Ullman [1972].

No left-recursive grammar is LL(k) for any k. Another simple grammar
that is not LL(&) for any & is:
S7

A;

A>
a | aka

It has the language {q2ntl }. If there were a value of k such that

first{aa"’,) n first(aAaa",) = ©
then k > n+1 would be true. However, since n can become arbitrarily large,
there is no such k.

Syntax

42

Chap. 2

Rosenkrantz and Stearns [1970] proved the following interesting statements
about LL(k) grammars:

(1) It is undecidable whether a given arbitrary grammar is LL(k) for an

unknown value of k.

(2) It is decidable whether a given arbitrary grammar is LL(k) for a given
fixed value of k.
(3) If a grammar G is not LL(k) for a given k, it cannot be determined if
there is an equivalent LL(k) grammar for G.

(4

A

For each LL(k) grammar G that contains e, there is an e-free LL(k+1)

grammar that produces the same language as G, but without the empty
string.

2.3

The top-down

graph

In a table-driven syntax analysis, the grammar of the source language must be
stored in main memory so that the analysis algorithm can access it freely. The
three-dimensional abstract data structure consisting of rules, alternatives, and
components, used in Section 2.2 for the representation of the principal algorithms, is not suited as concrete data structure. It does not make efficient use
of memory and the grammar cannot be represented in EBNF form. A representation that is much better suited for a practical top-down analysis is a
special kind of graph. We call it top-down graph. It is similar to the syntax
diagrams, introduced by Wirth, that were used to describe the Syntax of
Pascal. In Coco, the top-down graph is used as a preliminary step to the even
better suited G-code. Since the G-code is understandable only by means of the
top-down graph, we will describe that first.

Basic structure
The basic structure of the top-down graph is a collection of ordered binary
trees. Its nodes are the grammar symbols of the right-hand sides of syntax
rules. Right pointers link the components of an alternative while left pointers
link the start symbols of different alternatives.
In the picture of a top-down graph, a right pointer leaves a node
at the
right, a left pointer leaves the node at the bottom:

Sécs23

The top-down graph

node ———~

43

right pointer (to next component)

left pointer
(to next alternative)

2.35 Example Basic structure of the top-down graph

Figure 2.6 shows the top-down graph of the grammar
SAH;

B —

aB

7

| BBb

B—
b | ab

Notice that the top-down graph comprises only the right-hand sides of
the rules.

?

Ss

A—-

AS

a—B

B—B--b

a— > b

Fig. 2.6 Top-down graph of the grammar of Example 2.35

Factorization
An advantage of the top-down graph over a linear representation is the ability
to show alternatives in a factorized form, as can be done in EBNF. From the
rule

A—

abclacd

withthe top-down graph

A >

a—»b— +c
Q—
Ce ed

we get by left-factorization the EBNF rule

44

Syntax

A —>

Chap. 2

a(belcd)

withthe top-down graph

A =

a—»b—+c

abcldec

withthe top-down graph

A >

a—b—c

From the rule

A —*

we obtain by right-factorization the EBNF rule

A ——

(ab|de)c

withthetop-down graph

A =

we: c
d—ee

Notice that the last top-down graph is no longer a tree.
A special case occurs when an alternative is the beginning of another
alternative. Then, an e is created by factorization. For the rule

A —

abla

with the top-down graph

A =

a—b

A =

a—+b

we get by left-factorization the EBNF rule

A-—>

alb]

with the top-down graph

€

Removal of left-recursive rules
The symbol strings defined by left-recursive rules can be represented in EBNF
by the repetition symbol. Repetition corresponds to a loop in the top-down
graph. From the rule
A —

alAb

with the top-down graph

A =

a

5

A=
we get the EBNF rule

A — >, a{b}

with the top-down graph

A =>

a a: b “|
€

Sece<:2.3

The top-down graph

45

This top-down graph is also not a tree. It can easily be verified that it represents all possible right-hand sides such as a, ab, abb, abbb, etc.
The complement symbol any
Sometimes it is desirable to represent a set of terminals by its complementary
set, for example

1.
2.

3.

in the description of a string enclosed in quotes: the set of all symbols in
the alphabet except the quote;
any symbol in a comment of the form (* ... *) by the set of all symbols
except the symbol *) ;
any symbols except begin (to skip declarations).

Complementary sets cannot be represented in the production notation of formal languages. Therefore, the only thing left to do is to enumerate all members
of the complementary set, which is very inconvenient. For use in Coco, we
introduce the special symbol any to denote complementary sets.
2.36 Definition

Complement symbol any

The complement symbol any represents every arbitrary terminal that is
not a terminal start symbol of an alternative of any.

Figure 2.7 shows the three examples above with the symbol any as an EBNF
rule and as a top-down graph.
string

comment

skip =

{any}

win

=

=

"(*"

{any}

Usha)

{any}

"begin".

"x)"

string

=>

comment

skip

=

any

won

“ce

>

|

any

non

a)"

any a ii "begin"

Fig. 2.7 The meaning of the complement symbol any

Equivalent top-down graphs
If one uses only the basic structure, then a unique top-down graph results
from a grammar rule. This uniqueness is lost with factorization and removal of
left recursion. In these cases there are sometimes several equivalent top-down
graphs.

Chap. 2

Syntax

46

2.37 Example Equivalent top-down graphs
Consider the expression

TEE TTS
ites i ays
By factorization and elimination of left-recursion the graph shown by the
upper part of Fig. 2.8 will result. It has 10 nodes and corresponds to the
EBNF rule
E

=

die

|

wow

We

|

Woo

T)

RN

ii)

|

noe

Ths

Another top-down graph which is equivalent to both but consists of only
7 nodes appears in the middle part of Fig. 2.8. This graph corresponds to
the EBNF rule
E

=

(eae

|

WES

Ge

C(t

|

|

T}

Figure 2.8 shows another equivalent and even more condensed top-down
graph with only 6 nodes. This top-down graph no longer corresponds to
an EBNF rule.

10 nodes

[

7 nodes

6 nodes

=

+
wa
mwa)

Fig. 2.8 Three equivalent top-down graphs for expressions

Sec. 23

The top-down graph

47

The graph with the fewest nodes occupies the least memory. However, there
may be reasons (due to the treatment of semantics, see Section 3.6) not to
compress the top-down graph too much.
These examples show that for each EBNF rule there is a corresponding
top-down graph. But a top-down graph does not always correspond to an
EBNF rule.

Representation
The top-down graph can be represented in memory by a data structure of
nodes and pointers that may be dynamically generated or statically declared
and initialized. Since the number of nodes is known in advance and does not
change, the static declaration is more efficient. In Coco, the top-down graph
basically consists of an array of nodes, and each node consists of four
components:
type

Graphnode = record
kind:
(terminal,nonterminal,any,eps) ;
val,lp,rp: integer;

end;
var

graph:

array(l:n)

of Graphnode;

The names have the following meaning:
kind: ” the various node types.
val:
Ip:

rp:

n:

the node symbol in some encoding, meaningless for e-nodes.
the left pointer that points to the first node of the next alternative. If
lp > 0 then graph(Ip) starts the next alternative. If /p = 0, the current alternative is the last one of the production.
the right pointer that points to the next component. If rp > O then
graph(rp) is the next component. If rp = 0, the current component is
the last component of an alternative.
the number of nodes in the grammar.

LL(1) Conditions for top-down graphs
The LL(1) condition of Section 2.2 refers to the simple grammar representation with rules and alternatives. If a grammar meets these rules, the correct
alternative can be selected by a lookahead of one symbol without backtracking. A similar condition for the top-down graph ensures the correct selection
of the alternatives without backtracking by use of a lookahead of one symbol.
To simplify the discussion, we introduce two auxiliary concepts. Since
they are of central importance for the syntax analysis of top-down graphs, we
will use them often. We call these concepts ‘alternative chain’ and 'match'.

48

Syntax

Chap. 2

2.38 Definition Alternative chain

An alternative chain is the ordered set of all nodes of a top-down graph
that are linked together by left-pointers. A node pointed to by a right
pointer is the start of an alternative chain. A node without a left pointer is
the end of an alternative chain. We can define nodes that are not linked by
left pointers as also belonging to an alternative chain. In this case the
alternative chain consists of the node alone.
2.39 Example Alternative chains
In the top-down graph of Fig. 2.9 symbols are distinguished by subscripts. The graph contains the alternative chains
(Pisses)

The

ae

oe

sy

Note that node T; belongs to two alternative chains.

Beer

1

|

12

At

3

+ —TIT
4

|

5)

u,
Eg

Fig. 2.9 Top-down graph for expressions with indexed symbols

2.40 Definition

Match

An input symbol x and a node of the top-down graph with symbol sy
match (i. e. fit together) if one of the following conditions are met:

1. sy is a terminal and x = sy;
2. sy is a nondeletable nonterminal that may start with x;
3. sy is a deletable nonterminal. sy can start with x or xcan follow the
node sy;
4. sy is an e-node and x can follow the node SY;
5. sy is an any-node and x matches no other node in the alternative
chain to which the any-node belongs.

In order to select a node Joc uniquely from an alternative chain using
a lookahead symbol x, x must match only one alternative:

Sec. 2.3

The top-down graph

49

2.41 LL(1) conditions for top-down graphs
An alternative chain is LL(1) if an arbitrary input symbol matches at
most
one of its nodes. A top-down graph is LL(1) if all of its alternative chains
are LL(1).

The top-down graph of Fig. 2.9 is therefore LL(1) if T cannot start with
+ or
— and if E cannot be followed by + or — (these symbols would match the enode).
Since each EBNF

production corresponds to a top-down graph, the
LL(1) conditions for top-down graphs are also the LL(1) conditions for EBNF
grammars. In order to check if an EBNF grammar is LL(1), it is easiest to

generate its top-down graph and check if it meets the LL(1) conditions. The
LL(1) conditions for EBNF grammars can also be derived from the definition
of the EBNF grammar alone without constructing a top-down graph.
However, this is cumbersome and results in no new insights. We therefore
omit the description and leave the task to the interested reader.
LL(1) Top-down graphs and grammars of
programming languages
If top-down graphs are to have practical value, one must be able to represent
the grammars of programming languages as LL(1) top-down graphs, and
therefore as LL(1) EBNF grammars. We may ask, therefore, if they do this
without exception, or if there are constructs that resist an LL(1) representation,
and if so, what can be done about it.

First of all, LL(1) violations by left-recursive productions and by
of several alternatives with the same symbol can easily be avoided
down graphs and in EBNF notation. Remaining LL(1) violations can
be removed with various tricks that are determined with insight into
ticular situation. As an aid for the 'grammar designer’, we will treat
typical cases and distinguish between the following five methods:

the start
in topusually
the parseveral

substitution and factorization;
alphabet extension;
syntactic extension;
acceptance of non-LL(1) constructs;
AB
WN

miscellaneous transformations.

Substitution and factorization. Consider a production with two alternatives
that start with different nonterminals X and Y, where X and Y can start
with the same symbol (terminal or nonterminal). Then it is often possible to

50

Syntax

Chap. 2

ions, and
replace the symbols X and Y by the right-hand side of their product
,
to extract their common starting string by left-factorization.
of
tions
instruc
DO
This can be simple and obvious as in the various
PLM/80 (and similarly in PL/1):
statement

=

| dostatement
| whilestatement
| forstatement

| casestatement

|
dostatement
whilestatement
forstatement

=
=

casestatement

=

=

nyo
"DO"

Wout joil@xel<.
"WHILE" expr ";" {statement} ending.
"po" ident "=" expr "TO" expr ["BY" expr]
{statement} ending.
"DO" "CASE" expr ";" {statement} ending.

Wek

By substitution and factorization this results in
statement
‘

=

|; "po"

(Um block
(PIZCASE expr
Pee WHTEE express
aidentut="exor

) {statement}

"TO"

Rexpra

(BY

expr

ending

)
er

However, it can also be difficult. In grammars such as Modula-2 a factor can
be a set or a designator, and both can begin with an identifier:
factor

=a

designator
set
qualident

= qualident {"." ident | "[" exprlist
= (qualident nu
Vfelenentlesejen}T:
= ident {"." ident}.

Edesionatorz

lack parcial

uses

"]"

| "an.

Note that even the production for designator taken alone is not LL(1). For instance, ident.ident may be simply a qualident or a qualident followed by
dent
The removal of the LL(1) conflict consists of combining designator and
set into a new symbol deset, and then splitting designator into ident and a
remainder desigrest. After several substitutions and factorizations, the following LL(1) constructs result:
factor

=...

ieee agli

| deset

aa ee

|

LGN

120067

Sec#23

The top-down graph

iidenrs

Mer

(BAU

saidenty

tp wi erxpristeunle,
[elementlist]
"}"

| "{"

51

desigrest

[actpars]

| [actpars]
Nae

desigrest

=

{"."

ident

| "[" Cxprduist

my]

tan) mtn).

The equivalence of the old and new constructs can no longer be easily seen.
Alphabet extension. In selecting an alternative, it is fairly common for two
lookahead symbols to be necessary to find the right one. The main example of
this is when labels appear in front of statements:
statement

=

[ident

":"]

(ident

":="

expr

| ifstatement

gestsbc

An ident at the beginning of a statement may be a label or the left part of an
assignment. This can only be determined by the symbol following ident. This
conflict can often be resolved by extending the terminal alphabet. In the preceding case, the word label can be added to the alphabet, and the lexical
analyzer can be required to supply a label instead of an ident if ident is fol-

lowed by a':'. In this case, the lexical analyzer is used to resolve the LL(1)

conflict.
This method leads to complications if the lexical analyzer is required to
carry out a wider inspection of context to determine whether or not to substitute two terminals by another. For example, in Algol 60, ‘ident :' does not always mean the label of an instruction. An identifier may also appear in a
declaration, as in ARRAY(n

: m). In such cases, the lexical analyzer is no

longer independent of the syntax analyzer since it must consider the context.
Syntactic extension. In Algol 60 there exist multiple assi gnments, such as
assignment

= designator

":="

{designator

":="}

expr.

where expr can start with designator. This LL(1) violation is very nasty. It
can be removed by ‘substitution and factorization’, but this is very cumbersome (the reader should try it). It is easier to 'expand' the designator inside
the curly brackets to expr. This requires the introduction of an additional production for assignrest:
assignment

= designator

":="

assignrest

= expr

assignrest].

[":="

assignrest.

The syntactic extension must be compensated by a semantic restriction. If in
the production for assignrest the right-recursive part is present, expr must be
restricted to be a designator. This can be achieved by the introduction of a
boolean attribute isdesignator. Anticipating knowledge from Chapter 3, this

Chap. 2

Syntax

52

may be written as an attributed grammar as follows:
assignrest

=

©XPLTi sdesignator
(see
assignrest].

where

(isdesignator)

This means: by syntactic extensions, portions of the language definition are
moved from syntax to static semantics.
Acceptance of non-LL(1) constructs. If it is known that the parser tries to
match the alternatives in the order they are written, some LL(1) violations can
be left alone. The best known case is the dangling else:
ifstatement

=

"IF"

expr

"THEN"

statement

["ELSE"

statement].

Although this construct is not LL(1), and is even ambiguous (see Example
2.22), it can be left alone if one can be sure that the parser, having recognized
the statement following THEN, first tries to detect the optional ELSE, and
only regards the entire if statement as complete if there is no ELSE.
Other transformations. Sometimes, a grammar that is not LL(1) can be trans-

formed into an equivalent LL(1) grammar by simple transformations that do
not fall into any of the four categories above. For example, in Algol 60, a
block is defined as
block

= head

head

=

";"

"begin"

body.

declaration

{";"

declaration}.

This construct is not LL(1) since the semicolon is used in a dual role. It sep-

arates adjacent declarations and it separates body from head. The solution is
simple: The grammar can be transformed so that the semicolon becomes a
terminator instead of a separator:
block

head

= head

body

"begin

sdeclaratvone

t+?

(declaration,

a}

The necessity of such transformations, their difficulty, and the uncertainty of
executing these transformations correctly is a weakness of the LL-method, and
often a cause for criticism. In bottom-up analyzable LR(1) grammars, no
transformations, or only a few, are needed, so research has been focused on
the LR-method. However, syntax is but one aspect. What is gained with the
LR-method must be paid back by the connection of semantics to syntax: it is
much more inflexible in the LR-method than in the LL-method, often leads to
violations of the LR-property, and then also requires transformations. In addition, the LL(1)-method is much easier to understand than the LR-method. This
results in easier transformations and more understandable error messages.

secs2.4

2.4

The G-code

The

53

G-code

A top-down graph that resides in memory is a useful way of representing
a
grammar. It already requires little space, but it can be significantly compressed
further. Let us consider the grammar of arithmetic expressions:
S = expr.

eXPrE
RECOM (usr Gece mht
term = factor {"*" factor}.
factor

= v

| "("

expr

")",

Now, let us add the production S' = S eofsy where S' is the new sentence
symbol and eofsy (= end of file) a new terminal. This trick ensures that each
sentence terminates with the same symbol eofsy and that there is no empty

sentence if S can be derived into the empty string.

Su

=>

S—

S

>

expr

expr

=>

CHE

term

>

factor

factor

=

v

ME

=

eofsy

"+"

"*"

expr

—

term

a

_>- factor

=

ye

Fig. 2.10 Top-down graph for an expression: graphic representation

In Fig. 2.10 we see a top-down graph of a grammar with 15 nodes. In Fig.
2.11 we see the internal memory representation described in Section 2.3. If
we assume one byte each for the components typ and val, and two bytes
each for /p and rp, then the table requires 15*6 = 90 bytes.
Compacting can be achieved by partitioning the nodes according to their
types, and by coding the individual types so that they do not contain any unnecessary information. The G-code (grammar code) that we use is such a
code. For syntax analysis the elements of the G-code behave as instructions
and therefore they are written as instructions. Sequential G-code instructions
are sequentially executed. They correspond to nodes in the top-down graph

Chap. 2

Syntax

54

as far
that are connected by right pointers. Definition 2.42 defines the G-code
graph.
n
as it is relevant for the representation of a top-dow

tule for S '

tule for S

tule for expr

tule
for term

rule for factor

Fig. 2.11 Top-down graph for an expression: representation in memory

The G-code is augmented by tables containing the lookahead symbols. With
each nonterminal symbol sy (not with each nonterminal node) there is associated a set first(sy), containing its terminal start symbols.
The operand nr
of an e-instruction (= EPS and EPSA) refers to an array epsset. Thus
epsset(nr) contains all terminals that match the corresponding e-node (see
Definition 2.40). The operand nr of an ANY A-instruction refers to an array
anyset. Thus anyset(nr) contains all terminals that match the corresponding
any-node. In summary, these G-code lookahead sets have the following data

structures:
first:

array(l:maxnt)

of Symbolset

epsset:
anyset:

array(l:maxeps)
array(l:maxany)

of
of

Symbolset
Symbolset

If the lookahead sets are stored bitwise, they do not require much memory.
It can be seen that each node of the top-down graph corresponds to a Gcode instruction. The G-code instructions RET and JMP are added at the end
of productions and loops where the linear execution sequence is interrupted.

Sec: 2:4

The G-code

2.42 Definition

55

G-code (incomplete)

Instruction Bytes
Description
ee
e
Ah eh Ue e
i

sy

2

ERBEN

terminal

If the next input symbol is sy then recognize it, else report an
error.
TA

sy adr

4

terminal with alternative
If the next input symbol is sy then recognize it, else go to
adr.

NT

sy

2

nonterminal
If the next input symbol is a terminal start of sy then step
through its production, else report an error.

NTA

sy

4

nonterminal with alternative
If the next input symbol is a terminal start of sy then step
through its production, else go to adr.

1

any
Recognize the next input symbol.

4

any with alternative
If the next input symbol is contained in the symbol set indicated
by nr then recognize it, else go to adr.

2

epsilon
If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else report an error.

4

epsilon with alternative
If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else go to adr.

3

jump
Go to adr.

adr

ANY

ANYA

nr

IES)

ie

EPSA

nr

JMP

adr

adr

adr

1

RET

return

Return from the production of a nonterminal.

The operation code and the operands sy and nr are 1 byte each; adr is 2
bytes.
The following G-code results for the grammar shown in the top-down
graph of Fig. 2,10:

In?
are?
5) RET

The production

S'

= S eofsy.

The

S = expr.

S

eofsy

6
8

NT
RET

expr

9
Ta

NT
erh

term

19

NT

production

The production
BERN
term

20

expr

= term

{"+"

term}.

Chap. 2

Syntax

56

iW
20
22

JMP
11
BPSma
RET

31>
SAGE
36

JME
25
ERS?
RET

37
41

TATEN
RET

42

T

"

The

production

The

production

term

=

factor

FACT ORM

wa

{Ux

SACtOR} .

NeZexpren)ir

42
(”

44

NT

expr

4 6

M

" )u

48

RET

The lookahead sets are:
first (S)

Sn,

first (expr)

Zn)

I

first (term)
eirseltactor)

=
=

{vj 3}
Aig, ul)

epsset(1) =

{eofsy,

")"}

=

{eofsy,

")",

epsset (2)

"t"}

anyset is empty since the grammar contains no any-symbol.
The total amount of G-code is 48 bytes, which is slightly more than onehalf of the top-down graph.
In Coco, first of all a top-down graph is generated. It is then used to
check several properties of the grammar, and to calculate the start and successor sets. Finally the graph is transformed into G-code, and this is the ultimate
structure in which the grammar is stored.

2.5

Parsing with the G-code

Parsing becomes quite simple with the G-code since the G-code itself is already a parsing program. To make a parser, it is only necessary to code an
interpreter for a G-code program.
In this section we will develop such a parser without error handling. In
the next section we will add the error handling.
Assumptions
We will summarize here the assumptions on which parsing with the G-code is
based.

Sec. 2.5

1.
2.

Parsing with the G-code

The G-code is derived from a top-down graph that meets the LL(1)
conditions.
IfS is the sentence symbol, then the top-down graph and the G-code are
expanded by the production
Su)

3.

4.

07.

SSmCOLSY,

where eofsy is the terminal end-of-file that does not appear in the original grammar. The first G-code instruction of this production has the
address 1.
The symbol string to be parsed is supplied by a lexical analyzer, which
provides the next input symbol in the variable typ for each call. After
reaching the last source symbol, the lexical analyzer supplies the symbol
eofsy.
The parsing algorithm uses a stack of actual length lacts (= actual length
of stack) to store the addresses that follow the nonterminal instructions

currently being processed (these are the "return addresses" of the currently parsed nonterminals).

Overview
The parsing algorithm executes the G-code program which is controlled by the
input string to be recognized. It starts at address 1 and ends at the instruction
for the symbol eofsy. Depending on the current input symbol typ and the
current G-code instruction several courses of action are possible. When the
algorithm tries to recognize a terminal there are two possibilities: if it succeeds
then it moves to the next symbol; if it fails then it goes on to the next alternative (if there is any). When the algorithm tries to recognize a nonterminal,
there are also two possibilities: if the input string and the nonterminal match
then the algorithm pushes the address of the next instruction on the stack and
jumps to the first G-code instruction of the nonterminal; if they do not match
then it goes on to the next alternative (if there is any). At the end of productions, the ‘return address’ is popped from the stack with RET, and the
algorithm continues from there on. When an error occurs, error handling and
synchronization take place, after which parsing continues as if no error had
occurred. The analysis ends when typ = eofsy and the corresponding Gcode instruction is T eofsy.
The parsing algorithm is called Parse. It returns a boolean variable
correct which will be true if the analyzed input text is syntactically correct.
Parse is an interpreter that has the following structure:

Chap. 2

Syntax

58

Parse (Tcorrect):

--program

pc:=1;
loop
opcode: =G-code (Upc) ;
case opcode of

--G

counter

operation

ts
| ta:

execute
execute

instruction
instruction

"T sy" and change pc
"TA sy" adr and change

| jmp:
end

execute

instruction

"JMP

code

pc

adr"

end
end Parse

Inside the loop, a value is assigned to the result parameter and the loop is
terminated if typ = eofsy.

The simplified G-code parsing algorithm
First we will present a simplified version of Parse that does not contain the
instructions

ANY, ANYA, EPS, EPSA,

and does not have

any error

actions. We further assume that nonterminals are not deletable. For the description of Parse in Adele, we will use the following routines:

Decode( pct opcode sy? nr? adr? nextpc)
returns the parameters of the G-code instruction starting at address pc.
(An operand that does not appear in the actual instruction returns an undefined value of the corresponding parameter.) nextpc is the address of
the next instruction.

NewSym(t typ)
returns the next input symbol.
Root(} sy): integer
returns the address of the first G-code instruction for the production for
the nonterminal sy.

By using these actions, the simplified algorithm is as follows:
2.43 Algorithm Parse (simplified)
Parse(Tcorrect)

:

param
CONSE

correct: boolean;
COLSY =.

type

Instruction

=

--correctness
--end of file

indicator
symbol

(t,ta,nt,nta,
jmp, ret) ;

local
adr: integer;
first: array of Symbolset;
lacts:
integer;
nextpc: integer;
nr: integer;

--instruction part adr
--lookahead symbol sets
--actual stack length
--addr.of next G-code instr.
--instruction part nr

ECHL

Parsing with the G-code

59

opcode: Instruction;
pc: integer;
stack: array of integer;
sy: integer;
typ: integer;
begin

--instruction part opcode
--program counter
--nonterminals worked on
--sy part of G-code instr.
--current source symbol

pe:=1;

--init.and

lacts:=0;

NewSym(Ttyp) ;

read

first

loop

Decode (LpcT opcode? syfnrf aartnextpc) ;
case opcode of

te
if typ=sy
F
then
if typ=eofsy then
correct:=true; exit
end;

pes=nextpc;
else
end

NewSym(Ttyp)

correct:=false;

exit

| ta:

--get

instruction

--term.without
--must match

--terminate

--advance

then
else
end

pc:=nextpc;
pc:=adr

--may

NewSym(Ttyp)

fans
in first (sy)
lacts:=lacts+l;

stack(lacts)

else
end

correct:=false;

exit

| nta:
if typ
then

else

and read
unsuccessfully

with

alternative

match

--advance and read
--goto alternative

:=nextpc;

stack (lacts) :=nextpc;

pc:=adr

alternative

pc:=Root (sy)

--terminate

if error

loop

--nonterminal
--may match

in first(sy)
lacts:=lactstl;

alternative

--nonterm.without
--must match

if typ
«,
then

pc

successfully

--terminate

--terminal
if typ=sy

at

symbol

with

pc:=Root

--goto

alternative

--jump

to next

alternative
(Jsy)

end

| jmp:
pc:=adr
| -xets
pe:=stack (lacts);
--case
end
end
--loop
end Parse.

lacts:=lacts-1

--return
--find follower

instruction

in

stack

The complete G-code parsing algorithm
We will now add the interpretation of the instructions and properties that were
left out in the previous section, and provide the following explanations.
The instruction ANY recognizes any source symbol, and ANYA recognizes any source symbol that is a member of the lookahead set belonging to
this instruction.
The instructions EPS and EPSA recognize the empty string if the source
symbol matches their lookahead set.

Chap. 2

Syntax

60

In the case of an error, the analysis shall not be terminated. Rather, the
error handler
Error ({pclaltroot)

will be executed. Error requires as parameters the address pc of the nonmatching G-instruction and the address altroot (root of alternative chain) of

the first G-instruction of the alternative chain in which the error occurred.
Error synchronizes by skipping of input symbols, changes pc and altroot,
and sets correct to the value false. Error is thus local to Parse.
Every time an input symbol has been successfully parsed, the next symbol can be read, and altroot can be set to a new alternative chain. For semantic
reasons, however, these actions are delayed until the input symbol is actually
required by the parser. Instead of reading a symbol, the variable mustread is

set to true.
Furthermore, in the complete version we will consider the possibility that
a nonterminal X can be derived into the empty string. This can be tested with
the function
Deletable (x):

boolean

Such a nonterminal is always recognized, even if the current input symbol
does not belong to its terminal start symbols (explanation in Section 7.3.3).
This requires the interpretation of the instructions NT and NTA to be
extended.
Expanded in this way, the algorithm Parse has the following complete
form:
2.44 Algorithm Parse (complete)
Parse (Tcorrect):

param
const

correct:
eofsy=

boolean;

type

Instruction

=

ne,

pc:

sinteger;

Instruction;

of

integer;

indicator
symbol

jmp,ret);

--instruction part adr
--root of alternative chain
--lookahead symbol set
--lookahead symbol set
--lookahead symbol set
--actual stack length
--typ is consumed
--address of G-instruction
--instruction

part

nr

--instruction

part

opcode

--program

integer;

stack: array
sy: integer;

correctness
end of file

(t,ta,nt,nta,any,anya,eps,epsa,

local
adr: integer;
altroot: integer;
anyset: array of Symbolset;
epsset: array of symbolset;
first: array of Symbolset;
lacts:
integer;
mustread: boolean;
nextpc: integer;

opcode:

---

counter

--nonterminals worked on
--instruction part sy

Sec. 2.5

Parsing with the G-code

typ: integer;
Error
VS
GAs

--current source symbol
--local error procedure

Calol ierdones

begin
pc:=1; altroot:=1;
mustread:=true;
lacts:=0;
loop
Decode (LpcT opcode
if mustread then

--initialize

sy Tart adrtnextpc) ;

at pc
symbol

source

--terminal without
--must match

alternative

then

correct:=true;

end;
pc:=nextpc;

instruction
next

mustread: =false

#

if typ=sy
then
if typ=eofsy

exit

mustread:=true

else Error ($pclaltroot)
end
| ete
if typ=sy
then pc:=nextpc;
else pc:=adr
end
Ie hee

pe

--get
--read

NewSym(Ttyp) ; altroot:=pc;
end;
case opcode of
Gi

61

mustread:=true

sor Deletable

else Error (}pc,laltroot)

successfully

--terminal with alternative
--may match
--advance
--goto alternative

(Jsy)

without

--must

stack (lacts) :=nextpc;
altroot:=pc

loop

--advance
=-sets correct :=false

--nonterm.

ety peainer inst sy)
then
lacts:=lacts+l;
pc:=Root (Jsy);

--terminate

--push

--parse

--sets

alternative

match

rule

follower

for

nonterminal

correct:=false

end
Ente:

--nonterminal,with alternative
ir type in first (syn ion Deletable (J sy) --may match
then
lacts:=lacts+l;
stack (lacts) :=nextpc;
--push follower
pc:=Root (Jsy); altroot:=pc
--parse rule for nonterminal
else pc:=adr
--goto alternative
end
| any:
--any without .alternative
pc:=nextpc; mustread:=true
--advance
| anya:
--any with alternative
if typ in anyset (nr)
:
then pc:=nextpc; mustread:=true --advance
else pc:=adr
--goto alternative
end

| eps:

--epsilon

if typ

then

in epsset (nr)

--must

pc:=nextpc

--advance

else Error ({pclaltroot)
|

end
epsa:
if typ

in epsset (nr)

--sets

match

correct:=false

--epsilon with
--may match

alternative

Chap. 2

Syntax

62

--advance
--goto alternative

pc:=nextpc
pc:=adr

then
else

|

end

--jump

| jmp:
pce:=adr
| ret:
pc:=stack (lacts);
altroot :=pc
end
--case
end
--loop
end Parse.

2.6

lacts:=lacts-1;

to next

--return
--find follower

instruction

in

stack

Error handling

Principle
A syntax error arises in one of three situations: (1) the input symbol typ does
not match the symbol sy in the G-code instruction T; (2) typ is not a terminal start symbol of the instruction NT; (3) typ is not a terminal successor of
the instruction EPS. In any of these situations, the variable altroot contains
the address of the alternative chain in which the error occurred and the stack
contains the return addresses of all nonterminals that are currently being
processed.
This is sufficient information to collect all terminals that can be used to

continue the analysis. The following example illustrates the situation.
2.45 Example

Error situation

Consider the grammar fragment:
program = declarations body end.
declarations = .
body = statement {statement}.
statement

| "if"

=

relation

relation
relop

expr

=

= expr
Wy

"then"
relop

| Wau

body

...

expr.

| n_u

| We

| Wea

| we"

=...

and the input text
.

if

a:=b

then

c:=d

end

...

When the syntax analyzer detects the error caused by the ':=", the situation shown in Fig. 2.12 has been reached. The boxes in this figure enclose the grammar symbols of the G-code instructions whose addresses
are in the stack.

Sec. 2.6

Error handling

63

z
Es]
program

declarations

body
=

|Statement|

¢

—_—

statement

Statement

relation

expr

Io

if

a

relop

=

b then c:=d end

Fig. 2.12 Partial syntax tree of an erroneous translation of the instruction
if

end

c:=d

then

a:=b

The last input symbol which was correctly recognized is a. It was recognized as expr. Then relop must follow. Since relop cannot start with
':=' the procedure Error(tpctaltroot) is called. The stack contains the

addresses of the G-code instructions for the recognition of
eof,

end,

statement,

then
1

bottom of stack

top of stack

We will now collect the so-called 'anchors', i. e. all terminals that are suitable
for the resumption of the syntax analysis. They can be grouped into four
classes:
1.

All terminal start symbols of the alternative chain starting at altroot, because the erroneous symbol may have been added inadvertently by the
coder, in front of a symbol of the unrecognized alternative chain. In the

2.

All terminal successors of the alternative chain at altroot, because the erroneous symbol may appear in place of a symbol of the unrecognized
alternative chain. In the example, this set consists of the beginnings of
expr: Vv, C, +, -, (.

example, these are the symbols

>,

>=,

=,

<>,

<=,

<.

64

3.

4.

Syntax

Chap. 2

The terminal start symbols of all symbols in the stack, and of the alternative chains beginning with them. With these, syntax analysis can be
resumed after a non-recognized nonterminal. In the example these are the
symbols then, end, eofsy and the set first(statement).
All terminal successors of the alternative chains whose addresses are in
the stack. In the example, these are all terminal start symbols of body
since body follows then, and all terminal start symbols of statement
since statement follows statement.

While the inclusion of items 1 to 3 in the set of anchors is plausible, the inclusion of item 4 seems rather arbitrary. We could justify this by the fact that
items 3 and 4 are symmetric to items 1 and 2, but there is a heuristic reason as
well. In a grammar, where the ';' is a statement separator rather then a statement terminator, without rule 4 the set of anchors would contain the ';' but not
the start symbols of statements. Then, in the case of a missing ';' between

statements, which is a common error, the next statement would be skipped.
Rule 4 prevents this by adding the start symbols of statements to the set of
anchors. Similar errors, e. g. the suppression of a comma between expressions, are also quite likely to occur.
Now, input symbols are skipped until one of them appears in the set of
the anchors. In the worst case this appears at the end of the input text, since
eofsy is always among the anchors. Next, the stack must be corrected. If the
anchor is a terminal start symbol of the alternative chain, whose address is in
stack(t), analysis will be resumed at this address and the stack length will be

reduced to t - 1. In Example 2.45, only ':=' is skipped since b is a start symbol of expr and the stack is not reduced.
In summary, we can describe the principle of error handling as follows:
2.46 Principle of error handling

An error is detected if an alternative chain is unsuccessfully traversed up
to its end. Then the error is flagged and the analysis must be synchronized. The synchronization consists of collecting a set of anchors and of
skipping the input text up to the first input symbol that is contained in the
set of anchors. With it, the analysis can be resumed at the address pc of
the anchor. During this process the stack is reduced if necessary so that at
the end of the error handling the following assertion holds:
Starting with the G-code instruction at pc the analysis can be continued with the current input symbol typ (typ matches the alternative
chain at pc). The stack contains the return addresses of all nontermi-

nals currently under process when continuing the analysis with pc.

Sec. 2.6

Error handling

65

This error handling has two remarkable features:
1. It is completely independent of the Syntax of the input languag
e.
2. Anchors are collected only if an error is detected. It is therefor
e completely dynamic and starts anew for each error. Hence, the presence of
error
handling does not reduce the parsing speed in case of a correct
input
string. The synchronization itself is expensive, but, since errors are infrequent, this is only a slight disadvantage.

The algorithm Error
From the preceding section, the basic structure of the algorithm Error is
obvious now:

2.47 Algorithm Error (basic structure)
Error (tpclaltroot):
global

correct: boolean;
lacts: integer;

--actual stack
begin
correct:=false;
Print error message;
Collect anchors;
skip input symbols up to the first anchor;
Correct pc, altroot and lacts
.
»- It is synchronized. The analysis can continue
end Error

length

Error messages
The error messages are also independent of the input language. At the error
location, we simply extract all expected symbols from the G-code and list
them. In Example 2.45, the following error message will occur:
. if a:=b
|

then

relop

c:=d

end

...

expected

This message is sufficient for most purposes. In Coco we also provide the
option for the user to output his own error messages (see Section 5.2.2).
The collection of anchors
Since, after synchronization, parsing is resumed with a new G-code instruction newpc and with a new stack length newlacts, anchors are collected as
triples:
(newtyp, newpc, newlacts)

A procedure Triple produces a triple list in which the following triple categories are included:

66

1.
2.
3.

Syntax

Chap. 2

the terminal start symbols of the alternative chain beginning with altroot,
the terminal successors of the alternative chain beginning with altroot,
the terminal start symbols of all alternative chains whose addresses are in
the stack;

4.

the terminal successors of all alternative chains whose addresses are in the
stack.

If a terminal belongs to more than one of the four categories, category 1 has
priority (because no symbol needs to be read). Category 2 has priority over
categories 3 and 4 (because synchronization can take place in the same production where the error occurred). Of the anchors derived from the stack, the
ones closest to the error location have priority, and the terminal start symbols
of the stacked alternative chains have priority over their successors.
In order to fill the triple list with terminal start symbols and successors
corresponding to the priority rules, we use the algorithms Fill and FillSucc.
Hence, the algorithm Triple has the following form:

2.48 Algorithm Triple
Triple(Laltroot) :
global triple list;
stack: array of
lacts: integer;
begin
triple list := empty;
for i:=1 to lacts do
FillSucc

integer;

(stack (i)Ji-1);

Fill (Lstack (i)Ji-1)

--actual

—ONAGG

stack

size

W

--class 3

end;

FillSucc (JaltrootJlacts);

= e1la55@2

Fill(Jaltrootllacts)
end Triple

elassel

As a concrete data structure of the triple list, we use two arrays newpc and
newlacts, which are indexed with the maxt + 1 terminals of the grammar:
newpc, newlact: array(0 : maxt) of integer

The algorithms Fill and FillSucc use the following procedure to obtain Gcode instructions:

GetSymlInstr(l pct opcode? sy? nextpcT altpc)
which supplies the G-code instruction at pc. The two last parameters have the
meaning:

Sec. 2.6

nextpc:

altpc:

Error handling

67

Address of the first 'symbol-recognizing' instruction
(LTA, NG,
ANY, ANYA) which follows the instruction
at pc in the same
production, or 0 if no such instruction exists.
Address of the first ‘symbol-recognizing' instruction
which is an
alternative of the instruction at Pc, or 0 if no such instruc
tion exists.

Fill and FillSucc can now be described as follows:

2.49 Algorithm Fill

2

Fill(Jfirstpellacts).:
global newpc,newlact:
begin
pce:=firstpc;
while pc#0 do

array(0:maxt)

of

integer;

GetSymInstr(JpcTopcode? sytnextpct altpc) ;
case opcode of
t,ta:
newpc (sy) :=pc;
| nt,nta,nts,ntas:

for

all

newlacts(pc)

x € first (sy)

:=lacts

do

newpc
(x) :=pc; newlacts(x) :=lacts
end
« | any,anya: --nothing (eps and ret do not
“end;
pc:=altpc
end
end Fill

exast)

2.50 Algorithm FillSucc
FillSucc (4startpcllacts):
global newpc,newlact: array(0:maxt)
begin
pc:=startpc;
while pc#0 do

of

integer;

Get SymInstr (pct opcode? syTnextcpt altcp) ;

if nextcp#0
pe:=altpc
end

then

Fill (dnextpcd lacts)

end;

end FillSucc

Heuristic improvements
This synchronization procedure works well in most cases and synchronizes
rapidly. However, it is not uncommon for the synchronization to be incorrect,
causing spurious error messages or the skipping of longer text portions. The
quality of the synchronization also depends on the grammar. It can be

Chap. 2

Syntax

68

ones.
improved by partitioning long grammar productions into several shorter
This increases the number of anchors.
We have improved the procedure with two heuristics, which are also independent of the grammar:
1. If several errors occur close together, we print only the first one, under
the assumption that the remaining errors are spurious, resulting from the
first one. We introduce an error distance, errdist, which is set to 0 after
the handling of any error, and is increased by one for each input symbol
read. If errdist is less than a predetermined limit errdistmin when an
error occurs, no error message is given. We use errdistmin = 2, i. e. at
least two symbols must have been recognized since the last error, other2.

wise a spurious error is assumed.
When a spurious error occurs, the stack may have already changed from
the value when the original error occurred. Therefore, we save the stack
at each original error, and restore it at a spurious error.

The heuristics only apply to the program Error and not to its subprograms.
Error now has the final form:
2.51 Algorithm Error (with heuristic enhancements)
Error (fpclaltroot) :
global

correct: boolean;
lacts: integer;

errdist: integer;
errdistmin: integer;
begin
correct :=false;
if errdist<=errdistmin
then
Print error message;
Collect the anchors;
Save the stack
else Replace the stack
end;

--stack length
--error distance
--minimal error distance

again

Skip input symbols up to the first anchor;
Correct pc, altroot, and lacts;
-- It is synchronized. The analysis can continue
errdist :=0
end

Error

Coco includes the above error-handling method in the generated parser.
A similar error handling was published by Spenke et al [1984]. They
assign weights to the anchors and make the use of an anchor for synchronization dependent upon its ‘insertion overhead’ and its ‘reliability’.

3
Semantics

Syntax analysis checks a source program only for formal correctness. That is,
it only determines whether the input string is a sentence of the given grammar.
This function is shown in Fig. 3.1.
sty,

Source program

(character sequence)

Recognized or

Parser

not recognized

Fig. 3.1 Parser

Translation into a target language presents the additional requirement that the
source program must be transformed into the target program. The 'meaning'
of the target program should be the same as that of the source program, i.e.
the semantics should be retained. A program that does this is a compiler (Fig.

372),
Source program

Target program

Fig. 3.2 Compiler

A compiler emerges from a parser if the parser is able to emit so-called 'semantic actions’ each time it has parsed some syntactic construct. The semantic
actions in turn generate output symbols which constitute in their entirety the
target program.

69

70

Semantics

Chap. 3

This chapter covers attributed grammars, which are presently the most
common technique for the formal description of translation processes. To describe the translation the context-free grammar for the source program is enhanced by three items:
1. semantic actions, which describe the actions that must be performed dur2.

ing the translation;
attributes, which describe properties of the grammar symbols and their
environment;

3.

context conditions, which describe relationships between attributes.

We will introduce these three items one-by-one, then cover the formalism of
the attributed grammar as a whole, and finally cover a subset of the attributed
grammars, the so-called L-attributed grammars, used by Coco.

3.1

Semantic

actions

The description of semantic actions can be inserted directly at the desired locations in the grammar productions, e. g. by means of the special delimiters
sem ... endsem.
For a left-to-right parsing of a production A > @, 2, the execution of
the semantic action statseq after parsing w; and before parsing w2 can be described by inserting the semantic action between wı and w2:
A

0]

sem

statseq

endsem

07

This production is to be interpreted in such a way that, for the parsing of A,
where syntax analysis proceeds from left to right, first @; is parsed, then the
semantic action statseq is performed, and afterwards w2 is parsed.

For the description of the semantic actions themselves there are no generally accepted conventions. We will use the language constructs of Adele or
Modula-2.
3.1 Example Semantic actions
Given a grammar of an arbitrary sequence of zeros and ones:
S => Ws | iS | €

The task consists of reversing a sentence o of L(G(S)) to produce an

output where the first input symbol is output as the last, the second input
symbol is output as the next to last, and so on. This translation is simply
written as

Sec. 3.2

Attributes

S$ >

0S

eS:

sem

Write('0')

endsem

sem

Write('l')

endsem

71

I, 2

For a given input sentence, e.g. o = 001, the semantic actions can be traced according to the syntax tree of Fig. 3.3.
If parsing is performed top-down from left to right, the output string
100 results.

eee
ae

=

?

0

S

0

S

1

sem

N

sem

sem

Write('0')

Write('0')

Write('l')

endsem

endsem

endsem

ee

Fig. 3.3 Syntax tree with semantic actions

5#

The next example will show that this method can also describe more difficult
transformations.

3.2 Example Semantic actions
Given the grammar of the previous example, the task is to transform an
input sequence of n zeros and m ones into an output sequence of the
same length which contains all n zeros followed by all m ones, i.e. the

sequence 0” 1”. This translation is described by
s
|

0

sem

Write('0')

endsem

15

sem

Write('1')

endsem

S

ME

3.2

Attributes

Even for such a simple task as the transformation of the input sequence
'79 + 83' into the output sequence '162', the grammar with semantic actions
fails. In general, the input sentence of any two numbers connected by '+' to

Semantics

72

Chap. 3

produce an output sequence that shows the sum of the two numbers will fail.
Why?
When recognizing a constant, the lexical analyzer supplies only the terminal class c (as explained up to now). Thus, the parser 'sees' only the sequence c +c as input. A semantic action that produces the sum of the two
numbers, however, is not satisfied with the terminal classes of the two numbers, but requires the values of the constants. These values are the semantic
properties of the individual members of the terminal class c. Thus, a lexical
analyzer will have to supply two items for input symbols that are terminal
classes: the type and the value of the input symbol. The symbol type (not to
be confused with the data type) is the terminal symbol in the context of the
grammar (variable, constant), and therefore a syntactic property, the symbol
value is a semantic property.
By assigning an attribute to each terminal symbol that represents a terminal class, the semantic properties of terminal classes can be introduced into the
formal language description. We write attributes as indices preceded by an arrow, whereby a constant now assumes the form: ctx, where x is of the type
integer. The up-arrow shows that x is the result of the parsing of c, i.e. has
the character of an output parameter.

By the use of attributes, we can describe the task of reading and adding
two constants connected by a plus sign as follows:
SES

ect x a tac y

sem

Write(xty)

endsem

In general, attributes describe properties that are associated with a grammar
symbol. Therefore, nonterminals can also have one or more attributes. For ex-

ample, let the following three properties apply to the symbol expr: (1) ‘type
of expression’, (2) the expression has no operators, and (3) the expression is
translatable at compile-time. Then we can assign these three attributes to expression by writing
©XPITexprtype Tsimple Tvalueknown

exprtype may assume various values dependent on its data type; simple and
valueknown can assume boolean values. In general, one can assign to each
nonterminal and to each terminal class X of the context-free grammar a number of attributes that describe those properties of X that cannot be described
by the context-free grammar alone. Each attribute can assume a predetermined
number of values. These form the attribute type. The attributes of terminal
classes receive their value through the recognition of the terminal symbols
by
the lexical analyzer. The values of the attributes of all nonterminals are
calculated by the semantic actions.

Secs 322

Attributes

73

3.3 Example Interpretation of arithmetic expressions

Consider the grammar of arithmetic expressions consisting of numbers,
operators, and parentheses, and terminated by a semicolon:
Sy ey 15

E>
T | ETT
eb

ere

Pac

TEE)

We want to define formally the meaning of such an expression by a description of its interpretation. ‘Interpretation’ means that an expression
will be read, its value computed, and the result printed. In the formal description it must be stated that each symbol of the grammar, except for
operators, parentheses, and semicolons, has a value. This value is denoted by an attribute. For example, the production F > c is verbally
interpreted by the sentence ‘the value of the factor F is the value of the
constant c' and formally by the production:
Et, >

ch

sem

a:=b

endsem

Similarly, multiplication is described by the attributed production:
5

Ta >

Tin * Ftc

Sem a:=b*c

endsem

This means: "When recognizing the right-hand side, the attributes b and
c are assigned a value, and subsequently the product of these values is
computed, and assigned to the attribute a of the symbol 7". Correspondingly, the remaining productions of the grammar can be assigned attributes and semantic actions, so the complete description is as follows:
Sy

a

sem Write(a)

endsem

Eta >

Ttp

sem a:=

endsem

Efs >

Ein + Tr.

Sem a:=btc

endsem

sem a:=b

endsem

Ta >

Mn * Ftc

sem a:=b*c

endsem

Fta >

Cfp

sem a:=b

endsem

Fta

(Efp)

sem

endsem

Ad

>

a:=b

Such a description is called an attributed grammar.

A simplified notation
The reader may notice that in Example 3.3 most semantic actions consist of
only an assignment. It is therefore a useful shortcut to abbreviate

Chap. 3

Semantics

74

Fta >

CTD

Sem a:=b endsem

by
Ne
Aas
This notation expresses the fact that the attribute of c is assigned to the output attribute of F without change.

Attributes and semantic actions in EBNF
The extended Backus—Naur form can also be used for the description of attributed grammars. Example 3.4 is the same as Example 3.3 but uses the simplified notation in EBNF form.
3.4 Example Interpretation of arithmetic expressions in EBNF
3

te

sem Write(a)

endsem

U,

Efa > Ifa
(hr

es,

sem a:=atb

endsem

sem

endsem

He

fa > Ffa
(SIEH

a:=a*b

}.

Pia

la
|

oe (ee

Eta

un

With this notation, one can see how the visual separation of syntax and
semantics significantly improves readability.

Input and output attributes
All of the previously used attributes behave like output parameters: they are
generated by the parsing of a terminal or a nonterminal, and are used afterwards. We therefore call them derived or synthesized attributes and denote
them by an up-arrow. But nonterminals can also have attributes that behave as
input parameters, i.e. attributes that already have values, when the parsing of
the nonterminal starts. Then, semantic actions which are executed during the
parsing of the nonterminal can use these values. We call such attributes inherited attributes, and denote them by a down-arrow. The next example
shows the application of inherited attributes.

Sec. 3.2

Attributes

75

3.5 Example Inherited attributes
Given the following grammar
variables:
s

OC

typ

orreals

my pi Gils
ante

which describes

the declaration of

tar

| bool

idlist > id | idlist , id
id is the terminal class of all identifiers. The declaration consists of a

keyword dcl, a type, and one or more variables of this type, for
example: dcl int x, y, z. The semantic action, which should be performed during parsing of the declaration, consists of entering each
variable's name name and type t into the name list. Let this be done bya

call of the procedure NewId(\ nameJt). It is appropriate to call New/d
immediately after the parsing of an identifier id in the production for
idlist. But how can one recognize the type at this point since it was already parsed in the production for typ?
The solution is to attach the type ¢ as an inherited attribute to the
nonterminal idlist:
s->
del

typft

idlist];

idlist]+

;

=

idfname

| idlistlt , idfname

sem

NewId (name, Ft)

endsem

sem NewId(Jname,Jt) endsem

Output attributes of a known symbol A are computed during the parsing of
the right-hand side of the A-production, and can thus be used during the
parsing of other grammar productions that contain A as a part of their righthand side. Thus the information flows from the bottom to the top, from the
leaves to the sentence symbol. Input attributes of a nonterminal A are computed prior to parsing of the A-production, and are used during its parsing.
Thus the information flows from top to bottom in the syntax tree, from the
sentence symbol to the leaves. Output attributes of A describe properties of
the A-phrase, and its constituent phrases. Input attributes of A denote properties of the environment of the A-phrase.
Figure 3.4 shows a syntax tree ‘decorated’ with attributes for the
sentence:
Clall

abslie Sea\ye

The flow of attribute values along the dashed lines can easily be seen.

del

Chap. 3

Semantics

76

typ

Ba

idlist

Z

idlist

|?

bes

E

Dale,

NewId(|name

iiname

Jt)

PPPPFFERERREEFEEEEEREEEEREEEEEOEDERERDEELLLLLLLLESLLLLLLETLOLTLLTTT N

idlist

ä

lie

h

id

y

id

| Tname
x

NewId(Jname

| Tname

lies

pant

NewId(Jname

|t)

eccccescces

Fig. 3.4 Analysis of the sentence dcl int x,y,z.
The attributes flow along the dashed lines

3.3

Context

conditions

The formal syntax description of a programming language is not sufficient to
distinguish between correct and incorrect programs. For example, in a programming language where all variables must be explicitly declared, the following code may be syntactically correct, even though it does not represent a
valid program since the variables x and y are not declared.

PROCEDURE P
VAR a,b: INTEGER
x!=y

END P
If a programming language definition states ‘each variable in an assignment
statement must be declared' this defines a relationship between textually separated language elements, which cannot be represented by a context-free gram-

Sec. 3.3

Context conditions

77

mar. Such constraints are thus called context conditions and are usually considered as part of the semantics since they cannot be described syntactically.
The total set of context conditions is called the static semantics of the programming language. The word static signifies that they refer to the source
code and not to the execution of the program.
Programming languages are full of context conditions. It would be desirable if the language definition contained explicit definitions for them, separating them from the other parts of the language definition and stressing their
importance. Unfortunately, this is rarely the case since they are often buried
implicitly in other definitions. Sometimes they are missing altogether since the
author wants a small defining document, or because it is assumed that the
reader understands them.
Attributed grammars also permit the formal description of context conditions. The context condition is expressed as a relation between attributes. For
example, the context condition 'the left side and the right side of an assignment must be of the same type’ imposes a relation between the type attributes
of both sides. If
assign

=

idftyittyp1

ze

eXPrfyaftyp2

2.

is the production for the assignment, where typ] and typ2 are the types of
id and expr, then the context condition is typ] = typ2. The context condition can be written separately from the production in the form
assign = idtyiTtyp1
CC:
typl=typ2

PR

eXPrfyaftyp2

222

or it can be integrated into the production, e. g. in a manner proposed by Watt
and Lehrmann Madsen [1983]:
assign

= idfyıftypı

":="

eXPrfyaftyp2

"5"

where(typl=typ2).

The first form separates the context condition from the production in a firmer
manner and is especially suited for several long context conditions. The
second form emphasizes the coherence between production and context
condition.
According to van Wijngaarden's two-level grammar, the part where(...)
can be regarded as a nonterminal that is derived into an empty string if the
relationship inside the parentheses is true. It cannot be derived into a terminal
string if it is false. If typ] = typ2, the syntax analysis of where(typl =
typ2) then results in the empty string, so that an assignment is parsed with the
remaining part of the production. However, if typ] # typ2, the terminal
string representing the assignment statement is rejected since the where-part is
not terminating.

Chap. 3

Semantics

78

We use the style with where and define the point of execution of the test
of the context condition by its position in the production in the following way.
The production
A = 0] where(CC)

©

.

means that in order to parse A, we must execute a syntax analysis from left to
right that will parse w, first. Thereafter the context condition CC is tested. If
it is not met, an error will be reported. Then w2 will be parsed.
The following examples show the application of context conditions.
3.6 Example A context-sensitive language

The language {a"b"c": n 2 1} is not context-free. It is shown in all
textbooks about formal languages that a context-free grammar does not
exist for this language. However, the following attributed grammar with a
context condition is easily constructed:
S

At, Btg Cf,

where(p=q=r).

At,

=

a

sem

p:=l

endsem

{a sem

p:=p+l

endsem}.

Btg

= b

sem

q:=l

endsem

{b sem

q:=q+l

endsem}.

Gira

aac.

ESeM r =

Igendsemsetlcısengr:

rl

end Semi

Here, p,q, and r represent the counts of the characters a, b, and c.
The context condition requires that they are equal.
3.7 Example Context condition
The context condition 'in the declaration of an array, both index bounds
must be of type integer, and the lower bound must not be greater than the
upper bound’, can be described as follows:
arraydeclaration

=

idtname "(" constantToittyp:
where((typl=typ2=integer)

":" constant
&

caTtyp2 ")"

(clSc2)).

where c] and c2 represent the numerical values of the bounds.

3.8 Example Context condition
The context condition ‘each variable appearing in a statement must have
been previously declared’, can be described as follows. One must distinguish syntactically the applied occurrence of a name (in a statement) from
the defining occurrence (in a declaration), with the additional syntax rule:
var

=

id.

Sec. 3.3

Context conditions

79

The nonterminal var denotes the applied occurrence of the name id.
Therefore, var must be written in all statements in place of id. If a semantic procedure /sDeclared(\name) is used to check the symbol list to
see if the name of the variable is declared, the context condition can be

simply formulated as follows:
Vorne

ldtnene

where (IsDeclared({name)

) .

If a context condition is not met, this usually affects the execution of the
Subsequent semantic actions, but this cannot be expressed well in the
attributed grammar. In Coco, we therefore avoid explicit context conditions, replacing their checking with semantic actions (see Section 3.6).
However, for the description of the static semantics of programming languages context conditions are very suitable.

3.4

Attributed grammars

In the previous sections we have introduced the elements of attributed grammars. We now consider them in their entirety. In the literature the concept of
an attributed grammar is defined in many different ways (see for example,
Räihä [1977], Tienari [1980], Watt and Lehrmann Madsen [1983]). We will

follow Waite and Goos [1984].
3.9 Definition Attributed grammar
An

attributed

grammar

is a quadruple

AG = (G,A,R,K):

G = (Vy, Vr, P, S) is a reduced context-free grammar; A is a finite
set of attributes; R is a finite set of semantic actions; and K is a finite set

of context conditions. With each symbol X eVr UV], zero or more
attributes from A are associated. With each production zero or more
semantic actions from R and zero or more context conditions from K are
associated. For each occurrence of a nonterminal X in the syntax tree of
a sentence of L(G) the attributes of X can be computed in at most one
way by semantic actions.

The attribute computation process
In the concept of attributed grammars, it is essential that the definition says
nothing about the order in which the semantic actions are executed. In the previous examples, we assumed that syntax analysis was performed top-down
from left to right, and that the semantic actions were executed in the same

80

Semantics

Chap. 3

order. However, according to Definition 3.9, this is not required. The order of
the semantic actions is not predetermined by some syntax-analysis method:
rather, it is free. This eliminates the necessity of putting the semantic actions
and context conditions in particular places of the right-hand side of the grammar productions. All semantic actions and context conditions that belong to a
syntax production can be summarized and written at the end of the production.
In the general case, the translation runs in two phases:

DD

syntax analysis, which constructs a syntax tree;
execution of semantic actions, which mainly compute the attribute values
attached to the nodes of the syntax tree in an arbitrary order.

Step 2 implies that an ‘attribute computation process’ will traverse the syntax
tree in an arbitrary manner and compute the values of the unknown attributes
at each node. A semantic action can be executed at a specific time if and only if
all attribute values which contribute to the computation are known at that time.
The attribute computation process continues until all attribute values are calculated. It is therefore possible that the attribute computation process must traverse the syntax tree several times, up and down, criss-crossing from left to
right. In order to avoid ambiguous computations of attributes, the definition of
attributed grammar contains the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way".
3.10 Example Variable declaration
In Pascal, variables are declared by their enumeration after the keyword

var, and the type follows the list of variables. For example,
var X,y,z: integer

The semantic actions implied by the declaration may consist of a call to a
procedure New/d(\namelt) which appends the name and type of the
variable to the name list. In a strict translation from left to right, this con-

struct leads to difficulties, since the type is known only after all names
have been parsed, and therefore New/d cannot be called immediately
after recognizing a name. In an attributed grammar, these difficulties do
not arise if it is formulated as follows:
|

1

declaration

"var"

2

=

idlist).,

":" typft]

endsem.

idlist),, =
nd Taare

31

sem t0:=tl

Helalisk

soit? are

sem NewId(Jname/t1)

endsem

sem NewId({namelt1);

t2:=t1

endsem.

Sec. 3.3

Context conditions

81

For the source text var x,y,z: integer first a syntax tree is generated,
where all attributes except those of terminal classes have no values (see

Fig. 3.5).

declaration

Wehe

skelilsige

3

:

type

|

sem

t0:=t1

endsem

Nasunsunnunenunnnnannnennanuonnnnnsnnnnne nen ssnnnnnnsnnaunsusannnnennurnennunnen nenn snsnunennnnnsnnsnsnnssnnanunnnumannunnsennnen

MIR each
idlist

*

Em REN
’

DEE en
id

'

sem

NewId(|name

|t1);

t2:=t1

endsem

sem

NewId (|name eV;

t2:=t1

endsem

Tannnonsonsusnnnnunuununsnnnnnenasnnnnansnnnnnanenen

{ceswaceccsccsccsvevscosecorcensccey,

id

sem

NewId(|\name |t1)

endsem

|T name
xX

ee

Fig. 3.5 Analysis of the sentence var x,y,z: integer with the flow of
attributes along the dashed lines

The attribute computation process now starts at an arbitrary node in order
to compute the missing attributes, and to call procedure New/d. Wherever it starts, the first semantic action that can be executed is 0 := tl in

production 1. Then, 2 := t] and New/d(\namelt]) in production 2
can be executed. This process continues along the dashed lines until all of
the semantic actions are executed.

Semantics

82

Chap. 3

In Example 3.10, the order in which the three calls to Newld are executed is
not determined by the attributed grammar, but rather depends on the strategy
of the attribute computation process. In most cases, the order is unimportant,
and therefore this kind of attributed grammar is adequate. If desired, a particular order can be imposed by introducing additional attributes.
Cyclic semantic dependencies
Attributed grammars can be constructed in which the attribute computation
process does not terminate since some attributes depend on themselves. This
is called a cyclic semantic dependency. In Definition 3.9, this possibility is
covered with the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way’. There are algorithms that
can check the grammar for this property (Knuth [1968], Waite and Goos
[1984]). If an attributed grammar of the general form described above has
been defined, it must first be checked for cyclic semantic dependencies, and
possibly transformed into a well defined form.

3.5

L-attributed

grammars

Great effort is required to translate an attributed grammar as described in the
previous section. First, the syntax tree of the program to be translated must be
generated, and each of its nodes must be ‘decorated’ with the attributes. Then
the syntax tree must be traversed more than once to compute the attributes until
all attributes are determined. Nowadays storage and run-time requirements
confine this method to mainframes - if it is regarded as practical at all.
Hence, special forms of attributed grammars are needed for compilers,
permitting the computation of the attributes in a single pass from left to right
through the syntax tree. Then the semantic actions can be executed in parallel
with the syntax analysis and no syntax tree is needed. Such attributed grammars are called L-attributed (i.e. left attributed) according to Lewis et al.

[1976]. All examples in Sections 3.1 through 3.3 are of this kind. The limitations imposed on attributed grammars to make them L-attributed, and are
related only to the order of the attribute occurrences in a production. Each
inherited attribute a of a grammar symbol X on the right-hand side of a
production must be computable before X can be recognized. Therefore, for
its computation only those attributes can be used that are known prior to the
parsing of X. From this, the following definition follows:

eee 3:5

L-attributed grammars

83

3.11 Definition L-attributed grammar
An attributed grammar is called L-attributed if for each of its productions
Y > X, ... X,, the following is true: An input attribute of X, depends
only on the input attributes of Y and on the output attributes of
Xı

eee

re

It can easily be checked by inspection whether a given grammar based on this
definition is L-attributed.
The question is, how far can one get with an L-attributed grammar, and
what do the limitations mean? The general attributed grammars are indisputably the more powerful tool. The user does not need to be concerned about the
processing order of attributes (and possibly storage of intermediate results)
since this is all done automatically by the attribute computation process. The
description is essentially static and thus 'in principle’ simple. In reality, such
descriptions can be cumbersome and difficult to understand, particularly in the
presence of many attributes.
L-attributed grammars can be used to describe the translation of nearly all
important language constructions. However, in many cases more context must
be used for the translation. This is expressed by the necessity of saving inter-

mediate results in lists, stacks, etc. In Section 3.6 it is shown how the non-L-

attributed grammar of Example 3.10 can be easily replaced by an L-attributed
grammar with semantic actions for temporarily saving variable names. The
worst that can happen is that the order of the semantic actions which is imposed by the use of the L-attributed grammar will require the partition of the
translation into several passes in which each pass can be defined by an L-attributed grammar. In view of these disadvantages, Waite and Goos [1984]
say: 'L-attributed grammars are inadequate, even in comparatively simple
cases.' We do not agree with this categorical statement. In most cases, the
simplicity and the ease of implementation of L-attributed grammars more than
compensate for their disadvantages. Therefore we feel that they are a very
suitable tool for compiler implementations, at least as long as our computers
are limited in memory and speed.
Coco processes only L-attributed grammars, and all attributed grammars
in the following chapters of this book are L-attributed.
Algorithmic interpretation of L-attributed

grammars

While general attributed grammars are a declarative and therefore non-algorithmic formalism, L-attributed grammars can also be regarded as algorithmic descriptions, imposing an order in which semantic actions have to be executed.

Semantics

84

Chap. 3

Programmers who are used to think algorithmically will find it easier to follow
this approach. Therefore, we understand an L-attributed grammar as a very
high-level algorithmic language in the following sense.
The context-free portion of a production
A=,

1a,

| a@,.

denotes the algorithm: ‘Parse the nonterminal A by choosing the matching
alternative o;, and recognizing its components sequentially from left to right.’
Each alternative with a semantic action of the form
OQ;

=

X1...X;

sem

SA

endsem

X441---Xp

denotes the algorithm: 'Parse X} through Xj, then execute the semantic
action SA, and then parse X;+1 through Maes
Each alternative with a context condition of the form
OR

RER,

where

(CC)

X441---Xn

denotes the algorithm: 'Parse X; through Xj, then test the context condition CC (and report any errors), and then parse X;,; through X,.'
An attributed production of the form
Alaotoo

= Xlaıloı

Yla2Tb2-

denotes the following algorithm:

1.
2.
3.

compute a] (using semantic actions that are not stated here, which must
precede X and may depend on a0);
parse X (thereby b/ gets a value);
compute a2 (using semantic actions that are not stated here, which must
precede Y and may depend on a0, al, b1);

4.
5.

parse Y (thereby b2 gets a value);
compute bO (using semantic actions that are not stated here, which may
depend on a0, al, bl, a2, b2).

This algorithmic interpretation adds as a further clause to the definition of Lattributed grammars (Definition 3.11) the sentence: ‘Attributes that are used as

arguments in a semantic action or context condition between the grammar
symbols X, and X;,, can only be input attributes of the left-hand side of the
production and output attributes of X, to X;.'

Sec. 3.6

3.6

Implementation of the semantic interface

Implementation

85

of the semantic interface

The implementation of the semantic interface in a compiler compiler and in the
generated compiler consists of three tasks:

1.

2.
3.

translation and storage of semantic actions during compiler generation
time and execution of semantic actions at run-time of the generated
compiler;
translation and storage of context conditions during compiler generation
time and test of context conditions at run-time of the generated compiler;
reserving memory for attributes at compiler generation time and attribute
passing at run-time of the generated compiler.

These tasks are most simply and directly implemented if the generated compiler performs its syntax analysis with the popular method of recursive descent,
which is not covered in this book (Gries [1971], Hartmann [1977], Wirth
[1986]). In this, semantic actions and context conditions are directly embed-

ded as code in the syntactic procedures, and attributes become parameters of
the syntactic procedures. The simplicity of this kind of semantic interface
makes the method of recursive descent still attractive today for hand-coded
compilers. If the generated compiler performs a table-driven syntax analysis,
then somewhat more effort is required for the semantic interface. In this
section, we cover the method used by Coco.

Semantic actions
The semantic actions are numbered. The order is arbitrary, but it is easiest to
order them as they appear in the attributed grammar. We start the numbering at
12 for reasons that follow. All semantic actions are placed in the single procedure Semant as follows:
Semant (nr):
case nr of
12: Semantic
| 13: Semantic

| n

ee

Action
Action

12
13

Action

n

end
end Semant

The G-code is expanded to provide as many instructions as there are semantic
actions. The G-code instructions treated in Section 2.4 (and two more, see
Definition 3.14) have operation codes 0 through 11. Operation codes 12
through 255 correspond to semantic actions 12 through 255. Thus, Coco has
a limit of 244 semantic actions which will probably be rarely reached. We only

Chap. 3

Semantics

86

need 68 semantic actions to describe the attributed grammar of Coco itself, and
126 semantic actions for the largest pass of a Modula-2 compiler.
For the processing of semantic actions the parser of Algorithm 2.44 needs
to be expanded only by an if statement:
3.12 Parser with semantic interface
Parse (Tcorrect):
loop

case

opcode

of

I

rei
else

if correct then Semant (4 opcode)
--perform semantic action
end

end
end

--

--

end

case

loop

Parse

We will now study this method in more detail by an example that uses an L-attributed grammar to translate the following declaration:
var

x,y,Z:

integer;

(In Example 3.10 we have already given a general attributed grammar for this
task.) Before we can add the identifier list and type to the name list, it must be
temporarily stored. To this purpose we will use a queue as abstract data structure with the access procedures /nitQueue, Enqueue, Dequeue, and
EmptyQueue whose meaning is obvious. The attributed grammar is as
follows:
declaration

=

UVicUae CLens

sem

InitQueue;

(re

idlene

sem

Enqueue(Jname)

uate

VDEhe

sem

while

not

Enqueue(Jname)

endsem

endsem

EmptyQueue

do

Dequeue (Tx) ; NewId(Jxlt)
end
endsem
v.u
ya

The numbering of the semantic actions and their integration into the procedure
Semant results in the following:
Semant
local

(Jnr):
name,x:

Ee
begin
easeanrzoR

(int,

Nametype;

DOO

neali

Sec. 3.6

Implementation of the semantic interface

12:

119%
| 14:

InitQueue;

87

Enqueue (J name)

Enqueue (J name)
while not EmptyQueue

do

Dequeue (Tx) ; NewId (Lxlt)
end

end
end Semant

The attributes are local variables of Semant. This means that in general all the
names contained in a semantic action (enclosed between sem and endsem)

are global to this semantic action, and therefore common to all of the other
semantic actions.

Context conditions
Context conditions are not treated as an independent language element in
Coco. Rather, they are represented as semantic actions. Instead of
where (CC)

we write, for example,
sem

if

not

CC

then

SemErr

end

endsem

where SemErr is a semantic error processing procedure.

Attribute passing
Coco treats all attributes as local variables of Semant. They receive their
value through attribute passing. This is different for terminals and nonterminals. The attributes of terminals (i. e. terminal classes) are always synthesized attributes. They receive their value by the lexical analyzer during
parsing. The inherited attributes of nonterminals are passed before parsing by
an implicit semantic action, whereas the synthesized attributes are passed after

parsing.
3.13 Example Attribute passing
For the productions
A
= ... Biyty +
Bist,
the attribute passing
[x

is done in the A-production before the parsing of B, and the attribute
passing
y:-V

is done in the A-production after the parsing of B.

Chap. 3

Semantics

88

The attribute passing after the parsing of a nonterminal can be executed by a
'normal' G-code instruction, i. e. by an instruction activating a semantic
action. However, for the passing of inherited attributes, two additional G-code
instructions are necessary:
3.14 Definition

G-code (remainder)

Instruction

Bytes

Description

NTS

sy sem

3

nonterminal with input attribute semantics.
If the next input symbol is a terminal start symbol of sy,
then execute the semantic action sem (for input attribute
passing) and start the parsing of the production for sy, else
report an error.

NTAS

sy adr sem

5

nonterminal with alternative and input attribute
semantics.
If the next input symbol is a terminal start symbol of sy,
then execute the semantic action sem (for input attribute
passing) and start the parsing of the production else go to
adr.

A complete example for the translation of an attributed grammar into G-code,
including attribute passing semantics, can be found in Section 8.3.

Problems with semantic interfaces
The simplicity of this semantic interface gives rise to two problems:
1. Semantic actions may only be executed when it is clear that no other alternative will match. In the production
A = sem
| sem

actionl
action2

endsem
endsem

C.
D.

it must be determined whether C or D is the proper alternative before
executing action! or action2. Coco takes this into account by automatic
insertion of an e-node before the corresponding semantic actions, which
leads to the following result:
A

sem

action

1 endsem

c
|

A>€-action1l1

OC

x
sem

action

2 endsem

€

— action

2 >

D

EPSA

1M

SEM

12

NT

C

Sec. 3.6

Implementation of the semantic interface

89

where the proper selection of alternatives is done with the following
lookahead sets:
epsset

first (C)

(1)

epsset (2)

=

first (D)

This also works in the following production:
A=B

sem

action

1 endsem

sem

action

2 endsem

C sem

action

3 endsem

{

}.

i

For the above the following top-down graph and corresponding G-code
is generated:
|

A = B=

action]

|

(ieaction2

=

C~action3

NT

SEM
M1:EPSA

€

B

12
1 M2

SEM

13

NT

(€

SEM

14

JMP

Ml

M2REBSS

2

RET

with the lookahead sets
epsset (1)

=

epsset
(2) =

first (C)
follow(A)

If the e-nodes have disjoint lookahead

2.

sets, these constructs are LL(1).

Attributes in Coco are implemented as local variables of Semant. This
results in the undesirable feature that their values are not retained during
recursive parsing of nonterminals. For example, in the interpretation of
expressions, the following production arises:
Bie = ype

la

Tty sem

x:=x+y

endsem}.

Here, the output attribute x of the left T must be still available after
parsing of the right T since its value is used afterwards. However, since
T is recursive over F and E, the attribute x of the left T may be
destroyed by the parsing of the right T. Coco does not take care of this
problem. It is up to the programmer to save and restore x explicitly. This
can be done by use of a stack and replacing the above production by the
following:

Semantics

90

Et, = Tt, ("+" sem Push(lx)
Tty

Chap. 3

endsem

sem Pop(tx); x:=xty endsem}.

From this follows the

3.15 Principle of attribute saving for recursive symbols

Attribute values that must be preserved beyond the parsing of a recursive
nonterminal X must be saved before the parsing of X and restored after
the parsing of X.

4
Various compiler compilers

In the previous chapter we covered the theoretical background of compilers. In
the following chapters we will show the practical application of these principles in the design of the compiler compiler Coco.
However, before we go into the details of Coco, it will be interesting to
look at some other compiler compilers. This will enable the reader to compare
Coco with these systems.
There is extensive literature about compiler-generating systems. Bibliographies can be found at Räihä [1980] and Meijer and Nijholt [1982]. The
scope of this book allows us to cover but a few of them; and even then only to
a limited degree. Some of the best-known compiler compilers are YACC
(Johnson [1975]), HLP84 (Koskimies [1984]), GAG (Kastens et al.[1982]),

and MUG (Ganzinger and Giegerich [1984]). In the following paragraphs, we
will compare these systems to each other.
The basic operation of today's compiler compilers is always the same.
The compiler to be generated is described by a metalanguage based on
attributed grammars. From this compiler description, a parser and a semantic
evaluator are generated which constitute the essential parts of the resulting
compiler. The generated compiler reads the source text to be translated,
performs a syntax analysis to check the correctness of the input, and builds a
syntax tree in memory. It then assigns attribute values to the tree nodes
according to the attributed grammar. This process normally requires several
passes which traverse the tree from left to right or from right to left. In each
pass as many attributes as possible are evaluated. Finally the total semantics of
the source program is represented by the attributes in the tree. The last pass
generates the target code from the attribute values.

91

Various compiler compilers

92

Chap. 4

The various compiler compilers mainly differ in their compiler description
languages, and in their algorithms to traverse the syntax tree. Although much
effort is spent to reduce execution time and attribute space, large memory
requirements and long processing times are the main reasons why automatically generated compilers are still less efficient than hand coded compilers.
Therefore some compiler compilers like YACC and Coco bypass the construction of a syntax tree and accept that they are less powerful and less
generally applicable than HLP84, GAG, or MUG.
The above mentioned compiler compilers will be compared without going
into too much detail. We will give a short example of their input language
which will show the translation of a signed integer constant into its value.
Normally, such tasks are handled by the lexical analyzer. However, they can
also be solved with an attributed grammar, which is short and easy to
understand and is therefore well suited as an example of attributed grammars.
Of course compiler compilers can achieve more than what is demonstrated
in this short example. Most of them will only show their advantages on a large
and complex task. However, these small examples will allow some interesting
conclusions about the user-friendliness and the effort required to learn the
description language of the various systems.

4.1 YACC

- yet another compiler compiler

Origin and scope
YACC was produced by Stephen C. Johnson at Bell Laboratories in 1975. It
runs under Unix and is therefore widely available. YACC accepts L-attributed
grammars with the limitation that each grammar symbol has only one
synthesized attribute and no inherited attributes. From the compiler description, YACC generates an LALR(1) parser (Lookahead LR(1)) and a semantic

analyzer which is simply a collection of all of the semantic actions of the
compiler description. The user must supply a main program, a lexical
analyzer, and a syntax-error handler.
Description language
The syntax parts of the YACC source language are written as BNE
productions. All terminals (with the exception of literals) must be declared.

For the production

X0:X1X2

... Xn, the symbol $$ denotes the attri-

bute of X0, $1 the attribute of XI, and $n the attribute of Xn.

Semantic

actions can be specified at any position between the symbols of the context-

Sec. 4.1

YACC - yet another compiler compiler

free grammar. They must be
sequence of valid C statements.
in semantic actions. At the end
which are called in the semantic
named yylex must be provided.

93

written in C and may contain an arbitrary
Context conditions are written as if statements
of the grammar, one can write C procedures
actions. At this point also a scanner procedure

Attribute processing
The attribute processing is done in a single pass during syntax analysis. An
explicit syntax tree of the source language is not produced.
vd

Implementation
YACC is written in C and produces compilers that are also written in C. It has
been used for the translation of many languages, including C, APL,
RATFOR, and Pascal.
4.1 Example Attributed grammar as input for YACC
$start

Number

/* start

Stoken

digit

/* declaration

symbol

/* have

5%

to

be declared

/* separator

/* Tun

an ma

a

Digitlist:
|

terminals.

a

"=" Digitlist
Digitlist

|
a a a

as

*/
Literals

a a

a

digit
Digitlist

don't

*/

*/

*/

a

“ Number:
/* a

of the grammar
of terminals.

a

ee

ee

*/

{printf (-$2);}
{print£ils1);)7
a

a

a

a

wn we

we

{$$ = $1;}
{if (($1>3276)
(($1=3276)

digit

|
&&

ee

x /

($2>7)))

{printf ("Constant

too big");

$$ = 0;}
else

IfeFE

Sepp

ps

ee a

RR a

sy

he

($$ = $1*10

er

SE Se

5%
#include<ctype.h>

yylex()

{

/* lexical

int ch;
while ((ch=getchar

())=="

analyzer
");

if (isdigit(ch))
{yylval=ch-'0';
else return (ch);
}

yyerror (s)

/*error

procedure*/

chars2s,

{printf ("%s\n",s);)
main()
/*main
{return (yyparse());}

*/

procedure*/

return

(digit);}

+ $2;}

aay BE

re

ee */

94

4.2 HLP84

Various compiler compilers

Chap. 4

- Helsinki language processor

Origin and scope
The first version of HLP was produced in 1978 under the name HLP78 at the
University of Helsinki by Räihä er al. [1983]. Since then a new version,
HLP84 (Koskimies [1984]), has been created which has little in common with
the previous one. HLP84 accepts attributed grammars for a one-pass translation of programs. It generates a scanner, an LALR(1) parser with error
handling, and a semantic evaluator to which user procedures can be attached.
Symbol table handling can be partially described in the compiler definition
language; in certain cases it is even done automatically. This reduces the
number of semantic procedures required.
Description language
The description language Lisa is nonterminal oriented. This is in sharp
contrast to other compiler description languages, where the emphasis is on
productions. Each nonterminal is described by a block which forms the scope
of its local objects. This is similar to the use of procedures in higher-level
languages. A block contains all productions of a nonterminal in extended
BNF, as well as the description of all terminals used in it. Within a block,
attributes and local variables are declared in a Pascal-like form.
A set of semantic rules consisting of assignments and function calls is
attached to each production. These rules assign values to the synthesized
attributes on the left-hand side and to the inherited attributes on the right-hand
side of the production. An attribute a of a grammar symbol S is denoted by
S.a. Terminals can have a single synthesized attribute. There is a specific
language element for context conditions. Lisa provides some standard facilities
for frequently needed operations such as definition of scopes and searching of
names in them. These mechanisms free the user from some clerical work. For
example, an identifier will be automatically searched in all open scopes and its
node in the syntax tree will be automatically attributed according to the information in its symbol table entry.

Attribute processing
Attributes are processed in a single pass from left to right by means of an
attribute-stack and without an explicit syntax tree. This limits the application of
HLP84 to languages that can be translated in one pass although it is not
required that semantic analysis is done during syntax analysis.

Sec. 4.2

HLP84 — Helsinki language processor

95

Implementation
HLP84 was implemented on a Burroughs B7800 computer in Pascal. It
generates compilers in Pascal. The system has been used for its own implementation and for the generation of a Pascal compiler.
4.2 Example Attributed grammar as input for HLP84
external
-- declaration of external Pascal-objects
type Outfile = Extfile;
function WriteInt(f:Outfile; i:Integer):
(f:Outfile) =
procedure ExtOut (var f:Extfile; i:Integer) ;
-- Connects the Pascal-procedure ExtOut with the Lisa-function

--

Writelnt.

--

Extfile

and

nont

Number;

---

ExtOut

are

given

in a special

description of the nonterminal
Number has no attributes.

attrset Intval = (val: Integer);
-- val is declared to be an integer

system

file.

Number

attribute.

The

(start

sym.).

attribute

-- declaration is given the name Intval.
var out: Outfile;
-- global variable

const
nont

max

SignedNumber:

nont

,

= 65535;

DigitList:

Intval;

-- description
-- SignedNumber

of the
has an

nt "SignedNumber".
attr. set "Intval"

Intval;

check val < max;
-- context condition
token DigitToken: Integer = Digit;
-- the terminal "DigitToken" with an attr. of type Integer is
-- declared to consist of a single digit (Digit is predefined)
DigitList = DigitToken;
rules
val:=DigitToken
-- the attr. of a token
end;

---

syntactic production
semantic rules

is denoted

DigitList = DigitList DigitToken;
rules
val:=10*DigitList.val+DigitToken
end
end DigitList;

SignedNumber = '-' DigitList;
rules
val:=-DigitList.val

end;
SignedNumber
rules

= DigitList;

by the

name

of the

token

Chap. 4

Various compiler compilers

96

val:=DigitList.val
end
end SignedNumber;

Number

=

rules
post

SignedNumber;

(out, SignedNumber.val) ;
out:=WriteInt

-- after
end
end Number

SignedNumber

is processed,

its

attribute

val

is written.

4.3 GAG - generator based on attribute grammars
Origin and scope
GAG

was developed by Kastens, Hutt, and Zimmermann

[1982] at the

University of Karlsruhe. It accepts ordered attributed grammars where the
attribute evaluation order of each nonterminal is fixed and independent of the
context of the nonterminal. From the compiler description, an attribute
evaluator and an LALR(1) parser are produced (by separate tools). The user
must supply a lexical analyzer and a few other procedures such as a code
generator. These modules together with some fixed parts constitute a complete
compiler.
Description language
The grammar is written in extended BNF with special constructs for options
and repetitions. All nonterminals and terminals (except literals) must be
declared. Every production is associated with a set of semantic rules. In these
rules the strongly typed, functional language Aladin is used, allowing attribute
assignments and function calls. The right-hand side of an assignment can be a
complex expression of attribute values, function calls, if expressions, syntax
symbols, and many others (see Example 4.3). As a functional language Aladin
has neither variables nor control statements. The attribute notation S.a means
the attribute a of the symbol S. If S occurs in a production several times, the
first occurrence is denoted by S[1], the second by S[2], and so on. There is

a special language element for context conditions.
Attribute processing
A decorated syntax tree is built during attribute evaluation, but it is not
traversed in alternating passes from left to right and from right to left, as is
done in some other compiler compilers. A node is visited if there are no more

Sec. 4.3

GAG - Generator based on attribute grammars

97

nodes to the left of it, and a parent node is visited when no more of the
children can be visited. The syntax tree is therefore not processed in a straight
direction. In fact, evaluation may sometimes step back some nodes to evaluate
attributes that could not be computed earlier. In this manner, the number of
passes over the tree can be reduced. The memory requirements for attributes in
the syntax tree are optimized by various algorithms. After the attribute evaluation, the decorated syntax tree is passed to a user program which generates the
target code.

Implementation
GAG is implemented in Standard Pascal under Unix BSD 4.2 on a Siemens
computer 7.760. It also generates compilers in Standard Pascal. Compilers for
Pearl, LIS, Pascal, and Ada have already been produced by GAG.
4.3 Example Attributed grammar as input for GAG
GaSe
aS SSS = symbol
TERM

digit

NONTERM
NONTERM

Number
Digitlist

and

attribute

value:

INT

% value

[ge

value:

»RULE rl:
“ Number

::=

is a synthesized
INT

SSS ne

declarations

lag

--------------

SYNT;

integer

attribute

SYNT;

I

MEVEDIigTelnse

STATIC

Number.value:=
TEZUZVZTSETHERE

THEN
ELSE

-DigitList.value
DigitList.value

WPL

% No output of the attribute Number.value.
% The attributed tree is passed to a user written
% which prints the results.
END;

RULE r2:
Digitlist

::=

digit

STATIC

Digitlist.value:=digit.value
END;

RULE r3:
Digitlist

::=

Digitlist

digit

STATIC

Digitlist [1].value:=10*Digitlist[2].valuetdigit.value
CONDITION

(Digitlist
[2] .value<3276) OR
((Digitlist
[2] .value=3276) AND (digit.value<8) )
MESSAGE "Constant value too big"
END;

program,

Various compiler compilers

98

4.4 MUG

- modular

Chap. 4

compiler generator

Origin and scope
MUG

was developed in 1985 at the

(Modularer Ubersetzer-Generator)

University of Dortmund (Germany) by Ganzinger and Vach. It processes socalled one-sweep grammars (Engelfriet and File [1981]). MUG supports all
phases of semantic analysis (attribute processing, optimization, and code
generation). However, it does not produce a scanner or a parser. Those can be
generated with YACC and then attached to the MUG system. Semantic
modules are written in Modula-2.
The underlying principles of MUG are substantially different from
traditional attributed grammars. Terminals are viewed as the types of some
semantic objects (so-called semantic sorts), nonterminals are viewed as the
types of syntax trees (so-called syntactic sorts). Productions are therefore
viewed as functions, mapping objects of syntactic and semantic sorts into
syntax trees which are themselves elements of syntactic sorts.
The translation of trees of an input grammar into trees of an output
grammar is called an attribute coupling of the two grammars. Attributes can
be classified as semantic

attributes, which contain semantic values (and

therefore, like the values of terminal symbols, are objects of semantic sorts)
and syntactic attributes, which represent subtrees of the output grammar (and
thus are objects of syntactic sorts). Semantic attributes are computed in
semantic rules, whereas syntactic attributes are built by applying productions
of the output grammar. Semantic attributes can also be viewed as ‘terminal
symbols' of the output grammar.
As a result of this view, several attribute coupling processes can be
concatenated so that the output grammar of the first coupling becomes the
input grammar of the second one. As an option, MUG can automatically
combine the two attribute couplings into a single one. The user can therefore
describe complex translation processes as a sequence of simple translations
(e.g. L-attributed grammars), which the system — hidden from the user — combines into a single attributed grammar that does not need to be L-attributed. In
this manner, readability is balanced with efficiency.
Description language
MUG uses one description language for all translation phases. It is based on
Modula-2. The production
Prodl:

A->Bc

is written in a function-like manner as
CONSTRUCTOR

Prodl

(btree:B;

cval:c):

A

Sec. 4.4

MUG - modular compiler generator

99

An attribute a of a nonterminal $ is written as Sa. All nonterminals must
be declared together with their attributes and attribute types. For semantic
sorts, the user must write Modula-2 modules that export them as types unless
they are standard types of Modula-2. There must be separate modules for the
input grammar, the output grammar, and their attribute coupling. Semantic
rules can contain assignments with arbitrary Modula-2 expressions, function
calls, and if expressions. Syntactic attributes are calculated through constructors of the output grammar. Context conditions have no construct of their
own. They must be specified within semantic functions.
#

Attribute

processing

The attribute processor generated by MUG uses the 'one-sweep' method,
which is an L-attributed processing of the syntax tree, where possibly children
of each node have been previously brought into an adequate order.
Implementation
MUG was implemented in Modula-2 on
aCADMUS computer. It generates
compilers in Modula-2 and has been used for its own implementation.

4.4 Example Attributed grammar as input for
SIGNATURE

DEFINITION

(*definition

of the

MODULE

Numbers;

context-free

FROM Values IMPORT
Value;
(*syntactic
FROM User IMPORT
digit, minus;
SORT
Number, Digitlist;

MUG

input

grammar*)

sort from the output grammar*)
(*semantic sorts (terminals)*)

(*syntactic

sorts

(nonterminals)*)

(*rules of the context-free grammar*)
CONSTRUCTOR PosNumber (dl:Digitlist): Number;
CONSTRUCTOR NegNumber (m:minus; dl:Digitlist): Number;
CONSTRUCTOR SingleDigit (d:digit): Digitlist;
CONSTRUCTOR MoreDigits(dl:Digitlist; d:digit): Digitlist;

(*attribution function for the context-free
OPERATOR Evaluate(n:Number):
Value;
END Numbers.
SIGNATURE

DEFINITION

(*definition of the
SORT Value;
CONSTRUCTOR

END

Values.

MODULE

grammar*)

Values;

context-free

t
Resul
(val: INTEGER):

output
Value;

grammar*)

Chap. 4

Various compiler compilers

100

ATTRIBUTATION

(*attribute
FROM

MODULE

coupling

Values

OPERATOR

IMPORT

above

of

Digitlist

grammars*)

Value;

Evaluate(n:Number):

(*declaration
ATTR Number
ATTR

Numbers;

of the

Value;

attributes*)
SATTR nval: Value;
SATTR

dval:

INTEGER;

(*attributations of the productions*)
CONSTRUCTOR PosNumber (dl:Digitlist): Number;
BEGIN

PosNumber”nval = Result (dl”dval);
(*the constructor "Result" builds a
syntactical attribute of type "Value"*)
END PosNumber;
CONSTRUCTOR

NegNumber

(m:minus;

dl:Digitlist):

Number;

BEGIN

NegNumber“dval
END NegNumber;

CONSTRUCTOR

= Result (-d1”dval);

SingleDigit (d:digit) : Digitlist;

BEGIN

SingleDigit*dval
END

= d;

SingleDigit;

CONSTRUCTOR

MoreDigits(dl:Digitlist;

d:digit):

Digitlist;

BEGIN

MoreDigits*dval
END MoreDigits;

= 10 * dl’dval

+ d;

END Evaluate;
END Numbers.

4.5 Coco - compiler compiler
Origin and scope
Coco arose in 1983 at the University of Linz as a successor of a parsergenerator. It processes L-attributed grammars, which are viewed as procedural
descriptions of a translation process. The compiler description is translated
into an LL(1) parser with automatic error recovery and a semantic evaluator
to
which user modules can be attached. The user must further supply a main
program and a scanner (for which there is a scanner generator). It is possible
to generate multi-pass compilers with Coco.

Sec. 4.5

Coco — compiler compiler

101

Description language
The compiler description language Cocol is based on context-free grammars
in Wirth's EBNF notation. All terminals and nonterminals must be declared.
Each syntax symbol can have one or more attributes. A symbol S$ with an
output attribute a is written as S<out:a> wherever it occurs within a production. Semantic actions are written directly in Modula-2. They may appear
at arbitrary points on the right-hand side of the productions. Attributes can be
accessed like normal variables. Context conditions are written as if statements
in semantic actions.
#

Attribute

processing

Semantic evaluation takes place during the syntax analysis. A syntax tree of
the input is not built. Productions are processed strictly from left to right.
When a semantic action is encountered, it is executed immediately. Attribute
values of terminals are returned by the scanner, those of nonterminals are
passed using assignments generated by Coco.

Implementation
Coco is implemented in Modula-2 on various microcomputers including
Macintosh, IBM-PC, Atari, and Lilith. It is also available on IBM mainframes, Coco generates compilers in Modula-2. It has been used for the
construction of a multi-pass Modula-2 compiler and for the generation of
several tools for static program analysis.
4.5 Example Attributed grammar as input for Coco
GRAMMAR
SEMANTIC

Number
DECLARATIONS

FROM InOut IMPORT
VAR value,valuel:

WriteString,
INTEGER;

Writelnt;

TERMINALS
ve

digit

<out:value>

NONTERMINALS

Number
Digitlist

<out:value>

RULES

Number =
Digitlist<out:value>
|

sem

WriteInt (value,5);

sem

WriteInt (-value,5);

endsem

wow

Digitlist<out:value>
Digitlist<out:value>
digit<out:value>

=

endsem.

Chap. 4

Various compiler compilers

102

{ digit<out:valuel>

sem

IF

(value<3276) OR
((value=3276) AND (valuel<8) )
THEN value:=10*valuetvaluel;
ELSE
value:=0;
WriteString("Constant

too

big");

END;

endsem

ENDGRAM

}.

4.6 Summary
This short overview of some of the better known compiler compilers has
shown that many powerful systems with complex input languages exist for the
definition of many exotic special cases. Why then are these generators so
seldom used for practical applications? There are many reasons. The most
significant is the fact that automatically generated compilers are simply less
efficient than manually coded ones. According to Koskimies et al. [1982], a
Pascal compiler produced with HLP78 ran seven times slower and used three
times as much memory (only for its code!) than a manually generated
compiler.
However, efficiency is not the main goal of a compiler compiler. Often it
is more important that the compiler description be short, formal, and complete.
Then it can be used as a prototype of a compiler implementation for a new
language or to study the techniques of compiler construction as such.
Compiler description languages are sometimes not easy to read. In most
cases ordinary BNF is used for the syntax definition. Although concise and
elegant, this notation often looks unnatural because of the recursion needed to
express repetitions. Attributes usually appear only in semantic rules and not
with the grammar symbols. This makes the productions short, but the reader
must extract from the semantic rules those attributes which belong to a given
syntax symbol. In many cases, the semantic rules may only be attribute
assignments. Therefore, important parts of the actual translation must be
hidden in procedures. Having these difficulties to contend with may even
make the compiler compiler a burden rather than a help.
Finally, most compiler compilers require a lot of memory themselves. For
example, GAG required 4 megabytes of main memory for the generation
of an
Ada compiler, and this amount of memory is not available on many micro-

computers.

Sec. 4.6

Summary

103

We believe that a compiler compiler should be a tool which is easy to
understand and easy to use. Above all, its input language should be clear and
natural, but its availability (e.g. on microcomputers) and efficiency are equally
important. These were the considerations behind the development of Coco and
its input langage Cocol.
Table 4.1 summarizes the main features of the described compiler
compilers.

‘Tempo
SIOPNLISUOI

aremıjJosUrduJuu

$]00}

<OnfeA:1no>
ISFTNSIC]
[NSIC

oneAyIST

ISPTIWiq:Ip)
:(BIp:pNSIC]IST

BI
Snpea
TNSIG

joquids

pozisouuds

o3uls

+1B81p
AOIDNALSNOI
Id
ISTTIBIG
=::
‘suoissaidxo
oynquyye

nap

ayngunre
ied

poynquye-Tsrewurels

yyım
e

feeds

Tease

(DATVI

‘syuoWUsIsse

: IST
USI]
|SBI
srpisiq
“I9sIed
INUEULIS
JoTenfeA9
ON

Berg

(suonJe) sofnI STURWOS

srewumıld
pemnqiyre Jo SSe])

sse[D
Jo
3a1J-IX9JU09
srewureld

Areniqry
I
STUIWITEIS

anquyy
dating
es]
xeyuäsuONeNJeA9
san
Ul
OY}
I9PIO
Jan (ofdurex3)
xeuks (oIdumexs)
uorejou
uonepu
sıngLmy

Baer
pm 5] Samen
mare |

oN

Joyenyead
INUEUIIS
ainquaye
JofenfeA9
‘ı9sıed

Ye] wou ssed afduıs e uy

KıaA9
xejuAs
polopioar
uopusdapur
Joy
Afqıssod
srewurel3
uo
I9PIO
Jo
somnquite
pomnquye-])
uoTenfeA>)
|

T’p aquL

“DUUEIS

uorjouny

uonnguue

epy ‘Teosed “Ta Vdd

Iewwreis ‘Ie pelaplo WOLF
sy[nsal Iopio uoTenTeaq

a Pee

(feuruuspuou

snquny

OnfeASTIEIT

(soon

s1aqtdui09 Japiduios snows Jo soniodoig

JoTenfeA9

‘edendur]
‘sjTeo

Sp
Sri
= ISIC IST “SIP

an)

gnquyy

psiq
st=

PA

dooms, a[8uts e uf

Jorn

‘sJUOWUSISSY
sıngınre
‘suotssoidxe
UOMOUN
‘sTTed

{Zip}
psig
sr=

wtp

BP awes |e

‘“Iosıed

‘I
‘dy
‘Jopey
Teoseg

yoıym

01
porddy

IXJUOJ sodendue] Jo
SUONIPUOO
au Jptiduioo
adendue]
porersusd

)
The compiler description
language Cocol

This chapter describes Cocol, the input language of the compiler generator
Coco. A Cocol text essentially consists of an attributed grammar and
declarations. From this description, Coco generates a parser and a semantic
evaluator. The user has to provide a main program, a scanner, an error
message module and semantic modules to get a complete compiler. Some of
these modules can be generated by tools or are standard modules that do not
depend on the language to be processed.
The attributed grammar consists of a context-free grammar as a description of the compiler input and of semantic information as a description of how
this input is to be translated. When designing an attributed grammar one
usually starts with the context-free grammar and completes it step by step with
attributes, semantic actions and context conditions. Therefore this chapter is
arranged in two parts: the specification of Cocol as a syntax description
language and its specification as a semantic description language.

5.1

Lexical

structure

A grammar description in Cocol consists of keywords, identifiers, strings,
numbers, comments and special characters.

Keywords
ALIAS

ENDSEM

MACROS

RULES

105

The compiler description language Cocol

106

ANY
DECLARATIONS
ENDGRAM

EPS
GRAMMAR
IN

NONTERMINALS
OUT
PRAGMAS

Chap. 5

SEM
SEMANTIC
TERMINALS

Keywords must be written with upper-case letters, except for the following
keywords that may also be written with lower-case letters, as they often
appear in a context where they are not to be emphasized.
alias
any

endsem
eps

in
out

sem

Identifiers
identifier

=

letter

{letter

| digit}.

Identifiers may be of arbitrary length. Case is significant.
Strings
string

= quote {anybutquote} quote
| apostrophe {anybutapostrophe}

apostrophe.

quote means the character ", apostrophe means the character '. anybutquote
is any character except quote, anybutapostrophe is any character except
apostrophe. Strings must not extend beyond line boundaries.
Numbers
number

= digit

{digit}.

Special characters
for the syntax description:
for the semantic description:

PR:
SUITE

Comments start with the string '--' and extend to the end of the line.

5.2 Cocol as a syntax description

language

The kernel of a Cocol text is the syntactic description of the language that the
generated compiler is to process.
Grammar

=

"GRAMMAR"

identifier

SyntaxDeclarations

Productions

"ENDGRAM" ,
The syntax description consists of declarations for terminals and nontermin
als
and of the context-free grammar. The identifier following the
keyword

Sees 5.2

Cocol as a syntax description language

107

GRAMMAR is the grammar name. It is the root symbol (start symbol) of the
grammar and must be declared as a nonterminal. We start with the productions
and continue with the declarations later.

5.2.1

Productions

The productions of the context-free grammar are written in an EBNF
suggested by Wirth [1982] (square brackets enclose optional expressions,
curly brackets denote repetition zero or more times).
Productions

=

Production
Expression
Term
Factor

= identifier "=" Expression
= Term {"|" Term}.
= Factor {Factor}.
Symbol
NzExXpression. u)
w(? Expression) |"

"RULES"

{Production}.

H("SExXpression:

".",

Vy"

"eps"
"any"

Symbol

W

identifier

| string.

5.1 Example Cocol grammar for real constants
RULES
Real
Integer
Exponent

= Integer "." [Integer]
= digit {digit}.
= "E" ["+"|"-"] Integer.

[Exponent].

The symbols Real, Integer and Exponent are nonterminals. The
symbols digit, "E", ".", "+" and "-" are terminals (they have no
productions).

eps
The symbol eps denotes the empty string (see Section 2.1) and is used to
describe empty alternatives.
5.2 Example

Sign

The use of eps

"+"

| "-"

| eps.

Isequivalentto

Sign°=

[24

"NT,

eps is not necessarily needed for the syntax description, but it is required if
one has to attach semantic actions to empty alternatives.

any
The symbol any denotes any terminal, which is not the start of the alternative

The compiler description language Cocol

108

Chap. 5

chain to which the any symbol belongs. Therefore any is a representative of
a whole set of terminals, i.e. all terminals which cannot be recognized instead

of itat that point in the grammar.
5.3 Example

The use of any

Option

=

"$"

any.

Here, any means any terminal.
Token

= keyword

| identifier

| number

| any.

Here, any means any terminal except keyword, identifier or number
(which may be recognized instead of it).
String

=

LY

{any}

sun

Here, any means any terminal except '"' (which may be recognized
instead of it).
Properties of a correct grammar

Coco generates a compiler only if the grammar is:
1.
2.
3.

4.
5.

complete: there must exist a rule for every nonterminal;
free of redundancy: every nonterminal must occur in at least one
derivation of the root symbol;
free of cycles: there must not be a nonterminal which can be derived
from itself in one or more steps;
terminating: every nonterminal must be able to produce a string of
terminals;
unambiguous: the grammar must be LL(1).

LL(1) conflicts do not necessarily mean serious errors. They can be viewed as
warnings in situations where the generated compiler will take the first
matching alternative and ignore the others. Sometimes this is what the user
wants, as in the well-known case of the dangling else.
5.4 Example How the compiler treats LL(1) conflicts
This is the grammar of the dangling else:
Statement = ...
| IfStatement
IfStatement = "IF" Expr "THEN"

|
B
Statement

["ELSE"

Statement].

When analyzing the string

IF a THEN IF b THEN c ELSE d
it is not clear whether the else clause belongs to the inner or to the outer
if. During parsing the first matching alternative is the else of the inner

Sec35:2

Cocol as a syntax description language

109

if. The generated compiler takes this alternative.

5.2.2

Declarations

All terminals and nonterminals must be declared before they can be used in
productions. Declarations have the following order:
SyntaxDeclarations

=

TerminalDeclarations
[PragmaDeclarations]
NonterminalDeclarations.

Terminal

declarations

TerminalDeclarations
AliasName
Symbol

= "TERMINALS" {Symbol [AliasName]}.
= "alias" Symbol.
= identifier | string.

Terminals are declared by their enumeration behind the symbol TERMINALS.
Consecutive token numbers are assigned to them in the order of their
declaration. The first symbol gets the number 1, the next one the number 2,
and so on. If a symbol name contains a special character, it must be enclosed
in quotes (e.g. "+", "plus-symbol").
The end-of-file symbol must not be declared. It always is assumed to
have the token number 0. The lexical analyzer has to supply it as the last
symbol of the input text. At its arrival, the syntax analyzer automatically
interprets it as an indication that the input is empty now. The end-of-file
symbol must not (and cannot) be specified in a production.
A symbol may be given an alias name, which is used in error messages
by the generated compiler. If the alias name is omitted, the symbol name is
used instead of it. Alias names allow the use of short names in the grammar
and of expressive names in error messages.

5.5 Example Terminal declarations
TERMINALS

id
Zu
en

alias
alias
alias

identifier
"becomes symbol"
semicolon

Pragma declarations
Pragmas are a special feature of Cocol. They are neither terminals nor nonterminals and must not be used in productions. They may occur at any
position in the input text and are read by the parser as if they were terminals,
but they do not belong to the syntax of the language (examples of pragmas are

Chap. 5

The compiler description language Cocol

110

options, the end-of-line symbol, and comments). Parsing is not influenced by
pragmas but they may carry semantic information (such as line numbers,
option values, etc.). Pragmas can be used to propagate information between
the passes of a multi-pass compiler.
PragmaDeclarations

=

Symbol

= dentitier

"PRAGMAS"

{Symbol}.

|| string:

Pragmas are declared by enumerating them behind the keyword PRAGMAS.
They are assigned consecutive token numbers, starting with the highest
terminal number plus one.
5.6 Example Pragma declarations
PRAGMAS

"end of
option

line"

The purpose of pragmas will become clear when we attach semantic actions to
them (see Example 5.11).
Nonterminal

declarations

NonterminalDeclarations

=

AliasName
Symbol

= "alias" Symbol.
= identifier | string.

"NONTERMINALS"

{identifier

[AliasName]}.

Nonterminals are declared by enumerating them behind the keyword NONTERMINALS. Their declaration order is insignificant. Nonterminals can be
given an alias name too. The root symbol (grammar name) must also be
declared as a nonterminal.
5.7 Example Nonterminal declarations
NONTERMINALS

Stat

alias

Statement

Expr

alias

Expression

5.3 Cocol as a semantic

description language

The semantics of a translation are specified by attaching semantic actions,
attributes and semantic declarations to the syntax description. The following
grammar of Cocol shows that there are only few locations (marked by
underlined text), where semantic parts have to be added to a Syntax description
in order to get an attributed grammar.

Sec. 5:3

Cocol as a semantic description language

CocolText

= "GRAMMAR"

111

identifier

SyntaxDeclarations
Productions
"ENDGRAM" .

SyntaxDeclarations

= TerminalDeclarations
[PragmaDeclarations]
NonterminalDeclarations.

TerminalDeclarations

= "TERMINALS"

PragmaDeclarations

=

{Symbol
?

[Attributes]

[AliasName]}.

[Attributes]

[SemAction]}.

"PRAGMAS"

{Symbol

NonterminalDeclarations

=

"NONTERMINALS"
{identifier
[Attributes]

AliasName

=

"ALIAS"

Symbol.

{Production}.

Productions

=

"RULES"

Production

=

identifier

Expression

= Term

{"|"

[Attributes]

[AliasName]}.

"="

Expression

".".

Term}.

Term

= Factor

Factor

= Symbol [Attributes]
| "(" Expression ")"
et" Expressuone 4)"

{Factor}.

| Tiieixpression

=}

| SemAction
|

a

Symbol

5.3.1

"eps"

| "any".

=

Semantic

identifier

| string.

actions

A semantic action is a statement sequence on the right-hand side of a production, which is executed after the symbol to the left of it has been recognized
and before the symbol to the right of it will be recognized. Semantic actions
may be written in any algorithmic programming language (in our Coco
implementation this language is Modula-2). There are two kinds of semantic
.
actions.
SemAction

= SimpleAction

Simple semantic
SimpleAction

| SemMacroCall.

actions
=

"sem"

{any}

"endsem".

A semantic action is enclosed by the keywords sem and endsem. Between
them, any statements such as assignments, procedure calls, conditional
statements and loops are allowed. The syntactical correctness of the statements
is not checked by Coco.

Chap. 5

The compiler description language Cocol

112

5.8 Example Semantic actions

We want to have a compiler which counts the words in a text. The
context-free grammar is
Text

=

{Word}.

Now we add semantic actions.
Text

=

sem

count:=0

endsem

{Word sem count:=count+l endsem}
sem IF count>0 THEN
WriteCard(count,3); WriteString("

words")

END

endsem.

Since syntactic and semantic parts are intermixed and hard to read, we
separate them in two 'colums':
Text =
{Word
}

sem count:=0 endsem
sem count:=count+l endsem
sem IF count>0 THEN
WriteCard(count,3);

END

WriteString("

words")

endsem.

Syntactic and semantic parts are separated clearly now. The production
must be read line by line from the left to the right.

The parameters of procedure calls in semantic actions may be specified as
input, output or transient parameters by writing the characters ‘J’, 'T' or '\T'
in front of them ('!', '', and '!4' on an ASCII keyboard). This is a simple
way to make procedure calls more readable. In the resulting compiler these
marks are removed.
5.9 Example Indication of data flow at parameters
ComputeValues

(Largument1,

Semantic macros
Sometimes a semantic

Jargument2, Tresult);

action is needed

at more

than one location in a

grammar. To avoid rewriting of the action, the user can define a macro for it
and call it whenever he needs it.
SemMacroDefinition
SemMacroCall
MacroName

= "sem" ":" MacroName
= "sem" "(" MacroName
= identifier.

":"
")"

{any} "endsem",
"endsem",

A macro definition is a semantic action headed by a macro name which is

enclosed in colons. It must be given in a special section of the semantic
declarations (see Section 5.3.4). Note: The use of semantic macros

reduces the code size of the resulting compiler.

also

See3.5.3

Cocol as a semantic description language

113

5.10 Example Semantic macros
The last semantic action of Example 5.8 is needed more than once, say.
The action is defined as a macro in the semantic declarations as follows
(see Section 5.3.4):
MACROS
:WriteCounter:
IF count>0 THEN

sem

WriteCard(count,3);

WriteString("

words")

END

endsem

It may then be called by writing
sem

(WriteCounter)

endsem

Semantic actions for pragmas
A semantic action may be associated with the declaration of a pragma. This
means that the action is executed every time the parser reads the pragma. In
this way a pragma can cause the execution of a semantic action although it
does not occur in any production.
5.11 Example Semantic actions for pragmas
PRAGMAS
eolsy

5.3.2

sem

PrintLineInfo;
Emit (veol)
endsem

---

call a semantic procedure
write pragma to next interpass

file

Attributes

Attributes describe semantic properties of symbols and their context.
Attributes

=

InArteriputess

"<" QutAttributes ">"
Bu
ETnAteributesz su OuFAttributesien
= nern
Ater pe
NATE}.

QutAttributes
InAttr

= “out™ "N FQutAttr"t?,T
= identifier | number.

OutAttr

=

>"

OutAttr}.

identifier.

In Cocol, attributes play the role of parameters of the grammar symbols. They
are Classified into input attributes, which are passed to a nonterminal for its
recognition, and output attributes, which arise during the recognition of a
symbol.
We also distinguish between formal and actual attributes. Formal attributes occur in the declaration of a symbol or are attached to nonterminals on

Chap. 5

The compiler description language Cocol

114

the left-hand side of a production. Actual attributes are attached to symbols on
the right-hand side of a production.
5.12 Example Attributes
NONTERMINALS

Variable
N:

;

<in:type;

out:object>

type:
-- object:

formal
formal

input attribute
output attribute

<in:type;

out:object>

zesteyper
-- object:

formal
formal

input attribute
output attribute

out:obj>

Sas
=-

actual
actual

input attribute
output attribute

RULES

Variable
= ie

Declaration
= Variable <in:tp;

{E08
Ob]?

Attribute names may be used like variables in semantic actions.

Attributes of nonterminals
Nonterminals may have input and output attributes of arbitrary types. The type
of an attribute is declared like the type of any other variable (see Section
5.3.4). Formal and actual attributes must be assignment compatible in the
sense of Modula-2, although this is not checked by Coco.
Whenever a nonterminal occurs, all its attributes must follow it. Formal
and actual attributes must correspond in number, sequence, and kind (in or
out). A numeric constant may only be specified as an actual input attribute.
Attribute evaluation is similar to parameter passing in procedures: before
the recognition of a nonterminal is started, the values of the actual input
attributes of the nonterminal are assigned to its formal input attributes; when
the nonterminal has been recognized, the formal output attribute values are
assigned to its actual output attributes.
Attributes of terminals and pragmas
Terminals and pragmas may have only output attributes. For implementation
reasons their size is restricted to word size. This restriction can be circumvented by using abstract data types for longer attributes.
Whenever a terminal or a pragma occurs, all its attributes must follow it.
For terminals, the names of the formal attributes are insignificant, but for

pragmas they are significant as they may be used in a semantic action.
Pragmas don't have actual attributes since they cannot appear on the righthand side of a production. The attribute values of terminals and pragmas are
supplied by the scanner (see Section 6.4.2).

SeCH.3

5.3.3

Cocol as a semantic description language

Context

115

conditions

There is no special language construct for context conditions in Cocol. They
are written as conditional statements in semantic actions. This has the
drawback of hiding them somewhat but has the advantage that arbitrary error
actions can be associated with them.
5.13 Example

Context conditions

sem IF typel=type2
THEN@
ELSE

° -- context

condition

-- semantic action
=-werronsaction

2

END

endsem

5.3.4

Semantic

declarations

All variables, procedures and named constants that are used as attributes or in
semantic actions must be declared. The compiler description can be viewed as
a module to which these objects are local. The user may also import objects
from other modules.
SemanticDeclarations

Declarations

=

of semantic

ObjectDeclarations

=

[ObjectDeclarations]
[SemMacroDeclarations].

objects

"SEMANTIC"

"DECLARATIONS"

modulatext.

modulatext is an arbitrary text of import statements, constant, type, variable,
or procedure declarations in Modula-2. The syntax of this text is not checked
by Coco.
5.14 Example Declarations of semantic objects
SEMANTIC

FROM
FROM

DECLARATIONS

InOut IMPORT WriteCard, WriteString;
UserModule IMPORT UserProcedure;

CONST

maxint

=

VAR

field:

ARRAY[1..100]

32767;

PROCEDURE

Equal(x,y:ARRAY

BEGIN

END

...

Equal;

OF

OF

CHAR;

CHAR) : BOOLEAN;

Chap. 5

The compiler description language Cocol

116

Declaration of semantic macros
At this point the user may declare a set of semantic macros in this place which
can be used in the productions.
SemMacroDeclarations
SemMacroDefinition
MacroName

= "MACROS" {SemMacroDefinition}.
= "sem" ":" MacroName ":" {any} "endsem".
= identifier.

An example of the definition and the use of a semantic macro can be found in
Section 5.3.1 (Example 5.10).

5.3.5 Scope of semantic

objects

For implementation reasons, the scope of a semantic object cannot be restricted to a single production: all declared and imported objects are global to the
whole compiler description. This means that the value of a semantic object
may be destroyed by a nonterminal that is processed between the assignment
and the use of that object. One has to resort to the following remedies:
1.

2.

Naming conventions. Every production should use its own names for
those attributes and semantic objects which may be destroyed by another
production. This reduces the problem to semantic objects of recursive
nonterminals.
Stacking. All values which may be destroyed by a nonterminal should be
stacked before this nonterminal is entered and unstacked afterwards.

5.15 Example Stacking of semantic objects
Expression<out:exprval>
Term<out

=

:exprval>

Warn
Term<out : x>

sem

Push (Jexprval)

endsem

sem

Pop(lexprval);

exprval:=exprval+x

Ie

endsem

Term<out:termval> =
Factor<out:termval>

EN

sem Push(Jtermval)

Factor<out :x>

Ie

sem

Pop(Ttermval)

endsem
; termval:=termval*x

endsem

Factor<out:factval> =
integer<out: factval>

| "("

Expression<out:factval>

")",

The original values of exprval and termval are destroyed by the recursiv
e
calls to Term and Factor so they must be saved on a stack.

6
The compiler compiler Coco

This chapter describes the compiler compiler Coco from the user’s point of
view. It contains everything the user needs to know in order to produce a
compiler with Coco. Section 6.1 presents a survey of the main characteristics
of Coco, Section 6.2 describes the components of the generated compilers,
and Section 6.3 shows how these compilers work. Since Coco produces only
the basic parts of a compiler, the user must supply additional modules to get a
complete compiler. Section 6.4 describes the interfaces for these modules and
Section 6.5 shows how a multi-pass compiler can be produced with Coco.

6.1

Characteristics

Coco is a program which generates the basic parts of a compiler from a
compiler description that is supplied as its input. The characteristics of Coco
are:

1.

2.

The compiler definition language Cocol is easy to read and easy to learn.
It is based on L-attributed grammars whose syntax rules are written in
Wirth's EBNF notation, and whose semantic actions are coded directly in
Modula-2.
Coco and the compilers produced by it are small and efficient, since they
use simple analysis techniques (table-driven top-down parsing and Lattributed grammars), and since the parser tables are encoded in a very
compact form (G-code). Therefore, they can be efficiently used on microcomputers with a small memory and limited processor performance.

117

The compiler compiler Coco

118

Chap. 6

The generated compilers contain a syntax error-recovery algorithm that is
automatically derived from the attributed grammar. This frees the user
from developing individual error handlers for each target compiler.
The user can attach modules of his own to the generated compiler parts,
thus adapting the compiler to his particular needs.
The input grammar is checked for completeness, consistency, and unambiguity.
Coco supports the production of multi-pass compilers for languages that
cannot be translated in a single pass, or that are so large that a single-pass
compiler will not fit into memory.
Coco offers the possibility of excluding selected source text portions from
syntax analysis. Thus, it is possible to describe complements of regular
languages, or to forward parts of the input from one pass to the next
without modification.
Besides terminals and nonterminals,

Coco provides a third class of

symbols called pragmas. Pragmas are special terminals that can appear at
arbitrary positions in the input stream, but are not part of the syntax of the
language itself (e.g. end-of-line symbols or compiler options).

How

to invoke Coco

The invocation of Coco and the naming of the files involved depend on the
computer on which Coco is running. We describe the version for the Apple
Macintosh. On the Macintosh, Coco is invoked by clicking its icon and by
selecting an input file from the open dialog box which shows all available text
files. Fig. 6.1 is a block diagram of a Coco run.
Compiler description
in Cocol

Syntax analyzer

Fig. 6.1 Input and output files of Coco

Coco reads a compiler description and produces the following:

1%

a Syntax analyzer as described in Section 2.5 together with parser tables
(G-code and symbol information);

Sec. 6.2

2.
3.

Components of the generated compiler

119

asemantic evaluator as described in Section 3.6;
asource list of the Cocol input with any syntax and semantic error
messages, with the results of the grammar tests and with statistical data
about the grammar.

The syntax analyzer and the semantic evaluator are generated from program
frames on files. On the Macintosh, the generated parts are written to the
following files:
Syntax analyzer:

grammarnamesyn.DEF,

grammarnamesyn

Semantic evaluator:

grammarnamesem.DEF,

grammarnamesem.MOD

.MOD

Source list:

inputname.LST

grammarname is the grammar name specified in Cocol, inputname is the
name of the input file. Section 8.3 shows an example of these files.

6.2 Components

of the generated

compiler

In order to get a complete compiler, the user must attach his own modules to
the compiler parts produced by Coco. The following table shows which parts
are generated by Coco, which must be supplied by the user, and which are

available as standard modules.
Generated by Coco

User-supplied

Standard module

Syntax analyzer
Semantic evaluator

Main program
Lexical analyzer
Semantic modules

Error message module

Hence, Coco generates only the basic parts of a compiler (those which are
described by the attributed grammar). For flexibility, the remaining parts may
be written individually, although they are very similar in all compilers (see
program listings in Appendix F).
The lexical analyzer can be generated with the scanner generator Alex
(Mössenböck [1986]), which is a separate tool not described in this book. It

produces a scanner module in Modula-2 that exactly fits to the modules
generated by Coco.
The semantic modules are written in Modula-2. Only few conventions
have to be obeyed (see Section 6.4).

Chap. 6

The compiler compiler Coco

120

6.3 Operation

of the generated

compiler

Figure 6.2 shows the overall structure of a generated single-pass compiler.
The main program calls the syntax analyzer. The syntax analyzer parses the
source program by interpreting the G-code and executes semantic actions
contained in the semantic evaluator, which in turn call semantic procedures to
emit the target code. A filter procedure between the actual syntax analyzer and
the lexical analyzer filters any pragmas out of the input stream and processes
them semantically.
To create a multi-pass compiler, one must write a compiler description for
each pass separately and translate it with Coco. This results in a syntax
analyzer and a semantic evaluator for each pass. Figure 6.3 shows the
interaction of the generated parts in a two-pass compiler. The first pass reads
the source program, processes it and generates an intermediate language (IL).
The second pass reads the intermediate language, processes it again and
generates the target code.
Main program
Syntax analyzer
Lexical
analyzer

Error message
module

Error

Fig. 6.2

Semantic
evaluator

i
Semantic

Overall structure of a generated single-pass compiler

Main program

Syntax analyzer 1

Syntax analyzer 2

Lexical

Semantic

Semantic

analyzer

evaluator 1

evaluator 2

m,

procedures 1

[2] See
procedures 2

Fig. 6.3 Overall structure of a generated two-pass compiler

eS)

Sec. 6.4

Interfaces of the generated compiler

6.4 Interfaces

of the generated

121

compiler

A compiler nucleus produced by Coco has four interfaces (shown in Fig.
6.4). It is called by the main module, reads the input stream, translates it into
an output stream, and produces error messages. This nucleus is the same for
all generated compilers. The user must attach some of his own modules to
these interfaces to adapt the compiler to his particular needs.
Operating system
interface

Input
E

Syntax analyzer

nee

Semantic evaluator

=

interface

Fig. 6.4 Interfaces of a generated compiler
«

6.4.1

#

Caller

interface

The main program must call the syntax analyzer of the generated compiler to
perform the syntax analysis and semantic processing of the input text. The
following definition module shows the interface between the syntax analyzer
and the main program.
DEFINITION

VAR

MODULE

printinput:
printnodes:

grammarnamesyn;

BOOLEAN;
BOOLEAN;

PROCEDURE Parse (VAR
END grammarnamesyn.

(*trace
(*trace

the
the

input?*)
parser?*)

correct :BOOLEAN) ;

grammarnamesyn is the name of the generated syntax analyzer (the grammar
name from Cocol with the suffix syn). The procedure Parse is the actual
syntax analyzer. It must be called from the main program of the compiler.
Prior to this, the lexical analyzer (see Section 6.4.2) must be initialized and
ready to supply the first symbol. The parameter correct shows if syntax
errors have been found. The variables printinput and printnodes can be set to
TRUE in order to produce a trace of the syntax analysis for debugging.

122

The compiler compiler Coco

6.4.2 Input

Chap. 6

interface

The syntax analyzer expects the input from a procedure GetSy which must be
supplied by the user in a module grammarnamelex (grammar name from
Cocol with the suffix lex). The corresponding definition module must look
like this:
DEFINITION

MODULE

grammarnamelex;

VAR

typ:
at:
line:
col:

CARDINAL;
ARRAY[1..10]
CARDINAL;
CARDINAL;

PROCEDURE GetSy;
END

OF CHAR;

(*current symbol number*)
(*attributes of the current symbol*)
(*current symbol line number*)
(*current symbol column number*)

grammarnamelex.

Every time the syntax analyzer needs a new terminal, it calls the procedure
GetSy which returns the symbol number, line number and column number of
the next source symbol in the global variables typ, line and col. It also fills
the array at. If a symbol has i attributes, then az[1..i] holds their values. at is
implicitly imported in any attributed grammar. It can contain a maximum of 10
attributes which experience has shown is sufficient. If imported, typ, line,
and col can be used in the attributed grammar to get the type and the attributes
of symbols that are recognized by the special symbol any.
The symbol numbers returned by GetSy must correspond to the declaration sequence of the terminals and pragmas in the compiler description. The
first declared symbol must have the number 1, the next symbol must have 2
and so on. At the end of the input stream GetSy must return an end-of-file
symbol which by convention has the symbol number 0.

6.4.3 Output

interface

For the generation of object code and other compiler outputs the user
is not
bound by any restrictions. One can arbitrarily attach one's own modules
to the
compiler nucleus and call one's procedures from the semantic actions
of the
attributed grammar.
Thus, the output interface is the interface to all user-supplied
semantic
modules. It is described by the import clauses in the semanti
c declarations of
the compiler description and by the imported definition modules
.

Sec. 6.4

Interfaces of the generated compiler

6.4.4 Syntax

error

123

interface

The syntax analyzer of the generated compiler automatically recovers from a
syntax error and gathers information about the cause of error. However, the
user must provide for the output of the error message by supplying a
procedure SyntaxError exported from a module Errors (see standard module
in Appendix F). This procedure is called by the syntax analyzer each time a
syntax error occurs. It can print the error message immediately or store it in
order to display all error messages together at the end of the compilation. The
definition module Errors must have the following form:
DEFINITION

TYPE

MODULE

Symbolname

=

Errors;

ARRAY[1..25]

Errorptr
= POINTER
Errornode
= RECORD
txt:
Symbolname;
ils
CARDINAL;
next: Errorptr;

OF

CHAR;

TO Errornode;
(*symbol name*)
(*length of symbol name*)
(*to next symbol of the same

message*)

END;

PROCEDURE SyntaxError
END Errors.

(symbols:Errorptr;

line,col:CARDINAL) ;

SyntaxError has three parameters: symbols is a pointer to a linked list of
those symbols that are expected at the error location (if available, alias names
are uSed in place of symbol names). The parameters line and column indicate
the line number and column number of the error location.
Figure 6.5 shows a sample list of expected symbols pointed to by the
parameter symbols.

Bl

a aig

ee
SS

pee

Ä

Fig. 6.5 List of expected symbols. colon is the symbol causing the error;
semicolon or END have been expected instead

The first node of the list contains the symbol that caused the error (in this case
the colon), the subsequent nodes contain the symbols that were expected
instead

of the erroneous

symbol

(in this case

semicolon

and END).

SyntaxError can now produce the following message:
Syntaxerror

in

line...column...near

colon:

semicolon

or END

expected

Chap. 6

The compiler compiler Coco

124

6.5 Generation

of multi-pass

compilers

With L-attributed grammars, some languages can only be translated in multiple
passes. Some other languages are so complex that a single-pass compiler
would not fit into the memory of a microcomputer. For these reasons, a
compiler must often be split into several passes.
Each pass is a compiler of its own. It reads the source program, or an
intermediate language from which it produces a new intermediate language, or
the target program. If somebody wants to write a multi-pass compiler, he must
write a compiler description for each pass, and then put the produced compiler
passes in sequence (see Fig. 6.3). Cocol has features that are specially
designed for the generation of multi-pass compilers:
Input from an intermediate language. It is possible to read an intermediate language file instead of a source text by simply supplying an appropriate input procedure GetSy (see Section 6.4.2)
Pragmas serve mainly to pass control information from one pass to the
next in the intermediate language. Before they get to the syntax analyzer of the
next pass they are extracted from the input stream and processed semantically.
The symbol any. The grammar symbol any can be used to exclude parts
of the source text from the syntax analysis, and forward it unchanged to the
next pass.
6.1 Example Application of any

A typical application of the complement symbol any is to process
declarations in the first pass of a compiler and statements in the second
pass. The following example skips statements and forwards them to the
next pass:
Block =
Declarations
BEGINSY

{ any

sem

Copy (4typ, dline, dcol, dat) ;
== copy symbol to next

--

intermediate

language

endsem

}
ENDBLOCKSY.

Here, any denotes all terminal symbols except ENDBLOCKSY. It can
be semantically processed using the variables typ and at exported by the
lexical analyzer (see Section 6.4.2).

7
The implementation

In this chapter we will show how Coco is structured and how it works. First
we provide an overview of its design (7.1). Then we describe the internal data

structures such as the symbol list (7.2) and the top-down graph (7.3), as well
as the collection of some sets of terminal symbols (7.4). Section 7.5 covers

various grammar tests which the top-down graph is subjected to before the
target compiler is generated. The last three sections cover the generation of the
compiler parts, namely the parser tables (7.6), the syntax analyzer (7.7), and
the semantic evaluator (7.8). Section 8.3 shows an example of the generated

compiler parts for a specific input grammar.
At the beginning of each section, a diagram is used to illustrate how this
section relates to the structure of chapter 7.
The implementation

Sn

Moc
Structure
of the
symbol
list

Structure
of the
top-down
graph

Collecting
the
symbol sets

Grammar
tests

Generation
of the
parser
tables

Generation
of the
syntax
analyzer

Generation
of the
semantic
evaluator

Fig. 7.1 Structure of Chapter 7

We describe algorithms in an abstract manner, using Adele or Cocol.
Appendix F contains the concrete implementation of Coco. Details that are not

125

necessary for understanding the algorithms are
the program listings.
Coco is written in Modula-2 and has been
computers including Macintosh, IBM-PC,
compilers in Modula-2 and was used for its
describe the implementation on the Macintosh.

7.1

Chap. 7

The implementation

126

omitted as they can be found in
implemented on various microAtari and Lilith. It produces
own implementation, too. We

Survey

Like any compiler, Coco is composed of an analysis part (front end) and a
synthesis part (back end). The analysis part consists of a lexical analyzer and
a syntax analyzer. The synthesis part consists of a semantic evaluator with
several semantic modules attached to it (Fig. 7.2).
Main program
Syntax analyzer

Lexical analyzer

Symbol list
handler

Top-down graph
handler

Semantic evaluator

Grammar tests

Generation
of the
syntax analyzer

Generation
of the
semantic evaluator

Fig. 7.2 Structure of Coco with its main tasks shown as semantic modules

From the above, the main tasks of Coco are:

1.

2.

3.

4.

handling a symbol list: Symbol information is stored (name, symbol
number, attribute, scope, etc.);
handling a top-down graph: Graph nodes are generated and linked to
form subgraphs;
testing the grammar: The grammar is checked to see if it is complete,
non-circular, and LL(1). It is also checked to see whether all nonterminals
can be reached and derived into terminal strings;
generating the syntax analyzer: The source code of the generated syntax

analyzer is built from fixed frame parts, and variable parts derived from

Sec 7.2

Structure of the symbol list

127

the compiler description. It includes LL(1) parser tables generated from
the attributed grammar;

5.

generating the semantic evaluator: The source code of the semantic
evaluator is built from fixed frame parts and from semantic actions and
declarations copied from the compiler description.

The main algorithm of Coco is as follows:
Coco:

Initialize

lexical

Parse (Tok);

analyzer;

7

SOC CU LONE a4

if ok then
Find deletable ‘symbols;
Insert eps-nodes before deletable
Delete redundant eps-nodes;

nt's;

Get symbol sets;
Test grammar(lok);
end;
EWOK
then Generate compiler;
else Print error message;
end;
end Coco;

== Section
-- Section
=—SSCCLVONU
== Section
-- Section

--

141
7.3.3
Teor
7.4
7.5

Sections

7.6

and

7.7

The procedure Parse parses the input text and calls the semantic actions for
the construction of the top-down graph and the symbol list as well as for the
generation of the semantic evaluator. After some tests and transformations of
the data structures the target compiler is produced.

7.2 Structure

of the symbol list

Coco handles a symbol list with information about terminals, nonterminals,

and pragmas. This section describes its representation and shows how it is
filled.

7.2.1

Symbol

list representation

The symbol list is a linear list of symbol nodes each of them describing a
syntax symbol. The list is indexed by symbol numbers.
TYPE

Symboltype

=

(eps,t,pr,nt,any,err);

(*eps, terminal, pragma,
Symbolnode = RECORD

spix:

CARDINAL;

nonterminal,

(*spelling

any,

index

error-symbol*)

of symbol

name*)

Chap. 7

The implementation

128

The implementation

Structure
ofthe
top-down
2

symbol list
representation

Collecting
the
symbol sets
es

Grammar
tests

Generation
of the
parser
tables

Generation
of the
syntax
analyzer

Generation
of the
semantic
evaluator

symbol list
construction

Fig. 7.3 Structure of Section 7.2
aliasspix:
CARDINAL;
nra:
CARDINAL;
CASE typ: Symboltype OF
t,eps,any:

(*spelling index of alias
(*number of attributes*)
(*symbol kind*)

name*)

(*nothing*)

| pr:
seml,sem2:

CARDINAL;

(*pragma

semantics*)

nt,err:

start:

CARDINAL;

(*start

del:

BOOLEAN;

(*TRUE

firstat:

Attributeptr;

(*to

of top-down
if

first

graph*)

deletable*)

formal

attribute*)

END;
END;

Symbollist

= ARRAY[0..maxsymbol]

OF

Symbolnode;

The fields spix, aliasspix, nra, and typ are filled when the symbol is
declared. For terminals, this is the only information stored in the symbol list.
The node of a pragma has two additional fields denoting the semantic
actions which the generated compiler has to execute when it reads this pragma.
The first action is for the output attribute assignments (Section 7.8.4), the
second is the semantic action associated with this pragma in Cocol. If no
actions are to be executed, both fields are zero. The fields are filled when the
pragma is declared.
Nonterminal nodes contain additional information: The field start points
to the root of the top-down graph of this specific nonterminal. It is set when
the corresponding rule has been processed. At the same time, the field del is
set, which indicates whether the nonterminal is directly deletable, i.e. if it can
be immediately derived into the empty string. The indirect deletability of a
nonterminal can only be determined when the top-down graphs of all
nonterminals have been built (see Section 7.4.1). Finally, nonterminal
nodes
have a field firststat pointing to a list of formal attributes. This list contains

Seen 7.2

Structure of the symbol list

129

the name and direction (input-output) of each attribute of the nonterminal. The

attribute list is built when the nonterminal is declared. It is implemented as
follows:
TYPE

Direction
Attributeptr

= (up,down);
(*attribute
= POINTER TO Attribute;

Attribute

=

RECORD

spix:

CARDINAL;

(*attribute

dir:

Direction;

(*up, down*)

next:

Attributeptr;

(*to

END;

direction*)

next

name*)
attribute

of

same

nt*)

7

Names of symbols and attributes are not stored in the symbol list directly.
Rather, they are stored in a name list which is an array of characters. Instead
of the actual names the symbol list contains only their address in the name list,
called spix (spelling index). The lexical analyzer handles a hashed list of
'spixes' for fast searching of names.

7.2.2

Symbol

list construction

For each symbol in the syntax declarations of Cocol, a symbol node with a
successive number is allocated. Therefore, symbol numbers correspond to the
declaration sequence of the symbols. The following procedures are used to
generate, access, and modify symbol nodes:

PROCEDURE NewSy (spix:CARDINAL;
PROCEDURE SyNr(spix:CARDINAL):
PROCEDURE
PROCEDURE

GetSy(sy:CARDINAL;
RepSy(sy:CARDINAL;

typ:Symboltype) : CARDINAL;
CARDINAL;
VAR sn:Symbolnode) ;
sn:Symbolnode) ;

NewSy generates a new symbol node with the fields spix and typ and
returns its node number. SyNr searches for the symbol with the name spix.
If spix is found, SyNr returns the corresponding symbol number, else it
returns 65535 (the value of the null symbol). GetSy gets the symbol node sn
corresponding to symbol number sy. Repsy replaces the symbol sy by the
node sn.
Attributes are processed with the following procedures:
PROCEDURE

NewAt (sy, spix:CARDINAL;

PROCEDURE
PROCEDURE

GetAt(sy,n:CARDINAL; VAR spix:CARDINAL;
CompleteAt (sy,n:CARDINAL) : BOOLEAN;

dir:Direction);
VAR

dir:Direction);

NewAt defines a new attribute for the symbol sy. For nonterminals, it also
appends the name (spix) and the direction (dir) of the attribute to the attribute

list. GetAt gets the fields spix and dir of the nth attribute of the nonterminal
sy. If sy has less than n attributes, then 0 is returned as the value of spix.

130

The implementation

Chap. 7

CompleteAt returns TRUE if the symbol sy has exactly n attributes. The
implementation of these procedures is trivial as can be seen in Appendix F.

7.3 Structure

of the top-down

graph

The top-down graph has already been described in Section 2.3 as an internal
grammar representation. In Coco, it is implemented in a somewhat extended
form. First, we will describe the extended top-down graphs, and then show
how they are generated. In Section 7.6.2, we will describe the translation of
top-down graphs into G-code.
The implementation

Structure
of the
symbol

Collecting
the
symbol sets

Grammar
tests

Generation
of the
parser

Generation
of the
syntax

Generation
of the
semantic

tables

analyzer

evaluator

list

Top-down

Top-down

graph

graph

representation

construction

Insertion

Removal

of

of
eps-nodes

redundant
eps-nodes

Fig. 7.4 Structure of Section 7.3

7.3.1 Top-down

graph

representation

The top-down graph is a linear list of graph nodes. Each symbol on the righthand side of a Cocol rule is represented by a node. The pointers linking
the
nodes are indices of this list.
TYPE

Topdowngraph = ARRAY{1..maxnode] OF Graphnode;
Graphnode
= RECORD
typ:
(eps,t,nt,any);
(*symbol kind*)

sp:

CARDINAL;

ipo
ra

CARDINAL;
CARDINAL;

(*t,nt: pointer to node in symbol
(*eps:
pointer to eps-set*)
(*any:
pointer to any-set*)
(*left pointer*)
(*right pointer*)

list*)

Sec. 7.3

Structure of the top-down graph

seml:
sem2:
sem3:
line:
link:

CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;

‘

131

(*in-attribute action*)
(*out-attribute action*)
(*explicit semantic action*)
(*line number in the source text*)
(*pointer to the next right end*)

END;

Compared to Section 2.3 the graph node is extended by three semantic
numbers, a line number, and a pointer (link). These fields have the following
meaning:
seml:

action number of the input attribute assignments or zero (Sect. 7.8.4);

sem2: action number of the output attribute assignments or zero (Sect. 7.8.4);
sem3: number of the user-written semantic action which follows this symbol
in the Cocol text, or zero;
line:

line number of this symbol in the Cocol text (for error messages);

link:

pointer for linking the right ends of a graph (the right ends are the
nodes whose right pointer is zero).
=

7.3.2

Top-down

graph

construction

It is useful to think of a top-down graph as a ‘black box' linked to its environment by two pointers head and tail. The interior of the black box may contain
a single node, or an arbitrarily complex graph with several nodes. (Fig. 7.5).

Fig. 7.5 Top-down graph as a black box’

head points to the root of the graph and fail to its right end. Since the right
end of the graph usually consists of several nodes, these nodes are linked (see
dashed lines above). The following procedures are used to generate and
process the graph nodes:
PROCEDURE
PROCEDURE

NewNode (typ:Symboltype;
GetNode (n:CARDINAL; VAR

PROCEDURE

RepNode

(n:CARDINAL;

sy,line:CARDINAL) : CARDINAL;
gn:Graphnode);

gn:Graphnode) ;

NewNode creates a graph node containing the specified symbol sy, having

The implementation

132

Chap. 7

the symbol type typ, and the line number line and returns its node number.
GetNode returns the nth graph node in gn. RepNode replaces the nth
graph node by gn.
Two top-down graphs can be combined to a new graph by arranging
them either side by side as successive components or below one another as
alternatives. In either case, a new top-down graph with head and tail is
produced.

Linking of successive components
Coco uses the procedure ConcatRight to link sucessive components.

ConcatRight (theadl,
param
local

ftaill,

Jhead2,

headl,head2,taill,tail2:
p:2Cardıinal;

Jtail2):

Cardinal;

begin
p:=taill;
while p<>0 do
gn(p) .rp:=head2;
p:=gn(p)
.link;

end;
Pad lista
las
end ConcatRight;

ConcatRight links the graphs (headl, taill) and (head2, tail2) via right
pointers giving the new graph (headl, taill). The right ends of the first
graph are linked with the root of the second graph (see Fig. 7.6).

Fig. 7.6 Linking of successive components

Secs 7.3

Structure of the top-down graph

133

Linking of alternatives
Coco uses the procedure ConcatLeft to link alternatives.
ConcatLeft (fheadl,

{taill,

Jhead2,

Jtail2)

param headl,head2,taill,tail2:
Cardinal;
local p: Cardinal;
begin
p:=headl;
while gn(p).lp<>0 do p:=gn(p).1lp; end;

gn (p) .lp:=head2;
p:=taill;
while gn(p) .link<>0
gn(p) .link:=tail2;
end ConcatLeft;

ConcatLeft

do p:=gn(p).link;

end;

links the graphs (headl, taill) and (head2, tail2) via left

pointers giving the new graph (headl, taill). The end of the first alternative
chain of the first graph is linked with the root of the second graph. The right
ends of both graphs are connected in a similar way (see Fig. 7.7).

Fig. 7.7 Linking of alternatives

An attributed grammar for the construction of top-down graphs
In order to show that attributed grammars can be used for documentation as
well, we will describe the generation of the top-down graph for one syntax
rule by means of an attributed grammar. The complete top-down graph is
composed of the graphs for all syntax rules.

Chap. 7

The implementation

134

The grammar of EBNF rules
Rule
Expression
Term

= identifier "=" Expression
= Term {"|" Term}.
= Factor {Factor}.
symbol | "eps" | "any"
| "(" Expression ")"

Factor

".".

[eUIeExpressmon))"
(USS Bx pices SHOnmun aus

contains the nonterminals Expression, Term, and Factor. Each of these
nonterminals supplies as an output attribute a top-down graph with the ends
head and tail. These graphs can be linked in two different ways: factor
graphs are linked via right pointers, term graphs via left pointers

(ConcatRight and ConcatLeft). A new top-down graph is formed in either
case, which is again represented by head and tail.
Expression, Term, and Factor also supply an output attribute del,
which indicates if the term or factor is directly deletable, i.e. if it can be
derived into the empty string. del is entered into the symbol list.
The attributed grammar uses the procedures described above to handle the
symbol list (GetSy, RepSy, SyNr) and the top-down graph (NewNode,
ConcatLeft, ConcatRight).
GRAMMAR

Rule

SEMANTIC

FROM
FROM
VAR

--

graph

generation

for

a single

rule

DECLARATIONS

cocogra
cocosym

IMPORT
IMPORT

NewNode, ConcatLeft, ConcatRight, Push, Pop;
GetSy, RepSy, SyNr, Symbolnode, anysy, epssy;

h1,h2,h3: CARDINAL;
t1,t2,t3: CARDINAL;
dell,del2,del3:BOOLEAN;
sn: Symbolnode;
spix,syspix: CARDINAL;

sy:

CARDINAL;

----

head pointers
tail pointers
TRUE, if element

---

spelling indices
symbol number

is deletable

MACROS
sem

:PushValues:

Push(Jh1);
Push(Jh2);

Push(Jt1);
Push(Jt2);

Push (Jdell);
Push(Jdel2);

endsem

sem

:PopValues:

Pop(Tdel2); Pop(Tt2); Pop(Th2);
Pop (Tdell);
endsem

Pop (ttl);

Pop (Th1);

TERMINALS
"

(a

uh) "

"

fe

wy "

symbol<out :spix>

" {”

” }"

wow

" n ”

"eps"

"any"

Sec. 7.3

Structure of the top-down graph

135

NONTERMINALS

Rule
Expression <out:hl,tl,dell>
Term <out:h2,t2,del2>

Factor

<out:h3,t3,del3>

RULES
Rule =
symbol<out:syspix>
win

Expression<out:hl,t1,dell>

sem

sy :=SyNr (dbsyspix) ;

#

Get Sy (Lsy, Tsn);
sn.del:=dell;

sn.start:=hl;

RepSy (Jsy,Jsn);

endsem

Expression<out:hl,tl,delt>

=

Term<out:hl,tl,dell>
{ "|" Term<out :h2,t2,del2>

sem ConcatLeft (fh1,ft1,4h2,4t2);
dell:=dell
endsem

OR del2;

Term<out :h2,t2,del2> =
Factor<out:h2,t2,del2>

{ Factor<out:h3,t3,del3>
sem

ConcatRight (fh2,1t2,4h3,Yt3);
del2:=del2 AND del3;
endsem

}%
Factor<out:h3,t3,del3>
symbol<out :spix>

=
sem

sy:=SyNr(Jspix);

h3:=NewNode (Jsy);

t3:=h3;

del3:=FALSE;

endsem

| "eps"

sem

h3:=NewNode

(Jepssy);

t3:=h3;

del3:=TRUE;

sem

h3:=NewNode (Janysy);
endsem
(Tr
sem (PushValues) endsem
Expression<out:h3,t3,del3>
N
sem (PopValues) endsem
aes le
sem (PushValues) endsem
Expression<out :h3,t3,del3>

t3:=h3;

del3:=FALSE;

sem

hl :=NewNode (Jepssy) ; tl:=hl;

endsem

| "any"

ConcatLeft (th3,!t3,lh1,/t1);
del3:=TRUE;
endsem

Chap. 7

The implementation

136

Mt ps
he
Expression<out

sem (PopValues) endsem
sem (PushValues) endsem
:h3,t3,del3>
sem

h1:=NewNode (Jepssy);

tl:=hly

ConcatRight
(th3, 1t3,4n3, 4t3);

ConcatLeft (fh3,ft3,Jh1,Jt1);
t3:=t1; del3:=TRUE;
endsem
sem (PopValues) endsem.

Ww
ENDGRAM

Figure 7.8 shows which graphs are produced by the translation of an EBNF
expression in brackets. As an example, we select the expression abc.
(ablc)

[abIc]

{ablc}

Fig. 7.8 Translation of an EBNF expression into a top-down graph

7.3.3

Insertion

of eps-nodes

Normally each symbol of the input grammar corresponds to one node in the
top-down graph. However, from Fig. 7.8, we see that the translation of
expressions in square or curly brackets leads to the generation of additional
eps-nodes which have no counterpart in the input grammar. They are inserted
by Coco to indicate that an expression is deletable.
There are also some other cases where eps-nodes must be inserted into
graphs: The algorithm of Section 7.3.2 will fail if a term that begins with an
expression in curly brackets has an alternative. The production
g =

(tay be

ey

would lead to the top-down graph shown in Fig. 7.9.

Sec. 7.3

Structure of the top-down graph

137

Fig. 7.9 Erroneous top-down graph for S = ({a} b Ic)

This is obviously wrong because once an a has been identified, only a or b
should follow, not c, as is possible in the above graph. This problem is
solved by including an €ps-node in front of the first alternative (Fig. 7.10).

anes
u
Fig 7.10 Correct top-down graph for S = ({a} b | c) with inserted eps-node

This graph is now correct since after identifying an a, only a or b can
follow, not c. For each eps-node, the set of terminal successors (eps-sef) is
computed (Section 7.4.4). The eps-set of the node el (namely {a, b}) allows

us to distinguish between the two alternatives in the above example. Epsnodes-are inserted in front of all expressions in curly brackets during the
construction of the top-down graph (see attributed grammar in Appendix F).
Deletable nonterminals present a similar problem. If a nonterminal is
deletable, it is always processed by the syntax analyzer, because if the current
input symbol is not a start symbol of the nonterminal itself it may still be a
valid successor. Now, if there is a node which is an alternative of a deletable
nonterminal, this node will never be visited, since the nonterminal will always
be recognized beforehand. Coco solves this problem by inserting an eps-node
in front of a deletable nonterminal. The eps-set of this node is then used to
distinguish between the alternatives. From the graphs shown in Fig. 7.11,
where the deletable nonterminal Y has an alternative, the graphs in Fig. 7.12
are produced.
1

SS NO

Ne

1
b

&

Fig. 7.11 Top-down graph with deletable nonterminal Y
X:

—

el

i

b

—~> Y—-

a

Y:

—

c

i

€2

Fig. 7.12 Top-down graph with inserted eps-node in front of deletable nonterminal Y

Chap. 7

The implementation

138

The eps-set of the node el (namely {a, c}; c is a terminal start of Y and a
is successor of the deletable nonterminal Y) enables the selection between the

two alternatives starting with el and b. There are no more alternatives to the
node with the deletable nonterminal Y. It can therefore be safely visited by the
syntax analyzer.
The algorithm for the insertion of eps-nodes in front of deletable
nonterminals is shown below.
Insert eps-nodes before deletable
local gn,gnl: Graphnode;
sn:
Symbolnode;
begin
for all nodes i do

GetNode (Li, Tgn) ;
if (gn.typ=nt) and

nt's:

(gn.lp<>0)

then

GetSy(Jgn.sp,Tsn);

if sn.del then
-- deletable
gnl:=gn; gnl.1p:=0;
-j:=NewNode (Jnt,J0,40);

--

nt with
gnl now
create

alternative
holds the deletable
empty

nt

node

RepNode (4j,Jgnl);

gn.typ:=eps; gn.sp:=0;
-- gn holds the new eps-node
gn.rp:=j; gn.seml:=0; gn.sem2:=0; gn.sem3:=0;
RepNode (Li, tgn) ;
end;
end;
ame => ioe
end Insert eps-nodes

7.3.4

Removal

before

of redundant

deletable

nt's;

eps-nodes

When expressions in square or curly brackets are translated, eps-nodes arise
that can be removed again if it turns out that the expressions have successors
(see Fig. 7.13). The algorithm for the removal of redundant eps-nodes is
shown below:
Delete redundant eps-nodes:
global visited: set of nodenumbers;
sn:
Symbolnode;
;
begin
visited:={};
for

all

nonterminals

i do

GetSy (Li, Tsn);
DelEps (Jsn.start);
end;

end

Delete

redundant

eps-nodes;

--

mark

list

for

visited

nodes

sec. 7.3

Structure of the top-down graph

EBNF expression

Graph with
redundant
eps-nodes

139

Equivalent graph
without redundant
eps-nodes

[a] b

—

lee

(a)b

ee

en

’

€e—b

Fig. 7.13 Creation and removal of redundant eps-nodes

The procedure DelEps(Jloc) deletes all redundant
graph with the root loc. Redundant eps-nodes
following characteristics: they have no associated
pointer is null, and their right pointer is not null.
from the left pointer of some other node.
DelEps (4loc) :
pafam
loc:
global
local
begin
if

Cardinal;

visited:
gn,gnl:
loc=0

or

eps-nodes in the top-down
can be recognized by the
semantic actions, their left
They always receive a link

set of nodenumbers;
Graphnode;
loc

in visited

then

--

return

mark

end;

list

for

visited

--

end

or

cycle

visited:=visited+t{loc};
GetNode

(Lloc, Tan);

if gn.lp<>0 then
-- test if alt. node
GetNode (Jgn.1p, Tgnı);
if (gnl.typ=eps) and (gnl.sem3=0)
and (gnl.lp=0) and (gnl.rp<>0) then
gn.lp:=gnl.rp;
RepNode (Jloc, Jan);

end;
end;

DelEps (Jgn.1p);
DelEps (Jgn.rp);

end

DelEps;

is a redundant

eps

nodes;

Chap. 7

The implementation

140

7.4 Collecting

the symbol

sets

So far, the input grammar has been read and the symbol list as well as the topdown graph have been built. From these two data structures, Coco calculates
the symbol sets needed for the grammar tests and for the generated compiler.
The implementation

acs
Structure
of the

Structure
of the

Collecting
the

symbol
list

top-down
graph

|symbol sets

Deletable
nonterminals

co
Grammar
tests

Terminal
start symbols
of

Terminal
successors
of

nonterminals

nonterminals

To

Generation
of the

Generation
of the

Generation
of the

parser
tables

syntax
analyzer

semantic
evaluator

eps-sets

any-sets

Fig. 7.14 Structure of Section 7.4

Coco collects four sets of terminals:

1.
2.
3.
4.

start symbols of nonterminals;
successors of nonterminals;
successors of eps-nodes (eps-sets);
sets represented by any-symbols (any-sets).

The following procedures are used to access the top-down graph and the
symbol list:
PROCEDURE
PROCEDURE
PROCEDURE
PROCEDURE

GetNode(loc:CARDINAL;
VAR gn:Graphnode);
RepNode(loc:CARDINAL; gn:Graphnode);
GetSy(sy:CARDINAL; VAR sn:Symbolnode) ;
RepSy (sy:CARDINAL; sn:Symbolnode) ;

GetNode gets the graph node gn with the number loc. RepNode replaces
the graph node with the number loc by the node gn. GetSy gets the symbol
node sn with the number sy. RepSy replaces the symbol node with the
number sy by the node sn.
Before the symbol sets are collected, it is necessary to find out which
nonterminals are deletable.

Sec. 7.4

7.4.1

Collecting the symbol sets

Deletable

141

nonterminals

All deletable nonterminals are tagged in the symbol list. In the first step,
tagging of those symbols which’can be directly derived into the empty string is
carried out. In the second step, tagging of all those nonterminals whose topdown graph can be traversed along a path of already tagged symbols is carried
out. The second step is repeated until no more deletable symbols are found.
The directly deletable nonterminals are found when the top-down graph is
created (see Section 7.3.2). The following algorithm finds the indirectly deletable nonterminals.
“
Find

deletable

local

symbols:

sn:
changed:

Symbolnode;
Boolean;

begin
repeat
changed:=false;
for all nonterminals

i do

Getsy(Ji,Tsn);
if not

sn.del

and

Deletable(Jsn.start)

sn.del:=true; RepSy (Ji,Ysn);
end;
end;
«,
until not changed;
end Find deletable symbols;

then

changed: =true;

The procedure Deletable(\
loc) checks if the top-down graph rooted at loc is
deletable (i.e. if it can be traversed along a path of deletable symbols).
Deletable

param
global
begin

(loc)

marked:={};

end

Boolean:

loc:
marked:

Cardinal;
set of nodenumbers;
return

--

mark

list

for

visited

nodes

DelGraph (4loc) ;

Deletable;

The actual work is performed by the procedure DelGraph.
DelGraph (4 loc)

Boolean:

param
loc:
Cardinal;
global marked: set of nodenumbers;
local
gn:
Graphnode;
begin
if loc=0 then return true; end;
if loc in marked then return false;
marked:=marked+{loc};
GetNode(dloc,

return

DelGraph;

end of graph found
already visited: cycle

Tgn);

((gn.lp<>0)

(Delnode(Jgn)
end

---

end;

and

DelGraph (Jgn.1p))

and

or

DelGraph(Jgn.rp));

--

deletable

--

or deletable

alternat.

--

part

of graph

right

Chap. 7

The implementation

142

Finally, DelNode checks if a node (i.e. its corresponding symbol) is deletable.
DelNode (gn) Boolean:
param gn: Graphnode;
local sn: Symbolnode;
begin
if gn.typ=nt
then

GetSy(Jgn.sp,Tsn);

else return
end;
end DelNode;

7.4.2 Terminal

return

sn.del;

gn.typ=eps;

start symbols

of nonterminals

The terminal start symbols of a nonterminal are the terminal start symbols of
its top-down graph, i.e. the start symbols of its first alternative chain. Those
nodes of the chain which contain nonterminals will have their terminal start
symbols calculated recursively. If the chain contains a deletable symbol, its
successors have also to be considered. The terminal start symbols of all
nonterminals are stored in a list.
Get terminal start symbols:
global first: array(nonterminals)
of record
1888
set of terminals;
-- terminal start symbols
ready: Boolean;
Street
eseisEcompuLed
end;
loealsssıt
Symbolnode;
begin
for all nonterminals i do first (i).ready:=false; end;
for all nonterminals i do
GetSy (vi, Tsn);

GetFirstSet (Ysn.start, Tfirst (i) .ts);
first (i) .ready:=true;
end;
end Get terminal start symbols;

The procedure GetFirstSer(Lloc,Ts) supplies the terminal start symbols of
the top-down graph with the root loc.
GetFirstSet

param

(Lloc,Ts):

loc:
Si
visited:

global
begin
visited:={};

Cardinal;
set of terminals;
set of nodenumbers;

CollectFirst (Lloc, fs);
end

GetFirstSet;

--

mark

list

for

visited

nodes

Sec. 7.4

Collecting the symbol sets

143

GetFirstSet initializes a mark list for the prevention of cycles and calls the
procedure CollectFirst which does the actual work.
CollectFirst

(Jloc,Ts):

param

loc:
Sr
global visited:
Prster
localssesn:
gn:
Sie
begin

Cardinal;
set of terminals;
set of nodenumbers; -- mark
like in 'Get terminal start
Symboinode;
Graphnode;
set of terminals;

s:={};

list for visited
symbols';

nodes

¢

while loc<>0 do
-- for all alternatives
if loc in visited then return; end;
-visited:=visited+{loc};

cycle

GetNode (loc, Ton);

if DelNode (Jgn)
case

gn.typ

tee
| nt:

then

CollectFirst

if

eps:

end;

first (gn.sp) .ready
s:=s+tfirst (gn.sp) .ts;

GetSy(Jgn.sp,Tsn);
s:=stsl;
end;

“|

s:=s+sl;

S3=St(gni.sp};
then
else

any:

(Jgn.rp,1sl);

of

Sei

CollectFirst

(4sn.start,Ts1);

alltermnimansı,

-- nothing

end;
loc:=gn.1p;
end;
end CollectFirst;

The procedure DelNode(J
gn) from Section 7.4.1 checks if the graph node
gn is deletable.

7.4.3

Terminal

successors

of nonterminals

The terminal successors of all nonterminals are stored in another list. They are
collected in two steps: first, a search is made for the direct successors of all
nonterminals (those terminals immediately following this nonterminal at all its
occurrences in the graph); then the indirect successors are calculated (if a

nonterminal is at the end of a rule, its indirect successors are the successors of
the nonterminal on the left-hand side of this rule).
In the first step, the data structure follow is filled; this contains for each
nonterminal i its direct successors (ts) and those nonterminals (nts), whose
successors are indirect successors of i. In the second step, the indirect
successors are added to ts.

Chap. 7

The implementation

144

Get terminal successors:
global follow: array(nonterminals)

of

-- terminal successors
set of terminals;
ts:
-- nt's whose successors
nts: set of nonterminals;
-- must be added to ts
end;
-- mark list (visited nodes)
visitednod: set of nodenumbers;
visitedsym: set of nonterminals; -- mark list (visited nt's)
Symbolnode;
sn:
ike
Cardinal;

local
begin

ie all nonterminals
visitednod:={};

i do

follow(i).ts:={};

follow(i).nts:={};

for

i do

--

fill

and

--

complete

all

nonterminals

follow.ts

end;

follow.nts

GetSy (vi, T sn);
CollectFollow(lsn.start,Vi);
end;
for all nonterminals
visitedsym:={};

Complete(Ji);
end;
end Get

terminal

i do

follow.ts

follow(i).nts:={};
successors;

The procedure CollectFollow(Lloc,\sy) traverses the top-down graph of
the nonterminal sy starting at the node loc. Every time it encounters a nonterminal i, it adds its direct successors to the set follow(i).ts. For each non-

terminal i at the right end of the graph, it adds sy to the set follow(i).nts.
CollectFollow(Jloc,Ysy):

param
global
local

loc,sy:
Cardinal;
follow:
as in 'Get terminal
visitednod:
set of nodenumbers;
gn:
Graphnode;
Se
set of terminals;

successors';

begin
while loc<>0 do
-- step through alternatives chain
if loc in visitednod then return; end;
-- cycle
visitednod:=visitednod+{loc};
GetNode (loc, Tgn);

if gn.typ=nt

then

GetFirstSet (Jgn.rp, Ts);

follow(gn.sp).ts

:=

follow(gn.sp).ts

+ s;

ie Deletable(tgn.rp) then -- nt at end of rule
follow(gn.sp)
.nts := follow(gn.sp).nts + {sy};
end;
end;
CollectFollow(Jgn.rp,Ysy);
loc:=gn.lp;
end;
end CollectFollow;

The procedure GerFirstSet(Lloc,?s) from Section 7.4.2 computes the set of

Sec. 7.4

Collecting the symbol sets

145

terminal start symbols s of the graph with the root loc. The procedure
Deletable(\loc) from Section 7.4.1 checks whether the graph rooted at loc
is deletable.

The procedure Complete(li) used in Get terminal successors completes
the direct successors of the nonterminal i (follow(i).ts) by adding its indirect successors, which are the successors of the nonterminals contained in
follow(i).nts.
Complete(li):
param

i:

global

visitedsym:-set of nonterminals;
follow:
like in 'Get terminal

Cardinal;

local

j: Cardinal;

begin
if i in visitedsym

then

return;

end;

successors';

zeyele

visitedsym:=visitedsym+{i};

for

all

j in

follow(i).nts

do

Complete (14);
follow(i) ..ts:=follow(i)..ts+follow(j)
.ts;
end;
end Complete;

7.4.4

eps-sets

eps-nodes having an alternative must not be recognized by the generated
syntax analyzer unless the next input symbol is a valid successor of this epsnode. In order to find out whether a symbol is a valid successor, the syntax
analyzer must know the set of all possible successors of each eps-node with
alternatives.
The terminal successors of an eps-node are the terminal start symbols of
the subgraph rooted at the right pointer of the eps-node. If the right pointer is
null, the terminal successors are the successors of the nonterminal on the lefthand side of the graph containing the eps-node.
First, the top-down graph of each nonterminal is searched for eps-nodes.
Get eps-sets:
global epsset:
maxeps:

array of set
Cardinal;

of terminals; --- number

visited: set of nodenumbers;
local
sn:
Symbolnode;
begin
visited:={}; maxeps:=0;
for all nonterminals i do
GetSy(Ji,Tsn);

FindEps (Jsn.start,Ji,\false);
end;
end Get eps-sets;

--

mark

eps successors
of eps-sets

list

for visited

nodes

Chap. 7

The implementation

146

The procedure FindEps(lloc,lleftsy,\vialp) searches the top-down graph
with the root loc for eps-nodes. It computes their successors and stores them
into the global array epsset. The field sp of the eps-node is set to point to this
entry in epsset. The flag vialp indicates whether loc has been reached via a
left pointer.
FindEps (loc, Jleftsy,Jvialp):
param
loc:
Cardinal;
== root.
leftsy: Cardinal;
-- left
vialp:

global
local
begin

Boolean;

--

Of DG
side nonterminal

true,

visited: set of nodenumbers;
gn:
Graphnode;

if loc=0 or loc in visited
visited:=visited+t{loc};

then

if

--

loc

is

mark

return;

reached

list

end;

for

--end

via

lp

visited

or

nodes;

cycle

GetNode (Jloc,Tgn);

if

(gn.typ=eps)

and

(vialp

or

(gn.lp<>0))

then

--

FindEpsFollowers (Jgn.rp,Jleftsy, Tgn.sp);

--

RepNode (loc, Jon);

CDSS

eps

gn.sp

with
points

alt.
to

Sel

end;
FindEps (lgn.lp,

bleftsy,

true) ;

FindEps (Jgn.rp, Jleftsy,Yfalse);
end FindEps;

The procedure FindEpsFollowers(Lloc,Lleftsy,Tnr) collects the terminal
start symbols of the subgraph with the root loc. If the graph is deletable, the
successors of the nonterminal leftsy are also added. nr is the index into the
global array epsset. The collected set has been stored in epsset(nr).
FindEpsFollowers (Vloc,Jleftsy, fnr):
param
loc,leftsy,nr: Cardinal;
global epsset: array of set of terminals;
-- successors
follow: like in Get terminal successors;
maxeps: Cardinal;
local
s: set of terminals;
begin
GetFirstSet

of eps-nodes

(Jloc,1s);

ie Deletable(Lloc)
then s:=stfollow(leftsy)
.ts; end;
maxeps:=maxeps+l;
epsset (maxeps) :=s;
nr:=maxeps;
end FindEpsFollowers;

The procedure GerFirstSet(Lloc,?s) from Section 7.4.2 collects the terminal

start symbols of the graph with the root loc. The procedure Deletable(J loc)
from Section 7.4.1 determines whether the graph with the root loc
is
deletable.

Sec. 7.5

7.4.5

Grammar tests

147

any-sets

In order to recognize an any-symbol, the generated syntax analyzer needs the
set of all terminals represented by the any-symbol. An any-symbol represents
all terminals which are not in the alternative chain to which it belongs. For
any-symbols without alternatives, no any-sets are computed. The syntax
analyzer recognizes them regardless of the next input symbol.
Get any-sets:
global anyset:
maxany:
eofsy:
local
gn:
Ss
begin
for all nodes

array of set of terminals;
-- any-sets
Cardinal;
-- number of any-sets
Cardinal;
-- symbol number of eof-symbol
Graphnode;
set of terminals;
i do

GetNode
(ti, Tgn);
if

(gn.typ=any)

and

(gn.lp<>0)

GetFirstSet (Jgn.1p,1s);
Make complement of s;
s:=s-{eofsy};
-- eofsy
maxany:=maxanytl;
anyset (maxany) :=s;
gn.sp:=maxany;
--

2

sp of

must

then

not

any-node

be recognized

points

to

by any

any-set

RepNode (Ji,Jgn);
end;

end;
end

Get

any-sets;

The procedure GetFirstSet(Lloc,1s) from Section 7.4.2 supplies the terminal start symbols of the graph with the root loc.
For the calculation of an any-set, only those symbols are considered
which can be reached via the left pointer of the any-node. The symbols which
lie before the any-node in the alternative chain are not considered, since the
syntax analyzer has already checked them before it gets to the any-node.

7.5

Grammar

tests

Before Coco generates the target compiler, it carefully checks if the grammar
satisfies certain requirements which are necessary for a correct compiler. Here
the compiler compiler proves to be very valuable: even in large grammars,
which are hard to understand for human readers, it rapidly finds hidden ambiguities or circularities. The well-known problem of the ‘dangling else’ clearly

Chap. 7

The implementation

148

shows how easy bugs in the grammar design can remain undetected without
the support of an automatic tool (actually, this ambiguity was overlooked in
the language definition of Algol).
Coco verifies the following properties:
1.
2.
3.

completeness;
reachability;
noncircularity;

4.
5.

termination;
LL(1) property

The implementation

Structure
of the
symbol
list

Structure
of the
top-down
graph

Completeness

Collecting
the
symbol sets

Reachability

Generation
of the
parser
tables

Noncircularity

Generation
of the
syntax
analyzer

Terminalization

Generation
of the
semantic
evaluator

LL(1)-condition

Fig. 7.15 Structure of Section 7.5

The test algorithms are executed in the following
Test

grammar

Test

(Tok):

completeness (T okl);

Teste

ut eal

Find

circular

Test
LL1

order:

if all

Wontisecanebe

reached (Tok2) ;

rules (Tok3);

nt's

can

be derived

to t's (Tok4);

test (Tok5);

ok:=okl and ok2 and
end Test grammar;

ok3

and

ok4

and

ok5;

These algorithms access the top-down graph and the symbol list with the
following procedures, already described in Sections 7.2.2 and 7.3.2:
PROCEDURE
PROCEDURE

GetNode(loc:CARDINAL; VAR gn:Graphnode);
GetSy(sy:CARDINAL; VAR sn:Symbolnode) ;

Sec. 7.5

7.5.1

Grammar tests

149

Completeness

As check is carried out as to whether there is a rule for all nonterminals.
Basic idea: The field start in the symbol node of each nonterminal must
point to a top-down graph.
Test

completeness (Tok):

param

ok:

Boolean;

local
begin

sn:

Symbolnode;

ok:=true;
for

all

=
nonterminals

i do

GetSy (Li, Tsn);

if

sn.start=0

end;
end Test

7.5.2

then

ok:=false;

end;

completeness;

Reachability

A check is made as to whether all declared nonterminals appear in some sentential form derived from the start symbol of the grammar.
Basic idea: First, tagging is done on all those nonterminals which can be
derived directly from the start symbol, then on those nonterminals which can
be derived from symbols already tagged. This is repeated until no more
nonterminals can be tagged. The untagged nonterminals are not reachable.
Test

if all

param

ok:

nt's

Boolean;

can

be

reached (Tok) :

global

visited:
reached:
rootsy:
sn:

set of nodenumbers;
set of nonterminals;
Cardinal;
Symbolnode;

----

already visited nodes
reachable nonterminals
start symbol of grammar

local
begin
visited:={};
reached:={rootsy};

Get Sy (Lrootsy, 1sn);
MarkReachedNts(Jsn.start);
ok:=true;
for all nonterminals i do
if not (i in reached)
then ok:=false;
end;
end Test if all nt's can be reached;

end;

The procedure MarkReachedNts(Jloc) marks all nonterminals which can be
reached from the node loc.

Chap. 7

The implementation

150

MarkReachedNts(JLloc):
Cardinal;
loc:
param
set of nonterminals;
global reached:

---

set of nodenumbers;
Graphnode;

visited:
Kocalmrgni:
sn:

reachable nonterminals
already visited nodes

Symbolnode;

begin
if

loc=0

or

loc

in visited

visited:=visited+t{loc};

then

--

return;

visit

end;

--

end

or

cycle

loc

GetNode (Jloc,Tgn);

if

(gn.typ=nt) and not (gn.sp
reached:=reached+{gn.sp};

GetSy (Lgn.sp,

in

reached)

then

--

new

nt

reached

sn);

MarkReachedNts

(Jsn.start);

end;
MarkReachedNts(Jgn.1p);
MarkReachedNts(Jgn.rp);
end

7.5.3

MarkReachedNts;

Noncircularity

A check is made as to whether there are nonterminals which can be derived
into themselves, i.e. if there are derivations X ++ X for some nonterminals X.
(This circularity definition differs from the usual definition in attributed
grammars, which defines circular dependencies of attributes.)
Basic idea: All productions are considered, which have a single nonterminal as their right-hand side. These single-nonterminal productions make
up a graph that must be noncircular.
Algorithm: The graph is stored as pairs (left, right) of nonterminals for
which there is a production left > right.
Find

circular

param
global
local

rules (Tok):

ok:
visited:
graph :

Boolean;
set of nodenumbers;
array of record

left, right:

singles:={};

nodes

Cardinal;

deleted:
Boolean;
end;
graphlength: Cardinal;
singles: set of nonterminals;
sn:
Symbolnode;
'
changed: Boolean;
Aura)
Cardinal;

begin
graphlength:=0;
for all nonterminals

-- mark list for visited
-- derivation graph

i do

visited:={};

--

build

--

single

the graph

descendants

of

a nt

Sec. 7.5

Grammar tests

151

GetSy (Li, Tsn);

Get Singles (Jsn.start,tsingles)

for all

nonterminals

;

Hs

j in singles

CSc, MeElS:

5) Silene

Wess

do

graphlength:=graphlength+1;
with graph(graphlength) do
left:=i; right:=j; deleted:=false;
end;
end;
end;
repeat
-- remove edges,

which

are

not

on

a cycle

changed:=false;

for i:=1 to graphlength do
if not graph(i).deleted and
(graph(i).left not on any right-hand side or
graph(i).right not on any left-hand side) then
graph (i) .deleted:=true; changed:=true;
end;
end;
until not changed;
ok:=graph is empty;
end Find circular rules;

The elements that have not been deleted in the graph represent the circular part
of the grammar.
The procedure GetSingles(Lloc,tsingles) collects a set (singles) of

nonterminals in the top-down graph with the root loc. If the graph can be
derived into a single nonterminal X, then X is added to singles. The
following assertion always holds: Joc is on a path which contains only
deletable symbols between its beginning and loc.
Get Singles (Jloc,tsingles):
param
global
local

loc:
singles:
visited:
gn:

begin

--

assert:

Cardinal;
set of nonterminals;
set of nodenumbers;
Graphnode;
all

nodes

if loc=0 or loc in visited
visited:=visited+t{loc};
GetNode

if

left

then

to

loc

are

return;

deletable

end;

--

end

or

cycle

(Lloc, Tgn) ;

(gn.typ=nt) and Deletable(lgn.rp)
singles:=singlest{gn.sp}
end;

then

-- right subgraph
—zueletable

if DelNode(Jgn) then GetSingles(lgn.rp,tsingles)

end;

GetSingles(Jgn.1p,!singles);
end

GetSingles;

A nonterminal X is added to singles if it is on a path from loc to the end of
the top-down graph and if this path has only deletable nodes to the left and
right of X. The deletability of subgraphs and nodes is determined by the
procedures Deletable and DelNode from Section 7.4.1.

7.5.4

Chap. 7

The implementation

152

Termination

A check is made as to whether all nonterminals can be derived into (possibly
empty) strings of terminals.
Basic idea: Those nonterminals are tagged which are deletable or can be
derived into a string consisting only of terminals or already tagged nonterminals. This is repeated until no more nonterminals can be tagged. The untagged nonterminals are those which cannot be derived into terminals.
Test

if nt's

can

be

derived

to

t's(Tok):

param
global

ok:
visited:
termlist:

Boolean;
set of nodenumbers;
set of nonterminals;

local

changed:
sn:

Boolean;
Symbolnode;

begin
termlist:={};
repeat
changed:=false;
for all nonterminals

i which

are

---

mark list for visited nodes
nonterminals which can be

--

derived

not

to terminals

in termlist

do

Getsy(Ji,Tsn);
visited:={};

ie IsTerm(Jsn.start) then
termlist:=termlist+{i};
end;
end,=
2705
until not changed;
ok:=all nonterminals
end Test if nt's can

changed:=true;

are in termlist;
be derived to t's;

The procedure /sTerm(Lloc) checks if the top-down graph with the root loc
has a (possibly empty) path which consists only of terminals or already tagged
nonterminals.
IsTerm(Jloc): Boolean:
param
loc:
Cardinal;
global visited:
set of nodenumbers;
termlist: set of nonterminals;
local
gn:
Graphnode;
begin
if loc=0 or loc in visited then return
visited:=visited+{loc};
GetNode

if

return

elsezrewurne
end;
“end

end;

--

end

or

(Lloc, Tgn) ;

(gn.typ=nt)
then

false;

IsTerm;

and

not

(gn.sp

in termlist)

IsTerm(dgn. 1p);

(gn-rPp-0)

or

IsTerm(Jgn.rp)

or

IsTerm(Jgn.1p);

cycle

Sec;=7.5

7.5.5

Grammar tests

LL(1)

153

condition

A check is made as to whether it is always possible to decide which path of the
top-down graph should be followed during syntax analysis depending on the
next input symbol.
Basic idea: The LL(1) test consists of the following two subtests:

l.
2.

The terminal start symbols of all alternatives in an alternative chain must
be disjoint.
The terminal start symbols of deletable subgraphs must be different from
the terminal succéssors of the left-hand side nonterminal.
LL1

test (Tok):

param
ok:
global visited:
Ioealssssn:
begin
ok:=true;

for

all

Boolean;
set of nodenumbers;
Symbolnode;

nonterminals

visited:={};

--

mark

list

for

visited

nodes

i do

GetSy(Yi,Tsn);

CheckAlternatives(lsn.start,
i, lok);
end;
end LL1

test;

The procedure C'heckAlternatives(Lloc,\sy,tok) checks if the alternative
chaih with the root loc contains only alternatives with distinct start symbols
(subtest 1). If the subgraph rooted at loc is deletable (i.e. if it can produce the
empty string), it is also checked whether the start symbols of the subgraph are
different from the successors of the left-hand side nonterminal sy (subtest 2).

CheckAlternatives uses GetF(lsy,1 first) and GetFo(Jsy,7 follow)
to access the already calculated sets of terminal start symbols and successors
of nonterminals.
CheckAlternatives
(loc, sy, lok) :
param

global
localss

loc,sy:
ok:
visited:
first;
follow:
locset:
Si
gn:

begin
if loc=0

or

Cardinal;
Boolean;

set of nodenumbers;
set of terminals;
set of terminals;
set of terminals;
set of terminals;
Graphnode;
loc

in visited

if Deletable(Jloc)

then

GetFirstSet (Vloc, Ts);
GetFo(Jsy,Tfollow);

then

--

mark

---

start
start

return;

end;

list

for

symbols
symbols

--

end

visited

nodes

of current node
of prev. alt.

or

cycle

=zsubtests2

Chap. 7

The implementation

154

if s * follow
end;

<>

{} then

ok:=false;

end;

S:=(}7
-- for all alternatives
while loc<>0 do
if loc in visited then return; end;
visited:=visited+{loc};

we

subtestan

GetNode (loc, Tan);

if DelNode (4gn)
then

GetFirstSet

else

locset:={};

(Jgn.rp, Tlocset);

end;
case gn.typ of
t:
locset:=locset+t{gn.sp};
Mani:

GetF (Jgn.sp,Tfirst);

locset:=locset+first;
| eps,any:
-- nothing
end;
if s * locset <> {} then ok:=false;

end;

s:=stlocset;

CheckAlternatives(tgn.rp,
sy, tok) ;
loc:=gn. 1p;

end;
end CheckAlternatives;

The procedures Deletable(Lloc) and DelNode(\gn) from Section 7.4.1
check whether the top-down graph with the root loc or the graph node gn are
deletable. The procedure GetFirstSet(lloc,ts) from Section 7.4.2 supplies
the terminal start symbols s of the top-down graph with the root loc.

7.6 Generation

of the parser tables

When the grammar tests are completed, Coco can generate the target compiler.
From the symbol list and the top-down graph, the parser tables which drive
the generated compiler are constructed. The tables contain information for the
recognition of symbols and for error handling, including the G-code which
controls the syntax analysis. This section is structured as shown in Fig. 7.16.

7.6.1

Table

format

The parser tables are inserted into the generated syntax analyzer as initialization code. Table 7.1 shows their contents:

Sec. 7.6

Generation of the parser tables

155

The implementation

Structure

Structure

of the
symbol
list

of the
top-down
graph

— Collecting

Grammar | Generation

the
symbol sets

tests

Table
format

7

Generation
of the

Generation
of the

syntax

semantic

analyzer

evaluator

Generation of the
remaining tables

Generation
of theG-code

Fig. 7.16 Structure of Section 7.6
Table 7.1 Contents of the parser tables

header

table dimensions (for decoding)

code

G-oode

ntsymbols

information about nonterminals

epssets
anysets
attribute numbers
pragma semantics

sets of valid successors, one for each eps-instruction in the G-code
sets of terminals represented by each any-symbol
number of attributes for each terminal and each pragma
for each pragma, the semantic actions to be executed when
the pragma is recognized
symbol names for error messages
pointers to the symbol names

namelist
name pointers

The structure of the above data is shown by the following Modula-2 type
declarations:
TYPE

Header

=

RECORD

maxcodevar, maxtvar, maxpvar, maxsvar,
maxepsvar, Maxanyvar, maxnamevar, maxnamepvar:
END;

Code

CARDINAL;

.

= ARRAY[1..maxcode]

Symbolset

=

ARRAY[0..maxt

OF
DIV

[0..255];
16]

OF

BITSET;

Ntsymbols = ARRAY[maxptl..maxsym] OF RECORD
startpc: CARDINAL;
(*start of rule in G-code*)
del:
BOOLEAN;
(*true, if deletable*)
seabiesic, 2
Symbolset;
(*terminal start symbols*)
END;

Epsset = ARRAY[{1..maxeps] OF Symbolset;
Anyset = ARRAY[1..maxany] OF Symbolset;
Attributenumbers = ARRAY(0..maxp] OF [0..255];

Chap. 7

The implementation

156

OF RECORD
Pragmasemantics = ARRAY[maxt..maxp]
(*element maxt is a dummy*)
seml,sem2: CARDINAL;
END;

Namelist

= ARRAY[l..maxname]

OF CHAR;

Namepointers = ARRAY[0..maxnamep]
Checksum = CARDINAL;

OF CARDINAL;

The constants maxcode, maxt, maxp, etc. are the table dimensions derived
from the input grammar. They are inserted into the generated syntax analyzer
as constant declarations. The header of the parser tables contains the same
values as variables again. However, they are not used by the syntax analyzer,
but are reserved for a decoding program.

7.6.2

Generation

of the

G-code

The G-code is derived from the top-down graph. This process is very simple:
A recursive algorithm visits all nodes of the top-down graph and translates
them into G-code instructions. The simplified algorithm is shown below:
GenCode (Jnode) :
Generate code for node;
if (node.rp<>0) and (node.rp

not

yet

visited)

then

GenCode (dnode.rp) ;
end;
if (node.lp<>0) AND (node.lp

not

yet

visited)

then

GenCode (dnode. lp) ;

end;
end GenCode;

Each node is processed as follows (for the definition of the G-code, see

Section 2.4 or Appendix D):

1.

Depending on the node type, a G-code instruction for the recognition of
this node is generated (T, NT, NTS, ANY and EPS instructions). For
nodes with a nonzero left pointer value, the generated instruction also

contains the address of the corresponding alternative (TA, NTA, NTAS,
ANYA and EPSA instructions).

2.
WwW

4.

If semantic actions are specified in the node, SEM instructions are generated.
If the right pointer of the node is zero, a RET instruction is generated.
If the right pointer points to an already visited node, a JMP instruction to
the address of this node is generated.

In order to resolve jumps and addresses of alternatives, an address list of all
G-code sequences generated from graph nodes is needed. It is handled by the
following procedures:

Sec. 7.6

Generation of the parser tables

157

PROCEDURE NewAdr (loc:CARDINAL; adr:CARDINAL) ;
PROCEDURE GetAdr (loc, fixup:CARDINAL; VAR adr:CARDINAL);
PROCEDURE Visited(loc:CARDINAL) : BOOLEAN;
NewAdr defines that the G-code sequence generated from node loc has the
address adr. GetAdr returns the address adr of the G-code sequence corresponding to node loc. If the address is not yet in the address list, then adr is
zero. In this case, fixup is remembered as a G-code location where the node's
address is to be entered as soon as it becomes known. An address becomes
known, when it is defined by NewAdr. It is then automatically entered into
all fixup locations waiting for this address. Visited returns TRUE if the
address of the node with number loc is already known.
Two additional procedures are needed: one to emit G-code instructions
and one to access nodes of top-down graphs:
PROCEDURE
PROCEDURE

Emit (VAR pc:CARDINAL;
GetNode (loc:CARDINAL;

code:Instruction);
VAR node:Graphnode) ;

Emit writes the specified instruction code into the code segment at the location
pc and increases the code segment length accordingly. Here, Instruction is a
symbolic type that is represented by the text of the instruction. The actual
implementation deviates from this. GetNode gets the graph node with the
node number loc. The type Graphnode is described in Section 7.3.1.
- The actual algorithm for the generation of the G-code follows:
Generate
local
begin

G-code:

pc:

Cardinal;

pc:=1;

for

all

nonterminals

GenCode (Jroot

end;
end Generate

of

i do
top-down

graph

of nonterminal

i,

$pc);

G-code;

GenCode(lloc,tpc) is a recursive procedure which will now be refined. It

translates the top-down graph with the root /oc into a corresponding G-code
sequence and inserts it into the code segment at the location pc.
When GenCode arrives at a node loc that has already been visited, the
G-code for the subgraph at loc has already been generated, so this node does
not have to be revisited.
GenCode (loc, pc):
param
var

loc,pc:
node:
adr,nr:

Cardinal;
Graphnode;
Cardinal;

begin
if Visited(Jloc)
NewAdr (Lloc, dpc) ;

then

return;
——

end;

NOW) Vasit)

Loc

Chap. 7

The implementation

GetNode (4 1oc, Tnode) ;
case node.typ of
cs
if node.1lp=0

then Emit (fpc,Y"T

node.sp");

else
Getadr (node. 1p, bpc+2, fadr) ;

Emit([pc,4"TA

node.sp,adr");

end;

| nt:

if node.lp=0
then
if node.sem1=0

then Emit (tpc, "NT
else

node.sp");

Emit (fpc, Y"NTS

node.sp,node.seml");

end;

else
GetAdr

(Inode.Ip,Ypc+2,Tadr);
if node.sem1=0

then

Emit (pc,J"NTA

else

Emit (fpc, /"NTAS

node.sp,adr");
node.sp,adr,node.sem1") ;

end;

| any:

end;
if node.sp=0

then Emit (Ipc, )"ANY") ;
else

GetAdr (lnode.1p,tpc+2, Tadr) ;

Emit (Ipc, /"ANYA
| eps:

end;
if node.sp<>0 then
if node. 1p=0

node.sp,adr") ;
--

then Emit (fpc,L"EPS

node

with

eps-set

node.sp");

else
Get Adr (Lnode.lp,lpct+2,

Tadr) ;

Emit (Ipc, /"EPSA node.sp, adr") ;
end;

end;
end;
--case

if node.sem2<>0

THEN

Emit (Ipc,

"SEM

(node.sem2)");

if node.sem3<>0 THEN Emit (pc, "SEM (node.sem3)");

end;

end;

if node.rp=0

then

Emit (pc, L"RET");

else
abt Visited (node.rp)

then

GetAdr (tnode.rp, dpc+1, fadr) ; Emit (tpc, 4"JMP
end;
end;

if node.rp<>0

then

if node.lp<>0

then GenCode(lnode.lp,lpc);

end

GenCode;

GenCode (node. rp, pc) ; end;

end;

ella )\\c

Sec.r7.7

Generation of the syntax analyzer

159

The G-code is completely stored in memory so that the missing addresses can
be inserted when they become known.

7.6.3 Generation

of the remaining

tables

Besides the G-code, the contents of the generated tables are almost entirely
extracted from the symbol list. Only the name list is handled by the lexical
analyzer of Coco. Coco gets the necessary data from the symbol list and from
the lexical analyzer with the help of access procedures, and writes them unchanged into the syntax analyzer as initialization values.

7.7 Generation

of the syntax analyzer

Coco generates a table-driven LL(1) syntax analyzer with error handling in the
form of a Modula-2 source module which the user must compile and include
in his compiler. The syntax analyzer is the implementation of the analysis
algorithm described in Section 2.5. It is the same for all generated compilers.
Only-the parser tables differ from compiler to compiler so they have to be
inserted into the otherwise invariant parser module.
The implementation

Structure
of the
symbol
list

Structure
of the
top-down
graph

Collecting
the
symbol sets

Grammar
tests

Generation | Generation | Generation
of the
of the
parser
semantic
tables
evaluator

Fig. 7.17 Structure of Section 7.7

The definition module and the implementation module of the syntax analyzer
are generated from a frame text which Coco reads from the file cocosynframe. At certain locations grammar-dependent parts have to be inserted into
this frame. The locations are marked by the string '-->' and a descriptive name
of the text to be inserted. The following table shows what has to be inserted at
these locations.

Chap. 7

The implementation

160
-->modulename

-->semantic
-->input

grammar name + syn

analyzer

module

grammar name + sem
grammar name + lex

-->declarations

table dimensions declared as constants
(see example in Section 8.3)

-->tables

table values

The syntax analyzer contains references to other modules (e.g. the lexical
analyzer or the semantic evaluator) whose names are constructed from the
grammar name (the name of the root symbol in the attributed grammar) and
from a suffix. The resulting syntax analyzer is written to the files grammarnamesyn.DEF and grammarnamesyn.MOD.

Coco uses a procedure CopyFramePart to copy pieces of text from the
frame to the syntax analyzer module.
PROCEDURE

CopyFramePart (VAR

source,target:File;

str:ARRAY

OF

CHAR);

CopyFramePart copies text from the file source to the file target until it encounters the string str (str is not copied). When it is next called, it continues

copying the text immediately behind str.
This procedure is called with the name of the next piece of text to be
inserted (e.g. '-->tables'). It copies the frame up to this name and then Coco
inserts the specified text in place of the name. This process is repeated until the
entire syntax analyzer has been generated. A source listing of cocosynframe
is shown in Appendix F. The module cocosyn, also shown in Appendix F, is
an example of a syntax analyzer generated by this process.

7.8 Generation

of the semantic

evaluator

In addition to the syntax analyzer and the parser tables, Coco also generates a
semantic evaluator. This is a Modula-2 source module which the user must
compile and include in his compiler. The semantic evaluator consists of some
invariant parts and of the semantic actions and declarations which are copied
from the attributed grammar. Its generation can be divided into three tasks:

1.

2.
3.

copy the semantic declarations from the attributed grammar to the semantic evaluator;
translate the semantic actions into components of a case statement;
generate new semantic actions (assignments) for attribute passing.

Before covering these three tasks in detail, we will describe the invarian
t parts
of the semantic evaluator.

Sec. 7.8

Generation of the semantic evaluator

161

The implementation

Sp
eae
Structure
of the
symbol
list

Structure
of the
top-down
graph

Collecting
the
symbol sets

Grammar
tests

Generation
of the
parser
tables

Generation
of the
syntax
analyzer

Constant parts
of the

Translation
of

Translation
of

semantic
evaluator

semantic
declarations

semantic
actions

Generation

of the
semantic
evaluator

Attribute
processing

Fig. 7.18 Structure of Section 7.8

7.8.1 The invariant

parts of the semantic

evaluator

Like the syntax analyzer, the semantic analyzer is derived from a frame
module which Coco reads from the file cocosemframe. Again Coco copies
the frame using the procedure CopyFramePart (see Section 7.7) and inserts
grammar-dependent parts at some specified places in the frame. These places

are:
-->modulename

grammar name + sem

-->scannername

grammar name + lex

-->declarations
-->actions

semantic declaration of the grammar
semantic actions of the grammar

The frame module is as follows:
DEFINITION

VAR

PROCEDURE

END

MODULE

printactions:

-->modulename;

BOOLEAN;

Semant (sem:CARDINAL)

;

-->modulename.

IMPLEMENTATION MODULE -->modulename;
FROM SYSTEM IMPORT WORD;
FROM

-->scannername

IMPORT

at;

-->declarations
PROCEDURE
BEGIN

x:=y

ASSIGN (VAR
END

ASSIGN;

x:WORD;

y:WORD);

The implementation

162

PROCEDURE

Semant (sem:CARDINAL)

Chap. 7

;

BEGIN

CASE sem OF
11: ; (*action
-->actions

numbers

start

at

12*)

END;

END Semant;
END -->modulename.

The resulting semantic analyzer is written to the files grammarnamesem.DEF
and grammarnamesem.MOD. The user may set the exported variable printactions to TRUE if he wants a trace of the executed semantic actions.

7.8.2 Processing

of the semantic

declarations

The semantic declarations, which are written in Modula-2, are copied immediately and without change from the attributed grammar to the frame program,
and are inserted at the location marked by '-->declarations'. This happens in
the following manner: the lexical analyzer of Coco returns the symbols of the
Modula-2 text to the syntax analyzer as Cocol symbols, and from there they
go to the semantic evaluator. The procedure Copy(Jtyp,\col) is called for
each symbol to translate its symbol code back into its source text, which is
then inserted into the frame module.
Problems can arise since the Modula-2 text may contain symbols that are
not Cocol symbols (i.e. +, *, &, etc). Such symbols are copied by means of a
trick: the lexical analyzer assigns them a special symbol code (nococosy) and
an attribute (spix). They are treated like names and entered into the name list.

spix is their address in the name list, which allows their source text to be
accessed.
In order to keep the name list small, the Modula-2 names are entered only
temporarily. Permanent storage is prevented with the procedure StopHash.
This causes a name to be entered, but overwritten by the next name, so the
names can be accessed via their addresses just like the permanently stored
names, but only until the next name has been recognized. The procedure
RestartHash re-establishes permanent storage.
Coco copies the declarations without checking the syntax. If there are
syntax errors, they will be detected by the Modula-2 compiler when the
generated semantic evaluator is compiled. We now describe the translation of
the semantic declarations by an attributed grammar in Cocol.
GRAMMAR
SEMANTIC
FROM

Declarations
DECLARATIONS
cocogen

IMPORT

Copy;

Sec. 7.8

Generation of the semantic evaluator

FROM

==

cocolex

IMPORT

PROCEDURE

oe
=

col,

typ,

StopHash,

163

RestartHash;

Copy (typ,col:CARDINAL);

writes the source text
semantic analyzer. col

TERMINALS

SEMANTICSY

NONTERMINALS
RULES
Declarations
SEMANTICSY
{ any
}

Declarations

of the
is the

symbol
symbol

'typ' to the generated
column in the grammar.

DECLARATIONSY

=
DECLARATIONSY

sem StopHash endsem
sem Copy (Jtyp, Jcol) endsem
sem RestartHash endsen.

ENDGRAM

7.8.3 Processing

of the semantic

actions

Coco translates the semantic actions of the attributed grammar into continuously numbered variants of a case statement, and inserts them into the
semantic frame program at the location marked by the string '-->actions'.
Like the declarations, the semantic actions are copied unchanged and without a
syntax check. Again, each symbol is copied by translating its symbol code
back into its source text. We describe this process in Cocol.
«GRAMMAR

SemAction

SEMANTIC

DECLARATIONS

FROM

cocogen

IMPORT

Copy,

FROM

cocosym

IMPORT

NewMacro,

GetMacroNr;

FROM
FROM

cocolex
Errors

IMPORT
IMPORT

col, typ,
SemErr;

StopHash,

--PROCEDURE

--

OpenSem (VAR

generates

--PROCEDURE

---

gets
does

VAR

the
not

spix,sem:

a new

OpenSem;

sem:CARDINAL) ;

case

GetMacroNr

action
exist,

RestartHash;

label

and

returns

(spix:CARDINAL;

number
sem=0.

sem

its

VAR

of the

number

sem.

sem:CARDINAL) ;

macro

'spix'.

If the

CARDINAL;

TERMINALS

SEMS Yee ND OEMS Yanna

NONTERMINALS

SemAction<out:sem>

EEE

TEEN

SZESTDENTSOUNERISPLX>

RULES

SemAction<out:sem>
SEMSY
( "("

=

IDENT<out:spix>

sem

GetMacroNr (Jspix, Tsen);

IF sem=0
endsem

THEN

SemErr
(1) END

1) "

|

{ any

sem

Opensen (Tsen) ; StopHash

sem

Copy (Jtyp, Jcol)

sem

RestartHash

endsem

}
)
ENDSEMSY
ENDGRAM

endsen.

endsem

macro

Chap. 7

The implementation

164

The above grammar also shows how semantic macros are processed. The
module cocosym handles a list of macro names and their corresponding
semantic action numbers. The action number of a macro is supplied by the
procedure GetMacroNr.

7.8.4

Attribute

processing

While declarations and semantic actions need only to be copied from the
attributed grammar into the semantic evaluator, attributes need further processing. For each symbol, its attributes must be stored in the symbol list, and
must be checked for consistency every time this symbol occurs. In addition to
this, Coco must generate semantic actions by which values are assigned to the
attributes at run-time.
The processing of attributes depends on the context in which they appear.
In Cocol there are three different places where attributes may occur:
1.
2.
3.

at the declaration of a syntax symbol;
at anonterminal on the left-hand side of a rule;
atasymbol on the right-hand side of a rule.

We will now describe the processing of attributes in each of these three cases,
and then summarize it by an attributed grammar.
Declaration

of attributes

Attributes are declared together with syntax symbols and are entered into the
symbol list. The context of attribute declarations is:
SyntaxDeclarations =
TERMINALS
{Symbol
[ PRAGMAS
{Symbol

NONTERMINALS

[Attributes]
[Attributes]

{identifier

[AliasName] }
[SemAction]} ]

[Attributes]

[AliasName]}.

Coco uses the procedure New4r to enter an attribute into the symbol list.
TYPE

Direction

PROCEDURE

=

(up,down) ;

NewAt (sy, spix:CARDINAL;

dir:Direction);

NewAt enters an attribute spix with the direction dir for the symbol sy.
Depending on the kind of sy, the following information is stored:
for terminal symbols:
for pragmas:

number of attributes;
number of attributes;

for nonterminals:

numbéh ame, and direction of attributes.

Sec. 7.8

Generation of the semantic evaluator

165

Attributes on the left-hand side of productions
Attributes on the left-hand side of productions are called formal attributes.
Their context is:
Rule

=

identifier

[Attributes]

"="

Expression

"."

.

Formal attributes are checked for consistency with their declaration. For every
left-hand side nonterminal the number of attributes, their names, order, and
direction must agree with the attributes declared for this nonterminal. The
procedure GetAt is used to access the attribute information in the symbol list.
It gets the name (spix) and the direction (dir) of the nth attribute of the

nonterminal sy. If sy has fewer than n atttributes, then spix is zero.
PROCEDURE

GetAt(sy,n:CARDINAL;

VAR

spix:CARDINAL;

VAR

dir:Direction) ;

Attributes on the right-hand side of productions
Attributes on the right-hand side of productions appear as actual attributes of
syntax symbols in EBNF expressions.
Expression
Term
Factor

= Term {"|" Term}.
= Factor {Factor}.
= Symbol [Attributes]

|

In this context, attributes denote semantic values which result from the recognition of a syntax symbol, or which are required for its recognition. Coco
generates assignments between the attribute values and the attribute names,
and includes them as semantic actions in the evaluator program. It also checks
whether the number of attributes, their order and their direction agree with the
corresponding attribute declaration.
Attribute assignments for terminals and pragmas
The lexical analyzer of the generated compiler exports the attribute values of
terminals and pragmas in the variable at. The array at is filled for each symbol by the lexical analyzer. A terminal (or pragma) t<out:a,b> is handled by
the generated compiler as follows:
recognize

t and fill at;

a:=at(1);

b:=at(2);

When t has been recognized, a semantic action must be executed in which
the attribute values at(1) and at(2) are assigned to the attributes a and b.
Since such an action does not exist, Coco must generate it.

Attribute assignments for nonterminals
For nonterminals, attribute assignments occur between formal and actual attributes. A nonterminal nt<in:a,b; out:c,d> is handled by the generated

Chap. 7

The implementation

166

compiler as follows:
formal
formal

attribute
attribute

parse nt;
c := formal
d := formal

corresponding
corresponding

attribute
attribute

to
to

a
b

corresponding
corresponding

:= a;
:= b;

to c;
to d;

Again Coco must generate semantic actions for the attribute assignments.
Generation of attribute assignments
For each attribute on the right-hand side of a production, Coco calls the
procedure GenAssign, which generates an assignment of the corresponding
attribute value to the attribute variable.
TYPE

Attrkind

=

(term,
nonterm,
const) ;

PROCEDURE

(*attribute

of a terminal*)

(*attribute of
(*const. value

GenAssign(typ:Attrkind;

a nonterminal*)
as an attribute

of

an

nt*)

left, right:CARDINAL);

Table 7.2 shows the meaning of the parameters left and right depending on
the value of the parameter typ. It also shows which code is generated:
Table 7.2 Parameters of GenAssign and the generated code
pee of

Sat of

Meaning of
right

Generated code

term

Spix of
left side

ie

name(left):=at[right]

nonterm

Spix of
left side

Spix of
right side

name(left):=name(right)

Spix of
left side

Constant

name(left):=right

name(spix) denotes the name at the address spix in the name list. The array
at is exported by the lexical analyzer and contains the attribute values of the
most recently recognized terminal.

The procedure EmitAction builds a semantic action from the attribute
assignments generated since its last call. It inserts the action as a variant of a
Case statement into the semantic evaluator. Thus, the semantic evaluator contains not only the semantic actions of the attributed grammar, but also the
actions generated from attributes by Coco. EmitAction returns the action
number of the generated semantic action.
PROCEDURE

EmitAction(VAR

sem:CARDINAL)

;

Sec. 7.8

Generation of the semantic evaluator

167

Optimization of attribute passing
Coco performs two optimizations to reduce the number of attribute assignments:

1.
2.

If the formal and the actual attribute of a nonterminal have identical
names, no assignment is generated.
Identical semantic actions (with the same assignments) are generated only
once.

Description of the attribute processing in Cocol
We will now summarize the attribute processing, describing it by an attributed
grammar in Cocol. The start symbol of the grammar is the nonterminal
Attributes. The grammar is a segment of a larger grammar in which attributes
can appear in various contexts. Therefore, Attributes has three input attributes
which control its processing.
Attributes<in:sy,styp,kind;

out:seml,sem2>

sy denotes the symbol to which the attributes belong; styp specifies the type
of this symbol; kind is the context in which the attributes are being used
indicating how they are to be processed:

kind=def:
kind=check:

treat them as an attribute declaration;
perform a consistency check
(when used on the left-hand side of a production);

kind=use:

generate semantic actions for attribute passing
(when used on the right-hand side of a production).

seml and sem2 are the numbers of the generated semantic actions for input
and output attribute passing (or zero).
GRAMMAR

Attributes

SEMANTIC

FROM

DECLARATIONS

cocosym

IMPORT

FROM

cocogen

IMPORT

NewAt, GetAt, CompleteAt, Direction,
Symboltype;
Attrtype, EmitAction, GenAssign;

FROM

Errors

IMPORT

SemErr;

Usage,

SAUa

--

Attrtype

=

(term,nonterm, const) ;

-Direction
= (up,down);
(AOUG=ateoOrein=dt*)
-Usage
= (def,check,use);
(*attribute context :*)
-Symboltype = (eps,t,pr,nt,any);
--PROCEDURE NewAt (sy, spix:CARDINAL; dir:Direction) ;
-declares an attribute for the symbol sy with the name
-the direction dir.
--PROCEDURE GetAt(sy,n:CARDINAL;
-VAR dir:Direction) ;

VAR

spix:CARDINAL;

spix

and

Chap. 7

The implementation

168

spix and the direction dir of attribute number
If sy has less than n attributes, then spix=0.

gets the name
of symbol sy.

---

--PROCEDURE

CompleteAt

--

true

returns

(sy,n:CARDINAL)

if symbol

sy has

: BOOLEAN;

exactly

n attributes.

VAR

sy, Spix, spixl, seml,sem2,n,val:
styp: Symboltype;
kind: Usage;
dir,dirl: Direction;

CARDINAL;

MACROS
sem

:AssignInAt:
n:=ntl;

CASE kind OF
use:
IF styp=nt

THEN

Getat (Jsy,Jn,Tspixl,Tdirl);

IF spixl<>0 THEN
IF dir=dirl
THEN
ELSE

GenAssign(tnonterm,
SemErr (2)

/spixl, /spix)

END
END
END;

| check:

IF

styp=nt

THEN

GetAt (Lsy, bn, Tspix1,Tdirl);

IF spixl<>0

THEN

IF spix<>spixl THEN SemErr(3)
END;
IF dir<>dirl THEN SemErr(2)
END
END
END;

| def:

END

NewAt (Lsy, bspix,

--

dir)

CASE

endsem

sem

:AssignNumber:
ig Sahel
IF kind=use
THEN

IF

styp=nt

THEN

Getat (bsy, dn, Tspix1, hdirl) ;
IF spixl<>0 THEN

IF dir=dirl
THEN
ELSE

GenAssign(Jconst,Yspix1,\/val)
SemErr (2)

END
END

END

ELSE
END

endsem

SemErr (4)

n

Sec. 7:8

sem

Generation of the semantic evaluator

169

:AssignOutAt:
n:=ntl;

CASE

kind

use:

OF

IF styp=t
ELSIF

THEN

styp=nt

GenAssign (Jterm,Jspix,Yn)
THEN

Getat (Jsy,In,Tspix1,Tdiri);
IF

spixl<>0

THEN

IF dir=dirl

THEN
ELSE

GenAssign (Vnonterm, 4spix,
SemErr (2)

spix1)

END

END
END;

| check:IF

styp=nt

THEN

Get at (bsy, dn, Tspix1, Tdirl);
IF spixl<>0

THEN

IF spix<>spixl
IF dir<>dirl
END

THEN
THEN

SemErr(3)

SemErr(2)

END;
END

END;

| def:

NewAt (Usy,lspix,
IF styp=pr

END

--

THEN

dir) ;
GenAssign (Jterm, /spix, Vn) END

CASE

endsem
PERMINALS
Wu

nen

mae

IDENT<out :spix>

ee

Wis

INSY

OUTSY

NUMBER<out:val>

NONTERMINALS

Attributes<in:sy,styp,kind;
out:seml,sem2>
InAttr<in:sy,styp,kind;
out:seml,sem2,n>
zn:
attribute counter
OutAttr<in:sy,styp,kind,n;
out:seml,sem2,n>

Attributes<in:sy,styp,kind;
out:seml,sem2> =
u
sem seml:=0; sem2:=0 endsem
( InAttr<in:sy,styp,kind; out:seml,n>
[ ";" OutAttr<in:sy,styp,kind,n; out:sem2,n> .
| OutAttr<in:sy,styp,kind,0;
out:sem2,n>

)

wu

sem IF NOT Completeat (lsy,4n) THEN
SemErr (5)
END

endsem.
InAttr<in:sy,styp,kind;
INS Ys!

out:semi,n> =
sem IF styp<>nt THEN
dir:=down; n:=0
endsem

SemErr (1)

END;

The implementation

170

Chap. 7

( IDENT<out :spix>

sem

(AssignInAt)

| NUMBER<out

sem

(AssignNumber)

sem
sem

(AssignInAt) endsem
(AssignNumber) endsem

:val>

endsem
endsem

lan
( IDENT<out:spix>
| NUMBER<out :val>

sem IF kind=use THEN EmitAction(Tseml)

END

endsem.
OutAttr<in:sy,styp,kind,n;
out:sem2,n> =
sem dir:=up endsem
OULS Ye
IDENT<out :spix>
sem (AssignOutAt) endsem
{ "," IDENT<out:spix> sem (AssignOutAt) endsem
sem IF (kind=use) OR (styp=pr)
}

EmitAction

THEN

(Tsem2)

END

endsem.
ENDGRAM

If one of the context conditions is violated, the procedure SemErr(Jn)

called, which emits an error message depending on n:

error message
: In-attributes for a pragma or terminal
: Wrong attribute direction
: Wrong attribute name
: Formal attribute is a constant
AP
wm
8m : Wrong number of attributes

is

8
Applications

8.1 Applications

in compiler

construction

Attributed grammars are mainly used in compiler construction — more precisely for the description of compilers. However, the description of an actual
compiler is far too complex to be used as an introductory example. Therefore,
in this Section we will use Cocol to develop a lexical analyzer, which is part of
a compiler. This example is general enough to demonstrate all language
constructs of Cocol, and yet simple enough for a reader inexperienced with
attributed grammars to follow it. The application of Coco to an actual compiler
(the compiler description for Coco itself) can be found in Appendix F.
It is unusual to describe and to generate lexical analyzers with attributed
grammars. Normally, they are coded by hand since they must be very efficient
(lexical analysis takes the biggest part of the compilation time). There are
special scanner generators which are designed to produce fast lexical analyzers. Although Coco is not such a generator, run-time measurements show
that it is possible in both theory and practice to implement lexical analyzers
with Coco.
As an example, we will develop a lexical analyzer for Modula-2. First we
will give a general specification for lexical analyzers. Then we will prepare a
special specification of a lexical analyzer for Modula-2. Next we will describe
and build this lexical analyzer using Cocol. Finally we will explain some of
the problems that can arise. At the end of this section, we will specify the
semantic procedures used in the description of the lexical analyzer.

171

Chap. 8

Applications

172

8.1.1 Specification

of a lexical

analyzer

General tasks
A lexical analyzer must at least perform the following tasks:
1.
2.

read and optionally print the source program;
skip meaningless character sequences such as blanks, comments, etc.;

3.

recognize and tokenize terminals such as keywords, names, numbers,

4.

and operators;
report lexical errors.

Usually, a lexical analyzer will recognize only one terminal per call and pass it
to the syntax analyzer. However, there are also analyzers that process the
entire source text at once, and write the symbol codes of the recognized
terminals to an intermediate file so that the syntax analyzer can read them later
on. The lexical analyzer described here is of the latter type.
Tasks of a lexical analyzer for Modula-2
A lexical analyzer for Modula-2 must recognize the following terminals:
Keywords
AND

ELSIF

LOOP

REPEAT

ARRAY

END

MOD

RETURN

BEGIN
BY

EXIT
EXPORT

MODULE
NOT

SET
THEN

CASE

FOR

OF

TO

CONST
DEFINITION
DIV
DO
ELSE

FROM
ER
IMPLEMENTATION
IMPORT
IN

OR
POINTER

TYPE
UNTIL

PROCEDURE
QUALIFIED
RECORD

WHILE

VAR
WITH

Names
Identifier

=

Letter

Letter

=

INENDEEN

Digit

=

u

{Letter
a

KALI

RE

AUT

| Digit}.
DAT
Va

IS
SU

| Oe
SU

SE

OL

Decimal constants
DecNumber

= Digit

{Digit}.

Hexadecimal constants
HexNumber

= Digit

HexDigit

=

Digit

=

OctalDigit

{HexDigit}
|

"H".

KAUNIBEITENTUHE

TE

Octal constants
OctalNumber

{OctalDigit}

"B",

tem

‘

Sec. 8.1

Applications in compiler construction

OctalDigit

=

mor

|""

| wou

hey

ay

| wou

|wen

|u7u

173

n

Real constants
RealNumber

= Digit

{Digit}

pata

a

ee

"."
Fe)

{Digit}

Digit

[Digit]].

Character constants
CharConst

=

win

any

wig

| OctalDigit

Character strings
String

|

us

any

ms

{OctalDigit}

"c".

ei
=

win

{any}

wie

tur

{any}

dl

Comments
Comment

Ze

comment

any)

Operators and separators
a
=
z
/
:=

addition
subtraction
multiplication

&

logical

>=
(
[
{
z
F

real division
assignment
and

Segel
#
not equal
<>
not equal
<
less than
greater than
less than or

Context

1
2,
3.
4

greater than or equal
round parenthesis
index-parenthesis
set-brackets
pointer
comma
period

9

semicolon

S

colon
range operator
variant operator

equal

conditions

Decimal, hexadecimal, or octal constants must be in the range 0 to 65535.
The numerical value of character constants must be in the range 0 to 255.
Real constants must be in the range 1.4694E-39 to 1.7014E+38.
Character strings must not extend over line boundaries.

8.1.2 Description

of a lexical analyzer for Modula-2

In the previous section, we described the lexical structure of Modula-2 by a
context-free grammar. Now we will have to attribute it. The following points
need special attention.
The lexical analyzer supplies the terminals for syntax analysis. These are
the nonterminals of the lexical analyzer, whereas the terminals of the lexical

Chap. 8

Applications

174

analyzer are the characters of the source text. These characters must be
supplied by a mini-scanner with the following tasks:
1.
2.
3.

read and print the source program,
supply the characters of the source text as terminals;
treat the character sequences "..', '(*', and '*)' as special terminals (to
simplify the attributed grammar).

This still leaves enough work for the lexical analyzer proper. In accordance
with Section 6.4.2, we will implement the mini-scanner in the procedure
GetSy of the module Scannerlex. The mini-scanner is so simple that we
refrain from describing it further.
Now we will specify the lexical analyzer of Modula-2 with Cocol.
GRAMMAR

Scanner

SEMANTIC

DECLARATIONS

FROM

Conversions

IMPORT

Convert,

FROM

Errors

IMPORT

SemErr;

FROM

ListMod

IMPORT

EnterString,

FROM

Scannerlex

IMPORT

typ,

FROM

OutMod

IMPORT

Symboltype, Emit, EmitConstant,
EmitIdent, EmitString;
(*token codes*)

--TYPE

-==
==
==

Symboltype

=

ConvertReal;

line,

Hash;

col;

(eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy,
minussy, orsy, eqlsy, neqsy, grtsy, geqsy,
lsssy, leqsy,
insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy,
semicolonsy, periodsy, colonsy, rangesy, constsy,
commasy,
typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy,
tosy, arrowsy, importsy, exportsy, fromsy,
qualifiedsy,

==
==

beginsy, casesy,
ofsy, ifsy, thensy, elsifsy,
elsesy, loopsy,
exitsy,
repeatsy, untilsy, whilesy, dosy, withsy,
forsy, bysy,
returnsy, becomessy, endsy, callsy, definitionsy,

SS

implementationsy,
intcardcon,

proceduresy,

realcon,

charcon,

modulesy,

ident,

stringcon,

eolsy);

cardcon,

CONST

blmax

= 80;

---

buffer
fit on

length (every token must
a 80 character line)

--

string

address

VAR

addr:

CARDINAL;

bi
Joie

ARRAY [1..blmax] OF CHAR;
-CARDINAL;
-- buffer length

chi:
CHAR;
firstch:CHAR;

ir
length:
rval:
spix:
sy:
symcol:
val:

CARDINAL;
CARDINAL;
REAL;
CARDINAL;
Symboltype;
CARDINAL;
CARDINAL;

-==
--------

in string

list

buffer

auxiliary
first character in a string
auxiliary
string length
value of real-constant
spelling index of identifier
token code
symbol column
constant value

Applications in compiler construction

Sec. 8.1

175

MACROS
sem

:AddCh:

are

--

-not

bl:=bl+1;

it is supposed, that lines
longer than 80 characters

b[bl] :=ch

endsem

TERMINALS

RU
chr9
chr17
chr25

NEN
chr lO
chr18
chr26

vow

ui

Lt

aN

ga

wy"

vg"

Tom

va
2

A
38
Q

B
J
R

x

Ne

an
h

a
a

b
3

C
k

d
1

e
m

if
n

g
0

p
8

q
y

ag
2

S
ur

ie
ae

u
a

Vv
chr126

W
chr127

fg

H

Schr
che l6
chr24

chr4
chr l2
chr20
chr28

ChuS ey
eR
chr21
chr29

sichro = chris
ehrlds
chris
chr22
chr23
chr3l
chr30

uur

ie

Su

won

Wew

wee

"x"

win

En

woe

LW

ypu

wou

zu

nz“

"zu

TAU

Tu

Wet

Wig

u

vw

Wu

wa

E

G
(6)
W

URAN
chrl
chr19
chr27

Z,

[&
K
S
u

D
L
m
I

vn

BE
M
U

N

Ne "

wau

V

ou

NONTERMINALS

„Scanner

Symbol
Identifier

<out:sy,spix,symcol>

Number
String
Comment
Letter

<out:sy,val,symcol>
<out:sy,addr,lengt
firstch, symcol>
h,
<out:ch>

Digit
HexDigit

<out:ch>
<out:ch>

RULES

Scanner

=

sem Emit (Veofsy,Jcol)

{Symbol}
Symbol =
{eos
( Identifier

endsem.

ep banks
<out:sy,spix,symcol>
sem IF sy=ident

THEN EmitIdent (Jspix,Jsymcol) -- ident.
ELSE Emit (Lsy,tsymcol)
-- keyword
END

| Number

endsem
<out:sy,val,symcol>

sem EmitConstant (Jsy,Jval,\symcol)
--

cardcon,

intcardcon,

endsem

realcon,

charcon

Chap. 8

Applications

176

String

<out:sy,addr,
length, firstch, symcol>
sem IF sy=stringcon

THEN EmitString
(laddr, length, Jsymcol)
ELSE EmitConstant (Jcharcon,
JORD (firstch) ,dsymcol)
END

endsem
| Comment

UT
| "="
un
mie
vie
nu
Laney

sem
sem
sem
sem

Emit
Emit
Emit
Emit
sem Emit
sem Emit
sem Emit

Mu he

| van
jj) ow
wt
wen
su
wan
nn
we
ng"
nu
ee
CR
UO

(ME

| eps

(Jsemicolonsy,Ycol) endsem
(Jeqlsy,Jcol) endsem
(Jlparsy,Jcol) endsem
(Jrparsy,Jcol) endsem
(Jlbracksy,Jcol) endsem
(lrbracksy,/col) endsem
(Jlconbrsy,Jcol) endsem

sem

Emit ({rconbrsy,Jtcol)

sem
sem
sem
sem
sem
sem
sem
sem
sem
sem
sem
sem

Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit

endsem

sem

Emit (Jbecomessy,Jcol)

(Jtimessy,\col) endsem
(\commasy,Ylcol) endsem
({slashsy,col) endsem
(Lplussy,4col) endsem
(\minussy,Jcol) endsem
(Jarrowsy,\col) endsem
(\variantsy,Jcol) endsem
({notsy,dcol) endsem
(Jandsy,Jcol) endsem
(\periodsy,Jcol) endsem
(Jrangesy,Jcol) endsem
(leolsy,Jcol) endsem

sem Emit (Lcolonsy,Jcol)

endsem

endsem

)

| mh

(MDM

MIeN
| eps

sem Emit (\notsy,Jcol)
sem Emit (Llegsy,lcol)
sem Emit (Llsssy,dcol)

endsem
endsem
endsem

sem Emit (\gegsy,Jcol)
sem Emit (\gtrsy,Jcol)

endsem
endsem

)

| ">"

( wan
| eps

Identifier

<out:sy,spix,symcol>

=

Letter <out:ch>
{ Letter <out:ch>
| Digit <out:ch>

sem
sem
sem
sem

symcol:=col; bl:=0
(AddCh) endsem
(AddCh) endsem
(AddCh) endsem

}

sem Hash(lb,lb1,Tsy,
Tspix)
-- sy is identifier
endsem.

endsem

or

keyword

Sec. 8.1

Applications in compiler construction

Number

<out:sy,val,symcol>

=

Digit <out:ch>
{ HexDigit <out:ch>

sem

symcol:=col; bl:=0
(AddCh) endsem
(AddCh) endsem

(CH

sem

DIEBE ER DII ZUR:

sem
sem

endsem
bl:=bl+1; b[bl]:=CHR(typ)
(AddCh) endsem

sem
sem

177

endsem

Convert (lb, /b1,Tsy, Tval)

Digit <out:ch>
[ Digit <out:ch>

endsem

sem

bl:=bl+1;

b[bl]:=CHR(typ)

endsem

sem

bl:=bl+1;

b[bl]:=CHR(typ)

endsem

sem
sem

(AddCh)
(AddCh)

endsem
endsem

]
sem

ConvertReal
(lb, 4b1,Trval) ;
sy:=realcon;
val:=CARDINAL
endsem

(rval)

Convert
(lb, 4b1,Tsy,Tval)

String

<out:sy,addr,
length, firstch,symcol>
sem

(

endsem

=

symcol:=col;

bl:=0

endsem

DIDI

ER TDZISURNEHTE

sus

sem

EZ

HI?

endsem
|

|

Ku

Man

sem

bibl

endsem
sem DIDI
endsem
sem
sem

sem

s="(" 5 bi bl+2 1st 40s
SS

bls =i

HD=a) eles: =billake

SemError (J1,Jline,Jcol); bl:=0 endsem
bl:=bl+1; b[bl]:=CHR(typ) endsem

bb

Beh

(bl t2 = ts

bils=bilAe2

b[bl+2]:="*";

bl:=bl+2

b[b1+2]:=")";

bl:=b1+2

endsem
sem
sem

b[bl+1]:="(";
endsem
b[b1+1]:="*";
endsem

sem
sem

sem

SemError (J1,Jline,Jcol); bl:=0 endsem
bl:=bl+1; b[bl]:=CHR(typ) endsem

length:=bl;
IF length=1
THEN
ELSE

sy:=charcon;

firstch:=b[1]

Chap. 8

Applications

178

sy :=stringcon;

EnterString
(tb, Jbl, Taddr)
END

endsem.
Comment

"(*"

Letter

=

{ comment

<out:ch>

=

(AIBICIDIEIFIGIHIIIJIKILIMINJO|PJQIRISITIUIVIWIXIYIZ]
albleldlel£f|gihliljik|llmInlolplalr|sitlulviwix|ylz)
sem
Digit
ut

<out:ch>
RR

ZU

ch:=CHR(typ)

endsem.

=
TRITT

EZ

TUSU

TEE

I

ERTL)

sem ch:=CHR(typ)

endsem.

sem

endsen.

HexDigit <out:ch> =
digit <out:ch>

| (A|B|C|DIE|F)

ch:=CHR(typ)

ENDGRAM

The rules for Number and String need some explanation:
Numerical constants cannot be converted while they are being recognized
because decimal, hexadecimal, octal, and real constants can be distinguished
only by their last character or by a decimal point. Their text must therefore be
stored and converted later.
Strings also have their peculiarities. Our mini-scanner returns the
character sequences '..', '(*', and '*)' as single terminals. If one of these
Sequences appears within a string it has to be expanded again, since strings

must be stored in their original form. Therefore, the rule for strings gets more
complicated than expected.
On the other hand, the description of strings and comments with the
symbol any looks very simple and elegant. In accordance with Section 5.2.1,
any represents all those terminals which cannot be recognized instead of it, at
this point in the grammar (in String: all terminals except... (“27
SCHE
and ''' (or '"'); in Comment:

all terminals except '(* and '*)'). The

example also shows the semantic processing of any. In a string, the symbol
recognized by any is processed using the global variable typ (see Section
6.4.2).
The reason for the introduction of the terminals fey Chand“)

is not

obvious, and requires an explanation: the symbol '..' is necessary, because
otherwise a lookahead of 2 characters would be needed (the first period
in the

Sec. 8.1

Applications in compiler construction

179

sequence '1..2' may be a decimal point or the start of a range operator).
Although comments can be processed with a single lookahead character, it
simplifies the processing of comments considerably if we treat the sequences
'(*' and '*)' as single terminals.
LL(1) Conflicts
As shown by Example 8.1, it is often difficult to avoid LL(1) conflicts when

lexical structures are described by an attributed grammar:

8.1 Example LL(1) conflicts in lexical structures
Scanner
Symbol

{Symbol}.

| mon

Maren],
This situation represents an LL(1) conflict because if '>' is read and the

next character is '=', the syntax analyzer cannot decide whether this
character belongs to the symbol '>=' or whether it constitutes a separate
symbol '='. Such conflicts also appear in the symbols ':=", '<>', '<=",
Identifier, and Number. However, they are not critical since the syntax
analyzer always chooses the first alternative it encounters during analysis. In the example above, this means that '=' is correctly considered part
of the symbol '>=' rather than being recognized as a separate symbol.

Speed
A lexical analyzer implemented with Coco runs at approximately one-half the
speed of a hand-coded analyzer. A 35% speed gain can be achieved if the
nonterminals Letter and Digit with their many alternatives are already
recognized as terminal classes by the mini-scanner.

Assessment
The example has shown how easy a translation process can be described with
Cocol. At the first glance, the grammar may seem a bit confusing. Yet, as
soon as one becomes familiar with this notation, the following advantages can
be observed:
1.

2.
3.

The grammar is short and precise. For the recognition of a symbol, it is
sufficient to write its name without any additional actions.
The syntax is clearly separated from the semantics. Thus the syntax is
more explicit than it is in a hand-coded compiler.
From the syntax declarations, one can see immediately which terminals
and nonterminals are in the language.

Chap. 8

Applications

180

4.

Error-handling actions need not be described explicitly.

5.

Many constructs, like nested comments, can be described with any ina

straightforward and elegant way which is hard to surpass.

Of course, there are some parts of the grammar which are not very simple to
read, e.g. the production for Number. It has a rather complex structure, but
this only shows that Cocol can also handle difficult constructs. After all, the
production for Number describes four different kinds of numerical constants.
This would be difficult to read in a hand-coded lexical analyzer, too, and could
hardly be written in this short and concise form using a conventional programming language.

8.1.3 Semantic

procedures

for lexical

analysis

We decompose the semantic procedures of the attributed grammar into four
modules Scannerlex, OutMod, ListMod, and Conversions and specify
their definition modules, but omit their implementation modules due to space
limits.
DEFINITION MODULE
VAR typ,col,line:

at:

ARRAY[1..10]

PROCEDURE
END

Scannerlex;
CARDINAL;
(*information

OF CHAR;

(*not

needed

about

the

current

token*)

here*)

GetSy;

Scannerlex.

Scannerlex reads and prints a source text and returns every single character as
a separate token. The token number as well as its column and its line number
are returned by GetSy in the global variables typ, col, and line. The token
numbers are the ASCII-values of the source characters (exceptions: eofch=0,
'.=1, '(*'=2, and '*)'=3). After the last character in the source text is read
GetSy always returns eofch.
DEFINITION

TYPE

MODULE

OutMod;

Symboltype =
(*token codes*)
(eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy,
minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy,
lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, commasy,
semicolonsy, periodsy, colonsy, rangesy, constsy, typesy, varsy,
arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy,
importsy, exportsy, fromsy, qualifiedsy, beginsy, casesy, ofsy,
ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy,
untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy,
becomessy, endsy, callsy, definitionsy,
implementationsy,
proceduresy, modulesy, ident, cardcon, intcardcon, realcon,
charcon,

PROCEDURE

stringcon,

eolsy);

Emit (sy:Symboltype;

col:CARDINAL);

Sec. 8.1

Applications in compiler construction

PROCEDURE

EmitConstant

(sy:Symboltype;

181

val,col:CARDINAL)

PROCEDURE EmitIdent (spix,col:CARDINAL) ;
PROCEDURE EmitString (addr, len,col:CARDINAL) ;
END OutMod.

;

The module OutMod contains procedures to write symbols to an intermediate
language file.
Emit writes a symbol without attributes (e.g. a keyword, an operator or a
single character) to the intermediate language. It emits a word which contains
the symbol type sy and the column col of that symbol.
EmitConstant writes a numeric constant to the intermediate language. It
emits two words, the first of which contains the type sy and the column col
of the symbol and the second the constant value val.
EmitIdent writes a name to the intermediate language. It emits two
words, the first of which contains the symbol type ‘ident’ and the column col
and the second the spelling index (spix) of the name.

EmitString writes a string to the intermediate language. It emits three
words, the first of which contains the symbol type 'string' and the column
col, the second the string address addr and the third the string length len.
DEFINITION MODULE ListMod;
FROM

OutMod

IMPORT

Symboltype;

PROCEDURE EnterString(buffer:ARRAY OF CHAR;
« VAR addr :CARDINAL) ;
PROCEDURE

Hash(buffer:ARRAY

VAR spix:CARDINAL) ;
END

OF CHAR;

len:CARDINAL;

len:CARDINAL;

VAR

sy:Symboltype;

ListMod.

ListMod handles the name list and the string list of the scanner. EnterString
enters a string (stored in buffer[1..len]) into the string list and returns its
address addr. Hash searches a name (stored in buffer[1..len]) in the name

list. If not found it is entered. For keywords Hash returns the token code of
the keyword and spix is 0. Otherwise Hash returns the token code ‘ident'
and spix is the address (spelling index) of the name in the name list.
DEFINITION MODULE Conversions;
FROM OutMod IMPORT Symboltype;
PROCEDURE Convert (buffer:ARRAY OF CHAR; len:CARDINAL;
VAR sy:Symboltype; VAR val:CARDINAL) ;

PROCEDURE
VAR

END

ConvertReal (buffer:ARRAY

OF CHAR;

len:CARDINAL;

rval:REAL) ;

Conversions.

The module Conversions converts digit strings to cardinal or real numbers.
The procedure Convert converts a digit string (stored in buffer[1..len])

to a numeric constant or a character constant. The digit string may have the
following syntax:
digitstring

= digit

{digit}

--

decimal

constant

Chap. 8

Applications

182

'H'
| digit {hexdigit}
| octaldigit {octaldigit}
| octaldigit

{octaldigit}

'B'

---

hex constant
octal constant

'C'.

--

character

constant

For numeric constants the output parameter sy is cardcon and val is in the
range 0..65535; for character constants sy is charcon and val is in the range
VERITIE:
ConvertReal converts a digit string (stored in buffer[1..len]) to its real
value rval. The syntax of the digit string is

digitstring = digit {digit}

8.2 Applications

'.' {digit}

in software

['E' ('+'|'-'] digit

[digit]].

engineering

An attributed grammar as a description method and a compiler compiler as an
implementation tool are not limited to compiler construction. They can also be
useful in other fields of software engineering.
The reason why compiler construction techniques can be generally used
in software engineering is that most large programs have the following characteristics:

1.
2.

Input streams are sufficiently complex to be described in terms of syntax
and semantics.
The structure of the input text often determines the logical structure of the
entire program or of a large portion of it.

This wide field of applications is remarkable. We
known Jackson method of program design can be
program design with attributed grammars. With
the compiler description language is emphasized
stays in the background.

8.2.1 Attributed

grammars

will now show that the wellregarded as a special case of
this in mind, in this section
while the compiler compiler

as a software design method

The use of attributed grammars automatically leads to a two-step design process: In the first step (coarse design) the problem is decomposed into its syntactical and semantical parts. Here, the attributed grammar serves as a design
method. In the second step (refined design) the semantic procedures are designed from their specifications in the rough design.
The creation of the coarse design consists of the following steps, which
may be executed sequentially or iteratively:

Sec. 8.2

Applications in software engineering

183

Write the grammar. The syntactic structure of the input text is described
by a context-free grammar.
Find attributes. Starting from the meaning of each syntax symbol, one
tries to find out which (semantic) attributes should be attached to it. Then

one defines these attributes and their occurrences in the grammar rules.
With some experience and a proper understanding of the problem the
right choice is almost automatic. This step is therefore also a good check
on correct understanding of the problem.
Prepare context conditions. Possibly further attributes may be necessary
for this process. ”
Define semantic procedures. In this step, all procedures which are used
in semantic actions are defined. The refinement of semantic actions into
code and procedure calls may again be done in a coarse or fine manner.
Using the first approach, one may associate a special semantic procedure
with each semantic action; using the latter approach, one may describe
each semantic action in terms of elementary operations of a programming
language without calling semantic procedures. Since many of the semantic procedures are usually access procedures to data structures, they support a modular design in the form of data capsules. The collection of all
procedures shows which operations can be performed with the various

data structures and which relations exist between the data structures.
Set up the attributed grammar. One combines the context-free grammar,
the attributes, the semantic actions, and any context conditions for a
proper attributed grammar.

After these five steps, the coarse design is completed and the following has
been accomplished:
1.
2.
3.

The problem has been decomposed into three parts: syntax, context conditions, and semantic actions.
The attributes and the data structures derived from them are the terms in
which the problem solution can be appropriately described.
The access routines to the data structures and all other algorithms required
for the solution are defined by the semantic procedures.

This completes the design method with attributed grammars. The result is sufficiently abstract to fix only the essential semantic design decisions but to leave
enough freedom to the implementor. On the other hand it is sufficiently concrete to specify explicitly those details that should not be left to the decision of
the implementor.

Applications

184

Chap. 8

The result of the coarse design, consisting of a system of attributes, semantic procedures, and an attributed grammar, can be viewed as the specification for the refined design, since it describes what is to be done but not how it

should be done.
The next step is the refined design which may now exclusively concentrate on the semantic procedures without having to consider any syntactic
problems.
However, coarse design and refined design may influence each other.
After the definition of the attributes, one may find that the semantic procedures
are either too abstract or too concrete, too complex or too simple. For example, too many access procedures to the data structures of a module may indicate that it would have been better to add a lower level of abstraction, and to
divide the large module into several smaller ones. The concise and formal notation of attributed grammars encourages one to try several approaches and to
check their consequences without much effort, even when the task is large.
The refined design is followed by the implementation. Only a lexical analyzer has to be written here, the rest is done by the compiler compiler.

8.2.2 The telegram

problem

as an example

Henderson and Snowdon [1972] presented the following problem, which is
known as the 'telegram problem’:
A stream of telegrams is to be processed. Each telegram is terminated by
the string 'ZZZZ’. The telegram stream is terminated when an empty telegram followed by 'ZZZZ' arrives. The words in a telegram are to be
counted. Long words with more than 12 characters are to be counted separately. After each telegram, the counter values are to be printed. The telegrams are read and subsequently printed in lines of 100-120 characters.
Superfluous blanks are to be eliminated. The maximum word length is 20
characters. Longer words are to cause the program to stop.

Since the input consists of structured data, and its structure will significantly
determine the algorithm, this task is well suited for the application of attributed
grammars, and a subsequent implementation with a compiler compiler.
The design steps for the solution of the telegram problem are:
1.

Setup the grammar of the input data
Terminals:

textword
endword

a word in a telegram
end word (= ZZZZ)

Sec. 8.2

Applications in software engineering

Nonterminals:
TelegramSt ream

the total telegram stream

Text Telegram
EmptyTelegram
Context-free grammar:
TelegramStream
TextTelegram
EmptyTelegram

Define attributes.
result:
WwW

array

n
1

integer
integer

185

a text telegram (including its end word)
an empty telegram containing only the end word
=
=

{TextTelegram} EmptyTelegram.
textword {textword} endword.
endword.

From the specification of the task, three attributes

of char

the text of a word

the number of words in a telegram
the number of long words in a telegram

Assign attributes to the grammar symbols. In this step, we list the grammar symbols and attach attributes to them.
textwordt,
recognizes a word and provides its text w.
Text Telegram, ft)
recognizes and prints a telegram with n
words, of which / words are long.

The remaining gramrnar symbols have no attributes.
Note that the attributed symbols are viewed from an algorithmic
point (i.e. we do not say 'TextTelegram represents a telegram’, but
rather 'TextTelegram recognizes a telegram’). The verbal description of
the attributed symbols should specify all attributes of the symbol. It
should be accurate enough to be used as a specification of the translation
process. This is usually possible and easy to accomplish since the few
items involved have already been previously defined.
Define semantic procedures. The actions the program must execute can
be seen from the problem description:
(a)

read the source text, recognize and count the words;

(b) print the source text with a different line length;
(c)

print the counter values.

Reading the source text is the task of the lexical analyzer and does not
concern us here. The words are counted with the attributes n and /.
Therefore, the only candidates for semantic procedures are those which
print the text and the counter values. A variable will probably be needed
to assure that the line size will not exceed 120 characters. It will be initialized at the beginning of each telegram, and will be checked and increased
when a new word is added to the line. A line buffer may also be needed.
Following the principle of stepwise refinement, we are not yet interested
in the implementation details here. Rather, we define the following three
procedures which will do the whole printing job.

Out Init
Cut Word (dw)

initialize the output of a telegram;
print the word w according to the problem definition;
print the counter values n and / with an appropriate text.

OutAccount (InJ1)

5.

Chap. 8

Applications

186

Write down the attributed grammar. Having completed
through 4, the attributed grammar is almost self-evident now:
TelegramStream
{

steps 1

=

sem

TextTelegramf,f)

OutInit endsem
sem OutAccount ({nb1)

endsem

} EmptyTelegram.

TextTelegramt,tı

=

textwordf,,

where
sem

(|w|<=20)

n:=1;

if

|w|>12

OutWord

then

1:=1

else

1:=0

end;

(Lw)

endsem

{ textwordt,,

where
sem

(|w|<=

20)

n:=ntl;

if

|w|>12

OutWord

then

1:=1+1

end;

(Lw)

endsem

} endword.
EmptyTelegram

=

endword.

This completes the coarse design of the telegram problem. Syntax and semantics are clearly separated. Together they provide a clear decomposition of the
program, making its structure apparent. The separation shows that the semantic processing — i.e. the essential part — is very simple if there is a printing
module with the access procedures Out/nit, OutWord, and OutAccount.
A comparison with Henderson and Snowdon's solution shows that in his
program lexical analysis and syntax analysis attract the major part of attention
in design, program text, and possible design errors. Output and counting are
of minor importance and are nearly lost. Their solution avoids the terms syntax and semantics, thus letting the problem appear to be much more complex
than it is. In contrast, we focus most of our attention on printing and counting.
We consider lexical analysis and syntax analysis as routine matters that do not
require special attention.

Sec. 8.2

8.2.3 Attributed

Applications in software engineering

grammars

187

as documentation

From the above, it should be obvious that attributed grammars are also well
suited for documentation. Thé system of syntax, attributes, semantic procedures, and the attributed grammar of a software product is its documentation
(on the abstraction level of the attributed grammar). The following advantages
of this documentation method are evident:

1.

The form of the documentation (its structure) is easy to find. It is almost
independent of the product to be described, and consists of the parts: terminals, nonterminals, context-free grammar, attributes, attributed
grammar symbols, semantic procedures, and attributed grammar (in

2.
3.

The documentation is formal and therefore precise, complete, and short.
The documentation is abstract enough to hide implementation details, but
concrete enough to express important conceptual details.
The fact that attributed grammars represent a machine-readable documentation renders it unnecessary to separate implementation and documentation, thus ensuring that the documentation is always up-to-date.

this order). This arrangement aids standardization.

4.

3

8.2.4 The Jackson

method

as a special case

At a quick glance, the often discussed Jackson method of program design
seems to have nothing in common with attributed grammars. Jackson [1975]
uses a totally different terminology and describes his method only by examples in an indirect and unsystematic manner. To find out the essence of Jackson's method, the reader is forced to study other literature.
The Jackson method is based on the following three concepts:

Ne

The structure of an algorithm is derivable from its input and output data.
The structure of the input and output data is described by tree diagrams
which allow the description of sequences, alternatives, and (unlimited)

3.

repetitions.
If the structures of the input and output data 'match' in a certain way, the
total algorithm for the transformation of the input into the output data can
be viewed as an assembly of the transformation algorithms for the individual substructures.

If the structures of the input and output data do not match, the Jackson method
fails. However, in the examples in his book, Jackson shows that his method

188

Applications

Chap. 8

can still be used with the aid of tricks such as ‘backtracking’, ‘program inversion’, and some other techniques.
Hughes [1979] looked at the Jackson method from the standpoint of formal languages and summarized the following points:
1.

2.
3.

Jackson's tree diagrams describe only regular languages since they are
only based on sequences, alternatives, and unlimited iterations.
In addition, it is required that the input data can be deterministically analyzed with a single-character look-ahead.
Jackson's requirement of a structural matching between input and output
data means in the terminology of formal languages that there must be a finite automaton that transforms the input into the output.

Jackson's design method can be viewed as a special case of the design method
with attributed grammars, in which:
1.
2.
3.

the input data is regular and its grammar is LL(1);
the output data form a regular language;
acertain correspondence exists between input and output language that
manifests itself in the fact that a finite automaton can be found that transforms the input into the output.

It is therefore only applicable to a narrow range of tasks that meet these
conditions.
It is suprising that this relationship between Jackson's method and the
design method with attributed grammars has hardly been recognized. The reason for this may be that Jackson does not distinguish between syntax and semantics (in fact, they are indistinguishably coupled in his examples), and does
not use attributes.
If we describe the examples in Jackson's book with attributed grammars,
they will become simpler, shorter, and easier to understand. The grammars are
simple throughout. We will show this by example 14 of Jackson's book,
which in his discussion covers 17 pages, and is the most voluminous of the
entire book.

Problem description. An operating system collects data about its use. These
data are: A record for the start of each session (LOGON), the end of a session

(LOGOFF), the start of a program (PROGSTART), and the end of a program
(PROGEND). At logon time, the user is assigned a unique session number.
The system makes sure that a user can start a session only when his terminal is
free, and cannot terminate a session that he has not initiated. Furthermore, a

user can have only one active program at any given time. He must terminate
an active program before starting a new one.

Sec. 8.2

Applications in software engineering

189

The collected data is written to a file. The records have the following
contents:
Logon record:

106G0N

Logoff record:
Progstart record:
Progend record:

session number

LOGOFF-,,
PROGSTART
PROGEND

start time

session
session
session

stop time
program name
program name

number
number
number

start time
stop time

The records are stored in strict chronological order. However, it is possible
that records are missing due to erroneous processing. In this case, the data file
contains incomplete information for some sessions and programs: a logon
record without corresponding logoff record, and vice versa; a progstart record
without corresponding progend record, and vice versa.
As a result, the program should produce the following list:
Number of complete sessions
Average session length
Number of known sessions

= nnnn
—SBEEE
= mmmm

Number

=

of

complete

Average program
Number of known

programs

length
programs

pppp

= uuuu
= qqqq

Grammar. The input consists of four kinds of records. We regard them as
terminals: logon, logoff, progstart, and progend, and arrive at the following grammar:
input

=

{logon

| logoff

| progstart

| progend}.

It consists of a single rule (for regular languages, there is always a grammar
that consists of a single rule). In accordance with the problem description we
attach attributes to the terminals:
session: integer

session number

prog:

name

program name

time:

integer

time of logon, logoff, progstart and progend

and get the attributed grammar symbols
1 FONT sessiont time
logo £fT sessiontt ime

progstartf,essiontprogftime
progendt, essionlprogltime

Semantics. In the semantic actions, we need variables that hold the results.
We call them
completesessions:

integer

knownsessions:

integer

number of complete sessions
number of known

completeprogs:
knownprogs:

integer
integer

number of complete programs
number of known programs

sessions

Chap. 8

Applications

190

sessiontime:

integer

length of all complete sessions

progtime:

integer

length of all complete programs

It is clear from the above that, when a logon record appears, the job number
and the start time must be stored until a logoff record with the same job number is encountered. The same is true for programs. For the time being, we will
put the definition of the concrete data structures in the background, and consider only the fact that we need the following access procedures:
NewSession (lsession\time)

Define the start of a session at the specified time.
DisposeSession(\session)
Define the end of a session.
SessionStarted(\session): boolean

Return true if the specified session has been started.
SessionStartTime(\session): integer
Return the start time of the specified session.

NewProg(\session\prog\time)
Define the start of the program prog in the specified session at the specified time.
DisposeProg(\session\prog)
Define the end of the program prog in the specified session.

ProgStarted(\session\prog): boolean
Return true if prog in session has been started.
ProgStartTime(\session\prog): integer
Return the start time of prog in session.

InitStorage
Initialize the abstract data structure.

Attributed grammar. With only these few facts, which are easily derived by
modest thought, the attributed grammar of the problem can be formulated:
input

=
sem

InitStorage;

completesessions:=0; knownsessions:=0;
completeprogs:=0; knownprogs:=0;
sessiontime:=0; progtime:=0;
endsem
{1 OFONT sessionT time

sem

knownsessions:=knownsessionstl;

NewSession(Lsessiondtime)
endsem

;

Sec. 8.2

Applications in software engineering

191

| prog Startf ses siontprogft ime

sem

knownprogs:=knownprogs+1;

NewProg (lsessionlprogdt ime)
endsem
| progendfsessiontprogTtime

sem

if ProgStarted(JsessionYprog)
then
completeprogs:=completeprogst1;
progtime:=progtime+
(time-

ProgStartTime (Vsession\prog) )
DisposeProg(JsessionYprog)
else knownprogs:=knownprogs+1

end
endsem
| logofffsessionttime

sem

if SessionStarted(Jsession)
then
completesessions:=completesessionst];
sessiontime:=sessiontime+
(timeSessionStartTime

(Jsession))

DisposeSession (Jsession);
else knownsessions:=knownsessionstl
end
endsem

ae

sem

Write (Jcompletesessions)

Write (Jsessiontime/completesessions)

Write (Jknownsessions)
Write (Jcompleteprogs)
Write (Jprogtime/completeprogs)

Write (Jknownprogs)
endsem

At this point, the coarse design is already completed. The refined design will
decide about the concrete implementation of the abstract data structure. In
principle, the program can be implemented with a compiler compiler. In order
to read the input data, only a (trivial) lexical analyzer needs to be written. But
since the grammar of this problem is so simple (as it is also for the telegram
problem), the use of a compiler compiler is analogous to taking a sledgehammer to crack a nut. It is therefore almost self-explanatory that the syntax analyzer for this problem is coded using the method of recursive descent (in this
case it is even non-recursive).

Jackson instead undertakes voluminous considerations about intermediate
data files and program inversions which make the task appear much more
complicated than it really is.

Chap. 8

Applications

192

8.3 Results of a Coco run
For readers interested in the way Coco works, we present an example
showing the contents of the compiler parts generated from a specific input
grammar. It can be viewed as a supplement to the implementation description
in Chapter 7, and should help to understand the principles explained there.
The example will be the description of an index generator, which is a
program that generates an index from a list of keywords entered according to
some syntactic rules. This problem provides another example of the use of
attributed grammars in software engineering.
The input to the index generator is to be as follows: for each page of a
document, the page number and all keywords on this page are entered in the
following manner:
1 = Introduction;

User's

Guide;

2 = Start up;
Parts of the tool;
3 = General characteristics; User's

Guide

On the left-hand side of the '=' sign, page numbers as well as words are
allowed. Words, however must start with a '*':
*Appendix

= Maintenance;

Troubleshooting;

From this input, the compiler generates a file of pairs <keyword, page
number>, sorts this file, and prints an index in which page numbers of
identical keywords are collected (the index at the end of this book was produced with such a program).
In our example, we will describe the first phase of this compiler, i.e. the
generation of the <keyword, pagenumber> file.
1

GRAMMAR

Index

2
3

SEMANTIC

4

FROM

5
6

FROM

7
8

VAR

f: File;
keystring,refstring,string:

value:

9
10
11

DECLARATIONS

FileIO IMPORT File, Open,Close, Write, WriteString, WriteLn;
Indexlex IMPORT GetKeyword, AdjustNumber.

ARRAY[1..50]

CARDINAL;

TERMINALS

n=)
Urn

alias
alias

equal
semicolon

Seil
== 7;

14

MER

alias

asterisk

=>

15
16

keyword
number<out:value>

17
18

==
=>. 5)

OF

CHAR;

Sec. 8.3

Results of a Coco run

18
19
20

PRAGMAS
eolsy

21

NONTERMINALS

22
23
24

Index
Relation
Reference<out

--

,

193

6

-- 7
= te
=.)

:string>

25
26

RULES

27
28

Index

=

29

30

sem Open (f, "INDEX.OUT")
sem Close(f) endsem.

{Relation}”

endsem

3l

------------------------------------------- 2-2...

32
33

Relation =
Reference<out:refstring>

34

n_"

35
36
37

{ keyword

sem GetKeyword (Tkeystring);
WriteString (Jf,\keystring);
Write (Jf,JCHR(0));

38
39

WriteString (Jf,Jrefstring);;
endsem

40

41

QQ
43

„44
45

0.0...

WriteLn(f)

Wie "

}

eon

nnn

5

Reference<out:string>

number<out :value>
| "*" keyword

5-5-5

= == 5-5 $=

$= $= - = -- = - == - == == ---------

=

sem AdjustNumber (Jvalue,Tstring)
sem GetKeyword (Tstring) endsen.

endsem

46
47

ENDGRAM

This is the description of the translation process. The only thing the user has
to provide is the module /ndexlex that supplies the terminals and exports the
two procedures GetKeyword and AdjustNumber. GetKeyword should
return the keyword string that the lexical analyzer has obtained after recognition of the terminal keyword. AdjustNumber should right-justify a number
in a character field for sorting. The pragma eolsy is specified only to show
how pragmas are encoded in the generated tables.
From this input, Coco generates a table-driven syntax analyzer and a
semantic evaluator. These modules will be discussed in the next sections.

8.3.1 The generated

syntax analyzer

The syntax analyzer is generated from a frame program (cocosynframe,

shown in Appendix F) into which Coco inserts the following constant
declarations.

Chap. 8

Applications

194

CONST

maxname
maxnamep

=
=

vs
oF

(*length
(*number

of name list*)
of names*)

maxcode

=

48;

(*length

of G-code*)

maxany
maxeps
maxt
maxp
maxs
startpc

=
=
=
=
=
=

ip
aie
57
6;
9;
44;

(*number of any-sets. At least one
(*number of eps-follower sets*)
(*last terminal number*)
(*last pragma number*)
(*last nonterminal number*)
(*start address of the grammar*)

dummy*)

These values are the table dimensions derived from the above grammar.

8.3.2 The

generated

semantic

evaluator

The semantic evaluator also consists of fixed frame parts and parts that are
copied from the attibuted grammar. For the index generator, the semantic
evaluator is as follows (generated parts are shown in italics and frame parts are
shown in roman type):
IMPLEMENTATION MODULE Indexsem;
FROM SYSTEM IMPORT WORD;
FROM Indexlex IMPORT at;
FROM

FileIO

FROM

Indexlex

VAR

f:

IMPORT

File, Open, Close, Write, WriteString,WriteLn;

IMPORT

GetKeyword, AdjustNumber;

File;

keystring, refstring,string:
value: CARDINAL;
PROCEDURE

BEGIN

ASSIGN(VAR

x:=y

END

x:WORD;

y:WORD);

ASSIGN;

PROCEDURE Semant (sem:CARDINAL)
BEGIN
CASE sem OF
vee?

| 12:

ARRAY[1..50]

;

(*line 29*)
Open (f, "INDEX.OUT")

[| 13:

(*line

30*)

Close (f)

| 14:
72152.

(*line

33*)

refstring:=string;

(line 35%)
GetKeyword (keystring);
WriteString(f,keystring) ;
Write(f,CHR(0));
WriteString(f,refstring);

I 16:

(*line

44%)

WriteLn (f)

OF CHAR;

Sec. 8.3

Results of a Coco run

195

ASSIGN (value, at [1]);

[17s

I 18:

(*l]ine 44%)
AdjustNumber (value, string)
(*line 45%)
GetKeyword (string)

END;

END Semant;
END Indexsen.

8.3.3 The generated

parser tables

Coco generates the following tables:

1.
2.

G-code;
information about nonterminals (G-code start address, deletability, set of
start symbols);
terminal successors of eps-symbols;
symbol sets represented by any-symbols;
number of attributes for terminals and pragmas;
number of semantic actions for pragmas;
symbol names for error messages.

The table values are inserted as initialization code into the generated syntax
analyzer. We will now show these values in a decoded form.

G-code
Address
=

Instruction

Code

(addresses

take

uindex2——=

IL

SEM12

2

NTA

Relation,

3

8

0

6

JMP

2

l@

@

2

EPS
SEM13
RET

1

8
13
ia

1

Reference

2
9
14
al
1
4
0
15
he
2
10)
Os
Cu?

9
oF
12

See
13
15
16
18
22
23
25
28

VW
9

9

Re OU LON mea
NT
SEM14
a
TA
SEM15
T
JMP
EPS

won
keyword,
Du
18
2

28

28

2 bytes)

Chap. 8

Applications

196

=
References
31
TA
number,
35
SEM16
36

SEM17

37
38
40
42
43

RET
"iL
T
SEM18
RET

===

38

1
16

5

0238

WH

11
0
0
18
ial

ial
keyword

3
4

dunmyarulleg=

44

NT

Index

2

7

46

Te

EOF

0

0

48

RET

im)

The entire grammar occupies only 48 bytes of G-code!
Nonterminal description
symbol
Index

(no.)

start

(7)

Relation (8)
Reference (9)

address

deletability

iL

deletable

{"*",

number}

nondeletable
nondeletable

{"*",
{"*",

number}
number}

13
31

terminal

epS-SUCCEesSSors
119

{EOF }

oe

{EOF,

"*",

number}

Number of attributes for terminals and pragmas
EOF:
mat:
ml:
"Ku

0
0
0
‘

keyword:
number:
eolsy:

0
1
0

0

Pragma semantics
attribute
eolsy:

passing
0

action

user

action
0

Symbol names
names:

EOF/equal/semicolon/asterisk/keyword/
number/eolsy/Index/Relation/Reference

nanespointers

ip sell,

21,

BO,

Sh,

“5.

Sil,

57,

66

start

symbols

9
Experiences with Coco

In 1981 workers at the University of Linz built a parser-generator that generates parser tables for an LL(1) syntax analyzer from an input grammar in
Wirth's EBNF notation. The generator proved useful, which is the reason
why it was enhanced in 1983, and eventually evolved into the compiler

compiler Coco.
The first version of Coco ran on an Intel 8080 development system, and
was written in PL/M-80, a language similar to PL/I for microcomputers. Since
then, many more versions of Coco have been implemented in Modula-2 on
various microcomputers including the Macintosh, the IBM-PC, the Atari 1040
and the Lilith. There is also a version for IBM mainframes. Coco has been in
use for several years now and has proved to be useful both in research
projects (e.g. construction of a Modula-2 compiler, tools for static program
analysis) and in student courses.

9.1

A basis for measurements

In the following sections, we will describe the results of memory and run-time
measurements performed on Coco, and on three compilers generated by Coco.
First, we will measure the generation of a Modula-2 compiler. This
compiler consists of 6 passes (lexical analysis, syntax analysis, name
analysis, declaration analysis, semantic analysis, and code generation). Each
of passes 2 through 6 reads the entire source program in an intermediate
language generated by the previous pass. This intermediate program is

197

Chap. 9

Experiences with Coco

198

analyzed and forwarded to the next pass as a new, usually shorter, intermediate program (with the exception of pass 6, which generates the object
code). Each pass is therefore a compiler in itself, described with an attributed
grammar and translated by Coco into a syntax analyzer and a semantic
evaluator. For the measurements, we will not look at the entire Modula-2
compiler, but rather at two specific passes, since we are interested in the
individual Coco runs. We select pass 2 (syntax analysis) and pass 4
(declaration analysis). These two passes have rather different characteristics,
which make them well suited for a comparison. Pass 2 has a large and deeply

nested recursive grammar with only a few semantic actions, while pass 4 has a
simple grammar with a lot of semantic actions. In the following paragraphs,
we will talk about each of the passes as if they were independent compilers.
Secondly, we will measure the generation of Coco by itself. Compared to
the Modula-2 compiler Coco is much smaller and consists of a single pass.
Thus, we have a comparison between two large applications and a small
application. Table 9.1 shows the sizes of the compilers in terms of their
attributed grammar.
Table 9.1 Size of the attributed grammars of the example compilers
Modula-2

Modula-2

(pass 2)

(pass 4)

Number of lines
Terminal symbols
Pragmas
Nonterminal symbols
Alternatives
Symbols in productions
Semantic actions
G-code

The measurements shown in the following sections were taken from the Lilith,
since the Modula-2 compiler was only available there. For the Macintosh the
results would have been very similar.
The Lilith is a 16-bit computer built on an Am2901 bit-slice processor
with a cycle time of 150 nanoseconds. It has a very compact object code
format (the so-called M-code) which has been especially tailored to Modula-2.

Sec. 9.2

9.2 Measurements

Measurements on Coco

on

199

Coco

First, we will look at Coco and measure the memory requirements and the-run
time required by Coco to generate a compiler.
Memory requirements
Obviously the memory requirements for the code and the static data of Coco
are the same in all three measurements (65 347 bytes). The size of the dynamic
data depends on the input grammar but requires typically less than 1000 bytes
(see Table 9.2).

Table 9.2 Memory requirements of Coco for the generation of various compilers
Modula-2
(pass 2)

65537 bytes

66219

bytes

65911

bytes

The memory requirement for the code is shared between ten Coco-specific
modules and two standard modules. In addition, Coco uses one module that
belongs to the resident part of the operating system, and thus does not increase
Coco's memory requirements.

Run-time
The run-time of Coco depends on the size of the input grammar. Most of the
time is used by the lexical analyzer that reads and lists the grammar. To write
out the syntax analyzer and the semantic evaluator of the target compiler also
requires considerable time, while the rest of the work is done fairly rapidly. In
large grammars, with a deeply nested hierarchy of nonterminals (as in pass 2
of the Modula-2 compiler), also the grammar tests take a certain amount of
time. (see Table 9.3)

Chap. 9

Experiences with Coco

200

Table 9.3 Run-time of Coco for the generation of various compilers
Modula-2
(pass 2)

Modula-2
(pass 4)

Lexical analysis
Syntax analysis, semantic processing
Grammar tests
Output of the generated compiler

9.3 Measurements

on some

generated

compilers

We will now consider the memory requirements and the run-time of the
compilers generated by Coco.
Memory requirements
Here, we are only interested in parts which are actually generated by Coco,
namely the syntax analyzer, the semantic evaluator, and the parser tables. We
are not going to consider the size of the semantic modules since they are
independent of Coco.
Table 9.4 Memory requirements of some generated compilers
Modula-2
(pass 2)

Syntax analyzer
Semantic processor
Analysis tables

9532

bytes

8389

bytes

6344

bytes

All three compilers use the same syntax analyzer driven by different tables. Its
size is constant. The size of the semantic evaluator depends on the number and
the length of the semantic actions of the attributed grammar. As expected, its
size is larger in pass 4 of the Modula-2 compiler than in pass 2 and in Coco.
Note that the memory requirements of the generated compilers do not
depend on the length of the input text, since no Syntax tree of the input
is built.

Sec. 9.4

General experiences

201

Run-time

The run-time of the generated compilers on input texts of various length
is
shown in Table 9.5.
Table 9.5 Run-time of some generated compilers

100 Input symbols
1000 Input symbols
5000 Input symbols

Even though Coco is the smallest of the three compilers, it runs much slower
than the others since it does a lot of input and output (it writes long parts of
Source programs to disk), while pass 2 and pass 4 of the Modula-2 compiler
work almost entirely in the main memory (with input and output used only for
intermediate languages).
#

ws#

9.4

General

experiences

The experiences with Coco are exceptionally good. Coco allows a tight and
very readable specification of the translation processes. The attributed grammars become essential parts of each compiler documentation.
By automating syntax analysis, error handling, and semantic processing,
attention can be focused on the actual translation process in the semantic
procedures. More time is available for the design now. Working with attributed grammars almost automatically leads to a modular program structure
with abstract data structures and access procedures, which are usually small
and easy to understand.
In multi-pass compilers, like the Modula-2 compiler, the symbol any is
especially useful since it lets one easily skip over portions of the input that are
not of interest in this pass. The concept of pragmas has also proved useful
since they make it easy to pass control information between successive passes
(e.g. trace commands, options, etc.).
The limitations of LL(1) grammars are not a serious problem. Because of

Wirth's EBNF notation, it is not necessary to perform complex grammar
transformations in order to remove LL(1) conflicts, which is usually required

202

Experiences with Coco

Chap. 9

in the standard BNF notation. The only time when we failed to resolve LL(1)
conflicts was in the translation of the language PLM-80. The conflicts were
resolved by delegating some parts of the processing to the lexical analyzer.
Processing the input with L-attributed grammars and without building a
syntax tree is not a serious restriction. If during processing some attributes are
needed which only become available later, intermediate results are stored until
the required attributes have been calculated and the final translation is possible.
The omission of a syntax tree leads to efficient compilers with regard to speed
and memory requirements. Most of the generated compilers run on microcomputers.
The negative experiences in the use of Coco are limited to the global
nature of semantic objects in Cocol, which requires explicit stacking of
variables, and to the fact that whenever an error has been detected in the
attributed grammar the program development cycle is enlarged by an additional
run of the compiler compiler.
However, the positive experiences outweigh the negative ones. Even
though we have no hand-coded compiler that we can compare directly to a
Coco-generated compiler, we are not afraid to claim that the efficiency of
compilers generated by Coco is close to that of hand-coded compilers, and it is
certainly easier to implement and to maintain a compiler with Coco than by
hand.

A
Definition of Adele

An algorithm description language, like a programming language, should offer all concepts
for the description of algorithms, but should be free of syntactic peculiarities. In this way,
the algorithms will stand out clearly and the reader will not be distracted by all sorts of
baroque constructs. For the same reason, it should use only a few constructs and give the
user freedom of expression. It should lean on popular programming languages so that it is
easy to read, but should not be firmly bound to a particular programming language. Our
algorithm description language Adele contains elements of PL/I, Modula-2, and Ada. We
will describe its structure by a few examples.
Overall structure
Each algorithm has a name, parameters, and instructions:

Search (Jlistllengthixti):
begin
Instructions
end Search

The parameter list of functions is followed by the type of the function:

Search (llistllengthlx)

integer:

begin
Instructions
return ı
end Search

Input parameters are marked by J, output parameters by T, and transition parameters by J.

Statements
We distinguish between assignments, procedure calls, control statements, input-output
statements, and text statements. To improve readability, instructions may optionally be
separated by a semicolon.

203

App. A

Definition of Adele

204

Assignment.

The assignment has the form

variable:= expression

Procedure call. The call of a procedure consists of the procedure name and the actual parameters in parentheses:
ReadCard (Tcard)

It is a useful convention to define procedure names partially with capital letters, and variable
names completely with lower case letters.

Control Statements. Here we use the modern forms of Modula-2 which are explicitly
terminated by an end, with the exception of the repeat statement:
if expression

then

statement

sequence

end

if expression

then

statement

sequence

else

case expression of
label: statement sequence
| label:

statement

or

sequence

expression

repeat
loop

do

Statement
statement

variable

statement

statement

sequence

sequence

sequence

sequence

with

:= expression

sequence

until

end

| label:

else
end

while

sequence

case expression of
label: statement

end

for

statement

statement

statement

sequence

end

expression

exit

end

to expression

[by expression]

do

sequence

end

The control variable will be undefined after completion of the for loop.
exit

exits from the immediately enclosing loop statement.
return

exits from a procedure.
return

expression

exits from the function procedure with expression as the function
value.
halt

stops the algorithm without return to a surrounding algorithm.
Input-output statements. Here we only use three statements:

read(TxTeof)
write (Lx)
writeln

read x or signal end of input file
write x to the output medium
emit line feed

We do not concern ourselves with the format of the input and output text. The boolean
parameter eof indicates the end of the input file. When x has been read, eof will
be false

App. A

Definition of Adele

205

on return. If x could not be read due to end of file, eof will be true and x will
be undefined
on return.
Text statements. Text statements are free texts that describe actions. For example:
calculate

mean

values

and

variances;

The only rule is that they be terminated or separated by a semicolon so that their end can
be
seen.
Expressions
For expressions we stipulate the common combinations of Operators and operands without
giving specific rules. We state only that boolean expressions can be viewed as conditional
expressions with short circuit evaluation:

ach
au || Io

is equivalent to
is equivaleni
to

if a then
if a then

b else false
true else b end

This means that if the left operand alone is sufficient to determine the value of the
expression, then the right operand is not evaluated.

Declarations
Usually declarations are not needed for the description of short and simple algorithms,
especially if the variables used are obvious from the preceding explanations. However, in
longer algorithms with local variables, global variables, parameters, and perhaps also named
constants, it is advantageous if the algorithm description language also contains declarations.
In Adele, the declaration of constants and variables can be written between the head of the
algorithm and begin. We partition the declared items into the following classes: parameters, global variables, constants, local dynamic variables, and local static variables. The
classes are identified by the keywords param, global, const, static. After each keyword,
one or more declarations of names of the corresponding type can be placed.
A constant declaration has the form
name

=

value

a variable declaration has the form
name:

type

As types we use the elementary types of Pascal and Modula-2 with the following keywords
or structures:
integer
real
boolean
char
(red, green,

array

blue)

(index:index)

of type

Array types allow a certain amount of freedom. If the range limits are not needed, we write
array

of type

If the type is not needed, we write
array

(index: index)

Definition of Adele

206

App. A

If both are not needed, we simply write
array

As an example of the use of declarations, we describe a linear search algorithm with
declarations of all names:
Search (vlistd length! xT i) :
param list: array of integer
length, x, i: integer
local j: integer
begin
j:=length
while j>0 & list (j)<>x do
j:=j-1
end
1:=j
end Search

For static variables, we allow optional initialization. This is done by adding the phrase
init(value) after the type:
static

finished:

boolean

init (false)

Comments

Comments, like those in Ada, start with two minus signs and extend over the rest of the
line.
---

This is a comment
which extends over

Undefined

two

lines.

issues

Adele has no rules for the remaining items such as records, pointers, modules, etc. We write
them, more or less, in the style of Modula-2.

B
Modula-2 and Pascal

Since Modula-2 evolved from Pascal, its appearance is very similar to Pascal, and so Pascal
programmers have no difficulty in reading Modula-2 programs. Here we will briefly present
the most important differences for the reader of the Modula-2 programs in this book. The
complete language definition and examples can be found in the books of Wirth [1982] and
Pomberger [1986]. A didactically emphasized introduction to Modula-2 is the book of
Blaschek, Pomberger, and Ritzinger [1985].

General’ characteristics
Modula-2 is a system implementation language that enhances Pascal in the following key
features:

1.

2.

3.

Modular program structure. Modula-2 programs are composed of separately compiled
modules. The compiler checks the consistency of the interface between modules. The
language is therefore especially suited for the implementation of data capsules and
abstract data types.
Coroutines and parallel processes. Modula-2 provides the coroutine facility as the

basic element for the implementation of parallel processes.
Low-level features. Modula-2 provides facilities to bypass strong type checking so
that memory words can be directly accessed and addresses can be handled. This makes it
possible to produce machine-specific code.

We will not describe parallel processing or low-level features in this chapter since Coco does
not use them.
Lexical elements
Modula-2 differs from most Pascal implementations by its sensitivity to the case of letters.
The names TRUE, True, and true denote three different objects.
Single character constants can be denoted by use of an octal number that is terminated

with a 'C', e.g. CONST ff = 14C.

207

App. B

Modula-2 and Pascal

208

Declarations
In contrast to Pascal, constants, type, variable, and procedure declarations can appear in any
order. There are no labels or label declarations.
Standard types. In addition to the standard types of Pascal; INTEGER, REAL,
BOOLEAN, CHAR, we have the standard type CARDINAL for unsigned natural
numbers. For 16-bit implementations, the range of integer values is -32 768 to +32 767.
The range of cardinal values is 0 to 65 535.
Enumeration, subrange, array, record, and pointer types are the same as in Pascal with
the exception that arrays cannot be packed, and variant record types have an improved syntax.
If the word length of the computer is w bits, then the cardinality of set types is
confined to w, or a ‘small multiple thereof (according to the language definition). There is a
standard type BITSET that consists of the elements 0 through w - 1:
TYPE

BITSET

=

SET

OF

[0!..w-1]

Set constants are enclosed in '{' and '}'.

The machine-dependent type WORD denotes arbitrary data whose length is a
machine word. It is compatible with all types whose length is a machine word.
Expressions
Expressions in Modula-2 are constructed in the same way as in Pascal. The operators have
essentially the same meaning. One important difference in Modula-2 is that expressions that
contain the operators 'AND' or 'OR' are interpreted as conditional expressions whose
evaluation is terminated as soon as the result of the expression is known (short-circuit
evaluation):

a AND b
a OR b

is equivalentto
is equivalentto

if a then b else false
if a then true else b

Statements
Assignment, procedure call, and repeat-statement are taken from Pascal without change.
If, case, while, and for statements have been syntactically improved and expanded. The
if statement can have one or more elsif parts, the case statement can have an else part. All
of these constructs are explicitly terminated by END, which eliminates the need to
distinguish between single and multiple statements in a block:
ifstatement =
IF expr THEN statementsequence
{ELSIF expr THEN statementsequence}
[ELSE statement sequence]
END.

casestatement
CASE

case

expr

=

=
OF

Whilestatement
WHILE

case

caselabellist
expr

":"

case}

[ELSE

statementsequence]

statementsequence.

=
DO

forstatement =
FOR ident ":="

statementsequence

expr
statement sequence

END.

{"|"

TO expr

END.

[BY constexpr]

DO

END.

App. B

Modula-2 and Pascal

209

New features are the loop statement (infinite loop), the exit statement to leave
the loop
statement, and the return statement to leave a procedure or function (here with
passing of
the function value):
loopstatement

=

exitstatement

= EXIT.

returnstatement

LOOP

statementsequence

RETURN

=

END.

[expr].

There is no goto statement and no input-output statement in Modula-2. Input and output is
done by procedure calls.
Procedures
7
There are procedures and function procedures as in Pascal that permit VAL and VAR parameters. Procedures and functions both begin with the keyword PROCEDURE. Modula-2
permits procedure variables (not used by Coco), and arrays of unspecified length (so-called
open arrays) e. g. in the form:
PROCEDURE Sort (VAR
VAR n: INTEGER;

list:ARRAY

BEGIN

(* assume

n:=HIGH (list);

(* standard

END

OF

list:

proc.

INTEGER);

ARRAY[0..n]

to

find

OF

upper

INTEGER

limit

*)

of index

*)

Sort

Standard procedures. The standard procedures that differ from Pascal are:
CAP (ch):

CHAR

converts from lower to upper case

HIGH(a):

CARDINAL

returns the upper bound of array a

DEC (x)

decrease

DEC (x,n)

X!=x-n

EXCL (s,i)

exclude element i from set s:

HALT

terminate entire program

INC (x)

increase

INC (x,n)

INCL(s,i)

x:ı=X-1

s:=s-{i}
x:=xt1
x:=x+n

include element i in set s:

s:=st{i}

Type transfer functions. Modula-2 offers the possibility of explicit type conversions by
so-called type transfer functions. Each type name can be used as a function with one
argument. For example, the type transfer function
CARDINAL
(b)

denotes the bit pattern of b (without any conversion) but with type CARDINAL. The
context condition must hold that type b has the same number of bits as CARDINAL.
Type transfer functions should be used with care since they make programs machine
dependent.
Modules
An executable Modula-2 program consists of one or more separately compiled modules. A
module is a collection of declarations and statements giving a higher-level unit. Module
boundaries are like a fence for names, which means that names declared inside a module are
unknown outside, and names declared outside a module are unknown inside. The programmer

can open the fence for selected names by an import list that contains all names that are

Modula-2 and Pascal

210

App. B

declared outside and are to be known inside the module and an export list that contains all
names that are declared inside the module and are to be known outside. Thus the access is
explicitly specified by the programmer and visible in the program text.
There are four kinds of modules: main modules, definition modules, implementation
modules, and inner modules.
Main modules are almost like Pascal programs. They consist of an import list,
declarations (of constants, types, variables, procedures, and inner modules), and statements:
programmodule =
MODULE ident ";"
{import}
{declaration}
BEGIN
statement sequence

END

ident

"."

Only the line {import} is different from Pascal. It references other separately compiled
modules, and causes these modules to be loaded. In the most common form
import

= FROM

ident

IMPORT

identlist

";"

ident is the name of the module to be loaded and identlist contains the names of the objects
exported by the loaded module for use in the declarations and statements of the importing
module. In the less common form
import

= IMPORT

identlist

";"

the identlist contains only the names of the modules that are to be loaded together with the
importing module.
Separately compilable modules that are not main programs consist of two separately
compiled parts, the definition module and the implementation module. The definition
module describes the interface of the module to its clients. All declared names are
autornatically exported.
definitionmodule
DEFINITION

=

MODULE

ident

";"

{import}
{definition}
END ident "."

definition contains the declarations of the exported objects. Procedures are only specified by
their procedure heading (procedure name and parameters):
definition =
GONS Tarra
EYRE
| WS

a
Sec

| PROCEDURE

ident

[formalparameters]

";".

The implementation module contains the declaration of the non-exported objects, the code
for all procedures, and the statements of the module:
implementationmodule
IMPLEMENTATION

{import}
{declaration}
BEGIN

statement sequence
END

ident

"."

=

MODULE

ident

";"

App. B

Modula-2 and Pascal

211

Definition and implementation modules exist in pairs and have the same name. The
definition module must be compiled before the implementation module. A module can be
compiled only if the definition modules of all of the imported modules have been compiled
before.
Storage for local objects of separately compiled modules is allocated when the object
program is loaded, and remains allocated until the program terminates (static memory
allocation). The statement sequence of the implementation module is executed immediately
after loading the module, and therefore can be used for the initialization of data.
Inner modules are modules that are not separately compiled. They are like procedures
nested inside other modules or procedures. They can import and export.
moduledeclaration

MODULE

ident

=

";*

{import }
[EXPORT [QUALIFIED]
{declaration}

identlist

";"]

BEGIN

statement sequence
END

ident.

Storage for local objects of inner modules is allocated when the procedure that contains the
inner module is activated, and released when the procedure returns to its caller. By calling the
surrounding procedure, the statements of the inner module are also executed.
There is a (fictitious) separately compiled module
SYSTEM, provided by the
compiler, that gives access to low-level features. It exports types and related procedures
(including the type WORD). Each module that imports SYSTEM is therefore machine
dependent.
¥,#

C
Syntax of Cocol
Keywords:
Other terminal symbols:
Nonterminal symbols:

Cocol

Upper-case letters
Literals or lower-case letters
Upper and lower-case letters

= GRAMMAR

identifier

[SEMANTIC

DECLARATIONS
{any}]
{MACROS {SemMacroDef
}]
TERMINALS {Symbol [Attributes]
[PRAGMAS {Symbols
[Attributes]
NONTERMINALS

RULES

{identifier

{identifier

[AliasName]}
[SemAction]}]

[Attributes]

[Attributes]

"="

[AliasName]

Expression

ENDGRAM,

Expression

= Term

{"|"

Term

= Factor.

{Factor}.

Factor

=

[Attributes]

Symbol

Term}.

| EPS
| ANY

| SemAction
| "(" Expression ")"
[SUL SE xpressaonmua
| “Expression: ™}te
Attributes

=

"<"

( OutAttributes
| InAttributes

[";"

OutAttributes])

InAttributes

=

IN ":" (identifier | number)
{"," (identifier
| number)
}.

OutAttributes

=

OUT

SemAction

="SEM

":"

identifier

(EN (2

{","

identifier}.

identifiers)

| {any}
) ENDSEM.

SemMacroDef

= SEM

Symbol

= identifier

AliasName

= ALIAS

212

":"

identifier
|| string.

Symbol.

":"

{any}

ENDSEM.

">",

"."}

}

D
G-code

T
sy
terminal
If the next input, symbol is sy, then recognize it, else report an error.
TA
sy adr
terminal with alternative
If the next input symbol is sy, then recognize it, else go to adr.
NT
sy
nonterminal
If the next input symbol is a valid start of the nonterminal sy, then enter
the
production of sy, else report an error.
NTA
sy adr
nonterminal with alternative
If the next input symbol is a valid start of the nonterminal sy, then enter the
production of sy, else go to adr.
NTS
sy sem
nonterminal with input attribute semantics
If the next input symbol is a valid start of the nonterminal sy, then execute the
semantic action sem (for input attribute assignment) and enter the production of sy,
else report an error.
NTAS
sy adr sem nont. with alternative and input attribute semantics
If the next input symbol is a valid start of the nonterminal sy, then execute the
semantic action sem (for input attribute assignment) and enter the production of sy,
else report an error.
ANY
any
Recognize the next input symbol.
ANYA
nradr
any with alternative
If the next input symbol is in the symbol set (any-set) denoted by nr, then recognize
it, else go to adr.
EPS

nr

epsilon (empty string)

If the next input symbol is in the successor set (eps-set) denoted by nr, then
recognize the empty string, else report an error.
EPSA
nradr
epsilon with alternative
If the next input symbol is in the successor set (eps-set) denoted by nr, then
recognize the empty string, else go to adr.

11

JMP
adr
Go to adr.

Jump

RET

return

Return from the production of a nonterminal.

129%

SEM
semantic action
Execute the semantic action with the number of the G-code instruction.

213

E
Intermodular cross-reference list

The following list contains all names that are exported or imported by a module of the Coco
system as well as their data types. For every name, the first reference denotes the exporting
module and the other references the importing modules.
Allocate

PROC

(VAR

System,
alts

cocogen2,

ARRAY [1..10]

cocolex,
Attrtype

PROC

CloseFile

PROC

cocosyn

cocotst

s:Symbolset;

n:CARDINAL)

cocotst

cocogen,

cocogen2,

cocolst

coco,

cocosem

CARDINAL

PROC

PROC

cocogen,

cocogen2,

cocosem,

cocosym,

(nr:CARDINAL)

cocogen,

cocogen2,

cocosem,

(sy,nr:CARDINAL) : BOOLEAN

cocosym,

214

cocosem,

PROC

Errors,
CompleteAt

cocosem

m:Marklist)

PROC (f:File)
FileIO, coco,

cocolex,
CompErr

Errors

CARDINAL

cocosym,

(VAR

cocogen,
col

cocosyn,

cocosem

(VAR

cocosym,

Close

cocosym,

(term, nonterm, const)

cocogra,

ClearSet

OF

cocogen,

cocogen,
ClearMarkList

size:LONGINT)

cocogen2,

CARDINAL

cocogra,
at

ptr:ADDRESS;

cocogen,

cocosem

cocosym

cocosyn

App. E

con

Concat Left

Intermodular cross-reference list

File
FileIO, coco, cocogen, cocogen2, cocogra,
cocosym, cocosyn, cocotst, Errors
PROC

(VAR

PROC (VAR gp,gl,gp1,gl1:CARDINAL)
cocogra, cocosem

Copy

PROC (typ,col:CARDINAL)
cocogen, cocosem

CopyFramePart

PROC 4VAR f1,f2:File;
cocogen, cocogen2

ddt

ARRAY ["A".."Z"]

cocolex,

OF

coco,

s:ARRAY

OF CHAR)

BOOLEAN

cocogra,

cocosem,

PROC (VAR ptr:ADDRESS)
System, cocogen, cocogen2,
PROC

cocosem,

cocosem

ConcatRight

Deletable

cocolex,

gp/9g1,gpl1,g11:CARDINAL)

cocogra,

Deallocate

215

cocosym,

cocotst

Errors

(loc:CARDINAL) : BOOLEAN

cocogra,

cocosym,

cocotst

DeleteRedundantEps
PROC
cocogra, Coco
DelNode

Direction

PROC (gn:Graphnode) : BOOLEAN
cocogra, cocosym, cocotst

(up, down)
cocosym,

cocosem

Done

BOOLEAN

EF

CONST

EmitAction

PROC (line:CARDINAL;
cocogen, cocosem

EOL

CONST

FileIO,
FileIO,

cocogen,

cocogen2

cocolex,

cocolst
VAR

cocolex,

cocolst

RECORD

Errors,

cocosyn

Errorptr

POINTER
Errors,

TO Errornode
cocolst, cocosyn

File

RECORD
FileIO,

coco,

BOOLEAN
cocogen,

coco

FindCircularRules
PROC
cocotst,

(VAR
coco

filesopen

FindDelSymbols

sem:CARDINAL)

CHAR

FileIO,
Errornode

coco,

CHAR

PROC
cocosym,

Coco

cocogen,

cocogen2,

ok:BOOLEAN)

cocolex,

cocolst,

Errors

GenAssign
GenSynFiles

GetA

cocosem

PROC
cocogen2,

coco

PROC

(n:CARDINAL;

GetF

PROC

GetFirstSet

PROC

set:Symbolset)

VAR

(sy:CARDINAL;

set:Symbolset)

VAR

set:Symbolset)

cocotst

(spix:CARDINAL;

cocosym,

VAR

sem:CARDINAL)

cocosem

(spix:CARDINAL;VAR

cocolex,

cocogen,

name:ARRAY

cocogen2,

PROC (VAR nr, line,col:CARDINAL)
BrFOLSTREOCOLSE

GetNextSynErr

PROC

(VAR

GetNode

PROC

PROC

VAR

CHAR;VAR

cocosym,

cocogen2,
(VAR

VAR

gn:Graphnode)

cocosem,

synerrors,

cocosym,

cocotst

semerrors:CARDINAL)

EEEORSTELCHEO
GetSy

PROC

cocolex,

cocosyn

GetSy

PROC (sy:CARDINAL;
cocosym, cocogen2,

GetSymbolSets

PROC

cocosym,

coco

gramspix

CARDINAL
cocosym,

cocogen2,

PROC
cocogra,

cocosem

RECORD
cocogra,

cocogen2,

GraphList
Graphnode

InsertFramePart

PROC

cocogen,

cocosem

VAR sn:Symbolnode)
cocogra, cocosem, cocotst

cocosem

cocosem,

len:CARDINAL)

cocotst

line,col:CARDINAL)

cocolst

(p:CARDINAL;

cocogra,
GetNumberOfErrors

symbols:Errorptr;

OF

cocogra,

GetNextSemErr

Errors,

dir:Direction)

cocotst

cocosym,

PROC

VAR

VAR first:Symbolset)
cocotst

(loc:CARDINAL;

cocosym,

GetName

VAR

(sy:CARDINAL;
cocosym, cocogen2,

PROC

spix:CARDINAL;

cocosem

PROC (n:CARDINAL;
cocosym, cocogen2

PROC

set:Symbolset)
VAR

(sy,n:CARDINAL;

PROC

GetE

GetMacroNr

VAR

cocogen2

cocosym,

GetFo

left, right :CARDINAL)

(typ:Attrtype;

PROC

cocogen,

cocosym,

Get At

App. E

Intermodular cross-reference list

216

cocosym,

cocotst

App. E

IsInSet

line

LL1Test

Intermodular cross-reference list

PROC

(n:CARDINAL;

cocosym,

cocotst

CARDINAL
cocolex,

cocogen,

PROC

(VAR

cocotst,

lst

PROC

PROC

cocogen2,

(loc:CARDINAL;

cocosym,

cocosym,

cocogen2
cocogen2

maxn

CARDINAL

maxp

CARDINAL

cocogra,

cocotst

m:Marklist):

16]

BOOLEAN

OF BITSET

Cocogen2,

cocosym

cocogen2,

cocogra,

cocotst

cocosym,

cocogen2,

cocogra,

cocotst

CARDINAL
cocogen,

cocogen2

cocosym,
CARDINAL

CARDINAL

cocosym,
PROC

cocogen2,

NewEpsBeforeDelNts

dir:Direction)

cocosem

PROC

cocogra,
PROC

cocotst

(sy,spix:CARDINAL;

cocosym,

NewMacro

VAR

cocotst

cocosym,

cocosym,

cocotst
DIV

cocosym,

CARDINAL
cocosym,

NewAt

cocosyn

m:Marklist)

cocogra,

maxeps

maxt

VAR

ARRAY[O..maxnodes

CARDINAL

maxsem

cocosym,

cocotst

(loc:CARDINAL;

maxany

maxs

cocosem,

111:BOOLEAN)

coco,

cocogra,
Marklist

cocogen2,

BOOLEAN

File

cocogra,
Marked

s:Symbolset):

coco

cocolst,
Mark

VAR

217

Coco

(spix,sem:CARDINAL;

cocosym,

VAR

ok:BOOLEAN)

cocosem

NewNode

PROC (typ:Symboltype;
sp, line:CARDINAL) :CARDINAL
cocogra, cocosem

NewSy

PROC (spix:CARDINAL;
cocosym, cocosem

normal

enumeration constant
System, coco, Errors

Open

PROC (VAR f:File;
output
: BOOLEAN)

FileIO,

coco,

typ:Symboltype) : CARDINAL

volRef: INTEGER;

cocogen,

cocogen2,

fn:ARRAY

cocolst

OF

CHAR;

OpenFile

PROC

(spix:CARDINAL)

cocogen,
OpenSem

PROC

cocosem

(line:CARDINAL;

cocogen,

VAR

sem:CARDINAL)

cocosem

Parse

PROC (VAR correct
:BOOLEAN)
cocosyn, COCO

printinput

BOOLEAN
cocosyn,

PrintListing

COCO,

coco

BOOLEAN

cocosyn,
PrintSynError

cocolex

PROC

cocolst,

printnodes

App. E

Intermodular cross-reference list

218

PROC

coco,

(VAR

cocolex

f:File;

VAR

synerrors:CARDINAL)

VAR

ch:CHAR)

ERROESTRECOCONSE

PutStatistics

PROC

cocogen2,

Read

PROC

FileIO,

RepNode

PROC

coco

(VAR

£:File;

coco,

(p:CARDINAL;

cocogra,
RepSy

RestartHash

Restriction

PROC

cocogen,

PROC
cocolex,

cocosem

PROC

sn:Symbolnode)

cocogra,

rootloc

CARDINAL

rules

CARDINAL

cocogra,

cocogra,

cocogra,

cocolex,
cocosem,

cocogen2,

cocosem

PROC (sem:CARDINAL)
cocosem, cocosyn

SemErr

PROC

cocotst

cocosym,

cocosym
cocotst

(nr, line, col:CARDINAL)

Errors,

cocogen,

(VAR

cocosym,

src

cocosem,

cocogen2,

Semant

PROC

cocosem,

(nr:CARDINAL)

Errors,

SetBit

Errors

cocosym

(sy:CARDINAL;

cocogen2,

cocolst,

gn:Graphnode)

cocosem,

cocosym,

cocolex,

cocogen2,

cocolex,

s:Symbolset)

cocotst

File

cocolex,

coco,

cocogen,

StartCopy

PROC (col:CARDINAL)
cocogen, cocosem

StopHash

PROC

Symbolnode

RECORD

cocolex,

cocosem

cocolst

cocosem,

cocosym

App. E

Intermodular cross-reference list
cocosym,

Symbolset

cocogen2,

SyNr

PROC

cocogen2,

cocotst

OF BITSET

SyntaxError

PROC

PROC

coco,

(VAR

cocotst,

PROC

Errors

ok:BOOLEAN)

Coco

PROC

cocotst,

(VAR

ok:BOOLEAN)

coco

(VAR

ok:BOOLEAN)

cocotst,

Coco

CARDINAL
cocolex,

cocosem,

PROC

(VAR

cocosym,
PROC

cocosyn

sl,s2:Symbolset;

n:CARDINAL)

cocotst

(VAR

f:File;

FileIO, cocogen,
cocosym, Errors
PROC

line,col:CARDINAL)

(st:Status)

PROC

TestIfAllNtReached

cocotst

cocosyn

System,

TestCompleteness

cocosem,

cocosem

(symbols:Errorptr;

Errors,
Terminate

cocogra,

(spix:CARDINAL) : CARDINAL

cocosym,

WriteCard

cocosem,
16]

(eps,t,pr,nt,any,err)

cocosym,

typ

DIV

cocotst

cocogen2,

cocosym,
Symbolt ype

TestI£NtToTerm

cocogra,

ARRAY [0..maxterminals

219

(VAR

f:File;

ch:CHAR)

cocogen2,

cocolex,

nr:CARDINAL;

cocolst,

w: INTEGER)

FileIO, cocogen, cocogen2, cocogra, cocolex,
cocosem,
cocosym, cocosyn, cocotst, Errors
Writelnt

PROC (VAR f:File;
FilelIO, coco

WriteLn

PROC (VAR f:File)
FileIO, coco, cocogen,
cocosyn,

WriteString

PROC

(VAR

FileIO,

cocosem,
WriteText

cocotst,

f:File;

coco,

nr:INTEGER;

w: INTEGER)

cocogen2,

cocogra,

cocolst,

cocosym,

cocogra,

cocolex,

cocolst,

Errors

s:ARRAY

cocogen,

cocosym,

cocolst,

OF CHAR)

cocogen2,

cocosyn,

cocotst,

Errors

PROC (VAR f:File; t:ARRAY OF CHAR; 1: INTEGER)
FileIO, cocogen,
cocogen2, cocogra,
cocolex, cocosym,
cocotst, Errors

F
Program listings

This appendix contains the program listings of Coco, more than 3500 lines of Modula-2
source code. It is not our intention to describe the program step by step. At this point we
want to provide the reader with an overview of the function of the individual modules, and to

tell him where he should start reading, and which procedures he should further review in
order to understand the program. Modula-2 has a high degree of self-documentation, which
makes it possible to partition a large program into small modules that are easy to
understand, and furthermore to separate these modules into even smaller procedures that are
once more easy to understand. By reviewing the algorithms in Chapters 2, 3 and 7, it should
not be difficult for the reader to understand all the details of Coco.

F.1

Overview

Figure F.1 shows the phases of Coco with their modules and the data flow between them.
The lexical analyzer (cocolex) reads the compiler description and separates it into
tokens. The syntax analyzer (cocosyn) checks the syntax of the input stream and drives the
semantic processing program (cocosem) by activating semantic actions via action numbers.
In this phase, the symbol list (in cocsym) and the top-down graph (in cocogra) are
generated. The module cocogen generates the new semantics evaluator from the semantic
actions of the compiler description. Finally, the symbol list and the top-down graph are
analyzed in the grammar tests (cocotst), and if these tests have been successfully completed,
the new syntax analyzer with its parser tables is generated.
Since Coco was constructed by itself, the syntax analyzer (cocosyn) and its semantic
evaluator (cocosem) are examples of compiler parts produced by Coco.

220

App. F

Overview

221

Compiler description

Lexical analysis
cocolex
Mannnunsnnnnnnnennnnnnnnnnee

Symbols,

attributes

Syntax analysis
Semantic analysis
cocosyn
cocosem
cocosym
cocogra
cocogen

RER

Symbol list

Compiler generation

Syntax analyzer

Fig. F.1 Phases and modules of Coco
*

F.2

Module

hierarchy

Coco consists of
1.

10 Coco-related modules

coco
cocolex
cocosyn
cocosem
cocogra
cocosym

main module
lexical analyzer
syntax analyzer
semantic evaluator
top-down graph handler
symbol list handler

cocotst

grammar tests

cocogen
cocogen2
cocolst

generator of the new semantic evaluator
generator of the new syntax analyzer and the parser tables
source list generator

2.

2 general purpose standard modules
Errors
general error module for compilers generated by Coco
FileIo
input/output procedures

3.

1 operating system module (not part of Coco)
System
dynamic memory management (heap)

222

Program listings

App. F

Figure F.2 shows the module hierarchy. An arrow from module A to module B means that
A calls B.
Arrows leading to the operating system module and the standard modules are not shown
for simplicity. Those modules are used by almost all of the other modules, and are not a
direct part of Coco.

cocogen

cocogra

cocosym

cocolex

System

FileIO

Errors

Fig. F.2 Module hierachy with relation 'uses procedures from'

F.3

Module

descriptions

We will now give a short description of all modules of the Coco system. A diagram for each
module will show which procedures are called from other modules.
coco

coco is the main module. It opens the source file and the list file and calls the syntax
analyzer (Parse). When the syntax analysis is completed, the source file has been read, and
the symbol list and a top-down graph have been stored. The top-down graph is further
processed by inserting and deleting eps-nodes at certain positions (NewEpsBeforeDelNts,
DelRedundantEps) and the terminal start symbols are collected (FindDelSymbols,
GetSymbolSets). After that, coco calls the grammar tests (FindCircularRules, TestIfNtToTerm, TestCompleteness, TestIfAllNtReached, LL1 Test) and generates the
target
compiler (GenSynFiles) if no errors are found. At the end, statistics about the compilation
are written to the list file (PutStatistics), and all files are closed.

App. F

Module descriptions

FindDelSymbols
GetSymbolSets

NewEpsBeforeDelNts

DelRedundantEps

223

GenSynFiles
PutStatistics

CloseFile

FindCircularRules
Testl£fNtToTerm

TestCompleteness
TestlfAllNtReached
LL1Test

Fig. F.3 coco and the modules imported by it
cocolex
cocolex is the lexical analyzer of Coco. It reads the Cocol input, separates it into tokens,
and passes them together with their attributes to the syntax analyzer. Names and strings are
stored in a name list. Numbers are translated into their numeric value. The main procedure of
cocolex is GetSy.

cocosyn
cocosyn is the syntax analyzer of Coco and has been generated by Coco itself. It operates
according to the table-driven LL(1) parsing algorithm described in Section 2.5, and uses the
error-handling mechanism described in Section 2.6. cocosyn gets the source tokens from the
lexical analyzer (GetSy), analyzes them, and calls the procedure Semant to execute the
semantic actions.
cocosyn

Fig. F.4 cocosyn and the modules imported by it

cocosem
cocosem is the semantics evaluator of Coco. It has been generated by Coco itself and
contains the semantic actions of the attributed grammar of Coco. cocosem calls the
procedures for the generation and management of the symbol list and the top-down graph:
1.
2.
3.

4.
5.
6.

symbol handling: NewSy, GetSy, RepSy, SyNr;
attribute handling: NewAt, GetAt, CompleteAt,
top-down graph handling: NewNode, GetNode, RepNode, ConcatLeft,
ConcatRight, GraphList;
generation of the semantic evaluator: OpenFile, CloseFile, OpenSem,
StartCopy, Copy, InsertFramePart, GenAssign, EmitAssign, EmitAction;
handling of the semantic macros: NewMacro, GetMacroNr;
control over the entries into the name list: StopHash, RestartHash.

224

Program listings

App. F

The listing of cocosem is an example of a large semantic evaluator generated by Coco. But
it is not useful to study cocosem, rather one should study the attributed grammar.
cocosem

OpenFile

ConcatRight

CloseFile
Copy
InsertFramePart
StartCopy

GraphList

OpenSem

CompleteAt

GenAssign

NewMacro
GetMacroNr

EmitAction

Fig. F.5 cocosem and the modules imported by it

cocosym
The module cocosym handles the symbol list of Coco. It contains procedures to generate,
read, and modify symbol nodes, to search names in the symbol list, to enter, read, and check
attributes, and to generate and retrieve information about semantic macros. It also contains
procedures to determine the deletability of nonterminals, and to collect their terminal start
symbols. cocosym uses a few procedures from cocolex and cocogra.
cocosym

ClearMarkList
Mark
Marked

Fig. F.6 cocosym and the modules imported by it

cocogra
The module cocogra handles the top-down graph. It contains procedures to generate, read,
and modify graph nodes, to link subgraphs, and to print the entire top-down graph for
tracing. cocogra also contains procedures to insert eps-nodes in front of deletable
nonterminals, and to remove redundant eps-nodes. To output the top-down graph, cocogra
needs the syntax symbols and their names, which it gets from the modules cocosym
and
cocolex.

cocogen
The module cocogen generates the semantic evaluator of the target
compiler from the
semantic declarations and semantic actions of the input grammar. It contains
procedures to

App. F

Module descriptions

225

cocogra

RepSy

Fig. F.7 cocogra and the modules imported by it
read the frame module, to copy the semantic parts from the attributed grammar, and to
translate attributes into semantic actions. cocogen uses no other modules of Coco except
for the lexical analyzer, from which it gets the symbol names.
cocogen

Fig. F.8 cocogen and the modules imported by it
cocotst
The module cocotst is a collection of procedures for the execution of the grammar tests as
described in Section 7.5. It uses the symbol list (from cocosym) and the top-down graph
(from cocogra). For the output of error messages, cocotst needs the symbol names which
are. obtained with the procedure GetName. To recognize the deletability of graph nodes, and
subgraphs, it uses the procedures Deletable and DelNode from cocogra.
cocotst

Deletable

DeINode
ClearMarkList

Mark
Marked

Fig. F.9 cocotst and the modules imported by it
cocogen2

The module cocogen2 generates the syntax analyzer and the parser tables of the target
compiler. The table values are obtained from the symbol list (with GetSy, RepSy, GetF,
GetE, and GetA) and from the top-down graph (GetNode). Before the tables can be
inserted into the syntax analyzer, cocogen2 transforms the top-down graph into G-code
instructions. The syntax analyzer of the target compiler is assembled mainly from the frame
parts (on the file cocosynframe), in which cocogen2 inserts the parser tables, some

App. F

Program listings

226

declarations, and grammar-specific names. For the output of statistics, cocogen2 uses the
procedure GetName from the lexical analyzer.
cocogen2

cocogen
CopyFramePart

Fig. F.10 cocogen2 and the modules imported by it
cocolst
cocolst is called by the main program if errors have been detected during parsing. It reads
the input again and prints a source list with error messages.
Errors
Errors is a general-purpose error message module that can be used by all compilers
generated by Coco. It contains procedures for storing semantic and syntax errors, for
retrieving stored error messages, and for printing all of the stored error messages at the end of
the program. In addition, it contains procedures for handling implementation restrictions and
compiler errors.

FileIO
FileIO is a general-purpose module that contains screen and disk I/O procedures for
characters, strings, and numbers. It is based on five system modules which are not described
in this book. These are Terminal, MemTypes, OS, Toolbox and QuickDraw (see Inside
Macintosh [1985] and Wirth et al. [1986]).
System
System is an operating system module that among other things manages the heap.

F.4 Instructions

on how

to study the source

code

The listings consist of the attributed grammar of Coco and all other modules in alphabetical
order. The reader should first study the source code of the main module coco to see how the
program is started and initialized. The lexical analyzer and the syntax analyzer are not
essential for an understanding of the other modules, so they may be skipped in the
beginning.
The central document that describes the actual translation is the attributed grammar.
The reader should study the attributed grammar and the procedures that are called from the
semantic actions in detail. It is recommended that the procedures belonging to a particular
task are studied together. These tasks are:

App. F

Instructions how to study the source code

227

handling the symbol list: NewSy, GetSy, RepSy, IsSy
handling the attributes: NewAt, GetAt, CompleteAt
handling the top-down graph: NewNode, GetNode, RepNode, ConcatLeft,
Ka
SE
ConcatRight, GraphList
generating the semantic evaluator: CloseFile, CopyFramePart, InsertFramePart
copying semantic parts: OpenSem, StartCopy, Copy
generating attribute assignments: GenAssign, EmitAction
handling semantic macros: NewMacro, GetMacroNr
ee
controlling the name list entries: StopHash, RestartHash
The procedures for the collection of the symbol sets and the execution of the grammar tests
may be studied in any order. The only procedures used almost everywhere are the procedures
for marking paths that have been previously visited in traversing the top-down graph
(ClearMarkList, Mark, and Marked in cocogra) and the procedures which check the
deletability of graphs and graph nodes (Deletable and DelNode in cocogra). These
procedures should be read first.
As the last module, the reader should study cocogen2. It generates the parser tables and
the syntax analyzer, and uses the data structures generated by the other modules. The reader
should study these modules first to understand how the data structures are filled.
Before an implementation module is studied, the corresponding definition module
should be inspected. It describes the interface of the module, and contains the declarations and
descriptions of all exported objects. The procedures of an implementation module appear in
alphabetical order. Most of them are at the outermost level of the module. Only auxiliary
procedures that are clearly part of another procedure are nested within this procedure.
Each implementation module is followed by a cross-reference list. As an additional aid,

Appendix E contains an intermodular cross-reference list with the names and types of all
objects transferred between modules. This list also shows which modules export an object
and which import it.

Program listings in alphabetical
coco.ATG
coco.MOD
cocogen.DEF,
cocogen.MOD
cocogen2.DEF, cocogen2 .MOD
cocogra.DEF,
cocogra.MOD
cocolex.DEF,
cocolex.MOD
cocolst.DEF,
cocolst.MOD
cocosem.DEF,
cocosem.MOD
cocosemframe
cocosym.DEF,
cocosym.MOD
cocosyn.DEF,
cocosyn.MOD
cocosynframe
cocotst.DEF,
cocotst.MOD
Errors.DEF,

FileIO.DEF
System.DEF

Errors.MOD
FileIO.MOD

order
attributed grammar
main program
generator of semantics processor
generator of the syntax analyzer
top-down graph manager
lexical analyzer
source list generator
semantic evaluator of Coco
semantics evaluator frame
symbol list manager
syntax analyzer
syntax analyzer frame
grammar tests
standard error module

input/output module
dynamic memory management

228
241
245
254
266
274
283
287
297
299
316
328
338
348
356
369

App. F

Program listings

228

1 --

Attributed

Q --

ssesss=s==s===s==sssssSs=5==5=====
This grammar is a documentation of the compiler compiler Coco,
but it is also an example how to use the Coco input language Cocol.
The grammar describes the construction of the parser tables and of

3
4
5
6
|

grammar

----the semantic
einen

8 GRAMMAR

of Coco

evaluator.

Moe

EI

Ba

EIER

ae

13.3.83

ee

eee

coco

9
ty SS
u

coco

= GRAMMARSY

=

IDENT

[SEMANTICSY

DECLARATIONSY

{any}]

IP ==
eh =
Ne

[MACROSY {macrodef}]
TERMINALSY {symbol [attr]
(PRAGMASY
{symbol [attr]

15 -16%==
17 ==

NONTERMINALSY {IDENT [attr]
[aliasname] }
RULESSY {IDENT [attr]
'=' expr '.'}
ENDGRAMSY .

18 -92
ZUR

expr
Zzterm
ca

= term {'|' term)
=Etaetz trace).
= ( symbol [attr]

Ze

| EPSSY

BR

| ANYSY

PAR) iz
Co

| semaction
I UO ebgoye tl

DB) ==
ZAC SS

I DIR Sepgoyer UY
[UG kexpral
tcl)

[aliasname] }
[semaction]}]

.

ue.

ZU

attr

=

(outattr

Bes
299
2
Hl

a=
=
=
=

inattr
outattr
semaction
macrodef

= INSY ':' (IDENT | NUMBER)
{',' (IDENT | NUMBER)}
= OUTSY ':' IDENT {',' IDENT}
= SEMSY ( '(' IDENT ')' | {any}) ENDSEMSY
.
= SEMSY ":" IDENT ":" {any} ENDSEM .

etinattr

pit;

outavtr))

er

32 -38) =

symbol
aliasname

= IDENT | STRING .
= ALIASSY symbol.

34
35
36

SEMANTIC

DECLARATIONS

3] --===================
38
39 FROM
40
41 FROM
42
43 FROM
44 FROM
45
46

cocogen

IMPORT

cocogra

IMPORT

cocolex
cocosym

IMPORT
IMPORT

Attrtype, CloseFile, Copy, EmitAction, GenAssign,
InsertFramePart, OpenFile, OpenSem, StartCopy;
alts, rules, rootloc, ConcatLeft, ConcatRight,
GetNode, GraphList, Graphnode, NewNode, RepNode;
typ, line, col, ddt, RestartHash, StopHash;
gramspix, CompleteAt, Direction,
GetAt,

GetMacroNr,

NewSy,

RepSy,

FROM

Errors

IMPORT

CompErr,

48 FROM

SYSTEM

IMPORT

VAL;

47

GetSy,

Symbolnode,

Restriction,

52
53
54

55

CONST

null

= 65535;

--

null

symbol

--

symbol

TYPE

Usage

=

(def,

check,

use);

56
57 VAR

58
-- symbol
59eysn:

nodes

Symbolnode;

node

NewMacro,

Symboltype,

SemErr;

49
50
51

NewAt,

SyNr;

App. F

60
61
62

coco.ATG

sy, syl:
rootsy:
eofsy:

63
64
Ooi
66
67
68
69
70

CARDINAL;
CARDINAL;
CARDINAL;

----

229

symbol numbers
start symbol of grammar
endfile symbol (always Nr.

-- graph nodes
gn:
gp,gpl,gp2,gp3:
gl,gl1,g12,g13:

Graphnode;
CARDINAL;
CARDINAL;

-- graph node
-- ptr to start
-- ptr to right

dd, dd1,dd2:
gpo:
firstfact:

BOOLEAN;
CARDINAL;
BOOLEAN;

----

il
U2
73
74
1
Ths
77

-- attribute processing
Kinde
Usage;
styp:
Symboltype;
dir, dirl: Direction;
count:
CARDINAL;
48
CARDINAL;
+-- generation of semantic

of graphs
open ends

of graphs
is graph deletable ?
auxiliary ptr
TRUE if first factor in term

-- usage of attribute
-- (eps,t,pr,nt,any,err)
-- input/output attribute
-- attribute counter
-- value of an attribute constant
evaluator

78

seml,sem2,sem3:

CARDINAL;

--

semantic

79
80
81

firstsymbol:
-- various
ok:

BOOLEAN;

--

current

82

spix,

---

error indicator
auxiliaries

83
84
85

dummy:

86 --

BOOLEAN;
CARDINAL;

spixl:

0*)

actions

symbol

the

first

in action

?

CARDINAL;

SEMANTICSTACK

Stack

to save

semantic

values

87 --===========2=====2===2===2=2====2
2222222222222 22222222222 == 2222222222222
88 MODULE

SEMANTICSTACK;

89 IMPORT

CompErr,

90 ‚EXPORT

Pop,

Restriction;

Push;

91 CONST maxstacksize = 70;
92 VAR
93
stack: ARRAY[l..maxstacksize]
94
sp:
CARDINAL;
95
96 PROCEDURE Pop(): CARDINAL;
97 VAR
98

x:

OF CARDINAL;

CARDINAL;

BEGIN

99
100
101

IF sp=0 THEN
RETURN x;
END Pop;

CompErr(6);

ELSE

x:=stack[sp];

DEC(sp);

END;

102
103
104

PROCEDURE
BEGIN

105

IF sp<maxstacksize

106

THEN

107
108
109
110
111

Push(x:CARDINAL) ;

INC (sp);

stack[sp] :=x;

ELSE Restriction(14);
END;
END Push;
BEGIN

112

sp:=0;

113

END

SEMANTICSTACK;

114
AES)
iMG

——s

a

ee

heron

Report

semantic

118

PROCEDURE

Error(nr:CARDINAL) ;

error

2222222

230

Program listings

119 BEGIN SemErr(nr,line,col); END Error;
120
121
122
123
124
sem :AssignIdl:
125
126
INC (count);
CASE kind OF
127
use:
128
IF styp=nt THEN
129
GetAt (!sy, !count, “spixl, “dirl);
130
IF spixl<>0 THEN
131
IF dir=dirl
132
THEN GenAssign(!nonterm,
!spixl, !spix);
133
ELSE Error(8); END;
134
185
END;
136
END;
137
| check:
138
IF styp=nt THEN
139
GetAt (!sy, !count, *spixl, *dirl);
140
IF spixl<>0 THEN
141
IF spix<>spixl THEN Error(9); END;
142
IF dir<>dirl THEN Error(8); END;
143
END;
144
END;
145
| def:
146
NewAt (!sy, !spix, !dir);
147
END; -- CASE
148
endsem
149
150
sem :AssignId2:
151
INC (count) ;
192
CASE kind OF
153
use:
154
IF styp=t THEN
153
GenAssign(!term,
!spix, !count) ;
156
ELSIF styp=nt THEN
157
GetAt (!sy, !count, “spixl,“dirl);
158
IF spixl<>0 THEN
159
IF dir=dirl
160
THEN GenAssign(!nonterm,
!spix, !spix1)
161
ELSE Error (8);
162
END;
163
END;
164
END;
165
| check:
166
IF styp=nt THEN
167
GetAt (!sy, !count, *spixl, *dirl);
168
IF spixl<>0 THEN
169
IF spix<>spixl THEN Error(9); END;
170
IF dir<>dirl THEN Error(8); END;
171
END;
172
END;
173
| def:
174
NewAt (!sy,!spix,!dir);
175
IF styp=pr THEN
176
GenAssign(!term,
!spix, !count) ;
177
END;

App.F

App. F

coco.ATG

178

END;

179

endsem

--

231

CASE

180
181
182

sem

183

:AssignNumber:
INC (count);
IF kind=use

184
185

THEN
IF styp=nt

THEN

186

GetAt (!sy, !count, *spixl,“dirl);

187
188

IF spixl<>0 THEN
IF dir=dirl

189
190

THEN GenAssign(!const,
!spix1, !n) ;
ELSE Error (8);

191

END;

192
193
194
195

END;
END;
ELSE Error(10);
END;

196

endsem

197
198
199

sem

:CheckAttr:
IF NOT CompleteAt
(!sy, !count)

200

THEN

Error (6);

201

END;

202

endsem

203
204
205

sem

:Copy:
Copy (typ, col)
endsem

sem

:InitCopy:
StartCopy
(1)

206

207
208”,
209
210

endsem

211
212
213
214

sem

:PopPointers:
firstfact
:=VAL (BOOLEAN, Pop());
ddl :=VAL (BOOLEAN, Pop()); gll:=Pop();

215

dd:=VAL (BOOLEAN, Pop());

216
217
218
219

gpo:=0
endsem
sem

220
221

gl:=Pop();

:PushPointers:

Push(!gp); Push(!gl); Push(!VAL(CARDINAL,
dd) );
Push(!gpl); Push(!gll); Push(!VAL(CARDINAL,
ddl) ) ;

222

Push(!VAL(CARDINAL, firstfact) );

223
224
225
226
227
228

endsem

229
230
231
232
233
234

gpl:=Pop();

gp:=Pop();

sem

:StoreSymbol:
sy:=SyNr(!spix);
IF sy=null
THEN sy:=NewSy (spix, styp)

ELSE
END;
endsem

TERMINALS

235 --=======
236

Error(1);

232

Program listings

237

--

238
239
240
241

ALIASSY
ANYSY
DECLARATIONSY
ENDGRAMSY

242
243

App. F

key words

ENDSEMSY
+EPSSY

alias
alias
alias
alias

"ALIAS"
"any"
"DECLARATIONS"
"ENDGRAM"

==
----

1:
2:
3:
4:

ALIAS
ANY, any
DECLARATIONS
ENDGRAM

alias
alias

“endsem"
"eps"

-==

5:
OH

ENDSEM
[hy es

"GRAMMAR"

GRAMMAR

244

GRAMMARSY

alias

--

7:

245

INSY

allasarinz

=—)

9:

246
247

MACROSY
NONTERMINALSY

alias
alias

"MACROS"
"NONTERMINALS"

---

9:
10:

MACROS
NONTERMINALS

248

OUTSY

alias

"out"

--

11:

OUT,

249
250

PRAGMASY
RULESSY

alias
alias

"PRAGMAS"
"RULES"

-- 12:
13:2

PRAGMAS
RULES

251

SEMSY

alias

"sem"

--

14:

SEM,

252
253

SEMANTICSY
TERMINALSY

alias
alias

"SEMANTICS"
"TERMINALS"

---

15:
16:

SEMANTICS
TERMINALS

alias

identifier

-a=
=

17:
Alls,
9:

name
Shgkealiave;
eeiconstant

IN,

in

in

sem

254
255
256
257
258
259
260
261
262
263
264
205)
285
267

-- terminal classes
IDENT
<out:spix>
STRING
<out:spix>
NUMBER
<out:n>

al
Ne
ul
mi
u)!
MIP
al

-- 20
==, Vil
==e22
== 23
SS 7!
== 2S
== 2G

20

Sees

-- 27

269

ea

a=

270
271

ee
USS

be
=="

30

272

u:

==:

3]

273

Es

a

ey)

274
275
276
277

ir
nococosy

==

Se)

278

NONTERMINALS

AG
280

ee ee
coco

281
282
283
284
285

--

expr

286
287

293
294
295

characters

<out:n>

alias "correct grammar"
~~ recognizes the whole compiler description
<out:gp,gl,dd>
alias expression
-- recognizes an expression and builds its TDG.
-- gp points to the root of the TDG
-- gl points to right open ends of the TDG

<out:gpl,gll,ddl>

----

fact

fhe

3A

-- dd indicates
term

288
289
290

291
292

single

recognizes
gpl points
gll points

if the TDG

alias

35

==

is deletable

alternative

Shi)

an alternative and builds
to the root of the TDG
to right open ends of the

-- ddl indicates if the TDG is deletable
<in:gpo,firstfact; out:gp2,g1l2,dd2,gpo>
-- recognizes
-- gp2 points

a=

alias symbol
a component and builds
to the root of the TDG

its

TDG.

TDG

its TDG.

"==

38

App. F

coco.ATG

296
297
298
299
300

----

301
302
303
304
305
306
307
308
309
310
Ss]
312

is TRUE, if fact is the
out:seml,sem2,count>
„alias attribute

first

is 0
one

in the term
== 49)

-- recognizes input/output attributes for the symbol
-- with type styp.
-- kind=def:
used in declaration context
==
seml=0. sem2=0 (except of pragmas)
-- kind=check: used on the left-hand side of rules

inattr

seml=0,

-- kind=use:
==
Saye
-- count is the

used on the right-hand side of rules
seml: sem.no. of input attribute evaluation
sem2: sem.no. of output attribute evaluation
nr.of attributes in attr

sem2=0

out:seml,count>

==

alias "in-attribute"
input/output attributes

-- recognizes
-- with type styp

315

sy

--

<in:sy,styp,kind,count;

313
314
316
317
318
319
320
321
322
323
324
325

gl2 points to right open ends of the TDG
dd2 indicates if the TDG is deletable
gpo points to the predecessor of fact or

-- firstfact
<in:sy,styp,kind;

attr

233

for the symbol
(sy must be a nonterminal).

40

sy

-- kind=def:
used in declaration context
-seml=0.
-- kind=check: used on the left-hand side of rules
==
seml=0.
-- kind=use:
used on the right-hand side of rules
=
seml: sem.no. of input attribute evaluation
-- count is the no.of attributes in inattr
<in:sy,styp,kind,count; out:sem2,count>
alias "out-attribute”
-- recognizes input/output attributes for the symbol sy

outattr

326,

--

321:

-- kind=def:

with

type styp.

used in declaration

328
329
330
331
332

--- kind=check:
_—
-- kind=use:
ca

sem2=0.
used on the left-hand side of rules
sem2=0.
used on the right-hand side of rules
sem2: sem.no. of output attribute evaluation

context

333
-- count is the no.of attributes in outattr
334
semaction
<out:sem3>
alias "semantic action"
== 42
335
-- recognizes a semantic action and generates a CASE block
336
-- in Semant. sem2 is the action number.
337
macrodef
alias “semantic macro”
as 4)
338
symbol
<out:spix>
-- 44
339
-- recognizes a name or a string
340
aliasname <in:sy>
alias "alias name"
=_=45
341
-- recognizes a name which is used for the symbol sy in
342
-- syntax error messages in the generated compiler.
343
344
345 --======================== grammar rules ================2===============
346

347

RULES

coco

=

348

GRAMMARSY

349
350
351

IDENT

<out:gramspix>

sem

rules:=0; alts:=0;
OpenFile (gramspix);
endsem

352
353

354
355

[ SEMANTICSY

{ any

DECLARATIONSY

sem
sem

(InitCopy) endsem
(Copy) endsem

StopHash;

356
3911
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
Si)
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414

App. F

Program listings

234

]

sem RestartHash;
InsertFramePart;
endsem

[ MACROSY

{ macrodef

TERMINALSY
{ symbol <out:spix>

styp:=t;

} ]

sem eofsy:=NewSy(!0,!t) endsem
sem (StoreSymbol) endsem

[ attr <in:sy,t,def; out:seml,sem2,count>
{ aliasname <in:sy> ]

}
{ PRAGMASY
{ symbol
{ attr

<out:spix>

sem
sem

<in:sy,pr,def;

]

styp:=pr endsem
(StoreSymbol) endsem
out:seml,sem2,count>

sem GetSy(!sy,*sn);
RepSy (!sy,!sn);
endsem

sn.seml:=sem2;

]
{ semaction

<out:sem3>

sem GetSy(!sy,“sn);
RepSy (!sy,!sn);
endsem

sn.sem2:=sem3;

}
}
]
NONTERMINALSY
{ IDENT <out:spix>

sem styp:=nt endsem
sem (StoreSymbol) endsem

[ attr <in:sy,nt,def; out:seml,sem2,count>
[ aliasname <in:sy> ]

}

sem

]

rootsy:=SyNr (!gramspix);
IF rootsy=null THEN Error(2);
endsem

END;

RULESSY

{ IDENT

<out:spix>

sem

sy:=SyNr(!spix);
IF sy=null THEN
Error(3); sy:=NewSy(!spix,
!err)

END;
GetSy(!sy,”sn);
IF (sn.typ<>nt)

AND

(sn.typ<>err)

Error (4);

END;
IF sn.start<>0

[ attr
We

expr

THEN Error(5);

END;

syl:=sy; count:=0; styp:=sn.typ
endsem
<in:sy,styp, check; out:seml,sem2,count> ]
sem (CheckAttr) endsem

<out:gp,gl,dd>

sem GetSy(!syl,*sn);
sn.start:=gp; sn.del:=dd;
RepSy(!syl,!sn);
INC (rules);
endsem

St

}

sem

rootloc:=NewNode (Int, !rootsy, !0);
gpl:=NewNode (!t, !eofsy, !0);

gl:=rootloc; gll:=gpl;
ConcatRight (rootloc,gl,
endsem

!gpl,!gli)

THEN

coco.ATG

ENDGRAMSY

sem

235

IF ddt["L*]

THEN

GraphList;

END;

CloseFile;
endsem.

expr <out:gp,gl,dd> =
term <out:gp,gl,dd>

“

sem

INC(alts);

<out:gpl,gll,ddl>

sem

INC (alts);

endsem

el
term

Concatleft
(gp,gl, !gpl,!gqll);
dd:=dd OR ddl
endsem

term

<out:gpl,gll,ddl>

=

sem gpo:=0 endsem
fact <in:gpo,TRUE; out:gpl,gl1,ddl,gpo>
{ fact <in:gpo,FALSE; out:gp2,g12,dd2,gpo>
sem IF gp2<>0 THEN

ConcatRight (gp1,g11,!gp2, !gl2);
ddl:=ddl AND dd2;
END;

endsem

fact <in:gpo,firstfact;
( symbol <out:spix>

out:gp2,g12,dd2,gpo> =
sem sy:=SyNr(!spix);
IF sy=null THEN
Error(3); sy:=NewSy (!spix, !err)
END;

GetSy(!sy,”sn);
IF sn.typ=pr THEN Error (16); END;
gp2:=NewNode (!sn.typ, !sy, !line);
gl2:=gp2; dd2:=FALSE; gpo:=gp2;
count:=0; styp:=sn.typ
endsem
[ attr

<in:sy,styp,use;

out:seml,sem2,count>

sem

GetNode(!gp2,%gn) ;
gn.seml:=seml; gn.sem2:=sem2;
RepNode (!gp2, ! gn)
endsem
sem (CheckAttr) endsem

]
| EPSSY

sem

gp2:=NewNode(!eps,!0,!line);

| ANYSY

endsem
sem gp2:=NewNode(!any,!0,!line);

gl2:=gp2;

gl2:=gp2;
| semaction

<out:sem3>

sem

dd2:=TRUE;

dd2:=FALSE;

gpo:=gp2

gpo:=gp2

endsem
IF gpo=0
THEN

gp2:=N(!eps,
ewN
!0,
ode
!line);
gl2:=gp2; dd2:=TRUE;

GetNode (!gp2,”gn);
RepNode (!gp2, !gn);

gn.sem3:=sem3;

ELSE

GetNode(!gpo,*gn); gn.sem3:=sem3;
RepNode (gpo, gn) ;
gp2:=0; gl2:=0; gpo:=0
END;

endsem

474
475
476

IR
expr
ie

477

[Peet

478

App. F

Program listings

236

sem

(PushPointers)

endsem

sem

(PopPointers)

sem

(PushPointers)

sem

gp2:=NewNode
(!eps, !0,!line);

<out:gp2,g12,dd2>

expr

<out:gp,gl,dd>

479

endsem

endsem

gl2:=gp2;

480

ConcatLeft
(gp,gl, !gp2,!g12);

481

gp2:=gp;

482
483
484
485

gl2:=gl;

dd2:=TRUE;

endsem
sem (PopPointers) endsem
sem (PushPointers) endsem

si
linet
expr

<out:gp,gl,dd>

sem

gp2:=NewNode(!eps,!0,!line);

486

gl2:=gp2;

487
488

ConcatRight
(gp,gl, !gp, !gl);
ConcatLeft
(gp,gl, !gp2,!g12);

489
490
491
492
493
494
495

gp2:=gp; dd2:=TRUE;
-- gl2 is link of eps
endsem
sem (PopPointers) endsem
sem IF firstfact THEN
gp3:=gp2; gl3:=g12;

u

gp2:=NewNode
(!eps, !0, !line); gl2:=gp2;

496
497
498
499
500
501
502
503
504
505

506
507
508
509
510
511

512
513
514
515
516
517
518
519
520

ConcatRight (gp2,g12,!gp3, !g13);
END;
endsem
).
-----------------------------------------------------------------------attr <in:sy,styp,kind; out:seml,sem2,count> =
net
sem seml:=0; sem2:=0 endsem
( inattr <in:sy,styp,kind,0; out:seml,count>
[ ';' outattr <in:sy,styp,kind,count; out:sem2,count>
| outattr <in:sy,styp,kind,0; out:sem2,count>

inattr
INSY

<in:sy,styp,kind,count; out:seml,count> =
sem IF styp<>nt THEN Error(7);
dir:=down;

532

END;

endsem
Ba
( IDENT <out:spix>
| NUMBER <out:n>
)
(er
( IDENT <out:spix>
| NUMBER <out:n>
)}

sem
sem

(AssignIdl) endsem
(AssignNumber) endsem

sem
sem
sem

(AssignIdl) endsem
(AssignNumber) endsem
IF kind=use THEN
EmitAction(!line,*sem1)
END;

523

1

]

)
‘oak
--------------------------------------------------~-~-------------------

521
522

aaa
525
526
Sn
528
529
am

node

;

endsem.
outattr
OUTSY
Mosh
IDENT
UN

menu
<in:sy,styp,kind,count; out:sem2,count>
sem dir:=up endsem
<out:spix>

IDENT

}

<out:spix>

=

sem

(AssignId2)

sem
sem

(AssignId2) endsem
IF (kind=use) OR (styp=pr)

endsem

EmitAction(!line, "sem2) ;

THEN

App. F

33
534
SE
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550

coco.ATG

semaction
SEMSY
WE:
IDENT

===
<out:sem3>

END;
endsem.
Sr

237

nn

m

=

sem StopHash; firstsymbol:=TRUE endsem
“sem RestartHash endsem
sem GetMacroNr (!spix, “sem3) ;
IF sem3=0 THEN Error(12); END;
endsem

<out:spix>

UpNee
| { any

sem

IF firstsymbol THEN
firstsymbol:=FALSE;
OpenSem(!line,*sem3);
END;
Copy (!typ, !col)
endsem
sem RestartHash; endsem

5

}

StartCopy(!col)

)

551

ENDSEMSY.

952
553

------------------------------macrodef =

554

SEMSY

955
556

tet
IDENT

<out:spix>

sem

SE
558

2.0.0000...

OpenSem(!line, ”sem3);
NewMacro
(!spix, !sem3, *ok) ;
IF NOT ok THEN Error (11); END;

559
StopHash; firstsymbol:=TRUE;
560
endsem
561
San
562
{ any
sem IF firstsymbol THEN
563%
firstsymbol:=FALSE; StartCopy (col)
564
END;
565
Copy (!typ, !col)
566
endsem
567
}
568
ENDSEMSY
sem RestartHash endsen.
569 ------------ nnn nn nnn nnn 7777220
570
symbol <out:spix> =
oval
( IDENT <out:spix> | STRING <out:spix> ).
a
a
a
573

aliasname

574

ALIASSY

915
576

symbol

<in:sy>

=

<out:spix>

sem

WT

GetSy(!sy,“sn);
RepSy(!sy,!sn);

sn.aliasspix:=spix;

endsem.

578
579

ENDGRAM

alias

allasname
aliasspix
ALIASSY
alts
any
ANYSY
AssignIdl

238
239
240
BO
2S
25
BELIEBEN
N)
13
Tey
Ssh
Si)
3302385574
41
349
420
11
Seal
22
239
459
125,
Pola
ASS

241
28

242
PAR

243
230

244
ZEA

SHO

BX

SRA

yesh

422
7320

739783555.4597

245
Zh)

246
sh

54352562

247
eid

248
sricsh

249
ey

528
515
14
501
shill

530
519
15

16

20

27

314

322

325

333

402
416
10

455
280

347

119
89
199
423
413

205
99

545

547

563

565

480
433

488
487

496

204
126
Sn
448
25

205
130
312
450
220

359
139
312
501
282

547
151
322
503
286

565
155
323
504
404

214
292

221
297

287
431

291
434

415
240
145

393
173

304

132
130
74

142
132

146
139

116
229

521
241
242
363
243
243
392
118
387

532
415
951
411
456
456
395
ae
392

480
67
67

18
19
213
537
133
130
539
451
371
215
481
214
292

AssignId2
AssignNumber
attr

attributes
Attrtype
CheckAttr
CloseFile
coco
cocogen
cocogra
cocolex
cocosym
col
CompErr

CompleteAt
ConcatLeft

ConcatRight
const

App. F

Program listings

238

3m

365

370

384

401

157
323
504
405

167
333
505
419

176
365
509
420

182
370
509
424

186
384
525
424

199
399
525
478

422
439

424
447

428
457

430
460

434
465

434
475

481

316

327

365

370

384

159
142

170
157

174
159

188
167

511
170

526
186

188

464

478

485

490

495

442
134
396

141
398

142
442

161
445

169
510

170
540

190
558

194

24
20
222
543
155
139

25
292
292
544
160
157

26
298
299
559
176
167

282
299
439
562
189
186

404
430
493
563

419
431

475
439

478

485

466
376
220
485
221
296

469
394
282
487
287
431

404
285
487
290
433

444
404
488
412
439

575
412

413

419

420

423

478

413
447

422
457

423
460

428
465

430
471

433
475

300

189

Copy
count

300
401
dd

485

ddl
dd2
489
ddt
DECLARATIONSY
def
del
dir
dirl
Direction
down

405

Sit

dummy
EmitAction
ENDGRAMSY

ENDSEMSY
eofsy
eps
EPSSY
err
Error

568

Errors

expr
fact
firstfact
firstsymbol

GenAssign
GetAt
GetMacroNr
GetNode

GetSy

gl
gll
gl2

200

coco.ATG

gp2

gp3

gpo
GRAMMARSY

gramspix
GraphList
Graphnode
IDENT
inattr
InitCopy
InsertFramePart

INSY
kind

line

link
macrodef
MACROSY
maxstacksize
n
name
NewAt
NewMacro
NewNode

NewSy
nococosy
nodes
nonterm

NONTERMINALSY
nr
nt
null
NUMBER
ok
OpenFile
OpenSem
outattr
OUTSY
Pop

PopPointers
pr
PRAGMASY
Push
PushPointers
RepNode
RepSy
RestartHash

479
67
65
66
481
66
433
66
456
478
66
69
447
10
44
42
42
10
383
27
208
40
28
72
323
43
556
49)
12
12
91
76
256
45
45
42
46
275
58
133
15
118
73
52
28
81
40
40
27
29
90
212
13
14
90
219
42
46
43

480
494
451
215
485
214

292
457
479
494
216
457
244
349
415
65
15
390
28
354
358
245
127
327
119
31
246
93
189
339
146
557
410
228
64
160
247
119
129
52
28
557
350
545
27
248
96
476
175
249
103
474
453
372
357

481
496
452
220
487
221

239

488

490

494

495

496

453
284
488
289

466
404
489
411

466
405

467
419

469
420

469
423

470
478

480

412

413

422

423

428

430

432
460
485

433
460
486

439
464
488

446
465
489

447
466
494

447
467
495

451
471
495

453
475
496

298
469

429
470

430
471

430

431

431

439

439

16
514
312

28
528
503

29
530
509

29
539

30
556

31
571

32

256

349

510
192
329
446

300
501
459

304
503
464

306
504
478

308
505
485

312
509
495

316
520
521

318
525
532

320
531
545

295
457
480
496
292
460
348
350

337
361
105
258
341
174

553

515

319

411
363

456
442

459

464

478

485

495

382
309
138
227
258
558

311
166
391
519

32
185
441

322
382

332
384

333
395

410

333

504

505

525

214

214

214

25

ZS)

215

445

531

220

220

221

221

221

222

556
29
526
101
483
368
368
109
477
467
shi]
538

576
568

510

Restriction
root
rootloc
rootsy
rules
RULESSY
seml

sem2

sem3
semaction
Semant
SEMANTICSTACK
SEMANTICSY

SemErr
SEMSY
sn

sp
spix

spixl
stack
StartCopy
StopHash
StoreSymbol

47
89
107
284
289
295
AeA
OMe
AS
61
386
387
410
41
306
309°
31873207
329"
331
3455
34955407
16
250
389
79.300.305
307
1 IB
RES
72319
es
392
4014507
4525
4525
501°
5022508509521
ie
sk)
3050307023100
ews)
eh
ale)
sig
SI)
SR)
SY
3710037160,
3847
“401
450
4529
45222501
S02
04 eeODmmD
ZO
532
78
334
375
376
462
466
466
469
469
536
539
540
5455255655557]
14
23
300
3340537150.24:6252536
336
86
88
E2523

113
53

47
119
30
Sy
592370
404
405
94
99
8271337

25155371554
9371103129316
405
406
444
99
99°
105)
7142171467
1557

257
SHU
S25
169
93
40
A35

338
yA
30S
186
99
209
3505

364
SE
Ste
187
106
545
233i)

364

369

sv

PE

Sal

string

257

339

styp

13201295213 85 154515
315
323
326
358
368
a
Sy
a
Gl)
GS;
Bee
a
30077302
5312753147
3197
37131273167
3117
3327
440
441
442
444
446
ay
SE
BIG
60
399
404
406
a
208
32
re
293
302
314
325
338
59
46
A
3
46 226
386
390
440

Symbolnode
Symboltype
SyNr
SYSTEM

48

t
term

13 1517
35873630
365)
Tre

TERMINALSY

13552530363

typ
type
up

43 205
395
3037731577326
526

Usage

Sey

use
VAL
x

455)
ci lsh
97
99

128)
ZA
100

O04
448
2112
TA

esoones OS
575
575
6

226

228256

390
08597
ze

392
SS
a

440

440
RR

442
514
less
A
GE

aS
399

IS)
401

BOE
448

SO)
450

SYNE}
501

Aa
es
Na
325734073410.
390,739
7392
501
503
504

22
3650.
3045
505

Ban
DR
3660.370
3008
401
509
525

es

mesos mes oo
576

563
6559
383

225

syl
symbol

31.503111
445
446
106"
21106
60)
6 oe

369
383
390
2539225395755
USS es Soe
189

STRING

sy

App. F

Program listings

240

65166,
382
ys!
ken
3237
3857
450

ae

341

5

O)

364

369

440

ae

2

a

547

61

62

570

575

2411]

395

399

445

446

448

S53)
PS
103

183)
QA
106

308s
DR

320m
Wo

33184505

Ak

565

52008531

SHLD
503

9

App. F

coco MOD

1 (* Coco
2
==

Compiler

compiler

241

Coco

Moe

SETHIS is the main module of Coco. It controls the execution
4 compiler compiler. It
Sera) opens and closes the files
One
b) initializes the scanner
Je
clmcallssthe parser
8
d) calls the procedures which collect the symbol sets
9
e) calls the grammar test procedures
10
f) calls the procedure which generates the compiler
1l

g)

calls

the

12
13 Implementation
14
1: cocolex
15
2: cocolex
16
3: cocolex
17
4: cocolex
18
5: cocogra
19

6:

cocosym

lister

to print

a listing

with

error

of

27.12.83

the

messages

restrict ions:

Hash,

Hash table full
Name list full
Include stack overflow
Attribute queue overflow
Too many nodes in TDG (>600)
Symbol list overflow (>199)
Too many terminals (>127)

Hash
PushInc
EnQueue
NewNode
NewSy

20
7: cocosym
NewSy
zu
22 Compiler errors:
23
i: cöcolex
PopInc
Include stack underflow
24
2: cocolex
DeQueue
Attribute queue underflow
25
3: cocosym
GetAt
Try to get attribute inf. for a terminal
26
4: cocogen
OpenFile
Semantic frame not found
27
5: cocogen2 GenSynFiles
Parser frame not found
28
6: cocogen2 NewAdr
Fixups already resolved
29
30 Trace switches:
can be set by "$D letter {letter}" (without spaces)
31 +, A: cocosyn
Print parser input (remove comments!)
32
B: cocosyn
Trace parser run
(remove comments!)
33
C: cocogra
DelGraph
Print visited nodes
34
D: cocotst
FindCircularRules Print derivations between single nt's
35
E: cocotst
TestIfNtToTerm
Trace flow of algorithm
36
F: cocotst
CheckAlternatives Print visited nodes
37
G: cocosym
CollectF irstSet
Print visited nodes
38
H: cocosym
GetFirst Set
Print resulting set
39
I: cocosym
GetFollo wSets
Print resulting sets
40
J: cocosym
CollectFollowSets Print visited nodes
41
K: cocosym
Print sets of term.starts and succ.s
42
L: cocosem
Print generated TDG
43
44 MODULE Coco;
45
46 FROM cocogen
filesopen, CloseFile;
47 FROM cocogen2
GenSynFiles, PutStatistics;
48 FROM cocogra
DeleteRedundantEps, NewEpsBeforeDelNts;
ddGyasncy
49 FROM cocolex
lst, PrintListing;
50 FROM cocolst
FindDelSymbols, GetSymbolSets;
51 FROM cocosym
Parse, printinput, printnodes;
52 FROM cocosyn
FindCircularRules, LL1Test, TestCompleteness,
53 FROM cocotst
TestIfAllNtReached, TestIfNtToTerm;
54
GetNumberOfErrors;
55 FROM Errors
56 FROM

57
58 FROM

59

FileIO

con,

File,

WriteLn,
System

Terminate,

Done,

Open,

WriteString;
normal;

Close,

Read,

Writelnt,

60
61
62
63
64
65
66
67
68
69
70
ql
72
73
74
75
76
14)
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Of
98
99
100
101
102
103
104
105
106
107
108
109
110
111
1212
113
114
115
116
7
118

App.

Program listings

242

VAR
ch?

CHAR;

correct:

BOOLEAN;

aus
lstn:
ok:

BOOLEAN;
ARRAY(0..63]
BOOLEAN;

semerrors:

CARDINAL;

synerrors:

CARDINAL;

(* ChangeExtension
PROCEDURE

OF

Change

(*TRUE
(*list

CHAR;

extension

ChangeExtension(VAR

if grammar is LL(1)*)
file name*)

of file

name
ext:ARRAY

OF CHAR;

old,new:ARRAY

OF CHAR);

VAR i,}: INTEGER;
BEGIN
1:=0;

WHILE

(i<=HIGH(old))

WHILE
j:=1;
WHILE

(i>=0)

AND

(old[i]<="

(j>=0)

AND

(old{j]<>".")

IF

j>=0

THEN

1:=j-1;

AND

(old[i]<>0C)

DO 1:=i+l;

") DO DEC(i)
DO DEC(j)

END;

END;
END;

END;

FOR j:=0 TO i DO new[j]:=old[j];

END;

new[itl]:=".";
new[i+2]:=ext[0];
new[i+4]:=ext[2];
new[i+5] :=0C;

new(i+3]:=ext[1];

END

ChangeExtension;

BEGIN

WriteString(con,"Coco

- Compiler

Compiler

Vs 4.1$");

Open (src,0,"",FALSE);
IF NOT Done THEN Terminate (normal) END; (*cancel*)
ChangeExtension (src”.name,1lstn, "LST");
Open (lst,src*.volRef,1lstn,
TRUE);
WriteString(lst,"Coco - Compiler Compiler Vs 4.1
Mr
WriteString(lst," (Source file: "); WriteString(lst,src*.name) ;

WriteString(lst,")$$");
WriteString(con, "parsing");
Parse (correct);
(*parse
GetNumberOfErrors (synerrors,semerrors);
(*check
IF synerrors+semerrors<>0 THEN
IF filesopen THEN CloseFile END;
WriteString
(con, "$listing");
PrintListing;
WriteString(con,"$Compilation terminated. ");
Writelnt

input grammar*)
for errors*)

(con, synerrors+semerrors,0);

WriteString(con," errors
Close (sre); Close (lst);
Read(con,ch);
END;

detected.

Press

Terminate (normal);

WriteString(con, "$evaluating$");
FindDelSymbols;
NewEpsBeforeDelNts;
DeleteRedundantEps;
GetSymbolSets;
TestCompleteness
(ok) ;
IF ok THEN TestIfAllNtReached(ok);

END;

any

key.$");

coco.MOD

119
120

IF ok THEN
IF ok THEN

121
122

IF ok THEN LL1Test (111); END;
IF NOT ok OR NOT 111 THEN

123

243

FindCircularRules(ok); END;
TestIfNtToTerm(ok); END;

WriteString(con,"listing$") ;

124

WriteLn(1st);

125

END;

WriteLn(Yst);

PrintListing;

126
127

IF

128
129
130

ok

THEN

WriteString(con, "writing$");
GenSynFiles;
PutStatistics;

bl

END;

132

IF NOT

133

ok THEN

#

WriteString(con, "Compilation

134
Shey
136
137
138

ELSIF NOT 111 THEN
WriteString(con, "Compilation
ELSE
WriteString
(con, "Compilation
END;

139

Close (src);

140

WriteString(con,"

141

END

with

errors

ended

with

LL(1)

completed.

in grammar

tests.");

errors.");

No errors

detected.");

Close (lst);

Press

any

key.$");

Read(con,
ch) ;

Coco.

&
ch
ChangeExtension
Close
CloseFile
Coco” »
cocogen

77
62
73
56
46
44
46

cocogen2

47

cocogra
cocolex
cocolst
cocosym
cocosyn
cocotst
con

48
49
50
51
52
53
56
185
63
49

correct
ddt
DeleteRedundantEps
Done
56

84
109
85
108
102
141

89
137
99
48
91

Errors
ext

55
73

83

File
Filelo
filesopen
FindCircularRules
FindDelSymbols
GenSynFiles
GetNumberOfErrors
GetSymbolSets

56
56
46
53
Sl
47
55
Sn

102
119
113
129
100
116

HIGH

un

1

74
83
74

5

ended

76
83
79

140
92
108

139

139

98
140

103
140

105

106

107

109

112

VS

28

133

77
“84
80

tH

78

78

78

Ty

333

82

81

81

82

82

82

5

83

84

07
83
80

77
84
80

App. F

Program listings

244

121
121
93
92
95
82
48
91
ite
ig
90
99

122

134

94
93

95

95

96

108

83
114
109
118
an
93

83

83

84

84

118
78

119
80

119
82

120

52
50
52

104

124

PutStatistics
Read
semerrors
src
synerrors

47
56
67
49
68

130
109
100
90
100

140
101
92
101

106
23
106

95

108

139

System

58

95
137

95
140

96

98

147
LL1Test
lst
lstn
name
new
NewEpsBeforeDelNts
normal
ok
old
Open
Parse

64
53
50
65
92
13

printinput
PrintListing
printnodes

58
66
73

56
52

Terminate

58

TestCompleteness

53

TestIfAllNtReached
TestIfNtToTerm
volRef
WriteInt
WriteLn

54
93
56
57

WriteString

57

128

91
Jl)
54
120

106
124
89
133

124

124

139

120

121

122

127

132

103

105

107

2

123

109
118

124
94
135

App. F

cocogen.DEF

(* cocogen

Generator

This module
a) copies
b) copies
c) stores
DEFINITION

FROM

files

Moe

28.12.83

generates the semantic evaluator. It
symbols from the input grammar to the evaluator
text from the semantic frame to the evaluator
attribute assignments (and emits them as semantic
MODULE

FileIO

=

actions)

cocogen;

IMPORT

TYPE
Attrtype

of compiler

245

File;

(term,nonterm, const);
?

VAR
maxsem:
filesopen:

CARDINAL;
BOOLEAN;

PROCEDURE

CloseFile;

(* Closes

the

file

(*number of last semantic action*)
(*files may remain open after a syntax

where

the

semantic

evaluator

PROCEDURE Copy (typ,col:CARDINAL) ;
(* Copies the source symbol typ at column
semantic file*)
PROCEDURE
(* Copies

col

is written

to the

error*)

to*)

generated

CopyFramePart
(VAR fl,f2:File; s:ARRAY OF CHAR);
file fl to file £2 until string s occurs. s is not

copied*)

PROCEDURE EmitAction(line:CARDINAL; VAR sem:CARDINAL);
(* Emits the stored attribute assignments as a semantic action. line
“» is used to print a comment. sem is the number of the new action*)
PROCEDURE GenAssign(typ:Attrtype; left, right:CARDINAL) ;
(* Generates an assignment arg(left)<--arg(right).
typ indicates if
arg(right) is a terminal attribute, a nonterminal attribute or
a constant*)

PROCEDURE

InsertFramePart;

(* Inserts

the middle

part

in the generated

semantics

file*)

PROCEDURE OpenFile(spix:CARDINAL);
(* Opens the file where the semantic evaluator is written to. spix is
the grammar name in Cocol. The name of the generated file is the
grammar name with the suffix "sem"*)
PROCEDURE OpenSem(line:CARDINAL; VAR sem:CARDINAL);
(* Prints the start of a new semantic action (case-number of a new
case-block). line is used to print a comment. sem is the number
the new action*)
PROCEDURE StartCopy (col:CARDINAL);
(* Saves col as the leftmost column
END

cocogen.

in the

following

semantic

of

action*)

App. F

Program listings

246

1 (* cocogen
Q 9 sess===
3 This module
a) copies
4
5
b) copies
1

stores

c)

6

of semantic

Generation

evaluator

Moe

generates the semantic evaluator. It
symbols from the input grammar to the evaluator
text from the semantic frame to the evaluator
attribute assignments (and emits them as semantic

30.12.83

Be

-727770
-----7772722
-------=-----------

8 IMPLEMENTATION

MODULE

cocogen;

10 FROM

cocolex

IMPORT

at,

11 FROM
12 FROM
13
14 FROM

Errors
FileIO

IMPORT
IMPORT

System

IMPORT

CompErr, SemErr;
con, File, Done, Open, Close, Read, Write,
WriteCard, WriteLn, WriteString, WriteText;
Allocate, Deallocate;

line,

col,

src,

GetName;

15
16

CONST

17

blanks

=

19S
20

ident
SELING

=" 17;
lS)

2a

number

=

22
23

Ilparsy
commasy

= 23;
= 33;

24

eolsy

=299;

18

"

We

(*symbol

numbers*)

19;

26 TYPE

27
28
29

30
Sil
32
33
34
35
36
37
38
39
40

Actionptr = POINTER TO Action;
Assignmentptr = POINTER TO Assignment;
Action = RECORD
(*information

sem:

about

attr.eval.

action*)

(*action number*)

firstass: Assignmentptr;
(*to first assignment*)
next:
Actionptr;
(*to next action*)
END;
Assignment = RECORD
(*information about an attr. assignment*)
typ:
Attrtype;
(*term, nonterm, const*)
left:
CARDINAL;
(*spix of left-hand side*)
right: CARDINAL;
(*spix or val of right-hand side*)
next:
Assignmentptr;
(*to next assignment*)
END;
Name = ARRAY[1..80] OF CHAR;

41
42 VAR
43
firstact:
44
firstass:
45
fram:
46

gram:

47
48

graml:
lastact:

49
50
51

lastass:
lastcol:
lasttyp:
leftcol:
margin:
op:
sem:
semname:

52
53
54
55
e

CARDINAL;

58
59 PROCEDURE

Actionptr;
Assignmentptr;
File;
Name;
CARDINAL;
Actionptr;
Assignmentptr;
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
ARRAY[0..commasy]
File;
Name;

(*first generated action*)
(*first stored assignment*)
(*file with frame of sem.Analyzer*)
(*grammar name*)
(*length of grammar name*)
(*last generated action*)
(*last stored assignment*)
(*column of last symbol*)
(*type of last symbol*)
(*leftmost column in semantic action*)
(*indent from left margin*)
OF CHAR; (*operator tablet)
(*file containing sem.evaluator*)
(*file name of sem.evaluator*)

EmitAssign(p:Assignmentptr) ; FORWARD;

App. F

cocogen.MOD

247

60
61

62

(* CloseFile

Close

file containing

the semantic

evaluator

a ee
64 PROCEDURE CloseFile;

“i

65 BEGIN

66

CopyFramePart
(fram, sem, "-->modulename") ;

67

WriteText (sem, gram, graml) ; WriteString(sem,"sem") ;

68
CopyFramePart (fram,sem,"$$$") ;
69
Close(fram); Close (sem);
70
filesopen:=FALSE;
un
END CloseFile;
72
73
?
74 (* Copy
Copy source symbol to semantic evaluator
19 27-2222
76 PROCEDURE Copy (typ,col:CARDINAL);
77 VAR
78
che
CHAR;
79
1,i:
80
name:
81 BEGIN

82
83
84
85

CARDINAL;
Name;

IF col<=lastcol THEN
WriteLn (sem) ;
WriteText (sem,blanks,margin)
IF col>leftcol THEN

86
87

88

lasttyp:=eolsy;
END;

IF

91
92

;

(typ<=number)
Write(sem,"
END;

AND

(lasttyp<=number)

CASE typ OF

94
95
96

|
|
|

1: WriteString(sem, "allas");
2: WriteString(sem,"any");
3: WriteString
(sem, "DECLARATIONS");

97

|

4:

98
99
100

|
|
|
|
|

5: WriteString
(sem, "endsem");
6: WriteString
(sem, "eps”);
7: WriteString
(sem, "GRAMMAR");

102
103

THEN

");

93

101

line*)

WriteText
(sem, blanks, col-leftcol) ;
END;

89

90°»

(*new

WriteString
(sem, "ENDGRAM") ;

8: WriteString(sem,
"IN");

9: WriteString
(sem, "MACROS") ;
| 10: WriteString(sem, "NONTERMINALS") ;

108

|
|
|
|
|

15:

WriteString
(sem, "SEMANTICS");

109

| 16:

WriteString
(sem, "TERMINALS");

110
Aus
il

| 17,18: (*ident, string*)
GetName
(at [l],name,1); WriteText (sem,name,1);
| 19: WriteCard(sem,at[1],0);

113

[2027-33

104

105
106
107

11:

WriteString
(sem, "out");

12:
13:
14:

WriteString
(sem, "PRAGMAS");
WriteString
(sem, "RULES") ;
WriteString
(sem, "sem");

114

115
116
ily
118

a a(*Operators®)

Write(sem,op[typ]);

| 34:

ch:=CHR(at[1]);
IF

(ch="!") OR ((ch="*")
THEN;
ELSE Write (sem,ch) ;

AND

(lasttyp<>ident) )

2)

119
120
121
122
123
124
125

App. F

Program listings

248

END;
END; (*CASE*)
lasttyp:=typ; lastcol:=col;
END Copy;

(* CopyFramePart

Copies

file

fl to file

f2 until

string

s ee

126 ---------------------2277777777777
127 PROCEDURE CopyFramePart
(VAR fl,f2:File; s:ARRAY OF CHAR);
128 VAR
129
130

131
132

ch,startch: CHAR;
1:
INTEGER;

t:

ARRAY[0..50]

OF CHAR;

BEGIN

133

startch:=s[0];

134
135

WHILE NOT f1*.eof
IF ch=startch

Read(fl,ch);

136
137
138
139
140
141
142
143
144
145
146
147
148
149
150

THEN
(*check if s occurs*)
1:=0;
WHILE (i<HIGH(s)) AND (ch=s[{i]) AND NOT f1*.eof DO
t[i1]:=ch; INC(1); Read(fl,ch);
END;
IF ch=s[1] THEN RETURN; END;
(*found - exit*)
WriteText (f2,t,1);
(*not found - continue*)
Write (f2,ch);
ELSE Write(f2,ch);
(*normal character - write
END;
Read(fl,ch);
END;
(*WHILE*)
END CopyFramePart;

DO

151

(* EmitAction

152

-------------------------------------------

Emit

stored

153

PROCEDURE

EmitAction(line:CARDINAL;

semantic

VAR

action

2-22...

2...

sem:CARDINAL);

154 VAR
155
156
157
158

act,p: Actionptr;
q: Assignmentptr}

159

BEGIN

160
161
162
163

PROCEDURE

WHILE

EqualAct (pl,p2:

(pl<>NIL)

AND

Assignmentptr):

(p2<>NIL)

(pl*.left=p2*.left) AND
pl:=pl*.next; p2:=p2*.next;
END;

164

RETURN

165

END

(pl=NIL)

AND

AND

BOOLEAN;

(pl*.typ=p2*.typ)

(pl*.right=p2*.right)

AND
DO

(p2=NIL);

EqualAct;

166
167
68
169

170
171
172
1078
174

175
176
IG

BEGIN
IF firstass=NIL
THEN sem:=0;

ELSE
p:=firstact;
WHILE (p<>NIL)
p:=p*.next;
END;

AND

NOT

EqualAct (p*.firstass,firstass)

IF p=NIL
THEN
(*new action*)
OpenSem(line, sem);

it*)

EmitAssign (firstass);

DO

x)

App. F

cocogen MOD

249

178

Allocate
(act, SIZE (Action) );

179
180
181

act*.sem:=sem; act*.firstass:=firstass;
IF firstact=NIL
THEN firstact:=act

182

ELSE

183
184
185
186
187
188
189

lastact”.next:=act

END;
lastact:=act;
ELSE
(*same action found; delete recently
sem:=p*.sem;
WHILE firstass<>NIL DO
g:=firstass; firstass:=firstass”.next;
END;

190

act*.next:=NIL;

stored

assignments 4,

Deallocate(q);

END;

191
192

END;
firstass:=NIL;

193

END

A

EmitAction;

194
195
196

(* EmitAssign

197

------------------~---------------------------------------------------

Write

attribute

assignment

198 PROCEDURE EmitAssign (p:Assignmentptr);
199 VAR
200
1: CARDINAL;
201
name: Name;
202
203

BEGIN
WHILE

p<>NIL

DO

204
205

WriteLn(sem); WriteText (sem,blanks,margin)
GetName (p*.left,name, 1);

206
ZI I

CASE p*.typ
term:

208° *
209
210
211
212

WriteString(sem,”"ASSIGN("); WriteText (sem,name,1);
WriteString(sem,”,at["); WriteCard(sem,p*.right,0);
WriteString(sem,”]);");
| nonterm:
WriteText (sem,name,1); WriteString
(sem, ":=");

213
214
215
216
GAS |
218

219
220
221
222
223

224

OF

GetName (p*.right,name, 1) ;
WriteText (sem,name,1); Write (sem,";");
| const:
WriteText (sem,name,1); WriteString(sem,":=");
WriteCard(sem,p*.right,0); Write(sem,";");
END;

(*CASE*)

p:=p” .next;
END; (*WHILE*)
END EmitAssign;

(* GenAssign

Store

attribute

225 -----------------------226 PROCEDURE GenAssign(t:Attrtype;
227 VAR ass: Assignmentptr;
228
229

;

BEGIN
IF (t=nonterm)

AND

(l=r)

THEN

assignment

07777707700
1,r:CARDINAL);

RETURN;

END;

230
231
232

Allocate(ass,SIZE (Assignment) );
WITH ass* DO typ:=t; left:=1; right:=r; next:=NIL; END;
IF firstass=NIL THEN firstass:=ass; ELSE lastass”.next:=ass;

233
234
235

lastass:=ass;
END GenAssign;

236

END;

237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Zn
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295

App. F

Program listings

250

Insert

(* InsertFramePart
PROCEDURE

part

middle

of semantic

evaluator

InsertFramePart;

BEGIN
CopyFramePart
(fram, sem, "-->actions");
margin:=9;
END

InsertFramePart;

(* OpenFile

PROCEDURE
VAR 1,1:
BEGIN

Open

file

for

semantic

evaluator

OpenFile (spix:CARDINAL);
CARDINAL;

GetName (spix,gram,1l); graml:=1;
FOR i:=1 TO graml DO semname[i]:=gram[i];
semname[l+1]:="s";
semname(1+4]:=".";
semname(1+7]:="F";

semname[l+2]:="e";
semname[l+5]:="D";
semname[1+8] :=0C;

END;
semname[1l+3] :="m";
semname[1+6] :="E";

Open (sem, src*.volRef, semname, TRUE) ;

(*definition

module*)

Open (fram,src*.volRef,"cocosemframe",
FALSE);
IF NOT Done THEN
SemErr(25,line,col);

WriteString(con,"The file 'cocosemframe' must be in the same ");
WriteString(con, "subdirectory as the input grammar.$Aborted.$");

CompErr (4)
END;

CopyFramePart
(fram, sem, "-->modulename") ;
WriteText (sem,gram,graml); WriteString(sem,"sem") ;
CopyFramePart
(fram, sem, "-->modulename") ;
WriteText
(sem, gram,graml); WriteString(sem,"sem") ;
CopyFramePart
(fram, sem, "-->implementation") ;
Close (sem);
semname[1+5]:="M"; semname[1+6]:="0"; semname[l+7) :="D";
Open (sem, src*.volRef, semname, TRUE) ;
(*implementation
CopyFramePart
(fram, sem, "-->modulename") ;
WriteText (sem,gram,graml); WriteString(sem,"sem") ;
CopyFramePart
(fram, sem, "-->scannername") ;

WriteText (sem,gram,graml); WriteString(sem,"lex") ;
CopyFramePart
(fram, sem, "-->declarations");
filesopen:=TRUE;
END OpenFile;

(* OpenSem
PROCEDURE

Write

start

of new

OpenSem(line:CARDINAL;

semantic
VAR

action

nr:CARDINAL);

BEGIN

INC (maxsem) ; nr:=maxsem;
WriteString(sem,"$
| ");
WriteString(sem,":
(*line

WriteCard(sem,maxsem, 3) ;
"); WriteCard(sem, line,0);

WriteString(sem,"*)");
END

OpenSem;

(* StartCopy

Set

leftmost

column

in semantic

action

module*)

App. F

cocogenMOD

251

296 PROCEDURE StartCopy(col:CARDINAL) ;
297 BEGIN

leftcol:=col;

lasttyp:=eolsy;

lastcol:=99;

298

END

StartCopy;

299
300 BEGIN (*cocogen*)
301
(*012345678901234567890*)

302
303
304

op:="

=.1|1
1 <>;:,";

maxsem:=11; margin:=0;
END cocogen.

firstact:=NIL;

act
Action
Actionptr
Allocate
ass

199,
27
27
14
Bel

34

230

Assignmentptr
at
Attrtype
blanks
Cc
ch

7A
31
ioe a
39.226
17
84
255
ish aie)
143
144
12
69
64
all
8 304
10
WO)
7
23054
11722263

SB
slab

Assignment

Close
CloseFile
cocogen
cocolex
col
commasy

CompErr

Con
const
Copy
CopyFramePart
Deallocate

Done
EmitAction
EmitAssign
eof
eolsy
EqualAct
Errors
fl
f2
File
FileIo
filesopen
firstact
firstass
FORWARD

28

SUA
ee
29
178
32S eA
COS
¥78
230
B30
BS
Dey
Raye

20%)

filesopen:=FALSE;

nr!

pe\s}

A

SO

Salis}

a)

GG

I

1198227

86 204
UNG
146
69

12

261

262

215
76
66

122
68

127

14

188

133551352,1385

1397218

9141

270

ey

148

G.I

241

265

1222259
153.193
DIE
199221
134
138
24
88
297
1582.165
2172
ali
1272133751347
2138721397
1270
1422
43
44
2
Gh) | Sys)
12
7105271972303
43
171
180
181
303
3
A
19200273273
22308

146

a),

PS

GV

267

269

274

276

278

ee

Re

59

fram
GenAssign
GetName
gram
graml

ASO CR a
226
234
I
Jul
Ay
AG
Ol me2 ole
OT
=PS

HIGH

138

1

EA
aN)

(*"=" must start at pos.

firstass:=NIL;

US)

252

AVS)

USS

ee

za

200

rl

Ake}
2
25292665208
BY
OG
AS

252d
Ay
AN)

er

SISK

SIG

SIS

AICS

AR

a

SN)

re

Re

AV

Sy

ident
InsertFramePar

1

lastact
lastass
lastcol
lasttyp
left
leftcol
line
lparsy
margin
maxsem
name
Name
next
nonterm

nr
number
op
Open

OpenFile
OpenSem

SemErr
semname

spix
src
startch
StartCopy
string
System
E

term

typ
volRef
Write
WriteCard
WriteLn

WriteString

App. F

Program listings

252

19
239
79
231
272
48
49
50
Sl
36
52
10
22
53
287
80
40
32
al
285
21
54
12
248
177
59
206
158
158
156
226
12
37
227
30
95
107
186
214
268
288
11
56
272
248
10
129
296
20
14
131
207
35
257
12
13
13
13
104

116
243
111
249
272
182
232
82
88
161
85
153

200
251

ul
Zou
272
184
233
121
90
161
86
WH

297
116
205
297
260

84
287
Wa
46
38
229
287
90
114
257
280
285
155
209
160
160
188
229
133
161
133
55
96
108
186
216
269
289
260
252
212
251
257
133
297

204
288
111
56
162

242
303
201
80
162

90
302
258

273

205
253

208
258

121
231

297

285

289

2122
293

2135212
254
254

21652268229
255
255
254

303
205
201
173

208

212

213

214

216

1219

182

188

219

231

232

172
217
161
161

12
219
161
161

173
219
162
162

103

1219

186

198

203

205

162
162

164
164

146
209
138
67
98
alba
204
217
273
290

213
141
67
99
112
208
217
274

217

231

68
100
114
208
241
275

69
101
118
209
257
275

83
102
153
209
265
276

84
103
169
210
266
277

86
104
Ln
212
266
2

91
105
179
212
267
278

94
106
179
214
268
288

253
273

253

253

254

254

254

255

255

257

272

258
135

273

139

142

226

229

231

76
258
91
112
83
67
105

90
Zi
114
209
204
94
106

93

114

121

160

160

206

ze!

118
217

143
288

144
289

214

217

95
107

96
108

97
109

98
208

99
a)

100
al)

101
ar

102
Pal

103
Aa

Zo
171
213
160
160
188
231
139
161
138
66
97
109
204
216
270
289

App. F

WriteText

cocogen.MOD

262
266
IST
AX
A

268
ar
22/8)

275
BE

277
288
289
IDEE

253

290
08210

2112160266

App. F

Program listings

254

Generator for syntax
1 (* cocogen2:
2
3 This module generates the parser. It
4
a) translates the top-down graph into
5b)
copies text from the parser frame,
6
the table sizes
7c)
writes the parser tables
8
d) prints statistical information

Moe

files

G-code
inserting

about

the

the

declarations

compilation

io}

10 DEFINITION
11

MODULE

cocogen2;

12 PROCEDURE GenSynFiles;
13 (* Generates the parser

14
15 PROCEDURE
16 (* Writes
17
18 END

and the parser

PutStatistics;
statistics about

cocogen2.

the

tables*)

compilation

to the

list

1.2.84

file*)

of

App. F

cocogen2. MOD

(* cocogen2:

Generator

for

syntax

255

files

Moe

1.2.84

This module generates the parser. It
a) translates the top-down graph into G-code
b) copies text from the parser frame, inserting the declarations
the table sizes
c) writes the parser tables
d) prints statistical information about the compilation
IMPLEMENTATION

MODULE

of

cocogen2;

rr
DIDHGHSwWNwHrH
POW

FROM
FROM
FROM
FROM
FROM

cocogen
cocogra
cocolex
cocolst

IMPORT
IMPORT
IMPORT
IMPORT

cocosym

IMPORT

maxsem, CopyFramePart;
alts, maxn, rootloc, rules, GetNode, Graphnode;
line, col, GetName;
lst;
gramspix, maxany, maxeps, maxt, maxp, maxs, GetA,
GetE, GetF, GetSy, RepSy, Symbolnode, Symbolset,

Symboltype;
YE
Ree
RPP
PRR
wo
PM
Ww
&
DOAIdDO
FROM Errors

IMPORT

CompErr,

SemErr;

FROM

FileIO

IMPORT

con, File, Done, Open, Close, Write,
WriteString, WriteText, WriteLn;

FROM
FROM

System
SYSTEM

IMPORT
IMPORT

Allocate,
VAL;

WriteCard,

Deallocate;

PO>wmwHro
LH
DH
NM
NM

CONST
(*for G-code*)
lmaxc = 3000;

(*G-code

length*)

YDNDM
www

28
29
30
31
32
33
34
35
36
37
38
39
40
4]
42
43
44
45
46
47
48

TYPE

Filename
= ARRAY[1..30]
OF CHAR;
Instruction=(tc, tac,ntc,ntac,ntsc,ntasc, anyc,anyac,epsc,epsac,
.

jmpc, retc) ;

#

VAR
code:
ARRAY[{1..lmaxc]
pe:
CARDINAL;
maxname : CARDINAL;
first:
BOOLEAN;
alo
Oe

OF

[0..255];

(*G-code area*)
(*index in code*)
(*length of name list*)
(*used for printing of tables*)
(*initialization counter*)

CARDINAL;
RECORD

CASE

:BOOLEAN

TRUE:

| FALSE:

OF

ch:

ARRAY[1..2]

card:

CARDINAL;

OF

CHAR;

END;
END;

PROCEDURE

OutByte(VAR

f:File;

ch:CHAR);

PROCEDURE

OutWord(VAR

f:File;

n:CARDINAL);

PROCEDURE

PrintTables(VAR

49 PROCEDURE

f:File);

FORWARD;
FORWARD;

FORWARD;

WriteConstDecl
(VAR f:File;t:ARRAY

50
Sl
52
(* G-code labels
53 MODULE LABMOD;
220222 2222222222222 2202222222222
54 Z=2=22=2=2=2==2==2=2=2=2
55 IMPORT
code, CompErr, Allocate, Deallocate;
56
57 EXPORT
GetAdr, labact, NewAdr, Visited;
58
59

OF CHAR;n:CARDINAL);

FORWARD;

2222222222222
=== 2 === === )

60

61
62

TYPE

Fixupptr
Fixup

= POINTER
= RECORD

TO Fixup;

63

adr:

CARDINAL;

(*G-code

64

next:

Fixupptr;

(*to

65

END;

66
67

App. F

Program listings

256

Labeladr = RECORD
loc,adr: CARDINAL;

68
69

fxs
END;

Fixupptr;

(*node

(*to

address*)

next

fixup*)

address

first

and

corresponding

G-code

address*)

fixup*)

70 VAR

71
lab: ARRAY[1..70] OF Labeladr;
72
labact: CARDINAL;
ve
74
75 PROCEDURE GetAdr (loc, fixup:CARDINAL;
76 VAR
ah
1:

CARDINAL;

78

Fixupptr;

fp:

VAR

adr:CARDINAL) ;

79 BEGIN

80

i:=1;

81

WHILE

82
83
84
85
86

IF i>labact
THEN
(*new label*)
INC (labact); lab[i].loc:=loc; lab[i].adr:=0;
Allocate
(fp, SIZE (Fixup) );
fp*.adr:=fixup; fp*.next:=NIL; lab[i].fix:=fp;

87
88
89
90

(i<=labact)

AND

(lab[i].loc<>loc)

ELSE
(*old label*)
IF lab{i].adr=0 THEN
(*not
Allocate (fp,SIZE (Fixup));
lab[i].fix:=£p;

91

DO

INC(i);

yet resolved*)
fp*.adr:=fixup;

END;

fp*.next:=lab[i].fix;

END;

92
END;
93
adr:=lab[i].adr;
94
END GetAdr;
95
96
97 PROCEDURE NewAdr (loc,adr:CARDINAL);
98 VAR
99

100
101

i:

CARDINAL;

p,q:

Fixupptr;

BEGIN

102

desis

103

WHILE

104
105
106
107
108
109
110

IF i>labact
THEN
(*new label*)
INC (labact); lab[i].loc:=loc;
(*old label*)
ELSE
IF lab[i].adr=0
THEN
(*resolve fixups*)
p:=lab[1].fix;

111
112
113
114
115
116
al
118

(i<=labact)

AND

(lab[i].loc<>loc)

DO

INC(i);

lab[i].adr:=adr;

WHILE p<>NIL DO
code [p*.adr]:=adr DIV 256;
code[p*.adr+1]:=adr MOD 256;
qg:=p; p:=p*.next; Deallocate(q);
END;
lab{i].adr:=adr; lab[i].f1x:=NIL;
ELSE
(*fixups already resolved*)
CompErr (6);

END;

lab[1].fix:=NIL;

App. F

1019
120
121
122
123

cocogen2.MOD

END;
END;

END

NewAdr;

124 PROCEDURE Visited(loc:CARDINAL):
125 VAR 1: CARDINAL;
126 BEGIN
127
Ir

BOOLEAN;

128
129

WHILE (i<=labact) AND (lab[i].loc<>loc)
RETURN (i<=labact) AND (lab[i].adr>0);

130
131
132

END Visited;

133

257

DO

INC(i);

END;

z
BEGIN

(*LABMOD*)

134

labact:=0;

135

END

LABMOD;

136
137
138
139

(* Emit
Emit G-code byte
----------------------------------------

140

PROCEDURE

Emit (byte:CARDINAL);

141 BEGIN code[pc]:=byte; INC(pc); END Emit;
142
143
144 (* Emit2
Emit G-code word
145 ---------------------------------77
146 PROCEDURE Emit2 (word:CARDINAL);
147

BEGIN

148

code [pc]:=word

149%

INC(pc,2);

150

END

DIV

256;

code[pc+1]:=word

MOD

256;

Emit2;

151
3:52
153

(* GenCode

154

---------------------------------- 22-2222
nna

Generate

155

PROCEDURE

GenCode

G-code

for

TDG

in loc

(loc:CARDINAL);

156 VAR
157

adr:

158
gn:
159 BEGIN

CARDINAL;

Graphnode;

160

IF Visited(loc)

161
162
163

NewAdr(loc,pc);
GetNode(loc,gn);
WITH gn DO

164

165

172

RETURN;

(*now

coming

END;

to address

loc*)

CASE typ OF
Gc

166
167
168

169
170
171

THEN

IF lp=0
THEN Emit (ORD(tc)); Emit (sp);
ELSE
GetAdr (lp,pc+2,adr) ;

| nt:

Emit (ORD(tac));
END;
IF lp=0
THEN

Emit(sp);

Emit2 (adr);

IF seml=0

173
174

THEN Emit (ORD(ntc)); Emit (sp);
ELSE Emit (ORD(ntsc)); Emit(sp);

175

END;

176
170

ELSE
GetAdr (lp,pc+2,adr);

Emit (seml);

-

258

Program listings

App. F

IF seml=0
178
THEN Emit (ORD(ntac)); Emit(sp); Emit2 (adr);
179
ELSE Emit (ORD(ntasc) ) ;Emit (sp) ;Emit2 (adr) ; Emit (sem1) ;
180
181
END;
END;
182
| any: IF lp=0
183
184
THEN Emit (ORD (anyc) );
ELSE
185
GetAdr
(lp, pc+2,adr) ;
186
Emit (ORD(anyac)); Emit(sp); Emit2(adr);
187
188
END;
189
| eps: IF sp<>0 THEN
190
IF lp=0
191
THEN Emit (ORD(epsc)); Emit (sp);
192
ELSE
193
GetAdr
(lp, pc+2,adr) ;
194
Emit (ORD(epsac)); Emit(sp); Emit2 (adr) ;
195
END;
196
END;
197
END; (*CASE*)
198
IF sem2<>0 THEN Emit (sem2); END;
199
IF sem3<>0 THEN Emit (sem3); END;
200
IF rp=0 THEN Emit (ORD (retc));
201
ELSIF Visited(rp) THEN
202
GetAdr(rp,pc+l,adr); Emit (ORD(jmpc)); Emit2 (adr);
203
END;
204
IF rp>0 THEN GenCode(rp); END;
205
IF lp>0 THEN GenCode (lp); END;
206
END; (*WITH*)
207
END GenCode;
208
209
210 (* GenSynFiles
Generates files for syntax analysis
a
- -- =
- == --=2------_________ * )
211 wn nnn na a a
212 PROCEDURE GenSynFiles;
218 VAR
214
fn:
Filename;
215
fram:
File;
(*file with parser frame*)
216
graml:
CARDINAL;
(*length of grammar name*)
217
gramname:
Filename;
(*grammar name*)
218
Na
CARDINAL;
219
name:
ARRAY[1..50) OF CHAR;
220
startpc:
CARDINAL;
221
sn:
Symbolnode;
222
syn:
File;
(*file for generated parser*)
223 BEGIN
224
pe:=1;
225
FOR i1:=maxp+1l TO maxs DO
226
labact:=0; startpc:=pc;
227
GetSy(1,sn);
228
GenCode (sn.start);
229
sn.start:=startpc;
230
RepSy (1,sn);
231
END;
232
startpc:=pc; GenCode (rootloc) ;
233
234
maxname:=4;
(*"EOF"+0C*)
235
FOR i:=1 TO maxs DO
236
GetSy(1,sn); GetName (sn.aliasspix,name,]l);

App. F

237
238
239
240
241
242

243
244

cocogen2.

MOD

259

sn.spix:=maxname+1; RepSy(i,sn); INC (maxname,1+1);
(*sn.spix becomes a pointer in the generated name list*)
END;
GetName (gramspix,gramname,graml);

‘

generate parser*)

(*------------------------------------------FOR i:=1

TO graml

DO fn[i]:=gramname[i];

END;

245
246
247
248

fn{graml+1]:="s"; £n[graml+2]:="y"; fn[graml+3]:="n";
£n[graml+5]:="D"; fn[graml+6]:="E"; fn[graml+7]:="F";
TRUE);
fn,ef,
Open (syn, 1st*.volR
FALSE);
Open (fram, lst*.volRef,"cocosynfram
e",

249

IF NOT

250
251
252

253
254
255
256
280,

258
259
260
261
262

Done

fn[graml+4] :=".";
fn[graml+8] :=0C;

THEN

same

WriteString(con,"The file 'cocosynframe' must be in the
WriteString(con,"subdirectory as the input grammar.$");
SemErr(21,line,col); CompErr(5);

END;
syn, "-->modulename")
CopyF(fram,
(*definition
rameP
art ;
WriteText (syn,gramname,graml); WriteString(syn,"syn") ;
CopyF(fram,
syn, "-->modulename")
rameP
art ;
WriteText (syn,gramname,graml); WriteString(syn,"syn");
CopyF
(fram,
syn, "-->implementat
ion");
rameP
art
Close(syn);
£n[graml+5]:="M";

fn[{graml+6]:="0";

");

module*)

fn[graml+7]:="D";

TRUE);
fn,ef,
Open (syn, 1lst*.volR

263
264
265

CopyFramePart (fram, syn, "-->modulename");
(*module
WriteText (syn,gramname,graml); WriteString(syn,"syn");

name*)

266
267

" CopyFramePart

(fram,syn,"-->semantic

analyzer");

(*various

268

WriteText (syn,gramname,graml);

269
270

CopyFramePart (fram,syn,"-->input module");
WriteText (syn,gramname,graml); WriteString(syn,"lex") ;

imports*)

WriteString(syn,"sem") ;

271
272
273
274

CopyFramePart (fram, syn, "-->declarations");
"CONST$");
(syn,ring
WriteSt
=",maxname) ;
maxname
WriteConstDecl (syn,"

205
276
277
278
279
280
281

maxnamep =",maxs);
WriteConstDecl (syn,"
=",pc-1);
maxcode
WriteConstDecl(syn,"
IF maxany=0
=",l);
maxany
THEN WriteConstDecl (syn,"
ELSE WriteConstDecl (syn,"
maxany
=",maxany);
END;
IF maxeps=0

(*semantic

282

THEN

WriteConstDecl

(syn,"

maxeps

u),

283

ELSE

WriteConstDecl

(syn,"

maxeps

=",maxeps);

284

END;

285
286

WriteConstDecl(syn,"
WriteConstDecl(syn,"

maxt
maxp

=",maxt);
=",maxp) ;

287

WriteConstDecl(syn,"

maxs

=",maxs);

288
289

WriteConstDecl(syn,"
startpc
WriteString(syn,"$ ");

290
291
292
293
294
295

declarations*)

=",startpc);

CopyFramePart (fram,syn,"-->tables") ;
PrintTables (syn);
(*module
syn, "-->modulename")
CopyF(fram,
rameP
art ;
WriteText (syn,gramname,graml); WriteString(syn,"syn");
CopyFra
(fram,
syn, "$$$");
mePart

name*)

Program listings

260
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
Sith
312
313
314
315
316
Sal
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355

Close(fram);
END

App. F

Close(syn);

GenSynFiles;

(* OutByte
PROCEDURE
BEGIN

Write
OutByte(VAR

a byte

f:File;

Valeo
THEN c.ch[1]:=ch;
ELSE c.ch[2]:=ch;
END;
first:=NOT first;
END OutByte;

value

to tables

file

ch:CHAR);

Vg

(* OutWord
PROCEDURE

OutWord(f,c.card);

Write
OutWord(VAR

a word

f:File;

to tables

file

n:CARDINAL);

BEGIN
IF ic=10 THEN
WriteString(f,"$
"); ic:=0
END;
WriteCard(f,n,5); Write(f,",");
INC (ic) ;
OutWord;

END

(* PrintTables
PROCEDURE

Write

PrintTables(VAR

out

an

initialization

of the

grammar

tables

f:File);

VAR

1,3j,1:

CARDINAL;

name:

ARRAY[1..50]

38
sn:

Symbolset;
Symbolnode;

OF

CHAR;

BEGIN

first:=TRUE;

WriteString(f,"

INLINE($

OutWord(f,pc-1);
OutWord(f,maxt) ;

");

ic:=0;

(*header (table

lengths)*)

OutWord (f,maxp);
OutWord(f,maxs) ;
OutWord(f,maxeps);
OutWord(f,maxany) ;
OutWord(f,maxs) ;
OutWord(f,maxname) ;

WriteString(f,"$(*---G-code---*)$
FOR 1:=1 TO pc-1 DO

*);

ic:=0;
(*G-code*)

OutByte (f,CHR(code[i]));

END;

IF ODD(pc-1)

THEN

OutByte (f,0C);
END;

WriteString(f£,"$(*---nt-symbols---*)$
FOR i1:=maxp+l TO maxs DO
GetSy (1,sn);
OutWord(f,sn.start);
OutWord(f,ORD(sn.del)
*256);
GetF (i,s);

");

ic:=0;
(*nt-symbols*)

App. F

356

357
358
359

360
361
362
363

364
365

366
367

cocogen2.

FOR

j:=0

TO maxt

DIV

16 DO

END;

WriteString(f,"$(*---eps
FOR i:=1 TO maxeps DO
GetE (1,s);
FOR

j:=0

TO maxt

DIV

END;
IF maxeps=0

END;
maxeps:=1;

16 DO

(*dummy*)

END;

3183

WriteString(f,"$(*---any

FOR i:=1

TO maxany

375

GetA(1,s);

376

FOR

j:=0

TO maxt

DIV

END;
END;
IF maxany=0

OutWord(f,0);
END;
maxany:=1; (*dummy*)

(*any-sets*)
16 DO

j:=0

TO

maxt

DIV

16

DO

END;
"»WriteString(f,"$(*---attribute

numbers---*)$

FOR i:=0 TO maxp DO
GetSy (1,sn);

");

ic:=0;

(*attribute

numbers*)

OutWord(f,sn.nra) ;

390
391

END;
WriteString(f,"$(*---pragma

392

OutWord(f,0);

393
394
395
396

FOR i:=maxt+l TO maxp DO
GetSy(i,sn);
OutWord (f,sn.seml);
OutWord(f,sn.sem2) ;

semantic---*)$

OutWord(f,0);

");

psem*)

(*pragma

END;
WriteString(f,"$(*---name

399

OutWord(f,1);

400
401
402
403
404

FOR i:=1 TO maxs DO
GetSy(i,sn);
(*sn.spix is now a pointer
OutWord(f,sn.spix);
END;

pointers---*)$

");

semantic*)

ic:=0;

(*for eofsy*)

405

Writesering(f,

406

OutByte(f,"E");

407
408

OutByte(f,"F"); OutByte(f,0C);
FOR 1:=1 TO maxs DO

(*name
in the

generated

%oi(*—==-names List=——*)iSi

pointers*)
name

")7) tes=07

OutByte(f,"0");
(*name

GetSy (1,sn);

410

GetName (sn.aliasspix,name,
1);

411

FOR j:=1 TO 1 DO OutByte(f,name[j]);

412
413

OutByte(f,0C);
END;
IF ODD(maxname)

ic:=0;

(*dummy

397
398

414

ic:=0;

THEN

382
383
384

409

");

OutWord(f,VAL(CARDINAL,s[j]));

FOR

389

sets---*)$

DO

381

387
388

of eps

16 DO

374

386

"); ic:=0;
(*followers

THEN

370
371
372

385

followers---*)$
F

OutWord(f, VAL(CARDINAL,s[4]));
END;

FOR J:=0 TO maxt DIV
OutWord(f,0);-

378
379
380

261

OutWord(f,VAL (CARDINAL, s[4]));
END;

368
369

377

MOD

THEN

OutByte(f,0C);

END;

END;

list*)

list*)

nodes*)

262

Program listings

415

App. F

WriteString(f,"0);$");

416

END

PrintTables;

417
418

419
420

(* PutStatistics
Writes statistics about compilation to list file
-----------------------2722722777

421

PROCEDURE

422 VAR
423
ptrsize:
424
425

setsize:
storage:

PutStatistics;

CARDINAL;
CARDINAL;
CARDINAL;

426 BEGIN
427
ptrsize:=2; setsize:=2*((maxt DIV 16)+1);
428
storage:=pc-1 +
429
(ptrsize+t2+setsize)*(maxs-maxp) +
430
setsize*maxeps +
431
setsize*maxany +
432
2*(maxpt+1l) +

(*G-code*)
(*ntsymbols*)
(*eps-followers*)
(*any-sets*)
(*nra*)

433

(Eps2)

4* (maxp-maxt+1)

+

434
435
436
437

2* (maxs+1l) +
(*namep*)
maxname +
(*name*)
16;
(*header*)
WriteLn(lst); WriteString(lst,"Statistics:"); WriteLn(lst);

438
439
440
441
442
443

WriteCard(lst,rules,5); WriteString(lst," rules"); WriteLn(lst);
WriteCard(lst,alts,5); WriteString(lst," alternatives"); WriteLn(lst);
WriteCard(lst,maxn,5); WriteString(lst," nodes"); WriteLn(lst);
WriteCard(lst,maxsem-10,5); WriteString(lst," semantic actions");
WriteLn (lst);
WriteCard(lst,maxeps,5); WriteString(lst," eps with look ahead");

444
445
446
447
448
449
450
451
452

WriteLn (lst);
WriteCard(lst,maxany,5); WriteString(lst," any-sets"); WriteLn (lst);
WriteCard(lst,pc-1,5); WriteString(lst," bytes for G-code");
WriteLn(lst);
WriteCard(lst,storage,5);
WriteString(lst," bytes for
END PutStatistics;

453

(* WriteConstDecl

Write

grammar

constant

tables

(total)");

declaration

WriteLln(lst);

text

454 =---------------2----2--------------2--_
22...
__________ *)
455 PROCEDURE WriteConstDecl(VAR f:File; t:ARRAY OF CHAR; n:CARDINAL) ;
456

457
458
459
460

BEGIN

WriteString(f,t); WriteCard(f,n,4);
END WriteConstDecl;
END

cocogen2.

adr

63
108
179

67
132
180

75
112)
186

84
143"
187

aliasspix
Allocate

236
22

410
56

85

89

13

439

306

306

alts

WriteString(f,";$");

any
anyac

183
30

anyc
byte
c

30
140
38

187
184
141

305

86
“L138
193

88
89
VICE
194
202

“93.
1290
202

937
ST

ode
teow

0
77

App. F

Cc
card
ch
Close
cocogen
cocogen2
cocogra
cocolex
cocolst
cocosym
code
col
CompErr
con
CopyFramePart
Deallocate
del
Done

Emit

Emit2
eps
epsac
epsc
Errors
if:

File
FileIo

Filename
first
fix
fixup
Fixup

Fixupptr
fn

GenCode

GenSynFiles
GetA
GetAdr
GetE
GetF
GetName
GetNode

GetSy
gn
graml

gramname
gramspix

cocogen2.

246
41
40
20
12
10
13
14
15
16
33
14
19
20
12
22
354
20
140
179
199
146
189
30
30
19
46
335
353
392
411
20
20
29
36
68
75
61
61
214
261
46
78
215
296
155
212
16
58
17
17
14
13
17
158
216
257
217
16

348
306
46
259

MOD

263

407

412

414

302
296

305
296

305

306

306

141

148

148

345

258

264

267

269

272

291

293

295

174
191

174
194

174
194

19
198

334
350
391
407

460

56
252
56
250
254
56

1172

113

118
251
256
114

252

249
141
180
200
150

166
180
202
169

166
180

169
184

169
187

173
187

173
191

179

180

187

194

202

47
336
354
392
412
46

48
337
357
395
414
47

49
338
360
396
415
48

302
339
364
398
455
49

306
340
369
399
457
Za

314
341
373
403
457
222

317
342
377
405
457
302

319
343
382
406

319
345
386
406

326
348
389
407

314

326

455

214
304
86
86
62
64
244
261
47
85
248

217
308
89
89
85
68
245
262
48
86
254

308
90

334
106

110

116

89
78
245

100
245

245

246

246

246

246

247

49
86
256

86
258

89
264

89
267

89
269

90
272

291

29388295

204
297
375
75
362
355
236
162
227
162
241
261
241
241

205

207

228

232

94

168

177

186

193

202

241

410

236
163
244
261
244

352

388

394

401

409

245
261
255

245
265
257

245
268
265

245
270
268

246
294
270

246

246

246

194
191

294

261

255

Graphnode

ih

LC
Instruction

j
jmpe
1
lab

labact
Labeladr

maxany
maxeps
maxn
maxname
maxp
maxs
maxsem
maxt

n
name
NewAdr
next

nra
nt

ntac
ntasc
ntc
ntsc
ODD

Open
OutByte
OutWord

9
pe
PrintTables
ptrsize
PutStatistics
q
RepSy
retc

App. F

Program listings

264

13
77
93
116
230
355
409
37
405
30
218
30
218
Da
106
58
66
53
14
26
67
128
165
15
439
445
16
16
13
35
16
16
WZ
16
47
219
58
64
389
171
30
30
30
30
347
20
46
47
353
399
100
34
224
48
423
421
100
17
30

158
80
99
116
235
361

81
102
125
236
362

81
103
127
237
374

81
103
128
244
375

82
103
128
244
387

84
104
128
244
388

84
106
129
328
393

86
106
129
344
394

88
106
218
345
400

89
108
225
351
401

90
110
227
352
408

316

oly

320

334

343

350

360

373

386

391

398

328
202
236
81
108
72
71
135
252
33
19
128
168
247
440
446
Zu
281
440
234
225
225
44]
285
49
236
97
86

356

357

363

364

368

376

377

381

411

411

237
84
110
81

328
84
116
82

410
86
116
84

411
88
128
103

89
129
104

90

98

103

106

106

106

128

129

134

226

81
155
171
248
440
446
279
283

81
160
11)
262
440
447
340
339

84
161
183
437
441
448
374
361

84
162
186
437
441
449
380
367

97

103

103

106

106

124

190
437
442
449
384
371

193
438
443

205
438
443

205
438
444

439
445

439
445

431
430

445
443

237
286
235

237
33,
275

274
351
287

342
387
338

414
393
341

435
429
351

432
400

433
408

429

434

336
314
329
121
89

356
319
410
161
114

363
455
411

368
457

376

381

393

427

433

248
309
314
357

262
345
321
364

348
335
369

406
336
377

406
337
382

407
338
389

407
339
392

411
340
392

412
341
395

414
342
396

111
141
232
326
429

112
148
276
416

113
148
335

114
149
344

114
161
347

114
168
428

177
446

186

193

202

1719
180
173
174
414
247
302
306
354
403
110
141
226
292
427
450
114
230
200

114
237

App. F

rootloc
rp
rules
s
seml
sem2
sem3
SemErr
setsize

sn

sp
spix
start
startpc
storage

Symbolnode
Symbolset
Symboltype
syn

System
SYSTEM
t
tac
BC

typ
VAL
Visited

volRef
word
Write
WriteCard
WriteConstDecl

WriteLn
WriteString

WriteText

cocogen2.MOD
13
200
13
330
72
198
199
19
424
221
354
166
237
228
220
425
17
17
18
222
265
276
293
22
23
49
30
30
164
23
58
247
146
20
20
49
455
21
21
334
439
21

232
201
438
355
174
198
199
252
427
227
388
169
403
229
226
428
221
330

204

204

362
180

364
395

430
229
394
174

431
230
395
179

292

288

247
265
278
294

255
268
282
295

255
268
283
296

165
169
166

457

357
124
248
148
319
319
274
458
437
250
343
440
258

377
160

201

439
276
438
255
360
443
265

265

375

37

236
396
180

236
401
187

237
403
189

237
409
191

33
410
194

352

353

256
269
285

Zoi)
270
286

257
270
287

258
ile
288

259
273
289

262
274
291

264
275
292

440
278

44]
279

443
282

445
283

446
285

448
236

457
287

288

439
257
373
445
268

440
265
386
446
270

442
268
391
449
294

444
270
398
457

445
273
405
457

447
289
415

449
294
437

317
438

266

Program listings

Graph node

(* cocogra

App. F
Moe

list

28.12.83

This module builds and handles the top-down graph. It
a) generates and updates single graph nodes
b) concatenates graphs via left or right pointers
the whole graph for tracing
oO = prints
d) inserts eps nodes before deletable nonterminals with alternatives
e) deletes redundant eps-nodes resulting from EBNF-constructs such as

sc

N ORION. ee

ee er BR

rotvonawuwbwßMV+H

ER

%

11 DEFINITION MODULE cocogra;
12
13 FROM cocosym IMPORT Symboltype;
14
15 CONST

16
iy)
18

19
20
za
22
23
24
25
26
27
28
29

maxnodes

= 600;

TYPE

Graphnode = RECORD
typ:
Symboltype;
sp:
CARDINAL;
lp:
CARDINAL;
rp:
CARDINAL;
seml: [0..255];
sem2: [0..255];
sem3: [0..255];
line: CARDINAL;
link: CARDINAL;

(*eps,t,pr,nt,any,err*)
(*node symbol*)
(*left pointer*)
(*right pointer*)
(*evaluation of in-attributes*)
(*evaluation of out-attributes*)
(*semantic action*)
(*line number*)
(*ptr to node with same right successor*)

END;

30
Marklist = ARRAY[0..maxnodes DIV 16] OF
31
32 VAR
33
maxn:
CARDINAL;
(*number of
34
alts:
CARDINAL;
(*number of
35
rules:
CARDINAL;
(*number of
36
rootloc: CARDINAL;
(*root node
37
38 PROCEDURE ClearMarkList
(VAR m:Marklist) ;
39 (* Clears the mark list m*)
40

41 PROCEDURE
42
43
44

BITSET;

graph nodes*)
alternatives, filled by AG*)
grammar rules, filled by AG*)
of grammar, filled by AG*)

ConcatLeft (VAR gp,gl,gpl,gl1:CARDINAL);

(* Links the graph (gp,gl) with the graph (gpl,gll)
The resulting graph is identified by (gp,gl)*)

45 PROCEDURE

via

left

via

right

ConcatRight
(VAR gp,gl,gpl,gll:CARDINAL);

(* Links the graph (gp,gl) with the graph (gpl,gll)
The resulting graph is identified by (gp,gl)*)
iS
tes
Css
Oo
DID
PROCEDURE

Deletable(loc:CARDINAL):

(* TRUE

if the graph

with

the

root

BOOLEAN;

loc is deletable*)

nn
©
=

onie)

PROCEDURE DeleteRedundantEps;
(* Deletes eps nodes in constructions
PROCEDURE

(* TRUE
aAAaAnnnn
PROCEDURE
w
ou»
onio}

(* Gets

pointers.

{x}y and

[x]y*)

DelNode (gn:Graphnode) : BOOLEAN;

if the

node

gn contains

a deletable

symbol*)

GetNode (p:CARDINAL; VAR gn:Graphnode);
the graph node with the index p*)

pointers.

App. F

cocogra.DEF

267

60
61 PROCEDURE

GraphList;

62
63

a test

(* Prints

list

of the

top-down

graphs

of all

rules*)

64 PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist);
65 (* Marks loc in list m as visited*)
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

PROCEDURE Marked(loc:CARDINAL; VAR m:Marklist):
(* TRUE if loc is marked in m*)

PROCEDURE NewEpsBeforeDelNts;
(* Inserts eps nodes in front

of deletable

BOOLEAN;

nt's*)

PROCEDURE NewNode (typsSymboltype; sp,line:CARDINAL): CARDINAL;
(* Generates a new graph node with the specified values and returns
its index*)
PROCEDURE RepNode (p:CARDINAL; gn:Graphnode);
(* Replaces the graph node with index. p by gn*)
END

cocogra.

App. F

Program listings

268

list

node

Graph

(* cocogra

Moe

for coco

29.12.83

This module builds and handles the top-down graph. It
a) generates and updates single graph nodes
graphs via left or right pointers
oO < concatenates
c) prints the whole graph for tracing
d) inserts eps nodes before deletable nonterminals with alternatives
e) deletes redundant eps-nodes resulting from EBNF-constructs such as
vo,sawuPbwMNH
{x}y or [x]y
rR
LEN
ua

IMPLEMENTATION

MODULE

cocogra;

FROM
FROM

cocolex
cocosym

IMPORT
IMPORT

FROM
FROM

Errors
FileIo

IMPORT
IMPORT

ddt, GetName;
maxp, maxs, GetSy, RepSy, Symbolnode,
Symboltype;
Restriction;
con, WriteCard, WriteLn, WriteString,
WriteText;

P>+rrereHrrHrH
oJn\aw>w%M
vo

TYPE
VAR

Graphnodelist = ARRAY[l..maxnodes] OF Graphnode;
gn: Graphnodelist;
(*syntax graph*)

NP
ND
NM
MM
Oo
+
wMNV
>
(* ClearMarkList

Clear

mark

! | ! | | | | ' ! | ' ' | | I ! | ! | | ! | | | ! | | | !

PROCEDURE
i:

BEGIN

m
| | '

! I

|

' I | | ! I ! I I ! | -

CARDINAL;

FOR

i:=0

TO maxnodes

(* ConcatLeft

VAR p:
BEGIN

DIV

Concatenate

MW
nr
wwwh
WCOW
eo
WmOrR
PROCEDURE ConcatLeft

16 DO m[i]:={};

graph

gpl

left

END;

END

to graph

ClearMarkList;

gp

(VAR gp,gl,gpl,gl1:CARDINAL);

CARDINAL;

Www
=
Dm

P:=9p;

Ww
oo
—

gn[p] ..1p:=gpl;

WHILE

gn(p].lp<>0

DO p:=gn[p].lp;

END;

p:=gl;
>
W ow

WHILE gn[p] .link<>0
an[p] .link:=gl1;
END ConcatLeft;

DO p:=gn[p].link;

END;

>
PP
ww
-

(* ConcatRight

Concatenate

graph

gpl

right

to graph

PROCEDURE ConcatRight
(VAR gp,gl,gpl,gl1:CARDINAL) ;
VAR p: CARDINAL;
BEGIN

[u
Se
Sn
u WODWAIDHDUS

p:=gl;
WHILE p<>0 DO gn{p].rp:=gpl;

p:=gn[p].link;

END;

gl:=gll;
END

om
oom
wwre6o0

56

' ! ! cam

ClearMarkList
(VAR m:Marklist);

NM
No

VAR

list

| ! | ' I ' ' | j ' ! I

ConcatRight;

(* Deletable

Check

if graph

in loc

58 PROCEDURE Deletable (loc:CARDINAL) :BOOLEAN;
59 VAR m: Marklist;

is deletable

gp

~

App. F

cocograMOD

269

60
61

PROCEDURE

62

VAR gn:Graphnode;

63

BEGIN

DelGraph (loc:

64

IF loc=0

65
66

IF Marked(loc,m)
Mark (loc,m);

67
68

GetNode (loc,gn);
IF ddt["C"! THEN

69
70
WA

THEN RETURN

CARDINAL) :BOOLEAN;

TRUE;

THEN

END;

RETURN

(*end of graph

FALSE;

found*)

END;

WriteString(con,"DelGraph:") ;
WriteCard(con,1loc,6); WriteCard(con,ORD (gn.typ) ,8);
WriteCard(con,gn.sp,6); WriteLn(con);

72

END;

73

RETURN

74

((gn.lp<>0)

AND DelGraph(gn.lp))

(DelNode(gn)

75
76

AND

OR

DelGraph(gn.rp));

END DelGraph;

77 BEGIN

(*Deletable*)

78

ClearMarkList
(m);

79
80

RETURN DelGraph (loc);
END Deletable;

83 (* DelNode
Test if node gn is deletable
84 ------------------------------22222-000000
: BOOLEAN;
85 PROCEDURE DelNode (gn:Graphnod
e)
86 VAR sn:Symbolnode;
87 BEGIN

88
89

90
91

|

IF gn.typ=nt
THEN GetSy(gn.sp,sn);

” ELSE RETURN

RETURN

sn.del;

gn.typ=eps;

END;

92
END DelNode;
93
94
95 (* DeleteRedundantEps
Delete eps nodes in constructions {x}y and [x]y
96 -----------------------------------200.
*)
97 PROCEDURE DeleteRedundantEps;
98 VAR
99
m: Marklist;
100
1: CARDINAL;
101
sn: Symbolnode;
102
103
PROCEDURE DelEps (loc:CARDINAL);
104
VAR gn,gnl: Graphnode;
105
106

107
108
109

110
24
172
113

BEGIN
IF (loc=0)

OR Marked(loc,m)

RETURN;

IF lp<>0 THEN
GetNode (lp,gnl);
IF
(gnl.typ=eps) AND (gn1l.sem3=0)
AND (gnl.lp=0) AND (gnl.rp<>0) THEN

114

lp:=gnl.rp;

115

END;

116
bald]
118

THEN

Mark (loc,m);
GetNode (loc, gn) ;
WITH gn DO

END;
DelEps (lp);
DelEps (rp);

RepNode (loc, gn);

END;

*)

Program listings

270

119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
185
136
137
138
199
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177

App. F

END;
END

DelEps;

BEGIN
ClearMarkList
(m);
FOR i:=maxp+l TO maxs
GetSy(i,sn);

DO
DelEps (sn.start);

END;

END

DeleteRedundantEps;

(* GetNode

Get

node

gp

PROCEDURE GetNode (gp:CARDINAL; VAR
BEGIN gnl:=gn[gp]; END GetNode;

(* GraphList

PROCEDURE

trace

output

gnl:Graphnode);

of graph

node

list

GraphList;

VAR

1,3,1: CARDINAL;
name:
sn:

ARRAY[1..80]
Symbolnode;

OF CHAR;

BEGIN

WriteString
(con, "$$Topdown-graph:$$”);
WriteString(con,"loc
symbol
typ

WriteString(con,"

seml

FOR i:=1 TO maxn DO
WriteCard(con,1,3);

sem2

lp

sem3

link

WriteString(con,"

rp");

line$$");

");

WITH gn[i] DO
CASE typ OF
eps,any:

WriteString(con, "

we

| EAMES
GetSy(sp,sn);

GetName(sn.spix,name,1);

FOR

12 DO name[j]:="

j:=l+1

TO

WriteText (con,name,12);
| err:
WriteString(con, "error
END;
(*CASE*)

"; END;

LANG

CASE typ OF
|
|
|
|

eps:
t:
pr:
nt:
any:
ELSE;
END;

WriteString(con,"
WriteString(con,"
WriteString(con,"
WriteString(con,"
WriteString(con,"

eps
t
pr
nt
any

");
");
™);
");
");

(*CASE*)

WriteCard(con,1lp,7); WriteCard(con, rp,7) ;
WriteCard(con,seml,7); WriteCard(con, sem2,7);
WriteCard(con, sem3, 7); WriteCard(con, link, 7);
WriteCard(con,line,7); WriteLn(con);

END; (*WITH*)
END; (*FOR*)
END GraphList;

(* Mark

Marks

node

loc

in m as

visited

a

cocograMOD

271

PROCEDURE Mark(loc:CARDINAL;
BEGIN INCL(m[loc DIV 16],loc

VAR m:Marklist);
MOD 16); END Mark;

(* Marked

loc

Tests

PROCEDURE

BEGIN

if node

Marked(loc:CARDINAL;

RETURN

(loc

MOD

16)

(* NewEpsBeforeDelNts
PROCEDURE

is marked

VAR

IN m[loc

Insert

in m

m:Marklist):

BOOLEAN;

DIV

Marked;

16];

eps before

END

del.

nt's

with

alternatives

NewEpsBeforeDelNts;

VAR

gn,gnl: Graphnode;
loc, locl,maxloc: CARDINAL;
sn:
Symbolnode;
BEGIN

maxloc:=maxn;
FOR loc:=1 TO maxloc DO
GetNode (loc,gn);
IF (gn.typ=nt) AND (gn.lp<>0) AND DelNode(gn)
locl:=NewNode (gn.typ,gn.sp,gn.line);
gnl:=gn; gnl.lp:=0;

THEN

WITH gn DO
typ:=eps;
END;
RepNode

sp:=0;

rp:=locl;

seml:=0;

sem2:=0;

sem3:=0;

(locl,gnl);

RepNode (loc, gn) ;
“~

END;
END; (*FOR*)

END NewEpsBeforeDelNts;

(* NewNode

214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235

PROCEDURE

Generate

a new

NewNode (t:Symboltype;

graph

node

s:CARDINAL;

and

return

1:CARDINAL):

the

CARDINAL;

BEGIN

INC (maxn) ;
IF maxn>maxnodes THEN Restriction(5); END;
WITH gn[maxn] DO
typ:=t; sp:=s; lp:=0; rp:=0; seml:=0; sem2:=0;
line:=1; link:=0;
END;

RETURN

maxn;

END NewNode;

(* RepNode

Replace

node

gp

PROCEDURE RepNode (gp:CARDINAL; gnl:Graphnode) ;
BEGIN gn[gp] :=gnl; END RepNode;

BEGIN (*cocogra*)
maxn:=0;
END cocogra.

index

sem3:=0;

any
ClearMarkList
cocogra
cocolex
cocosym
con

151
26
hl
18)
14
17
156
170
33
47
13
89

ConcatLeft
ConcatRight
dat
del
DelEps
103
Deletable
58
DeleteRedundantEps
DelGraph
61
DelNode
74
eps
90
err
Errors
FileIo
GetName
GetNode

GetSy
gl
gll
gn

157
16
17
12)
67

14

gpl

33
33
21
Th
114
203
104
206
33
33

GraphList
Graphnode
Graphnodelist
sl
INCL

138
20
20
Ze
180

gnl
gp

j

140

il
line
link
Oe

Mark
Marked
Marklist
maxloc

140
ON!
40
58
108
194
37
202
26
185
66
65
26
194

maxn

147

locl
lp
m

App. F

Program listings

27,2

maxnodes

20

maxp

14

maxs

14

165
28
235

78

123

69
158
ilzıl
42
53
68

70
161
ITA

70
162

71
163

117
80
97
is
85
112

118

120

125

127
74
92
kan

19
200
161

19

154
108
89
39
41
37
73
133
207
111
229
36
38
174
62
21
28
155
154
201
40
61
114
201
37
220
28
186
107
106
oo
197
197
28
124
124

al
125
47
47
37
23
149
219
112
230
47
47

132
154
50
52
38
74
193
230
112

tial
164

144
165

145
168

146
168

148
169

148
169

152
170

204

133

199

199

40
85
200

41
88
200

Sl
89
200

51
90
201

62
104
201

67
108
201

70
109
202

113

114

132

133

1.93

202

202

132
hil

229

230

85

104

193

229

28

100

125

140

147

148

149

155
158
221
41
64
179
204
38

51
65
180
206
73

221
67
185

70
186

>
186

103
194

106
198

106
199

107
207

110

111

113

114

La

168

200

wf

65

78

99

106

107

123

179

180

179
185
99
198
217
218

180
186
179

185

218

219

223

234

215

App. F

cocograMOD

name
141
NewEpsBeforeDelNts
NewNode
201
nt
88
p
34
50
pr
163
RepNode
114
RepSy
14
Restriction
16

rp
s

51
215

seml
sem2
sem3
sn
sp
spix
start
Symbolnode
Symboltype
t
typ
WriteCard
WriteLn
WriteString

169
169
112
86
71
154
125
14
15
153
70
7
17
17
165
18

WriteText

154
193
ails)
153
36
51

155
210
224
164
37
51

Et

37
51

206

207

229

230

13

114

220
220
204
89
154

218
74
220
204
204
1%
89
89

86
215
162
88
70
qi
69
156

273

156

200
37

38

39

40

118

168

204

220

220
101
201

125
204

125
220

142

101

142

195

215
90
70
171
144

220
112
di

150
148

160
168

145

146

148

40

40

154

154

195

200
168

201
169

204
169

220
170

170

al

152

158

161

162

163

164

41

48

App. F

Program listings

274

(* cocolex

Lexical

analyzer

for coco

This is the Coco-scanner. It
a) reads the input grammar
b) returns symbol numbers and terminal
c) hashes names and strings into a name
temporarily)
d) converts number-strings to values

All symbols which are not
'nococosy' and are hashed

Moe

attributes
list

to the parser

(permanently

terminals of Cocol get the
into the name list.

83.03.27

symbol

or

type

DEFINITION MODULE cocolex;
FROM FileIO IMPORT File;
VAR

typ:
CARDINAL;
(*next token code*)
at:
ARRAY[1..10]) OF CARDINAL;
(*attr. values of current token*)
line: CARDINAL;
(*current line number*)
el
on
eee
ee
cee
Cel
jr
HH
&wWwWNHrMN
DWAA
CW
WODAIDNSFwWNHH
col:
CARDINAL;
(*current column number*)
20
ddt:
ARRAY ["A".."Z"] OF BOOLEAN;
(*debug and test switches*)
21
sre:
File;
(*source file*)
22
23 PROCEDURE GetName (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL);
24 (* Get the text of a name or a string with the spelling index spix.
25
len denotes its length*)
26
27

PROCEDURE

GetSy;

28 (* Gets the next input token and fills at, line and col*)
29
30 PROCEDURE RestartHash;
31 (* Causes identifiers and strings to be stored permanently*)

PROCEDURE
(* Causes
END

Ww
w
WW
Dom
MY
Sw

StopHash;
identifiers

cocolex.

and

strings

to be stored

temporarily*)

App. F

cocolex MOD

(* cocolex:
=======

275

lexical analyzer for coco
S======2=2=25==222=2===222222=

moe

83.03.27
83512023

This is the Coco-scanner. It
a) reads the input grammar
b) returns symbol numbers and terminal attribut
es to the parser
c) hashes names and strings’into a name list
(permanently or

temporarily)

d) converts number-strings to values
All symbols which are not terminals of Cocol get
"nococosy' and are hashed into the name list.
IMPLEMENTATION

FROM

cocosyn

MODULE

the

symbol

type

cocolex;

IMPORT

printinput,

FROM Errors

IMPORT SemErr,

FROM

FileIo

IMPORT

FROM

SYSTEM

IMPORT

printnodes;

Restriction;

con, EF, EOL, File,
Read, Write, WriteCard,
VAL;

WriteString,

WriteText;

RPP
PRP
Hr
CMO
BB
UH
vo
Pur
au
DID
wm
r CONST

20
21
22

eofsy
ident
string
number
eqlsy
periodsy
varlantsy

23
24
25
26

27

1parsy

28
Zoe

rparsy
ibracksya

=
0;
=
178
=
18;
19,
=
20;
=
21;
=
22;

=)

23;

Ae

=
=

24;
725;

Ca)
er)

026:
2271;

(Er)

SO Ree LACKS Vm
Sie
lconbrsy
|=)
S25

rconbrsy)

= 5 29

Some
34

lat pansy
ratparsy

=
=

2 OF
30;

35
36
37
38

semicolonsy=
colonsy
=
commasy
=
snococosy)
=

31%,
32,
33;
73%;

39
40
41
42

notyp
buflen

=
=

43 TYPE
44
Charclass

45
46
47

(*lexical
(*numbers

types*)
1..16 reserved

for

keywords*)

aes)

GE)
(ES)

255;
1024*16;

=

(none, letter,digit,quote,eql, period, variant, lpar, rpar, lbrack,
rbrack, lconbr, rconbr, latpar, ratpar, semicolon, colon, comma, endfile,
endline,dollar,minus);

48
49 VAR
50
51

SP
53
54
55

Ce
class:

CHAR;
ARRAY

[0C..377Cj OF Charclass;
DUT:
ARRAY [0..buflen-1] OF CHAR;
bp,bpmax:CARDINAL;

(*class OF input character*)
(*input buffer*)
(*buffer pointers*)

56 CONST

57
58

59

idmax
htmax

= 4980;
=
359;

(*max.length
(*max.length

of identifier list*)
of hash table*)

App. F

Program listings

276

60 VAR

Game

chis

CHAR;

(*current

62
63
64
OS
66
OY)
68

column:
abe
idl:
snidact:
keys:
Whee
storeid:

CARDINAL;
CARDINAL;
ARRAY[0..idmax+20] OF CHAR;
CARDINAL;
CARDINAL;
ARRAY(0..htmax] OF CARDINAL;
BOOLEAN;

(*start

input character*)
column*)

of current

(*identifiers*)
(*last element IN id*)
(*pos. OF last keyword IN id*)
(*hash table*)
(*store id. permanently?*)

69
70
71

(* Nextch

Get

next

input

character

(ch,column

global)

12. --------------------------2 2222222222002
73 PROCEDURE
74 BEGIN

75

222 2o2ooooo 5)

NextCh;

Read(srce,ch);

INC (column);

76
END NextCh;
Ui
78
79 (* Hash
WW) Sear SSS

Hash an
ES

identifier and
SS

return
i

its spix
Se
ee

esas

81 PROCEDURE Hash(idp:CARDINAL; VAR spix: CARDINAL);
82 VAR h,l,d: INTEGER;

SS *)

83
84
85
86

PROCEDURE Equalld(x,y,1:CARDINAL)
:BOOLEAN;
VAR 1: CARDINAL;
BEGIN

87
88

1:=0;
WHILE

(i<l)

89

RETURN

i=];

90

END Equalld;

AND

(id[x+i]=id[y+i])

DO

INC(1);

END;

91
92

BEGIN

93
94
95

l:=idp-idact; spix:=idact+1;
h:=(ORD(id[{spix])*7 + ORD(id[spix+1])
d:= -htmax;

96

LOOP

97

IF ht[h]=0

98

THEN

IF storeid

99

* 17 MOD

(*new

THEN

ht[h]:=spix;

idact:=idp;

htmax;

identifier*)

END;

EXIT;

100

ELSIF

101
102
103
104
105

spix:=ht[h]; EXIT;
ELSE
INC (d, 2);
IF d=htmax THEN Restriction(l);
h:=(h+ABS(d)) MOD htmax;

106
107

END;

Equalld(ht[h],spix,1)

THEN

(*old identifier*)
(*collision*)
END;

(*hash

table

full*)

END;
(*LOOP*)

108

IF idp>idmax

109

END

THEN

Restriction(2);

END;

(*identifier

Hash;

110
al
Es (* EnterKey

Enter

a keyword

114 PROCEDURE EnterKey (sy:CARDINAL;
115 VAR idp,i: INTEGER;
116 BEGIN

117
118

+ 1)

INC (idact); id{idact]:=CHR(sy);
FOR 1:=0 TO HIGH(key) DO

to the

key:ARRAY

identifier

list

full*)

list

OF CHAR);

idp:=idact;

(*store
(*store

symbol number*)
keyword*)

App. F

119
120
121
122
123
124
125
126
AL
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
197
158
159
160
161
162
163
164
165
166
167
168
169
170
ial
172
Ws
174
175
176
177

cocolex MOD

INC (idp);

277

id{idp]:=key[i];

END;

INC (idp); id[idp]:=0c;
Hash (idp, keys) ;
(*keys
END EnterKey;

(* GetName

Get

PROCEDURE

the

contains

name

of an

GetName (spix:CARDINAL;VAR

VAR i,h:CARDINAL;

the

last

keyword

identifier

from

name:ARRAY

spix at any

the

name

1:=spix;

1:=0;

h:=HIGH (name) ;

(id[{i]<>0C)

name[l]:=id[i];

AND

(l<=h)

INC(1);

DO

INC(1);

END;
END GetName;

(* ReadName

Read

identifier

or keyword

PROCEDURE ReadName (VAR typ, val:CARDINAL);
VAR spix,idp: CARDINAL;
BEGIN

idp:=idact;
WHILE (class[ch]=letter)
INC (idp); id[idp]:=ch;

OR

(class[ch]=digit)

DO

NextCh;

END;

INC (1dp);

id[idp]:=0c;

Hash (idp, spix);

‘IF spix<=keys
THEN typ:=ORD (id[spix-1]); val:=0;
ELSE typ:=ident; val:=spix;

(*keyword*)
(*identifier*)

END;

END

ReadName;

(* ReadString

Read

and hash

PROCEDURE ReadString(VAR
VAR
och: CHAR;
idp: CARDINAL;

a string

spix:CARDINAL);

BEGIN

idp:=idact; och:=ch;
INC (idp); id[idp]:=och; NextCh;
LOOP
IF ch=och THEN NextCh; EXIT;
ELSIF
ELSIF

ELSE

(*store

quote*)

ch=EF THEN SemErr(24,line,col); EXIT;
ch=EOL THEN SemErr(23,line,col); EXIT;

INC(idp);

id{idp]:=ch;

NextCh;

END;

END;

INC (idp);

id[idp] :=och;

(*store

quote*)

INC (idp); id[idp] :=0C;
Hash (idp, spix)
END ReadString;

(* RestartHash

Causes

identifiers

to be stored

list

3)
OF CHAR;VAR

BEGIN
WHILE

time*)

permanently

1: CARDINAL) ;

Program listings

278

178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236

woe

ee

PROCEDURE RestartHash;
BEGIN storeid:=TRUE; END

(* StopHash
sr se

Causes

==

------

= = = = = =

=

x)

RestartHash;

identifiers

a Sn es scm see rk i wis = ni

App. F

ne, cs a mscas

to be stored

ne a

ms awa ps

sr

en

temporarily
Src

es Sy

eS

a a

rs

*)

PROCEDURE StopHash;
BEGIN

storeid:=FALSE;

(* ReadNumber
Fee

un

u

PROCEDURE

END

StopHash;

Read and convert

rea

a a

a a wa

Fa

ae

ee

cardinal
SE

constant

N a ao

SS

SS

SS

ee

*)

ReadNumber
(VAR val:CARDINAL);

BEGIN

val:=0;
WHILE class[ch]=digit DO
IF (val>6553) OR ( (val=6553)

AND

(ch>'5')

)

THEN

SemErr(22,line,col);
WHILE class[ch]=digit

DO NextCh;

END;

ELSE
val:=10*val+VAL
(CARDINAL, ORD (ch) -ORD('0'));

NextCh;
END;
END;

END

ReadNumber;

(* GetSy
nn

a

a -

get

next

=>

2 7-22

lexical

symbol

- - - - - - ---

-- -

2-22

22a

PROCEDURE GetSy;
VAR val:CARDINAL;
BEGIN

REPEAT
WHILE ch=' ' DO NextCh; END;
col:=column;
CASE class[ch] OF
none:
typ:=nococosy; at[1]:=ORD(ch); Nextch;
| letter:
ReadName
(typ, val) ;
IF typ=ident THEN at[1]:=val; END;
| digit:
ReadNum(at
ber
[1]); typ:=number;
| quote:
ReadString(at[1]); typ:=string;
eql:
period:
| variant:
| lpar:

typ:=eqlsy; NextCh;
typ:=periodsy; NextCh;
typ:=variantsy; NextCh;
typ:=lparsy; NextCh;

rpar:
| lbrack:
| rbrack:
lconbr:
rconbr:
| latpar:
ratpar:
semicolon:
| colon:

typ:=rparsy; NextCh;
typ:=lbracksy; NextCh;
typ:=rbracksy; NextCh;
typ:=lconbrsy; NextCh;
typ:=rconbrsy; NextCh;
typ:=latparsy; NextCh;
typ:=ratparsy; NextCh;
typ:=semicolonsy; NextCh;
typ:=colonsy;
NextCh;

comma :

endfile:
| endline:

typ:=commasy;

typ:=eofsy;
typ:=notyp;

NextCh;

----________

*)

App. F

cocolex.MOD

237
238
239
240
241
242

column:=0; INC(line); Nextch;
IF (line MOD 16)=0 THEN (*update counter
IF line>16 THEN
FOR 1:=1 TO 5 DO Write(con,10C) END;
END;
WriteCard(con,
line, 5)

243

244

screen*)

Nextch;

IF CAP (ch)="D"

(*debug option*)

THEN
NextCh;

246
247
248

WHILE

249
250

(CAP(ch)>="A")

„

IF ddt["A"]
IF ddt["B"]

253

WHILE

(CAP(ch)<="Z")

ch<>EOL

DO NextCh;

typ:=notyp;
ELSE typ:=nococosy;

256

END;

END;
END;

END;

at[1]:=ORD('$');

NextCh;
IF ch='-'

259

THEN

260
261
262
263

WHILE ch<>EOL DO NextCh; END;
typ:=notyp;
ELSE typ:=nococosy; at[1]:=ORD('-');
END;
END;

DO

NextCh

THEN printinput:=TRUE
THEN printnodes:=TRUE

254
255
| minus:

AND

ddt [CAP (ch) ]:=TRUE;
END;

21
252

264

on

END;

| dollar:

245

257
258

279

(*CASE*)

265
UNTIL typ<>notyp;
266
END GetSy;
A
u
268
269 BEGIN (*cocolex*)
270
FOR c:="A" TO "Z" DO
AAU
FOR c:=0C TO 377C DO
272
FOR c:='a’ TO 'z' DO
273
FOR c:='A' TO 'Z' DO
274
FOR c:='0' TO '9' DO
20S)
class [EF] :=endfile;
276
class["'"] :=quote;

ddt[c]:=FALSE END;
class[c]:=none; END;
class[c]:=letter; END;
class[c]:=letter; END;

class[c]:=digit; END;
class [EOL] :=endline;
class['$']:=dollar;

class['"'] :=quote;

277

class['(']:=lpar;

class[')']:=rpar;

class[',']:=comma;

278

class['-']:=minus;

class['.']:=period;

class[':']:=colon;

279

class[';']:=semicolon;

class['<']:=latpar;

class['=']:=eql;

280
281
282
283
284
285
286
287
288
289
290

class['>']:=ratpar;
class['{']:=lconbr;

class['[']:=lbrack;
class['|’]:=variant;

class[']']:=rbrack;
class['}']:=rconbr;

FOR 1:=0 TO htmax-1 DO ht[i]:=0; END;
storeid:=TRUE;
1410] :="E"; ial1):="0”, 1d[2]:="F", 1A[l3]:=0C;
idact:=3;
EnterKey( 1,'ALIAS');
Enterkey( 1,'alias');
EnterKey( 2,'ANY');
EnterKey( 2,'any');

column:=0;

col:=0;

line:=1;

291

EnterKey(

292

EnterKey(

4,'ENDGRAM');

293
294

EnterKey(
EnterKey(

5, 'ENDSEM');
6, 'EPS');

295

EnterKey(

7, 'GRAMMAR');

ch:="

";

3,'DECLARATIONS') ;
EnterKkey(
EnterKey(

5, 'endsem');
6,'eps');

Program listings

280

296
297
298

Enterkey(
EnterKey( 8,'IN');
+ EnterKey( 9, 'MACROS');
_Enterkey
(10, 'NONTERMINALS') ;

299

EnterKey
(11, 'OUT');

300
301
302
303

Enterkey
(12, 'PRAGMAS');
EnterKey(13,'RULES');
EnterKey(14,'SEM');
_EnterKey
(15, 'SEMANTIC');

304

EnterKey
(16, 'TERMINALS') ;

305

END

ABS
at

bp
bpmax
buf
buflen
6

€
CAP

ch

Charclass
class

cocolex
cocosyn
col
colon

colonsy
column

comma
commasy
con
d
ddt
digit
dollar
EF

endfile
endline
EnterKey

eofsy
EOL
eql

eqlsy
Equalld
Errors
File
FileIo
GetName
GetSy
h
Hash
HIGH

App. F

8,'in');

EnterKey
(11, 'out');

EnterKey
(14, 'sem');

cocolex.

105
2165021:855219522205225555262
53
53
52
Al
52
a
GH
aa
ales
a
We
BAY
Bahl
SU N
a
ZN
Pk
RIP
Pl
le
245
248
248
249
61
75
144
144
145
163
166
167
UCR
ANY
als) Ailey IG
24552409248
283
4
51
51 144
144
194
198
215
271
272
US
UGS
CNG
Zi
21)
PAR
PIG
Bie
2800 9280582805
9281952990528
12 305
13
167
168
197
214
283
46 233
278
S58)
C2
Smee217375783
46234
277,
32234
15 240
242
82
95 103
104
105
24
9525
1252270
45 144
194
198
219
274
47 244
276
1022116738275
20282355275
47 236
275
114
123
289
289
290
290
291
292
295
296
296
297
298
299
299
300
304
2000235
15
168
253
260
275
AS 2212719
AN BPI
84
90 100
14
15
15
172855135
209
266
82:
94°
97.
98 -100..10%
105
105
re
118
131

271
273

287
274

274

168
249

169
253

194
258

195
260

273
278

274
279

215
279

215
279

293
301

293
302

294
302

294
303

129

ia

132

cocolex.MOD

id
idact
ident
idmax

idp

key
keys
l
latpar
latparsy
lbrack
lbracksy
lconbr
lconbrsy
letter
line
lpar
lparsy
minus
name
NextCh

nococosy
none
notyp

number
och
period
periodsy
printinput
printnodes
quote
ratpar
ratparsy

rbrack
rbracksy
rconbr
rconbrsy
Read
ReadName
ReadNumber
ReadString
RestartHash
Restriction
rpar
rparsy
SemErr
semicolon
semicolonsy
spix

97
67
85
132
88
164
93
151
64
93
145
171
118
122

84

230
230
226
226
228
228
144
168
224
224
Zoi
131
76
224
244
216
216
236
219
163
222
222
251
252
220
231
231
227
227
229
229
19
153
204
174
180
104
22
225
167
232
232
93
151

98
94
87
133
88
169
93
218
108
98
145
172
119
149
88
279

281

100
95
88
133
94
171
98

101
104
88
240
94
172
117

285
105
88
285
117
287
117

285
88
285
119
287
117

89

1215

118

119

129

121
287
143

132
287
163

133

145

147

288

108
147
172

115
tay
173

117
148

119
161

119
163

121
164

121
164

122
169

141
169

89

93

94

100

128

WH!

132

133

133

272
287

273
238

239

242

283

164
226
249
262

166
227
253

169
228
257

198
229
260

201
230

218
231

216
232

22
233

222
234

261

265

164
278

166

171

276
280

276

100

101

128

131

141

148

149

280

281
2107
197
277
278
133
145
225
247
255
271
254

280
281

217
219
220
108
277

168
279

197

94
158

94
173

98

src
StopHash
storeid
string

sy
SYSTEM

typ

val
VAL
variant

variantsy
Write
WriteCard

WriteString
WriteText
x

y

App. F

Program listings

282

75
185
68
22
114
17
140
225
254
140
17
45
26
16
16
16
16
84
84

186
98
220
1977

150
226
255
150
200
223
223
240
242

88
88

180

186

286

151
227
261
io

216
228
262
191

217
229
265
193

281

218
230

DUG)
Dil

PAY) BE
2G.
23200233231

223224
235
236

195

195)

2008

217

200R

e210

218

App. F

cocolst.DEF

1 (* cocolst
2

Prints

ZZZ222=

=

SS

listing

ZSZSEI2I5ESESI

EI

of Cocol
SI

S=S SS

283

text

00222202

Moe

16.8.87

>

3 This module closes the source file and reopens it for reading.
It prints
4 a listing of the source file with line numbers and error messages.
ee
ae
a
en Sl
le
I nn
=)
6 DEFINITION MODULE cocolst;
7 FROM

FileIO

IMPORT

File;

8
9 VAR Ist: File;
(*list file*)
10
11 PROCEDURE PrintListing;
12
13 END

cocolst.

284

Program listings

(* cocolst

Prints

listing

of Cocol

This module closes the source file and
a listing of the source file with line
6 IMPLEMENTATION

7 FROM cocolex
8 FROM Errors
9 FROM FileIO
10

MODULE

IMPORT
IMPORT
IMPORT

Moe

reopens
numbers

16.8.87

it for reading. It prints
and error messages.

cocolst;

15 PROCEDURE GetLine(f:File;
16 VAR ch:CHAR; i:CARDINAL;

line. Return
SSS

VAR

line:ARRAY

empty line if eof.
SS IEE
OF

*)

CHAR);

BEGIN

18
19
20
723

24

text

src;
Errorptr, GetNextSynErr,GetNextSemErr,
PrintSynError;
File, EF, EOL, Open, Close, Read, Write,
WriteString, WriteCard, Writeln;

11
12
13 (* GetLine
Read a source
WAL
SRS IS aS
a

17

App. F

Read(f,ch); i:=0;
WHILE (ch<>EOL) AND
IF (1=0) AND (ch=EF)
END GetLine;

(* PrintSemError

26 PROCEDURE

(ch<>EF) DO line[i]:=ch;
INC(1); Read(f,ch)
THEN line[0]:=EF ELSE line[i]:=0C END;

Print

semantic

PrintSemError(f:File;

error

END;

message

nr,col:CARDINAL);

27 VAR i:CARDINAL;
28 BEGIN
29
30

WriteString(f£,"*****
Writestering(t mas:

31
32

CASE

");

FOR

nr OF
1: WriteString(f,"Symbol

i:=1

TO

declared

col-1

DO

Write(f,"

")

twice");

33

| 2:

WriteString(f,"Grammar

34
35
36
31]

|
|
|
|

WriteString(f,"Undeclared symbol");
WriteString(f,"Terminal on left-hand side of rule");
WriteString(f,"Two rules for the same nonterminal") ;
WriteString(f,"Wrong number of attributes");

38
39
40
41

| 7: WriteString(f,"In-attribute for a terminal") ;
8: WriteString(f,"Wrong attribute direction") ;

3:
4:
5:
6:

name

is

no

nonterminal");

9: WriteString(f,"Wrong attribute name");
10: WriteString(f,"Attribute constant on left-hand

42

I11:

WriteString(f,"Semantic

43
44
45
46

|12:
16:
|21:
|22:

47
48

WriteString(f,"Undeclared semantic macro") ;
WriteString(f,"Pragma used in rules");
WriteString(f,"File "cocosynframe' not found");
WriteString(f,"Number too agit):

23:
124:

WriteString(f,"End
WriteString(f,"End

49
50
5
52
53
54
55

of
of

macro

line
file

declared

side

twice");

in string");
in string");

125: WriteString(f,"File 'cocosemframe'
ELSE WriteString(f,"Error");
END;

not

found") ;

WriteLn(f);
END PrintSemError;

= (* PrintListing

59 VAR

Print

a source

list

END;

with

error

messages

of rule");

App. F

cocolstMOD

60

volRef:

INTEGER;

(*volume

285

or directory

of source

file*)

61

seen:

62
63
64
65

line:
ARRAY[0..255] OF CHAR; (*source line*)
symbols: Errorptr;
(*pointer to error symbols*)
synline,syncol: CARDINAL;
(*line and column of syntax error*)
semnr:
CARDINAL;
(*semantic error number*)

66
67
68
69

semline,semcol: CARDINAL;
Inr:
CARDINAL;
sync,semc:CARDINAL;
18
CARDINAL;

ARRAY[0..63]

OF CHAR;

(*source

name*)

(*line and column of
(*line number*)
(*error counters*)

semantic

error*)

70 BEGIN
71
volRef:=sre*.volRef;
72
73

1:=0; REPEAT srcn[i]:=src*.name[i];
INC(1)
Close (src); Open(src,volRef,sr
FALSE);
en,
GetNextSemErr
(semnr, semline,semcol) ;
GetNextSynErr
(symbols, synline,syncol) ;

74
15
76

GetLine(src,line);

77

WHILE

78
79
80
81

WHILE

symbols<>NIL

DO

PrintSynError (lst,symbols,syncol); INC (sync);
GetNextSynErr (symbols,synline,syncol);

93

END;

WHILE semnr<>0 DO
PrintSemError (lst,semnr,semcol); INC (semc);
GetNextSemErr(semnr,semline,semcol) ;

97

END;
WriteLn(lst) ;

99
WriteCard(lst,sync,5);
100
WriteCard(lst,semc,5);
101
END PrintListing;
102
103 END cocolst.

C
ch

Close
cocolex
cocolst
col

ZZ
Gms

103
29

EF

J

WY

EOL

ly

Errorptr
Errors

he
8

8)

oe
u

Oel
a

File

WriteString(lst,"
WriteString(lst,"

TOR

Oe

10

syntax error(s)$");
semantic error(s)$$");

2.0

IE
7
6
26

f

sync:=0;

WHILE semline=lnr DO
PrintSemError(lst,semnr,semcol);
INC(semc);
GetNextSemErr
(semnr, semline, semcol) ;
END;
GetLine(src,line); INC(lnr);
END;

91
92

98

semc:=0;

DO

GetNextSynErr
(symbols, synline,syncol) ;
END;

84
85
86
87
88
89

94
95
96

Inr:=1;

sren[1-1]=0C;

WriteCard(lst,Inr,5); WriteString(lst,"
");
WriteString(lst,line); WriteLn(lst);
WHILE synline=lnr DO
PrintSynError(lst,symbols,syncol); INC (sync);

82
83

90

line[0]<>EF

UNTIL

2

Oe
he

2

Wi

629293032
0 eet
re
aa

33
45)

34
468

35
Ai,

36
48

App. F

Program listings

286

FileIO
GetLine
GetNextSynErr
i

76
86
82
19

88
96
92
19

20

20

27

29

69

line
Inr
st

20
78
79

20
80
79

62
84
81

76
88
85

77

79

88

91

95

85
91
19
85
85
84
85
72
12
81
81
81
80
71

95

95
86
86
86
73
73
82
91
82
82
18

99
79
30
43

100
98
32
44

36
48

GetNextSemErr

72

We

72

98

99

99

100

31
49

38
50

39
78

40
79

name
nr
Open
PrintListing
PrintSemError
PrintSynError
Read
semc
semcol
semline
semnr
sec
sren
symbols
sync
syncol
synline
volRef
Write
WriteCard
WriteLn
WriteString

101

100
95
96
94
73
90
99
91
92

33
45

96
95
76

96
88

91

92

92

34
46

35
47

App. F

cocosem.DEF

1 (* Generated

2

semantic

analyzer

====2=====- 222222220200 00000-

3 This module is produced
4 attributed grammar.
6
7
8
9

287

by Coco

from

the

semantic

actions

DEFINITION MODULE cocosen;
VAR printactions: BOOLEAN;
(*trace
PROCEDURE Semant (sem:CARDINAL) ;
END cocosem.

the

executed

semantic

of an

actions*)

App. F

Program listings

288

(* Generated

semantic

analyzer

This module is produced
attributed grammar.

by Coco

from

the semantic

actions

of an

w
ome
hdr

IMPLEMENTATION

MODULE

cocosem;

FROM
FROM
FROM

FileIO IMPORT con, WriteCard,
SYSTEM IMPORT WORD;
cocolex IMPORT at;

FROM

cocogen

IMPORT

FROM

cocogra

IMPORT

FROM
FROM

cocolex
cocosym

IMPORT
IMPORT

WriteString;

Attrtype,CloseFile,Copy,EmitAction,
GenAssign,
InsertFramePart,OpenFile,OpenSem,
StartCopy;
alts,rules, rootloc,ConcatLeft,ConcatRight,
GetNode, GraphList, Graphnode, NewNode, RepNode;
typ, line,col,ddt,RestartHash,
StopHash;
gramspix,CompleteAt,Direction,
GetAt,

GetMacroNr, GetSy,NewAt,

NewMacro,

NewSy, RepSy, Symbolnode, Symboltype, SyNr;
FROM

Errors

IMPORT

FROM SYSTEM IMPORT
CONST null=65535;

CompErr,Restriction,
SemErr;

VAL;

TYPE Usage=(def,
check, use) ;
VAR sn:Symbolnode;
sy, Sy1:CARDINAL;
rootsy:CARDINAL;
eofsy:CARDINAL;
gn:Graphnode;
gp, 9p1,9p2,
gp3: CARDINAL;
gl,gl1,912,913:CARDINAL;
dd, ddl, dd2:BOOLEAN;
gpo : CARDINAL;

firstfact
: BOOLEAN;
kind:Usage;
styp:Symboltype;
dir,dirl:Direction;
count: CARDINAL;

n:CARDINAL;
seml, sem2,sem3:CARDINAL;

firstsymbol
: BOOLEAN;
ok: BOOLEAN;
spix,spix1:CARDINAL;
dummy
: CARDINAL;

MODULE SEMANTICSTACK;
IMPORT CompErr,Restriction;
EXPORT Pop, Push;
CONST maxstacksize=70;
VAR stack:ARRAY[1..maxstacksize]OF

CARDINAL;

sp:CARDINAL;

PROCEDURE
VAR

Pop() :CARDINAL;

x:CARDINAL;

BEGIN IF sp=0 THEN CompErr
(6) ;ELSE
RETURN x;
END Pop;
PROCEDURE Push (x:CARDINAL);
BEGIN IF sp<maxstacksize

58
59

THEN

INC (sp) ;stack
[sp] :=x;

ELSE

Restriction (14);

x:=stack [sp] ;DEC (sp) ;END;

App. F

60
61
62
63
64
65
66
67
68
69
70
71
72
Ue)
74
75
76
vi
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118

cocosem.MOD

END;
END Push;
BEGIN sp:=0;
END SEMANTICSTACK;
PROCEDURE Error (nr:CARDINAL);
BEGIN SemErr(nr,line,col);BND

PROCEDURE
BEGIN

ASSIGN(VAR

x:WORD;

Error;
y:WORD) ;

xy;
END

ASSIGN;

PROCEDURE
BEGIN

Semant

(sem:CARDINAL)

;

(*IF printactions THEN
WriteString(con,"$

I)

WriteCard(con,
sem, 3);

WriteString(con,"]

");

END;*)
CASE sem
ne

|

12:

OF

(*line 125*)
INC (count);
CASE

kind

OF

use:

IF styp=nt THEN
GetAt (sy, count, spixl,dirl);
IF spixl<>0 THEN
IF dir=dirl
THEN GenAssign (nonterm, spix1,spix);
ELSE Error
(8) ;END;

P

END;
END;

|check:
IF styp=nt THEN
GetAt (sy, count, spix1,dirl);
IF spixl<>0 THEN

IF spix<>spixl
IF dir<>dirl
END;

THEN Error
(9) ;END;
THEN

Error(8) ;END;

END;

|def:
NewAt (sy, spix,dir);
END;

|

13:

(*line

150*)

INC (count);
CASE

kind

OF

use:
IF

styp=t

THEN

GenAssign
(term, spix, count) ;

ELSIF styp=nt THEN
GetAt (sy, count, spixl,dirl);

IF spixl<>0

THEN

IF dir=dirl

THEN

GenAssign (nonterm, spix, spix1)

ELSE
END;

Error (8);

END;

289

Program listings

290

119
120
ial
122
123
124
125
126
17277
128
129
130
sh
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
199
160
161
162
163
164
165
166
167
168
169
170
171
1072
WS)
174
175
176
177

END;

|check:
IF styp=nt

THEN

GetAt (sy, count, spix1,dirl);

IF spixl<>0 THEN
IF spix<>spixl THEN Error
(9) ;END;
IF dir<>dirl THEN Error(8);END;
END;
END;

|def:

NewAt (sy, spix,dir);
IF styp=pr THEN
GenAssign
(term, spix, count) ;

END;
END;

14:

(*line 181*)
INC (count);
IF kind=use

THEN IF styp=nt THEN
GetAt (sy, count, spixl,dirl);

IF spixl<>0 THEN
IF dir=dirl
THEN GenAssign (const,spix1,n);

ELSE Error (8);
END;
END;
END;

ELSE

Error (10);

END;

19%

(*line

198*)

IF NOT

CompleteAt
(sy, count) THEN

Error (6);
END;

16:
KT:

18:

(*line 204*)
Copy (typ, col)
(*line 208*)
StartCopy
(1)
(*line 212*)
firstfact :=VAL (BOOLEAN, Pop());

ddl :=VAL (BOOLEAN, Pop()) ;gl1:=Pop() ;gp1:=Pop();
dd:=VAL (BOOLEAN, Pop () ) ;gl:=Pop() ;gp:=Pop() ;
19:

gpo :=0
(*line 219*)
Push (gp) ;Push (gl) ; Push (VAL (CARDINAL, dd) ) ;

Push (gp1) ;Push (gl1) ;Push (VAL (CARDINAL, ddl) );
20:

Push (VAL (CARDINAL, firstfact));
(*line 225%)
sy:=SyNr (spix);
IF sy=null
THEN sy:=NewSy (spix,styp)
ELSE
END;

Error(1);

2%

(*line 349*)
ASSIGN (gramspix,at[1]);

22%

(*line 349*)
rules:=0;alts:=0;

OpenF ile (gramspix)
; StopHash;
238

(*line

357*)

RestartHash;

App. F

App. F

178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236

cocosem. MOD

24:
25%

26:
27:

InsertFramePart;styp:=t;
(*line 363*)
eofsy:=NewSy (0,t)
(*line 365*)
styp:=t;
kind:=def;
(*line 368*)
styp:=pr
(*line 370%)

styp:=pr;
28:

29:

30:

kind:=def;
(*line 371%)
GetSy (sy,sn) ;sn.seml:=sem2;
RepSy (sy,sn);
(*line 376*)
GetSy (sy,sn) ;sn.sem2:=sem3;
RepSy (sy,sn);
(*line 382%)
styp:=nt

Silke

(*line

383*)

ASSIGN (spix,at[1]);
322

(*line

384*)

styp:=nt;

332

34:

kind:=def;
(*line 386*)
rootsy:=SyNr (gramspix) ;
IF rootsy=null THEN Error
(2) ;END;
(*line 390*)
sy:=SyNr (spix) ;
IF sy=null THEN
Error (3);sy:=NewSy (spix,err)
END;

GetSy(sy,sn);
IF (sn.typ<>nt)
AND (sn.typ<>err) THEN
Error (4);
END;
IF sn.start<>0 THEN Error
(5) ;END;
39%:
36:

syl:=sy;count:=0;styp:=sn.typ
(*line 401*)
kind:=check;
(*line 404*)
GetSy (syl,sn);
sn.start:=gp;sn.del:=dd;

RepSy (syl,sn);
INC (rules);
Sis

(*line

“

410%)

rootloc:=NewNode
(nt, rootsy, 0);

38:

gp1:=NewNode (t,eofsy, 0);
gl:=rootloc;gll:=gpl;
ConcatRight (rootloc,gl,gpl,gll)
(*line 415*)
IF ddt ["L"]THEN
CloseFile;

39:

(*line

420*)

gp:=gpl;
gl:=gll;
dd:=ddl;
40:

(*line

420%)

INC (alts);

GraphList;END;

291

Program listings

292

237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
22
213
274
ZS)
276
277
278
Zo
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295

41:

(*line

422%)

INC (alts);

ConcatLeft (gp,gl,gpl,gll);
42:

dd:=dd OR ddl
(*line 429*)
gpo:=0

43:

(*line

430%)

firstfact:=TRUE;
44:

(*line

430*)

gpl:=gp2;

gll:=gl2;
ddl:=dd2;
45;

(*line

46:

firstfact:=FALSE;
(*line 432*)

431*)

IF gp2<>0 THEN
ConcatRight

ddl:=ddl

(gp1,g11,gp2,g12) ;

AND dd2;

END;

47:

(*line 440%)
sy:=SyNr (spix);
IF sy=null THEN
Error (3) ;sy:=NewSy (spix,err)
END;
GetSy (sy,sn);
IF sn.typ=pr THEN

Error (16) ;END;
gp2:=NewNode (sn.typ,sy, line);
gl2:=gp2;dd2:=FALSE;gpo:=gp2;

count :=0;styp:=sn.typ
48:

. 49;

(*line 450*)
kind:=use;

(*line

451*)

GetNode (gp2,gn);
gn.seml:=seml;gn.sem2:=sem2;
RepNode (gp2,gn)
50:

51%

O2

(*line 456*)
gp2:=NewNode
(eps, 0, line);
g1l2:=gp2;dd2:=TRUE;gpo:=gp2
(*line 459*)
gp2:=NewNode
(any, 0,line);
912:=gp2;dd2:=FALSE;gpo:=gp2

(*line

462*)

IF gpo=0
THEN gp2:=New
(eps,
Node
0,line);
g12:=gp2;dd2:=TRUE;
GetNode (gp2, gn) ; gn. sem3:=sem3;
RepNode (gp2, gn) ;
ELSE GetNode (gpo, gn) ;gn.sem3:=sem3;
RepNode (gpo, gn) ;
gp2:=0;912:=0;gpo:=0
END;
53:

(*line 475%)
gp2:=gp;
gl2:=gl;
dd2:=dd;

54:

(*line

478%)

9p2:=NewNode
(eps, 0, line);
gl2:=gp2;
ConcatLeft (gp,gl,gp2,gl2);

App. F

cocosem.MOD

gp2:=gp;gl2:=gl;dd2:=TRUE;
555

(*line

485*)

gp2:=NewNode
(eps, 0, line);

56:

gl2:=gp2;
ConcatRight (gp,gl,gp,gl);
ConcatLeft (gp,gl,gp2,g12) ;
gp2:=gp;dd2:=TRUE;
(*line 493*)
IF firstfact THEN
gp3:=9p2;g13:=g12;
gp2:=NewNode
(eps, 0, line) ;gl2:=gp2;
ConcatRight (gp2,912,9P3,913);

alle

(*line

END;

502*)

sem1:=0;semZ:=0
58:
59:

60:

(*line 503*)
count :=0;
(*line 510*)
IF styp<>nt THEN
dir:=down;
(*line 515*)

Error(7);END;

ASSIGN (n,at[1]);

(Sl

(*line 520*)
IF kind=use THEN
EmitAction(line,seml) ;
END;

62:
63:

(*line 526%)
dir:=up
(*line 531*)

IF (kind=use) OR(styp=pr) THEN
EmitAction(line,sem2) ;
END;

64:

65:
66:

67:

(*line 537*)
StopHash; firstsymbol:=TRUE
(*line 538%)
RestartHash
(*line 539%)
GetMacroNr
(spix, sem3) ;
IF sem3=0 THEN Error (12) ;END;
(*line 543*)
IF firstsymbol THEN
firstsymbol:=FALSE;
OpenSem(line,
sem3) ; StartCopy (col)
END;

Copy (typ, col)
68:

(*line

69:

RestartHash;
(*line 556*)

549*)

OpenSem(line,
sem3) ;
NewMacro
(spix, sem3, 0k) ;
IF NOT ok THEN Error (11);END;
OE

StopHash; firstsymbol:=TRUE;
(*line 562*)
IF firstsymbol THEN
firstsymbol:=FALSE; StartCopy (col)
END;

Ws

PAB

Copy (typ, col)
(*line 568*)
RestartHash
(*line 575*)

293

294

Program listings

356
357

GetSy(sy,sn);sn.aliasspix:=sp1ix;
RepSy (sy,sn);

358
359

App. F

END;
END

Semant;

360 BEGIN
361
362

printactions:=FALSE;
END cocosem.

aliasspix
alts
any

356
15°
276

174°

236

ASSIGN

67

1)

UR

at
Attrtype
check
CloseFile
cocogen
cocogra
cocolex

LO
298
13
24
94
120
1372230
13
NS
10
17

cocosem

cocosym
col
CompErr
CompleteAt

th

18
17
21
18

«238
ASB)

217

Sy

Go
46
149

153
53

98399340350

con
ConcatLeft
ConcatRight
const

8
19523955295
sy
22
PSs
141

Copy
count

SMS 3s 40
352
38
83
87
96
312
21928265,
32015
9861625022072
32
158
163
234
32
248
254
264
1708229
ZA
S02
S128
183
220
37
89
OO
Os!
37
87
89
96
18
37
315
44
13
320
326
28
180
225
273
280
293
298
208
211
259
64
65
91
98
204
208
212
214
21
8
34
157
164
244
cull Fey)
SINS
ec
13
ey
ae
ale
19
87
IC
19333
16
269
282
284
US) US)
RE}
A
Si
IGE)
NG
PG,

dd
ddl
dd2
ddt
def
del
dir
dirl
Direction
down
dummy
EmitAction
eofsy
eps
err
Error
Errors
FileIo
firstfact
firstsymbol
GenAssign
GetAt
GetMacroNr
GetNode
GetSy
gl

eng)

3

es On
saya)

352

aly

106

110

112

122

131

135

2312740
240
248
274
277

240
254
281

291
254
291

296

302

129
114

140
122

315
125

323
138

140

124
314

125
334

142
346

146

150

169

290

295

296

300

300

188)

138

149

201

ie,
DE
997112

306
99
259

116
262

250
SV)

304
AG)
an
122138

DIY
Oo)

a
ae

350

356
239

cocosem.MOD

gp3
gpo
gramspix
GraphList
Graphnode
InsertFramePart
kind
line
maxstacksize
n
NewAt
NewMacro
NewNode
NewSy
nonterm

nr
nt
nulıy

+,

ok
OpenFile
OpenSem
Pop
pr
printactions
Push
RepNode
RepSy
RestartHash
Restriction
rootloc
rootsy
rules
sem
seml
sem2
sem3

Semant
SEMANTICSTACK
SemErr

sn

sp
spix
spixl

301
31
sil
299
31
29
30
302
30
30
276
296
30
33
18
16
16
14
35
17
344
48
39
19
19
16
20
90
64
86
23
42
14
14
47
130
361
47
16
20
17
21
15
27
15
72
40
40
40
72
45
21
25
219
50
43
206
43
124

158
247
301
305
269
159

163
253
305
307
270
162

158
246
277
298
305
160
172

163
252
277
299
307
242
175

22929
178
84
65

107
263

49
141
103
345
224
168
115
65
95
167
345
175
338
51
185

56
271
191
177
46
224
203
174
79
190
190
193
359
63
65
190
220
53
90
208
87
138

295

226
264
306

227
274
307

233
277

239
281

247
286

253
290

294

295

296

270
220

271
232

282
239

282
289

283
295

284
296

284
300

285
300

301

225
253
280
301

226
263
281
302

227
264
282
305

232
264
283
306

239
269
286
306

246
271
289
307

253
273
293

274
294

274
295

264
203

274

277

279

284

285

286

136
273

183
276

188
280

201
293

217
298

267
306

319
320

325
326

338

225
180

263
208

273
259

276

280

293

298

306

111
204
346

121
207

137
258

196

200

211

224

314

157
262

158
325

158

158

159

159

159

162
285
221
342

162

162

163

163

163

164

334

338

344

345

210
265
62
129

23
356

211
356

214
357

29

131

166

168

198

112

113

115

122

123

57
317
129

344
55
187

61
283
194
331
59
226
204
222

357
354

227
224

270
193
282

270
270
282

310
270
284

320
310
284

326
333

190
220
53
98
257
88
139

197
221
53
103
259
90
141

193
261
57
110
333
96

193
262
58
115
345
97

194
263
58
124
356
98

Program listings

296

stack
start
StartCopy

StopHash
styp

Sy
syl
Symbolnode
Symboltype
SyNr

SYSTEM
t
term

typ
up
Usage
use
VAL
WORD
WriteCard
WriteString
x

y

33
220
155
1515
86
196
87
191
263
215
25
36
166
22
178
131
153
35
85
157
67

53
69

App. F

58
338
329
95
200
96
193
356
219

350
347
109
215
103
194
Soff
221

111
265
i
206

203

206

250

180

182

225

211

21

108
158
67

54

121
314
122
207

130
325
129
208

Ney)

168

178

182

138
210

149
215

166
27

167
168
29080259

ZS)

262

263

265

340

352

136
159

267
162

319
163

325
164

56

58

67

69

185

App. F

cocosemframe

(* Generated

semantic

analyzer

This module is produced
attributed grammar.
DEFINITION MODULE
VAR printactions:

by Coco

from

-->modulename;
BOOLEAN;
(*trace

11
12
13
14
IE

PROCEDURE Semant (sem:CARDINAL) ;
END -->modulename.
-->implementation
(* Generated semantic analyzer
S===2=2=2=2=222=2===2=2=2==2=22==2=====
This module is produced by Coco from
attributed grammar.
De
nn
nn
a

18

FROM

10

the semantic

actions

the

executed

semantic

the

semantic

actions

ne
16 IMPLEMENTATION MODULE -->modulename;
17 FROM FileIO IMPORT con, WriteCard, WriteString;
SYSTEM

IMPORT

WORD;

19 FROM -->scannername
20
21 -->declarations
22
23
24

PROCEDURE
BEGIN

250

Xt=y;

26

END

29

BEGIN

30

(*IF

31

+.

IMPORT

ASSIGN(VAR

at;

x:WORD;

y:WORD) ;

ASSIGN;

Du
28 PROCEDURE

Semant (sem:CARDINAL) ;

printactions

THEN

WriteString(con,"$

[");

32
33

WriteCard(con,sem,
3) ;
WriteString(con,"]
");

34

END; *)

35
CASE sem OF
36
112;
37 -->actions
38
END;
39
END Semant;
40

BEGIN

41

printactions:=FALSE;

42

END

-->modulename.

actions

37

ASSIGN

23

at
con
declarations

19
17
21

FileIoO

17

implementation
modulename
printactions
scannername
sem

10
6
7
19
8

Semant

sy

SYSTEM

18

WORD
WriteCard

18
17

297

26

9
4]

16

28

35

ASS)
233

42

ae

of an

actions*)

of an

ee

*)

298

Program listings

WriteString

17

x
y

me
DS)
23S

App. F

cocosym.DEF

(* cocosym

Symbol

This module
a) generates and updates
nonterminals

list

symbol

for

nodes

299

coco

for

Moe

terminals,

28.12.83

pragmas

searches names in the symbol list
stores and retrieves attribute information
stores and retrieves semantic macros

marks

deletable

collects
DEFINITION

symbols

first-sets,

MODULE

in symbol

follow-sets,

list
eps-sets

and any-sets

cocosym;

CONST

,

maxterminals

=

128;

TYPE
Direction
= (up,down);
(*attribute direction*)
Attributeptr = POINTER TO Attribute;
Attribute = RECORD
spix: CARDINAL;
(*name of attribute*)
dir:
Direction;
(*up,down*)
next: Attributeptr;
(*to next attribute of same nt*)
END;
Symboltype = (eps,t,pr,nt,any,err);
Symbolnode = RECORD
spix:
CARDINAL;
(*spelling index of symbol*)
aliasspix: CARDINAL;
(*spelling index of alias name*)
nra:
CARDINAL;
(*no.of attributes*)
CASE typ:
Symboltype OF
(*type of symbol*)
*,
pr: seml,sem2: CARDINAL; (*pragma semantics*)
| nt,err:

start:

CARDINAL;

(*start

of top-down

graph*)

del:
BOOLEAN;
(*TRUE if deletable*)
firstat: Attributeptr;
(*to first attribute node*)
END;
END;
Symbolset = ARRAY[0..maxterminals DIV 16] OF BITSET;
VAR
maxany:
maxeps:

CARDINAL;
CARDINAL;

(*no.of
(*no.of

maxt:
maxp:
maxs:
gramspix:

CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;

(*no.of last terminal*)
(*no.of last pragma*)
(*no.of last nonterminal*)
(*grammar name, filled by .AG*)

PROCEDURE
(* Clears

ClearSet (VAR
set s*)

s:Symbo lset;

PROCEDURE

CompleteAt

(sy,nr:CARDINAL)

(* Checks

if symbol

sy has

any-sets*)
eps-follower-sets*)

n:CARDINAL) ;

: BOOLEAN;

nr attributes*)

PROCEDURE FindDelSymbols;
(* Marks deletable nonterminals

and prints

them*)

PROCEDURE GetA(n:CARDINAL; VAR set:Symbolset) ;
(* Gets the any-set with the number n*)

and

300

60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
iu
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
We
113
114

App. F

Program listings

PROCEDURE GetAt (sy,n:CARDINAL; VAR spix:CARDINAL;
(* Gets the spelling index spix and the direction
attribute of the symbol sy*)

VAR dir:Direction);
dir of the n-th

PROCEDURE GetE(n:CARDINAL; VAR set:Symbolset);
(* Gets the eps-follower-set with the number n*)
PROCEDURE

(* Gets

GetF (sy:CARDINAL;

the

set

of terminal

VAR

first:Symbolset) ;

start

symbols

for the

nonterminal

PROCEDURE GetFirstSet (loc:CARDINAL; VAR set:Symbolset);
(* Gets the terminal start symbols of the graph with the

root

sy*)

loc*)

PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset) ;
(* Gets followers of the nonterminal sy*)

PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL) ;
(* Gets the number sem of the semantic action corresponding
macro with the name spix*)
PROCEDURE GetSy(sy:CARDINAL;
(* Gets the symbol node with

VAR
the

sn:Symbolnode);
index sy*)

PROCEDURE GetSymbolSets;
(* Collects first-sets, follower-sets,

PROCEDURE IsInSet (n:CARDINAL;
(* TRUE if n is in set s*)

VAR

PROCEDURE NewAt (sy,spix:CARDINAL;
(* Enters a new attribute for the
spix and the direction dir*)
PROCEDURE
(* Enters
sem*)

eps-sets

and any-sets*)

s:Symbols
:BOOLEAN;
et)

dir:Direction);
symbol sy with the

spelling

NewMacro(spix,sem:CARDINAL; VAR ok: BOOLEAN) ;
a new semantic macro with the name spix and the

PROCEDURE NewSy (spix:CARDINAL;
(* Generates a new symbol with

its

returns

index*)

to the

index

action

typ:Symboltype) : CARDINAL;
the name spix and the type typ and

PROCEDURE RepSy (sy:CARDINAL; sn:Symbolnode);
(* Replaces the symbol sy by the node snt)
PROCEDURE SetBit (VAR
(* Sets bit n in set

s:Symbolset;
s*)

PROCEDURE Unit (VAR sl,s2:Symbolset;
(* Adds the set s2 to the set s1*)
PROCEDURE

(* Gets
END

n:CARDINAL) ;

n: CARDINAL) ;

SyNr(spix:CARDINAL) : CARDINAL;

the

cocosym.

symbol

number

for the

identifier

with

the name

spix*)

number

App. F

cocosym.MOD

(* cocosym

Symbol

list

301

for coco

Moe

This module
a) generates and updates symbol nodes for terminals, pragmas
nonterminals
b) searches names in the symbol list
c) stores and retrieves attribute information
d) stores and retrieves semantic macros
o ~— marks
deletable symbols in symbol list
f) collects first-sets, follow-sets, eps-sets and any-sets

29.12.83

and

IMPLEMENTATION MODULE cocosym;
FROM cocogra IMPORT maxn, rootloc, ClearMarkList, Deletable, DelNode,
RP
PRP
Rr
OW
Nr
Sw
SPWMHYPrP
DWYIDO
GetNode,
Graphnode, Mark, Marked, Marklist, RepNode;
15 FROM cocolex IMPORT line, col, ddt, GetName;
16 FROM cocolst IMPORT lst;
17 FROM Errors
IMPORT CompErr, Restriction, SemErr;
18 FROM FileIO
IMPORT con, Write, WriteCard, WriteString,WriteText,WriteLn;

19 FROM

System

IMPORT

Allocate;

20

SYSTEM

IMPORT

VAL;

FROM

21
22 CONST

23
24
25

26
Zi
28
29
30

31
32
33
34
35

+anysetsize
epssetsize
maxsymbols

maxnt
null:
eofsy

= 80;
= 65535;
= 0;

is to be added*)

END;

Firstset
= ARRAY[0..maxnt-1] OF RECORD
ts:
Symbolset;
(*terminal symbols*)
ready: BOOLEAN;
(*TRUE if ts is complete*)
END;

Macroptr
= POINTER
Macronode
= RECORD
spix: CARDINAL;
sem:
CARDINAL;
next: Macroptr;

46

47

compl.-sets for any-symbols*)
eps-follower-sets*)
symbols*)
(*max.number of nonterminals*)

‘Anyset
= ARRAY[l..anysetsize] OF Symbolset;
Epsset
= ARRAY[l..epssetsize] OF Symbolset;
Followset = ARRAY[0..maxnt-1] OF RECORD
ts:
Symbolset;
(*terminal symbols*)
nts: Symbolset;
(*nts whose start set

40

4)
42
43
44
45

(*max.no.of
(*max.no.of
(*max.no.of

TYPE

36

37
38
39

= 20;
= 70;
= 200;

TO Macronode;

(*name

of semantic

(*associated

(*to next

macro*)

semantic

action*)

sem macro*)

END;

Symbollist

= ARRAY[{0..maxsymbols]

OF Symbolnode;

48
49 VAR
50
yi!
52
Se
54
99
56
Sn
Ss

anyset:
column:
epsset:
ATTA
firstmacro:
fnt:
follow:
lastmacro:
snc

Anyset;
CARDINAL;
Epsset;
Firstset;
Macroptr;
CARDINAL;
Followset;
Macroptr;
Symbollist;

(*actual no.of any-sets*)
(*printing column for terminal
(*actual no.of eps-sets*)
(*terminal start symbols*)
(*first sem macro*)
(*no.of first nonterminal*)
(*terminal successors*)
(*last sem macro*)
(*symbol list*)

sets*)

Program listings

302

60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
La
12
NIL}
114
115
116
157
118

s:Symbolset);

App. F

PROCEDURE

AllBit (VAR

PROCEDURE
PROCEDURE

DelBit (VAR s:Symbolset; n:CARDINAL); FORWARD;
PrintSet (VAR s:Symbolset; n:CARDINAL); FORWARD;

FORWARD;

PROCEDURE

PutNt (sy:CARDINAL) ; FORWARD;

PROCEDURE

PutTermSet
(VAR s:Symbolset);

(* CompleteAt
PROCEDURE

BEGIN

Test

CompleteAt

RETURN

correct

(sy,nr:CARDINAL)

(sn[sy].nra=nr)

(* FindDelSymbols

PROCEDURE

if nr is the

Find

all

OR

FORWARD;

no.of

attributes

: BOOLEAN;

(sn[sy].typ=err);

deletable

END

CompleteAt;

symbols

and print

(*while

new

them

FindDelSymbols;

VAR
change: BOOLEAN;
dummy:
CARDINAL;
first:
BOOLEAN;
sales
CARDINAL;
name:
ARRAY [1..50]
sn:
Symbolnode;
BEGIN
fnt:=maxpt1;
REPEAT
change :=FALSE;

OF CHAR;

deletable

symbols*)

FOR 1:=maxp+l TO maxs DO
GetSy (i,sn);
IF (NOT sn.del) AND (sn.start<>0) AND Deletable(sn.start)
sn.del:=TRUE; RepSy(i,sn); change:=TRUE;

THEN

END;
END;

UNTIL

NOT

change;

first:=TRUE;
FOR i:=maxp+l TO maxs
GetSy(i,sn);
IF sn.del THEN
IF

first

(*print
DO

deletable

symbols*)

THEN

WriteLn(1lst);

WriteLn(lst);

WriteString(1st,"Deletable
first:=FALSE;

symbols:");

WriteLn (lst);

END;

GetName (sn.spix,name,1);
WriteString(lst,"
"); WriteText (lst,name, 1);
END;
END;
IF first THEN
WriteLn(lst);

WriteLn(lst);

WriteString(lst,"Grammar
WriteLn(lst);

contains

no deletable

END;

END

FindDelSymbols;

(* GetA

Returns

WriteLn(lst);

the any-set

with

the

number

nr

symbols.");

App. FR

219
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
199
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
al
172
173
174
175
176
rl

cocosym.MOD

PROCEDURE GetA(nr:CARDINAL; VAR
BEGIN s:=anyset[nr]; END GetA;

(* GetAnySets

Find

the

303

s:Symbolset);

complement

sets

for any-nodes

PROCEDURE GetAnySets;
VAR
gn:
Graphnode;
loc,i: CARDINAL;
Si
Symbolset;
BEGIN (*GetAnySets*)
FOR loc:=1 TO maxn

DO

GetNode (loc,gn);

IF

(gn.typ=any)

AND

(gn.lp<>0)

THEN

(*any

with

alternatives*)

GetFirstSet (gn.lp,s);
FOR 1:=0 TO maxt DIV 16 DO (*make
s [i] :=VAL(BITSET,
65535) -s [i];

complement*)

END;

DelBit(s,eofsy);
(*any must not
INC (maxany); anyset [maxany] :=s;
gn.sp:=maxany; RepNode (loc,gn);
END;
END;
END GetAnySets;

(* GetAt

PROCEDURE

Get

name

and

recognize

direction

GetAt(sy,nr:CARDINAL;

VAR

of an

eofsy*)

attribute

spix:CARDINAL;

VAR dir:Direction);

VARs |

i: CARDINAL;

p: Attributeptr;
BEGIN

IF
IF

(sn[sy].typ<>nt) AND (sn[sy].typ<>err)
(nr>sn[sy].nra) OR (sn[sy].typ=err)
THEN spix:=0; dir:=down;
(*semantic

ELSE
p:=sn[sy].firstat;
FOR 1:=1 TO nr-1 DO p:=p*.next;
spix:=p*.spix; dir:=p*.dir;
END

THEN

CompErr(3);

error*)

END;

END;
GetAt;

(* GetE

Returns

the

eps-set

with

PROCEDURE GetE(nr:CARDINAL; VAR
BEGIN s:=epsset [nr]; END GetE;

(* GetEpsSets
PROCEDURE

Find

GetEpsSets;

VAR
curnt: CARDINAL;
m: Marklist;
sn: Symbolnode;

the

follower

the

number

nr

s:Symbolset) ;

symbols

for

eps-nodes

END;

Program listings

App. F

PROCEDURE FindEpsFollowers
(loc, leftsy:CARDINAL;
VAR s:Symbolset;

VAR

nr:CARDINAL) ;

BEGIN
GetFirstSet (loc,s);
IF Deletable(loc) THEN Unit(s,follow[leftsy-fnt]
INC (maxeps); epsset [maxeps] :=s;

.ts,maxt);

END;

nr:=maxeps;
END

FindEpsFollowers;

PROCEDURE
VAR
gn:
nr:

FindEps (loc, leftsy:CARDINAL;

vialp:BOOLEAN);

Graphnode;
CARDINAL;

BEGIN

IF (loc=0) OR Marked(loc,m)
Mark (loc,m) ;

THEN

RETURN;

END;

GetNode (loc,gn);

WITH gn DO
IF

(typ=eps) AND (vialp OR (lp<>0))
FindEpsFollowers
(rp, leftsy,nr) ;
sp:=nr; RepNode(loc,gn) ;

THEN

END;

IF lp<>0
IF rp<>0

THEN
THEN

FindEps(lp,leftsy,TRUE); END;
FindEps (rp,leftsy,FALSE); END;

END;

END

FindEps;

BEGIN (*GetEpsSets*)
ClearMarkList
(m) ;
FOR

curnt:=maxp+l

TO maxs

DO

GetSy(curnt,sn);
FindEps (sn.start,curnt,
FALSE);
END;
END

GetEpsSets;

(* GetF

Returns

PROCEDURE

GetF (sy:CARDINAL;

BEGIN

the

s:=first[sy-fnt].ts;

(* GetFirstSet

PROCEDURE
VAR

terminal

m:

Gets

start

VAR
END

the

PROCEDURE

(*mark

list

CollectFirstSet

s:Symbolset);

terminal

for

start

VAR

symbols

(loc:CARDINAL;

VAR

IF

ddt[{"G"]

THEN

THEN

;

set:Symbolset)

BEGIN

WHILE loc<>0 DO
(*for
Mark(loc,m);
GetNode (loc,gn);

graph

nodes*)

Graphnode;
Symbolnode;
Symbolset;

ClearSet (set,maxt) ;
IF (loc=0) OR Marked(loc,m)

of the

set:Symbolset)

visited

VAR

gn:
sn:
sl:

of sy

GetF;

GetFirstSet (loc:CARDINAL;

Marklist;

symbols

RETURN;

all alternatives*)

END;

;

in loc

App. F

cocosym.MOD

237
238
239
240
241

WriteString(con,"CollectFirstSet:");
WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ)
,6);
WriteCard(con,gn.sp,6); WriteLn(con);
END;
IF DelNode (gn) THEN

242

CollectFirstSet (gn.rp,sl);

Unit(set,sl,maxt);

END;
CASE gn.typ OF
eps: ;
| 468
SetBit (set,gn.sp) ;
| nt:
IF first (gn.sp-fnt] .ready

243
244
245
246
247
248

THEN

249

ELSE

250

Unit (set, first [gn.sp-fn
.ts,maxt)
t] ;

GetSy (gn.sp,sn);

251

CollectFirstSet

(sn.start,sl);

Unit (set,sl,maxt);

END;

252

253

| any: AllBit (set);

254

END;

205

(*CASE*)

loc:=gn.1p;

256

257
258
259
260
261
262

305

END;

END

(*WHILE*)

CollectFirstSet;

BEGIN (*GetFirstSet*)
ClearMarkList
(m);
CollectFirstSet
(loc, set);
IF ddt["H"] THEN
WriteString
(con, "GetFirstSet:");
END;
END GetFirstSet;

263
PrintSet (set,maxt);
264
265
266
267
268 (* GetFollowSets
Get terminal successors of nonterminals
269 --------------------------------400
270 PROCEDURE GetFollowSets;
271 VAR

272

change:

203

i,n,nl:

BOOLEAN;
CARDINAL;

274
Zio)

m:
sn!

Marklist;
Symbolnode;

276
Zn,

PROCEDURE

278
279
280

VAR
gn: Graphnode;
set: Symbolset;

281

BEGIN

282

WHILE

CollectFollowSets
(loc, sym:CARDINAL);

loc<>0

DO

(*step

284
285
286
287
288
289

Mark (loc,m);
GetNode (loc,gn);
WITH gn DO
IF ddt["J"] THEN
WriteString(con,"CollectFollowSets ");
WriteCard(con,loc,6); WriteCard(con,sp,
6);

292

WriteCard(con,sym,6);
END;

IF typ=nt

END;

alternative

IF Marked(loc,m)

290
291

THEN RETURN;

through

283

(*cycle*)

WriteLn(con);

THEN

293
294

GetFirstSet (rp,set);
Unit (follow[sp-fnt].ts,set,maxt) ;

295

IF Deletable(rp)

THEN

chain*)

x)

306

Program listings

App. F

SetBit (follow[sp-fnt] .nts,sym-fnt) ;
296
297
END;
IF ddt ["I"] THEN
298
WriteString(con, "CollectFollowSets:");
299
WriteCard(con,
loc, 6);
300
WriteString(con,"$
"); PrintSet (follow[sp-fnt].ts,maxt);
301
WriteString(con,"$
");
302
PrintSet (follow[sp-fnt] .nts,maxs-maxp) ;
303
WriteLn (con);
304
END;
305
END; (*IF typ=nt*)
306
CollectFollowSets (rp,sym);
307
loc:=1p;
308
309
END;
(*WITH*)
END;
(*WHILE*)
310
END CollectFollowSets;
gyal
312
PROCEDURE Complete (1:CARDINAL);
(*add indirect successors of*)
313
314
VAR j: CARDINAL;
(*i+fnt to follow[i].ts*)
315
BEGIN
316
IF Marked(i,m) THEN RETURN; END;
(*already visited*)
317
Mark (i,m);
318
FOR j:=0 TO maxs-fnt DO
319
IF IsInSet(j,follow[i].nts) THEN
320
Complete (j);
321
Unit (follow[i].ts,follow[j].ts,maxt) ;
322
END;
323
END;
324
END Complete;
325
326 BEGIN (*GetFollowSets*)
327
FOR i:=fnt TO maxs DO
328
ClearSet (follow[1-fnt] .ts,maxt);
329
ClearSet (follow[i-fnt] .nts,maxs-fnt);
330
END;
331
332
ClearMarkList
(m);
333
FOR 1:=fnt TO maxs DO
(*get direct successors of nonterminals*)
334
GetSy(i,sn);
335
IF ddt["I"] THEN
336
WriteString (con, "GetFollowSets (0) :"); WriteCard(con,sn.start,6);
337
WriteCard(con,1,6); WriteLn(con);
338
END;
339
CollectFollowSets (sn.start,i);
340
END;
341
CollectFollowSets (rootloc,maxs+1l);
(*successors of grammar symbol*)
342
343
FOR 1:=0 TO maxs-fnt DO
(*add indirect successors to follow.ts*)
344
ClearMarkList
(m);
345
Complete (i);
346
ClearSet (follow[i].nts,maxt);
347
END;
348
349
IF ddt ["I"] THEN
350
WriteString
(con, "GetFollowSets (3) :$");
351
FOR i:=0 TO maxs-fnt DO
352
WriteCard(con, fnt+i, 6); PrintSet (follow[i].ts,maxt) ;
353
WriteLn (con);
354
END;
355
END;

App. F

356
357
358
359

cocosym.MOD

END

307

GetFollowSets;

(* GetFo

Get

follow-set

of nonterminal

sy

360 =----------=------2_--_--___
_______ 2... _______
____________ x)
361 PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset);
362 BEGIN set:=follow[sy-fnt].ts; END GetFo;
363
364
365

(* GetMacroNr

Get

semantic

macro

366 ------------------------------------------------------~-------------- x)
367 PROCEDURE GetMacroNr (spix:CARDINAL; VAR sem:CARDINAL) ;
368 VAR p: Macroptr;
369 BEGIN
,
370
371
312
373

p:=firstmacro;
WHILE (p<>NIL) AND (p*.spix<>spix) DO p:=p*.next;
IF p=NIL THEN sem:=0; ELSE sem:=p*.sem; END;
END GetMacroNr;

END;

374
375
376

(* GetSy

311

2222222222222

Gets

the

symbol

378 PROCEDURE GetSy(sy:CARDINAL; VAR
379 BEGIN snl:=sn[sy]; END GetSy;

sy

=)

snl:Symbolnode);

380
381
382

(* GetSymbolSets

383

----------------------------2-----------------22

Get

first-sets,

follower-sets,

eps-sets

and any-sets

384

PROCEDURE

GetSymbolSets;

385 VAR
386

1;

CARDINAL;

387

sn:

Symbolnode;

388

BEGIN

389
390
391
392
393
394

395

fnt:=maxpt1;
FOR i:=0 TO maxs-fnt DO first[i].ready:=FALSE;
FOR i:=fnt TO maxs DO
GetSy (1i,sn);
GetFirstSet (sn.start,first[i-fnt].ts);
first [i-fnt] .ready:=TRUE;

END;

END;

396
397
398

GetFollowSets;
GetEpsSets;
GetAnySets;

399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414

IF ddt({"K"] THEN
(*print first-sets and follow-sets*)
WriteLn (lst);
WriteString(lst,"List of terminal start symbols:"); WriteLn(lst);
FOR i:=fnt TO maxs DO
PutNt (1); PutTermSet
(first [i-fnt].ts);
END;
WriteLn(lst); WriteLn(lst);
WriteString(lst,"List of terminal successors:");
FOR i:=fnt TO maxs DO
PutNt (1); PutTermSet (follow[i-fnt].ts);
END;
END;

END

GetSymbolSets;

(* NewAt

Enter

new

attribute

for

a symbol

WriteLn(lst);

=)

415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473

App.

Program listings

308

PROCEDURE

NewAt (sy, spx:CARDINAL;

dir:Direction);

VAR

i: CARDINAL;

p,at: Attributeptr;
BEGIN
WITH sn[sy] DO
INC (nra);
IF typ=nt THEN
(*store name and direction?)
Allocate (at,SIZE (Attribute));
at*.spix:=spx; at*.dir:=dir; at*.next:=NIL;
IF firstat=NIL
THEN

firstat:=at;

ELSE
p:=firstat;

WHILE

p*.next<>NIL

DO p:=p*.next

END;

p*.next:=at;
END;

END;
END;
END

NewAt;

(* NewMacro

Enter

ee

new

semantic

ee

macro

ee

PROCEDURE NewMacro(spix,sem:CARDINAL;
VAR p,s: Macroptr;

VAR

ee

ee

ee

*)

ok:BOOLEAN);

BEGIN

p:=firstmacro;
WHILE (p<>NIL) AND (p*.spix<>spix) DO p:=p%*.next;
IF p=NIL
THEN
ok:=TRUE;
Allocate
(s, SIZE (Macronode) ) ;
s*.spix:=spix; s*.sem:=sem; s*.next:=NIL;
IF firstmacro=NIL
THEN firstmacro:=s; lastmacro:=s;
ELSE lastmacro*.next:=s;
END;
ELSE ok:=FALSE;
END;
END NewMacro;

(* NewSy
a

END;

lastmacro:=s;

Generate a new symbol and return index
a
a
ee
NewSy (spx:CARDINAL; tp:Symboltype): CARDINAL;
a

PROCEDURE
VAR i: CARDINAL;
BEGIN

IF maxs=null THEN maxs:=0; ELSE INC(maxs); END;
IF maxs>=maxsymbols THEN Restriction(6); END;
WITH sn[maxs] DO
typ:=tp; spix:=spx; aliasspix:=spix; nra:=0;
CASE typ OF
Be

IF maxt=null THEN maxt:=0;
IF maxp=null THEN maxp:=0;
IF maxt>=maxterminals THEN

Il Toter
IF maxp=null

ELSE INC(maxt);
ELSE INC(maxp);
Restriction(7);

END;
END;
END;

*)

App. F

cocosym.MOD

474

THEN

SemErr(25,line,col);

475

ELSE

INC (maxp);

476
477

END;
seml:=0;

sem2:=0;

478

| nt,err:

479
480
481
482
483

309

maxp:=0;

maxt:=0;

5

start:=0; del:=FALSE;
END; (*CASE*)
END; (*WITH*)
RETURN maxs;
END NewSy;

firstat:=NIL;

484
485
486

(* RepSy

487

----------------- A ----- =~

Replace

symbol

488
489

PROCEDURE RepSy(sy:CARDINAL; snl:Symbolnode);
BEGIN sn[sy]:=snl; END RepSy;

$5 5

sy

5

5 $5

= == = = = = = == ==

= === ------- x)

490
491
492 (* SyNr
Gets index of name spix
Sn
494
495

PROCEDURE SyNr(spix:CARDINAL):
VAR i: CARDINAL;

496

BEGIN

THEN

RETURN

=)

CARDINAL;

497
498

IF maxs=null
1:=0;

null;

END;

499
500

WHILE (i<=maxs) AND (sn[{i].spix<>spix)
IF 1i>maxs THEN i:=null; END;

501

RETURN

502
503,

END

DO

INC(1);

END;

i;

SyNr;

504 *
905

(* ALIBIEC

Set

all

bits

in set

s

506 --------------------------------------------------------------------- x)
507 PROCEDURE AllBit (VAR s:Symbolset) ;
508
509

VAR 1:
BEGIN

CARDINAL;

510

FOR

511
512
513
514
515
516
517
518
519
520
S21E
220
523

END AllBit;

1:=0

TO maxterminals

DIV

16 DO

s[{i]:=VAL(BITSET, 65535);

Deletes bit n in set s
en
en een
ee
PROCEDURE DelBit (VAR s:Symbolset; n:CARDINAL);

END;

(* ClearSet
Clears set s
----------------------------7777777777222
PROCEDURE ClearSet (VAR s:Symbolset; n:CARDINAL);
VAR i: CARDINAL;
BEGIN FOR i:=0 TO n DIV 16 DO s[i]:={}; END; END ClearSet;

*)

A DeLBLt

EXCL(s[n

DIV

16],

n MOD

END

Sn

SoSe 2

524

BEGIN

525
526
527
528
529

(* Empty
TRUE if set s is empty
=-------2--------------222-2222.
Sn
PROCEDURE Empty(VAR s:Symbolset; n:CARDINAL) :BOOLEAN;

530
531
532

VAR 1:CARDINAL;
BEGIN
FOR i:=0 TO n DIV

16 DO

16);

Son

*)

DelBit;

*)

App. F

Program listings

310

533

IF s[i]<>{}

534

END;

535

RETURN

536

END

THEN

RETURN

FALSE;

END;

TRUE;

Empty;

537
538
539

(* InSet

TRUE

if sl <= s2

540 ------------------------nn
n=
541 PROCEDURE InSet (VAR sl,s2:Symbolset; n:CARDINAL)
:BOOLEAN;
542

VAR

543
544
545
546

BEGIN
FOR i:=0 TO n DIV 16 DO
IF NOT(s1{i]<=s2[i]) THEN
END;

i:

CARDINAL;

547

RETURN

548

END

RETURN

FALSE;

END;

TRUE;

InSet;

549
550
551 (* IsInSet
TRUE VIE nedseineseess
552 -------------------------22-2222
2202220
553 PROCEDURE IsInSet (n:CARDINAL; VAR s:Symbolset) :BOOLEAN;
554 BEGIN RETURN (n MOD 16) IN s[n DIV 16]; END IsInSet;
555
556
557 (* PrintSet
ddt output of set s
558 ---------------------------------------4...
559 PROCEDURE PrintSet (VAR s:Symbolset; n:CARDINAL);
560
561
562

563
564
565

VAR i: CARDINAL;
BEGIN
FOR 1:=0 TO n DIV

DIV
MOD

(* PutNt
Print name of nonterminal sy
=
=======2-2=2-- 2-2 2-2

571

PROCEDURE

PrintSet;

572 VAR
Sl
ih:
574
name:
SS,
Gkalp

nn *)

PutNt (sy:CARDINAL);

CARDINAL;
ARRAY[1..50)
Symbolnode;

OF CHAR;

BEGIN

577

GetSy(sy,sn);

GetName(sn.spix,name,1);

578

WHILE

INC(1);

579
980
581

WriteLn(lst);
WriteString(lst,"
column:=15;

582

END

1<12

DO

name[l]:="

"; END;

"); WriteText (lst,name,1);

Write (lst,"

");

PutNt;

583
584
585

x)

256,4);
256,4);

566
567
568
569
520

576

x)

16 DO

WriteCard(con,VAL(CARDINAL,s[i])
WriteCard(con,VAL(CARDINAL,s[i])
END;

END

*)

(* PutTermSet

Print

names

of terminals

386) ----- 2.
nn
ln
nn
eg
587 PROCEDURE PutTermSet (VAR s:Symbolset);
588 CONST maxlinelen = 72;
589 VAR
590
1,1:
CARDINAL;
591
name: ARRAY[1..50] OF CHAR;

ee

in set

s

ee

SE

App. F
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621

cocosym.MOD

sn:

311

Symbolnode;

BEGIN

FOR i:=0 TO maxt DO
IF IsInSet(i,s)
THEN
GetSy(i,sn); GetName(sn.spix,name,
1);

IF column+l>maxlinelen THEN
WriteLn (lst); WriteString(lst,"
column:=15;
END;

WriteText (lst,name,1);
INC (column,
1+2) ;
END; (*IF IsInSet*)
END;

");

(*FOR*)

Writeln (lst);
END

WriteString(lst,"

>

PutTermSet;

(* SetBit

PROCEDURE

BEGIN

Sets

DIV

(* Unit

16],n MOD

si

Unit (VAR

s=

END

SetBit;

+ s2

n:CARDINAL) ;

16 DO sl[i]:=sl[i]+s2[1i] ; END;

622 BEGIN (*cocosym*)
623
maxt:=null; maxp:=null;
624
maxany:=0; maxeps:=0;
625
END cocosym.

aliasspix

466

AllBit
Allocate
any

ClaeZ5 3 00M
19 424
447
19328253
31

sl

s

n:CARDINAL);

16);

sl,s2:Symbolset;

VAR i: CARDINAL;
BEGIN FOR 1:=0 TO n DIV

Anyset

n in set

SetBit (VAR s:Symbolset;

INCL(s[n

PROCEDURE

bit

maxs:=null;

firstmacro:=NIL;

od

50

anyset
anysetsize
at
Attribute

505512082139
230231
419
424
425
424

Attributeptr
change

151
419
188779129437

ClearMarkList
ClearSet
cocogra
cocolex
cocolst

13
231
is
15
16

206
328

260
329

425

332
346

425

427

272
344
516

518

cocosym
col

1272625
15 474
CollectFirstSet
225
242
251
257
261
CollectFollowSets
am
RNP
ae)
column
S158 lee 597225997.602
CompErr
17153
Complete
3135 320532452345
CompleteAt
u
7p

SV

430

END

Unit;

con

curnt
ddt
del
DelBit
Deletable
DelNode
dir
Direction
down
dummy
Empty
eofsy
eps
Epsset
epsset
epssetsize
err
Errors
EXCL

FileIo
FindDelSymbols
FindEps
FindEpsFollowers
first

firstat
firstmacro
Firstset
£nt

follow
Followset
FORWARD
GetA
GetAnySets
GetAt
GetE
GetEpsSets
GetF
GetFirstSet
GetFo
GetFollowSets
GetMacroNr
GetName
GetNode
GetSy
GetSymbolSets
gn

Graphnode
1

App. F

Program listings

312

972895229058
Bee) 77631208528
Ds)
Dei
Osh!
18, Dey?
Youle
ei
Heer
564
563
174
207
208
209
1522236
22620 PE
ZO.
ssi
SHO
ses
90
91
99 479
621 1138
523
524
1372905
825
7295
13241
14971557
USSR
S98
46425"
7425
148
416
155
79
529
536
28
138
196
245
3252
S2e 107185
24
32
Tle 53
LOAN
e478

2
352

17
524

18
76

114

187
200
201
203
178
185
197
SO
CONS
6n 0
403
157
426
427
429
54
370
442
449
37
53
555
85) 182217
3210328703297
3297
393
394
402
403
56.118202
29457296
33253627108
33
56
61
62
63
64
119
120
125
143
398
148
161
166
167
2a
397
2162217
134
181
222
265
361
362
270
356
396
aii)

209
OSI

LIED

479
450

623

2477723922995
7333, 3133510
407
408
3012530323

EA EZ

SE 90

9

3 os

296796
30120
3525
362723805390
391
EBD

DE

a

Er

2

931s

65

293

393

El

la
Als
Er
14721327199
I
AT
Sr
zul
127
132,
1337
2355238082395
286
285
iM
WG
AI
81
88
89

598
5235=5.285
25033403

ee

133513
22140,
241724072417

120191921551
246
217222090
950809558979

Ban
9]

aie)
97

273
339

eyes

317,
346

319°
351

320
352

313.
343

316"
345

ae

9327)
352

OG

NSIS

aS

GOYA

3288370)
386
390

333)
390

38
391

ay
39%

App. F

INCL
InSet
IsInSet

J
1
lastmacro

Macronode
Macroptr

Mark
Marked
Marklist

maxany
maxeps
maxlinelen

maxn
maxnt
maxp
maxs

maxsymbols
maxt
maxterminals
n
nl
name
NewAt
NewMacro
NewSy
next
nr

cocosym.MOD

393
499
530
594
612
541
319
314
81
601
57
178
15
128
198
282
133
16
400
598
175
317
41
41
14
14
14
139
183
588
13
26
85
623
88
391
623
25
135
392
471
62
553
273
82
416
439
460
45
70
197
71
153
35
27
439
151
372
443

394
499
532
395

313

402
499
83
596

403
500
542
618

403
500
544
619

407
501
545
619

408
508
545
619

408
510
560
619

418
510
562

461
517
563

495
518
564

498
518
590

554
319
106

595
320
573

321
577

578

578

578

580

590

596

597

451
187

451
197

200

201

132
225
284
196
101
401
601
193
344
447
54
234
232
223
140
184

140
232
285
200
102
405
605
206

178
232
289
200
102
405

181
233
300
255
106
406

182
234
308
308
106
406

187
235

192
238

192
255

193
261

194
277

106
579

110
580

110
580

111
580

112
598

223

232

234

260

274

283

284

316

OW
284
283
274
624
624

368
317
316

440

37
97

207

303

389

470

470

470

473

474

475

97
402

207
407

303
463

318
463

327
463

329
464

333
465

341
482

343
497

351
499

390
500

47
182
469
510
63
554

464
231
469

242
469

248
471

251
474

263
594

294
623

301

321

328

346

273
554

516
559

518
562

523
611

524
612

524
612

529
617

532

541

544

105
434
455
483
158
71
198
154
247
296
463
446
157
372
443

106

574

Si

578

580

Seh

596

601

371
19

425
120

429
148

429
154

430
158

443
166

448
167

451
178

184

190

422
292
303
469
453
158
419
444

466
423
319
470

478
329
473

346
497

497

500

623

623

623

158
429

159
429

159
429

368
429

370
430

371
440

371
442

371
443

548
553
318
105
602
450
182
474
als
222
283
134
101
401
601
192
332
42
45
193
192
175
139
183
597
131
33
88

619

371
443

pr
PrintSet
PutNt

472
63)
64

263)
403

301
408

303"
571

S52)
582

PutTermSet
ready

65
39

403
247

408
390

587
394

606

RepNode
RepSy

14
91
17

140
488
464

198
489
471

13

341

Restriction
rootloc
rp
s

sl
s2
sem
seml
sem2
SemErr
set
SetBit
sn

snl
sp
spix

App. F

Program listings

314

5595566

ey
AW
61
62)
166
167
44824507
Seis) Geis!

IE
363)
179
24507
GY!

2222992957207
5657119120129
1 S43 6m
181
182
183
216
217
440
450
451
5 07 S051 6 OS
SSE) Gs)
Stl
Sk
GO
(il

2298
2325
Ki
Selby
A436)
477
477
17
474
LLL
225
294
361
246
296
58
71

242
a
3725

25
ls)
372)

25 le S14 SO
Sila

439

448

231
362
611
il

29328

2165

2185

2512535

612
83

89

90

90

90

91

91

98

99

ar
275
oe

ays)
336
sil

a
339
Bee

at
379
BIS

MG
387
S86

LG
392

OE
393

20D
421

220)
465

250)
489

Sil
499

247
159
466

248
159
494

250
367
499

289
371
499

294
371
577

296
425
596

301
439

303
443

336
148
416

339
153
421

393
153
488

479
154
489

154
571

157
577

216

217

ass
334
Sy

3
447
DZ
ily,

Cees
8189
448
448
32429

OOOO
448

2612635280293

sym
Symbollist
Symbolnode
Symbolset

378
140
43
443
416
90
64
361
277
47
47
31.

379 488
198 239
105 148
448 448
425 460
90 209
70
71
362 378
290 296
58
83 176
832" 34,

489
246
155
466
466
251
71
379
307

228 275
0955 030"

378
Gil

387 488
“62%, 63%

575
65a

592
119

7799

166

216
eke

225
Gal

280

361

516

523

529

541

Symboltype
SyNr
System

179
a
460
494
19

OU ar

GO

Spx
start
sy

SYSTEM

ie
tp
ts

typ
Unit
VAL
vialp
Write
WriteCard

WriteLn

WriteString

222
a

229
faaly)

507

502

20

246 468
460
466
layer
393
403 408
Uy
ise) le
IGE) GLY
aI
1927724272487
251
294321
20 136
510
563
564
187
196
18 580
18 238
238
239
289
289
564
18 101° 101° 102° 106° 110°
353 400 401 405 405
406
18 102
106 111
237 263

Aa

OS

WE
YE
9617619

PY

ADS)

AGG

7167)

290

336

337

352

563

239. 250
605
3070302

304

339

336

350

300

1104.12.
579 598
288% 2997

App. F

WriteText

cocosym.MOD

401
18

406
106

580
580

598
601

601

315

Program listings

316

1 (* General table-driven
2
3 This is a parser module

syntax

analyzer

generated by Coco from an attributed grammar.
4 Before calling the procedure Parse from the main program, initialize
5 the scanner (<grammarname>lex.MOD) .

7 DEFINITION

MODULE

cocosyn;

8 VAR

9
printinput:
BOOLEAN;
(*trace the
10 + printnodes:
BOOLEAN;
(*trace the
11
12 PROCEDURE Parse (VAR correct :BOOLEAN);
13 END cocosyn.

input tokens read*)
G-code interpretation*)

App. F

2 'ORB

cocosyn.MOD

317

(* General table-driven syntax analyzer
Re
S===S==2==SSS2=S 222222222 22=2======2=>=
Moe 21.12.83
01 (21.12.83) First version (rewritten from PL/M)
02 (28.02.84) New interface for input and errors
03 (02.04.84) Error in EOL-processing corrected
04 (08.05.84) New EOL-processing
05 (23.07.84) For G-code
06 (30.08.84) Error recovery simplified
07 (05.04.85) New G-code instruction EPSA (ANYA modified)
08 (12.04.87) Grammar tables initialized INLINE
09 (12.04.87). typ,col,line and at exported by cocolex
10 (07.06.87) Name of error module and scanner procedure constant
nenn
an,
IMPLEMENTATION

MODULE

cocosyn;

FROM

Errors

IMPORT

SyntaxError,

FROM

FileIo

IMPORT

con,

Errorptr,

WriteCard,

Errornode;

WriteLn,

WriteString;

RP
PRP
RPP
PRP
HM
NH
SW
MN
WIAD
COW
WODMDANIDOBPWNHOrFP
FROM System
IMPORT Allocate;

FROM

SYSTEM

IMPORT

FROM

cocosem

FROM

cocolex

IMPORT

ADDRESS,

ADR,

INLINE;

Semant;

IMPORT

GetSy,

typ,

at,

line,

col;

NW
MR
NM
Oo
>wvwomwh

CONST
maxname
maxnamep
maxcode
maxany
YH
LH
MH
NM
von
maxeps
ww
ro

= 385;
=
45;
= 401;
=
37
=
10;

maxt

=

maxp
maxs
startpc

=
34;
=
45;
= 397;

34;

MH
SW
OO
YD
Ww
ww

38 CONST
et
nts
eps

(*G-code

= 0;
= 4;
= 8;

errdistmin
lmaxs
eofsy

wo
bP
BB
wWwnNroowo

instructions*)

eek
ntas
epsa
=
=
=

= ay
= 5;
= 9;

2;
50;
0;

ies =
2
any =
6;
jmp = 10;

Dtags=
anya =
ret
=

3,
7;
ll;

(*min.distance between two errors*)
(*max.stack length*)
(*token number of endfile symbol*)

=
Nom
Sb
Sp

47 TYPE
>
ce
WO

Attributenumbers = ARRAY[0..maxp] OF CARDINAL;
Namepointers
= ARRAY[0..maxnamep] OF CARDINAL;
Name list
= ARRAY(1..maxname] OF CHAR;
Pragma
= RECORD
(*semantics for a pragma*)
sem2,sem3: CARDINAL;
END;

Pragmalist
Symbolset

Symbolnode
startpc:

Oo
ND
WMHr-H
OS
AaAannnnnn

58
59

del:

= ARRAY[maxt..maxp] OF Pragma;
= ARRAY[0..maxt DIV 16] OF BITSET;
(*set of terminals*)
= RECORD
(*symbol information (only for nt)*)
CARDINAL;
(*start node of rule for nt*)
BOOLEAN;

(*TRUE,

if nt is deletable*)

App. F

Program listings

318

60
61
62

first:
END;
Symbollist

63

Stack

(*terminals

Symbolset;

= ARRAY[maxp+l..maxs]
= ARRAY[1..lmaxs]

64
65 VAR
66
tab:

POINTER

TO RECORD

causing

to analyze

nt*)

this

OF Symbolnode;

OF CARDINAL;

(*grammar

tables*)

67
68
69
70
hl

header:
code:
ntsymbols:
epsset:
anyset:

ARRAY[1..8] OF CARDINAL;
(*not used*)
ARRAY[1..maxcode] OF CHAR;
(*G-code area*)
Symbollist;
(*nonterminals information*)
ARRAY[1..maxeps] OF Symbolset;
ARRAY[1..maxany] OF Symbolset;

72
73
74
75

nra:
ps:
namep:
name:

Attributenumbers;
Pragmalist;
Namepointers;
Namelist;

76

END;

77
Lem

correct:
pee

(*no.of attributes*)
(*semantics for pragmas*)
(*pointers to symbol names*)
(*symbol names*)

BOOLEAN;
CARDINAL;

(*error indicator*)
(*program counter*)

79
80

errdist:

CARDINAL;

81

newlacts:

ARRAY

82
83
84

newpc:
s,olds:
lacts:

(*current

[0..maxt]

ARRAY [0..maxt]
Stack;
CARDINAL;

OF

CARDINAL;

(*new

OF CARDINAL;

error

stack

(*pc after
(*stack

distance*)

length*)

recovery*)

pointer*)

85
86
87 PROCEDURE
88

GetSymInstr(pc:CARDINAL;

VAR

opcode,sy,nextpc,altpc:

CARDINAL);

FORWARD;

89 PROCEDURE

RestoreStack;

90 PROCEDURE
91 PROCEDURE
92 PROCEDURE

SaveStack; FORWARD;
StackElem(i:CARDINAL): CARDINAL; FORWARD;
Triple (altroot:CARDINAL); FORWARD;

FORWARD;

93
94
95 (* Match
Check if sy is member of the specified set
96 ----------------------------

x)

97 PROCEDURE Match(sy:CARDINAL; set:Symbolset): BOOLEAN;
98 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match;

99
100
101

(* NextSym

102
103

--------------------------------------------------_-_________
2... 2... Ei)
PROCEDURE NextSym;

104

BEGIN

105
106

107

Get

next

symbol

LOOP
GetSy;

= (*IF printinput

THEN

108

WriteString(con,"S$(in:");

109
110

WriteString(con,")
IF printnodes THEN

111
ANZ,
113

WriteCard(con,
END;
END; *)

114

IF typ<=maxt

115

WITH

tab”

WriteCard(con,typ, 3);

");

lacts,3);

THEN

RETURN

WriteString(con,"|

");

END;

DO

116

IF correct

AND

(ps[typ].sem2<>0)

117
118

IF correct
END;

THEN Semant (ps[typ].sem2);

AND

(ps[typ].sem3<>0)

END;

THEN

END;

Semant (ps[typ].sem3);

App. F

119

120
121
122
123
124
125

cocosyn MOD

IF typ=eofsy

THEN

RETURN

319

END;

END;
END NextSym;

(*===========================

ERRORS

S===S======2=22=2=========5===5========mk)

126
127 (* AdjustPc
Adjust pc to next symbol instruction
128 --------------------------------------------------------------------- x)
129 PROCEDURE AdjustPc(VAR pc:CARDINAL) ;
130
131
132
183

BEGIN
WITH tab”
IF pc=0
LOOP

DO
THEN

RETURN;

END;

134

CASE ORD(code[pc])

185
136
137

t,ta,nt,nta,nts,ntas,any,anya,eps,epsa: EXIT;
| jmp: pc:=256*ORD
(code [pc+1])+ORD (code [pc+2]);
leret:2pss=0, ZEXIT;

138
139
140

ELSE
END;
END;

141

142

INC (pc);

OF

(*sem*)

END;

END

AdjustPc;

143
144
145
146

(* Error
Report syntax error
-------------------------------------

147

PROCEDURE

148 VAR
149*
e,el,h:

2272722222222

- *)

Error (VAR pc,altroot:CARDINAL) ;

Errorptr;

150

1,j:

CARDINAL;

151

opcode, sy,nextpc,altpc,pcl:

CARDINAL;

152
153
154
155
156
157
158
159
160
161
162

163
164
165
166
167
168

PROCEDURE GiveName(q:Errorptr; sy:CARDINAL);
VAR p,4}: CARDINAL;
BEGIN
WITH tab“ DO
p:=namep[sy]; j:=0;
WHILE (j<25) AND (name[p+j]<>0C) DO
INC (4); q*.txt[j]:=name [p+tj-1];
END;
qu. Ls=i
END;

END GiveName;
BEGIN (*Error*)
correct :=FALSE;
IF errdist >= errdistmin
THEN

169
170

Allocate
(h,SIZE (Errornode));
h*.next:=NIL; el:=h;

171
172

pel:=altroot;
AdjustPc(pcl);
WHILE pc1>0 DO

173
174
175
176
197,

GiveName(h,typ);

(*pass

GetSymInstr (pcl,opcode,sy,nextpc,altpc);
IF opcode<any THEN
(*t,nt,nts,ta,nta,ntas*)
Allocate
(e,SIZE (Errornode));
GiveName (e,sy);
(*pass
el‘.next:=e;

el:=e;

e*.next:=NIL;

near-symbol*)

expected

symbol*)

Program listings

320

178
END;
pel:=altpc;
19
180
END; (*WHILE*)
SyntaxError(h,
line, col);
181
Triple(altroot); SaveStack;
182
IF printnodes THEN
183
WriteString(con,"$
typ
newpc
newlacts$");
184
FOR i:=0 TO maxt DO
185
IF newpc[1]<>0 THEN
186
WriteCard(con,i,5); WriteCard(con,newpc[1],10);
187
WriteCard(con,newlacts[i],10); WriteLn(con) ;
188
189
END; (*IF*)
190
END;
(*FOR*)
191
END 7a (rR)
ELSE RestoreStack;
192
END;
193
WHILE newpc[typ]=0 DO
194
IF printnodes THEN
195
196
WriteString(con,"$(skip:"); WriteCard(con,typ,0);
197
WriteString(con,")
");
END;
198
199
NextSym;
200
END;
201
pe:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0;
END Error;
202
203
204
205 (* Fill
Fill triple list with alt-chain starting at pc
206
207 PROCEDURE Fill(pc,lacts:CARDINAL);
208 VAR
209
i1,opcode,sy,nextpc,altpc: CARDINAL;
210
s: Symbolset;
211 BEGIN
212
AdjustPc (pc);
213
WHILE pc<>0 DO
214
GetSymInstr
(pc, opcode, sy, nextpc,altpc) ;
215
CASE opcode OF
216
eta
2a]
newpc[sy]:=pc; newlacts[sy]:=lacts;
218
| nt,nta,nts,ntas:
219
s:=tab*.ntsymbols[sy].first;
220
FOR 1:=0 TO maxt DO
221
IF Match(1,s) THEN newpc[i]:=pc; newlacts[1]:=lacts; END;
222
END;
223
IF tab*.ntsymbols[sy].del THEN Fill(nextpc,lacts); END;
224
| eps,epsa:
225
Fill(nextpc,lacts) ;
226
ELSE (*any,anya: nothing*)
PEN
END; (*CASE*)
228
pc:=altpc;
229
END; (*WHILE*)
230
END Fill;
231
232
233 (* Fillsuce
Fill triple list with succ. of alt-chain at pe
234
235

PROCEDURE

236

VAR

FillSucc (pc, lacts:CARDINAL) ;

App.

App. F

237

cocosyn. MOD

opcode, sy,nextpc,altpc:

321

CARDINAL;

238 BEGIN
239
AdjustPc(pc);
240
WHILE pc>0 DO
(*fill with successors of alternative-starts*)
241
GetSymInstr
(pc, opcode, sy,nextpc, altpc) ;
242

IF nextpc>0

243

pce:=altpc;

244

245

END;

END

THEN

Fill (nextpc,lacts);

END;

(*WHILE*)

FillSucc;

246
247
248 (* GetSymInstr
Get G-code instruction at address pc
249 -------------4-------4
250 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc:
251

252

BEGIN

(*assert:

WITH

tab*

pc points

opcode:=ORD
(code [pc] ) ;

254

IF

256
257
258

(opcode<=epsa)
THEN

AND

(opcode<>any)

e,nt,eps:

nextpc:=pc+2;

261
262
263

| ta,nta,anya,epsa:
nextpc:=pct4;
| nts:
nextpc:=pc+3;

264
265

| ntas:
| any:

271
272
Zar
IN

RET,JMP,SEM,ANY)*)

sy:=ORD
(code [pc+1]);

260

269
270

(not

ELSE sy:=0;
END;
CASE opcode OF

259

266
267”
268

instruction

DO

253
255

to a symbol

*)
CARDINAL);

nextpc:=pc+5;
nextpc:=pc+tl;

altpc:=0;

altpc:=256*ORD
(code [pc+2])+ORD (code [pc+3]) ;
altpc:=0;
altpc:=256*ORD
(code [pc+2] ) +ORD (code [pc+3]);
altpc:=0;

END; (*CASE*)
AdjustPc(nextpc); AdjustPc(altpc);
END;
(*assert: nextpc,altpc point to symbol
END GetSymInstr;

instructions

or are

zero*)

ZTriple
Fill triple list
SEI
I
OOSEESES

=)

275 PROCEDURE Triple (altroot:CARDINAL);
276 VAR i: CARDINAL;
277

BEGIN

278
279
280

FOR 1:=0 TO maxt DO
newpc[i]:=0; newlacts[i]:=0;
END;

281
282
283

FOR

1:=1 TO lacts DO
(*s[1] contains successor at
FillSucc(StackElem(i),1-1);

284

Fill (StackElem(i),1-1);

285

END;

286
287
288
289

290

291
292
293

294

295

FillSucc(altroot, lacts) ;
Fill (altroot, lacts) ;

(*clear

triple

(*fill with
level 0*)

(*fill
(*fill

with
with

list*)

succ.of

stacked

nt's*)

succ.of
current

alt-chain*)
alt-chain*)

END Triple;

(*=========================

(*========================

END ERRORS ================================%)

SYNTAXSTACK

===============================%)

296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
Sal
352
353
354
355

App. F

Program listings

322
PROCEDURE

Pop(VAR

loc:

CARDINAL);

BEGIN

IF lacts>0
:
THEN loc:=s[lacts]; DEC(lacts);
ELSE WriteString(con,"--- Parser

stack

underflow.$");

HALT;

END;

(*IF printnodes
END Pop;
PROCEDURE
BEGIN

IF

THEN

Push(loc:

WriteString(con,"

pop");

END;*)

CARDINAL);

lacts<lmaxs

THEN INC(lacts); s[{lacts] :=loc;
ELSE WriteString(con,"--- Parser stack overflow.$");
END;
(*IF printnodes THEN WriteString(con," push"); END;*)
END Push;

HALT;

PROCEDURE RestoreStack;
BEGIN s:=olds; END RestoreStack;

PROCEDURE SaveStack;
BEGIN olds:=s; END SaveStack;
PROCEDURE StackElem(i:CARDINAL) : CARDINAL;
BEGIN RETURN s[i]; END StackElem;

(* TableContents

A dirty

trick

PROCEDURE TableContents;
BEGIN (*%% dont remove or change

to initialize

this

the grammar

tables

comment*)

INLINE(

401,
34,
34,
45,
10,
Sr
45,
385,
(7=——G=Code-——*)
Ue
lp BIOs
AL,
Die
3, 4359,
256, 5648, 2560,
SE,
22,
BOS,
36,
811,
3679296070 14247 4120
82),
56, 5125, 9984,12569,
813,
39, 2560, 9985, 3072,20506,
812,
80, 5125, 9984,18459, 7171,10752,15645, 2560,15616,
2590,
273,
101, 7956, 1319,
$4, 8195,11520,21258,
83,
2050, 8448, 3329, 4352,33311, 8709, 9984,29987, 2052, 3840,
5122, 9252,
21, 2560,27144,
805,
4, 9739,
549,10024,
278,
151,
549,10506,
141, 2053, 2858, 1062,11052, 1318,
168,11566, 2560,40712, 1547,
812,
186,12037, 9984,46640,
12552, 1807, 2817, 1536,49202, 2817,
512,50739, 281.9,.1.01527
52276, 2817, 5888,55315,
548,13568, 6162, 2817, 6400,58387,
348,13824, 6674, 2816, 6931,
548,14080, 7186,14347,
2
14597,10241,
SB
2p
Ass),
SS),
30, 2820,10554, 2560,
64768, 2107,
32,
273,
297, 7948,
289,
293,
273,
286,
7948, 2561, 4352, 4924, 3594,
273, 2056,15627,
19,15374,
2561, 4352, 2878,
327
1721949
2228972324,
17, 1949,
2561,14600, 2367, 2816, 3648,
279,
345,16640, 4383,16896,
6144, 1291, 1794,
353,17162,
345, 2058,17418,
342,
14,
32,
17, 8005,
V5 WEB,
Sip LIS,
SiS6),
5,18187,
UE
Mle TO)
18, 7947,
17.556, 18443,
5477,
0,
2816,
(*---nt-symbols---*)

17

0,

128,

0,

0,

137,

0,16452,

2694,

0,

App. F

cocosyn MOD

323

356
154,
0,16452, 2694,
Oy, ste
0,16452, 2694,
0,
357
0,
07.256,
OBER 2EZE
0,
0, 8192,
0,
239,
358
0,
0,
0,16384,
Ya SS
304,
0,
0, 2048,
359
0,
0,
359,
6,
0,
OFS Sill,
0,16384,
0,
360
391,
0,
2,
0,
0,
361 (*---eps followers---*)
362
0,
17
0,
512,
0,
0, 8192,
0,
0,
16,
363
0,
16,
0, 5408,
0,
0,16452, 8166,
Op AT27
364
0,
0,
0,
0,16384,
0,49152,
0,
0,
32,
365 (*---any sets---*)
366
65022, 65534, 65535, 65502, 65535, 65535, 65502, 65535, 65535,
367 (*---attribute numbers---*)
368
0,
0,
0,
0,
0
0,
0,
0,
0,
0,
369
370
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
371
0,
0,
0,
0,
17
372 (*---pragma semantic---*)
373
0,
0,
374 (*---name pointers---*)
375
17
197
57
74,
69,
53,
19,
59,
44,
34,
376
Cp
83,
I,
ele
Ana,
IR
Ar
ait
A
or
377
za
WS,
le,
a
NE).
er
AO
ee
PADI
378
ZU
PAY
aii,
AES,
Bly
PA,
IN
PD.
XS
Oi
379
366,
2313,
1,2349,5
30073157533
380 (*---name list---*)
381
17743,17920, 8769,19529,16723, 8704, 8801,28281, 8704, 8772,
382
34,17742,17479,21057,
17931719521, 21057,.21577,20302, 212827
383
19746,
34,25966,25715,25965, 8704, 8805,28787, 8704, 8775,
384
21057,19789,16722, 8704, 8809,28194,
34,19777,17234,20307,
385 » ‚ 8704, 8782,20302,21573,21069,18766,16716,21282,
34,28533,
386 " 29730,
34,20562,16711,19777,21282,
34,21077,19525,21282,
387
34, 21317,19777, 20052, 18755, 21282,
34,29541,27938,
34,
388
21573,21069,18766,16716,21282,
105,25701,28276,
26982, 26981,
389
78,21837,16965,20992,10045, 9984,
29184, 21332,21065, 20039,
390
10030, 9984,10108, 9984,10024, 9984,10025, 9984,10075, 9984,
391
10077, 9984,10107, 9984,10109, 9984,10044, 9984,10046, 9984,
392
25455, 29561,
10043, 9984,10042, 9984,10028, 9984,28271,25455,
101,
34, 25455, 29298, 25955, 29728, 26482, 24941, 28001,29218,
393
394
30832, 29285, 29555, 26991, 28160, 24940, 29797, 29294, 24948, 26998,
34,
97,29812, 29289, 25205, 29797,
25856,29561, 28002, 28524,
395
26990
29812,
29289, 25205,29797,
8704, 8815,30068,11617,
396
,1161
7,
29812,29289,25205,29797, 8704, 8819,25965,24942,29801,
25376,
397
28001,
25376, 01,
24931,29801,28526, 8704, 8819,25965,24942,298
398
115,31085,25199,27648, 8801,27753,24947, 8302,
25458,28450,
399
0,0);
24941,25890,
400
401
END TableContents;
402
403
404
405
406
407 PROCEDURE Parse(VAR corr:BOOLEAN) ;
408 VAR
altroot:
CARDINAL;
(*root of current alternative chain*)
409
mustread:
BOOLEAN;
(*TRUE if next symbol must be read*)
410
opcode:
CARDINAL;
(*instruction code*)
411
running:
BOOLEAN;
(*interpreter state*)
412
sy:
CARDINAL;
413
414

ee

ie

324

Program listings

App. F

415 BEGIN
tab:=ADR(TableContents)+10D;
(*initialize the tables*)
416
pe:=startpc; altroot:=pc;
417
line:=1; col:=1;
418
correct:=TRUE; mustread:=TRUE;
running:=TRUE;
419
420
WITH tab* DO
421
WHILE running DO
422
opcode:
=ORD (code [pc]) ;
423
IF mustread AND (opcode<=epsa) THEN
424
NextSym; mustread:=FALSE; INC(errdist); altroot:=pc;
425
END;
426
(*IF printnodes THEN WriteCard(con,pc,5); END;*)
427
428
INC (pc);
CASE opcode OF
429
(58
430
431
IF ORD (typ) =ORD (code [pc] )
THEN IF typ=eofsy
(*t recognized*)
432
THEN running:=FALSE;
433
ELSE INC(pc); mustread:=TRUE;
434
END;
435
ELSE Error (pc,altroot);
436
437
END;
ta:
438
439
IF ORD (typ)=ORD
(code [pc] )
440
THEN INC (pc,3); mustread:=TRUE;
(*t recognized*)
441
ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]));
(*try alt.*)
442
END;
| nt pnts:
443
444
sy:=ORD (code [pc]);
445
IF Match (typ,ntsymbols[sy].first) OR ntsymbols[sy].del
446
THEN
(*right nt, parse it*)
447
IF opcode=nts THEN INC (pc); Semant (ORD(code[pc])); END;
448
Push (pc+1); pc:=ntsymbols[sy].startpc;
449
altroot:=pc;
450
ELSE Error (pc,altroot);
451
END;
452
| nta,ntas:
453
sy:=ORD (code [pc]);
454
IF Match (typ,ntsymbols[sy].first)
455
THEN
(*right nt, parse it*)
456
INC (pc, 3);
457
IF opcode=ntas THEN Semant (ORD(code[pc])); INC(pc) END;
458
Push (pc); pc:=ntsymbols[sy].startpc;
459
altroot:=pc;
460
ELSE pc:=ORD (code [pc+1])*256+0RD
(code [pc+2]); (*try alt.*)
461
END;
462
| any:
mustread:=TRUE;
(*any recognized*)
463
| anya:
464
IF Match (typ,anyset
[ORD (code [pc]) ])
465
THEN INC (pc,3); mustread:=TRUE;
(*any recognized*)
466
ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2]);
467
END;
468
| eps:
469
IF Match (typ, epsset [ORD (code [pc])])
470
THEN INC (pc);
471
ELSE Error (pc,altroot);
472
END;
473
| epsa:

App. F

474
475
476
477
478
479
480
481
482
483
484
485
486
487

cocosyn.MOD

325

IF Match (typ,epsset
[ORD (code [pc])])
THEN INC (pc, 3);
(*eps recognized*)
ELSE pc:=ORD (code [pct+1] ) *256+0RD (code [pce+2]) ;
END;
| jmp: pc:=ORD (code [pc] ) *256+ORD(code[pct1]);
(*goto successor*)
| ret: Pop(pc); altroot:=pc;
(*end of nt*)
ELSE (*sem*)
IF correct THEN Semant
(ORD (opcode) ); END;
END; (*CASE*)
END; (*WHILE running*)
END; (*WITH tab**)
corr:=correct;
END Parse;
Ss

488 BEGIN
489
490
491
492
493

printinput:=FALSE;
printnodes:=FALSE;
errdist:=100;
lacts:=0;
END cocosyn.

ADDRESS

AdjustPc
ADR
Allocate
altpc
altroot
Pr
any
anya
anyset
at
Attributenumbers

Cc
cocolex
cocosem

cocosyn
code

col
con
corr

correct
D
del
e
el
eofsy
eps
epsa
epsset

errdist
errdistmin

Error
Errornode
Errorptr

20

20
20
110
87
262
92
449
40
40%
71
23
48

EA
IE
2102392677261]
416
6
S75)
lol
7379
2095 DIT
2287231772499
237
72637264
7265 7261
147)
Pia)
9182; 201
275) 286
28710972177
450
459
471
479
135
174
254
265
462
1352.
464

2615

9250

260

42572436

463

72

158
23
22

15 493
OGme 345
1360136
439
441
441
444
469
474
476
476
23
181
418
18
184
187
187
407
485
ieee
kG
ey
416
59 223
445
IWS)
a
A
Ae
WA
le
NTP
a
a
nl)
ANS
41
135
224
259
AVS 5) 2245
2545
70
469
474
SOR Gime 2015 4255
43
167
147
202
436
450
11216951715
aba 498
153

2535259)
447
453
478
478

262
457

2620264
460
460

2642323
464
466

188

188

196

196

300

atom

Zul

Eee

a]

468
26164249
4911
471

473

197

309

7431
466

Errors

FileIo
Fill
FillSucc
first

FORWARD
GetSy
GetSymInstr
GiveName

h
HALT
header
al
INLINE

j
jmp
1
lacts
line
lmaxs
loc
Match

maxany
maxcode
maxeps
maxname
maxnamep
maxp
maxs
maxt
mustread
name
Namelist
namep
Namepointers
newlacts
newpc
next

nextpc

NextSym
nra
nt

nta
ntas
nts
ntsymbols
olds
opcode
Pp
Parse
pc

App. F

Program listings

326

17
18
207
233
60
88
23
87
153
149
300
67
91
276
20
150
41
161
84
298
23
44
296
97
29
28
30
26
27
32
33
31
410
15
50
74
49
81
82
170
87
260
103
72
39
39
40
40
69
83
87
254
154
407
78
201
243
265
441

284

287

225
283
445
90

230
286
454
91

242

214
169
169

241
176
170

250

270

170

181

185
279

186
279

187
281

187
283

188
283

209
284

220
284

199
478

158

158

159

199

159

161

201
299
181
63
299
98
71
68
70
50
49
48
62
54
419
158
19
157
74
188
186
177
151
262
121

207
299
418
307
305
221

MT
307

221
308

223
308

225
492

235

242

308
445

454

464

469

474

54

62

55
424
159

8
425

82
434

114
440

185
462

220
465

201
187
177
173
263
199

207
194

221
201

279
217

221

279

209
264
425

214
265

223
267

225

237

135
135
135
135
219
315
151
258
157
486
87
207
250
417
441

218
218
218
218
223
318
173
411
158

259
261
264
263
445

443
452
452
443
445

457
447
448

454

458

174
423
159

209
424

214
429

215
447

129
212
253
417
444

132
213
255
423
447

134
214
260
425
447

136
217
262
428
448

136
221
262
431
448

223
245
219
89
106
173
163
169
309
150
278
330
154
136

92

221
320

221
321

221

281

286

287

278

241

2427222250

237
457

241
481

250

253

254

136
228
262
434
449

137
235
263
436
450

138
239
264
439
453

147
240
264
440
456

201
241
264
441
457

App. F

pel

Pop
Pragma
Pragmalist
printinput
printnodes
ps
Push
q
RestoreStack
ret
running
s
SaveStack
sem2
sem3
Semant
set
Stack
StackElem
startpc
sy

cocosyn.MOD
457
469
479
151
296
51
54
489
183
73
305

153
89
41
412
83

Symbollist
Symbolnode
Symbolset

90
52
52
22
97
63
91
34
87
217
448
62
57
55

SyntaxError

17

System
SYSTEM
ie
ta
tab
TableContents
Triple
txt
typ

WriteCard
WriteLn
WriteString

19
20
39
39
66
328
92
159
23
431
18
18
18

458
470

458
471

459
474

460
475

460
476

wal
303
54
73

171
479

172

178

179

195
116
312
159
192
137
419
210
182
116
117
116
98
83
283
58
97
219
453
69
62
60
181

490
116
448
161
314
479
422
219
317
116
1407
117

117
458

117

327

460
476

464
476

465
478

sail

466
478

466
478

466
479

Hl

Sh}

STE
255

7321765209214
256 413
444
445

27,
445

315

433
221
318

299

308

447

457

481

284
417
98
223
454

320
448
98
237
458

321
458
Ton
241

153
250

70

71

97

210

135
135
115
402
182

216
216
131
416
275

259
261
156

430
438
219

223

252

416

421

288

114
432
187
188
184

116
439
187

116
445
188

117
454
196

117
464

ie
469

a
474

ae

196

197

300

309

ee

Zu

zu

328

1

Program listings

(* General

table-driven

syntax

App. F

analyzer

2
3 This is a parser module generated by Coco from an attributed grammar.
4 Before calling the procedure Parse from the main program, initialize
5 the scanner (<grammarname>lex.MOD) .
7 DEFINITION

MODULE

8 VAR
9
printinput:
10
printnodes:

-->modulename;

BOOLEAN;
BOOLEAN;

(*trace
(*trace

the input tokens read*)
the G-code interpretation*)

12 PROCEDURE Parse (VAR correct:BOOLEAN) ;
13 END -->modulename.
14 -->implementation
15 (* General table-driven syntax analyzer

Re

16

Moe

17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

S===2=2=2=2=============2=2===2=2=========>

21.12.83

01 (21.12.83) First version (rewritten from PL/M)
02 (28.02.84) New interface for input and errors
03 (02.04.84) Error in EOL-processing corrected
04 (08.05.84) New EOL-processing
05 (23.07.84) For G-code
06 (30.08.84) Error recovery simplified
07 (05.04.85) New G-code instruction EPSA (ANYA modified)
08 (12.04.87) Grammar tables initialized INLINE
09 (12.04.87) typ,col,line and at exported by cocolex
10 (07.06.87) Name of error module and scanner procedure constant
-----------------222----------222-2...
2... 0.0... “)
IMPLEMENTATION MODULE -->modulename;

FROM
FROM

Errors
FileIO

IMPORT
IMPORT

FROM

System

IMPORT

SyntaxError, Errorptr, Errornode;
con, WriteCard, WriteLn, WriteString;
Allocate;

33 FROM

SYSTEM

IMPORT

ADDRESS,

34
35 FROM -->semantic analyzer
36 FROM -->input module
37

IMPORT
IMPORT

ADR,

INLINE;

Semant;
GetSy, typ,

at,

line,

col;

38 -->declarations

39
40 CONST (*G-code instructions*)
Als)
ta
=];
none):
42
nts=4;
ntas = 5;
any =
6;
43
eps = 8;
epsa = 9;
jmp = 10;
45
46

i

errdistmin
Ilmaxs
eofsy

=
2;
= 50;

(*min.
(*max.

=

(*token

(5

nta
anya
ret

=
3;
=
7;
= 11%

distance between
stack length*)

number

of endfile

two

errors*)

symbol*)

49 TYPE

50
51
52
53.

Attributenumbers
Namepointers
Namelist
Pragma

54

sem2,sem3:

95

END;

56
57
58
59

Pragmalist
Symbolset

Symbolnode

=
=
=
=

ARRAY [0..maxp] OF CARDINAL;
ARRAY(0..maxnamep] OF CARDINAL;
ARRAY[{1..maxname] OF CHAR;
RECORD
(*semantics for a pragma*)

CARDINAL;
N
i

il}

ARRAY [maxt..maxp] OF Pragma;
ARRAY[O..maxt DIV 16] OF BITSET;
(*set of terminals*)
RECORD
(*symbol information (only

for nt)*)

App. F

60
61
62
63
64
65

68

cocosynframe

startpc:
del:
elite Sie

CARDINAL;
BOOLEAN;
Symbolset;

329

(*start node of rule for nt*)
(*TRUE, if nt is deletable*)
(*terminals causing this nt to be analyzed*)

END;

Symbollist
Stack

= ARRAY (maxp+1..maxs] OF Symbolnode;
= ARRAY[1..lmaxs] OF CARDINAL;

VAR
tab:
header:

POINTER TO RECORD (*grammar tables*)
ARRAY [1..8] OF CARDINAL;
(*not used*)
ARRAY[1l..maxcode] OF CHAR;
code:
(*G-code area*)
ntsymbols: Symbollist;
(*nonterminals information*)
epsset:
ARRAY[1..maxeps] OF Symbolset;
anyset:
ARRAY [1..maxany] OF Symbolset;
nra:
Attributenumbers;
(*no.of attributes*)
Pragmalist;
ps:
(*semantics for pragmas*)
Namepointers;
namep:
(*pointers to symbol names*)
name:
(*symbol names*)
Namelist;
END;
correct:
BOOLEAN;
(*error indicator*)
CARDINAL;
pes
(*program counter*)
CARDINAL;
errdist:
newlacts: ARRAY [0..maxt]
ARRAY [0..maxt]
newpc:
s,oldsz Stack;

CARDINAL;

lacts:

(*stack

PROCEDURE

GetSymInstr (pc:CARDINAL;

PROCEDURE

RestoreStack;

PROCEDURE

SaveStack;

“ # FORWARD;

VAR

StackElem(i:CARDINAL):

Triple (altroot:CARDINAL);

Check

PROCEDURE

(* NextSym
---

PROCEDURE

(sy MOD

16)

Get next

---------

----

CARDINAL;

CARDINAL);

----

FORWARD;

FORWARD;

if sy is member

Match(sy:CARDINAL;

RETURN

opcode,sy,nextpc,altpc:

FORWARD;

PROCEDURE

(* Match

pointer*)

FORWARD;

PROCEDURE

BEGIN

(*current error distance*)
(*new stack length*)
(*pc after recovery*)

OF CARDINAL;
OF CARDINAL;

of the

specified

set:Symbolset):

IN set[sy

DIV

16];

set

BOOLEAN;
END

Match;

symbol

- ---------

---

---

- --

ee

+

x)

NextSym;

BEGIN
LOOP
GetSy;

(*IF printinput

THEN

WriteString(con,"$(in:");
WriteString(con,") ");
IF printnodes THEN
WriteCard(con,lacts,3);

WriteCard(con,
typ, 3);

WriteString(con,"|

");

END;

END; *)
IF typ<=maxt
WITH

tab“

THEN

RETURN

END;

AND

(ps[typ].sem2<>0)

DO

IF correct

THEN

Semant (ps[typ].sem2);

END;

330

Program listings

119
120
121
122
123

124
125
126
127
128

IF correct AND (ps[typ].sem3<>0)
END;
IF typ=eofsy THEN RETURN END;
END;

THEN

App. F
Semant (ps[typ].sem3);

END;

END NextSym;

(*===========================

129

(* AdjustPc

IS)

Sean

Adjust

131

PROCEDURE

pc to next

==================================%*)

symbol

instruction

a

AdjustPc (VAR

an

SE

ID

pc:CARDINAL);

132 BEGIN
133
WITH tab“ DO
134
IF pc=0 THEN RETURN;
135

ERRORS

END;

LOOP

136

CASE

137

ORD(code[pc])

OF

t,ta,nt,nta,nts,ntas,any,anya,eps,epsa:

138
139

EXIT;

| jmp: pc:=256*0RD
(code [pc+1])+ORD (code [pc+2]);
| ret: pc:=0; EXIT;

140

ELSE

INC (pc);

(*sem*)

141
142
143
144
145
146
147
148

END;
END;
END;
END AdjustPc;

(* Error
Report syntax error
-------------------------------------------------_____0... 2)

149

PROCEDURE

Error(VAR

pc,altroot:CARDINAL);

150 VAR
151

e,el,h:

152
153
154
155
156
157

1,3: CARDINAL;
opcode, sy,nextpc,altpc,pcl:

Errorptr;

CARDINAL;

PROCEDURE GiveName(q:Errorptr;
VAR p,j: CARDINAL;
BEGIN

158

WITH

159
160
161
162
163
164
165
166
167 BEGIN

p:=namep[sy]; 4:=0;
WHILE ()<25) AND (name[p+}]<>0C) DO
INC (J); q*.txt[j]:=name[p+j-1];
END;
Qoolks=3i7
END;
END GiveName;

168
169

170
171
172
173

174
175

176
wi)

tab*

sy:CARDINAL) ;

DO

(*Error*)
correct :=FALSE;
IF errdist >= errdistmin

THEN
Allocate (h, SIZE (Errornode));
h*.next:=NIL;
pel:=altroot;

WHILE pc1>0 DO

el:=h;
AdjustPc(pcl);

GiveName(h,typ);

GetSymInstr (pcl, opcode, sy,nextpc,altpc);

IF opcode<any

THEN

(*t,nt,nts,ta,nta,ntas*)

Allocate (e, SIZE (Errornode));

(*pass

near-symbol*)

App. F

cocosynframe

178

331

GiveName (e, sy) ;

182

END;

183
184
185
186
187

(*pass expected

el:=e;

el*.next:=e;
END;
pel:=altpc;

179
180
181

e*.next:=NIL;

(*WHILE*)

SyntaxError (h,line,col);
Triple (altroot); SaveStack;
IF printnodes THEN
WriteString(con,"$
typ
FOR 1:=0 TO maxt DO

188

symbol*)

IF newpc[i]<>0

189
190

newpc

newlacts$") ;

THEN

WriteCard(con,1,5); WriteCard(con,newpc[1],10);
WriteCard(con,newlacts[i],10); WriteLn(con);

191

END;

192
193

(*IF*)

END; (*FOR*)
END; (*IF*)

194
195

ELSE
END;

RestoreStack;

196

WHILE

197
198
199
200
201
202
203
204
205
206
207
208

IF printnodes THEN
WriteString(con,"$(skip:"); WriteCard(con,typ,0);
WriteString(con,")
");
END;
Next Sym;
END;
pc:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0;
END Error;

newpc[typ]=0

----------------------------------= == =$5 = === $= == === ==== =-- === --- *)

209

PROCEDURE

(& Fill

Fill

DO

triple

list

with

alt-chain

starting

at pc

Fill(pc, lacts:CARDINAL) ;

210 VAR
211

1,opcode,sy,nextpc,altpc:

212

s:

213

BEGIN

214

AdjustPc (pc);

NS

WHILE

216
217
218

219
220
221

229
230

pc<>0

DO

GetSymInstr
(pc, opcode, sy,nextpc,altpc) ;
CASE opcode OF
ei, cars

newpc[sy]:=pc; newlacts[sy] :=lacts;
| nt,nta,nts,ntas:
s:=tab*.ntsymbols[sy].first;

222
223
224
225
226
227
228

CARDINAL;

Symbolset;

FOR 1:=0 TO maxt DO
IF Match(i,s) THEN newpc[i]:=pc; newlacts[i]:=lacts; END;
END;
IF tab*.ntsymbols[sy].del THEN Fill(nextpc,lacts); END;
| eps,epsa:
Fill(nextpc, lacts) ;
ELSE (*any,anya: nothing*)

END;

(*CASE*)

pc:=altpc;

231

END;

232°
233
234
235

END Fill;

(*WHILE*)

(* FillSuce

Fill

triple

list with

succ.

of alt-chain

at pc

236 --------------------------------------------------------------------- x)

237

App. F

Program listings

332

PROCEDURE

FillSucc(pc,

lacts:CARDINAL)

;

238 VAR
239
240

opcode, sy,nextpc,altpc:
BEGIN

CARDINAL;

241
AdjustPc(pc);
242
WHILE pc>0 DO
(*fill with successors of alternative-starts*)
243
GetSymInstr
(pc, opcode, sy, nextpc,altpc) ;
244
IF nextpc>0 THEN Fill (nextpc,lacts); END;
245
pe:=altpc;
246
END; (*WHILE*)
247
END FillSucc;
248
249
250 (* GetSymInstr
Get G-code instruction at address pc
AS SESS SSSI BSH
HEFTE FREE
IT
ET x
252 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL);
253 BEGIN (*assert: pc points to a symbol instruction (not RET, JMP, SEM, ANY) *)
254

WITH

255
256
257

tab*

DO

opcode
:=ORD (code [pc] );
IF (opcode<=epsa) AND (opcode<>any)
THEN sy:=ORD
(code [pct+l});

258
259

ELSE
END;

260
261
262
263
264

sy:=0;

CASE opcode OF
t,nt,eps:
nextpc:=pct2;
| ta,nta,anya,epsa:
nextpe:=pc+4;

altpc:=0;

altpc:=
(code [pc+2]
256*0RD
) +ORD (code [pc+3]);
265
| nts:
nextpc:=pc+3;
altpc:=0;
266
| ntas: nextpc:=pc+t5;
altpc:
(code
=256*O
[pc+2] )+ORD (codeRD
[pc+3]);
| any:
267
altpc:=0;
nextpc:=pc+l;
268
END; (*CASE*)
269
AdjustPc(nextpc); AdjustPc(altpc) ;
270
END;
271
(*assert: nextpc,altpc point to symbol instructions or are zerot*)
272
END GetSymInstr;
273
274
275 (* Triple
Fill triple list
6722222222
2
ee
ae *)
277 PROCEDURE Triple(altroot:CARDINAL);
278

VAR

279

BEGIN

280
281
282
283
284
285
286
287

288
289
290
291
292
293
294

295

i:

CARDINAL;

FOR i:=0 TO maxt DO
(*clear triple list*)
newpc(i]:=0; newlacts[i]
:=0;
END;
FOR i:=1 TO lacts DO
(*fill with succ.of stacked
(*s[1] contains successor at level 0*)
FillSuce (StackElem(1) ‚,i-1);
Fill(StackElem(1),1-1);

nt's*)

END;

FillSucc (altroot, lacts);
Fill(altroot,lacts);
END Triple;

(*=========================

(*fill
(*fill

END

ERRORS

with
with

succ.of
current

alt-chain*)
alt-chain*)

===S==5S222==22=25=2=22===22=2===22=%)

SF

cocosynframe

(#========================

PROCEDURE
BEGIN

Pop(VAR

loc:

SYNTAXSTACK

333

====================s=s==2=======t)

CARDINAL) ;

IF lacts>0

THEN

loc:=s[lacts];

ELSE

WriteString(con,"---

DEC(lacts);
Parser

stack

underflow.$");

HALT;

END;

(*IF printnodes
END Pop;
PROCEDURE

THEN

Push(loc:

WriteString(con,"

pop");

END;*)

CARDINAL) ;

BEGIN

IF lacts<lmaxs
THEN INC(lacts); s[lacts]:=loc;
ELSE WriteString(con,"--- Parser

stack

overflow.$");

HALT;

END;

(*IF printnodes
END Push;

THEN

WriteString(con,"

push");

END;*)

PROCEDURE RestoreStack;
BEGIN s:=olds; END RestoreStack;

PROCEDURE SaveStack;
BEGIN olds:=s; END SaveStack;
PROCEDURE StackElem(1:CARDINAL):
CARDINAL;
BEGIN RETURN s{i]; END StackElem;

(* TableContents

A dirty

PROCEDURE TableContents;
BEGIN (*%% dont remove or
-->tables
END TableContents;

PROCEDURE
VAR

altroot:
mustread:
opcode:
running:
sy:

Parse(VAR

trick

change

to

initialize

this

comment*)

the

grammar

corr:BOOLEAN) ;

CARDINAL;
BOOLEAN;
CARDINAL;
BOOLEAN;
CARDINAL;

(*root of current alternative chain*)
(*TRUE if next symbol must be read*)
(*instruction code*)
(*interpreter state*)

BEGIN

tab:=ADR(TableContents)+10D;

pe:=startpc; altroot:=pc;
line:=1; col:=0;
correct:=TRUE; mustread:=TRUE;
WITH

tab“

tables

(*initialize

the

running:=TRUE;

DO

WHILE running DO
opcode
:=ORD (code [pc]) ;
IF mustread AND (opcode<=epsa)

THEN

tables*)

358

(*IF

printnodes

INC (pc) ;
CASE opcode
ee

362
363
364
365
366
367
368
369
370
371
372
373
374

mustread:=FALSE;

NextSym;
END;

356
357

359
360
361

App. F

Program listings

334

THEN

INC(errdist);

WriteCard(con,pc,5);

altroot:=pc;
END;*)

OF

IF ORD (typ) =ORD (code [pc] }
THEN IF typ=eofsy
THEN running:=FALSE;
ELSE INC(pc); mustread:=TRUE;
END;
ELSE Error (pc,altroot);
END;

(*t recognized*)

Peete
IF ORD (typ) =ORD (code [pc] )
THEN INC(pc, 3); mustread:=TRUE;
ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]);

(*t recognized*)
(*try alt.*)

END;
jene, nes:

375
376
377
378
379
380

sy:=ORD
(code [pc] );
IF Match(typ,ntsymbols[sy].first) OR ntsymbols[sy]
.del
THEN
(*right nt, parse it*)
IF opcode=nts THEN INC (pc); Semant
(ORD (code[pc])); END;
Push(pc+1); pc:=ntsymbols[sy] .startpc;
altroot:=pc;

381

ELSE

382

END;

Error (pc,altroot);

383

| nta,ntas:

384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400

sy:=ORD (code [pc] ) ;
IF Match(typ,ntsymbols[sy].first)
THEN
(*right nt, parse it*)
INC (pc, 3);
IF opcode=ntas THEN Semant (ORD(code[pc])); INC(pc) END;
Push (pc); pc:=ntsymbols[sy].startpc;
altroot:=pc;
ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]);
(*try alt.*)
END;
| any:
mustread:=TRUE;
(*any recognized*)
| anya:
IF Match (typ,anyset
[ORD (code [pc])])
THEN INC (pc,3); mustread:=TRUE;
(*any recognized*)
ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2] ) ;
END;
| eps:
IF Match (typ,epsset
[ORD (code [pc])])

401

402
403
404
405
406
407
408
409
410

411

THEN INC (pc);

ELSE
END;

| epsa:
IF Match (typ,e
[ORD
psset
(code [pc])])
THEN INC (pc, 3);

(*eps recognized*)
ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2]);
END;
| jmp: Pc:=ORD (code [pc] ) *256+0RD (code [pc+1]); (*goto successor*
)
| ret: Pop(pc); altroot:=pc;
(*end of nt*)

ELSE

412
413

414

Error (pc,altroot);

(*sem*)

IF correct
END;

END;

THEN

Sema
(ORD (opcode)
nt ); END;

(*CASE*)

(*WHILE

running*)

App. F

415
416
417

cocosynframe

335

END; (*WITH tab**)
corr:=correct;
END Parse;

418
419 BEGIN
420
421
422
423

printinput: =FALSE;
printnodes: =FALSE;
errdist:=100;
lacts:=0;

424

END

-->modulename.

ADDRESS

AdjustPc
ADR
Allocate
altpc
altroot

analyzer
any
anya
anyset
at
Attributenumbers

C
code
}
Cole
ee”
con
corr
correct
D
declarations
del
e
el

eofsy
eps
epsa
epsset
errdist
errdistmin
Error
Errornode
Errorptr
Errors
FileIo
Fill
FillSucc
first

33

131
33
32
89
264
94
380
35
42
42
73
36

144
347
174
153
265
149
381

50

74

160
70
370
400
36
31
338
12
347
38
61
151
151

47
43
43
12
82
45
149
30
30
30
31
209
237
62

FORWARD

90

GetSy

36

GetSymInstr
GiveName
h
HALT

89
155)
151
302

173

214

241

269

269

177
175
266
173
390

181
267
184
402

211
269
203
410

216

230

239

243

245

252

262

277

288

289

340

348

356

367

176
263

256
394

267

393

136
372
405
183
186
416
79

138
312
407
349
189

138
375
407

255
378
409

PI
384
409

264
388

264
391

266
391

266
395

354
397

362
397

189

190

190

198

198

199

302

311

118

119

168

350

412

416

225
177
172
121
137
137
400
169
169
204
171
151

376
178
179
363
226
226
405
203

179
179

179

ins)

261
256

839
263

355

404

356

422

367
177
Ve)

381

402

BOS
247
221
91
108
175
165
1m
311

227
285
376
92

232
288
385
93

244

286

289

216
171
171

243
178
172

252

272

172

183

137
137
395

94

header
1

implementation
INLINE
input
5

jmp
1
lacts
line
lmaxs
loc
Match
maxany
maxcode
maxeps
maxname
maxnamep
maxp
maxs

maxt
module
modulename
mustread
name
Namelist
namep
Namepointers
newlacts
newpc
next
nextpc
NextSym
nra

nt
nta
ntas
nts
ntsymbols
olds
opcode

App. F

Program listings

336

69
93°

152

187)

18872199

18977190

21102

2222235273223

21802

PO

EL

Pl

sel

EG,

AI

AS

A

160°

Neh

16

Acleeies

2109
309

52237572255,
310
310

221,
423

2312128328

310
376

385

395

400

405

56
57
83
84
116
36
ii
13
28
424
341723505
3555235623652
Teer
52
iy
76
159

187

222

280

Ss

14
33
36
1522

156

215921607

43

138

409

163
8607203222095
300
301
301
36
183
349
46
65
309
298
301
307
99
100
223
13
70
72
52
ol
50
56
64

29

64

517718
832190502032
84
188
189
1122
795279
Ee) ates)
265
264
262
UOSS
232
0s

37103935396

2195
196

223
203

287
219

SEE
266
56

NG
267

AP
269

223

281

OP

Bley

Deh

OM

DA

DIG

BS)
A

950

O65

DEG

149

203

242
266
371
387
397
409

243
266
372
2308
397
410

74

p
Parse
pc

“ML
ale
PPA
PAL
412137272207
2638
ale
ar
oe
A237
e220
265
TA
221
225376,
Somes
820
SIE 1537
Gy
11,0
230
7A
SV
ee
156°
159160
61
a
ee
al)
80
895 131"
134)

pel
Pop

203
245
267
372,
388
400
410
US)
283

ee
383
Se
Tat
23760,.2379

3858399

il
ac

PNT
ee

PAG.
KGW

BES)
Se)

136,

138)

209
252
348
372
389
401

13985129

214
255
348
375.
389
402

215
257
354
378:
390
405

216
262
356
378)
391
406

219
264
359
9379.
391
407

223
264
362
3790
391
407

sh

AGS)
ANN)

NG

NGG

alii

Pragma

Se}

Ys

Pragmalist

56

75

1295140,

236-937
264
265
365
367
3801039817
395
396
407
409

241
266
370
07384
397
409

App. F

cocosynframe

printinput
printnodes
ps
Push
q
RestoreStack
ret
running
s
SaveStack
sem2
sem3
Semant
semantic
set
Stack
StackElem
startpc
sy

9 420
10
185
197
421
Ths) ar
le
ae
A
307
314
379
389
15572 16122163
91
194
316
#317
43551397410
S43 ees 503537364
BSI
2221?
23
0 lee 31
320
319
184
92
54.378118
94
119
119
35
118
119
378
388
412
35
Fr
99
100
65
85
9352285572865,
3225323
60
348
379
389
COT
9910
022100515322
1755

Symbollist
Symbolnode
Symbolset
SyntaxError

219502218
7225552395,
22352525
379
384
385
389
64
7
59
64
Ol
OZ en
2a?
30
183

SYSTEM

58

System

32

t
ta
tab
TableContents
tables
Triple

Au
le
ak
41
137
218
a
isis)
3303337349
332
94
184
277

EXC

161

typ

SE
362

WriteCard
WriteLn

WriteString

al
263
G3

eo]

92

0923

5159
1757221219
7257222595
34082375.
31608316

El
369
a

22

er)

Gy?

IDEEN
400
405

IE

395

290

LUO
363

EEE
370

18119
376
385

189
190
re

189

190

198

ae

OO

oz

31
31

337

til

9872037203

338

Program listings

Perform

I(*=cocoust
3 This
4
a)
5
b)
6
c)
7
+d)
8
e)

various

tests

with

App. F

graph

top-down

module tests
if all nonterminals can be reached from the start
if there exist productions for all nonterminals
if all nonterminals can be derived to terminals
if the grammar is free of circular derivations
if the grammar satisfies the LL(1)-conditions

10 DEFINITION

MODULE

16 PROCEDURE

LL1Test (VAR

17

if the

19 PROCEDURE
20

(* ok=TRUE

22

PROCEDURE

23

(* ok=TRUE

25

PROCEDURE

26

(* ok=TRUE

28 END

12.1.83

symbol

cocotst;

12 PROCEDURE FindCircularRules(VAR ok:BOOLEAN);
13 (* Finds and prints the circular part of the grammar.
14
no circular part*)

(* Checks

Moe

ok means:

111:BOOLEAN);

grammar

satisfies

the

LL(1)

conditions*)

TestCompleteness
(VAR ok:BOOLEAN) ;
if all

nonterminals

have

rules*)

TestIfAllNtReached
(VAR ok:BOOLEAN);

if all

nonterminals

TestIfNtToTerm(VAR

cocotst.

if all

nonterminals

can

be

reached

from

the

start

ok:BOOLEAN);

can

be

reduced

to

terminals*)

symbol*)

App. F

cocotst MOD

339

1 (* cocotst
Perform various tests with the top-down graph
Moe 11.1.84
2
3 This module tests
4
a) if all nonterminals can be reached from the start symbol
5
b) if there exist productions for all nonterminals
6
c) if all nonterminals can be derived to terminals
i
d) if the grammar is free of circular derivations
8
e) if the grammar satisfies the LL(1)-conditions
Oa
ee
*)
10

IMPLEMENTATION

MODULE

cocotst;

FROM

cocogra

IMPORT

FROM
FROM
FROM

cocolex
cocolst
cocosym

IMPORT
IMPORT
IMPORT

rootloc, ClearMarkList, Deletable, DelNode,
Graphnode, GetNode, Mark, Marked, Marklist;
ddt, GetName;
lst;
maxp, maxs, maxt, ClearSet, GetF,
GetFirstSet, GetFo, GetSy, IsInSet, RepSy,

FROM

FileIo

IMPORT

al!
12
13
14
15
16
17
18
19
20

SetBit,
con,

Unit,

Symbolnode,

WriteCard,

Symbolset,

WriteString,

Symboltype;

WriteText,

WriteLn;

21 VAR
22
headline: BOOLEAN;
(*TRUE if header shall be printed*)
a)
slike
BOOLEAN;
(*TRUE if LL(1) conditions hold*)
24
25
26 (* FindCircularRules
Test grammar for circular derivations
Q] 2222222222220
28 PROCEDURE
29 CONST

*)

FindCircularRules
(VAR ok:BOOLEAN) ;

30
circmax = 150;
31 TYPE
32
Circrule = RECORD
33
left, right: CARDINAL;
34
del: BOOLEAN;
35

END;

36
Circrulelist = ARRAY[l..circmax] OF Circrule;
37 VAR
38
es
Circrulelist;
39
changed:
BOOLEAN;
40
headline:
BOOLEAN;
41
1,j},k,dummy: CARDINAL;
42
ied.
CARDINAL;
43
m
Marklist;
44
singleset:
Marklist;
(*set of single nonterminals in a production*)
45
sn:
Symbo lnode;
46
rside,lside: BOOLEAN;
47
48
PROCEDURE GetSingles(loc:CARDINAL; VAR singles:Marklist) ;
49
VAR gn: Graphnode;
50
51

BEGIN
IF (loc=0)

OR Marked(loc,m)

THEN

RETURN;

52
583
54
55

Mark (loc,m);
GetNode (loc,gn);
CASE gn.typ OF
eps:
GetSingles(gn.rp,singles) ;

56

| t,any:

;

57
58

ent»

IF Deletable(gn.rp)
IF DelNode(gn) THEN

59

END;

(*CASE*)

END;

THEN Mark (gn.sp,singles); END;
GetSingles(gn.rp,singles); END;

340

Program listings

GetSingles(gn.lp,singles) ;
60
END GetSingles;
61
62
PROCEDURE PutCirc(1:CARDINAL);
63
VAR
64
65
1: CARDINAL;
name: ARRAY[1..50]) OF CHAR;
66
sn: Symbolnode;
67
BEGIN
68
IF headline THEN
69
WriteLn (lst);
70
WriteString(lst,"Circular part for this grammar:");
71
72
WriteLn (lst);
73
headline:=FALSE;
74
END;
75
WriteString(lst,"
");
76
GetSy(c[i].left,sn); GetName(sn.spix,name,1);
vy
WriteText (lst,name,1l); WriteString(lst," --> ");
78
GetSy(e[i].right,sn); GetName(sn.spix,name,
1);
19
WriteText (lst,name,1l); WriteLn(lst);
80
END PutCirc;
81
82 BEGIN (*FindCircularRules*)
83
leirc:=0;
84 (*---------------------------- fill list of circular derivations c*)
85
FOR i:=maxp+l TO maxs DO
86
ClearMarkList (singleset); ClearMarkList
(m);
87
GetSy(i,sn);
88
GetSingles (sn.start,singleset);
(*get nt's j such that i->j*)
89
FOR ):=maxp+l TO maxs DO
90
IF Marked(j,singleset) THEN
91
INetlerre);
92
WITH c[lcirc] DO left:=i; right:=j; del:=FALSE; END;
93
IF ddt["D"] THEN
94
WriteCard(con, lcirc,6); WriteCard(con,i,6);
95
WriteCard(con,
j,6); WriteLn(con) ;
96
END;
97
END; (*IF Marked*)
98
END; (*FOR j*)
99
END;
(*FOR i*)
100 (*#=--2==22---2---------- remove non circular derivations from c*)
101
REPEAT
102
changed:=FALSE;
103
FOR 1:=1 TO lcirc DO
104
IF NOT c[i].del THEN
105
rside:=FALSE; lside:=FALSE;
106
FOR j:=1 TO leirc DO
107
IF NOT c[}j].del THEN
108
IF c{i].left=c[4].right THEN rside:=TRUE; END;
109
IF c{j].left=c[i].right THEN lside:=TRUE; END;
110
END;
ala
END; (*FOR j*)
112
IF NOT rside OR NOT lside THEN
113
c{i].del:=TRUE; changed:=TRUE;
114
IF ddt[{"D"] THEN
ISLS)
WriteCard(con,i,6); WriteString(con," deleted$");
116
END;
a7)
END;
118
END;
(*IF NOT c[i].del*)

App. F

cocotst MOD

END;

(*FOR*)

UNTIL NOT changed;
Saas ce contains the

123
124
125
126
20]
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
1155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
apt
172
173
174
175
176
107

341

circular

part

ok :=TRUE; headline:=TRUE;
FO R is=1 10 leire DO
IF NOT c[{i].del THEN PutCirc(i);

of the grammar.

ok:=FALSE;

Print

it*)

END;

END;

IR ok THEN
WriteLn (lst);
WriteString(lst,"Grammar
WriteLn (lst);

contains

no

circular

derivations.");

END;

END

FindCircularRules;
#

GET LlError
PROCEDURE

Print

LL(1)

error

message

LL1Error (code, line, sy:CARDINAL) ;

VAR

ile
name:

CARDINAL;
ARRAY[1..50]

sn:
BEGI N

OF

CHAR;

Symbolnode;

IF headline

THEN

headline:=FALSE;
WriteLn(lst);

WriteString(lst,"LL(1)-error(s):");

Writeln(lst);

END;

WriteString(lst,"
line"); WriteCard(l1st,
line, 4);
GetSy(sy,sn); GetName(sn.spix,name,
1);

WriteString(lst,"
HR
"CASE code OF
1: WriteText (lst,name,1);
WriteString(lst," is start
2:

of more

than

one

alternative.");

WriteText (lst,name,1);

WriteString(lst," is start and successor
WriteString(lst,"rest of rule.");

of deletable

");

END;
WriteLn (lst);
11:=FALSE;

END

LL1Error;

(* LL1Test

Collects

PROCEDURE LL1Test (VAR
VAR
dummy: CARDINAL;
gn:
Graphnode;

1, loc:
m:

Marklist;

sn:

Symbolnode;

terminal

sets

and

checks

LL(1)

conditions

111:BOOLEAN) ;

CARDINAL;

PROCEDURE

Test (VAR

s1,s2:Symbolset;

code, 1ine:CARDINAL) ;

VAR 1:CARDINAL;
BEGIN

FOR i:=0 TO maxt DO
IF IsInSet(i,sl) AND IsInSet(i,s2)
LL1Error (code, line, 1);

THEN

342

Program listings

178

END;

179

180

App. F

END;

END

Test;

181
182
183

PROCEDURE

184
185

VAR
gn:

186
187

CheckAlternatives
(loc, sym:CARDINAL) ;

Graphnode;

locset,s,first,follow:

Symbolset;

BEGIN

188

IF

189

GetNode (loc, gn) ;

(loc=0)

OR Marked(loc,m)

190

IF

ddt["F"]

THEN

RETURN;

THEN

191

WriteCard(con,loc,6);

192
193

WriteCard(con,gn.sp,6);
END;

WriteCard(con,ORD

IF Deletable (loc) THEN
GetFirstSet (loc,s); GetFo(sym, follow);
Test (s, follow,2,gn.line);

197
198

END;
ClearSet (s,maxt);

199

WHILE

loc<>0

DO

200

Mark (loc,m);

201
202
203

GetNode (loc, gn) ;
IF DelNode (gn)
THEN GetFirstSet
(gn.rp, locset) ;
ELSE

205

END;

206

CASE

207

tes

ClearSet

gn.typ

208

| nt:

| eps,any:

;

OF

GetF(gn.sp,first);

Unit (locset, first,maxt);

;

END;

ZN
212
213

Test (s, locset,1,gn.line) ;
Unit(s, locset,maxt) ;
CheckAlternatives(gn.rp,
sym) ;

214

loc:=gn.lp;

215

END;

216

(locset,maxt)

SetBit (locset,gn.sp);

209
210

(gn.typ),6);

WriteLln(con);

194
195
196

204

END;

END

CheckAlternatives;

217
218
219 BEGIN (*LL1Test*)
220
11:=TRUE; headline:=TRUE;
221
FOR 1:=maxp+1 TO maxs DO
222
ClearMarkList
(m) ;
223
GetSy (1,sn);
224

CheckAlternatives(sn.start,1);

225
226

END;
IF 11 THEN

227
228
229
230)
231
232
233
234
239)

WriteLn (lst);
WriteString(lst,"Grammar
END;
d=
END LL1Test;

(* TestCompleteness
Test if all
=== mean nna Sera
ci

236

PROCEDURE

satisfies

LL(1)-conditions.");

nonterminals
ee

TestCompleteness
(VAR ok :BOOLEAN) ;

have

WriteLn (1st);

rules

*

App. F

cocotst MOD

237 VAR
238
sn:

Symbolnode;

239

i,1,dummy:

240

name:

241

242
243
244
245

CARDINAL;

ARRAY[1..50]

OF CHAR;

BEGIN

Py,

ok:=TRUE;
FOR i:=maxp+l TO maxs
GetSy(i,sn);
IF sn.start=0 THEN

246

IF

ok

DO

THEN

247

WriteLn

248
249

WriteString(lst,"Nonterminals
END;

250
251
252

256

257
258
259
260
261

(lst) ;

without

rules:");

GetName (sn.spix,name,1);
WriteString(lst,"
"); WriteText (lst,name,l);

ok:=FALSE;
END;
END; (*FOR*)

253
254
255

343

IF

ok

WriteLn(lst);

WriteLn (lst);

THEN

WriteLn (lst);

WriteString(lst,"All
END;
END TestCompleteness;

nonterminals

Tests

if all

have

nts

can

rules.");

be

WriteLn(lst);

262

(* TestIfAllNtReached

263

---------------------------------------------------==
x)

reached

264 PROCEDURE TestIfAllNtReached(VAR ok:BOOLEAN) ;
265
266

VAR
gn:

Graphnode;

267 » i,1,dummy:

CARDINAL;

268
269
270
271
272
273
274
275

m:
name:
sn:
reached:

Marklist;
ARRAY[1..50]
Symbolnode;
Marklist;

276
za
278

BEGIN
IF (loc=0) OR Marked(loc,m)
Mark (loc,m);

PROCEDURE MarkReachedNts
VAR gn: Graphnode;
sn: Symbolnode;

279

GetNode (loc,gn);

280

WITH gn DO

281
282
283
284
285
286
287
288
289
290
291
292

OF CHAR;

(loc:CARDINAL) ;

THEN

RETURN;

IF

END;

(typ=nt) AND NOT Marked(sp, reached) THEN
Mark (sp, reached); GetSy(sp,sn); MarkReachedNts(sn.start) ;
END;
MarkReachedNts
(lp) ;
MarkReachedNts
(rp) ;
END;
END MarkReachedNts;
BEGIN
ClearMarkList
(m) ;
ClearMarkList (reached);

293

GetNode (rootloc,gn);
GetSy(gn.sp,sn);

Mark(gn.sp, reached) ;

294
295

MarkReachedNts(sn.start);
ok:=TRUE;

344
296
297

Program listings

GetSy(i,sn); GetName (sn.spix,name, 1);
WriteString(lst,"Nonterminal "); WriteText (lst,name, 1);
WriteString(lst," cannot be reached."); WriteLn (lst);
ok:=FALSE;
END;
END;

304

IF

305
306
307
308
309
310

WriteLn (1st);
WriteString(lst,"All nonterminals
END;
END TestIfAllNtReached;

SD
313

ok

THEN

(* TestIfNtToTerm

mm
PROCEDURE

Test

if all

nt

can

be

derived

TestIfNtToTerm(VAR

t

=)
ok:BOOLEAN) ;

which

can

be

derived

to

BEGIN

IF

327

Mark (loc,m);

(loc=0)

OR Marked(loc,m)

328

GetNode (loc,gn);

329

WITH gn DO

330

IF

(typ=nt)

AND

NOT

331
332
333
334
335
336

THEN RETURN
ELSE RETURN
END;
END;
END IsTerm;

337
338

BEGIN (*TestIfNtToTerm*)
ClearMarkList (termlist) ;

340
341
342
343
344
345
346
347
348
349
350
351

to

nen

326

339

WriteLn(1lst);

reached.");

be

can

314 VAR
315
i1,1,dummy: CARDINAL;
3165
ssn:
Symbolnode;
Si
name:
ARRAY[1..50] OF CHAR;
318
changed:
BOOLEAN;
319
termlist: Marklist;
(*list of nts
3202
M:
Marklist;
321
term:
BOOLEAN;
322
323
PROCEDURE IsTerm(loc:CARDINAL)
: BOOLEAN;
324
VAR gn: Graphnode;
325

symbols*)

not marked

FOR i:=maxp+l TO maxs DO (*report
IF NOT Marked(i, reached) THEN

298
299
300
301
302
303

311

App.

THEN

RETURN

FALSE;

END;

Marked(sp,termlist)

IsTerm(lp);
(rp=0) OR IsTerm(rp)

OR IsTerm(lp);

REPEAT

changed: =FALSE;
FOR i:=maxp+l TO maxs

DO

IF NOT Marked(i,termlist)
GetSy (1,sn);
ClearMarkList
(m);
term:=IsTerm(sn.start);

THEN

IF term THEN Mark (i,termlist);
IF ddt["E"] THEN
WriteCard(con,i,6);
IF term
THEN WriteString(con,"

ELSE

WriteString(con,"

352
353

END;
END;
(*IF

354
355

END; (*FOR*)
UNTIL NOT changed;

NOT

Marked*)

changed:=TRUE;

reducable

not

END;

to term.$");

reducable

to term.$");

END;

t*)

App. F

cocotst

356
357

ok:=TRUE;
WriteLn (lst);

358

FOR i:=maxp+l

TO maxs

MOD

345

DO

359
IF NOT Marked(i,termlist) THEN
360
GetSy(i,sn); GetName(sn.spix,name,1);
361
WriteText (lst,name,1);
362
WriteString(lst," cannot be derived to terminals."); WriteLn(ls
t);
363
ok:=FALSE;
364
END;
365
END; (*FOR*)
366
IF ok THEN
367
WriteString(lst,"All nonterminals can be derived to terminals.");
368
WriteLn (lst);
369
END;
>
370
END TestIfNtToTerm;
hl
372
373 END cocotst.

any
c
changed

56
209
38
76
78
SIERT

92
120
216

104
318
224

107
340

108
346

108
355

86
204

222

290

291

338

344

172
94

gH
95

95

115

115

il

191

ddt
114
del
104
Deletable
194
DelNode
202
dummy
41
165
239
eps
55
209
FileIo
19
FindCircularRules 28
131
first
186
208
208
follow
186
195
196
GetF
16
208
GetFirstSet
17195722083

190
107

347
118

124

267

315

147
201
60
87
55
201
280
185
73

250
219
61
147
57
202
292
266
122

282
58
207
324

293
60
208
328

CheckAlternatives
circmax

Circrule
Circrulelist
ClearMarkList
ClearSet

cocogra
cocolex ,
cocolst
cocosym
cocotst
code
con

GetFo
GetName
GetNode

GetSingles
GetSy
gn

Graphnode
headline

30
32
36
12,
16
12
14
15
16
10
136
19
350
14
34
12
12

183002183
36

124

192

192

348

298
166
211
329

343
360
US)
IHS)
21357214

36
38
86
198

373
149
94
351
93
92
57
58

17.195
14
76
13
53

48
Ma
49
191
266
13
22

113

55)
53
192
274
49
40

78
189
08
TKS}
54
196
279

166
69

298
292
88
223
57
203
292
274
142

360
328

244
58
206
293
324
143

220

IsInSet
IsTerm

j
k
1

ikenlfete:
left
line
ifal
at
LLlError
LL1Test
loc

locset

m

Mark

Marked
Marklist
MarkReachedNts
maxp
maxs
maxt

name
nt
ok

PutCirc
reached

RepSy
right
rootloc
rp
rside
s
sl
s2
SetBit
singles
singleset
sn

sp

App. F

Program listings

346

4
113
223
343
107
323
41
41
65
267
42
33
136
23
163
136
163
48
195
326
186
60
46
IL)
144
227
299
368
43
320
13
13
13
273
16
16
16
66
269
Di
28
304
63
Zale.
17
33
12
55
46
186
172
172
18
48
44
45
223
298
57

104
176
315

108
177
341

109
221
342

152

239

250

251

188
277

188
278

189
279

191
323

194
326

ul
150
251
306

18
151
251
357

79
152
251
361

2
153
256
362

128
154
257
362

129
156
257
367

188

200

222

268

277

278

290

278
277
168
287
243
243
208
79
360

282
281
268
294
296
296
DZ
189
361

292
297
ZN

327
326
319

346
330
320

342

359

341
341

358
358

147

150

152

240

250

251

236
266

242

246

252

239

264

295

301

Zon)

292

zZ

92

108

109

58
108
196

203
Ha
198

213

285

332

332

211

212

57
88
76
238
316
207

58
90
76
244
343
208

60

87
270
360
282

88
PAS)

140
Se:

147
147
169
2820729377294

292

293

330

92
141,3
296

94
1775
297

103
176
298

107

108

109

147

150

106

123

167
273

183
Zu

208
332

211

212

12)
146
248
305

77
148
248
306

168

63
115
224
346
176
sit
89

76
123
239
348
176
332
90

78
124
243
358

85
124
244
359

87
167
267
360

332
92

335
95

345
106

76
298
83
76
146
254
230
158
231.
Sil
199
327
203
214
105
70
144
228
299

am
299
Sil
92
172
220

78
sly)
92
108
177
226

79
360
94
109
196
230

138
361
103

52
201

53
214

207
es
112
72
146
247
300

51
326
52
fet
43
282
85
85
1075
76
298
208
122
313
80
281

52
327
57
90
44
284
89
89
198
Ui
299
281
124
356
124
282

86
344
200
188
48
285
221
22)
204
78
Sur]
330
126
363

78
292
57
105
195
176
176
207
55
86
67
224
298
192

211

177

sl
200
328
204
284
109
71
144
228
300

78
245
345
281

78
250
360
282

App. F

cocotst MOD

spix
start
sy
sym
Symbolnode
Symbolset
Symboltype
t

76
88
136
183
18
18
18
56

term
termlist

321
319

Test
1072
TestCompleteness 236
TestIfAllNtReached
TestIfNtToTerm
318
typ
54
Unit
18
WriteCard
19
WriteLn
19
228
WriteString
19
228
WriteText
19

78
224
147
195
45
172
207
345
330
180
259
264
370°
191
208
94
70
247
71
248
77

147
245

250
282

298
294

360
345

213
67
186

140

169

238

346
338
196

349
342
211

346

359

281

330

95
79
251
1
257
150

115
95
256
145
299
152

347

270

275

316

191
129
300
144
306
299

191
144
305
146
350
361

192
144
306
148
351

308
206
212
94
72
248
75
251
79

146
127
257
128
300
25

348
156
San
151
362

192
362
158
367

22,
368
154

Program listings
module

General

(* Errors

App. F

error

to store

Moe

messages

21.03.84

This module stores information about syntax errors and semantic errors.
The information can either be retrieved afterwards or be printed
automatically as simple error messages.
Furthermore the module contains procedures to report compiler errors
and implementation restrictions. These procedures cause a program stop.
DEFINITION

FROM

MODULE

FileIO

IMPORT

Errors;

File;

TYPE

Symbolname

{|
=

ARRAY[1..25]

Errorptr
POINTER
Errornode
= RECORD
txt:
Symbolname;
Ike
next:
END;

PROCEDURE
(* Reports

OF CHAR;

TO Errornode;
(*expected symbol

in syntax

error

message*)

CARDINAL;
Errorptr;

CompErr(nr:CARDINAL) ;
compiler

error

nr

and

stops

the

program*)

PROCEDURE GetNextSemErr(VAR nr, line,col:CARDINAL);
(* Gets the error number, the line number and the column
next semantic error. nr=0 if no next error exists*)

number

of

the

PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL) ;
(* Gets the expected symbols, the line number and the column number of
the next syntax error. symbols=NIL if no next error exists*)

PROCEDURE GetNumberOfErrors(VAR synerrors, semerrors:CARDINAL) ;
(* Gets the total number of syntax errors and semantic errors which
occurred during compilation*)
PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL) ;
(* Prints error messages for all stored semantic errors
(line,col,
error number). semerrors holds the total number of stored semantic
errors*)

PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL);
(* Prints error messages for all stored syntax errors (line,col,
"near symbol",expected symbols). synerrors holds the total number
stored syntax errors*)

PROCEDURE
(* Prints

PrintSynError(f:File; symbols:Errorptr; col:CARDINAL)
one error message line (* expected symbols) .*)

PROCEDURE Restriction(nr:CARDINAL);
(* Reports implementation restriction

nr and

PROCEDURE SemErr(nr,line,col:CARDINAL);
(* Stores the error number, line number and
error*)

PROCEDURE
(* Stores

stops

column

the

of

;

program*)

number

of a semantic

SyntaxError (symbols:Errorptr; line,col:CARDINAL) ;
the "near-symbol", the expected symbols, the line number

and

App. F

60

Errors. DEF

the

column

61
62 END

Errors.

number

of a syntax

error*)

349

Program listings

(* Errors

General

module

to

store

App. F

error

messages

Moe

21.03.84

This module stores information about syntax errors and semantic errors.
The information can either be retrieved afterwards or be printed
automatically as simple error messages.
Furthermore the module contains procedures to report compiler errors
and implementation restrictions. These procedures cause a program stop.
IMPLEMENTATION

(*imports
FROM

Errors;

of definition

FileIO

(*imports

MODULE

IMPORT

module*)

File;

of implementation

FROM

FileIO

IMPORT

FROM

System

IMPORT

module*)

con, Write, WriteCard, WriteLn, WriteString,
WriteText, Read;
Allocate, Deallocate, Terminate, normal;

TYPE
Semerrptr = POINTER TO Semerror;
Semerror
= RECORD
nr,line,col: CARDINAL;
next: Semerrptr;

END;
Synerrptr
Synerror

= POINTER
= RECORD

symbols:

TO

Synerror;

Errorptr;

line,col: CARDINAL;
next: Synerrptr;

END;
VAR
semerr:
synerr:

Semerrptr;
Synerrptr;

(* CompErr

Reports

compiler

error

nr

and

stops

the

program

PROCEDURE CompErr(nr:CARDINAL);
VAR dummy:CARDINAL; ch:CHAR;
BEGIN

PrintSynErrors (con, dummy) ; PrintSemErrors (con, dummy) ;
WriteString(con,"Compiler error "); WriteCard(con,nr,0);
WriteString(con,". Program terminated.$");
WriteString(con,"Press a key to continue.$"); Read (con, ch) ;
Terminate (normal);
END

5

CompErr;

(* GetNextSemErr

Gets

next

semantic

error

information

53 PROCEDURE GetNextSemErr
(VAR nr, line,col:CARDINAL) ;
54 VAR p: Semerrptr;
55 BEGIN
56
IF semerr=NIL
57
THEN nr:=0; line:=0; col:=0;
58
ELSE
59
P:=semerr;

App. F

60
61
62
63
64
65
66
DES
68
69
70
71
72
a3

Errors MOD

351

nr:=p*.nr; line:=p*.line; col:=p*.col;
semerr:=p*.next; Deallocate(p);
END;
END GetNextSemErr;

(* GetNextSynErr
Gets next syntax error information
a a ea
mu
land
PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL) ;
VAR p: Synerrptr;
BEGIN
IF synerr=NIL
THEN symbols:=NIL; line:=0; col:=0;
ELSE

‘i

74
p:=synerr;
15
symbols:=p*.symbols; line:=p*.line; col:=p*.col;
76
synerr:=p*.next; Deallocate (p);
77
END;
78
END GetNextSynErr;
79
80
81 (* GetNumberOfErrors
Gets the total number of errors that occurred
82 --------- 2222-222 ----- 4-2.
20002 2-2 ---==
=~
83 PROCEDURE GetNumberOfErrors(VAR synerrors, semerrors:CARDINAL);
84 VAR
85
syn: Synerrptr;
86
sem: Semerrptr;
87

88
89

90

*)

©)

BEGIN

synerrors:=0; syn:=synerr;
„WHILE syn<>NIL DO INC(synerrors);

Semerrors:=0;

syn:=syn*.next;

END;

sem:=semerr;

91
WHILE sem<>NIL DO INC(semerrors); sem:=sem*.next; END;
92
END GetNumberOfErrors;
93
94
95 (* PrintSemErrors
Prints simple error messages for semantic errors
96 --------------7--7
222-227
*)
97 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL) ;
98 VAR
9
p:
Semerrptr;
100
synerrors: CARDINAL;
101 BEGIN
102
GetNumberOfErrors (synerrors,semerrors);
103
IF semerrors>0 THEN
104
WriteString(f,"Semantic errors:$$");
105
p:=semerr;
106

WHILE

p<>NIL

DO

107
WriteString(f,"line"); WriteCard(f,p*.line,5);
108
WriteString(f," col"); WriteCard(f,p*.col,
3);
109
WriteString(f,": error "); WriteCard(f,p*.nr,0);
110
WriteLn (f);
aval
p:=p*.next;
#12
END;
la)
END;
114
END PrintSemErrors;
115
116
117 (* PrintSym
Print a symbol in error message

118 ----------------------------------------------------------------------- x)

119
120

121
122
123

App. F

Program listings

352

PROCEDURE
BEGIN

IF

txt:ARRAY

PrintSym(f:File;

OF

CHAR;

len=1
THEN Write(f,'"'); Write(f,txt[0]);
ELSE WriteText (f,txt,len);

len:CARDINAL);

Write(f,'"");

124
END;
125
END PrintSym;
126
127
Print expected symbols
128 (* PrintExpected
7
129 ------ == = 777777777707
7770000007
130 PROCEDURE PrintExpected(f:File; VAR p:Errorptr);
131 VAR first:BOOLEAN; q:Errorptr;
132 BEGIN
133.
first:=TRUE;
134
WHILE p<>NIL DO
135
IF first THEN first:=FALSE
136
ELSIF p*.next=NIL THEN WriteString(f,' or ')
137

ELSE

138
139
140

END;
PrintSym
(i; p> txt, poe)
q:=p; p:=p*.next; Deallocate(q);

141
142

143

WriteString(f,',

')

END;
WriteString(f,' expected’);
END PrintExpected;

WriteLn(f);

144
145
146 (* PrintSynErrors
Prints simple error messages for syntax errors
al
en
Haaren
148 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL) ;
149 VAR
150
iil
152
153

154
155
156
1157
158

159
160

err,errl:
og
semerrors:

Synerrptr;
Errorptr;
CARDINAL;

BEGIN

GetNumberOfErrors (synerrors, semerrors);
IF synerrors>0 THEN
WriteString(f,"Syntax errors:$$");
err:=synerr;
WHILE

err<>NIL

DO

WriteString(f,'line');
p:=err”.symbols;

WriteCard(f,err*.line,5);

161
WriteString(f,' near '); PrintSym(f,p*.txt,p*.1);
162
Writes trstno (ti
een:
163
PrintExpected(f,p*.next); Deallocate(p);
164
errl:=err; err:=err*.next; Deallocate(errl);
165
END;
166
END;
167
END PrintSynErrors;
168
169
170 (* PrintSynError
Prints one error message line
OSI
I
BE
EEE
172 PROCEDURE PrintSynError(f:File; symbols:Errorptr; col:CARDINAL);
173
174

a)

VAR i:CARDINAL;
BEGIN

175

WriteString(f,"*****

176
Wii

WriteString(f,"* ");
PrintExpected(f,symbols” .next);

");

FOR

4:=1

TO

col-1

Deallocate

DO Write(f,"

(symbols);

") END;

x

App. F

178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
1199
200
201
202
203
204
205
206
207
208
209
210
211
212

Errors MOD

END

PrintSynError;

(* Restriction

Reports

impl.

ee
eS a mem ees SO a oe

SeI

restriction
a

nn

nr and

nu

stops

the program

nn

nun

lan

x)

PROCEDURE Restriction(nr:CARDINAL) ;
VAR dummy:CARDINAL; ch:CHAR;
BEGIN
PrintSynErrors(con,dummy); PrintSemErrors
(con, dummy) ;
WriteString(con,"Implementation restriction "); WriteCard(con,nr,0);
WriteString(con,". Program terminated.$");
WriteString(con,"Press a key to continue.$"); Read(con,ch);
Terminate (normal);

END Restriction; ”
(* SemErr
a

a a

PROCEDURE
VAR

Stores

- -- -—- - ===

e,p,qg:

information

22-22

en

about

nn

semantic
------

error

- =

22==-----_______

*)

SemErr (nr, line,col:CARDINAL);
Semerrptr;

BEGINAllocate
(e,SIZE (Semerror)); e*.nr:=nr; e*.line:=line; e*.col:=col;
p:=semerr; q:=NIL;
WHILE (p<>NIL) AND (p*.line<line) DO q:=p; p:=p*.next; END;
WHILE (p<>NIL) AND (p*.line=line) AND (p*.col<col) DO
q:=p; p:=p*.next;
END;
IF q=NIL THEN
e*.next:=p;
« „END SemErr;

(* SyntaxError

semerr:=e;

ELSE

Stores

g*.next:=e;

information

PROCEDURE SyntaxError (symbols:Errorptr;
213 VAR e,p,q: Synerrptr;
214

353

END;

about

syntax

error

line,col:CARDINAL) ;

BEGIN

215
216
ZN
218
219
220

221
222
223

Allocate(e,SIZE(Synerror)
);
e*.symbols:=symbols; e*.line:=line; e*.col:=col;
p:=synerr; q:=NIL;
WHILE (p<>NIL) AND (p*.line<line) DO q:=p; p:=p%.next;
WHILE (p<>NIL) AND (p*.line=line) AND (p*.col<col) DO
q:=p; p:=p*.next;
END;
IF q=NIL THEN synerr:=e; ELSE q*.next:=e; END;
e*.next:=p;
END SyntaxError;

END;

224
225
226 BEGIN
(*Errors*)
227 ~ synerr:=NIL; semerr:=NIL;
228
END Errors.

Allocate
ch
col
CompErr
con

iz, Se)
ails}
44
46
184
189
a)
eS ee
OO
NU Sie 969
IE O2
40
48
TSA
Ses Se 4444

OC
OC
er
A
0221221672165

45465

46)

186

a
219219

186

18702187

2

err

188
17
41
197
ONG
150

189
61
43
199
QE
157

189
76
43
199
APE
158)

errl

150

164

164

Deallocate
dummy
e

Errorptr
Errors
f

File
FileIo
first

App. F

Program listings

354

ale)
9 228
C7) LOA
Ore

140
184
199
Pees}
159°

163
186
199

164
186
205

205

C0

eA

64162

stk

a

ee

Be

A

aeOS

OF ee

177

De
oe

ee
le

ee
ae
a

12
12
eh

ec10

BP

CU an
15
ek
ae

GetNextSemErr

93

63

GetNextSynErr

68

78

EP

206

ae

2

SMO)
43

aly
97

ans)
114

aa
186

PrintSym

Ie)

sy

isi)

ANS

PrintSynError
PrintSynErrors
q

278
43
148
167
13D
LAO
LAO
220022200722
le
46
189
183
191
86
90
91
34
56
59
196
207
21
201199
83
9
91
2
24
34
28
68
72
85
8
89
ea
ul
74
26
2235
83
88
89
26
30
Se)
Zee
224
17

I

eh

19.0)

216

216

ne

re

12

1

eA

se

Ale

AS

75
26

YS
le
SSR)
2160210218

ee

er

a

183

187

196

199

75

15

75

ashe)

US
az
EE MUS
136521392139
140
en
AN
al
AA
CAVES PAYS a
22005220
2205223

Read
Restriction
sem
semerr
SemErr
Semerror
semerrors
Semerrptr
symbols
syn
synerr
Synerror
synerrors
Synerrptr
SyntaxError
System
Terminate

215

re

GetNumberOfErrors 83
92
102
154
1
Hash.
aS)
il
139
161
len
Mae)
brat
sue
line
23
29
55
57
60
60
68
72
196"
199)
199°
201
201
52027
202
21205
219
219
next
24
30
61
76
89
sr
a
re
2.010=2032°2.0552206.0218772202225223
normal
ia
47
190
nr
23
40
44
53
57
60
60
109
199
p
54
59
60
60
60
61
61
69

PrintExpected
PrintSemErrors

213

186
ITEE2

91
61

KS
140
AW
Pay

0

ae
SSI
POL
ale

102

74

Me)
en
NCOm
Glen
202202
A
A

0201203205205

91
SOROS ae

Sr
54
86
75
75
89
89
76
Be
100
SY

a
140
AA
NE

Ra

ae
SY
oe
l6Seeale3
DOP
Ros
wwe
WC)
Maley DING

Dong

227

ANNs}
GY
99
197
16070172

ahGya

177

170.

eh

QV

222

297)

148
aI)

154
AS)

155

aie

more

App. F

txt

Write
WriteCard
WriteLn
WriteString
WriteText

Errors. MOD
119
15
15
15
15
159
16

122
122
44
110
44
161
123

123
139
122
122
107
108
142
45
46
162 21775

355

161
175
109

159

187

104
176

107
187

108
188

109
189

136

137

142

156

App. F

Program listings

356

Simple

1 (* FileIo
2
3 This module
4 except that

provides
they can

IO with

more

than

one

16.8.87

Moe

file

procedures which are similar to those
be used with more than one file (even

of InOut,
with the

5 console).
7 DEFINITION
9 FROM

MODULE

SYSTEM

FilelO;

IMPORT

10 FROM Toolbox IMPORT
11 FROM OS
IMPORT
2
13 CONST
14
DEL = 177C;
15
EF
=
4C;
16
EOL =
15C;
wi

PSCa=ooGs

18

buffersize

20

21
22

=

WORD;

DialogPtr;
ParmBlkPtr;

16*1024;

TYPE

File = POINTER TO FileRecord;
FileRecord = RECORD

23
24

ref:
volRef:

INTEGER;
INTEGER;

25

name:

ARRAY[0..63]

26
27
28
29
30

buffer:
bp:
bb:
output:
eof:

ARRAY(0..buffersize-1]
CARDINAL;
(*index of
CARDINAL;
(*number of
BOOLEAN;
(*true, if
BOOLEAN;
(*true, if

31
END;
32
33 VAR
SAM
con:
35
Done:
36

termCH:

(*file reference number*)
(*volume (subdirectory) reference

OF CHAR;

File;
BOOLEAN;
CHAR;

(*Modula

string

number*)

terminated

by 0C*)

OF CHAR;
next byte in buffer*)
bytes in buffer*)
opened for output*)
no more unread bytes*)

(*console file (screen and keyboard)*)
(*TRUE if an operation was successful*)
(*first character after input text*)

37
38 (* --SYETYPE

40
4
42

for Mac

FilterHook
DialogHook
Filetype

u

open

dialog

box

(see

"Inside

Macintosh")

---

*)

PROCEDURE (ParmBlkPtr) : BOOLEAN;
PROCEDURE (INTEGER, DialogPtr): INTEGER;
ARRAY[0..3] OF CHAR;

43
44 VAR

45
46
47
48
49
50

Sl
52

errCode:
filterHook:
dlgHook:
ftype:

DialogHook;
ARRAY[0..3]

(*file
(*file

manager status code*)
filter procedure (init none)*)
(*dialog handling procedure (init none)*)
OF Filetype;
(*file types to be handled by open dialog*)
(*init: ftype(0]:="TEXT", ftype[1..3]:=""*)

(* ------------------------------------------------------~-

53 PROCEDURE
54

55
56
57
58
59

INTEGER;
FilterHook;

Open(VAR f:File; volRef:INTEGER;
output
: BOOLEAN) ;

fn:ARRAY

a)

OF CHAR;

(* Opens file f with name fn on volume (subdirectory) volRef.
volRef
0:default volume; 1:internal drive; 2:external drive
negative:volume or subdirectory reference number.
fn
- If not empty, fn is the name of the file to be opened on
volume (subdirectory) volRef. The drive number may be placed

App. F

60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
a
112
113
114
115
116
117
118

FilelO.DEF

357

In front of the file name separated by a colon (e.g.1:na
me).
It overwrites volRef.
- If empty, an open dialog box is displayed which allows
choosing the volume, subdirectory and filename. The chosen
values are returned in f*. The value of volRef is irrelevan
t
in this case.
(Advanced programmers: Only those files are displayed whose
file type is contained in ftype. Own procedures may be
supplied in the variables "filterHook" and "dlgHook" to
suppress file names in the open box or to handle additional

output

dialog

items.)

TRUE:

the specified
file with the

FALSE:
Done

indicates

file
same

is opened for output.
name is deleted.

the specified file is opened for input.

if the

PROCEDURE

Close

(VAR

(* Closes

file

f.

file

f has been

indicates

opened

NIL*)

if the operation

f (no echo

has been

ReadInt (f:File;

VAR

on

the

are

skipped).

are

skipped) .

val:INTEGER);

(* Reads an INTEGER from file f (leading
~ . termCH and Done get values*)

blanks

PROCEDURE ReadString(f:File; VAR s:ARRAY OF CHAR);
(* Reads a string of characters (terminated by " " or CR)
file f. termCH and Done get values*)
PROCEDURE ReadWord(f:File; VAR w:CARDINAL);
(* Reads a 16 bit word w from the file f without

PROCEDURE

Write(f:File;

(* Writes

a character

PROCEDURE
(* Writes

width

ch:CHAR);
ch to

the

file

f*)

WriteCard(f:File; nr:CARDINAL; w: INTEGER);
a CARDINAL nr with width w to the file f. If the
of nr is bigger than w, w is expanded*)

WriteHex(f:File; a:ARRAY
length hexadecimal bytes

PROCEDURE
(* Writes

WriteInt(f:File;
1:INTEGER; w: INTEGER);
an INTEGER i with w characters to file f.

PROCEDURE

(* Skips

from

conversion*)

PROCEDURE
(* Writes

width

console).

successful*)

PROCEDURE ReadCard(f:File; VAR val:CARDINAL);
(* Reads a CARDINAL from file f (leading blanks
termCH and Done get values*)
PROCEDURE

successfully.*)

f:File);

f becomes

PROCEDURE Read(f:File; VAR ch:CHAR);
(* Reads a character ch from the
file

Done

Any existing

of nr is bigger

than

w,

actual

OF WORD; length: INTEGER) ;
from a to the file f*)

If the

actual

w is expanded*)

WriteLn(f:File);

to the

start

of the next

line

on the

file

PROCEDURE WriteString(f:File; s:ARRAY OF CHAR);
(* Writes a string s to the file f. Any occurrence
"Ss" in s causes a WriteLn*)

f*)

of the

character

App. F

Program listings

358

119

PROCEDURE

WriteText

120
121

(* Writes

a text

122

PROCEDURE

WriteWord(f:File;

123
124

(* Writes

a 16 bit

125

END

FilelO.

(f:File;

t with

word

t:ARRAY

length

OF

CHAR;

1 to the

file

1: INTEGER);

f*)

w:CARDINAL) ;

w without

conversion

to the

file

f*)

App. F

1
2
3
4
5

FilelO.MOD

(* FileIo

Simple

This module
except that
console).

provides
they can

7 IMPLEMENTATION
9 FROM

SYSTEM

IO with

more

than

359

one

file

Moe

16.8.87

procedures which are similar to those
be used with more than one file (even

MODULE

FileIo;

of InOut,
with the

IMPORT

WORD,

10 FROM MemTypes

IMPORT

Str255,

11 FROM
112
18

IMPORT

DupFNErr, EOFErr, OSType, ParamBlockRec,
FS, PBHOpen, PBHCreate,PBClose, PBHDelete,
PBWrite,

OS

,

14

'

ADR,

HFS,

GetCatInfo,

15

SFGetFile,

16

SFTypeList;

17 FROM
18 FROM

QuickDraw
Toolbox

IMPORT
IMPORT

Point;
ModStr,

19 FROM

System

IMPORT

Allocate,

20

SETREG,

REG,

SHORT,

VAL;

ProcPtr;
PBRead,

SetCatInfo,

SFPutFile,

SFget,

SFput,

SFReply,

PasStr;

Deallocate;

Terminal;

IMPORT

25 PROCEDURE Open(VAR f:File;
output:BOOLEAN) ;
26

volRef:INTEGER;

fn:ARRAY

OF CHAR;

27 VAR
28
ZI

par:
Ss:

30
31
32

. pt:
reply:
tlist:

ParamBlockRec;
Str259;
Point;
SFReply;
SFTypeList;

33

4i,4,1:

34
35
36
37

PROCEDURE Create (drive: INTEGER; name:ARRAY OF CHAR;
type,creator:OSType; VAR status: INTEGER) ;
VAR statusl: INTEGER; par:ParamBlockRec;

38

BEGIN

39

WITH par DO

40
41
42
43

loNamePtr:=ADR (name); 1oVRefNum:=drive;
status:=FS(PBHCreate,par); statusl:=0;
IF status=DupFNErr THEN
statusl:=FS(PBHDelete,
par) ;

44

status:=FS (PBHCreate,

45
46

END;
IF (status=0)

47
48

49
50
51
52
53
54
55
56

INTEGER;

AND

ioVersNum:=0C;

par);

(statusl=0)

THEN

(*set

finder

info*)

1oFDirIndex:=0; status:=HFS (GetCatInfo,par);
IF status=0 THEN

1oFlFndrInfo.fdType:=type; ioFlFndrInfo.fdCreator:=creator;
1oDirID:=0;
status:=HFS (SetCatInfo,par);
END;
END;
END;
END Create;

57 BEGIN

58
59

ioDirID:=0;

Done:=TRUE;
IF fn[{0J=0C

errCode:=0;
THEN (*get file

name

from dialog box*)

App. F

Program listings

360

60
61
62

pt.v:=60; pt.h:=100; PasStr(fn,s);
IF output
THEN SFPutFile(pt,s,s,VAL(ProcPtr,dlgHook)
, reply, SFput)

63

ELSE

64
65
66
67

1:=0;
WHILE (1<4) AND (ftype[1,0]<>0C) DO
FOR j:=0 TO 3 DO tlist[i,j+1]:=ftype(i,j]
INC (1)

68

END;

END;

69
70

SFGetFile(pt,s,VAL(ProcPtr, filterHook),i,tlist,
VAL (ProcPtr,dlgHook)
, reply, SFget)

71
72

END;
IF reply.good

73

THEN

74

1:=ORD (reply. fName[0]);

25

FOR

i:=0

TO

1 DO

s[i]J:=reply.fName[i];

END;

76
77
78
79
80
81
82
83
84

volRef:=reply.vRefNum
ELSE errCode:=2
(*cancel*)
END;
ELSIF (£n[1]=":") AND (£n[0]>="0") AND (fn[0]<="9") THEN
volRef:=ORD (£fn[0])-ORD ("0");
1:=2;
WHILE (i<=HIGH(fn)) AND (fn[{i]<>0C) DO s[i-1]:=fn[i]; INC(i)
s[0):=CHR(i);
ELSE PasStr(fn,s);

85

END;

86
87
88

89
90
91

92

IF output & (errCode=0) THEN
Create (volRef,s, "TEXT", "222?" ,errCode) ;
END;
IF errCode=0

THEN

WITH par DO

93
94
95
96
97

joNamePtr:=ADR(s); 10VRefNum:=volRef;
1oPermssn:=0C; ioMisc:=NIL;
errCode:=FS (PBHOpen, par) ;
IF errCode=0 THEN
Allocate
(f,SIZE (FileRecord));

98

IF

99

f<>NIL

f*.bp:=0;

£*.bb:=0;

f*.volRef:=volRef;

f*.eof:=FALSE;

ModStr(s,f*.name) ;

f* .output:=output;

END;
END;

102
103

END;
IF errCode#0
END Open;

THEN Done:=FALSE;
106
107
108
109 (* Close
Close file f
110 ===
Hanne
en Se ae
111 PROCEDURE Close(VAR f:File);
112 VAR par:ParamBlockRec;

114
115
116
IKT]
118

ioDirID:=0;

END;

101

113

10VersNum:=0C;

THEN

f*.ref:=1oRefNum;

100

104
105

END;

f:=NIL

END;

oe

ee

gee eee

BEGIN

IF f=NIL THEN RETURN END; (*con cannot be closed*)
par.ioRefNum:=f*.
ref;
IF f*.output THEN
par.ioBuffer:=ADR(f*.buffer);
par.ioReqCount :=f*.bp; par.ioPosMode:=0; par.ioP
osOffset:=0;

App. F

19
120

FilelIO MOD

errCode:=FS
END;

361

(PBWrite, par)

121

errCode:=FS

122
123
124
125
126
aa

Deallocate(f);
END Close;

(* Read
Read a character from file f
ee
ee

128

PROCEDURE

(PBClose, par) ; Done:=errCode=0;

f:=NIL;

Read(f:File;

VAR

x)

ch:CHAR) ;

129 VAR par:ParamBlockRec;
130

BEGIN

1

IF f=NIL

132

THEN

133

ELSE

134

(*con*)

Terminal.Read(ch);

WITH

135
136

£*

DO

IF bp>=bb THEN
par.ioRefNum:=ref;

1737
138

par.ioBuffer:=ADR (buffer);
par.ioReqCount:=buffersize; par.ioPosMode:=0;
par.ioPosOffset:=0;

139

errCode:=FS (PBRead, par) ;

140

IF errCode=EOFErr

141
142

bb:=SHORT (par.ioActCount);
IF bb=0 THEN

143

buffer[0]:=EF;

144

END

145

errCode:=0

END;

bp:=0;

eof:=TRUE;

Done:=FALSE;

errCode:=ROFErr

END;

146

ch:=buffer [bp];

147

END

148,

149

THEN

INC (bp)

END;

” END Read;

150
151
152

(* ReadCard

153

---------------------~---------------------------------------------- *)

154
155

PROCEDURE ReadCard(f:File;
VAR ch:CHAR; 1:INTEGER;

156 BEGIN
157,
IF f=NIL

158

Read

a CARDINAL-constant
VAR

(*input

1:=0; val:=0;
REPEAT Terminal.Read(ch);

161

WHILE

IF

ch>"

ELSIF

(ch>="0")

AND

DEC(1);

(ch<="9")

169

val:=10*val+VAL

END;

171

";

val:=val

DIV

10;

AND

((val<6553) OR ((val=6553) AND
Terminal.Write(ch);
INC(1);

170

175

ch<>"

" DO

Terminal.Write(ch);
END;

167
168

174

UNTIL

from terminal*)

ch=DEL THEN
IF 1>0 THEN

164
165

172
173

f

(*con*)

159
160

166

file

val:CARDINAL);

THEN

162
163

from

(ch<="5")))

THEN

(CARDINAL,ORD (ch) -ORD ("0"));

Terminal.Read(ch);
END;
Done:=1>0;

ELSE
val:=0;

176

REPEAT

177

WHILE

(*input
Done:=TRUE;
Read(f,ch)

ch>"

" DO

UNTIL

ch<>"

";

from file*)

178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
ZAM
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236

App. F

Program listings

362

IF

(ch>="0")

AND

((val<6553)

(ch<="9")

OR

AND

((val=6553)

Done

AND

AND

(ch<="5")))

THEN

val:=10*val+VAL
(CARDINAL, ORD (ch)-ORD("0"));

ELSE

Done:=FALSE;

val:=0;

END;

Read
(f, ch) ;
END;
END;
termCH:=ch;
END ReadCard;

(* ReadInt

Read

PROCEDURE
VAR

ReadInt

ch:
sign:
x
Ss;

CHAR;
INTEGER;
CARDINAL;
ARRAY[1..80]

ae
BEGIN

an

(f:File;

OF

INTEGER-constant
VAR

from

file

f

val: INTEGER) ;

CHAR;

INTEGER;

ReadString(f,s);
x:=0; val:=0; i:=1;
IF s[i1]="-" THEN sign:=-1;
ch:=s[1];

INC(1);

ELSE

sign:=1;

END;

LOOP
IF
IF

ch=0C THEN Done:=TRUE; EXIT; END;
(ch<"0") OR (ch>"9") THEN Done:=FALSE;
EXIT; END;
IF (x>3276) OR ((x=3276) AND (ch>"8")) THEN Done:=FALSE;

x:=10*x+VAL
(CARDINAL, ORD (ch) -ORD ("0") );
INC (1); ch:=s[1];
END;

IF Done
IF

THEN

x<=32767

ELSIF
ELSE

THEN

sign=-1

val:=sign*VAL

THEN

Done:=FALSE;

(INTEGER, x);

val:=-32767;

DEC(val);

END;

END;

END

ReadInt;

(* ReadString

Read

a string

PROCEDURE ReadString(f:File;
VAR i: INTEGER; ch:CHAR;
BEGIN
IF

f=NIL

of characters

VAR

s:ARRAY

from

file

OF CHAR);

(*con*)

THEN

REPEAT

Terminal.Read(ch);

UNTIL ch<>"

";

di=—ilire
WHILE

ch>"

IF

" DO

ch=DEL

THEN

IF 1>=0

THEN

ELSIF

1<HIGH(s)

Terminal.Write(10C);

Terminal.Write(ch);
END;

Terminal.Read(ch) ;
END;
ELSE

DEC(i);

THEN

INC(1);

s[i]:=ch;

END;

f

EXIT

END;

App. F

FilelO. MOD

237

REPEAT

238

1:=-1;

239
240

WHILE ch>" " DO
IF i<HIGH(s)
THEN

241
242
243
244
245
246
247
248
249
250
2312
252

UNTIL

ch<>"

INC(1);

";

s[1i]:=ch;

Read(f,ch) ;
END;
END;
termCH:=ch;
INC (1);
IF 1<=HIGH(s) THEN
END ReadString;

s[1]:=0C;

END;

END;

(* ReadWord
Read a word from File f without
Le
_ a
=
PROCEDURE ReadWord(f:File; VAR w:CARDINAL);

253 VAR
254

Read(f,ch);

363

i,j:

conversion
en

x)

CHAR;

BEGIN

255
256
257
258
259

Read(f,i); Read(f, 4);
w:=256*ORD(i) + ORD(4);
END ReadWord;

260

(* Write

261
262
263

----------------------------------------------------------------------- *)
PROCEDURE Write(f:File; ch:CHAR);
VAR par:ParamBlockRec; status: INTEGER;

264

BEGIN

265

266

IF f=NIL

„

26%,

THEN

a character

to

list

file

(*con*)

Terminal.Write(ch);

3° ELSE

268

WITH

269
270
271
272
273
274
275
276
277
278
279
280
281
282

Write

£f* DO

IF bp>=buffersize THEN
par.ioRefNum:=ref; par.ioBuffer:=ADR (buffer);
par.ioReqCount:=buffersize; par.ioPosMode:=0;
par.ioPosOffset:=0;
status:=FS (PBWrite,par) ;
bp:=0
END;
buffer[bp] :=ch; INC (bp)
END
END;
END Write;

(* WriteCard

Write

a cardinal

to

list

file

283 ----------------------------2722222222222.
284 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER);
285 VAR

286

1,d:

INTEGER;

Zee:
288

ARRAY[1..5]

289

1:=0;

290

REPEAT

291

d:=nr

292

INC (1);

293
294
295

OF CHAR;

BEGIN

MOD

10;

nr:=nr

UNTIL nr=0;
WHILE w>l DO Write(f,"
WHILE

1>0

DIV

10;

t[{1]:=CHR(ORD
("0") +d);

DO

");

Write(f,t[{l]);

DEC(w);
DEC(1);

END;
END;

*)

Program listings

364
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
Sig
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355

END

WriteCard;

(* WriteHex

PROCEDURE
VAR

App. F

Write

length

WriteHex(f:File;

i,j: INTEGER;

PROCEDURE

bytes

s:ARRAY

from

a

OF WORD;

length: INTEGER) ;

w:CARDINAL;

WriteHexDigit
(b: INTEGER) ;

BEGIN
IF b<10
THEN Write (f,CHR(b+ORD("0")));
ELSE

END
BEGIN

Write(f£,CHR(b-10+ORD("A")));

END;

WriteHexDigit;
(*WriteHex*)

j:=0;

FOR

i:=1

TO length

DO

IF ODD (1)
THEN w:=VAL(CARDINAL,s[j])

DIV 256;

ELSE w:=VAL(CARDINAL,s[j])

MOD 256;

INC(}4);

END;
weitet

zn?

WriteHexDigit(w DIV
WriteHexDigit(w MOD
END;
END

16);
16);

WriteHex;

(* WriteInt
PROCEDURE
VAR

Write

an

INTEGER-value

WriteInt(f:File;

mar

INTEGER;

xe
tes

CARDINAL;
ARRAY[1..5]

sign:

CHAR;

OF

i:INTEGER;

to

file

f

w: INTEGER) ;

CHAR;

BEGIN

IF i<0
THEN
ELSE

sign:="-";
sign3=" ">

END;

x:=VAL (CARDINAL, ABS(i+1));
x:=VAL (CARDINAL, ABS (1) ) ;

INC(x);

1:=0;
REPEAT

d:=x MOD 10; x:=x DIV 10;
INC(1); t[1]:=CHR(ORD
("0") +d);
UNTIL x=0;
WHILE w>1+1 DO Write(f," "); DEC(w); END;
IF (sign="-") OR (w>l) THEN Write(f,sign);
WHILE 1>0 DO Write(f,t{l]); DEC(1); END;
END WriteInt;

(* WriteLn

Skip to new
a

PROCEDURE

aaah

a —

WriteLn(f:File);

BEGIN

IF f=NIL (*con*)
THEN Terminal.WriteLn;

ELSE

Write (f,EOL);

line

athe

on list

END;

file

Le

a

i

*

App. F
356
357
358
359
360
361

FilelO. MOD
END;
END WriteLn;

(* WriteString
Write a string to list
--------------------------------------

362 PROCEDURE WriteString(f:File;
363 VAR i: INTEGER;
364
365
366

IF 1>HIGH(s)

368

ELSIF

369

ELSIF s[i]=0C THEN EXIT;

370
371
372
373

ELSE Write(f,s[i]);
END;
INC (1);
END;

374

END

*)

THEN

WriteLn(f);

WriteString;

(* WriteText

Write

text

to

list

file

---------------------------------------------------------------------- 7)

PROCEDURE
VAR i:
BEGIN

WriteText (f:File;

t:ARRAY

OF

CHAR;

1:INTEGER);

INTEGER;

FOR

1:=0

383

END

WriteText;

TO

384
385)

7_

386

(* WriteWord

Ha a

1-1

DO Write(f,t[i]);

Write

a word

END;

to File

f without

conversion

2222070000020
00000 x)

388 PROCEDURE
389

0.0.

OF CHAR);

THEN EXIT;

s[1]="$"

382

381

s:ARRAY

fill

BEGIN
13=0;
LOOP

367

375
376
377
378
379
380
381

365

WriteWord(f:File;

w:CARDINAL) ;

BEGIN

390
391

Write(f,CHR(w
Write (f,CHR(w

392

END

DIV
MOD

256));
256));

WriteWord;

393
394 BEGIN
395

con:=NIL;

396

dlgHook:=VAL(DialogHook,NIL);

ftype[0]:="TEXT";

397
398

filterHook:=VAL(FilterHook,NIL) ;
errCode:=0;

399

END FilelO.

ABS
ADR
Allocate
b
bb
bp
buffer

335
9
19
304
lol
io
lala)

336
40
97
306
Ahelsr
a
lee

307
Alzal
a
GAs)

buffersize

WBF

269

ZN

€
ch

40
1a

59
er

65

a
194
MOO)

Aa
203
DYE

A
205
Dv

932.

ftype[1]:="";

17012213082270
308
42
a

SPN
A

Ry
u

AG

rl

PATHS

82
a

93
94
205
230
246
re

369

7
206
Ol

A
206
207
Dahil Bob

les
226
226
228
22476255266

208
DSO

209
BAY

222
DATE

el

App. F

Program listings

366

Close
con

123

Create
creator

55
49
291
122
229

292

329

340

341

70
105

396
121

143

173

175

178

181

205

206

207

214

d
Deallocate
DEL
DialogHook
dlgHook
Done

40
42

drive

DupFNErr
EF
eof
EOFErr
EOL
errCode

f

fdCreator
fdType
File
FileIo
FileRecord

FilterHook
filterHook
fn
fName

ftype
GetCatInfo
good
h
HFS
HIGH

il

l1oFlFndrInfo
l1oMisc
1oNamePtr
loPermssn
loPosMode
loPosOffset
ioRefNum

143
140

143

77
140
97
115
183
268
ss

87
143
98
116
192
284
353

88
398
99
117
200
294
355

91

95

96

105

19

121

v2

139

99
118
221
295
362

99
122
224
301
368

100
122
237
307
370

100
128
241
308
379

100
131
252
318
382

100
134
239
327
388

105
154
255
343
390

111
157
262
344
Sol

111
379
399

128
388

154

192

221

252

262

284

301

327

351

60

719

79

79

80

82

82

82

84

43
395

44
395

95

119

121

139

273

47
231
64
82
201
232
313
372

51
240
65
82
202
238
314
380

246
65
82
202
240
327
382

367
66
82
203
240
334
382

66
83
209
240
335

67
155
209
245
336

69
159
222
246
363

75
163
227
246
365

75
164
230
253
367

Tis)
168
230
255
368

136
50

270
93

397
59
15
al
66
47

FS

1oActCount
ioBuffer
10oDirID
1oFDirIndex

88

198
232
302
370
141
117
40
47
49
94
40
94
118
118
99

49
93
Si
138
115

271
272
136

270

81
173
231
256
369

App.F

ioReqCount

FilelO. MOD

118

1oVersNum
ioVRefNum

J
1
length
MemTypes
ModStr
name
nr
ODD

338
301

284
314

Open

os

367

137 271
93
93
66 66 66 253 255 256
14 75 286 289 292 292
341 341 343 344 345 345
313
99
40 99
291 291 291

302
294
345

Read

132

147
139
ParamBlockRec
PasStr
PBClose

PBHCreate
PBHDelete
PBHOpen
PBRead
PBWrite
Point
ProcPtr

QuickDraw

ReadCard

ReadInt
ReadString
ReadWord
ref
REG
reply
s

SetCatInfo
SETREG

SFget
SFGetFile
SFput
SFPutFile

SFReply

SFTypeList
SHORT

sign
status
statusl

Str255

149

160

171

176

183 226

a
185
273

ai
ls
ek

ey
eh

23423717

72412255

BA
240

Rey
246

187
216
221 247
257
115 136 270
62 70 72 74 75 16
60 62 62 69 75 82 83
200 202 203 209 221 231 232
315 316 362 367 368 369 370
51
70
69
62
62
31
32
141
202 202 212 213 332 335 336
A aa
Sig PAT 49 2.51
41 43 46
29

SYSTEM

System
(8

ShkKS
BAS)

106

per.

par

SIG
POS,

293

36
61 87 100 100 116
Sie Page ale dg) Man AT 351
118 118 118 119 121 129 136
14a" 263 72708 270° 271 2711-272
28 37 112 1290263
60 84
121
a1 44
43
95
139
119 273
30
2 69 m
60 60 62 69

OSType
output

2
Sey
PRS) sy
310382

292

295

331

341

345

379

382

Ry
240

344
344
2638213

EY)
246

termCH

Terminal
tlist
Toolbox

type

VAL
volRef
vRefNum
WwW
WORD

Write

WriteCard
WriteHex
WriteHexDigit
WriteInt
WriteLn

WriteString
WriteText
WriteWord

x

App. F

Program listings

368
186

244
132
66

160
69

164

168

iyi

226

230

232

234

266

354

159
181
62
397
76

164
192
69

164
201
70

167
212
169

167
23
180

169
213
208

169

175

179

179

180

212

319

316

335

336

80

88

93

99

99

256
344
301
168
344
296
322
309
346
354
374
383
392
201
340

284
388

294
390

294
391

302

315

316

319

320

327

343

230
345

232
355

262
370

266
382

279
390

294
391

295

307

308

318

319

320

357

368

207
340

207
342

208

208

212

212

330

335

335

336

49

App. F

System.DEF

1 (* System
2
3
4
5
6

System

dependent

module

369

(from MacMETH

[86])

The module System is the heart of the Modula-2 system on the Macintosh.
It contains the loader and procedures to supply missing instructions
of the processor (REAL and LONGINT arithmetic). There are also
procedures for calling and terminating programs and handling the heap.

8 DEFINITION

MODULE

10 FROM
Wes
12

SYSTEM

IMPORT

13 TYPE
14

Status

=

System;

(*H.Seiler,

C.Vetterli,

22-Dec-85/26-Feb-86*)

ADDRESS;

(normal, moduleNotFound,
fileNotFound,
illegalKey,
readError, badSyntax, noMemory, alreadyLoaded,

15
killed, tooManyPrograms, continue, noApplication);
16
17 PROCEDURE Allocate (VAR ptr:ADDRESS; size:LONGINT);
18 (* Tries to allocate a memory area of the given size on the heap. If the
19
space is not available, ptr returns NIL otherwise ptr returns the
20
address of the reserved area*)PROCEDURE Deallocate (VAR Ptr:ADDRESS);
22 PROCEDURE Deallocate (VAR ptr:ADDRESS);
23 (* Releases the memory area given by address
25 PROCEDURE

26
27

ptr

returns

NIL*)

Terminate (status :Status);

(* terminates the currently
cause of termination*)

31 END Systen.

ptr.

running

process.

status

signals

the

Bibliography

Aho A.V., Johnson S.C. [1974] LR-parsing, Computing Surveys 6, 2, 99-124

Aho A.V., Ullman J.D. [1972] The Theory of Parsing, Translation, and Compiling,
Prentice Hall

Aho A.V., Ullman J.D. [1977] Principles of Compiler Design, Addison-Wesley

Bauer F.L., Eickel J.(eds) [1976] Compiler Construction. An Advanced Course, SpringerVerlag

Blaschek G., Pomberger G., Ritzinger F. [1985] Einführung in die Programm
ierung mit
Modula-2, Springer-Verlag, to appear in English 1989

Engelfriet J., File G. [1981] Passes, Sweeps, and Visits, in: Lecture
Notes in Computer
Science 115, Springer-Verlag, 193-207

Feldman J.A., Gries D. [1968] Translator writing systems, CACM
9, 1, 77-113
Fischer C.N., LeBlanc R.J. [1988] Crafting a Compiler,
Publishing Company

The

Benjamin/Cummings

Ganzinger H., Giegerich R. [1984] Attribute coupled grammar
s, SIGPLAN Notices 19, 6
157-170

Gries D. [1971] Compiler Construction for Digital Compute
rs, Wiley

Hartmann A.C. [1977] A Concurrent Pascal Compiler for
Minicomputers, Springer-Verlag
Henderson P., Snowdon R. [1972] An experiment
in structured programming, Bit 2, 38-53

370

>

Bibliography

Hopcroft, Ullman J.D. [1979] Introduction
Computation, Addison-Wesley

371

to Automata

Theory, Languages,

and

Hughes J.W. [1979] A formalization and explication of the Michael Jackson method of
program design, SOFTWARE - Practice and Experience 9, 191-202
Inside Macintosh [1985] volumes I-III, Addison-Wesley
Jackson M.A. [1975] Principles of Program Design, Academic Press
Johnson S.C. [1975] YACC_Laboratories, July 1975

Yet Another Compiler-Compiler, Tech.Rep.Nr.32, Bell

Kastens U., Hutt B., Zimmermann E. [1982] GAG: A Practical Compiler-Generator, in:
Lecture Notes in Computer Science 141, Springer-Verlag

Knuth D.E. [1965] On the translation of languages from left to right, Information and
Control 8, 6, 607-639

Knuth D.E. [1968] Semantics of context-free languages, Mathematical Systems Theory 2,
127-145
Koskimies K. [1984] A specification language for one-pass semantic analysis, SIGPLAN
Notices 19, 6, 179-189
Koskimies K., Räihä K.-J., Sarjakoski M. [1982] Compiler construction using attribute
grammars, Proc. SIGPLAN 82 Symposion on Compiler Construction, June 1982,
153-159
Lewis P.M., Rosenkrantz D.J., Stearns R.E. [1976] Compiler Design Theory, AddisonWesley
Lewis

P.M., Stearns R.E.
3,464-488

[1968]

Syntax directed

transduction,

Journal

ACM 15,

Meijer H., Nijholt A. [1982] YABBER - yet another bibliography: translator writing tools,
SIGPLAN Notices 17, 10
Mössenböck H. [1986] Alex — a simple and efficient scanner-generator, SIGPLAN Notices
2S

Pomberger G [1986] Software Engineering and Modula-2, Prentice Hall
Räihä K.-J. [1977] On Attribute Grammars and their Use in a Compiler Writing System,
Report A-1977-4, Department of Computer Science, University of Helsinki

Räihä K.-J. [1980] Bibliography on attribute grammars, SIGPLAN Notices 15, 3
Räihä K.-J., et al. [1983] Revised Report on the Compiler Writing System HLP78,
Report A-1983-1, Department of Computer Science, University of Helsinki

372

Bibliography

Rosen S. (ed.) [1967] Programming Systems and Languages, McGraw-Hill, New York
Rosenkrantz D.J., Stearns R.E. [1970] Properties of deterministic top-down grammars,
Information and Control 17, 3, 226-256

Spenke M., Mühlenbein H., Mevenkamp M., et al. [1984] A language independent error
recovery method for LL(1) parsers, SOFTWARE - Practice and Experience 14, 11
Tienari M. [1980]

On the Definition of an Attribute

Grammar,

in: Lecture Notes in

Computer Science 94 (eds Goos, G. and Hartmanis, J.), Springer-Verlag
Waite W.M., Goos G. [1984] Compiler Construction, Springer-Verlag

Watt D.E., Lehrmann Madsen O. [1983] Extended attribute grammars, The Computer
Journal 26, 2, 142-153

Wirth N. [1982] Programming in Modula-2, Springer-Verlag
Wirth N. [1986] Compilerbau, B.G. Teubner Stuttgart
Wirth N., Gutknecht J., Heiz W., et al. [1986] MacMETH - A Fast Modula-2 Language
System For the Apple Macintosh, User Manual, ETH Zürich

Index

actual attributes, 113, 165
address list for G-code generation, 157
Adele, 11, 125, 203
Aho, 13, 41
Alex, 119
Algol60, 52
algorithmic interpretation of grammars, 83
alias name, 109, 123
aliasspix, 128
alphabet, 14
extension, 51
alternative chain, 48, 108
alternatives, 15
of deletable nonterminals, 137
of eps-nodes, 137
ambiguity, 108
analysis phase, 4
analyzing grammar, 23
AND, 208
any, 45, 107, 122, 124, 178
any-set, 140, 147, 155
anyset, 54
applications of attributed grammars, 171
arithmetic expressions, 19
arithmetization of symbols, 6
arrows, 112
assessment of some compiler generators, 102
at, 122, 165
Atari, 101, 126
attribute, 71, 72, 113
assignment, 131, 165
context, 167
coupling, 98
direction, 164
evaluation, 79
list, 129, 164, 226
numbers, 155

passing, 87
processing, 164
saving, 90

attributed grammar, 73, 79, 105
applications, 171
of Coco, 228
attributes
consistency check, 165
of terminals, 122
Attrkind, 166
back end, 6
Bauer, 7
BITSET, 208
Blaschek, 207
BNF, 102
bottom-up syntax analysis, 24
brackets, 136

caller interface, 121

CAP, 209
CARDINAL, 208
central-recursive grammar, 19
characteristics of Coco, 117
CheckAltematives, 153
circular, 108
derivation, 21
grammar, 21
circularity, 150
CloseFile, 223

Coco, 4, 104, 222, 241
characteristics, 117
history, 197
short description, 100
coco.ATG, 228

374

Index

cocogen, 224, 245
cocogen2, 225, 254

indirect, 141
Deletable, 60, 141

cocogra, 224, 266

deletable nonterminal, 31, 141

Cocol, 4, 105

Delete redundant eps-nodes, 127, 138

example, 101, 134, 163, 167, 174, 186,

DelGraph, 141

190, 192
syntax, 212

derivable symbol, 21
derivation, 16

cocolex, 223, 275
cocolst, 226, 283

tules, 15
derived attributes, 74

cocosem, 223, 287
cocosemframe, 161, 297
cocosym, 224, 299

deterministic grammar, 24
direct deletability, 128, 134
documentation, 187

cocosyn, 223, 316
cocosynframe, 159, 328
cocotst, 225, 338

dynamic
compiler structure, 8

col, 122

CollectFirst, 143

EBNF, 19, 20, 107, 117

CollectFollow,

Emit, 157

144

comments, 106, 110
compiler, 2
compiler compiler, 3, 91
compiler description language, 3, 105
compiler error numbers, 241
compiler structure
dynamic, 8
Static, 4
complement symbol any, 45, 107
Camplete, 145
CompleteAt, 129, 223
completeness, 108, 149
components of a generated compiler, 119
compound characters, 6
ConcatLeft, 133, 223
ConcatRight, 132, 223
context condition, 76, 87, 115
context-free grammar, 15, 106
Copy, 162, 163, 223
CopyFramePart, 160, 161
correct grammar, properties, 108
cross-reference list, 214
cyclic semantic dependencies, 82

EmitAction, 166, 167, 223
empty string, 14, 107
end-of-file symbol, 109
end-of-line symbol, 110
endsem, 70
Engelfriet, 98
eps, 107
followers, 54
eps-nodes
insertion, 136
removal, 138
terminal successors of, 140
eps-set, 140, 145, 155
example, 196
epsset, 54
equivalent top-down graphs, 45
errdist, 68
Error, 60, 65, 68
error distance, 68
error handling, 62, 64
error message module, 119, 226, 348
error messages, 65, 123
Errorptr, 123
Errors, 123, 226, 348

dangling else, 29, 108, 147
debug switches, 241
DEC, 209
declaration of
semantic objects, 115
symbols, 109
definition module, 210
DelEps, 139
deletability, 31
direct, 128, 134

example of
Cocol, 101, 163, 167, 174, 186, 190, 192
generated compiler parts, 192
EXCL, 209
exit statement, 209
experiences, 197, 201
export list, 209
extended Backus-Naur form, 19
factorization of

Index

nonterminals, 49

top-down graphs, 43
File, 98
FileIo, 226, 356
Fill, 67

FillSucc, 67
filter procedure, 120
Find circular

rules, 148, 150

Find deletable symbols, 127, 141
FindEps, 146
FindEpsFollowers, 146
first (X), 26, 54
;
Fischer, 13
follow(X), 28, 143
formal attributes, 113, 165
frame module, 118, 159, 161, 297, 328
free monoid, 14
free semi-group, 14
front end, 6

grammar of Cocol, 212

grammar name, 106, 110, 121
grammar rules, 107
grammar tests, 126, 147, 225, 338
grammars in matrix form, 34

grammatical language levels, 22
GraphList, 223
Graphnode, 47, 130

Gries, 7, 13, 85

HALT, 209
handle, 18
Hartmann, 85
Henderson, 184
HIGH, 209
hints for reading the source lists, 226
HLP84, 91, 94, 104
Hopcroft, 21
Hughes, 188
Hutt, 96

G-code, 53, 55, 88, 117, 155, 213
example, 195
generation, 156

parser, 58

IBM-PC, 101, 126

identifiers, 106

GAG, 91, 96, 102, 104
Ganzinger, 91, 98
GenAssign, 166, 167, 223
GenCode, 156, 157
Generate G-code, 157

implementation description, 125
implementation module, 210
implementation restrictions, 241
import, 115, 122
list, 209

generated compiler parts, 118

INC, 209

example, 192

generated compiler, operation, 120
generated semantic actions, 165
generation of the
semantic evaluator, 245
syntax analyzer, 254
generative grammar, 23
Get eps-sets, 145
Get symbol sets, 127
Get terminal start symbols, 142
Get terminal successors, 144
GetAdr, 157
GetAt, 129, 165, 167, 223
GetFirstSet, 142
GetMacroNr, 163, 223
GetNode, 131, 140, 148, 157, 223
GetSingles, 151

GetSy, 122, 124, 129, 140, 148, 223
Giegerich, 91
Goos, 13, 82, 83
GRAMMAR, 106
grammar, 15

INCL, 209
indirect deletability, 141
individual characters, 6
inherited attributes, 74, 75
inner module, 211
input attribute, 113
input of Coco, 118
input interface, 122
Insert eps-nodes before deletable
nt's, 127,138 _
interfaces of the generated compiler, 121
intermediate language, 120, 124
intermodular cross-reference list, 214
invocation of Coco, 118
IsTerm, 152

Jackson, 187
Johnson, 13, 91, 92
Kastens, 91, 96

375

376

Index

keywords, 6, 105
Knuth, 13, 29, 82
Koskimies, 91, 94, 102

Coco, 199
the generated compilers, 200
MenTypes, 226
mini-scanner, 174

L-attributed grammar, 4, 82, 83, 92, 117

LALR(I) parser, 92,94, 96
language, 16
levels, 22
LeBlanc, 13
left-canonical derivation, 17
left-recursive grammar, 19
Lewis, 82
lexical
analysis, 5, 6
analyzer, 119, 122, 129, 165, 275
analyzer described by Cocol, 171
analyzer, specification, 172
language level, 22
Lilith, 101, 126, 198
line, 122
line numbers, 122, 131
linking
alternative graphs, 133
component graphs, 132
listings, 220
literals, 6
LL(1) test, 148, 153
LL(1) analysis
nonrecursive, 38
recursive, 35
LL(1) conditions, 27, 28
for top-down-graphs, 47, 49
LL(1) conflicts, 108
in lexical structures, 179
LL(1) grammar, 23, 26 ; 201
LL(k) condition, 40
LL(k) grammar, 25, 40
LL(k) test, 41
lookahead, 25

Macintosh, 101, 119, 126

macro, 112, 116, 163
main algorithm of Coco, 127
main program, 119, 121, 210, 222, 241
MarkReachedNts,

150

matching of symbols, 48
matrix form of grammars, 34
measurements, 197
Meijer, 91
memory requirements of

Modula-2, 111, 115, 119, 126, 207
modules, 209
description, 222

hierarchy, 221
overview, 220
Mössenböck, 119
MUG, 91, 98, 104
multi-pass compiler, 8, 9, 120, 124

name list, 129, 155

names, 6
NewAdr, 157

NewAt, 129, 164, 167, 223
NewMacro, 223
NewNode, 131, 223
NewSy, 129, 223
Nijholt, 91
nococosy, 162
nodes of the top-down graph, 130
non-circular grammar, 21
nonterminal, 14, 15, 110, 128
deletable, 141
nonterminals
factorization of, 49
replacement of, 15
substitution of, 49
terminal successors of, 140, 143
termination of, 108, 152
numbering of terminals, 109, 122
numbers, 106

OpenFile, 223
OpenSem, 163, 223
optimization of attribute processing, 167
option symbol, 20
OR, 208
ordered attributed grammar, 96
OS, 226
Output attribute, 113
output of Coco, 118
Output interface, 122

parameter arrows, 112
Parse, 58, 60, 86, 121, 127
ParseNonRecursive, 38

Index

parser, 223, 316
generation, 159

377

the generated compilers, 201

interface, 121

tables, 118, 155
tables, example, 195
tables, generation, 154
ParseRecursive, 35
parts of the generated compiler, 119
Pascal, 207
pass, 8

phrase, 17, 18
PL/1, 50
F
PLM/80, 50
Pomberger, 207
pragma, 109, 124
semantics, 113, 128, 155
printinput, 121
printnodes, 121
procedures, 115
productions, 15, 107
program frames, 118

program listings, 220

scanner, 129, 165, 223, 275
scanner generator, 119, 171
scanner interface, 122
scanner procedure, 122
scanner specification, 172
scope of semantic objects, 116
sem, 70, 111

Semant, 85, 86
semantic
action numbers, 131
actions, 70, 111
actions, generated, 165
actions, processing, 163
analysis, 5, 8
declarations, copying, 162
description, 110
error action, 115
evaluator, 118, 119, 223

evaluator of Coco, 287
evaluator, example, 194

QuickDraw, 226

evaluator, generation, 160
frame module, 297
interface, 85

.

macro, 111, 112, 116, 163

Räihä, 91, 94

modules, 119, 122

reachability, 149
recursive
grammar, 19
productions, 19
reduced grammar, 20, 21
redundancy, 108
redundant
eps-node, 138
symbol, 21
repetition symbol, 20
replacement of nonterminals, 15
RepNode, 131, 140, 223
RepSy, 129, 140, 223
RestartHash, 162, 223
restrictions, 241
results of a Coco run, 192
right end of graphs, 131

procedures for lexical analysis, 180
semantics, 69
sentence, 16
symbol, 15
sentential form, 16
simple phrase, 18
single-pass compiler, 8, 9
Snowdon, 184
software engineering, 182
source code, 220
hints, 226
source list, 118
generator, 283
source program, 2
spelling index, 129
spix, 128, 129, 162, 166
stacking of semantic objects, 116

right-recursive grammar, 19
Ritzinger, 207

start symbol, 110, 149
StartCopy, 223

root, 15

static compiler structure, 4

symbol, 106, 110, 149
Rosenkrantz, 40, 42
RULES, 107
run-time of

Stearns, 40, 42
stepwise refinement, 11
StopHash, 162, 223
strings, 6, 14, 106

Coco, 199

substitution of nonterminals, 49

378

Index

Test if all nt's can be reached, 148,
149

symbol list, 126, 127, 224, 226, 299
symbol names, 129
symbol sets, collection, 140

token code, 109, 122
Toolbox, 226
wn
graph, 42, 126, 130, 226, 266
graphs, equivalent, 45
graphs, factorization of, 43
syntax analysis, 23, 24
top-down-graphs, LL(1) conditions for, 47, 49
trace switches, 241
tracing the parser, 121
Triple, 66
two level-grammar, 77

Symbolnode, 127

symbols, 6, 14
Symboltype, 127

SyNr, 129, 223
syntactic extension, 51

syntactical language level, 22

syntax

analysis, 5, 34
analyzer, 118, 119, 223, 316

analyzer, generation, 159
of Cocol, 212
description, 106
error indicator, 121
error interface, 123
error message, 109
error-recovery, 118
notation, 107
rules, 15, 107
tree, 7, 14, 17, 91
SyntaxError,

typ, 122
type transfer functions, 209

Ullman, 13, 21, 41
understanding the source code, hints, 226
useless symbol, 21
user modules, 122
using Coco, 117

123

synthesis phase, 5
synthesized attributes, 74
SYSTEM, 211

Vach, 98
van Wijngaarden, 77
variables, 115
versions of Coco, 4

System, 226

system specific procedures, 369

Visited, 157
vocabulary, 14

target program, 2
tasks of Coco, 126
telegram problem, 184
terminal, 14, 15, 109, 122, 128
class, 23
start symbols, 26, 31, 32, 140, 142
start symbols of length k, 40
successors, 28, 31, 33

Waite, 13, 82, 83
Watt, 77
where, 77
Wirth, 20, 85, 107, 198, 207
WORD, 208

successors of eps-nodes, 140, 145
successors of nonterminals, 140, 143

terminating symbol, 21

YACC, 91, 92, 98, 104

termination, 21
of nonterminals, 108, 152
Test completeness, 148, 149
Test grammar, 127, 148

Zimmermann, 96

Test

if all nt's

t's, 148, 152

can be derived

to

*

»

=

i

—_

~

;
é

N

c En
005.26 R2Z4
:
Pe
Rechenberö
r
to
ra
genne
A compiler
uters
microcomp

(005.26 R24c 120,067
Rechenberg,
Peter.
A compiler
generator
microcomputers

tor

for

EEE

ee

A COMPILER GENERATOR
FOR MICROCOMPUTERS
Presents a practical approach to compiler construction,
illustrating how to convertthe theoretical principles of
compiler writing into a working program. The book
_ describes the compiler generator Coco, developed by
the authors in Modula-2 to runon microcomputers.
Features include:

m Adetailed description of acompiler generator including
its source code:
= The application of the compiler generatorto non-trivial
problems.
m Emphasis on table-driven syntax analysis with automatic
error recovery and semantic specification of compilers by
means of attributed grammars.
@ |llustration of the application of documentation methods
to alarge program.
P. Rechenberg is Professor of Computer Science at the
. University of Linz, Austriaand H. Mössenböck is
Assistant Professor of Computer Science atthe Federal
Institute of Technology (ETH), Zurich, Switzerland.

IT
if

Prentice Hall

ISBN

D-13-155060-8