/
Автор: Mössenböck H. Rechenberg P.
Теги: programming computer science microcomputers software tools
ISBN: 0-13-155060-8
Год: 1989
Текст
ACOMPILER
ee
MICROCOMPUTERS
eee
~P.RECHENBERG
ISSENBOCK
hy
190081
Digitized by the Internet Archive
in 2022 with funding from
Kahle/Austin Foundation
https://archive.org/details/compilergenerato0000rech
A COMPILER GENERATOR
FOR MICROCOMPUTERS
Limits of Liability and Disclaimer of Warranty
The authors and publishers of this book have used their best efforts in
preparing this book and the programs contained within it. These efforts
include the development, research and testing of the theories and programs
to determine their effectiveness. The authors and publishers make no
warranty of any kind, expressed or implied, with regard to these programs
or the documentation contained in this book. The authors and publishers
shall not be liable in any event for incidental and consequential damages
in connection with, or arising from, the furnishing, performance or use of
these programs.
A COMPILER GENERATOR
FOR MICROCOMPUTERS
Peter Rechenberg
University of Linz
Hanspeter Mossenbock
University of Ziirich
Translated by John O’Meara
and the authors
First published in English 1989 by
Prentice Hall International (UK) Ltd,
66 Wood Lane End, Hemel Hempstead,
Hertfordshire, HP2 4RG
A division of
Simon & Schuster International Group
This book was originally published in German under the title
Ein Compiler Generator ftir Mikrocomputer by Peter
Rechenberg and Hanspeter Rechenberg
© 1985 Carl Hanser Verlag, Munich and Vienna.
© 1989 Carl Hanser Verlag and
Prentice Hall International (UK) Ltd
All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in
any form, or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior
permission, in writing, from the publisher.
For permission within the United States of America contact
Prentice Hall Inc., Englewood Cliffs, NJ 07632.
Printed and bound in Great Britain by
A. Wheaton
& Co. Ltd, Exeter.
Library of Congress Cataloguing-in-Publication Data
Rechenberg, Peter
[Compiler- Generator fiir Mikrocomputer. English]
A compiler generator for microcomputers / Peter
Rechenberg.
Hanspeter Mössenböck.
>
Cin
Translation of: Ein Compiler- Generator fiir
Mikrocomputer.
Bibliography: p.
Includes index.
ISBN 0-13-155060-8 : $40.00
1. Compilers (Computer programs)
— Programming.
I. Mössenböck,
Hanspeter,
QA76.76. C65R4313
005.26 — dc19
—————
1959-
.
2. Microcomputers
II. Title
1988
ee
ee en
ee
ee ee
British Library Cataloguing in Publication Data
Rechenberg, Peter
A compiler generator for microcomputers.
l. Computer systems. Programming languages.
Compilers. Design & construction
I. Title II. Mössenböck, Hanspeter
III. Ein Compiler-Generator für Mikrocomputer.
English 005.4'53
ISBN 0-13-155060-8
ISBN 0-13-155136-1 Pbk
ee
ee
12345
92 91 90 89 88
ISBN D-13-155060-8
ISBN O-13-15513b-1
PBK
88-28926
Contents
Preface
Numbered
definitions,
algorithms,
examples
Symbols
1
Introduction and survey
1.1 Compilers and compiler compilers
1.2 Static compiler structure
1.3 Dynamic compiler structure
1.4 The structure of the book
2
Syntax
2.1 Basic concepts from formal language theory
2.2 LL(1) grammars and syntax analysis
2.3 The top-down graph
2.4 The G-code
2.5 Parsing with the G-code
2.6 Error handling
xi
Contents
3
Semantics
3,1 Semantic actions
3.2 Attributes
3.3 Context conditions
3.4 Attributed grammars
3.5 L-Attributed grammars
3.6 Implementation of the semantic interface
69
70
a
76
79
82
85
Various compiler compilers
4.1 YACC - yet another compiler compiler
4.2 HLP84 - Helsinki language processor
4.3 GAG - generator based on attribute grammars
4.4 MUG - modular compiler generator
4.5 Coco - compiler compiler
4.6 Summary
91
92
94
96
98
100
102
The compiler description language Cocol
5.1 Lexical structure
5.2 Cocol as a syntax description language
5.2.1 Productions
5.2.2 Declarations
5.3 Cocol as a semantic description language
5.3.1 Semantic actions
5.3.2 Attributes
5.3.3 Context conditions
5.3.4 Semantic declarations
5.3.5 Scope of semantic objects
105
105
106
107
109
110
111
113
115
115
116
The
6.1
6.2
6.3
6.4
117
147
119
120
121
121
1272
122
123
124
compiler compiler Coco
Characteristics
Components of the generated compiler
Operation of the generated compiler
Interfaces of the generated compiler
6.4.1 Caller interface
6.4.2 Input interface
6.4.3 Output interface
6.4.4 Syntax error interface
6.5 Generation of multi-pass compilers
Contents
7
The implementation
ed Survey
12 Structure of the symbol list
7.2.1 Symbol list representation
7.2.2 Symbol list construction
1.3 Structure of the top-down graph
7.3.1 Top-down graph representation
7.3.2 Top-down graph construction
7.3.3 Insertion of eps-nodes
7.3.4 Removal of redundant eps-nodes
7.4 Collecting the symbol sets
7.4.1 Deletable nonterminals
7.4.2 Terminal start symbols of nonterminals
7.4.3 Terminal successors of nonterminals
7.4.4 eps-sets
7.4.5 any-sets
FS) Grammar tests
7.5.1 Completeness
7.5.2 Reachability
7.5.3 Noncircularity
a 7.5.4
7.5.5
Termination
LL(1) condition
7.6 Generation of the parser tables
7.6.1
7.6.2
7.6.3
Table format
Generation of the G-code
Generation of the remaining tables
77, Generation of the syntax analyzer
7.8 Generation of the semantic evaluator
7.8.1 The invariant parts of the semantic evaluator
7.8.2
7.8.3
7.8.4
8
Processing of the semantic declarations
Processing of the semantic actions
Attribute processing
Applications
8.1 Applications in compiler construction
8.1.1 Specification of a lexical analyzer
8.1.2 Description of a lexical analyzer for Modula-2
8.1.3 Semantic procedures for lexical analysis
8.2 Applications in software engineering
8.2.1 Attributed grammars as a software design method
Vill
Contents
Jackson method as a special case
a Coco run
generated syntax analyzer
generated semantic evaluator
generated parser tables
184
187
187
192
193
194
195
Experiences with Coco
9.1 A basis for measurements
9.2 Measurements on Coco
9.3 Measurements on some generated compilers
9.4 General experiences
197
197
199
200
201
8.2.2 The telegram problem as an example
8.2.3 Attributed grammars as documentation
8.2.4 The
8.3 Results of
8.3.1 The
8.3.2 The
8.3.3 The
9
Appendices
A
B
Definition of Adele
Modula-2 and Pascal
C
Syntax of Cocol
D
G-code
E
_Intermodular cross-reference list
F
Program listings
203
207
IA:
213
214
220
Bibliography
370
Index
373
Preface
This book describes the structure of the compiler compiler Coco, which was
developed for microcomputers by the authors. It also deals with the techniques
used by Coco and those by which Coco was developed. Special attention is
given to the table driven top-down syntax analysis with automatic error
recovery and description of semantics using L-attributed grammars. Coco is
writteninModula-2 and generates compilers in Modula-2. It is hoped that this
will show how well Modula-2 is suited to the implementation and
documentation of large modular programs.
Compiler compilers, as we understand them, are not the field of a few
specialists in compiler construction, but rather are tools for managing various
tasks in software engineering, a fact which is not generally known. The
methodology of attributed grammars which lies at the foundation of compiler
compilers includes, for example, the Jackson method as a simple special case,
and can be applied where the program flow is primarily controlled by one
structured input data stream.
Thus this book has something to offer for a wide circle of readers:
1.
2.
3.
Itis a representation of the principles of compiler construction, as far as
they concern the analysis part of compilers especially LL(1)-syntax
analysis with attributed grammars. (Lexical analysis is covered only
marginally.)
Itisa detailed description of a compiler compiler.
It illustrates the application of a compiler compiler by numerous
examples.
ix
X
4.
5.
Preface
Itillustrates the application of software documentation methods on a large
program system, especially the method of stepwise refinement and the
use of an algorithm description language.
It can be used to evaluate the suitability of Modula-2 for software
engineering because it presents a large program in Modula-2 which
exploits the special properties of modular programming.
We consider the primary circle of readers to be advanced computer science
students, theoretically and practically active computer scientists and software
engineers. We therefore presuppose the usual terminology, assume that the
reader is acquainted with the development of software and that he can read
Pascal, or even better Modula-2, or some similar language. Accordingly, we
have kept the discussion brief, but have also taken pains not to refer to special
knowledge cited elsewhere to make the book understandable in itself.
The focal point around which the entire book evolved is the complete
Modula-2 code of Coco in Appendix F. We consider the publishing of such a
large program system a gamble because we are not sure whether the reader
will be interested in the numerous details in it, and because we expose
ourselves to all sorts of criticism of our programming style and choice of
algorithms. But at the same time we hope that it is just this completeness
which makes the book valuable and distinct from others.
For information concerning the structure of the book the reader is referred
to Section 1.4.
The Austrian Foundation for the Advancement of Scientific Research
financially supported the development of the compiler compiler and thereby
rendered it possible, for which we wish to express our appreciation.
For the careful review of the manuscript and for helpful suggestions we
wish to thank our colleagues and friends Prof. G. Pomberger, Dr G. Blaschek
and F. Ritzinger; for proof reading the English translation we wish to thank D.
Raye; for the review of the examples in Chapter 4 we wish to thank Prof. H.
Ganzinger, Prof. U. Kastens, Dr K. Koskimies and Prof. R. Marty. The text
was produced by ourselves with the text processor WriteNow on a Macintosh
computer.
Linz
August, 1988
P. Rechenberg
H. Mössenböck
Numbered definitions,
algorithms, examples
Definition Compiler
Definition Compiler compiler
Versions of Coco
Example Lexical analysis
Example Syntax tree
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
Definition Abbreviations for strings and sets of strings
Definition Grammar
Definition Derivation, sentential form, sentence, language
Example Derivation of all sentential forms of a language
Definition Left-canonical derivation
Definition Phrase
Definition Simple phrase, handle
Example Phrase, simple phrase, handle
Definition Recursive grammar
Example Arithmetic expressions
Definition Terminating symbol, derivable symbol
Definition Useless symbol
Definition Reduced grammar
Definition LL(k) grammar
Definition Terminal start symbols of a nonterminal
Definition Terminal start symbols of a string
LL(I) conditions for e-free grammars
Example LL(1) conditions
xi
Numbered definitions, algorithms, examples
xii
219
2.20
2:21
2:22
2.23
2.24
225
2.26
221
2.28
229
2.30
231
252
2.33
2.34
233
2.36
231
2.38
2:39
2.40
2.41
2.42
2.43
2.44
2.45
2.46
2.47
2.48
2.49
250
21
>
32
323
3.4
3.9
3.6
Saif
3.8
Definition Terminal successors
LL{1) conditions for arbitrary grammars
Example LL(1) conditions
Example Dangling else
Definition Deletability
Algorithm Marking deletable symbols
Algorithm Calculation of the sets of terminal start symbols
Algorithm Calculation of successor sets
Algorithm LL(1) analysis (recursive)
Example Recursive LL(1) parsing
Algorithm LL(]) parsing (nonrecursive)
Example Nonrecursive LL(1) parsing
Definition Terminal start symbols of length k
Definition LL(k) grammar
LL(k) condition
Example LL(2) and LL(3) test
Example Basic structure of the top-down graph
Definition Complement symbol any
Example Equivalent top-down graphs
Definition Alternative chain
Example Alternative chains
Definition Match
Definition LL(1) conditions for top-down graphs
Definition G-code (incomplete)
Algorithm Parse (simplified)
Algorithm Parse (complete)
Example Error situation
Principle of error handling
Algorithm Error (basic structure)
Algorithm Triple
Algorithm Fill
Algorithm FillSucc
Algorithm Error (with heuristic enhancements)
Example
Example
Example
Example
Example
Example
Example
Example
Semantic actions
Semantic actions
Interpretation of arithmetic expressions
Interpretation of arithmetic expressions in EBNF
Inherited attributes
A context-sensitive language
Context condition
Context condition
Numbered definitions, algorithms, examples
xiii
3.9
3.10
3.11
3.12
3.13
Definition Attributed grammar
Example Variable declaration
Definition L-attributed grammar
Parser with semantic interface
Example Attribute passing
3.14
3.15
4.1
4.2
4.3
4.4
4.5
5.1
5.2
5.3
5.4
Definition G-code (remainder)
Principle of attribute saving for recursive symbols
Example Attributed grammar as input for YACC
Example Attributed grammar as input for HLP84
Example Attributed grammar as input for GAG
Example Attributed grammar as input for MUG
Example Attributed grammar as input for Coco
Example Cocol grammar for real constants
Example The use of eps
Example The use of any
Example How the compiler treats LL(1) conflicts
88
90
93
95
97
99
101
107
107
108
108
5.5
5.6
5.7
109
110
110
5.9
5.10
5.11
5.12
5.13
5.14
5.15
Example
Exampie
Example
Example
Example
Example
Example
Example
Example
Example
Example
6.1
Example Application of any
124
8.1
Example LL(]) conflicts in lexical structures
179
5.8
Terminal declarations
Pragma declarations
Nonterminal declarations
Semantic actions
Indication of data flow at parameters
Semantic macros
Semantic actions for pragmas
Attributes
Context conditions
Declarations of semantic objects
Stacking of semantic objects
79
80
83
86
87
B12
112
113
113
114
115
115
116
Symbols
a
14
(085
14
a*
14
G
O
N
15
40
15
V
14
yr
14
Ve
14
€
14
Vr
Vy
15
15
@
e
A
U
>
|
15
15
[]
20
{}
{}
=>
>t
20
16
16
="
16
=
17
>
Ent
a, B, 9, @
(6)
203
The string of n identical symbols a
The set {a": n= 1}
The set {a”: n >0}
Grammar
Order (asymptotic time complexity)
Sentence symbol
Alphabet
Alphabet of terminals
Alphabet of nonterminals
The set of all non-empty strings built from symbols of V
The set of all strings built from symbols of V including the
empty string
The empty string
The empty set
‘Element of
Intersection of two sets
Union of two sets
Replacement symbol: ‘is defined as’
Separates alternatives
Option notation (encloses optional symbols and strings)
Set notation
Repetition notation
Direct derivation: 'produces directly'
Derivation: 'produces'
Derivation: ‘produces or is equal to’
Left-canonical derivation
‘Does not produce and is not equal to'
Input, output, transient parameters
Strings
String to be analyzed
1
Introduction and survey
The older of the two authors distinctly remembers that he first heard the word
‘compiler compiler’ at the IFIP-Congress in Munich in 1962 in connection
with Atlas, the super computer of its time by the English company Ferranti. It
was a dark, secretive term. Since compiler writing was still an art mastered by
only a few initiates, one could only touch one's cap humbly to people who
were involved in writing compilers which generated compilers. There was just
no way to understand them.
The two works which focused attention on compiler generating programs
and which eliminated much of the mystery from the concept were the anthology by Rosen [1967] and the survey article Translator Writing Systems by
Feldman and Gries [1968]. But it was the clear formulation of the two most
important deterministic grammars, LR(k)-grammars by Knuth [1965] and
LL(k)-grammars by Lewis and Stearns [1968] that helped compiler generators achieve the actual breakthrough.
Today, the terms 'compiler generator’, ‘compiler generating program’
and ‘compiler compiler’ are used synonymously and refer to a system which
in some way supports and partially automates the production of compilers.
In the first chapter we introduce the concepts of 'compiler' and ‘compiler
compiler’, survey the subtasks which a compiler must handle and discuss the
organization of the book. The reader who is acquainted with the terminology
of compiler construction, even only partially, can start immediately with Section 1.4.
Chap. 1
Introduction and survey
2
1.1 Compilers
and
compiler
compilers
With the exception of special cases, a program can be seen as the description
of a process (algorithm) which transforms input data into output data (Fig.
el):
Fig 1.1 Program
If the input data themselves form a program, and the program P transforms
them into another language, P is called a compiler, the input data are called the
source program and the output data are called the target program (Fig. 1.2).
S
Cc
dt
Fig. 1.2 Compiler
Here, the source language is almost inevitably the higher, less machine-oriented, and the target language the lower, more machine-oriented, language often
the machine language itself. Thus a compiler can be defined, as in Waite and
Goos [1984].
1.1 Definition Compiler
A compiler is a program which transforms an algorithm from a language
acceptable to humans into a language acceptable to machines.
Because a compiler is a complex program which itself must be written in a
programming language, the question arose quite early as to whether, given an
abstract description of the source language and its transformation into a target
language, a compiler could be generated either completely or partially. A program CC which is to solve such a task reads the description of the source language S together with its transformation into a target language T as input data.
It transforms this description into a program C which, when it is later executed, transforms source programs written in S into the target language T. Thus
CC generates a compiler C, and is known as a compiler generator or compiler compiler (Fig. 1.3).
pecat.1
Compilers and compiler compilers
Compiler description
in
compiler description
language
CDL
3
Compiler
in
compiler implementation
language
CIL
Compiler compiler
EC
Fig. 1.3 Compiler compiler
This leads to the following definition.
1.2 Definition Compiler compiler
A compiler compiler is a program which generates a compiler, or major
parts thereof, from the complete or partial description of the compiler.
A compiler compiler and the compiler it generates can be represented as in Fig.
1.4.
Compiler
description
in CDL
#
Source program
S
Compiler
Target program
Cc
it
Fig. 1.4 Compiler compiler and the generated compiler
A compiler compiler and its compiler description language are very closely
related. For the user of a compiler compiler the compiler description language
is actually the only interesting feature because it determines whether the description of the compiler to be generated can be formulated and how conveniently this may be accomplished.
Compiler description languages have two primary tasks: (1) the description of the syntax of the source language of the compiler to be generated and
(2) the description of the transformation of the source program into the target
program. Because the meaning of the source program is visible in this transformation, the description of the transformation is also known as a semantical
description.
There are basically two notations for syntax description: Backus-Naur
form (BNF) and Extended Backus-Naur form (EBNF). Both describe the
Introduction and survey
4
Chap. 1
syntax as a grammar in the form of so-called productions. They constitute
well-understood formal systems and are based on the theory of formal
languages.
The technique of describing semantics is less consolidated. Aside from
ad hoc methods, attributed grammars in a wide variety of forms are usually
applied here.
The compiler compiler described in this book is named Coco (a not very
imaginative abbreviation of 'compiler compiler’) and its compiler description
language is called Cocol (compiler compiler language). Cocol uses the EBNF
of Wirth [1982] for syntax description and a special form of attributed grammars, the so called L-attributed grammars, for semantical descriptions.
Coco was originally implemented in PLM80 and generated a compiler in
Pascal-86. The version described here is written in Modula-2 and generates
compilers in Modula-2. Table 1.3 shows the versions of Coco that are available for several popular compilers at the time of writing. They are different in
the languages of the generated compilers (Modula-2 or Pascal) and in the
machines on which they run.
1.3 Versions of Coco
Computer
Modula-2
Pascal
Macintosh
Mac-METH
Turbo-Pascal
MS-DOS computers
Logitech V. 3.0
M2-SDS
Taylor-Modula
Turbo-Pascal V. 4.0
ATARI-ST
TDI-Modula
IBM/370
Modula/370
1.2 Static compiler
structure
Like the translation of a sentence in a natural language Q into another natural
language Z, the transformation of a source program into a target program can
be roughly divided into two phases. First the sentence in Q must be 'understood’, through grammatical analysis. With knowledge of its grammatical
structure and the aid of a dictionary it is then possible to construct the sentence
in Z with the same meaning. In a similar way, the translation of a program
consists of analysis and synthesis.
In the analysis phase the source program is decomposed into its constituent parts. Here one distinguishes:
Sec.1.2
1.
2.
3.
Static compiler structure
5
lexical analysis, which transforms the input character stream into 'symbols' such as names, numbers and operators;
syntax analysis, which analyzes the grammatical structure of the program,
semantic analysis, which analyzes all the properties of the program
which are not of a syntactical nature.
Analysis yields:
1.
2.
3.
the determination of the correctness of the program;
the internal representation of the source program in a form which is particularly well adapted for synthesis (so-called intermediate language);
memory tables which are used for further processing of the intermediate
language.
Source program
A
N
DET
suse Cele senate tactoseodass Sesbccendaceek
Characters
Lexical analysis
a
Symbols
Compiler front end
Sealy
2
Syntax tree
te
capa tes sen feloeiekele Seele
Synthesis
Intermediate language
Intermediate language
|
Compiler back end
Target program
Fig. 1.5 Static compiler structure
In the synthesis phase the target program is generated from the program in the
intermediate language. Here one distinguishes:
ie
2,
optimization, which transforms the program in the intermediate language
to improve the target program with respect to certain criteria;
code generation, which generates the target program from the optimized
intermediate language.
This static, or logical compiler structure is shown in Fig. 1.5.
Introduction and survey
6
Chap. 1
The analysis sections are determined by the source language and the intermediate language; the synthesis sections are determined by the intermediate language and the target language. The analysis sections are known as the compiler front end; the synthesis sections are known as the compiler back end.
The compiler front end is independent of the target language; the compiler
back end is independent of the source language.
Compiler compilers primarily support the analysis phase, and therefore
this book only deals with the analysis phase.
Lexical analysis
Lexical analysis preprocesses the source program text in order to simplify the
tasks of the later phases. This preprocessing includes the following points:
1.
2.
Elimination of meaningless characters. Comments, empty lines and unnecessary spaces are eliminated.
Recognition of symbols. One or more characters in sequence which together constitute a symbol are recognized. Symbols are:
(a)
(b)
(c)
(d)
3.
keywords such as IF, WHILE, END, etc.;
names for constants, types, variables, procedures, etc.;
literals (numerical constants) such as 3.14;
strings, usually enclosed in inverted commas, such as ‘This is a
string’;
(€) compound characters such as ':=', '<=','. .', etc.;
(f) individual characters such as '(', '+', etc.
Arithmetization of symbols. Because numbers can be processed more
easily than strings, keywords, names and strings are replaced by numbers, and literals are converted to the internal numerical representation of
the machine. This process is known as arithmetization. Names are stored
in a name list, strings in a string list, and literals, possibly, in a constant
list.
1.4 Example Lexical analysis
The source statement
x :=3 + base * factor;
contains the names x, base and factor; the numerical value 3, the character combination ':=' and the individual characters ande:
alt
ident, becomessy, number, plussy, timessy and semicolonsy
are
names for the arithmetized symbols, lexical analysis yields the sequence
of 8 symbols:
ident becomessy number plussy ident timessy ident semicolonsy
Sec.1:2
Static compiler structure
7
Some of these symbols are uniquely determined (e.g. plussy); others
such as ident and number refer to a class of symbols and must be made
unique by a semantic value (e.g. an index in the name list for names, the
converted numerical value for literals). If x, base and factor are stored
respectively in places 1, 2 and 3 in the name list, lexical analysis yields
the following symbols with their semantic values:
ident/I becomessy number/3 plussy ident/2 timessy ident/3
semicolonsy
Lexical analysis is the simplest part of the compiler. However, it does take up
a large portion of the compilation time (typically 20 to 40%), which means that
efficiency is especially important.
A lexical analyzer written in Cocol is described in Section 8.1. But lexical
analyzers are not discussed anywhere else in the book and the reader is referred to the literature, for example Gries [1971] or Bauer [1976].
Syntax analysis
Syntax analysis decomposes the source program, which now consists of symbols, into its grammatical parts and represents its structure as a tree (called a
Syntax tree) or as something equivalent to a tree.
vl
:=
Variable
3
+
v2
*
v3
3
Expression
Assignment
Fig. 1.6 Syntax tree
1.5 Example Syntax tree
The source statement in Example 1.4 is an assignment. An assignment
consists of a variable, the assignment symbol, an expression and a closing semicolon. An expression consists of terms connected by addition
operators, and terms consist of factors connected by multiplication operators. This yields the syntax tree in Fig. 1.6.
8
Introduction and survey
Chap. 1
Syntax analysis is much more difficult than lexical analysis. There are, however, methods for syntax analysis which are based on the grammar of the
source language. Knowledge of these methods makes syntax analysis a routine task.
Semantic Analysis
Semantic analysis examines the properties of the source program which cannot
be represented grammatically, in particular:
1.
2.
3.
the scope of names;
the correspondence between declarations and uses of names;
the type compatibility of operands in expressions and statements.
Semantic analysis and syntax analysis can be performed together, in which
case the two phases merge; or they can be performed separately, in which case
the syntax tree, the result of the syntax analysis, is augmented with semantic
information.
1.3 Dynamic
compiler
structure
Dynamic, or time-dependent, compiler structure must be distinguished from
static, or logical, compiler structure. The individual logical divisions — lexical
analysis, syntax analysis, semantic analysis, optimization and code generation
— can be executed either sequentially or simultaneously, which means interwoven in time. Each part of the compiler which reads the source program or
an intermediate program in its entirety is called a pass, and thus compilers are
classified as single-pass or multi-pass compilers.
Figure 1.7 shows both cases. For a single-pass compiler the syntax analyzer is the central, controlling program. It calls the lexical analyzer when
it requires the next source symbol, and it calls the semantic analyzer
when it
wishes to pass on a syntactically correct construction. The semantic analyzer
generates a section of intermediate code or the corresponding machine
code
(with or without optimization). For a multi-pass compiler each
section is
executed sequentially. The result of each section is an intermediate
program
which is written onto an external storage device and is read
again by the next
pass.
Single pass compilers are generally much faster than multi-p
ass compilers
because they avoid access to external storage devices for readin
g and writing
Secil.3
Dynamic compiler structure
9
intermediate programs. Multi-pass compilers, on the other hand, require less
storage space because only one part of the compiler need ever reside in main
storage at once, and they are logically simpler because the various parts are not
intertwined. Some source languages cannot even be compiled by single-pass
pass compilers because they contain grammatical constructs whose translation
requires information which becomes available only from parts of the source
program that are processed later. This is the case, for example, when a variable can be used before it is declared.
The advantages and disadvantages of single-pass and multi-pass compilers can be summarized as in Fig. 1.8.
Source program
Lexical
analyzer
External memory
Syntax
analyzer
External memory
: nn
—
en, "
_ Control flow
Optimization
and
Data flow
code generation
Intermediate language
Semantic
analyzer
Target
program
External memory
Optimization and
code generation
Target program
Fig. 1.7 Single-pass and multi-pass compilers
Single-pass
| Multi-pass
compiler
Speed
Memory
Logical complexity
Universal applicability
Fig. 1.8 Properties of single-pass and multi-pass compilers
+=favorable - = unfavorable
10
1.4 The
Introduction and survey
structure
Chap. 1
of the book
This book consists of nine chapters and six appendices. The first three chapters cover the principles of compiler construction as far as they are required for
an understanding of Coco; occasionally rather more than the minimum is presented in order to provide a well-rounded picture. The fourth chapter provides
a glimpse into other compiler compilers, and the rest of the chapters present
Coco itself, its compiler description language, its implementation and applications. In view of this an outline looks as follows:
Principles of compiler construction
1. Introduction and survey
2. Syntax
3. Semantics
Various compiler compilers
4. Various compiler compilers
The compiler compiler Coco
5. The compiler description language Cocol
6. The compiler compiler Coco
7. The implementation
8. Applications
a
Experiences with Coco
The second chapter starts with those concepts from formal language theory
which are necessary for the remainder of the book. Then table-driven LL(1)
Syntax analysis is covered; this determines the fundamental structure of this
compiler compiler, and at the same time is a simple and efficient method for
developing the syntactic section of compilers. Most importantly this chapter
contains a method for automatic error recovery which is independent of the
language to be analyzed.
In the third chapter, the method applied in this compiler compiler for describing the actual translation process, using attributed grammars, is presented. The special case of L-attributed grammars is used here and the translat
ion
process is described by attributes, context conditions and semantic
actions.
The fourth chapter gives a survey of a few compiler generators
described
in the literature, and thus also surveys the state of the art.
The fifth chapter is a definition of the compiler descri
ption language
Cocol.
Sec.1.4
The structure of the book
11
The sixth chapter describes Coco from the view point of the user: its
characteristics, how to use it and what the compilers it generates look like.
Along the way it is shown that Coco is also suitable for implementing multipass compilers. This chapter, together with the language description of Chapter 5, forms the ‘external’ description of Coco.
The seventh chapter, the longest, contains the details of the implementation of Coco. This chapter is also intended as a study in program documentation.
The eighth chapter presents three major examples of the use of Coco. The
first is a complete description of a lexical analyzer in Cocol. The second
illustrates Coco as a software engineering tool and the method of attributed
grammars as a Software engineering method which encompasses the Jackson
method as a special case. The third presents the compiler sections generated
for a concrete input grammar.
In conclusion the ninth chapter presents experiences of the authors with
Coco.
The Appendices contain the algorithm description language Adele used
here, describe Modula-2 in as much as it differs from or supersedes Pascal,
present a complete listing of Coco in Modula-2 and a description of Coco in
Cocol, that is a self-description of Coco.
Systematic readers should read the book chapter by chapter. Readers who
wish to begin with lexical analysis should consult Section 8.1 as early as
Chapter 2. Readers who wish to know about Coco only (or firstly) from the
user's point of view can start immediately with Chapter 5 followed by Chapters 6 and 8, and perhaps Chapter 4.
Finally, readers who are already familiar with LL(1)-grammars and are
primarily interested in the implementation of Coco can acquaint themselves
first with Cocol in Chapter 5 and then concentrate on Chapters 6 and 7, although they will occasionally have to refer back to Chapters 2 and 3.
The following chapter sequences are therefore recommended (Chapters
which extend the material are in italics):
Novices and all-embracing readers:
Primarily interested in applying Coco:
Primarily interested in comparing Coco:
Primarily interested in the implementation of Coco:
2-9
5, 6, 8, 4
A088
5,6, 7, 8.3
Some remarks have been repeated so that the chapters do not become too
interdependent. We hope the all-embracing reader will forgive us for this.
In general the presentation is organized according to the principle of
stepwise refinement. This is true of the individual chapters as well as for the
Introduction and survey
12
Chap. 1
book as a whole. Thus Chapters 2 and 3 are basically refinements of Section
1.2, Chapters 5 and 7 refinements of Chapters 2 and 3 and Appendix F, containing the text of Coco in Modula-2, is a refinement of Chapter 7.
For representing algorithms our algorithm design language Adele is
used. It is defined in Appendix A, but should be understandable without a
definition as it relies strongly on Modula-2 and Pascal. The authors use Adele
constantly in their daily work and view Adele as a method for algorithm description which is adequate in most cases.
Actual Modula-2 programs occur only in the appendices, but there are
also Modula-2 fragments in Chapters 5 and 7. The book is therefore understandable for readers who are not familiar with Modula-2. In spite of this,
Modula-2 is viewed as of major importance in this book because of the technique of modular programming, and especially because of data encapsulation.
One of the book's important aims is to document a large Modula-2 program
and to demonstrate in the process how well Modula-2 is suited to software engineering projects.
Definitions, algorithms and examples are numbered and indented. A col-
lection of all numbers is to be found after the table of contents to facilitate fast
searching.
Syntax
In this chapter we deal with all syntax-related questions as far as they concern
compilers that use LL(1) syntax analysis. First, we will summarize the
terminology and some important results of formal language theory. Next, we
look at LL(1) grammars and their syntactical analysis. Since the flexibility
and efficiency of syntax analysis depends to a large degree on the representation of
the grammar in memory, we will describe the tree-like data structure
used in Coco which is called a top-down graph. We will also describe an
optimized version of the top-down graph, called the G-code, which is especially suited for interpretation. At the end of the chapter we describe the Gcode syntax analyzer and a method for automatic error handling.
Except for the G-code and its interpretation this chapter is not Coco specific. Thus, it can be read as a general treatment of syntax issues in compiler
design. Bottom-up analysis and LR(k) grammars have been left out, since
they constitute a large and self-contained topic that does not apply to Coco.
Interested readers are referred to Knuth [1965], Aho and Johnson [1974],
Waite and Goos [1984], and Fischer and LeBlanc [1988].
2.1
—
Basic concepts from formal language theory
We assume that the reader is familiar with the elements of formal languages,
and we summarize here only the terms and definitions that we will use later
on. We primarily use the terminology from the books of Gries [1971] and
Aho and Ullman [1972].
13
14
Syntax
Symbols
and
Chap. 2
strings
Programs consist syntactically of sequences or strings of symbols which
belong to an alphabet or vocabulary. If a, b, c are the symbols that constitute the alphabet V, then we can write:
Vi={a,6.c}
Symbols can be concatenated to form strings. For some strings and sets of
strings there are commonly used abbreviations:
2.1 Definition Abbreviations for strings and sets of strings
a”
e
denotes the string consisting of n identical symbols a, e. g. a3 = aaa.
denotes the empty string, i.e. a string of null symbols.
a*
a*
denotes the set {a": n> 1}, e. g. at = (a, aa, aaa, aaaa, ...}.
V+
denotes the set of all non-empty strings which can be formed from
the symbols contained in V. For example, if V = {a, b, c} then
denotes the set (an:
n2 0}, e. g. a*= {e, a, aa, aaa, ...).
It is obvious that a* = at U {e}.
Vt = {a, b,c, aa, ab, ac, ba, bb, bc, ca, cb, cc, Gade
V*
V+ is called the free semi-group over the alphabet V.
denotes the set of all strings including the empty string that can be
formed from the symbols of V. For example, if V = {a, b, c}
then
V* = {e, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, <n}
V* is called the free monoid over the alphabet V.
It is obvious that V* = V+ U fe}.
The set V is always finite whereas the sets a+, a*, V+, V* are always infinite.
Grammar
and language
In Section 1.2, we showed that the grammatical structure of an instructi
on, a
program, or generally of a ‘sentence’ is a tree, the syntax tree. In the
syntax
tree, there are two types of symbols:
1.
2.
Terminals are the symbols of the sentence itself. They are the leaves
of
the tree and cannot be decomposed further.
Nonterminals are all other symbols.
Sec. 2.1
Basic concepts from formal language theory
15
In addition to the above, each tree contains a distinguished nonterminal, the
sentence symbol, or the root, from which the entire tree originates. The valid
structures of syntax trees and hence the sentences of a formal language are described by a grammar.
A context-free grammar or, simply, grammar — since we only use
context-free grammars — is a system of rules for producing strings over an
alphabet V.
2.2 Definition
Grammar
A grammar G is a quadruple G = (Vy, V7, R, S) with the following
components:
Vy: alphabet of nonterminals,
Vr: alphabet of terminals,
R:
set of productions, also called syntax rules or derivation rules,
S:
sentence symbol, a special symbol from Vy, the root of the syntax
tree.
By V=Vy vu Vr we denote the union of the terminal and nonterminal
alphabets.
A production is written as
Aa
where
X € Vy and ae V*
(read: 'X is defined as a' or 'X can be replaced by a' which means that the
nonterminal X can be replaced by the string a in each string that contains X.
Several productions may have the same left-hand side, such as:
X
X
Q
>
X>
a
43
They denote alternatives and can be grouped by use of the symbol '!:
Xa,
la2!la3
|
(read: 'X is defined as a, or a2 Or a3’).
The productions describe the replacement of nonterminals by strings. We
start from the sentence symbol S, and replace it by a string according to the
productions of the grammar. Then we repeatedly replace nonterminals in the
string by other strings until we reach a string that contains only terminals. S$
itself and all strings that result from S by the application of the productions
are called sentential forms. The sentential forms that consist of terminals only
are called sentences.
16
Syntax
Chap. 2
We denote replacement by the replacement or derivation symbol =. If a
and B are two sentential forms and ß may be derived from a by the application
of a production, we write:
a=>B
(read: ‘a produces ß' or 'B is derived from a’).
These terms are formalized by Definition 2.3 and are illustrated by Example 2.4.
2.3 Definition Derivation, sentential form, sentence, language
We say that a string a directly produces a string B, written a = ß, if
there exist strings @; and @,, such that we can write a =, A w,,
8 =@,
@, and the production A => g belongs to the grammar. We then
call B a direct derivation of a. We describe a sequence of several
derivations by the symbols >* and >*. A string a produces a string B,
written as
a>+tß
if there exists a sequence of direct derivations
a=09>01)>02>...5
0,=8
where
n2>1
Such a sequence is called a derivation of length n. For the case of
a =+
B ora =B8, we write
a=
6s
(read: 'a produces or is equal to B'). If G is a grammar with sentence
symbol S, then a string a is called a sentential form if
S=>* a
A sentence is a sentential form that consists only of terminals, and
a language L(G) is the set of all sentences that can be derived from the
sentence symbol:
L(G) ={a:Sata
& aeVr*}
2.4 Example Derivation of all sentential forms of a langua
ge
Consider
the grammar
nonterminals
tions:
G = (SA)
B)
shag
wer:
S) with the
S, A, B, the terminals a, b, ;, and the set R of produc-
peco2.t
Basic concepts from formal language theory
S
>
A;
A —
aB
17
| BBb
B—
b | ab
From this, the following derivations of sentential forms can be produced:
S =)Ay) => ab;
=> ab;
=>
=
BBb;
aab;
=> bBb;
“=
abBb;
=) bbb;
>
babb;
>
abbb;
>
ababb;
The result is L(G) = {ab; aab; bbb; babb; abbb; ababb;}.
the language L(G) consists of 6 sentences.
Hence,
A syntax tree can be assigned to each sentence. Figure 2.1 shows the syntax
tree of abbb; in two forms.
S
S
A
Ey
;
t
+
B
By)
ae
ab
A
+
|
B
fae
B
b
ane ab
b
;
Fig. 2.1 Two forms of syntax tree for abbb;
In the top-down syntax analysis discussed later on, we will always use derivations in which the leftmost nonterminal is replaced. This kind of derivation
is called left-canonical:
2.5 Definition Left-canonical derivation
A direct derivation ,; A 2 > wı a @2 is called left-canonical, and
written as
L
01 A @2 > wıam
if o| € Vr", that is if A is the leftmost nonterminal. A derivation is
called left-canonical if all its direct derivations are left-canonical.
Sometimes it is useful to have a name for the string that is substituted for
a nonterminal during a derivation. This string is called a ‘phrase’.
18
Syntax
Chap. 2
2.6 Definition Phrase
When wı a 2 is a sentential form such that
S>*0,A@2>*01] 002,
then «a is called a phrase, more specifically an A-phrase.
According to this definition, each sentential form is an S-phrase.
Because of their importance in bottom-up syntax analysis, which is not
covered in this book, we shall also define the terms simple phrase and
handle.
2.7 Definition Simple phrase, handle
If a is an A-phrase and a direct derivation of A, then
S>*o,A
02> 0, am
holds and a is also called a simple A-phrase. The leftmost simple phrase
in a sentential form is called the handle of the sentential form.
2.8 Example Phrase, simple phrase, handle
Consider Example 2.4 and the derivation sequence
S>
A;
= B1B2b3;
> Byb2b3;
= ab b2b33
where the different Bs and bs have been distinguished by an index. In
the sentential form ab,b2b3;
abı
bo
is a simple B-phrase and the handle,
is a simple B-phrase,
abjb2b3
abi b2b3;
is an A-phrase,
is an S-phrase.
In the sentential form B,b2b3;
be
is a simple B-phrase and the handle,
B,b2b3
B,b2b3;
is an A-phrase,
is an S-phrase.
In the sentential form B1B2b3;
B1Bab3
B1B2b3;
is a simple A-phrase and the handle,
is an S-phrase.
In the sentential form a;
A;
is a simple S-phrase and the handle.
Sec. 2.1
Basic concepts from formal language theory
19
Recursive productions produce languages with an infinite number of sentences. The production A > a1Ab produces the set of sentences ab*. The
production A - a |bA produces the set of sentences b*a, the production
A
al (A) produces the set of sentences { ("a )":n>0}.
Recursion can also appear indirectly, which means it can span several
productions, as in the production pair
IN Sp oe
By.
B>
z | Au
The following definition i§ a consequence of this:
2.9 Definition Recursive grammar
A grammar is called recursive if it permits derivations of the form A >*
1 A @2 with A € Vy, wı e V*, w2 e V*. More specifically, it is called
A=>* A w
Left-recursive if
Right-recursive if
A>* oA
Central-recursive or self-embedding if A =>*+ 1 A @2.
2.10 Example Arithmetic expressions
The grammar of arithmetic expressions with the sentence symbol E and
the terminals v for variables, and c for constants:
We
PS are a
|
Be
Al > eee eat ACP
Oh ape IB [lee oe
ee by
Pam
theca ili (abhi)
is left-recursive in E and T, and central-recursive in E, T, and F.
The extended Backus-Naur form (EBNF)
Computer science uses various notations for grammar productions. The pre-
viously used notation has the following
characteristics:
1.
2.
terminals are lower case
nonterminals are upper case
3.
4.
replacement symbol is >
separation of alternatives is denoted by |
Indefinite repetition, which is a frequently occurring language element, must
be described by recursive productions, especially left-recursive productions.
This appears in many cases unnatural and it is also unsuited for top-down
syntax analysis. Several grammatical notations have therefore evolved that
Chap. 2
Syntax
20
remove these and other deficiencies. Among these, the notation introduced by
Wirth [1982] for the description of Modula-2 is especially notable because of
its simplicity and clarity. Its characteristics are:
1.
2.
CON
PW
NN
Terminals that represent themselves (literals) are in quotes
Other terminals and nonterminals have names that imply their meaning
(this is customary but not mandatory)
Replacement symbol is =
Separation of alternatives is denoted by |
Productions are ended explicitly by a period
Option symbol: [A] means A le
Repetition symbol: {A} means elAIAAIAAA|...
Parentheses for enclosing
The grammar of the arithmetic expressions is as follows:
PGMAMO
term
factor
[PMSA| steeweni EWE
||Yo) ei ic
ea neKoretong AX(MUVUA LAY) “SeEKelEyoat
= Cn Vile
(eeexpre’ssmonian)
co
The form of the EBNF grammar itself can also be described by an EBNF
grammar:
EBNFGrammar
production
= production
= symbol "="
expr
term
factor
{production}
expr.
".".
term {"|" term}.
factor {factor}.
ident
N—
il
De
| string
expr
un
|
Lu
la
expr
alley
|
bi
expr
WA
ident is the terminal for names, string is the terminal for a character string
enclosed in quotes.
In this book, we will primarily use Wirth's EBNF notation. However,
where structural simplicity of the grammar is important, we will still use the
older notation of the formal languages.
Reduced
grammars
In the grammar of a programming language, each nonterminal and each alternative should contribute to the generation of sentences. If this is the case, the
grammar is called reduced. In the development of a grammar, unnecessary
nonterminals and alternatives may creep in. Therefore, each newly developed
grammar should be checked to determine if it is reduced. If it is not, the unnecessary symbols and productions should be removed.
In order to contribute to the generation of sentences, each nonterminal
must meet the following two conditions: It must be 'terminating', that is, it
Sec..2.1
Basic concepts from formal language theory
21
must produce a terminal string, and it must be ‘derivable’, that is, it must
appear in some sentential form.
2.11 Definition Terminating symbol, derivable symbol
A nonterminal A is called terminating if it produces a terminal string,
that is
A=*a withaeVr*.
A nonterminal A is called derivable if it appears in a sentential form, that
is, if
7
S=* @,A @2.
A nonterminal that is not derivable or not terminating, contributes nothing to
the generation of sentences, and is therefore useless.
2.12 Definition
Useless symbol
A nonterminal A is called useless if there is no derivation
*
S =>" 0, A 25" ©]
a @2
where @), 02, a eV*
2.13 Definition Reduced grammar
A grammar that contains no useless nonterminals is called reduced.
Algorithms for the detection of all useless symbols are simple (see Sections
7.5.2 and 7.5.4, or Hopcroft and Ullman [1979]). If one wants to delete
them, the order is important. First, the nonterminating symbols must be
found and all alternatives in which they appear must be deleted from the
grammar. Then, the nonderivable symbols and the alternatives in which they
appear must be found in the new grammar and deleted. Automatic deletion is
possible but not recommended since useless symbols often indicate errors in
the grammar.
Even after removing useless symbols, the grammar may still contain use-
less alternatives, which permit derivations of the form A >* A. Such a derivation is called a circular derivation, and the grammar is called circular or
cyclic. Section 7.5.3 contains an algorithm for a circularity check of a grammar. The book by Hopcroft and Ullman [1979] contains an algorithm for the
deletion of productions where the right-hand side consists of only a nonterminal, and thus for the removal of cycles.
In the following, we will cover only non-circular reduced grammars.
22
Syntax
Chap. 2
Grammatical levels
Programming languages contain constructs of various hierarchy. At the very
top are programs, which are composed of modules, procedures, declarations
and statements. Declarations and statements in turn are composed of expressions, keywords, names and numbers. Names and numbers are composed
of characters. It is somewhat arbitrary which of these constructs are defined
as terminals. If one only wants to show the nesting of procedures, then declarations and expressions can be regarded as terminals. If one wants to describe
the structure of expressions, then keywords, names, numbers, and operators
can be regarded as terminals. Only if one wants to descend further must
individual characters be seen as terminals.
In this way, the syntax of a programming language need not be completely described by one grammar, but may be partitioned into several grammatical
levels. The terminals of the higher level are the nonterminals of the lower
level.
In compiler design, usually two levels are used: the syntactical and the
lexical level. The syntactical level is the higher of the two; its sentence symbol
is the program. Its terminals are keywords, names, numbers, operators, etc.
Below this, nonterminals of the lexical level are keywords, names, numbers,
and special symbols. Its terminals are the individual characters of the source
text, insofar as they are meaningful for the grammar (comments, end-of-lines,
and meaningless empty symbols are not part of grammar). Figure 2.2 shows
this relationship.
syntactic
lexical
program
procedure
statement
name
number
keyword
expression
ee
name
number
keyword
individual character
Fig. 2.2 Syntactic and lexical grammatical levels
In this book, we consider mainly the syntactical level. This results in a difficulty with the notation of terminals. From the syntactical level, the expression
art pe
Seen2:2
LL (I) grammars and syntax analysis
23
consists of two names v, a number c, and the operators '+' and '*':
vtivitc
While the terminals '+' and '*' are simultaneously members of the syntactical
and the lexical level, the terminal v denotes all names, and the terminal c
denotes all numbers. In order to emphasize this fact, we call terminals of the
syntactical level that represent an entire class of symbols of the lexical level, a
terminal class. Thus, in the grammar of arithmetic expressions, v and c are
terminal classes, and +, -, *, /, (, ) are individual terminals.
It is to some extent arbitrary which terminals of the lexical level are also
considered as terminals of the syntactical level, and which are combined to
terminal classes. For instance, the operators *, /, and MOD from the lexical
level can be considered at the syntactical level as individual terminals or can be
combined at the lexical level to the terminal class mulop by the production
mulop
2.2
PT
Beh if
|
WA
|
"MOD",
LL(1) grammars
and syntax analysis
A grammar for a language can be used in two different ways: as a generative
grammar for the generation of sentences of the language, and as an analyzing
grammar for the decision whether a given string is a sentence of the language.
The generation of sentences is a trivial, straightforward, combinatorial
problem, and of no interest in the practical areas of computer science. However, the aspect of the generative grammar is important in theoretical computer
science and mathematics. In these fields grammars are classified according to
the expressive power and the structural characteristics of the languages they
generate.
The analysis, more precisely the recognition of sentences is, from a
mathematical point of view, also a trivial problem. All sentences of the grammar may simply be generated in ascending order by their length, and it is then
easily determined if the specified string is among the sentences (search by exhaustion). In reality, this is not feasible since the number of sentences generally grows exponentially with their length. Analysis methods are needed that
make use of all information in the grammar, and that perform the analysis of
the given string in a minimum of time and memory requirement. These
methods can be separated into two main categories: top-down methods start at
the top with the sentence symbol and move downward by repeated derivations
trying to find a sentence which matches the given terminal string; bottom-up
24
Syntax
Chap. 2
methods start at the given terminal string and move upward by repeated reductions of phrases until the sentence symbol is reached. In addition to these
two main approaches, there are analysis methods that mix the top-down and
bottom-up approach.
In this book, we will cover only top-down analysis.
In top-down analysis, we start from the sentence symbol and repeatedly
generate new sentential forms by left-canonical derivations, with the goal of
deriving a sentence matching the given string. If this is successful, the string
has been parsed. If it is not successful, and we have exhausted all possibilities
for the derivation of sentences that match the string, then it is clear that the
symbol string is not a sentence of the grammar.
The only difficulty with this approach is the selection of the correct alternative. Generally, there is not enough information available at the time when
the selection between several alternatives must be made to be reasonably sure
of choosing the correct one. Therefore, usually the alternatives must be tried
one after the other until the correct one is found. The alternatives that have
been tried unsuccessfully are dead ends from which one has to return by
backtracking. Fortunately, programming languages are structured in such a
way that the proper alternative can be determined with certainty by considering
only a part of the input string. These grammars are called deterministic. In
compiler construction, only deterministic grammars are used, and so we shall
cover only the top-down analysis of deterministic grammars.
Deterministic
top-down
parsing
The concept of deterministic top-down parsing consists in selecting the proper
alternative by looking at the start symbols of the string to be analyzed. In this
way parsing proceeds from left to right. Consider, for instance, the grammar
S
>
A —
aA
x
| bB
| aB
B—
y | bA
and the input string o = bbay. The grammar has the property that all of its
alternatives start with terminals, and also that the heads of the alternatives are
different in each rule. This property permits the dead-end-free determination of
the correct alternative by consulting the string o. Assuming that the string is
read from left to right, the parsing proceeds as follows:
1.
In the beginning, when a choice has to be made between S >a@A and
S = bB, the first symbol of o is read, b is found, and therefore it is
Sec. 2.2
LL (1) grammars and syntax analysis
known that
S >bB
must be the correct alternative since
never lead to a sentence starting with b.
25
$S=aA can
2.
If bB is further derived, one has the choice of replacing B with y or
with DA. If the next symbol is read, a b is found, and so DA must be
the correct alternative.
3.
Continuing this procedure, the following derivations are generated:
S =bB >bbA >bbaB =bbay
resulting in the recognition of o as a sentence.
From the above derivation, the syntax tree of Fig. 2.3 follows.
Fig. 2.3 Syntax tree
This is the essence of deterministic top-down parsing: Starting with the sentence symbol, a sequence of left-canonical derivations is built, selecting the
correct alternatives by the inspection of the string to be parsed. The string is
read from left to right.
More than one symbol of the input string must be considered when several alternatives of a production start with the same symbol. This lookahead is
a characteristic of the LL(k) grammar:
2.14 Definition LL(k) grammar
A grammar is called LL(k) (deterministically recognizable from left to
right with left canonical derivations and a lookahead of k symbols) if its
sentences can be parsed by a top-down analysis from left to right in such
a way that in each situation where a choice must be made between several
alternatives, the correct alternative can always be found by considering
the next k symbols of the input string.
Chap. 2
Syntax
26
Since it is desirable to restrict the lookahead to one symbol, and since it turns
out that this restriction allows us to handle most practical cases, we will examine more closely only the LL(1) grammars. The main question is how to
determine if a given grammar is LL(1). We will answer this question first for
e-free grammars (i. e. grammars without empty alternatives), and then for
grammars that do contain empty alternatives.
LL(1)
Grammars
without
empty alternatives
Even a grammar whose alternatives begin with nonterminals may be parsable
without running into dead ends. Consider the grammar
S
—
D>
Aa
xza|
| Bb
vB
B—
uz | vA
and the string o = uzb. Even though none of the alternatives of the production
for S starts with u, it is obvious that only B can be derived into a string
starting with u, while all derivations of A start with x or y. The symbols x
and y are the 'terminal start symbols’ of A, and u and v are the terminal
start symbols of B. The concept of a set of terminal start symbols is central
for the description of the LL(1) property.
2.15 Definition Terminal start symbols of a nonterminal
The set first(A) of terminal start symbols of the nonterminal A is defined to be the set of all terminals with which a string derived from A can
Start:
first(A)= {x:A=>*xo,
for
xe Vrand
we V*}
For the production A — ¢, first(A)=®
(the empty set)
This definition can be expanded in a natural way for a string as argument :
2.16 Definition Terminal start symbols of a string
The set first(a) of the terminal start symbols of a string a is defined to be
the set of all terminals with which « or a string derived from « can start:
first(a)= {x:a=>* x, forxeVr and we V*}
As a special case we define first(e) = @.
With the concept of terminal start symbols, it is easy to define the conditions
under which an e-free grammar is LL(1):
Sec#2.2
LL (1) grammars and syntax analysis
27
2.17 LL(1) condition for e-free grammars
An e-free grammar is LL(1) if, for each of its productions,
the sets of
terminal start symbols of its alternatives are pairwise disjoint
. That is,
for each of its productions
A>
oj | =.
| a,
the following holds:
firsta) N first(aj)=@
forl<izj<n
2.18 Example LL(1) conditions
The grammar
DEDERN;
A —
aB
| BBb
Bi => 10) | ale
is not LL(1) since the following is true for the production
A — aB|BBb:
first(aB)= {a}, first(BBb)= {a,b},
and hence
*,
first(aB) N
first(BBb)= {a}
The sets of terminal start symbols of both alternatives are not disjoint.
Both alternatives can start with an a. As a result, if a choice must be
made between alternatives, and a is the leftmost symbol of the input
string, the correct alternative cannot be found without a lookahead of
more than one symbol.
No left-recursive grammar is LL(1) since for a production of form
A-~ a | AB the following is true: first(a) = first(A B).
LL(1) Grammars with empty alternatives
For a grammar with empty alternatives, the LL(1) condition of the preceding
section no longer holds. Consider, for instance, the grammar
S —
aA;
| bAc;
Arc|/e
and the input string o = bc;. It is obvious that the production for S meets the
LL(1) condition 2.17 which is also true for the production for A because
first(c)={c}, first(e)=@
andhence
first(c)N first(e)=®
Chap. 2
Syntax
28
However, the grammar is not LL(1) since after the derivation
Se
DAG.
it is impossible to determine with a lookahead of only one symbol whether
A—>c
orA > e must be used for the next derivation.The choice of A — c:
SEZEDACHEDEDEC,
does not lead to o. The choice of A > e is the correct one. Therefore, the
grammar is not LL(1).
If we must choose one of the alternatives of a production
AO,
holo
le
and only the next symbol of o can be used, then the terminal start symbols of
a1 to a, and the terminal successors of A must be pairwise disjoint, since
in the case of the production A > e , the terminal following A is the next one
in o. The set of terminal sucessors is defined as follows:
2.19 Definition
Terminal successors
The set follow(A) of the terminal successors of a nonterminal A is the
set of all terminals that can follow A in any sentential form:
follow(A) = {x:S ="
Ax @2, forAe Vy, xe Vz, 01, waeV*}
This definition makes it possible to determine the conditions under which an
arbitrary grammar is LL(1):
2.20 LL(1) conditions for arbitrary grammars
A grammar is LL(1) if (1) for each of its productions, the sets of all ter-
minal start symbols of all alternatives are pairwise disjoint, and (2) for the
nonterminals which can be derived into the empty string, all terminal successors of the nonterminal are disjoint from the terminal start symbols of
each alternative. Formally: for each production
A>a,l...la,
the following must hold:
first(a; follow(A)) N first(a jfollow(A))=®
forl<i#j<n
Note that in the formal representation the cases «; PD" ¢ and a;> ir are
combined. For a; #" ewe have first(a; follow(A)) = first(a;), for
aj=
*
ewe have first(a; follow(A)) = follow(A).
Sec. 2.2
LL (1) grammars and syntax analysis
29
2.21 Example LL(1) conditions
Consider the grammar of Knuth[1965]:
S->
E;
E >
aAbE
| bBaE
A
aAbA
| €
B —
bBaB
| €
| ¢€
Is it LL(1)? Since e appears in the productions for E, A, and B, the terminal successors of E, A, and B are needed. From the grammar it can
be easily seen that follow(E)
follow (A)
follow (B)
{7}
{b}
{a}
The lookahead sets are:
for the alternatives of the E production
first (aAbE
follow(E))
{a}
first (bBaE
follow(E))
{b}
first(€ follow(E))
for the alternatives of the A production
first (aAbA
{7}
follow(A))
{a}
first(€ follow(A))
© #4
for the alternatives of the B production
first (bBaB
follow(B))
{b}
ll
{a}
Since the lookahead sets are pairwise disjoint for the alternatives of each
production, the grammar is LL(1).
The calculation of the successor sets is not always easy as we can see in the
following example of an if statement having a dangling else clause.
2.22 Example Dangling else
Consider the grammar
NS
FG)
eS
OYE
=F
=
program
4
statement
programrest
>
program
statement
4
assignment
assignment
+
Vie
programrest
| end
| ifstatement
xpi,
ifstatement
>
if thenpart
thenpart
>
expr
then
elsepart
4
else
statement
elsepart
statement
| €
with the sentence symbol program and the terminals end, v, :=, expr,
„if, then, else.
Chap. 2
Syntax
30
Is it LL(1)? There are three productions with alternatives: programrest,
statement, elsepart. The first two are LL(1) since
first (program)
=I
ivan yy etelsite (ena)
=
{end}
first (assignment)
=
{v},
=
{if}
first (ifstatement)
The calculation of follow(elsepart) is longer:
follow(elsepart)
follow(ifstatement)
follow(statement)
=
=
=
follow(ifstatement)
follow(statement)
first (programrest)
U follow(thenpart)
U follow(elsepart)
by production
by production
by production
5
3
1
by production
by production
6
7
with the result:
follow(elsepart)
U
U
first (programrest)
follow(thenpart)
follow(elsepart).
Since the last term on the right-hand side agrees with the left-hand side, it
adds nothing to the set. In addition, since
first (programrest)
=
=
first (program)
{v,if,end}
=
{v,if,end}
U
first (end)
we have
follow(elsepart)
U
follow(thenpart).
Additionally,
follow(thenpart)
first (elsepart)
follow(ifstatement)
= first (elsepart)
U follow(ifstatement)
= {else}
by production
by production
5
7
=
follow(statement)
by production
3
=
=
{v,if,end} U {else}
{v,if,end,else}
hence
follow(elsepart)
Checking the LL(1) condition for production 7 results in:
first (else
statement)
N
follow(elsepart)
=
{else}
# ®,
The grammar is therefore not LL(1).
The fact that the grammar in this example is not LL(1) does not preclude it
from being deterministically parsable with a lookahead of one symbol. The
Syntax analyzer can always choose the first alternative when it sees the production elsepart and else is the next input symbol. In spite of the ambiguity
of the statement
Sece22
if
LL (1) grammars and syntax analysis
a then
if b then
c else
31
d
the first else then always belongs to the innermost then (as in
PL/I and Pas-
cal).
LL(1) grammars and grammars of programming languages
The LL(1) conditions severely restrict the class of grammars that can
be analyzed deterministically. Almost all programming language grammars
violate
the LL(1) conditions. Especially disturbing are two facts:
1. Left-recursive productions are not LEG):
2. Alternatives that start with the same string are not LL(1).
However, it is almost always possible to transform non-LL(1) constructs into
LL(1) constructs. This is greatly aided by the use of EBNF notation. With it
left-recursive productions can be described by use of the repetition symbol {},
and common beginnings of alternatives can be extracted by factorization. We
have defined the LL(1) conditions only for grammars with simple BNF productions. So the question must arise how they look when an EBNF grammar
is used. We will defer the answer for the time being until the end of Section
223:
Computation
of start and successor
sets
For small grammars, the calculation of start and successor sets to check for the
LL(1) property can be done by careful visual inspection. However, an algorithm is needed for larger grammars. Since the derivation of the form A >+ ¢
plays an important role, we will first introduce the concept of 'deletability".
2.23 Definition Deletability
A nonterminal A is called deletable, if it produces the empty string:
A="
«.
In this section we will write deletable symbols in square brackets: [A].
An algorithm for marking deletable symbols in a grammar is trivial. It is
based on the following assertions:
1.
IfA > eisa
2.
FA
production then A is deletable.
X... X, is a production and all X; are deletable, then A is
also deletable.
Chap. 2
Syntax
32
2.24 Algorithm Marking deletable symbols
MarkDeletableSymbols:
Mark all nonterminals A for which A>8& exists;
repeat
-- Assert: All marked symbols are deletable
Mark all nonterminals A for which A > X1...Xn
and Xj...Xn are all marked nonterminals
until No new symbol was
end MarkDeletableSymbols
marked
Sets of terminal start symbols. Three cases must be distinguished for the
calculation of the terminal start symbols of a string a:
1.
the string is deletable;
2.
3.
its first nondeletable symbol is a terminal y;
its first nondeletable symbol is a nonterminal Y.
From this, computation rules (1) through (3) follow:
1.
foro = [X)] ... X];
EIESE(WE-FFILSERT
2.
3.
EUR
fora = [X,]... [X;] yo,
first (a) = first (1) Us..U
fora= (Xq) 1X2
0,
FIRSELQ)EZEFTESEKT
EUTL
OERITSEIXE)
first (Xe) VO {y)
UERIESEN
RE
)RÜUSLTESET)
The set of start symbols of a nonterminal is the union of the sets of start symbols of its alternatives:
4.
“for a>
ay | sel oF
erase (UA))
First (OU
5 fU) eiliecye, (0)
From these computation rules, the following algorithm is derived.
2.25 Algorithm Calculation of the sets of terminal start symbols
FindFirstSets(lGTfirst):
param G: A grammar
nonterminals;
with
marked
deletable
symbols
first: array(l:n)
of set of terminal;
begin
first (1l:n):=@;
-- start with empty sets
repeat
for all
productions
A>Q]|...|Qm
do
and
n
Seci2.2
LL (1) grammars and syntax analysis
for all alternatives a@;=[Xıl...[Xt]Y@ with t>=0,
first (A) :=first (A) +first (X))+...+first (X¢);
case of
Y is terminal:
| Y is nonterminal:
| Y® is absent:
end
end
end
until No change in first
end FindFirstSets
33
Ywev*
do
first (A) :=first (A) +{Y}
first (A) :=first (A) +first (Y)
-- nothing
Terminal successor sets. When calculating the terminal successors of a nonterminal A there are also three cases which must be distinguished: in the
right-hand side of a production in which A appears, either a terminal y, a
nondeletable nonterminal Y, or nothing follows after any deletable symbols.
From this, the computation rules (5) through (7) follow:
5.
forB >A
[X)]... X],
(first (X}) U...U
6.
forB
>A
first (Xt) U follow(B))
[X)]... [XJ y 2,
(Eanesti(
Xk) Oe On tars
7.
fer
is in follow(A)
(xp) EU
tye)
is in follow(A)
B+
A [X] ... [Xi] Yo,
(first (X]) U...U first (X) U first(Y))
isin follow(A)
If all occurrences of A on the right-hand side of the productions are considered, the total set follow(A) will be the combination of all partial sets of
follow(A) that result from (5) through (7). Therefore we have the following
algorithm.
2.26 Algorithm Calculation of successor sets
FindFollowSets(lGlfirstT follow) :
param
G:
A grammar with marked deletable symbols and n
nonterminals;
first: array(1:n) of set of terminal;
follow: array(l:n) of set of terminal;
begin
follow(1:n):=@;
-- start with
repeat
for all nonterminals A do
empty
follow
for all productions B-@A[X1]...[Xt]Y¥@2
sets
with t>=0 and Yw,eVv* do
follow(A) :=follow(A) +first (X;)+...+first
(Xt);
Chap. 2
Syntax
34
case of
Y is terminal:
| Y is nonterminal:
| YM.
is absent:
end
end
end
until No change in
end FindFollowSets
follow(A)
follow(A)
:=follow(A) +{y}
:=follow(A)+first
(Y)
follow(A) :=follow(A) +follow(B)
follow
The implementation of the algorithms depends strongly on the data structure of
the grammar. The execution time depends on the order in which the productions are visited. Many optimizations are possible.
Principles of syntax analysis of LL(1) grammars
The principle of deterministic syntax analysis of LL(1) grammars can be described abstractly under the following assumptions.
1.
The grammar is given in ‘matrix form':
form
Aj > al.
| Ojjmax(i)
where
It has imax productions of the
1 SiS imax
The sentence symbol is A;. An alternative aj; is given by kmax components of the form
Oj = X
aj =e
ji. Xijkmax(i,j)
means kmax(i,j) = 1, and Xj) =e.
The representation is matrix-like: index i describes the production,
index j describes the alternative, and index k describes the component.
As programmers, we understand this representation as an abstract data
structure with the access functions:
X(Lidjlk): symbol
returns the value of symbol X ijk:
Kind(liljlk): Symkind
returns the kind of the symbol Xjjx,
where Symkind = (terminal,nonterminal,epsilon).
Rule(Liljlk): integer
If X;jx is the nonterminal A;, then this function returns index i:
Rule(liljik)=i' © X ijk = Ai
Kmax(litj): integer
returns the number of components of alternative j in the production i.
Sec. 2.2
LL (1) grammars and syntax analysis
35
Match(\xli): boolean
returns true if a phrase of the nonterminal A; can start with terminal
x, or - if Ai >* e -x can follow the phrase of Ai:
Match(\xli) & x e’first(A; follow(A;))
Alternative(Jx\i): integer
returns the index j of the alternative of the production i which can
begin with the terminal x:
Alternative(txli) =j = xe first(a; follow(A;))
3.
The string to be parsed consists of pmax symbols Sp:
© = S}...Spmax With pmax 2 1
The description is basic and abstract since we ignore (1) the concrete data
structures of the stored grammar, (2) the implementation of the access functions, and (3) the fact that the input string is actually supplied by a lexical analyzer.
We will now give a recursive and a nonrecursive parsing algorithm.
The recursive algorithm uses an internal recursive procedure Parse. Its
operation should be clear from the following specifications and from the text
of the algorithm without additional explanations.
Initial state: The input string, up to the symbol s,.ı, is recognized as a legal
beginning of a sentence. The A;-phrase starts with Sp:
Function: Parse(litcorrect) tries to parse the A;-phrase.
Final state: If correct = true, an A;-phrase is parsed and p is advanced
such that s, is the first input symbol that is no longer part of the A;-phrase. If
correct = false, an A;-phrase was not parsed.
2.27 Algorithm LL(]1) analysis (recursive)
ParseRecursive
(Tcorrect):
param
global
correct: boolean;
grammar in matrix
local
pmax: integer;
p: integer;
s: array(l:pmax)
Parse (Jilcorrect):
param i: integer;
correct: boolean;
local j,k: integer;
--the
string
is successfully
parsed
form;
of symbol;
--input
string
--index
of current
--an
Aj phrase
input
is parsed
symbol
Chap. 2
Syntax
36
--try
begin
-- position
to parse
an
1 --
if Match (4spli)
Ai phrase
|
then
j:=Alternat ive (Jspli); k:=1;
loop
—— OO SuseO Nie
case Kind (LiljJk) of
terminal:
--parsing of A; possible
--parse ajj
--parse Xjjk
if p>pmax or Sp<>X (Lidjdk) then
correct:=false; exit
end;
p:=ptl
--read
next
input
symbol
| nonterminal:
Parse (JRule (Liljlk) Tcorrect) ;
if not correct then exit end
| epsilon:
--
do nothing
end;
ihe k<Kmax (JiJj)
then
k:=k+1l
end
else correct:=false
end
=>
correct:=true;
--parsing
of A;
exit
end
impossible
foyoyenlicst@yel. 3) =
end
begin
p:=1;
end
else
Parse;
--pmax and s are assumed to have values
Parse (J1Tcorrect); correct:=correct
& p=pmaxtl
ParseRecursive
We will show the behavior of the above algorithm in Example 2.28 below
where we take a snapshot of the states of the algorithm at 'position 1',
‘position 2', and 'position 3'.
2.28 Example Recursive LL(1) parsing
Consider Knuth's e-containing grammar from Example 2.21. Let us give
its components the indices i, j, and k, and extend it by the component
eof so that it will not produce empty sentences:
Sh
E2
A3
Ba
=
©
>
>
Hii
4211
a311
bai1
COs
A212 b213 E214
A312 b313 A314
Byı2 4413 Baıa
| b221 B222 ao23 E224
| €321
| €421
| €231
The input string shall be
a1 b2 by aq eofs
All steps performed by the algorithm can be traced in full detail by the
snapshots of Fig. 2.4.
Sec. 2.2
LL (1) grammars and syntax analysis
Recursion depth:
0
Position p Sp ijk Xijx
1
2
ijk Xijk
3
13k Xjjx
1
2
a
la
ib
2
a
oA
2-Alle
2
Zab
ZZ
1
22D
Aa
2
2D
SZ
37
ijk Xijk
1-TIGER
ea
3
Dub
2
2b
321
2
1
2
2
Sie,
Seb
Bab
4a
ii
4a
Dame
2
4a
421
€
3
4a
421
€
2
2
4a
5 eof
il
5 eof
Zee
=
2
5 eof
Zalse
3
5 eof
3
"3
5 eof
5 eof
2
3
5 eof
6 -
*
243°
5
BNGY
iy
¢
correct=true
2-BON
N
PPDEEB
BR
224
214
112
112
eof
eof
E
correct :=true
Bi
E
Oeil
E
correct :=true
224
E correct:=true
correct:=true
correct:=true
Fig. 2.4 Snapshot of the LL(1) parsing of Algorithm 2.27 applied to the
grammar of Example 2.28
The nonrecursive algorithm uses a stack for the intermediate storage of the
indices of all nonterminals that are currently being processed. The access
functions of the stack are InitStack, Push(Liljlk) and Pop(Titjtk).
The algorithm can be in three states: findalternative, try, forward.
These are characterized by the assertions which hold in each one respectively:
findalternative: The input string is already recognized up to the symbol s,_ı
as a legal beginning of a sentence. s, is recognized and it is expected that an
A,-phrase, starting with s,, will follow. The index j of the matching alternative of the A;-production will be found.
try: The grammar symbol X;;, will be parsed.
forward: Xj, has been successfully parsed, so move to its successor.
For the stack, the following assertion holds in all three states. If i = 1, the
stack is empty. If i > 1, then A; is at the top of the stack.
38
Chap. 2
Syntax
2.29 Algorithm LL(J/) parsing (nonrecursive)
ParseNonRecursive(fcorrect) :
--the string is successfully
param
correct: boolean;
global grammar in matrix form;
--input string
s: array(l:pmax) of symbol;
pmax: integer;
try, forward);
type
State = (findalternative,
i,j,k,p: integer;
local
Su
parsed
mowaleys
begin
i:=1; p:=1; stack:=empty;
st:=findalternative;
--pmax and s are assumed
--have values
-=starbewichsefirseterule
to
loop
case st of
findalternative:
—— Osi
one —
ge Match (Jspli)
then
j:=Alternative(Yspli);
==Xj;x
else
k:=1; st:=try
correct:=false;
==5p does
exit
is first
not
component
match
end
| eine
——
--parse
Xj jx
DOS
tekOns ZI
case Kind(Libjlk)
terminal:
if p>pmax
p:=ptl;
of
or X (Lid jk) <>sp then
st:=forward
--Xijk
correct:=false;
exit
end;
is parsed
| nonterminal:
Push (Lil jlk);
i:=Rule(Liljlk);
st:=findalternative
| epsilon:
st:=forward
end
--case
Kind
| forward:
--
--advance
position
to next
component
3 --
if k<Kmax(lil})
then
else
k:=k+l;
if i>1
st:=try
--no end of alternative
--end of alternative
then Pop(TiTjtk)
else correct:=p=pmax+l;
end
end
--ease st
end
==lloop
-- position 4 -end ParseNonRecursive
--Nonterminal
exit
--rule
X;;jx is parsed
1 is parsed
The behavior of the nonrecursive algorithm is shown in Example 2.30.
SeCig2.2
LL (1) grammars and syntax analysis
2.30 Example
39
Nonrecursive LL(1) parsing
We consider the same grammar and input stream as in Example 2.28 with
snapshots at positions 1 to 4. The algorithm executes as in Fig. 2.5.
Position
P Sp
Stack
(End-Of-Stack
ZA
NA
2,
WL
BOS
Bev
Matt
222,
204)
110
BR
Pilih
Aaa
224,
214,
111
22.0,
alla,
hahah
224,
214,
111
left)
EN
WDEWEN
SL
IE
ler
Ken
lon
hey
iep
lem
jemion
ton
hte
ue
DBIEDZREED
® fo} Ph
eof
eof
mmeow
eof
NS
UO
WW
DD
WI
NO
ING
DIN
DIA
Go
NM
Gd
FD
>
ID
MN
DD
G
MR
G
231
224
214
m
111 mm
112 eof
112 eof
1 eof
Oo
SS
BS
SP
Lf
NH
DH
DD
FF
FF
WWD
NHN
OOo
DDO
SP
WW
correct=true
Fig. 2.5 Snapshots of the nonrecursive LL(1) algorithm 2.29,
applied to the grammar of Example 2.28
The recursive algorithm is statically shorter and more elegant. The nonrecursive algorithm is more suited for the inclusion of error handling since the explicitly stacked symbols are accessible (see Section 2.6).
Both scan the input string strictly from left to right (p is never decremented). In addition, there exists a grammar-specific upper limit c such that after
Chap. 2
Syntax
40
a maximum of c loops, a new input symbol is read. Hence, the algorithm has
a linear execution time with respect to the length of the input string. It has the
time complexity O( pmax).
LL(k)
grammars
for k > 1
A lookahead of more than one symbol is rarely used in compilers. We shall
therefore treat LL(k) grammars for k > 1 only briefly, for the sake of
completeness.
First, we define the set of terminal start symbols of length k of a
string a:
2.31 Definition Terminal start symbols of length k
first(a) = {B: a =* Bo with BeV;*, IBl=k,weV*}
first(a)
= {B:a=*B with
B Vz", IBI<k)
for Bork
for B >k
If the terminal string which can be derived from «a. is shorter than k, then
the elements of first,(a) are also shorter than k.
We will now give a formal definition of the LL(k) grammars according to
Rosenkrantz and Stearns [1970]:
2.32 Definition LL(k) grammar
A grammar is called LL(&) if for all left-canonical derivations of the form
S>*aAo*a
S>*aAw*
Bo
a Yo
where first,(B ©) = first,(y @), it is implied that B =y.
This means that in an LL(k) grammar no two sentential forms with the leftmost nonterminal A and the alternatives A > B and A > ycan exist in which
the left canonical derivations of the remaining strings beginning with ß and y
agree in the first k symbols. From this, we get the following condition:
2.33
LL(k) condition
A grammar is LL(k) if for each pair of rules
A>ß
and
Ay
and each left canonically derived sentential form that contains A:
SECEL2
LL (1) grammars and syntax analysis
41
S>*aAo
the following condition holds:
first(B w) 0 firstly o) =
2.34 Example
LL(2) and LL(3) test
Again, consider the grammar
S 7
A;
A —
aB
| BBb,
B—>
b | ab
in order to see if it is LL(k) and determine the value of k.
The only pair of rules that creates a problem is:
A > aB
A > BBb
and the only sentential form in which its left-hand side A appears is A;.
k=1:
the LL(1) test produces:
1 (aB;)
=iaq}
Lı(3BBb;)
=T{a,b}
Since a belongs to both sets, the grammar is not LL(1).
k=2:
the LL(2) test produces:
L>(aB;)
L2(BBb;)
= (aa, ab}
= {bb, ba, ab}
Since the element ab belongs to both sets, the grammar is not LL(2).
k=3:
the LL(3) test produces:
L3(aB;)
L3(BBb;)
= {ab;, aab}
= {bbb, bab, abb, aba}
Since both sets are disjoint the grammar is LL(3).
Algorithms for the computation of the sets first,(a) and for checking the
LL(k) conditions for k > 1 can be found in Aho and Ullman [1972].
No left-recursive grammar is LL(k) for any k. Another simple grammar
that is not LL(&) for any & is:
S7
A;
A>
a | aka
It has the language {q2ntl }. If there were a value of k such that
first{aa"’,) n first(aAaa",) = ©
then k > n+1 would be true. However, since n can become arbitrarily large,
there is no such k.
Syntax
42
Chap. 2
Rosenkrantz and Stearns [1970] proved the following interesting statements
about LL(k) grammars:
(1) It is undecidable whether a given arbitrary grammar is LL(k) for an
unknown value of k.
(2) It is decidable whether a given arbitrary grammar is LL(k) for a given
fixed value of k.
(3) If a grammar G is not LL(k) for a given k, it cannot be determined if
there is an equivalent LL(k) grammar for G.
(4
A
For each LL(k) grammar G that contains e, there is an e-free LL(k+1)
grammar that produces the same language as G, but without the empty
string.
2.3
The top-down
graph
In a table-driven syntax analysis, the grammar of the source language must be
stored in main memory so that the analysis algorithm can access it freely. The
three-dimensional abstract data structure consisting of rules, alternatives, and
components, used in Section 2.2 for the representation of the principal algorithms, is not suited as concrete data structure. It does not make efficient use
of memory and the grammar cannot be represented in EBNF form. A representation that is much better suited for a practical top-down analysis is a
special kind of graph. We call it top-down graph. It is similar to the syntax
diagrams, introduced by Wirth, that were used to describe the Syntax of
Pascal. In Coco, the top-down graph is used as a preliminary step to the even
better suited G-code. Since the G-code is understandable only by means of the
top-down graph, we will describe that first.
Basic structure
The basic structure of the top-down graph is a collection of ordered binary
trees. Its nodes are the grammar symbols of the right-hand sides of syntax
rules. Right pointers link the components of an alternative while left pointers
link the start symbols of different alternatives.
In the picture of a top-down graph, a right pointer leaves a node
at the
right, a left pointer leaves the node at the bottom:
Sécs23
The top-down graph
node ———~
43
right pointer (to next component)
left pointer
(to next alternative)
2.35 Example Basic structure of the top-down graph
Figure 2.6 shows the top-down graph of the grammar
SAH;
B —
aB
7
| BBb
B—
b | ab
Notice that the top-down graph comprises only the right-hand sides of
the rules.
?
Ss
A—-
AS
a—B
B—B--b
a— > b
Fig. 2.6 Top-down graph of the grammar of Example 2.35
Factorization
An advantage of the top-down graph over a linear representation is the ability
to show alternatives in a factorized form, as can be done in EBNF. From the
rule
A—
abclacd
withthe top-down graph
A >
a—»b— +c
Q—
Ce ed
we get by left-factorization the EBNF rule
44
Syntax
A —>
Chap. 2
a(belcd)
withthe top-down graph
A =
a—»b—+c
abcldec
withthe top-down graph
A >
a—b—c
From the rule
A —*
we obtain by right-factorization the EBNF rule
A ——
(ab|de)c
withthetop-down graph
A =
we: c
d—ee
Notice that the last top-down graph is no longer a tree.
A special case occurs when an alternative is the beginning of another
alternative. Then, an e is created by factorization. For the rule
A —
abla
with the top-down graph
A =
a—b
A =
a—+b
we get by left-factorization the EBNF rule
A-—>
alb]
with the top-down graph
€
Removal of left-recursive rules
The symbol strings defined by left-recursive rules can be represented in EBNF
by the repetition symbol. Repetition corresponds to a loop in the top-down
graph. From the rule
A —
alAb
with the top-down graph
A =
a
5
A=
we get the EBNF rule
A — >, a{b}
with the top-down graph
A =>
a a: b “|
€
Sece<:2.3
The top-down graph
45
This top-down graph is also not a tree. It can easily be verified that it represents all possible right-hand sides such as a, ab, abb, abbb, etc.
The complement symbol any
Sometimes it is desirable to represent a set of terminals by its complementary
set, for example
1.
2.
3.
in the description of a string enclosed in quotes: the set of all symbols in
the alphabet except the quote;
any symbol in a comment of the form (* ... *) by the set of all symbols
except the symbol *) ;
any symbols except begin (to skip declarations).
Complementary sets cannot be represented in the production notation of formal languages. Therefore, the only thing left to do is to enumerate all members
of the complementary set, which is very inconvenient. For use in Coco, we
introduce the special symbol any to denote complementary sets.
2.36 Definition
Complement symbol any
The complement symbol any represents every arbitrary terminal that is
not a terminal start symbol of an alternative of any.
Figure 2.7 shows the three examples above with the symbol any as an EBNF
rule and as a top-down graph.
string
comment
skip =
{any}
win
=
=
"(*"
{any}
Usha)
{any}
"begin".
"x)"
string
=>
comment
skip
=
any
won
“ce
>
|
any
non
a)"
any a ii "begin"
Fig. 2.7 The meaning of the complement symbol any
Equivalent top-down graphs
If one uses only the basic structure, then a unique top-down graph results
from a grammar rule. This uniqueness is lost with factorization and removal of
left recursion. In these cases there are sometimes several equivalent top-down
graphs.
Chap. 2
Syntax
46
2.37 Example Equivalent top-down graphs
Consider the expression
TEE TTS
ites i ays
By factorization and elimination of left-recursion the graph shown by the
upper part of Fig. 2.8 will result. It has 10 nodes and corresponds to the
EBNF rule
E
=
die
|
wow
We
|
Woo
T)
RN
ii)
|
noe
Ths
Another top-down graph which is equivalent to both but consists of only
7 nodes appears in the middle part of Fig. 2.8. This graph corresponds to
the EBNF rule
E
=
(eae
|
WES
Ge
C(t
|
|
T}
Figure 2.8 shows another equivalent and even more condensed top-down
graph with only 6 nodes. This top-down graph no longer corresponds to
an EBNF rule.
10 nodes
[
7 nodes
6 nodes
=
+
wa
mwa)
Fig. 2.8 Three equivalent top-down graphs for expressions
Sec. 23
The top-down graph
47
The graph with the fewest nodes occupies the least memory. However, there
may be reasons (due to the treatment of semantics, see Section 3.6) not to
compress the top-down graph too much.
These examples show that for each EBNF rule there is a corresponding
top-down graph. But a top-down graph does not always correspond to an
EBNF rule.
Representation
The top-down graph can be represented in memory by a data structure of
nodes and pointers that may be dynamically generated or statically declared
and initialized. Since the number of nodes is known in advance and does not
change, the static declaration is more efficient. In Coco, the top-down graph
basically consists of an array of nodes, and each node consists of four
components:
type
Graphnode = record
kind:
(terminal,nonterminal,any,eps) ;
val,lp,rp: integer;
end;
var
graph:
array(l:n)
of Graphnode;
The names have the following meaning:
kind: ” the various node types.
val:
Ip:
rp:
n:
the node symbol in some encoding, meaningless for e-nodes.
the left pointer that points to the first node of the next alternative. If
lp > 0 then graph(Ip) starts the next alternative. If /p = 0, the current alternative is the last one of the production.
the right pointer that points to the next component. If rp > O then
graph(rp) is the next component. If rp = 0, the current component is
the last component of an alternative.
the number of nodes in the grammar.
LL(1) Conditions for top-down graphs
The LL(1) condition of Section 2.2 refers to the simple grammar representation with rules and alternatives. If a grammar meets these rules, the correct
alternative can be selected by a lookahead of one symbol without backtracking. A similar condition for the top-down graph ensures the correct selection
of the alternatives without backtracking by use of a lookahead of one symbol.
To simplify the discussion, we introduce two auxiliary concepts. Since
they are of central importance for the syntax analysis of top-down graphs, we
will use them often. We call these concepts ‘alternative chain’ and 'match'.
48
Syntax
Chap. 2
2.38 Definition Alternative chain
An alternative chain is the ordered set of all nodes of a top-down graph
that are linked together by left-pointers. A node pointed to by a right
pointer is the start of an alternative chain. A node without a left pointer is
the end of an alternative chain. We can define nodes that are not linked by
left pointers as also belonging to an alternative chain. In this case the
alternative chain consists of the node alone.
2.39 Example Alternative chains
In the top-down graph of Fig. 2.9 symbols are distinguished by subscripts. The graph contains the alternative chains
(Pisses)
The
ae
oe
sy
Note that node T; belongs to two alternative chains.
Beer
1
|
12
At
3
+ —TIT
4
|
5)
u,
Eg
Fig. 2.9 Top-down graph for expressions with indexed symbols
2.40 Definition
Match
An input symbol x and a node of the top-down graph with symbol sy
match (i. e. fit together) if one of the following conditions are met:
1. sy is a terminal and x = sy;
2. sy is a nondeletable nonterminal that may start with x;
3. sy is a deletable nonterminal. sy can start with x or xcan follow the
node sy;
4. sy is an e-node and x can follow the node SY;
5. sy is an any-node and x matches no other node in the alternative
chain to which the any-node belongs.
In order to select a node Joc uniquely from an alternative chain using
a lookahead symbol x, x must match only one alternative:
Sec. 2.3
The top-down graph
49
2.41 LL(1) conditions for top-down graphs
An alternative chain is LL(1) if an arbitrary input symbol matches at
most
one of its nodes. A top-down graph is LL(1) if all of its alternative chains
are LL(1).
The top-down graph of Fig. 2.9 is therefore LL(1) if T cannot start with
+ or
— and if E cannot be followed by + or — (these symbols would match the enode).
Since each EBNF
production corresponds to a top-down graph, the
LL(1) conditions for top-down graphs are also the LL(1) conditions for EBNF
grammars. In order to check if an EBNF grammar is LL(1), it is easiest to
generate its top-down graph and check if it meets the LL(1) conditions. The
LL(1) conditions for EBNF grammars can also be derived from the definition
of the EBNF grammar alone without constructing a top-down graph.
However, this is cumbersome and results in no new insights. We therefore
omit the description and leave the task to the interested reader.
LL(1) Top-down graphs and grammars of
programming languages
If top-down graphs are to have practical value, one must be able to represent
the grammars of programming languages as LL(1) top-down graphs, and
therefore as LL(1) EBNF grammars. We may ask, therefore, if they do this
without exception, or if there are constructs that resist an LL(1) representation,
and if so, what can be done about it.
First of all, LL(1) violations by left-recursive productions and by
of several alternatives with the same symbol can easily be avoided
down graphs and in EBNF notation. Remaining LL(1) violations can
be removed with various tricks that are determined with insight into
ticular situation. As an aid for the 'grammar designer’, we will treat
typical cases and distinguish between the following five methods:
the start
in topusually
the parseveral
substitution and factorization;
alphabet extension;
syntactic extension;
acceptance of non-LL(1) constructs;
AB
WN
miscellaneous transformations.
Substitution and factorization. Consider a production with two alternatives
that start with different nonterminals X and Y, where X and Y can start
with the same symbol (terminal or nonterminal). Then it is often possible to
50
Syntax
Chap. 2
ions, and
replace the symbols X and Y by the right-hand side of their product
,
to extract their common starting string by left-factorization.
of
tions
instruc
DO
This can be simple and obvious as in the various
PLM/80 (and similarly in PL/1):
statement
=
| dostatement
| whilestatement
| forstatement
| casestatement
|
dostatement
whilestatement
forstatement
=
=
casestatement
=
=
nyo
"DO"
Wout joil@xel<.
"WHILE" expr ";" {statement} ending.
"po" ident "=" expr "TO" expr ["BY" expr]
{statement} ending.
"DO" "CASE" expr ";" {statement} ending.
Wek
By substitution and factorization this results in
statement
‘
=
|; "po"
(Um block
(PIZCASE expr
Pee WHTEE express
aidentut="exor
) {statement}
"TO"
Rexpra
(BY
expr
ending
)
er
However, it can also be difficult. In grammars such as Modula-2 a factor can
be a set or a designator, and both can begin with an identifier:
factor
=a
designator
set
qualident
= qualident {"." ident | "[" exprlist
= (qualident nu
Vfelenentlesejen}T:
= ident {"." ident}.
Edesionatorz
lack parcial
uses
"]"
| "an.
Note that even the production for designator taken alone is not LL(1). For instance, ident.ident may be simply a qualident or a qualident followed by
dent
The removal of the LL(1) conflict consists of combining designator and
set into a new symbol deset, and then splitting designator into ident and a
remainder desigrest. After several substitutions and factorizations, the following LL(1) constructs result:
factor
=...
ieee agli
| deset
aa ee
|
LGN
120067
Sec#23
The top-down graph
iidenrs
Mer
(BAU
saidenty
tp wi erxpristeunle,
[elementlist]
"}"
| "{"
51
desigrest
[actpars]
| [actpars]
Nae
desigrest
=
{"."
ident
| "[" Cxprduist
my]
tan) mtn).
The equivalence of the old and new constructs can no longer be easily seen.
Alphabet extension. In selecting an alternative, it is fairly common for two
lookahead symbols to be necessary to find the right one. The main example of
this is when labels appear in front of statements:
statement
=
[ident
":"]
(ident
":="
expr
| ifstatement
gestsbc
An ident at the beginning of a statement may be a label or the left part of an
assignment. This can only be determined by the symbol following ident. This
conflict can often be resolved by extending the terminal alphabet. In the preceding case, the word label can be added to the alphabet, and the lexical
analyzer can be required to supply a label instead of an ident if ident is fol-
lowed by a':'. In this case, the lexical analyzer is used to resolve the LL(1)
conflict.
This method leads to complications if the lexical analyzer is required to
carry out a wider inspection of context to determine whether or not to substitute two terminals by another. For example, in Algol 60, ‘ident :' does not always mean the label of an instruction. An identifier may also appear in a
declaration, as in ARRAY(n
: m). In such cases, the lexical analyzer is no
longer independent of the syntax analyzer since it must consider the context.
Syntactic extension. In Algol 60 there exist multiple assi gnments, such as
assignment
= designator
":="
{designator
":="}
expr.
where expr can start with designator. This LL(1) violation is very nasty. It
can be removed by ‘substitution and factorization’, but this is very cumbersome (the reader should try it). It is easier to 'expand' the designator inside
the curly brackets to expr. This requires the introduction of an additional production for assignrest:
assignment
= designator
":="
assignrest
= expr
assignrest].
[":="
assignrest.
The syntactic extension must be compensated by a semantic restriction. If in
the production for assignrest the right-recursive part is present, expr must be
restricted to be a designator. This can be achieved by the introduction of a
boolean attribute isdesignator. Anticipating knowledge from Chapter 3, this
Chap. 2
Syntax
52
may be written as an attributed grammar as follows:
assignrest
=
©XPLTi sdesignator
(see
assignrest].
where
(isdesignator)
This means: by syntactic extensions, portions of the language definition are
moved from syntax to static semantics.
Acceptance of non-LL(1) constructs. If it is known that the parser tries to
match the alternatives in the order they are written, some LL(1) violations can
be left alone. The best known case is the dangling else:
ifstatement
=
"IF"
expr
"THEN"
statement
["ELSE"
statement].
Although this construct is not LL(1), and is even ambiguous (see Example
2.22), it can be left alone if one can be sure that the parser, having recognized
the statement following THEN, first tries to detect the optional ELSE, and
only regards the entire if statement as complete if there is no ELSE.
Other transformations. Sometimes, a grammar that is not LL(1) can be trans-
formed into an equivalent LL(1) grammar by simple transformations that do
not fall into any of the four categories above. For example, in Algol 60, a
block is defined as
block
= head
head
=
";"
"begin"
body.
declaration
{";"
declaration}.
This construct is not LL(1) since the semicolon is used in a dual role. It sep-
arates adjacent declarations and it separates body from head. The solution is
simple: The grammar can be transformed so that the semicolon becomes a
terminator instead of a separator:
block
head
= head
body
"begin
sdeclaratvone
t+?
(declaration,
a}
The necessity of such transformations, their difficulty, and the uncertainty of
executing these transformations correctly is a weakness of the LL-method, and
often a cause for criticism. In bottom-up analyzable LR(1) grammars, no
transformations, or only a few, are needed, so research has been focused on
the LR-method. However, syntax is but one aspect. What is gained with the
LR-method must be paid back by the connection of semantics to syntax: it is
much more inflexible in the LR-method than in the LL-method, often leads to
violations of the LR-property, and then also requires transformations. In addition, the LL(1)-method is much easier to understand than the LR-method. This
results in easier transformations and more understandable error messages.
secs2.4
2.4
The G-code
The
53
G-code
A top-down graph that resides in memory is a useful way of representing
a
grammar. It already requires little space, but it can be significantly compressed
further. Let us consider the grammar of arithmetic expressions:
S = expr.
eXPrE
RECOM (usr Gece mht
term = factor {"*" factor}.
factor
= v
| "("
expr
")",
Now, let us add the production S' = S eofsy where S' is the new sentence
symbol and eofsy (= end of file) a new terminal. This trick ensures that each
sentence terminates with the same symbol eofsy and that there is no empty
sentence if S can be derived into the empty string.
Su
=>
S—
S
>
expr
expr
=>
CHE
term
>
factor
factor
=
v
ME
=
eofsy
"+"
"*"
expr
—
term
a
_>- factor
=
ye
Fig. 2.10 Top-down graph for an expression: graphic representation
In Fig. 2.10 we see a top-down graph of a grammar with 15 nodes. In Fig.
2.11 we see the internal memory representation described in Section 2.3. If
we assume one byte each for the components typ and val, and two bytes
each for /p and rp, then the table requires 15*6 = 90 bytes.
Compacting can be achieved by partitioning the nodes according to their
types, and by coding the individual types so that they do not contain any unnecessary information. The G-code (grammar code) that we use is such a
code. For syntax analysis the elements of the G-code behave as instructions
and therefore they are written as instructions. Sequential G-code instructions
are sequentially executed. They correspond to nodes in the top-down graph
Chap. 2
Syntax
54
as far
that are connected by right pointers. Definition 2.42 defines the G-code
graph.
n
as it is relevant for the representation of a top-dow
tule for S '
tule for S
tule for expr
tule
for term
rule for factor
Fig. 2.11 Top-down graph for an expression: representation in memory
The G-code is augmented by tables containing the lookahead symbols. With
each nonterminal symbol sy (not with each nonterminal node) there is associated a set first(sy), containing its terminal start symbols.
The operand nr
of an e-instruction (= EPS and EPSA) refers to an array epsset. Thus
epsset(nr) contains all terminals that match the corresponding e-node (see
Definition 2.40). The operand nr of an ANY A-instruction refers to an array
anyset. Thus anyset(nr) contains all terminals that match the corresponding
any-node. In summary, these G-code lookahead sets have the following data
structures:
first:
array(l:maxnt)
of Symbolset
epsset:
anyset:
array(l:maxeps)
array(l:maxany)
of
of
Symbolset
Symbolset
If the lookahead sets are stored bitwise, they do not require much memory.
It can be seen that each node of the top-down graph corresponds to a Gcode instruction. The G-code instructions RET and JMP are added at the end
of productions and loops where the linear execution sequence is interrupted.
Sec: 2:4
The G-code
2.42 Definition
55
G-code (incomplete)
Instruction Bytes
Description
ee
e
Ah eh Ue e
i
sy
2
ERBEN
terminal
If the next input symbol is sy then recognize it, else report an
error.
TA
sy adr
4
terminal with alternative
If the next input symbol is sy then recognize it, else go to
adr.
NT
sy
2
nonterminal
If the next input symbol is a terminal start of sy then step
through its production, else report an error.
NTA
sy
4
nonterminal with alternative
If the next input symbol is a terminal start of sy then step
through its production, else go to adr.
1
any
Recognize the next input symbol.
4
any with alternative
If the next input symbol is contained in the symbol set indicated
by nr then recognize it, else go to adr.
2
epsilon
If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else report an error.
4
epsilon with alternative
If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else go to adr.
3
jump
Go to adr.
adr
ANY
ANYA
nr
IES)
ie
EPSA
nr
JMP
adr
adr
adr
1
RET
return
Return from the production of a nonterminal.
The operation code and the operands sy and nr are 1 byte each; adr is 2
bytes.
The following G-code results for the grammar shown in the top-down
graph of Fig. 2,10:
In?
are?
5) RET
The production
S'
= S eofsy.
The
S = expr.
S
eofsy
6
8
NT
RET
expr
9
Ta
NT
erh
term
19
NT
production
The production
BERN
term
20
expr
= term
{"+"
term}.
Chap. 2
Syntax
56
iW
20
22
JMP
11
BPSma
RET
31>
SAGE
36
JME
25
ERS?
RET
37
41
TATEN
RET
42
T
"
The
production
The
production
term
=
factor
FACT ORM
wa
{Ux
SACtOR} .
NeZexpren)ir
42
(”
44
NT
expr
4 6
M
" )u
48
RET
The lookahead sets are:
first (S)
Sn,
first (expr)
Zn)
I
first (term)
eirseltactor)
=
=
{vj 3}
Aig, ul)
epsset(1) =
{eofsy,
")"}
=
{eofsy,
")",
epsset (2)
"t"}
anyset is empty since the grammar contains no any-symbol.
The total amount of G-code is 48 bytes, which is slightly more than onehalf of the top-down graph.
In Coco, first of all a top-down graph is generated. It is then used to
check several properties of the grammar, and to calculate the start and successor sets. Finally the graph is transformed into G-code, and this is the ultimate
structure in which the grammar is stored.
2.5
Parsing with the G-code
Parsing becomes quite simple with the G-code since the G-code itself is already a parsing program. To make a parser, it is only necessary to code an
interpreter for a G-code program.
In this section we will develop such a parser without error handling. In
the next section we will add the error handling.
Assumptions
We will summarize here the assumptions on which parsing with the G-code is
based.
Sec. 2.5
1.
2.
Parsing with the G-code
The G-code is derived from a top-down graph that meets the LL(1)
conditions.
IfS is the sentence symbol, then the top-down graph and the G-code are
expanded by the production
Su)
3.
4.
07.
SSmCOLSY,
where eofsy is the terminal end-of-file that does not appear in the original grammar. The first G-code instruction of this production has the
address 1.
The symbol string to be parsed is supplied by a lexical analyzer, which
provides the next input symbol in the variable typ for each call. After
reaching the last source symbol, the lexical analyzer supplies the symbol
eofsy.
The parsing algorithm uses a stack of actual length lacts (= actual length
of stack) to store the addresses that follow the nonterminal instructions
currently being processed (these are the "return addresses" of the currently parsed nonterminals).
Overview
The parsing algorithm executes the G-code program which is controlled by the
input string to be recognized. It starts at address 1 and ends at the instruction
for the symbol eofsy. Depending on the current input symbol typ and the
current G-code instruction several courses of action are possible. When the
algorithm tries to recognize a terminal there are two possibilities: if it succeeds
then it moves to the next symbol; if it fails then it goes on to the next alternative (if there is any). When the algorithm tries to recognize a nonterminal,
there are also two possibilities: if the input string and the nonterminal match
then the algorithm pushes the address of the next instruction on the stack and
jumps to the first G-code instruction of the nonterminal; if they do not match
then it goes on to the next alternative (if there is any). At the end of productions, the ‘return address’ is popped from the stack with RET, and the
algorithm continues from there on. When an error occurs, error handling and
synchronization take place, after which parsing continues as if no error had
occurred. The analysis ends when typ = eofsy and the corresponding Gcode instruction is T eofsy.
The parsing algorithm is called Parse. It returns a boolean variable
correct which will be true if the analyzed input text is syntactically correct.
Parse is an interpreter that has the following structure:
Chap. 2
Syntax
58
Parse (Tcorrect):
--program
pc:=1;
loop
opcode: =G-code (Upc) ;
case opcode of
--G
counter
operation
ts
| ta:
execute
execute
instruction
instruction
"T sy" and change pc
"TA sy" adr and change
| jmp:
end
execute
instruction
"JMP
code
pc
adr"
end
end Parse
Inside the loop, a value is assigned to the result parameter and the loop is
terminated if typ = eofsy.
The simplified G-code parsing algorithm
First we will present a simplified version of Parse that does not contain the
instructions
ANY, ANYA, EPS, EPSA,
and does not have
any error
actions. We further assume that nonterminals are not deletable. For the description of Parse in Adele, we will use the following routines:
Decode( pct opcode sy? nr? adr? nextpc)
returns the parameters of the G-code instruction starting at address pc.
(An operand that does not appear in the actual instruction returns an undefined value of the corresponding parameter.) nextpc is the address of
the next instruction.
NewSym(t typ)
returns the next input symbol.
Root(} sy): integer
returns the address of the first G-code instruction for the production for
the nonterminal sy.
By using these actions, the simplified algorithm is as follows:
2.43 Algorithm Parse (simplified)
Parse(Tcorrect)
:
param
CONSE
correct: boolean;
COLSY =.
type
Instruction
=
--correctness
--end of file
indicator
symbol
(t,ta,nt,nta,
jmp, ret) ;
local
adr: integer;
first: array of Symbolset;
lacts:
integer;
nextpc: integer;
nr: integer;
--instruction part adr
--lookahead symbol sets
--actual stack length
--addr.of next G-code instr.
--instruction part nr
ECHL
Parsing with the G-code
59
opcode: Instruction;
pc: integer;
stack: array of integer;
sy: integer;
typ: integer;
begin
--instruction part opcode
--program counter
--nonterminals worked on
--sy part of G-code instr.
--current source symbol
pe:=1;
--init.and
lacts:=0;
NewSym(Ttyp) ;
read
first
loop
Decode (LpcT opcode? syfnrf aartnextpc) ;
case opcode of
te
if typ=sy
F
then
if typ=eofsy then
correct:=true; exit
end;
pes=nextpc;
else
end
NewSym(Ttyp)
correct:=false;
exit
| ta:
--get
instruction
--term.without
--must match
--terminate
--advance
then
else
end
pc:=nextpc;
pc:=adr
--may
NewSym(Ttyp)
fans
in first (sy)
lacts:=lacts+l;
stack(lacts)
else
end
correct:=false;
exit
| nta:
if typ
then
else
and read
unsuccessfully
with
alternative
match
--advance and read
--goto alternative
:=nextpc;
stack (lacts) :=nextpc;
pc:=adr
alternative
pc:=Root (sy)
--terminate
if error
loop
--nonterminal
--may match
in first(sy)
lacts:=lactstl;
alternative
--nonterm.without
--must match
if typ
«,
then
pc
successfully
--terminate
--terminal
if typ=sy
at
symbol
with
pc:=Root
--goto
alternative
--jump
to next
alternative
(Jsy)
end
| jmp:
pc:=adr
| -xets
pe:=stack (lacts);
--case
end
end
--loop
end Parse.
lacts:=lacts-1
--return
--find follower
instruction
in
stack
The complete G-code parsing algorithm
We will now add the interpretation of the instructions and properties that were
left out in the previous section, and provide the following explanations.
The instruction ANY recognizes any source symbol, and ANYA recognizes any source symbol that is a member of the lookahead set belonging to
this instruction.
The instructions EPS and EPSA recognize the empty string if the source
symbol matches their lookahead set.
Chap. 2
Syntax
60
In the case of an error, the analysis shall not be terminated. Rather, the
error handler
Error ({pclaltroot)
will be executed. Error requires as parameters the address pc of the nonmatching G-instruction and the address altroot (root of alternative chain) of
the first G-instruction of the alternative chain in which the error occurred.
Error synchronizes by skipping of input symbols, changes pc and altroot,
and sets correct to the value false. Error is thus local to Parse.
Every time an input symbol has been successfully parsed, the next symbol can be read, and altroot can be set to a new alternative chain. For semantic
reasons, however, these actions are delayed until the input symbol is actually
required by the parser. Instead of reading a symbol, the variable mustread is
set to true.
Furthermore, in the complete version we will consider the possibility that
a nonterminal X can be derived into the empty string. This can be tested with
the function
Deletable (x):
boolean
Such a nonterminal is always recognized, even if the current input symbol
does not belong to its terminal start symbols (explanation in Section 7.3.3).
This requires the interpretation of the instructions NT and NTA to be
extended.
Expanded in this way, the algorithm Parse has the following complete
form:
2.44 Algorithm Parse (complete)
Parse (Tcorrect):
param
const
correct:
eofsy=
boolean;
type
Instruction
=
ne,
pc:
sinteger;
Instruction;
of
integer;
indicator
symbol
jmp,ret);
--instruction part adr
--root of alternative chain
--lookahead symbol set
--lookahead symbol set
--lookahead symbol set
--actual stack length
--typ is consumed
--address of G-instruction
--instruction
part
nr
--instruction
part
opcode
--program
integer;
stack: array
sy: integer;
correctness
end of file
(t,ta,nt,nta,any,anya,eps,epsa,
local
adr: integer;
altroot: integer;
anyset: array of Symbolset;
epsset: array of symbolset;
first: array of Symbolset;
lacts:
integer;
mustread: boolean;
nextpc: integer;
opcode:
---
counter
--nonterminals worked on
--instruction part sy
Sec. 2.5
Parsing with the G-code
typ: integer;
Error
VS
GAs
--current source symbol
--local error procedure
Calol ierdones
begin
pc:=1; altroot:=1;
mustread:=true;
lacts:=0;
loop
Decode (LpcT opcode
if mustread then
--initialize
sy Tart adrtnextpc) ;
at pc
symbol
source
--terminal without
--must match
alternative
then
correct:=true;
end;
pc:=nextpc;
instruction
next
mustread: =false
#
if typ=sy
then
if typ=eofsy
exit
mustread:=true
else Error ($pclaltroot)
end
| ete
if typ=sy
then pc:=nextpc;
else pc:=adr
end
Ie hee
pe
--get
--read
NewSym(Ttyp) ; altroot:=pc;
end;
case opcode of
Gi
61
mustread:=true
sor Deletable
else Error (}pc,laltroot)
successfully
--terminal with alternative
--may match
--advance
--goto alternative
(Jsy)
without
--must
stack (lacts) :=nextpc;
altroot:=pc
loop
--advance
=-sets correct :=false
--nonterm.
ety peainer inst sy)
then
lacts:=lacts+l;
pc:=Root (Jsy);
--terminate
--push
--parse
--sets
alternative
match
rule
follower
for
nonterminal
correct:=false
end
Ente:
--nonterminal,with alternative
ir type in first (syn ion Deletable (J sy) --may match
then
lacts:=lacts+l;
stack (lacts) :=nextpc;
--push follower
pc:=Root (Jsy); altroot:=pc
--parse rule for nonterminal
else pc:=adr
--goto alternative
end
| any:
--any without .alternative
pc:=nextpc; mustread:=true
--advance
| anya:
--any with alternative
if typ in anyset (nr)
:
then pc:=nextpc; mustread:=true --advance
else pc:=adr
--goto alternative
end
| eps:
--epsilon
if typ
then
in epsset (nr)
--must
pc:=nextpc
--advance
else Error ({pclaltroot)
|
end
epsa:
if typ
in epsset (nr)
--sets
match
correct:=false
--epsilon with
--may match
alternative
Chap. 2
Syntax
62
--advance
--goto alternative
pc:=nextpc
pc:=adr
then
else
|
end
--jump
| jmp:
pce:=adr
| ret:
pc:=stack (lacts);
altroot :=pc
end
--case
end
--loop
end Parse.
2.6
lacts:=lacts-1;
to next
--return
--find follower
instruction
in
stack
Error handling
Principle
A syntax error arises in one of three situations: (1) the input symbol typ does
not match the symbol sy in the G-code instruction T; (2) typ is not a terminal start symbol of the instruction NT; (3) typ is not a terminal successor of
the instruction EPS. In any of these situations, the variable altroot contains
the address of the alternative chain in which the error occurred and the stack
contains the return addresses of all nonterminals that are currently being
processed.
This is sufficient information to collect all terminals that can be used to
continue the analysis. The following example illustrates the situation.
2.45 Example
Error situation
Consider the grammar fragment:
program = declarations body end.
declarations = .
body = statement {statement}.
statement
| "if"
=
relation
relation
relop
expr
=
= expr
Wy
"then"
relop
| Wau
body
...
expr.
| n_u
| We
| Wea
| we"
=...
and the input text
.
if
a:=b
then
c:=d
end
...
When the syntax analyzer detects the error caused by the ':=", the situation shown in Fig. 2.12 has been reached. The boxes in this figure enclose the grammar symbols of the G-code instructions whose addresses
are in the stack.
Sec. 2.6
Error handling
63
z
Es]
program
declarations
body
=
|Statement|
¢
—_—
statement
Statement
relation
expr
Io
if
a
relop
=
b then c:=d end
Fig. 2.12 Partial syntax tree of an erroneous translation of the instruction
if
end
c:=d
then
a:=b
The last input symbol which was correctly recognized is a. It was recognized as expr. Then relop must follow. Since relop cannot start with
':=' the procedure Error(tpctaltroot) is called. The stack contains the
addresses of the G-code instructions for the recognition of
eof,
end,
statement,
then
1
bottom of stack
top of stack
We will now collect the so-called 'anchors', i. e. all terminals that are suitable
for the resumption of the syntax analysis. They can be grouped into four
classes:
1.
All terminal start symbols of the alternative chain starting at altroot, because the erroneous symbol may have been added inadvertently by the
coder, in front of a symbol of the unrecognized alternative chain. In the
2.
All terminal successors of the alternative chain at altroot, because the erroneous symbol may appear in place of a symbol of the unrecognized
alternative chain. In the example, this set consists of the beginnings of
expr: Vv, C, +, -, (.
example, these are the symbols
>,
>=,
=,
<>,
<=,
<.
64
3.
4.
Syntax
Chap. 2
The terminal start symbols of all symbols in the stack, and of the alternative chains beginning with them. With these, syntax analysis can be
resumed after a non-recognized nonterminal. In the example these are the
symbols then, end, eofsy and the set first(statement).
All terminal successors of the alternative chains whose addresses are in
the stack. In the example, these are all terminal start symbols of body
since body follows then, and all terminal start symbols of statement
since statement follows statement.
While the inclusion of items 1 to 3 in the set of anchors is plausible, the inclusion of item 4 seems rather arbitrary. We could justify this by the fact that
items 3 and 4 are symmetric to items 1 and 2, but there is a heuristic reason as
well. In a grammar, where the ';' is a statement separator rather then a statement terminator, without rule 4 the set of anchors would contain the ';' but not
the start symbols of statements. Then, in the case of a missing ';' between
statements, which is a common error, the next statement would be skipped.
Rule 4 prevents this by adding the start symbols of statements to the set of
anchors. Similar errors, e. g. the suppression of a comma between expressions, are also quite likely to occur.
Now, input symbols are skipped until one of them appears in the set of
the anchors. In the worst case this appears at the end of the input text, since
eofsy is always among the anchors. Next, the stack must be corrected. If the
anchor is a terminal start symbol of the alternative chain, whose address is in
stack(t), analysis will be resumed at this address and the stack length will be
reduced to t - 1. In Example 2.45, only ':=' is skipped since b is a start symbol of expr and the stack is not reduced.
In summary, we can describe the principle of error handling as follows:
2.46 Principle of error handling
An error is detected if an alternative chain is unsuccessfully traversed up
to its end. Then the error is flagged and the analysis must be synchronized. The synchronization consists of collecting a set of anchors and of
skipping the input text up to the first input symbol that is contained in the
set of anchors. With it, the analysis can be resumed at the address pc of
the anchor. During this process the stack is reduced if necessary so that at
the end of the error handling the following assertion holds:
Starting with the G-code instruction at pc the analysis can be continued with the current input symbol typ (typ matches the alternative
chain at pc). The stack contains the return addresses of all nontermi-
nals currently under process when continuing the analysis with pc.
Sec. 2.6
Error handling
65
This error handling has two remarkable features:
1. It is completely independent of the Syntax of the input languag
e.
2. Anchors are collected only if an error is detected. It is therefor
e completely dynamic and starts anew for each error. Hence, the presence of
error
handling does not reduce the parsing speed in case of a correct
input
string. The synchronization itself is expensive, but, since errors are infrequent, this is only a slight disadvantage.
The algorithm Error
From the preceding section, the basic structure of the algorithm Error is
obvious now:
2.47 Algorithm Error (basic structure)
Error (tpclaltroot):
global
correct: boolean;
lacts: integer;
--actual stack
begin
correct:=false;
Print error message;
Collect anchors;
skip input symbols up to the first anchor;
Correct pc, altroot and lacts
.
»- It is synchronized. The analysis can continue
end Error
length
Error messages
The error messages are also independent of the input language. At the error
location, we simply extract all expected symbols from the G-code and list
them. In Example 2.45, the following error message will occur:
. if a:=b
|
then
relop
c:=d
end
...
expected
This message is sufficient for most purposes. In Coco we also provide the
option for the user to output his own error messages (see Section 5.2.2).
The collection of anchors
Since, after synchronization, parsing is resumed with a new G-code instruction newpc and with a new stack length newlacts, anchors are collected as
triples:
(newtyp, newpc, newlacts)
A procedure Triple produces a triple list in which the following triple categories are included:
66
1.
2.
3.
Syntax
Chap. 2
the terminal start symbols of the alternative chain beginning with altroot,
the terminal successors of the alternative chain beginning with altroot,
the terminal start symbols of all alternative chains whose addresses are in
the stack;
4.
the terminal successors of all alternative chains whose addresses are in the
stack.
If a terminal belongs to more than one of the four categories, category 1 has
priority (because no symbol needs to be read). Category 2 has priority over
categories 3 and 4 (because synchronization can take place in the same production where the error occurred). Of the anchors derived from the stack, the
ones closest to the error location have priority, and the terminal start symbols
of the stacked alternative chains have priority over their successors.
In order to fill the triple list with terminal start symbols and successors
corresponding to the priority rules, we use the algorithms Fill and FillSucc.
Hence, the algorithm Triple has the following form:
2.48 Algorithm Triple
Triple(Laltroot) :
global triple list;
stack: array of
lacts: integer;
begin
triple list := empty;
for i:=1 to lacts do
FillSucc
integer;
(stack (i)Ji-1);
Fill (Lstack (i)Ji-1)
--actual
—ONAGG
stack
size
W
--class 3
end;
FillSucc (JaltrootJlacts);
= e1la55@2
Fill(Jaltrootllacts)
end Triple
elassel
As a concrete data structure of the triple list, we use two arrays newpc and
newlacts, which are indexed with the maxt + 1 terminals of the grammar:
newpc, newlact: array(0 : maxt) of integer
The algorithms Fill and FillSucc use the following procedure to obtain Gcode instructions:
GetSymlInstr(l pct opcode? sy? nextpcT altpc)
which supplies the G-code instruction at pc. The two last parameters have the
meaning:
Sec. 2.6
nextpc:
altpc:
Error handling
67
Address of the first 'symbol-recognizing' instruction
(LTA, NG,
ANY, ANYA) which follows the instruction
at pc in the same
production, or 0 if no such instruction exists.
Address of the first ‘symbol-recognizing' instruction
which is an
alternative of the instruction at Pc, or 0 if no such instruc
tion exists.
Fill and FillSucc can now be described as follows:
2.49 Algorithm Fill
2
Fill(Jfirstpellacts).:
global newpc,newlact:
begin
pce:=firstpc;
while pc#0 do
array(0:maxt)
of
integer;
GetSymInstr(JpcTopcode? sytnextpct altpc) ;
case opcode of
t,ta:
newpc (sy) :=pc;
| nt,nta,nts,ntas:
for
all
newlacts(pc)
x € first (sy)
:=lacts
do
newpc
(x) :=pc; newlacts(x) :=lacts
end
« | any,anya: --nothing (eps and ret do not
“end;
pc:=altpc
end
end Fill
exast)
2.50 Algorithm FillSucc
FillSucc (4startpcllacts):
global newpc,newlact: array(0:maxt)
begin
pc:=startpc;
while pc#0 do
of
integer;
Get SymInstr (pct opcode? syTnextcpt altcp) ;
if nextcp#0
pe:=altpc
end
then
Fill (dnextpcd lacts)
end;
end FillSucc
Heuristic improvements
This synchronization procedure works well in most cases and synchronizes
rapidly. However, it is not uncommon for the synchronization to be incorrect,
causing spurious error messages or the skipping of longer text portions. The
quality of the synchronization also depends on the grammar. It can be
Chap. 2
Syntax
68
ones.
improved by partitioning long grammar productions into several shorter
This increases the number of anchors.
We have improved the procedure with two heuristics, which are also independent of the grammar:
1. If several errors occur close together, we print only the first one, under
the assumption that the remaining errors are spurious, resulting from the
first one. We introduce an error distance, errdist, which is set to 0 after
the handling of any error, and is increased by one for each input symbol
read. If errdist is less than a predetermined limit errdistmin when an
error occurs, no error message is given. We use errdistmin = 2, i. e. at
least two symbols must have been recognized since the last error, other2.
wise a spurious error is assumed.
When a spurious error occurs, the stack may have already changed from
the value when the original error occurred. Therefore, we save the stack
at each original error, and restore it at a spurious error.
The heuristics only apply to the program Error and not to its subprograms.
Error now has the final form:
2.51 Algorithm Error (with heuristic enhancements)
Error (fpclaltroot) :
global
correct: boolean;
lacts: integer;
errdist: integer;
errdistmin: integer;
begin
correct :=false;
if errdist<=errdistmin
then
Print error message;
Collect the anchors;
Save the stack
else Replace the stack
end;
--stack length
--error distance
--minimal error distance
again
Skip input symbols up to the first anchor;
Correct pc, altroot, and lacts;
-- It is synchronized. The analysis can continue
errdist :=0
end
Error
Coco includes the above error-handling method in the generated parser.
A similar error handling was published by Spenke et al [1984]. They
assign weights to the anchors and make the use of an anchor for synchronization dependent upon its ‘insertion overhead’ and its ‘reliability’.
3
Semantics
Syntax analysis checks a source program only for formal correctness. That is,
it only determines whether the input string is a sentence of the given grammar.
This function is shown in Fig. 3.1.
sty,
Source program
(character sequence)
Recognized or
Parser
not recognized
Fig. 3.1 Parser
Translation into a target language presents the additional requirement that the
source program must be transformed into the target program. The 'meaning'
of the target program should be the same as that of the source program, i.e.
the semantics should be retained. A program that does this is a compiler (Fig.
372),
Source program
Target program
Fig. 3.2 Compiler
A compiler emerges from a parser if the parser is able to emit so-called 'semantic actions’ each time it has parsed some syntactic construct. The semantic
actions in turn generate output symbols which constitute in their entirety the
target program.
69
70
Semantics
Chap. 3
This chapter covers attributed grammars, which are presently the most
common technique for the formal description of translation processes. To describe the translation the context-free grammar for the source program is enhanced by three items:
1. semantic actions, which describe the actions that must be performed dur2.
ing the translation;
attributes, which describe properties of the grammar symbols and their
environment;
3.
context conditions, which describe relationships between attributes.
We will introduce these three items one-by-one, then cover the formalism of
the attributed grammar as a whole, and finally cover a subset of the attributed
grammars, the so-called L-attributed grammars, used by Coco.
3.1
Semantic
actions
The description of semantic actions can be inserted directly at the desired locations in the grammar productions, e. g. by means of the special delimiters
sem ... endsem.
For a left-to-right parsing of a production A > @, 2, the execution of
the semantic action statseq after parsing w; and before parsing w2 can be described by inserting the semantic action between wı and w2:
A
0]
sem
statseq
endsem
07
This production is to be interpreted in such a way that, for the parsing of A,
where syntax analysis proceeds from left to right, first @; is parsed, then the
semantic action statseq is performed, and afterwards w2 is parsed.
For the description of the semantic actions themselves there are no generally accepted conventions. We will use the language constructs of Adele or
Modula-2.
3.1 Example Semantic actions
Given a grammar of an arbitrary sequence of zeros and ones:
S => Ws | iS | €
The task consists of reversing a sentence o of L(G(S)) to produce an
output where the first input symbol is output as the last, the second input
symbol is output as the next to last, and so on. This translation is simply
written as
Sec. 3.2
Attributes
S$ >
0S
eS:
sem
Write('0')
endsem
sem
Write('l')
endsem
71
I, 2
For a given input sentence, e.g. o = 001, the semantic actions can be traced according to the syntax tree of Fig. 3.3.
If parsing is performed top-down from left to right, the output string
100 results.
eee
ae
=
?
0
S
0
S
1
sem
N
sem
sem
Write('0')
Write('0')
Write('l')
endsem
endsem
endsem
ee
Fig. 3.3 Syntax tree with semantic actions
5#
The next example will show that this method can also describe more difficult
transformations.
3.2 Example Semantic actions
Given the grammar of the previous example, the task is to transform an
input sequence of n zeros and m ones into an output sequence of the
same length which contains all n zeros followed by all m ones, i.e. the
sequence 0” 1”. This translation is described by
s
|
0
sem
Write('0')
endsem
15
sem
Write('1')
endsem
S
ME
3.2
Attributes
Even for such a simple task as the transformation of the input sequence
'79 + 83' into the output sequence '162', the grammar with semantic actions
fails. In general, the input sentence of any two numbers connected by '+' to
Semantics
72
Chap. 3
produce an output sequence that shows the sum of the two numbers will fail.
Why?
When recognizing a constant, the lexical analyzer supplies only the terminal class c (as explained up to now). Thus, the parser 'sees' only the sequence c +c as input. A semantic action that produces the sum of the two
numbers, however, is not satisfied with the terminal classes of the two numbers, but requires the values of the constants. These values are the semantic
properties of the individual members of the terminal class c. Thus, a lexical
analyzer will have to supply two items for input symbols that are terminal
classes: the type and the value of the input symbol. The symbol type (not to
be confused with the data type) is the terminal symbol in the context of the
grammar (variable, constant), and therefore a syntactic property, the symbol
value is a semantic property.
By assigning an attribute to each terminal symbol that represents a terminal class, the semantic properties of terminal classes can be introduced into the
formal language description. We write attributes as indices preceded by an arrow, whereby a constant now assumes the form: ctx, where x is of the type
integer. The up-arrow shows that x is the result of the parsing of c, i.e. has
the character of an output parameter.
By the use of attributes, we can describe the task of reading and adding
two constants connected by a plus sign as follows:
SES
ect x a tac y
sem
Write(xty)
endsem
In general, attributes describe properties that are associated with a grammar
symbol. Therefore, nonterminals can also have one or more attributes. For ex-
ample, let the following three properties apply to the symbol expr: (1) ‘type
of expression’, (2) the expression has no operators, and (3) the expression is
translatable at compile-time. Then we can assign these three attributes to expression by writing
©XPITexprtype Tsimple Tvalueknown
exprtype may assume various values dependent on its data type; simple and
valueknown can assume boolean values. In general, one can assign to each
nonterminal and to each terminal class X of the context-free grammar a number of attributes that describe those properties of X that cannot be described
by the context-free grammar alone. Each attribute can assume a predetermined
number of values. These form the attribute type. The attributes of terminal
classes receive their value through the recognition of the terminal symbols
by
the lexical analyzer. The values of the attributes of all nonterminals are
calculated by the semantic actions.
Secs 322
Attributes
73
3.3 Example Interpretation of arithmetic expressions
Consider the grammar of arithmetic expressions consisting of numbers,
operators, and parentheses, and terminated by a semicolon:
Sy ey 15
E>
T | ETT
eb
ere
Pac
TEE)
We want to define formally the meaning of such an expression by a description of its interpretation. ‘Interpretation’ means that an expression
will be read, its value computed, and the result printed. In the formal description it must be stated that each symbol of the grammar, except for
operators, parentheses, and semicolons, has a value. This value is denoted by an attribute. For example, the production F > c is verbally
interpreted by the sentence ‘the value of the factor F is the value of the
constant c' and formally by the production:
Et, >
ch
sem
a:=b
endsem
Similarly, multiplication is described by the attributed production:
5
Ta >
Tin * Ftc
Sem a:=b*c
endsem
This means: "When recognizing the right-hand side, the attributes b and
c are assigned a value, and subsequently the product of these values is
computed, and assigned to the attribute a of the symbol 7". Correspondingly, the remaining productions of the grammar can be assigned attributes and semantic actions, so the complete description is as follows:
Sy
a
sem Write(a)
endsem
Eta >
Ttp
sem a:=
endsem
Efs >
Ein + Tr.
Sem a:=btc
endsem
sem a:=b
endsem
Ta >
Mn * Ftc
sem a:=b*c
endsem
Fta >
Cfp
sem a:=b
endsem
Fta
(Efp)
sem
endsem
Ad
>
a:=b
Such a description is called an attributed grammar.
A simplified notation
The reader may notice that in Example 3.3 most semantic actions consist of
only an assignment. It is therefore a useful shortcut to abbreviate
Chap. 3
Semantics
74
Fta >
CTD
Sem a:=b endsem
by
Ne
Aas
This notation expresses the fact that the attribute of c is assigned to the output attribute of F without change.
Attributes and semantic actions in EBNF
The extended Backus—Naur form can also be used for the description of attributed grammars. Example 3.4 is the same as Example 3.3 but uses the simplified notation in EBNF form.
3.4 Example Interpretation of arithmetic expressions in EBNF
3
te
sem Write(a)
endsem
U,
Efa > Ifa
(hr
es,
sem a:=atb
endsem
sem
endsem
He
fa > Ffa
(SIEH
a:=a*b
}.
Pia
la
|
oe (ee
Eta
un
With this notation, one can see how the visual separation of syntax and
semantics significantly improves readability.
Input and output attributes
All of the previously used attributes behave like output parameters: they are
generated by the parsing of a terminal or a nonterminal, and are used afterwards. We therefore call them derived or synthesized attributes and denote
them by an up-arrow. But nonterminals can also have attributes that behave as
input parameters, i.e. attributes that already have values, when the parsing of
the nonterminal starts. Then, semantic actions which are executed during the
parsing of the nonterminal can use these values. We call such attributes inherited attributes, and denote them by a down-arrow. The next example
shows the application of inherited attributes.
Sec. 3.2
Attributes
75
3.5 Example Inherited attributes
Given the following grammar
variables:
s
OC
typ
orreals
my pi Gils
ante
which describes
the declaration of
tar
| bool
idlist > id | idlist , id
id is the terminal class of all identifiers. The declaration consists of a
keyword dcl, a type, and one or more variables of this type, for
example: dcl int x, y, z. The semantic action, which should be performed during parsing of the declaration, consists of entering each
variable's name name and type t into the name list. Let this be done bya
call of the procedure NewId(\ nameJt). It is appropriate to call New/d
immediately after the parsing of an identifier id in the production for
idlist. But how can one recognize the type at this point since it was already parsed in the production for typ?
The solution is to attach the type ¢ as an inherited attribute to the
nonterminal idlist:
s->
del
typft
idlist];
idlist]+
;
=
idfname
| idlistlt , idfname
sem
NewId (name, Ft)
endsem
sem NewId(Jname,Jt) endsem
Output attributes of a known symbol A are computed during the parsing of
the right-hand side of the A-production, and can thus be used during the
parsing of other grammar productions that contain A as a part of their righthand side. Thus the information flows from the bottom to the top, from the
leaves to the sentence symbol. Input attributes of a nonterminal A are computed prior to parsing of the A-production, and are used during its parsing.
Thus the information flows from top to bottom in the syntax tree, from the
sentence symbol to the leaves. Output attributes of A describe properties of
the A-phrase, and its constituent phrases. Input attributes of A denote properties of the environment of the A-phrase.
Figure 3.4 shows a syntax tree ‘decorated’ with attributes for the
sentence:
Clall
abslie Sea\ye
The flow of attribute values along the dashed lines can easily be seen.
del
Chap. 3
Semantics
76
typ
Ba
idlist
Z
idlist
|?
bes
E
Dale,
NewId(|name
iiname
Jt)
PPPPFFERERREEFEEEEEREEEEREEEEEOEDERERDEELLLLLLLLESLLLLLLETLOLTLLTTT N
idlist
ä
lie
h
id
y
id
| Tname
x
NewId(Jname
| Tname
lies
pant
NewId(Jname
|t)
eccccescces
Fig. 3.4 Analysis of the sentence dcl int x,y,z.
The attributes flow along the dashed lines
3.3
Context
conditions
The formal syntax description of a programming language is not sufficient to
distinguish between correct and incorrect programs. For example, in a programming language where all variables must be explicitly declared, the following code may be syntactically correct, even though it does not represent a
valid program since the variables x and y are not declared.
PROCEDURE P
VAR a,b: INTEGER
x!=y
END P
If a programming language definition states ‘each variable in an assignment
statement must be declared' this defines a relationship between textually separated language elements, which cannot be represented by a context-free gram-
Sec. 3.3
Context conditions
77
mar. Such constraints are thus called context conditions and are usually considered as part of the semantics since they cannot be described syntactically.
The total set of context conditions is called the static semantics of the programming language. The word static signifies that they refer to the source
code and not to the execution of the program.
Programming languages are full of context conditions. It would be desirable if the language definition contained explicit definitions for them, separating them from the other parts of the language definition and stressing their
importance. Unfortunately, this is rarely the case since they are often buried
implicitly in other definitions. Sometimes they are missing altogether since the
author wants a small defining document, or because it is assumed that the
reader understands them.
Attributed grammars also permit the formal description of context conditions. The context condition is expressed as a relation between attributes. For
example, the context condition 'the left side and the right side of an assignment must be of the same type’ imposes a relation between the type attributes
of both sides. If
assign
=
idftyittyp1
ze
eXPrfyaftyp2
2.
is the production for the assignment, where typ] and typ2 are the types of
id and expr, then the context condition is typ] = typ2. The context condition can be written separately from the production in the form
assign = idtyiTtyp1
CC:
typl=typ2
PR
eXPrfyaftyp2
222
or it can be integrated into the production, e. g. in a manner proposed by Watt
and Lehrmann Madsen [1983]:
assign
= idfyıftypı
":="
eXPrfyaftyp2
"5"
where(typl=typ2).
The first form separates the context condition from the production in a firmer
manner and is especially suited for several long context conditions. The
second form emphasizes the coherence between production and context
condition.
According to van Wijngaarden's two-level grammar, the part where(...)
can be regarded as a nonterminal that is derived into an empty string if the
relationship inside the parentheses is true. It cannot be derived into a terminal
string if it is false. If typ] = typ2, the syntax analysis of where(typl =
typ2) then results in the empty string, so that an assignment is parsed with the
remaining part of the production. However, if typ] # typ2, the terminal
string representing the assignment statement is rejected since the where-part is
not terminating.
Chap. 3
Semantics
78
We use the style with where and define the point of execution of the test
of the context condition by its position in the production in the following way.
The production
A = 0] where(CC)
©
.
means that in order to parse A, we must execute a syntax analysis from left to
right that will parse w, first. Thereafter the context condition CC is tested. If
it is not met, an error will be reported. Then w2 will be parsed.
The following examples show the application of context conditions.
3.6 Example A context-sensitive language
The language {a"b"c": n 2 1} is not context-free. It is shown in all
textbooks about formal languages that a context-free grammar does not
exist for this language. However, the following attributed grammar with a
context condition is easily constructed:
S
At, Btg Cf,
where(p=q=r).
At,
=
a
sem
p:=l
endsem
{a sem
p:=p+l
endsem}.
Btg
= b
sem
q:=l
endsem
{b sem
q:=q+l
endsem}.
Gira
aac.
ESeM r =
Igendsemsetlcısengr:
rl
end Semi
Here, p,q, and r represent the counts of the characters a, b, and c.
The context condition requires that they are equal.
3.7 Example Context condition
The context condition 'in the declaration of an array, both index bounds
must be of type integer, and the lower bound must not be greater than the
upper bound’, can be described as follows:
arraydeclaration
=
idtname "(" constantToittyp:
where((typl=typ2=integer)
":" constant
&
caTtyp2 ")"
(clSc2)).
where c] and c2 represent the numerical values of the bounds.
3.8 Example Context condition
The context condition ‘each variable appearing in a statement must have
been previously declared’, can be described as follows. One must distinguish syntactically the applied occurrence of a name (in a statement) from
the defining occurrence (in a declaration), with the additional syntax rule:
var
=
id.
Sec. 3.3
Context conditions
79
The nonterminal var denotes the applied occurrence of the name id.
Therefore, var must be written in all statements in place of id. If a semantic procedure /sDeclared(\name) is used to check the symbol list to
see if the name of the variable is declared, the context condition can be
simply formulated as follows:
Vorne
ldtnene
where (IsDeclared({name)
) .
If a context condition is not met, this usually affects the execution of the
Subsequent semantic actions, but this cannot be expressed well in the
attributed grammar. In Coco, we therefore avoid explicit context conditions, replacing their checking with semantic actions (see Section 3.6).
However, for the description of the static semantics of programming languages context conditions are very suitable.
3.4
Attributed grammars
In the previous sections we have introduced the elements of attributed grammars. We now consider them in their entirety. In the literature the concept of
an attributed grammar is defined in many different ways (see for example,
Räihä [1977], Tienari [1980], Watt and Lehrmann Madsen [1983]). We will
follow Waite and Goos [1984].
3.9 Definition Attributed grammar
An
attributed
grammar
is a quadruple
AG = (G,A,R,K):
G = (Vy, Vr, P, S) is a reduced context-free grammar; A is a finite
set of attributes; R is a finite set of semantic actions; and K is a finite set
of context conditions. With each symbol X eVr UV], zero or more
attributes from A are associated. With each production zero or more
semantic actions from R and zero or more context conditions from K are
associated. For each occurrence of a nonterminal X in the syntax tree of
a sentence of L(G) the attributes of X can be computed in at most one
way by semantic actions.
The attribute computation process
In the concept of attributed grammars, it is essential that the definition says
nothing about the order in which the semantic actions are executed. In the previous examples, we assumed that syntax analysis was performed top-down
from left to right, and that the semantic actions were executed in the same
80
Semantics
Chap. 3
order. However, according to Definition 3.9, this is not required. The order of
the semantic actions is not predetermined by some syntax-analysis method:
rather, it is free. This eliminates the necessity of putting the semantic actions
and context conditions in particular places of the right-hand side of the grammar productions. All semantic actions and context conditions that belong to a
syntax production can be summarized and written at the end of the production.
In the general case, the translation runs in two phases:
DD
syntax analysis, which constructs a syntax tree;
execution of semantic actions, which mainly compute the attribute values
attached to the nodes of the syntax tree in an arbitrary order.
Step 2 implies that an ‘attribute computation process’ will traverse the syntax
tree in an arbitrary manner and compute the values of the unknown attributes
at each node. A semantic action can be executed at a specific time if and only if
all attribute values which contribute to the computation are known at that time.
The attribute computation process continues until all attribute values are calculated. It is therefore possible that the attribute computation process must traverse the syntax tree several times, up and down, criss-crossing from left to
right. In order to avoid ambiguous computations of attributes, the definition of
attributed grammar contains the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way".
3.10 Example Variable declaration
In Pascal, variables are declared by their enumeration after the keyword
var, and the type follows the list of variables. For example,
var X,y,z: integer
The semantic actions implied by the declaration may consist of a call to a
procedure New/d(\namelt) which appends the name and type of the
variable to the name list. In a strict translation from left to right, this con-
struct leads to difficulties, since the type is known only after all names
have been parsed, and therefore New/d cannot be called immediately
after recognizing a name. In an attributed grammar, these difficulties do
not arise if it is formulated as follows:
|
1
declaration
"var"
2
=
idlist).,
":" typft]
endsem.
idlist),, =
nd Taare
31
sem t0:=tl
Helalisk
soit? are
sem NewId(Jname/t1)
endsem
sem NewId({namelt1);
t2:=t1
endsem.
Sec. 3.3
Context conditions
81
For the source text var x,y,z: integer first a syntax tree is generated,
where all attributes except those of terminal classes have no values (see
Fig. 3.5).
declaration
Wehe
skelilsige
3
:
type
|
sem
t0:=t1
endsem
Nasunsunnunenunnnnannnennanuonnnnnsnnnnne nen ssnnnnnnsnnaunsusannnnennurnennunnen nenn snsnunennnnnsnnsnsnnssnnanunnnumannunnsennnen
MIR each
idlist
*
Em REN
’
DEE en
id
'
sem
NewId(|name
|t1);
t2:=t1
endsem
sem
NewId (|name eV;
t2:=t1
endsem
Tannnonsonsusnnnnunuununsnnnnnenasnnnnansnnnnnanenen
{ceswaceccsccsccsvevscosecorcensccey,
id
sem
NewId(|\name |t1)
endsem
|T name
xX
ee
Fig. 3.5 Analysis of the sentence var x,y,z: integer with the flow of
attributes along the dashed lines
The attribute computation process now starts at an arbitrary node in order
to compute the missing attributes, and to call procedure New/d. Wherever it starts, the first semantic action that can be executed is 0 := tl in
production 1. Then, 2 := t] and New/d(\namelt]) in production 2
can be executed. This process continues along the dashed lines until all of
the semantic actions are executed.
Semantics
82
Chap. 3
In Example 3.10, the order in which the three calls to Newld are executed is
not determined by the attributed grammar, but rather depends on the strategy
of the attribute computation process. In most cases, the order is unimportant,
and therefore this kind of attributed grammar is adequate. If desired, a particular order can be imposed by introducing additional attributes.
Cyclic semantic dependencies
Attributed grammars can be constructed in which the attribute computation
process does not terminate since some attributes depend on themselves. This
is called a cyclic semantic dependency. In Definition 3.9, this possibility is
covered with the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way’. There are algorithms that
can check the grammar for this property (Knuth [1968], Waite and Goos
[1984]). If an attributed grammar of the general form described above has
been defined, it must first be checked for cyclic semantic dependencies, and
possibly transformed into a well defined form.
3.5
L-attributed
grammars
Great effort is required to translate an attributed grammar as described in the
previous section. First, the syntax tree of the program to be translated must be
generated, and each of its nodes must be ‘decorated’ with the attributes. Then
the syntax tree must be traversed more than once to compute the attributes until
all attributes are determined. Nowadays storage and run-time requirements
confine this method to mainframes - if it is regarded as practical at all.
Hence, special forms of attributed grammars are needed for compilers,
permitting the computation of the attributes in a single pass from left to right
through the syntax tree. Then the semantic actions can be executed in parallel
with the syntax analysis and no syntax tree is needed. Such attributed grammars are called L-attributed (i.e. left attributed) according to Lewis et al.
[1976]. All examples in Sections 3.1 through 3.3 are of this kind. The limitations imposed on attributed grammars to make them L-attributed, and are
related only to the order of the attribute occurrences in a production. Each
inherited attribute a of a grammar symbol X on the right-hand side of a
production must be computable before X can be recognized. Therefore, for
its computation only those attributes can be used that are known prior to the
parsing of X. From this, the following definition follows:
eee 3:5
L-attributed grammars
83
3.11 Definition L-attributed grammar
An attributed grammar is called L-attributed if for each of its productions
Y > X, ... X,, the following is true: An input attribute of X, depends
only on the input attributes of Y and on the output attributes of
Xı
eee
re
It can easily be checked by inspection whether a given grammar based on this
definition is L-attributed.
The question is, how far can one get with an L-attributed grammar, and
what do the limitations mean? The general attributed grammars are indisputably the more powerful tool. The user does not need to be concerned about the
processing order of attributes (and possibly storage of intermediate results)
since this is all done automatically by the attribute computation process. The
description is essentially static and thus 'in principle’ simple. In reality, such
descriptions can be cumbersome and difficult to understand, particularly in the
presence of many attributes.
L-attributed grammars can be used to describe the translation of nearly all
important language constructions. However, in many cases more context must
be used for the translation. This is expressed by the necessity of saving inter-
mediate results in lists, stacks, etc. In Section 3.6 it is shown how the non-L-
attributed grammar of Example 3.10 can be easily replaced by an L-attributed
grammar with semantic actions for temporarily saving variable names. The
worst that can happen is that the order of the semantic actions which is imposed by the use of the L-attributed grammar will require the partition of the
translation into several passes in which each pass can be defined by an L-attributed grammar. In view of these disadvantages, Waite and Goos [1984]
say: 'L-attributed grammars are inadequate, even in comparatively simple
cases.' We do not agree with this categorical statement. In most cases, the
simplicity and the ease of implementation of L-attributed grammars more than
compensate for their disadvantages. Therefore we feel that they are a very
suitable tool for compiler implementations, at least as long as our computers
are limited in memory and speed.
Coco processes only L-attributed grammars, and all attributed grammars
in the following chapters of this book are L-attributed.
Algorithmic interpretation of L-attributed
grammars
While general attributed grammars are a declarative and therefore non-algorithmic formalism, L-attributed grammars can also be regarded as algorithmic descriptions, imposing an order in which semantic actions have to be executed.
Semantics
84
Chap. 3
Programmers who are used to think algorithmically will find it easier to follow
this approach. Therefore, we understand an L-attributed grammar as a very
high-level algorithmic language in the following sense.
The context-free portion of a production
A=,
1a,
| a@,.
denotes the algorithm: ‘Parse the nonterminal A by choosing the matching
alternative o;, and recognizing its components sequentially from left to right.’
Each alternative with a semantic action of the form
OQ;
=
X1...X;
sem
SA
endsem
X441---Xp
denotes the algorithm: 'Parse X} through Xj, then execute the semantic
action SA, and then parse X;+1 through Maes
Each alternative with a context condition of the form
OR
RER,
where
(CC)
X441---Xn
denotes the algorithm: 'Parse X; through Xj, then test the context condition CC (and report any errors), and then parse X;,; through X,.'
An attributed production of the form
Alaotoo
= Xlaıloı
Yla2Tb2-
denotes the following algorithm:
1.
2.
3.
compute a] (using semantic actions that are not stated here, which must
precede X and may depend on a0);
parse X (thereby b/ gets a value);
compute a2 (using semantic actions that are not stated here, which must
precede Y and may depend on a0, al, b1);
4.
5.
parse Y (thereby b2 gets a value);
compute bO (using semantic actions that are not stated here, which may
depend on a0, al, bl, a2, b2).
This algorithmic interpretation adds as a further clause to the definition of Lattributed grammars (Definition 3.11) the sentence: ‘Attributes that are used as
arguments in a semantic action or context condition between the grammar
symbols X, and X;,, can only be input attributes of the left-hand side of the
production and output attributes of X, to X;.'
Sec. 3.6
3.6
Implementation of the semantic interface
Implementation
85
of the semantic interface
The implementation of the semantic interface in a compiler compiler and in the
generated compiler consists of three tasks:
1.
2.
3.
translation and storage of semantic actions during compiler generation
time and execution of semantic actions at run-time of the generated
compiler;
translation and storage of context conditions during compiler generation
time and test of context conditions at run-time of the generated compiler;
reserving memory for attributes at compiler generation time and attribute
passing at run-time of the generated compiler.
These tasks are most simply and directly implemented if the generated compiler performs its syntax analysis with the popular method of recursive descent,
which is not covered in this book (Gries [1971], Hartmann [1977], Wirth
[1986]). In this, semantic actions and context conditions are directly embed-
ded as code in the syntactic procedures, and attributes become parameters of
the syntactic procedures. The simplicity of this kind of semantic interface
makes the method of recursive descent still attractive today for hand-coded
compilers. If the generated compiler performs a table-driven syntax analysis,
then somewhat more effort is required for the semantic interface. In this
section, we cover the method used by Coco.
Semantic actions
The semantic actions are numbered. The order is arbitrary, but it is easiest to
order them as they appear in the attributed grammar. We start the numbering at
12 for reasons that follow. All semantic actions are placed in the single procedure Semant as follows:
Semant (nr):
case nr of
12: Semantic
| 13: Semantic
| n
ee
Action
Action
12
13
Action
n
end
end Semant
The G-code is expanded to provide as many instructions as there are semantic
actions. The G-code instructions treated in Section 2.4 (and two more, see
Definition 3.14) have operation codes 0 through 11. Operation codes 12
through 255 correspond to semantic actions 12 through 255. Thus, Coco has
a limit of 244 semantic actions which will probably be rarely reached. We only
Chap. 3
Semantics
86
need 68 semantic actions to describe the attributed grammar of Coco itself, and
126 semantic actions for the largest pass of a Modula-2 compiler.
For the processing of semantic actions the parser of Algorithm 2.44 needs
to be expanded only by an if statement:
3.12 Parser with semantic interface
Parse (Tcorrect):
loop
case
opcode
of
I
rei
else
if correct then Semant (4 opcode)
--perform semantic action
end
end
end
--
--
end
case
loop
Parse
We will now study this method in more detail by an example that uses an L-attributed grammar to translate the following declaration:
var
x,y,Z:
integer;
(In Example 3.10 we have already given a general attributed grammar for this
task.) Before we can add the identifier list and type to the name list, it must be
temporarily stored. To this purpose we will use a queue as abstract data structure with the access procedures /nitQueue, Enqueue, Dequeue, and
EmptyQueue whose meaning is obvious. The attributed grammar is as
follows:
declaration
=
UVicUae CLens
sem
InitQueue;
(re
idlene
sem
Enqueue(Jname)
uate
VDEhe
sem
while
not
Enqueue(Jname)
endsem
endsem
EmptyQueue
do
Dequeue (Tx) ; NewId(Jxlt)
end
endsem
v.u
ya
The numbering of the semantic actions and their integration into the procedure
Semant results in the following:
Semant
local
(Jnr):
name,x:
Ee
begin
easeanrzoR
(int,
Nametype;
DOO
neali
Sec. 3.6
Implementation of the semantic interface
12:
119%
| 14:
InitQueue;
87
Enqueue (J name)
Enqueue (J name)
while not EmptyQueue
do
Dequeue (Tx) ; NewId (Lxlt)
end
end
end Semant
The attributes are local variables of Semant. This means that in general all the
names contained in a semantic action (enclosed between sem and endsem)
are global to this semantic action, and therefore common to all of the other
semantic actions.
Context conditions
Context conditions are not treated as an independent language element in
Coco. Rather, they are represented as semantic actions. Instead of
where (CC)
we write, for example,
sem
if
not
CC
then
SemErr
end
endsem
where SemErr is a semantic error processing procedure.
Attribute passing
Coco treats all attributes as local variables of Semant. They receive their
value through attribute passing. This is different for terminals and nonterminals. The attributes of terminals (i. e. terminal classes) are always synthesized attributes. They receive their value by the lexical analyzer during
parsing. The inherited attributes of nonterminals are passed before parsing by
an implicit semantic action, whereas the synthesized attributes are passed after
parsing.
3.13 Example Attribute passing
For the productions
A
= ... Biyty +
Bist,
the attribute passing
[x
is done in the A-production before the parsing of B, and the attribute
passing
y:-V
is done in the A-production after the parsing of B.
Chap. 3
Semantics
88
The attribute passing after the parsing of a nonterminal can be executed by a
'normal' G-code instruction, i. e. by an instruction activating a semantic
action. However, for the passing of inherited attributes, two additional G-code
instructions are necessary:
3.14 Definition
G-code (remainder)
Instruction
Bytes
Description
NTS
sy sem
3
nonterminal with input attribute semantics.
If the next input symbol is a terminal start symbol of sy,
then execute the semantic action sem (for input attribute
passing) and start the parsing of the production for sy, else
report an error.
NTAS
sy adr sem
5
nonterminal with alternative and input attribute
semantics.
If the next input symbol is a terminal start symbol of sy,
then execute the semantic action sem (for input attribute
passing) and start the parsing of the production else go to
adr.
A complete example for the translation of an attributed grammar into G-code,
including attribute passing semantics, can be found in Section 8.3.
Problems with semantic interfaces
The simplicity of this semantic interface gives rise to two problems:
1. Semantic actions may only be executed when it is clear that no other alternative will match. In the production
A = sem
| sem
actionl
action2
endsem
endsem
C.
D.
it must be determined whether C or D is the proper alternative before
executing action! or action2. Coco takes this into account by automatic
insertion of an e-node before the corresponding semantic actions, which
leads to the following result:
A
sem
action
1 endsem
c
|
A>€-action1l1
OC
x
sem
action
2 endsem
€
— action
2 >
D
EPSA
1M
SEM
12
NT
C
Sec. 3.6
Implementation of the semantic interface
89
where the proper selection of alternatives is done with the following
lookahead sets:
epsset
first (C)
(1)
epsset (2)
=
first (D)
This also works in the following production:
A=B
sem
action
1 endsem
sem
action
2 endsem
C sem
action
3 endsem
{
}.
i
For the above the following top-down graph and corresponding G-code
is generated:
|
A = B=
action]
|
(ieaction2
=
C~action3
NT
SEM
M1:EPSA
€
B
12
1 M2
SEM
13
NT
(€
SEM
14
JMP
Ml
M2REBSS
2
RET
with the lookahead sets
epsset (1)
=
epsset
(2) =
first (C)
follow(A)
If the e-nodes have disjoint lookahead
2.
sets, these constructs are LL(1).
Attributes in Coco are implemented as local variables of Semant. This
results in the undesirable feature that their values are not retained during
recursive parsing of nonterminals. For example, in the interpretation of
expressions, the following production arises:
Bie = ype
la
Tty sem
x:=x+y
endsem}.
Here, the output attribute x of the left T must be still available after
parsing of the right T since its value is used afterwards. However, since
T is recursive over F and E, the attribute x of the left T may be
destroyed by the parsing of the right T. Coco does not take care of this
problem. It is up to the programmer to save and restore x explicitly. This
can be done by use of a stack and replacing the above production by the
following:
Semantics
90
Et, = Tt, ("+" sem Push(lx)
Tty
Chap. 3
endsem
sem Pop(tx); x:=xty endsem}.
From this follows the
3.15 Principle of attribute saving for recursive symbols
Attribute values that must be preserved beyond the parsing of a recursive
nonterminal X must be saved before the parsing of X and restored after
the parsing of X.
4
Various compiler compilers
In the previous chapter we covered the theoretical background of compilers. In
the following chapters we will show the practical application of these principles in the design of the compiler compiler Coco.
However, before we go into the details of Coco, it will be interesting to
look at some other compiler compilers. This will enable the reader to compare
Coco with these systems.
There is extensive literature about compiler-generating systems. Bibliographies can be found at Räihä [1980] and Meijer and Nijholt [1982]. The
scope of this book allows us to cover but a few of them; and even then only to
a limited degree. Some of the best-known compiler compilers are YACC
(Johnson [1975]), HLP84 (Koskimies [1984]), GAG (Kastens et al.[1982]),
and MUG (Ganzinger and Giegerich [1984]). In the following paragraphs, we
will compare these systems to each other.
The basic operation of today's compiler compilers is always the same.
The compiler to be generated is described by a metalanguage based on
attributed grammars. From this compiler description, a parser and a semantic
evaluator are generated which constitute the essential parts of the resulting
compiler. The generated compiler reads the source text to be translated,
performs a syntax analysis to check the correctness of the input, and builds a
syntax tree in memory. It then assigns attribute values to the tree nodes
according to the attributed grammar. This process normally requires several
passes which traverse the tree from left to right or from right to left. In each
pass as many attributes as possible are evaluated. Finally the total semantics of
the source program is represented by the attributes in the tree. The last pass
generates the target code from the attribute values.
91
Various compiler compilers
92
Chap. 4
The various compiler compilers mainly differ in their compiler description
languages, and in their algorithms to traverse the syntax tree. Although much
effort is spent to reduce execution time and attribute space, large memory
requirements and long processing times are the main reasons why automatically generated compilers are still less efficient than hand coded compilers.
Therefore some compiler compilers like YACC and Coco bypass the construction of a syntax tree and accept that they are less powerful and less
generally applicable than HLP84, GAG, or MUG.
The above mentioned compiler compilers will be compared without going
into too much detail. We will give a short example of their input language
which will show the translation of a signed integer constant into its value.
Normally, such tasks are handled by the lexical analyzer. However, they can
also be solved with an attributed grammar, which is short and easy to
understand and is therefore well suited as an example of attributed grammars.
Of course compiler compilers can achieve more than what is demonstrated
in this short example. Most of them will only show their advantages on a large
and complex task. However, these small examples will allow some interesting
conclusions about the user-friendliness and the effort required to learn the
description language of the various systems.
4.1 YACC
- yet another compiler compiler
Origin and scope
YACC was produced by Stephen C. Johnson at Bell Laboratories in 1975. It
runs under Unix and is therefore widely available. YACC accepts L-attributed
grammars with the limitation that each grammar symbol has only one
synthesized attribute and no inherited attributes. From the compiler description, YACC generates an LALR(1) parser (Lookahead LR(1)) and a semantic
analyzer which is simply a collection of all of the semantic actions of the
compiler description. The user must supply a main program, a lexical
analyzer, and a syntax-error handler.
Description language
The syntax parts of the YACC source language are written as BNE
productions. All terminals (with the exception of literals) must be declared.
For the production
X0:X1X2
... Xn, the symbol $$ denotes the attri-
bute of X0, $1 the attribute of XI, and $n the attribute of Xn.
Semantic
actions can be specified at any position between the symbols of the context-
Sec. 4.1
YACC - yet another compiler compiler
free grammar. They must be
sequence of valid C statements.
in semantic actions. At the end
which are called in the semantic
named yylex must be provided.
93
written in C and may contain an arbitrary
Context conditions are written as if statements
of the grammar, one can write C procedures
actions. At this point also a scanner procedure
Attribute processing
The attribute processing is done in a single pass during syntax analysis. An
explicit syntax tree of the source language is not produced.
vd
Implementation
YACC is written in C and produces compilers that are also written in C. It has
been used for the translation of many languages, including C, APL,
RATFOR, and Pascal.
4.1 Example Attributed grammar as input for YACC
$start
Number
/* start
Stoken
digit
/* declaration
symbol
/* have
5%
to
be declared
/* separator
/* Tun
an ma
a
Digitlist:
|
terminals.
a
"=" Digitlist
Digitlist
|
a a a
as
*/
Literals
a a
a
digit
Digitlist
don't
*/
*/
*/
a
“ Number:
/* a
of the grammar
of terminals.
a
ee
ee
*/
{printf (-$2);}
{print£ils1);)7
a
a
a
a
wn we
we
{$$ = $1;}
{if (($1>3276)
(($1=3276)
digit
|
&&
ee
x /
($2>7)))
{printf ("Constant
too big");
$$ = 0;}
else
IfeFE
Sepp
ps
ee a
RR a
sy
he
($$ = $1*10
er
SE Se
5%
#include<ctype.h>
yylex()
{
/* lexical
int ch;
while ((ch=getchar
())=="
analyzer
");
if (isdigit(ch))
{yylval=ch-'0';
else return (ch);
}
yyerror (s)
/*error
procedure*/
chars2s,
{printf ("%s\n",s);)
main()
/*main
{return (yyparse());}
*/
procedure*/
return
(digit);}
+ $2;}
aay BE
re
ee */
94
4.2 HLP84
Various compiler compilers
Chap. 4
- Helsinki language processor
Origin and scope
The first version of HLP was produced in 1978 under the name HLP78 at the
University of Helsinki by Räihä er al. [1983]. Since then a new version,
HLP84 (Koskimies [1984]), has been created which has little in common with
the previous one. HLP84 accepts attributed grammars for a one-pass translation of programs. It generates a scanner, an LALR(1) parser with error
handling, and a semantic evaluator to which user procedures can be attached.
Symbol table handling can be partially described in the compiler definition
language; in certain cases it is even done automatically. This reduces the
number of semantic procedures required.
Description language
The description language Lisa is nonterminal oriented. This is in sharp
contrast to other compiler description languages, where the emphasis is on
productions. Each nonterminal is described by a block which forms the scope
of its local objects. This is similar to the use of procedures in higher-level
languages. A block contains all productions of a nonterminal in extended
BNF, as well as the description of all terminals used in it. Within a block,
attributes and local variables are declared in a Pascal-like form.
A set of semantic rules consisting of assignments and function calls is
attached to each production. These rules assign values to the synthesized
attributes on the left-hand side and to the inherited attributes on the right-hand
side of the production. An attribute a of a grammar symbol S is denoted by
S.a. Terminals can have a single synthesized attribute. There is a specific
language element for context conditions. Lisa provides some standard facilities
for frequently needed operations such as definition of scopes and searching of
names in them. These mechanisms free the user from some clerical work. For
example, an identifier will be automatically searched in all open scopes and its
node in the syntax tree will be automatically attributed according to the information in its symbol table entry.
Attribute processing
Attributes are processed in a single pass from left to right by means of an
attribute-stack and without an explicit syntax tree. This limits the application of
HLP84 to languages that can be translated in one pass although it is not
required that semantic analysis is done during syntax analysis.
Sec. 4.2
HLP84 — Helsinki language processor
95
Implementation
HLP84 was implemented on a Burroughs B7800 computer in Pascal. It
generates compilers in Pascal. The system has been used for its own implementation and for the generation of a Pascal compiler.
4.2 Example Attributed grammar as input for HLP84
external
-- declaration of external Pascal-objects
type Outfile = Extfile;
function WriteInt(f:Outfile; i:Integer):
(f:Outfile) =
procedure ExtOut (var f:Extfile; i:Integer) ;
-- Connects the Pascal-procedure ExtOut with the Lisa-function
--
Writelnt.
--
Extfile
and
nont
Number;
---
ExtOut
are
given
in a special
description of the nonterminal
Number has no attributes.
attrset Intval = (val: Integer);
-- val is declared to be an integer
system
file.
Number
attribute.
The
(start
sym.).
attribute
-- declaration is given the name Intval.
var out: Outfile;
-- global variable
const
nont
max
SignedNumber:
nont
,
= 65535;
DigitList:
Intval;
-- description
-- SignedNumber
of the
has an
nt "SignedNumber".
attr. set "Intval"
Intval;
check val < max;
-- context condition
token DigitToken: Integer = Digit;
-- the terminal "DigitToken" with an attr. of type Integer is
-- declared to consist of a single digit (Digit is predefined)
DigitList = DigitToken;
rules
val:=DigitToken
-- the attr. of a token
end;
---
syntactic production
semantic rules
is denoted
DigitList = DigitList DigitToken;
rules
val:=10*DigitList.val+DigitToken
end
end DigitList;
SignedNumber = '-' DigitList;
rules
val:=-DigitList.val
end;
SignedNumber
rules
= DigitList;
by the
name
of the
token
Chap. 4
Various compiler compilers
96
val:=DigitList.val
end
end SignedNumber;
Number
=
rules
post
SignedNumber;
(out, SignedNumber.val) ;
out:=WriteInt
-- after
end
end Number
SignedNumber
is processed,
its
attribute
val
is written.
4.3 GAG - generator based on attribute grammars
Origin and scope
GAG
was developed by Kastens, Hutt, and Zimmermann
[1982] at the
University of Karlsruhe. It accepts ordered attributed grammars where the
attribute evaluation order of each nonterminal is fixed and independent of the
context of the nonterminal. From the compiler description, an attribute
evaluator and an LALR(1) parser are produced (by separate tools). The user
must supply a lexical analyzer and a few other procedures such as a code
generator. These modules together with some fixed parts constitute a complete
compiler.
Description language
The grammar is written in extended BNF with special constructs for options
and repetitions. All nonterminals and terminals (except literals) must be
declared. Every production is associated with a set of semantic rules. In these
rules the strongly typed, functional language Aladin is used, allowing attribute
assignments and function calls. The right-hand side of an assignment can be a
complex expression of attribute values, function calls, if expressions, syntax
symbols, and many others (see Example 4.3). As a functional language Aladin
has neither variables nor control statements. The attribute notation S.a means
the attribute a of the symbol S. If S occurs in a production several times, the
first occurrence is denoted by S[1], the second by S[2], and so on. There is
a special language element for context conditions.
Attribute processing
A decorated syntax tree is built during attribute evaluation, but it is not
traversed in alternating passes from left to right and from right to left, as is
done in some other compiler compilers. A node is visited if there are no more
Sec. 4.3
GAG - Generator based on attribute grammars
97
nodes to the left of it, and a parent node is visited when no more of the
children can be visited. The syntax tree is therefore not processed in a straight
direction. In fact, evaluation may sometimes step back some nodes to evaluate
attributes that could not be computed earlier. In this manner, the number of
passes over the tree can be reduced. The memory requirements for attributes in
the syntax tree are optimized by various algorithms. After the attribute evaluation, the decorated syntax tree is passed to a user program which generates the
target code.
Implementation
GAG is implemented in Standard Pascal under Unix BSD 4.2 on a Siemens
computer 7.760. It also generates compilers in Standard Pascal. Compilers for
Pearl, LIS, Pascal, and Ada have already been produced by GAG.
4.3 Example Attributed grammar as input for GAG
GaSe
aS SSS = symbol
TERM
digit
NONTERM
NONTERM
Number
Digitlist
and
attribute
value:
INT
% value
[ge
value:
»RULE rl:
“ Number
::=
is a synthesized
INT
SSS ne
declarations
lag
--------------
SYNT;
integer
attribute
SYNT;
I
MEVEDIigTelnse
STATIC
Number.value:=
TEZUZVZTSETHERE
THEN
ELSE
-DigitList.value
DigitList.value
WPL
% No output of the attribute Number.value.
% The attributed tree is passed to a user written
% which prints the results.
END;
RULE r2:
Digitlist
::=
digit
STATIC
Digitlist.value:=digit.value
END;
RULE r3:
Digitlist
::=
Digitlist
digit
STATIC
Digitlist [1].value:=10*Digitlist[2].valuetdigit.value
CONDITION
(Digitlist
[2] .value<3276) OR
((Digitlist
[2] .value=3276) AND (digit.value<8) )
MESSAGE "Constant value too big"
END;
program,
Various compiler compilers
98
4.4 MUG
- modular
Chap. 4
compiler generator
Origin and scope
MUG
was developed in 1985 at the
(Modularer Ubersetzer-Generator)
University of Dortmund (Germany) by Ganzinger and Vach. It processes socalled one-sweep grammars (Engelfriet and File [1981]). MUG supports all
phases of semantic analysis (attribute processing, optimization, and code
generation). However, it does not produce a scanner or a parser. Those can be
generated with YACC and then attached to the MUG system. Semantic
modules are written in Modula-2.
The underlying principles of MUG are substantially different from
traditional attributed grammars. Terminals are viewed as the types of some
semantic objects (so-called semantic sorts), nonterminals are viewed as the
types of syntax trees (so-called syntactic sorts). Productions are therefore
viewed as functions, mapping objects of syntactic and semantic sorts into
syntax trees which are themselves elements of syntactic sorts.
The translation of trees of an input grammar into trees of an output
grammar is called an attribute coupling of the two grammars. Attributes can
be classified as semantic
attributes, which contain semantic values (and
therefore, like the values of terminal symbols, are objects of semantic sorts)
and syntactic attributes, which represent subtrees of the output grammar (and
thus are objects of syntactic sorts). Semantic attributes are computed in
semantic rules, whereas syntactic attributes are built by applying productions
of the output grammar. Semantic attributes can also be viewed as ‘terminal
symbols' of the output grammar.
As a result of this view, several attribute coupling processes can be
concatenated so that the output grammar of the first coupling becomes the
input grammar of the second one. As an option, MUG can automatically
combine the two attribute couplings into a single one. The user can therefore
describe complex translation processes as a sequence of simple translations
(e.g. L-attributed grammars), which the system — hidden from the user — combines into a single attributed grammar that does not need to be L-attributed. In
this manner, readability is balanced with efficiency.
Description language
MUG uses one description language for all translation phases. It is based on
Modula-2. The production
Prodl:
A->Bc
is written in a function-like manner as
CONSTRUCTOR
Prodl
(btree:B;
cval:c):
A
Sec. 4.4
MUG - modular compiler generator
99
An attribute a of a nonterminal $ is written as Sa. All nonterminals must
be declared together with their attributes and attribute types. For semantic
sorts, the user must write Modula-2 modules that export them as types unless
they are standard types of Modula-2. There must be separate modules for the
input grammar, the output grammar, and their attribute coupling. Semantic
rules can contain assignments with arbitrary Modula-2 expressions, function
calls, and if expressions. Syntactic attributes are calculated through constructors of the output grammar. Context conditions have no construct of their
own. They must be specified within semantic functions.
#
Attribute
processing
The attribute processor generated by MUG uses the 'one-sweep' method,
which is an L-attributed processing of the syntax tree, where possibly children
of each node have been previously brought into an adequate order.
Implementation
MUG was implemented in Modula-2 on
aCADMUS computer. It generates
compilers in Modula-2 and has been used for its own implementation.
4.4 Example Attributed grammar as input for
SIGNATURE
DEFINITION
(*definition
of the
MODULE
Numbers;
context-free
FROM Values IMPORT
Value;
(*syntactic
FROM User IMPORT
digit, minus;
SORT
Number, Digitlist;
MUG
input
grammar*)
sort from the output grammar*)
(*semantic sorts (terminals)*)
(*syntactic
sorts
(nonterminals)*)
(*rules of the context-free grammar*)
CONSTRUCTOR PosNumber (dl:Digitlist): Number;
CONSTRUCTOR NegNumber (m:minus; dl:Digitlist): Number;
CONSTRUCTOR SingleDigit (d:digit): Digitlist;
CONSTRUCTOR MoreDigits(dl:Digitlist; d:digit): Digitlist;
(*attribution function for the context-free
OPERATOR Evaluate(n:Number):
Value;
END Numbers.
SIGNATURE
DEFINITION
(*definition of the
SORT Value;
CONSTRUCTOR
END
Values.
MODULE
grammar*)
Values;
context-free
t
Resul
(val: INTEGER):
output
Value;
grammar*)
Chap. 4
Various compiler compilers
100
ATTRIBUTATION
(*attribute
FROM
MODULE
coupling
Values
OPERATOR
IMPORT
above
of
Digitlist
grammars*)
Value;
Evaluate(n:Number):
(*declaration
ATTR Number
ATTR
Numbers;
of the
Value;
attributes*)
SATTR nval: Value;
SATTR
dval:
INTEGER;
(*attributations of the productions*)
CONSTRUCTOR PosNumber (dl:Digitlist): Number;
BEGIN
PosNumber”nval = Result (dl”dval);
(*the constructor "Result" builds a
syntactical attribute of type "Value"*)
END PosNumber;
CONSTRUCTOR
NegNumber
(m:minus;
dl:Digitlist):
Number;
BEGIN
NegNumber“dval
END NegNumber;
CONSTRUCTOR
= Result (-d1”dval);
SingleDigit (d:digit) : Digitlist;
BEGIN
SingleDigit*dval
END
= d;
SingleDigit;
CONSTRUCTOR
MoreDigits(dl:Digitlist;
d:digit):
Digitlist;
BEGIN
MoreDigits*dval
END MoreDigits;
= 10 * dl’dval
+ d;
END Evaluate;
END Numbers.
4.5 Coco - compiler compiler
Origin and scope
Coco arose in 1983 at the University of Linz as a successor of a parsergenerator. It processes L-attributed grammars, which are viewed as procedural
descriptions of a translation process. The compiler description is translated
into an LL(1) parser with automatic error recovery and a semantic evaluator
to
which user modules can be attached. The user must further supply a main
program and a scanner (for which there is a scanner generator). It is possible
to generate multi-pass compilers with Coco.
Sec. 4.5
Coco — compiler compiler
101
Description language
The compiler description language Cocol is based on context-free grammars
in Wirth's EBNF notation. All terminals and nonterminals must be declared.
Each syntax symbol can have one or more attributes. A symbol S$ with an
output attribute a is written as S<out:a> wherever it occurs within a production. Semantic actions are written directly in Modula-2. They may appear
at arbitrary points on the right-hand side of the productions. Attributes can be
accessed like normal variables. Context conditions are written as if statements
in semantic actions.
#
Attribute
processing
Semantic evaluation takes place during the syntax analysis. A syntax tree of
the input is not built. Productions are processed strictly from left to right.
When a semantic action is encountered, it is executed immediately. Attribute
values of terminals are returned by the scanner, those of nonterminals are
passed using assignments generated by Coco.
Implementation
Coco is implemented in Modula-2 on various microcomputers including
Macintosh, IBM-PC, Atari, and Lilith. It is also available on IBM mainframes, Coco generates compilers in Modula-2. It has been used for the
construction of a multi-pass Modula-2 compiler and for the generation of
several tools for static program analysis.
4.5 Example Attributed grammar as input for Coco
GRAMMAR
SEMANTIC
Number
DECLARATIONS
FROM InOut IMPORT
VAR value,valuel:
WriteString,
INTEGER;
Writelnt;
TERMINALS
ve
digit
<out:value>
NONTERMINALS
Number
Digitlist
<out:value>
RULES
Number =
Digitlist<out:value>
|
sem
WriteInt (value,5);
sem
WriteInt (-value,5);
endsem
wow
Digitlist<out:value>
Digitlist<out:value>
digit<out:value>
=
endsem.
Chap. 4
Various compiler compilers
102
{ digit<out:valuel>
sem
IF
(value<3276) OR
((value=3276) AND (valuel<8) )
THEN value:=10*valuetvaluel;
ELSE
value:=0;
WriteString("Constant
too
big");
END;
endsem
ENDGRAM
}.
4.6 Summary
This short overview of some of the better known compiler compilers has
shown that many powerful systems with complex input languages exist for the
definition of many exotic special cases. Why then are these generators so
seldom used for practical applications? There are many reasons. The most
significant is the fact that automatically generated compilers are simply less
efficient than manually coded ones. According to Koskimies et al. [1982], a
Pascal compiler produced with HLP78 ran seven times slower and used three
times as much memory (only for its code!) than a manually generated
compiler.
However, efficiency is not the main goal of a compiler compiler. Often it
is more important that the compiler description be short, formal, and complete.
Then it can be used as a prototype of a compiler implementation for a new
language or to study the techniques of compiler construction as such.
Compiler description languages are sometimes not easy to read. In most
cases ordinary BNF is used for the syntax definition. Although concise and
elegant, this notation often looks unnatural because of the recursion needed to
express repetitions. Attributes usually appear only in semantic rules and not
with the grammar symbols. This makes the productions short, but the reader
must extract from the semantic rules those attributes which belong to a given
syntax symbol. In many cases, the semantic rules may only be attribute
assignments. Therefore, important parts of the actual translation must be
hidden in procedures. Having these difficulties to contend with may even
make the compiler compiler a burden rather than a help.
Finally, most compiler compilers require a lot of memory themselves. For
example, GAG required 4 megabytes of main memory for the generation
of an
Ada compiler, and this amount of memory is not available on many micro-
computers.
Sec. 4.6
Summary
103
We believe that a compiler compiler should be a tool which is easy to
understand and easy to use. Above all, its input language should be clear and
natural, but its availability (e.g. on microcomputers) and efficiency are equally
important. These were the considerations behind the development of Coco and
its input langage Cocol.
Table 4.1 summarizes the main features of the described compiler
compilers.
‘Tempo
SIOPNLISUOI
aremıjJosUrduJuu
$]00}
<OnfeA:1no>
ISFTNSIC]
[NSIC
oneAyIST
ISPTIWiq:Ip)
:(BIp:pNSIC]IST
BI
Snpea
TNSIG
joquids
pozisouuds
o3uls
+1B81p
AOIDNALSNOI
Id
ISTTIBIG
=::
‘suoissaidxo
oynquyye
nap
ayngunre
ied
poynquye-Tsrewurels
yyım
e
feeds
Tease
(DATVI
‘syuoWUsIsse
: IST
USI]
|SBI
srpisiq
“I9sIed
INUEULIS
JoTenfeA9
ON
Berg
(suonJe) sofnI STURWOS
srewumıld
pemnqiyre Jo SSe])
sse[D
Jo
3a1J-IX9JU09
srewureld
Areniqry
I
STUIWITEIS
anquyy
dating
es]
xeyuäsuONeNJeA9
san
Ul
OY}
I9PIO
Jan (ofdurex3)
xeuks (oIdumexs)
uorejou
uonepu
sıngLmy
Baer
pm 5] Samen
mare |
oN
Joyenyead
INUEUIIS
ainquaye
JofenfeA9
‘ı9sıed
Ye] wou ssed afduıs e uy
KıaA9
xejuAs
polopioar
uopusdapur
Joy
Afqıssod
srewurel3
uo
I9PIO
Jo
somnquite
pomnquye-])
uoTenfeA>)
|
T’p aquL
“DUUEIS
uorjouny
uonnguue
epy ‘Teosed “Ta Vdd
Iewwreis ‘Ie pelaplo WOLF
sy[nsal Iopio uoTenTeaq
a Pee
(feuruuspuou
snquny
OnfeASTIEIT
(soon
s1aqtdui09 Japiduios snows Jo soniodoig
JoTenfeA9
‘edendur]
‘sjTeo
Sp
Sri
= ISIC IST “SIP
an)
gnquyy
psiq
st=
PA
dooms, a[8uts e uf
Jorn
‘sJUOWUSISSY
sıngınre
‘suotssoidxe
UOMOUN
‘sTTed
{Zip}
psig
sr=
wtp
BP awes |e
‘“Iosıed
‘I
‘dy
‘Jopey
Teoseg
yoıym
01
porddy
IXJUOJ sodendue] Jo
SUONIPUOO
au Jptiduioo
adendue]
porersusd
)
The compiler description
language Cocol
This chapter describes Cocol, the input language of the compiler generator
Coco. A Cocol text essentially consists of an attributed grammar and
declarations. From this description, Coco generates a parser and a semantic
evaluator. The user has to provide a main program, a scanner, an error
message module and semantic modules to get a complete compiler. Some of
these modules can be generated by tools or are standard modules that do not
depend on the language to be processed.
The attributed grammar consists of a context-free grammar as a description of the compiler input and of semantic information as a description of how
this input is to be translated. When designing an attributed grammar one
usually starts with the context-free grammar and completes it step by step with
attributes, semantic actions and context conditions. Therefore this chapter is
arranged in two parts: the specification of Cocol as a syntax description
language and its specification as a semantic description language.
5.1
Lexical
structure
A grammar description in Cocol consists of keywords, identifiers, strings,
numbers, comments and special characters.
Keywords
ALIAS
ENDSEM
MACROS
RULES
105
The compiler description language Cocol
106
ANY
DECLARATIONS
ENDGRAM
EPS
GRAMMAR
IN
NONTERMINALS
OUT
PRAGMAS
Chap. 5
SEM
SEMANTIC
TERMINALS
Keywords must be written with upper-case letters, except for the following
keywords that may also be written with lower-case letters, as they often
appear in a context where they are not to be emphasized.
alias
any
endsem
eps
in
out
sem
Identifiers
identifier
=
letter
{letter
| digit}.
Identifiers may be of arbitrary length. Case is significant.
Strings
string
= quote {anybutquote} quote
| apostrophe {anybutapostrophe}
apostrophe.
quote means the character ", apostrophe means the character '. anybutquote
is any character except quote, anybutapostrophe is any character except
apostrophe. Strings must not extend beyond line boundaries.
Numbers
number
= digit
{digit}.
Special characters
for the syntax description:
for the semantic description:
PR:
SUITE
Comments start with the string '--' and extend to the end of the line.
5.2 Cocol as a syntax description
language
The kernel of a Cocol text is the syntactic description of the language that the
generated compiler is to process.
Grammar
=
"GRAMMAR"
identifier
SyntaxDeclarations
Productions
"ENDGRAM" ,
The syntax description consists of declarations for terminals and nontermin
als
and of the context-free grammar. The identifier following the
keyword
Sees 5.2
Cocol as a syntax description language
107
GRAMMAR is the grammar name. It is the root symbol (start symbol) of the
grammar and must be declared as a nonterminal. We start with the productions
and continue with the declarations later.
5.2.1
Productions
The productions of the context-free grammar are written in an EBNF
suggested by Wirth [1982] (square brackets enclose optional expressions,
curly brackets denote repetition zero or more times).
Productions
=
Production
Expression
Term
Factor
= identifier "=" Expression
= Term {"|" Term}.
= Factor {Factor}.
Symbol
NzExXpression. u)
w(? Expression) |"
"RULES"
{Production}.
H("SExXpression:
".",
Vy"
"eps"
"any"
Symbol
W
identifier
| string.
5.1 Example Cocol grammar for real constants
RULES
Real
Integer
Exponent
= Integer "." [Integer]
= digit {digit}.
= "E" ["+"|"-"] Integer.
[Exponent].
The symbols Real, Integer and Exponent are nonterminals. The
symbols digit, "E", ".", "+" and "-" are terminals (they have no
productions).
eps
The symbol eps denotes the empty string (see Section 2.1) and is used to
describe empty alternatives.
5.2 Example
Sign
The use of eps
"+"
| "-"
| eps.
Isequivalentto
Sign°=
[24
"NT,
eps is not necessarily needed for the syntax description, but it is required if
one has to attach semantic actions to empty alternatives.
any
The symbol any denotes any terminal, which is not the start of the alternative
The compiler description language Cocol
108
Chap. 5
chain to which the any symbol belongs. Therefore any is a representative of
a whole set of terminals, i.e. all terminals which cannot be recognized instead
of itat that point in the grammar.
5.3 Example
The use of any
Option
=
"$"
any.
Here, any means any terminal.
Token
= keyword
| identifier
| number
| any.
Here, any means any terminal except keyword, identifier or number
(which may be recognized instead of it).
String
=
LY
{any}
sun
Here, any means any terminal except '"' (which may be recognized
instead of it).
Properties of a correct grammar
Coco generates a compiler only if the grammar is:
1.
2.
3.
4.
5.
complete: there must exist a rule for every nonterminal;
free of redundancy: every nonterminal must occur in at least one
derivation of the root symbol;
free of cycles: there must not be a nonterminal which can be derived
from itself in one or more steps;
terminating: every nonterminal must be able to produce a string of
terminals;
unambiguous: the grammar must be LL(1).
LL(1) conflicts do not necessarily mean serious errors. They can be viewed as
warnings in situations where the generated compiler will take the first
matching alternative and ignore the others. Sometimes this is what the user
wants, as in the well-known case of the dangling else.
5.4 Example How the compiler treats LL(1) conflicts
This is the grammar of the dangling else:
Statement = ...
| IfStatement
IfStatement = "IF" Expr "THEN"
|
B
Statement
["ELSE"
Statement].
When analyzing the string
IF a THEN IF b THEN c ELSE d
it is not clear whether the else clause belongs to the inner or to the outer
if. During parsing the first matching alternative is the else of the inner
Sec35:2
Cocol as a syntax description language
109
if. The generated compiler takes this alternative.
5.2.2
Declarations
All terminals and nonterminals must be declared before they can be used in
productions. Declarations have the following order:
SyntaxDeclarations
=
TerminalDeclarations
[PragmaDeclarations]
NonterminalDeclarations.
Terminal
declarations
TerminalDeclarations
AliasName
Symbol
= "TERMINALS" {Symbol [AliasName]}.
= "alias" Symbol.
= identifier | string.
Terminals are declared by their enumeration behind the symbol TERMINALS.
Consecutive token numbers are assigned to them in the order of their
declaration. The first symbol gets the number 1, the next one the number 2,
and so on. If a symbol name contains a special character, it must be enclosed
in quotes (e.g. "+", "plus-symbol").
The end-of-file symbol must not be declared. It always is assumed to
have the token number 0. The lexical analyzer has to supply it as the last
symbol of the input text. At its arrival, the syntax analyzer automatically
interprets it as an indication that the input is empty now. The end-of-file
symbol must not (and cannot) be specified in a production.
A symbol may be given an alias name, which is used in error messages
by the generated compiler. If the alias name is omitted, the symbol name is
used instead of it. Alias names allow the use of short names in the grammar
and of expressive names in error messages.
5.5 Example Terminal declarations
TERMINALS
id
Zu
en
alias
alias
alias
identifier
"becomes symbol"
semicolon
Pragma declarations
Pragmas are a special feature of Cocol. They are neither terminals nor nonterminals and must not be used in productions. They may occur at any
position in the input text and are read by the parser as if they were terminals,
but they do not belong to the syntax of the language (examples of pragmas are
Chap. 5
The compiler description language Cocol
110
options, the end-of-line symbol, and comments). Parsing is not influenced by
pragmas but they may carry semantic information (such as line numbers,
option values, etc.). Pragmas can be used to propagate information between
the passes of a multi-pass compiler.
PragmaDeclarations
=
Symbol
= dentitier
"PRAGMAS"
{Symbol}.
|| string:
Pragmas are declared by enumerating them behind the keyword PRAGMAS.
They are assigned consecutive token numbers, starting with the highest
terminal number plus one.
5.6 Example Pragma declarations
PRAGMAS
"end of
option
line"
The purpose of pragmas will become clear when we attach semantic actions to
them (see Example 5.11).
Nonterminal
declarations
NonterminalDeclarations
=
AliasName
Symbol
= "alias" Symbol.
= identifier | string.
"NONTERMINALS"
{identifier
[AliasName]}.
Nonterminals are declared by enumerating them behind the keyword NONTERMINALS. Their declaration order is insignificant. Nonterminals can be
given an alias name too. The root symbol (grammar name) must also be
declared as a nonterminal.
5.7 Example Nonterminal declarations
NONTERMINALS
Stat
alias
Statement
Expr
alias
Expression
5.3 Cocol as a semantic
description language
The semantics of a translation are specified by attaching semantic actions,
attributes and semantic declarations to the syntax description. The following
grammar of Cocol shows that there are only few locations (marked by
underlined text), where semantic parts have to be added to a Syntax description
in order to get an attributed grammar.
Sec. 5:3
Cocol as a semantic description language
CocolText
= "GRAMMAR"
111
identifier
SyntaxDeclarations
Productions
"ENDGRAM" .
SyntaxDeclarations
= TerminalDeclarations
[PragmaDeclarations]
NonterminalDeclarations.
TerminalDeclarations
= "TERMINALS"
PragmaDeclarations
=
{Symbol
?
[Attributes]
[AliasName]}.
[Attributes]
[SemAction]}.
"PRAGMAS"
{Symbol
NonterminalDeclarations
=
"NONTERMINALS"
{identifier
[Attributes]
AliasName
=
"ALIAS"
Symbol.
{Production}.
Productions
=
"RULES"
Production
=
identifier
Expression
= Term
{"|"
[Attributes]
[AliasName]}.
"="
Expression
".".
Term}.
Term
= Factor
Factor
= Symbol [Attributes]
| "(" Expression ")"
et" Expressuone 4)"
{Factor}.
| Tiieixpression
=}
| SemAction
|
a
Symbol
5.3.1
"eps"
| "any".
=
Semantic
identifier
| string.
actions
A semantic action is a statement sequence on the right-hand side of a production, which is executed after the symbol to the left of it has been recognized
and before the symbol to the right of it will be recognized. Semantic actions
may be written in any algorithmic programming language (in our Coco
implementation this language is Modula-2). There are two kinds of semantic
.
actions.
SemAction
= SimpleAction
Simple semantic
SimpleAction
| SemMacroCall.
actions
=
"sem"
{any}
"endsem".
A semantic action is enclosed by the keywords sem and endsem. Between
them, any statements such as assignments, procedure calls, conditional
statements and loops are allowed. The syntactical correctness of the statements
is not checked by Coco.
Chap. 5
The compiler description language Cocol
112
5.8 Example Semantic actions
We want to have a compiler which counts the words in a text. The
context-free grammar is
Text
=
{Word}.
Now we add semantic actions.
Text
=
sem
count:=0
endsem
{Word sem count:=count+l endsem}
sem IF count>0 THEN
WriteCard(count,3); WriteString("
words")
END
endsem.
Since syntactic and semantic parts are intermixed and hard to read, we
separate them in two 'colums':
Text =
{Word
}
sem count:=0 endsem
sem count:=count+l endsem
sem IF count>0 THEN
WriteCard(count,3);
END
WriteString("
words")
endsem.
Syntactic and semantic parts are separated clearly now. The production
must be read line by line from the left to the right.
The parameters of procedure calls in semantic actions may be specified as
input, output or transient parameters by writing the characters ‘J’, 'T' or '\T'
in front of them ('!', '', and '!4' on an ASCII keyboard). This is a simple
way to make procedure calls more readable. In the resulting compiler these
marks are removed.
5.9 Example Indication of data flow at parameters
ComputeValues
(Largument1,
Semantic macros
Sometimes a semantic
Jargument2, Tresult);
action is needed
at more
than one location in a
grammar. To avoid rewriting of the action, the user can define a macro for it
and call it whenever he needs it.
SemMacroDefinition
SemMacroCall
MacroName
= "sem" ":" MacroName
= "sem" "(" MacroName
= identifier.
":"
")"
{any} "endsem",
"endsem",
A macro definition is a semantic action headed by a macro name which is
enclosed in colons. It must be given in a special section of the semantic
declarations (see Section 5.3.4). Note: The use of semantic macros
reduces the code size of the resulting compiler.
also
See3.5.3
Cocol as a semantic description language
113
5.10 Example Semantic macros
The last semantic action of Example 5.8 is needed more than once, say.
The action is defined as a macro in the semantic declarations as follows
(see Section 5.3.4):
MACROS
:WriteCounter:
IF count>0 THEN
sem
WriteCard(count,3);
WriteString("
words")
END
endsem
It may then be called by writing
sem
(WriteCounter)
endsem
Semantic actions for pragmas
A semantic action may be associated with the declaration of a pragma. This
means that the action is executed every time the parser reads the pragma. In
this way a pragma can cause the execution of a semantic action although it
does not occur in any production.
5.11 Example Semantic actions for pragmas
PRAGMAS
eolsy
5.3.2
sem
PrintLineInfo;
Emit (veol)
endsem
---
call a semantic procedure
write pragma to next interpass
file
Attributes
Attributes describe semantic properties of symbols and their context.
Attributes
=
InArteriputess
"<" QutAttributes ">"
Bu
ETnAteributesz su OuFAttributesien
= nern
Ater pe
NATE}.
QutAttributes
InAttr
= “out™ "N FQutAttr"t?,T
= identifier | number.
OutAttr
=
>"
OutAttr}.
identifier.
In Cocol, attributes play the role of parameters of the grammar symbols. They
are Classified into input attributes, which are passed to a nonterminal for its
recognition, and output attributes, which arise during the recognition of a
symbol.
We also distinguish between formal and actual attributes. Formal attributes occur in the declaration of a symbol or are attached to nonterminals on
Chap. 5
The compiler description language Cocol
114
the left-hand side of a production. Actual attributes are attached to symbols on
the right-hand side of a production.
5.12 Example Attributes
NONTERMINALS
Variable
N:
;
<in:type;
out:object>
type:
-- object:
formal
formal
input attribute
output attribute
<in:type;
out:object>
zesteyper
-- object:
formal
formal
input attribute
output attribute
out:obj>
Sas
=-
actual
actual
input attribute
output attribute
RULES
Variable
= ie
Declaration
= Variable <in:tp;
{E08
Ob]?
Attribute names may be used like variables in semantic actions.
Attributes of nonterminals
Nonterminals may have input and output attributes of arbitrary types. The type
of an attribute is declared like the type of any other variable (see Section
5.3.4). Formal and actual attributes must be assignment compatible in the
sense of Modula-2, although this is not checked by Coco.
Whenever a nonterminal occurs, all its attributes must follow it. Formal
and actual attributes must correspond in number, sequence, and kind (in or
out). A numeric constant may only be specified as an actual input attribute.
Attribute evaluation is similar to parameter passing in procedures: before
the recognition of a nonterminal is started, the values of the actual input
attributes of the nonterminal are assigned to its formal input attributes; when
the nonterminal has been recognized, the formal output attribute values are
assigned to its actual output attributes.
Attributes of terminals and pragmas
Terminals and pragmas may have only output attributes. For implementation
reasons their size is restricted to word size. This restriction can be circumvented by using abstract data types for longer attributes.
Whenever a terminal or a pragma occurs, all its attributes must follow it.
For terminals, the names of the formal attributes are insignificant, but for
pragmas they are significant as they may be used in a semantic action.
Pragmas don't have actual attributes since they cannot appear on the righthand side of a production. The attribute values of terminals and pragmas are
supplied by the scanner (see Section 6.4.2).
SeCH.3
5.3.3
Cocol as a semantic description language
Context
115
conditions
There is no special language construct for context conditions in Cocol. They
are written as conditional statements in semantic actions. This has the
drawback of hiding them somewhat but has the advantage that arbitrary error
actions can be associated with them.
5.13 Example
Context conditions
sem IF typel=type2
THEN@
ELSE
° -- context
condition
-- semantic action
=-werronsaction
2
END
endsem
5.3.4
Semantic
declarations
All variables, procedures and named constants that are used as attributes or in
semantic actions must be declared. The compiler description can be viewed as
a module to which these objects are local. The user may also import objects
from other modules.
SemanticDeclarations
Declarations
=
of semantic
ObjectDeclarations
=
[ObjectDeclarations]
[SemMacroDeclarations].
objects
"SEMANTIC"
"DECLARATIONS"
modulatext.
modulatext is an arbitrary text of import statements, constant, type, variable,
or procedure declarations in Modula-2. The syntax of this text is not checked
by Coco.
5.14 Example Declarations of semantic objects
SEMANTIC
FROM
FROM
DECLARATIONS
InOut IMPORT WriteCard, WriteString;
UserModule IMPORT UserProcedure;
CONST
maxint
=
VAR
field:
ARRAY[1..100]
32767;
PROCEDURE
Equal(x,y:ARRAY
BEGIN
END
...
Equal;
OF
OF
CHAR;
CHAR) : BOOLEAN;
Chap. 5
The compiler description language Cocol
116
Declaration of semantic macros
At this point the user may declare a set of semantic macros in this place which
can be used in the productions.
SemMacroDeclarations
SemMacroDefinition
MacroName
= "MACROS" {SemMacroDefinition}.
= "sem" ":" MacroName ":" {any} "endsem".
= identifier.
An example of the definition and the use of a semantic macro can be found in
Section 5.3.1 (Example 5.10).
5.3.5 Scope of semantic
objects
For implementation reasons, the scope of a semantic object cannot be restricted to a single production: all declared and imported objects are global to the
whole compiler description. This means that the value of a semantic object
may be destroyed by a nonterminal that is processed between the assignment
and the use of that object. One has to resort to the following remedies:
1.
2.
Naming conventions. Every production should use its own names for
those attributes and semantic objects which may be destroyed by another
production. This reduces the problem to semantic objects of recursive
nonterminals.
Stacking. All values which may be destroyed by a nonterminal should be
stacked before this nonterminal is entered and unstacked afterwards.
5.15 Example Stacking of semantic objects
Expression<out:exprval>
Term<out
=
:exprval>
Warn
Term<out : x>
sem
Push (Jexprval)
endsem
sem
Pop(lexprval);
exprval:=exprval+x
Ie
endsem
Term<out:termval> =
Factor<out:termval>
EN
sem Push(Jtermval)
Factor<out :x>
Ie
sem
Pop(Ttermval)
endsem
; termval:=termval*x
endsem
Factor<out:factval> =
integer<out: factval>
| "("
Expression<out:factval>
")",
The original values of exprval and termval are destroyed by the recursiv
e
calls to Term and Factor so they must be saved on a stack.
6
The compiler compiler Coco
This chapter describes the compiler compiler Coco from the user’s point of
view. It contains everything the user needs to know in order to produce a
compiler with Coco. Section 6.1 presents a survey of the main characteristics
of Coco, Section 6.2 describes the components of the generated compilers,
and Section 6.3 shows how these compilers work. Since Coco produces only
the basic parts of a compiler, the user must supply additional modules to get a
complete compiler. Section 6.4 describes the interfaces for these modules and
Section 6.5 shows how a multi-pass compiler can be produced with Coco.
6.1
Characteristics
Coco is a program which generates the basic parts of a compiler from a
compiler description that is supplied as its input. The characteristics of Coco
are:
1.
2.
The compiler definition language Cocol is easy to read and easy to learn.
It is based on L-attributed grammars whose syntax rules are written in
Wirth's EBNF notation, and whose semantic actions are coded directly in
Modula-2.
Coco and the compilers produced by it are small and efficient, since they
use simple analysis techniques (table-driven top-down parsing and Lattributed grammars), and since the parser tables are encoded in a very
compact form (G-code). Therefore, they can be efficiently used on microcomputers with a small memory and limited processor performance.
117
The compiler compiler Coco
118
Chap. 6
The generated compilers contain a syntax error-recovery algorithm that is
automatically derived from the attributed grammar. This frees the user
from developing individual error handlers for each target compiler.
The user can attach modules of his own to the generated compiler parts,
thus adapting the compiler to his particular needs.
The input grammar is checked for completeness, consistency, and unambiguity.
Coco supports the production of multi-pass compilers for languages that
cannot be translated in a single pass, or that are so large that a single-pass
compiler will not fit into memory.
Coco offers the possibility of excluding selected source text portions from
syntax analysis. Thus, it is possible to describe complements of regular
languages, or to forward parts of the input from one pass to the next
without modification.
Besides terminals and nonterminals,
Coco provides a third class of
symbols called pragmas. Pragmas are special terminals that can appear at
arbitrary positions in the input stream, but are not part of the syntax of the
language itself (e.g. end-of-line symbols or compiler options).
How
to invoke Coco
The invocation of Coco and the naming of the files involved depend on the
computer on which Coco is running. We describe the version for the Apple
Macintosh. On the Macintosh, Coco is invoked by clicking its icon and by
selecting an input file from the open dialog box which shows all available text
files. Fig. 6.1 is a block diagram of a Coco run.
Compiler description
in Cocol
Syntax analyzer
Fig. 6.1 Input and output files of Coco
Coco reads a compiler description and produces the following:
1%
a Syntax analyzer as described in Section 2.5 together with parser tables
(G-code and symbol information);
Sec. 6.2
2.
3.
Components of the generated compiler
119
asemantic evaluator as described in Section 3.6;
asource list of the Cocol input with any syntax and semantic error
messages, with the results of the grammar tests and with statistical data
about the grammar.
The syntax analyzer and the semantic evaluator are generated from program
frames on files. On the Macintosh, the generated parts are written to the
following files:
Syntax analyzer:
grammarnamesyn.DEF,
grammarnamesyn
Semantic evaluator:
grammarnamesem.DEF,
grammarnamesem.MOD
.MOD
Source list:
inputname.LST
grammarname is the grammar name specified in Cocol, inputname is the
name of the input file. Section 8.3 shows an example of these files.
6.2 Components
of the generated
compiler
In order to get a complete compiler, the user must attach his own modules to
the compiler parts produced by Coco. The following table shows which parts
are generated by Coco, which must be supplied by the user, and which are
available as standard modules.
Generated by Coco
User-supplied
Standard module
Syntax analyzer
Semantic evaluator
Main program
Lexical analyzer
Semantic modules
Error message module
Hence, Coco generates only the basic parts of a compiler (those which are
described by the attributed grammar). For flexibility, the remaining parts may
be written individually, although they are very similar in all compilers (see
program listings in Appendix F).
The lexical analyzer can be generated with the scanner generator Alex
(Mössenböck [1986]), which is a separate tool not described in this book. It
produces a scanner module in Modula-2 that exactly fits to the modules
generated by Coco.
The semantic modules are written in Modula-2. Only few conventions
have to be obeyed (see Section 6.4).
Chap. 6
The compiler compiler Coco
120
6.3 Operation
of the generated
compiler
Figure 6.2 shows the overall structure of a generated single-pass compiler.
The main program calls the syntax analyzer. The syntax analyzer parses the
source program by interpreting the G-code and executes semantic actions
contained in the semantic evaluator, which in turn call semantic procedures to
emit the target code. A filter procedure between the actual syntax analyzer and
the lexical analyzer filters any pragmas out of the input stream and processes
them semantically.
To create a multi-pass compiler, one must write a compiler description for
each pass separately and translate it with Coco. This results in a syntax
analyzer and a semantic evaluator for each pass. Figure 6.3 shows the
interaction of the generated parts in a two-pass compiler. The first pass reads
the source program, processes it and generates an intermediate language (IL).
The second pass reads the intermediate language, processes it again and
generates the target code.
Main program
Syntax analyzer
Lexical
analyzer
Error message
module
Error
Fig. 6.2
Semantic
evaluator
i
Semantic
Overall structure of a generated single-pass compiler
Main program
Syntax analyzer 1
Syntax analyzer 2
Lexical
Semantic
Semantic
analyzer
evaluator 1
evaluator 2
m,
procedures 1
[2] See
procedures 2
Fig. 6.3 Overall structure of a generated two-pass compiler
eS)
Sec. 6.4
Interfaces of the generated compiler
6.4 Interfaces
of the generated
121
compiler
A compiler nucleus produced by Coco has four interfaces (shown in Fig.
6.4). It is called by the main module, reads the input stream, translates it into
an output stream, and produces error messages. This nucleus is the same for
all generated compilers. The user must attach some of his own modules to
these interfaces to adapt the compiler to his particular needs.
Operating system
interface
Input
E
Syntax analyzer
nee
Semantic evaluator
=
interface
Fig. 6.4 Interfaces of a generated compiler
«
6.4.1
#
Caller
interface
The main program must call the syntax analyzer of the generated compiler to
perform the syntax analysis and semantic processing of the input text. The
following definition module shows the interface between the syntax analyzer
and the main program.
DEFINITION
VAR
MODULE
printinput:
printnodes:
grammarnamesyn;
BOOLEAN;
BOOLEAN;
PROCEDURE Parse (VAR
END grammarnamesyn.
(*trace
(*trace
the
the
input?*)
parser?*)
correct :BOOLEAN) ;
grammarnamesyn is the name of the generated syntax analyzer (the grammar
name from Cocol with the suffix syn). The procedure Parse is the actual
syntax analyzer. It must be called from the main program of the compiler.
Prior to this, the lexical analyzer (see Section 6.4.2) must be initialized and
ready to supply the first symbol. The parameter correct shows if syntax
errors have been found. The variables printinput and printnodes can be set to
TRUE in order to produce a trace of the syntax analysis for debugging.
122
The compiler compiler Coco
6.4.2 Input
Chap. 6
interface
The syntax analyzer expects the input from a procedure GetSy which must be
supplied by the user in a module grammarnamelex (grammar name from
Cocol with the suffix lex). The corresponding definition module must look
like this:
DEFINITION
MODULE
grammarnamelex;
VAR
typ:
at:
line:
col:
CARDINAL;
ARRAY[1..10]
CARDINAL;
CARDINAL;
PROCEDURE GetSy;
END
OF CHAR;
(*current symbol number*)
(*attributes of the current symbol*)
(*current symbol line number*)
(*current symbol column number*)
grammarnamelex.
Every time the syntax analyzer needs a new terminal, it calls the procedure
GetSy which returns the symbol number, line number and column number of
the next source symbol in the global variables typ, line and col. It also fills
the array at. If a symbol has i attributes, then az[1..i] holds their values. at is
implicitly imported in any attributed grammar. It can contain a maximum of 10
attributes which experience has shown is sufficient. If imported, typ, line,
and col can be used in the attributed grammar to get the type and the attributes
of symbols that are recognized by the special symbol any.
The symbol numbers returned by GetSy must correspond to the declaration sequence of the terminals and pragmas in the compiler description. The
first declared symbol must have the number 1, the next symbol must have 2
and so on. At the end of the input stream GetSy must return an end-of-file
symbol which by convention has the symbol number 0.
6.4.3 Output
interface
For the generation of object code and other compiler outputs the user
is not
bound by any restrictions. One can arbitrarily attach one's own modules
to the
compiler nucleus and call one's procedures from the semantic actions
of the
attributed grammar.
Thus, the output interface is the interface to all user-supplied
semantic
modules. It is described by the import clauses in the semanti
c declarations of
the compiler description and by the imported definition modules
.
Sec. 6.4
Interfaces of the generated compiler
6.4.4 Syntax
error
123
interface
The syntax analyzer of the generated compiler automatically recovers from a
syntax error and gathers information about the cause of error. However, the
user must provide for the output of the error message by supplying a
procedure SyntaxError exported from a module Errors (see standard module
in Appendix F). This procedure is called by the syntax analyzer each time a
syntax error occurs. It can print the error message immediately or store it in
order to display all error messages together at the end of the compilation. The
definition module Errors must have the following form:
DEFINITION
TYPE
MODULE
Symbolname
=
Errors;
ARRAY[1..25]
Errorptr
= POINTER
Errornode
= RECORD
txt:
Symbolname;
ils
CARDINAL;
next: Errorptr;
OF
CHAR;
TO Errornode;
(*symbol name*)
(*length of symbol name*)
(*to next symbol of the same
message*)
END;
PROCEDURE SyntaxError
END Errors.
(symbols:Errorptr;
line,col:CARDINAL) ;
SyntaxError has three parameters: symbols is a pointer to a linked list of
those symbols that are expected at the error location (if available, alias names
are uSed in place of symbol names). The parameters line and column indicate
the line number and column number of the error location.
Figure 6.5 shows a sample list of expected symbols pointed to by the
parameter symbols.
Bl
a aig
ee
SS
pee
Ä
Fig. 6.5 List of expected symbols. colon is the symbol causing the error;
semicolon or END have been expected instead
The first node of the list contains the symbol that caused the error (in this case
the colon), the subsequent nodes contain the symbols that were expected
instead
of the erroneous
symbol
(in this case
semicolon
and END).
SyntaxError can now produce the following message:
Syntaxerror
in
line...column...near
colon:
semicolon
or END
expected
Chap. 6
The compiler compiler Coco
124
6.5 Generation
of multi-pass
compilers
With L-attributed grammars, some languages can only be translated in multiple
passes. Some other languages are so complex that a single-pass compiler
would not fit into the memory of a microcomputer. For these reasons, a
compiler must often be split into several passes.
Each pass is a compiler of its own. It reads the source program, or an
intermediate language from which it produces a new intermediate language, or
the target program. If somebody wants to write a multi-pass compiler, he must
write a compiler description for each pass, and then put the produced compiler
passes in sequence (see Fig. 6.3). Cocol has features that are specially
designed for the generation of multi-pass compilers:
Input from an intermediate language. It is possible to read an intermediate language file instead of a source text by simply supplying an appropriate input procedure GetSy (see Section 6.4.2)
Pragmas serve mainly to pass control information from one pass to the
next in the intermediate language. Before they get to the syntax analyzer of the
next pass they are extracted from the input stream and processed semantically.
The symbol any. The grammar symbol any can be used to exclude parts
of the source text from the syntax analysis, and forward it unchanged to the
next pass.
6.1 Example Application of any
A typical application of the complement symbol any is to process
declarations in the first pass of a compiler and statements in the second
pass. The following example skips statements and forwards them to the
next pass:
Block =
Declarations
BEGINSY
{ any
sem
Copy (4typ, dline, dcol, dat) ;
== copy symbol to next
--
intermediate
language
endsem
}
ENDBLOCKSY.
Here, any denotes all terminal symbols except ENDBLOCKSY. It can
be semantically processed using the variables typ and at exported by the
lexical analyzer (see Section 6.4.2).
7
The implementation
In this chapter we will show how Coco is structured and how it works. First
we provide an overview of its design (7.1). Then we describe the internal data
structures such as the symbol list (7.2) and the top-down graph (7.3), as well
as the collection of some sets of terminal symbols (7.4). Section 7.5 covers
various grammar tests which the top-down graph is subjected to before the
target compiler is generated. The last three sections cover the generation of the
compiler parts, namely the parser tables (7.6), the syntax analyzer (7.7), and
the semantic evaluator (7.8). Section 8.3 shows an example of the generated
compiler parts for a specific input grammar.
At the beginning of each section, a diagram is used to illustrate how this
section relates to the structure of chapter 7.
The implementation
Sn
Moc
Structure
of the
symbol
list
Structure
of the
top-down
graph
Collecting
the
symbol sets
Grammar
tests
Generation
of the
parser
tables
Generation
of the
syntax
analyzer
Generation
of the
semantic
evaluator
Fig. 7.1 Structure of Chapter 7
We describe algorithms in an abstract manner, using Adele or Cocol.
Appendix F contains the concrete implementation of Coco. Details that are not
125
necessary for understanding the algorithms are
the program listings.
Coco is written in Modula-2 and has been
computers including Macintosh, IBM-PC,
compilers in Modula-2 and was used for its
describe the implementation on the Macintosh.
7.1
Chap. 7
The implementation
126
omitted as they can be found in
implemented on various microAtari and Lilith. It produces
own implementation, too. We
Survey
Like any compiler, Coco is composed of an analysis part (front end) and a
synthesis part (back end). The analysis part consists of a lexical analyzer and
a syntax analyzer. The synthesis part consists of a semantic evaluator with
several semantic modules attached to it (Fig. 7.2).
Main program
Syntax analyzer
Lexical analyzer
Symbol list
handler
Top-down graph
handler
Semantic evaluator
Grammar tests
Generation
of the
syntax analyzer
Generation
of the
semantic evaluator
Fig. 7.2 Structure of Coco with its main tasks shown as semantic modules
From the above, the main tasks of Coco are:
1.
2.
3.
4.
handling a symbol list: Symbol information is stored (name, symbol
number, attribute, scope, etc.);
handling a top-down graph: Graph nodes are generated and linked to
form subgraphs;
testing the grammar: The grammar is checked to see if it is complete,
non-circular, and LL(1). It is also checked to see whether all nonterminals
can be reached and derived into terminal strings;
generating the syntax analyzer: The source code of the generated syntax
analyzer is built from fixed frame parts, and variable parts derived from
Sec 7.2
Structure of the symbol list
127
the compiler description. It includes LL(1) parser tables generated from
the attributed grammar;
5.
generating the semantic evaluator: The source code of the semantic
evaluator is built from fixed frame parts and from semantic actions and
declarations copied from the compiler description.
The main algorithm of Coco is as follows:
Coco:
Initialize
lexical
Parse (Tok);
analyzer;
7
SOC CU LONE a4
if ok then
Find deletable ‘symbols;
Insert eps-nodes before deletable
Delete redundant eps-nodes;
nt's;
Get symbol sets;
Test grammar(lok);
end;
EWOK
then Generate compiler;
else Print error message;
end;
end Coco;
== Section
-- Section
=—SSCCLVONU
== Section
-- Section
--
141
7.3.3
Teor
7.4
7.5
Sections
7.6
and
7.7
The procedure Parse parses the input text and calls the semantic actions for
the construction of the top-down graph and the symbol list as well as for the
generation of the semantic evaluator. After some tests and transformations of
the data structures the target compiler is produced.
7.2 Structure
of the symbol list
Coco handles a symbol list with information about terminals, nonterminals,
and pragmas. This section describes its representation and shows how it is
filled.
7.2.1
Symbol
list representation
The symbol list is a linear list of symbol nodes each of them describing a
syntax symbol. The list is indexed by symbol numbers.
TYPE
Symboltype
=
(eps,t,pr,nt,any,err);
(*eps, terminal, pragma,
Symbolnode = RECORD
spix:
CARDINAL;
nonterminal,
(*spelling
any,
index
error-symbol*)
of symbol
name*)
Chap. 7
The implementation
128
The implementation
Structure
ofthe
top-down
2
symbol list
representation
Collecting
the
symbol sets
es
Grammar
tests
Generation
of the
parser
tables
Generation
of the
syntax
analyzer
Generation
of the
semantic
evaluator
symbol list
construction
Fig. 7.3 Structure of Section 7.2
aliasspix:
CARDINAL;
nra:
CARDINAL;
CASE typ: Symboltype OF
t,eps,any:
(*spelling index of alias
(*number of attributes*)
(*symbol kind*)
name*)
(*nothing*)
| pr:
seml,sem2:
CARDINAL;
(*pragma
semantics*)
nt,err:
start:
CARDINAL;
(*start
del:
BOOLEAN;
(*TRUE
firstat:
Attributeptr;
(*to
of top-down
if
first
graph*)
deletable*)
formal
attribute*)
END;
END;
Symbollist
= ARRAY[0..maxsymbol]
OF
Symbolnode;
The fields spix, aliasspix, nra, and typ are filled when the symbol is
declared. For terminals, this is the only information stored in the symbol list.
The node of a pragma has two additional fields denoting the semantic
actions which the generated compiler has to execute when it reads this pragma.
The first action is for the output attribute assignments (Section 7.8.4), the
second is the semantic action associated with this pragma in Cocol. If no
actions are to be executed, both fields are zero. The fields are filled when the
pragma is declared.
Nonterminal nodes contain additional information: The field start points
to the root of the top-down graph of this specific nonterminal. It is set when
the corresponding rule has been processed. At the same time, the field del is
set, which indicates whether the nonterminal is directly deletable, i.e. if it can
be immediately derived into the empty string. The indirect deletability of a
nonterminal can only be determined when the top-down graphs of all
nonterminals have been built (see Section 7.4.1). Finally, nonterminal
nodes
have a field firststat pointing to a list of formal attributes. This list contains
Seen 7.2
Structure of the symbol list
129
the name and direction (input-output) of each attribute of the nonterminal. The
attribute list is built when the nonterminal is declared. It is implemented as
follows:
TYPE
Direction
Attributeptr
= (up,down);
(*attribute
= POINTER TO Attribute;
Attribute
=
RECORD
spix:
CARDINAL;
(*attribute
dir:
Direction;
(*up, down*)
next:
Attributeptr;
(*to
END;
direction*)
next
name*)
attribute
of
same
nt*)
7
Names of symbols and attributes are not stored in the symbol list directly.
Rather, they are stored in a name list which is an array of characters. Instead
of the actual names the symbol list contains only their address in the name list,
called spix (spelling index). The lexical analyzer handles a hashed list of
'spixes' for fast searching of names.
7.2.2
Symbol
list construction
For each symbol in the syntax declarations of Cocol, a symbol node with a
successive number is allocated. Therefore, symbol numbers correspond to the
declaration sequence of the symbols. The following procedures are used to
generate, access, and modify symbol nodes:
PROCEDURE NewSy (spix:CARDINAL;
PROCEDURE SyNr(spix:CARDINAL):
PROCEDURE
PROCEDURE
GetSy(sy:CARDINAL;
RepSy(sy:CARDINAL;
typ:Symboltype) : CARDINAL;
CARDINAL;
VAR sn:Symbolnode) ;
sn:Symbolnode) ;
NewSy generates a new symbol node with the fields spix and typ and
returns its node number. SyNr searches for the symbol with the name spix.
If spix is found, SyNr returns the corresponding symbol number, else it
returns 65535 (the value of the null symbol). GetSy gets the symbol node sn
corresponding to symbol number sy. Repsy replaces the symbol sy by the
node sn.
Attributes are processed with the following procedures:
PROCEDURE
NewAt (sy, spix:CARDINAL;
PROCEDURE
PROCEDURE
GetAt(sy,n:CARDINAL; VAR spix:CARDINAL;
CompleteAt (sy,n:CARDINAL) : BOOLEAN;
dir:Direction);
VAR
dir:Direction);
NewAt defines a new attribute for the symbol sy. For nonterminals, it also
appends the name (spix) and the direction (dir) of the attribute to the attribute
list. GetAt gets the fields spix and dir of the nth attribute of the nonterminal
sy. If sy has less than n attributes, then 0 is returned as the value of spix.
130
The implementation
Chap. 7
CompleteAt returns TRUE if the symbol sy has exactly n attributes. The
implementation of these procedures is trivial as can be seen in Appendix F.
7.3 Structure
of the top-down
graph
The top-down graph has already been described in Section 2.3 as an internal
grammar representation. In Coco, it is implemented in a somewhat extended
form. First, we will describe the extended top-down graphs, and then show
how they are generated. In Section 7.6.2, we will describe the translation of
top-down graphs into G-code.
The implementation
Structure
of the
symbol
Collecting
the
symbol sets
Grammar
tests
Generation
of the
parser
Generation
of the
syntax
Generation
of the
semantic
tables
analyzer
evaluator
list
Top-down
Top-down
graph
graph
representation
construction
Insertion
Removal
of
of
eps-nodes
redundant
eps-nodes
Fig. 7.4 Structure of Section 7.3
7.3.1 Top-down
graph
representation
The top-down graph is a linear list of graph nodes. Each symbol on the righthand side of a Cocol rule is represented by a node. The pointers linking
the
nodes are indices of this list.
TYPE
Topdowngraph = ARRAY{1..maxnode] OF Graphnode;
Graphnode
= RECORD
typ:
(eps,t,nt,any);
(*symbol kind*)
sp:
CARDINAL;
ipo
ra
CARDINAL;
CARDINAL;
(*t,nt: pointer to node in symbol
(*eps:
pointer to eps-set*)
(*any:
pointer to any-set*)
(*left pointer*)
(*right pointer*)
list*)
Sec. 7.3
Structure of the top-down graph
seml:
sem2:
sem3:
line:
link:
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
‘
131
(*in-attribute action*)
(*out-attribute action*)
(*explicit semantic action*)
(*line number in the source text*)
(*pointer to the next right end*)
END;
Compared to Section 2.3 the graph node is extended by three semantic
numbers, a line number, and a pointer (link). These fields have the following
meaning:
seml:
action number of the input attribute assignments or zero (Sect. 7.8.4);
sem2: action number of the output attribute assignments or zero (Sect. 7.8.4);
sem3: number of the user-written semantic action which follows this symbol
in the Cocol text, or zero;
line:
line number of this symbol in the Cocol text (for error messages);
link:
pointer for linking the right ends of a graph (the right ends are the
nodes whose right pointer is zero).
=
7.3.2
Top-down
graph
construction
It is useful to think of a top-down graph as a ‘black box' linked to its environment by two pointers head and tail. The interior of the black box may contain
a single node, or an arbitrarily complex graph with several nodes. (Fig. 7.5).
Fig. 7.5 Top-down graph as a black box’
head points to the root of the graph and fail to its right end. Since the right
end of the graph usually consists of several nodes, these nodes are linked (see
dashed lines above). The following procedures are used to generate and
process the graph nodes:
PROCEDURE
PROCEDURE
NewNode (typ:Symboltype;
GetNode (n:CARDINAL; VAR
PROCEDURE
RepNode
(n:CARDINAL;
sy,line:CARDINAL) : CARDINAL;
gn:Graphnode);
gn:Graphnode) ;
NewNode creates a graph node containing the specified symbol sy, having
The implementation
132
Chap. 7
the symbol type typ, and the line number line and returns its node number.
GetNode returns the nth graph node in gn. RepNode replaces the nth
graph node by gn.
Two top-down graphs can be combined to a new graph by arranging
them either side by side as successive components or below one another as
alternatives. In either case, a new top-down graph with head and tail is
produced.
Linking of successive components
Coco uses the procedure ConcatRight to link sucessive components.
ConcatRight (theadl,
param
local
ftaill,
Jhead2,
headl,head2,taill,tail2:
p:2Cardıinal;
Jtail2):
Cardinal;
begin
p:=taill;
while p<>0 do
gn(p) .rp:=head2;
p:=gn(p)
.link;
end;
Pad lista
las
end ConcatRight;
ConcatRight links the graphs (headl, taill) and (head2, tail2) via right
pointers giving the new graph (headl, taill). The right ends of the first
graph are linked with the root of the second graph (see Fig. 7.6).
Fig. 7.6 Linking of successive components
Secs 7.3
Structure of the top-down graph
133
Linking of alternatives
Coco uses the procedure ConcatLeft to link alternatives.
ConcatLeft (fheadl,
{taill,
Jhead2,
Jtail2)
param headl,head2,taill,tail2:
Cardinal;
local p: Cardinal;
begin
p:=headl;
while gn(p).lp<>0 do p:=gn(p).1lp; end;
gn (p) .lp:=head2;
p:=taill;
while gn(p) .link<>0
gn(p) .link:=tail2;
end ConcatLeft;
ConcatLeft
do p:=gn(p).link;
end;
links the graphs (headl, taill) and (head2, tail2) via left
pointers giving the new graph (headl, taill). The end of the first alternative
chain of the first graph is linked with the root of the second graph. The right
ends of both graphs are connected in a similar way (see Fig. 7.7).
Fig. 7.7 Linking of alternatives
An attributed grammar for the construction of top-down graphs
In order to show that attributed grammars can be used for documentation as
well, we will describe the generation of the top-down graph for one syntax
rule by means of an attributed grammar. The complete top-down graph is
composed of the graphs for all syntax rules.
Chap. 7
The implementation
134
The grammar of EBNF rules
Rule
Expression
Term
= identifier "=" Expression
= Term {"|" Term}.
= Factor {Factor}.
symbol | "eps" | "any"
| "(" Expression ")"
Factor
".".
[eUIeExpressmon))"
(USS Bx pices SHOnmun aus
contains the nonterminals Expression, Term, and Factor. Each of these
nonterminals supplies as an output attribute a top-down graph with the ends
head and tail. These graphs can be linked in two different ways: factor
graphs are linked via right pointers, term graphs via left pointers
(ConcatRight and ConcatLeft). A new top-down graph is formed in either
case, which is again represented by head and tail.
Expression, Term, and Factor also supply an output attribute del,
which indicates if the term or factor is directly deletable, i.e. if it can be
derived into the empty string. del is entered into the symbol list.
The attributed grammar uses the procedures described above to handle the
symbol list (GetSy, RepSy, SyNr) and the top-down graph (NewNode,
ConcatLeft, ConcatRight).
GRAMMAR
Rule
SEMANTIC
FROM
FROM
VAR
--
graph
generation
for
a single
rule
DECLARATIONS
cocogra
cocosym
IMPORT
IMPORT
NewNode, ConcatLeft, ConcatRight, Push, Pop;
GetSy, RepSy, SyNr, Symbolnode, anysy, epssy;
h1,h2,h3: CARDINAL;
t1,t2,t3: CARDINAL;
dell,del2,del3:BOOLEAN;
sn: Symbolnode;
spix,syspix: CARDINAL;
sy:
CARDINAL;
----
head pointers
tail pointers
TRUE, if element
---
spelling indices
symbol number
is deletable
MACROS
sem
:PushValues:
Push(Jh1);
Push(Jh2);
Push(Jt1);
Push(Jt2);
Push (Jdell);
Push(Jdel2);
endsem
sem
:PopValues:
Pop(Tdel2); Pop(Tt2); Pop(Th2);
Pop (Tdell);
endsem
Pop (ttl);
Pop (Th1);
TERMINALS
"
(a
uh) "
"
fe
wy "
symbol<out :spix>
" {”
” }"
wow
" n ”
"eps"
"any"
Sec. 7.3
Structure of the top-down graph
135
NONTERMINALS
Rule
Expression <out:hl,tl,dell>
Term <out:h2,t2,del2>
Factor
<out:h3,t3,del3>
RULES
Rule =
symbol<out:syspix>
win
Expression<out:hl,t1,dell>
sem
sy :=SyNr (dbsyspix) ;
#
Get Sy (Lsy, Tsn);
sn.del:=dell;
sn.start:=hl;
RepSy (Jsy,Jsn);
endsem
Expression<out:hl,tl,delt>
=
Term<out:hl,tl,dell>
{ "|" Term<out :h2,t2,del2>
sem ConcatLeft (fh1,ft1,4h2,4t2);
dell:=dell
endsem
OR del2;
Term<out :h2,t2,del2> =
Factor<out:h2,t2,del2>
{ Factor<out:h3,t3,del3>
sem
ConcatRight (fh2,1t2,4h3,Yt3);
del2:=del2 AND del3;
endsem
}%
Factor<out:h3,t3,del3>
symbol<out :spix>
=
sem
sy:=SyNr(Jspix);
h3:=NewNode (Jsy);
t3:=h3;
del3:=FALSE;
endsem
| "eps"
sem
h3:=NewNode
(Jepssy);
t3:=h3;
del3:=TRUE;
sem
h3:=NewNode (Janysy);
endsem
(Tr
sem (PushValues) endsem
Expression<out:h3,t3,del3>
N
sem (PopValues) endsem
aes le
sem (PushValues) endsem
Expression<out :h3,t3,del3>
t3:=h3;
del3:=FALSE;
sem
hl :=NewNode (Jepssy) ; tl:=hl;
endsem
| "any"
ConcatLeft (th3,!t3,lh1,/t1);
del3:=TRUE;
endsem
Chap. 7
The implementation
136
Mt ps
he
Expression<out
sem (PopValues) endsem
sem (PushValues) endsem
:h3,t3,del3>
sem
h1:=NewNode (Jepssy);
tl:=hly
ConcatRight
(th3, 1t3,4n3, 4t3);
ConcatLeft (fh3,ft3,Jh1,Jt1);
t3:=t1; del3:=TRUE;
endsem
sem (PopValues) endsem.
Ww
ENDGRAM
Figure 7.8 shows which graphs are produced by the translation of an EBNF
expression in brackets. As an example, we select the expression abc.
(ablc)
[abIc]
{ablc}
Fig. 7.8 Translation of an EBNF expression into a top-down graph
7.3.3
Insertion
of eps-nodes
Normally each symbol of the input grammar corresponds to one node in the
top-down graph. However, from Fig. 7.8, we see that the translation of
expressions in square or curly brackets leads to the generation of additional
eps-nodes which have no counterpart in the input grammar. They are inserted
by Coco to indicate that an expression is deletable.
There are also some other cases where eps-nodes must be inserted into
graphs: The algorithm of Section 7.3.2 will fail if a term that begins with an
expression in curly brackets has an alternative. The production
g =
(tay be
ey
would lead to the top-down graph shown in Fig. 7.9.
Sec. 7.3
Structure of the top-down graph
137
Fig. 7.9 Erroneous top-down graph for S = ({a} b Ic)
This is obviously wrong because once an a has been identified, only a or b
should follow, not c, as is possible in the above graph. This problem is
solved by including an €ps-node in front of the first alternative (Fig. 7.10).
anes
u
Fig 7.10 Correct top-down graph for S = ({a} b | c) with inserted eps-node
This graph is now correct since after identifying an a, only a or b can
follow, not c. For each eps-node, the set of terminal successors (eps-sef) is
computed (Section 7.4.4). The eps-set of the node el (namely {a, b}) allows
us to distinguish between the two alternatives in the above example. Epsnodes-are inserted in front of all expressions in curly brackets during the
construction of the top-down graph (see attributed grammar in Appendix F).
Deletable nonterminals present a similar problem. If a nonterminal is
deletable, it is always processed by the syntax analyzer, because if the current
input symbol is not a start symbol of the nonterminal itself it may still be a
valid successor. Now, if there is a node which is an alternative of a deletable
nonterminal, this node will never be visited, since the nonterminal will always
be recognized beforehand. Coco solves this problem by inserting an eps-node
in front of a deletable nonterminal. The eps-set of this node is then used to
distinguish between the alternatives. From the graphs shown in Fig. 7.11,
where the deletable nonterminal Y has an alternative, the graphs in Fig. 7.12
are produced.
1
SS NO
Ne
1
b
&
Fig. 7.11 Top-down graph with deletable nonterminal Y
X:
—
el
i
b
—~> Y—-
a
Y:
—
c
i
€2
Fig. 7.12 Top-down graph with inserted eps-node in front of deletable nonterminal Y
Chap. 7
The implementation
138
The eps-set of the node el (namely {a, c}; c is a terminal start of Y and a
is successor of the deletable nonterminal Y) enables the selection between the
two alternatives starting with el and b. There are no more alternatives to the
node with the deletable nonterminal Y. It can therefore be safely visited by the
syntax analyzer.
The algorithm for the insertion of eps-nodes in front of deletable
nonterminals is shown below.
Insert eps-nodes before deletable
local gn,gnl: Graphnode;
sn:
Symbolnode;
begin
for all nodes i do
GetNode (Li, Tgn) ;
if (gn.typ=nt) and
nt's:
(gn.lp<>0)
then
GetSy(Jgn.sp,Tsn);
if sn.del then
-- deletable
gnl:=gn; gnl.1p:=0;
-j:=NewNode (Jnt,J0,40);
--
nt with
gnl now
create
alternative
holds the deletable
empty
nt
node
RepNode (4j,Jgnl);
gn.typ:=eps; gn.sp:=0;
-- gn holds the new eps-node
gn.rp:=j; gn.seml:=0; gn.sem2:=0; gn.sem3:=0;
RepNode (Li, tgn) ;
end;
end;
ame => ioe
end Insert eps-nodes
7.3.4
Removal
before
of redundant
deletable
nt's;
eps-nodes
When expressions in square or curly brackets are translated, eps-nodes arise
that can be removed again if it turns out that the expressions have successors
(see Fig. 7.13). The algorithm for the removal of redundant eps-nodes is
shown below:
Delete redundant eps-nodes:
global visited: set of nodenumbers;
sn:
Symbolnode;
;
begin
visited:={};
for
all
nonterminals
i do
GetSy (Li, Tsn);
DelEps (Jsn.start);
end;
end
Delete
redundant
eps-nodes;
--
mark
list
for
visited
nodes
sec. 7.3
Structure of the top-down graph
EBNF expression
Graph with
redundant
eps-nodes
139
Equivalent graph
without redundant
eps-nodes
[a] b
—
lee
(a)b
ee
en
’
€e—b
Fig. 7.13 Creation and removal of redundant eps-nodes
The procedure DelEps(Jloc) deletes all redundant
graph with the root loc. Redundant eps-nodes
following characteristics: they have no associated
pointer is null, and their right pointer is not null.
from the left pointer of some other node.
DelEps (4loc) :
pafam
loc:
global
local
begin
if
Cardinal;
visited:
gn,gnl:
loc=0
or
eps-nodes in the top-down
can be recognized by the
semantic actions, their left
They always receive a link
set of nodenumbers;
Graphnode;
loc
in visited
then
--
return
mark
end;
list
for
visited
--
end
or
cycle
visited:=visited+t{loc};
GetNode
(Lloc, Tan);
if gn.lp<>0 then
-- test if alt. node
GetNode (Jgn.1p, Tgnı);
if (gnl.typ=eps) and (gnl.sem3=0)
and (gnl.lp=0) and (gnl.rp<>0) then
gn.lp:=gnl.rp;
RepNode (Jloc, Jan);
end;
end;
DelEps (Jgn.1p);
DelEps (Jgn.rp);
end
DelEps;
is a redundant
eps
nodes;
Chap. 7
The implementation
140
7.4 Collecting
the symbol
sets
So far, the input grammar has been read and the symbol list as well as the topdown graph have been built. From these two data structures, Coco calculates
the symbol sets needed for the grammar tests and for the generated compiler.
The implementation
acs
Structure
of the
Structure
of the
Collecting
the
symbol
list
top-down
graph
|symbol sets
Deletable
nonterminals
co
Grammar
tests
Terminal
start symbols
of
Terminal
successors
of
nonterminals
nonterminals
To
Generation
of the
Generation
of the
Generation
of the
parser
tables
syntax
analyzer
semantic
evaluator
eps-sets
any-sets
Fig. 7.14 Structure of Section 7.4
Coco collects four sets of terminals:
1.
2.
3.
4.
start symbols of nonterminals;
successors of nonterminals;
successors of eps-nodes (eps-sets);
sets represented by any-symbols (any-sets).
The following procedures are used to access the top-down graph and the
symbol list:
PROCEDURE
PROCEDURE
PROCEDURE
PROCEDURE
GetNode(loc:CARDINAL;
VAR gn:Graphnode);
RepNode(loc:CARDINAL; gn:Graphnode);
GetSy(sy:CARDINAL; VAR sn:Symbolnode) ;
RepSy (sy:CARDINAL; sn:Symbolnode) ;
GetNode gets the graph node gn with the number loc. RepNode replaces
the graph node with the number loc by the node gn. GetSy gets the symbol
node sn with the number sy. RepSy replaces the symbol node with the
number sy by the node sn.
Before the symbol sets are collected, it is necessary to find out which
nonterminals are deletable.
Sec. 7.4
7.4.1
Collecting the symbol sets
Deletable
141
nonterminals
All deletable nonterminals are tagged in the symbol list. In the first step,
tagging of those symbols which’can be directly derived into the empty string is
carried out. In the second step, tagging of all those nonterminals whose topdown graph can be traversed along a path of already tagged symbols is carried
out. The second step is repeated until no more deletable symbols are found.
The directly deletable nonterminals are found when the top-down graph is
created (see Section 7.3.2). The following algorithm finds the indirectly deletable nonterminals.
“
Find
deletable
local
symbols:
sn:
changed:
Symbolnode;
Boolean;
begin
repeat
changed:=false;
for all nonterminals
i do
Getsy(Ji,Tsn);
if not
sn.del
and
Deletable(Jsn.start)
sn.del:=true; RepSy (Ji,Ysn);
end;
end;
«,
until not changed;
end Find deletable symbols;
then
changed: =true;
The procedure Deletable(\
loc) checks if the top-down graph rooted at loc is
deletable (i.e. if it can be traversed along a path of deletable symbols).
Deletable
param
global
begin
(loc)
marked:={};
end
Boolean:
loc:
marked:
Cardinal;
set of nodenumbers;
return
--
mark
list
for
visited
nodes
DelGraph (4loc) ;
Deletable;
The actual work is performed by the procedure DelGraph.
DelGraph (4 loc)
Boolean:
param
loc:
Cardinal;
global marked: set of nodenumbers;
local
gn:
Graphnode;
begin
if loc=0 then return true; end;
if loc in marked then return false;
marked:=marked+{loc};
GetNode(dloc,
return
DelGraph;
end of graph found
already visited: cycle
Tgn);
((gn.lp<>0)
(Delnode(Jgn)
end
---
end;
and
DelGraph (Jgn.1p))
and
or
DelGraph(Jgn.rp));
--
deletable
--
or deletable
alternat.
--
part
of graph
right
Chap. 7
The implementation
142
Finally, DelNode checks if a node (i.e. its corresponding symbol) is deletable.
DelNode (gn) Boolean:
param gn: Graphnode;
local sn: Symbolnode;
begin
if gn.typ=nt
then
GetSy(Jgn.sp,Tsn);
else return
end;
end DelNode;
7.4.2 Terminal
return
sn.del;
gn.typ=eps;
start symbols
of nonterminals
The terminal start symbols of a nonterminal are the terminal start symbols of
its top-down graph, i.e. the start symbols of its first alternative chain. Those
nodes of the chain which contain nonterminals will have their terminal start
symbols calculated recursively. If the chain contains a deletable symbol, its
successors have also to be considered. The terminal start symbols of all
nonterminals are stored in a list.
Get terminal start symbols:
global first: array(nonterminals)
of record
1888
set of terminals;
-- terminal start symbols
ready: Boolean;
Street
eseisEcompuLed
end;
loealsssıt
Symbolnode;
begin
for all nonterminals i do first (i).ready:=false; end;
for all nonterminals i do
GetSy (vi, Tsn);
GetFirstSet (Ysn.start, Tfirst (i) .ts);
first (i) .ready:=true;
end;
end Get terminal start symbols;
The procedure GetFirstSer(Lloc,Ts) supplies the terminal start symbols of
the top-down graph with the root loc.
GetFirstSet
param
(Lloc,Ts):
loc:
Si
visited:
global
begin
visited:={};
Cardinal;
set of terminals;
set of nodenumbers;
CollectFirst (Lloc, fs);
end
GetFirstSet;
--
mark
list
for
visited
nodes
Sec. 7.4
Collecting the symbol sets
143
GetFirstSet initializes a mark list for the prevention of cycles and calls the
procedure CollectFirst which does the actual work.
CollectFirst
(Jloc,Ts):
param
loc:
Sr
global visited:
Prster
localssesn:
gn:
Sie
begin
Cardinal;
set of terminals;
set of nodenumbers; -- mark
like in 'Get terminal start
Symboinode;
Graphnode;
set of terminals;
s:={};
list for visited
symbols';
nodes
¢
while loc<>0 do
-- for all alternatives
if loc in visited then return; end;
-visited:=visited+{loc};
cycle
GetNode (loc, Ton);
if DelNode (Jgn)
case
gn.typ
tee
| nt:
then
CollectFirst
if
eps:
end;
first (gn.sp) .ready
s:=s+tfirst (gn.sp) .ts;
GetSy(Jgn.sp,Tsn);
s:=stsl;
end;
“|
s:=s+sl;
S3=St(gni.sp};
then
else
any:
(Jgn.rp,1sl);
of
Sei
CollectFirst
(4sn.start,Ts1);
alltermnimansı,
-- nothing
end;
loc:=gn.1p;
end;
end CollectFirst;
The procedure DelNode(J
gn) from Section 7.4.1 checks if the graph node
gn is deletable.
7.4.3
Terminal
successors
of nonterminals
The terminal successors of all nonterminals are stored in another list. They are
collected in two steps: first, a search is made for the direct successors of all
nonterminals (those terminals immediately following this nonterminal at all its
occurrences in the graph); then the indirect successors are calculated (if a
nonterminal is at the end of a rule, its indirect successors are the successors of
the nonterminal on the left-hand side of this rule).
In the first step, the data structure follow is filled; this contains for each
nonterminal i its direct successors (ts) and those nonterminals (nts), whose
successors are indirect successors of i. In the second step, the indirect
successors are added to ts.
Chap. 7
The implementation
144
Get terminal successors:
global follow: array(nonterminals)
of
-- terminal successors
set of terminals;
ts:
-- nt's whose successors
nts: set of nonterminals;
-- must be added to ts
end;
-- mark list (visited nodes)
visitednod: set of nodenumbers;
visitedsym: set of nonterminals; -- mark list (visited nt's)
Symbolnode;
sn:
ike
Cardinal;
local
begin
ie all nonterminals
visitednod:={};
i do
follow(i).ts:={};
follow(i).nts:={};
for
i do
--
fill
and
--
complete
all
nonterminals
follow.ts
end;
follow.nts
GetSy (vi, T sn);
CollectFollow(lsn.start,Vi);
end;
for all nonterminals
visitedsym:={};
Complete(Ji);
end;
end Get
terminal
i do
follow.ts
follow(i).nts:={};
successors;
The procedure CollectFollow(Lloc,\sy) traverses the top-down graph of
the nonterminal sy starting at the node loc. Every time it encounters a nonterminal i, it adds its direct successors to the set follow(i).ts. For each non-
terminal i at the right end of the graph, it adds sy to the set follow(i).nts.
CollectFollow(Jloc,Ysy):
param
global
local
loc,sy:
Cardinal;
follow:
as in 'Get terminal
visitednod:
set of nodenumbers;
gn:
Graphnode;
Se
set of terminals;
successors';
begin
while loc<>0 do
-- step through alternatives chain
if loc in visitednod then return; end;
-- cycle
visitednod:=visitednod+{loc};
GetNode (loc, Tgn);
if gn.typ=nt
then
GetFirstSet (Jgn.rp, Ts);
follow(gn.sp).ts
:=
follow(gn.sp).ts
+ s;
ie Deletable(tgn.rp) then -- nt at end of rule
follow(gn.sp)
.nts := follow(gn.sp).nts + {sy};
end;
end;
CollectFollow(Jgn.rp,Ysy);
loc:=gn.lp;
end;
end CollectFollow;
The procedure GerFirstSet(Lloc,?s) from Section 7.4.2 computes the set of
Sec. 7.4
Collecting the symbol sets
145
terminal start symbols s of the graph with the root loc. The procedure
Deletable(\loc) from Section 7.4.1 checks whether the graph rooted at loc
is deletable.
The procedure Complete(li) used in Get terminal successors completes
the direct successors of the nonterminal i (follow(i).ts) by adding its indirect successors, which are the successors of the nonterminals contained in
follow(i).nts.
Complete(li):
param
i:
global
visitedsym:-set of nonterminals;
follow:
like in 'Get terminal
Cardinal;
local
j: Cardinal;
begin
if i in visitedsym
then
return;
end;
successors';
zeyele
visitedsym:=visitedsym+{i};
for
all
j in
follow(i).nts
do
Complete (14);
follow(i) ..ts:=follow(i)..ts+follow(j)
.ts;
end;
end Complete;
7.4.4
eps-sets
eps-nodes having an alternative must not be recognized by the generated
syntax analyzer unless the next input symbol is a valid successor of this epsnode. In order to find out whether a symbol is a valid successor, the syntax
analyzer must know the set of all possible successors of each eps-node with
alternatives.
The terminal successors of an eps-node are the terminal start symbols of
the subgraph rooted at the right pointer of the eps-node. If the right pointer is
null, the terminal successors are the successors of the nonterminal on the lefthand side of the graph containing the eps-node.
First, the top-down graph of each nonterminal is searched for eps-nodes.
Get eps-sets:
global epsset:
maxeps:
array of set
Cardinal;
of terminals; --- number
visited: set of nodenumbers;
local
sn:
Symbolnode;
begin
visited:={}; maxeps:=0;
for all nonterminals i do
GetSy(Ji,Tsn);
FindEps (Jsn.start,Ji,\false);
end;
end Get eps-sets;
--
mark
eps successors
of eps-sets
list
for visited
nodes
Chap. 7
The implementation
146
The procedure FindEps(lloc,lleftsy,\vialp) searches the top-down graph
with the root loc for eps-nodes. It computes their successors and stores them
into the global array epsset. The field sp of the eps-node is set to point to this
entry in epsset. The flag vialp indicates whether loc has been reached via a
left pointer.
FindEps (loc, Jleftsy,Jvialp):
param
loc:
Cardinal;
== root.
leftsy: Cardinal;
-- left
vialp:
global
local
begin
Boolean;
--
Of DG
side nonterminal
true,
visited: set of nodenumbers;
gn:
Graphnode;
if loc=0 or loc in visited
visited:=visited+t{loc};
then
if
--
loc
is
mark
return;
reached
list
end;
for
--end
via
lp
visited
or
nodes;
cycle
GetNode (Jloc,Tgn);
if
(gn.typ=eps)
and
(vialp
or
(gn.lp<>0))
then
--
FindEpsFollowers (Jgn.rp,Jleftsy, Tgn.sp);
--
RepNode (loc, Jon);
CDSS
eps
gn.sp
with
points
alt.
to
Sel
end;
FindEps (lgn.lp,
bleftsy,
true) ;
FindEps (Jgn.rp, Jleftsy,Yfalse);
end FindEps;
The procedure FindEpsFollowers(Lloc,Lleftsy,Tnr) collects the terminal
start symbols of the subgraph with the root loc. If the graph is deletable, the
successors of the nonterminal leftsy are also added. nr is the index into the
global array epsset. The collected set has been stored in epsset(nr).
FindEpsFollowers (Vloc,Jleftsy, fnr):
param
loc,leftsy,nr: Cardinal;
global epsset: array of set of terminals;
-- successors
follow: like in Get terminal successors;
maxeps: Cardinal;
local
s: set of terminals;
begin
GetFirstSet
of eps-nodes
(Jloc,1s);
ie Deletable(Lloc)
then s:=stfollow(leftsy)
.ts; end;
maxeps:=maxeps+l;
epsset (maxeps) :=s;
nr:=maxeps;
end FindEpsFollowers;
The procedure GerFirstSet(Lloc,?s) from Section 7.4.2 collects the terminal
start symbols of the graph with the root loc. The procedure Deletable(J loc)
from Section 7.4.1 determines whether the graph with the root loc
is
deletable.
Sec. 7.5
7.4.5
Grammar tests
147
any-sets
In order to recognize an any-symbol, the generated syntax analyzer needs the
set of all terminals represented by the any-symbol. An any-symbol represents
all terminals which are not in the alternative chain to which it belongs. For
any-symbols without alternatives, no any-sets are computed. The syntax
analyzer recognizes them regardless of the next input symbol.
Get any-sets:
global anyset:
maxany:
eofsy:
local
gn:
Ss
begin
for all nodes
array of set of terminals;
-- any-sets
Cardinal;
-- number of any-sets
Cardinal;
-- symbol number of eof-symbol
Graphnode;
set of terminals;
i do
GetNode
(ti, Tgn);
if
(gn.typ=any)
and
(gn.lp<>0)
GetFirstSet (Jgn.1p,1s);
Make complement of s;
s:=s-{eofsy};
-- eofsy
maxany:=maxanytl;
anyset (maxany) :=s;
gn.sp:=maxany;
--
2
sp of
must
then
not
any-node
be recognized
points
to
by any
any-set
RepNode (Ji,Jgn);
end;
end;
end
Get
any-sets;
The procedure GetFirstSet(Lloc,1s) from Section 7.4.2 supplies the terminal start symbols of the graph with the root loc.
For the calculation of an any-set, only those symbols are considered
which can be reached via the left pointer of the any-node. The symbols which
lie before the any-node in the alternative chain are not considered, since the
syntax analyzer has already checked them before it gets to the any-node.
7.5
Grammar
tests
Before Coco generates the target compiler, it carefully checks if the grammar
satisfies certain requirements which are necessary for a correct compiler. Here
the compiler compiler proves to be very valuable: even in large grammars,
which are hard to understand for human readers, it rapidly finds hidden ambiguities or circularities. The well-known problem of the ‘dangling else’ clearly
Chap. 7
The implementation
148
shows how easy bugs in the grammar design can remain undetected without
the support of an automatic tool (actually, this ambiguity was overlooked in
the language definition of Algol).
Coco verifies the following properties:
1.
2.
3.
completeness;
reachability;
noncircularity;
4.
5.
termination;
LL(1) property
The implementation
Structure
of the
symbol
list
Structure
of the
top-down
graph
Completeness
Collecting
the
symbol sets
Reachability
Generation
of the
parser
tables
Noncircularity
Generation
of the
syntax
analyzer
Terminalization
Generation
of the
semantic
evaluator
LL(1)-condition
Fig. 7.15 Structure of Section 7.5
The test algorithms are executed in the following
Test
grammar
Test
(Tok):
completeness (T okl);
Teste
ut eal
Find
circular
Test
LL1
order:
if all
Wontisecanebe
reached (Tok2) ;
rules (Tok3);
nt's
can
be derived
to t's (Tok4);
test (Tok5);
ok:=okl and ok2 and
end Test grammar;
ok3
and
ok4
and
ok5;
These algorithms access the top-down graph and the symbol list with the
following procedures, already described in Sections 7.2.2 and 7.3.2:
PROCEDURE
PROCEDURE
GetNode(loc:CARDINAL; VAR gn:Graphnode);
GetSy(sy:CARDINAL; VAR sn:Symbolnode) ;
Sec. 7.5
7.5.1
Grammar tests
149
Completeness
As check is carried out as to whether there is a rule for all nonterminals.
Basic idea: The field start in the symbol node of each nonterminal must
point to a top-down graph.
Test
completeness (Tok):
param
ok:
Boolean;
local
begin
sn:
Symbolnode;
ok:=true;
for
all
=
nonterminals
i do
GetSy (Li, Tsn);
if
sn.start=0
end;
end Test
7.5.2
then
ok:=false;
end;
completeness;
Reachability
A check is made as to whether all declared nonterminals appear in some sentential form derived from the start symbol of the grammar.
Basic idea: First, tagging is done on all those nonterminals which can be
derived directly from the start symbol, then on those nonterminals which can
be derived from symbols already tagged. This is repeated until no more
nonterminals can be tagged. The untagged nonterminals are not reachable.
Test
if all
param
ok:
nt's
Boolean;
can
be
reached (Tok) :
global
visited:
reached:
rootsy:
sn:
set of nodenumbers;
set of nonterminals;
Cardinal;
Symbolnode;
----
already visited nodes
reachable nonterminals
start symbol of grammar
local
begin
visited:={};
reached:={rootsy};
Get Sy (Lrootsy, 1sn);
MarkReachedNts(Jsn.start);
ok:=true;
for all nonterminals i do
if not (i in reached)
then ok:=false;
end;
end Test if all nt's can be reached;
end;
The procedure MarkReachedNts(Jloc) marks all nonterminals which can be
reached from the node loc.
Chap. 7
The implementation
150
MarkReachedNts(JLloc):
Cardinal;
loc:
param
set of nonterminals;
global reached:
---
set of nodenumbers;
Graphnode;
visited:
Kocalmrgni:
sn:
reachable nonterminals
already visited nodes
Symbolnode;
begin
if
loc=0
or
loc
in visited
visited:=visited+t{loc};
then
--
return;
visit
end;
--
end
or
cycle
loc
GetNode (Jloc,Tgn);
if
(gn.typ=nt) and not (gn.sp
reached:=reached+{gn.sp};
GetSy (Lgn.sp,
in
reached)
then
--
new
nt
reached
sn);
MarkReachedNts
(Jsn.start);
end;
MarkReachedNts(Jgn.1p);
MarkReachedNts(Jgn.rp);
end
7.5.3
MarkReachedNts;
Noncircularity
A check is made as to whether there are nonterminals which can be derived
into themselves, i.e. if there are derivations X ++ X for some nonterminals X.
(This circularity definition differs from the usual definition in attributed
grammars, which defines circular dependencies of attributes.)
Basic idea: All productions are considered, which have a single nonterminal as their right-hand side. These single-nonterminal productions make
up a graph that must be noncircular.
Algorithm: The graph is stored as pairs (left, right) of nonterminals for
which there is a production left > right.
Find
circular
param
global
local
rules (Tok):
ok:
visited:
graph :
Boolean;
set of nodenumbers;
array of record
left, right:
singles:={};
nodes
Cardinal;
deleted:
Boolean;
end;
graphlength: Cardinal;
singles: set of nonterminals;
sn:
Symbolnode;
'
changed: Boolean;
Aura)
Cardinal;
begin
graphlength:=0;
for all nonterminals
-- mark list for visited
-- derivation graph
i do
visited:={};
--
build
--
single
the graph
descendants
of
a nt
Sec. 7.5
Grammar tests
151
GetSy (Li, Tsn);
Get Singles (Jsn.start,tsingles)
for all
nonterminals
;
Hs
j in singles
CSc, MeElS:
5) Silene
Wess
do
graphlength:=graphlength+1;
with graph(graphlength) do
left:=i; right:=j; deleted:=false;
end;
end;
end;
repeat
-- remove edges,
which
are
not
on
a cycle
changed:=false;
for i:=1 to graphlength do
if not graph(i).deleted and
(graph(i).left not on any right-hand side or
graph(i).right not on any left-hand side) then
graph (i) .deleted:=true; changed:=true;
end;
end;
until not changed;
ok:=graph is empty;
end Find circular rules;
The elements that have not been deleted in the graph represent the circular part
of the grammar.
The procedure GetSingles(Lloc,tsingles) collects a set (singles) of
nonterminals in the top-down graph with the root loc. If the graph can be
derived into a single nonterminal X, then X is added to singles. The
following assertion always holds: Joc is on a path which contains only
deletable symbols between its beginning and loc.
Get Singles (Jloc,tsingles):
param
global
local
loc:
singles:
visited:
gn:
begin
--
assert:
Cardinal;
set of nonterminals;
set of nodenumbers;
Graphnode;
all
nodes
if loc=0 or loc in visited
visited:=visited+t{loc};
GetNode
if
left
then
to
loc
are
return;
deletable
end;
--
end
or
cycle
(Lloc, Tgn) ;
(gn.typ=nt) and Deletable(lgn.rp)
singles:=singlest{gn.sp}
end;
then
-- right subgraph
—zueletable
if DelNode(Jgn) then GetSingles(lgn.rp,tsingles)
end;
GetSingles(Jgn.1p,!singles);
end
GetSingles;
A nonterminal X is added to singles if it is on a path from loc to the end of
the top-down graph and if this path has only deletable nodes to the left and
right of X. The deletability of subgraphs and nodes is determined by the
procedures Deletable and DelNode from Section 7.4.1.
7.5.4
Chap. 7
The implementation
152
Termination
A check is made as to whether all nonterminals can be derived into (possibly
empty) strings of terminals.
Basic idea: Those nonterminals are tagged which are deletable or can be
derived into a string consisting only of terminals or already tagged nonterminals. This is repeated until no more nonterminals can be tagged. The untagged nonterminals are those which cannot be derived into terminals.
Test
if nt's
can
be
derived
to
t's(Tok):
param
global
ok:
visited:
termlist:
Boolean;
set of nodenumbers;
set of nonterminals;
local
changed:
sn:
Boolean;
Symbolnode;
begin
termlist:={};
repeat
changed:=false;
for all nonterminals
i which
are
---
mark list for visited nodes
nonterminals which can be
--
derived
not
to terminals
in termlist
do
Getsy(Ji,Tsn);
visited:={};
ie IsTerm(Jsn.start) then
termlist:=termlist+{i};
end;
end,=
2705
until not changed;
ok:=all nonterminals
end Test if nt's can
changed:=true;
are in termlist;
be derived to t's;
The procedure /sTerm(Lloc) checks if the top-down graph with the root loc
has a (possibly empty) path which consists only of terminals or already tagged
nonterminals.
IsTerm(Jloc): Boolean:
param
loc:
Cardinal;
global visited:
set of nodenumbers;
termlist: set of nonterminals;
local
gn:
Graphnode;
begin
if loc=0 or loc in visited then return
visited:=visited+{loc};
GetNode
if
return
elsezrewurne
end;
“end
end;
--
end
or
(Lloc, Tgn) ;
(gn.typ=nt)
then
false;
IsTerm;
and
not
(gn.sp
in termlist)
IsTerm(dgn. 1p);
(gn-rPp-0)
or
IsTerm(Jgn.rp)
or
IsTerm(Jgn.1p);
cycle
Sec;=7.5
7.5.5
Grammar tests
LL(1)
153
condition
A check is made as to whether it is always possible to decide which path of the
top-down graph should be followed during syntax analysis depending on the
next input symbol.
Basic idea: The LL(1) test consists of the following two subtests:
l.
2.
The terminal start symbols of all alternatives in an alternative chain must
be disjoint.
The terminal start symbols of deletable subgraphs must be different from
the terminal succéssors of the left-hand side nonterminal.
LL1
test (Tok):
param
ok:
global visited:
Ioealssssn:
begin
ok:=true;
for
all
Boolean;
set of nodenumbers;
Symbolnode;
nonterminals
visited:={};
--
mark
list
for
visited
nodes
i do
GetSy(Yi,Tsn);
CheckAlternatives(lsn.start,
i, lok);
end;
end LL1
test;
The procedure C'heckAlternatives(Lloc,\sy,tok) checks if the alternative
chaih with the root loc contains only alternatives with distinct start symbols
(subtest 1). If the subgraph rooted at loc is deletable (i.e. if it can produce the
empty string), it is also checked whether the start symbols of the subgraph are
different from the successors of the left-hand side nonterminal sy (subtest 2).
CheckAlternatives uses GetF(lsy,1 first) and GetFo(Jsy,7 follow)
to access the already calculated sets of terminal start symbols and successors
of nonterminals.
CheckAlternatives
(loc, sy, lok) :
param
global
localss
loc,sy:
ok:
visited:
first;
follow:
locset:
Si
gn:
begin
if loc=0
or
Cardinal;
Boolean;
set of nodenumbers;
set of terminals;
set of terminals;
set of terminals;
set of terminals;
Graphnode;
loc
in visited
if Deletable(Jloc)
then
GetFirstSet (Vloc, Ts);
GetFo(Jsy,Tfollow);
then
--
mark
---
start
start
return;
end;
list
for
symbols
symbols
--
end
visited
nodes
of current node
of prev. alt.
or
cycle
=zsubtests2
Chap. 7
The implementation
154
if s * follow
end;
<>
{} then
ok:=false;
end;
S:=(}7
-- for all alternatives
while loc<>0 do
if loc in visited then return; end;
visited:=visited+{loc};
we
subtestan
GetNode (loc, Tan);
if DelNode (4gn)
then
GetFirstSet
else
locset:={};
(Jgn.rp, Tlocset);
end;
case gn.typ of
t:
locset:=locset+t{gn.sp};
Mani:
GetF (Jgn.sp,Tfirst);
locset:=locset+first;
| eps,any:
-- nothing
end;
if s * locset <> {} then ok:=false;
end;
s:=stlocset;
CheckAlternatives(tgn.rp,
sy, tok) ;
loc:=gn. 1p;
end;
end CheckAlternatives;
The procedures Deletable(Lloc) and DelNode(\gn) from Section 7.4.1
check whether the top-down graph with the root loc or the graph node gn are
deletable. The procedure GetFirstSet(lloc,ts) from Section 7.4.2 supplies
the terminal start symbols s of the top-down graph with the root loc.
7.6 Generation
of the parser tables
When the grammar tests are completed, Coco can generate the target compiler.
From the symbol list and the top-down graph, the parser tables which drive
the generated compiler are constructed. The tables contain information for the
recognition of symbols and for error handling, including the G-code which
controls the syntax analysis. This section is structured as shown in Fig. 7.16.
7.6.1
Table
format
The parser tables are inserted into the generated syntax analyzer as initialization code. Table 7.1 shows their contents:
Sec. 7.6
Generation of the parser tables
155
The implementation
Structure
Structure
of the
symbol
list
of the
top-down
graph
— Collecting
Grammar | Generation
the
symbol sets
tests
Table
format
7
Generation
of the
Generation
of the
syntax
semantic
analyzer
evaluator
Generation of the
remaining tables
Generation
of theG-code
Fig. 7.16 Structure of Section 7.6
Table 7.1 Contents of the parser tables
header
table dimensions (for decoding)
code
G-oode
ntsymbols
information about nonterminals
epssets
anysets
attribute numbers
pragma semantics
sets of valid successors, one for each eps-instruction in the G-code
sets of terminals represented by each any-symbol
number of attributes for each terminal and each pragma
for each pragma, the semantic actions to be executed when
the pragma is recognized
symbol names for error messages
pointers to the symbol names
namelist
name pointers
The structure of the above data is shown by the following Modula-2 type
declarations:
TYPE
Header
=
RECORD
maxcodevar, maxtvar, maxpvar, maxsvar,
maxepsvar, Maxanyvar, maxnamevar, maxnamepvar:
END;
Code
CARDINAL;
.
= ARRAY[1..maxcode]
Symbolset
=
ARRAY[0..maxt
OF
DIV
[0..255];
16]
OF
BITSET;
Ntsymbols = ARRAY[maxptl..maxsym] OF RECORD
startpc: CARDINAL;
(*start of rule in G-code*)
del:
BOOLEAN;
(*true, if deletable*)
seabiesic, 2
Symbolset;
(*terminal start symbols*)
END;
Epsset = ARRAY[{1..maxeps] OF Symbolset;
Anyset = ARRAY[1..maxany] OF Symbolset;
Attributenumbers = ARRAY(0..maxp] OF [0..255];
Chap. 7
The implementation
156
OF RECORD
Pragmasemantics = ARRAY[maxt..maxp]
(*element maxt is a dummy*)
seml,sem2: CARDINAL;
END;
Namelist
= ARRAY[l..maxname]
OF CHAR;
Namepointers = ARRAY[0..maxnamep]
Checksum = CARDINAL;
OF CARDINAL;
The constants maxcode, maxt, maxp, etc. are the table dimensions derived
from the input grammar. They are inserted into the generated syntax analyzer
as constant declarations. The header of the parser tables contains the same
values as variables again. However, they are not used by the syntax analyzer,
but are reserved for a decoding program.
7.6.2
Generation
of the
G-code
The G-code is derived from the top-down graph. This process is very simple:
A recursive algorithm visits all nodes of the top-down graph and translates
them into G-code instructions. The simplified algorithm is shown below:
GenCode (Jnode) :
Generate code for node;
if (node.rp<>0) and (node.rp
not
yet
visited)
then
GenCode (dnode.rp) ;
end;
if (node.lp<>0) AND (node.lp
not
yet
visited)
then
GenCode (dnode. lp) ;
end;
end GenCode;
Each node is processed as follows (for the definition of the G-code, see
Section 2.4 or Appendix D):
1.
Depending on the node type, a G-code instruction for the recognition of
this node is generated (T, NT, NTS, ANY and EPS instructions). For
nodes with a nonzero left pointer value, the generated instruction also
contains the address of the corresponding alternative (TA, NTA, NTAS,
ANYA and EPSA instructions).
2.
WwW
4.
If semantic actions are specified in the node, SEM instructions are generated.
If the right pointer of the node is zero, a RET instruction is generated.
If the right pointer points to an already visited node, a JMP instruction to
the address of this node is generated.
In order to resolve jumps and addresses of alternatives, an address list of all
G-code sequences generated from graph nodes is needed. It is handled by the
following procedures:
Sec. 7.6
Generation of the parser tables
157
PROCEDURE NewAdr (loc:CARDINAL; adr:CARDINAL) ;
PROCEDURE GetAdr (loc, fixup:CARDINAL; VAR adr:CARDINAL);
PROCEDURE Visited(loc:CARDINAL) : BOOLEAN;
NewAdr defines that the G-code sequence generated from node loc has the
address adr. GetAdr returns the address adr of the G-code sequence corresponding to node loc. If the address is not yet in the address list, then adr is
zero. In this case, fixup is remembered as a G-code location where the node's
address is to be entered as soon as it becomes known. An address becomes
known, when it is defined by NewAdr. It is then automatically entered into
all fixup locations waiting for this address. Visited returns TRUE if the
address of the node with number loc is already known.
Two additional procedures are needed: one to emit G-code instructions
and one to access nodes of top-down graphs:
PROCEDURE
PROCEDURE
Emit (VAR pc:CARDINAL;
GetNode (loc:CARDINAL;
code:Instruction);
VAR node:Graphnode) ;
Emit writes the specified instruction code into the code segment at the location
pc and increases the code segment length accordingly. Here, Instruction is a
symbolic type that is represented by the text of the instruction. The actual
implementation deviates from this. GetNode gets the graph node with the
node number loc. The type Graphnode is described in Section 7.3.1.
- The actual algorithm for the generation of the G-code follows:
Generate
local
begin
G-code:
pc:
Cardinal;
pc:=1;
for
all
nonterminals
GenCode (Jroot
end;
end Generate
of
i do
top-down
graph
of nonterminal
i,
$pc);
G-code;
GenCode(lloc,tpc) is a recursive procedure which will now be refined. It
translates the top-down graph with the root /oc into a corresponding G-code
sequence and inserts it into the code segment at the location pc.
When GenCode arrives at a node loc that has already been visited, the
G-code for the subgraph at loc has already been generated, so this node does
not have to be revisited.
GenCode (loc, pc):
param
var
loc,pc:
node:
adr,nr:
Cardinal;
Graphnode;
Cardinal;
begin
if Visited(Jloc)
NewAdr (Lloc, dpc) ;
then
return;
——
end;
NOW) Vasit)
Loc
Chap. 7
The implementation
GetNode (4 1oc, Tnode) ;
case node.typ of
cs
if node.1lp=0
then Emit (fpc,Y"T
node.sp");
else
Getadr (node. 1p, bpc+2, fadr) ;
Emit([pc,4"TA
node.sp,adr");
end;
| nt:
if node.lp=0
then
if node.sem1=0
then Emit (tpc, "NT
else
node.sp");
Emit (fpc, Y"NTS
node.sp,node.seml");
end;
else
GetAdr
(Inode.Ip,Ypc+2,Tadr);
if node.sem1=0
then
Emit (pc,J"NTA
else
Emit (fpc, /"NTAS
node.sp,adr");
node.sp,adr,node.sem1") ;
end;
| any:
end;
if node.sp=0
then Emit (Ipc, )"ANY") ;
else
GetAdr (lnode.1p,tpc+2, Tadr) ;
Emit (Ipc, /"ANYA
| eps:
end;
if node.sp<>0 then
if node. 1p=0
node.sp,adr") ;
--
then Emit (fpc,L"EPS
node
with
eps-set
node.sp");
else
Get Adr (Lnode.lp,lpct+2,
Tadr) ;
Emit (Ipc, /"EPSA node.sp, adr") ;
end;
end;
end;
--case
if node.sem2<>0
THEN
Emit (Ipc,
"SEM
(node.sem2)");
if node.sem3<>0 THEN Emit (pc, "SEM (node.sem3)");
end;
end;
if node.rp=0
then
Emit (pc, L"RET");
else
abt Visited (node.rp)
then
GetAdr (tnode.rp, dpc+1, fadr) ; Emit (tpc, 4"JMP
end;
end;
if node.rp<>0
then
if node.lp<>0
then GenCode(lnode.lp,lpc);
end
GenCode;
GenCode (node. rp, pc) ; end;
end;
ella )\\c
Sec.r7.7
Generation of the syntax analyzer
159
The G-code is completely stored in memory so that the missing addresses can
be inserted when they become known.
7.6.3 Generation
of the remaining
tables
Besides the G-code, the contents of the generated tables are almost entirely
extracted from the symbol list. Only the name list is handled by the lexical
analyzer of Coco. Coco gets the necessary data from the symbol list and from
the lexical analyzer with the help of access procedures, and writes them unchanged into the syntax analyzer as initialization values.
7.7 Generation
of the syntax analyzer
Coco generates a table-driven LL(1) syntax analyzer with error handling in the
form of a Modula-2 source module which the user must compile and include
in his compiler. The syntax analyzer is the implementation of the analysis
algorithm described in Section 2.5. It is the same for all generated compilers.
Only-the parser tables differ from compiler to compiler so they have to be
inserted into the otherwise invariant parser module.
The implementation
Structure
of the
symbol
list
Structure
of the
top-down
graph
Collecting
the
symbol sets
Grammar
tests
Generation | Generation | Generation
of the
of the
parser
semantic
tables
evaluator
Fig. 7.17 Structure of Section 7.7
The definition module and the implementation module of the syntax analyzer
are generated from a frame text which Coco reads from the file cocosynframe. At certain locations grammar-dependent parts have to be inserted into
this frame. The locations are marked by the string '-->' and a descriptive name
of the text to be inserted. The following table shows what has to be inserted at
these locations.
Chap. 7
The implementation
160
-->modulename
-->semantic
-->input
grammar name + syn
analyzer
module
grammar name + sem
grammar name + lex
-->declarations
table dimensions declared as constants
(see example in Section 8.3)
-->tables
table values
The syntax analyzer contains references to other modules (e.g. the lexical
analyzer or the semantic evaluator) whose names are constructed from the
grammar name (the name of the root symbol in the attributed grammar) and
from a suffix. The resulting syntax analyzer is written to the files grammarnamesyn.DEF and grammarnamesyn.MOD.
Coco uses a procedure CopyFramePart to copy pieces of text from the
frame to the syntax analyzer module.
PROCEDURE
CopyFramePart (VAR
source,target:File;
str:ARRAY
OF
CHAR);
CopyFramePart copies text from the file source to the file target until it encounters the string str (str is not copied). When it is next called, it continues
copying the text immediately behind str.
This procedure is called with the name of the next piece of text to be
inserted (e.g. '-->tables'). It copies the frame up to this name and then Coco
inserts the specified text in place of the name. This process is repeated until the
entire syntax analyzer has been generated. A source listing of cocosynframe
is shown in Appendix F. The module cocosyn, also shown in Appendix F, is
an example of a syntax analyzer generated by this process.
7.8 Generation
of the semantic
evaluator
In addition to the syntax analyzer and the parser tables, Coco also generates a
semantic evaluator. This is a Modula-2 source module which the user must
compile and include in his compiler. The semantic evaluator consists of some
invariant parts and of the semantic actions and declarations which are copied
from the attributed grammar. Its generation can be divided into three tasks:
1.
2.
3.
copy the semantic declarations from the attributed grammar to the semantic evaluator;
translate the semantic actions into components of a case statement;
generate new semantic actions (assignments) for attribute passing.
Before covering these three tasks in detail, we will describe the invarian
t parts
of the semantic evaluator.
Sec. 7.8
Generation of the semantic evaluator
161
The implementation
Sp
eae
Structure
of the
symbol
list
Structure
of the
top-down
graph
Collecting
the
symbol sets
Grammar
tests
Generation
of the
parser
tables
Generation
of the
syntax
analyzer
Constant parts
of the
Translation
of
Translation
of
semantic
evaluator
semantic
declarations
semantic
actions
Generation
of the
semantic
evaluator
Attribute
processing
Fig. 7.18 Structure of Section 7.8
7.8.1 The invariant
parts of the semantic
evaluator
Like the syntax analyzer, the semantic analyzer is derived from a frame
module which Coco reads from the file cocosemframe. Again Coco copies
the frame using the procedure CopyFramePart (see Section 7.7) and inserts
grammar-dependent parts at some specified places in the frame. These places
are:
-->modulename
grammar name + sem
-->scannername
grammar name + lex
-->declarations
-->actions
semantic declaration of the grammar
semantic actions of the grammar
The frame module is as follows:
DEFINITION
VAR
PROCEDURE
END
MODULE
printactions:
-->modulename;
BOOLEAN;
Semant (sem:CARDINAL)
;
-->modulename.
IMPLEMENTATION MODULE -->modulename;
FROM SYSTEM IMPORT WORD;
FROM
-->scannername
IMPORT
at;
-->declarations
PROCEDURE
BEGIN
x:=y
ASSIGN (VAR
END
ASSIGN;
x:WORD;
y:WORD);
The implementation
162
PROCEDURE
Semant (sem:CARDINAL)
Chap. 7
;
BEGIN
CASE sem OF
11: ; (*action
-->actions
numbers
start
at
12*)
END;
END Semant;
END -->modulename.
The resulting semantic analyzer is written to the files grammarnamesem.DEF
and grammarnamesem.MOD. The user may set the exported variable printactions to TRUE if he wants a trace of the executed semantic actions.
7.8.2 Processing
of the semantic
declarations
The semantic declarations, which are written in Modula-2, are copied immediately and without change from the attributed grammar to the frame program,
and are inserted at the location marked by '-->declarations'. This happens in
the following manner: the lexical analyzer of Coco returns the symbols of the
Modula-2 text to the syntax analyzer as Cocol symbols, and from there they
go to the semantic evaluator. The procedure Copy(Jtyp,\col) is called for
each symbol to translate its symbol code back into its source text, which is
then inserted into the frame module.
Problems can arise since the Modula-2 text may contain symbols that are
not Cocol symbols (i.e. +, *, &, etc). Such symbols are copied by means of a
trick: the lexical analyzer assigns them a special symbol code (nococosy) and
an attribute (spix). They are treated like names and entered into the name list.
spix is their address in the name list, which allows their source text to be
accessed.
In order to keep the name list small, the Modula-2 names are entered only
temporarily. Permanent storage is prevented with the procedure StopHash.
This causes a name to be entered, but overwritten by the next name, so the
names can be accessed via their addresses just like the permanently stored
names, but only until the next name has been recognized. The procedure
RestartHash re-establishes permanent storage.
Coco copies the declarations without checking the syntax. If there are
syntax errors, they will be detected by the Modula-2 compiler when the
generated semantic evaluator is compiled. We now describe the translation of
the semantic declarations by an attributed grammar in Cocol.
GRAMMAR
SEMANTIC
FROM
Declarations
DECLARATIONS
cocogen
IMPORT
Copy;
Sec. 7.8
Generation of the semantic evaluator
FROM
==
cocolex
IMPORT
PROCEDURE
oe
=
col,
typ,
StopHash,
163
RestartHash;
Copy (typ,col:CARDINAL);
writes the source text
semantic analyzer. col
TERMINALS
SEMANTICSY
NONTERMINALS
RULES
Declarations
SEMANTICSY
{ any
}
Declarations
of the
is the
symbol
symbol
'typ' to the generated
column in the grammar.
DECLARATIONSY
=
DECLARATIONSY
sem StopHash endsem
sem Copy (Jtyp, Jcol) endsem
sem RestartHash endsen.
ENDGRAM
7.8.3 Processing
of the semantic
actions
Coco translates the semantic actions of the attributed grammar into continuously numbered variants of a case statement, and inserts them into the
semantic frame program at the location marked by the string '-->actions'.
Like the declarations, the semantic actions are copied unchanged and without a
syntax check. Again, each symbol is copied by translating its symbol code
back into its source text. We describe this process in Cocol.
«GRAMMAR
SemAction
SEMANTIC
DECLARATIONS
FROM
cocogen
IMPORT
Copy,
FROM
cocosym
IMPORT
NewMacro,
GetMacroNr;
FROM
FROM
cocolex
Errors
IMPORT
IMPORT
col, typ,
SemErr;
StopHash,
--PROCEDURE
--
OpenSem (VAR
generates
--PROCEDURE
---
gets
does
VAR
the
not
spix,sem:
a new
OpenSem;
sem:CARDINAL) ;
case
GetMacroNr
action
exist,
RestartHash;
label
and
returns
(spix:CARDINAL;
number
sem=0.
sem
its
VAR
of the
number
sem.
sem:CARDINAL) ;
macro
'spix'.
If the
CARDINAL;
TERMINALS
SEMS Yee ND OEMS Yanna
NONTERMINALS
SemAction<out:sem>
EEE
TEEN
SZESTDENTSOUNERISPLX>
RULES
SemAction<out:sem>
SEMSY
( "("
=
IDENT<out:spix>
sem
GetMacroNr (Jspix, Tsen);
IF sem=0
endsem
THEN
SemErr
(1) END
1) "
|
{ any
sem
Opensen (Tsen) ; StopHash
sem
Copy (Jtyp, Jcol)
sem
RestartHash
endsem
}
)
ENDSEMSY
ENDGRAM
endsen.
endsem
macro
Chap. 7
The implementation
164
The above grammar also shows how semantic macros are processed. The
module cocosym handles a list of macro names and their corresponding
semantic action numbers. The action number of a macro is supplied by the
procedure GetMacroNr.
7.8.4
Attribute
processing
While declarations and semantic actions need only to be copied from the
attributed grammar into the semantic evaluator, attributes need further processing. For each symbol, its attributes must be stored in the symbol list, and
must be checked for consistency every time this symbol occurs. In addition to
this, Coco must generate semantic actions by which values are assigned to the
attributes at run-time.
The processing of attributes depends on the context in which they appear.
In Cocol there are three different places where attributes may occur:
1.
2.
3.
at the declaration of a syntax symbol;
at anonterminal on the left-hand side of a rule;
atasymbol on the right-hand side of a rule.
We will now describe the processing of attributes in each of these three cases,
and then summarize it by an attributed grammar.
Declaration
of attributes
Attributes are declared together with syntax symbols and are entered into the
symbol list. The context of attribute declarations is:
SyntaxDeclarations =
TERMINALS
{Symbol
[ PRAGMAS
{Symbol
NONTERMINALS
[Attributes]
[Attributes]
{identifier
[AliasName] }
[SemAction]} ]
[Attributes]
[AliasName]}.
Coco uses the procedure New4r to enter an attribute into the symbol list.
TYPE
Direction
PROCEDURE
=
(up,down) ;
NewAt (sy, spix:CARDINAL;
dir:Direction);
NewAt enters an attribute spix with the direction dir for the symbol sy.
Depending on the kind of sy, the following information is stored:
for terminal symbols:
for pragmas:
number of attributes;
number of attributes;
for nonterminals:
numbéh ame, and direction of attributes.
Sec. 7.8
Generation of the semantic evaluator
165
Attributes on the left-hand side of productions
Attributes on the left-hand side of productions are called formal attributes.
Their context is:
Rule
=
identifier
[Attributes]
"="
Expression
"."
.
Formal attributes are checked for consistency with their declaration. For every
left-hand side nonterminal the number of attributes, their names, order, and
direction must agree with the attributes declared for this nonterminal. The
procedure GetAt is used to access the attribute information in the symbol list.
It gets the name (spix) and the direction (dir) of the nth attribute of the
nonterminal sy. If sy has fewer than n atttributes, then spix is zero.
PROCEDURE
GetAt(sy,n:CARDINAL;
VAR
spix:CARDINAL;
VAR
dir:Direction) ;
Attributes on the right-hand side of productions
Attributes on the right-hand side of productions appear as actual attributes of
syntax symbols in EBNF expressions.
Expression
Term
Factor
= Term {"|" Term}.
= Factor {Factor}.
= Symbol [Attributes]
|
In this context, attributes denote semantic values which result from the recognition of a syntax symbol, or which are required for its recognition. Coco
generates assignments between the attribute values and the attribute names,
and includes them as semantic actions in the evaluator program. It also checks
whether the number of attributes, their order and their direction agree with the
corresponding attribute declaration.
Attribute assignments for terminals and pragmas
The lexical analyzer of the generated compiler exports the attribute values of
terminals and pragmas in the variable at. The array at is filled for each symbol by the lexical analyzer. A terminal (or pragma) t<out:a,b> is handled by
the generated compiler as follows:
recognize
t and fill at;
a:=at(1);
b:=at(2);
When t has been recognized, a semantic action must be executed in which
the attribute values at(1) and at(2) are assigned to the attributes a and b.
Since such an action does not exist, Coco must generate it.
Attribute assignments for nonterminals
For nonterminals, attribute assignments occur between formal and actual attributes. A nonterminal nt<in:a,b; out:c,d> is handled by the generated
Chap. 7
The implementation
166
compiler as follows:
formal
formal
attribute
attribute
parse nt;
c := formal
d := formal
corresponding
corresponding
attribute
attribute
to
to
a
b
corresponding
corresponding
:= a;
:= b;
to c;
to d;
Again Coco must generate semantic actions for the attribute assignments.
Generation of attribute assignments
For each attribute on the right-hand side of a production, Coco calls the
procedure GenAssign, which generates an assignment of the corresponding
attribute value to the attribute variable.
TYPE
Attrkind
=
(term,
nonterm,
const) ;
PROCEDURE
(*attribute
of a terminal*)
(*attribute of
(*const. value
GenAssign(typ:Attrkind;
a nonterminal*)
as an attribute
of
an
nt*)
left, right:CARDINAL);
Table 7.2 shows the meaning of the parameters left and right depending on
the value of the parameter typ. It also shows which code is generated:
Table 7.2 Parameters of GenAssign and the generated code
pee of
Sat of
Meaning of
right
Generated code
term
Spix of
left side
ie
name(left):=at[right]
nonterm
Spix of
left side
Spix of
right side
name(left):=name(right)
Spix of
left side
Constant
name(left):=right
name(spix) denotes the name at the address spix in the name list. The array
at is exported by the lexical analyzer and contains the attribute values of the
most recently recognized terminal.
The procedure EmitAction builds a semantic action from the attribute
assignments generated since its last call. It inserts the action as a variant of a
Case statement into the semantic evaluator. Thus, the semantic evaluator contains not only the semantic actions of the attributed grammar, but also the
actions generated from attributes by Coco. EmitAction returns the action
number of the generated semantic action.
PROCEDURE
EmitAction(VAR
sem:CARDINAL)
;
Sec. 7.8
Generation of the semantic evaluator
167
Optimization of attribute passing
Coco performs two optimizations to reduce the number of attribute assignments:
1.
2.
If the formal and the actual attribute of a nonterminal have identical
names, no assignment is generated.
Identical semantic actions (with the same assignments) are generated only
once.
Description of the attribute processing in Cocol
We will now summarize the attribute processing, describing it by an attributed
grammar in Cocol. The start symbol of the grammar is the nonterminal
Attributes. The grammar is a segment of a larger grammar in which attributes
can appear in various contexts. Therefore, Attributes has three input attributes
which control its processing.
Attributes<in:sy,styp,kind;
out:seml,sem2>
sy denotes the symbol to which the attributes belong; styp specifies the type
of this symbol; kind is the context in which the attributes are being used
indicating how they are to be processed:
kind=def:
kind=check:
treat them as an attribute declaration;
perform a consistency check
(when used on the left-hand side of a production);
kind=use:
generate semantic actions for attribute passing
(when used on the right-hand side of a production).
seml and sem2 are the numbers of the generated semantic actions for input
and output attribute passing (or zero).
GRAMMAR
Attributes
SEMANTIC
FROM
DECLARATIONS
cocosym
IMPORT
FROM
cocogen
IMPORT
NewAt, GetAt, CompleteAt, Direction,
Symboltype;
Attrtype, EmitAction, GenAssign;
FROM
Errors
IMPORT
SemErr;
Usage,
SAUa
--
Attrtype
=
(term,nonterm, const) ;
-Direction
= (up,down);
(AOUG=ateoOrein=dt*)
-Usage
= (def,check,use);
(*attribute context :*)
-Symboltype = (eps,t,pr,nt,any);
--PROCEDURE NewAt (sy, spix:CARDINAL; dir:Direction) ;
-declares an attribute for the symbol sy with the name
-the direction dir.
--PROCEDURE GetAt(sy,n:CARDINAL;
-VAR dir:Direction) ;
VAR
spix:CARDINAL;
spix
and
Chap. 7
The implementation
168
spix and the direction dir of attribute number
If sy has less than n attributes, then spix=0.
gets the name
of symbol sy.
---
--PROCEDURE
CompleteAt
--
true
returns
(sy,n:CARDINAL)
if symbol
sy has
: BOOLEAN;
exactly
n attributes.
VAR
sy, Spix, spixl, seml,sem2,n,val:
styp: Symboltype;
kind: Usage;
dir,dirl: Direction;
CARDINAL;
MACROS
sem
:AssignInAt:
n:=ntl;
CASE kind OF
use:
IF styp=nt
THEN
Getat (Jsy,Jn,Tspixl,Tdirl);
IF spixl<>0 THEN
IF dir=dirl
THEN
ELSE
GenAssign(tnonterm,
SemErr (2)
/spixl, /spix)
END
END
END;
| check:
IF
styp=nt
THEN
GetAt (Lsy, bn, Tspix1,Tdirl);
IF spixl<>0
THEN
IF spix<>spixl THEN SemErr(3)
END;
IF dir<>dirl THEN SemErr(2)
END
END
END;
| def:
END
NewAt (Lsy, bspix,
--
dir)
CASE
endsem
sem
:AssignNumber:
ig Sahel
IF kind=use
THEN
IF
styp=nt
THEN
Getat (bsy, dn, Tspix1, hdirl) ;
IF spixl<>0 THEN
IF dir=dirl
THEN
ELSE
GenAssign(Jconst,Yspix1,\/val)
SemErr (2)
END
END
END
ELSE
END
endsem
SemErr (4)
n
Sec. 7:8
sem
Generation of the semantic evaluator
169
:AssignOutAt:
n:=ntl;
CASE
kind
use:
OF
IF styp=t
ELSIF
THEN
styp=nt
GenAssign (Jterm,Jspix,Yn)
THEN
Getat (Jsy,In,Tspix1,Tdiri);
IF
spixl<>0
THEN
IF dir=dirl
THEN
ELSE
GenAssign (Vnonterm, 4spix,
SemErr (2)
spix1)
END
END
END;
| check:IF
styp=nt
THEN
Get at (bsy, dn, Tspix1, Tdirl);
IF spixl<>0
THEN
IF spix<>spixl
IF dir<>dirl
END
THEN
THEN
SemErr(3)
SemErr(2)
END;
END
END;
| def:
NewAt (Usy,lspix,
IF styp=pr
END
--
THEN
dir) ;
GenAssign (Jterm, /spix, Vn) END
CASE
endsem
PERMINALS
Wu
nen
mae
IDENT<out :spix>
ee
Wis
INSY
OUTSY
NUMBER<out:val>
NONTERMINALS
Attributes<in:sy,styp,kind;
out:seml,sem2>
InAttr<in:sy,styp,kind;
out:seml,sem2,n>
zn:
attribute counter
OutAttr<in:sy,styp,kind,n;
out:seml,sem2,n>
Attributes<in:sy,styp,kind;
out:seml,sem2> =
u
sem seml:=0; sem2:=0 endsem
( InAttr<in:sy,styp,kind; out:seml,n>
[ ";" OutAttr<in:sy,styp,kind,n; out:sem2,n> .
| OutAttr<in:sy,styp,kind,0;
out:sem2,n>
)
wu
sem IF NOT Completeat (lsy,4n) THEN
SemErr (5)
END
endsem.
InAttr<in:sy,styp,kind;
INS Ys!
out:semi,n> =
sem IF styp<>nt THEN
dir:=down; n:=0
endsem
SemErr (1)
END;
The implementation
170
Chap. 7
( IDENT<out :spix>
sem
(AssignInAt)
| NUMBER<out
sem
(AssignNumber)
sem
sem
(AssignInAt) endsem
(AssignNumber) endsem
:val>
endsem
endsem
lan
( IDENT<out:spix>
| NUMBER<out :val>
sem IF kind=use THEN EmitAction(Tseml)
END
endsem.
OutAttr<in:sy,styp,kind,n;
out:sem2,n> =
sem dir:=up endsem
OULS Ye
IDENT<out :spix>
sem (AssignOutAt) endsem
{ "," IDENT<out:spix> sem (AssignOutAt) endsem
sem IF (kind=use) OR (styp=pr)
}
EmitAction
THEN
(Tsem2)
END
endsem.
ENDGRAM
If one of the context conditions is violated, the procedure SemErr(Jn)
called, which emits an error message depending on n:
error message
: In-attributes for a pragma or terminal
: Wrong attribute direction
: Wrong attribute name
: Formal attribute is a constant
AP
wm
8m : Wrong number of attributes
is
8
Applications
8.1 Applications
in compiler
construction
Attributed grammars are mainly used in compiler construction — more precisely for the description of compilers. However, the description of an actual
compiler is far too complex to be used as an introductory example. Therefore,
in this Section we will use Cocol to develop a lexical analyzer, which is part of
a compiler. This example is general enough to demonstrate all language
constructs of Cocol, and yet simple enough for a reader inexperienced with
attributed grammars to follow it. The application of Coco to an actual compiler
(the compiler description for Coco itself) can be found in Appendix F.
It is unusual to describe and to generate lexical analyzers with attributed
grammars. Normally, they are coded by hand since they must be very efficient
(lexical analysis takes the biggest part of the compilation time). There are
special scanner generators which are designed to produce fast lexical analyzers. Although Coco is not such a generator, run-time measurements show
that it is possible in both theory and practice to implement lexical analyzers
with Coco.
As an example, we will develop a lexical analyzer for Modula-2. First we
will give a general specification for lexical analyzers. Then we will prepare a
special specification of a lexical analyzer for Modula-2. Next we will describe
and build this lexical analyzer using Cocol. Finally we will explain some of
the problems that can arise. At the end of this section, we will specify the
semantic procedures used in the description of the lexical analyzer.
171
Chap. 8
Applications
172
8.1.1 Specification
of a lexical
analyzer
General tasks
A lexical analyzer must at least perform the following tasks:
1.
2.
read and optionally print the source program;
skip meaningless character sequences such as blanks, comments, etc.;
3.
recognize and tokenize terminals such as keywords, names, numbers,
4.
and operators;
report lexical errors.
Usually, a lexical analyzer will recognize only one terminal per call and pass it
to the syntax analyzer. However, there are also analyzers that process the
entire source text at once, and write the symbol codes of the recognized
terminals to an intermediate file so that the syntax analyzer can read them later
on. The lexical analyzer described here is of the latter type.
Tasks of a lexical analyzer for Modula-2
A lexical analyzer for Modula-2 must recognize the following terminals:
Keywords
AND
ELSIF
LOOP
REPEAT
ARRAY
END
MOD
RETURN
BEGIN
BY
EXIT
EXPORT
MODULE
NOT
SET
THEN
CASE
FOR
OF
TO
CONST
DEFINITION
DIV
DO
ELSE
FROM
ER
IMPLEMENTATION
IMPORT
IN
OR
POINTER
TYPE
UNTIL
PROCEDURE
QUALIFIED
RECORD
WHILE
VAR
WITH
Names
Identifier
=
Letter
Letter
=
INENDEEN
Digit
=
u
{Letter
a
KALI
RE
AUT
| Digit}.
DAT
Va
IS
SU
| Oe
SU
SE
OL
Decimal constants
DecNumber
= Digit
{Digit}.
Hexadecimal constants
HexNumber
= Digit
HexDigit
=
Digit
=
OctalDigit
{HexDigit}
|
"H".
KAUNIBEITENTUHE
TE
Octal constants
OctalNumber
{OctalDigit}
"B",
tem
‘
Sec. 8.1
Applications in compiler construction
OctalDigit
=
mor
|""
| wou
hey
ay
| wou
|wen
|u7u
173
n
Real constants
RealNumber
= Digit
{Digit}
pata
a
ee
"."
Fe)
{Digit}
Digit
[Digit]].
Character constants
CharConst
=
win
any
wig
| OctalDigit
Character strings
String
|
us
any
ms
{OctalDigit}
"c".
ei
=
win
{any}
wie
tur
{any}
dl
Comments
Comment
Ze
comment
any)
Operators and separators
a
=
z
/
:=
addition
subtraction
multiplication
&
logical
>=
(
[
{
z
F
real division
assignment
and
Segel
#
not equal
<>
not equal
<
less than
greater than
less than or
Context
1
2,
3.
4
greater than or equal
round parenthesis
index-parenthesis
set-brackets
pointer
comma
period
9
semicolon
S
colon
range operator
variant operator
equal
conditions
Decimal, hexadecimal, or octal constants must be in the range 0 to 65535.
The numerical value of character constants must be in the range 0 to 255.
Real constants must be in the range 1.4694E-39 to 1.7014E+38.
Character strings must not extend over line boundaries.
8.1.2 Description
of a lexical analyzer for Modula-2
In the previous section, we described the lexical structure of Modula-2 by a
context-free grammar. Now we will have to attribute it. The following points
need special attention.
The lexical analyzer supplies the terminals for syntax analysis. These are
the nonterminals of the lexical analyzer, whereas the terminals of the lexical
Chap. 8
Applications
174
analyzer are the characters of the source text. These characters must be
supplied by a mini-scanner with the following tasks:
1.
2.
3.
read and print the source program,
supply the characters of the source text as terminals;
treat the character sequences "..', '(*', and '*)' as special terminals (to
simplify the attributed grammar).
This still leaves enough work for the lexical analyzer proper. In accordance
with Section 6.4.2, we will implement the mini-scanner in the procedure
GetSy of the module Scannerlex. The mini-scanner is so simple that we
refrain from describing it further.
Now we will specify the lexical analyzer of Modula-2 with Cocol.
GRAMMAR
Scanner
SEMANTIC
DECLARATIONS
FROM
Conversions
IMPORT
Convert,
FROM
Errors
IMPORT
SemErr;
FROM
ListMod
IMPORT
EnterString,
FROM
Scannerlex
IMPORT
typ,
FROM
OutMod
IMPORT
Symboltype, Emit, EmitConstant,
EmitIdent, EmitString;
(*token codes*)
--TYPE
-==
==
==
Symboltype
=
ConvertReal;
line,
Hash;
col;
(eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy,
minussy, orsy, eqlsy, neqsy, grtsy, geqsy,
lsssy, leqsy,
insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy,
semicolonsy, periodsy, colonsy, rangesy, constsy,
commasy,
typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy,
tosy, arrowsy, importsy, exportsy, fromsy,
qualifiedsy,
==
==
beginsy, casesy,
ofsy, ifsy, thensy, elsifsy,
elsesy, loopsy,
exitsy,
repeatsy, untilsy, whilesy, dosy, withsy,
forsy, bysy,
returnsy, becomessy, endsy, callsy, definitionsy,
SS
implementationsy,
intcardcon,
proceduresy,
realcon,
charcon,
modulesy,
ident,
stringcon,
eolsy);
cardcon,
CONST
blmax
= 80;
---
buffer
fit on
length (every token must
a 80 character line)
--
string
address
VAR
addr:
CARDINAL;
bi
Joie
ARRAY [1..blmax] OF CHAR;
-CARDINAL;
-- buffer length
chi:
CHAR;
firstch:CHAR;
ir
length:
rval:
spix:
sy:
symcol:
val:
CARDINAL;
CARDINAL;
REAL;
CARDINAL;
Symboltype;
CARDINAL;
CARDINAL;
-==
--------
in string
list
buffer
auxiliary
first character in a string
auxiliary
string length
value of real-constant
spelling index of identifier
token code
symbol column
constant value
Applications in compiler construction
Sec. 8.1
175
MACROS
sem
:AddCh:
are
--
-not
bl:=bl+1;
it is supposed, that lines
longer than 80 characters
b[bl] :=ch
endsem
TERMINALS
RU
chr9
chr17
chr25
NEN
chr lO
chr18
chr26
vow
ui
Lt
aN
ga
wy"
vg"
Tom
va
2
A
38
Q
B
J
R
x
Ne
an
h
a
a
b
3
C
k
d
1
e
m
if
n
g
0
p
8
q
y
ag
2
S
ur
ie
ae
u
a
Vv
chr126
W
chr127
fg
H
Schr
che l6
chr24
chr4
chr l2
chr20
chr28
ChuS ey
eR
chr21
chr29
sichro = chris
ehrlds
chris
chr22
chr23
chr3l
chr30
uur
ie
Su
won
Wew
wee
"x"
win
En
woe
LW
ypu
wou
zu
nz“
"zu
TAU
Tu
Wet
Wig
u
vw
Wu
wa
E
G
(6)
W
URAN
chrl
chr19
chr27
Z,
[&
K
S
u
D
L
m
I
vn
BE
M
U
N
Ne "
wau
V
ou
NONTERMINALS
„Scanner
Symbol
Identifier
<out:sy,spix,symcol>
Number
String
Comment
Letter
<out:sy,val,symcol>
<out:sy,addr,lengt
firstch, symcol>
h,
<out:ch>
Digit
HexDigit
<out:ch>
<out:ch>
RULES
Scanner
=
sem Emit (Veofsy,Jcol)
{Symbol}
Symbol =
{eos
( Identifier
endsem.
ep banks
<out:sy,spix,symcol>
sem IF sy=ident
THEN EmitIdent (Jspix,Jsymcol) -- ident.
ELSE Emit (Lsy,tsymcol)
-- keyword
END
| Number
endsem
<out:sy,val,symcol>
sem EmitConstant (Jsy,Jval,\symcol)
--
cardcon,
intcardcon,
endsem
realcon,
charcon
Chap. 8
Applications
176
String
<out:sy,addr,
length, firstch, symcol>
sem IF sy=stringcon
THEN EmitString
(laddr, length, Jsymcol)
ELSE EmitConstant (Jcharcon,
JORD (firstch) ,dsymcol)
END
endsem
| Comment
UT
| "="
un
mie
vie
nu
Laney
sem
sem
sem
sem
Emit
Emit
Emit
Emit
sem Emit
sem Emit
sem Emit
Mu he
| van
jj) ow
wt
wen
su
wan
nn
we
ng"
nu
ee
CR
UO
(ME
| eps
(Jsemicolonsy,Ycol) endsem
(Jeqlsy,Jcol) endsem
(Jlparsy,Jcol) endsem
(Jrparsy,Jcol) endsem
(Jlbracksy,Jcol) endsem
(lrbracksy,/col) endsem
(Jlconbrsy,Jcol) endsem
sem
Emit ({rconbrsy,Jtcol)
sem
sem
sem
sem
sem
sem
sem
sem
sem
sem
sem
sem
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
Emit
endsem
sem
Emit (Jbecomessy,Jcol)
(Jtimessy,\col) endsem
(\commasy,Ylcol) endsem
({slashsy,col) endsem
(Lplussy,4col) endsem
(\minussy,Jcol) endsem
(Jarrowsy,\col) endsem
(\variantsy,Jcol) endsem
({notsy,dcol) endsem
(Jandsy,Jcol) endsem
(\periodsy,Jcol) endsem
(Jrangesy,Jcol) endsem
(leolsy,Jcol) endsem
sem Emit (Lcolonsy,Jcol)
endsem
endsem
)
| mh
(MDM
MIeN
| eps
sem Emit (\notsy,Jcol)
sem Emit (Llegsy,lcol)
sem Emit (Llsssy,dcol)
endsem
endsem
endsem
sem Emit (\gegsy,Jcol)
sem Emit (\gtrsy,Jcol)
endsem
endsem
)
| ">"
( wan
| eps
Identifier
<out:sy,spix,symcol>
=
Letter <out:ch>
{ Letter <out:ch>
| Digit <out:ch>
sem
sem
sem
sem
symcol:=col; bl:=0
(AddCh) endsem
(AddCh) endsem
(AddCh) endsem
}
sem Hash(lb,lb1,Tsy,
Tspix)
-- sy is identifier
endsem.
endsem
or
keyword
Sec. 8.1
Applications in compiler construction
Number
<out:sy,val,symcol>
=
Digit <out:ch>
{ HexDigit <out:ch>
sem
symcol:=col; bl:=0
(AddCh) endsem
(AddCh) endsem
(CH
sem
DIEBE ER DII ZUR:
sem
sem
endsem
bl:=bl+1; b[bl]:=CHR(typ)
(AddCh) endsem
sem
sem
177
endsem
Convert (lb, /b1,Tsy, Tval)
Digit <out:ch>
[ Digit <out:ch>
endsem
sem
bl:=bl+1;
b[bl]:=CHR(typ)
endsem
sem
bl:=bl+1;
b[bl]:=CHR(typ)
endsem
sem
sem
(AddCh)
(AddCh)
endsem
endsem
]
sem
ConvertReal
(lb, 4b1,Trval) ;
sy:=realcon;
val:=CARDINAL
endsem
(rval)
Convert
(lb, 4b1,Tsy,Tval)
String
<out:sy,addr,
length, firstch,symcol>
sem
(
endsem
=
symcol:=col;
bl:=0
endsem
DIDI
ER TDZISURNEHTE
sus
sem
EZ
HI?
endsem
|
|
Ku
Man
sem
bibl
endsem
sem DIDI
endsem
sem
sem
sem
s="(" 5 bi bl+2 1st 40s
SS
bls =i
HD=a) eles: =billake
SemError (J1,Jline,Jcol); bl:=0 endsem
bl:=bl+1; b[bl]:=CHR(typ) endsem
bb
Beh
(bl t2 = ts
bils=bilAe2
b[bl+2]:="*";
bl:=bl+2
b[b1+2]:=")";
bl:=b1+2
endsem
sem
sem
b[bl+1]:="(";
endsem
b[b1+1]:="*";
endsem
sem
sem
sem
SemError (J1,Jline,Jcol); bl:=0 endsem
bl:=bl+1; b[bl]:=CHR(typ) endsem
length:=bl;
IF length=1
THEN
ELSE
sy:=charcon;
firstch:=b[1]
Chap. 8
Applications
178
sy :=stringcon;
EnterString
(tb, Jbl, Taddr)
END
endsem.
Comment
"(*"
Letter
=
{ comment
<out:ch>
=
(AIBICIDIEIFIGIHIIIJIKILIMINJO|PJQIRISITIUIVIWIXIYIZ]
albleldlel£f|gihliljik|llmInlolplalr|sitlulviwix|ylz)
sem
Digit
ut
<out:ch>
RR
ZU
ch:=CHR(typ)
endsem.
=
TRITT
EZ
TUSU
TEE
I
ERTL)
sem ch:=CHR(typ)
endsem.
sem
endsen.
HexDigit <out:ch> =
digit <out:ch>
| (A|B|C|DIE|F)
ch:=CHR(typ)
ENDGRAM
The rules for Number and String need some explanation:
Numerical constants cannot be converted while they are being recognized
because decimal, hexadecimal, octal, and real constants can be distinguished
only by their last character or by a decimal point. Their text must therefore be
stored and converted later.
Strings also have their peculiarities. Our mini-scanner returns the
character sequences '..', '(*', and '*)' as single terminals. If one of these
Sequences appears within a string it has to be expanded again, since strings
must be stored in their original form. Therefore, the rule for strings gets more
complicated than expected.
On the other hand, the description of strings and comments with the
symbol any looks very simple and elegant. In accordance with Section 5.2.1,
any represents all those terminals which cannot be recognized instead of it, at
this point in the grammar (in String: all terminals except... (“27
SCHE
and ''' (or '"'); in Comment:
all terminals except '(* and '*)'). The
example also shows the semantic processing of any. In a string, the symbol
recognized by any is processed using the global variable typ (see Section
6.4.2).
The reason for the introduction of the terminals fey Chand“)
is not
obvious, and requires an explanation: the symbol '..' is necessary, because
otherwise a lookahead of 2 characters would be needed (the first period
in the
Sec. 8.1
Applications in compiler construction
179
sequence '1..2' may be a decimal point or the start of a range operator).
Although comments can be processed with a single lookahead character, it
simplifies the processing of comments considerably if we treat the sequences
'(*' and '*)' as single terminals.
LL(1) Conflicts
As shown by Example 8.1, it is often difficult to avoid LL(1) conflicts when
lexical structures are described by an attributed grammar:
8.1 Example LL(1) conflicts in lexical structures
Scanner
Symbol
{Symbol}.
| mon
Maren],
This situation represents an LL(1) conflict because if '>' is read and the
next character is '=', the syntax analyzer cannot decide whether this
character belongs to the symbol '>=' or whether it constitutes a separate
symbol '='. Such conflicts also appear in the symbols ':=", '<>', '<=",
Identifier, and Number. However, they are not critical since the syntax
analyzer always chooses the first alternative it encounters during analysis. In the example above, this means that '=' is correctly considered part
of the symbol '>=' rather than being recognized as a separate symbol.
Speed
A lexical analyzer implemented with Coco runs at approximately one-half the
speed of a hand-coded analyzer. A 35% speed gain can be achieved if the
nonterminals Letter and Digit with their many alternatives are already
recognized as terminal classes by the mini-scanner.
Assessment
The example has shown how easy a translation process can be described with
Cocol. At the first glance, the grammar may seem a bit confusing. Yet, as
soon as one becomes familiar with this notation, the following advantages can
be observed:
1.
2.
3.
The grammar is short and precise. For the recognition of a symbol, it is
sufficient to write its name without any additional actions.
The syntax is clearly separated from the semantics. Thus the syntax is
more explicit than it is in a hand-coded compiler.
From the syntax declarations, one can see immediately which terminals
and nonterminals are in the language.
Chap. 8
Applications
180
4.
Error-handling actions need not be described explicitly.
5.
Many constructs, like nested comments, can be described with any ina
straightforward and elegant way which is hard to surpass.
Of course, there are some parts of the grammar which are not very simple to
read, e.g. the production for Number. It has a rather complex structure, but
this only shows that Cocol can also handle difficult constructs. After all, the
production for Number describes four different kinds of numerical constants.
This would be difficult to read in a hand-coded lexical analyzer, too, and could
hardly be written in this short and concise form using a conventional programming language.
8.1.3 Semantic
procedures
for lexical
analysis
We decompose the semantic procedures of the attributed grammar into four
modules Scannerlex, OutMod, ListMod, and Conversions and specify
their definition modules, but omit their implementation modules due to space
limits.
DEFINITION MODULE
VAR typ,col,line:
at:
ARRAY[1..10]
PROCEDURE
END
Scannerlex;
CARDINAL;
(*information
OF CHAR;
(*not
needed
about
the
current
token*)
here*)
GetSy;
Scannerlex.
Scannerlex reads and prints a source text and returns every single character as
a separate token. The token number as well as its column and its line number
are returned by GetSy in the global variables typ, col, and line. The token
numbers are the ASCII-values of the source characters (exceptions: eofch=0,
'.=1, '(*'=2, and '*)'=3). After the last character in the source text is read
GetSy always returns eofch.
DEFINITION
TYPE
MODULE
OutMod;
Symboltype =
(*token codes*)
(eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy,
minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy,
lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, commasy,
semicolonsy, periodsy, colonsy, rangesy, constsy, typesy, varsy,
arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy,
importsy, exportsy, fromsy, qualifiedsy, beginsy, casesy, ofsy,
ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy,
untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy,
becomessy, endsy, callsy, definitionsy,
implementationsy,
proceduresy, modulesy, ident, cardcon, intcardcon, realcon,
charcon,
PROCEDURE
stringcon,
eolsy);
Emit (sy:Symboltype;
col:CARDINAL);
Sec. 8.1
Applications in compiler construction
PROCEDURE
EmitConstant
(sy:Symboltype;
181
val,col:CARDINAL)
PROCEDURE EmitIdent (spix,col:CARDINAL) ;
PROCEDURE EmitString (addr, len,col:CARDINAL) ;
END OutMod.
;
The module OutMod contains procedures to write symbols to an intermediate
language file.
Emit writes a symbol without attributes (e.g. a keyword, an operator or a
single character) to the intermediate language. It emits a word which contains
the symbol type sy and the column col of that symbol.
EmitConstant writes a numeric constant to the intermediate language. It
emits two words, the first of which contains the type sy and the column col
of the symbol and the second the constant value val.
EmitIdent writes a name to the intermediate language. It emits two
words, the first of which contains the symbol type ‘ident’ and the column col
and the second the spelling index (spix) of the name.
EmitString writes a string to the intermediate language. It emits three
words, the first of which contains the symbol type 'string' and the column
col, the second the string address addr and the third the string length len.
DEFINITION MODULE ListMod;
FROM
OutMod
IMPORT
Symboltype;
PROCEDURE EnterString(buffer:ARRAY OF CHAR;
« VAR addr :CARDINAL) ;
PROCEDURE
Hash(buffer:ARRAY
VAR spix:CARDINAL) ;
END
OF CHAR;
len:CARDINAL;
len:CARDINAL;
VAR
sy:Symboltype;
ListMod.
ListMod handles the name list and the string list of the scanner. EnterString
enters a string (stored in buffer[1..len]) into the string list and returns its
address addr. Hash searches a name (stored in buffer[1..len]) in the name
list. If not found it is entered. For keywords Hash returns the token code of
the keyword and spix is 0. Otherwise Hash returns the token code ‘ident'
and spix is the address (spelling index) of the name in the name list.
DEFINITION MODULE Conversions;
FROM OutMod IMPORT Symboltype;
PROCEDURE Convert (buffer:ARRAY OF CHAR; len:CARDINAL;
VAR sy:Symboltype; VAR val:CARDINAL) ;
PROCEDURE
VAR
END
ConvertReal (buffer:ARRAY
OF CHAR;
len:CARDINAL;
rval:REAL) ;
Conversions.
The module Conversions converts digit strings to cardinal or real numbers.
The procedure Convert converts a digit string (stored in buffer[1..len])
to a numeric constant or a character constant. The digit string may have the
following syntax:
digitstring
= digit
{digit}
--
decimal
constant
Chap. 8
Applications
182
'H'
| digit {hexdigit}
| octaldigit {octaldigit}
| octaldigit
{octaldigit}
'B'
---
hex constant
octal constant
'C'.
--
character
constant
For numeric constants the output parameter sy is cardcon and val is in the
range 0..65535; for character constants sy is charcon and val is in the range
VERITIE:
ConvertReal converts a digit string (stored in buffer[1..len]) to its real
value rval. The syntax of the digit string is
digitstring = digit {digit}
8.2 Applications
'.' {digit}
in software
['E' ('+'|'-'] digit
[digit]].
engineering
An attributed grammar as a description method and a compiler compiler as an
implementation tool are not limited to compiler construction. They can also be
useful in other fields of software engineering.
The reason why compiler construction techniques can be generally used
in software engineering is that most large programs have the following characteristics:
1.
2.
Input streams are sufficiently complex to be described in terms of syntax
and semantics.
The structure of the input text often determines the logical structure of the
entire program or of a large portion of it.
This wide field of applications is remarkable. We
known Jackson method of program design can be
program design with attributed grammars. With
the compiler description language is emphasized
stays in the background.
8.2.1 Attributed
grammars
will now show that the wellregarded as a special case of
this in mind, in this section
while the compiler compiler
as a software design method
The use of attributed grammars automatically leads to a two-step design process: In the first step (coarse design) the problem is decomposed into its syntactical and semantical parts. Here, the attributed grammar serves as a design
method. In the second step (refined design) the semantic procedures are designed from their specifications in the rough design.
The creation of the coarse design consists of the following steps, which
may be executed sequentially or iteratively:
Sec. 8.2
Applications in software engineering
183
Write the grammar. The syntactic structure of the input text is described
by a context-free grammar.
Find attributes. Starting from the meaning of each syntax symbol, one
tries to find out which (semantic) attributes should be attached to it. Then
one defines these attributes and their occurrences in the grammar rules.
With some experience and a proper understanding of the problem the
right choice is almost automatic. This step is therefore also a good check
on correct understanding of the problem.
Prepare context conditions. Possibly further attributes may be necessary
for this process. ”
Define semantic procedures. In this step, all procedures which are used
in semantic actions are defined. The refinement of semantic actions into
code and procedure calls may again be done in a coarse or fine manner.
Using the first approach, one may associate a special semantic procedure
with each semantic action; using the latter approach, one may describe
each semantic action in terms of elementary operations of a programming
language without calling semantic procedures. Since many of the semantic procedures are usually access procedures to data structures, they support a modular design in the form of data capsules. The collection of all
procedures shows which operations can be performed with the various
data structures and which relations exist between the data structures.
Set up the attributed grammar. One combines the context-free grammar,
the attributes, the semantic actions, and any context conditions for a
proper attributed grammar.
After these five steps, the coarse design is completed and the following has
been accomplished:
1.
2.
3.
The problem has been decomposed into three parts: syntax, context conditions, and semantic actions.
The attributes and the data structures derived from them are the terms in
which the problem solution can be appropriately described.
The access routines to the data structures and all other algorithms required
for the solution are defined by the semantic procedures.
This completes the design method with attributed grammars. The result is sufficiently abstract to fix only the essential semantic design decisions but to leave
enough freedom to the implementor. On the other hand it is sufficiently concrete to specify explicitly those details that should not be left to the decision of
the implementor.
Applications
184
Chap. 8
The result of the coarse design, consisting of a system of attributes, semantic procedures, and an attributed grammar, can be viewed as the specification for the refined design, since it describes what is to be done but not how it
should be done.
The next step is the refined design which may now exclusively concentrate on the semantic procedures without having to consider any syntactic
problems.
However, coarse design and refined design may influence each other.
After the definition of the attributes, one may find that the semantic procedures
are either too abstract or too concrete, too complex or too simple. For example, too many access procedures to the data structures of a module may indicate that it would have been better to add a lower level of abstraction, and to
divide the large module into several smaller ones. The concise and formal notation of attributed grammars encourages one to try several approaches and to
check their consequences without much effort, even when the task is large.
The refined design is followed by the implementation. Only a lexical analyzer has to be written here, the rest is done by the compiler compiler.
8.2.2 The telegram
problem
as an example
Henderson and Snowdon [1972] presented the following problem, which is
known as the 'telegram problem’:
A stream of telegrams is to be processed. Each telegram is terminated by
the string 'ZZZZ’. The telegram stream is terminated when an empty telegram followed by 'ZZZZ' arrives. The words in a telegram are to be
counted. Long words with more than 12 characters are to be counted separately. After each telegram, the counter values are to be printed. The telegrams are read and subsequently printed in lines of 100-120 characters.
Superfluous blanks are to be eliminated. The maximum word length is 20
characters. Longer words are to cause the program to stop.
Since the input consists of structured data, and its structure will significantly
determine the algorithm, this task is well suited for the application of attributed
grammars, and a subsequent implementation with a compiler compiler.
The design steps for the solution of the telegram problem are:
1.
Setup the grammar of the input data
Terminals:
textword
endword
a word in a telegram
end word (= ZZZZ)
Sec. 8.2
Applications in software engineering
Nonterminals:
TelegramSt ream
the total telegram stream
Text Telegram
EmptyTelegram
Context-free grammar:
TelegramStream
TextTelegram
EmptyTelegram
Define attributes.
result:
WwW
array
n
1
integer
integer
185
a text telegram (including its end word)
an empty telegram containing only the end word
=
=
{TextTelegram} EmptyTelegram.
textword {textword} endword.
endword.
From the specification of the task, three attributes
of char
the text of a word
the number of words in a telegram
the number of long words in a telegram
Assign attributes to the grammar symbols. In this step, we list the grammar symbols and attach attributes to them.
textwordt,
recognizes a word and provides its text w.
Text Telegram, ft)
recognizes and prints a telegram with n
words, of which / words are long.
The remaining gramrnar symbols have no attributes.
Note that the attributed symbols are viewed from an algorithmic
point (i.e. we do not say 'TextTelegram represents a telegram’, but
rather 'TextTelegram recognizes a telegram’). The verbal description of
the attributed symbols should specify all attributes of the symbol. It
should be accurate enough to be used as a specification of the translation
process. This is usually possible and easy to accomplish since the few
items involved have already been previously defined.
Define semantic procedures. The actions the program must execute can
be seen from the problem description:
(a)
read the source text, recognize and count the words;
(b) print the source text with a different line length;
(c)
print the counter values.
Reading the source text is the task of the lexical analyzer and does not
concern us here. The words are counted with the attributes n and /.
Therefore, the only candidates for semantic procedures are those which
print the text and the counter values. A variable will probably be needed
to assure that the line size will not exceed 120 characters. It will be initialized at the beginning of each telegram, and will be checked and increased
when a new word is added to the line. A line buffer may also be needed.
Following the principle of stepwise refinement, we are not yet interested
in the implementation details here. Rather, we define the following three
procedures which will do the whole printing job.
Out Init
Cut Word (dw)
initialize the output of a telegram;
print the word w according to the problem definition;
print the counter values n and / with an appropriate text.
OutAccount (InJ1)
5.
Chap. 8
Applications
186
Write down the attributed grammar. Having completed
through 4, the attributed grammar is almost self-evident now:
TelegramStream
{
steps 1
=
sem
TextTelegramf,f)
OutInit endsem
sem OutAccount ({nb1)
endsem
} EmptyTelegram.
TextTelegramt,tı
=
textwordf,,
where
sem
(|w|<=20)
n:=1;
if
|w|>12
OutWord
then
1:=1
else
1:=0
end;
(Lw)
endsem
{ textwordt,,
where
sem
(|w|<=
20)
n:=ntl;
if
|w|>12
OutWord
then
1:=1+1
end;
(Lw)
endsem
} endword.
EmptyTelegram
=
endword.
This completes the coarse design of the telegram problem. Syntax and semantics are clearly separated. Together they provide a clear decomposition of the
program, making its structure apparent. The separation shows that the semantic processing — i.e. the essential part — is very simple if there is a printing
module with the access procedures Out/nit, OutWord, and OutAccount.
A comparison with Henderson and Snowdon's solution shows that in his
program lexical analysis and syntax analysis attract the major part of attention
in design, program text, and possible design errors. Output and counting are
of minor importance and are nearly lost. Their solution avoids the terms syntax and semantics, thus letting the problem appear to be much more complex
than it is. In contrast, we focus most of our attention on printing and counting.
We consider lexical analysis and syntax analysis as routine matters that do not
require special attention.
Sec. 8.2
8.2.3 Attributed
Applications in software engineering
grammars
187
as documentation
From the above, it should be obvious that attributed grammars are also well
suited for documentation. Thé system of syntax, attributes, semantic procedures, and the attributed grammar of a software product is its documentation
(on the abstraction level of the attributed grammar). The following advantages
of this documentation method are evident:
1.
The form of the documentation (its structure) is easy to find. It is almost
independent of the product to be described, and consists of the parts: terminals, nonterminals, context-free grammar, attributes, attributed
grammar symbols, semantic procedures, and attributed grammar (in
2.
3.
The documentation is formal and therefore precise, complete, and short.
The documentation is abstract enough to hide implementation details, but
concrete enough to express important conceptual details.
The fact that attributed grammars represent a machine-readable documentation renders it unnecessary to separate implementation and documentation, thus ensuring that the documentation is always up-to-date.
this order). This arrangement aids standardization.
4.
3
8.2.4 The Jackson
method
as a special case
At a quick glance, the often discussed Jackson method of program design
seems to have nothing in common with attributed grammars. Jackson [1975]
uses a totally different terminology and describes his method only by examples in an indirect and unsystematic manner. To find out the essence of Jackson's method, the reader is forced to study other literature.
The Jackson method is based on the following three concepts:
Ne
The structure of an algorithm is derivable from its input and output data.
The structure of the input and output data is described by tree diagrams
which allow the description of sequences, alternatives, and (unlimited)
3.
repetitions.
If the structures of the input and output data 'match' in a certain way, the
total algorithm for the transformation of the input into the output data can
be viewed as an assembly of the transformation algorithms for the individual substructures.
If the structures of the input and output data do not match, the Jackson method
fails. However, in the examples in his book, Jackson shows that his method
188
Applications
Chap. 8
can still be used with the aid of tricks such as ‘backtracking’, ‘program inversion’, and some other techniques.
Hughes [1979] looked at the Jackson method from the standpoint of formal languages and summarized the following points:
1.
2.
3.
Jackson's tree diagrams describe only regular languages since they are
only based on sequences, alternatives, and unlimited iterations.
In addition, it is required that the input data can be deterministically analyzed with a single-character look-ahead.
Jackson's requirement of a structural matching between input and output
data means in the terminology of formal languages that there must be a finite automaton that transforms the input into the output.
Jackson's design method can be viewed as a special case of the design method
with attributed grammars, in which:
1.
2.
3.
the input data is regular and its grammar is LL(1);
the output data form a regular language;
acertain correspondence exists between input and output language that
manifests itself in the fact that a finite automaton can be found that transforms the input into the output.
It is therefore only applicable to a narrow range of tasks that meet these
conditions.
It is suprising that this relationship between Jackson's method and the
design method with attributed grammars has hardly been recognized. The reason for this may be that Jackson does not distinguish between syntax and semantics (in fact, they are indistinguishably coupled in his examples), and does
not use attributes.
If we describe the examples in Jackson's book with attributed grammars,
they will become simpler, shorter, and easier to understand. The grammars are
simple throughout. We will show this by example 14 of Jackson's book,
which in his discussion covers 17 pages, and is the most voluminous of the
entire book.
Problem description. An operating system collects data about its use. These
data are: A record for the start of each session (LOGON), the end of a session
(LOGOFF), the start of a program (PROGSTART), and the end of a program
(PROGEND). At logon time, the user is assigned a unique session number.
The system makes sure that a user can start a session only when his terminal is
free, and cannot terminate a session that he has not initiated. Furthermore, a
user can have only one active program at any given time. He must terminate
an active program before starting a new one.
Sec. 8.2
Applications in software engineering
189
The collected data is written to a file. The records have the following
contents:
Logon record:
106G0N
Logoff record:
Progstart record:
Progend record:
session number
LOGOFF-,,
PROGSTART
PROGEND
start time
session
session
session
stop time
program name
program name
number
number
number
start time
stop time
The records are stored in strict chronological order. However, it is possible
that records are missing due to erroneous processing. In this case, the data file
contains incomplete information for some sessions and programs: a logon
record without corresponding logoff record, and vice versa; a progstart record
without corresponding progend record, and vice versa.
As a result, the program should produce the following list:
Number of complete sessions
Average session length
Number of known sessions
= nnnn
—SBEEE
= mmmm
Number
=
of
complete
Average program
Number of known
programs
length
programs
pppp
= uuuu
= qqqq
Grammar. The input consists of four kinds of records. We regard them as
terminals: logon, logoff, progstart, and progend, and arrive at the following grammar:
input
=
{logon
| logoff
| progstart
| progend}.
It consists of a single rule (for regular languages, there is always a grammar
that consists of a single rule). In accordance with the problem description we
attach attributes to the terminals:
session: integer
session number
prog:
name
program name
time:
integer
time of logon, logoff, progstart and progend
and get the attributed grammar symbols
1 FONT sessiont time
logo £fT sessiontt ime
progstartf,essiontprogftime
progendt, essionlprogltime
Semantics. In the semantic actions, we need variables that hold the results.
We call them
completesessions:
integer
knownsessions:
integer
number of complete sessions
number of known
completeprogs:
knownprogs:
integer
integer
number of complete programs
number of known programs
sessions
Chap. 8
Applications
190
sessiontime:
integer
length of all complete sessions
progtime:
integer
length of all complete programs
It is clear from the above that, when a logon record appears, the job number
and the start time must be stored until a logoff record with the same job number is encountered. The same is true for programs. For the time being, we will
put the definition of the concrete data structures in the background, and consider only the fact that we need the following access procedures:
NewSession (lsession\time)
Define the start of a session at the specified time.
DisposeSession(\session)
Define the end of a session.
SessionStarted(\session): boolean
Return true if the specified session has been started.
SessionStartTime(\session): integer
Return the start time of the specified session.
NewProg(\session\prog\time)
Define the start of the program prog in the specified session at the specified time.
DisposeProg(\session\prog)
Define the end of the program prog in the specified session.
ProgStarted(\session\prog): boolean
Return true if prog in session has been started.
ProgStartTime(\session\prog): integer
Return the start time of prog in session.
InitStorage
Initialize the abstract data structure.
Attributed grammar. With only these few facts, which are easily derived by
modest thought, the attributed grammar of the problem can be formulated:
input
=
sem
InitStorage;
completesessions:=0; knownsessions:=0;
completeprogs:=0; knownprogs:=0;
sessiontime:=0; progtime:=0;
endsem
{1 OFONT sessionT time
sem
knownsessions:=knownsessionstl;
NewSession(Lsessiondtime)
endsem
;
Sec. 8.2
Applications in software engineering
191
| prog Startf ses siontprogft ime
sem
knownprogs:=knownprogs+1;
NewProg (lsessionlprogdt ime)
endsem
| progendfsessiontprogTtime
sem
if ProgStarted(JsessionYprog)
then
completeprogs:=completeprogst1;
progtime:=progtime+
(time-
ProgStartTime (Vsession\prog) )
DisposeProg(JsessionYprog)
else knownprogs:=knownprogs+1
end
endsem
| logofffsessionttime
sem
if SessionStarted(Jsession)
then
completesessions:=completesessionst];
sessiontime:=sessiontime+
(timeSessionStartTime
(Jsession))
DisposeSession (Jsession);
else knownsessions:=knownsessionstl
end
endsem
ae
sem
Write (Jcompletesessions)
Write (Jsessiontime/completesessions)
Write (Jknownsessions)
Write (Jcompleteprogs)
Write (Jprogtime/completeprogs)
Write (Jknownprogs)
endsem
At this point, the coarse design is already completed. The refined design will
decide about the concrete implementation of the abstract data structure. In
principle, the program can be implemented with a compiler compiler. In order
to read the input data, only a (trivial) lexical analyzer needs to be written. But
since the grammar of this problem is so simple (as it is also for the telegram
problem), the use of a compiler compiler is analogous to taking a sledgehammer to crack a nut. It is therefore almost self-explanatory that the syntax analyzer for this problem is coded using the method of recursive descent (in this
case it is even non-recursive).
Jackson instead undertakes voluminous considerations about intermediate
data files and program inversions which make the task appear much more
complicated than it really is.
Chap. 8
Applications
192
8.3 Results of a Coco run
For readers interested in the way Coco works, we present an example
showing the contents of the compiler parts generated from a specific input
grammar. It can be viewed as a supplement to the implementation description
in Chapter 7, and should help to understand the principles explained there.
The example will be the description of an index generator, which is a
program that generates an index from a list of keywords entered according to
some syntactic rules. This problem provides another example of the use of
attributed grammars in software engineering.
The input to the index generator is to be as follows: for each page of a
document, the page number and all keywords on this page are entered in the
following manner:
1 = Introduction;
User's
Guide;
2 = Start up;
Parts of the tool;
3 = General characteristics; User's
Guide
On the left-hand side of the '=' sign, page numbers as well as words are
allowed. Words, however must start with a '*':
*Appendix
= Maintenance;
Troubleshooting;
From this input, the compiler generates a file of pairs <keyword, page
number>, sorts this file, and prints an index in which page numbers of
identical keywords are collected (the index at the end of this book was produced with such a program).
In our example, we will describe the first phase of this compiler, i.e. the
generation of the <keyword, pagenumber> file.
1
GRAMMAR
Index
2
3
SEMANTIC
4
FROM
5
6
FROM
7
8
VAR
f: File;
keystring,refstring,string:
value:
9
10
11
DECLARATIONS
FileIO IMPORT File, Open,Close, Write, WriteString, WriteLn;
Indexlex IMPORT GetKeyword, AdjustNumber.
ARRAY[1..50]
CARDINAL;
TERMINALS
n=)
Urn
alias
alias
equal
semicolon
Seil
== 7;
14
MER
alias
asterisk
=>
15
16
keyword
number<out:value>
17
18
==
=>. 5)
OF
CHAR;
Sec. 8.3
Results of a Coco run
18
19
20
PRAGMAS
eolsy
21
NONTERMINALS
22
23
24
Index
Relation
Reference<out
--
,
193
6
-- 7
= te
=.)
:string>
25
26
RULES
27
28
Index
=
29
30
sem Open (f, "INDEX.OUT")
sem Close(f) endsem.
{Relation}”
endsem
3l
------------------------------------------- 2-2...
32
33
Relation =
Reference<out:refstring>
34
n_"
35
36
37
{ keyword
sem GetKeyword (Tkeystring);
WriteString (Jf,\keystring);
Write (Jf,JCHR(0));
38
39
WriteString (Jf,Jrefstring);;
endsem
40
41
QQ
43
„44
45
0.0...
WriteLn(f)
Wie "
}
eon
nnn
5
Reference<out:string>
number<out :value>
| "*" keyword
5-5-5
= == 5-5 $=
$= $= - = -- = - == - == == ---------
=
sem AdjustNumber (Jvalue,Tstring)
sem GetKeyword (Tstring) endsen.
endsem
46
47
ENDGRAM
This is the description of the translation process. The only thing the user has
to provide is the module /ndexlex that supplies the terminals and exports the
two procedures GetKeyword and AdjustNumber. GetKeyword should
return the keyword string that the lexical analyzer has obtained after recognition of the terminal keyword. AdjustNumber should right-justify a number
in a character field for sorting. The pragma eolsy is specified only to show
how pragmas are encoded in the generated tables.
From this input, Coco generates a table-driven syntax analyzer and a
semantic evaluator. These modules will be discussed in the next sections.
8.3.1 The generated
syntax analyzer
The syntax analyzer is generated from a frame program (cocosynframe,
shown in Appendix F) into which Coco inserts the following constant
declarations.
Chap. 8
Applications
194
CONST
maxname
maxnamep
=
=
vs
oF
(*length
(*number
of name list*)
of names*)
maxcode
=
48;
(*length
of G-code*)
maxany
maxeps
maxt
maxp
maxs
startpc
=
=
=
=
=
=
ip
aie
57
6;
9;
44;
(*number of any-sets. At least one
(*number of eps-follower sets*)
(*last terminal number*)
(*last pragma number*)
(*last nonterminal number*)
(*start address of the grammar*)
dummy*)
These values are the table dimensions derived from the above grammar.
8.3.2 The
generated
semantic
evaluator
The semantic evaluator also consists of fixed frame parts and parts that are
copied from the attibuted grammar. For the index generator, the semantic
evaluator is as follows (generated parts are shown in italics and frame parts are
shown in roman type):
IMPLEMENTATION MODULE Indexsem;
FROM SYSTEM IMPORT WORD;
FROM Indexlex IMPORT at;
FROM
FileIO
FROM
Indexlex
VAR
f:
IMPORT
File, Open, Close, Write, WriteString,WriteLn;
IMPORT
GetKeyword, AdjustNumber;
File;
keystring, refstring,string:
value: CARDINAL;
PROCEDURE
BEGIN
ASSIGN(VAR
x:=y
END
x:WORD;
y:WORD);
ASSIGN;
PROCEDURE Semant (sem:CARDINAL)
BEGIN
CASE sem OF
vee?
| 12:
ARRAY[1..50]
;
(*line 29*)
Open (f, "INDEX.OUT")
[| 13:
(*line
30*)
Close (f)
| 14:
72152.
(*line
33*)
refstring:=string;
(line 35%)
GetKeyword (keystring);
WriteString(f,keystring) ;
Write(f,CHR(0));
WriteString(f,refstring);
I 16:
(*line
44%)
WriteLn (f)
OF CHAR;
Sec. 8.3
Results of a Coco run
195
ASSIGN (value, at [1]);
[17s
I 18:
(*l]ine 44%)
AdjustNumber (value, string)
(*line 45%)
GetKeyword (string)
END;
END Semant;
END Indexsen.
8.3.3 The generated
parser tables
Coco generates the following tables:
1.
2.
G-code;
information about nonterminals (G-code start address, deletability, set of
start symbols);
terminal successors of eps-symbols;
symbol sets represented by any-symbols;
number of attributes for terminals and pragmas;
number of semantic actions for pragmas;
symbol names for error messages.
The table values are inserted as initialization code into the generated syntax
analyzer. We will now show these values in a decoded form.
G-code
Address
=
Instruction
Code
(addresses
take
uindex2——=
IL
SEM12
2
NTA
Relation,
3
8
0
6
JMP
2
l@
@
2
EPS
SEM13
RET
1
8
13
ia
1
Reference
2
9
14
al
1
4
0
15
he
2
10)
Os
Cu?
9
oF
12
See
13
15
16
18
22
23
25
28
VW
9
9
Re OU LON mea
NT
SEM14
a
TA
SEM15
T
JMP
EPS
won
keyword,
Du
18
2
28
28
2 bytes)
Chap. 8
Applications
196
=
References
31
TA
number,
35
SEM16
36
SEM17
37
38
40
42
43
RET
"iL
T
SEM18
RET
===
38
1
16
5
0238
WH
11
0
0
18
ial
ial
keyword
3
4
dunmyarulleg=
44
NT
Index
2
7
46
Te
EOF
0
0
48
RET
im)
The entire grammar occupies only 48 bytes of G-code!
Nonterminal description
symbol
Index
(no.)
start
(7)
Relation (8)
Reference (9)
address
deletability
iL
deletable
{"*",
number}
nondeletable
nondeletable
{"*",
{"*",
number}
number}
13
31
terminal
epS-SUCCEesSSors
119
{EOF }
oe
{EOF,
"*",
number}
Number of attributes for terminals and pragmas
EOF:
mat:
ml:
"Ku
0
0
0
‘
keyword:
number:
eolsy:
0
1
0
0
Pragma semantics
attribute
eolsy:
passing
0
action
user
action
0
Symbol names
names:
EOF/equal/semicolon/asterisk/keyword/
number/eolsy/Index/Relation/Reference
nanespointers
ip sell,
21,
BO,
Sh,
“5.
Sil,
57,
66
start
symbols
9
Experiences with Coco
In 1981 workers at the University of Linz built a parser-generator that generates parser tables for an LL(1) syntax analyzer from an input grammar in
Wirth's EBNF notation. The generator proved useful, which is the reason
why it was enhanced in 1983, and eventually evolved into the compiler
compiler Coco.
The first version of Coco ran on an Intel 8080 development system, and
was written in PL/M-80, a language similar to PL/I for microcomputers. Since
then, many more versions of Coco have been implemented in Modula-2 on
various microcomputers including the Macintosh, the IBM-PC, the Atari 1040
and the Lilith. There is also a version for IBM mainframes. Coco has been in
use for several years now and has proved to be useful both in research
projects (e.g. construction of a Modula-2 compiler, tools for static program
analysis) and in student courses.
9.1
A basis for measurements
In the following sections, we will describe the results of memory and run-time
measurements performed on Coco, and on three compilers generated by Coco.
First, we will measure the generation of a Modula-2 compiler. This
compiler consists of 6 passes (lexical analysis, syntax analysis, name
analysis, declaration analysis, semantic analysis, and code generation). Each
of passes 2 through 6 reads the entire source program in an intermediate
language generated by the previous pass. This intermediate program is
197
Chap. 9
Experiences with Coco
198
analyzed and forwarded to the next pass as a new, usually shorter, intermediate program (with the exception of pass 6, which generates the object
code). Each pass is therefore a compiler in itself, described with an attributed
grammar and translated by Coco into a syntax analyzer and a semantic
evaluator. For the measurements, we will not look at the entire Modula-2
compiler, but rather at two specific passes, since we are interested in the
individual Coco runs. We select pass 2 (syntax analysis) and pass 4
(declaration analysis). These two passes have rather different characteristics,
which make them well suited for a comparison. Pass 2 has a large and deeply
nested recursive grammar with only a few semantic actions, while pass 4 has a
simple grammar with a lot of semantic actions. In the following paragraphs,
we will talk about each of the passes as if they were independent compilers.
Secondly, we will measure the generation of Coco by itself. Compared to
the Modula-2 compiler Coco is much smaller and consists of a single pass.
Thus, we have a comparison between two large applications and a small
application. Table 9.1 shows the sizes of the compilers in terms of their
attributed grammar.
Table 9.1 Size of the attributed grammars of the example compilers
Modula-2
Modula-2
(pass 2)
(pass 4)
Number of lines
Terminal symbols
Pragmas
Nonterminal symbols
Alternatives
Symbols in productions
Semantic actions
G-code
The measurements shown in the following sections were taken from the Lilith,
since the Modula-2 compiler was only available there. For the Macintosh the
results would have been very similar.
The Lilith is a 16-bit computer built on an Am2901 bit-slice processor
with a cycle time of 150 nanoseconds. It has a very compact object code
format (the so-called M-code) which has been especially tailored to Modula-2.
Sec. 9.2
9.2 Measurements
Measurements on Coco
on
199
Coco
First, we will look at Coco and measure the memory requirements and the-run
time required by Coco to generate a compiler.
Memory requirements
Obviously the memory requirements for the code and the static data of Coco
are the same in all three measurements (65 347 bytes). The size of the dynamic
data depends on the input grammar but requires typically less than 1000 bytes
(see Table 9.2).
Table 9.2 Memory requirements of Coco for the generation of various compilers
Modula-2
(pass 2)
65537 bytes
66219
bytes
65911
bytes
The memory requirement for the code is shared between ten Coco-specific
modules and two standard modules. In addition, Coco uses one module that
belongs to the resident part of the operating system, and thus does not increase
Coco's memory requirements.
Run-time
The run-time of Coco depends on the size of the input grammar. Most of the
time is used by the lexical analyzer that reads and lists the grammar. To write
out the syntax analyzer and the semantic evaluator of the target compiler also
requires considerable time, while the rest of the work is done fairly rapidly. In
large grammars, with a deeply nested hierarchy of nonterminals (as in pass 2
of the Modula-2 compiler), also the grammar tests take a certain amount of
time. (see Table 9.3)
Chap. 9
Experiences with Coco
200
Table 9.3 Run-time of Coco for the generation of various compilers
Modula-2
(pass 2)
Modula-2
(pass 4)
Lexical analysis
Syntax analysis, semantic processing
Grammar tests
Output of the generated compiler
9.3 Measurements
on some
generated
compilers
We will now consider the memory requirements and the run-time of the
compilers generated by Coco.
Memory requirements
Here, we are only interested in parts which are actually generated by Coco,
namely the syntax analyzer, the semantic evaluator, and the parser tables. We
are not going to consider the size of the semantic modules since they are
independent of Coco.
Table 9.4 Memory requirements of some generated compilers
Modula-2
(pass 2)
Syntax analyzer
Semantic processor
Analysis tables
9532
bytes
8389
bytes
6344
bytes
All three compilers use the same syntax analyzer driven by different tables. Its
size is constant. The size of the semantic evaluator depends on the number and
the length of the semantic actions of the attributed grammar. As expected, its
size is larger in pass 4 of the Modula-2 compiler than in pass 2 and in Coco.
Note that the memory requirements of the generated compilers do not
depend on the length of the input text, since no Syntax tree of the input
is built.
Sec. 9.4
General experiences
201
Run-time
The run-time of the generated compilers on input texts of various length
is
shown in Table 9.5.
Table 9.5 Run-time of some generated compilers
100 Input symbols
1000 Input symbols
5000 Input symbols
Even though Coco is the smallest of the three compilers, it runs much slower
than the others since it does a lot of input and output (it writes long parts of
Source programs to disk), while pass 2 and pass 4 of the Modula-2 compiler
work almost entirely in the main memory (with input and output used only for
intermediate languages).
#
ws#
9.4
General
experiences
The experiences with Coco are exceptionally good. Coco allows a tight and
very readable specification of the translation processes. The attributed grammars become essential parts of each compiler documentation.
By automating syntax analysis, error handling, and semantic processing,
attention can be focused on the actual translation process in the semantic
procedures. More time is available for the design now. Working with attributed grammars almost automatically leads to a modular program structure
with abstract data structures and access procedures, which are usually small
and easy to understand.
In multi-pass compilers, like the Modula-2 compiler, the symbol any is
especially useful since it lets one easily skip over portions of the input that are
not of interest in this pass. The concept of pragmas has also proved useful
since they make it easy to pass control information between successive passes
(e.g. trace commands, options, etc.).
The limitations of LL(1) grammars are not a serious problem. Because of
Wirth's EBNF notation, it is not necessary to perform complex grammar
transformations in order to remove LL(1) conflicts, which is usually required
202
Experiences with Coco
Chap. 9
in the standard BNF notation. The only time when we failed to resolve LL(1)
conflicts was in the translation of the language PLM-80. The conflicts were
resolved by delegating some parts of the processing to the lexical analyzer.
Processing the input with L-attributed grammars and without building a
syntax tree is not a serious restriction. If during processing some attributes are
needed which only become available later, intermediate results are stored until
the required attributes have been calculated and the final translation is possible.
The omission of a syntax tree leads to efficient compilers with regard to speed
and memory requirements. Most of the generated compilers run on microcomputers.
The negative experiences in the use of Coco are limited to the global
nature of semantic objects in Cocol, which requires explicit stacking of
variables, and to the fact that whenever an error has been detected in the
attributed grammar the program development cycle is enlarged by an additional
run of the compiler compiler.
However, the positive experiences outweigh the negative ones. Even
though we have no hand-coded compiler that we can compare directly to a
Coco-generated compiler, we are not afraid to claim that the efficiency of
compilers generated by Coco is close to that of hand-coded compilers, and it is
certainly easier to implement and to maintain a compiler with Coco than by
hand.
A
Definition of Adele
An algorithm description language, like a programming language, should offer all concepts
for the description of algorithms, but should be free of syntactic peculiarities. In this way,
the algorithms will stand out clearly and the reader will not be distracted by all sorts of
baroque constructs. For the same reason, it should use only a few constructs and give the
user freedom of expression. It should lean on popular programming languages so that it is
easy to read, but should not be firmly bound to a particular programming language. Our
algorithm description language Adele contains elements of PL/I, Modula-2, and Ada. We
will describe its structure by a few examples.
Overall structure
Each algorithm has a name, parameters, and instructions:
Search (Jlistllengthixti):
begin
Instructions
end Search
The parameter list of functions is followed by the type of the function:
Search (llistllengthlx)
integer:
begin
Instructions
return ı
end Search
Input parameters are marked by J, output parameters by T, and transition parameters by J.
Statements
We distinguish between assignments, procedure calls, control statements, input-output
statements, and text statements. To improve readability, instructions may optionally be
separated by a semicolon.
203
App. A
Definition of Adele
204
Assignment.
The assignment has the form
variable:= expression
Procedure call. The call of a procedure consists of the procedure name and the actual parameters in parentheses:
ReadCard (Tcard)
It is a useful convention to define procedure names partially with capital letters, and variable
names completely with lower case letters.
Control Statements. Here we use the modern forms of Modula-2 which are explicitly
terminated by an end, with the exception of the repeat statement:
if expression
then
statement
sequence
end
if expression
then
statement
sequence
else
case expression of
label: statement sequence
| label:
statement
or
sequence
expression
repeat
loop
do
Statement
statement
variable
statement
statement
sequence
sequence
sequence
sequence
with
:= expression
sequence
until
end
| label:
else
end
while
sequence
case expression of
label: statement
end
for
statement
statement
statement
sequence
end
expression
exit
end
to expression
[by expression]
do
sequence
end
The control variable will be undefined after completion of the for loop.
exit
exits from the immediately enclosing loop statement.
return
exits from a procedure.
return
expression
exits from the function procedure with expression as the function
value.
halt
stops the algorithm without return to a surrounding algorithm.
Input-output statements. Here we only use three statements:
read(TxTeof)
write (Lx)
writeln
read x or signal end of input file
write x to the output medium
emit line feed
We do not concern ourselves with the format of the input and output text. The boolean
parameter eof indicates the end of the input file. When x has been read, eof will
be false
App. A
Definition of Adele
205
on return. If x could not be read due to end of file, eof will be true and x will
be undefined
on return.
Text statements. Text statements are free texts that describe actions. For example:
calculate
mean
values
and
variances;
The only rule is that they be terminated or separated by a semicolon so that their end can
be
seen.
Expressions
For expressions we stipulate the common combinations of Operators and operands without
giving specific rules. We state only that boolean expressions can be viewed as conditional
expressions with short circuit evaluation:
ach
au || Io
is equivalent to
is equivaleni
to
if a then
if a then
b else false
true else b end
This means that if the left operand alone is sufficient to determine the value of the
expression, then the right operand is not evaluated.
Declarations
Usually declarations are not needed for the description of short and simple algorithms,
especially if the variables used are obvious from the preceding explanations. However, in
longer algorithms with local variables, global variables, parameters, and perhaps also named
constants, it is advantageous if the algorithm description language also contains declarations.
In Adele, the declaration of constants and variables can be written between the head of the
algorithm and begin. We partition the declared items into the following classes: parameters, global variables, constants, local dynamic variables, and local static variables. The
classes are identified by the keywords param, global, const, static. After each keyword,
one or more declarations of names of the corresponding type can be placed.
A constant declaration has the form
name
=
value
a variable declaration has the form
name:
type
As types we use the elementary types of Pascal and Modula-2 with the following keywords
or structures:
integer
real
boolean
char
(red, green,
array
blue)
(index:index)
of type
Array types allow a certain amount of freedom. If the range limits are not needed, we write
array
of type
If the type is not needed, we write
array
(index: index)
Definition of Adele
206
App. A
If both are not needed, we simply write
array
As an example of the use of declarations, we describe a linear search algorithm with
declarations of all names:
Search (vlistd length! xT i) :
param list: array of integer
length, x, i: integer
local j: integer
begin
j:=length
while j>0 & list (j)<>x do
j:=j-1
end
1:=j
end Search
For static variables, we allow optional initialization. This is done by adding the phrase
init(value) after the type:
static
finished:
boolean
init (false)
Comments
Comments, like those in Ada, start with two minus signs and extend over the rest of the
line.
---
This is a comment
which extends over
Undefined
two
lines.
issues
Adele has no rules for the remaining items such as records, pointers, modules, etc. We write
them, more or less, in the style of Modula-2.
B
Modula-2 and Pascal
Since Modula-2 evolved from Pascal, its appearance is very similar to Pascal, and so Pascal
programmers have no difficulty in reading Modula-2 programs. Here we will briefly present
the most important differences for the reader of the Modula-2 programs in this book. The
complete language definition and examples can be found in the books of Wirth [1982] and
Pomberger [1986]. A didactically emphasized introduction to Modula-2 is the book of
Blaschek, Pomberger, and Ritzinger [1985].
General’ characteristics
Modula-2 is a system implementation language that enhances Pascal in the following key
features:
1.
2.
3.
Modular program structure. Modula-2 programs are composed of separately compiled
modules. The compiler checks the consistency of the interface between modules. The
language is therefore especially suited for the implementation of data capsules and
abstract data types.
Coroutines and parallel processes. Modula-2 provides the coroutine facility as the
basic element for the implementation of parallel processes.
Low-level features. Modula-2 provides facilities to bypass strong type checking so
that memory words can be directly accessed and addresses can be handled. This makes it
possible to produce machine-specific code.
We will not describe parallel processing or low-level features in this chapter since Coco does
not use them.
Lexical elements
Modula-2 differs from most Pascal implementations by its sensitivity to the case of letters.
The names TRUE, True, and true denote three different objects.
Single character constants can be denoted by use of an octal number that is terminated
with a 'C', e.g. CONST ff = 14C.
207
App. B
Modula-2 and Pascal
208
Declarations
In contrast to Pascal, constants, type, variable, and procedure declarations can appear in any
order. There are no labels or label declarations.
Standard types. In addition to the standard types of Pascal; INTEGER, REAL,
BOOLEAN, CHAR, we have the standard type CARDINAL for unsigned natural
numbers. For 16-bit implementations, the range of integer values is -32 768 to +32 767.
The range of cardinal values is 0 to 65 535.
Enumeration, subrange, array, record, and pointer types are the same as in Pascal with
the exception that arrays cannot be packed, and variant record types have an improved syntax.
If the word length of the computer is w bits, then the cardinality of set types is
confined to w, or a ‘small multiple thereof (according to the language definition). There is a
standard type BITSET that consists of the elements 0 through w - 1:
TYPE
BITSET
=
SET
OF
[0!..w-1]
Set constants are enclosed in '{' and '}'.
The machine-dependent type WORD denotes arbitrary data whose length is a
machine word. It is compatible with all types whose length is a machine word.
Expressions
Expressions in Modula-2 are constructed in the same way as in Pascal. The operators have
essentially the same meaning. One important difference in Modula-2 is that expressions that
contain the operators 'AND' or 'OR' are interpreted as conditional expressions whose
evaluation is terminated as soon as the result of the expression is known (short-circuit
evaluation):
a AND b
a OR b
is equivalentto
is equivalentto
if a then b else false
if a then true else b
Statements
Assignment, procedure call, and repeat-statement are taken from Pascal without change.
If, case, while, and for statements have been syntactically improved and expanded. The
if statement can have one or more elsif parts, the case statement can have an else part. All
of these constructs are explicitly terminated by END, which eliminates the need to
distinguish between single and multiple statements in a block:
ifstatement =
IF expr THEN statementsequence
{ELSIF expr THEN statementsequence}
[ELSE statement sequence]
END.
casestatement
CASE
case
expr
=
=
OF
Whilestatement
WHILE
case
caselabellist
expr
":"
case}
[ELSE
statementsequence]
statementsequence.
=
DO
forstatement =
FOR ident ":="
statementsequence
expr
statement sequence
END.
{"|"
TO expr
END.
[BY constexpr]
DO
END.
App. B
Modula-2 and Pascal
209
New features are the loop statement (infinite loop), the exit statement to leave
the loop
statement, and the return statement to leave a procedure or function (here with
passing of
the function value):
loopstatement
=
exitstatement
= EXIT.
returnstatement
LOOP
statementsequence
RETURN
=
END.
[expr].
There is no goto statement and no input-output statement in Modula-2. Input and output is
done by procedure calls.
Procedures
7
There are procedures and function procedures as in Pascal that permit VAL and VAR parameters. Procedures and functions both begin with the keyword PROCEDURE. Modula-2
permits procedure variables (not used by Coco), and arrays of unspecified length (so-called
open arrays) e. g. in the form:
PROCEDURE Sort (VAR
VAR n: INTEGER;
list:ARRAY
BEGIN
(* assume
n:=HIGH (list);
(* standard
END
OF
list:
proc.
INTEGER);
ARRAY[0..n]
to
find
OF
upper
INTEGER
limit
*)
of index
*)
Sort
Standard procedures. The standard procedures that differ from Pascal are:
CAP (ch):
CHAR
converts from lower to upper case
HIGH(a):
CARDINAL
returns the upper bound of array a
DEC (x)
decrease
DEC (x,n)
X!=x-n
EXCL (s,i)
exclude element i from set s:
HALT
terminate entire program
INC (x)
increase
INC (x,n)
INCL(s,i)
x:ı=X-1
s:=s-{i}
x:=xt1
x:=x+n
include element i in set s:
s:=st{i}
Type transfer functions. Modula-2 offers the possibility of explicit type conversions by
so-called type transfer functions. Each type name can be used as a function with one
argument. For example, the type transfer function
CARDINAL
(b)
denotes the bit pattern of b (without any conversion) but with type CARDINAL. The
context condition must hold that type b has the same number of bits as CARDINAL.
Type transfer functions should be used with care since they make programs machine
dependent.
Modules
An executable Modula-2 program consists of one or more separately compiled modules. A
module is a collection of declarations and statements giving a higher-level unit. Module
boundaries are like a fence for names, which means that names declared inside a module are
unknown outside, and names declared outside a module are unknown inside. The programmer
can open the fence for selected names by an import list that contains all names that are
Modula-2 and Pascal
210
App. B
declared outside and are to be known inside the module and an export list that contains all
names that are declared inside the module and are to be known outside. Thus the access is
explicitly specified by the programmer and visible in the program text.
There are four kinds of modules: main modules, definition modules, implementation
modules, and inner modules.
Main modules are almost like Pascal programs. They consist of an import list,
declarations (of constants, types, variables, procedures, and inner modules), and statements:
programmodule =
MODULE ident ";"
{import}
{declaration}
BEGIN
statement sequence
END
ident
"."
Only the line {import} is different from Pascal. It references other separately compiled
modules, and causes these modules to be loaded. In the most common form
import
= FROM
ident
IMPORT
identlist
";"
ident is the name of the module to be loaded and identlist contains the names of the objects
exported by the loaded module for use in the declarations and statements of the importing
module. In the less common form
import
= IMPORT
identlist
";"
the identlist contains only the names of the modules that are to be loaded together with the
importing module.
Separately compilable modules that are not main programs consist of two separately
compiled parts, the definition module and the implementation module. The definition
module describes the interface of the module to its clients. All declared names are
autornatically exported.
definitionmodule
DEFINITION
=
MODULE
ident
";"
{import}
{definition}
END ident "."
definition contains the declarations of the exported objects. Procedures are only specified by
their procedure heading (procedure name and parameters):
definition =
GONS Tarra
EYRE
| WS
a
Sec
| PROCEDURE
ident
[formalparameters]
";".
The implementation module contains the declaration of the non-exported objects, the code
for all procedures, and the statements of the module:
implementationmodule
IMPLEMENTATION
{import}
{declaration}
BEGIN
statement sequence
END
ident
"."
=
MODULE
ident
";"
App. B
Modula-2 and Pascal
211
Definition and implementation modules exist in pairs and have the same name. The
definition module must be compiled before the implementation module. A module can be
compiled only if the definition modules of all of the imported modules have been compiled
before.
Storage for local objects of separately compiled modules is allocated when the object
program is loaded, and remains allocated until the program terminates (static memory
allocation). The statement sequence of the implementation module is executed immediately
after loading the module, and therefore can be used for the initialization of data.
Inner modules are modules that are not separately compiled. They are like procedures
nested inside other modules or procedures. They can import and export.
moduledeclaration
MODULE
ident
=
";*
{import }
[EXPORT [QUALIFIED]
{declaration}
identlist
";"]
BEGIN
statement sequence
END
ident.
Storage for local objects of inner modules is allocated when the procedure that contains the
inner module is activated, and released when the procedure returns to its caller. By calling the
surrounding procedure, the statements of the inner module are also executed.
There is a (fictitious) separately compiled module
SYSTEM, provided by the
compiler, that gives access to low-level features. It exports types and related procedures
(including the type WORD). Each module that imports SYSTEM is therefore machine
dependent.
¥,#
C
Syntax of Cocol
Keywords:
Other terminal symbols:
Nonterminal symbols:
Cocol
Upper-case letters
Literals or lower-case letters
Upper and lower-case letters
= GRAMMAR
identifier
[SEMANTIC
DECLARATIONS
{any}]
{MACROS {SemMacroDef
}]
TERMINALS {Symbol [Attributes]
[PRAGMAS {Symbols
[Attributes]
NONTERMINALS
RULES
{identifier
{identifier
[AliasName]}
[SemAction]}]
[Attributes]
[Attributes]
"="
[AliasName]
Expression
ENDGRAM,
Expression
= Term
{"|"
Term
= Factor.
{Factor}.
Factor
=
[Attributes]
Symbol
Term}.
| EPS
| ANY
| SemAction
| "(" Expression ")"
[SUL SE xpressaonmua
| “Expression: ™}te
Attributes
=
"<"
( OutAttributes
| InAttributes
[";"
OutAttributes])
InAttributes
=
IN ":" (identifier | number)
{"," (identifier
| number)
}.
OutAttributes
=
OUT
SemAction
="SEM
":"
identifier
(EN (2
{","
identifier}.
identifiers)
| {any}
) ENDSEM.
SemMacroDef
= SEM
Symbol
= identifier
AliasName
= ALIAS
212
":"
identifier
|| string.
Symbol.
":"
{any}
ENDSEM.
">",
"."}
}
D
G-code
T
sy
terminal
If the next input, symbol is sy, then recognize it, else report an error.
TA
sy adr
terminal with alternative
If the next input symbol is sy, then recognize it, else go to adr.
NT
sy
nonterminal
If the next input symbol is a valid start of the nonterminal sy, then enter
the
production of sy, else report an error.
NTA
sy adr
nonterminal with alternative
If the next input symbol is a valid start of the nonterminal sy, then enter the
production of sy, else go to adr.
NTS
sy sem
nonterminal with input attribute semantics
If the next input symbol is a valid start of the nonterminal sy, then execute the
semantic action sem (for input attribute assignment) and enter the production of sy,
else report an error.
NTAS
sy adr sem nont. with alternative and input attribute semantics
If the next input symbol is a valid start of the nonterminal sy, then execute the
semantic action sem (for input attribute assignment) and enter the production of sy,
else report an error.
ANY
any
Recognize the next input symbol.
ANYA
nradr
any with alternative
If the next input symbol is in the symbol set (any-set) denoted by nr, then recognize
it, else go to adr.
EPS
nr
epsilon (empty string)
If the next input symbol is in the successor set (eps-set) denoted by nr, then
recognize the empty string, else report an error.
EPSA
nradr
epsilon with alternative
If the next input symbol is in the successor set (eps-set) denoted by nr, then
recognize the empty string, else go to adr.
11
JMP
adr
Go to adr.
Jump
RET
return
Return from the production of a nonterminal.
129%
SEM
semantic action
Execute the semantic action with the number of the G-code instruction.
213
E
Intermodular cross-reference list
The following list contains all names that are exported or imported by a module of the Coco
system as well as their data types. For every name, the first reference denotes the exporting
module and the other references the importing modules.
Allocate
PROC
(VAR
System,
alts
cocogen2,
ARRAY [1..10]
cocolex,
Attrtype
PROC
CloseFile
PROC
cocosyn
cocotst
s:Symbolset;
n:CARDINAL)
cocotst
cocogen,
cocogen2,
cocolst
coco,
cocosem
CARDINAL
PROC
PROC
cocogen,
cocogen2,
cocosem,
cocosym,
(nr:CARDINAL)
cocogen,
cocogen2,
cocosem,
(sy,nr:CARDINAL) : BOOLEAN
cocosym,
214
cocosem,
PROC
Errors,
CompleteAt
cocosem
m:Marklist)
PROC (f:File)
FileIO, coco,
cocolex,
CompErr
Errors
CARDINAL
cocosym,
(VAR
cocogen,
col
cocosyn,
cocosem
(VAR
cocosym,
Close
cocosym,
(term, nonterm, const)
cocogra,
ClearSet
OF
cocogen,
cocogen,
ClearMarkList
size:LONGINT)
cocogen2,
CARDINAL
cocogra,
at
ptr:ADDRESS;
cocogen,
cocosem
cocosym
cocosyn
App. E
con
Concat Left
Intermodular cross-reference list
File
FileIO, coco, cocogen, cocogen2, cocogra,
cocosym, cocosyn, cocotst, Errors
PROC
(VAR
PROC (VAR gp,gl,gp1,gl1:CARDINAL)
cocogra, cocosem
Copy
PROC (typ,col:CARDINAL)
cocogen, cocosem
CopyFramePart
PROC 4VAR f1,f2:File;
cocogen, cocogen2
ddt
ARRAY ["A".."Z"]
cocolex,
OF
coco,
s:ARRAY
OF CHAR)
BOOLEAN
cocogra,
cocosem,
PROC (VAR ptr:ADDRESS)
System, cocogen, cocogen2,
PROC
cocosem,
cocosem
ConcatRight
Deletable
cocolex,
gp/9g1,gpl1,g11:CARDINAL)
cocogra,
Deallocate
215
cocosym,
cocotst
Errors
(loc:CARDINAL) : BOOLEAN
cocogra,
cocosym,
cocotst
DeleteRedundantEps
PROC
cocogra, Coco
DelNode
Direction
PROC (gn:Graphnode) : BOOLEAN
cocogra, cocosym, cocotst
(up, down)
cocosym,
cocosem
Done
BOOLEAN
EF
CONST
EmitAction
PROC (line:CARDINAL;
cocogen, cocosem
EOL
CONST
FileIO,
FileIO,
cocogen,
cocogen2
cocolex,
cocolst
VAR
cocolex,
cocolst
RECORD
Errors,
cocosyn
Errorptr
POINTER
Errors,
TO Errornode
cocolst, cocosyn
File
RECORD
FileIO,
coco,
BOOLEAN
cocogen,
coco
FindCircularRules
PROC
cocotst,
(VAR
coco
filesopen
FindDelSymbols
sem:CARDINAL)
CHAR
FileIO,
Errornode
coco,
CHAR
PROC
cocosym,
Coco
cocogen,
cocogen2,
ok:BOOLEAN)
cocolex,
cocolst,
Errors
GenAssign
GenSynFiles
GetA
cocosem
PROC
cocogen2,
coco
PROC
(n:CARDINAL;
GetF
PROC
GetFirstSet
PROC
set:Symbolset)
VAR
(sy:CARDINAL;
set:Symbolset)
VAR
set:Symbolset)
cocotst
(spix:CARDINAL;
cocosym,
VAR
sem:CARDINAL)
cocosem
(spix:CARDINAL;VAR
cocolex,
cocogen,
name:ARRAY
cocogen2,
PROC (VAR nr, line,col:CARDINAL)
BrFOLSTREOCOLSE
GetNextSynErr
PROC
(VAR
GetNode
PROC
PROC
VAR
CHAR;VAR
cocosym,
cocogen2,
(VAR
VAR
gn:Graphnode)
cocosem,
synerrors,
cocosym,
cocotst
semerrors:CARDINAL)
EEEORSTELCHEO
GetSy
PROC
cocolex,
cocosyn
GetSy
PROC (sy:CARDINAL;
cocosym, cocogen2,
GetSymbolSets
PROC
cocosym,
coco
gramspix
CARDINAL
cocosym,
cocogen2,
PROC
cocogra,
cocosem
RECORD
cocogra,
cocogen2,
GraphList
Graphnode
InsertFramePart
PROC
cocogen,
cocosem
VAR sn:Symbolnode)
cocogra, cocosem, cocotst
cocosem
cocosem,
len:CARDINAL)
cocotst
line,col:CARDINAL)
cocolst
(p:CARDINAL;
cocogra,
GetNumberOfErrors
symbols:Errorptr;
OF
cocogra,
GetNextSemErr
Errors,
dir:Direction)
cocotst
cocosym,
PROC
VAR
VAR first:Symbolset)
cocotst
(loc:CARDINAL;
cocosym,
GetName
VAR
(sy:CARDINAL;
cocosym, cocogen2,
PROC
spix:CARDINAL;
cocosem
PROC (n:CARDINAL;
cocosym, cocogen2
PROC
set:Symbolset)
VAR
(sy,n:CARDINAL;
PROC
GetE
GetMacroNr
VAR
cocogen2
cocosym,
GetFo
left, right :CARDINAL)
(typ:Attrtype;
PROC
cocogen,
cocosym,
Get At
App. E
Intermodular cross-reference list
216
cocosym,
cocotst
App. E
IsInSet
line
LL1Test
Intermodular cross-reference list
PROC
(n:CARDINAL;
cocosym,
cocotst
CARDINAL
cocolex,
cocogen,
PROC
(VAR
cocotst,
lst
PROC
PROC
cocogen2,
(loc:CARDINAL;
cocosym,
cocosym,
cocogen2
cocogen2
maxn
CARDINAL
maxp
CARDINAL
cocogra,
cocotst
m:Marklist):
16]
BOOLEAN
OF BITSET
Cocogen2,
cocosym
cocogen2,
cocogra,
cocotst
cocosym,
cocogen2,
cocogra,
cocotst
CARDINAL
cocogen,
cocogen2
cocosym,
CARDINAL
CARDINAL
cocosym,
PROC
cocogen2,
NewEpsBeforeDelNts
dir:Direction)
cocosem
PROC
cocogra,
PROC
cocotst
(sy,spix:CARDINAL;
cocosym,
NewMacro
VAR
cocotst
cocosym,
cocosym,
cocotst
DIV
cocosym,
CARDINAL
cocosym,
NewAt
cocosyn
m:Marklist)
cocogra,
maxeps
maxt
VAR
ARRAY[O..maxnodes
CARDINAL
maxsem
cocosym,
cocotst
(loc:CARDINAL;
maxany
maxs
cocosem,
111:BOOLEAN)
coco,
cocogra,
Marklist
cocogen2,
BOOLEAN
File
cocogra,
Marked
s:Symbolset):
coco
cocolst,
Mark
VAR
217
Coco
(spix,sem:CARDINAL;
cocosym,
VAR
ok:BOOLEAN)
cocosem
NewNode
PROC (typ:Symboltype;
sp, line:CARDINAL) :CARDINAL
cocogra, cocosem
NewSy
PROC (spix:CARDINAL;
cocosym, cocosem
normal
enumeration constant
System, coco, Errors
Open
PROC (VAR f:File;
output
: BOOLEAN)
FileIO,
coco,
typ:Symboltype) : CARDINAL
volRef: INTEGER;
cocogen,
cocogen2,
fn:ARRAY
cocolst
OF
CHAR;
OpenFile
PROC
(spix:CARDINAL)
cocogen,
OpenSem
PROC
cocosem
(line:CARDINAL;
cocogen,
VAR
sem:CARDINAL)
cocosem
Parse
PROC (VAR correct
:BOOLEAN)
cocosyn, COCO
printinput
BOOLEAN
cocosyn,
PrintListing
COCO,
coco
BOOLEAN
cocosyn,
PrintSynError
cocolex
PROC
cocolst,
printnodes
App. E
Intermodular cross-reference list
218
PROC
coco,
(VAR
cocolex
f:File;
VAR
synerrors:CARDINAL)
VAR
ch:CHAR)
ERROESTRECOCONSE
PutStatistics
PROC
cocogen2,
Read
PROC
FileIO,
RepNode
PROC
coco
(VAR
£:File;
coco,
(p:CARDINAL;
cocogra,
RepSy
RestartHash
Restriction
PROC
cocogen,
PROC
cocolex,
cocosem
PROC
sn:Symbolnode)
cocogra,
rootloc
CARDINAL
rules
CARDINAL
cocogra,
cocogra,
cocogra,
cocolex,
cocosem,
cocogen2,
cocosem
PROC (sem:CARDINAL)
cocosem, cocosyn
SemErr
PROC
cocotst
cocosym,
cocosym
cocotst
(nr, line, col:CARDINAL)
Errors,
cocogen,
(VAR
cocosym,
src
cocosem,
cocogen2,
Semant
PROC
cocosem,
(nr:CARDINAL)
Errors,
SetBit
Errors
cocosym
(sy:CARDINAL;
cocogen2,
cocolst,
gn:Graphnode)
cocosem,
cocosym,
cocolex,
cocogen2,
cocolex,
s:Symbolset)
cocotst
File
cocolex,
coco,
cocogen,
StartCopy
PROC (col:CARDINAL)
cocogen, cocosem
StopHash
PROC
Symbolnode
RECORD
cocolex,
cocosem
cocolst
cocosem,
cocosym
App. E
Intermodular cross-reference list
cocosym,
Symbolset
cocogen2,
SyNr
PROC
cocogen2,
cocotst
OF BITSET
SyntaxError
PROC
PROC
coco,
(VAR
cocotst,
PROC
Errors
ok:BOOLEAN)
Coco
PROC
cocotst,
(VAR
ok:BOOLEAN)
coco
(VAR
ok:BOOLEAN)
cocotst,
Coco
CARDINAL
cocolex,
cocosem,
PROC
(VAR
cocosym,
PROC
cocosyn
sl,s2:Symbolset;
n:CARDINAL)
cocotst
(VAR
f:File;
FileIO, cocogen,
cocosym, Errors
PROC
line,col:CARDINAL)
(st:Status)
PROC
TestIfAllNtReached
cocotst
cocosyn
System,
TestCompleteness
cocosem,
cocosem
(symbols:Errorptr;
Errors,
Terminate
cocogra,
(spix:CARDINAL) : CARDINAL
cocosym,
WriteCard
cocosem,
16]
(eps,t,pr,nt,any,err)
cocosym,
typ
DIV
cocotst
cocogen2,
cocosym,
Symbolt ype
TestI£NtToTerm
cocogra,
ARRAY [0..maxterminals
219
(VAR
f:File;
ch:CHAR)
cocogen2,
cocolex,
nr:CARDINAL;
cocolst,
w: INTEGER)
FileIO, cocogen, cocogen2, cocogra, cocolex,
cocosem,
cocosym, cocosyn, cocotst, Errors
Writelnt
PROC (VAR f:File;
FilelIO, coco
WriteLn
PROC (VAR f:File)
FileIO, coco, cocogen,
cocosyn,
WriteString
PROC
(VAR
FileIO,
cocosem,
WriteText
cocotst,
f:File;
coco,
nr:INTEGER;
w: INTEGER)
cocogen2,
cocogra,
cocolst,
cocosym,
cocogra,
cocolex,
cocolst,
Errors
s:ARRAY
cocogen,
cocosym,
cocolst,
OF CHAR)
cocogen2,
cocosyn,
cocotst,
Errors
PROC (VAR f:File; t:ARRAY OF CHAR; 1: INTEGER)
FileIO, cocogen,
cocogen2, cocogra,
cocolex, cocosym,
cocotst, Errors
F
Program listings
This appendix contains the program listings of Coco, more than 3500 lines of Modula-2
source code. It is not our intention to describe the program step by step. At this point we
want to provide the reader with an overview of the function of the individual modules, and to
tell him where he should start reading, and which procedures he should further review in
order to understand the program. Modula-2 has a high degree of self-documentation, which
makes it possible to partition a large program into small modules that are easy to
understand, and furthermore to separate these modules into even smaller procedures that are
once more easy to understand. By reviewing the algorithms in Chapters 2, 3 and 7, it should
not be difficult for the reader to understand all the details of Coco.
F.1
Overview
Figure F.1 shows the phases of Coco with their modules and the data flow between them.
The lexical analyzer (cocolex) reads the compiler description and separates it into
tokens. The syntax analyzer (cocosyn) checks the syntax of the input stream and drives the
semantic processing program (cocosem) by activating semantic actions via action numbers.
In this phase, the symbol list (in cocsym) and the top-down graph (in cocogra) are
generated. The module cocogen generates the new semantics evaluator from the semantic
actions of the compiler description. Finally, the symbol list and the top-down graph are
analyzed in the grammar tests (cocotst), and if these tests have been successfully completed,
the new syntax analyzer with its parser tables is generated.
Since Coco was constructed by itself, the syntax analyzer (cocosyn) and its semantic
evaluator (cocosem) are examples of compiler parts produced by Coco.
220
App. F
Overview
221
Compiler description
Lexical analysis
cocolex
Mannnunsnnnnnnnennnnnnnnnnee
Symbols,
attributes
Syntax analysis
Semantic analysis
cocosyn
cocosem
cocosym
cocogra
cocogen
RER
Symbol list
Compiler generation
Syntax analyzer
Fig. F.1 Phases and modules of Coco
*
F.2
Module
hierarchy
Coco consists of
1.
10 Coco-related modules
coco
cocolex
cocosyn
cocosem
cocogra
cocosym
main module
lexical analyzer
syntax analyzer
semantic evaluator
top-down graph handler
symbol list handler
cocotst
grammar tests
cocogen
cocogen2
cocolst
generator of the new semantic evaluator
generator of the new syntax analyzer and the parser tables
source list generator
2.
2 general purpose standard modules
Errors
general error module for compilers generated by Coco
FileIo
input/output procedures
3.
1 operating system module (not part of Coco)
System
dynamic memory management (heap)
222
Program listings
App. F
Figure F.2 shows the module hierarchy. An arrow from module A to module B means that
A calls B.
Arrows leading to the operating system module and the standard modules are not shown
for simplicity. Those modules are used by almost all of the other modules, and are not a
direct part of Coco.
cocogen
cocogra
cocosym
cocolex
System
FileIO
Errors
Fig. F.2 Module hierachy with relation 'uses procedures from'
F.3
Module
descriptions
We will now give a short description of all modules of the Coco system. A diagram for each
module will show which procedures are called from other modules.
coco
coco is the main module. It opens the source file and the list file and calls the syntax
analyzer (Parse). When the syntax analysis is completed, the source file has been read, and
the symbol list and a top-down graph have been stored. The top-down graph is further
processed by inserting and deleting eps-nodes at certain positions (NewEpsBeforeDelNts,
DelRedundantEps) and the terminal start symbols are collected (FindDelSymbols,
GetSymbolSets). After that, coco calls the grammar tests (FindCircularRules, TestIfNtToTerm, TestCompleteness, TestIfAllNtReached, LL1 Test) and generates the
target
compiler (GenSynFiles) if no errors are found. At the end, statistics about the compilation
are written to the list file (PutStatistics), and all files are closed.
App. F
Module descriptions
FindDelSymbols
GetSymbolSets
NewEpsBeforeDelNts
DelRedundantEps
223
GenSynFiles
PutStatistics
CloseFile
FindCircularRules
Testl£fNtToTerm
TestCompleteness
TestlfAllNtReached
LL1Test
Fig. F.3 coco and the modules imported by it
cocolex
cocolex is the lexical analyzer of Coco. It reads the Cocol input, separates it into tokens,
and passes them together with their attributes to the syntax analyzer. Names and strings are
stored in a name list. Numbers are translated into their numeric value. The main procedure of
cocolex is GetSy.
cocosyn
cocosyn is the syntax analyzer of Coco and has been generated by Coco itself. It operates
according to the table-driven LL(1) parsing algorithm described in Section 2.5, and uses the
error-handling mechanism described in Section 2.6. cocosyn gets the source tokens from the
lexical analyzer (GetSy), analyzes them, and calls the procedure Semant to execute the
semantic actions.
cocosyn
Fig. F.4 cocosyn and the modules imported by it
cocosem
cocosem is the semantics evaluator of Coco. It has been generated by Coco itself and
contains the semantic actions of the attributed grammar of Coco. cocosem calls the
procedures for the generation and management of the symbol list and the top-down graph:
1.
2.
3.
4.
5.
6.
symbol handling: NewSy, GetSy, RepSy, SyNr;
attribute handling: NewAt, GetAt, CompleteAt,
top-down graph handling: NewNode, GetNode, RepNode, ConcatLeft,
ConcatRight, GraphList;
generation of the semantic evaluator: OpenFile, CloseFile, OpenSem,
StartCopy, Copy, InsertFramePart, GenAssign, EmitAssign, EmitAction;
handling of the semantic macros: NewMacro, GetMacroNr;
control over the entries into the name list: StopHash, RestartHash.
224
Program listings
App. F
The listing of cocosem is an example of a large semantic evaluator generated by Coco. But
it is not useful to study cocosem, rather one should study the attributed grammar.
cocosem
OpenFile
ConcatRight
CloseFile
Copy
InsertFramePart
StartCopy
GraphList
OpenSem
CompleteAt
GenAssign
NewMacro
GetMacroNr
EmitAction
Fig. F.5 cocosem and the modules imported by it
cocosym
The module cocosym handles the symbol list of Coco. It contains procedures to generate,
read, and modify symbol nodes, to search names in the symbol list, to enter, read, and check
attributes, and to generate and retrieve information about semantic macros. It also contains
procedures to determine the deletability of nonterminals, and to collect their terminal start
symbols. cocosym uses a few procedures from cocolex and cocogra.
cocosym
ClearMarkList
Mark
Marked
Fig. F.6 cocosym and the modules imported by it
cocogra
The module cocogra handles the top-down graph. It contains procedures to generate, read,
and modify graph nodes, to link subgraphs, and to print the entire top-down graph for
tracing. cocogra also contains procedures to insert eps-nodes in front of deletable
nonterminals, and to remove redundant eps-nodes. To output the top-down graph, cocogra
needs the syntax symbols and their names, which it gets from the modules cocosym
and
cocolex.
cocogen
The module cocogen generates the semantic evaluator of the target
compiler from the
semantic declarations and semantic actions of the input grammar. It contains
procedures to
App. F
Module descriptions
225
cocogra
RepSy
Fig. F.7 cocogra and the modules imported by it
read the frame module, to copy the semantic parts from the attributed grammar, and to
translate attributes into semantic actions. cocogen uses no other modules of Coco except
for the lexical analyzer, from which it gets the symbol names.
cocogen
Fig. F.8 cocogen and the modules imported by it
cocotst
The module cocotst is a collection of procedures for the execution of the grammar tests as
described in Section 7.5. It uses the symbol list (from cocosym) and the top-down graph
(from cocogra). For the output of error messages, cocotst needs the symbol names which
are. obtained with the procedure GetName. To recognize the deletability of graph nodes, and
subgraphs, it uses the procedures Deletable and DelNode from cocogra.
cocotst
Deletable
DeINode
ClearMarkList
Mark
Marked
Fig. F.9 cocotst and the modules imported by it
cocogen2
The module cocogen2 generates the syntax analyzer and the parser tables of the target
compiler. The table values are obtained from the symbol list (with GetSy, RepSy, GetF,
GetE, and GetA) and from the top-down graph (GetNode). Before the tables can be
inserted into the syntax analyzer, cocogen2 transforms the top-down graph into G-code
instructions. The syntax analyzer of the target compiler is assembled mainly from the frame
parts (on the file cocosynframe), in which cocogen2 inserts the parser tables, some
App. F
Program listings
226
declarations, and grammar-specific names. For the output of statistics, cocogen2 uses the
procedure GetName from the lexical analyzer.
cocogen2
cocogen
CopyFramePart
Fig. F.10 cocogen2 and the modules imported by it
cocolst
cocolst is called by the main program if errors have been detected during parsing. It reads
the input again and prints a source list with error messages.
Errors
Errors is a general-purpose error message module that can be used by all compilers
generated by Coco. It contains procedures for storing semantic and syntax errors, for
retrieving stored error messages, and for printing all of the stored error messages at the end of
the program. In addition, it contains procedures for handling implementation restrictions and
compiler errors.
FileIO
FileIO is a general-purpose module that contains screen and disk I/O procedures for
characters, strings, and numbers. It is based on five system modules which are not described
in this book. These are Terminal, MemTypes, OS, Toolbox and QuickDraw (see Inside
Macintosh [1985] and Wirth et al. [1986]).
System
System is an operating system module that among other things manages the heap.
F.4 Instructions
on how
to study the source
code
The listings consist of the attributed grammar of Coco and all other modules in alphabetical
order. The reader should first study the source code of the main module coco to see how the
program is started and initialized. The lexical analyzer and the syntax analyzer are not
essential for an understanding of the other modules, so they may be skipped in the
beginning.
The central document that describes the actual translation is the attributed grammar.
The reader should study the attributed grammar and the procedures that are called from the
semantic actions in detail. It is recommended that the procedures belonging to a particular
task are studied together. These tasks are:
App. F
Instructions how to study the source code
227
handling the symbol list: NewSy, GetSy, RepSy, IsSy
handling the attributes: NewAt, GetAt, CompleteAt
handling the top-down graph: NewNode, GetNode, RepNode, ConcatLeft,
Ka
SE
ConcatRight, GraphList
generating the semantic evaluator: CloseFile, CopyFramePart, InsertFramePart
copying semantic parts: OpenSem, StartCopy, Copy
generating attribute assignments: GenAssign, EmitAction
handling semantic macros: NewMacro, GetMacroNr
ee
controlling the name list entries: StopHash, RestartHash
The procedures for the collection of the symbol sets and the execution of the grammar tests
may be studied in any order. The only procedures used almost everywhere are the procedures
for marking paths that have been previously visited in traversing the top-down graph
(ClearMarkList, Mark, and Marked in cocogra) and the procedures which check the
deletability of graphs and graph nodes (Deletable and DelNode in cocogra). These
procedures should be read first.
As the last module, the reader should study cocogen2. It generates the parser tables and
the syntax analyzer, and uses the data structures generated by the other modules. The reader
should study these modules first to understand how the data structures are filled.
Before an implementation module is studied, the corresponding definition module
should be inspected. It describes the interface of the module, and contains the declarations and
descriptions of all exported objects. The procedures of an implementation module appear in
alphabetical order. Most of them are at the outermost level of the module. Only auxiliary
procedures that are clearly part of another procedure are nested within this procedure.
Each implementation module is followed by a cross-reference list. As an additional aid,
Appendix E contains an intermodular cross-reference list with the names and types of all
objects transferred between modules. This list also shows which modules export an object
and which import it.
Program listings in alphabetical
coco.ATG
coco.MOD
cocogen.DEF,
cocogen.MOD
cocogen2.DEF, cocogen2 .MOD
cocogra.DEF,
cocogra.MOD
cocolex.DEF,
cocolex.MOD
cocolst.DEF,
cocolst.MOD
cocosem.DEF,
cocosem.MOD
cocosemframe
cocosym.DEF,
cocosym.MOD
cocosyn.DEF,
cocosyn.MOD
cocosynframe
cocotst.DEF,
cocotst.MOD
Errors.DEF,
FileIO.DEF
System.DEF
Errors.MOD
FileIO.MOD
order
attributed grammar
main program
generator of semantics processor
generator of the syntax analyzer
top-down graph manager
lexical analyzer
source list generator
semantic evaluator of Coco
semantics evaluator frame
symbol list manager
syntax analyzer
syntax analyzer frame
grammar tests
standard error module
input/output module
dynamic memory management
228
241
245
254
266
274
283
287
297
299
316
328
338
348
356
369
App. F
Program listings
228
1 --
Attributed
Q --
ssesss=s==s===s==sssssSs=5==5=====
This grammar is a documentation of the compiler compiler Coco,
but it is also an example how to use the Coco input language Cocol.
The grammar describes the construction of the parser tables and of
3
4
5
6
|
grammar
----the semantic
einen
8 GRAMMAR
of Coco
evaluator.
Moe
EI
Ba
EIER
ae
13.3.83
ee
eee
coco
9
ty SS
u
coco
= GRAMMARSY
=
IDENT
[SEMANTICSY
DECLARATIONSY
{any}]
IP ==
eh =
Ne
[MACROSY {macrodef}]
TERMINALSY {symbol [attr]
(PRAGMASY
{symbol [attr]
15 -16%==
17 ==
NONTERMINALSY {IDENT [attr]
[aliasname] }
RULESSY {IDENT [attr]
'=' expr '.'}
ENDGRAMSY .
18 -92
ZUR
expr
Zzterm
ca
= term {'|' term)
=Etaetz trace).
= ( symbol [attr]
Ze
| EPSSY
BR
| ANYSY
PAR) iz
Co
| semaction
I UO ebgoye tl
DB) ==
ZAC SS
I DIR Sepgoyer UY
[UG kexpral
tcl)
[aliasname] }
[semaction]}]
.
ue.
ZU
attr
=
(outattr
Bes
299
2
Hl
a=
=
=
=
inattr
outattr
semaction
macrodef
= INSY ':' (IDENT | NUMBER)
{',' (IDENT | NUMBER)}
= OUTSY ':' IDENT {',' IDENT}
= SEMSY ( '(' IDENT ')' | {any}) ENDSEMSY
.
= SEMSY ":" IDENT ":" {any} ENDSEM .
etinattr
pit;
outavtr))
er
32 -38) =
symbol
aliasname
= IDENT | STRING .
= ALIASSY symbol.
34
35
36
SEMANTIC
DECLARATIONS
3] --===================
38
39 FROM
40
41 FROM
42
43 FROM
44 FROM
45
46
cocogen
IMPORT
cocogra
IMPORT
cocolex
cocosym
IMPORT
IMPORT
Attrtype, CloseFile, Copy, EmitAction, GenAssign,
InsertFramePart, OpenFile, OpenSem, StartCopy;
alts, rules, rootloc, ConcatLeft, ConcatRight,
GetNode, GraphList, Graphnode, NewNode, RepNode;
typ, line, col, ddt, RestartHash, StopHash;
gramspix, CompleteAt, Direction,
GetAt,
GetMacroNr,
NewSy,
RepSy,
FROM
Errors
IMPORT
CompErr,
48 FROM
SYSTEM
IMPORT
VAL;
47
GetSy,
Symbolnode,
Restriction,
52
53
54
55
CONST
null
= 65535;
--
null
symbol
--
symbol
TYPE
Usage
=
(def,
check,
use);
56
57 VAR
58
-- symbol
59eysn:
nodes
Symbolnode;
node
NewMacro,
Symboltype,
SemErr;
49
50
51
NewAt,
SyNr;
App. F
60
61
62
coco.ATG
sy, syl:
rootsy:
eofsy:
63
64
Ooi
66
67
68
69
70
CARDINAL;
CARDINAL;
CARDINAL;
----
229
symbol numbers
start symbol of grammar
endfile symbol (always Nr.
-- graph nodes
gn:
gp,gpl,gp2,gp3:
gl,gl1,g12,g13:
Graphnode;
CARDINAL;
CARDINAL;
-- graph node
-- ptr to start
-- ptr to right
dd, dd1,dd2:
gpo:
firstfact:
BOOLEAN;
CARDINAL;
BOOLEAN;
----
il
U2
73
74
1
Ths
77
-- attribute processing
Kinde
Usage;
styp:
Symboltype;
dir, dirl: Direction;
count:
CARDINAL;
48
CARDINAL;
+-- generation of semantic
of graphs
open ends
of graphs
is graph deletable ?
auxiliary ptr
TRUE if first factor in term
-- usage of attribute
-- (eps,t,pr,nt,any,err)
-- input/output attribute
-- attribute counter
-- value of an attribute constant
evaluator
78
seml,sem2,sem3:
CARDINAL;
--
semantic
79
80
81
firstsymbol:
-- various
ok:
BOOLEAN;
--
current
82
spix,
---
error indicator
auxiliaries
83
84
85
dummy:
86 --
BOOLEAN;
CARDINAL;
spixl:
0*)
actions
symbol
the
first
in action
?
CARDINAL;
SEMANTICSTACK
Stack
to save
semantic
values
87 --===========2=====2===2===2=2====2
2222222222222 22222222222 == 2222222222222
88 MODULE
SEMANTICSTACK;
89 IMPORT
CompErr,
90 ‚EXPORT
Pop,
Restriction;
Push;
91 CONST maxstacksize = 70;
92 VAR
93
stack: ARRAY[l..maxstacksize]
94
sp:
CARDINAL;
95
96 PROCEDURE Pop(): CARDINAL;
97 VAR
98
x:
OF CARDINAL;
CARDINAL;
BEGIN
99
100
101
IF sp=0 THEN
RETURN x;
END Pop;
CompErr(6);
ELSE
x:=stack[sp];
DEC(sp);
END;
102
103
104
PROCEDURE
BEGIN
105
IF sp<maxstacksize
106
THEN
107
108
109
110
111
Push(x:CARDINAL) ;
INC (sp);
stack[sp] :=x;
ELSE Restriction(14);
END;
END Push;
BEGIN
112
sp:=0;
113
END
SEMANTICSTACK;
114
AES)
iMG
——s
a
ee
heron
Report
semantic
118
PROCEDURE
Error(nr:CARDINAL) ;
error
2222222
230
Program listings
119 BEGIN SemErr(nr,line,col); END Error;
120
121
122
123
124
sem :AssignIdl:
125
126
INC (count);
CASE kind OF
127
use:
128
IF styp=nt THEN
129
GetAt (!sy, !count, “spixl, “dirl);
130
IF spixl<>0 THEN
131
IF dir=dirl
132
THEN GenAssign(!nonterm,
!spixl, !spix);
133
ELSE Error(8); END;
134
185
END;
136
END;
137
| check:
138
IF styp=nt THEN
139
GetAt (!sy, !count, *spixl, *dirl);
140
IF spixl<>0 THEN
141
IF spix<>spixl THEN Error(9); END;
142
IF dir<>dirl THEN Error(8); END;
143
END;
144
END;
145
| def:
146
NewAt (!sy, !spix, !dir);
147
END; -- CASE
148
endsem
149
150
sem :AssignId2:
151
INC (count) ;
192
CASE kind OF
153
use:
154
IF styp=t THEN
153
GenAssign(!term,
!spix, !count) ;
156
ELSIF styp=nt THEN
157
GetAt (!sy, !count, “spixl,“dirl);
158
IF spixl<>0 THEN
159
IF dir=dirl
160
THEN GenAssign(!nonterm,
!spix, !spix1)
161
ELSE Error (8);
162
END;
163
END;
164
END;
165
| check:
166
IF styp=nt THEN
167
GetAt (!sy, !count, *spixl, *dirl);
168
IF spixl<>0 THEN
169
IF spix<>spixl THEN Error(9); END;
170
IF dir<>dirl THEN Error(8); END;
171
END;
172
END;
173
| def:
174
NewAt (!sy,!spix,!dir);
175
IF styp=pr THEN
176
GenAssign(!term,
!spix, !count) ;
177
END;
App.F
App. F
coco.ATG
178
END;
179
endsem
--
231
CASE
180
181
182
sem
183
:AssignNumber:
INC (count);
IF kind=use
184
185
THEN
IF styp=nt
THEN
186
GetAt (!sy, !count, *spixl,“dirl);
187
188
IF spixl<>0 THEN
IF dir=dirl
189
190
THEN GenAssign(!const,
!spix1, !n) ;
ELSE Error (8);
191
END;
192
193
194
195
END;
END;
ELSE Error(10);
END;
196
endsem
197
198
199
sem
:CheckAttr:
IF NOT CompleteAt
(!sy, !count)
200
THEN
Error (6);
201
END;
202
endsem
203
204
205
sem
:Copy:
Copy (typ, col)
endsem
sem
:InitCopy:
StartCopy
(1)
206
207
208”,
209
210
endsem
211
212
213
214
sem
:PopPointers:
firstfact
:=VAL (BOOLEAN, Pop());
ddl :=VAL (BOOLEAN, Pop()); gll:=Pop();
215
dd:=VAL (BOOLEAN, Pop());
216
217
218
219
gpo:=0
endsem
sem
220
221
gl:=Pop();
:PushPointers:
Push(!gp); Push(!gl); Push(!VAL(CARDINAL,
dd) );
Push(!gpl); Push(!gll); Push(!VAL(CARDINAL,
ddl) ) ;
222
Push(!VAL(CARDINAL, firstfact) );
223
224
225
226
227
228
endsem
229
230
231
232
233
234
gpl:=Pop();
gp:=Pop();
sem
:StoreSymbol:
sy:=SyNr(!spix);
IF sy=null
THEN sy:=NewSy (spix, styp)
ELSE
END;
endsem
TERMINALS
235 --=======
236
Error(1);
232
Program listings
237
--
238
239
240
241
ALIASSY
ANYSY
DECLARATIONSY
ENDGRAMSY
242
243
App. F
key words
ENDSEMSY
+EPSSY
alias
alias
alias
alias
"ALIAS"
"any"
"DECLARATIONS"
"ENDGRAM"
==
----
1:
2:
3:
4:
ALIAS
ANY, any
DECLARATIONS
ENDGRAM
alias
alias
“endsem"
"eps"
-==
5:
OH
ENDSEM
[hy es
"GRAMMAR"
GRAMMAR
244
GRAMMARSY
alias
--
7:
245
INSY
allasarinz
=—)
9:
246
247
MACROSY
NONTERMINALSY
alias
alias
"MACROS"
"NONTERMINALS"
---
9:
10:
MACROS
NONTERMINALS
248
OUTSY
alias
"out"
--
11:
OUT,
249
250
PRAGMASY
RULESSY
alias
alias
"PRAGMAS"
"RULES"
-- 12:
13:2
PRAGMAS
RULES
251
SEMSY
alias
"sem"
--
14:
SEM,
252
253
SEMANTICSY
TERMINALSY
alias
alias
"SEMANTICS"
"TERMINALS"
---
15:
16:
SEMANTICS
TERMINALS
alias
identifier
-a=
=
17:
Alls,
9:
name
Shgkealiave;
eeiconstant
IN,
in
in
sem
254
255
256
257
258
259
260
261
262
263
264
205)
285
267
-- terminal classes
IDENT
<out:spix>
STRING
<out:spix>
NUMBER
<out:n>
al
Ne
ul
mi
u)!
MIP
al
-- 20
==, Vil
==e22
== 23
SS 7!
== 2S
== 2G
20
Sees
-- 27
269
ea
a=
270
271
ee
USS
be
=="
30
272
u:
==:
3]
273
Es
a
ey)
274
275
276
277
ir
nococosy
==
Se)
278
NONTERMINALS
AG
280
ee ee
coco
281
282
283
284
285
--
expr
286
287
293
294
295
characters
<out:n>
alias "correct grammar"
~~ recognizes the whole compiler description
<out:gp,gl,dd>
alias expression
-- recognizes an expression and builds its TDG.
-- gp points to the root of the TDG
-- gl points to right open ends of the TDG
<out:gpl,gll,ddl>
----
fact
fhe
3A
-- dd indicates
term
288
289
290
291
292
single
recognizes
gpl points
gll points
if the TDG
alias
35
==
is deletable
alternative
Shi)
an alternative and builds
to the root of the TDG
to right open ends of the
-- ddl indicates if the TDG is deletable
<in:gpo,firstfact; out:gp2,g1l2,dd2,gpo>
-- recognizes
-- gp2 points
a=
alias symbol
a component and builds
to the root of the TDG
its
TDG.
TDG
its TDG.
"==
38
App. F
coco.ATG
296
297
298
299
300
----
301
302
303
304
305
306
307
308
309
310
Ss]
312
is TRUE, if fact is the
out:seml,sem2,count>
„alias attribute
first
is 0
one
in the term
== 49)
-- recognizes input/output attributes for the symbol
-- with type styp.
-- kind=def:
used in declaration context
==
seml=0. sem2=0 (except of pragmas)
-- kind=check: used on the left-hand side of rules
inattr
seml=0,
-- kind=use:
==
Saye
-- count is the
used on the right-hand side of rules
seml: sem.no. of input attribute evaluation
sem2: sem.no. of output attribute evaluation
nr.of attributes in attr
sem2=0
out:seml,count>
==
alias "in-attribute"
input/output attributes
-- recognizes
-- with type styp
315
sy
--
<in:sy,styp,kind,count;
313
314
316
317
318
319
320
321
322
323
324
325
gl2 points to right open ends of the TDG
dd2 indicates if the TDG is deletable
gpo points to the predecessor of fact or
-- firstfact
<in:sy,styp,kind;
attr
233
for the symbol
(sy must be a nonterminal).
40
sy
-- kind=def:
used in declaration context
-seml=0.
-- kind=check: used on the left-hand side of rules
==
seml=0.
-- kind=use:
used on the right-hand side of rules
=
seml: sem.no. of input attribute evaluation
-- count is the no.of attributes in inattr
<in:sy,styp,kind,count; out:sem2,count>
alias "out-attribute”
-- recognizes input/output attributes for the symbol sy
outattr
326,
--
321:
-- kind=def:
with
type styp.
used in declaration
328
329
330
331
332
--- kind=check:
_—
-- kind=use:
ca
sem2=0.
used on the left-hand side of rules
sem2=0.
used on the right-hand side of rules
sem2: sem.no. of output attribute evaluation
context
333
-- count is the no.of attributes in outattr
334
semaction
<out:sem3>
alias "semantic action"
== 42
335
-- recognizes a semantic action and generates a CASE block
336
-- in Semant. sem2 is the action number.
337
macrodef
alias “semantic macro”
as 4)
338
symbol
<out:spix>
-- 44
339
-- recognizes a name or a string
340
aliasname <in:sy>
alias "alias name"
=_=45
341
-- recognizes a name which is used for the symbol sy in
342
-- syntax error messages in the generated compiler.
343
344
345 --======================== grammar rules ================2===============
346
347
RULES
coco
=
348
GRAMMARSY
349
350
351
IDENT
<out:gramspix>
sem
rules:=0; alts:=0;
OpenFile (gramspix);
endsem
352
353
354
355
[ SEMANTICSY
{ any
DECLARATIONSY
sem
sem
(InitCopy) endsem
(Copy) endsem
StopHash;
356
3911
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
Si)
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
App. F
Program listings
234
]
sem RestartHash;
InsertFramePart;
endsem
[ MACROSY
{ macrodef
TERMINALSY
{ symbol <out:spix>
styp:=t;
} ]
sem eofsy:=NewSy(!0,!t) endsem
sem (StoreSymbol) endsem
[ attr <in:sy,t,def; out:seml,sem2,count>
{ aliasname <in:sy> ]
}
{ PRAGMASY
{ symbol
{ attr
<out:spix>
sem
sem
<in:sy,pr,def;
]
styp:=pr endsem
(StoreSymbol) endsem
out:seml,sem2,count>
sem GetSy(!sy,*sn);
RepSy (!sy,!sn);
endsem
sn.seml:=sem2;
]
{ semaction
<out:sem3>
sem GetSy(!sy,“sn);
RepSy (!sy,!sn);
endsem
sn.sem2:=sem3;
}
}
]
NONTERMINALSY
{ IDENT <out:spix>
sem styp:=nt endsem
sem (StoreSymbol) endsem
[ attr <in:sy,nt,def; out:seml,sem2,count>
[ aliasname <in:sy> ]
}
sem
]
rootsy:=SyNr (!gramspix);
IF rootsy=null THEN Error(2);
endsem
END;
RULESSY
{ IDENT
<out:spix>
sem
sy:=SyNr(!spix);
IF sy=null THEN
Error(3); sy:=NewSy(!spix,
!err)
END;
GetSy(!sy,”sn);
IF (sn.typ<>nt)
AND
(sn.typ<>err)
Error (4);
END;
IF sn.start<>0
[ attr
We
expr
THEN Error(5);
END;
syl:=sy; count:=0; styp:=sn.typ
endsem
<in:sy,styp, check; out:seml,sem2,count> ]
sem (CheckAttr) endsem
<out:gp,gl,dd>
sem GetSy(!syl,*sn);
sn.start:=gp; sn.del:=dd;
RepSy(!syl,!sn);
INC (rules);
endsem
St
}
sem
rootloc:=NewNode (Int, !rootsy, !0);
gpl:=NewNode (!t, !eofsy, !0);
gl:=rootloc; gll:=gpl;
ConcatRight (rootloc,gl,
endsem
!gpl,!gli)
THEN
coco.ATG
ENDGRAMSY
sem
235
IF ddt["L*]
THEN
GraphList;
END;
CloseFile;
endsem.
expr <out:gp,gl,dd> =
term <out:gp,gl,dd>
“
sem
INC(alts);
<out:gpl,gll,ddl>
sem
INC (alts);
endsem
el
term
Concatleft
(gp,gl, !gpl,!gqll);
dd:=dd OR ddl
endsem
term
<out:gpl,gll,ddl>
=
sem gpo:=0 endsem
fact <in:gpo,TRUE; out:gpl,gl1,ddl,gpo>
{ fact <in:gpo,FALSE; out:gp2,g12,dd2,gpo>
sem IF gp2<>0 THEN
ConcatRight (gp1,g11,!gp2, !gl2);
ddl:=ddl AND dd2;
END;
endsem
fact <in:gpo,firstfact;
( symbol <out:spix>
out:gp2,g12,dd2,gpo> =
sem sy:=SyNr(!spix);
IF sy=null THEN
Error(3); sy:=NewSy (!spix, !err)
END;
GetSy(!sy,”sn);
IF sn.typ=pr THEN Error (16); END;
gp2:=NewNode (!sn.typ, !sy, !line);
gl2:=gp2; dd2:=FALSE; gpo:=gp2;
count:=0; styp:=sn.typ
endsem
[ attr
<in:sy,styp,use;
out:seml,sem2,count>
sem
GetNode(!gp2,%gn) ;
gn.seml:=seml; gn.sem2:=sem2;
RepNode (!gp2, ! gn)
endsem
sem (CheckAttr) endsem
]
| EPSSY
sem
gp2:=NewNode(!eps,!0,!line);
| ANYSY
endsem
sem gp2:=NewNode(!any,!0,!line);
gl2:=gp2;
gl2:=gp2;
| semaction
<out:sem3>
sem
dd2:=TRUE;
dd2:=FALSE;
gpo:=gp2
gpo:=gp2
endsem
IF gpo=0
THEN
gp2:=N(!eps,
ewN
!0,
ode
!line);
gl2:=gp2; dd2:=TRUE;
GetNode (!gp2,”gn);
RepNode (!gp2, !gn);
gn.sem3:=sem3;
ELSE
GetNode(!gpo,*gn); gn.sem3:=sem3;
RepNode (gpo, gn) ;
gp2:=0; gl2:=0; gpo:=0
END;
endsem
474
475
476
IR
expr
ie
477
[Peet
478
App. F
Program listings
236
sem
(PushPointers)
endsem
sem
(PopPointers)
sem
(PushPointers)
sem
gp2:=NewNode
(!eps, !0,!line);
<out:gp2,g12,dd2>
expr
<out:gp,gl,dd>
479
endsem
endsem
gl2:=gp2;
480
ConcatLeft
(gp,gl, !gp2,!g12);
481
gp2:=gp;
482
483
484
485
gl2:=gl;
dd2:=TRUE;
endsem
sem (PopPointers) endsem
sem (PushPointers) endsem
si
linet
expr
<out:gp,gl,dd>
sem
gp2:=NewNode(!eps,!0,!line);
486
gl2:=gp2;
487
488
ConcatRight
(gp,gl, !gp, !gl);
ConcatLeft
(gp,gl, !gp2,!g12);
489
490
491
492
493
494
495
gp2:=gp; dd2:=TRUE;
-- gl2 is link of eps
endsem
sem (PopPointers) endsem
sem IF firstfact THEN
gp3:=gp2; gl3:=g12;
u
gp2:=NewNode
(!eps, !0, !line); gl2:=gp2;
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
ConcatRight (gp2,g12,!gp3, !g13);
END;
endsem
).
-----------------------------------------------------------------------attr <in:sy,styp,kind; out:seml,sem2,count> =
net
sem seml:=0; sem2:=0 endsem
( inattr <in:sy,styp,kind,0; out:seml,count>
[ ';' outattr <in:sy,styp,kind,count; out:sem2,count>
| outattr <in:sy,styp,kind,0; out:sem2,count>
inattr
INSY
<in:sy,styp,kind,count; out:seml,count> =
sem IF styp<>nt THEN Error(7);
dir:=down;
532
END;
endsem
Ba
( IDENT <out:spix>
| NUMBER <out:n>
)
(er
( IDENT <out:spix>
| NUMBER <out:n>
)}
sem
sem
(AssignIdl) endsem
(AssignNumber) endsem
sem
sem
sem
(AssignIdl) endsem
(AssignNumber) endsem
IF kind=use THEN
EmitAction(!line,*sem1)
END;
523
1
]
)
‘oak
--------------------------------------------------~-~-------------------
521
522
aaa
525
526
Sn
528
529
am
node
;
endsem.
outattr
OUTSY
Mosh
IDENT
UN
menu
<in:sy,styp,kind,count; out:sem2,count>
sem dir:=up endsem
<out:spix>
IDENT
}
<out:spix>
=
sem
(AssignId2)
sem
sem
(AssignId2) endsem
IF (kind=use) OR (styp=pr)
endsem
EmitAction(!line, "sem2) ;
THEN
App. F
33
534
SE
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
coco.ATG
semaction
SEMSY
WE:
IDENT
===
<out:sem3>
END;
endsem.
Sr
237
nn
m
=
sem StopHash; firstsymbol:=TRUE endsem
“sem RestartHash endsem
sem GetMacroNr (!spix, “sem3) ;
IF sem3=0 THEN Error(12); END;
endsem
<out:spix>
UpNee
| { any
sem
IF firstsymbol THEN
firstsymbol:=FALSE;
OpenSem(!line,*sem3);
END;
Copy (!typ, !col)
endsem
sem RestartHash; endsem
5
}
StartCopy(!col)
)
551
ENDSEMSY.
952
553
------------------------------macrodef =
554
SEMSY
955
556
tet
IDENT
<out:spix>
sem
SE
558
2.0.0000...
OpenSem(!line, ”sem3);
NewMacro
(!spix, !sem3, *ok) ;
IF NOT ok THEN Error (11); END;
559
StopHash; firstsymbol:=TRUE;
560
endsem
561
San
562
{ any
sem IF firstsymbol THEN
563%
firstsymbol:=FALSE; StartCopy (col)
564
END;
565
Copy (!typ, !col)
566
endsem
567
}
568
ENDSEMSY
sem RestartHash endsen.
569 ------------ nnn nn nnn nnn 7777220
570
symbol <out:spix> =
oval
( IDENT <out:spix> | STRING <out:spix> ).
a
a
a
573
aliasname
574
ALIASSY
915
576
symbol
<in:sy>
=
<out:spix>
sem
WT
GetSy(!sy,“sn);
RepSy(!sy,!sn);
sn.aliasspix:=spix;
endsem.
578
579
ENDGRAM
alias
allasname
aliasspix
ALIASSY
alts
any
ANYSY
AssignIdl
238
239
240
BO
2S
25
BELIEBEN
N)
13
Tey
Ssh
Si)
3302385574
41
349
420
11
Seal
22
239
459
125,
Pola
ASS
241
28
242
PAR
243
230
244
ZEA
SHO
BX
SRA
yesh
422
7320
739783555.4597
245
Zh)
246
sh
54352562
247
eid
248
sricsh
249
ey
528
515
14
501
shill
530
519
15
16
20
27
314
322
325
333
402
416
10
455
280
347
119
89
199
423
413
205
99
545
547
563
565
480
433
488
487
496
204
126
Sn
448
25
205
130
312
450
220
359
139
312
501
282
547
151
322
503
286
565
155
323
504
404
214
292
221
297
287
431
291
434
415
240
145
393
173
304
132
130
74
142
132
146
139
116
229
521
241
242
363
243
243
392
118
387
532
415
951
411
456
456
395
ae
392
480
67
67
18
19
213
537
133
130
539
451
371
215
481
214
292
AssignId2
AssignNumber
attr
attributes
Attrtype
CheckAttr
CloseFile
coco
cocogen
cocogra
cocolex
cocosym
col
CompErr
CompleteAt
ConcatLeft
ConcatRight
const
App. F
Program listings
238
3m
365
370
384
401
157
323
504
405
167
333
505
419
176
365
509
420
182
370
509
424
186
384
525
424
199
399
525
478
422
439
424
447
428
457
430
460
434
465
434
475
481
316
327
365
370
384
159
142
170
157
174
159
188
167
511
170
526
186
188
464
478
485
490
495
442
134
396
141
398
142
442
161
445
169
510
170
540
190
558
194
24
20
222
543
155
139
25
292
292
544
160
157
26
298
299
559
176
167
282
299
439
562
189
186
404
430
493
563
419
431
475
439
478
485
466
376
220
485
221
296
469
394
282
487
287
431
404
285
487
290
433
444
404
488
412
439
575
412
413
419
420
423
478
413
447
422
457
423
460
428
465
430
471
433
475
300
189
Copy
count
300
401
dd
485
ddl
dd2
489
ddt
DECLARATIONSY
def
del
dir
dirl
Direction
down
405
Sit
dummy
EmitAction
ENDGRAMSY
ENDSEMSY
eofsy
eps
EPSSY
err
Error
568
Errors
expr
fact
firstfact
firstsymbol
GenAssign
GetAt
GetMacroNr
GetNode
GetSy
gl
gll
gl2
200
coco.ATG
gp2
gp3
gpo
GRAMMARSY
gramspix
GraphList
Graphnode
IDENT
inattr
InitCopy
InsertFramePart
INSY
kind
line
link
macrodef
MACROSY
maxstacksize
n
name
NewAt
NewMacro
NewNode
NewSy
nococosy
nodes
nonterm
NONTERMINALSY
nr
nt
null
NUMBER
ok
OpenFile
OpenSem
outattr
OUTSY
Pop
PopPointers
pr
PRAGMASY
Push
PushPointers
RepNode
RepSy
RestartHash
479
67
65
66
481
66
433
66
456
478
66
69
447
10
44
42
42
10
383
27
208
40
28
72
323
43
556
49)
12
12
91
76
256
45
45
42
46
275
58
133
15
118
73
52
28
81
40
40
27
29
90
212
13
14
90
219
42
46
43
480
494
451
215
485
214
292
457
479
494
216
457
244
349
415
65
15
390
28
354
358
245
127
327
119
31
246
93
189
339
146
557
410
228
64
160
247
119
129
52
28
557
350
545
27
248
96
476
175
249
103
474
453
372
357
481
496
452
220
487
221
239
488
490
494
495
496
453
284
488
289
466
404
489
411
466
405
467
419
469
420
469
423
470
478
480
412
413
422
423
428
430
432
460
485
433
460
486
439
464
488
446
465
489
447
466
494
447
467
495
451
471
495
453
475
496
298
469
429
470
430
471
430
431
431
439
439
16
514
312
28
528
503
29
530
509
29
539
30
556
31
571
32
256
349
510
192
329
446
300
501
459
304
503
464
306
504
478
308
505
485
312
509
495
316
520
521
318
525
532
320
531
545
295
457
480
496
292
460
348
350
337
361
105
258
341
174
553
515
319
411
363
456
442
459
464
478
485
495
382
309
138
227
258
558
311
166
391
519
32
185
441
322
382
332
384
333
395
410
333
504
505
525
214
214
214
25
ZS)
215
445
531
220
220
221
221
221
222
556
29
526
101
483
368
368
109
477
467
shi]
538
576
568
510
Restriction
root
rootloc
rootsy
rules
RULESSY
seml
sem2
sem3
semaction
Semant
SEMANTICSTACK
SEMANTICSY
SemErr
SEMSY
sn
sp
spix
spixl
stack
StartCopy
StopHash
StoreSymbol
47
89
107
284
289
295
AeA
OMe
AS
61
386
387
410
41
306
309°
31873207
329"
331
3455
34955407
16
250
389
79.300.305
307
1 IB
RES
72319
es
392
4014507
4525
4525
501°
5022508509521
ie
sk)
3050307023100
ews)
eh
ale)
sig
SI)
SR)
SY
3710037160,
3847
“401
450
4529
45222501
S02
04 eeODmmD
ZO
532
78
334
375
376
462
466
466
469
469
536
539
540
5455255655557]
14
23
300
3340537150.24:6252536
336
86
88
E2523
113
53
47
119
30
Sy
592370
404
405
94
99
8271337
25155371554
9371103129316
405
406
444
99
99°
105)
7142171467
1557
257
SHU
S25
169
93
40
A35
338
yA
30S
186
99
209
3505
364
SE
Ste
187
106
545
233i)
364
369
sv
PE
Sal
string
257
339
styp
13201295213 85 154515
315
323
326
358
368
a
Sy
a
Gl)
GS;
Bee
a
30077302
5312753147
3197
37131273167
3117
3327
440
441
442
444
446
ay
SE
BIG
60
399
404
406
a
208
32
re
293
302
314
325
338
59
46
A
3
46 226
386
390
440
Symbolnode
Symboltype
SyNr
SYSTEM
48
t
term
13 1517
35873630
365)
Tre
TERMINALSY
13552530363
typ
type
up
43 205
395
3037731577326
526
Usage
Sey
use
VAL
x
455)
ci lsh
97
99
128)
ZA
100
O04
448
2112
TA
esoones OS
575
575
6
226
228256
390
08597
ze
392
SS
a
440
440
RR
442
514
less
A
GE
aS
399
IS)
401
BOE
448
SO)
450
SYNE}
501
Aa
es
Na
325734073410.
390,739
7392
501
503
504
22
3650.
3045
505
Ban
DR
3660.370
3008
401
509
525
es
mesos mes oo
576
563
6559
383
225
syl
symbol
31.503111
445
446
106"
21106
60)
6 oe
369
383
390
2539225395755
USS es Soe
189
STRING
sy
App. F
Program listings
240
65166,
382
ys!
ken
3237
3857
450
ae
341
5
O)
364
369
440
ae
2
a
547
61
62
570
575
2411]
395
399
445
446
448
S53)
PS
103
183)
QA
106
308s
DR
320m
Wo
33184505
Ak
565
52008531
SHLD
503
9
App. F
coco MOD
1 (* Coco
2
==
Compiler
compiler
241
Coco
Moe
SETHIS is the main module of Coco. It controls the execution
4 compiler compiler. It
Sera) opens and closes the files
One
b) initializes the scanner
Je
clmcallssthe parser
8
d) calls the procedures which collect the symbol sets
9
e) calls the grammar test procedures
10
f) calls the procedure which generates the compiler
1l
g)
calls
the
12
13 Implementation
14
1: cocolex
15
2: cocolex
16
3: cocolex
17
4: cocolex
18
5: cocogra
19
6:
cocosym
lister
to print
a listing
with
error
of
27.12.83
the
messages
restrict ions:
Hash,
Hash table full
Name list full
Include stack overflow
Attribute queue overflow
Too many nodes in TDG (>600)
Symbol list overflow (>199)
Too many terminals (>127)
Hash
PushInc
EnQueue
NewNode
NewSy
20
7: cocosym
NewSy
zu
22 Compiler errors:
23
i: cöcolex
PopInc
Include stack underflow
24
2: cocolex
DeQueue
Attribute queue underflow
25
3: cocosym
GetAt
Try to get attribute inf. for a terminal
26
4: cocogen
OpenFile
Semantic frame not found
27
5: cocogen2 GenSynFiles
Parser frame not found
28
6: cocogen2 NewAdr
Fixups already resolved
29
30 Trace switches:
can be set by "$D letter {letter}" (without spaces)
31 +, A: cocosyn
Print parser input (remove comments!)
32
B: cocosyn
Trace parser run
(remove comments!)
33
C: cocogra
DelGraph
Print visited nodes
34
D: cocotst
FindCircularRules Print derivations between single nt's
35
E: cocotst
TestIfNtToTerm
Trace flow of algorithm
36
F: cocotst
CheckAlternatives Print visited nodes
37
G: cocosym
CollectF irstSet
Print visited nodes
38
H: cocosym
GetFirst Set
Print resulting set
39
I: cocosym
GetFollo wSets
Print resulting sets
40
J: cocosym
CollectFollowSets Print visited nodes
41
K: cocosym
Print sets of term.starts and succ.s
42
L: cocosem
Print generated TDG
43
44 MODULE Coco;
45
46 FROM cocogen
filesopen, CloseFile;
47 FROM cocogen2
GenSynFiles, PutStatistics;
48 FROM cocogra
DeleteRedundantEps, NewEpsBeforeDelNts;
ddGyasncy
49 FROM cocolex
lst, PrintListing;
50 FROM cocolst
FindDelSymbols, GetSymbolSets;
51 FROM cocosym
Parse, printinput, printnodes;
52 FROM cocosyn
FindCircularRules, LL1Test, TestCompleteness,
53 FROM cocotst
TestIfAllNtReached, TestIfNtToTerm;
54
GetNumberOfErrors;
55 FROM Errors
56 FROM
57
58 FROM
59
FileIO
con,
File,
WriteLn,
System
Terminate,
Done,
Open,
WriteString;
normal;
Close,
Read,
Writelnt,
60
61
62
63
64
65
66
67
68
69
70
ql
72
73
74
75
76
14)
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Of
98
99
100
101
102
103
104
105
106
107
108
109
110
111
1212
113
114
115
116
7
118
App.
Program listings
242
VAR
ch?
CHAR;
correct:
BOOLEAN;
aus
lstn:
ok:
BOOLEAN;
ARRAY(0..63]
BOOLEAN;
semerrors:
CARDINAL;
synerrors:
CARDINAL;
(* ChangeExtension
PROCEDURE
OF
Change
(*TRUE
(*list
CHAR;
extension
ChangeExtension(VAR
if grammar is LL(1)*)
file name*)
of file
name
ext:ARRAY
OF CHAR;
old,new:ARRAY
OF CHAR);
VAR i,}: INTEGER;
BEGIN
1:=0;
WHILE
(i<=HIGH(old))
WHILE
j:=1;
WHILE
(i>=0)
AND
(old[i]<="
(j>=0)
AND
(old{j]<>".")
IF
j>=0
THEN
1:=j-1;
AND
(old[i]<>0C)
DO 1:=i+l;
") DO DEC(i)
DO DEC(j)
END;
END;
END;
END;
FOR j:=0 TO i DO new[j]:=old[j];
END;
new[itl]:=".";
new[i+2]:=ext[0];
new[i+4]:=ext[2];
new[i+5] :=0C;
new(i+3]:=ext[1];
END
ChangeExtension;
BEGIN
WriteString(con,"Coco
- Compiler
Compiler
Vs 4.1$");
Open (src,0,"",FALSE);
IF NOT Done THEN Terminate (normal) END; (*cancel*)
ChangeExtension (src”.name,1lstn, "LST");
Open (lst,src*.volRef,1lstn,
TRUE);
WriteString(lst,"Coco - Compiler Compiler Vs 4.1
Mr
WriteString(lst," (Source file: "); WriteString(lst,src*.name) ;
WriteString(lst,")$$");
WriteString(con, "parsing");
Parse (correct);
(*parse
GetNumberOfErrors (synerrors,semerrors);
(*check
IF synerrors+semerrors<>0 THEN
IF filesopen THEN CloseFile END;
WriteString
(con, "$listing");
PrintListing;
WriteString(con,"$Compilation terminated. ");
Writelnt
input grammar*)
for errors*)
(con, synerrors+semerrors,0);
WriteString(con," errors
Close (sre); Close (lst);
Read(con,ch);
END;
detected.
Press
Terminate (normal);
WriteString(con, "$evaluating$");
FindDelSymbols;
NewEpsBeforeDelNts;
DeleteRedundantEps;
GetSymbolSets;
TestCompleteness
(ok) ;
IF ok THEN TestIfAllNtReached(ok);
END;
any
key.$");
coco.MOD
119
120
IF ok THEN
IF ok THEN
121
122
IF ok THEN LL1Test (111); END;
IF NOT ok OR NOT 111 THEN
123
243
FindCircularRules(ok); END;
TestIfNtToTerm(ok); END;
WriteString(con,"listing$") ;
124
WriteLn(1st);
125
END;
WriteLn(Yst);
PrintListing;
126
127
IF
128
129
130
ok
THEN
WriteString(con, "writing$");
GenSynFiles;
PutStatistics;
bl
END;
132
IF NOT
133
ok THEN
#
WriteString(con, "Compilation
134
Shey
136
137
138
ELSIF NOT 111 THEN
WriteString(con, "Compilation
ELSE
WriteString
(con, "Compilation
END;
139
Close (src);
140
WriteString(con,"
141
END
with
errors
ended
with
LL(1)
completed.
in grammar
tests.");
errors.");
No errors
detected.");
Close (lst);
Press
any
key.$");
Read(con,
ch) ;
Coco.
&
ch
ChangeExtension
Close
CloseFile
Coco” »
cocogen
77
62
73
56
46
44
46
cocogen2
47
cocogra
cocolex
cocolst
cocosym
cocosyn
cocotst
con
48
49
50
51
52
53
56
185
63
49
correct
ddt
DeleteRedundantEps
Done
56
84
109
85
108
102
141
89
137
99
48
91
Errors
ext
55
73
83
File
Filelo
filesopen
FindCircularRules
FindDelSymbols
GenSynFiles
GetNumberOfErrors
GetSymbolSets
56
56
46
53
Sl
47
55
Sn
102
119
113
129
100
116
HIGH
un
1
74
83
74
5
ended
76
83
79
140
92
108
139
139
98
140
103
140
105
106
107
109
112
VS
28
133
77
“84
80
tH
78
78
78
Ty
333
82
81
81
82
82
82
5
83
84
07
83
80
77
84
80
App. F
Program listings
244
121
121
93
92
95
82
48
91
ite
ig
90
99
122
134
94
93
95
95
96
108
83
114
109
118
an
93
83
83
84
84
118
78
119
80
119
82
120
52
50
52
104
124
PutStatistics
Read
semerrors
src
synerrors
47
56
67
49
68
130
109
100
90
100
140
101
92
101
106
23
106
95
108
139
System
58
95
137
95
140
96
98
147
LL1Test
lst
lstn
name
new
NewEpsBeforeDelNts
normal
ok
old
Open
Parse
64
53
50
65
92
13
printinput
PrintListing
printnodes
58
66
73
56
52
Terminate
58
TestCompleteness
53
TestIfAllNtReached
TestIfNtToTerm
volRef
WriteInt
WriteLn
54
93
56
57
WriteString
57
128
91
Jl)
54
120
106
124
89
133
124
124
139
120
121
122
127
132
103
105
107
2
123
109
118
124
94
135
App. F
cocogen.DEF
(* cocogen
Generator
This module
a) copies
b) copies
c) stores
DEFINITION
FROM
files
Moe
28.12.83
generates the semantic evaluator. It
symbols from the input grammar to the evaluator
text from the semantic frame to the evaluator
attribute assignments (and emits them as semantic
MODULE
FileIO
=
actions)
cocogen;
IMPORT
TYPE
Attrtype
of compiler
245
File;
(term,nonterm, const);
?
VAR
maxsem:
filesopen:
CARDINAL;
BOOLEAN;
PROCEDURE
CloseFile;
(* Closes
the
file
(*number of last semantic action*)
(*files may remain open after a syntax
where
the
semantic
evaluator
PROCEDURE Copy (typ,col:CARDINAL) ;
(* Copies the source symbol typ at column
semantic file*)
PROCEDURE
(* Copies
col
is written
to the
error*)
to*)
generated
CopyFramePart
(VAR fl,f2:File; s:ARRAY OF CHAR);
file fl to file £2 until string s occurs. s is not
copied*)
PROCEDURE EmitAction(line:CARDINAL; VAR sem:CARDINAL);
(* Emits the stored attribute assignments as a semantic action. line
“» is used to print a comment. sem is the number of the new action*)
PROCEDURE GenAssign(typ:Attrtype; left, right:CARDINAL) ;
(* Generates an assignment arg(left)<--arg(right).
typ indicates if
arg(right) is a terminal attribute, a nonterminal attribute or
a constant*)
PROCEDURE
InsertFramePart;
(* Inserts
the middle
part
in the generated
semantics
file*)
PROCEDURE OpenFile(spix:CARDINAL);
(* Opens the file where the semantic evaluator is written to. spix is
the grammar name in Cocol. The name of the generated file is the
grammar name with the suffix "sem"*)
PROCEDURE OpenSem(line:CARDINAL; VAR sem:CARDINAL);
(* Prints the start of a new semantic action (case-number of a new
case-block). line is used to print a comment. sem is the number
the new action*)
PROCEDURE StartCopy (col:CARDINAL);
(* Saves col as the leftmost column
END
cocogen.
in the
following
semantic
of
action*)
App. F
Program listings
246
1 (* cocogen
Q 9 sess===
3 This module
a) copies
4
5
b) copies
1
stores
c)
6
of semantic
Generation
evaluator
Moe
generates the semantic evaluator. It
symbols from the input grammar to the evaluator
text from the semantic frame to the evaluator
attribute assignments (and emits them as semantic
30.12.83
Be
-727770
-----7772722
-------=-----------
8 IMPLEMENTATION
MODULE
cocogen;
10 FROM
cocolex
IMPORT
at,
11 FROM
12 FROM
13
14 FROM
Errors
FileIO
IMPORT
IMPORT
System
IMPORT
CompErr, SemErr;
con, File, Done, Open, Close, Read, Write,
WriteCard, WriteLn, WriteString, WriteText;
Allocate, Deallocate;
line,
col,
src,
GetName;
15
16
CONST
17
blanks
=
19S
20
ident
SELING
=" 17;
lS)
2a
number
=
22
23
Ilparsy
commasy
= 23;
= 33;
24
eolsy
=299;
18
"
We
(*symbol
numbers*)
19;
26 TYPE
27
28
29
30
Sil
32
33
34
35
36
37
38
39
40
Actionptr = POINTER TO Action;
Assignmentptr = POINTER TO Assignment;
Action = RECORD
(*information
sem:
about
attr.eval.
action*)
(*action number*)
firstass: Assignmentptr;
(*to first assignment*)
next:
Actionptr;
(*to next action*)
END;
Assignment = RECORD
(*information about an attr. assignment*)
typ:
Attrtype;
(*term, nonterm, const*)
left:
CARDINAL;
(*spix of left-hand side*)
right: CARDINAL;
(*spix or val of right-hand side*)
next:
Assignmentptr;
(*to next assignment*)
END;
Name = ARRAY[1..80] OF CHAR;
41
42 VAR
43
firstact:
44
firstass:
45
fram:
46
gram:
47
48
graml:
lastact:
49
50
51
lastass:
lastcol:
lasttyp:
leftcol:
margin:
op:
sem:
semname:
52
53
54
55
e
CARDINAL;
58
59 PROCEDURE
Actionptr;
Assignmentptr;
File;
Name;
CARDINAL;
Actionptr;
Assignmentptr;
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
ARRAY[0..commasy]
File;
Name;
(*first generated action*)
(*first stored assignment*)
(*file with frame of sem.Analyzer*)
(*grammar name*)
(*length of grammar name*)
(*last generated action*)
(*last stored assignment*)
(*column of last symbol*)
(*type of last symbol*)
(*leftmost column in semantic action*)
(*indent from left margin*)
OF CHAR; (*operator tablet)
(*file containing sem.evaluator*)
(*file name of sem.evaluator*)
EmitAssign(p:Assignmentptr) ; FORWARD;
App. F
cocogen.MOD
247
60
61
62
(* CloseFile
Close
file containing
the semantic
evaluator
a ee
64 PROCEDURE CloseFile;
“i
65 BEGIN
66
CopyFramePart
(fram, sem, "-->modulename") ;
67
WriteText (sem, gram, graml) ; WriteString(sem,"sem") ;
68
CopyFramePart (fram,sem,"$$$") ;
69
Close(fram); Close (sem);
70
filesopen:=FALSE;
un
END CloseFile;
72
73
?
74 (* Copy
Copy source symbol to semantic evaluator
19 27-2222
76 PROCEDURE Copy (typ,col:CARDINAL);
77 VAR
78
che
CHAR;
79
1,i:
80
name:
81 BEGIN
82
83
84
85
CARDINAL;
Name;
IF col<=lastcol THEN
WriteLn (sem) ;
WriteText (sem,blanks,margin)
IF col>leftcol THEN
86
87
88
lasttyp:=eolsy;
END;
IF
91
92
;
(typ<=number)
Write(sem,"
END;
AND
(lasttyp<=number)
CASE typ OF
94
95
96
|
|
|
1: WriteString(sem, "allas");
2: WriteString(sem,"any");
3: WriteString
(sem, "DECLARATIONS");
97
|
4:
98
99
100
|
|
|
|
|
5: WriteString
(sem, "endsem");
6: WriteString
(sem, "eps”);
7: WriteString
(sem, "GRAMMAR");
102
103
THEN
");
93
101
line*)
WriteText
(sem, blanks, col-leftcol) ;
END;
89
90°»
(*new
WriteString
(sem, "ENDGRAM") ;
8: WriteString(sem,
"IN");
9: WriteString
(sem, "MACROS") ;
| 10: WriteString(sem, "NONTERMINALS") ;
108
|
|
|
|
|
15:
WriteString
(sem, "SEMANTICS");
109
| 16:
WriteString
(sem, "TERMINALS");
110
Aus
il
| 17,18: (*ident, string*)
GetName
(at [l],name,1); WriteText (sem,name,1);
| 19: WriteCard(sem,at[1],0);
113
[2027-33
104
105
106
107
11:
WriteString
(sem, "out");
12:
13:
14:
WriteString
(sem, "PRAGMAS");
WriteString
(sem, "RULES") ;
WriteString
(sem, "sem");
114
115
116
ily
118
a a(*Operators®)
Write(sem,op[typ]);
| 34:
ch:=CHR(at[1]);
IF
(ch="!") OR ((ch="*")
THEN;
ELSE Write (sem,ch) ;
AND
(lasttyp<>ident) )
2)
119
120
121
122
123
124
125
App. F
Program listings
248
END;
END; (*CASE*)
lasttyp:=typ; lastcol:=col;
END Copy;
(* CopyFramePart
Copies
file
fl to file
f2 until
string
s ee
126 ---------------------2277777777777
127 PROCEDURE CopyFramePart
(VAR fl,f2:File; s:ARRAY OF CHAR);
128 VAR
129
130
131
132
ch,startch: CHAR;
1:
INTEGER;
t:
ARRAY[0..50]
OF CHAR;
BEGIN
133
startch:=s[0];
134
135
WHILE NOT f1*.eof
IF ch=startch
Read(fl,ch);
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
THEN
(*check if s occurs*)
1:=0;
WHILE (i<HIGH(s)) AND (ch=s[{i]) AND NOT f1*.eof DO
t[i1]:=ch; INC(1); Read(fl,ch);
END;
IF ch=s[1] THEN RETURN; END;
(*found - exit*)
WriteText (f2,t,1);
(*not found - continue*)
Write (f2,ch);
ELSE Write(f2,ch);
(*normal character - write
END;
Read(fl,ch);
END;
(*WHILE*)
END CopyFramePart;
DO
151
(* EmitAction
152
-------------------------------------------
Emit
stored
153
PROCEDURE
EmitAction(line:CARDINAL;
semantic
VAR
action
2-22...
2...
sem:CARDINAL);
154 VAR
155
156
157
158
act,p: Actionptr;
q: Assignmentptr}
159
BEGIN
160
161
162
163
PROCEDURE
WHILE
EqualAct (pl,p2:
(pl<>NIL)
AND
Assignmentptr):
(p2<>NIL)
(pl*.left=p2*.left) AND
pl:=pl*.next; p2:=p2*.next;
END;
164
RETURN
165
END
(pl=NIL)
AND
AND
BOOLEAN;
(pl*.typ=p2*.typ)
(pl*.right=p2*.right)
AND
DO
(p2=NIL);
EqualAct;
166
167
68
169
170
171
172
1078
174
175
176
IG
BEGIN
IF firstass=NIL
THEN sem:=0;
ELSE
p:=firstact;
WHILE (p<>NIL)
p:=p*.next;
END;
AND
NOT
EqualAct (p*.firstass,firstass)
IF p=NIL
THEN
(*new action*)
OpenSem(line, sem);
it*)
EmitAssign (firstass);
DO
x)
App. F
cocogen MOD
249
178
Allocate
(act, SIZE (Action) );
179
180
181
act*.sem:=sem; act*.firstass:=firstass;
IF firstact=NIL
THEN firstact:=act
182
ELSE
183
184
185
186
187
188
189
lastact”.next:=act
END;
lastact:=act;
ELSE
(*same action found; delete recently
sem:=p*.sem;
WHILE firstass<>NIL DO
g:=firstass; firstass:=firstass”.next;
END;
190
act*.next:=NIL;
stored
assignments 4,
Deallocate(q);
END;
191
192
END;
firstass:=NIL;
193
END
A
EmitAction;
194
195
196
(* EmitAssign
197
------------------~---------------------------------------------------
Write
attribute
assignment
198 PROCEDURE EmitAssign (p:Assignmentptr);
199 VAR
200
1: CARDINAL;
201
name: Name;
202
203
BEGIN
WHILE
p<>NIL
DO
204
205
WriteLn(sem); WriteText (sem,blanks,margin)
GetName (p*.left,name, 1);
206
ZI I
CASE p*.typ
term:
208° *
209
210
211
212
WriteString(sem,”"ASSIGN("); WriteText (sem,name,1);
WriteString(sem,”,at["); WriteCard(sem,p*.right,0);
WriteString(sem,”]);");
| nonterm:
WriteText (sem,name,1); WriteString
(sem, ":=");
213
214
215
216
GAS |
218
219
220
221
222
223
224
OF
GetName (p*.right,name, 1) ;
WriteText (sem,name,1); Write (sem,";");
| const:
WriteText (sem,name,1); WriteString(sem,":=");
WriteCard(sem,p*.right,0); Write(sem,";");
END;
(*CASE*)
p:=p” .next;
END; (*WHILE*)
END EmitAssign;
(* GenAssign
Store
attribute
225 -----------------------226 PROCEDURE GenAssign(t:Attrtype;
227 VAR ass: Assignmentptr;
228
229
;
BEGIN
IF (t=nonterm)
AND
(l=r)
THEN
assignment
07777707700
1,r:CARDINAL);
RETURN;
END;
230
231
232
Allocate(ass,SIZE (Assignment) );
WITH ass* DO typ:=t; left:=1; right:=r; next:=NIL; END;
IF firstass=NIL THEN firstass:=ass; ELSE lastass”.next:=ass;
233
234
235
lastass:=ass;
END GenAssign;
236
END;
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
Zn
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
App. F
Program listings
250
Insert
(* InsertFramePart
PROCEDURE
part
middle
of semantic
evaluator
InsertFramePart;
BEGIN
CopyFramePart
(fram, sem, "-->actions");
margin:=9;
END
InsertFramePart;
(* OpenFile
PROCEDURE
VAR 1,1:
BEGIN
Open
file
for
semantic
evaluator
OpenFile (spix:CARDINAL);
CARDINAL;
GetName (spix,gram,1l); graml:=1;
FOR i:=1 TO graml DO semname[i]:=gram[i];
semname[l+1]:="s";
semname(1+4]:=".";
semname(1+7]:="F";
semname[l+2]:="e";
semname[l+5]:="D";
semname[1+8] :=0C;
END;
semname[1l+3] :="m";
semname[1+6] :="E";
Open (sem, src*.volRef, semname, TRUE) ;
(*definition
module*)
Open (fram,src*.volRef,"cocosemframe",
FALSE);
IF NOT Done THEN
SemErr(25,line,col);
WriteString(con,"The file 'cocosemframe' must be in the same ");
WriteString(con, "subdirectory as the input grammar.$Aborted.$");
CompErr (4)
END;
CopyFramePart
(fram, sem, "-->modulename") ;
WriteText (sem,gram,graml); WriteString(sem,"sem") ;
CopyFramePart
(fram, sem, "-->modulename") ;
WriteText
(sem, gram,graml); WriteString(sem,"sem") ;
CopyFramePart
(fram, sem, "-->implementation") ;
Close (sem);
semname[1+5]:="M"; semname[1+6]:="0"; semname[l+7) :="D";
Open (sem, src*.volRef, semname, TRUE) ;
(*implementation
CopyFramePart
(fram, sem, "-->modulename") ;
WriteText (sem,gram,graml); WriteString(sem,"sem") ;
CopyFramePart
(fram, sem, "-->scannername") ;
WriteText (sem,gram,graml); WriteString(sem,"lex") ;
CopyFramePart
(fram, sem, "-->declarations");
filesopen:=TRUE;
END OpenFile;
(* OpenSem
PROCEDURE
Write
start
of new
OpenSem(line:CARDINAL;
semantic
VAR
action
nr:CARDINAL);
BEGIN
INC (maxsem) ; nr:=maxsem;
WriteString(sem,"$
| ");
WriteString(sem,":
(*line
WriteCard(sem,maxsem, 3) ;
"); WriteCard(sem, line,0);
WriteString(sem,"*)");
END
OpenSem;
(* StartCopy
Set
leftmost
column
in semantic
action
module*)
App. F
cocogenMOD
251
296 PROCEDURE StartCopy(col:CARDINAL) ;
297 BEGIN
leftcol:=col;
lasttyp:=eolsy;
lastcol:=99;
298
END
StartCopy;
299
300 BEGIN (*cocogen*)
301
(*012345678901234567890*)
302
303
304
op:="
=.1|1
1 <>;:,";
maxsem:=11; margin:=0;
END cocogen.
firstact:=NIL;
act
Action
Actionptr
Allocate
ass
199,
27
27
14
Bel
34
230
Assignmentptr
at
Attrtype
blanks
Cc
ch
7A
31
ioe a
39.226
17
84
255
ish aie)
143
144
12
69
64
all
8 304
10
WO)
7
23054
11722263
SB
slab
Assignment
Close
CloseFile
cocogen
cocolex
col
commasy
CompErr
Con
const
Copy
CopyFramePart
Deallocate
Done
EmitAction
EmitAssign
eof
eolsy
EqualAct
Errors
fl
f2
File
FileIo
filesopen
firstact
firstass
FORWARD
28
SUA
ee
29
178
32S eA
COS
¥78
230
B30
BS
Dey
Raye
20%)
filesopen:=FALSE;
nr!
pe\s}
A
SO
Salis}
a)
GG
I
1198227
86 204
UNG
146
69
12
261
262
215
76
66
122
68
127
14
188
133551352,1385
1397218
9141
270
ey
148
G.I
241
265
1222259
153.193
DIE
199221
134
138
24
88
297
1582.165
2172
ali
1272133751347
2138721397
1270
1422
43
44
2
Gh) | Sys)
12
7105271972303
43
171
180
181
303
3
A
19200273273
22308
146
a),
PS
GV
267
269
274
276
278
ee
Re
59
fram
GenAssign
GetName
gram
graml
ASO CR a
226
234
I
Jul
Ay
AG
Ol me2 ole
OT
=PS
HIGH
138
1
EA
aN)
(*"=" must start at pos.
firstass:=NIL;
US)
252
AVS)
USS
ee
za
200
rl
Ake}
2
25292665208
BY
OG
AS
252d
Ay
AN)
er
SISK
SIG
SIS
AICS
AR
a
SN)
re
Re
AV
Sy
ident
InsertFramePar
1
lastact
lastass
lastcol
lasttyp
left
leftcol
line
lparsy
margin
maxsem
name
Name
next
nonterm
nr
number
op
Open
OpenFile
OpenSem
SemErr
semname
spix
src
startch
StartCopy
string
System
E
term
typ
volRef
Write
WriteCard
WriteLn
WriteString
App. F
Program listings
252
19
239
79
231
272
48
49
50
Sl
36
52
10
22
53
287
80
40
32
al
285
21
54
12
248
177
59
206
158
158
156
226
12
37
227
30
95
107
186
214
268
288
11
56
272
248
10
129
296
20
14
131
207
35
257
12
13
13
13
104
116
243
111
249
272
182
232
82
88
161
85
153
200
251
ul
Zou
272
184
233
121
90
161
86
WH
297
116
205
297
260
84
287
Wa
46
38
229
287
90
114
257
280
285
155
209
160
160
188
229
133
161
133
55
96
108
186
216
269
289
260
252
212
251
257
133
297
204
288
111
56
162
242
303
201
80
162
90
302
258
273
205
253
208
258
121
231
297
285
289
2122
293
2135212
254
254
21652268229
255
255
254
303
205
201
173
208
212
213
214
216
1219
182
188
219
231
232
172
217
161
161
12
219
161
161
173
219
162
162
103
1219
186
198
203
205
162
162
164
164
146
209
138
67
98
alba
204
217
273
290
213
141
67
99
112
208
217
274
217
231
68
100
114
208
241
275
69
101
118
209
257
275
83
102
153
209
265
276
84
103
169
210
266
277
86
104
Ln
212
266
2
91
105
179
212
267
278
94
106
179
214
268
288
253
273
253
253
254
254
254
255
255
257
272
258
135
273
139
142
226
229
231
76
258
91
112
83
67
105
90
Zi
114
209
204
94
106
93
114
121
160
160
206
ze!
118
217
143
288
144
289
214
217
95
107
96
108
97
109
98
208
99
a)
100
al)
101
ar
102
Pal
103
Aa
Zo
171
213
160
160
188
231
139
161
138
66
97
109
204
216
270
289
App. F
WriteText
cocogen.MOD
262
266
IST
AX
A
268
ar
22/8)
275
BE
277
288
289
IDEE
253
290
08210
2112160266
App. F
Program listings
254
Generator for syntax
1 (* cocogen2:
2
3 This module generates the parser. It
4
a) translates the top-down graph into
5b)
copies text from the parser frame,
6
the table sizes
7c)
writes the parser tables
8
d) prints statistical information
Moe
files
G-code
inserting
about
the
the
declarations
compilation
io}
10 DEFINITION
11
MODULE
cocogen2;
12 PROCEDURE GenSynFiles;
13 (* Generates the parser
14
15 PROCEDURE
16 (* Writes
17
18 END
and the parser
PutStatistics;
statistics about
cocogen2.
the
tables*)
compilation
to the
list
1.2.84
file*)
of
App. F
cocogen2. MOD
(* cocogen2:
Generator
for
syntax
255
files
Moe
1.2.84
This module generates the parser. It
a) translates the top-down graph into G-code
b) copies text from the parser frame, inserting the declarations
the table sizes
c) writes the parser tables
d) prints statistical information about the compilation
IMPLEMENTATION
MODULE
of
cocogen2;
rr
DIDHGHSwWNwHrH
POW
FROM
FROM
FROM
FROM
FROM
cocogen
cocogra
cocolex
cocolst
IMPORT
IMPORT
IMPORT
IMPORT
cocosym
IMPORT
maxsem, CopyFramePart;
alts, maxn, rootloc, rules, GetNode, Graphnode;
line, col, GetName;
lst;
gramspix, maxany, maxeps, maxt, maxp, maxs, GetA,
GetE, GetF, GetSy, RepSy, Symbolnode, Symbolset,
Symboltype;
YE
Ree
RPP
PRR
wo
PM
Ww
&
DOAIdDO
FROM Errors
IMPORT
CompErr,
SemErr;
FROM
FileIO
IMPORT
con, File, Done, Open, Close, Write,
WriteString, WriteText, WriteLn;
FROM
FROM
System
SYSTEM
IMPORT
IMPORT
Allocate,
VAL;
WriteCard,
Deallocate;
PO>wmwHro
LH
DH
NM
NM
CONST
(*for G-code*)
lmaxc = 3000;
(*G-code
length*)
YDNDM
www
28
29
30
31
32
33
34
35
36
37
38
39
40
4]
42
43
44
45
46
47
48
TYPE
Filename
= ARRAY[1..30]
OF CHAR;
Instruction=(tc, tac,ntc,ntac,ntsc,ntasc, anyc,anyac,epsc,epsac,
.
jmpc, retc) ;
#
VAR
code:
ARRAY[{1..lmaxc]
pe:
CARDINAL;
maxname : CARDINAL;
first:
BOOLEAN;
alo
Oe
OF
[0..255];
(*G-code area*)
(*index in code*)
(*length of name list*)
(*used for printing of tables*)
(*initialization counter*)
CARDINAL;
RECORD
CASE
:BOOLEAN
TRUE:
| FALSE:
OF
ch:
ARRAY[1..2]
card:
CARDINAL;
OF
CHAR;
END;
END;
PROCEDURE
OutByte(VAR
f:File;
ch:CHAR);
PROCEDURE
OutWord(VAR
f:File;
n:CARDINAL);
PROCEDURE
PrintTables(VAR
49 PROCEDURE
f:File);
FORWARD;
FORWARD;
FORWARD;
WriteConstDecl
(VAR f:File;t:ARRAY
50
Sl
52
(* G-code labels
53 MODULE LABMOD;
220222 2222222222222 2202222222222
54 Z=2=22=2=2=2==2==2=2=2=2
55 IMPORT
code, CompErr, Allocate, Deallocate;
56
57 EXPORT
GetAdr, labact, NewAdr, Visited;
58
59
OF CHAR;n:CARDINAL);
FORWARD;
2222222222222
=== 2 === === )
60
61
62
TYPE
Fixupptr
Fixup
= POINTER
= RECORD
TO Fixup;
63
adr:
CARDINAL;
(*G-code
64
next:
Fixupptr;
(*to
65
END;
66
67
App. F
Program listings
256
Labeladr = RECORD
loc,adr: CARDINAL;
68
69
fxs
END;
Fixupptr;
(*node
(*to
address*)
next
fixup*)
address
first
and
corresponding
G-code
address*)
fixup*)
70 VAR
71
lab: ARRAY[1..70] OF Labeladr;
72
labact: CARDINAL;
ve
74
75 PROCEDURE GetAdr (loc, fixup:CARDINAL;
76 VAR
ah
1:
CARDINAL;
78
Fixupptr;
fp:
VAR
adr:CARDINAL) ;
79 BEGIN
80
i:=1;
81
WHILE
82
83
84
85
86
IF i>labact
THEN
(*new label*)
INC (labact); lab[i].loc:=loc; lab[i].adr:=0;
Allocate
(fp, SIZE (Fixup) );
fp*.adr:=fixup; fp*.next:=NIL; lab[i].fix:=fp;
87
88
89
90
(i<=labact)
AND
(lab[i].loc<>loc)
ELSE
(*old label*)
IF lab{i].adr=0 THEN
(*not
Allocate (fp,SIZE (Fixup));
lab[i].fix:=£p;
91
DO
INC(i);
yet resolved*)
fp*.adr:=fixup;
END;
fp*.next:=lab[i].fix;
END;
92
END;
93
adr:=lab[i].adr;
94
END GetAdr;
95
96
97 PROCEDURE NewAdr (loc,adr:CARDINAL);
98 VAR
99
100
101
i:
CARDINAL;
p,q:
Fixupptr;
BEGIN
102
desis
103
WHILE
104
105
106
107
108
109
110
IF i>labact
THEN
(*new label*)
INC (labact); lab[i].loc:=loc;
(*old label*)
ELSE
IF lab[i].adr=0
THEN
(*resolve fixups*)
p:=lab[1].fix;
111
112
113
114
115
116
al
118
(i<=labact)
AND
(lab[i].loc<>loc)
DO
INC(i);
lab[i].adr:=adr;
WHILE p<>NIL DO
code [p*.adr]:=adr DIV 256;
code[p*.adr+1]:=adr MOD 256;
qg:=p; p:=p*.next; Deallocate(q);
END;
lab{i].adr:=adr; lab[i].f1x:=NIL;
ELSE
(*fixups already resolved*)
CompErr (6);
END;
lab[1].fix:=NIL;
App. F
1019
120
121
122
123
cocogen2.MOD
END;
END;
END
NewAdr;
124 PROCEDURE Visited(loc:CARDINAL):
125 VAR 1: CARDINAL;
126 BEGIN
127
Ir
BOOLEAN;
128
129
WHILE (i<=labact) AND (lab[i].loc<>loc)
RETURN (i<=labact) AND (lab[i].adr>0);
130
131
132
END Visited;
133
257
DO
INC(i);
END;
z
BEGIN
(*LABMOD*)
134
labact:=0;
135
END
LABMOD;
136
137
138
139
(* Emit
Emit G-code byte
----------------------------------------
140
PROCEDURE
Emit (byte:CARDINAL);
141 BEGIN code[pc]:=byte; INC(pc); END Emit;
142
143
144 (* Emit2
Emit G-code word
145 ---------------------------------77
146 PROCEDURE Emit2 (word:CARDINAL);
147
BEGIN
148
code [pc]:=word
149%
INC(pc,2);
150
END
DIV
256;
code[pc+1]:=word
MOD
256;
Emit2;
151
3:52
153
(* GenCode
154
---------------------------------- 22-2222
nna
Generate
155
PROCEDURE
GenCode
G-code
for
TDG
in loc
(loc:CARDINAL);
156 VAR
157
adr:
158
gn:
159 BEGIN
CARDINAL;
Graphnode;
160
IF Visited(loc)
161
162
163
NewAdr(loc,pc);
GetNode(loc,gn);
WITH gn DO
164
165
172
RETURN;
(*now
coming
END;
to address
loc*)
CASE typ OF
Gc
166
167
168
169
170
171
THEN
IF lp=0
THEN Emit (ORD(tc)); Emit (sp);
ELSE
GetAdr (lp,pc+2,adr) ;
| nt:
Emit (ORD(tac));
END;
IF lp=0
THEN
Emit(sp);
Emit2 (adr);
IF seml=0
173
174
THEN Emit (ORD(ntc)); Emit (sp);
ELSE Emit (ORD(ntsc)); Emit(sp);
175
END;
176
170
ELSE
GetAdr (lp,pc+2,adr);
Emit (seml);
-
258
Program listings
App. F
IF seml=0
178
THEN Emit (ORD(ntac)); Emit(sp); Emit2 (adr);
179
ELSE Emit (ORD(ntasc) ) ;Emit (sp) ;Emit2 (adr) ; Emit (sem1) ;
180
181
END;
END;
182
| any: IF lp=0
183
184
THEN Emit (ORD (anyc) );
ELSE
185
GetAdr
(lp, pc+2,adr) ;
186
Emit (ORD(anyac)); Emit(sp); Emit2(adr);
187
188
END;
189
| eps: IF sp<>0 THEN
190
IF lp=0
191
THEN Emit (ORD(epsc)); Emit (sp);
192
ELSE
193
GetAdr
(lp, pc+2,adr) ;
194
Emit (ORD(epsac)); Emit(sp); Emit2 (adr) ;
195
END;
196
END;
197
END; (*CASE*)
198
IF sem2<>0 THEN Emit (sem2); END;
199
IF sem3<>0 THEN Emit (sem3); END;
200
IF rp=0 THEN Emit (ORD (retc));
201
ELSIF Visited(rp) THEN
202
GetAdr(rp,pc+l,adr); Emit (ORD(jmpc)); Emit2 (adr);
203
END;
204
IF rp>0 THEN GenCode(rp); END;
205
IF lp>0 THEN GenCode (lp); END;
206
END; (*WITH*)
207
END GenCode;
208
209
210 (* GenSynFiles
Generates files for syntax analysis
a
- -- =
- == --=2------_________ * )
211 wn nnn na a a
212 PROCEDURE GenSynFiles;
218 VAR
214
fn:
Filename;
215
fram:
File;
(*file with parser frame*)
216
graml:
CARDINAL;
(*length of grammar name*)
217
gramname:
Filename;
(*grammar name*)
218
Na
CARDINAL;
219
name:
ARRAY[1..50) OF CHAR;
220
startpc:
CARDINAL;
221
sn:
Symbolnode;
222
syn:
File;
(*file for generated parser*)
223 BEGIN
224
pe:=1;
225
FOR i1:=maxp+1l TO maxs DO
226
labact:=0; startpc:=pc;
227
GetSy(1,sn);
228
GenCode (sn.start);
229
sn.start:=startpc;
230
RepSy (1,sn);
231
END;
232
startpc:=pc; GenCode (rootloc) ;
233
234
maxname:=4;
(*"EOF"+0C*)
235
FOR i:=1 TO maxs DO
236
GetSy(1,sn); GetName (sn.aliasspix,name,]l);
App. F
237
238
239
240
241
242
243
244
cocogen2.
MOD
259
sn.spix:=maxname+1; RepSy(i,sn); INC (maxname,1+1);
(*sn.spix becomes a pointer in the generated name list*)
END;
GetName (gramspix,gramname,graml);
‘
generate parser*)
(*------------------------------------------FOR i:=1
TO graml
DO fn[i]:=gramname[i];
END;
245
246
247
248
fn{graml+1]:="s"; £n[graml+2]:="y"; fn[graml+3]:="n";
£n[graml+5]:="D"; fn[graml+6]:="E"; fn[graml+7]:="F";
TRUE);
fn,ef,
Open (syn, 1st*.volR
FALSE);
Open (fram, lst*.volRef,"cocosynfram
e",
249
IF NOT
250
251
252
253
254
255
256
280,
258
259
260
261
262
Done
fn[graml+4] :=".";
fn[graml+8] :=0C;
THEN
same
WriteString(con,"The file 'cocosynframe' must be in the
WriteString(con,"subdirectory as the input grammar.$");
SemErr(21,line,col); CompErr(5);
END;
syn, "-->modulename")
CopyF(fram,
(*definition
rameP
art ;
WriteText (syn,gramname,graml); WriteString(syn,"syn") ;
CopyF(fram,
syn, "-->modulename")
rameP
art ;
WriteText (syn,gramname,graml); WriteString(syn,"syn");
CopyF
(fram,
syn, "-->implementat
ion");
rameP
art
Close(syn);
£n[graml+5]:="M";
fn[{graml+6]:="0";
");
module*)
fn[graml+7]:="D";
TRUE);
fn,ef,
Open (syn, 1lst*.volR
263
264
265
CopyFramePart (fram, syn, "-->modulename");
(*module
WriteText (syn,gramname,graml); WriteString(syn,"syn");
name*)
266
267
" CopyFramePart
(fram,syn,"-->semantic
analyzer");
(*various
268
WriteText (syn,gramname,graml);
269
270
CopyFramePart (fram,syn,"-->input module");
WriteText (syn,gramname,graml); WriteString(syn,"lex") ;
imports*)
WriteString(syn,"sem") ;
271
272
273
274
CopyFramePart (fram, syn, "-->declarations");
"CONST$");
(syn,ring
WriteSt
=",maxname) ;
maxname
WriteConstDecl (syn,"
205
276
277
278
279
280
281
maxnamep =",maxs);
WriteConstDecl (syn,"
=",pc-1);
maxcode
WriteConstDecl(syn,"
IF maxany=0
=",l);
maxany
THEN WriteConstDecl (syn,"
ELSE WriteConstDecl (syn,"
maxany
=",maxany);
END;
IF maxeps=0
(*semantic
282
THEN
WriteConstDecl
(syn,"
maxeps
u),
283
ELSE
WriteConstDecl
(syn,"
maxeps
=",maxeps);
284
END;
285
286
WriteConstDecl(syn,"
WriteConstDecl(syn,"
maxt
maxp
=",maxt);
=",maxp) ;
287
WriteConstDecl(syn,"
maxs
=",maxs);
288
289
WriteConstDecl(syn,"
startpc
WriteString(syn,"$ ");
290
291
292
293
294
295
declarations*)
=",startpc);
CopyFramePart (fram,syn,"-->tables") ;
PrintTables (syn);
(*module
syn, "-->modulename")
CopyF(fram,
rameP
art ;
WriteText (syn,gramname,graml); WriteString(syn,"syn");
CopyFra
(fram,
syn, "$$$");
mePart
name*)
Program listings
260
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
Sith
312
313
314
315
316
Sal
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
Close(fram);
END
App. F
Close(syn);
GenSynFiles;
(* OutByte
PROCEDURE
BEGIN
Write
OutByte(VAR
a byte
f:File;
Valeo
THEN c.ch[1]:=ch;
ELSE c.ch[2]:=ch;
END;
first:=NOT first;
END OutByte;
value
to tables
file
ch:CHAR);
Vg
(* OutWord
PROCEDURE
OutWord(f,c.card);
Write
OutWord(VAR
a word
f:File;
to tables
file
n:CARDINAL);
BEGIN
IF ic=10 THEN
WriteString(f,"$
"); ic:=0
END;
WriteCard(f,n,5); Write(f,",");
INC (ic) ;
OutWord;
END
(* PrintTables
PROCEDURE
Write
PrintTables(VAR
out
an
initialization
of the
grammar
tables
f:File);
VAR
1,3j,1:
CARDINAL;
name:
ARRAY[1..50]
38
sn:
Symbolset;
Symbolnode;
OF
CHAR;
BEGIN
first:=TRUE;
WriteString(f,"
INLINE($
OutWord(f,pc-1);
OutWord(f,maxt) ;
");
ic:=0;
(*header (table
lengths)*)
OutWord (f,maxp);
OutWord(f,maxs) ;
OutWord(f,maxeps);
OutWord(f,maxany) ;
OutWord(f,maxs) ;
OutWord(f,maxname) ;
WriteString(f,"$(*---G-code---*)$
FOR 1:=1 TO pc-1 DO
*);
ic:=0;
(*G-code*)
OutByte (f,CHR(code[i]));
END;
IF ODD(pc-1)
THEN
OutByte (f,0C);
END;
WriteString(f£,"$(*---nt-symbols---*)$
FOR i1:=maxp+l TO maxs DO
GetSy (1,sn);
OutWord(f,sn.start);
OutWord(f,ORD(sn.del)
*256);
GetF (i,s);
");
ic:=0;
(*nt-symbols*)
App. F
356
357
358
359
360
361
362
363
364
365
366
367
cocogen2.
FOR
j:=0
TO maxt
DIV
16 DO
END;
WriteString(f,"$(*---eps
FOR i:=1 TO maxeps DO
GetE (1,s);
FOR
j:=0
TO maxt
DIV
END;
IF maxeps=0
END;
maxeps:=1;
16 DO
(*dummy*)
END;
3183
WriteString(f,"$(*---any
FOR i:=1
TO maxany
375
GetA(1,s);
376
FOR
j:=0
TO maxt
DIV
END;
END;
IF maxany=0
OutWord(f,0);
END;
maxany:=1; (*dummy*)
(*any-sets*)
16 DO
j:=0
TO
maxt
DIV
16
DO
END;
"»WriteString(f,"$(*---attribute
numbers---*)$
FOR i:=0 TO maxp DO
GetSy (1,sn);
");
ic:=0;
(*attribute
numbers*)
OutWord(f,sn.nra) ;
390
391
END;
WriteString(f,"$(*---pragma
392
OutWord(f,0);
393
394
395
396
FOR i:=maxt+l TO maxp DO
GetSy(i,sn);
OutWord (f,sn.seml);
OutWord(f,sn.sem2) ;
semantic---*)$
OutWord(f,0);
");
psem*)
(*pragma
END;
WriteString(f,"$(*---name
399
OutWord(f,1);
400
401
402
403
404
FOR i:=1 TO maxs DO
GetSy(i,sn);
(*sn.spix is now a pointer
OutWord(f,sn.spix);
END;
pointers---*)$
");
semantic*)
ic:=0;
(*for eofsy*)
405
Writesering(f,
406
OutByte(f,"E");
407
408
OutByte(f,"F"); OutByte(f,0C);
FOR 1:=1 TO maxs DO
(*name
in the
generated
%oi(*—==-names List=——*)iSi
pointers*)
name
")7) tes=07
OutByte(f,"0");
(*name
GetSy (1,sn);
410
GetName (sn.aliasspix,name,
1);
411
FOR j:=1 TO 1 DO OutByte(f,name[j]);
412
413
OutByte(f,0C);
END;
IF ODD(maxname)
ic:=0;
(*dummy
397
398
414
ic:=0;
THEN
382
383
384
409
");
OutWord(f,VAL(CARDINAL,s[j]));
FOR
389
sets---*)$
DO
381
387
388
of eps
16 DO
374
386
"); ic:=0;
(*followers
THEN
370
371
372
385
followers---*)$
F
OutWord(f, VAL(CARDINAL,s[4]));
END;
FOR J:=0 TO maxt DIV
OutWord(f,0);-
378
379
380
261
OutWord(f,VAL (CARDINAL, s[4]));
END;
368
369
377
MOD
THEN
OutByte(f,0C);
END;
END;
list*)
list*)
nodes*)
262
Program listings
415
App. F
WriteString(f,"0);$");
416
END
PrintTables;
417
418
419
420
(* PutStatistics
Writes statistics about compilation to list file
-----------------------2722722777
421
PROCEDURE
422 VAR
423
ptrsize:
424
425
setsize:
storage:
PutStatistics;
CARDINAL;
CARDINAL;
CARDINAL;
426 BEGIN
427
ptrsize:=2; setsize:=2*((maxt DIV 16)+1);
428
storage:=pc-1 +
429
(ptrsize+t2+setsize)*(maxs-maxp) +
430
setsize*maxeps +
431
setsize*maxany +
432
2*(maxpt+1l) +
(*G-code*)
(*ntsymbols*)
(*eps-followers*)
(*any-sets*)
(*nra*)
433
(Eps2)
4* (maxp-maxt+1)
+
434
435
436
437
2* (maxs+1l) +
(*namep*)
maxname +
(*name*)
16;
(*header*)
WriteLn(lst); WriteString(lst,"Statistics:"); WriteLn(lst);
438
439
440
441
442
443
WriteCard(lst,rules,5); WriteString(lst," rules"); WriteLn(lst);
WriteCard(lst,alts,5); WriteString(lst," alternatives"); WriteLn(lst);
WriteCard(lst,maxn,5); WriteString(lst," nodes"); WriteLn(lst);
WriteCard(lst,maxsem-10,5); WriteString(lst," semantic actions");
WriteLn (lst);
WriteCard(lst,maxeps,5); WriteString(lst," eps with look ahead");
444
445
446
447
448
449
450
451
452
WriteLn (lst);
WriteCard(lst,maxany,5); WriteString(lst," any-sets"); WriteLn (lst);
WriteCard(lst,pc-1,5); WriteString(lst," bytes for G-code");
WriteLn(lst);
WriteCard(lst,storage,5);
WriteString(lst," bytes for
END PutStatistics;
453
(* WriteConstDecl
Write
grammar
constant
tables
(total)");
declaration
WriteLln(lst);
text
454 =---------------2----2--------------2--_
22...
__________ *)
455 PROCEDURE WriteConstDecl(VAR f:File; t:ARRAY OF CHAR; n:CARDINAL) ;
456
457
458
459
460
BEGIN
WriteString(f,t); WriteCard(f,n,4);
END WriteConstDecl;
END
cocogen2.
adr
63
108
179
67
132
180
75
112)
186
84
143"
187
aliasspix
Allocate
236
22
410
56
85
89
13
439
306
306
alts
WriteString(f,";$");
any
anyac
183
30
anyc
byte
c
30
140
38
187
184
141
305
86
“L138
193
88
89
VICE
194
202
“93.
1290
202
937
ST
ode
teow
0
77
App. F
Cc
card
ch
Close
cocogen
cocogen2
cocogra
cocolex
cocolst
cocosym
code
col
CompErr
con
CopyFramePart
Deallocate
del
Done
Emit
Emit2
eps
epsac
epsc
Errors
if:
File
FileIo
Filename
first
fix
fixup
Fixup
Fixupptr
fn
GenCode
GenSynFiles
GetA
GetAdr
GetE
GetF
GetName
GetNode
GetSy
gn
graml
gramname
gramspix
cocogen2.
246
41
40
20
12
10
13
14
15
16
33
14
19
20
12
22
354
20
140
179
199
146
189
30
30
19
46
335
353
392
411
20
20
29
36
68
75
61
61
214
261
46
78
215
296
155
212
16
58
17
17
14
13
17
158
216
257
217
16
348
306
46
259
MOD
263
407
412
414
302
296
305
296
305
306
306
141
148
148
345
258
264
267
269
272
291
293
295
174
191
174
194
174
194
19
198
334
350
391
407
460
56
252
56
250
254
56
1172
113
118
251
256
114
252
249
141
180
200
150
166
180
202
169
166
180
169
184
169
187
173
187
173
191
179
180
187
194
202
47
336
354
392
412
46
48
337
357
395
414
47
49
338
360
396
415
48
302
339
364
398
455
49
306
340
369
399
457
Za
314
341
373
403
457
222
317
342
377
405
457
302
319
343
382
406
319
345
386
406
326
348
389
407
314
326
455
214
304
86
86
62
64
244
261
47
85
248
217
308
89
89
85
68
245
262
48
86
254
308
90
334
106
110
116
89
78
245
100
245
245
246
246
246
246
247
49
86
256
86
258
89
264
89
267
89
269
90
272
291
29388295
204
297
375
75
362
355
236
162
227
162
241
261
241
241
205
207
228
232
94
168
177
186
193
202
241
410
236
163
244
261
244
352
388
394
401
409
245
261
255
245
265
257
245
268
265
245
270
268
246
294
270
246
246
246
194
191
294
261
255
Graphnode
ih
LC
Instruction
j
jmpe
1
lab
labact
Labeladr
maxany
maxeps
maxn
maxname
maxp
maxs
maxsem
maxt
n
name
NewAdr
next
nra
nt
ntac
ntasc
ntc
ntsc
ODD
Open
OutByte
OutWord
9
pe
PrintTables
ptrsize
PutStatistics
q
RepSy
retc
App. F
Program listings
264
13
77
93
116
230
355
409
37
405
30
218
30
218
Da
106
58
66
53
14
26
67
128
165
15
439
445
16
16
13
35
16
16
WZ
16
47
219
58
64
389
171
30
30
30
30
347
20
46
47
353
399
100
34
224
48
423
421
100
17
30
158
80
99
116
235
361
81
102
125
236
362
81
103
127
237
374
81
103
128
244
375
82
103
128
244
387
84
104
128
244
388
84
106
129
328
393
86
106
129
344
394
88
106
218
345
400
89
108
225
351
401
90
110
227
352
408
316
oly
320
334
343
350
360
373
386
391
398
328
202
236
81
108
72
71
135
252
33
19
128
168
247
440
446
Zu
281
440
234
225
225
44]
285
49
236
97
86
356
357
363
364
368
376
377
381
411
411
237
84
110
81
328
84
116
82
410
86
116
84
411
88
128
103
89
129
104
90
98
103
106
106
106
128
129
134
226
81
155
171
248
440
446
279
283
81
160
11)
262
440
447
340
339
84
161
183
437
441
448
374
361
84
162
186
437
441
449
380
367
97
103
103
106
106
124
190
437
442
449
384
371
193
438
443
205
438
443
205
438
444
439
445
439
445
431
430
445
443
237
286
235
237
33,
275
274
351
287
342
387
338
414
393
341
435
429
351
432
400
433
408
429
434
336
314
329
121
89
356
319
410
161
114
363
455
411
368
457
376
381
393
427
433
248
309
314
357
262
345
321
364
348
335
369
406
336
377
406
337
382
407
338
389
407
339
392
411
340
392
412
341
395
414
342
396
111
141
232
326
429
112
148
276
416
113
148
335
114
149
344
114
161
347
114
168
428
177
446
186
193
202
1719
180
173
174
414
247
302
306
354
403
110
141
226
292
427
450
114
230
200
114
237
App. F
rootloc
rp
rules
s
seml
sem2
sem3
SemErr
setsize
sn
sp
spix
start
startpc
storage
Symbolnode
Symbolset
Symboltype
syn
System
SYSTEM
t
tac
BC
typ
VAL
Visited
volRef
word
Write
WriteCard
WriteConstDecl
WriteLn
WriteString
WriteText
cocogen2.MOD
13
200
13
330
72
198
199
19
424
221
354
166
237
228
220
425
17
17
18
222
265
276
293
22
23
49
30
30
164
23
58
247
146
20
20
49
455
21
21
334
439
21
232
201
438
355
174
198
199
252
427
227
388
169
403
229
226
428
221
330
204
204
362
180
364
395
430
229
394
174
431
230
395
179
292
288
247
265
278
294
255
268
282
295
255
268
283
296
165
169
166
457
357
124
248
148
319
319
274
458
437
250
343
440
258
377
160
201
439
276
438
255
360
443
265
265
375
37
236
396
180
236
401
187
237
403
189
237
409
191
33
410
194
352
353
256
269
285
Zoi)
270
286
257
270
287
258
ile
288
259
273
289
262
274
291
264
275
292
440
278
44]
279
443
282
445
283
446
285
448
236
457
287
288
439
257
373
445
268
440
265
386
446
270
442
268
391
449
294
444
270
398
457
445
273
405
457
447
289
415
449
294
437
317
438
266
Program listings
Graph node
(* cocogra
App. F
Moe
list
28.12.83
This module builds and handles the top-down graph. It
a) generates and updates single graph nodes
b) concatenates graphs via left or right pointers
the whole graph for tracing
oO = prints
d) inserts eps nodes before deletable nonterminals with alternatives
e) deletes redundant eps-nodes resulting from EBNF-constructs such as
sc
N ORION. ee
ee er BR
rotvonawuwbwßMV+H
ER
%
11 DEFINITION MODULE cocogra;
12
13 FROM cocosym IMPORT Symboltype;
14
15 CONST
16
iy)
18
19
20
za
22
23
24
25
26
27
28
29
maxnodes
= 600;
TYPE
Graphnode = RECORD
typ:
Symboltype;
sp:
CARDINAL;
lp:
CARDINAL;
rp:
CARDINAL;
seml: [0..255];
sem2: [0..255];
sem3: [0..255];
line: CARDINAL;
link: CARDINAL;
(*eps,t,pr,nt,any,err*)
(*node symbol*)
(*left pointer*)
(*right pointer*)
(*evaluation of in-attributes*)
(*evaluation of out-attributes*)
(*semantic action*)
(*line number*)
(*ptr to node with same right successor*)
END;
30
Marklist = ARRAY[0..maxnodes DIV 16] OF
31
32 VAR
33
maxn:
CARDINAL;
(*number of
34
alts:
CARDINAL;
(*number of
35
rules:
CARDINAL;
(*number of
36
rootloc: CARDINAL;
(*root node
37
38 PROCEDURE ClearMarkList
(VAR m:Marklist) ;
39 (* Clears the mark list m*)
40
41 PROCEDURE
42
43
44
BITSET;
graph nodes*)
alternatives, filled by AG*)
grammar rules, filled by AG*)
of grammar, filled by AG*)
ConcatLeft (VAR gp,gl,gpl,gl1:CARDINAL);
(* Links the graph (gp,gl) with the graph (gpl,gll)
The resulting graph is identified by (gp,gl)*)
45 PROCEDURE
via
left
via
right
ConcatRight
(VAR gp,gl,gpl,gll:CARDINAL);
(* Links the graph (gp,gl) with the graph (gpl,gll)
The resulting graph is identified by (gp,gl)*)
iS
tes
Css
Oo
DID
PROCEDURE
Deletable(loc:CARDINAL):
(* TRUE
if the graph
with
the
root
BOOLEAN;
loc is deletable*)
nn
©
=
onie)
PROCEDURE DeleteRedundantEps;
(* Deletes eps nodes in constructions
PROCEDURE
(* TRUE
aAAaAnnnn
PROCEDURE
w
ou»
onio}
(* Gets
pointers.
{x}y and
[x]y*)
DelNode (gn:Graphnode) : BOOLEAN;
if the
node
gn contains
a deletable
symbol*)
GetNode (p:CARDINAL; VAR gn:Graphnode);
the graph node with the index p*)
pointers.
App. F
cocogra.DEF
267
60
61 PROCEDURE
GraphList;
62
63
a test
(* Prints
list
of the
top-down
graphs
of all
rules*)
64 PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist);
65 (* Marks loc in list m as visited*)
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
PROCEDURE Marked(loc:CARDINAL; VAR m:Marklist):
(* TRUE if loc is marked in m*)
PROCEDURE NewEpsBeforeDelNts;
(* Inserts eps nodes in front
of deletable
BOOLEAN;
nt's*)
PROCEDURE NewNode (typsSymboltype; sp,line:CARDINAL): CARDINAL;
(* Generates a new graph node with the specified values and returns
its index*)
PROCEDURE RepNode (p:CARDINAL; gn:Graphnode);
(* Replaces the graph node with index. p by gn*)
END
cocogra.
App. F
Program listings
268
list
node
Graph
(* cocogra
Moe
for coco
29.12.83
This module builds and handles the top-down graph. It
a) generates and updates single graph nodes
graphs via left or right pointers
oO < concatenates
c) prints the whole graph for tracing
d) inserts eps nodes before deletable nonterminals with alternatives
e) deletes redundant eps-nodes resulting from EBNF-constructs such as
vo,sawuPbwMNH
{x}y or [x]y
rR
LEN
ua
IMPLEMENTATION
MODULE
cocogra;
FROM
FROM
cocolex
cocosym
IMPORT
IMPORT
FROM
FROM
Errors
FileIo
IMPORT
IMPORT
ddt, GetName;
maxp, maxs, GetSy, RepSy, Symbolnode,
Symboltype;
Restriction;
con, WriteCard, WriteLn, WriteString,
WriteText;
P>+rrereHrrHrH
oJn\aw>w%M
vo
TYPE
VAR
Graphnodelist = ARRAY[l..maxnodes] OF Graphnode;
gn: Graphnodelist;
(*syntax graph*)
NP
ND
NM
MM
Oo
+
wMNV
>
(* ClearMarkList
Clear
mark
! | ! | | | | ' ! | ' ' | | I ! | ! | | ! | | | ! | | | !
PROCEDURE
i:
BEGIN
m
| | '
! I
|
' I | | ! I ! I I ! | -
CARDINAL;
FOR
i:=0
TO maxnodes
(* ConcatLeft
VAR p:
BEGIN
DIV
Concatenate
MW
nr
wwwh
WCOW
eo
WmOrR
PROCEDURE ConcatLeft
16 DO m[i]:={};
graph
gpl
left
END;
END
to graph
ClearMarkList;
gp
(VAR gp,gl,gpl,gl1:CARDINAL);
CARDINAL;
Www
=
Dm
P:=9p;
Ww
oo
—
gn[p] ..1p:=gpl;
WHILE
gn(p].lp<>0
DO p:=gn[p].lp;
END;
p:=gl;
>
W ow
WHILE gn[p] .link<>0
an[p] .link:=gl1;
END ConcatLeft;
DO p:=gn[p].link;
END;
>
PP
ww
-
(* ConcatRight
Concatenate
graph
gpl
right
to graph
PROCEDURE ConcatRight
(VAR gp,gl,gpl,gl1:CARDINAL) ;
VAR p: CARDINAL;
BEGIN
[u
Se
Sn
u WODWAIDHDUS
p:=gl;
WHILE p<>0 DO gn{p].rp:=gpl;
p:=gn[p].link;
END;
gl:=gll;
END
om
oom
wwre6o0
56
' ! ! cam
ClearMarkList
(VAR m:Marklist);
NM
No
VAR
list
| ! | ' I ' ' | j ' ! I
ConcatRight;
(* Deletable
Check
if graph
in loc
58 PROCEDURE Deletable (loc:CARDINAL) :BOOLEAN;
59 VAR m: Marklist;
is deletable
gp
~
App. F
cocograMOD
269
60
61
PROCEDURE
62
VAR gn:Graphnode;
63
BEGIN
DelGraph (loc:
64
IF loc=0
65
66
IF Marked(loc,m)
Mark (loc,m);
67
68
GetNode (loc,gn);
IF ddt["C"! THEN
69
70
WA
THEN RETURN
CARDINAL) :BOOLEAN;
TRUE;
THEN
END;
RETURN
(*end of graph
FALSE;
found*)
END;
WriteString(con,"DelGraph:") ;
WriteCard(con,1loc,6); WriteCard(con,ORD (gn.typ) ,8);
WriteCard(con,gn.sp,6); WriteLn(con);
72
END;
73
RETURN
74
((gn.lp<>0)
AND DelGraph(gn.lp))
(DelNode(gn)
75
76
AND
OR
DelGraph(gn.rp));
END DelGraph;
77 BEGIN
(*Deletable*)
78
ClearMarkList
(m);
79
80
RETURN DelGraph (loc);
END Deletable;
83 (* DelNode
Test if node gn is deletable
84 ------------------------------22222-000000
: BOOLEAN;
85 PROCEDURE DelNode (gn:Graphnod
e)
86 VAR sn:Symbolnode;
87 BEGIN
88
89
90
91
|
IF gn.typ=nt
THEN GetSy(gn.sp,sn);
” ELSE RETURN
RETURN
sn.del;
gn.typ=eps;
END;
92
END DelNode;
93
94
95 (* DeleteRedundantEps
Delete eps nodes in constructions {x}y and [x]y
96 -----------------------------------200.
*)
97 PROCEDURE DeleteRedundantEps;
98 VAR
99
m: Marklist;
100
1: CARDINAL;
101
sn: Symbolnode;
102
103
PROCEDURE DelEps (loc:CARDINAL);
104
VAR gn,gnl: Graphnode;
105
106
107
108
109
110
24
172
113
BEGIN
IF (loc=0)
OR Marked(loc,m)
RETURN;
IF lp<>0 THEN
GetNode (lp,gnl);
IF
(gnl.typ=eps) AND (gn1l.sem3=0)
AND (gnl.lp=0) AND (gnl.rp<>0) THEN
114
lp:=gnl.rp;
115
END;
116
bald]
118
THEN
Mark (loc,m);
GetNode (loc, gn) ;
WITH gn DO
END;
DelEps (lp);
DelEps (rp);
RepNode (loc, gn);
END;
*)
Program listings
270
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
185
136
137
138
199
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
App. F
END;
END
DelEps;
BEGIN
ClearMarkList
(m);
FOR i:=maxp+l TO maxs
GetSy(i,sn);
DO
DelEps (sn.start);
END;
END
DeleteRedundantEps;
(* GetNode
Get
node
gp
PROCEDURE GetNode (gp:CARDINAL; VAR
BEGIN gnl:=gn[gp]; END GetNode;
(* GraphList
PROCEDURE
trace
output
gnl:Graphnode);
of graph
node
list
GraphList;
VAR
1,3,1: CARDINAL;
name:
sn:
ARRAY[1..80]
Symbolnode;
OF CHAR;
BEGIN
WriteString
(con, "$$Topdown-graph:$$”);
WriteString(con,"loc
symbol
typ
WriteString(con,"
seml
FOR i:=1 TO maxn DO
WriteCard(con,1,3);
sem2
lp
sem3
link
WriteString(con,"
rp");
line$$");
");
WITH gn[i] DO
CASE typ OF
eps,any:
WriteString(con, "
we
| EAMES
GetSy(sp,sn);
GetName(sn.spix,name,1);
FOR
12 DO name[j]:="
j:=l+1
TO
WriteText (con,name,12);
| err:
WriteString(con, "error
END;
(*CASE*)
"; END;
LANG
CASE typ OF
|
|
|
|
eps:
t:
pr:
nt:
any:
ELSE;
END;
WriteString(con,"
WriteString(con,"
WriteString(con,"
WriteString(con,"
WriteString(con,"
eps
t
pr
nt
any
");
");
™);
");
");
(*CASE*)
WriteCard(con,1lp,7); WriteCard(con, rp,7) ;
WriteCard(con,seml,7); WriteCard(con, sem2,7);
WriteCard(con, sem3, 7); WriteCard(con, link, 7);
WriteCard(con,line,7); WriteLn(con);
END; (*WITH*)
END; (*FOR*)
END GraphList;
(* Mark
Marks
node
loc
in m as
visited
a
cocograMOD
271
PROCEDURE Mark(loc:CARDINAL;
BEGIN INCL(m[loc DIV 16],loc
VAR m:Marklist);
MOD 16); END Mark;
(* Marked
loc
Tests
PROCEDURE
BEGIN
if node
Marked(loc:CARDINAL;
RETURN
(loc
MOD
16)
(* NewEpsBeforeDelNts
PROCEDURE
is marked
VAR
IN m[loc
Insert
in m
m:Marklist):
BOOLEAN;
DIV
Marked;
16];
eps before
END
del.
nt's
with
alternatives
NewEpsBeforeDelNts;
VAR
gn,gnl: Graphnode;
loc, locl,maxloc: CARDINAL;
sn:
Symbolnode;
BEGIN
maxloc:=maxn;
FOR loc:=1 TO maxloc DO
GetNode (loc,gn);
IF (gn.typ=nt) AND (gn.lp<>0) AND DelNode(gn)
locl:=NewNode (gn.typ,gn.sp,gn.line);
gnl:=gn; gnl.lp:=0;
THEN
WITH gn DO
typ:=eps;
END;
RepNode
sp:=0;
rp:=locl;
seml:=0;
sem2:=0;
sem3:=0;
(locl,gnl);
RepNode (loc, gn) ;
“~
END;
END; (*FOR*)
END NewEpsBeforeDelNts;
(* NewNode
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
PROCEDURE
Generate
a new
NewNode (t:Symboltype;
graph
node
s:CARDINAL;
and
return
1:CARDINAL):
the
CARDINAL;
BEGIN
INC (maxn) ;
IF maxn>maxnodes THEN Restriction(5); END;
WITH gn[maxn] DO
typ:=t; sp:=s; lp:=0; rp:=0; seml:=0; sem2:=0;
line:=1; link:=0;
END;
RETURN
maxn;
END NewNode;
(* RepNode
Replace
node
gp
PROCEDURE RepNode (gp:CARDINAL; gnl:Graphnode) ;
BEGIN gn[gp] :=gnl; END RepNode;
BEGIN (*cocogra*)
maxn:=0;
END cocogra.
index
sem3:=0;
any
ClearMarkList
cocogra
cocolex
cocosym
con
151
26
hl
18)
14
17
156
170
33
47
13
89
ConcatLeft
ConcatRight
dat
del
DelEps
103
Deletable
58
DeleteRedundantEps
DelGraph
61
DelNode
74
eps
90
err
Errors
FileIo
GetName
GetNode
GetSy
gl
gll
gn
157
16
17
12)
67
14
gpl
33
33
21
Th
114
203
104
206
33
33
GraphList
Graphnode
Graphnodelist
sl
INCL
138
20
20
Ze
180
gnl
gp
j
140
il
line
link
Oe
Mark
Marked
Marklist
maxloc
140
ON!
40
58
108
194
37
202
26
185
66
65
26
194
maxn
147
locl
lp
m
App. F
Program listings
27,2
maxnodes
20
maxp
14
maxs
14
165
28
235
78
123
69
158
ilzıl
42
53
68
70
161
ITA
70
162
71
163
117
80
97
is
85
112
118
120
125
127
74
92
kan
19
200
161
19
154
108
89
39
41
37
73
133
207
111
229
36
38
174
62
21
28
155
154
201
40
61
114
201
37
220
28
186
107
106
oo
197
197
28
124
124
al
125
47
47
37
23
149
219
112
230
47
47
132
154
50
52
38
74
193
230
112
tial
164
144
165
145
168
146
168
148
169
148
169
152
170
204
133
199
199
40
85
200
41
88
200
Sl
89
200
51
90
201
62
104
201
67
108
201
70
109
202
113
114
132
133
1.93
202
202
132
hil
229
230
85
104
193
229
28
100
125
140
147
148
149
155
158
221
41
64
179
204
38
51
65
180
206
73
221
67
185
70
186
>
186
103
194
106
198
106
199
107
207
110
111
113
114
La
168
200
wf
65
78
99
106
107
123
179
180
179
185
99
198
217
218
180
186
179
185
218
219
223
234
215
App. F
cocograMOD
name
141
NewEpsBeforeDelNts
NewNode
201
nt
88
p
34
50
pr
163
RepNode
114
RepSy
14
Restriction
16
rp
s
51
215
seml
sem2
sem3
sn
sp
spix
start
Symbolnode
Symboltype
t
typ
WriteCard
WriteLn
WriteString
169
169
112
86
71
154
125
14
15
153
70
7
17
17
165
18
WriteText
154
193
ails)
153
36
51
155
210
224
164
37
51
Et
37
51
206
207
229
230
13
114
220
220
204
89
154
218
74
220
204
204
1%
89
89
86
215
162
88
70
qi
69
156
273
156
200
37
38
39
40
118
168
204
220
220
101
201
125
204
125
220
142
101
142
195
215
90
70
171
144
220
112
di
150
148
160
168
145
146
148
40
40
154
154
195
200
168
201
169
204
169
220
170
170
al
152
158
161
162
163
164
41
48
App. F
Program listings
274
(* cocolex
Lexical
analyzer
for coco
This is the Coco-scanner. It
a) reads the input grammar
b) returns symbol numbers and terminal
c) hashes names and strings into a name
temporarily)
d) converts number-strings to values
All symbols which are not
'nococosy' and are hashed
Moe
attributes
list
to the parser
(permanently
terminals of Cocol get the
into the name list.
83.03.27
symbol
or
type
DEFINITION MODULE cocolex;
FROM FileIO IMPORT File;
VAR
typ:
CARDINAL;
(*next token code*)
at:
ARRAY[1..10]) OF CARDINAL;
(*attr. values of current token*)
line: CARDINAL;
(*current line number*)
el
on
eee
ee
cee
Cel
jr
HH
&wWwWNHrMN
DWAA
CW
WODAIDNSFwWNHH
col:
CARDINAL;
(*current column number*)
20
ddt:
ARRAY ["A".."Z"] OF BOOLEAN;
(*debug and test switches*)
21
sre:
File;
(*source file*)
22
23 PROCEDURE GetName (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL);
24 (* Get the text of a name or a string with the spelling index spix.
25
len denotes its length*)
26
27
PROCEDURE
GetSy;
28 (* Gets the next input token and fills at, line and col*)
29
30 PROCEDURE RestartHash;
31 (* Causes identifiers and strings to be stored permanently*)
PROCEDURE
(* Causes
END
Ww
w
WW
Dom
MY
Sw
StopHash;
identifiers
cocolex.
and
strings
to be stored
temporarily*)
App. F
cocolex MOD
(* cocolex:
=======
275
lexical analyzer for coco
S======2=2=25==222=2===222222=
moe
83.03.27
83512023
This is the Coco-scanner. It
a) reads the input grammar
b) returns symbol numbers and terminal attribut
es to the parser
c) hashes names and strings’into a name list
(permanently or
temporarily)
d) converts number-strings to values
All symbols which are not terminals of Cocol get
"nococosy' and are hashed into the name list.
IMPLEMENTATION
FROM
cocosyn
MODULE
the
symbol
type
cocolex;
IMPORT
printinput,
FROM Errors
IMPORT SemErr,
FROM
FileIo
IMPORT
FROM
SYSTEM
IMPORT
printnodes;
Restriction;
con, EF, EOL, File,
Read, Write, WriteCard,
VAL;
WriteString,
WriteText;
RPP
PRP
Hr
CMO
BB
UH
vo
Pur
au
DID
wm
r CONST
20
21
22
eofsy
ident
string
number
eqlsy
periodsy
varlantsy
23
24
25
26
27
1parsy
28
Zoe
rparsy
ibracksya
=
0;
=
178
=
18;
19,
=
20;
=
21;
=
22;
=)
23;
Ae
=
=
24;
725;
Ca)
er)
026:
2271;
(Er)
SO Ree LACKS Vm
Sie
lconbrsy
|=)
S25
rconbrsy)
= 5 29
Some
34
lat pansy
ratparsy
=
=
2 OF
30;
35
36
37
38
semicolonsy=
colonsy
=
commasy
=
snococosy)
=
31%,
32,
33;
73%;
39
40
41
42
notyp
buflen
=
=
43 TYPE
44
Charclass
45
46
47
(*lexical
(*numbers
types*)
1..16 reserved
for
keywords*)
aes)
GE)
(ES)
255;
1024*16;
=
(none, letter,digit,quote,eql, period, variant, lpar, rpar, lbrack,
rbrack, lconbr, rconbr, latpar, ratpar, semicolon, colon, comma, endfile,
endline,dollar,minus);
48
49 VAR
50
51
SP
53
54
55
Ce
class:
CHAR;
ARRAY
[0C..377Cj OF Charclass;
DUT:
ARRAY [0..buflen-1] OF CHAR;
bp,bpmax:CARDINAL;
(*class OF input character*)
(*input buffer*)
(*buffer pointers*)
56 CONST
57
58
59
idmax
htmax
= 4980;
=
359;
(*max.length
(*max.length
of identifier list*)
of hash table*)
App. F
Program listings
276
60 VAR
Game
chis
CHAR;
(*current
62
63
64
OS
66
OY)
68
column:
abe
idl:
snidact:
keys:
Whee
storeid:
CARDINAL;
CARDINAL;
ARRAY[0..idmax+20] OF CHAR;
CARDINAL;
CARDINAL;
ARRAY(0..htmax] OF CARDINAL;
BOOLEAN;
(*start
input character*)
column*)
of current
(*identifiers*)
(*last element IN id*)
(*pos. OF last keyword IN id*)
(*hash table*)
(*store id. permanently?*)
69
70
71
(* Nextch
Get
next
input
character
(ch,column
global)
12. --------------------------2 2222222222002
73 PROCEDURE
74 BEGIN
75
222 2o2ooooo 5)
NextCh;
Read(srce,ch);
INC (column);
76
END NextCh;
Ui
78
79 (* Hash
WW) Sear SSS
Hash an
ES
identifier and
SS
return
i
its spix
Se
ee
esas
81 PROCEDURE Hash(idp:CARDINAL; VAR spix: CARDINAL);
82 VAR h,l,d: INTEGER;
SS *)
83
84
85
86
PROCEDURE Equalld(x,y,1:CARDINAL)
:BOOLEAN;
VAR 1: CARDINAL;
BEGIN
87
88
1:=0;
WHILE
(i<l)
89
RETURN
i=];
90
END Equalld;
AND
(id[x+i]=id[y+i])
DO
INC(1);
END;
91
92
BEGIN
93
94
95
l:=idp-idact; spix:=idact+1;
h:=(ORD(id[{spix])*7 + ORD(id[spix+1])
d:= -htmax;
96
LOOP
97
IF ht[h]=0
98
THEN
IF storeid
99
* 17 MOD
(*new
THEN
ht[h]:=spix;
idact:=idp;
htmax;
identifier*)
END;
EXIT;
100
ELSIF
101
102
103
104
105
spix:=ht[h]; EXIT;
ELSE
INC (d, 2);
IF d=htmax THEN Restriction(l);
h:=(h+ABS(d)) MOD htmax;
106
107
END;
Equalld(ht[h],spix,1)
THEN
(*old identifier*)
(*collision*)
END;
(*hash
table
full*)
END;
(*LOOP*)
108
IF idp>idmax
109
END
THEN
Restriction(2);
END;
(*identifier
Hash;
110
al
Es (* EnterKey
Enter
a keyword
114 PROCEDURE EnterKey (sy:CARDINAL;
115 VAR idp,i: INTEGER;
116 BEGIN
117
118
+ 1)
INC (idact); id{idact]:=CHR(sy);
FOR 1:=0 TO HIGH(key) DO
to the
key:ARRAY
identifier
list
full*)
list
OF CHAR);
idp:=idact;
(*store
(*store
symbol number*)
keyword*)
App. F
119
120
121
122
123
124
125
126
AL
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
197
158
159
160
161
162
163
164
165
166
167
168
169
170
ial
172
Ws
174
175
176
177
cocolex MOD
INC (idp);
277
id{idp]:=key[i];
END;
INC (idp); id[idp]:=0c;
Hash (idp, keys) ;
(*keys
END EnterKey;
(* GetName
Get
PROCEDURE
the
contains
name
of an
GetName (spix:CARDINAL;VAR
VAR i,h:CARDINAL;
the
last
keyword
identifier
from
name:ARRAY
spix at any
the
name
1:=spix;
1:=0;
h:=HIGH (name) ;
(id[{i]<>0C)
name[l]:=id[i];
AND
(l<=h)
INC(1);
DO
INC(1);
END;
END GetName;
(* ReadName
Read
identifier
or keyword
PROCEDURE ReadName (VAR typ, val:CARDINAL);
VAR spix,idp: CARDINAL;
BEGIN
idp:=idact;
WHILE (class[ch]=letter)
INC (idp); id[idp]:=ch;
OR
(class[ch]=digit)
DO
NextCh;
END;
INC (1dp);
id[idp]:=0c;
Hash (idp, spix);
‘IF spix<=keys
THEN typ:=ORD (id[spix-1]); val:=0;
ELSE typ:=ident; val:=spix;
(*keyword*)
(*identifier*)
END;
END
ReadName;
(* ReadString
Read
and hash
PROCEDURE ReadString(VAR
VAR
och: CHAR;
idp: CARDINAL;
a string
spix:CARDINAL);
BEGIN
idp:=idact; och:=ch;
INC (idp); id[idp]:=och; NextCh;
LOOP
IF ch=och THEN NextCh; EXIT;
ELSIF
ELSIF
ELSE
(*store
quote*)
ch=EF THEN SemErr(24,line,col); EXIT;
ch=EOL THEN SemErr(23,line,col); EXIT;
INC(idp);
id{idp]:=ch;
NextCh;
END;
END;
INC (idp);
id[idp] :=och;
(*store
quote*)
INC (idp); id[idp] :=0C;
Hash (idp, spix)
END ReadString;
(* RestartHash
Causes
identifiers
to be stored
list
3)
OF CHAR;VAR
BEGIN
WHILE
time*)
permanently
1: CARDINAL) ;
Program listings
278
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
woe
ee
PROCEDURE RestartHash;
BEGIN storeid:=TRUE; END
(* StopHash
sr se
Causes
==
------
= = = = = =
=
x)
RestartHash;
identifiers
a Sn es scm see rk i wis = ni
App. F
ne, cs a mscas
to be stored
ne a
ms awa ps
sr
en
temporarily
Src
es Sy
eS
a a
rs
*)
PROCEDURE StopHash;
BEGIN
storeid:=FALSE;
(* ReadNumber
Fee
un
u
PROCEDURE
END
StopHash;
Read and convert
rea
a a
a a wa
Fa
ae
ee
cardinal
SE
constant
N a ao
SS
SS
SS
ee
*)
ReadNumber
(VAR val:CARDINAL);
BEGIN
val:=0;
WHILE class[ch]=digit DO
IF (val>6553) OR ( (val=6553)
AND
(ch>'5')
)
THEN
SemErr(22,line,col);
WHILE class[ch]=digit
DO NextCh;
END;
ELSE
val:=10*val+VAL
(CARDINAL, ORD (ch) -ORD('0'));
NextCh;
END;
END;
END
ReadNumber;
(* GetSy
nn
a
a -
get
next
=>
2 7-22
lexical
symbol
- - - - - - ---
-- -
2-22
22a
PROCEDURE GetSy;
VAR val:CARDINAL;
BEGIN
REPEAT
WHILE ch=' ' DO NextCh; END;
col:=column;
CASE class[ch] OF
none:
typ:=nococosy; at[1]:=ORD(ch); Nextch;
| letter:
ReadName
(typ, val) ;
IF typ=ident THEN at[1]:=val; END;
| digit:
ReadNum(at
ber
[1]); typ:=number;
| quote:
ReadString(at[1]); typ:=string;
eql:
period:
| variant:
| lpar:
typ:=eqlsy; NextCh;
typ:=periodsy; NextCh;
typ:=variantsy; NextCh;
typ:=lparsy; NextCh;
rpar:
| lbrack:
| rbrack:
lconbr:
rconbr:
| latpar:
ratpar:
semicolon:
| colon:
typ:=rparsy; NextCh;
typ:=lbracksy; NextCh;
typ:=rbracksy; NextCh;
typ:=lconbrsy; NextCh;
typ:=rconbrsy; NextCh;
typ:=latparsy; NextCh;
typ:=ratparsy; NextCh;
typ:=semicolonsy; NextCh;
typ:=colonsy;
NextCh;
comma :
endfile:
| endline:
typ:=commasy;
typ:=eofsy;
typ:=notyp;
NextCh;
----________
*)
App. F
cocolex.MOD
237
238
239
240
241
242
column:=0; INC(line); Nextch;
IF (line MOD 16)=0 THEN (*update counter
IF line>16 THEN
FOR 1:=1 TO 5 DO Write(con,10C) END;
END;
WriteCard(con,
line, 5)
243
244
screen*)
Nextch;
IF CAP (ch)="D"
(*debug option*)
THEN
NextCh;
246
247
248
WHILE
249
250
(CAP(ch)>="A")
„
IF ddt["A"]
IF ddt["B"]
253
WHILE
(CAP(ch)<="Z")
ch<>EOL
DO NextCh;
typ:=notyp;
ELSE typ:=nococosy;
256
END;
END;
END;
END;
at[1]:=ORD('$');
NextCh;
IF ch='-'
259
THEN
260
261
262
263
WHILE ch<>EOL DO NextCh; END;
typ:=notyp;
ELSE typ:=nococosy; at[1]:=ORD('-');
END;
END;
DO
NextCh
THEN printinput:=TRUE
THEN printnodes:=TRUE
254
255
| minus:
AND
ddt [CAP (ch) ]:=TRUE;
END;
21
252
264
on
END;
| dollar:
245
257
258
279
(*CASE*)
265
UNTIL typ<>notyp;
266
END GetSy;
A
u
268
269 BEGIN (*cocolex*)
270
FOR c:="A" TO "Z" DO
AAU
FOR c:=0C TO 377C DO
272
FOR c:='a’ TO 'z' DO
273
FOR c:='A' TO 'Z' DO
274
FOR c:='0' TO '9' DO
20S)
class [EF] :=endfile;
276
class["'"] :=quote;
ddt[c]:=FALSE END;
class[c]:=none; END;
class[c]:=letter; END;
class[c]:=letter; END;
class[c]:=digit; END;
class [EOL] :=endline;
class['$']:=dollar;
class['"'] :=quote;
277
class['(']:=lpar;
class[')']:=rpar;
class[',']:=comma;
278
class['-']:=minus;
class['.']:=period;
class[':']:=colon;
279
class[';']:=semicolon;
class['<']:=latpar;
class['=']:=eql;
280
281
282
283
284
285
286
287
288
289
290
class['>']:=ratpar;
class['{']:=lconbr;
class['[']:=lbrack;
class['|’]:=variant;
class[']']:=rbrack;
class['}']:=rconbr;
FOR 1:=0 TO htmax-1 DO ht[i]:=0; END;
storeid:=TRUE;
1410] :="E"; ial1):="0”, 1d[2]:="F", 1A[l3]:=0C;
idact:=3;
EnterKey( 1,'ALIAS');
Enterkey( 1,'alias');
EnterKey( 2,'ANY');
EnterKey( 2,'any');
column:=0;
col:=0;
line:=1;
291
EnterKey(
292
EnterKey(
4,'ENDGRAM');
293
294
EnterKey(
EnterKey(
5, 'ENDSEM');
6, 'EPS');
295
EnterKey(
7, 'GRAMMAR');
ch:="
";
3,'DECLARATIONS') ;
EnterKkey(
EnterKey(
5, 'endsem');
6,'eps');
Program listings
280
296
297
298
Enterkey(
EnterKey( 8,'IN');
+ EnterKey( 9, 'MACROS');
_Enterkey
(10, 'NONTERMINALS') ;
299
EnterKey
(11, 'OUT');
300
301
302
303
Enterkey
(12, 'PRAGMAS');
EnterKey(13,'RULES');
EnterKey(14,'SEM');
_EnterKey
(15, 'SEMANTIC');
304
EnterKey
(16, 'TERMINALS') ;
305
END
ABS
at
bp
bpmax
buf
buflen
6
€
CAP
ch
Charclass
class
cocolex
cocosyn
col
colon
colonsy
column
comma
commasy
con
d
ddt
digit
dollar
EF
endfile
endline
EnterKey
eofsy
EOL
eql
eqlsy
Equalld
Errors
File
FileIo
GetName
GetSy
h
Hash
HIGH
App. F
8,'in');
EnterKey
(11, 'out');
EnterKey
(14, 'sem');
cocolex.
105
2165021:855219522205225555262
53
53
52
Al
52
a
GH
aa
ales
a
We
BAY
Bahl
SU N
a
ZN
Pk
RIP
Pl
le
245
248
248
249
61
75
144
144
145
163
166
167
UCR
ANY
als) Ailey IG
24552409248
283
4
51
51 144
144
194
198
215
271
272
US
UGS
CNG
Zi
21)
PAR
PIG
Bie
2800 9280582805
9281952990528
12 305
13
167
168
197
214
283
46 233
278
S58)
C2
Smee217375783
46234
277,
32234
15 240
242
82
95 103
104
105
24
9525
1252270
45 144
194
198
219
274
47 244
276
1022116738275
20282355275
47 236
275
114
123
289
289
290
290
291
292
295
296
296
297
298
299
299
300
304
2000235
15
168
253
260
275
AS 2212719
AN BPI
84
90 100
14
15
15
172855135
209
266
82:
94°
97.
98 -100..10%
105
105
re
118
131
271
273
287
274
274
168
249
169
253
194
258
195
260
273
278
274
279
215
279
215
279
293
301
293
302
294
302
294
303
129
ia
132
cocolex.MOD
id
idact
ident
idmax
idp
key
keys
l
latpar
latparsy
lbrack
lbracksy
lconbr
lconbrsy
letter
line
lpar
lparsy
minus
name
NextCh
nococosy
none
notyp
number
och
period
periodsy
printinput
printnodes
quote
ratpar
ratparsy
rbrack
rbracksy
rconbr
rconbrsy
Read
ReadName
ReadNumber
ReadString
RestartHash
Restriction
rpar
rparsy
SemErr
semicolon
semicolonsy
spix
97
67
85
132
88
164
93
151
64
93
145
171
118
122
84
230
230
226
226
228
228
144
168
224
224
Zoi
131
76
224
244
216
216
236
219
163
222
222
251
252
220
231
231
227
227
229
229
19
153
204
174
180
104
22
225
167
232
232
93
151
98
94
87
133
88
169
93
218
108
98
145
172
119
149
88
279
281
100
95
88
133
94
171
98
101
104
88
240
94
172
117
285
105
88
285
117
287
117
285
88
285
119
287
117
89
1215
118
119
129
121
287
143
132
287
163
133
145
147
288
108
147
172
115
tay
173
117
148
119
161
119
163
121
164
121
164
122
169
141
169
89
93
94
100
128
WH!
132
133
133
272
287
273
238
239
242
283
164
226
249
262
166
227
253
169
228
257
198
229
260
201
230
218
231
216
232
22
233
222
234
261
265
164
278
166
171
276
280
276
100
101
128
131
141
148
149
280
281
2107
197
277
278
133
145
225
247
255
271
254
280
281
217
219
220
108
277
168
279
197
94
158
94
173
98
src
StopHash
storeid
string
sy
SYSTEM
typ
val
VAL
variant
variantsy
Write
WriteCard
WriteString
WriteText
x
y
App. F
Program listings
282
75
185
68
22
114
17
140
225
254
140
17
45
26
16
16
16
16
84
84
186
98
220
1977
150
226
255
150
200
223
223
240
242
88
88
180
186
286
151
227
261
io
216
228
262
191
217
229
265
193
281
218
230
DUG)
Dil
PAY) BE
2G.
23200233231
223224
235
236
195
195)
2008
217
200R
e210
218
App. F
cocolst.DEF
1 (* cocolst
2
Prints
ZZZ222=
=
SS
listing
ZSZSEI2I5ESESI
EI
of Cocol
SI
S=S SS
283
text
00222202
Moe
16.8.87
>
3 This module closes the source file and reopens it for reading.
It prints
4 a listing of the source file with line numbers and error messages.
ee
ae
a
en Sl
le
I nn
=)
6 DEFINITION MODULE cocolst;
7 FROM
FileIO
IMPORT
File;
8
9 VAR Ist: File;
(*list file*)
10
11 PROCEDURE PrintListing;
12
13 END
cocolst.
284
Program listings
(* cocolst
Prints
listing
of Cocol
This module closes the source file and
a listing of the source file with line
6 IMPLEMENTATION
7 FROM cocolex
8 FROM Errors
9 FROM FileIO
10
MODULE
IMPORT
IMPORT
IMPORT
Moe
reopens
numbers
16.8.87
it for reading. It prints
and error messages.
cocolst;
15 PROCEDURE GetLine(f:File;
16 VAR ch:CHAR; i:CARDINAL;
line. Return
SSS
VAR
line:ARRAY
empty line if eof.
SS IEE
OF
*)
CHAR);
BEGIN
18
19
20
723
24
text
src;
Errorptr, GetNextSynErr,GetNextSemErr,
PrintSynError;
File, EF, EOL, Open, Close, Read, Write,
WriteString, WriteCard, Writeln;
11
12
13 (* GetLine
Read a source
WAL
SRS IS aS
a
17
App. F
Read(f,ch); i:=0;
WHILE (ch<>EOL) AND
IF (1=0) AND (ch=EF)
END GetLine;
(* PrintSemError
26 PROCEDURE
(ch<>EF) DO line[i]:=ch;
INC(1); Read(f,ch)
THEN line[0]:=EF ELSE line[i]:=0C END;
Print
semantic
PrintSemError(f:File;
error
END;
message
nr,col:CARDINAL);
27 VAR i:CARDINAL;
28 BEGIN
29
30
WriteString(f£,"*****
Writestering(t mas:
31
32
CASE
");
FOR
nr OF
1: WriteString(f,"Symbol
i:=1
TO
declared
col-1
DO
Write(f,"
")
twice");
33
| 2:
WriteString(f,"Grammar
34
35
36
31]
|
|
|
|
WriteString(f,"Undeclared symbol");
WriteString(f,"Terminal on left-hand side of rule");
WriteString(f,"Two rules for the same nonterminal") ;
WriteString(f,"Wrong number of attributes");
38
39
40
41
| 7: WriteString(f,"In-attribute for a terminal") ;
8: WriteString(f,"Wrong attribute direction") ;
3:
4:
5:
6:
name
is
no
nonterminal");
9: WriteString(f,"Wrong attribute name");
10: WriteString(f,"Attribute constant on left-hand
42
I11:
WriteString(f,"Semantic
43
44
45
46
|12:
16:
|21:
|22:
47
48
WriteString(f,"Undeclared semantic macro") ;
WriteString(f,"Pragma used in rules");
WriteString(f,"File "cocosynframe' not found");
WriteString(f,"Number too agit):
23:
124:
WriteString(f,"End
WriteString(f,"End
49
50
5
52
53
54
55
of
of
macro
line
file
declared
side
twice");
in string");
in string");
125: WriteString(f,"File 'cocosemframe'
ELSE WriteString(f,"Error");
END;
not
found") ;
WriteLn(f);
END PrintSemError;
= (* PrintListing
59 VAR
Print
a source
list
END;
with
error
messages
of rule");
App. F
cocolstMOD
60
volRef:
INTEGER;
(*volume
285
or directory
of source
file*)
61
seen:
62
63
64
65
line:
ARRAY[0..255] OF CHAR; (*source line*)
symbols: Errorptr;
(*pointer to error symbols*)
synline,syncol: CARDINAL;
(*line and column of syntax error*)
semnr:
CARDINAL;
(*semantic error number*)
66
67
68
69
semline,semcol: CARDINAL;
Inr:
CARDINAL;
sync,semc:CARDINAL;
18
CARDINAL;
ARRAY[0..63]
OF CHAR;
(*source
name*)
(*line and column of
(*line number*)
(*error counters*)
semantic
error*)
70 BEGIN
71
volRef:=sre*.volRef;
72
73
1:=0; REPEAT srcn[i]:=src*.name[i];
INC(1)
Close (src); Open(src,volRef,sr
FALSE);
en,
GetNextSemErr
(semnr, semline,semcol) ;
GetNextSynErr
(symbols, synline,syncol) ;
74
15
76
GetLine(src,line);
77
WHILE
78
79
80
81
WHILE
symbols<>NIL
DO
PrintSynError (lst,symbols,syncol); INC (sync);
GetNextSynErr (symbols,synline,syncol);
93
END;
WHILE semnr<>0 DO
PrintSemError (lst,semnr,semcol); INC (semc);
GetNextSemErr(semnr,semline,semcol) ;
97
END;
WriteLn(lst) ;
99
WriteCard(lst,sync,5);
100
WriteCard(lst,semc,5);
101
END PrintListing;
102
103 END cocolst.
C
ch
Close
cocolex
cocolst
col
ZZ
Gms
103
29
EF
J
WY
EOL
ly
Errorptr
Errors
he
8
8)
oe
u
Oel
a
File
WriteString(lst,"
WriteString(lst,"
TOR
Oe
10
syntax error(s)$");
semantic error(s)$$");
2.0
IE
7
6
26
f
sync:=0;
WHILE semline=lnr DO
PrintSemError(lst,semnr,semcol);
INC(semc);
GetNextSemErr
(semnr, semline, semcol) ;
END;
GetLine(src,line); INC(lnr);
END;
91
92
98
semc:=0;
DO
GetNextSynErr
(symbols, synline,syncol) ;
END;
84
85
86
87
88
89
94
95
96
Inr:=1;
sren[1-1]=0C;
WriteCard(lst,Inr,5); WriteString(lst,"
");
WriteString(lst,line); WriteLn(lst);
WHILE synline=lnr DO
PrintSynError(lst,symbols,syncol); INC (sync);
82
83
90
line[0]<>EF
UNTIL
2
Oe
he
2
Wi
629293032
0 eet
re
aa
33
45)
34
468
35
Ai,
36
48
App. F
Program listings
286
FileIO
GetLine
GetNextSynErr
i
76
86
82
19
88
96
92
19
20
20
27
29
69
line
Inr
st
20
78
79
20
80
79
62
84
81
76
88
85
77
79
88
91
95
85
91
19
85
85
84
85
72
12
81
81
81
80
71
95
95
86
86
86
73
73
82
91
82
82
18
99
79
30
43
100
98
32
44
36
48
GetNextSemErr
72
We
72
98
99
99
100
31
49
38
50
39
78
40
79
name
nr
Open
PrintListing
PrintSemError
PrintSynError
Read
semc
semcol
semline
semnr
sec
sren
symbols
sync
syncol
synline
volRef
Write
WriteCard
WriteLn
WriteString
101
100
95
96
94
73
90
99
91
92
33
45
96
95
76
96
88
91
92
92
34
46
35
47
App. F
cocosem.DEF
1 (* Generated
2
semantic
analyzer
====2=====- 222222220200 00000-
3 This module is produced
4 attributed grammar.
6
7
8
9
287
by Coco
from
the
semantic
actions
DEFINITION MODULE cocosen;
VAR printactions: BOOLEAN;
(*trace
PROCEDURE Semant (sem:CARDINAL) ;
END cocosem.
the
executed
semantic
of an
actions*)
App. F
Program listings
288
(* Generated
semantic
analyzer
This module is produced
attributed grammar.
by Coco
from
the semantic
actions
of an
w
ome
hdr
IMPLEMENTATION
MODULE
cocosem;
FROM
FROM
FROM
FileIO IMPORT con, WriteCard,
SYSTEM IMPORT WORD;
cocolex IMPORT at;
FROM
cocogen
IMPORT
FROM
cocogra
IMPORT
FROM
FROM
cocolex
cocosym
IMPORT
IMPORT
WriteString;
Attrtype,CloseFile,Copy,EmitAction,
GenAssign,
InsertFramePart,OpenFile,OpenSem,
StartCopy;
alts,rules, rootloc,ConcatLeft,ConcatRight,
GetNode, GraphList, Graphnode, NewNode, RepNode;
typ, line,col,ddt,RestartHash,
StopHash;
gramspix,CompleteAt,Direction,
GetAt,
GetMacroNr, GetSy,NewAt,
NewMacro,
NewSy, RepSy, Symbolnode, Symboltype, SyNr;
FROM
Errors
IMPORT
FROM SYSTEM IMPORT
CONST null=65535;
CompErr,Restriction,
SemErr;
VAL;
TYPE Usage=(def,
check, use) ;
VAR sn:Symbolnode;
sy, Sy1:CARDINAL;
rootsy:CARDINAL;
eofsy:CARDINAL;
gn:Graphnode;
gp, 9p1,9p2,
gp3: CARDINAL;
gl,gl1,912,913:CARDINAL;
dd, ddl, dd2:BOOLEAN;
gpo : CARDINAL;
firstfact
: BOOLEAN;
kind:Usage;
styp:Symboltype;
dir,dirl:Direction;
count: CARDINAL;
n:CARDINAL;
seml, sem2,sem3:CARDINAL;
firstsymbol
: BOOLEAN;
ok: BOOLEAN;
spix,spix1:CARDINAL;
dummy
: CARDINAL;
MODULE SEMANTICSTACK;
IMPORT CompErr,Restriction;
EXPORT Pop, Push;
CONST maxstacksize=70;
VAR stack:ARRAY[1..maxstacksize]OF
CARDINAL;
sp:CARDINAL;
PROCEDURE
VAR
Pop() :CARDINAL;
x:CARDINAL;
BEGIN IF sp=0 THEN CompErr
(6) ;ELSE
RETURN x;
END Pop;
PROCEDURE Push (x:CARDINAL);
BEGIN IF sp<maxstacksize
58
59
THEN
INC (sp) ;stack
[sp] :=x;
ELSE
Restriction (14);
x:=stack [sp] ;DEC (sp) ;END;
App. F
60
61
62
63
64
65
66
67
68
69
70
71
72
Ue)
74
75
76
vi
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
cocosem.MOD
END;
END Push;
BEGIN sp:=0;
END SEMANTICSTACK;
PROCEDURE Error (nr:CARDINAL);
BEGIN SemErr(nr,line,col);BND
PROCEDURE
BEGIN
ASSIGN(VAR
x:WORD;
Error;
y:WORD) ;
xy;
END
ASSIGN;
PROCEDURE
BEGIN
Semant
(sem:CARDINAL)
;
(*IF printactions THEN
WriteString(con,"$
I)
WriteCard(con,
sem, 3);
WriteString(con,"]
");
END;*)
CASE sem
ne
|
12:
OF
(*line 125*)
INC (count);
CASE
kind
OF
use:
IF styp=nt THEN
GetAt (sy, count, spixl,dirl);
IF spixl<>0 THEN
IF dir=dirl
THEN GenAssign (nonterm, spix1,spix);
ELSE Error
(8) ;END;
P
END;
END;
|check:
IF styp=nt THEN
GetAt (sy, count, spix1,dirl);
IF spixl<>0 THEN
IF spix<>spixl
IF dir<>dirl
END;
THEN Error
(9) ;END;
THEN
Error(8) ;END;
END;
|def:
NewAt (sy, spix,dir);
END;
|
13:
(*line
150*)
INC (count);
CASE
kind
OF
use:
IF
styp=t
THEN
GenAssign
(term, spix, count) ;
ELSIF styp=nt THEN
GetAt (sy, count, spixl,dirl);
IF spixl<>0
THEN
IF dir=dirl
THEN
GenAssign (nonterm, spix, spix1)
ELSE
END;
Error (8);
END;
289
Program listings
290
119
120
ial
122
123
124
125
126
17277
128
129
130
sh
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
199
160
161
162
163
164
165
166
167
168
169
170
171
1072
WS)
174
175
176
177
END;
|check:
IF styp=nt
THEN
GetAt (sy, count, spix1,dirl);
IF spixl<>0 THEN
IF spix<>spixl THEN Error
(9) ;END;
IF dir<>dirl THEN Error(8);END;
END;
END;
|def:
NewAt (sy, spix,dir);
IF styp=pr THEN
GenAssign
(term, spix, count) ;
END;
END;
14:
(*line 181*)
INC (count);
IF kind=use
THEN IF styp=nt THEN
GetAt (sy, count, spixl,dirl);
IF spixl<>0 THEN
IF dir=dirl
THEN GenAssign (const,spix1,n);
ELSE Error (8);
END;
END;
END;
ELSE
Error (10);
END;
19%
(*line
198*)
IF NOT
CompleteAt
(sy, count) THEN
Error (6);
END;
16:
KT:
18:
(*line 204*)
Copy (typ, col)
(*line 208*)
StartCopy
(1)
(*line 212*)
firstfact :=VAL (BOOLEAN, Pop());
ddl :=VAL (BOOLEAN, Pop()) ;gl1:=Pop() ;gp1:=Pop();
dd:=VAL (BOOLEAN, Pop () ) ;gl:=Pop() ;gp:=Pop() ;
19:
gpo :=0
(*line 219*)
Push (gp) ;Push (gl) ; Push (VAL (CARDINAL, dd) ) ;
Push (gp1) ;Push (gl1) ;Push (VAL (CARDINAL, ddl) );
20:
Push (VAL (CARDINAL, firstfact));
(*line 225%)
sy:=SyNr (spix);
IF sy=null
THEN sy:=NewSy (spix,styp)
ELSE
END;
Error(1);
2%
(*line 349*)
ASSIGN (gramspix,at[1]);
22%
(*line 349*)
rules:=0;alts:=0;
OpenF ile (gramspix)
; StopHash;
238
(*line
357*)
RestartHash;
App. F
App. F
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
cocosem. MOD
24:
25%
26:
27:
InsertFramePart;styp:=t;
(*line 363*)
eofsy:=NewSy (0,t)
(*line 365*)
styp:=t;
kind:=def;
(*line 368*)
styp:=pr
(*line 370%)
styp:=pr;
28:
29:
30:
kind:=def;
(*line 371%)
GetSy (sy,sn) ;sn.seml:=sem2;
RepSy (sy,sn);
(*line 376*)
GetSy (sy,sn) ;sn.sem2:=sem3;
RepSy (sy,sn);
(*line 382%)
styp:=nt
Silke
(*line
383*)
ASSIGN (spix,at[1]);
322
(*line
384*)
styp:=nt;
332
34:
kind:=def;
(*line 386*)
rootsy:=SyNr (gramspix) ;
IF rootsy=null THEN Error
(2) ;END;
(*line 390*)
sy:=SyNr (spix) ;
IF sy=null THEN
Error (3);sy:=NewSy (spix,err)
END;
GetSy(sy,sn);
IF (sn.typ<>nt)
AND (sn.typ<>err) THEN
Error (4);
END;
IF sn.start<>0 THEN Error
(5) ;END;
39%:
36:
syl:=sy;count:=0;styp:=sn.typ
(*line 401*)
kind:=check;
(*line 404*)
GetSy (syl,sn);
sn.start:=gp;sn.del:=dd;
RepSy (syl,sn);
INC (rules);
Sis
(*line
“
410%)
rootloc:=NewNode
(nt, rootsy, 0);
38:
gp1:=NewNode (t,eofsy, 0);
gl:=rootloc;gll:=gpl;
ConcatRight (rootloc,gl,gpl,gll)
(*line 415*)
IF ddt ["L"]THEN
CloseFile;
39:
(*line
420*)
gp:=gpl;
gl:=gll;
dd:=ddl;
40:
(*line
420%)
INC (alts);
GraphList;END;
291
Program listings
292
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
22
213
274
ZS)
276
277
278
Zo
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
41:
(*line
422%)
INC (alts);
ConcatLeft (gp,gl,gpl,gll);
42:
dd:=dd OR ddl
(*line 429*)
gpo:=0
43:
(*line
430%)
firstfact:=TRUE;
44:
(*line
430*)
gpl:=gp2;
gll:=gl2;
ddl:=dd2;
45;
(*line
46:
firstfact:=FALSE;
(*line 432*)
431*)
IF gp2<>0 THEN
ConcatRight
ddl:=ddl
(gp1,g11,gp2,g12) ;
AND dd2;
END;
47:
(*line 440%)
sy:=SyNr (spix);
IF sy=null THEN
Error (3) ;sy:=NewSy (spix,err)
END;
GetSy (sy,sn);
IF sn.typ=pr THEN
Error (16) ;END;
gp2:=NewNode (sn.typ,sy, line);
gl2:=gp2;dd2:=FALSE;gpo:=gp2;
count :=0;styp:=sn.typ
48:
. 49;
(*line 450*)
kind:=use;
(*line
451*)
GetNode (gp2,gn);
gn.seml:=seml;gn.sem2:=sem2;
RepNode (gp2,gn)
50:
51%
O2
(*line 456*)
gp2:=NewNode
(eps, 0, line);
g1l2:=gp2;dd2:=TRUE;gpo:=gp2
(*line 459*)
gp2:=NewNode
(any, 0,line);
912:=gp2;dd2:=FALSE;gpo:=gp2
(*line
462*)
IF gpo=0
THEN gp2:=New
(eps,
Node
0,line);
g12:=gp2;dd2:=TRUE;
GetNode (gp2, gn) ; gn. sem3:=sem3;
RepNode (gp2, gn) ;
ELSE GetNode (gpo, gn) ;gn.sem3:=sem3;
RepNode (gpo, gn) ;
gp2:=0;912:=0;gpo:=0
END;
53:
(*line 475%)
gp2:=gp;
gl2:=gl;
dd2:=dd;
54:
(*line
478%)
9p2:=NewNode
(eps, 0, line);
gl2:=gp2;
ConcatLeft (gp,gl,gp2,gl2);
App. F
cocosem.MOD
gp2:=gp;gl2:=gl;dd2:=TRUE;
555
(*line
485*)
gp2:=NewNode
(eps, 0, line);
56:
gl2:=gp2;
ConcatRight (gp,gl,gp,gl);
ConcatLeft (gp,gl,gp2,g12) ;
gp2:=gp;dd2:=TRUE;
(*line 493*)
IF firstfact THEN
gp3:=9p2;g13:=g12;
gp2:=NewNode
(eps, 0, line) ;gl2:=gp2;
ConcatRight (gp2,912,9P3,913);
alle
(*line
END;
502*)
sem1:=0;semZ:=0
58:
59:
60:
(*line 503*)
count :=0;
(*line 510*)
IF styp<>nt THEN
dir:=down;
(*line 515*)
Error(7);END;
ASSIGN (n,at[1]);
(Sl
(*line 520*)
IF kind=use THEN
EmitAction(line,seml) ;
END;
62:
63:
(*line 526%)
dir:=up
(*line 531*)
IF (kind=use) OR(styp=pr) THEN
EmitAction(line,sem2) ;
END;
64:
65:
66:
67:
(*line 537*)
StopHash; firstsymbol:=TRUE
(*line 538%)
RestartHash
(*line 539%)
GetMacroNr
(spix, sem3) ;
IF sem3=0 THEN Error (12) ;END;
(*line 543*)
IF firstsymbol THEN
firstsymbol:=FALSE;
OpenSem(line,
sem3) ; StartCopy (col)
END;
Copy (typ, col)
68:
(*line
69:
RestartHash;
(*line 556*)
549*)
OpenSem(line,
sem3) ;
NewMacro
(spix, sem3, 0k) ;
IF NOT ok THEN Error (11);END;
OE
StopHash; firstsymbol:=TRUE;
(*line 562*)
IF firstsymbol THEN
firstsymbol:=FALSE; StartCopy (col)
END;
Ws
PAB
Copy (typ, col)
(*line 568*)
RestartHash
(*line 575*)
293
294
Program listings
356
357
GetSy(sy,sn);sn.aliasspix:=sp1ix;
RepSy (sy,sn);
358
359
App. F
END;
END
Semant;
360 BEGIN
361
362
printactions:=FALSE;
END cocosem.
aliasspix
alts
any
356
15°
276
174°
236
ASSIGN
67
1)
UR
at
Attrtype
check
CloseFile
cocogen
cocogra
cocolex
LO
298
13
24
94
120
1372230
13
NS
10
17
cocosem
cocosym
col
CompErr
CompleteAt
th
18
17
21
18
«238
ASB)
217
Sy
Go
46
149
153
53
98399340350
con
ConcatLeft
ConcatRight
const
8
19523955295
sy
22
PSs
141
Copy
count
SMS 3s 40
352
38
83
87
96
312
21928265,
32015
9861625022072
32
158
163
234
32
248
254
264
1708229
ZA
S02
S128
183
220
37
89
OO
Os!
37
87
89
96
18
37
315
44
13
320
326
28
180
225
273
280
293
298
208
211
259
64
65
91
98
204
208
212
214
21
8
34
157
164
244
cull Fey)
SINS
ec
13
ey
ae
ale
19
87
IC
19333
16
269
282
284
US) US)
RE}
A
Si
IGE)
NG
PG,
dd
ddl
dd2
ddt
def
del
dir
dirl
Direction
down
dummy
EmitAction
eofsy
eps
err
Error
Errors
FileIo
firstfact
firstsymbol
GenAssign
GetAt
GetMacroNr
GetNode
GetSy
gl
eng)
3
es On
saya)
352
aly
106
110
112
122
131
135
2312740
240
248
274
277
240
254
281
291
254
291
296
302
129
114
140
122
315
125
323
138
140
124
314
125
334
142
346
146
150
169
290
295
296
300
300
188)
138
149
201
ie,
DE
997112
306
99
259
116
262
250
SV)
304
AG)
an
122138
DIY
Oo)
a
ae
350
356
239
cocosem.MOD
gp3
gpo
gramspix
GraphList
Graphnode
InsertFramePart
kind
line
maxstacksize
n
NewAt
NewMacro
NewNode
NewSy
nonterm
nr
nt
nulıy
+,
ok
OpenFile
OpenSem
Pop
pr
printactions
Push
RepNode
RepSy
RestartHash
Restriction
rootloc
rootsy
rules
sem
seml
sem2
sem3
Semant
SEMANTICSTACK
SemErr
sn
sp
spix
spixl
301
31
sil
299
31
29
30
302
30
30
276
296
30
33
18
16
16
14
35
17
344
48
39
19
19
16
20
90
64
86
23
42
14
14
47
130
361
47
16
20
17
21
15
27
15
72
40
40
40
72
45
21
25
219
50
43
206
43
124
158
247
301
305
269
159
163
253
305
307
270
162
158
246
277
298
305
160
172
163
252
277
299
307
242
175
22929
178
84
65
107
263
49
141
103
345
224
168
115
65
95
167
345
175
338
51
185
56
271
191
177
46
224
203
174
79
190
190
193
359
63
65
190
220
53
90
208
87
138
295
226
264
306
227
274
307
233
277
239
281
247
286
253
290
294
295
296
270
220
271
232
282
239
282
289
283
295
284
296
284
300
285
300
301
225
253
280
301
226
263
281
302
227
264
282
305
232
264
283
306
239
269
286
306
246
271
289
307
253
273
293
274
294
274
295
264
203
274
277
279
284
285
286
136
273
183
276
188
280
201
293
217
298
267
306
319
320
325
326
338
225
180
263
208
273
259
276
280
293
298
306
111
204
346
121
207
137
258
196
200
211
224
314
157
262
158
325
158
158
159
159
159
162
285
221
342
162
162
163
163
163
164
334
338
344
345
210
265
62
129
23
356
211
356
214
357
29
131
166
168
198
112
113
115
122
123
57
317
129
344
55
187
61
283
194
331
59
226
204
222
357
354
227
224
270
193
282
270
270
282
310
270
284
320
310
284
326
333
190
220
53
98
257
88
139
197
221
53
103
259
90
141
193
261
57
110
333
96
193
262
58
115
345
97
194
263
58
124
356
98
Program listings
296
stack
start
StartCopy
StopHash
styp
Sy
syl
Symbolnode
Symboltype
SyNr
SYSTEM
t
term
typ
up
Usage
use
VAL
WORD
WriteCard
WriteString
x
y
33
220
155
1515
86
196
87
191
263
215
25
36
166
22
178
131
153
35
85
157
67
53
69
App. F
58
338
329
95
200
96
193
356
219
350
347
109
215
103
194
Soff
221
111
265
i
206
203
206
250
180
182
225
211
21
108
158
67
54
121
314
122
207
130
325
129
208
Ney)
168
178
182
138
210
149
215
166
27
167
168
29080259
ZS)
262
263
265
340
352
136
159
267
162
319
163
325
164
56
58
67
69
185
App. F
cocosemframe
(* Generated
semantic
analyzer
This module is produced
attributed grammar.
DEFINITION MODULE
VAR printactions:
by Coco
from
-->modulename;
BOOLEAN;
(*trace
11
12
13
14
IE
PROCEDURE Semant (sem:CARDINAL) ;
END -->modulename.
-->implementation
(* Generated semantic analyzer
S===2=2=2=2=222=2===2=2=2==2=22==2=====
This module is produced by Coco from
attributed grammar.
De
nn
nn
a
18
FROM
10
the semantic
actions
the
executed
semantic
the
semantic
actions
ne
16 IMPLEMENTATION MODULE -->modulename;
17 FROM FileIO IMPORT con, WriteCard, WriteString;
SYSTEM
IMPORT
WORD;
19 FROM -->scannername
20
21 -->declarations
22
23
24
PROCEDURE
BEGIN
250
Xt=y;
26
END
29
BEGIN
30
(*IF
31
+.
IMPORT
ASSIGN(VAR
at;
x:WORD;
y:WORD) ;
ASSIGN;
Du
28 PROCEDURE
Semant (sem:CARDINAL) ;
printactions
THEN
WriteString(con,"$
[");
32
33
WriteCard(con,sem,
3) ;
WriteString(con,"]
");
34
END; *)
35
CASE sem OF
36
112;
37 -->actions
38
END;
39
END Semant;
40
BEGIN
41
printactions:=FALSE;
42
END
-->modulename.
actions
37
ASSIGN
23
at
con
declarations
19
17
21
FileIoO
17
implementation
modulename
printactions
scannername
sem
10
6
7
19
8
Semant
sy
SYSTEM
18
WORD
WriteCard
18
17
297
26
9
4]
16
28
35
ASS)
233
42
ae
of an
actions*)
of an
ee
*)
298
Program listings
WriteString
17
x
y
me
DS)
23S
App. F
cocosym.DEF
(* cocosym
Symbol
This module
a) generates and updates
nonterminals
list
symbol
for
nodes
299
coco
for
Moe
terminals,
28.12.83
pragmas
searches names in the symbol list
stores and retrieves attribute information
stores and retrieves semantic macros
marks
deletable
collects
DEFINITION
symbols
first-sets,
MODULE
in symbol
follow-sets,
list
eps-sets
and any-sets
cocosym;
CONST
,
maxterminals
=
128;
TYPE
Direction
= (up,down);
(*attribute direction*)
Attributeptr = POINTER TO Attribute;
Attribute = RECORD
spix: CARDINAL;
(*name of attribute*)
dir:
Direction;
(*up,down*)
next: Attributeptr;
(*to next attribute of same nt*)
END;
Symboltype = (eps,t,pr,nt,any,err);
Symbolnode = RECORD
spix:
CARDINAL;
(*spelling index of symbol*)
aliasspix: CARDINAL;
(*spelling index of alias name*)
nra:
CARDINAL;
(*no.of attributes*)
CASE typ:
Symboltype OF
(*type of symbol*)
*,
pr: seml,sem2: CARDINAL; (*pragma semantics*)
| nt,err:
start:
CARDINAL;
(*start
of top-down
graph*)
del:
BOOLEAN;
(*TRUE if deletable*)
firstat: Attributeptr;
(*to first attribute node*)
END;
END;
Symbolset = ARRAY[0..maxterminals DIV 16] OF BITSET;
VAR
maxany:
maxeps:
CARDINAL;
CARDINAL;
(*no.of
(*no.of
maxt:
maxp:
maxs:
gramspix:
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
(*no.of last terminal*)
(*no.of last pragma*)
(*no.of last nonterminal*)
(*grammar name, filled by .AG*)
PROCEDURE
(* Clears
ClearSet (VAR
set s*)
s:Symbo lset;
PROCEDURE
CompleteAt
(sy,nr:CARDINAL)
(* Checks
if symbol
sy has
any-sets*)
eps-follower-sets*)
n:CARDINAL) ;
: BOOLEAN;
nr attributes*)
PROCEDURE FindDelSymbols;
(* Marks deletable nonterminals
and prints
them*)
PROCEDURE GetA(n:CARDINAL; VAR set:Symbolset) ;
(* Gets the any-set with the number n*)
and
300
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
iu
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
We
113
114
App. F
Program listings
PROCEDURE GetAt (sy,n:CARDINAL; VAR spix:CARDINAL;
(* Gets the spelling index spix and the direction
attribute of the symbol sy*)
VAR dir:Direction);
dir of the n-th
PROCEDURE GetE(n:CARDINAL; VAR set:Symbolset);
(* Gets the eps-follower-set with the number n*)
PROCEDURE
(* Gets
GetF (sy:CARDINAL;
the
set
of terminal
VAR
first:Symbolset) ;
start
symbols
for the
nonterminal
PROCEDURE GetFirstSet (loc:CARDINAL; VAR set:Symbolset);
(* Gets the terminal start symbols of the graph with the
root
sy*)
loc*)
PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset) ;
(* Gets followers of the nonterminal sy*)
PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL) ;
(* Gets the number sem of the semantic action corresponding
macro with the name spix*)
PROCEDURE GetSy(sy:CARDINAL;
(* Gets the symbol node with
VAR
the
sn:Symbolnode);
index sy*)
PROCEDURE GetSymbolSets;
(* Collects first-sets, follower-sets,
PROCEDURE IsInSet (n:CARDINAL;
(* TRUE if n is in set s*)
VAR
PROCEDURE NewAt (sy,spix:CARDINAL;
(* Enters a new attribute for the
spix and the direction dir*)
PROCEDURE
(* Enters
sem*)
eps-sets
and any-sets*)
s:Symbols
:BOOLEAN;
et)
dir:Direction);
symbol sy with the
spelling
NewMacro(spix,sem:CARDINAL; VAR ok: BOOLEAN) ;
a new semantic macro with the name spix and the
PROCEDURE NewSy (spix:CARDINAL;
(* Generates a new symbol with
its
returns
index*)
to the
index
action
typ:Symboltype) : CARDINAL;
the name spix and the type typ and
PROCEDURE RepSy (sy:CARDINAL; sn:Symbolnode);
(* Replaces the symbol sy by the node snt)
PROCEDURE SetBit (VAR
(* Sets bit n in set
s:Symbolset;
s*)
PROCEDURE Unit (VAR sl,s2:Symbolset;
(* Adds the set s2 to the set s1*)
PROCEDURE
(* Gets
END
n:CARDINAL) ;
n: CARDINAL) ;
SyNr(spix:CARDINAL) : CARDINAL;
the
cocosym.
symbol
number
for the
identifier
with
the name
spix*)
number
App. F
cocosym.MOD
(* cocosym
Symbol
list
301
for coco
Moe
This module
a) generates and updates symbol nodes for terminals, pragmas
nonterminals
b) searches names in the symbol list
c) stores and retrieves attribute information
d) stores and retrieves semantic macros
o ~— marks
deletable symbols in symbol list
f) collects first-sets, follow-sets, eps-sets and any-sets
29.12.83
and
IMPLEMENTATION MODULE cocosym;
FROM cocogra IMPORT maxn, rootloc, ClearMarkList, Deletable, DelNode,
RP
PRP
Rr
OW
Nr
Sw
SPWMHYPrP
DWYIDO
GetNode,
Graphnode, Mark, Marked, Marklist, RepNode;
15 FROM cocolex IMPORT line, col, ddt, GetName;
16 FROM cocolst IMPORT lst;
17 FROM Errors
IMPORT CompErr, Restriction, SemErr;
18 FROM FileIO
IMPORT con, Write, WriteCard, WriteString,WriteText,WriteLn;
19 FROM
System
IMPORT
Allocate;
20
SYSTEM
IMPORT
VAL;
FROM
21
22 CONST
23
24
25
26
Zi
28
29
30
31
32
33
34
35
+anysetsize
epssetsize
maxsymbols
maxnt
null:
eofsy
= 80;
= 65535;
= 0;
is to be added*)
END;
Firstset
= ARRAY[0..maxnt-1] OF RECORD
ts:
Symbolset;
(*terminal symbols*)
ready: BOOLEAN;
(*TRUE if ts is complete*)
END;
Macroptr
= POINTER
Macronode
= RECORD
spix: CARDINAL;
sem:
CARDINAL;
next: Macroptr;
46
47
compl.-sets for any-symbols*)
eps-follower-sets*)
symbols*)
(*max.number of nonterminals*)
‘Anyset
= ARRAY[l..anysetsize] OF Symbolset;
Epsset
= ARRAY[l..epssetsize] OF Symbolset;
Followset = ARRAY[0..maxnt-1] OF RECORD
ts:
Symbolset;
(*terminal symbols*)
nts: Symbolset;
(*nts whose start set
40
4)
42
43
44
45
(*max.no.of
(*max.no.of
(*max.no.of
TYPE
36
37
38
39
= 20;
= 70;
= 200;
TO Macronode;
(*name
of semantic
(*associated
(*to next
macro*)
semantic
action*)
sem macro*)
END;
Symbollist
= ARRAY[{0..maxsymbols]
OF Symbolnode;
48
49 VAR
50
yi!
52
Se
54
99
56
Sn
Ss
anyset:
column:
epsset:
ATTA
firstmacro:
fnt:
follow:
lastmacro:
snc
Anyset;
CARDINAL;
Epsset;
Firstset;
Macroptr;
CARDINAL;
Followset;
Macroptr;
Symbollist;
(*actual no.of any-sets*)
(*printing column for terminal
(*actual no.of eps-sets*)
(*terminal start symbols*)
(*first sem macro*)
(*no.of first nonterminal*)
(*terminal successors*)
(*last sem macro*)
(*symbol list*)
sets*)
Program listings
302
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
La
12
NIL}
114
115
116
157
118
s:Symbolset);
App. F
PROCEDURE
AllBit (VAR
PROCEDURE
PROCEDURE
DelBit (VAR s:Symbolset; n:CARDINAL); FORWARD;
PrintSet (VAR s:Symbolset; n:CARDINAL); FORWARD;
FORWARD;
PROCEDURE
PutNt (sy:CARDINAL) ; FORWARD;
PROCEDURE
PutTermSet
(VAR s:Symbolset);
(* CompleteAt
PROCEDURE
BEGIN
Test
CompleteAt
RETURN
correct
(sy,nr:CARDINAL)
(sn[sy].nra=nr)
(* FindDelSymbols
PROCEDURE
if nr is the
Find
all
OR
FORWARD;
no.of
attributes
: BOOLEAN;
(sn[sy].typ=err);
deletable
END
CompleteAt;
symbols
and print
(*while
new
them
FindDelSymbols;
VAR
change: BOOLEAN;
dummy:
CARDINAL;
first:
BOOLEAN;
sales
CARDINAL;
name:
ARRAY [1..50]
sn:
Symbolnode;
BEGIN
fnt:=maxpt1;
REPEAT
change :=FALSE;
OF CHAR;
deletable
symbols*)
FOR 1:=maxp+l TO maxs DO
GetSy (i,sn);
IF (NOT sn.del) AND (sn.start<>0) AND Deletable(sn.start)
sn.del:=TRUE; RepSy(i,sn); change:=TRUE;
THEN
END;
END;
UNTIL
NOT
change;
first:=TRUE;
FOR i:=maxp+l TO maxs
GetSy(i,sn);
IF sn.del THEN
IF
first
(*print
DO
deletable
symbols*)
THEN
WriteLn(1lst);
WriteLn(lst);
WriteString(1st,"Deletable
first:=FALSE;
symbols:");
WriteLn (lst);
END;
GetName (sn.spix,name,1);
WriteString(lst,"
"); WriteText (lst,name, 1);
END;
END;
IF first THEN
WriteLn(lst);
WriteLn(lst);
WriteString(lst,"Grammar
WriteLn(lst);
contains
no deletable
END;
END
FindDelSymbols;
(* GetA
Returns
WriteLn(lst);
the any-set
with
the
number
nr
symbols.");
App. FR
219
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
199
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
al
172
173
174
175
176
rl
cocosym.MOD
PROCEDURE GetA(nr:CARDINAL; VAR
BEGIN s:=anyset[nr]; END GetA;
(* GetAnySets
Find
the
303
s:Symbolset);
complement
sets
for any-nodes
PROCEDURE GetAnySets;
VAR
gn:
Graphnode;
loc,i: CARDINAL;
Si
Symbolset;
BEGIN (*GetAnySets*)
FOR loc:=1 TO maxn
DO
GetNode (loc,gn);
IF
(gn.typ=any)
AND
(gn.lp<>0)
THEN
(*any
with
alternatives*)
GetFirstSet (gn.lp,s);
FOR 1:=0 TO maxt DIV 16 DO (*make
s [i] :=VAL(BITSET,
65535) -s [i];
complement*)
END;
DelBit(s,eofsy);
(*any must not
INC (maxany); anyset [maxany] :=s;
gn.sp:=maxany; RepNode (loc,gn);
END;
END;
END GetAnySets;
(* GetAt
PROCEDURE
Get
name
and
recognize
direction
GetAt(sy,nr:CARDINAL;
VAR
of an
eofsy*)
attribute
spix:CARDINAL;
VAR dir:Direction);
VARs |
i: CARDINAL;
p: Attributeptr;
BEGIN
IF
IF
(sn[sy].typ<>nt) AND (sn[sy].typ<>err)
(nr>sn[sy].nra) OR (sn[sy].typ=err)
THEN spix:=0; dir:=down;
(*semantic
ELSE
p:=sn[sy].firstat;
FOR 1:=1 TO nr-1 DO p:=p*.next;
spix:=p*.spix; dir:=p*.dir;
END
THEN
CompErr(3);
error*)
END;
END;
GetAt;
(* GetE
Returns
the
eps-set
with
PROCEDURE GetE(nr:CARDINAL; VAR
BEGIN s:=epsset [nr]; END GetE;
(* GetEpsSets
PROCEDURE
Find
GetEpsSets;
VAR
curnt: CARDINAL;
m: Marklist;
sn: Symbolnode;
the
follower
the
number
nr
s:Symbolset) ;
symbols
for
eps-nodes
END;
Program listings
App. F
PROCEDURE FindEpsFollowers
(loc, leftsy:CARDINAL;
VAR s:Symbolset;
VAR
nr:CARDINAL) ;
BEGIN
GetFirstSet (loc,s);
IF Deletable(loc) THEN Unit(s,follow[leftsy-fnt]
INC (maxeps); epsset [maxeps] :=s;
.ts,maxt);
END;
nr:=maxeps;
END
FindEpsFollowers;
PROCEDURE
VAR
gn:
nr:
FindEps (loc, leftsy:CARDINAL;
vialp:BOOLEAN);
Graphnode;
CARDINAL;
BEGIN
IF (loc=0) OR Marked(loc,m)
Mark (loc,m) ;
THEN
RETURN;
END;
GetNode (loc,gn);
WITH gn DO
IF
(typ=eps) AND (vialp OR (lp<>0))
FindEpsFollowers
(rp, leftsy,nr) ;
sp:=nr; RepNode(loc,gn) ;
THEN
END;
IF lp<>0
IF rp<>0
THEN
THEN
FindEps(lp,leftsy,TRUE); END;
FindEps (rp,leftsy,FALSE); END;
END;
END
FindEps;
BEGIN (*GetEpsSets*)
ClearMarkList
(m) ;
FOR
curnt:=maxp+l
TO maxs
DO
GetSy(curnt,sn);
FindEps (sn.start,curnt,
FALSE);
END;
END
GetEpsSets;
(* GetF
Returns
PROCEDURE
GetF (sy:CARDINAL;
BEGIN
the
s:=first[sy-fnt].ts;
(* GetFirstSet
PROCEDURE
VAR
terminal
m:
Gets
start
VAR
END
the
PROCEDURE
(*mark
list
CollectFirstSet
s:Symbolset);
terminal
for
start
VAR
symbols
(loc:CARDINAL;
VAR
IF
ddt[{"G"]
THEN
THEN
;
set:Symbolset)
BEGIN
WHILE loc<>0 DO
(*for
Mark(loc,m);
GetNode (loc,gn);
graph
nodes*)
Graphnode;
Symbolnode;
Symbolset;
ClearSet (set,maxt) ;
IF (loc=0) OR Marked(loc,m)
of the
set:Symbolset)
visited
VAR
gn:
sn:
sl:
of sy
GetF;
GetFirstSet (loc:CARDINAL;
Marklist;
symbols
RETURN;
all alternatives*)
END;
;
in loc
App. F
cocosym.MOD
237
238
239
240
241
WriteString(con,"CollectFirstSet:");
WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ)
,6);
WriteCard(con,gn.sp,6); WriteLn(con);
END;
IF DelNode (gn) THEN
242
CollectFirstSet (gn.rp,sl);
Unit(set,sl,maxt);
END;
CASE gn.typ OF
eps: ;
| 468
SetBit (set,gn.sp) ;
| nt:
IF first (gn.sp-fnt] .ready
243
244
245
246
247
248
THEN
249
ELSE
250
Unit (set, first [gn.sp-fn
.ts,maxt)
t] ;
GetSy (gn.sp,sn);
251
CollectFirstSet
(sn.start,sl);
Unit (set,sl,maxt);
END;
252
253
| any: AllBit (set);
254
END;
205
(*CASE*)
loc:=gn.1p;
256
257
258
259
260
261
262
305
END;
END
(*WHILE*)
CollectFirstSet;
BEGIN (*GetFirstSet*)
ClearMarkList
(m);
CollectFirstSet
(loc, set);
IF ddt["H"] THEN
WriteString
(con, "GetFirstSet:");
END;
END GetFirstSet;
263
PrintSet (set,maxt);
264
265
266
267
268 (* GetFollowSets
Get terminal successors of nonterminals
269 --------------------------------400
270 PROCEDURE GetFollowSets;
271 VAR
272
change:
203
i,n,nl:
BOOLEAN;
CARDINAL;
274
Zio)
m:
sn!
Marklist;
Symbolnode;
276
Zn,
PROCEDURE
278
279
280
VAR
gn: Graphnode;
set: Symbolset;
281
BEGIN
282
WHILE
CollectFollowSets
(loc, sym:CARDINAL);
loc<>0
DO
(*step
284
285
286
287
288
289
Mark (loc,m);
GetNode (loc,gn);
WITH gn DO
IF ddt["J"] THEN
WriteString(con,"CollectFollowSets ");
WriteCard(con,loc,6); WriteCard(con,sp,
6);
292
WriteCard(con,sym,6);
END;
IF typ=nt
END;
alternative
IF Marked(loc,m)
290
291
THEN RETURN;
through
283
(*cycle*)
WriteLn(con);
THEN
293
294
GetFirstSet (rp,set);
Unit (follow[sp-fnt].ts,set,maxt) ;
295
IF Deletable(rp)
THEN
chain*)
x)
306
Program listings
App. F
SetBit (follow[sp-fnt] .nts,sym-fnt) ;
296
297
END;
IF ddt ["I"] THEN
298
WriteString(con, "CollectFollowSets:");
299
WriteCard(con,
loc, 6);
300
WriteString(con,"$
"); PrintSet (follow[sp-fnt].ts,maxt);
301
WriteString(con,"$
");
302
PrintSet (follow[sp-fnt] .nts,maxs-maxp) ;
303
WriteLn (con);
304
END;
305
END; (*IF typ=nt*)
306
CollectFollowSets (rp,sym);
307
loc:=1p;
308
309
END;
(*WITH*)
END;
(*WHILE*)
310
END CollectFollowSets;
gyal
312
PROCEDURE Complete (1:CARDINAL);
(*add indirect successors of*)
313
314
VAR j: CARDINAL;
(*i+fnt to follow[i].ts*)
315
BEGIN
316
IF Marked(i,m) THEN RETURN; END;
(*already visited*)
317
Mark (i,m);
318
FOR j:=0 TO maxs-fnt DO
319
IF IsInSet(j,follow[i].nts) THEN
320
Complete (j);
321
Unit (follow[i].ts,follow[j].ts,maxt) ;
322
END;
323
END;
324
END Complete;
325
326 BEGIN (*GetFollowSets*)
327
FOR i:=fnt TO maxs DO
328
ClearSet (follow[1-fnt] .ts,maxt);
329
ClearSet (follow[i-fnt] .nts,maxs-fnt);
330
END;
331
332
ClearMarkList
(m);
333
FOR 1:=fnt TO maxs DO
(*get direct successors of nonterminals*)
334
GetSy(i,sn);
335
IF ddt["I"] THEN
336
WriteString (con, "GetFollowSets (0) :"); WriteCard(con,sn.start,6);
337
WriteCard(con,1,6); WriteLn(con);
338
END;
339
CollectFollowSets (sn.start,i);
340
END;
341
CollectFollowSets (rootloc,maxs+1l);
(*successors of grammar symbol*)
342
343
FOR 1:=0 TO maxs-fnt DO
(*add indirect successors to follow.ts*)
344
ClearMarkList
(m);
345
Complete (i);
346
ClearSet (follow[i].nts,maxt);
347
END;
348
349
IF ddt ["I"] THEN
350
WriteString
(con, "GetFollowSets (3) :$");
351
FOR i:=0 TO maxs-fnt DO
352
WriteCard(con, fnt+i, 6); PrintSet (follow[i].ts,maxt) ;
353
WriteLn (con);
354
END;
355
END;
App. F
356
357
358
359
cocosym.MOD
END
307
GetFollowSets;
(* GetFo
Get
follow-set
of nonterminal
sy
360 =----------=------2_--_--___
_______ 2... _______
____________ x)
361 PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset);
362 BEGIN set:=follow[sy-fnt].ts; END GetFo;
363
364
365
(* GetMacroNr
Get
semantic
macro
366 ------------------------------------------------------~-------------- x)
367 PROCEDURE GetMacroNr (spix:CARDINAL; VAR sem:CARDINAL) ;
368 VAR p: Macroptr;
369 BEGIN
,
370
371
312
373
p:=firstmacro;
WHILE (p<>NIL) AND (p*.spix<>spix) DO p:=p*.next;
IF p=NIL THEN sem:=0; ELSE sem:=p*.sem; END;
END GetMacroNr;
END;
374
375
376
(* GetSy
311
2222222222222
Gets
the
symbol
378 PROCEDURE GetSy(sy:CARDINAL; VAR
379 BEGIN snl:=sn[sy]; END GetSy;
sy
=)
snl:Symbolnode);
380
381
382
(* GetSymbolSets
383
----------------------------2-----------------22
Get
first-sets,
follower-sets,
eps-sets
and any-sets
384
PROCEDURE
GetSymbolSets;
385 VAR
386
1;
CARDINAL;
387
sn:
Symbolnode;
388
BEGIN
389
390
391
392
393
394
395
fnt:=maxpt1;
FOR i:=0 TO maxs-fnt DO first[i].ready:=FALSE;
FOR i:=fnt TO maxs DO
GetSy (1i,sn);
GetFirstSet (sn.start,first[i-fnt].ts);
first [i-fnt] .ready:=TRUE;
END;
END;
396
397
398
GetFollowSets;
GetEpsSets;
GetAnySets;
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
IF ddt({"K"] THEN
(*print first-sets and follow-sets*)
WriteLn (lst);
WriteString(lst,"List of terminal start symbols:"); WriteLn(lst);
FOR i:=fnt TO maxs DO
PutNt (1); PutTermSet
(first [i-fnt].ts);
END;
WriteLn(lst); WriteLn(lst);
WriteString(lst,"List of terminal successors:");
FOR i:=fnt TO maxs DO
PutNt (1); PutTermSet (follow[i-fnt].ts);
END;
END;
END
GetSymbolSets;
(* NewAt
Enter
new
attribute
for
a symbol
WriteLn(lst);
=)
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
App.
Program listings
308
PROCEDURE
NewAt (sy, spx:CARDINAL;
dir:Direction);
VAR
i: CARDINAL;
p,at: Attributeptr;
BEGIN
WITH sn[sy] DO
INC (nra);
IF typ=nt THEN
(*store name and direction?)
Allocate (at,SIZE (Attribute));
at*.spix:=spx; at*.dir:=dir; at*.next:=NIL;
IF firstat=NIL
THEN
firstat:=at;
ELSE
p:=firstat;
WHILE
p*.next<>NIL
DO p:=p*.next
END;
p*.next:=at;
END;
END;
END;
END
NewAt;
(* NewMacro
Enter
ee
new
semantic
ee
macro
ee
PROCEDURE NewMacro(spix,sem:CARDINAL;
VAR p,s: Macroptr;
VAR
ee
ee
ee
*)
ok:BOOLEAN);
BEGIN
p:=firstmacro;
WHILE (p<>NIL) AND (p*.spix<>spix) DO p:=p%*.next;
IF p=NIL
THEN
ok:=TRUE;
Allocate
(s, SIZE (Macronode) ) ;
s*.spix:=spix; s*.sem:=sem; s*.next:=NIL;
IF firstmacro=NIL
THEN firstmacro:=s; lastmacro:=s;
ELSE lastmacro*.next:=s;
END;
ELSE ok:=FALSE;
END;
END NewMacro;
(* NewSy
a
END;
lastmacro:=s;
Generate a new symbol and return index
a
a
ee
NewSy (spx:CARDINAL; tp:Symboltype): CARDINAL;
a
PROCEDURE
VAR i: CARDINAL;
BEGIN
IF maxs=null THEN maxs:=0; ELSE INC(maxs); END;
IF maxs>=maxsymbols THEN Restriction(6); END;
WITH sn[maxs] DO
typ:=tp; spix:=spx; aliasspix:=spix; nra:=0;
CASE typ OF
Be
IF maxt=null THEN maxt:=0;
IF maxp=null THEN maxp:=0;
IF maxt>=maxterminals THEN
Il Toter
IF maxp=null
ELSE INC(maxt);
ELSE INC(maxp);
Restriction(7);
END;
END;
END;
*)
App. F
cocosym.MOD
474
THEN
SemErr(25,line,col);
475
ELSE
INC (maxp);
476
477
END;
seml:=0;
sem2:=0;
478
| nt,err:
479
480
481
482
483
309
maxp:=0;
maxt:=0;
5
start:=0; del:=FALSE;
END; (*CASE*)
END; (*WITH*)
RETURN maxs;
END NewSy;
firstat:=NIL;
484
485
486
(* RepSy
487
----------------- A ----- =~
Replace
symbol
488
489
PROCEDURE RepSy(sy:CARDINAL; snl:Symbolnode);
BEGIN sn[sy]:=snl; END RepSy;
$5 5
sy
5
5 $5
= == = = = = = == ==
= === ------- x)
490
491
492 (* SyNr
Gets index of name spix
Sn
494
495
PROCEDURE SyNr(spix:CARDINAL):
VAR i: CARDINAL;
496
BEGIN
THEN
RETURN
=)
CARDINAL;
497
498
IF maxs=null
1:=0;
null;
END;
499
500
WHILE (i<=maxs) AND (sn[{i].spix<>spix)
IF 1i>maxs THEN i:=null; END;
501
RETURN
502
503,
END
DO
INC(1);
END;
i;
SyNr;
504 *
905
(* ALIBIEC
Set
all
bits
in set
s
506 --------------------------------------------------------------------- x)
507 PROCEDURE AllBit (VAR s:Symbolset) ;
508
509
VAR 1:
BEGIN
CARDINAL;
510
FOR
511
512
513
514
515
516
517
518
519
520
S21E
220
523
END AllBit;
1:=0
TO maxterminals
DIV
16 DO
s[{i]:=VAL(BITSET, 65535);
Deletes bit n in set s
en
en een
ee
PROCEDURE DelBit (VAR s:Symbolset; n:CARDINAL);
END;
(* ClearSet
Clears set s
----------------------------7777777777222
PROCEDURE ClearSet (VAR s:Symbolset; n:CARDINAL);
VAR i: CARDINAL;
BEGIN FOR i:=0 TO n DIV 16 DO s[i]:={}; END; END ClearSet;
*)
A DeLBLt
EXCL(s[n
DIV
16],
n MOD
END
Sn
SoSe 2
524
BEGIN
525
526
527
528
529
(* Empty
TRUE if set s is empty
=-------2--------------222-2222.
Sn
PROCEDURE Empty(VAR s:Symbolset; n:CARDINAL) :BOOLEAN;
530
531
532
VAR 1:CARDINAL;
BEGIN
FOR i:=0 TO n DIV
16 DO
16);
Son
*)
DelBit;
*)
App. F
Program listings
310
533
IF s[i]<>{}
534
END;
535
RETURN
536
END
THEN
RETURN
FALSE;
END;
TRUE;
Empty;
537
538
539
(* InSet
TRUE
if sl <= s2
540 ------------------------nn
n=
541 PROCEDURE InSet (VAR sl,s2:Symbolset; n:CARDINAL)
:BOOLEAN;
542
VAR
543
544
545
546
BEGIN
FOR i:=0 TO n DIV 16 DO
IF NOT(s1{i]<=s2[i]) THEN
END;
i:
CARDINAL;
547
RETURN
548
END
RETURN
FALSE;
END;
TRUE;
InSet;
549
550
551 (* IsInSet
TRUE VIE nedseineseess
552 -------------------------22-2222
2202220
553 PROCEDURE IsInSet (n:CARDINAL; VAR s:Symbolset) :BOOLEAN;
554 BEGIN RETURN (n MOD 16) IN s[n DIV 16]; END IsInSet;
555
556
557 (* PrintSet
ddt output of set s
558 ---------------------------------------4...
559 PROCEDURE PrintSet (VAR s:Symbolset; n:CARDINAL);
560
561
562
563
564
565
VAR i: CARDINAL;
BEGIN
FOR 1:=0 TO n DIV
DIV
MOD
(* PutNt
Print name of nonterminal sy
=
=======2-2=2-- 2-2 2-2
571
PROCEDURE
PrintSet;
572 VAR
Sl
ih:
574
name:
SS,
Gkalp
nn *)
PutNt (sy:CARDINAL);
CARDINAL;
ARRAY[1..50)
Symbolnode;
OF CHAR;
BEGIN
577
GetSy(sy,sn);
GetName(sn.spix,name,1);
578
WHILE
INC(1);
579
980
581
WriteLn(lst);
WriteString(lst,"
column:=15;
582
END
1<12
DO
name[l]:="
"; END;
"); WriteText (lst,name,1);
Write (lst,"
");
PutNt;
583
584
585
x)
256,4);
256,4);
566
567
568
569
520
576
x)
16 DO
WriteCard(con,VAL(CARDINAL,s[i])
WriteCard(con,VAL(CARDINAL,s[i])
END;
END
*)
(* PutTermSet
Print
names
of terminals
386) ----- 2.
nn
ln
nn
eg
587 PROCEDURE PutTermSet (VAR s:Symbolset);
588 CONST maxlinelen = 72;
589 VAR
590
1,1:
CARDINAL;
591
name: ARRAY[1..50] OF CHAR;
ee
in set
s
ee
SE
App. F
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
cocosym.MOD
sn:
311
Symbolnode;
BEGIN
FOR i:=0 TO maxt DO
IF IsInSet(i,s)
THEN
GetSy(i,sn); GetName(sn.spix,name,
1);
IF column+l>maxlinelen THEN
WriteLn (lst); WriteString(lst,"
column:=15;
END;
WriteText (lst,name,1);
INC (column,
1+2) ;
END; (*IF IsInSet*)
END;
");
(*FOR*)
Writeln (lst);
END
WriteString(lst,"
>
PutTermSet;
(* SetBit
PROCEDURE
BEGIN
Sets
DIV
(* Unit
16],n MOD
si
Unit (VAR
s=
END
SetBit;
+ s2
n:CARDINAL) ;
16 DO sl[i]:=sl[i]+s2[1i] ; END;
622 BEGIN (*cocosym*)
623
maxt:=null; maxp:=null;
624
maxany:=0; maxeps:=0;
625
END cocosym.
aliasspix
466
AllBit
Allocate
any
ClaeZ5 3 00M
19 424
447
19328253
31
sl
s
n:CARDINAL);
16);
sl,s2:Symbolset;
VAR i: CARDINAL;
BEGIN FOR 1:=0 TO n DIV
Anyset
n in set
SetBit (VAR s:Symbolset;
INCL(s[n
PROCEDURE
bit
maxs:=null;
firstmacro:=NIL;
od
50
anyset
anysetsize
at
Attribute
505512082139
230231
419
424
425
424
Attributeptr
change
151
419
188779129437
ClearMarkList
ClearSet
cocogra
cocolex
cocolst
13
231
is
15
16
206
328
260
329
425
332
346
425
427
272
344
516
518
cocosym
col
1272625
15 474
CollectFirstSet
225
242
251
257
261
CollectFollowSets
am
RNP
ae)
column
S158 lee 597225997.602
CompErr
17153
Complete
3135 320532452345
CompleteAt
u
7p
SV
430
END
Unit;
con
curnt
ddt
del
DelBit
Deletable
DelNode
dir
Direction
down
dummy
Empty
eofsy
eps
Epsset
epsset
epssetsize
err
Errors
EXCL
FileIo
FindDelSymbols
FindEps
FindEpsFollowers
first
firstat
firstmacro
Firstset
£nt
follow
Followset
FORWARD
GetA
GetAnySets
GetAt
GetE
GetEpsSets
GetF
GetFirstSet
GetFo
GetFollowSets
GetMacroNr
GetName
GetNode
GetSy
GetSymbolSets
gn
Graphnode
1
App. F
Program listings
312
972895229058
Bee) 77631208528
Ds)
Dei
Osh!
18, Dey?
Youle
ei
Heer
564
563
174
207
208
209
1522236
22620 PE
ZO.
ssi
SHO
ses
90
91
99 479
621 1138
523
524
1372905
825
7295
13241
14971557
USSR
S98
46425"
7425
148
416
155
79
529
536
28
138
196
245
3252
S2e 107185
24
32
Tle 53
LOAN
e478
2
352
17
524
18
76
114
187
200
201
203
178
185
197
SO
CONS
6n 0
403
157
426
427
429
54
370
442
449
37
53
555
85) 182217
3210328703297
3297
393
394
402
403
56.118202
29457296
33253627108
33
56
61
62
63
64
119
120
125
143
398
148
161
166
167
2a
397
2162217
134
181
222
265
361
362
270
356
396
aii)
209
OSI
LIED
479
450
623
2477723922995
7333, 3133510
407
408
3012530323
EA EZ
SE 90
9
3 os
296796
30120
3525
362723805390
391
EBD
DE
a
Er
2
931s
65
293
393
El
la
Als
Er
14721327199
I
AT
Sr
zul
127
132,
1337
2355238082395
286
285
iM
WG
AI
81
88
89
598
5235=5.285
25033403
ee
133513
22140,
241724072417
120191921551
246
217222090
950809558979
Ban
9]
aie)
97
273
339
eyes
317,
346
319°
351
320
352
313.
343
316"
345
ae
9327)
352
OG
NSIS
aS
GOYA
3288370)
386
390
333)
390
38
391
ay
39%
App. F
INCL
InSet
IsInSet
J
1
lastmacro
Macronode
Macroptr
Mark
Marked
Marklist
maxany
maxeps
maxlinelen
maxn
maxnt
maxp
maxs
maxsymbols
maxt
maxterminals
n
nl
name
NewAt
NewMacro
NewSy
next
nr
cocosym.MOD
393
499
530
594
612
541
319
314
81
601
57
178
15
128
198
282
133
16
400
598
175
317
41
41
14
14
14
139
183
588
13
26
85
623
88
391
623
25
135
392
471
62
553
273
82
416
439
460
45
70
197
71
153
35
27
439
151
372
443
394
499
532
395
313
402
499
83
596
403
500
542
618
403
500
544
619
407
501
545
619
408
508
545
619
408
510
560
619
418
510
562
461
517
563
495
518
564
498
518
590
554
319
106
595
320
573
321
577
578
578
578
580
590
596
597
451
187
451
197
200
201
132
225
284
196
101
401
601
193
344
447
54
234
232
223
140
184
140
232
285
200
102
405
605
206
178
232
289
200
102
405
181
233
300
255
106
406
182
234
308
308
106
406
187
235
192
238
192
255
193
261
194
277
106
579
110
580
110
580
111
580
112
598
223
232
234
260
274
283
284
316
OW
284
283
274
624
624
368
317
316
440
37
97
207
303
389
470
470
470
473
474
475
97
402
207
407
303
463
318
463
327
463
329
464
333
465
341
482
343
497
351
499
390
500
47
182
469
510
63
554
464
231
469
242
469
248
471
251
474
263
594
294
623
301
321
328
346
273
554
516
559
518
562
523
611
524
612
524
612
529
617
532
541
544
105
434
455
483
158
71
198
154
247
296
463
446
157
372
443
106
574
Si
578
580
Seh
596
601
371
19
425
120
429
148
429
154
430
158
443
166
448
167
451
178
184
190
422
292
303
469
453
158
419
444
466
423
319
470
478
329
473
346
497
497
500
623
623
623
158
429
159
429
159
429
368
429
370
430
371
440
371
442
371
443
548
553
318
105
602
450
182
474
als
222
283
134
101
401
601
192
332
42
45
193
192
175
139
183
597
131
33
88
619
371
443
pr
PrintSet
PutNt
472
63)
64
263)
403
301
408
303"
571
S52)
582
PutTermSet
ready
65
39
403
247
408
390
587
394
606
RepNode
RepSy
14
91
17
140
488
464
198
489
471
13
341
Restriction
rootloc
rp
s
sl
s2
sem
seml
sem2
SemErr
set
SetBit
sn
snl
sp
spix
App. F
Program listings
314
5595566
ey
AW
61
62)
166
167
44824507
Seis) Geis!
IE
363)
179
24507
GY!
2222992957207
5657119120129
1 S43 6m
181
182
183
216
217
440
450
451
5 07 S051 6 OS
SSE) Gs)
Stl
Sk
GO
(il
2298
2325
Ki
Selby
A436)
477
477
17
474
LLL
225
294
361
246
296
58
71
242
a
3725
25
ls)
372)
25 le S14 SO
Sila
439
448
231
362
611
il
29328
2165
2185
2512535
612
83
89
90
90
90
91
91
98
99
ar
275
oe
ays)
336
sil
a
339
Bee
at
379
BIS
MG
387
S86
LG
392
OE
393
20D
421
220)
465
250)
489
Sil
499
247
159
466
248
159
494
250
367
499
289
371
499
294
371
577
296
425
596
301
439
303
443
336
148
416
339
153
421
393
153
488
479
154
489
154
571
157
577
216
217
ass
334
Sy
3
447
DZ
ily,
Cees
8189
448
448
32429
OOOO
448
2612635280293
sym
Symbollist
Symbolnode
Symbolset
378
140
43
443
416
90
64
361
277
47
47
31.
379 488
198 239
105 148
448 448
425 460
90 209
70
71
362 378
290 296
58
83 176
832" 34,
489
246
155
466
466
251
71
379
307
228 275
0955 030"
378
Gil
387 488
“62%, 63%
575
65a
592
119
7799
166
216
eke
225
Gal
280
361
516
523
529
541
Symboltype
SyNr
System
179
a
460
494
19
OU ar
GO
Spx
start
sy
SYSTEM
ie
tp
ts
typ
Unit
VAL
vialp
Write
WriteCard
WriteLn
WriteString
222
a
229
faaly)
507
502
20
246 468
460
466
layer
393
403 408
Uy
ise) le
IGE) GLY
aI
1927724272487
251
294321
20 136
510
563
564
187
196
18 580
18 238
238
239
289
289
564
18 101° 101° 102° 106° 110°
353 400 401 405 405
406
18 102
106 111
237 263
Aa
OS
WE
YE
9617619
PY
ADS)
AGG
7167)
290
336
337
352
563
239. 250
605
3070302
304
339
336
350
300
1104.12.
579 598
288% 2997
App. F
WriteText
cocosym.MOD
401
18
406
106
580
580
598
601
601
315
Program listings
316
1 (* General table-driven
2
3 This is a parser module
syntax
analyzer
generated by Coco from an attributed grammar.
4 Before calling the procedure Parse from the main program, initialize
5 the scanner (<grammarname>lex.MOD) .
7 DEFINITION
MODULE
cocosyn;
8 VAR
9
printinput:
BOOLEAN;
(*trace the
10 + printnodes:
BOOLEAN;
(*trace the
11
12 PROCEDURE Parse (VAR correct :BOOLEAN);
13 END cocosyn.
input tokens read*)
G-code interpretation*)
App. F
2 'ORB
cocosyn.MOD
317
(* General table-driven syntax analyzer
Re
S===S==2==SSS2=S 222222222 22=2======2=>=
Moe 21.12.83
01 (21.12.83) First version (rewritten from PL/M)
02 (28.02.84) New interface for input and errors
03 (02.04.84) Error in EOL-processing corrected
04 (08.05.84) New EOL-processing
05 (23.07.84) For G-code
06 (30.08.84) Error recovery simplified
07 (05.04.85) New G-code instruction EPSA (ANYA modified)
08 (12.04.87) Grammar tables initialized INLINE
09 (12.04.87). typ,col,line and at exported by cocolex
10 (07.06.87) Name of error module and scanner procedure constant
nenn
an,
IMPLEMENTATION
MODULE
cocosyn;
FROM
Errors
IMPORT
SyntaxError,
FROM
FileIo
IMPORT
con,
Errorptr,
WriteCard,
Errornode;
WriteLn,
WriteString;
RP
PRP
RPP
PRP
HM
NH
SW
MN
WIAD
COW
WODMDANIDOBPWNHOrFP
FROM System
IMPORT Allocate;
FROM
SYSTEM
IMPORT
FROM
cocosem
FROM
cocolex
IMPORT
ADDRESS,
ADR,
INLINE;
Semant;
IMPORT
GetSy,
typ,
at,
line,
col;
NW
MR
NM
Oo
>wvwomwh
CONST
maxname
maxnamep
maxcode
maxany
YH
LH
MH
NM
von
maxeps
ww
ro
= 385;
=
45;
= 401;
=
37
=
10;
maxt
=
maxp
maxs
startpc
=
34;
=
45;
= 397;
34;
MH
SW
OO
YD
Ww
ww
38 CONST
et
nts
eps
(*G-code
= 0;
= 4;
= 8;
errdistmin
lmaxs
eofsy
wo
bP
BB
wWwnNroowo
instructions*)
eek
ntas
epsa
=
=
=
= ay
= 5;
= 9;
2;
50;
0;
ies =
2
any =
6;
jmp = 10;
Dtags=
anya =
ret
=
3,
7;
ll;
(*min.distance between two errors*)
(*max.stack length*)
(*token number of endfile symbol*)
=
Nom
Sb
Sp
47 TYPE
>
ce
WO
Attributenumbers = ARRAY[0..maxp] OF CARDINAL;
Namepointers
= ARRAY[0..maxnamep] OF CARDINAL;
Name list
= ARRAY(1..maxname] OF CHAR;
Pragma
= RECORD
(*semantics for a pragma*)
sem2,sem3: CARDINAL;
END;
Pragmalist
Symbolset
Symbolnode
startpc:
Oo
ND
WMHr-H
OS
AaAannnnnn
58
59
del:
= ARRAY[maxt..maxp] OF Pragma;
= ARRAY[0..maxt DIV 16] OF BITSET;
(*set of terminals*)
= RECORD
(*symbol information (only for nt)*)
CARDINAL;
(*start node of rule for nt*)
BOOLEAN;
(*TRUE,
if nt is deletable*)
App. F
Program listings
318
60
61
62
first:
END;
Symbollist
63
Stack
(*terminals
Symbolset;
= ARRAY[maxp+l..maxs]
= ARRAY[1..lmaxs]
64
65 VAR
66
tab:
POINTER
TO RECORD
causing
to analyze
nt*)
this
OF Symbolnode;
OF CARDINAL;
(*grammar
tables*)
67
68
69
70
hl
header:
code:
ntsymbols:
epsset:
anyset:
ARRAY[1..8] OF CARDINAL;
(*not used*)
ARRAY[1..maxcode] OF CHAR;
(*G-code area*)
Symbollist;
(*nonterminals information*)
ARRAY[1..maxeps] OF Symbolset;
ARRAY[1..maxany] OF Symbolset;
72
73
74
75
nra:
ps:
namep:
name:
Attributenumbers;
Pragmalist;
Namepointers;
Namelist;
76
END;
77
Lem
correct:
pee
(*no.of attributes*)
(*semantics for pragmas*)
(*pointers to symbol names*)
(*symbol names*)
BOOLEAN;
CARDINAL;
(*error indicator*)
(*program counter*)
79
80
errdist:
CARDINAL;
81
newlacts:
ARRAY
82
83
84
newpc:
s,olds:
lacts:
(*current
[0..maxt]
ARRAY [0..maxt]
Stack;
CARDINAL;
OF
CARDINAL;
(*new
OF CARDINAL;
error
stack
(*pc after
(*stack
distance*)
length*)
recovery*)
pointer*)
85
86
87 PROCEDURE
88
GetSymInstr(pc:CARDINAL;
VAR
opcode,sy,nextpc,altpc:
CARDINAL);
FORWARD;
89 PROCEDURE
RestoreStack;
90 PROCEDURE
91 PROCEDURE
92 PROCEDURE
SaveStack; FORWARD;
StackElem(i:CARDINAL): CARDINAL; FORWARD;
Triple (altroot:CARDINAL); FORWARD;
FORWARD;
93
94
95 (* Match
Check if sy is member of the specified set
96 ----------------------------
x)
97 PROCEDURE Match(sy:CARDINAL; set:Symbolset): BOOLEAN;
98 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match;
99
100
101
(* NextSym
102
103
--------------------------------------------------_-_________
2... 2... Ei)
PROCEDURE NextSym;
104
BEGIN
105
106
107
Get
next
symbol
LOOP
GetSy;
= (*IF printinput
THEN
108
WriteString(con,"S$(in:");
109
110
WriteString(con,")
IF printnodes THEN
111
ANZ,
113
WriteCard(con,
END;
END; *)
114
IF typ<=maxt
115
WITH
tab”
WriteCard(con,typ, 3);
");
lacts,3);
THEN
RETURN
WriteString(con,"|
");
END;
DO
116
IF correct
AND
(ps[typ].sem2<>0)
117
118
IF correct
END;
THEN Semant (ps[typ].sem2);
AND
(ps[typ].sem3<>0)
END;
THEN
END;
Semant (ps[typ].sem3);
App. F
119
120
121
122
123
124
125
cocosyn MOD
IF typ=eofsy
THEN
RETURN
319
END;
END;
END NextSym;
(*===========================
ERRORS
S===S======2=22=2=========5===5========mk)
126
127 (* AdjustPc
Adjust pc to next symbol instruction
128 --------------------------------------------------------------------- x)
129 PROCEDURE AdjustPc(VAR pc:CARDINAL) ;
130
131
132
183
BEGIN
WITH tab”
IF pc=0
LOOP
DO
THEN
RETURN;
END;
134
CASE ORD(code[pc])
185
136
137
t,ta,nt,nta,nts,ntas,any,anya,eps,epsa: EXIT;
| jmp: pc:=256*ORD
(code [pc+1])+ORD (code [pc+2]);
leret:2pss=0, ZEXIT;
138
139
140
ELSE
END;
END;
141
142
INC (pc);
OF
(*sem*)
END;
END
AdjustPc;
143
144
145
146
(* Error
Report syntax error
-------------------------------------
147
PROCEDURE
148 VAR
149*
e,el,h:
2272722222222
- *)
Error (VAR pc,altroot:CARDINAL) ;
Errorptr;
150
1,j:
CARDINAL;
151
opcode, sy,nextpc,altpc,pcl:
CARDINAL;
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
PROCEDURE GiveName(q:Errorptr; sy:CARDINAL);
VAR p,4}: CARDINAL;
BEGIN
WITH tab“ DO
p:=namep[sy]; j:=0;
WHILE (j<25) AND (name[p+j]<>0C) DO
INC (4); q*.txt[j]:=name [p+tj-1];
END;
qu. Ls=i
END;
END GiveName;
BEGIN (*Error*)
correct :=FALSE;
IF errdist >= errdistmin
THEN
169
170
Allocate
(h,SIZE (Errornode));
h*.next:=NIL; el:=h;
171
172
pel:=altroot;
AdjustPc(pcl);
WHILE pc1>0 DO
173
174
175
176
197,
GiveName(h,typ);
(*pass
GetSymInstr (pcl,opcode,sy,nextpc,altpc);
IF opcode<any THEN
(*t,nt,nts,ta,nta,ntas*)
Allocate
(e,SIZE (Errornode));
GiveName (e,sy);
(*pass
el‘.next:=e;
el:=e;
e*.next:=NIL;
near-symbol*)
expected
symbol*)
Program listings
320
178
END;
pel:=altpc;
19
180
END; (*WHILE*)
SyntaxError(h,
line, col);
181
Triple(altroot); SaveStack;
182
IF printnodes THEN
183
WriteString(con,"$
typ
newpc
newlacts$");
184
FOR i:=0 TO maxt DO
185
IF newpc[1]<>0 THEN
186
WriteCard(con,i,5); WriteCard(con,newpc[1],10);
187
WriteCard(con,newlacts[i],10); WriteLn(con) ;
188
189
END; (*IF*)
190
END;
(*FOR*)
191
END 7a (rR)
ELSE RestoreStack;
192
END;
193
WHILE newpc[typ]=0 DO
194
IF printnodes THEN
195
196
WriteString(con,"$(skip:"); WriteCard(con,typ,0);
197
WriteString(con,")
");
END;
198
199
NextSym;
200
END;
201
pe:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0;
END Error;
202
203
204
205 (* Fill
Fill triple list with alt-chain starting at pc
206
207 PROCEDURE Fill(pc,lacts:CARDINAL);
208 VAR
209
i1,opcode,sy,nextpc,altpc: CARDINAL;
210
s: Symbolset;
211 BEGIN
212
AdjustPc (pc);
213
WHILE pc<>0 DO
214
GetSymInstr
(pc, opcode, sy, nextpc,altpc) ;
215
CASE opcode OF
216
eta
2a]
newpc[sy]:=pc; newlacts[sy]:=lacts;
218
| nt,nta,nts,ntas:
219
s:=tab*.ntsymbols[sy].first;
220
FOR 1:=0 TO maxt DO
221
IF Match(1,s) THEN newpc[i]:=pc; newlacts[1]:=lacts; END;
222
END;
223
IF tab*.ntsymbols[sy].del THEN Fill(nextpc,lacts); END;
224
| eps,epsa:
225
Fill(nextpc,lacts) ;
226
ELSE (*any,anya: nothing*)
PEN
END; (*CASE*)
228
pc:=altpc;
229
END; (*WHILE*)
230
END Fill;
231
232
233 (* Fillsuce
Fill triple list with succ. of alt-chain at pe
234
235
PROCEDURE
236
VAR
FillSucc (pc, lacts:CARDINAL) ;
App.
App. F
237
cocosyn. MOD
opcode, sy,nextpc,altpc:
321
CARDINAL;
238 BEGIN
239
AdjustPc(pc);
240
WHILE pc>0 DO
(*fill with successors of alternative-starts*)
241
GetSymInstr
(pc, opcode, sy,nextpc, altpc) ;
242
IF nextpc>0
243
pce:=altpc;
244
245
END;
END
THEN
Fill (nextpc,lacts);
END;
(*WHILE*)
FillSucc;
246
247
248 (* GetSymInstr
Get G-code instruction at address pc
249 -------------4-------4
250 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc:
251
252
BEGIN
(*assert:
WITH
tab*
pc points
opcode:=ORD
(code [pc] ) ;
254
IF
256
257
258
(opcode<=epsa)
THEN
AND
(opcode<>any)
e,nt,eps:
nextpc:=pc+2;
261
262
263
| ta,nta,anya,epsa:
nextpc:=pct4;
| nts:
nextpc:=pc+3;
264
265
| ntas:
| any:
271
272
Zar
IN
RET,JMP,SEM,ANY)*)
sy:=ORD
(code [pc+1]);
260
269
270
(not
ELSE sy:=0;
END;
CASE opcode OF
259
266
267”
268
instruction
DO
253
255
to a symbol
*)
CARDINAL);
nextpc:=pc+5;
nextpc:=pc+tl;
altpc:=0;
altpc:=256*ORD
(code [pc+2])+ORD (code [pc+3]) ;
altpc:=0;
altpc:=256*ORD
(code [pc+2] ) +ORD (code [pc+3]);
altpc:=0;
END; (*CASE*)
AdjustPc(nextpc); AdjustPc(altpc);
END;
(*assert: nextpc,altpc point to symbol
END GetSymInstr;
instructions
or are
zero*)
ZTriple
Fill triple list
SEI
I
OOSEESES
=)
275 PROCEDURE Triple (altroot:CARDINAL);
276 VAR i: CARDINAL;
277
BEGIN
278
279
280
FOR 1:=0 TO maxt DO
newpc[i]:=0; newlacts[i]:=0;
END;
281
282
283
FOR
1:=1 TO lacts DO
(*s[1] contains successor at
FillSucc(StackElem(i),1-1);
284
Fill (StackElem(i),1-1);
285
END;
286
287
288
289
290
291
292
293
294
295
FillSucc(altroot, lacts) ;
Fill (altroot, lacts) ;
(*clear
triple
(*fill with
level 0*)
(*fill
(*fill
with
with
list*)
succ.of
stacked
nt's*)
succ.of
current
alt-chain*)
alt-chain*)
END Triple;
(*=========================
(*========================
END ERRORS ================================%)
SYNTAXSTACK
===============================%)
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
Sal
352
353
354
355
App. F
Program listings
322
PROCEDURE
Pop(VAR
loc:
CARDINAL);
BEGIN
IF lacts>0
:
THEN loc:=s[lacts]; DEC(lacts);
ELSE WriteString(con,"--- Parser
stack
underflow.$");
HALT;
END;
(*IF printnodes
END Pop;
PROCEDURE
BEGIN
IF
THEN
Push(loc:
WriteString(con,"
pop");
END;*)
CARDINAL);
lacts<lmaxs
THEN INC(lacts); s[{lacts] :=loc;
ELSE WriteString(con,"--- Parser stack overflow.$");
END;
(*IF printnodes THEN WriteString(con," push"); END;*)
END Push;
HALT;
PROCEDURE RestoreStack;
BEGIN s:=olds; END RestoreStack;
PROCEDURE SaveStack;
BEGIN olds:=s; END SaveStack;
PROCEDURE StackElem(i:CARDINAL) : CARDINAL;
BEGIN RETURN s[i]; END StackElem;
(* TableContents
A dirty
trick
PROCEDURE TableContents;
BEGIN (*%% dont remove or change
to initialize
this
the grammar
tables
comment*)
INLINE(
401,
34,
34,
45,
10,
Sr
45,
385,
(7=——G=Code-——*)
Ue
lp BIOs
AL,
Die
3, 4359,
256, 5648, 2560,
SE,
22,
BOS,
36,
811,
3679296070 14247 4120
82),
56, 5125, 9984,12569,
813,
39, 2560, 9985, 3072,20506,
812,
80, 5125, 9984,18459, 7171,10752,15645, 2560,15616,
2590,
273,
101, 7956, 1319,
$4, 8195,11520,21258,
83,
2050, 8448, 3329, 4352,33311, 8709, 9984,29987, 2052, 3840,
5122, 9252,
21, 2560,27144,
805,
4, 9739,
549,10024,
278,
151,
549,10506,
141, 2053, 2858, 1062,11052, 1318,
168,11566, 2560,40712, 1547,
812,
186,12037, 9984,46640,
12552, 1807, 2817, 1536,49202, 2817,
512,50739, 281.9,.1.01527
52276, 2817, 5888,55315,
548,13568, 6162, 2817, 6400,58387,
348,13824, 6674, 2816, 6931,
548,14080, 7186,14347,
2
14597,10241,
SB
2p
Ass),
SS),
30, 2820,10554, 2560,
64768, 2107,
32,
273,
297, 7948,
289,
293,
273,
286,
7948, 2561, 4352, 4924, 3594,
273, 2056,15627,
19,15374,
2561, 4352, 2878,
327
1721949
2228972324,
17, 1949,
2561,14600, 2367, 2816, 3648,
279,
345,16640, 4383,16896,
6144, 1291, 1794,
353,17162,
345, 2058,17418,
342,
14,
32,
17, 8005,
V5 WEB,
Sip LIS,
SiS6),
5,18187,
UE
Mle TO)
18, 7947,
17.556, 18443,
5477,
0,
2816,
(*---nt-symbols---*)
17
0,
128,
0,
0,
137,
0,16452,
2694,
0,
App. F
cocosyn MOD
323
356
154,
0,16452, 2694,
Oy, ste
0,16452, 2694,
0,
357
0,
07.256,
OBER 2EZE
0,
0, 8192,
0,
239,
358
0,
0,
0,16384,
Ya SS
304,
0,
0, 2048,
359
0,
0,
359,
6,
0,
OFS Sill,
0,16384,
0,
360
391,
0,
2,
0,
0,
361 (*---eps followers---*)
362
0,
17
0,
512,
0,
0, 8192,
0,
0,
16,
363
0,
16,
0, 5408,
0,
0,16452, 8166,
Op AT27
364
0,
0,
0,
0,16384,
0,49152,
0,
0,
32,
365 (*---any sets---*)
366
65022, 65534, 65535, 65502, 65535, 65535, 65502, 65535, 65535,
367 (*---attribute numbers---*)
368
0,
0,
0,
0,
0
0,
0,
0,
0,
0,
369
370
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
371
0,
0,
0,
0,
17
372 (*---pragma semantic---*)
373
0,
0,
374 (*---name pointers---*)
375
17
197
57
74,
69,
53,
19,
59,
44,
34,
376
Cp
83,
I,
ele
Ana,
IR
Ar
ait
A
or
377
za
WS,
le,
a
NE).
er
AO
ee
PADI
378
ZU
PAY
aii,
AES,
Bly
PA,
IN
PD.
XS
Oi
379
366,
2313,
1,2349,5
30073157533
380 (*---name list---*)
381
17743,17920, 8769,19529,16723, 8704, 8801,28281, 8704, 8772,
382
34,17742,17479,21057,
17931719521, 21057,.21577,20302, 212827
383
19746,
34,25966,25715,25965, 8704, 8805,28787, 8704, 8775,
384
21057,19789,16722, 8704, 8809,28194,
34,19777,17234,20307,
385 » ‚ 8704, 8782,20302,21573,21069,18766,16716,21282,
34,28533,
386 " 29730,
34,20562,16711,19777,21282,
34,21077,19525,21282,
387
34, 21317,19777, 20052, 18755, 21282,
34,29541,27938,
34,
388
21573,21069,18766,16716,21282,
105,25701,28276,
26982, 26981,
389
78,21837,16965,20992,10045, 9984,
29184, 21332,21065, 20039,
390
10030, 9984,10108, 9984,10024, 9984,10025, 9984,10075, 9984,
391
10077, 9984,10107, 9984,10109, 9984,10044, 9984,10046, 9984,
392
25455, 29561,
10043, 9984,10042, 9984,10028, 9984,28271,25455,
101,
34, 25455, 29298, 25955, 29728, 26482, 24941, 28001,29218,
393
394
30832, 29285, 29555, 26991, 28160, 24940, 29797, 29294, 24948, 26998,
34,
97,29812, 29289, 25205, 29797,
25856,29561, 28002, 28524,
395
26990
29812,
29289, 25205,29797,
8704, 8815,30068,11617,
396
,1161
7,
29812,29289,25205,29797, 8704, 8819,25965,24942,29801,
25376,
397
28001,
25376, 01,
24931,29801,28526, 8704, 8819,25965,24942,298
398
115,31085,25199,27648, 8801,27753,24947, 8302,
25458,28450,
399
0,0);
24941,25890,
400
401
END TableContents;
402
403
404
405
406
407 PROCEDURE Parse(VAR corr:BOOLEAN) ;
408 VAR
altroot:
CARDINAL;
(*root of current alternative chain*)
409
mustread:
BOOLEAN;
(*TRUE if next symbol must be read*)
410
opcode:
CARDINAL;
(*instruction code*)
411
running:
BOOLEAN;
(*interpreter state*)
412
sy:
CARDINAL;
413
414
ee
ie
324
Program listings
App. F
415 BEGIN
tab:=ADR(TableContents)+10D;
(*initialize the tables*)
416
pe:=startpc; altroot:=pc;
417
line:=1; col:=1;
418
correct:=TRUE; mustread:=TRUE;
running:=TRUE;
419
420
WITH tab* DO
421
WHILE running DO
422
opcode:
=ORD (code [pc]) ;
423
IF mustread AND (opcode<=epsa) THEN
424
NextSym; mustread:=FALSE; INC(errdist); altroot:=pc;
425
END;
426
(*IF printnodes THEN WriteCard(con,pc,5); END;*)
427
428
INC (pc);
CASE opcode OF
429
(58
430
431
IF ORD (typ) =ORD (code [pc] )
THEN IF typ=eofsy
(*t recognized*)
432
THEN running:=FALSE;
433
ELSE INC(pc); mustread:=TRUE;
434
END;
435
ELSE Error (pc,altroot);
436
437
END;
ta:
438
439
IF ORD (typ)=ORD
(code [pc] )
440
THEN INC (pc,3); mustread:=TRUE;
(*t recognized*)
441
ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]));
(*try alt.*)
442
END;
| nt pnts:
443
444
sy:=ORD (code [pc]);
445
IF Match (typ,ntsymbols[sy].first) OR ntsymbols[sy].del
446
THEN
(*right nt, parse it*)
447
IF opcode=nts THEN INC (pc); Semant (ORD(code[pc])); END;
448
Push (pc+1); pc:=ntsymbols[sy].startpc;
449
altroot:=pc;
450
ELSE Error (pc,altroot);
451
END;
452
| nta,ntas:
453
sy:=ORD (code [pc]);
454
IF Match (typ,ntsymbols[sy].first)
455
THEN
(*right nt, parse it*)
456
INC (pc, 3);
457
IF opcode=ntas THEN Semant (ORD(code[pc])); INC(pc) END;
458
Push (pc); pc:=ntsymbols[sy].startpc;
459
altroot:=pc;
460
ELSE pc:=ORD (code [pc+1])*256+0RD
(code [pc+2]); (*try alt.*)
461
END;
462
| any:
mustread:=TRUE;
(*any recognized*)
463
| anya:
464
IF Match (typ,anyset
[ORD (code [pc]) ])
465
THEN INC (pc,3); mustread:=TRUE;
(*any recognized*)
466
ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2]);
467
END;
468
| eps:
469
IF Match (typ, epsset [ORD (code [pc])])
470
THEN INC (pc);
471
ELSE Error (pc,altroot);
472
END;
473
| epsa:
App. F
474
475
476
477
478
479
480
481
482
483
484
485
486
487
cocosyn.MOD
325
IF Match (typ,epsset
[ORD (code [pc])])
THEN INC (pc, 3);
(*eps recognized*)
ELSE pc:=ORD (code [pct+1] ) *256+0RD (code [pce+2]) ;
END;
| jmp: pc:=ORD (code [pc] ) *256+ORD(code[pct1]);
(*goto successor*)
| ret: Pop(pc); altroot:=pc;
(*end of nt*)
ELSE (*sem*)
IF correct THEN Semant
(ORD (opcode) ); END;
END; (*CASE*)
END; (*WHILE running*)
END; (*WITH tab**)
corr:=correct;
END Parse;
Ss
488 BEGIN
489
490
491
492
493
printinput:=FALSE;
printnodes:=FALSE;
errdist:=100;
lacts:=0;
END cocosyn.
ADDRESS
AdjustPc
ADR
Allocate
altpc
altroot
Pr
any
anya
anyset
at
Attributenumbers
Cc
cocolex
cocosem
cocosyn
code
col
con
corr
correct
D
del
e
el
eofsy
eps
epsa
epsset
errdist
errdistmin
Error
Errornode
Errorptr
20
20
20
110
87
262
92
449
40
40%
71
23
48
EA
IE
2102392677261]
416
6
S75)
lol
7379
2095 DIT
2287231772499
237
72637264
7265 7261
147)
Pia)
9182; 201
275) 286
28710972177
450
459
471
479
135
174
254
265
462
1352.
464
2615
9250
260
42572436
463
72
158
23
22
15 493
OGme 345
1360136
439
441
441
444
469
474
476
476
23
181
418
18
184
187
187
407
485
ieee
kG
ey
416
59 223
445
IWS)
a
A
Ae
WA
le
NTP
a
a
nl)
ANS
41
135
224
259
AVS 5) 2245
2545
70
469
474
SOR Gime 2015 4255
43
167
147
202
436
450
11216951715
aba 498
153
2535259)
447
453
478
478
262
457
2620264
460
460
2642323
464
466
188
188
196
196
300
atom
Zul
Eee
a]
468
26164249
4911
471
473
197
309
7431
466
Errors
FileIo
Fill
FillSucc
first
FORWARD
GetSy
GetSymInstr
GiveName
h
HALT
header
al
INLINE
j
jmp
1
lacts
line
lmaxs
loc
Match
maxany
maxcode
maxeps
maxname
maxnamep
maxp
maxs
maxt
mustread
name
Namelist
namep
Namepointers
newlacts
newpc
next
nextpc
NextSym
nra
nt
nta
ntas
nts
ntsymbols
olds
opcode
Pp
Parse
pc
App. F
Program listings
326
17
18
207
233
60
88
23
87
153
149
300
67
91
276
20
150
41
161
84
298
23
44
296
97
29
28
30
26
27
32
33
31
410
15
50
74
49
81
82
170
87
260
103
72
39
39
40
40
69
83
87
254
154
407
78
201
243
265
441
284
287
225
283
445
90
230
286
454
91
242
214
169
169
241
176
170
250
270
170
181
185
279
186
279
187
281
187
283
188
283
209
284
220
284
199
478
158
158
159
199
159
161
201
299
181
63
299
98
71
68
70
50
49
48
62
54
419
158
19
157
74
188
186
177
151
262
121
207
299
418
307
305
221
MT
307
221
308
223
308
225
492
235
242
308
445
454
464
469
474
54
62
55
424
159
8
425
82
434
114
440
185
462
220
465
201
187
177
173
263
199
207
194
221
201
279
217
221
279
209
264
425
214
265
223
267
225
237
135
135
135
135
219
315
151
258
157
486
87
207
250
417
441
218
218
218
218
223
318
173
411
158
259
261
264
263
445
443
452
452
443
445
457
447
448
454
458
174
423
159
209
424
214
429
215
447
129
212
253
417
444
132
213
255
423
447
134
214
260
425
447
136
217
262
428
448
136
221
262
431
448
223
245
219
89
106
173
163
169
309
150
278
330
154
136
92
221
320
221
321
221
281
286
287
278
241
2427222250
237
457
241
481
250
253
254
136
228
262
434
449
137
235
263
436
450
138
239
264
439
453
147
240
264
440
456
201
241
264
441
457
App. F
pel
Pop
Pragma
Pragmalist
printinput
printnodes
ps
Push
q
RestoreStack
ret
running
s
SaveStack
sem2
sem3
Semant
set
Stack
StackElem
startpc
sy
cocosyn.MOD
457
469
479
151
296
51
54
489
183
73
305
153
89
41
412
83
Symbollist
Symbolnode
Symbolset
90
52
52
22
97
63
91
34
87
217
448
62
57
55
SyntaxError
17
System
SYSTEM
ie
ta
tab
TableContents
Triple
txt
typ
WriteCard
WriteLn
WriteString
19
20
39
39
66
328
92
159
23
431
18
18
18
458
470
458
471
459
474
460
475
460
476
wal
303
54
73
171
479
172
178
179
195
116
312
159
192
137
419
210
182
116
117
116
98
83
283
58
97
219
453
69
62
60
181
490
116
448
161
314
479
422
219
317
116
1407
117
117
458
117
327
460
476
464
476
465
478
sail
466
478
466
478
466
479
Hl
Sh}
STE
255
7321765209214
256 413
444
445
27,
445
315
433
221
318
299
308
447
457
481
284
417
98
223
454
320
448
98
237
458
321
458
Ton
241
153
250
70
71
97
210
135
135
115
402
182
216
216
131
416
275
259
261
156
430
438
219
223
252
416
421
288
114
432
187
188
184
116
439
187
116
445
188
117
454
196
117
464
ie
469
a
474
ae
196
197
300
309
ee
Zu
zu
328
1
Program listings
(* General
table-driven
syntax
App. F
analyzer
2
3 This is a parser module generated by Coco from an attributed grammar.
4 Before calling the procedure Parse from the main program, initialize
5 the scanner (<grammarname>lex.MOD) .
7 DEFINITION
MODULE
8 VAR
9
printinput:
10
printnodes:
-->modulename;
BOOLEAN;
BOOLEAN;
(*trace
(*trace
the input tokens read*)
the G-code interpretation*)
12 PROCEDURE Parse (VAR correct:BOOLEAN) ;
13 END -->modulename.
14 -->implementation
15 (* General table-driven syntax analyzer
Re
16
Moe
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
S===2=2=2=2=============2=2===2=2=========>
21.12.83
01 (21.12.83) First version (rewritten from PL/M)
02 (28.02.84) New interface for input and errors
03 (02.04.84) Error in EOL-processing corrected
04 (08.05.84) New EOL-processing
05 (23.07.84) For G-code
06 (30.08.84) Error recovery simplified
07 (05.04.85) New G-code instruction EPSA (ANYA modified)
08 (12.04.87) Grammar tables initialized INLINE
09 (12.04.87) typ,col,line and at exported by cocolex
10 (07.06.87) Name of error module and scanner procedure constant
-----------------222----------222-2...
2... 0.0... “)
IMPLEMENTATION MODULE -->modulename;
FROM
FROM
Errors
FileIO
IMPORT
IMPORT
FROM
System
IMPORT
SyntaxError, Errorptr, Errornode;
con, WriteCard, WriteLn, WriteString;
Allocate;
33 FROM
SYSTEM
IMPORT
ADDRESS,
34
35 FROM -->semantic analyzer
36 FROM -->input module
37
IMPORT
IMPORT
ADR,
INLINE;
Semant;
GetSy, typ,
at,
line,
col;
38 -->declarations
39
40 CONST (*G-code instructions*)
Als)
ta
=];
none):
42
nts=4;
ntas = 5;
any =
6;
43
eps = 8;
epsa = 9;
jmp = 10;
45
46
i
errdistmin
Ilmaxs
eofsy
=
2;
= 50;
(*min.
(*max.
=
(*token
(5
nta
anya
ret
=
3;
=
7;
= 11%
distance between
stack length*)
number
of endfile
two
errors*)
symbol*)
49 TYPE
50
51
52
53.
Attributenumbers
Namepointers
Namelist
Pragma
54
sem2,sem3:
95
END;
56
57
58
59
Pragmalist
Symbolset
Symbolnode
=
=
=
=
ARRAY [0..maxp] OF CARDINAL;
ARRAY(0..maxnamep] OF CARDINAL;
ARRAY[{1..maxname] OF CHAR;
RECORD
(*semantics for a pragma*)
CARDINAL;
N
i
il}
ARRAY [maxt..maxp] OF Pragma;
ARRAY[O..maxt DIV 16] OF BITSET;
(*set of terminals*)
RECORD
(*symbol information (only
for nt)*)
App. F
60
61
62
63
64
65
68
cocosynframe
startpc:
del:
elite Sie
CARDINAL;
BOOLEAN;
Symbolset;
329
(*start node of rule for nt*)
(*TRUE, if nt is deletable*)
(*terminals causing this nt to be analyzed*)
END;
Symbollist
Stack
= ARRAY (maxp+1..maxs] OF Symbolnode;
= ARRAY[1..lmaxs] OF CARDINAL;
VAR
tab:
header:
POINTER TO RECORD (*grammar tables*)
ARRAY [1..8] OF CARDINAL;
(*not used*)
ARRAY[1l..maxcode] OF CHAR;
code:
(*G-code area*)
ntsymbols: Symbollist;
(*nonterminals information*)
epsset:
ARRAY[1..maxeps] OF Symbolset;
anyset:
ARRAY [1..maxany] OF Symbolset;
nra:
Attributenumbers;
(*no.of attributes*)
Pragmalist;
ps:
(*semantics for pragmas*)
Namepointers;
namep:
(*pointers to symbol names*)
name:
(*symbol names*)
Namelist;
END;
correct:
BOOLEAN;
(*error indicator*)
CARDINAL;
pes
(*program counter*)
CARDINAL;
errdist:
newlacts: ARRAY [0..maxt]
ARRAY [0..maxt]
newpc:
s,oldsz Stack;
CARDINAL;
lacts:
(*stack
PROCEDURE
GetSymInstr (pc:CARDINAL;
PROCEDURE
RestoreStack;
PROCEDURE
SaveStack;
“ # FORWARD;
VAR
StackElem(i:CARDINAL):
Triple (altroot:CARDINAL);
Check
PROCEDURE
(* NextSym
---
PROCEDURE
(sy MOD
16)
Get next
---------
----
CARDINAL;
CARDINAL);
----
FORWARD;
FORWARD;
if sy is member
Match(sy:CARDINAL;
RETURN
opcode,sy,nextpc,altpc:
FORWARD;
PROCEDURE
(* Match
pointer*)
FORWARD;
PROCEDURE
BEGIN
(*current error distance*)
(*new stack length*)
(*pc after recovery*)
OF CARDINAL;
OF CARDINAL;
of the
specified
set:Symbolset):
IN set[sy
DIV
16];
set
BOOLEAN;
END
Match;
symbol
- ---------
---
---
- --
ee
+
x)
NextSym;
BEGIN
LOOP
GetSy;
(*IF printinput
THEN
WriteString(con,"$(in:");
WriteString(con,") ");
IF printnodes THEN
WriteCard(con,lacts,3);
WriteCard(con,
typ, 3);
WriteString(con,"|
");
END;
END; *)
IF typ<=maxt
WITH
tab“
THEN
RETURN
END;
AND
(ps[typ].sem2<>0)
DO
IF correct
THEN
Semant (ps[typ].sem2);
END;
330
Program listings
119
120
121
122
123
124
125
126
127
128
IF correct AND (ps[typ].sem3<>0)
END;
IF typ=eofsy THEN RETURN END;
END;
THEN
App. F
Semant (ps[typ].sem3);
END;
END NextSym;
(*===========================
129
(* AdjustPc
IS)
Sean
Adjust
131
PROCEDURE
pc to next
==================================%*)
symbol
instruction
a
AdjustPc (VAR
an
SE
ID
pc:CARDINAL);
132 BEGIN
133
WITH tab“ DO
134
IF pc=0 THEN RETURN;
135
ERRORS
END;
LOOP
136
CASE
137
ORD(code[pc])
OF
t,ta,nt,nta,nts,ntas,any,anya,eps,epsa:
138
139
EXIT;
| jmp: pc:=256*0RD
(code [pc+1])+ORD (code [pc+2]);
| ret: pc:=0; EXIT;
140
ELSE
INC (pc);
(*sem*)
141
142
143
144
145
146
147
148
END;
END;
END;
END AdjustPc;
(* Error
Report syntax error
-------------------------------------------------_____0... 2)
149
PROCEDURE
Error(VAR
pc,altroot:CARDINAL);
150 VAR
151
e,el,h:
152
153
154
155
156
157
1,3: CARDINAL;
opcode, sy,nextpc,altpc,pcl:
Errorptr;
CARDINAL;
PROCEDURE GiveName(q:Errorptr;
VAR p,j: CARDINAL;
BEGIN
158
WITH
159
160
161
162
163
164
165
166
167 BEGIN
p:=namep[sy]; 4:=0;
WHILE ()<25) AND (name[p+}]<>0C) DO
INC (J); q*.txt[j]:=name[p+j-1];
END;
Qoolks=3i7
END;
END GiveName;
168
169
170
171
172
173
174
175
176
wi)
tab*
sy:CARDINAL) ;
DO
(*Error*)
correct :=FALSE;
IF errdist >= errdistmin
THEN
Allocate (h, SIZE (Errornode));
h*.next:=NIL;
pel:=altroot;
WHILE pc1>0 DO
el:=h;
AdjustPc(pcl);
GiveName(h,typ);
GetSymInstr (pcl, opcode, sy,nextpc,altpc);
IF opcode<any
THEN
(*t,nt,nts,ta,nta,ntas*)
Allocate (e, SIZE (Errornode));
(*pass
near-symbol*)
App. F
cocosynframe
178
331
GiveName (e, sy) ;
182
END;
183
184
185
186
187
(*pass expected
el:=e;
el*.next:=e;
END;
pel:=altpc;
179
180
181
e*.next:=NIL;
(*WHILE*)
SyntaxError (h,line,col);
Triple (altroot); SaveStack;
IF printnodes THEN
WriteString(con,"$
typ
FOR 1:=0 TO maxt DO
188
symbol*)
IF newpc[i]<>0
189
190
newpc
newlacts$") ;
THEN
WriteCard(con,1,5); WriteCard(con,newpc[1],10);
WriteCard(con,newlacts[i],10); WriteLn(con);
191
END;
192
193
(*IF*)
END; (*FOR*)
END; (*IF*)
194
195
ELSE
END;
RestoreStack;
196
WHILE
197
198
199
200
201
202
203
204
205
206
207
208
IF printnodes THEN
WriteString(con,"$(skip:"); WriteCard(con,typ,0);
WriteString(con,")
");
END;
Next Sym;
END;
pc:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0;
END Error;
newpc[typ]=0
----------------------------------= == =$5 = === $= == === ==== =-- === --- *)
209
PROCEDURE
(& Fill
Fill
DO
triple
list
with
alt-chain
starting
at pc
Fill(pc, lacts:CARDINAL) ;
210 VAR
211
1,opcode,sy,nextpc,altpc:
212
s:
213
BEGIN
214
AdjustPc (pc);
NS
WHILE
216
217
218
219
220
221
229
230
pc<>0
DO
GetSymInstr
(pc, opcode, sy,nextpc,altpc) ;
CASE opcode OF
ei, cars
newpc[sy]:=pc; newlacts[sy] :=lacts;
| nt,nta,nts,ntas:
s:=tab*.ntsymbols[sy].first;
222
223
224
225
226
227
228
CARDINAL;
Symbolset;
FOR 1:=0 TO maxt DO
IF Match(i,s) THEN newpc[i]:=pc; newlacts[i]:=lacts; END;
END;
IF tab*.ntsymbols[sy].del THEN Fill(nextpc,lacts); END;
| eps,epsa:
Fill(nextpc, lacts) ;
ELSE (*any,anya: nothing*)
END;
(*CASE*)
pc:=altpc;
231
END;
232°
233
234
235
END Fill;
(*WHILE*)
(* FillSuce
Fill
triple
list with
succ.
of alt-chain
at pc
236 --------------------------------------------------------------------- x)
237
App. F
Program listings
332
PROCEDURE
FillSucc(pc,
lacts:CARDINAL)
;
238 VAR
239
240
opcode, sy,nextpc,altpc:
BEGIN
CARDINAL;
241
AdjustPc(pc);
242
WHILE pc>0 DO
(*fill with successors of alternative-starts*)
243
GetSymInstr
(pc, opcode, sy, nextpc,altpc) ;
244
IF nextpc>0 THEN Fill (nextpc,lacts); END;
245
pe:=altpc;
246
END; (*WHILE*)
247
END FillSucc;
248
249
250 (* GetSymInstr
Get G-code instruction at address pc
AS SESS SSSI BSH
HEFTE FREE
IT
ET x
252 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL);
253 BEGIN (*assert: pc points to a symbol instruction (not RET, JMP, SEM, ANY) *)
254
WITH
255
256
257
tab*
DO
opcode
:=ORD (code [pc] );
IF (opcode<=epsa) AND (opcode<>any)
THEN sy:=ORD
(code [pct+l});
258
259
ELSE
END;
260
261
262
263
264
sy:=0;
CASE opcode OF
t,nt,eps:
nextpc:=pct2;
| ta,nta,anya,epsa:
nextpe:=pc+4;
altpc:=0;
altpc:=
(code [pc+2]
256*0RD
) +ORD (code [pc+3]);
265
| nts:
nextpc:=pc+3;
altpc:=0;
266
| ntas: nextpc:=pc+t5;
altpc:
(code
=256*O
[pc+2] )+ORD (codeRD
[pc+3]);
| any:
267
altpc:=0;
nextpc:=pc+l;
268
END; (*CASE*)
269
AdjustPc(nextpc); AdjustPc(altpc) ;
270
END;
271
(*assert: nextpc,altpc point to symbol instructions or are zerot*)
272
END GetSymInstr;
273
274
275 (* Triple
Fill triple list
6722222222
2
ee
ae *)
277 PROCEDURE Triple(altroot:CARDINAL);
278
VAR
279
BEGIN
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
i:
CARDINAL;
FOR i:=0 TO maxt DO
(*clear triple list*)
newpc(i]:=0; newlacts[i]
:=0;
END;
FOR i:=1 TO lacts DO
(*fill with succ.of stacked
(*s[1] contains successor at level 0*)
FillSuce (StackElem(1) ‚,i-1);
Fill(StackElem(1),1-1);
nt's*)
END;
FillSucc (altroot, lacts);
Fill(altroot,lacts);
END Triple;
(*=========================
(*fill
(*fill
END
ERRORS
with
with
succ.of
current
alt-chain*)
alt-chain*)
===S==5S222==22=25=2=22===22=2===22=%)
SF
cocosynframe
(#========================
PROCEDURE
BEGIN
Pop(VAR
loc:
SYNTAXSTACK
333
====================s=s==2=======t)
CARDINAL) ;
IF lacts>0
THEN
loc:=s[lacts];
ELSE
WriteString(con,"---
DEC(lacts);
Parser
stack
underflow.$");
HALT;
END;
(*IF printnodes
END Pop;
PROCEDURE
THEN
Push(loc:
WriteString(con,"
pop");
END;*)
CARDINAL) ;
BEGIN
IF lacts<lmaxs
THEN INC(lacts); s[lacts]:=loc;
ELSE WriteString(con,"--- Parser
stack
overflow.$");
HALT;
END;
(*IF printnodes
END Push;
THEN
WriteString(con,"
push");
END;*)
PROCEDURE RestoreStack;
BEGIN s:=olds; END RestoreStack;
PROCEDURE SaveStack;
BEGIN olds:=s; END SaveStack;
PROCEDURE StackElem(1:CARDINAL):
CARDINAL;
BEGIN RETURN s{i]; END StackElem;
(* TableContents
A dirty
PROCEDURE TableContents;
BEGIN (*%% dont remove or
-->tables
END TableContents;
PROCEDURE
VAR
altroot:
mustread:
opcode:
running:
sy:
Parse(VAR
trick
change
to
initialize
this
comment*)
the
grammar
corr:BOOLEAN) ;
CARDINAL;
BOOLEAN;
CARDINAL;
BOOLEAN;
CARDINAL;
(*root of current alternative chain*)
(*TRUE if next symbol must be read*)
(*instruction code*)
(*interpreter state*)
BEGIN
tab:=ADR(TableContents)+10D;
pe:=startpc; altroot:=pc;
line:=1; col:=0;
correct:=TRUE; mustread:=TRUE;
WITH
tab“
tables
(*initialize
the
running:=TRUE;
DO
WHILE running DO
opcode
:=ORD (code [pc]) ;
IF mustread AND (opcode<=epsa)
THEN
tables*)
358
(*IF
printnodes
INC (pc) ;
CASE opcode
ee
362
363
364
365
366
367
368
369
370
371
372
373
374
mustread:=FALSE;
NextSym;
END;
356
357
359
360
361
App. F
Program listings
334
THEN
INC(errdist);
WriteCard(con,pc,5);
altroot:=pc;
END;*)
OF
IF ORD (typ) =ORD (code [pc] }
THEN IF typ=eofsy
THEN running:=FALSE;
ELSE INC(pc); mustread:=TRUE;
END;
ELSE Error (pc,altroot);
END;
(*t recognized*)
Peete
IF ORD (typ) =ORD (code [pc] )
THEN INC(pc, 3); mustread:=TRUE;
ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]);
(*t recognized*)
(*try alt.*)
END;
jene, nes:
375
376
377
378
379
380
sy:=ORD
(code [pc] );
IF Match(typ,ntsymbols[sy].first) OR ntsymbols[sy]
.del
THEN
(*right nt, parse it*)
IF opcode=nts THEN INC (pc); Semant
(ORD (code[pc])); END;
Push(pc+1); pc:=ntsymbols[sy] .startpc;
altroot:=pc;
381
ELSE
382
END;
Error (pc,altroot);
383
| nta,ntas:
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
sy:=ORD (code [pc] ) ;
IF Match(typ,ntsymbols[sy].first)
THEN
(*right nt, parse it*)
INC (pc, 3);
IF opcode=ntas THEN Semant (ORD(code[pc])); INC(pc) END;
Push (pc); pc:=ntsymbols[sy].startpc;
altroot:=pc;
ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]);
(*try alt.*)
END;
| any:
mustread:=TRUE;
(*any recognized*)
| anya:
IF Match (typ,anyset
[ORD (code [pc])])
THEN INC (pc,3); mustread:=TRUE;
(*any recognized*)
ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2] ) ;
END;
| eps:
IF Match (typ,epsset
[ORD (code [pc])])
401
402
403
404
405
406
407
408
409
410
411
THEN INC (pc);
ELSE
END;
| epsa:
IF Match (typ,e
[ORD
psset
(code [pc])])
THEN INC (pc, 3);
(*eps recognized*)
ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2]);
END;
| jmp: Pc:=ORD (code [pc] ) *256+0RD (code [pc+1]); (*goto successor*
)
| ret: Pop(pc); altroot:=pc;
(*end of nt*)
ELSE
412
413
414
Error (pc,altroot);
(*sem*)
IF correct
END;
END;
THEN
Sema
(ORD (opcode)
nt ); END;
(*CASE*)
(*WHILE
running*)
App. F
415
416
417
cocosynframe
335
END; (*WITH tab**)
corr:=correct;
END Parse;
418
419 BEGIN
420
421
422
423
printinput: =FALSE;
printnodes: =FALSE;
errdist:=100;
lacts:=0;
424
END
-->modulename.
ADDRESS
AdjustPc
ADR
Allocate
altpc
altroot
analyzer
any
anya
anyset
at
Attributenumbers
C
code
}
Cole
ee”
con
corr
correct
D
declarations
del
e
el
eofsy
eps
epsa
epsset
errdist
errdistmin
Error
Errornode
Errorptr
Errors
FileIo
Fill
FillSucc
first
33
131
33
32
89
264
94
380
35
42
42
73
36
144
347
174
153
265
149
381
50
74
160
70
370
400
36
31
338
12
347
38
61
151
151
47
43
43
12
82
45
149
30
30
30
31
209
237
62
FORWARD
90
GetSy
36
GetSymInstr
GiveName
h
HALT
89
155)
151
302
173
214
241
269
269
177
175
266
173
390
181
267
184
402
211
269
203
410
216
230
239
243
245
252
262
277
288
289
340
348
356
367
176
263
256
394
267
393
136
372
405
183
186
416
79
138
312
407
349
189
138
375
407
255
378
409
PI
384
409
264
388
264
391
266
391
266
395
354
397
362
397
189
190
190
198
198
199
302
311
118
119
168
350
412
416
225
177
172
121
137
137
400
169
169
204
171
151
376
178
179
363
226
226
405
203
179
179
179
ins)
261
256
839
263
355
404
356
422
367
177
Ve)
381
402
BOS
247
221
91
108
175
165
1m
311
227
285
376
92
232
288
385
93
244
286
289
216
171
171
243
178
172
252
272
172
183
137
137
395
94
header
1
implementation
INLINE
input
5
jmp
1
lacts
line
lmaxs
loc
Match
maxany
maxcode
maxeps
maxname
maxnamep
maxp
maxs
maxt
module
modulename
mustread
name
Namelist
namep
Namepointers
newlacts
newpc
next
nextpc
NextSym
nra
nt
nta
ntas
nts
ntsymbols
olds
opcode
App. F
Program listings
336
69
93°
152
187)
18872199
18977190
21102
2222235273223
21802
PO
EL
Pl
sel
EG,
AI
AS
A
160°
Neh
16
Acleeies
2109
309
52237572255,
310
310
221,
423
2312128328
310
376
385
395
400
405
56
57
83
84
116
36
ii
13
28
424
341723505
3555235623652
Teer
52
iy
76
159
187
222
280
Ss
14
33
36
1522
156
215921607
43
138
409
163
8607203222095
300
301
301
36
183
349
46
65
309
298
301
307
99
100
223
13
70
72
52
ol
50
56
64
29
64
517718
832190502032
84
188
189
1122
795279
Ee) ates)
265
264
262
UOSS
232
0s
37103935396
2195
196
223
203
287
219
SEE
266
56
NG
267
AP
269
223
281
OP
Bley
Deh
OM
DA
DIG
BS)
A
950
O65
DEG
149
203
242
266
371
387
397
409
243
266
372
2308
397
410
74
p
Parse
pc
“ML
ale
PPA
PAL
412137272207
2638
ale
ar
oe
A237
e220
265
TA
221
225376,
Somes
820
SIE 1537
Gy
11,0
230
7A
SV
ee
156°
159160
61
a
ee
al)
80
895 131"
134)
pel
Pop
203
245
267
372,
388
400
410
US)
283
ee
383
Se
Tat
23760,.2379
3858399
il
ac
PNT
ee
PAG.
KGW
BES)
Se)
136,
138)
209
252
348
372
389
401
13985129
214
255
348
375.
389
402
215
257
354
378:
390
405
216
262
356
378)
391
406
219
264
359
9379.
391
407
223
264
362
3790
391
407
sh
AGS)
ANN)
NG
NGG
alii
Pragma
Se}
Ys
Pragmalist
56
75
1295140,
236-937
264
265
365
367
3801039817
395
396
407
409
241
266
370
07384
397
409
App. F
cocosynframe
printinput
printnodes
ps
Push
q
RestoreStack
ret
running
s
SaveStack
sem2
sem3
Semant
semantic
set
Stack
StackElem
startpc
sy
9 420
10
185
197
421
Ths) ar
le
ae
A
307
314
379
389
15572 16122163
91
194
316
#317
43551397410
S43 ees 503537364
BSI
2221?
23
0 lee 31
320
319
184
92
54.378118
94
119
119
35
118
119
378
388
412
35
Fr
99
100
65
85
9352285572865,
3225323
60
348
379
389
COT
9910
022100515322
1755
Symbollist
Symbolnode
Symbolset
SyntaxError
219502218
7225552395,
22352525
379
384
385
389
64
7
59
64
Ol
OZ en
2a?
30
183
SYSTEM
58
System
32
t
ta
tab
TableContents
tables
Triple
Au
le
ak
41
137
218
a
isis)
3303337349
332
94
184
277
EXC
161
typ
SE
362
WriteCard
WriteLn
WriteString
al
263
G3
eo]
92
0923
5159
1757221219
7257222595
34082375.
31608316
El
369
a
22
er)
Gy?
IDEEN
400
405
IE
395
290
LUO
363
EEE
370
18119
376
385
189
190
re
189
190
198
ae
OO
oz
31
31
337
til
9872037203
338
Program listings
Perform
I(*=cocoust
3 This
4
a)
5
b)
6
c)
7
+d)
8
e)
various
tests
with
App. F
graph
top-down
module tests
if all nonterminals can be reached from the start
if there exist productions for all nonterminals
if all nonterminals can be derived to terminals
if the grammar is free of circular derivations
if the grammar satisfies the LL(1)-conditions
10 DEFINITION
MODULE
16 PROCEDURE
LL1Test (VAR
17
if the
19 PROCEDURE
20
(* ok=TRUE
22
PROCEDURE
23
(* ok=TRUE
25
PROCEDURE
26
(* ok=TRUE
28 END
12.1.83
symbol
cocotst;
12 PROCEDURE FindCircularRules(VAR ok:BOOLEAN);
13 (* Finds and prints the circular part of the grammar.
14
no circular part*)
(* Checks
Moe
ok means:
111:BOOLEAN);
grammar
satisfies
the
LL(1)
conditions*)
TestCompleteness
(VAR ok:BOOLEAN) ;
if all
nonterminals
have
rules*)
TestIfAllNtReached
(VAR ok:BOOLEAN);
if all
nonterminals
TestIfNtToTerm(VAR
cocotst.
if all
nonterminals
can
be
reached
from
the
start
ok:BOOLEAN);
can
be
reduced
to
terminals*)
symbol*)
App. F
cocotst MOD
339
1 (* cocotst
Perform various tests with the top-down graph
Moe 11.1.84
2
3 This module tests
4
a) if all nonterminals can be reached from the start symbol
5
b) if there exist productions for all nonterminals
6
c) if all nonterminals can be derived to terminals
i
d) if the grammar is free of circular derivations
8
e) if the grammar satisfies the LL(1)-conditions
Oa
ee
*)
10
IMPLEMENTATION
MODULE
cocotst;
FROM
cocogra
IMPORT
FROM
FROM
FROM
cocolex
cocolst
cocosym
IMPORT
IMPORT
IMPORT
rootloc, ClearMarkList, Deletable, DelNode,
Graphnode, GetNode, Mark, Marked, Marklist;
ddt, GetName;
lst;
maxp, maxs, maxt, ClearSet, GetF,
GetFirstSet, GetFo, GetSy, IsInSet, RepSy,
FROM
FileIo
IMPORT
al!
12
13
14
15
16
17
18
19
20
SetBit,
con,
Unit,
Symbolnode,
WriteCard,
Symbolset,
WriteString,
Symboltype;
WriteText,
WriteLn;
21 VAR
22
headline: BOOLEAN;
(*TRUE if header shall be printed*)
a)
slike
BOOLEAN;
(*TRUE if LL(1) conditions hold*)
24
25
26 (* FindCircularRules
Test grammar for circular derivations
Q] 2222222222220
28 PROCEDURE
29 CONST
*)
FindCircularRules
(VAR ok:BOOLEAN) ;
30
circmax = 150;
31 TYPE
32
Circrule = RECORD
33
left, right: CARDINAL;
34
del: BOOLEAN;
35
END;
36
Circrulelist = ARRAY[l..circmax] OF Circrule;
37 VAR
38
es
Circrulelist;
39
changed:
BOOLEAN;
40
headline:
BOOLEAN;
41
1,j},k,dummy: CARDINAL;
42
ied.
CARDINAL;
43
m
Marklist;
44
singleset:
Marklist;
(*set of single nonterminals in a production*)
45
sn:
Symbo lnode;
46
rside,lside: BOOLEAN;
47
48
PROCEDURE GetSingles(loc:CARDINAL; VAR singles:Marklist) ;
49
VAR gn: Graphnode;
50
51
BEGIN
IF (loc=0)
OR Marked(loc,m)
THEN
RETURN;
52
583
54
55
Mark (loc,m);
GetNode (loc,gn);
CASE gn.typ OF
eps:
GetSingles(gn.rp,singles) ;
56
| t,any:
;
57
58
ent»
IF Deletable(gn.rp)
IF DelNode(gn) THEN
59
END;
(*CASE*)
END;
THEN Mark (gn.sp,singles); END;
GetSingles(gn.rp,singles); END;
340
Program listings
GetSingles(gn.lp,singles) ;
60
END GetSingles;
61
62
PROCEDURE PutCirc(1:CARDINAL);
63
VAR
64
65
1: CARDINAL;
name: ARRAY[1..50]) OF CHAR;
66
sn: Symbolnode;
67
BEGIN
68
IF headline THEN
69
WriteLn (lst);
70
WriteString(lst,"Circular part for this grammar:");
71
72
WriteLn (lst);
73
headline:=FALSE;
74
END;
75
WriteString(lst,"
");
76
GetSy(c[i].left,sn); GetName(sn.spix,name,1);
vy
WriteText (lst,name,1l); WriteString(lst," --> ");
78
GetSy(e[i].right,sn); GetName(sn.spix,name,
1);
19
WriteText (lst,name,1l); WriteLn(lst);
80
END PutCirc;
81
82 BEGIN (*FindCircularRules*)
83
leirc:=0;
84 (*---------------------------- fill list of circular derivations c*)
85
FOR i:=maxp+l TO maxs DO
86
ClearMarkList (singleset); ClearMarkList
(m);
87
GetSy(i,sn);
88
GetSingles (sn.start,singleset);
(*get nt's j such that i->j*)
89
FOR ):=maxp+l TO maxs DO
90
IF Marked(j,singleset) THEN
91
INetlerre);
92
WITH c[lcirc] DO left:=i; right:=j; del:=FALSE; END;
93
IF ddt["D"] THEN
94
WriteCard(con, lcirc,6); WriteCard(con,i,6);
95
WriteCard(con,
j,6); WriteLn(con) ;
96
END;
97
END; (*IF Marked*)
98
END; (*FOR j*)
99
END;
(*FOR i*)
100 (*#=--2==22---2---------- remove non circular derivations from c*)
101
REPEAT
102
changed:=FALSE;
103
FOR 1:=1 TO lcirc DO
104
IF NOT c[i].del THEN
105
rside:=FALSE; lside:=FALSE;
106
FOR j:=1 TO leirc DO
107
IF NOT c[}j].del THEN
108
IF c{i].left=c[4].right THEN rside:=TRUE; END;
109
IF c{j].left=c[i].right THEN lside:=TRUE; END;
110
END;
ala
END; (*FOR j*)
112
IF NOT rside OR NOT lside THEN
113
c{i].del:=TRUE; changed:=TRUE;
114
IF ddt[{"D"] THEN
ISLS)
WriteCard(con,i,6); WriteString(con," deleted$");
116
END;
a7)
END;
118
END;
(*IF NOT c[i].del*)
App. F
cocotst MOD
END;
(*FOR*)
UNTIL NOT changed;
Saas ce contains the
123
124
125
126
20]
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
1155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
apt
172
173
174
175
176
107
341
circular
part
ok :=TRUE; headline:=TRUE;
FO R is=1 10 leire DO
IF NOT c[{i].del THEN PutCirc(i);
of the grammar.
ok:=FALSE;
Print
it*)
END;
END;
IR ok THEN
WriteLn (lst);
WriteString(lst,"Grammar
WriteLn (lst);
contains
no
circular
derivations.");
END;
END
FindCircularRules;
#
GET LlError
PROCEDURE
Print
LL(1)
error
message
LL1Error (code, line, sy:CARDINAL) ;
VAR
ile
name:
CARDINAL;
ARRAY[1..50]
sn:
BEGI N
OF
CHAR;
Symbolnode;
IF headline
THEN
headline:=FALSE;
WriteLn(lst);
WriteString(lst,"LL(1)-error(s):");
Writeln(lst);
END;
WriteString(lst,"
line"); WriteCard(l1st,
line, 4);
GetSy(sy,sn); GetName(sn.spix,name,
1);
WriteString(lst,"
HR
"CASE code OF
1: WriteText (lst,name,1);
WriteString(lst," is start
2:
of more
than
one
alternative.");
WriteText (lst,name,1);
WriteString(lst," is start and successor
WriteString(lst,"rest of rule.");
of deletable
");
END;
WriteLn (lst);
11:=FALSE;
END
LL1Error;
(* LL1Test
Collects
PROCEDURE LL1Test (VAR
VAR
dummy: CARDINAL;
gn:
Graphnode;
1, loc:
m:
Marklist;
sn:
Symbolnode;
terminal
sets
and
checks
LL(1)
conditions
111:BOOLEAN) ;
CARDINAL;
PROCEDURE
Test (VAR
s1,s2:Symbolset;
code, 1ine:CARDINAL) ;
VAR 1:CARDINAL;
BEGIN
FOR i:=0 TO maxt DO
IF IsInSet(i,sl) AND IsInSet(i,s2)
LL1Error (code, line, 1);
THEN
342
Program listings
178
END;
179
180
App. F
END;
END
Test;
181
182
183
PROCEDURE
184
185
VAR
gn:
186
187
CheckAlternatives
(loc, sym:CARDINAL) ;
Graphnode;
locset,s,first,follow:
Symbolset;
BEGIN
188
IF
189
GetNode (loc, gn) ;
(loc=0)
OR Marked(loc,m)
190
IF
ddt["F"]
THEN
RETURN;
THEN
191
WriteCard(con,loc,6);
192
193
WriteCard(con,gn.sp,6);
END;
WriteCard(con,ORD
IF Deletable (loc) THEN
GetFirstSet (loc,s); GetFo(sym, follow);
Test (s, follow,2,gn.line);
197
198
END;
ClearSet (s,maxt);
199
WHILE
loc<>0
DO
200
Mark (loc,m);
201
202
203
GetNode (loc, gn) ;
IF DelNode (gn)
THEN GetFirstSet
(gn.rp, locset) ;
ELSE
205
END;
206
CASE
207
tes
ClearSet
gn.typ
208
| nt:
| eps,any:
;
OF
GetF(gn.sp,first);
Unit (locset, first,maxt);
;
END;
ZN
212
213
Test (s, locset,1,gn.line) ;
Unit(s, locset,maxt) ;
CheckAlternatives(gn.rp,
sym) ;
214
loc:=gn.lp;
215
END;
216
(locset,maxt)
SetBit (locset,gn.sp);
209
210
(gn.typ),6);
WriteLln(con);
194
195
196
204
END;
END
CheckAlternatives;
217
218
219 BEGIN (*LL1Test*)
220
11:=TRUE; headline:=TRUE;
221
FOR 1:=maxp+1 TO maxs DO
222
ClearMarkList
(m) ;
223
GetSy (1,sn);
224
CheckAlternatives(sn.start,1);
225
226
END;
IF 11 THEN
227
228
229
230)
231
232
233
234
239)
WriteLn (lst);
WriteString(lst,"Grammar
END;
d=
END LL1Test;
(* TestCompleteness
Test if all
=== mean nna Sera
ci
236
PROCEDURE
satisfies
LL(1)-conditions.");
nonterminals
ee
TestCompleteness
(VAR ok :BOOLEAN) ;
have
WriteLn (1st);
rules
*
App. F
cocotst MOD
237 VAR
238
sn:
Symbolnode;
239
i,1,dummy:
240
name:
241
242
243
244
245
CARDINAL;
ARRAY[1..50]
OF CHAR;
BEGIN
Py,
ok:=TRUE;
FOR i:=maxp+l TO maxs
GetSy(i,sn);
IF sn.start=0 THEN
246
IF
ok
DO
THEN
247
WriteLn
248
249
WriteString(lst,"Nonterminals
END;
250
251
252
256
257
258
259
260
261
(lst) ;
without
rules:");
GetName (sn.spix,name,1);
WriteString(lst,"
"); WriteText (lst,name,l);
ok:=FALSE;
END;
END; (*FOR*)
253
254
255
343
IF
ok
WriteLn(lst);
WriteLn (lst);
THEN
WriteLn (lst);
WriteString(lst,"All
END;
END TestCompleteness;
nonterminals
Tests
if all
have
nts
can
rules.");
be
WriteLn(lst);
262
(* TestIfAllNtReached
263
---------------------------------------------------==
x)
reached
264 PROCEDURE TestIfAllNtReached(VAR ok:BOOLEAN) ;
265
266
VAR
gn:
Graphnode;
267 » i,1,dummy:
CARDINAL;
268
269
270
271
272
273
274
275
m:
name:
sn:
reached:
Marklist;
ARRAY[1..50]
Symbolnode;
Marklist;
276
za
278
BEGIN
IF (loc=0) OR Marked(loc,m)
Mark (loc,m);
PROCEDURE MarkReachedNts
VAR gn: Graphnode;
sn: Symbolnode;
279
GetNode (loc,gn);
280
WITH gn DO
281
282
283
284
285
286
287
288
289
290
291
292
OF CHAR;
(loc:CARDINAL) ;
THEN
RETURN;
IF
END;
(typ=nt) AND NOT Marked(sp, reached) THEN
Mark (sp, reached); GetSy(sp,sn); MarkReachedNts(sn.start) ;
END;
MarkReachedNts
(lp) ;
MarkReachedNts
(rp) ;
END;
END MarkReachedNts;
BEGIN
ClearMarkList
(m) ;
ClearMarkList (reached);
293
GetNode (rootloc,gn);
GetSy(gn.sp,sn);
Mark(gn.sp, reached) ;
294
295
MarkReachedNts(sn.start);
ok:=TRUE;
344
296
297
Program listings
GetSy(i,sn); GetName (sn.spix,name, 1);
WriteString(lst,"Nonterminal "); WriteText (lst,name, 1);
WriteString(lst," cannot be reached."); WriteLn (lst);
ok:=FALSE;
END;
END;
304
IF
305
306
307
308
309
310
WriteLn (1st);
WriteString(lst,"All nonterminals
END;
END TestIfAllNtReached;
SD
313
ok
THEN
(* TestIfNtToTerm
mm
PROCEDURE
Test
if all
nt
can
be
derived
TestIfNtToTerm(VAR
t
=)
ok:BOOLEAN) ;
which
can
be
derived
to
BEGIN
IF
327
Mark (loc,m);
(loc=0)
OR Marked(loc,m)
328
GetNode (loc,gn);
329
WITH gn DO
330
IF
(typ=nt)
AND
NOT
331
332
333
334
335
336
THEN RETURN
ELSE RETURN
END;
END;
END IsTerm;
337
338
BEGIN (*TestIfNtToTerm*)
ClearMarkList (termlist) ;
340
341
342
343
344
345
346
347
348
349
350
351
to
nen
326
339
WriteLn(1lst);
reached.");
be
can
314 VAR
315
i1,1,dummy: CARDINAL;
3165
ssn:
Symbolnode;
Si
name:
ARRAY[1..50] OF CHAR;
318
changed:
BOOLEAN;
319
termlist: Marklist;
(*list of nts
3202
M:
Marklist;
321
term:
BOOLEAN;
322
323
PROCEDURE IsTerm(loc:CARDINAL)
: BOOLEAN;
324
VAR gn: Graphnode;
325
symbols*)
not marked
FOR i:=maxp+l TO maxs DO (*report
IF NOT Marked(i, reached) THEN
298
299
300
301
302
303
311
App.
THEN
RETURN
FALSE;
END;
Marked(sp,termlist)
IsTerm(lp);
(rp=0) OR IsTerm(rp)
OR IsTerm(lp);
REPEAT
changed: =FALSE;
FOR i:=maxp+l TO maxs
DO
IF NOT Marked(i,termlist)
GetSy (1,sn);
ClearMarkList
(m);
term:=IsTerm(sn.start);
THEN
IF term THEN Mark (i,termlist);
IF ddt["E"] THEN
WriteCard(con,i,6);
IF term
THEN WriteString(con,"
ELSE
WriteString(con,"
352
353
END;
END;
(*IF
354
355
END; (*FOR*)
UNTIL NOT changed;
NOT
Marked*)
changed:=TRUE;
reducable
not
END;
to term.$");
reducable
to term.$");
END;
t*)
App. F
cocotst
356
357
ok:=TRUE;
WriteLn (lst);
358
FOR i:=maxp+l
TO maxs
MOD
345
DO
359
IF NOT Marked(i,termlist) THEN
360
GetSy(i,sn); GetName(sn.spix,name,1);
361
WriteText (lst,name,1);
362
WriteString(lst," cannot be derived to terminals."); WriteLn(ls
t);
363
ok:=FALSE;
364
END;
365
END; (*FOR*)
366
IF ok THEN
367
WriteString(lst,"All nonterminals can be derived to terminals.");
368
WriteLn (lst);
369
END;
>
370
END TestIfNtToTerm;
hl
372
373 END cocotst.
any
c
changed
56
209
38
76
78
SIERT
92
120
216
104
318
224
107
340
108
346
108
355
86
204
222
290
291
338
344
172
94
gH
95
95
115
115
il
191
ddt
114
del
104
Deletable
194
DelNode
202
dummy
41
165
239
eps
55
209
FileIo
19
FindCircularRules 28
131
first
186
208
208
follow
186
195
196
GetF
16
208
GetFirstSet
17195722083
190
107
347
118
124
267
315
147
201
60
87
55
201
280
185
73
250
219
61
147
57
202
292
266
122
282
58
207
324
293
60
208
328
CheckAlternatives
circmax
Circrule
Circrulelist
ClearMarkList
ClearSet
cocogra
cocolex ,
cocolst
cocosym
cocotst
code
con
GetFo
GetName
GetNode
GetSingles
GetSy
gn
Graphnode
headline
30
32
36
12,
16
12
14
15
16
10
136
19
350
14
34
12
12
183002183
36
124
192
192
348
298
166
211
329
343
360
US)
IHS)
21357214
36
38
86
198
373
149
94
351
93
92
57
58
17.195
14
76
13
53
48
Ma
49
191
266
13
22
113
55)
53
192
274
49
40
78
189
08
TKS}
54
196
279
166
69
298
292
88
223
57
203
292
274
142
360
328
244
58
206
293
324
143
220
IsInSet
IsTerm
j
k
1
ikenlfete:
left
line
ifal
at
LLlError
LL1Test
loc
locset
m
Mark
Marked
Marklist
MarkReachedNts
maxp
maxs
maxt
name
nt
ok
PutCirc
reached
RepSy
right
rootloc
rp
rside
s
sl
s2
SetBit
singles
singleset
sn
sp
App. F
Program listings
346
4
113
223
343
107
323
41
41
65
267
42
33
136
23
163
136
163
48
195
326
186
60
46
IL)
144
227
299
368
43
320
13
13
13
273
16
16
16
66
269
Di
28
304
63
Zale.
17
33
12
55
46
186
172
172
18
48
44
45
223
298
57
104
176
315
108
177
341
109
221
342
152
239
250
251
188
277
188
278
189
279
191
323
194
326
ul
150
251
306
18
151
251
357
79
152
251
361
2
153
256
362
128
154
257
362
129
156
257
367
188
200
222
268
277
278
290
278
277
168
287
243
243
208
79
360
282
281
268
294
296
296
DZ
189
361
292
297
ZN
327
326
319
346
330
320
342
359
341
341
358
358
147
150
152
240
250
251
236
266
242
246
252
239
264
295
301
Zon)
292
zZ
92
108
109
58
108
196
203
Ha
198
213
285
332
332
211
212
57
88
76
238
316
207
58
90
76
244
343
208
60
87
270
360
282
88
PAS)
140
Se:
147
147
169
2820729377294
292
293
330
92
141,3
296
94
1775
297
103
176
298
107
108
109
147
150
106
123
167
273
183
Zu
208
332
211
212
12)
146
248
305
77
148
248
306
168
63
115
224
346
176
sit
89
76
123
239
348
176
332
90
78
124
243
358
85
124
244
359
87
167
267
360
332
92
335
95
345
106
76
298
83
76
146
254
230
158
231.
Sil
199
327
203
214
105
70
144
228
299
am
299
Sil
92
172
220
78
sly)
92
108
177
226
79
360
94
109
196
230
138
361
103
52
201
53
214
207
es
112
72
146
247
300
51
326
52
fet
43
282
85
85
1075
76
298
208
122
313
80
281
52
327
57
90
44
284
89
89
198
Ui
299
281
124
356
124
282
86
344
200
188
48
285
221
22)
204
78
Sur]
330
126
363
78
292
57
105
195
176
176
207
55
86
67
224
298
192
211
177
sl
200
328
204
284
109
71
144
228
300
78
245
345
281
78
250
360
282
App. F
cocotst MOD
spix
start
sy
sym
Symbolnode
Symbolset
Symboltype
t
76
88
136
183
18
18
18
56
term
termlist
321
319
Test
1072
TestCompleteness 236
TestIfAllNtReached
TestIfNtToTerm
318
typ
54
Unit
18
WriteCard
19
WriteLn
19
228
WriteString
19
228
WriteText
19
78
224
147
195
45
172
207
345
330
180
259
264
370°
191
208
94
70
247
71
248
77
147
245
250
282
298
294
360
345
213
67
186
140
169
238
346
338
196
349
342
211
346
359
281
330
95
79
251
1
257
150
115
95
256
145
299
152
347
270
275
316
191
129
300
144
306
299
191
144
305
146
350
361
192
144
306
148
351
308
206
212
94
72
248
75
251
79
146
127
257
128
300
25
348
156
San
151
362
192
362
158
367
22,
368
154
Program listings
module
General
(* Errors
App. F
error
to store
Moe
messages
21.03.84
This module stores information about syntax errors and semantic errors.
The information can either be retrieved afterwards or be printed
automatically as simple error messages.
Furthermore the module contains procedures to report compiler errors
and implementation restrictions. These procedures cause a program stop.
DEFINITION
FROM
MODULE
FileIO
IMPORT
Errors;
File;
TYPE
Symbolname
{|
=
ARRAY[1..25]
Errorptr
POINTER
Errornode
= RECORD
txt:
Symbolname;
Ike
next:
END;
PROCEDURE
(* Reports
OF CHAR;
TO Errornode;
(*expected symbol
in syntax
error
message*)
CARDINAL;
Errorptr;
CompErr(nr:CARDINAL) ;
compiler
error
nr
and
stops
the
program*)
PROCEDURE GetNextSemErr(VAR nr, line,col:CARDINAL);
(* Gets the error number, the line number and the column
next semantic error. nr=0 if no next error exists*)
number
of
the
PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL) ;
(* Gets the expected symbols, the line number and the column number of
the next syntax error. symbols=NIL if no next error exists*)
PROCEDURE GetNumberOfErrors(VAR synerrors, semerrors:CARDINAL) ;
(* Gets the total number of syntax errors and semantic errors which
occurred during compilation*)
PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL) ;
(* Prints error messages for all stored semantic errors
(line,col,
error number). semerrors holds the total number of stored semantic
errors*)
PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL);
(* Prints error messages for all stored syntax errors (line,col,
"near symbol",expected symbols). synerrors holds the total number
stored syntax errors*)
PROCEDURE
(* Prints
PrintSynError(f:File; symbols:Errorptr; col:CARDINAL)
one error message line (* expected symbols) .*)
PROCEDURE Restriction(nr:CARDINAL);
(* Reports implementation restriction
nr and
PROCEDURE SemErr(nr,line,col:CARDINAL);
(* Stores the error number, line number and
error*)
PROCEDURE
(* Stores
stops
column
the
of
;
program*)
number
of a semantic
SyntaxError (symbols:Errorptr; line,col:CARDINAL) ;
the "near-symbol", the expected symbols, the line number
and
App. F
60
Errors. DEF
the
column
61
62 END
Errors.
number
of a syntax
error*)
349
Program listings
(* Errors
General
module
to
store
App. F
error
messages
Moe
21.03.84
This module stores information about syntax errors and semantic errors.
The information can either be retrieved afterwards or be printed
automatically as simple error messages.
Furthermore the module contains procedures to report compiler errors
and implementation restrictions. These procedures cause a program stop.
IMPLEMENTATION
(*imports
FROM
Errors;
of definition
FileIO
(*imports
MODULE
IMPORT
module*)
File;
of implementation
FROM
FileIO
IMPORT
FROM
System
IMPORT
module*)
con, Write, WriteCard, WriteLn, WriteString,
WriteText, Read;
Allocate, Deallocate, Terminate, normal;
TYPE
Semerrptr = POINTER TO Semerror;
Semerror
= RECORD
nr,line,col: CARDINAL;
next: Semerrptr;
END;
Synerrptr
Synerror
= POINTER
= RECORD
symbols:
TO
Synerror;
Errorptr;
line,col: CARDINAL;
next: Synerrptr;
END;
VAR
semerr:
synerr:
Semerrptr;
Synerrptr;
(* CompErr
Reports
compiler
error
nr
and
stops
the
program
PROCEDURE CompErr(nr:CARDINAL);
VAR dummy:CARDINAL; ch:CHAR;
BEGIN
PrintSynErrors (con, dummy) ; PrintSemErrors (con, dummy) ;
WriteString(con,"Compiler error "); WriteCard(con,nr,0);
WriteString(con,". Program terminated.$");
WriteString(con,"Press a key to continue.$"); Read (con, ch) ;
Terminate (normal);
END
5
CompErr;
(* GetNextSemErr
Gets
next
semantic
error
information
53 PROCEDURE GetNextSemErr
(VAR nr, line,col:CARDINAL) ;
54 VAR p: Semerrptr;
55 BEGIN
56
IF semerr=NIL
57
THEN nr:=0; line:=0; col:=0;
58
ELSE
59
P:=semerr;
App. F
60
61
62
63
64
65
66
DES
68
69
70
71
72
a3
Errors MOD
351
nr:=p*.nr; line:=p*.line; col:=p*.col;
semerr:=p*.next; Deallocate(p);
END;
END GetNextSemErr;
(* GetNextSynErr
Gets next syntax error information
a a ea
mu
land
PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL) ;
VAR p: Synerrptr;
BEGIN
IF synerr=NIL
THEN symbols:=NIL; line:=0; col:=0;
ELSE
‘i
74
p:=synerr;
15
symbols:=p*.symbols; line:=p*.line; col:=p*.col;
76
synerr:=p*.next; Deallocate (p);
77
END;
78
END GetNextSynErr;
79
80
81 (* GetNumberOfErrors
Gets the total number of errors that occurred
82 --------- 2222-222 ----- 4-2.
20002 2-2 ---==
=~
83 PROCEDURE GetNumberOfErrors(VAR synerrors, semerrors:CARDINAL);
84 VAR
85
syn: Synerrptr;
86
sem: Semerrptr;
87
88
89
90
*)
©)
BEGIN
synerrors:=0; syn:=synerr;
„WHILE syn<>NIL DO INC(synerrors);
Semerrors:=0;
syn:=syn*.next;
END;
sem:=semerr;
91
WHILE sem<>NIL DO INC(semerrors); sem:=sem*.next; END;
92
END GetNumberOfErrors;
93
94
95 (* PrintSemErrors
Prints simple error messages for semantic errors
96 --------------7--7
222-227
*)
97 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL) ;
98 VAR
9
p:
Semerrptr;
100
synerrors: CARDINAL;
101 BEGIN
102
GetNumberOfErrors (synerrors,semerrors);
103
IF semerrors>0 THEN
104
WriteString(f,"Semantic errors:$$");
105
p:=semerr;
106
WHILE
p<>NIL
DO
107
WriteString(f,"line"); WriteCard(f,p*.line,5);
108
WriteString(f," col"); WriteCard(f,p*.col,
3);
109
WriteString(f,": error "); WriteCard(f,p*.nr,0);
110
WriteLn (f);
aval
p:=p*.next;
#12
END;
la)
END;
114
END PrintSemErrors;
115
116
117 (* PrintSym
Print a symbol in error message
118 ----------------------------------------------------------------------- x)
119
120
121
122
123
App. F
Program listings
352
PROCEDURE
BEGIN
IF
txt:ARRAY
PrintSym(f:File;
OF
CHAR;
len=1
THEN Write(f,'"'); Write(f,txt[0]);
ELSE WriteText (f,txt,len);
len:CARDINAL);
Write(f,'"");
124
END;
125
END PrintSym;
126
127
Print expected symbols
128 (* PrintExpected
7
129 ------ == = 777777777707
7770000007
130 PROCEDURE PrintExpected(f:File; VAR p:Errorptr);
131 VAR first:BOOLEAN; q:Errorptr;
132 BEGIN
133.
first:=TRUE;
134
WHILE p<>NIL DO
135
IF first THEN first:=FALSE
136
ELSIF p*.next=NIL THEN WriteString(f,' or ')
137
ELSE
138
139
140
END;
PrintSym
(i; p> txt, poe)
q:=p; p:=p*.next; Deallocate(q);
141
142
143
WriteString(f,',
')
END;
WriteString(f,' expected’);
END PrintExpected;
WriteLn(f);
144
145
146 (* PrintSynErrors
Prints simple error messages for syntax errors
al
en
Haaren
148 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL) ;
149 VAR
150
iil
152
153
154
155
156
1157
158
159
160
err,errl:
og
semerrors:
Synerrptr;
Errorptr;
CARDINAL;
BEGIN
GetNumberOfErrors (synerrors, semerrors);
IF synerrors>0 THEN
WriteString(f,"Syntax errors:$$");
err:=synerr;
WHILE
err<>NIL
DO
WriteString(f,'line');
p:=err”.symbols;
WriteCard(f,err*.line,5);
161
WriteString(f,' near '); PrintSym(f,p*.txt,p*.1);
162
Writes trstno (ti
een:
163
PrintExpected(f,p*.next); Deallocate(p);
164
errl:=err; err:=err*.next; Deallocate(errl);
165
END;
166
END;
167
END PrintSynErrors;
168
169
170 (* PrintSynError
Prints one error message line
OSI
I
BE
EEE
172 PROCEDURE PrintSynError(f:File; symbols:Errorptr; col:CARDINAL);
173
174
a)
VAR i:CARDINAL;
BEGIN
175
WriteString(f,"*****
176
Wii
WriteString(f,"* ");
PrintExpected(f,symbols” .next);
");
FOR
4:=1
TO
col-1
Deallocate
DO Write(f,"
(symbols);
") END;
x
App. F
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
1199
200
201
202
203
204
205
206
207
208
209
210
211
212
Errors MOD
END
PrintSynError;
(* Restriction
Reports
impl.
ee
eS a mem ees SO a oe
SeI
restriction
a
nn
nr and
nu
stops
the program
nn
nun
lan
x)
PROCEDURE Restriction(nr:CARDINAL) ;
VAR dummy:CARDINAL; ch:CHAR;
BEGIN
PrintSynErrors(con,dummy); PrintSemErrors
(con, dummy) ;
WriteString(con,"Implementation restriction "); WriteCard(con,nr,0);
WriteString(con,". Program terminated.$");
WriteString(con,"Press a key to continue.$"); Read(con,ch);
Terminate (normal);
END Restriction; ”
(* SemErr
a
a a
PROCEDURE
VAR
Stores
- -- -—- - ===
e,p,qg:
information
22-22
en
about
nn
semantic
------
error
- =
22==-----_______
*)
SemErr (nr, line,col:CARDINAL);
Semerrptr;
BEGINAllocate
(e,SIZE (Semerror)); e*.nr:=nr; e*.line:=line; e*.col:=col;
p:=semerr; q:=NIL;
WHILE (p<>NIL) AND (p*.line<line) DO q:=p; p:=p*.next; END;
WHILE (p<>NIL) AND (p*.line=line) AND (p*.col<col) DO
q:=p; p:=p*.next;
END;
IF q=NIL THEN
e*.next:=p;
« „END SemErr;
(* SyntaxError
semerr:=e;
ELSE
Stores
g*.next:=e;
information
PROCEDURE SyntaxError (symbols:Errorptr;
213 VAR e,p,q: Synerrptr;
214
353
END;
about
syntax
error
line,col:CARDINAL) ;
BEGIN
215
216
ZN
218
219
220
221
222
223
Allocate(e,SIZE(Synerror)
);
e*.symbols:=symbols; e*.line:=line; e*.col:=col;
p:=synerr; q:=NIL;
WHILE (p<>NIL) AND (p*.line<line) DO q:=p; p:=p%.next;
WHILE (p<>NIL) AND (p*.line=line) AND (p*.col<col) DO
q:=p; p:=p*.next;
END;
IF q=NIL THEN synerr:=e; ELSE q*.next:=e; END;
e*.next:=p;
END SyntaxError;
END;
224
225
226 BEGIN
(*Errors*)
227 ~ synerr:=NIL; semerr:=NIL;
228
END Errors.
Allocate
ch
col
CompErr
con
iz, Se)
ails}
44
46
184
189
a)
eS ee
OO
NU Sie 969
IE O2
40
48
TSA
Ses Se 4444
OC
OC
er
A
0221221672165
45465
46)
186
a
219219
186
18702187
2
err
188
17
41
197
ONG
150
189
61
43
199
QE
157
189
76
43
199
APE
158)
errl
150
164
164
Deallocate
dummy
e
Errorptr
Errors
f
File
FileIo
first
App. F
Program listings
354
ale)
9 228
C7) LOA
Ore
140
184
199
Pees}
159°
163
186
199
164
186
205
205
C0
eA
64162
stk
a
ee
Be
A
aeOS
OF ee
177
De
oe
ee
le
ee
ae
a
12
12
eh
ec10
BP
CU an
15
ek
ae
GetNextSemErr
93
63
GetNextSynErr
68
78
EP
206
ae
2
SMO)
43
aly
97
ans)
114
aa
186
PrintSym
Ie)
sy
isi)
ANS
PrintSynError
PrintSynErrors
q
278
43
148
167
13D
LAO
LAO
220022200722
le
46
189
183
191
86
90
91
34
56
59
196
207
21
201199
83
9
91
2
24
34
28
68
72
85
8
89
ea
ul
74
26
2235
83
88
89
26
30
Se)
Zee
224
17
I
eh
19.0)
216
216
ne
re
12
1
eA
se
Ale
AS
75
26
YS
le
SSR)
2160210218
ee
er
a
183
187
196
199
75
15
75
ashe)
US
az
EE MUS
136521392139
140
en
AN
al
AA
CAVES PAYS a
22005220
2205223
Read
Restriction
sem
semerr
SemErr
Semerror
semerrors
Semerrptr
symbols
syn
synerr
Synerror
synerrors
Synerrptr
SyntaxError
System
Terminate
215
re
GetNumberOfErrors 83
92
102
154
1
Hash.
aS)
il
139
161
len
Mae)
brat
sue
line
23
29
55
57
60
60
68
72
196"
199)
199°
201
201
52027
202
21205
219
219
next
24
30
61
76
89
sr
a
re
2.010=2032°2.0552206.0218772202225223
normal
ia
47
190
nr
23
40
44
53
57
60
60
109
199
p
54
59
60
60
60
61
61
69
PrintExpected
PrintSemErrors
213
186
ITEE2
91
61
KS
140
AW
Pay
0
ae
SSI
POL
ale
102
74
Me)
en
NCOm
Glen
202202
A
A
0201203205205
91
SOROS ae
Sr
54
86
75
75
89
89
76
Be
100
SY
a
140
AA
NE
Ra
ae
SY
oe
l6Seeale3
DOP
Ros
wwe
WC)
Maley DING
Dong
227
ANNs}
GY
99
197
16070172
ahGya
177
170.
eh
QV
222
297)
148
aI)
154
AS)
155
aie
more
App. F
txt
Write
WriteCard
WriteLn
WriteString
WriteText
Errors. MOD
119
15
15
15
15
159
16
122
122
44
110
44
161
123
123
139
122
122
107
108
142
45
46
162 21775
355
161
175
109
159
187
104
176
107
187
108
188
109
189
136
137
142
156
App. F
Program listings
356
Simple
1 (* FileIo
2
3 This module
4 except that
provides
they can
IO with
more
than
one
16.8.87
Moe
file
procedures which are similar to those
be used with more than one file (even
of InOut,
with the
5 console).
7 DEFINITION
9 FROM
MODULE
SYSTEM
FilelO;
IMPORT
10 FROM Toolbox IMPORT
11 FROM OS
IMPORT
2
13 CONST
14
DEL = 177C;
15
EF
=
4C;
16
EOL =
15C;
wi
PSCa=ooGs
18
buffersize
20
21
22
=
WORD;
DialogPtr;
ParmBlkPtr;
16*1024;
TYPE
File = POINTER TO FileRecord;
FileRecord = RECORD
23
24
ref:
volRef:
INTEGER;
INTEGER;
25
name:
ARRAY[0..63]
26
27
28
29
30
buffer:
bp:
bb:
output:
eof:
ARRAY(0..buffersize-1]
CARDINAL;
(*index of
CARDINAL;
(*number of
BOOLEAN;
(*true, if
BOOLEAN;
(*true, if
31
END;
32
33 VAR
SAM
con:
35
Done:
36
termCH:
(*file reference number*)
(*volume (subdirectory) reference
OF CHAR;
File;
BOOLEAN;
CHAR;
(*Modula
string
number*)
terminated
by 0C*)
OF CHAR;
next byte in buffer*)
bytes in buffer*)
opened for output*)
no more unread bytes*)
(*console file (screen and keyboard)*)
(*TRUE if an operation was successful*)
(*first character after input text*)
37
38 (* --SYETYPE
40
4
42
for Mac
FilterHook
DialogHook
Filetype
u
open
dialog
box
(see
"Inside
Macintosh")
---
*)
PROCEDURE (ParmBlkPtr) : BOOLEAN;
PROCEDURE (INTEGER, DialogPtr): INTEGER;
ARRAY[0..3] OF CHAR;
43
44 VAR
45
46
47
48
49
50
Sl
52
errCode:
filterHook:
dlgHook:
ftype:
DialogHook;
ARRAY[0..3]
(*file
(*file
manager status code*)
filter procedure (init none)*)
(*dialog handling procedure (init none)*)
OF Filetype;
(*file types to be handled by open dialog*)
(*init: ftype(0]:="TEXT", ftype[1..3]:=""*)
(* ------------------------------------------------------~-
53 PROCEDURE
54
55
56
57
58
59
INTEGER;
FilterHook;
Open(VAR f:File; volRef:INTEGER;
output
: BOOLEAN) ;
fn:ARRAY
a)
OF CHAR;
(* Opens file f with name fn on volume (subdirectory) volRef.
volRef
0:default volume; 1:internal drive; 2:external drive
negative:volume or subdirectory reference number.
fn
- If not empty, fn is the name of the file to be opened on
volume (subdirectory) volRef. The drive number may be placed
App. F
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
a
112
113
114
115
116
117
118
FilelO.DEF
357
In front of the file name separated by a colon (e.g.1:na
me).
It overwrites volRef.
- If empty, an open dialog box is displayed which allows
choosing the volume, subdirectory and filename. The chosen
values are returned in f*. The value of volRef is irrelevan
t
in this case.
(Advanced programmers: Only those files are displayed whose
file type is contained in ftype. Own procedures may be
supplied in the variables "filterHook" and "dlgHook" to
suppress file names in the open box or to handle additional
output
dialog
items.)
TRUE:
the specified
file with the
FALSE:
Done
indicates
file
same
is opened for output.
name is deleted.
the specified file is opened for input.
if the
PROCEDURE
Close
(VAR
(* Closes
file
f.
file
f has been
indicates
opened
NIL*)
if the operation
f (no echo
has been
ReadInt (f:File;
VAR
on
the
are
skipped).
are
skipped) .
val:INTEGER);
(* Reads an INTEGER from file f (leading
~ . termCH and Done get values*)
blanks
PROCEDURE ReadString(f:File; VAR s:ARRAY OF CHAR);
(* Reads a string of characters (terminated by " " or CR)
file f. termCH and Done get values*)
PROCEDURE ReadWord(f:File; VAR w:CARDINAL);
(* Reads a 16 bit word w from the file f without
PROCEDURE
Write(f:File;
(* Writes
a character
PROCEDURE
(* Writes
width
ch:CHAR);
ch to
the
file
f*)
WriteCard(f:File; nr:CARDINAL; w: INTEGER);
a CARDINAL nr with width w to the file f. If the
of nr is bigger than w, w is expanded*)
WriteHex(f:File; a:ARRAY
length hexadecimal bytes
PROCEDURE
(* Writes
WriteInt(f:File;
1:INTEGER; w: INTEGER);
an INTEGER i with w characters to file f.
PROCEDURE
(* Skips
from
conversion*)
PROCEDURE
(* Writes
width
console).
successful*)
PROCEDURE ReadCard(f:File; VAR val:CARDINAL);
(* Reads a CARDINAL from file f (leading blanks
termCH and Done get values*)
PROCEDURE
successfully.*)
f:File);
f becomes
PROCEDURE Read(f:File; VAR ch:CHAR);
(* Reads a character ch from the
file
Done
Any existing
of nr is bigger
than
w,
actual
OF WORD; length: INTEGER) ;
from a to the file f*)
If the
actual
w is expanded*)
WriteLn(f:File);
to the
start
of the next
line
on the
file
PROCEDURE WriteString(f:File; s:ARRAY OF CHAR);
(* Writes a string s to the file f. Any occurrence
"Ss" in s causes a WriteLn*)
f*)
of the
character
App. F
Program listings
358
119
PROCEDURE
WriteText
120
121
(* Writes
a text
122
PROCEDURE
WriteWord(f:File;
123
124
(* Writes
a 16 bit
125
END
FilelO.
(f:File;
t with
word
t:ARRAY
length
OF
CHAR;
1 to the
file
1: INTEGER);
f*)
w:CARDINAL) ;
w without
conversion
to the
file
f*)
App. F
1
2
3
4
5
FilelO.MOD
(* FileIo
Simple
This module
except that
console).
provides
they can
7 IMPLEMENTATION
9 FROM
SYSTEM
IO with
more
than
359
one
file
Moe
16.8.87
procedures which are similar to those
be used with more than one file (even
MODULE
FileIo;
of InOut,
with the
IMPORT
WORD,
10 FROM MemTypes
IMPORT
Str255,
11 FROM
112
18
IMPORT
DupFNErr, EOFErr, OSType, ParamBlockRec,
FS, PBHOpen, PBHCreate,PBClose, PBHDelete,
PBWrite,
OS
,
14
'
ADR,
HFS,
GetCatInfo,
15
SFGetFile,
16
SFTypeList;
17 FROM
18 FROM
QuickDraw
Toolbox
IMPORT
IMPORT
Point;
ModStr,
19 FROM
System
IMPORT
Allocate,
20
SETREG,
REG,
SHORT,
VAL;
ProcPtr;
PBRead,
SetCatInfo,
SFPutFile,
SFget,
SFput,
SFReply,
PasStr;
Deallocate;
Terminal;
IMPORT
25 PROCEDURE Open(VAR f:File;
output:BOOLEAN) ;
26
volRef:INTEGER;
fn:ARRAY
OF CHAR;
27 VAR
28
ZI
par:
Ss:
30
31
32
. pt:
reply:
tlist:
ParamBlockRec;
Str259;
Point;
SFReply;
SFTypeList;
33
4i,4,1:
34
35
36
37
PROCEDURE Create (drive: INTEGER; name:ARRAY OF CHAR;
type,creator:OSType; VAR status: INTEGER) ;
VAR statusl: INTEGER; par:ParamBlockRec;
38
BEGIN
39
WITH par DO
40
41
42
43
loNamePtr:=ADR (name); 1oVRefNum:=drive;
status:=FS(PBHCreate,par); statusl:=0;
IF status=DupFNErr THEN
statusl:=FS(PBHDelete,
par) ;
44
status:=FS (PBHCreate,
45
46
END;
IF (status=0)
47
48
49
50
51
52
53
54
55
56
INTEGER;
AND
ioVersNum:=0C;
par);
(statusl=0)
THEN
(*set
finder
info*)
1oFDirIndex:=0; status:=HFS (GetCatInfo,par);
IF status=0 THEN
1oFlFndrInfo.fdType:=type; ioFlFndrInfo.fdCreator:=creator;
1oDirID:=0;
status:=HFS (SetCatInfo,par);
END;
END;
END;
END Create;
57 BEGIN
58
59
ioDirID:=0;
Done:=TRUE;
IF fn[{0J=0C
errCode:=0;
THEN (*get file
name
from dialog box*)
App. F
Program listings
360
60
61
62
pt.v:=60; pt.h:=100; PasStr(fn,s);
IF output
THEN SFPutFile(pt,s,s,VAL(ProcPtr,dlgHook)
, reply, SFput)
63
ELSE
64
65
66
67
1:=0;
WHILE (1<4) AND (ftype[1,0]<>0C) DO
FOR j:=0 TO 3 DO tlist[i,j+1]:=ftype(i,j]
INC (1)
68
END;
END;
69
70
SFGetFile(pt,s,VAL(ProcPtr, filterHook),i,tlist,
VAL (ProcPtr,dlgHook)
, reply, SFget)
71
72
END;
IF reply.good
73
THEN
74
1:=ORD (reply. fName[0]);
25
FOR
i:=0
TO
1 DO
s[i]J:=reply.fName[i];
END;
76
77
78
79
80
81
82
83
84
volRef:=reply.vRefNum
ELSE errCode:=2
(*cancel*)
END;
ELSIF (£n[1]=":") AND (£n[0]>="0") AND (fn[0]<="9") THEN
volRef:=ORD (£fn[0])-ORD ("0");
1:=2;
WHILE (i<=HIGH(fn)) AND (fn[{i]<>0C) DO s[i-1]:=fn[i]; INC(i)
s[0):=CHR(i);
ELSE PasStr(fn,s);
85
END;
86
87
88
89
90
91
92
IF output & (errCode=0) THEN
Create (volRef,s, "TEXT", "222?" ,errCode) ;
END;
IF errCode=0
THEN
WITH par DO
93
94
95
96
97
joNamePtr:=ADR(s); 10VRefNum:=volRef;
1oPermssn:=0C; ioMisc:=NIL;
errCode:=FS (PBHOpen, par) ;
IF errCode=0 THEN
Allocate
(f,SIZE (FileRecord));
98
IF
99
f<>NIL
f*.bp:=0;
£*.bb:=0;
f*.volRef:=volRef;
f*.eof:=FALSE;
ModStr(s,f*.name) ;
f* .output:=output;
END;
END;
102
103
END;
IF errCode#0
END Open;
THEN Done:=FALSE;
106
107
108
109 (* Close
Close file f
110 ===
Hanne
en Se ae
111 PROCEDURE Close(VAR f:File);
112 VAR par:ParamBlockRec;
114
115
116
IKT]
118
ioDirID:=0;
END;
101
113
10VersNum:=0C;
THEN
f*.ref:=1oRefNum;
100
104
105
END;
f:=NIL
END;
oe
ee
gee eee
BEGIN
IF f=NIL THEN RETURN END; (*con cannot be closed*)
par.ioRefNum:=f*.
ref;
IF f*.output THEN
par.ioBuffer:=ADR(f*.buffer);
par.ioReqCount :=f*.bp; par.ioPosMode:=0; par.ioP
osOffset:=0;
App. F
19
120
FilelIO MOD
errCode:=FS
END;
361
(PBWrite, par)
121
errCode:=FS
122
123
124
125
126
aa
Deallocate(f);
END Close;
(* Read
Read a character from file f
ee
ee
128
PROCEDURE
(PBClose, par) ; Done:=errCode=0;
f:=NIL;
Read(f:File;
VAR
x)
ch:CHAR) ;
129 VAR par:ParamBlockRec;
130
BEGIN
1
IF f=NIL
132
THEN
133
ELSE
134
(*con*)
Terminal.Read(ch);
WITH
135
136
£*
DO
IF bp>=bb THEN
par.ioRefNum:=ref;
1737
138
par.ioBuffer:=ADR (buffer);
par.ioReqCount:=buffersize; par.ioPosMode:=0;
par.ioPosOffset:=0;
139
errCode:=FS (PBRead, par) ;
140
IF errCode=EOFErr
141
142
bb:=SHORT (par.ioActCount);
IF bb=0 THEN
143
buffer[0]:=EF;
144
END
145
errCode:=0
END;
bp:=0;
eof:=TRUE;
Done:=FALSE;
errCode:=ROFErr
END;
146
ch:=buffer [bp];
147
END
148,
149
THEN
INC (bp)
END;
” END Read;
150
151
152
(* ReadCard
153
---------------------~---------------------------------------------- *)
154
155
PROCEDURE ReadCard(f:File;
VAR ch:CHAR; 1:INTEGER;
156 BEGIN
157,
IF f=NIL
158
Read
a CARDINAL-constant
VAR
(*input
1:=0; val:=0;
REPEAT Terminal.Read(ch);
161
WHILE
IF
ch>"
ELSIF
(ch>="0")
AND
DEC(1);
(ch<="9")
169
val:=10*val+VAL
END;
171
";
val:=val
DIV
10;
AND
((val<6553) OR ((val=6553) AND
Terminal.Write(ch);
INC(1);
170
175
ch<>"
" DO
Terminal.Write(ch);
END;
167
168
174
UNTIL
from terminal*)
ch=DEL THEN
IF 1>0 THEN
164
165
172
173
f
(*con*)
159
160
166
file
val:CARDINAL);
THEN
162
163
from
(ch<="5")))
THEN
(CARDINAL,ORD (ch) -ORD ("0"));
Terminal.Read(ch);
END;
Done:=1>0;
ELSE
val:=0;
176
REPEAT
177
WHILE
(*input
Done:=TRUE;
Read(f,ch)
ch>"
" DO
UNTIL
ch<>"
";
from file*)
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
ZAM
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
App. F
Program listings
362
IF
(ch>="0")
AND
((val<6553)
(ch<="9")
OR
AND
((val=6553)
Done
AND
AND
(ch<="5")))
THEN
val:=10*val+VAL
(CARDINAL, ORD (ch)-ORD("0"));
ELSE
Done:=FALSE;
val:=0;
END;
Read
(f, ch) ;
END;
END;
termCH:=ch;
END ReadCard;
(* ReadInt
Read
PROCEDURE
VAR
ReadInt
ch:
sign:
x
Ss;
CHAR;
INTEGER;
CARDINAL;
ARRAY[1..80]
ae
BEGIN
an
(f:File;
OF
INTEGER-constant
VAR
from
file
f
val: INTEGER) ;
CHAR;
INTEGER;
ReadString(f,s);
x:=0; val:=0; i:=1;
IF s[i1]="-" THEN sign:=-1;
ch:=s[1];
INC(1);
ELSE
sign:=1;
END;
LOOP
IF
IF
ch=0C THEN Done:=TRUE; EXIT; END;
(ch<"0") OR (ch>"9") THEN Done:=FALSE;
EXIT; END;
IF (x>3276) OR ((x=3276) AND (ch>"8")) THEN Done:=FALSE;
x:=10*x+VAL
(CARDINAL, ORD (ch) -ORD ("0") );
INC (1); ch:=s[1];
END;
IF Done
IF
THEN
x<=32767
ELSIF
ELSE
THEN
sign=-1
val:=sign*VAL
THEN
Done:=FALSE;
(INTEGER, x);
val:=-32767;
DEC(val);
END;
END;
END
ReadInt;
(* ReadString
Read
a string
PROCEDURE ReadString(f:File;
VAR i: INTEGER; ch:CHAR;
BEGIN
IF
f=NIL
of characters
VAR
s:ARRAY
from
file
OF CHAR);
(*con*)
THEN
REPEAT
Terminal.Read(ch);
UNTIL ch<>"
";
di=—ilire
WHILE
ch>"
IF
" DO
ch=DEL
THEN
IF 1>=0
THEN
ELSIF
1<HIGH(s)
Terminal.Write(10C);
Terminal.Write(ch);
END;
Terminal.Read(ch) ;
END;
ELSE
DEC(i);
THEN
INC(1);
s[i]:=ch;
END;
f
EXIT
END;
App. F
FilelO. MOD
237
REPEAT
238
1:=-1;
239
240
WHILE ch>" " DO
IF i<HIGH(s)
THEN
241
242
243
244
245
246
247
248
249
250
2312
252
UNTIL
ch<>"
INC(1);
";
s[1i]:=ch;
Read(f,ch) ;
END;
END;
termCH:=ch;
INC (1);
IF 1<=HIGH(s) THEN
END ReadString;
s[1]:=0C;
END;
END;
(* ReadWord
Read a word from File f without
Le
_ a
=
PROCEDURE ReadWord(f:File; VAR w:CARDINAL);
253 VAR
254
Read(f,ch);
363
i,j:
conversion
en
x)
CHAR;
BEGIN
255
256
257
258
259
Read(f,i); Read(f, 4);
w:=256*ORD(i) + ORD(4);
END ReadWord;
260
(* Write
261
262
263
----------------------------------------------------------------------- *)
PROCEDURE Write(f:File; ch:CHAR);
VAR par:ParamBlockRec; status: INTEGER;
264
BEGIN
265
266
IF f=NIL
„
26%,
THEN
a character
to
list
file
(*con*)
Terminal.Write(ch);
3° ELSE
268
WITH
269
270
271
272
273
274
275
276
277
278
279
280
281
282
Write
£f* DO
IF bp>=buffersize THEN
par.ioRefNum:=ref; par.ioBuffer:=ADR (buffer);
par.ioReqCount:=buffersize; par.ioPosMode:=0;
par.ioPosOffset:=0;
status:=FS (PBWrite,par) ;
bp:=0
END;
buffer[bp] :=ch; INC (bp)
END
END;
END Write;
(* WriteCard
Write
a cardinal
to
list
file
283 ----------------------------2722222222222.
284 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER);
285 VAR
286
1,d:
INTEGER;
Zee:
288
ARRAY[1..5]
289
1:=0;
290
REPEAT
291
d:=nr
292
INC (1);
293
294
295
OF CHAR;
BEGIN
MOD
10;
nr:=nr
UNTIL nr=0;
WHILE w>l DO Write(f,"
WHILE
1>0
DIV
10;
t[{1]:=CHR(ORD
("0") +d);
DO
");
Write(f,t[{l]);
DEC(w);
DEC(1);
END;
END;
*)
Program listings
364
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
Sig
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
END
WriteCard;
(* WriteHex
PROCEDURE
VAR
App. F
Write
length
WriteHex(f:File;
i,j: INTEGER;
PROCEDURE
bytes
s:ARRAY
from
a
OF WORD;
length: INTEGER) ;
w:CARDINAL;
WriteHexDigit
(b: INTEGER) ;
BEGIN
IF b<10
THEN Write (f,CHR(b+ORD("0")));
ELSE
END
BEGIN
Write(f£,CHR(b-10+ORD("A")));
END;
WriteHexDigit;
(*WriteHex*)
j:=0;
FOR
i:=1
TO length
DO
IF ODD (1)
THEN w:=VAL(CARDINAL,s[j])
DIV 256;
ELSE w:=VAL(CARDINAL,s[j])
MOD 256;
INC(}4);
END;
weitet
zn?
WriteHexDigit(w DIV
WriteHexDigit(w MOD
END;
END
16);
16);
WriteHex;
(* WriteInt
PROCEDURE
VAR
Write
an
INTEGER-value
WriteInt(f:File;
mar
INTEGER;
xe
tes
CARDINAL;
ARRAY[1..5]
sign:
CHAR;
OF
i:INTEGER;
to
file
f
w: INTEGER) ;
CHAR;
BEGIN
IF i<0
THEN
ELSE
sign:="-";
sign3=" ">
END;
x:=VAL (CARDINAL, ABS(i+1));
x:=VAL (CARDINAL, ABS (1) ) ;
INC(x);
1:=0;
REPEAT
d:=x MOD 10; x:=x DIV 10;
INC(1); t[1]:=CHR(ORD
("0") +d);
UNTIL x=0;
WHILE w>1+1 DO Write(f," "); DEC(w); END;
IF (sign="-") OR (w>l) THEN Write(f,sign);
WHILE 1>0 DO Write(f,t{l]); DEC(1); END;
END WriteInt;
(* WriteLn
Skip to new
a
PROCEDURE
aaah
a —
WriteLn(f:File);
BEGIN
IF f=NIL (*con*)
THEN Terminal.WriteLn;
ELSE
Write (f,EOL);
line
athe
on list
END;
file
Le
a
i
*
App. F
356
357
358
359
360
361
FilelO. MOD
END;
END WriteLn;
(* WriteString
Write a string to list
--------------------------------------
362 PROCEDURE WriteString(f:File;
363 VAR i: INTEGER;
364
365
366
IF 1>HIGH(s)
368
ELSIF
369
ELSIF s[i]=0C THEN EXIT;
370
371
372
373
ELSE Write(f,s[i]);
END;
INC (1);
END;
374
END
*)
THEN
WriteLn(f);
WriteString;
(* WriteText
Write
text
to
list
file
---------------------------------------------------------------------- 7)
PROCEDURE
VAR i:
BEGIN
WriteText (f:File;
t:ARRAY
OF
CHAR;
1:INTEGER);
INTEGER;
FOR
1:=0
383
END
WriteText;
TO
384
385)
7_
386
(* WriteWord
Ha a
1-1
DO Write(f,t[i]);
Write
a word
END;
to File
f without
conversion
2222070000020
00000 x)
388 PROCEDURE
389
0.0.
OF CHAR);
THEN EXIT;
s[1]="$"
382
381
s:ARRAY
fill
BEGIN
13=0;
LOOP
367
375
376
377
378
379
380
381
365
WriteWord(f:File;
w:CARDINAL) ;
BEGIN
390
391
Write(f,CHR(w
Write (f,CHR(w
392
END
DIV
MOD
256));
256));
WriteWord;
393
394 BEGIN
395
con:=NIL;
396
dlgHook:=VAL(DialogHook,NIL);
ftype[0]:="TEXT";
397
398
filterHook:=VAL(FilterHook,NIL) ;
errCode:=0;
399
END FilelO.
ABS
ADR
Allocate
b
bb
bp
buffer
335
9
19
304
lol
io
lala)
336
40
97
306
Ahelsr
a
lee
307
Alzal
a
GAs)
buffersize
WBF
269
ZN
€
ch
40
1a
59
er
65
a
194
MOO)
Aa
203
DYE
A
205
Dv
932.
ftype[1]:="";
17012213082270
308
42
a
SPN
A
Ry
u
AG
rl
PATHS
82
a
93
94
205
230
246
re
369
7
206
Ol
A
206
207
Dahil Bob
les
226
226
228
22476255266
208
DSO
209
BAY
222
DATE
el
App. F
Program listings
366
Close
con
123
Create
creator
55
49
291
122
229
292
329
340
341
70
105
396
121
143
173
175
178
181
205
206
207
214
d
Deallocate
DEL
DialogHook
dlgHook
Done
40
42
drive
DupFNErr
EF
eof
EOFErr
EOL
errCode
f
fdCreator
fdType
File
FileIo
FileRecord
FilterHook
filterHook
fn
fName
ftype
GetCatInfo
good
h
HFS
HIGH
il
l1oFlFndrInfo
l1oMisc
1oNamePtr
loPermssn
loPosMode
loPosOffset
ioRefNum
143
140
143
77
140
97
115
183
268
ss
87
143
98
116
192
284
353
88
398
99
117
200
294
355
91
95
96
105
19
121
v2
139
99
118
221
295
362
99
122
224
301
368
100
122
237
307
370
100
128
241
308
379
100
131
252
318
382
100
134
239
327
388
105
154
255
343
390
111
157
262
344
Sol
111
379
399
128
388
154
192
221
252
262
284
301
327
351
60
719
79
79
80
82
82
82
84
43
395
44
395
95
119
121
139
273
47
231
64
82
201
232
313
372
51
240
65
82
202
238
314
380
246
65
82
202
240
327
382
367
66
82
203
240
334
382
66
83
209
240
335
67
155
209
245
336
69
159
222
246
363
75
163
227
246
365
75
164
230
253
367
Tis)
168
230
255
368
136
50
270
93
397
59
15
al
66
47
FS
1oActCount
ioBuffer
10oDirID
1oFDirIndex
88
198
232
302
370
141
117
40
47
49
94
40
94
118
118
99
49
93
Si
138
115
271
272
136
270
81
173
231
256
369
App.F
ioReqCount
FilelO. MOD
118
1oVersNum
ioVRefNum
J
1
length
MemTypes
ModStr
name
nr
ODD
338
301
284
314
Open
os
367
137 271
93
93
66 66 66 253 255 256
14 75 286 289 292 292
341 341 343 344 345 345
313
99
40 99
291 291 291
302
294
345
Read
132
147
139
ParamBlockRec
PasStr
PBClose
PBHCreate
PBHDelete
PBHOpen
PBRead
PBWrite
Point
ProcPtr
QuickDraw
ReadCard
ReadInt
ReadString
ReadWord
ref
REG
reply
s
SetCatInfo
SETREG
SFget
SFGetFile
SFput
SFPutFile
SFReply
SFTypeList
SHORT
sign
status
statusl
Str255
149
160
171
176
183 226
a
185
273
ai
ls
ek
ey
eh
23423717
72412255
BA
240
Rey
246
187
216
221 247
257
115 136 270
62 70 72 74 75 16
60 62 62 69 75 82 83
200 202 203 209 221 231 232
315 316 362 367 368 369 370
51
70
69
62
62
31
32
141
202 202 212 213 332 335 336
A aa
Sig PAT 49 2.51
41 43 46
29
SYSTEM
System
(8
ShkKS
BAS)
106
per.
par
SIG
POS,
293
36
61 87 100 100 116
Sie Page ale dg) Man AT 351
118 118 118 119 121 129 136
14a" 263 72708 270° 271 2711-272
28 37 112 1290263
60 84
121
a1 44
43
95
139
119 273
30
2 69 m
60 60 62 69
OSType
output
2
Sey
PRS) sy
310382
292
295
331
341
345
379
382
Ry
240
344
344
2638213
EY)
246
termCH
Terminal
tlist
Toolbox
type
VAL
volRef
vRefNum
WwW
WORD
Write
WriteCard
WriteHex
WriteHexDigit
WriteInt
WriteLn
WriteString
WriteText
WriteWord
x
App. F
Program listings
368
186
244
132
66
160
69
164
168
iyi
226
230
232
234
266
354
159
181
62
397
76
164
192
69
164
201
70
167
212
169
167
23
180
169
213
208
169
175
179
179
180
212
319
316
335
336
80
88
93
99
99
256
344
301
168
344
296
322
309
346
354
374
383
392
201
340
284
388
294
390
294
391
302
315
316
319
320
327
343
230
345
232
355
262
370
266
382
279
390
294
391
295
307
308
318
319
320
357
368
207
340
207
342
208
208
212
212
330
335
335
336
49
App. F
System.DEF
1 (* System
2
3
4
5
6
System
dependent
module
369
(from MacMETH
[86])
The module System is the heart of the Modula-2 system on the Macintosh.
It contains the loader and procedures to supply missing instructions
of the processor (REAL and LONGINT arithmetic). There are also
procedures for calling and terminating programs and handling the heap.
8 DEFINITION
MODULE
10 FROM
Wes
12
SYSTEM
IMPORT
13 TYPE
14
Status
=
System;
(*H.Seiler,
C.Vetterli,
22-Dec-85/26-Feb-86*)
ADDRESS;
(normal, moduleNotFound,
fileNotFound,
illegalKey,
readError, badSyntax, noMemory, alreadyLoaded,
15
killed, tooManyPrograms, continue, noApplication);
16
17 PROCEDURE Allocate (VAR ptr:ADDRESS; size:LONGINT);
18 (* Tries to allocate a memory area of the given size on the heap. If the
19
space is not available, ptr returns NIL otherwise ptr returns the
20
address of the reserved area*)PROCEDURE Deallocate (VAR Ptr:ADDRESS);
22 PROCEDURE Deallocate (VAR ptr:ADDRESS);
23 (* Releases the memory area given by address
25 PROCEDURE
26
27
ptr
returns
NIL*)
Terminate (status :Status);
(* terminates the currently
cause of termination*)
31 END Systen.
ptr.
running
process.
status
signals
the
Bibliography
Aho A.V., Johnson S.C. [1974] LR-parsing, Computing Surveys 6, 2, 99-124
Aho A.V., Ullman J.D. [1972] The Theory of Parsing, Translation, and Compiling,
Prentice Hall
Aho A.V., Ullman J.D. [1977] Principles of Compiler Design, Addison-Wesley
Bauer F.L., Eickel J.(eds) [1976] Compiler Construction. An Advanced Course, SpringerVerlag
Blaschek G., Pomberger G., Ritzinger F. [1985] Einführung in die Programm
ierung mit
Modula-2, Springer-Verlag, to appear in English 1989
Engelfriet J., File G. [1981] Passes, Sweeps, and Visits, in: Lecture
Notes in Computer
Science 115, Springer-Verlag, 193-207
Feldman J.A., Gries D. [1968] Translator writing systems, CACM
9, 1, 77-113
Fischer C.N., LeBlanc R.J. [1988] Crafting a Compiler,
Publishing Company
The
Benjamin/Cummings
Ganzinger H., Giegerich R. [1984] Attribute coupled grammar
s, SIGPLAN Notices 19, 6
157-170
Gries D. [1971] Compiler Construction for Digital Compute
rs, Wiley
Hartmann A.C. [1977] A Concurrent Pascal Compiler for
Minicomputers, Springer-Verlag
Henderson P., Snowdon R. [1972] An experiment
in structured programming, Bit 2, 38-53
370
>
Bibliography
Hopcroft, Ullman J.D. [1979] Introduction
Computation, Addison-Wesley
371
to Automata
Theory, Languages,
and
Hughes J.W. [1979] A formalization and explication of the Michael Jackson method of
program design, SOFTWARE - Practice and Experience 9, 191-202
Inside Macintosh [1985] volumes I-III, Addison-Wesley
Jackson M.A. [1975] Principles of Program Design, Academic Press
Johnson S.C. [1975] YACC_Laboratories, July 1975
Yet Another Compiler-Compiler, Tech.Rep.Nr.32, Bell
Kastens U., Hutt B., Zimmermann E. [1982] GAG: A Practical Compiler-Generator, in:
Lecture Notes in Computer Science 141, Springer-Verlag
Knuth D.E. [1965] On the translation of languages from left to right, Information and
Control 8, 6, 607-639
Knuth D.E. [1968] Semantics of context-free languages, Mathematical Systems Theory 2,
127-145
Koskimies K. [1984] A specification language for one-pass semantic analysis, SIGPLAN
Notices 19, 6, 179-189
Koskimies K., Räihä K.-J., Sarjakoski M. [1982] Compiler construction using attribute
grammars, Proc. SIGPLAN 82 Symposion on Compiler Construction, June 1982,
153-159
Lewis P.M., Rosenkrantz D.J., Stearns R.E. [1976] Compiler Design Theory, AddisonWesley
Lewis
P.M., Stearns R.E.
3,464-488
[1968]
Syntax directed
transduction,
Journal
ACM 15,
Meijer H., Nijholt A. [1982] YABBER - yet another bibliography: translator writing tools,
SIGPLAN Notices 17, 10
Mössenböck H. [1986] Alex — a simple and efficient scanner-generator, SIGPLAN Notices
2S
Pomberger G [1986] Software Engineering and Modula-2, Prentice Hall
Räihä K.-J. [1977] On Attribute Grammars and their Use in a Compiler Writing System,
Report A-1977-4, Department of Computer Science, University of Helsinki
Räihä K.-J. [1980] Bibliography on attribute grammars, SIGPLAN Notices 15, 3
Räihä K.-J., et al. [1983] Revised Report on the Compiler Writing System HLP78,
Report A-1983-1, Department of Computer Science, University of Helsinki
372
Bibliography
Rosen S. (ed.) [1967] Programming Systems and Languages, McGraw-Hill, New York
Rosenkrantz D.J., Stearns R.E. [1970] Properties of deterministic top-down grammars,
Information and Control 17, 3, 226-256
Spenke M., Mühlenbein H., Mevenkamp M., et al. [1984] A language independent error
recovery method for LL(1) parsers, SOFTWARE - Practice and Experience 14, 11
Tienari M. [1980]
On the Definition of an Attribute
Grammar,
in: Lecture Notes in
Computer Science 94 (eds Goos, G. and Hartmanis, J.), Springer-Verlag
Waite W.M., Goos G. [1984] Compiler Construction, Springer-Verlag
Watt D.E., Lehrmann Madsen O. [1983] Extended attribute grammars, The Computer
Journal 26, 2, 142-153
Wirth N. [1982] Programming in Modula-2, Springer-Verlag
Wirth N. [1986] Compilerbau, B.G. Teubner Stuttgart
Wirth N., Gutknecht J., Heiz W., et al. [1986] MacMETH - A Fast Modula-2 Language
System For the Apple Macintosh, User Manual, ETH Zürich
Index
actual attributes, 113, 165
address list for G-code generation, 157
Adele, 11, 125, 203
Aho, 13, 41
Alex, 119
Algol60, 52
algorithmic interpretation of grammars, 83
alias name, 109, 123
aliasspix, 128
alphabet, 14
extension, 51
alternative chain, 48, 108
alternatives, 15
of deletable nonterminals, 137
of eps-nodes, 137
ambiguity, 108
analysis phase, 4
analyzing grammar, 23
AND, 208
any, 45, 107, 122, 124, 178
any-set, 140, 147, 155
anyset, 54
applications of attributed grammars, 171
arithmetic expressions, 19
arithmetization of symbols, 6
arrows, 112
assessment of some compiler generators, 102
at, 122, 165
Atari, 101, 126
attribute, 71, 72, 113
assignment, 131, 165
context, 167
coupling, 98
direction, 164
evaluation, 79
list, 129, 164, 226
numbers, 155
passing, 87
processing, 164
saving, 90
attributed grammar, 73, 79, 105
applications, 171
of Coco, 228
attributes
consistency check, 165
of terminals, 122
Attrkind, 166
back end, 6
Bauer, 7
BITSET, 208
Blaschek, 207
BNF, 102
bottom-up syntax analysis, 24
brackets, 136
caller interface, 121
CAP, 209
CARDINAL, 208
central-recursive grammar, 19
characteristics of Coco, 117
CheckAltematives, 153
circular, 108
derivation, 21
grammar, 21
circularity, 150
CloseFile, 223
Coco, 4, 104, 222, 241
characteristics, 117
history, 197
short description, 100
coco.ATG, 228
374
Index
cocogen, 224, 245
cocogen2, 225, 254
indirect, 141
Deletable, 60, 141
cocogra, 224, 266
deletable nonterminal, 31, 141
Cocol, 4, 105
Delete redundant eps-nodes, 127, 138
example, 101, 134, 163, 167, 174, 186,
DelGraph, 141
190, 192
syntax, 212
derivable symbol, 21
derivation, 16
cocolex, 223, 275
cocolst, 226, 283
tules, 15
derived attributes, 74
cocosem, 223, 287
cocosemframe, 161, 297
cocosym, 224, 299
deterministic grammar, 24
direct deletability, 128, 134
documentation, 187
cocosyn, 223, 316
cocosynframe, 159, 328
cocotst, 225, 338
dynamic
compiler structure, 8
col, 122
CollectFirst, 143
EBNF, 19, 20, 107, 117
CollectFollow,
Emit, 157
144
comments, 106, 110
compiler, 2
compiler compiler, 3, 91
compiler description language, 3, 105
compiler error numbers, 241
compiler structure
dynamic, 8
Static, 4
complement symbol any, 45, 107
Camplete, 145
CompleteAt, 129, 223
completeness, 108, 149
components of a generated compiler, 119
compound characters, 6
ConcatLeft, 133, 223
ConcatRight, 132, 223
context condition, 76, 87, 115
context-free grammar, 15, 106
Copy, 162, 163, 223
CopyFramePart, 160, 161
correct grammar, properties, 108
cross-reference list, 214
cyclic semantic dependencies, 82
EmitAction, 166, 167, 223
empty string, 14, 107
end-of-file symbol, 109
end-of-line symbol, 110
endsem, 70
Engelfriet, 98
eps, 107
followers, 54
eps-nodes
insertion, 136
removal, 138
terminal successors of, 140
eps-set, 140, 145, 155
example, 196
epsset, 54
equivalent top-down graphs, 45
errdist, 68
Error, 60, 65, 68
error distance, 68
error handling, 62, 64
error message module, 119, 226, 348
error messages, 65, 123
Errorptr, 123
Errors, 123, 226, 348
dangling else, 29, 108, 147
debug switches, 241
DEC, 209
declaration of
semantic objects, 115
symbols, 109
definition module, 210
DelEps, 139
deletability, 31
direct, 128, 134
example of
Cocol, 101, 163, 167, 174, 186, 190, 192
generated compiler parts, 192
EXCL, 209
exit statement, 209
experiences, 197, 201
export list, 209
extended Backus-Naur form, 19
factorization of
Index
nonterminals, 49
top-down graphs, 43
File, 98
FileIo, 226, 356
Fill, 67
FillSucc, 67
filter procedure, 120
Find circular
rules, 148, 150
Find deletable symbols, 127, 141
FindEps, 146
FindEpsFollowers, 146
first (X), 26, 54
;
Fischer, 13
follow(X), 28, 143
formal attributes, 113, 165
frame module, 118, 159, 161, 297, 328
free monoid, 14
free semi-group, 14
front end, 6
grammar of Cocol, 212
grammar name, 106, 110, 121
grammar rules, 107
grammar tests, 126, 147, 225, 338
grammars in matrix form, 34
grammatical language levels, 22
GraphList, 223
Graphnode, 47, 130
Gries, 7, 13, 85
HALT, 209
handle, 18
Hartmann, 85
Henderson, 184
HIGH, 209
hints for reading the source lists, 226
HLP84, 91, 94, 104
Hopcroft, 21
Hughes, 188
Hutt, 96
G-code, 53, 55, 88, 117, 155, 213
example, 195
generation, 156
parser, 58
IBM-PC, 101, 126
identifiers, 106
GAG, 91, 96, 102, 104
Ganzinger, 91, 98
GenAssign, 166, 167, 223
GenCode, 156, 157
Generate G-code, 157
implementation description, 125
implementation module, 210
implementation restrictions, 241
import, 115, 122
list, 209
generated compiler parts, 118
INC, 209
example, 192
generated compiler, operation, 120
generated semantic actions, 165
generation of the
semantic evaluator, 245
syntax analyzer, 254
generative grammar, 23
Get eps-sets, 145
Get symbol sets, 127
Get terminal start symbols, 142
Get terminal successors, 144
GetAdr, 157
GetAt, 129, 165, 167, 223
GetFirstSet, 142
GetMacroNr, 163, 223
GetNode, 131, 140, 148, 157, 223
GetSingles, 151
GetSy, 122, 124, 129, 140, 148, 223
Giegerich, 91
Goos, 13, 82, 83
GRAMMAR, 106
grammar, 15
INCL, 209
indirect deletability, 141
individual characters, 6
inherited attributes, 74, 75
inner module, 211
input attribute, 113
input of Coco, 118
input interface, 122
Insert eps-nodes before deletable
nt's, 127,138 _
interfaces of the generated compiler, 121
intermediate language, 120, 124
intermodular cross-reference list, 214
invocation of Coco, 118
IsTerm, 152
Jackson, 187
Johnson, 13, 91, 92
Kastens, 91, 96
375
376
Index
keywords, 6, 105
Knuth, 13, 29, 82
Koskimies, 91, 94, 102
Coco, 199
the generated compilers, 200
MenTypes, 226
mini-scanner, 174
L-attributed grammar, 4, 82, 83, 92, 117
LALR(I) parser, 92,94, 96
language, 16
levels, 22
LeBlanc, 13
left-canonical derivation, 17
left-recursive grammar, 19
Lewis, 82
lexical
analysis, 5, 6
analyzer, 119, 122, 129, 165, 275
analyzer described by Cocol, 171
analyzer, specification, 172
language level, 22
Lilith, 101, 126, 198
line, 122
line numbers, 122, 131
linking
alternative graphs, 133
component graphs, 132
listings, 220
literals, 6
LL(1) test, 148, 153
LL(1) analysis
nonrecursive, 38
recursive, 35
LL(1) conditions, 27, 28
for top-down-graphs, 47, 49
LL(1) conflicts, 108
in lexical structures, 179
LL(1) grammar, 23, 26 ; 201
LL(k) condition, 40
LL(k) grammar, 25, 40
LL(k) test, 41
lookahead, 25
Macintosh, 101, 119, 126
macro, 112, 116, 163
main algorithm of Coco, 127
main program, 119, 121, 210, 222, 241
MarkReachedNts,
150
matching of symbols, 48
matrix form of grammars, 34
measurements, 197
Meijer, 91
memory requirements of
Modula-2, 111, 115, 119, 126, 207
modules, 209
description, 222
hierarchy, 221
overview, 220
Mössenböck, 119
MUG, 91, 98, 104
multi-pass compiler, 8, 9, 120, 124
name list, 129, 155
names, 6
NewAdr, 157
NewAt, 129, 164, 167, 223
NewMacro, 223
NewNode, 131, 223
NewSy, 129, 223
Nijholt, 91
nococosy, 162
nodes of the top-down graph, 130
non-circular grammar, 21
nonterminal, 14, 15, 110, 128
deletable, 141
nonterminals
factorization of, 49
replacement of, 15
substitution of, 49
terminal successors of, 140, 143
termination of, 108, 152
numbering of terminals, 109, 122
numbers, 106
OpenFile, 223
OpenSem, 163, 223
optimization of attribute processing, 167
option symbol, 20
OR, 208
ordered attributed grammar, 96
OS, 226
Output attribute, 113
output of Coco, 118
Output interface, 122
parameter arrows, 112
Parse, 58, 60, 86, 121, 127
ParseNonRecursive, 38
Index
parser, 223, 316
generation, 159
377
the generated compilers, 201
interface, 121
tables, 118, 155
tables, example, 195
tables, generation, 154
ParseRecursive, 35
parts of the generated compiler, 119
Pascal, 207
pass, 8
phrase, 17, 18
PL/1, 50
F
PLM/80, 50
Pomberger, 207
pragma, 109, 124
semantics, 113, 128, 155
printinput, 121
printnodes, 121
procedures, 115
productions, 15, 107
program frames, 118
program listings, 220
scanner, 129, 165, 223, 275
scanner generator, 119, 171
scanner interface, 122
scanner procedure, 122
scanner specification, 172
scope of semantic objects, 116
sem, 70, 111
Semant, 85, 86
semantic
action numbers, 131
actions, 70, 111
actions, generated, 165
actions, processing, 163
analysis, 5, 8
declarations, copying, 162
description, 110
error action, 115
evaluator, 118, 119, 223
evaluator of Coco, 287
evaluator, example, 194
QuickDraw, 226
evaluator, generation, 160
frame module, 297
interface, 85
.
macro, 111, 112, 116, 163
Räihä, 91, 94
modules, 119, 122
reachability, 149
recursive
grammar, 19
productions, 19
reduced grammar, 20, 21
redundancy, 108
redundant
eps-node, 138
symbol, 21
repetition symbol, 20
replacement of nonterminals, 15
RepNode, 131, 140, 223
RepSy, 129, 140, 223
RestartHash, 162, 223
restrictions, 241
results of a Coco run, 192
right end of graphs, 131
procedures for lexical analysis, 180
semantics, 69
sentence, 16
symbol, 15
sentential form, 16
simple phrase, 18
single-pass compiler, 8, 9
Snowdon, 184
software engineering, 182
source code, 220
hints, 226
source list, 118
generator, 283
source program, 2
spelling index, 129
spix, 128, 129, 162, 166
stacking of semantic objects, 116
right-recursive grammar, 19
Ritzinger, 207
start symbol, 110, 149
StartCopy, 223
root, 15
static compiler structure, 4
symbol, 106, 110, 149
Rosenkrantz, 40, 42
RULES, 107
run-time of
Stearns, 40, 42
stepwise refinement, 11
StopHash, 162, 223
strings, 6, 14, 106
Coco, 199
substitution of nonterminals, 49
378
Index
Test if all nt's can be reached, 148,
149
symbol list, 126, 127, 224, 226, 299
symbol names, 129
symbol sets, collection, 140
token code, 109, 122
Toolbox, 226
wn
graph, 42, 126, 130, 226, 266
graphs, equivalent, 45
graphs, factorization of, 43
syntax analysis, 23, 24
top-down-graphs, LL(1) conditions for, 47, 49
trace switches, 241
tracing the parser, 121
Triple, 66
two level-grammar, 77
Symbolnode, 127
symbols, 6, 14
Symboltype, 127
SyNr, 129, 223
syntactic extension, 51
syntactical language level, 22
syntax
analysis, 5, 34
analyzer, 118, 119, 223, 316
analyzer, generation, 159
of Cocol, 212
description, 106
error indicator, 121
error interface, 123
error message, 109
error-recovery, 118
notation, 107
rules, 15, 107
tree, 7, 14, 17, 91
SyntaxError,
typ, 122
type transfer functions, 209
Ullman, 13, 21, 41
understanding the source code, hints, 226
useless symbol, 21
user modules, 122
using Coco, 117
123
synthesis phase, 5
synthesized attributes, 74
SYSTEM, 211
Vach, 98
van Wijngaarden, 77
variables, 115
versions of Coco, 4
System, 226
system specific procedures, 369
Visited, 157
vocabulary, 14
target program, 2
tasks of Coco, 126
telegram problem, 184
terminal, 14, 15, 109, 122, 128
class, 23
start symbols, 26, 31, 32, 140, 142
start symbols of length k, 40
successors, 28, 31, 33
Waite, 13, 82, 83
Watt, 77
where, 77
Wirth, 20, 85, 107, 198, 207
WORD, 208
successors of eps-nodes, 140, 145
successors of nonterminals, 140, 143
terminating symbol, 21
YACC, 91, 92, 98, 104
termination, 21
of nonterminals, 108, 152
Test completeness, 148, 149
Test grammar, 127, 148
Zimmermann, 96
Test
if all nt's
t's, 148, 152
can be derived
to
*
»
=
i
—_
~
;
é
N
c En
005.26 R2Z4
:
Pe
Rechenberö
r
to
ra
genne
A compiler
uters
microcomp
(005.26 R24c 120,067
Rechenberg,
Peter.
A compiler
generator
microcomputers
tor
for
EEE
ee
A COMPILER GENERATOR
FOR MICROCOMPUTERS
Presents a practical approach to compiler construction,
illustrating how to convertthe theoretical principles of
compiler writing into a working program. The book
_ describes the compiler generator Coco, developed by
the authors in Modula-2 to runon microcomputers.
Features include:
m Adetailed description of acompiler generator including
its source code:
= The application of the compiler generatorto non-trivial
problems.
m Emphasis on table-driven syntax analysis with automatic
error recovery and semantic specification of compilers by
means of attributed grammars.
@ |llustration of the application of documentation methods
to alarge program.
P. Rechenberg is Professor of Computer Science at the
. University of Linz, Austriaand H. Mössenböck is
Assistant Professor of Computer Science atthe Federal
Institute of Technology (ETH), Zurich, Switzerland.
IT
if
Prentice Hall
ISBN
D-13-155060-8