Текст
                    ACOMPILER
ee
MICROCOMPUTERS

eee

~P.RECHENBERG

ISSENBOCK


hy
190081
Digitized by the Internet Archive in 2022 with funding from Kahle/Austin Foundation https://archive.org/details/compilergenerato0000rech
A COMPILER GENERATOR FOR MICROCOMPUTERS
Limits of Liability and Disclaimer of Warranty The authors and publishers of this book have used their best efforts in preparing this book and the programs contained within it. These efforts include the development, research and testing of the theories and programs to determine their effectiveness. The authors and publishers make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The authors and publishers shall not be liable in any event for incidental and consequential damages in connection with, or arising from, the furnishing, performance or use of these programs.
A COMPILER GENERATOR FOR MICROCOMPUTERS Peter Rechenberg University of Linz Hanspeter Mossenbock University of Ziirich Translated by John O’Meara and the authors
First published in English 1989 by Prentice Hall International (UK) Ltd, 66 Wood Lane End, Hemel Hempstead, Hertfordshire, HP2 4RG A division of Simon & Schuster International Group This book was originally published in German under the title Ein Compiler Generator ftir Mikrocomputer by Peter Rechenberg and Hanspeter Rechenberg © 1985 Carl Hanser Verlag, Munich and Vienna. © 1989 Carl Hanser Verlag and Prentice Hall International (UK) Ltd All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission, in writing, from the publisher. For permission within the United States of America contact Prentice Hall Inc., Englewood Cliffs, NJ 07632. Printed and bound in Great Britain by A. Wheaton & Co. Ltd, Exeter. Library of Congress Cataloguing-in-Publication Data Rechenberg, Peter [Compiler- Generator fiir Mikrocomputer. English] A compiler generator for microcomputers / Peter Rechenberg. Hanspeter Mössenböck. > Cin Translation of: Ein Compiler- Generator fiir Mikrocomputer. Bibliography: p. Includes index. ISBN 0-13-155060-8 : $40.00 1. Compilers (Computer programs) — Programming. I. Mössenböck, Hanspeter, QA76.76. C65R4313 005.26 — dc19 ————— 1959- . 2. Microcomputers II. Title 1988 ee ee en ee ee ee British Library Cataloguing in Publication Data Rechenberg, Peter A compiler generator for microcomputers. l. Computer systems. Programming languages. Compilers. Design & construction I. Title II. Mössenböck, Hanspeter III. Ein Compiler-Generator für Mikrocomputer. English 005.4'53 ISBN 0-13-155060-8 ISBN 0-13-155136-1 Pbk ee ee 12345 92 91 90 89 88 ISBN D-13-155060-8 ISBN O-13-15513b-1 PBK 88-28926
Contents Preface Numbered definitions, algorithms, examples Symbols 1 Introduction and survey 1.1 Compilers and compiler compilers 1.2 Static compiler structure 1.3 Dynamic compiler structure 1.4 The structure of the book 2 Syntax 2.1 Basic concepts from formal language theory 2.2 LL(1) grammars and syntax analysis 2.3 The top-down graph 2.4 The G-code 2.5 Parsing with the G-code 2.6 Error handling xi
Contents 3 Semantics 3,1 Semantic actions 3.2 Attributes 3.3 Context conditions 3.4 Attributed grammars 3.5 L-Attributed grammars 3.6 Implementation of the semantic interface 69 70 a 76 79 82 85 Various compiler compilers 4.1 YACC - yet another compiler compiler 4.2 HLP84 - Helsinki language processor 4.3 GAG - generator based on attribute grammars 4.4 MUG - modular compiler generator 4.5 Coco - compiler compiler 4.6 Summary 91 92 94 96 98 100 102 The compiler description language Cocol 5.1 Lexical structure 5.2 Cocol as a syntax description language 5.2.1 Productions 5.2.2 Declarations 5.3 Cocol as a semantic description language 5.3.1 Semantic actions 5.3.2 Attributes 5.3.3 Context conditions 5.3.4 Semantic declarations 5.3.5 Scope of semantic objects 105 105 106 107 109 110 111 113 115 115 116 The 6.1 6.2 6.3 6.4 117 147 119 120 121 121 1272 122 123 124 compiler compiler Coco Characteristics Components of the generated compiler Operation of the generated compiler Interfaces of the generated compiler 6.4.1 Caller interface 6.4.2 Input interface 6.4.3 Output interface 6.4.4 Syntax error interface 6.5 Generation of multi-pass compilers
Contents 7 The implementation ed Survey 12 Structure of the symbol list 7.2.1 Symbol list representation 7.2.2 Symbol list construction 1.3 Structure of the top-down graph 7.3.1 Top-down graph representation 7.3.2 Top-down graph construction 7.3.3 Insertion of eps-nodes 7.3.4 Removal of redundant eps-nodes 7.4 Collecting the symbol sets 7.4.1 Deletable nonterminals 7.4.2 Terminal start symbols of nonterminals 7.4.3 Terminal successors of nonterminals 7.4.4 eps-sets 7.4.5 any-sets FS) Grammar tests 7.5.1 Completeness 7.5.2 Reachability 7.5.3 Noncircularity a 7.5.4 7.5.5 Termination LL(1) condition 7.6 Generation of the parser tables 7.6.1 7.6.2 7.6.3 Table format Generation of the G-code Generation of the remaining tables 77, Generation of the syntax analyzer 7.8 Generation of the semantic evaluator 7.8.1 The invariant parts of the semantic evaluator 7.8.2 7.8.3 7.8.4 8 Processing of the semantic declarations Processing of the semantic actions Attribute processing Applications 8.1 Applications in compiler construction 8.1.1 Specification of a lexical analyzer 8.1.2 Description of a lexical analyzer for Modula-2 8.1.3 Semantic procedures for lexical analysis 8.2 Applications in software engineering 8.2.1 Attributed grammars as a software design method
Vill Contents Jackson method as a special case a Coco run generated syntax analyzer generated semantic evaluator generated parser tables 184 187 187 192 193 194 195 Experiences with Coco 9.1 A basis for measurements 9.2 Measurements on Coco 9.3 Measurements on some generated compilers 9.4 General experiences 197 197 199 200 201 8.2.2 The telegram problem as an example 8.2.3 Attributed grammars as documentation 8.2.4 The 8.3 Results of 8.3.1 The 8.3.2 The 8.3.3 The 9 Appendices A B Definition of Adele Modula-2 and Pascal C Syntax of Cocol D G-code E _Intermodular cross-reference list F Program listings 203 207 IA: 213 214 220 Bibliography 370 Index 373
Preface This book describes the structure of the compiler compiler Coco, which was developed for microcomputers by the authors. It also deals with the techniques used by Coco and those by which Coco was developed. Special attention is given to the table driven top-down syntax analysis with automatic error recovery and description of semantics using L-attributed grammars. Coco is writteninModula-2 and generates compilers in Modula-2. It is hoped that this will show how well Modula-2 is suited to the implementation and documentation of large modular programs. Compiler compilers, as we understand them, are not the field of a few specialists in compiler construction, but rather are tools for managing various tasks in software engineering, a fact which is not generally known. The methodology of attributed grammars which lies at the foundation of compiler compilers includes, for example, the Jackson method as a simple special case, and can be applied where the program flow is primarily controlled by one structured input data stream. Thus this book has something to offer for a wide circle of readers: 1. 2. 3. Itis a representation of the principles of compiler construction, as far as they concern the analysis part of compilers especially LL(1)-syntax analysis with attributed grammars. (Lexical analysis is covered only marginally.) Itisa detailed description of a compiler compiler. It illustrates the application of a compiler compiler by numerous examples. ix
X 4. 5. Preface Itillustrates the application of software documentation methods on a large program system, especially the method of stepwise refinement and the use of an algorithm description language. It can be used to evaluate the suitability of Modula-2 for software engineering because it presents a large program in Modula-2 which exploits the special properties of modular programming. We consider the primary circle of readers to be advanced computer science students, theoretically and practically active computer scientists and software engineers. We therefore presuppose the usual terminology, assume that the reader is acquainted with the development of software and that he can read Pascal, or even better Modula-2, or some similar language. Accordingly, we have kept the discussion brief, but have also taken pains not to refer to special knowledge cited elsewhere to make the book understandable in itself. The focal point around which the entire book evolved is the complete Modula-2 code of Coco in Appendix F. We consider the publishing of such a large program system a gamble because we are not sure whether the reader will be interested in the numerous details in it, and because we expose ourselves to all sorts of criticism of our programming style and choice of algorithms. But at the same time we hope that it is just this completeness which makes the book valuable and distinct from others. For information concerning the structure of the book the reader is referred to Section 1.4. The Austrian Foundation for the Advancement of Scientific Research financially supported the development of the compiler compiler and thereby rendered it possible, for which we wish to express our appreciation. For the careful review of the manuscript and for helpful suggestions we wish to thank our colleagues and friends Prof. G. Pomberger, Dr G. Blaschek and F. Ritzinger; for proof reading the English translation we wish to thank D. Raye; for the review of the examples in Chapter 4 we wish to thank Prof. H. Ganzinger, Prof. U. Kastens, Dr K. Koskimies and Prof. R. Marty. The text was produced by ourselves with the text processor WriteNow on a Macintosh computer. Linz August, 1988 P. Rechenberg H. Mössenböck
Numbered definitions, algorithms, examples Definition Compiler Definition Compiler compiler Versions of Coco Example Lexical analysis Example Syntax tree 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 Definition Abbreviations for strings and sets of strings Definition Grammar Definition Derivation, sentential form, sentence, language Example Derivation of all sentential forms of a language Definition Left-canonical derivation Definition Phrase Definition Simple phrase, handle Example Phrase, simple phrase, handle Definition Recursive grammar Example Arithmetic expressions Definition Terminating symbol, derivable symbol Definition Useless symbol Definition Reduced grammar Definition LL(k) grammar Definition Terminal start symbols of a nonterminal Definition Terminal start symbols of a string LL(I) conditions for e-free grammars Example LL(1) conditions xi
Numbered definitions, algorithms, examples xii 219 2.20 2:21 2:22 2.23 2.24 225 2.26 221 2.28 229 2.30 231 252 2.33 2.34 233 2.36 231 2.38 2:39 2.40 2.41 2.42 2.43 2.44 2.45 2.46 2.47 2.48 2.49 250 21 > 32 323 3.4 3.9 3.6 Saif 3.8 Definition Terminal successors LL{1) conditions for arbitrary grammars Example LL(1) conditions Example Dangling else Definition Deletability Algorithm Marking deletable symbols Algorithm Calculation of the sets of terminal start symbols Algorithm Calculation of successor sets Algorithm LL(1) analysis (recursive) Example Recursive LL(1) parsing Algorithm LL(]) parsing (nonrecursive) Example Nonrecursive LL(1) parsing Definition Terminal start symbols of length k Definition LL(k) grammar LL(k) condition Example LL(2) and LL(3) test Example Basic structure of the top-down graph Definition Complement symbol any Example Equivalent top-down graphs Definition Alternative chain Example Alternative chains Definition Match Definition LL(1) conditions for top-down graphs Definition G-code (incomplete) Algorithm Parse (simplified) Algorithm Parse (complete) Example Error situation Principle of error handling Algorithm Error (basic structure) Algorithm Triple Algorithm Fill Algorithm FillSucc Algorithm Error (with heuristic enhancements) Example Example Example Example Example Example Example Example Semantic actions Semantic actions Interpretation of arithmetic expressions Interpretation of arithmetic expressions in EBNF Inherited attributes A context-sensitive language Context condition Context condition
Numbered definitions, algorithms, examples xiii 3.9 3.10 3.11 3.12 3.13 Definition Attributed grammar Example Variable declaration Definition L-attributed grammar Parser with semantic interface Example Attribute passing 3.14 3.15 4.1 4.2 4.3 4.4 4.5 5.1 5.2 5.3 5.4 Definition G-code (remainder) Principle of attribute saving for recursive symbols Example Attributed grammar as input for YACC Example Attributed grammar as input for HLP84 Example Attributed grammar as input for GAG Example Attributed grammar as input for MUG Example Attributed grammar as input for Coco Example Cocol grammar for real constants Example The use of eps Example The use of any Example How the compiler treats LL(1) conflicts 88 90 93 95 97 99 101 107 107 108 108 5.5 5.6 5.7 109 110 110 5.9 5.10 5.11 5.12 5.13 5.14 5.15 Example Exampie Example Example Example Example Example Example Example Example Example 6.1 Example Application of any 124 8.1 Example LL(]) conflicts in lexical structures 179 5.8 Terminal declarations Pragma declarations Nonterminal declarations Semantic actions Indication of data flow at parameters Semantic macros Semantic actions for pragmas Attributes Context conditions Declarations of semantic objects Stacking of semantic objects 79 80 83 86 87 B12 112 113 113 114 115 115 116
Symbols a 14 (085 14 a* 14 G O N 15 40 15 V 14 yr 14 Ve 14 € 14 Vr Vy 15 15 @ e A U > | 15 15 [] 20 {} {} => >t 20 16 16 =" 16 = 17 > Ent a, B, 9, @ (6) 203 The string of n identical symbols a The set {a": n= 1} The set {a”: n >0} Grammar Order (asymptotic time complexity) Sentence symbol Alphabet Alphabet of terminals Alphabet of nonterminals The set of all non-empty strings built from symbols of V The set of all strings built from symbols of V including the empty string The empty string The empty set ‘Element of Intersection of two sets Union of two sets Replacement symbol: ‘is defined as’ Separates alternatives Option notation (encloses optional symbols and strings) Set notation Repetition notation Direct derivation: 'produces directly' Derivation: 'produces' Derivation: ‘produces or is equal to’ Left-canonical derivation ‘Does not produce and is not equal to' Input, output, transient parameters Strings String to be analyzed
1 Introduction and survey The older of the two authors distinctly remembers that he first heard the word ‘compiler compiler’ at the IFIP-Congress in Munich in 1962 in connection with Atlas, the super computer of its time by the English company Ferranti. It was a dark, secretive term. Since compiler writing was still an art mastered by only a few initiates, one could only touch one's cap humbly to people who were involved in writing compilers which generated compilers. There was just no way to understand them. The two works which focused attention on compiler generating programs and which eliminated much of the mystery from the concept were the anthology by Rosen [1967] and the survey article Translator Writing Systems by Feldman and Gries [1968]. But it was the clear formulation of the two most important deterministic grammars, LR(k)-grammars by Knuth [1965] and LL(k)-grammars by Lewis and Stearns [1968] that helped compiler generators achieve the actual breakthrough. Today, the terms 'compiler generator’, ‘compiler generating program’ and ‘compiler compiler’ are used synonymously and refer to a system which in some way supports and partially automates the production of compilers. In the first chapter we introduce the concepts of 'compiler' and ‘compiler compiler’, survey the subtasks which a compiler must handle and discuss the organization of the book. The reader who is acquainted with the terminology of compiler construction, even only partially, can start immediately with Section 1.4.
Chap. 1 Introduction and survey 2 1.1 Compilers and compiler compilers With the exception of special cases, a program can be seen as the description of a process (algorithm) which transforms input data into output data (Fig. el): Fig 1.1 Program If the input data themselves form a program, and the program P transforms them into another language, P is called a compiler, the input data are called the source program and the output data are called the target program (Fig. 1.2). S Cc dt Fig. 1.2 Compiler Here, the source language is almost inevitably the higher, less machine-oriented, and the target language the lower, more machine-oriented, language often the machine language itself. Thus a compiler can be defined, as in Waite and Goos [1984]. 1.1 Definition Compiler A compiler is a program which transforms an algorithm from a language acceptable to humans into a language acceptable to machines. Because a compiler is a complex program which itself must be written in a programming language, the question arose quite early as to whether, given an abstract description of the source language and its transformation into a target language, a compiler could be generated either completely or partially. A program CC which is to solve such a task reads the description of the source language S together with its transformation into a target language T as input data. It transforms this description into a program C which, when it is later executed, transforms source programs written in S into the target language T. Thus CC generates a compiler C, and is known as a compiler generator or compiler compiler (Fig. 1.3).
pecat.1 Compilers and compiler compilers Compiler description in compiler description language CDL 3 Compiler in compiler implementation language CIL Compiler compiler EC Fig. 1.3 Compiler compiler This leads to the following definition. 1.2 Definition Compiler compiler A compiler compiler is a program which generates a compiler, or major parts thereof, from the complete or partial description of the compiler. A compiler compiler and the compiler it generates can be represented as in Fig. 1.4. Compiler description in CDL # Source program S Compiler Target program Cc it Fig. 1.4 Compiler compiler and the generated compiler A compiler compiler and its compiler description language are very closely related. For the user of a compiler compiler the compiler description language is actually the only interesting feature because it determines whether the description of the compiler to be generated can be formulated and how conveniently this may be accomplished. Compiler description languages have two primary tasks: (1) the description of the syntax of the source language of the compiler to be generated and (2) the description of the transformation of the source program into the target program. Because the meaning of the source program is visible in this transformation, the description of the transformation is also known as a semantical description. There are basically two notations for syntax description: Backus-Naur form (BNF) and Extended Backus-Naur form (EBNF). Both describe the
Introduction and survey 4 Chap. 1 syntax as a grammar in the form of so-called productions. They constitute well-understood formal systems and are based on the theory of formal languages. The technique of describing semantics is less consolidated. Aside from ad hoc methods, attributed grammars in a wide variety of forms are usually applied here. The compiler compiler described in this book is named Coco (a not very imaginative abbreviation of 'compiler compiler’) and its compiler description language is called Cocol (compiler compiler language). Cocol uses the EBNF of Wirth [1982] for syntax description and a special form of attributed grammars, the so called L-attributed grammars, for semantical descriptions. Coco was originally implemented in PLM80 and generated a compiler in Pascal-86. The version described here is written in Modula-2 and generates compilers in Modula-2. Table 1.3 shows the versions of Coco that are available for several popular compilers at the time of writing. They are different in the languages of the generated compilers (Modula-2 or Pascal) and in the machines on which they run. 1.3 Versions of Coco Computer Modula-2 Pascal Macintosh Mac-METH Turbo-Pascal MS-DOS computers Logitech V. 3.0 M2-SDS Taylor-Modula Turbo-Pascal V. 4.0 ATARI-ST TDI-Modula IBM/370 Modula/370 1.2 Static compiler structure Like the translation of a sentence in a natural language Q into another natural language Z, the transformation of a source program into a target program can be roughly divided into two phases. First the sentence in Q must be 'understood’, through grammatical analysis. With knowledge of its grammatical structure and the aid of a dictionary it is then possible to construct the sentence in Z with the same meaning. In a similar way, the translation of a program consists of analysis and synthesis. In the analysis phase the source program is decomposed into its constituent parts. Here one distinguishes:
Sec.1.2 1. 2. 3. Static compiler structure 5 lexical analysis, which transforms the input character stream into 'symbols' such as names, numbers and operators; syntax analysis, which analyzes the grammatical structure of the program, semantic analysis, which analyzes all the properties of the program which are not of a syntactical nature. Analysis yields: 1. 2. 3. the determination of the correctness of the program; the internal representation of the source program in a form which is particularly well adapted for synthesis (so-called intermediate language); memory tables which are used for further processing of the intermediate language. Source program A N DET suse Cele senate tactoseodass Sesbccendaceek Characters Lexical analysis a Symbols Compiler front end Sealy 2 Syntax tree te capa tes sen feloeiekele Seele Synthesis Intermediate language Intermediate language | Compiler back end Target program Fig. 1.5 Static compiler structure In the synthesis phase the target program is generated from the program in the intermediate language. Here one distinguishes: ie 2, optimization, which transforms the program in the intermediate language to improve the target program with respect to certain criteria; code generation, which generates the target program from the optimized intermediate language. This static, or logical compiler structure is shown in Fig. 1.5.
Introduction and survey 6 Chap. 1 The analysis sections are determined by the source language and the intermediate language; the synthesis sections are determined by the intermediate language and the target language. The analysis sections are known as the compiler front end; the synthesis sections are known as the compiler back end. The compiler front end is independent of the target language; the compiler back end is independent of the source language. Compiler compilers primarily support the analysis phase, and therefore this book only deals with the analysis phase. Lexical analysis Lexical analysis preprocesses the source program text in order to simplify the tasks of the later phases. This preprocessing includes the following points: 1. 2. Elimination of meaningless characters. Comments, empty lines and unnecessary spaces are eliminated. Recognition of symbols. One or more characters in sequence which together constitute a symbol are recognized. Symbols are: (a) (b) (c) (d) 3. keywords such as IF, WHILE, END, etc.; names for constants, types, variables, procedures, etc.; literals (numerical constants) such as 3.14; strings, usually enclosed in inverted commas, such as ‘This is a string’; (€) compound characters such as ':=', '<=','. .', etc.; (f) individual characters such as '(', '+', etc. Arithmetization of symbols. Because numbers can be processed more easily than strings, keywords, names and strings are replaced by numbers, and literals are converted to the internal numerical representation of the machine. This process is known as arithmetization. Names are stored in a name list, strings in a string list, and literals, possibly, in a constant list. 1.4 Example Lexical analysis The source statement x :=3 + base * factor; contains the names x, base and factor; the numerical value 3, the character combination ':=' and the individual characters ande: alt ident, becomessy, number, plussy, timessy and semicolonsy are names for the arithmetized symbols, lexical analysis yields the sequence of 8 symbols: ident becomessy number plussy ident timessy ident semicolonsy
Sec.1:2 Static compiler structure 7 Some of these symbols are uniquely determined (e.g. plussy); others such as ident and number refer to a class of symbols and must be made unique by a semantic value (e.g. an index in the name list for names, the converted numerical value for literals). If x, base and factor are stored respectively in places 1, 2 and 3 in the name list, lexical analysis yields the following symbols with their semantic values: ident/I becomessy number/3 plussy ident/2 timessy ident/3 semicolonsy Lexical analysis is the simplest part of the compiler. However, it does take up a large portion of the compilation time (typically 20 to 40%), which means that efficiency is especially important. A lexical analyzer written in Cocol is described in Section 8.1. But lexical analyzers are not discussed anywhere else in the book and the reader is referred to the literature, for example Gries [1971] or Bauer [1976]. Syntax analysis Syntax analysis decomposes the source program, which now consists of symbols, into its grammatical parts and represents its structure as a tree (called a Syntax tree) or as something equivalent to a tree. vl := Variable 3 + v2 * v3 3 Expression Assignment Fig. 1.6 Syntax tree 1.5 Example Syntax tree The source statement in Example 1.4 is an assignment. An assignment consists of a variable, the assignment symbol, an expression and a closing semicolon. An expression consists of terms connected by addition operators, and terms consist of factors connected by multiplication operators. This yields the syntax tree in Fig. 1.6.
8 Introduction and survey Chap. 1 Syntax analysis is much more difficult than lexical analysis. There are, however, methods for syntax analysis which are based on the grammar of the source language. Knowledge of these methods makes syntax analysis a routine task. Semantic Analysis Semantic analysis examines the properties of the source program which cannot be represented grammatically, in particular: 1. 2. 3. the scope of names; the correspondence between declarations and uses of names; the type compatibility of operands in expressions and statements. Semantic analysis and syntax analysis can be performed together, in which case the two phases merge; or they can be performed separately, in which case the syntax tree, the result of the syntax analysis, is augmented with semantic information. 1.3 Dynamic compiler structure Dynamic, or time-dependent, compiler structure must be distinguished from static, or logical, compiler structure. The individual logical divisions — lexical analysis, syntax analysis, semantic analysis, optimization and code generation — can be executed either sequentially or simultaneously, which means interwoven in time. Each part of the compiler which reads the source program or an intermediate program in its entirety is called a pass, and thus compilers are classified as single-pass or multi-pass compilers. Figure 1.7 shows both cases. For a single-pass compiler the syntax analyzer is the central, controlling program. It calls the lexical analyzer when it requires the next source symbol, and it calls the semantic analyzer when it wishes to pass on a syntactically correct construction. The semantic analyzer generates a section of intermediate code or the corresponding machine code (with or without optimization). For a multi-pass compiler each section is executed sequentially. The result of each section is an intermediate program which is written onto an external storage device and is read again by the next pass. Single pass compilers are generally much faster than multi-p ass compilers because they avoid access to external storage devices for readin g and writing
Secil.3 Dynamic compiler structure 9 intermediate programs. Multi-pass compilers, on the other hand, require less storage space because only one part of the compiler need ever reside in main storage at once, and they are logically simpler because the various parts are not intertwined. Some source languages cannot even be compiled by single-pass pass compilers because they contain grammatical constructs whose translation requires information which becomes available only from parts of the source program that are processed later. This is the case, for example, when a variable can be used before it is declared. The advantages and disadvantages of single-pass and multi-pass compilers can be summarized as in Fig. 1.8. Source program Lexical analyzer External memory Syntax analyzer External memory : nn — en, " _ Control flow Optimization and Data flow code generation Intermediate language Semantic analyzer Target program External memory Optimization and code generation Target program Fig. 1.7 Single-pass and multi-pass compilers Single-pass | Multi-pass compiler Speed Memory Logical complexity Universal applicability Fig. 1.8 Properties of single-pass and multi-pass compilers +=favorable - = unfavorable
10 1.4 The Introduction and survey structure Chap. 1 of the book This book consists of nine chapters and six appendices. The first three chapters cover the principles of compiler construction as far as they are required for an understanding of Coco; occasionally rather more than the minimum is presented in order to provide a well-rounded picture. The fourth chapter provides a glimpse into other compiler compilers, and the rest of the chapters present Coco itself, its compiler description language, its implementation and applications. In view of this an outline looks as follows: Principles of compiler construction 1. Introduction and survey 2. Syntax 3. Semantics Various compiler compilers 4. Various compiler compilers The compiler compiler Coco 5. The compiler description language Cocol 6. The compiler compiler Coco 7. The implementation 8. Applications a Experiences with Coco The second chapter starts with those concepts from formal language theory which are necessary for the remainder of the book. Then table-driven LL(1) Syntax analysis is covered; this determines the fundamental structure of this compiler compiler, and at the same time is a simple and efficient method for developing the syntactic section of compilers. Most importantly this chapter contains a method for automatic error recovery which is independent of the language to be analyzed. In the third chapter, the method applied in this compiler compiler for describing the actual translation process, using attributed grammars, is presented. The special case of L-attributed grammars is used here and the translat ion process is described by attributes, context conditions and semantic actions. The fourth chapter gives a survey of a few compiler generators described in the literature, and thus also surveys the state of the art. The fifth chapter is a definition of the compiler descri ption language Cocol.
Sec.1.4 The structure of the book 11 The sixth chapter describes Coco from the view point of the user: its characteristics, how to use it and what the compilers it generates look like. Along the way it is shown that Coco is also suitable for implementing multipass compilers. This chapter, together with the language description of Chapter 5, forms the ‘external’ description of Coco. The seventh chapter, the longest, contains the details of the implementation of Coco. This chapter is also intended as a study in program documentation. The eighth chapter presents three major examples of the use of Coco. The first is a complete description of a lexical analyzer in Cocol. The second illustrates Coco as a software engineering tool and the method of attributed grammars as a Software engineering method which encompasses the Jackson method as a special case. The third presents the compiler sections generated for a concrete input grammar. In conclusion the ninth chapter presents experiences of the authors with Coco. The Appendices contain the algorithm description language Adele used here, describe Modula-2 in as much as it differs from or supersedes Pascal, present a complete listing of Coco in Modula-2 and a description of Coco in Cocol, that is a self-description of Coco. Systematic readers should read the book chapter by chapter. Readers who wish to begin with lexical analysis should consult Section 8.1 as early as Chapter 2. Readers who wish to know about Coco only (or firstly) from the user's point of view can start immediately with Chapter 5 followed by Chapters 6 and 8, and perhaps Chapter 4. Finally, readers who are already familiar with LL(1)-grammars and are primarily interested in the implementation of Coco can acquaint themselves first with Cocol in Chapter 5 and then concentrate on Chapters 6 and 7, although they will occasionally have to refer back to Chapters 2 and 3. The following chapter sequences are therefore recommended (Chapters which extend the material are in italics): Novices and all-embracing readers: Primarily interested in applying Coco: Primarily interested in comparing Coco: Primarily interested in the implementation of Coco: 2-9 5, 6, 8, 4 A088 5,6, 7, 8.3 Some remarks have been repeated so that the chapters do not become too interdependent. We hope the all-embracing reader will forgive us for this. In general the presentation is organized according to the principle of stepwise refinement. This is true of the individual chapters as well as for the
Introduction and survey 12 Chap. 1 book as a whole. Thus Chapters 2 and 3 are basically refinements of Section 1.2, Chapters 5 and 7 refinements of Chapters 2 and 3 and Appendix F, containing the text of Coco in Modula-2, is a refinement of Chapter 7. For representing algorithms our algorithm design language Adele is used. It is defined in Appendix A, but should be understandable without a definition as it relies strongly on Modula-2 and Pascal. The authors use Adele constantly in their daily work and view Adele as a method for algorithm description which is adequate in most cases. Actual Modula-2 programs occur only in the appendices, but there are also Modula-2 fragments in Chapters 5 and 7. The book is therefore understandable for readers who are not familiar with Modula-2. In spite of this, Modula-2 is viewed as of major importance in this book because of the technique of modular programming, and especially because of data encapsulation. One of the book's important aims is to document a large Modula-2 program and to demonstrate in the process how well Modula-2 is suited to software engineering projects. Definitions, algorithms and examples are numbered and indented. A col- lection of all numbers is to be found after the table of contents to facilitate fast searching.
Syntax In this chapter we deal with all syntax-related questions as far as they concern compilers that use LL(1) syntax analysis. First, we will summarize the terminology and some important results of formal language theory. Next, we look at LL(1) grammars and their syntactical analysis. Since the flexibility and efficiency of syntax analysis depends to a large degree on the representation of the grammar in memory, we will describe the tree-like data structure used in Coco which is called a top-down graph. We will also describe an optimized version of the top-down graph, called the G-code, which is especially suited for interpretation. At the end of the chapter we describe the Gcode syntax analyzer and a method for automatic error handling. Except for the G-code and its interpretation this chapter is not Coco specific. Thus, it can be read as a general treatment of syntax issues in compiler design. Bottom-up analysis and LR(k) grammars have been left out, since they constitute a large and self-contained topic that does not apply to Coco. Interested readers are referred to Knuth [1965], Aho and Johnson [1974], Waite and Goos [1984], and Fischer and LeBlanc [1988]. 2.1 — Basic concepts from formal language theory We assume that the reader is familiar with the elements of formal languages, and we summarize here only the terms and definitions that we will use later on. We primarily use the terminology from the books of Gries [1971] and Aho and Ullman [1972]. 13
14 Syntax Symbols and Chap. 2 strings Programs consist syntactically of sequences or strings of symbols which belong to an alphabet or vocabulary. If a, b, c are the symbols that constitute the alphabet V, then we can write: Vi={a,6.c} Symbols can be concatenated to form strings. For some strings and sets of strings there are commonly used abbreviations: 2.1 Definition Abbreviations for strings and sets of strings a” e denotes the string consisting of n identical symbols a, e. g. a3 = aaa. denotes the empty string, i.e. a string of null symbols. a* a* denotes the set {a": n> 1}, e. g. at = (a, aa, aaa, aaaa, ...}. V+ denotes the set of all non-empty strings which can be formed from the symbols contained in V. For example, if V = {a, b, c} then denotes the set (an: n2 0}, e. g. a*= {e, a, aa, aaa, ...). It is obvious that a* = at U {e}. Vt = {a, b,c, aa, ab, ac, ba, bb, bc, ca, cb, cc, Gade V* V+ is called the free semi-group over the alphabet V. denotes the set of all strings including the empty string that can be formed from the symbols of V. For example, if V = {a, b, c} then V* = {e, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc, aaa, <n} V* is called the free monoid over the alphabet V. It is obvious that V* = V+ U fe}. The set V is always finite whereas the sets a+, a*, V+, V* are always infinite. Grammar and language In Section 1.2, we showed that the grammatical structure of an instructi on, a program, or generally of a ‘sentence’ is a tree, the syntax tree. In the syntax tree, there are two types of symbols: 1. 2. Terminals are the symbols of the sentence itself. They are the leaves of the tree and cannot be decomposed further. Nonterminals are all other symbols.
Sec. 2.1 Basic concepts from formal language theory 15 In addition to the above, each tree contains a distinguished nonterminal, the sentence symbol, or the root, from which the entire tree originates. The valid structures of syntax trees and hence the sentences of a formal language are described by a grammar. A context-free grammar or, simply, grammar — since we only use context-free grammars — is a system of rules for producing strings over an alphabet V. 2.2 Definition Grammar A grammar G is a quadruple G = (Vy, V7, R, S) with the following components: Vy: alphabet of nonterminals, Vr: alphabet of terminals, R: set of productions, also called syntax rules or derivation rules, S: sentence symbol, a special symbol from Vy, the root of the syntax tree. By V=Vy vu Vr we denote the union of the terminal and nonterminal alphabets. A production is written as Aa where X € Vy and ae V* (read: 'X is defined as a' or 'X can be replaced by a' which means that the nonterminal X can be replaced by the string a in each string that contains X. Several productions may have the same left-hand side, such as: X X Q > X> a 43 They denote alternatives and can be grouped by use of the symbol '!: Xa, la2!la3 | (read: 'X is defined as a, or a2 Or a3’). The productions describe the replacement of nonterminals by strings. We start from the sentence symbol S, and replace it by a string according to the productions of the grammar. Then we repeatedly replace nonterminals in the string by other strings until we reach a string that contains only terminals. S$ itself and all strings that result from S by the application of the productions are called sentential forms. The sentential forms that consist of terminals only are called sentences.
16 Syntax Chap. 2 We denote replacement by the replacement or derivation symbol =. If a and B are two sentential forms and ß may be derived from a by the application of a production, we write: a=>B (read: ‘a produces ß' or 'B is derived from a’). These terms are formalized by Definition 2.3 and are illustrated by Example 2.4. 2.3 Definition Derivation, sentential form, sentence, language We say that a string a directly produces a string B, written a = ß, if there exist strings @; and @,, such that we can write a =, A w,, 8 =@, @, and the production A => g belongs to the grammar. We then call B a direct derivation of a. We describe a sequence of several derivations by the symbols >* and >*. A string a produces a string B, written as a>+tß if there exists a sequence of direct derivations a=09>01)>02>...5 0,=8 where n2>1 Such a sequence is called a derivation of length n. For the case of a =+ B ora =B8, we write a= 6s (read: 'a produces or is equal to B'). If G is a grammar with sentence symbol S, then a string a is called a sentential form if S=>* a A sentence is a sentential form that consists only of terminals, and a language L(G) is the set of all sentences that can be derived from the sentence symbol: L(G) ={a:Sata & aeVr*} 2.4 Example Derivation of all sentential forms of a langua ge Consider the grammar nonterminals tions: G = (SA) B) shag wer: S) with the S, A, B, the terminals a, b, ;, and the set R of produc-
peco2.t Basic concepts from formal language theory S > A; A — aB 17 | BBb B— b | ab From this, the following derivations of sentential forms can be produced: S =)Ay) => ab; => ab; => = BBb; aab; => bBb; “= abBb; =) bbb; > babb; > abbb; > ababb; The result is L(G) = {ab; aab; bbb; babb; abbb; ababb;}. the language L(G) consists of 6 sentences. Hence, A syntax tree can be assigned to each sentence. Figure 2.1 shows the syntax tree of abbb; in two forms. S S A Ey ; t + B By) ae ab A + | B fae B b ane ab b ; Fig. 2.1 Two forms of syntax tree for abbb; In the top-down syntax analysis discussed later on, we will always use derivations in which the leftmost nonterminal is replaced. This kind of derivation is called left-canonical: 2.5 Definition Left-canonical derivation A direct derivation ,; A 2 > wı a @2 is called left-canonical, and written as L 01 A @2 > wıam if o| € Vr", that is if A is the leftmost nonterminal. A derivation is called left-canonical if all its direct derivations are left-canonical. Sometimes it is useful to have a name for the string that is substituted for a nonterminal during a derivation. This string is called a ‘phrase’.
18 Syntax Chap. 2 2.6 Definition Phrase When wı a 2 is a sentential form such that S>*0,A@2>*01] 002, then «a is called a phrase, more specifically an A-phrase. According to this definition, each sentential form is an S-phrase. Because of their importance in bottom-up syntax analysis, which is not covered in this book, we shall also define the terms simple phrase and handle. 2.7 Definition Simple phrase, handle If a is an A-phrase and a direct derivation of A, then S>*o,A 02> 0, am holds and a is also called a simple A-phrase. The leftmost simple phrase in a sentential form is called the handle of the sentential form. 2.8 Example Phrase, simple phrase, handle Consider Example 2.4 and the derivation sequence S> A; = B1B2b3; > Byb2b3; = ab b2b33 where the different Bs and bs have been distinguished by an index. In the sentential form ab,b2b3; abı bo is a simple B-phrase and the handle, is a simple B-phrase, abjb2b3 abi b2b3; is an A-phrase, is an S-phrase. In the sentential form B,b2b3; be is a simple B-phrase and the handle, B,b2b3 B,b2b3; is an A-phrase, is an S-phrase. In the sentential form B1B2b3; B1Bab3 B1B2b3; is a simple A-phrase and the handle, is an S-phrase. In the sentential form a; A; is a simple S-phrase and the handle.
Sec. 2.1 Basic concepts from formal language theory 19 Recursive productions produce languages with an infinite number of sentences. The production A > a1Ab produces the set of sentences ab*. The production A - a |bA produces the set of sentences b*a, the production A al (A) produces the set of sentences { ("a )":n>0}. Recursion can also appear indirectly, which means it can span several productions, as in the production pair IN Sp oe By. B> z | Au The following definition i§ a consequence of this: 2.9 Definition Recursive grammar A grammar is called recursive if it permits derivations of the form A >* 1 A @2 with A € Vy, wı e V*, w2 e V*. More specifically, it is called A=>* A w Left-recursive if Right-recursive if A>* oA Central-recursive or self-embedding if A =>*+ 1 A @2. 2.10 Example Arithmetic expressions The grammar of arithmetic expressions with the sentence symbol E and the terminals v for variables, and c for constants: We PS are a | Be Al > eee eat ACP Oh ape IB [lee oe ee by Pam theca ili (abhi) is left-recursive in E and T, and central-recursive in E, T, and F. The extended Backus-Naur form (EBNF) Computer science uses various notations for grammar productions. The pre- viously used notation has the following characteristics: 1. 2. terminals are lower case nonterminals are upper case 3. 4. replacement symbol is > separation of alternatives is denoted by | Indefinite repetition, which is a frequently occurring language element, must be described by recursive productions, especially left-recursive productions. This appears in many cases unnatural and it is also unsuited for top-down syntax analysis. Several grammatical notations have therefore evolved that
Chap. 2 Syntax 20 remove these and other deficiencies. Among these, the notation introduced by Wirth [1982] for the description of Modula-2 is especially notable because of its simplicity and clarity. Its characteristics are: 1. 2. CON PW NN Terminals that represent themselves (literals) are in quotes Other terminals and nonterminals have names that imply their meaning (this is customary but not mandatory) Replacement symbol is = Separation of alternatives is denoted by | Productions are ended explicitly by a period Option symbol: [A] means A le Repetition symbol: {A} means elAIAAIAAA|... Parentheses for enclosing The grammar of the arithmetic expressions is as follows: PGMAMO term factor [PMSA| steeweni EWE ||Yo) ei ic ea neKoretong AX(MUVUA LAY) “SeEKelEyoat = Cn Vile (eeexpre’ssmonian) co The form of the EBNF grammar itself can also be described by an EBNF grammar: EBNFGrammar production = production = symbol "=" expr term factor {production} expr. ".". term {"|" term}. factor {factor}. ident N— il De | string expr un | Lu la expr alley | bi expr WA ident is the terminal for names, string is the terminal for a character string enclosed in quotes. In this book, we will primarily use Wirth's EBNF notation. However, where structural simplicity of the grammar is important, we will still use the older notation of the formal languages. Reduced grammars In the grammar of a programming language, each nonterminal and each alternative should contribute to the generation of sentences. If this is the case, the grammar is called reduced. In the development of a grammar, unnecessary nonterminals and alternatives may creep in. Therefore, each newly developed grammar should be checked to determine if it is reduced. If it is not, the unnecessary symbols and productions should be removed. In order to contribute to the generation of sentences, each nonterminal must meet the following two conditions: It must be 'terminating', that is, it
Sec..2.1 Basic concepts from formal language theory 21 must produce a terminal string, and it must be ‘derivable’, that is, it must appear in some sentential form. 2.11 Definition Terminating symbol, derivable symbol A nonterminal A is called terminating if it produces a terminal string, that is A=*a withaeVr*. A nonterminal A is called derivable if it appears in a sentential form, that is, if 7 S=* @,A @2. A nonterminal that is not derivable or not terminating, contributes nothing to the generation of sentences, and is therefore useless. 2.12 Definition Useless symbol A nonterminal A is called useless if there is no derivation * S =>" 0, A 25" ©] a @2 where @), 02, a eV* 2.13 Definition Reduced grammar A grammar that contains no useless nonterminals is called reduced. Algorithms for the detection of all useless symbols are simple (see Sections 7.5.2 and 7.5.4, or Hopcroft and Ullman [1979]). If one wants to delete them, the order is important. First, the nonterminating symbols must be found and all alternatives in which they appear must be deleted from the grammar. Then, the nonderivable symbols and the alternatives in which they appear must be found in the new grammar and deleted. Automatic deletion is possible but not recommended since useless symbols often indicate errors in the grammar. Even after removing useless symbols, the grammar may still contain use- less alternatives, which permit derivations of the form A >* A. Such a derivation is called a circular derivation, and the grammar is called circular or cyclic. Section 7.5.3 contains an algorithm for a circularity check of a grammar. The book by Hopcroft and Ullman [1979] contains an algorithm for the deletion of productions where the right-hand side consists of only a nonterminal, and thus for the removal of cycles. In the following, we will cover only non-circular reduced grammars.
22 Syntax Chap. 2 Grammatical levels Programming languages contain constructs of various hierarchy. At the very top are programs, which are composed of modules, procedures, declarations and statements. Declarations and statements in turn are composed of expressions, keywords, names and numbers. Names and numbers are composed of characters. It is somewhat arbitrary which of these constructs are defined as terminals. If one only wants to show the nesting of procedures, then declarations and expressions can be regarded as terminals. If one wants to describe the structure of expressions, then keywords, names, numbers, and operators can be regarded as terminals. Only if one wants to descend further must individual characters be seen as terminals. In this way, the syntax of a programming language need not be completely described by one grammar, but may be partitioned into several grammatical levels. The terminals of the higher level are the nonterminals of the lower level. In compiler design, usually two levels are used: the syntactical and the lexical level. The syntactical level is the higher of the two; its sentence symbol is the program. Its terminals are keywords, names, numbers, operators, etc. Below this, nonterminals of the lexical level are keywords, names, numbers, and special symbols. Its terminals are the individual characters of the source text, insofar as they are meaningful for the grammar (comments, end-of-lines, and meaningless empty symbols are not part of grammar). Figure 2.2 shows this relationship. syntactic lexical program procedure statement name number keyword expression ee name number keyword individual character Fig. 2.2 Syntactic and lexical grammatical levels In this book, we consider mainly the syntactical level. This results in a difficulty with the notation of terminals. From the syntactical level, the expression art pe
Seen2:2 LL (I) grammars and syntax analysis 23 consists of two names v, a number c, and the operators '+' and '*': vtivitc While the terminals '+' and '*' are simultaneously members of the syntactical and the lexical level, the terminal v denotes all names, and the terminal c denotes all numbers. In order to emphasize this fact, we call terminals of the syntactical level that represent an entire class of symbols of the lexical level, a terminal class. Thus, in the grammar of arithmetic expressions, v and c are terminal classes, and +, -, *, /, (, ) are individual terminals. It is to some extent arbitrary which terminals of the lexical level are also considered as terminals of the syntactical level, and which are combined to terminal classes. For instance, the operators *, /, and MOD from the lexical level can be considered at the syntactical level as individual terminals or can be combined at the lexical level to the terminal class mulop by the production mulop 2.2 PT Beh if | WA | "MOD", LL(1) grammars and syntax analysis A grammar for a language can be used in two different ways: as a generative grammar for the generation of sentences of the language, and as an analyzing grammar for the decision whether a given string is a sentence of the language. The generation of sentences is a trivial, straightforward, combinatorial problem, and of no interest in the practical areas of computer science. However, the aspect of the generative grammar is important in theoretical computer science and mathematics. In these fields grammars are classified according to the expressive power and the structural characteristics of the languages they generate. The analysis, more precisely the recognition of sentences is, from a mathematical point of view, also a trivial problem. All sentences of the grammar may simply be generated in ascending order by their length, and it is then easily determined if the specified string is among the sentences (search by exhaustion). In reality, this is not feasible since the number of sentences generally grows exponentially with their length. Analysis methods are needed that make use of all information in the grammar, and that perform the analysis of the given string in a minimum of time and memory requirement. These methods can be separated into two main categories: top-down methods start at the top with the sentence symbol and move downward by repeated derivations trying to find a sentence which matches the given terminal string; bottom-up
24 Syntax Chap. 2 methods start at the given terminal string and move upward by repeated reductions of phrases until the sentence symbol is reached. In addition to these two main approaches, there are analysis methods that mix the top-down and bottom-up approach. In this book, we will cover only top-down analysis. In top-down analysis, we start from the sentence symbol and repeatedly generate new sentential forms by left-canonical derivations, with the goal of deriving a sentence matching the given string. If this is successful, the string has been parsed. If it is not successful, and we have exhausted all possibilities for the derivation of sentences that match the string, then it is clear that the symbol string is not a sentence of the grammar. The only difficulty with this approach is the selection of the correct alternative. Generally, there is not enough information available at the time when the selection between several alternatives must be made to be reasonably sure of choosing the correct one. Therefore, usually the alternatives must be tried one after the other until the correct one is found. The alternatives that have been tried unsuccessfully are dead ends from which one has to return by backtracking. Fortunately, programming languages are structured in such a way that the proper alternative can be determined with certainty by considering only a part of the input string. These grammars are called deterministic. In compiler construction, only deterministic grammars are used, and so we shall cover only the top-down analysis of deterministic grammars. Deterministic top-down parsing The concept of deterministic top-down parsing consists in selecting the proper alternative by looking at the start symbols of the string to be analyzed. In this way parsing proceeds from left to right. Consider, for instance, the grammar S > A — aA x | bB | aB B— y | bA and the input string o = bbay. The grammar has the property that all of its alternatives start with terminals, and also that the heads of the alternatives are different in each rule. This property permits the dead-end-free determination of the correct alternative by consulting the string o. Assuming that the string is read from left to right, the parsing proceeds as follows: 1. In the beginning, when a choice has to be made between S >a@A and S = bB, the first symbol of o is read, b is found, and therefore it is
Sec. 2.2 LL (1) grammars and syntax analysis known that S >bB must be the correct alternative since never lead to a sentence starting with b. 25 $S=aA can 2. If bB is further derived, one has the choice of replacing B with y or with DA. If the next symbol is read, a b is found, and so DA must be the correct alternative. 3. Continuing this procedure, the following derivations are generated: S =bB >bbA >bbaB =bbay resulting in the recognition of o as a sentence. From the above derivation, the syntax tree of Fig. 2.3 follows. Fig. 2.3 Syntax tree This is the essence of deterministic top-down parsing: Starting with the sentence symbol, a sequence of left-canonical derivations is built, selecting the correct alternatives by the inspection of the string to be parsed. The string is read from left to right. More than one symbol of the input string must be considered when several alternatives of a production start with the same symbol. This lookahead is a characteristic of the LL(k) grammar: 2.14 Definition LL(k) grammar A grammar is called LL(k) (deterministically recognizable from left to right with left canonical derivations and a lookahead of k symbols) if its sentences can be parsed by a top-down analysis from left to right in such a way that in each situation where a choice must be made between several alternatives, the correct alternative can always be found by considering the next k symbols of the input string.
Chap. 2 Syntax 26 Since it is desirable to restrict the lookahead to one symbol, and since it turns out that this restriction allows us to handle most practical cases, we will examine more closely only the LL(1) grammars. The main question is how to determine if a given grammar is LL(1). We will answer this question first for e-free grammars (i. e. grammars without empty alternatives), and then for grammars that do contain empty alternatives. LL(1) Grammars without empty alternatives Even a grammar whose alternatives begin with nonterminals may be parsable without running into dead ends. Consider the grammar S — D> Aa xza| | Bb vB B— uz | vA and the string o = uzb. Even though none of the alternatives of the production for S starts with u, it is obvious that only B can be derived into a string starting with u, while all derivations of A start with x or y. The symbols x and y are the 'terminal start symbols’ of A, and u and v are the terminal start symbols of B. The concept of a set of terminal start symbols is central for the description of the LL(1) property. 2.15 Definition Terminal start symbols of a nonterminal The set first(A) of terminal start symbols of the nonterminal A is defined to be the set of all terminals with which a string derived from A can Start: first(A)= {x:A=>*xo, for xe Vrand we V*} For the production A — ¢, first(A)=® (the empty set) This definition can be expanded in a natural way for a string as argument : 2.16 Definition Terminal start symbols of a string The set first(a) of the terminal start symbols of a string a is defined to be the set of all terminals with which « or a string derived from « can start: first(a)= {x:a=>* x, forxeVr and we V*} As a special case we define first(e) = @. With the concept of terminal start symbols, it is easy to define the conditions under which an e-free grammar is LL(1):
Sec#2.2 LL (1) grammars and syntax analysis 27 2.17 LL(1) condition for e-free grammars An e-free grammar is LL(1) if, for each of its productions, the sets of terminal start symbols of its alternatives are pairwise disjoint . That is, for each of its productions A> oj | =. | a, the following holds: firsta) N first(aj)=@ forl<izj<n 2.18 Example LL(1) conditions The grammar DEDERN; A — aB | BBb Bi => 10) | ale is not LL(1) since the following is true for the production A — aB|BBb: first(aB)= {a}, first(BBb)= {a,b}, and hence *, first(aB) N first(BBb)= {a} The sets of terminal start symbols of both alternatives are not disjoint. Both alternatives can start with an a. As a result, if a choice must be made between alternatives, and a is the leftmost symbol of the input string, the correct alternative cannot be found without a lookahead of more than one symbol. No left-recursive grammar is LL(1) since for a production of form A-~ a | AB the following is true: first(a) = first(A B). LL(1) Grammars with empty alternatives For a grammar with empty alternatives, the LL(1) condition of the preceding section no longer holds. Consider, for instance, the grammar S — aA; | bAc; Arc|/e and the input string o = bc;. It is obvious that the production for S meets the LL(1) condition 2.17 which is also true for the production for A because first(c)={c}, first(e)=@ andhence first(c)N first(e)=®
Chap. 2 Syntax 28 However, the grammar is not LL(1) since after the derivation Se DAG. it is impossible to determine with a lookahead of only one symbol whether A—>c orA > e must be used for the next derivation.The choice of A — c: SEZEDACHEDEDEC, does not lead to o. The choice of A > e is the correct one. Therefore, the grammar is not LL(1). If we must choose one of the alternatives of a production AO, holo le and only the next symbol of o can be used, then the terminal start symbols of a1 to a, and the terminal successors of A must be pairwise disjoint, since in the case of the production A > e , the terminal following A is the next one in o. The set of terminal sucessors is defined as follows: 2.19 Definition Terminal successors The set follow(A) of the terminal successors of a nonterminal A is the set of all terminals that can follow A in any sentential form: follow(A) = {x:S =" Ax @2, forAe Vy, xe Vz, 01, waeV*} This definition makes it possible to determine the conditions under which an arbitrary grammar is LL(1): 2.20 LL(1) conditions for arbitrary grammars A grammar is LL(1) if (1) for each of its productions, the sets of all ter- minal start symbols of all alternatives are pairwise disjoint, and (2) for the nonterminals which can be derived into the empty string, all terminal successors of the nonterminal are disjoint from the terminal start symbols of each alternative. Formally: for each production A>a,l...la, the following must hold: first(a; follow(A)) N first(a jfollow(A))=® forl<i#j<n Note that in the formal representation the cases «; PD" ¢ and a;> ir are combined. For a; #" ewe have first(a; follow(A)) = first(a;), for aj= * ewe have first(a; follow(A)) = follow(A).
Sec. 2.2 LL (1) grammars and syntax analysis 29 2.21 Example LL(1) conditions Consider the grammar of Knuth[1965]: S-> E; E > aAbE | bBaE A aAbA | € B — bBaB | € | ¢€ Is it LL(1)? Since e appears in the productions for E, A, and B, the terminal successors of E, A, and B are needed. From the grammar it can be easily seen that follow(E) follow (A) follow (B) {7} {b} {a} The lookahead sets are: for the alternatives of the E production first (aAbE follow(E)) {a} first (bBaE follow(E)) {b} first(€ follow(E)) for the alternatives of the A production first (aAbA {7} follow(A)) {a} first(€ follow(A)) © #4 for the alternatives of the B production first (bBaB follow(B)) {b} ll {a} Since the lookahead sets are pairwise disjoint for the alternatives of each production, the grammar is LL(1). The calculation of the successor sets is not always easy as we can see in the following example of an if statement having a dangling else clause. 2.22 Example Dangling else Consider the grammar NS FG) eS OYE =F = program 4 statement programrest > program statement 4 assignment assignment + Vie programrest | end | ifstatement xpi, ifstatement > if thenpart thenpart > expr then elsepart 4 else statement elsepart statement | € with the sentence symbol program and the terminals end, v, :=, expr, „if, then, else.
Chap. 2 Syntax 30 Is it LL(1)? There are three productions with alternatives: programrest, statement, elsepart. The first two are LL(1) since first (program) =I ivan yy etelsite (ena) = {end} first (assignment) = {v}, = {if} first (ifstatement) The calculation of follow(elsepart) is longer: follow(elsepart) follow(ifstatement) follow(statement) = = = follow(ifstatement) follow(statement) first (programrest) U follow(thenpart) U follow(elsepart) by production by production by production 5 3 1 by production by production 6 7 with the result: follow(elsepart) U U first (programrest) follow(thenpart) follow(elsepart). Since the last term on the right-hand side agrees with the left-hand side, it adds nothing to the set. In addition, since first (programrest) = = first (program) {v,if,end} = {v,if,end} U first (end) we have follow(elsepart) U follow(thenpart). Additionally, follow(thenpart) first (elsepart) follow(ifstatement) = first (elsepart) U follow(ifstatement) = {else} by production by production 5 7 = follow(statement) by production 3 = = {v,if,end} U {else} {v,if,end,else} hence follow(elsepart) Checking the LL(1) condition for production 7 results in: first (else statement) N follow(elsepart) = {else} # ®, The grammar is therefore not LL(1). The fact that the grammar in this example is not LL(1) does not preclude it from being deterministically parsable with a lookahead of one symbol. The Syntax analyzer can always choose the first alternative when it sees the production elsepart and else is the next input symbol. In spite of the ambiguity of the statement
Sece22 if LL (1) grammars and syntax analysis a then if b then c else 31 d the first else then always belongs to the innermost then (as in PL/I and Pas- cal). LL(1) grammars and grammars of programming languages The LL(1) conditions severely restrict the class of grammars that can be analyzed deterministically. Almost all programming language grammars violate the LL(1) conditions. Especially disturbing are two facts: 1. Left-recursive productions are not LEG): 2. Alternatives that start with the same string are not LL(1). However, it is almost always possible to transform non-LL(1) constructs into LL(1) constructs. This is greatly aided by the use of EBNF notation. With it left-recursive productions can be described by use of the repetition symbol {}, and common beginnings of alternatives can be extracted by factorization. We have defined the LL(1) conditions only for grammars with simple BNF productions. So the question must arise how they look when an EBNF grammar is used. We will defer the answer for the time being until the end of Section 223: Computation of start and successor sets For small grammars, the calculation of start and successor sets to check for the LL(1) property can be done by careful visual inspection. However, an algorithm is needed for larger grammars. Since the derivation of the form A >+ ¢ plays an important role, we will first introduce the concept of 'deletability". 2.23 Definition Deletability A nonterminal A is called deletable, if it produces the empty string: A=" «. In this section we will write deletable symbols in square brackets: [A]. An algorithm for marking deletable symbols in a grammar is trivial. It is based on the following assertions: 1. IfA > eisa 2. FA production then A is deletable. X... X, is a production and all X; are deletable, then A is also deletable.
Chap. 2 Syntax 32 2.24 Algorithm Marking deletable symbols MarkDeletableSymbols: Mark all nonterminals A for which A>8& exists; repeat -- Assert: All marked symbols are deletable Mark all nonterminals A for which A > X1...Xn and Xj...Xn are all marked nonterminals until No new symbol was end MarkDeletableSymbols marked Sets of terminal start symbols. Three cases must be distinguished for the calculation of the terminal start symbols of a string a: 1. the string is deletable; 2. 3. its first nondeletable symbol is a terminal y; its first nondeletable symbol is a nonterminal Y. From this, computation rules (1) through (3) follow: 1. foro = [X)] ... X]; EIESE(WE-FFILSERT 2. 3. EUR fora = [X,]... [X;] yo, first (a) = first (1) Us..U fora= (Xq) 1X2 0, FIRSELQ)EZEFTESEKT EUTL OERITSEIXE) first (Xe) VO {y) UERIESEN RE )RÜUSLTESET) The set of start symbols of a nonterminal is the union of the sets of start symbols of its alternatives: 4. “for a> ay | sel oF erase (UA)) First (OU 5 fU) eiliecye, (0) From these computation rules, the following algorithm is derived. 2.25 Algorithm Calculation of the sets of terminal start symbols FindFirstSets(lGTfirst): param G: A grammar nonterminals; with marked deletable symbols first: array(l:n) of set of terminal; begin first (1l:n):=@; -- start with empty sets repeat for all productions A>Q]|...|Qm do and n
Seci2.2 LL (1) grammars and syntax analysis for all alternatives a@;=[Xıl...[Xt]Y@ with t>=0, first (A) :=first (A) +first (X))+...+first (X¢); case of Y is terminal: | Y is nonterminal: | Y® is absent: end end end until No change in first end FindFirstSets 33 Ywev* do first (A) :=first (A) +{Y} first (A) :=first (A) +first (Y) -- nothing Terminal successor sets. When calculating the terminal successors of a nonterminal A there are also three cases which must be distinguished: in the right-hand side of a production in which A appears, either a terminal y, a nondeletable nonterminal Y, or nothing follows after any deletable symbols. From this, the computation rules (5) through (7) follow: 5. forB >A [X)]... X], (first (X}) U...U 6. forB >A first (Xt) U follow(B)) [X)]... [XJ y 2, (Eanesti( Xk) Oe On tars 7. fer is in follow(A) (xp) EU tye) is in follow(A) B+ A [X] ... [Xi] Yo, (first (X]) U...U first (X) U first(Y)) isin follow(A) If all occurrences of A on the right-hand side of the productions are considered, the total set follow(A) will be the combination of all partial sets of follow(A) that result from (5) through (7). Therefore we have the following algorithm. 2.26 Algorithm Calculation of successor sets FindFollowSets(lGlfirstT follow) : param G: A grammar with marked deletable symbols and n nonterminals; first: array(1:n) of set of terminal; follow: array(l:n) of set of terminal; begin follow(1:n):=@; -- start with repeat for all nonterminals A do empty follow for all productions B-@A[X1]...[Xt]Y¥@2 sets with t>=0 and Yw,eVv* do follow(A) :=follow(A) +first (X;)+...+first (Xt);
Chap. 2 Syntax 34 case of Y is terminal: | Y is nonterminal: | YM. is absent: end end end until No change in end FindFollowSets follow(A) follow(A) :=follow(A) +{y} :=follow(A)+first (Y) follow(A) :=follow(A) +follow(B) follow The implementation of the algorithms depends strongly on the data structure of the grammar. The execution time depends on the order in which the productions are visited. Many optimizations are possible. Principles of syntax analysis of LL(1) grammars The principle of deterministic syntax analysis of LL(1) grammars can be described abstractly under the following assumptions. 1. The grammar is given in ‘matrix form': form Aj > al. | Ojjmax(i) where It has imax productions of the 1 SiS imax The sentence symbol is A;. An alternative aj; is given by kmax components of the form Oj = X aj =e ji. Xijkmax(i,j) means kmax(i,j) = 1, and Xj) =e. The representation is matrix-like: index i describes the production, index j describes the alternative, and index k describes the component. As programmers, we understand this representation as an abstract data structure with the access functions: X(Lidjlk): symbol returns the value of symbol X ijk: Kind(liljlk): Symkind returns the kind of the symbol Xjjx, where Symkind = (terminal,nonterminal,epsilon). Rule(Liljlk): integer If X;jx is the nonterminal A;, then this function returns index i: Rule(liljik)=i' © X ijk = Ai Kmax(litj): integer returns the number of components of alternative j in the production i.
Sec. 2.2 LL (1) grammars and syntax analysis 35 Match(\xli): boolean returns true if a phrase of the nonterminal A; can start with terminal x, or - if Ai >* e -x can follow the phrase of Ai: Match(\xli) & x e’first(A; follow(A;)) Alternative(Jx\i): integer returns the index j of the alternative of the production i which can begin with the terminal x: Alternative(txli) =j = xe first(a; follow(A;)) 3. The string to be parsed consists of pmax symbols Sp: © = S}...Spmax With pmax 2 1 The description is basic and abstract since we ignore (1) the concrete data structures of the stored grammar, (2) the implementation of the access functions, and (3) the fact that the input string is actually supplied by a lexical analyzer. We will now give a recursive and a nonrecursive parsing algorithm. The recursive algorithm uses an internal recursive procedure Parse. Its operation should be clear from the following specifications and from the text of the algorithm without additional explanations. Initial state: The input string, up to the symbol s,.ı, is recognized as a legal beginning of a sentence. The A;-phrase starts with Sp: Function: Parse(litcorrect) tries to parse the A;-phrase. Final state: If correct = true, an A;-phrase is parsed and p is advanced such that s, is the first input symbol that is no longer part of the A;-phrase. If correct = false, an A;-phrase was not parsed. 2.27 Algorithm LL(]1) analysis (recursive) ParseRecursive (Tcorrect): param global correct: boolean; grammar in matrix local pmax: integer; p: integer; s: array(l:pmax) Parse (Jilcorrect): param i: integer; correct: boolean; local j,k: integer; --the string is successfully parsed form; of symbol; --input string --index of current --an Aj phrase input is parsed symbol
Chap. 2 Syntax 36 --try begin -- position to parse an 1 -- if Match (4spli) Ai phrase | then j:=Alternat ive (Jspli); k:=1; loop —— OO SuseO Nie case Kind (LiljJk) of terminal: --parsing of A; possible --parse ajj --parse Xjjk if p>pmax or Sp<>X (Lidjdk) then correct:=false; exit end; p:=ptl --read next input symbol | nonterminal: Parse (JRule (Liljlk) Tcorrect) ; if not correct then exit end | epsilon: -- do nothing end; ihe k<Kmax (JiJj) then k:=k+1l end else correct:=false end => correct:=true; --parsing of A; exit end impossible foyoyenlicst@yel. 3) = end begin p:=1; end else Parse; --pmax and s are assumed to have values Parse (J1Tcorrect); correct:=correct & p=pmaxtl ParseRecursive We will show the behavior of the above algorithm in Example 2.28 below where we take a snapshot of the states of the algorithm at 'position 1', ‘position 2', and 'position 3'. 2.28 Example Recursive LL(1) parsing Consider Knuth's e-containing grammar from Example 2.21. Let us give its components the indices i, j, and k, and extend it by the component eof so that it will not produce empty sentences: Sh E2 A3 Ba = © > > Hii 4211 a311 bai1 COs A212 b213 E214 A312 b313 A314 Byı2 4413 Baıa | b221 B222 ao23 E224 | €321 | €421 | €231 The input string shall be a1 b2 by aq eofs All steps performed by the algorithm can be traced in full detail by the snapshots of Fig. 2.4.
Sec. 2.2 LL (1) grammars and syntax analysis Recursion depth: 0 Position p Sp ijk Xijx 1 2 ijk Xijk 3 13k Xjjx 1 2 a la ib 2 a oA 2-Alle 2 Zab ZZ 1 22D Aa 2 2D SZ 37 ijk Xijk 1-TIGER ea 3 Dub 2 2b 321 2 1 2 2 Sie, Seb Bab 4a ii 4a Dame 2 4a 421 € 3 4a 421 € 2 2 4a 5 eof il 5 eof Zee = 2 5 eof Zalse 3 5 eof 3 "3 5 eof 5 eof 2 3 5 eof 6 - * 243° 5 BNGY iy ¢ correct=true 2-BON N PPDEEB BR 224 214 112 112 eof eof E correct :=true Bi E Oeil E correct :=true 224 E correct:=true correct:=true correct:=true Fig. 2.4 Snapshot of the LL(1) parsing of Algorithm 2.27 applied to the grammar of Example 2.28 The nonrecursive algorithm uses a stack for the intermediate storage of the indices of all nonterminals that are currently being processed. The access functions of the stack are InitStack, Push(Liljlk) and Pop(Titjtk). The algorithm can be in three states: findalternative, try, forward. These are characterized by the assertions which hold in each one respectively: findalternative: The input string is already recognized up to the symbol s,_ı as a legal beginning of a sentence. s, is recognized and it is expected that an A,-phrase, starting with s,, will follow. The index j of the matching alternative of the A;-production will be found. try: The grammar symbol X;;, will be parsed. forward: Xj, has been successfully parsed, so move to its successor. For the stack, the following assertion holds in all three states. If i = 1, the stack is empty. If i > 1, then A; is at the top of the stack.
38 Chap. 2 Syntax 2.29 Algorithm LL(J/) parsing (nonrecursive) ParseNonRecursive(fcorrect) : --the string is successfully param correct: boolean; global grammar in matrix form; --input string s: array(l:pmax) of symbol; pmax: integer; try, forward); type State = (findalternative, i,j,k,p: integer; local Su parsed mowaleys begin i:=1; p:=1; stack:=empty; st:=findalternative; --pmax and s are assumed --have values -=starbewichsefirseterule to loop case st of findalternative: —— Osi one — ge Match (Jspli) then j:=Alternative(Yspli); ==Xj;x else k:=1; st:=try correct:=false; ==5p does exit is first not component match end | eine —— --parse Xj jx DOS tekOns ZI case Kind(Libjlk) terminal: if p>pmax p:=ptl; of or X (Lid jk) <>sp then st:=forward --Xijk correct:=false; exit end; is parsed | nonterminal: Push (Lil jlk); i:=Rule(Liljlk); st:=findalternative | epsilon: st:=forward end --case Kind | forward: -- --advance position to next component 3 -- if k<Kmax(lil}) then else k:=k+l; if i>1 st:=try --no end of alternative --end of alternative then Pop(TiTjtk) else correct:=p=pmax+l; end end --ease st end ==lloop -- position 4 -end ParseNonRecursive --Nonterminal exit --rule X;;jx is parsed 1 is parsed The behavior of the nonrecursive algorithm is shown in Example 2.30.
SeCig2.2 LL (1) grammars and syntax analysis 2.30 Example 39 Nonrecursive LL(1) parsing We consider the same grammar and input stream as in Example 2.28 with snapshots at positions 1 to 4. The algorithm executes as in Fig. 2.5. Position P Sp Stack (End-Of-Stack ZA NA 2, WL BOS Bev Matt 222, 204) 110 BR Pilih Aaa 224, 214, 111 22.0, alla, hahah 224, 214, 111 left) EN WDEWEN SL IE ler Ken lon hey iep lem jemion ton hte ue DBIEDZREED ® fo} Ph eof eof mmeow eof NS UO WW DD WI NO ING DIN DIA Go NM Gd FD > ID MN DD G MR G 231 224 214 m 111 mm 112 eof 112 eof 1 eof Oo SS BS SP Lf NH DH DD FF FF WWD NHN OOo DDO SP WW correct=true Fig. 2.5 Snapshots of the nonrecursive LL(1) algorithm 2.29, applied to the grammar of Example 2.28 The recursive algorithm is statically shorter and more elegant. The nonrecursive algorithm is more suited for the inclusion of error handling since the explicitly stacked symbols are accessible (see Section 2.6). Both scan the input string strictly from left to right (p is never decremented). In addition, there exists a grammar-specific upper limit c such that after
Chap. 2 Syntax 40 a maximum of c loops, a new input symbol is read. Hence, the algorithm has a linear execution time with respect to the length of the input string. It has the time complexity O( pmax). LL(k) grammars for k > 1 A lookahead of more than one symbol is rarely used in compilers. We shall therefore treat LL(k) grammars for k > 1 only briefly, for the sake of completeness. First, we define the set of terminal start symbols of length k of a string a: 2.31 Definition Terminal start symbols of length k first(a) = {B: a =* Bo with BeV;*, IBl=k,weV*} first(a) = {B:a=*B with B Vz", IBI<k) for Bork for B >k If the terminal string which can be derived from «a. is shorter than k, then the elements of first,(a) are also shorter than k. We will now give a formal definition of the LL(k) grammars according to Rosenkrantz and Stearns [1970]: 2.32 Definition LL(k) grammar A grammar is called LL(&) if for all left-canonical derivations of the form S>*aAo*a S>*aAw* Bo a Yo where first,(B ©) = first,(y @), it is implied that B =y. This means that in an LL(k) grammar no two sentential forms with the leftmost nonterminal A and the alternatives A > B and A > ycan exist in which the left canonical derivations of the remaining strings beginning with ß and y agree in the first k symbols. From this, we get the following condition: 2.33 LL(k) condition A grammar is LL(k) if for each pair of rules A>ß and Ay and each left canonically derived sentential form that contains A:
SECEL2 LL (1) grammars and syntax analysis 41 S>*aAo the following condition holds: first(B w) 0 firstly o) = 2.34 Example LL(2) and LL(3) test Again, consider the grammar S 7 A; A — aB | BBb, B—> b | ab in order to see if it is LL(k) and determine the value of k. The only pair of rules that creates a problem is: A > aB A > BBb and the only sentential form in which its left-hand side A appears is A;. k=1: the LL(1) test produces: 1 (aB;) =iaq} Lı(3BBb;) =T{a,b} Since a belongs to both sets, the grammar is not LL(1). k=2: the LL(2) test produces: L>(aB;) L2(BBb;) = (aa, ab} = {bb, ba, ab} Since the element ab belongs to both sets, the grammar is not LL(2). k=3: the LL(3) test produces: L3(aB;) L3(BBb;) = {ab;, aab} = {bbb, bab, abb, aba} Since both sets are disjoint the grammar is LL(3). Algorithms for the computation of the sets first,(a) and for checking the LL(k) conditions for k > 1 can be found in Aho and Ullman [1972]. No left-recursive grammar is LL(k) for any k. Another simple grammar that is not LL(&) for any & is: S7 A; A> a | aka It has the language {q2ntl }. If there were a value of k such that first{aa"’,) n first(aAaa",) = © then k > n+1 would be true. However, since n can become arbitrarily large, there is no such k.
Syntax 42 Chap. 2 Rosenkrantz and Stearns [1970] proved the following interesting statements about LL(k) grammars: (1) It is undecidable whether a given arbitrary grammar is LL(k) for an unknown value of k. (2) It is decidable whether a given arbitrary grammar is LL(k) for a given fixed value of k. (3) If a grammar G is not LL(k) for a given k, it cannot be determined if there is an equivalent LL(k) grammar for G. (4 A For each LL(k) grammar G that contains e, there is an e-free LL(k+1) grammar that produces the same language as G, but without the empty string. 2.3 The top-down graph In a table-driven syntax analysis, the grammar of the source language must be stored in main memory so that the analysis algorithm can access it freely. The three-dimensional abstract data structure consisting of rules, alternatives, and components, used in Section 2.2 for the representation of the principal algorithms, is not suited as concrete data structure. It does not make efficient use of memory and the grammar cannot be represented in EBNF form. A representation that is much better suited for a practical top-down analysis is a special kind of graph. We call it top-down graph. It is similar to the syntax diagrams, introduced by Wirth, that were used to describe the Syntax of Pascal. In Coco, the top-down graph is used as a preliminary step to the even better suited G-code. Since the G-code is understandable only by means of the top-down graph, we will describe that first. Basic structure The basic structure of the top-down graph is a collection of ordered binary trees. Its nodes are the grammar symbols of the right-hand sides of syntax rules. Right pointers link the components of an alternative while left pointers link the start symbols of different alternatives. In the picture of a top-down graph, a right pointer leaves a node at the right, a left pointer leaves the node at the bottom:
Sécs23 The top-down graph node ———~ 43 right pointer (to next component) left pointer (to next alternative) 2.35 Example Basic structure of the top-down graph Figure 2.6 shows the top-down graph of the grammar SAH; B — aB 7 | BBb B— b | ab Notice that the top-down graph comprises only the right-hand sides of the rules. ? Ss A—- AS a—B B—B--b a— > b Fig. 2.6 Top-down graph of the grammar of Example 2.35 Factorization An advantage of the top-down graph over a linear representation is the ability to show alternatives in a factorized form, as can be done in EBNF. From the rule A— abclacd withthe top-down graph A > a—»b— +c Q— Ce ed we get by left-factorization the EBNF rule
44 Syntax A —> Chap. 2 a(belcd) withthe top-down graph A = a—»b—+c abcldec withthe top-down graph A > a—b—c From the rule A —* we obtain by right-factorization the EBNF rule A —— (ab|de)c withthetop-down graph A = we: c d—ee Notice that the last top-down graph is no longer a tree. A special case occurs when an alternative is the beginning of another alternative. Then, an e is created by factorization. For the rule A — abla with the top-down graph A = a—b A = a—+b we get by left-factorization the EBNF rule A-—> alb] with the top-down graph € Removal of left-recursive rules The symbol strings defined by left-recursive rules can be represented in EBNF by the repetition symbol. Repetition corresponds to a loop in the top-down graph. From the rule A — alAb with the top-down graph A = a 5 A= we get the EBNF rule A — >, a{b} with the top-down graph A => a a: b “| €
Sece<:2.3 The top-down graph 45 This top-down graph is also not a tree. It can easily be verified that it represents all possible right-hand sides such as a, ab, abb, abbb, etc. The complement symbol any Sometimes it is desirable to represent a set of terminals by its complementary set, for example 1. 2. 3. in the description of a string enclosed in quotes: the set of all symbols in the alphabet except the quote; any symbol in a comment of the form (* ... *) by the set of all symbols except the symbol *) ; any symbols except begin (to skip declarations). Complementary sets cannot be represented in the production notation of formal languages. Therefore, the only thing left to do is to enumerate all members of the complementary set, which is very inconvenient. For use in Coco, we introduce the special symbol any to denote complementary sets. 2.36 Definition Complement symbol any The complement symbol any represents every arbitrary terminal that is not a terminal start symbol of an alternative of any. Figure 2.7 shows the three examples above with the symbol any as an EBNF rule and as a top-down graph. string comment skip = {any} win = = "(*" {any} Usha) {any} "begin". "x)" string => comment skip = any won “ce > | any non a)" any a ii "begin" Fig. 2.7 The meaning of the complement symbol any Equivalent top-down graphs If one uses only the basic structure, then a unique top-down graph results from a grammar rule. This uniqueness is lost with factorization and removal of left recursion. In these cases there are sometimes several equivalent top-down graphs.
Chap. 2 Syntax 46 2.37 Example Equivalent top-down graphs Consider the expression TEE TTS ites i ays By factorization and elimination of left-recursion the graph shown by the upper part of Fig. 2.8 will result. It has 10 nodes and corresponds to the EBNF rule E = die | wow We | Woo T) RN ii) | noe Ths Another top-down graph which is equivalent to both but consists of only 7 nodes appears in the middle part of Fig. 2.8. This graph corresponds to the EBNF rule E = (eae | WES Ge C(t | | T} Figure 2.8 shows another equivalent and even more condensed top-down graph with only 6 nodes. This top-down graph no longer corresponds to an EBNF rule. 10 nodes [ 7 nodes 6 nodes = + wa mwa) Fig. 2.8 Three equivalent top-down graphs for expressions
Sec. 23 The top-down graph 47 The graph with the fewest nodes occupies the least memory. However, there may be reasons (due to the treatment of semantics, see Section 3.6) not to compress the top-down graph too much. These examples show that for each EBNF rule there is a corresponding top-down graph. But a top-down graph does not always correspond to an EBNF rule. Representation The top-down graph can be represented in memory by a data structure of nodes and pointers that may be dynamically generated or statically declared and initialized. Since the number of nodes is known in advance and does not change, the static declaration is more efficient. In Coco, the top-down graph basically consists of an array of nodes, and each node consists of four components: type Graphnode = record kind: (terminal,nonterminal,any,eps) ; val,lp,rp: integer; end; var graph: array(l:n) of Graphnode; The names have the following meaning: kind: ” the various node types. val: Ip: rp: n: the node symbol in some encoding, meaningless for e-nodes. the left pointer that points to the first node of the next alternative. If lp > 0 then graph(Ip) starts the next alternative. If /p = 0, the current alternative is the last one of the production. the right pointer that points to the next component. If rp > O then graph(rp) is the next component. If rp = 0, the current component is the last component of an alternative. the number of nodes in the grammar. LL(1) Conditions for top-down graphs The LL(1) condition of Section 2.2 refers to the simple grammar representation with rules and alternatives. If a grammar meets these rules, the correct alternative can be selected by a lookahead of one symbol without backtracking. A similar condition for the top-down graph ensures the correct selection of the alternatives without backtracking by use of a lookahead of one symbol. To simplify the discussion, we introduce two auxiliary concepts. Since they are of central importance for the syntax analysis of top-down graphs, we will use them often. We call these concepts ‘alternative chain’ and 'match'.
48 Syntax Chap. 2 2.38 Definition Alternative chain An alternative chain is the ordered set of all nodes of a top-down graph that are linked together by left-pointers. A node pointed to by a right pointer is the start of an alternative chain. A node without a left pointer is the end of an alternative chain. We can define nodes that are not linked by left pointers as also belonging to an alternative chain. In this case the alternative chain consists of the node alone. 2.39 Example Alternative chains In the top-down graph of Fig. 2.9 symbols are distinguished by subscripts. The graph contains the alternative chains (Pisses) The ae oe sy Note that node T; belongs to two alternative chains. Beer 1 | 12 At 3 + —TIT 4 | 5) u, Eg Fig. 2.9 Top-down graph for expressions with indexed symbols 2.40 Definition Match An input symbol x and a node of the top-down graph with symbol sy match (i. e. fit together) if one of the following conditions are met: 1. sy is a terminal and x = sy; 2. sy is a nondeletable nonterminal that may start with x; 3. sy is a deletable nonterminal. sy can start with x or xcan follow the node sy; 4. sy is an e-node and x can follow the node SY; 5. sy is an any-node and x matches no other node in the alternative chain to which the any-node belongs. In order to select a node Joc uniquely from an alternative chain using a lookahead symbol x, x must match only one alternative:
Sec. 2.3 The top-down graph 49 2.41 LL(1) conditions for top-down graphs An alternative chain is LL(1) if an arbitrary input symbol matches at most one of its nodes. A top-down graph is LL(1) if all of its alternative chains are LL(1). The top-down graph of Fig. 2.9 is therefore LL(1) if T cannot start with + or — and if E cannot be followed by + or — (these symbols would match the enode). Since each EBNF production corresponds to a top-down graph, the LL(1) conditions for top-down graphs are also the LL(1) conditions for EBNF grammars. In order to check if an EBNF grammar is LL(1), it is easiest to generate its top-down graph and check if it meets the LL(1) conditions. The LL(1) conditions for EBNF grammars can also be derived from the definition of the EBNF grammar alone without constructing a top-down graph. However, this is cumbersome and results in no new insights. We therefore omit the description and leave the task to the interested reader. LL(1) Top-down graphs and grammars of programming languages If top-down graphs are to have practical value, one must be able to represent the grammars of programming languages as LL(1) top-down graphs, and therefore as LL(1) EBNF grammars. We may ask, therefore, if they do this without exception, or if there are constructs that resist an LL(1) representation, and if so, what can be done about it. First of all, LL(1) violations by left-recursive productions and by of several alternatives with the same symbol can easily be avoided down graphs and in EBNF notation. Remaining LL(1) violations can be removed with various tricks that are determined with insight into ticular situation. As an aid for the 'grammar designer’, we will treat typical cases and distinguish between the following five methods: the start in topusually the parseveral substitution and factorization; alphabet extension; syntactic extension; acceptance of non-LL(1) constructs; AB WN miscellaneous transformations. Substitution and factorization. Consider a production with two alternatives that start with different nonterminals X and Y, where X and Y can start with the same symbol (terminal or nonterminal). Then it is often possible to
50 Syntax Chap. 2 ions, and replace the symbols X and Y by the right-hand side of their product , to extract their common starting string by left-factorization. of tions instruc DO This can be simple and obvious as in the various PLM/80 (and similarly in PL/1): statement = | dostatement | whilestatement | forstatement | casestatement | dostatement whilestatement forstatement = = casestatement = = nyo "DO" Wout joil@xel<. "WHILE" expr ";" {statement} ending. "po" ident "=" expr "TO" expr ["BY" expr] {statement} ending. "DO" "CASE" expr ";" {statement} ending. Wek By substitution and factorization this results in statement ‘ = |; "po" (Um block (PIZCASE expr Pee WHTEE express aidentut="exor ) {statement} "TO" Rexpra (BY expr ending ) er However, it can also be difficult. In grammars such as Modula-2 a factor can be a set or a designator, and both can begin with an identifier: factor =a designator set qualident = qualident {"." ident | "[" exprlist = (qualident nu Vfelenentlesejen}T: = ident {"." ident}. Edesionatorz lack parcial uses "]" | "an. Note that even the production for designator taken alone is not LL(1). For instance, ident.ident may be simply a qualident or a qualident followed by dent The removal of the LL(1) conflict consists of combining designator and set into a new symbol deset, and then splitting designator into ident and a remainder desigrest. After several substitutions and factorizations, the following LL(1) constructs result: factor =... ieee agli | deset aa ee | LGN 120067
Sec#23 The top-down graph iidenrs Mer (BAU saidenty tp wi erxpristeunle, [elementlist] "}" | "{" 51 desigrest [actpars] | [actpars] Nae desigrest = {"." ident | "[" Cxprduist my] tan) mtn). The equivalence of the old and new constructs can no longer be easily seen. Alphabet extension. In selecting an alternative, it is fairly common for two lookahead symbols to be necessary to find the right one. The main example of this is when labels appear in front of statements: statement = [ident ":"] (ident ":=" expr | ifstatement gestsbc An ident at the beginning of a statement may be a label or the left part of an assignment. This can only be determined by the symbol following ident. This conflict can often be resolved by extending the terminal alphabet. In the preceding case, the word label can be added to the alphabet, and the lexical analyzer can be required to supply a label instead of an ident if ident is fol- lowed by a':'. In this case, the lexical analyzer is used to resolve the LL(1) conflict. This method leads to complications if the lexical analyzer is required to carry out a wider inspection of context to determine whether or not to substitute two terminals by another. For example, in Algol 60, ‘ident :' does not always mean the label of an instruction. An identifier may also appear in a declaration, as in ARRAY(n : m). In such cases, the lexical analyzer is no longer independent of the syntax analyzer since it must consider the context. Syntactic extension. In Algol 60 there exist multiple assi gnments, such as assignment = designator ":=" {designator ":="} expr. where expr can start with designator. This LL(1) violation is very nasty. It can be removed by ‘substitution and factorization’, but this is very cumbersome (the reader should try it). It is easier to 'expand' the designator inside the curly brackets to expr. This requires the introduction of an additional production for assignrest: assignment = designator ":=" assignrest = expr assignrest]. [":=" assignrest. The syntactic extension must be compensated by a semantic restriction. If in the production for assignrest the right-recursive part is present, expr must be restricted to be a designator. This can be achieved by the introduction of a boolean attribute isdesignator. Anticipating knowledge from Chapter 3, this
Chap. 2 Syntax 52 may be written as an attributed grammar as follows: assignrest = ©XPLTi sdesignator (see assignrest]. where (isdesignator) This means: by syntactic extensions, portions of the language definition are moved from syntax to static semantics. Acceptance of non-LL(1) constructs. If it is known that the parser tries to match the alternatives in the order they are written, some LL(1) violations can be left alone. The best known case is the dangling else: ifstatement = "IF" expr "THEN" statement ["ELSE" statement]. Although this construct is not LL(1), and is even ambiguous (see Example 2.22), it can be left alone if one can be sure that the parser, having recognized the statement following THEN, first tries to detect the optional ELSE, and only regards the entire if statement as complete if there is no ELSE. Other transformations. Sometimes, a grammar that is not LL(1) can be trans- formed into an equivalent LL(1) grammar by simple transformations that do not fall into any of the four categories above. For example, in Algol 60, a block is defined as block = head head = ";" "begin" body. declaration {";" declaration}. This construct is not LL(1) since the semicolon is used in a dual role. It sep- arates adjacent declarations and it separates body from head. The solution is simple: The grammar can be transformed so that the semicolon becomes a terminator instead of a separator: block head = head body "begin sdeclaratvone t+? (declaration, a} The necessity of such transformations, their difficulty, and the uncertainty of executing these transformations correctly is a weakness of the LL-method, and often a cause for criticism. In bottom-up analyzable LR(1) grammars, no transformations, or only a few, are needed, so research has been focused on the LR-method. However, syntax is but one aspect. What is gained with the LR-method must be paid back by the connection of semantics to syntax: it is much more inflexible in the LR-method than in the LL-method, often leads to violations of the LR-property, and then also requires transformations. In addition, the LL(1)-method is much easier to understand than the LR-method. This results in easier transformations and more understandable error messages.
secs2.4 2.4 The G-code The 53 G-code A top-down graph that resides in memory is a useful way of representing a grammar. It already requires little space, but it can be significantly compressed further. Let us consider the grammar of arithmetic expressions: S = expr. eXPrE RECOM (usr Gece mht term = factor {"*" factor}. factor = v | "(" expr ")", Now, let us add the production S' = S eofsy where S' is the new sentence symbol and eofsy (= end of file) a new terminal. This trick ensures that each sentence terminates with the same symbol eofsy and that there is no empty sentence if S can be derived into the empty string. Su => S— S > expr expr => CHE term > factor factor = v ME = eofsy "+" "*" expr — term a _>- factor = ye Fig. 2.10 Top-down graph for an expression: graphic representation In Fig. 2.10 we see a top-down graph of a grammar with 15 nodes. In Fig. 2.11 we see the internal memory representation described in Section 2.3. If we assume one byte each for the components typ and val, and two bytes each for /p and rp, then the table requires 15*6 = 90 bytes. Compacting can be achieved by partitioning the nodes according to their types, and by coding the individual types so that they do not contain any unnecessary information. The G-code (grammar code) that we use is such a code. For syntax analysis the elements of the G-code behave as instructions and therefore they are written as instructions. Sequential G-code instructions are sequentially executed. They correspond to nodes in the top-down graph
Chap. 2 Syntax 54 as far that are connected by right pointers. Definition 2.42 defines the G-code graph. n as it is relevant for the representation of a top-dow tule for S ' tule for S tule for expr tule for term rule for factor Fig. 2.11 Top-down graph for an expression: representation in memory The G-code is augmented by tables containing the lookahead symbols. With each nonterminal symbol sy (not with each nonterminal node) there is associated a set first(sy), containing its terminal start symbols. The operand nr of an e-instruction (= EPS and EPSA) refers to an array epsset. Thus epsset(nr) contains all terminals that match the corresponding e-node (see Definition 2.40). The operand nr of an ANY A-instruction refers to an array anyset. Thus anyset(nr) contains all terminals that match the corresponding any-node. In summary, these G-code lookahead sets have the following data structures: first: array(l:maxnt) of Symbolset epsset: anyset: array(l:maxeps) array(l:maxany) of of Symbolset Symbolset If the lookahead sets are stored bitwise, they do not require much memory. It can be seen that each node of the top-down graph corresponds to a Gcode instruction. The G-code instructions RET and JMP are added at the end of productions and loops where the linear execution sequence is interrupted.
Sec: 2:4 The G-code 2.42 Definition 55 G-code (incomplete) Instruction Bytes Description ee e Ah eh Ue e i sy 2 ERBEN terminal If the next input symbol is sy then recognize it, else report an error. TA sy adr 4 terminal with alternative If the next input symbol is sy then recognize it, else go to adr. NT sy 2 nonterminal If the next input symbol is a terminal start of sy then step through its production, else report an error. NTA sy 4 nonterminal with alternative If the next input symbol is a terminal start of sy then step through its production, else go to adr. 1 any Recognize the next input symbol. 4 any with alternative If the next input symbol is contained in the symbol set indicated by nr then recognize it, else go to adr. 2 epsilon If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else report an error. 4 epsilon with alternative If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else go to adr. 3 jump Go to adr. adr ANY ANYA nr IES) ie EPSA nr JMP adr adr adr 1 RET return Return from the production of a nonterminal. The operation code and the operands sy and nr are 1 byte each; adr is 2 bytes. The following G-code results for the grammar shown in the top-down graph of Fig. 2,10: In? are? 5) RET The production S' = S eofsy. The S = expr. S eofsy 6 8 NT RET expr 9 Ta NT erh term 19 NT production The production BERN term 20 expr = term {"+" term}.
Chap. 2 Syntax 56 iW 20 22 JMP 11 BPSma RET 31> SAGE 36 JME 25 ERS? RET 37 41 TATEN RET 42 T " The production The production term = factor FACT ORM wa {Ux SACtOR} . NeZexpren)ir 42 (” 44 NT expr 4 6 M " )u 48 RET The lookahead sets are: first (S) Sn, first (expr) Zn) I first (term) eirseltactor) = = {vj 3} Aig, ul) epsset(1) = {eofsy, ")"} = {eofsy, ")", epsset (2) "t"} anyset is empty since the grammar contains no any-symbol. The total amount of G-code is 48 bytes, which is slightly more than onehalf of the top-down graph. In Coco, first of all a top-down graph is generated. It is then used to check several properties of the grammar, and to calculate the start and successor sets. Finally the graph is transformed into G-code, and this is the ultimate structure in which the grammar is stored. 2.5 Parsing with the G-code Parsing becomes quite simple with the G-code since the G-code itself is already a parsing program. To make a parser, it is only necessary to code an interpreter for a G-code program. In this section we will develop such a parser without error handling. In the next section we will add the error handling. Assumptions We will summarize here the assumptions on which parsing with the G-code is based.
Sec. 2.5 1. 2. Parsing with the G-code The G-code is derived from a top-down graph that meets the LL(1) conditions. IfS is the sentence symbol, then the top-down graph and the G-code are expanded by the production Su) 3. 4. 07. SSmCOLSY, where eofsy is the terminal end-of-file that does not appear in the original grammar. The first G-code instruction of this production has the address 1. The symbol string to be parsed is supplied by a lexical analyzer, which provides the next input symbol in the variable typ for each call. After reaching the last source symbol, the lexical analyzer supplies the symbol eofsy. The parsing algorithm uses a stack of actual length lacts (= actual length of stack) to store the addresses that follow the nonterminal instructions currently being processed (these are the "return addresses" of the currently parsed nonterminals). Overview The parsing algorithm executes the G-code program which is controlled by the input string to be recognized. It starts at address 1 and ends at the instruction for the symbol eofsy. Depending on the current input symbol typ and the current G-code instruction several courses of action are possible. When the algorithm tries to recognize a terminal there are two possibilities: if it succeeds then it moves to the next symbol; if it fails then it goes on to the next alternative (if there is any). When the algorithm tries to recognize a nonterminal, there are also two possibilities: if the input string and the nonterminal match then the algorithm pushes the address of the next instruction on the stack and jumps to the first G-code instruction of the nonterminal; if they do not match then it goes on to the next alternative (if there is any). At the end of productions, the ‘return address’ is popped from the stack with RET, and the algorithm continues from there on. When an error occurs, error handling and synchronization take place, after which parsing continues as if no error had occurred. The analysis ends when typ = eofsy and the corresponding Gcode instruction is T eofsy. The parsing algorithm is called Parse. It returns a boolean variable correct which will be true if the analyzed input text is syntactically correct. Parse is an interpreter that has the following structure:
Chap. 2 Syntax 58 Parse (Tcorrect): --program pc:=1; loop opcode: =G-code (Upc) ; case opcode of --G counter operation ts | ta: execute execute instruction instruction "T sy" and change pc "TA sy" adr and change | jmp: end execute instruction "JMP code pc adr" end end Parse Inside the loop, a value is assigned to the result parameter and the loop is terminated if typ = eofsy. The simplified G-code parsing algorithm First we will present a simplified version of Parse that does not contain the instructions ANY, ANYA, EPS, EPSA, and does not have any error actions. We further assume that nonterminals are not deletable. For the description of Parse in Adele, we will use the following routines: Decode( pct opcode sy? nr? adr? nextpc) returns the parameters of the G-code instruction starting at address pc. (An operand that does not appear in the actual instruction returns an undefined value of the corresponding parameter.) nextpc is the address of the next instruction. NewSym(t typ) returns the next input symbol. Root(} sy): integer returns the address of the first G-code instruction for the production for the nonterminal sy. By using these actions, the simplified algorithm is as follows: 2.43 Algorithm Parse (simplified) Parse(Tcorrect) : param CONSE correct: boolean; COLSY =. type Instruction = --correctness --end of file indicator symbol (t,ta,nt,nta, jmp, ret) ; local adr: integer; first: array of Symbolset; lacts: integer; nextpc: integer; nr: integer; --instruction part adr --lookahead symbol sets --actual stack length --addr.of next G-code instr. --instruction part nr
ECHL Parsing with the G-code 59 opcode: Instruction; pc: integer; stack: array of integer; sy: integer; typ: integer; begin --instruction part opcode --program counter --nonterminals worked on --sy part of G-code instr. --current source symbol pe:=1; --init.and lacts:=0; NewSym(Ttyp) ; read first loop Decode (LpcT opcode? syfnrf aartnextpc) ; case opcode of te if typ=sy F then if typ=eofsy then correct:=true; exit end; pes=nextpc; else end NewSym(Ttyp) correct:=false; exit | ta: --get instruction --term.without --must match --terminate --advance then else end pc:=nextpc; pc:=adr --may NewSym(Ttyp) fans in first (sy) lacts:=lacts+l; stack(lacts) else end correct:=false; exit | nta: if typ then else and read unsuccessfully with alternative match --advance and read --goto alternative :=nextpc; stack (lacts) :=nextpc; pc:=adr alternative pc:=Root (sy) --terminate if error loop --nonterminal --may match in first(sy) lacts:=lactstl; alternative --nonterm.without --must match if typ «, then pc successfully --terminate --terminal if typ=sy at symbol with pc:=Root --goto alternative --jump to next alternative (Jsy) end | jmp: pc:=adr | -xets pe:=stack (lacts); --case end end --loop end Parse. lacts:=lacts-1 --return --find follower instruction in stack The complete G-code parsing algorithm We will now add the interpretation of the instructions and properties that were left out in the previous section, and provide the following explanations. The instruction ANY recognizes any source symbol, and ANYA recognizes any source symbol that is a member of the lookahead set belonging to this instruction. The instructions EPS and EPSA recognize the empty string if the source symbol matches their lookahead set.
Chap. 2 Syntax 60 In the case of an error, the analysis shall not be terminated. Rather, the error handler Error ({pclaltroot) will be executed. Error requires as parameters the address pc of the nonmatching G-instruction and the address altroot (root of alternative chain) of the first G-instruction of the alternative chain in which the error occurred. Error synchronizes by skipping of input symbols, changes pc and altroot, and sets correct to the value false. Error is thus local to Parse. Every time an input symbol has been successfully parsed, the next symbol can be read, and altroot can be set to a new alternative chain. For semantic reasons, however, these actions are delayed until the input symbol is actually required by the parser. Instead of reading a symbol, the variable mustread is set to true. Furthermore, in the complete version we will consider the possibility that a nonterminal X can be derived into the empty string. This can be tested with the function Deletable (x): boolean Such a nonterminal is always recognized, even if the current input symbol does not belong to its terminal start symbols (explanation in Section 7.3.3). This requires the interpretation of the instructions NT and NTA to be extended. Expanded in this way, the algorithm Parse has the following complete form: 2.44 Algorithm Parse (complete) Parse (Tcorrect): param const correct: eofsy= boolean; type Instruction = ne, pc: sinteger; Instruction; of integer; indicator symbol jmp,ret); --instruction part adr --root of alternative chain --lookahead symbol set --lookahead symbol set --lookahead symbol set --actual stack length --typ is consumed --address of G-instruction --instruction part nr --instruction part opcode --program integer; stack: array sy: integer; correctness end of file (t,ta,nt,nta,any,anya,eps,epsa, local adr: integer; altroot: integer; anyset: array of Symbolset; epsset: array of symbolset; first: array of Symbolset; lacts: integer; mustread: boolean; nextpc: integer; opcode: --- counter --nonterminals worked on --instruction part sy
Sec. 2.5 Parsing with the G-code typ: integer; Error VS GAs --current source symbol --local error procedure Calol ierdones begin pc:=1; altroot:=1; mustread:=true; lacts:=0; loop Decode (LpcT opcode if mustread then --initialize sy Tart adrtnextpc) ; at pc symbol source --terminal without --must match alternative then correct:=true; end; pc:=nextpc; instruction next mustread: =false # if typ=sy then if typ=eofsy exit mustread:=true else Error ($pclaltroot) end | ete if typ=sy then pc:=nextpc; else pc:=adr end Ie hee pe --get --read NewSym(Ttyp) ; altroot:=pc; end; case opcode of Gi 61 mustread:=true sor Deletable else Error (}pc,laltroot) successfully --terminal with alternative --may match --advance --goto alternative (Jsy) without --must stack (lacts) :=nextpc; altroot:=pc loop --advance =-sets correct :=false --nonterm. ety peainer inst sy) then lacts:=lacts+l; pc:=Root (Jsy); --terminate --push --parse --sets alternative match rule follower for nonterminal correct:=false end Ente: --nonterminal,with alternative ir type in first (syn ion Deletable (J sy) --may match then lacts:=lacts+l; stack (lacts) :=nextpc; --push follower pc:=Root (Jsy); altroot:=pc --parse rule for nonterminal else pc:=adr --goto alternative end | any: --any without .alternative pc:=nextpc; mustread:=true --advance | anya: --any with alternative if typ in anyset (nr) : then pc:=nextpc; mustread:=true --advance else pc:=adr --goto alternative end | eps: --epsilon if typ then in epsset (nr) --must pc:=nextpc --advance else Error ({pclaltroot) | end epsa: if typ in epsset (nr) --sets match correct:=false --epsilon with --may match alternative
Chap. 2 Syntax 62 --advance --goto alternative pc:=nextpc pc:=adr then else | end --jump | jmp: pce:=adr | ret: pc:=stack (lacts); altroot :=pc end --case end --loop end Parse. 2.6 lacts:=lacts-1; to next --return --find follower instruction in stack Error handling Principle A syntax error arises in one of three situations: (1) the input symbol typ does not match the symbol sy in the G-code instruction T; (2) typ is not a terminal start symbol of the instruction NT; (3) typ is not a terminal successor of the instruction EPS. In any of these situations, the variable altroot contains the address of the alternative chain in which the error occurred and the stack contains the return addresses of all nonterminals that are currently being processed. This is sufficient information to collect all terminals that can be used to continue the analysis. The following example illustrates the situation. 2.45 Example Error situation Consider the grammar fragment: program = declarations body end. declarations = . body = statement {statement}. statement | "if" = relation relation relop expr = = expr Wy "then" relop | Wau body ... expr. | n_u | We | Wea | we" =... and the input text . if a:=b then c:=d end ... When the syntax analyzer detects the error caused by the ':=", the situation shown in Fig. 2.12 has been reached. The boxes in this figure enclose the grammar symbols of the G-code instructions whose addresses are in the stack.
Sec. 2.6 Error handling 63 z Es] program declarations body = |Statement| ¢ —_— statement Statement relation expr Io if a relop = b then c:=d end Fig. 2.12 Partial syntax tree of an erroneous translation of the instruction if end c:=d then a:=b The last input symbol which was correctly recognized is a. It was recognized as expr. Then relop must follow. Since relop cannot start with ':=' the procedure Error(tpctaltroot) is called. The stack contains the addresses of the G-code instructions for the recognition of eof, end, statement, then 1 bottom of stack top of stack We will now collect the so-called 'anchors', i. e. all terminals that are suitable for the resumption of the syntax analysis. They can be grouped into four classes: 1. All terminal start symbols of the alternative chain starting at altroot, because the erroneous symbol may have been added inadvertently by the coder, in front of a symbol of the unrecognized alternative chain. In the 2. All terminal successors of the alternative chain at altroot, because the erroneous symbol may appear in place of a symbol of the unrecognized alternative chain. In the example, this set consists of the beginnings of expr: Vv, C, +, -, (. example, these are the symbols >, >=, =, <>, <=, <.
64 3. 4. Syntax Chap. 2 The terminal start symbols of all symbols in the stack, and of the alternative chains beginning with them. With these, syntax analysis can be resumed after a non-recognized nonterminal. In the example these are the symbols then, end, eofsy and the set first(statement). All terminal successors of the alternative chains whose addresses are in the stack. In the example, these are all terminal start symbols of body since body follows then, and all terminal start symbols of statement since statement follows statement. While the inclusion of items 1 to 3 in the set of anchors is plausible, the inclusion of item 4 seems rather arbitrary. We could justify this by the fact that items 3 and 4 are symmetric to items 1 and 2, but there is a heuristic reason as well. In a grammar, where the ';' is a statement separator rather then a statement terminator, without rule 4 the set of anchors would contain the ';' but not the start symbols of statements. Then, in the case of a missing ';' between statements, which is a common error, the next statement would be skipped. Rule 4 prevents this by adding the start symbols of statements to the set of anchors. Similar errors, e. g. the suppression of a comma between expressions, are also quite likely to occur. Now, input symbols are skipped until one of them appears in the set of the anchors. In the worst case this appears at the end of the input text, since eofsy is always among the anchors. Next, the stack must be corrected. If the anchor is a terminal start symbol of the alternative chain, whose address is in stack(t), analysis will be resumed at this address and the stack length will be reduced to t - 1. In Example 2.45, only ':=' is skipped since b is a start symbol of expr and the stack is not reduced. In summary, we can describe the principle of error handling as follows: 2.46 Principle of error handling An error is detected if an alternative chain is unsuccessfully traversed up to its end. Then the error is flagged and the analysis must be synchronized. The synchronization consists of collecting a set of anchors and of skipping the input text up to the first input symbol that is contained in the set of anchors. With it, the analysis can be resumed at the address pc of the anchor. During this process the stack is reduced if necessary so that at the end of the error handling the following assertion holds: Starting with the G-code instruction at pc the analysis can be continued with the current input symbol typ (typ matches the alternative chain at pc). The stack contains the return addresses of all nontermi- nals currently under process when continuing the analysis with pc.
Sec. 2.6 Error handling 65 This error handling has two remarkable features: 1. It is completely independent of the Syntax of the input languag e. 2. Anchors are collected only if an error is detected. It is therefor e completely dynamic and starts anew for each error. Hence, the presence of error handling does not reduce the parsing speed in case of a correct input string. The synchronization itself is expensive, but, since errors are infrequent, this is only a slight disadvantage. The algorithm Error From the preceding section, the basic structure of the algorithm Error is obvious now: 2.47 Algorithm Error (basic structure) Error (tpclaltroot): global correct: boolean; lacts: integer; --actual stack begin correct:=false; Print error message; Collect anchors; skip input symbols up to the first anchor; Correct pc, altroot and lacts . »- It is synchronized. The analysis can continue end Error length Error messages The error messages are also independent of the input language. At the error location, we simply extract all expected symbols from the G-code and list them. In Example 2.45, the following error message will occur: . if a:=b | then relop c:=d end ... expected This message is sufficient for most purposes. In Coco we also provide the option for the user to output his own error messages (see Section 5.2.2). The collection of anchors Since, after synchronization, parsing is resumed with a new G-code instruction newpc and with a new stack length newlacts, anchors are collected as triples: (newtyp, newpc, newlacts) A procedure Triple produces a triple list in which the following triple categories are included:
66 1. 2. 3. Syntax Chap. 2 the terminal start symbols of the alternative chain beginning with altroot, the terminal successors of the alternative chain beginning with altroot, the terminal start symbols of all alternative chains whose addresses are in the stack; 4. the terminal successors of all alternative chains whose addresses are in the stack. If a terminal belongs to more than one of the four categories, category 1 has priority (because no symbol needs to be read). Category 2 has priority over categories 3 and 4 (because synchronization can take place in the same production where the error occurred). Of the anchors derived from the stack, the ones closest to the error location have priority, and the terminal start symbols of the stacked alternative chains have priority over their successors. In order to fill the triple list with terminal start symbols and successors corresponding to the priority rules, we use the algorithms Fill and FillSucc. Hence, the algorithm Triple has the following form: 2.48 Algorithm Triple Triple(Laltroot) : global triple list; stack: array of lacts: integer; begin triple list := empty; for i:=1 to lacts do FillSucc integer; (stack (i)Ji-1); Fill (Lstack (i)Ji-1) --actual —ONAGG stack size W --class 3 end; FillSucc (JaltrootJlacts); = e1la55@2 Fill(Jaltrootllacts) end Triple elassel As a concrete data structure of the triple list, we use two arrays newpc and newlacts, which are indexed with the maxt + 1 terminals of the grammar: newpc, newlact: array(0 : maxt) of integer The algorithms Fill and FillSucc use the following procedure to obtain Gcode instructions: GetSymlInstr(l pct opcode? sy? nextpcT altpc) which supplies the G-code instruction at pc. The two last parameters have the meaning:
Sec. 2.6 nextpc: altpc: Error handling 67 Address of the first 'symbol-recognizing' instruction (LTA, NG, ANY, ANYA) which follows the instruction at pc in the same production, or 0 if no such instruction exists. Address of the first ‘symbol-recognizing' instruction which is an alternative of the instruction at Pc, or 0 if no such instruc tion exists. Fill and FillSucc can now be described as follows: 2.49 Algorithm Fill 2 Fill(Jfirstpellacts).: global newpc,newlact: begin pce:=firstpc; while pc#0 do array(0:maxt) of integer; GetSymInstr(JpcTopcode? sytnextpct altpc) ; case opcode of t,ta: newpc (sy) :=pc; | nt,nta,nts,ntas: for all newlacts(pc) x € first (sy) :=lacts do newpc (x) :=pc; newlacts(x) :=lacts end « | any,anya: --nothing (eps and ret do not “end; pc:=altpc end end Fill exast) 2.50 Algorithm FillSucc FillSucc (4startpcllacts): global newpc,newlact: array(0:maxt) begin pc:=startpc; while pc#0 do of integer; Get SymInstr (pct opcode? syTnextcpt altcp) ; if nextcp#0 pe:=altpc end then Fill (dnextpcd lacts) end; end FillSucc Heuristic improvements This synchronization procedure works well in most cases and synchronizes rapidly. However, it is not uncommon for the synchronization to be incorrect, causing spurious error messages or the skipping of longer text portions. The quality of the synchronization also depends on the grammar. It can be
Chap. 2 Syntax 68 ones. improved by partitioning long grammar productions into several shorter This increases the number of anchors. We have improved the procedure with two heuristics, which are also independent of the grammar: 1. If several errors occur close together, we print only the first one, under the assumption that the remaining errors are spurious, resulting from the first one. We introduce an error distance, errdist, which is set to 0 after the handling of any error, and is increased by one for each input symbol read. If errdist is less than a predetermined limit errdistmin when an error occurs, no error message is given. We use errdistmin = 2, i. e. at least two symbols must have been recognized since the last error, other2. wise a spurious error is assumed. When a spurious error occurs, the stack may have already changed from the value when the original error occurred. Therefore, we save the stack at each original error, and restore it at a spurious error. The heuristics only apply to the program Error and not to its subprograms. Error now has the final form: 2.51 Algorithm Error (with heuristic enhancements) Error (fpclaltroot) : global correct: boolean; lacts: integer; errdist: integer; errdistmin: integer; begin correct :=false; if errdist<=errdistmin then Print error message; Collect the anchors; Save the stack else Replace the stack end; --stack length --error distance --minimal error distance again Skip input symbols up to the first anchor; Correct pc, altroot, and lacts; -- It is synchronized. The analysis can continue errdist :=0 end Error Coco includes the above error-handling method in the generated parser. A similar error handling was published by Spenke et al [1984]. They assign weights to the anchors and make the use of an anchor for synchronization dependent upon its ‘insertion overhead’ and its ‘reliability’.
3 Semantics Syntax analysis checks a source program only for formal correctness. That is, it only determines whether the input string is a sentence of the given grammar. This function is shown in Fig. 3.1. sty, Source program (character sequence) Recognized or Parser not recognized Fig. 3.1 Parser Translation into a target language presents the additional requirement that the source program must be transformed into the target program. The 'meaning' of the target program should be the same as that of the source program, i.e. the semantics should be retained. A program that does this is a compiler (Fig. 372), Source program Target program Fig. 3.2 Compiler A compiler emerges from a parser if the parser is able to emit so-called 'semantic actions’ each time it has parsed some syntactic construct. The semantic actions in turn generate output symbols which constitute in their entirety the target program. 69
70 Semantics Chap. 3 This chapter covers attributed grammars, which are presently the most common technique for the formal description of translation processes. To describe the translation the context-free grammar for the source program is enhanced by three items: 1. semantic actions, which describe the actions that must be performed dur2. ing the translation; attributes, which describe properties of the grammar symbols and their environment; 3. context conditions, which describe relationships between attributes. We will introduce these three items one-by-one, then cover the formalism of the attributed grammar as a whole, and finally cover a subset of the attributed grammars, the so-called L-attributed grammars, used by Coco. 3.1 Semantic actions The description of semantic actions can be inserted directly at the desired locations in the grammar productions, e. g. by means of the special delimiters sem ... endsem. For a left-to-right parsing of a production A > @, 2, the execution of the semantic action statseq after parsing w; and before parsing w2 can be described by inserting the semantic action between wı and w2: A 0] sem statseq endsem 07 This production is to be interpreted in such a way that, for the parsing of A, where syntax analysis proceeds from left to right, first @; is parsed, then the semantic action statseq is performed, and afterwards w2 is parsed. For the description of the semantic actions themselves there are no generally accepted conventions. We will use the language constructs of Adele or Modula-2. 3.1 Example Semantic actions Given a grammar of an arbitrary sequence of zeros and ones: S => Ws | iS | € The task consists of reversing a sentence o of L(G(S)) to produce an output where the first input symbol is output as the last, the second input symbol is output as the next to last, and so on. This translation is simply written as
Sec. 3.2 Attributes S$ > 0S eS: sem Write('0') endsem sem Write('l') endsem 71 I, 2 For a given input sentence, e.g. o = 001, the semantic actions can be traced according to the syntax tree of Fig. 3.3. If parsing is performed top-down from left to right, the output string 100 results. eee ae = ? 0 S 0 S 1 sem N sem sem Write('0') Write('0') Write('l') endsem endsem endsem ee Fig. 3.3 Syntax tree with semantic actions 5# The next example will show that this method can also describe more difficult transformations. 3.2 Example Semantic actions Given the grammar of the previous example, the task is to transform an input sequence of n zeros and m ones into an output sequence of the same length which contains all n zeros followed by all m ones, i.e. the sequence 0” 1”. This translation is described by s | 0 sem Write('0') endsem 15 sem Write('1') endsem S ME 3.2 Attributes Even for such a simple task as the transformation of the input sequence '79 + 83' into the output sequence '162', the grammar with semantic actions fails. In general, the input sentence of any two numbers connected by '+' to
Semantics 72 Chap. 3 produce an output sequence that shows the sum of the two numbers will fail. Why? When recognizing a constant, the lexical analyzer supplies only the terminal class c (as explained up to now). Thus, the parser 'sees' only the sequence c +c as input. A semantic action that produces the sum of the two numbers, however, is not satisfied with the terminal classes of the two numbers, but requires the values of the constants. These values are the semantic properties of the individual members of the terminal class c. Thus, a lexical analyzer will have to supply two items for input symbols that are terminal classes: the type and the value of the input symbol. The symbol type (not to be confused with the data type) is the terminal symbol in the context of the grammar (variable, constant), and therefore a syntactic property, the symbol value is a semantic property. By assigning an attribute to each terminal symbol that represents a terminal class, the semantic properties of terminal classes can be introduced into the formal language description. We write attributes as indices preceded by an arrow, whereby a constant now assumes the form: ctx, where x is of the type integer. The up-arrow shows that x is the result of the parsing of c, i.e. has the character of an output parameter. By the use of attributes, we can describe the task of reading and adding two constants connected by a plus sign as follows: SES ect x a tac y sem Write(xty) endsem In general, attributes describe properties that are associated with a grammar symbol. Therefore, nonterminals can also have one or more attributes. For ex- ample, let the following three properties apply to the symbol expr: (1) ‘type of expression’, (2) the expression has no operators, and (3) the expression is translatable at compile-time. Then we can assign these three attributes to expression by writing ©XPITexprtype Tsimple Tvalueknown exprtype may assume various values dependent on its data type; simple and valueknown can assume boolean values. In general, one can assign to each nonterminal and to each terminal class X of the context-free grammar a number of attributes that describe those properties of X that cannot be described by the context-free grammar alone. Each attribute can assume a predetermined number of values. These form the attribute type. The attributes of terminal classes receive their value through the recognition of the terminal symbols by the lexical analyzer. The values of the attributes of all nonterminals are calculated by the semantic actions.
Secs 322 Attributes 73 3.3 Example Interpretation of arithmetic expressions Consider the grammar of arithmetic expressions consisting of numbers, operators, and parentheses, and terminated by a semicolon: Sy ey 15 E> T | ETT eb ere Pac TEE) We want to define formally the meaning of such an expression by a description of its interpretation. ‘Interpretation’ means that an expression will be read, its value computed, and the result printed. In the formal description it must be stated that each symbol of the grammar, except for operators, parentheses, and semicolons, has a value. This value is denoted by an attribute. For example, the production F > c is verbally interpreted by the sentence ‘the value of the factor F is the value of the constant c' and formally by the production: Et, > ch sem a:=b endsem Similarly, multiplication is described by the attributed production: 5 Ta > Tin * Ftc Sem a:=b*c endsem This means: "When recognizing the right-hand side, the attributes b and c are assigned a value, and subsequently the product of these values is computed, and assigned to the attribute a of the symbol 7". Correspondingly, the remaining productions of the grammar can be assigned attributes and semantic actions, so the complete description is as follows: Sy a sem Write(a) endsem Eta > Ttp sem a:= endsem Efs > Ein + Tr. Sem a:=btc endsem sem a:=b endsem Ta > Mn * Ftc sem a:=b*c endsem Fta > Cfp sem a:=b endsem Fta (Efp) sem endsem Ad > a:=b Such a description is called an attributed grammar. A simplified notation The reader may notice that in Example 3.3 most semantic actions consist of only an assignment. It is therefore a useful shortcut to abbreviate
Chap. 3 Semantics 74 Fta > CTD Sem a:=b endsem by Ne Aas This notation expresses the fact that the attribute of c is assigned to the output attribute of F without change. Attributes and semantic actions in EBNF The extended Backus—Naur form can also be used for the description of attributed grammars. Example 3.4 is the same as Example 3.3 but uses the simplified notation in EBNF form. 3.4 Example Interpretation of arithmetic expressions in EBNF 3 te sem Write(a) endsem U, Efa > Ifa (hr es, sem a:=atb endsem sem endsem He fa > Ffa (SIEH a:=a*b }. Pia la | oe (ee Eta un With this notation, one can see how the visual separation of syntax and semantics significantly improves readability. Input and output attributes All of the previously used attributes behave like output parameters: they are generated by the parsing of a terminal or a nonterminal, and are used afterwards. We therefore call them derived or synthesized attributes and denote them by an up-arrow. But nonterminals can also have attributes that behave as input parameters, i.e. attributes that already have values, when the parsing of the nonterminal starts. Then, semantic actions which are executed during the parsing of the nonterminal can use these values. We call such attributes inherited attributes, and denote them by a down-arrow. The next example shows the application of inherited attributes.
Sec. 3.2 Attributes 75 3.5 Example Inherited attributes Given the following grammar variables: s OC typ orreals my pi Gils ante which describes the declaration of tar | bool idlist > id | idlist , id id is the terminal class of all identifiers. The declaration consists of a keyword dcl, a type, and one or more variables of this type, for example: dcl int x, y, z. The semantic action, which should be performed during parsing of the declaration, consists of entering each variable's name name and type t into the name list. Let this be done bya call of the procedure NewId(\ nameJt). It is appropriate to call New/d immediately after the parsing of an identifier id in the production for idlist. But how can one recognize the type at this point since it was already parsed in the production for typ? The solution is to attach the type ¢ as an inherited attribute to the nonterminal idlist: s-> del typft idlist]; idlist]+ ; = idfname | idlistlt , idfname sem NewId (name, Ft) endsem sem NewId(Jname,Jt) endsem Output attributes of a known symbol A are computed during the parsing of the right-hand side of the A-production, and can thus be used during the parsing of other grammar productions that contain A as a part of their righthand side. Thus the information flows from the bottom to the top, from the leaves to the sentence symbol. Input attributes of a nonterminal A are computed prior to parsing of the A-production, and are used during its parsing. Thus the information flows from top to bottom in the syntax tree, from the sentence symbol to the leaves. Output attributes of A describe properties of the A-phrase, and its constituent phrases. Input attributes of A denote properties of the environment of the A-phrase. Figure 3.4 shows a syntax tree ‘decorated’ with attributes for the sentence: Clall abslie Sea\ye The flow of attribute values along the dashed lines can easily be seen.
del Chap. 3 Semantics 76 typ Ba idlist Z idlist |? bes E Dale, NewId(|name iiname Jt) PPPPFFERERREEFEEEEEREEEEREEEEEOEDERERDEELLLLLLLLESLLLLLLETLOLTLLTTT N idlist ä lie h id y id | Tname x NewId(Jname | Tname lies pant NewId(Jname |t) eccccescces Fig. 3.4 Analysis of the sentence dcl int x,y,z. The attributes flow along the dashed lines 3.3 Context conditions The formal syntax description of a programming language is not sufficient to distinguish between correct and incorrect programs. For example, in a programming language where all variables must be explicitly declared, the following code may be syntactically correct, even though it does not represent a valid program since the variables x and y are not declared. PROCEDURE P VAR a,b: INTEGER x!=y END P If a programming language definition states ‘each variable in an assignment statement must be declared' this defines a relationship between textually separated language elements, which cannot be represented by a context-free gram-
Sec. 3.3 Context conditions 77 mar. Such constraints are thus called context conditions and are usually considered as part of the semantics since they cannot be described syntactically. The total set of context conditions is called the static semantics of the programming language. The word static signifies that they refer to the source code and not to the execution of the program. Programming languages are full of context conditions. It would be desirable if the language definition contained explicit definitions for them, separating them from the other parts of the language definition and stressing their importance. Unfortunately, this is rarely the case since they are often buried implicitly in other definitions. Sometimes they are missing altogether since the author wants a small defining document, or because it is assumed that the reader understands them. Attributed grammars also permit the formal description of context conditions. The context condition is expressed as a relation between attributes. For example, the context condition 'the left side and the right side of an assignment must be of the same type’ imposes a relation between the type attributes of both sides. If assign = idftyittyp1 ze eXPrfyaftyp2 2. is the production for the assignment, where typ] and typ2 are the types of id and expr, then the context condition is typ] = typ2. The context condition can be written separately from the production in the form assign = idtyiTtyp1 CC: typl=typ2 PR eXPrfyaftyp2 222 or it can be integrated into the production, e. g. in a manner proposed by Watt and Lehrmann Madsen [1983]: assign = idfyıftypı ":=" eXPrfyaftyp2 "5" where(typl=typ2). The first form separates the context condition from the production in a firmer manner and is especially suited for several long context conditions. The second form emphasizes the coherence between production and context condition. According to van Wijngaarden's two-level grammar, the part where(...) can be regarded as a nonterminal that is derived into an empty string if the relationship inside the parentheses is true. It cannot be derived into a terminal string if it is false. If typ] = typ2, the syntax analysis of where(typl = typ2) then results in the empty string, so that an assignment is parsed with the remaining part of the production. However, if typ] # typ2, the terminal string representing the assignment statement is rejected since the where-part is not terminating.
Chap. 3 Semantics 78 We use the style with where and define the point of execution of the test of the context condition by its position in the production in the following way. The production A = 0] where(CC) © . means that in order to parse A, we must execute a syntax analysis from left to right that will parse w, first. Thereafter the context condition CC is tested. If it is not met, an error will be reported. Then w2 will be parsed. The following examples show the application of context conditions. 3.6 Example A context-sensitive language The language {a"b"c": n 2 1} is not context-free. It is shown in all textbooks about formal languages that a context-free grammar does not exist for this language. However, the following attributed grammar with a context condition is easily constructed: S At, Btg Cf, where(p=q=r). At, = a sem p:=l endsem {a sem p:=p+l endsem}. Btg = b sem q:=l endsem {b sem q:=q+l endsem}. Gira aac. ESeM r = Igendsemsetlcısengr: rl end Semi Here, p,q, and r represent the counts of the characters a, b, and c. The context condition requires that they are equal. 3.7 Example Context condition The context condition 'in the declaration of an array, both index bounds must be of type integer, and the lower bound must not be greater than the upper bound’, can be described as follows: arraydeclaration = idtname "(" constantToittyp: where((typl=typ2=integer) ":" constant & caTtyp2 ")" (clSc2)). where c] and c2 represent the numerical values of the bounds. 3.8 Example Context condition The context condition ‘each variable appearing in a statement must have been previously declared’, can be described as follows. One must distinguish syntactically the applied occurrence of a name (in a statement) from the defining occurrence (in a declaration), with the additional syntax rule: var = id.
Sec. 3.3 Context conditions 79 The nonterminal var denotes the applied occurrence of the name id. Therefore, var must be written in all statements in place of id. If a semantic procedure /sDeclared(\name) is used to check the symbol list to see if the name of the variable is declared, the context condition can be simply formulated as follows: Vorne ldtnene where (IsDeclared({name) ) . If a context condition is not met, this usually affects the execution of the Subsequent semantic actions, but this cannot be expressed well in the attributed grammar. In Coco, we therefore avoid explicit context conditions, replacing their checking with semantic actions (see Section 3.6). However, for the description of the static semantics of programming languages context conditions are very suitable. 3.4 Attributed grammars In the previous sections we have introduced the elements of attributed grammars. We now consider them in their entirety. In the literature the concept of an attributed grammar is defined in many different ways (see for example, Räihä [1977], Tienari [1980], Watt and Lehrmann Madsen [1983]). We will follow Waite and Goos [1984]. 3.9 Definition Attributed grammar An attributed grammar is a quadruple AG = (G,A,R,K): G = (Vy, Vr, P, S) is a reduced context-free grammar; A is a finite set of attributes; R is a finite set of semantic actions; and K is a finite set of context conditions. With each symbol X eVr UV], zero or more attributes from A are associated. With each production zero or more semantic actions from R and zero or more context conditions from K are associated. For each occurrence of a nonterminal X in the syntax tree of a sentence of L(G) the attributes of X can be computed in at most one way by semantic actions. The attribute computation process In the concept of attributed grammars, it is essential that the definition says nothing about the order in which the semantic actions are executed. In the previous examples, we assumed that syntax analysis was performed top-down from left to right, and that the semantic actions were executed in the same
80 Semantics Chap. 3 order. However, according to Definition 3.9, this is not required. The order of the semantic actions is not predetermined by some syntax-analysis method: rather, it is free. This eliminates the necessity of putting the semantic actions and context conditions in particular places of the right-hand side of the grammar productions. All semantic actions and context conditions that belong to a syntax production can be summarized and written at the end of the production. In the general case, the translation runs in two phases: DD syntax analysis, which constructs a syntax tree; execution of semantic actions, which mainly compute the attribute values attached to the nodes of the syntax tree in an arbitrary order. Step 2 implies that an ‘attribute computation process’ will traverse the syntax tree in an arbitrary manner and compute the values of the unknown attributes at each node. A semantic action can be executed at a specific time if and only if all attribute values which contribute to the computation are known at that time. The attribute computation process continues until all attribute values are calculated. It is therefore possible that the attribute computation process must traverse the syntax tree several times, up and down, criss-crossing from left to right. In order to avoid ambiguous computations of attributes, the definition of attributed grammar contains the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way". 3.10 Example Variable declaration In Pascal, variables are declared by their enumeration after the keyword var, and the type follows the list of variables. For example, var X,y,z: integer The semantic actions implied by the declaration may consist of a call to a procedure New/d(\namelt) which appends the name and type of the variable to the name list. In a strict translation from left to right, this con- struct leads to difficulties, since the type is known only after all names have been parsed, and therefore New/d cannot be called immediately after recognizing a name. In an attributed grammar, these difficulties do not arise if it is formulated as follows: | 1 declaration "var" 2 = idlist)., ":" typft] endsem. idlist),, = nd Taare 31 sem t0:=tl Helalisk soit? are sem NewId(Jname/t1) endsem sem NewId({namelt1); t2:=t1 endsem.
Sec. 3.3 Context conditions 81 For the source text var x,y,z: integer first a syntax tree is generated, where all attributes except those of terminal classes have no values (see Fig. 3.5). declaration Wehe skelilsige 3 : type | sem t0:=t1 endsem Nasunsunnunenunnnnannnennanuonnnnnsnnnnne nen ssnnnnnnsnnaunsusannnnennurnennunnen nenn snsnunennnnnsnnsnsnnssnnanunnnumannunnsennnen MIR each idlist * Em REN ’ DEE en id ' sem NewId(|name |t1); t2:=t1 endsem sem NewId (|name eV; t2:=t1 endsem Tannnonsonsusnnnnunuununsnnnnnenasnnnnansnnnnnanenen {ceswaceccsccsccsvevscosecorcensccey, id sem NewId(|\name |t1) endsem |T name xX ee Fig. 3.5 Analysis of the sentence var x,y,z: integer with the flow of attributes along the dashed lines The attribute computation process now starts at an arbitrary node in order to compute the missing attributes, and to call procedure New/d. Wherever it starts, the first semantic action that can be executed is 0 := tl in production 1. Then, 2 := t] and New/d(\namelt]) in production 2 can be executed. This process continues along the dashed lines until all of the semantic actions are executed.
Semantics 82 Chap. 3 In Example 3.10, the order in which the three calls to Newld are executed is not determined by the attributed grammar, but rather depends on the strategy of the attribute computation process. In most cases, the order is unimportant, and therefore this kind of attributed grammar is adequate. If desired, a particular order can be imposed by introducing additional attributes. Cyclic semantic dependencies Attributed grammars can be constructed in which the attribute computation process does not terminate since some attributes depend on themselves. This is called a cyclic semantic dependency. In Definition 3.9, this possibility is covered with the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way’. There are algorithms that can check the grammar for this property (Knuth [1968], Waite and Goos [1984]). If an attributed grammar of the general form described above has been defined, it must first be checked for cyclic semantic dependencies, and possibly transformed into a well defined form. 3.5 L-attributed grammars Great effort is required to translate an attributed grammar as described in the previous section. First, the syntax tree of the program to be translated must be generated, and each of its nodes must be ‘decorated’ with the attributes. Then the syntax tree must be traversed more than once to compute the attributes until all attributes are determined. Nowadays storage and run-time requirements confine this method to mainframes - if it is regarded as practical at all. Hence, special forms of attributed grammars are needed for compilers, permitting the computation of the attributes in a single pass from left to right through the syntax tree. Then the semantic actions can be executed in parallel with the syntax analysis and no syntax tree is needed. Such attributed grammars are called L-attributed (i.e. left attributed) according to Lewis et al. [1976]. All examples in Sections 3.1 through 3.3 are of this kind. The limitations imposed on attributed grammars to make them L-attributed, and are related only to the order of the attribute occurrences in a production. Each inherited attribute a of a grammar symbol X on the right-hand side of a production must be computable before X can be recognized. Therefore, for its computation only those attributes can be used that are known prior to the parsing of X. From this, the following definition follows:
eee 3:5 L-attributed grammars 83 3.11 Definition L-attributed grammar An attributed grammar is called L-attributed if for each of its productions Y > X, ... X,, the following is true: An input attribute of X, depends only on the input attributes of Y and on the output attributes of Xı eee re It can easily be checked by inspection whether a given grammar based on this definition is L-attributed. The question is, how far can one get with an L-attributed grammar, and what do the limitations mean? The general attributed grammars are indisputably the more powerful tool. The user does not need to be concerned about the processing order of attributes (and possibly storage of intermediate results) since this is all done automatically by the attribute computation process. The description is essentially static and thus 'in principle’ simple. In reality, such descriptions can be cumbersome and difficult to understand, particularly in the presence of many attributes. L-attributed grammars can be used to describe the translation of nearly all important language constructions. However, in many cases more context must be used for the translation. This is expressed by the necessity of saving inter- mediate results in lists, stacks, etc. In Section 3.6 it is shown how the non-L- attributed grammar of Example 3.10 can be easily replaced by an L-attributed grammar with semantic actions for temporarily saving variable names. The worst that can happen is that the order of the semantic actions which is imposed by the use of the L-attributed grammar will require the partition of the translation into several passes in which each pass can be defined by an L-attributed grammar. In view of these disadvantages, Waite and Goos [1984] say: 'L-attributed grammars are inadequate, even in comparatively simple cases.' We do not agree with this categorical statement. In most cases, the simplicity and the ease of implementation of L-attributed grammars more than compensate for their disadvantages. Therefore we feel that they are a very suitable tool for compiler implementations, at least as long as our computers are limited in memory and speed. Coco processes only L-attributed grammars, and all attributed grammars in the following chapters of this book are L-attributed. Algorithmic interpretation of L-attributed grammars While general attributed grammars are a declarative and therefore non-algorithmic formalism, L-attributed grammars can also be regarded as algorithmic descriptions, imposing an order in which semantic actions have to be executed.
Semantics 84 Chap. 3 Programmers who are used to think algorithmically will find it easier to follow this approach. Therefore, we understand an L-attributed grammar as a very high-level algorithmic language in the following sense. The context-free portion of a production A=, 1a, | a@,. denotes the algorithm: ‘Parse the nonterminal A by choosing the matching alternative o;, and recognizing its components sequentially from left to right.’ Each alternative with a semantic action of the form OQ; = X1...X; sem SA endsem X441---Xp denotes the algorithm: 'Parse X} through Xj, then execute the semantic action SA, and then parse X;+1 through Maes Each alternative with a context condition of the form OR RER, where (CC) X441---Xn denotes the algorithm: 'Parse X; through Xj, then test the context condition CC (and report any errors), and then parse X;,; through X,.' An attributed production of the form Alaotoo = Xlaıloı Yla2Tb2- denotes the following algorithm: 1. 2. 3. compute a] (using semantic actions that are not stated here, which must precede X and may depend on a0); parse X (thereby b/ gets a value); compute a2 (using semantic actions that are not stated here, which must precede Y and may depend on a0, al, b1); 4. 5. parse Y (thereby b2 gets a value); compute bO (using semantic actions that are not stated here, which may depend on a0, al, bl, a2, b2). This algorithmic interpretation adds as a further clause to the definition of Lattributed grammars (Definition 3.11) the sentence: ‘Attributes that are used as arguments in a semantic action or context condition between the grammar symbols X, and X;,, can only be input attributes of the left-hand side of the production and output attributes of X, to X;.'
Sec. 3.6 3.6 Implementation of the semantic interface Implementation 85 of the semantic interface The implementation of the semantic interface in a compiler compiler and in the generated compiler consists of three tasks: 1. 2. 3. translation and storage of semantic actions during compiler generation time and execution of semantic actions at run-time of the generated compiler; translation and storage of context conditions during compiler generation time and test of context conditions at run-time of the generated compiler; reserving memory for attributes at compiler generation time and attribute passing at run-time of the generated compiler. These tasks are most simply and directly implemented if the generated compiler performs its syntax analysis with the popular method of recursive descent, which is not covered in this book (Gries [1971], Hartmann [1977], Wirth [1986]). In this, semantic actions and context conditions are directly embed- ded as code in the syntactic procedures, and attributes become parameters of the syntactic procedures. The simplicity of this kind of semantic interface makes the method of recursive descent still attractive today for hand-coded compilers. If the generated compiler performs a table-driven syntax analysis, then somewhat more effort is required for the semantic interface. In this section, we cover the method used by Coco. Semantic actions The semantic actions are numbered. The order is arbitrary, but it is easiest to order them as they appear in the attributed grammar. We start the numbering at 12 for reasons that follow. All semantic actions are placed in the single procedure Semant as follows: Semant (nr): case nr of 12: Semantic | 13: Semantic | n ee Action Action 12 13 Action n end end Semant The G-code is expanded to provide as many instructions as there are semantic actions. The G-code instructions treated in Section 2.4 (and two more, see Definition 3.14) have operation codes 0 through 11. Operation codes 12 through 255 correspond to semantic actions 12 through 255. Thus, Coco has a limit of 244 semantic actions which will probably be rarely reached. We only
Chap. 3 Semantics 86 need 68 semantic actions to describe the attributed grammar of Coco itself, and 126 semantic actions for the largest pass of a Modula-2 compiler. For the processing of semantic actions the parser of Algorithm 2.44 needs to be expanded only by an if statement: 3.12 Parser with semantic interface Parse (Tcorrect): loop case opcode of I rei else if correct then Semant (4 opcode) --perform semantic action end end end -- -- end case loop Parse We will now study this method in more detail by an example that uses an L-attributed grammar to translate the following declaration: var x,y,Z: integer; (In Example 3.10 we have already given a general attributed grammar for this task.) Before we can add the identifier list and type to the name list, it must be temporarily stored. To this purpose we will use a queue as abstract data structure with the access procedures /nitQueue, Enqueue, Dequeue, and EmptyQueue whose meaning is obvious. The attributed grammar is as follows: declaration = UVicUae CLens sem InitQueue; (re idlene sem Enqueue(Jname) uate VDEhe sem while not Enqueue(Jname) endsem endsem EmptyQueue do Dequeue (Tx) ; NewId(Jxlt) end endsem v.u ya The numbering of the semantic actions and their integration into the procedure Semant results in the following: Semant local (Jnr): name,x: Ee begin easeanrzoR (int, Nametype; DOO neali
Sec. 3.6 Implementation of the semantic interface 12: 119% | 14: InitQueue; 87 Enqueue (J name) Enqueue (J name) while not EmptyQueue do Dequeue (Tx) ; NewId (Lxlt) end end end Semant The attributes are local variables of Semant. This means that in general all the names contained in a semantic action (enclosed between sem and endsem) are global to this semantic action, and therefore common to all of the other semantic actions. Context conditions Context conditions are not treated as an independent language element in Coco. Rather, they are represented as semantic actions. Instead of where (CC) we write, for example, sem if not CC then SemErr end endsem where SemErr is a semantic error processing procedure. Attribute passing Coco treats all attributes as local variables of Semant. They receive their value through attribute passing. This is different for terminals and nonterminals. The attributes of terminals (i. e. terminal classes) are always synthesized attributes. They receive their value by the lexical analyzer during parsing. The inherited attributes of nonterminals are passed before parsing by an implicit semantic action, whereas the synthesized attributes are passed after parsing. 3.13 Example Attribute passing For the productions A = ... Biyty + Bist, the attribute passing [x is done in the A-production before the parsing of B, and the attribute passing y:-V is done in the A-production after the parsing of B.
Chap. 3 Semantics 88 The attribute passing after the parsing of a nonterminal can be executed by a 'normal' G-code instruction, i. e. by an instruction activating a semantic action. However, for the passing of inherited attributes, two additional G-code instructions are necessary: 3.14 Definition G-code (remainder) Instruction Bytes Description NTS sy sem 3 nonterminal with input attribute semantics. If the next input symbol is a terminal start symbol of sy, then execute the semantic action sem (for input attribute passing) and start the parsing of the production for sy, else report an error. NTAS sy adr sem 5 nonterminal with alternative and input attribute semantics. If the next input symbol is a terminal start symbol of sy, then execute the semantic action sem (for input attribute passing) and start the parsing of the production else go to adr. A complete example for the translation of an attributed grammar into G-code, including attribute passing semantics, can be found in Section 8.3. Problems with semantic interfaces The simplicity of this semantic interface gives rise to two problems: 1. Semantic actions may only be executed when it is clear that no other alternative will match. In the production A = sem | sem actionl action2 endsem endsem C. D. it must be determined whether C or D is the proper alternative before executing action! or action2. Coco takes this into account by automatic insertion of an e-node before the corresponding semantic actions, which leads to the following result: A sem action 1 endsem c | A>€-action1l1 OC x sem action 2 endsem € — action 2 > D EPSA 1M SEM 12 NT C
Sec. 3.6 Implementation of the semantic interface 89 where the proper selection of alternatives is done with the following lookahead sets: epsset first (C) (1) epsset (2) = first (D) This also works in the following production: A=B sem action 1 endsem sem action 2 endsem C sem action 3 endsem { }. i For the above the following top-down graph and corresponding G-code is generated: | A = B= action] | (ieaction2 = C~action3 NT SEM M1:EPSA € B 12 1 M2 SEM 13 NT (€ SEM 14 JMP Ml M2REBSS 2 RET with the lookahead sets epsset (1) = epsset (2) = first (C) follow(A) If the e-nodes have disjoint lookahead 2. sets, these constructs are LL(1). Attributes in Coco are implemented as local variables of Semant. This results in the undesirable feature that their values are not retained during recursive parsing of nonterminals. For example, in the interpretation of expressions, the following production arises: Bie = ype la Tty sem x:=x+y endsem}. Here, the output attribute x of the left T must be still available after parsing of the right T since its value is used afterwards. However, since T is recursive over F and E, the attribute x of the left T may be destroyed by the parsing of the right T. Coco does not take care of this problem. It is up to the programmer to save and restore x explicitly. This can be done by use of a stack and replacing the above production by the following:
Semantics 90 Et, = Tt, ("+" sem Push(lx) Tty Chap. 3 endsem sem Pop(tx); x:=xty endsem}. From this follows the 3.15 Principle of attribute saving for recursive symbols Attribute values that must be preserved beyond the parsing of a recursive nonterminal X must be saved before the parsing of X and restored after the parsing of X.
4 Various compiler compilers In the previous chapter we covered the theoretical background of compilers. In the following chapters we will show the practical application of these principles in the design of the compiler compiler Coco. However, before we go into the details of Coco, it will be interesting to look at some other compiler compilers. This will enable the reader to compare Coco with these systems. There is extensive literature about compiler-generating systems. Bibliographies can be found at Räihä [1980] and Meijer and Nijholt [1982]. The scope of this book allows us to cover but a few of them; and even then only to a limited degree. Some of the best-known compiler compilers are YACC (Johnson [1975]), HLP84 (Koskimies [1984]), GAG (Kastens et al.[1982]), and MUG (Ganzinger and Giegerich [1984]). In the following paragraphs, we will compare these systems to each other. The basic operation of today's compiler compilers is always the same. The compiler to be generated is described by a metalanguage based on attributed grammars. From this compiler description, a parser and a semantic evaluator are generated which constitute the essential parts of the resulting compiler. The generated compiler reads the source text to be translated, performs a syntax analysis to check the correctness of the input, and builds a syntax tree in memory. It then assigns attribute values to the tree nodes according to the attributed grammar. This process normally requires several passes which traverse the tree from left to right or from right to left. In each pass as many attributes as possible are evaluated. Finally the total semantics of the source program is represented by the attributes in the tree. The last pass generates the target code from the attribute values. 91
Various compiler compilers 92 Chap. 4 The various compiler compilers mainly differ in their compiler description languages, and in their algorithms to traverse the syntax tree. Although much effort is spent to reduce execution time and attribute space, large memory requirements and long processing times are the main reasons why automatically generated compilers are still less efficient than hand coded compilers. Therefore some compiler compilers like YACC and Coco bypass the construction of a syntax tree and accept that they are less powerful and less generally applicable than HLP84, GAG, or MUG. The above mentioned compiler compilers will be compared without going into too much detail. We will give a short example of their input language which will show the translation of a signed integer constant into its value. Normally, such tasks are handled by the lexical analyzer. However, they can also be solved with an attributed grammar, which is short and easy to understand and is therefore well suited as an example of attributed grammars. Of course compiler compilers can achieve more than what is demonstrated in this short example. Most of them will only show their advantages on a large and complex task. However, these small examples will allow some interesting conclusions about the user-friendliness and the effort required to learn the description language of the various systems. 4.1 YACC - yet another compiler compiler Origin and scope YACC was produced by Stephen C. Johnson at Bell Laboratories in 1975. It runs under Unix and is therefore widely available. YACC accepts L-attributed grammars with the limitation that each grammar symbol has only one synthesized attribute and no inherited attributes. From the compiler description, YACC generates an LALR(1) parser (Lookahead LR(1)) and a semantic analyzer which is simply a collection of all of the semantic actions of the compiler description. The user must supply a main program, a lexical analyzer, and a syntax-error handler. Description language The syntax parts of the YACC source language are written as BNE productions. All terminals (with the exception of literals) must be declared. For the production X0:X1X2 ... Xn, the symbol $$ denotes the attri- bute of X0, $1 the attribute of XI, and $n the attribute of Xn. Semantic actions can be specified at any position between the symbols of the context-
Sec. 4.1 YACC - yet another compiler compiler free grammar. They must be sequence of valid C statements. in semantic actions. At the end which are called in the semantic named yylex must be provided. 93 written in C and may contain an arbitrary Context conditions are written as if statements of the grammar, one can write C procedures actions. At this point also a scanner procedure Attribute processing The attribute processing is done in a single pass during syntax analysis. An explicit syntax tree of the source language is not produced. vd Implementation YACC is written in C and produces compilers that are also written in C. It has been used for the translation of many languages, including C, APL, RATFOR, and Pascal. 4.1 Example Attributed grammar as input for YACC $start Number /* start Stoken digit /* declaration symbol /* have 5% to be declared /* separator /* Tun an ma a Digitlist: | terminals. a "=" Digitlist Digitlist | a a a as */ Literals a a a digit Digitlist don't */ */ */ a “ Number: /* a of the grammar of terminals. a ee ee */ {printf (-$2);} {print£ils1);)7 a a a a wn we we {$$ = $1;} {if (($1>3276) (($1=3276) digit | && ee x / ($2>7))) {printf ("Constant too big"); $$ = 0;} else IfeFE Sepp ps ee a RR a sy he ($$ = $1*10 er SE Se 5% #include<ctype.h> yylex() { /* lexical int ch; while ((ch=getchar ())==" analyzer "); if (isdigit(ch)) {yylval=ch-'0'; else return (ch); } yyerror (s) /*error procedure*/ chars2s, {printf ("%s\n",s);) main() /*main {return (yyparse());} */ procedure*/ return (digit);} + $2;} aay BE re ee */
94 4.2 HLP84 Various compiler compilers Chap. 4 - Helsinki language processor Origin and scope The first version of HLP was produced in 1978 under the name HLP78 at the University of Helsinki by Räihä er al. [1983]. Since then a new version, HLP84 (Koskimies [1984]), has been created which has little in common with the previous one. HLP84 accepts attributed grammars for a one-pass translation of programs. It generates a scanner, an LALR(1) parser with error handling, and a semantic evaluator to which user procedures can be attached. Symbol table handling can be partially described in the compiler definition language; in certain cases it is even done automatically. This reduces the number of semantic procedures required. Description language The description language Lisa is nonterminal oriented. This is in sharp contrast to other compiler description languages, where the emphasis is on productions. Each nonterminal is described by a block which forms the scope of its local objects. This is similar to the use of procedures in higher-level languages. A block contains all productions of a nonterminal in extended BNF, as well as the description of all terminals used in it. Within a block, attributes and local variables are declared in a Pascal-like form. A set of semantic rules consisting of assignments and function calls is attached to each production. These rules assign values to the synthesized attributes on the left-hand side and to the inherited attributes on the right-hand side of the production. An attribute a of a grammar symbol S is denoted by S.a. Terminals can have a single synthesized attribute. There is a specific language element for context conditions. Lisa provides some standard facilities for frequently needed operations such as definition of scopes and searching of names in them. These mechanisms free the user from some clerical work. For example, an identifier will be automatically searched in all open scopes and its node in the syntax tree will be automatically attributed according to the information in its symbol table entry. Attribute processing Attributes are processed in a single pass from left to right by means of an attribute-stack and without an explicit syntax tree. This limits the application of HLP84 to languages that can be translated in one pass although it is not required that semantic analysis is done during syntax analysis.
Sec. 4.2 HLP84 — Helsinki language processor 95 Implementation HLP84 was implemented on a Burroughs B7800 computer in Pascal. It generates compilers in Pascal. The system has been used for its own implementation and for the generation of a Pascal compiler. 4.2 Example Attributed grammar as input for HLP84 external -- declaration of external Pascal-objects type Outfile = Extfile; function WriteInt(f:Outfile; i:Integer): (f:Outfile) = procedure ExtOut (var f:Extfile; i:Integer) ; -- Connects the Pascal-procedure ExtOut with the Lisa-function -- Writelnt. -- Extfile and nont Number; --- ExtOut are given in a special description of the nonterminal Number has no attributes. attrset Intval = (val: Integer); -- val is declared to be an integer system file. Number attribute. The (start sym.). attribute -- declaration is given the name Intval. var out: Outfile; -- global variable const nont max SignedNumber: nont , = 65535; DigitList: Intval; -- description -- SignedNumber of the has an nt "SignedNumber". attr. set "Intval" Intval; check val < max; -- context condition token DigitToken: Integer = Digit; -- the terminal "DigitToken" with an attr. of type Integer is -- declared to consist of a single digit (Digit is predefined) DigitList = DigitToken; rules val:=DigitToken -- the attr. of a token end; --- syntactic production semantic rules is denoted DigitList = DigitList DigitToken; rules val:=10*DigitList.val+DigitToken end end DigitList; SignedNumber = '-' DigitList; rules val:=-DigitList.val end; SignedNumber rules = DigitList; by the name of the token
Chap. 4 Various compiler compilers 96 val:=DigitList.val end end SignedNumber; Number = rules post SignedNumber; (out, SignedNumber.val) ; out:=WriteInt -- after end end Number SignedNumber is processed, its attribute val is written. 4.3 GAG - generator based on attribute grammars Origin and scope GAG was developed by Kastens, Hutt, and Zimmermann [1982] at the University of Karlsruhe. It accepts ordered attributed grammars where the attribute evaluation order of each nonterminal is fixed and independent of the context of the nonterminal. From the compiler description, an attribute evaluator and an LALR(1) parser are produced (by separate tools). The user must supply a lexical analyzer and a few other procedures such as a code generator. These modules together with some fixed parts constitute a complete compiler. Description language The grammar is written in extended BNF with special constructs for options and repetitions. All nonterminals and terminals (except literals) must be declared. Every production is associated with a set of semantic rules. In these rules the strongly typed, functional language Aladin is used, allowing attribute assignments and function calls. The right-hand side of an assignment can be a complex expression of attribute values, function calls, if expressions, syntax symbols, and many others (see Example 4.3). As a functional language Aladin has neither variables nor control statements. The attribute notation S.a means the attribute a of the symbol S. If S occurs in a production several times, the first occurrence is denoted by S[1], the second by S[2], and so on. There is a special language element for context conditions. Attribute processing A decorated syntax tree is built during attribute evaluation, but it is not traversed in alternating passes from left to right and from right to left, as is done in some other compiler compilers. A node is visited if there are no more
Sec. 4.3 GAG - Generator based on attribute grammars 97 nodes to the left of it, and a parent node is visited when no more of the children can be visited. The syntax tree is therefore not processed in a straight direction. In fact, evaluation may sometimes step back some nodes to evaluate attributes that could not be computed earlier. In this manner, the number of passes over the tree can be reduced. The memory requirements for attributes in the syntax tree are optimized by various algorithms. After the attribute evaluation, the decorated syntax tree is passed to a user program which generates the target code. Implementation GAG is implemented in Standard Pascal under Unix BSD 4.2 on a Siemens computer 7.760. It also generates compilers in Standard Pascal. Compilers for Pearl, LIS, Pascal, and Ada have already been produced by GAG. 4.3 Example Attributed grammar as input for GAG GaSe aS SSS = symbol TERM digit NONTERM NONTERM Number Digitlist and attribute value: INT % value [ge value: »RULE rl: “ Number ::= is a synthesized INT SSS ne declarations lag -------------- SYNT; integer attribute SYNT; I MEVEDIigTelnse STATIC Number.value:= TEZUZVZTSETHERE THEN ELSE -DigitList.value DigitList.value WPL % No output of the attribute Number.value. % The attributed tree is passed to a user written % which prints the results. END; RULE r2: Digitlist ::= digit STATIC Digitlist.value:=digit.value END; RULE r3: Digitlist ::= Digitlist digit STATIC Digitlist [1].value:=10*Digitlist[2].valuetdigit.value CONDITION (Digitlist [2] .value<3276) OR ((Digitlist [2] .value=3276) AND (digit.value<8) ) MESSAGE "Constant value too big" END; program,
Various compiler compilers 98 4.4 MUG - modular Chap. 4 compiler generator Origin and scope MUG was developed in 1985 at the (Modularer Ubersetzer-Generator) University of Dortmund (Germany) by Ganzinger and Vach. It processes socalled one-sweep grammars (Engelfriet and File [1981]). MUG supports all phases of semantic analysis (attribute processing, optimization, and code generation). However, it does not produce a scanner or a parser. Those can be generated with YACC and then attached to the MUG system. Semantic modules are written in Modula-2. The underlying principles of MUG are substantially different from traditional attributed grammars. Terminals are viewed as the types of some semantic objects (so-called semantic sorts), nonterminals are viewed as the types of syntax trees (so-called syntactic sorts). Productions are therefore viewed as functions, mapping objects of syntactic and semantic sorts into syntax trees which are themselves elements of syntactic sorts. The translation of trees of an input grammar into trees of an output grammar is called an attribute coupling of the two grammars. Attributes can be classified as semantic attributes, which contain semantic values (and therefore, like the values of terminal symbols, are objects of semantic sorts) and syntactic attributes, which represent subtrees of the output grammar (and thus are objects of syntactic sorts). Semantic attributes are computed in semantic rules, whereas syntactic attributes are built by applying productions of the output grammar. Semantic attributes can also be viewed as ‘terminal symbols' of the output grammar. As a result of this view, several attribute coupling processes can be concatenated so that the output grammar of the first coupling becomes the input grammar of the second one. As an option, MUG can automatically combine the two attribute couplings into a single one. The user can therefore describe complex translation processes as a sequence of simple translations (e.g. L-attributed grammars), which the system — hidden from the user — combines into a single attributed grammar that does not need to be L-attributed. In this manner, readability is balanced with efficiency. Description language MUG uses one description language for all translation phases. It is based on Modula-2. The production Prodl: A->Bc is written in a function-like manner as CONSTRUCTOR Prodl (btree:B; cval:c): A
Sec. 4.4 MUG - modular compiler generator 99 An attribute a of a nonterminal $ is written as Sa. All nonterminals must be declared together with their attributes and attribute types. For semantic sorts, the user must write Modula-2 modules that export them as types unless they are standard types of Modula-2. There must be separate modules for the input grammar, the output grammar, and their attribute coupling. Semantic rules can contain assignments with arbitrary Modula-2 expressions, function calls, and if expressions. Syntactic attributes are calculated through constructors of the output grammar. Context conditions have no construct of their own. They must be specified within semantic functions. # Attribute processing The attribute processor generated by MUG uses the 'one-sweep' method, which is an L-attributed processing of the syntax tree, where possibly children of each node have been previously brought into an adequate order. Implementation MUG was implemented in Modula-2 on aCADMUS computer. It generates compilers in Modula-2 and has been used for its own implementation. 4.4 Example Attributed grammar as input for SIGNATURE DEFINITION (*definition of the MODULE Numbers; context-free FROM Values IMPORT Value; (*syntactic FROM User IMPORT digit, minus; SORT Number, Digitlist; MUG input grammar*) sort from the output grammar*) (*semantic sorts (terminals)*) (*syntactic sorts (nonterminals)*) (*rules of the context-free grammar*) CONSTRUCTOR PosNumber (dl:Digitlist): Number; CONSTRUCTOR NegNumber (m:minus; dl:Digitlist): Number; CONSTRUCTOR SingleDigit (d:digit): Digitlist; CONSTRUCTOR MoreDigits(dl:Digitlist; d:digit): Digitlist; (*attribution function for the context-free OPERATOR Evaluate(n:Number): Value; END Numbers. SIGNATURE DEFINITION (*definition of the SORT Value; CONSTRUCTOR END Values. MODULE grammar*) Values; context-free t Resul (val: INTEGER): output Value; grammar*)
Chap. 4 Various compiler compilers 100 ATTRIBUTATION (*attribute FROM MODULE coupling Values OPERATOR IMPORT above of Digitlist grammars*) Value; Evaluate(n:Number): (*declaration ATTR Number ATTR Numbers; of the Value; attributes*) SATTR nval: Value; SATTR dval: INTEGER; (*attributations of the productions*) CONSTRUCTOR PosNumber (dl:Digitlist): Number; BEGIN PosNumber”nval = Result (dl”dval); (*the constructor "Result" builds a syntactical attribute of type "Value"*) END PosNumber; CONSTRUCTOR NegNumber (m:minus; dl:Digitlist): Number; BEGIN NegNumber“dval END NegNumber; CONSTRUCTOR = Result (-d1”dval); SingleDigit (d:digit) : Digitlist; BEGIN SingleDigit*dval END = d; SingleDigit; CONSTRUCTOR MoreDigits(dl:Digitlist; d:digit): Digitlist; BEGIN MoreDigits*dval END MoreDigits; = 10 * dl’dval + d; END Evaluate; END Numbers. 4.5 Coco - compiler compiler Origin and scope Coco arose in 1983 at the University of Linz as a successor of a parsergenerator. It processes L-attributed grammars, which are viewed as procedural descriptions of a translation process. The compiler description is translated into an LL(1) parser with automatic error recovery and a semantic evaluator to which user modules can be attached. The user must further supply a main program and a scanner (for which there is a scanner generator). It is possible to generate multi-pass compilers with Coco.
Sec. 4.5 Coco — compiler compiler 101 Description language The compiler description language Cocol is based on context-free grammars in Wirth's EBNF notation. All terminals and nonterminals must be declared. Each syntax symbol can have one or more attributes. A symbol S$ with an output attribute a is written as S<out:a> wherever it occurs within a production. Semantic actions are written directly in Modula-2. They may appear at arbitrary points on the right-hand side of the productions. Attributes can be accessed like normal variables. Context conditions are written as if statements in semantic actions. # Attribute processing Semantic evaluation takes place during the syntax analysis. A syntax tree of the input is not built. Productions are processed strictly from left to right. When a semantic action is encountered, it is executed immediately. Attribute values of terminals are returned by the scanner, those of nonterminals are passed using assignments generated by Coco. Implementation Coco is implemented in Modula-2 on various microcomputers including Macintosh, IBM-PC, Atari, and Lilith. It is also available on IBM mainframes, Coco generates compilers in Modula-2. It has been used for the construction of a multi-pass Modula-2 compiler and for the generation of several tools for static program analysis. 4.5 Example Attributed grammar as input for Coco GRAMMAR SEMANTIC Number DECLARATIONS FROM InOut IMPORT VAR value,valuel: WriteString, INTEGER; Writelnt; TERMINALS ve digit <out:value> NONTERMINALS Number Digitlist <out:value> RULES Number = Digitlist<out:value> | sem WriteInt (value,5); sem WriteInt (-value,5); endsem wow Digitlist<out:value> Digitlist<out:value> digit<out:value> = endsem.
Chap. 4 Various compiler compilers 102 { digit<out:valuel> sem IF (value<3276) OR ((value=3276) AND (valuel<8) ) THEN value:=10*valuetvaluel; ELSE value:=0; WriteString("Constant too big"); END; endsem ENDGRAM }. 4.6 Summary This short overview of some of the better known compiler compilers has shown that many powerful systems with complex input languages exist for the definition of many exotic special cases. Why then are these generators so seldom used for practical applications? There are many reasons. The most significant is the fact that automatically generated compilers are simply less efficient than manually coded ones. According to Koskimies et al. [1982], a Pascal compiler produced with HLP78 ran seven times slower and used three times as much memory (only for its code!) than a manually generated compiler. However, efficiency is not the main goal of a compiler compiler. Often it is more important that the compiler description be short, formal, and complete. Then it can be used as a prototype of a compiler implementation for a new language or to study the techniques of compiler construction as such. Compiler description languages are sometimes not easy to read. In most cases ordinary BNF is used for the syntax definition. Although concise and elegant, this notation often looks unnatural because of the recursion needed to express repetitions. Attributes usually appear only in semantic rules and not with the grammar symbols. This makes the productions short, but the reader must extract from the semantic rules those attributes which belong to a given syntax symbol. In many cases, the semantic rules may only be attribute assignments. Therefore, important parts of the actual translation must be hidden in procedures. Having these difficulties to contend with may even make the compiler compiler a burden rather than a help. Finally, most compiler compilers require a lot of memory themselves. For example, GAG required 4 megabytes of main memory for the generation of an Ada compiler, and this amount of memory is not available on many micro- computers.
Sec. 4.6 Summary 103 We believe that a compiler compiler should be a tool which is easy to understand and easy to use. Above all, its input language should be clear and natural, but its availability (e.g. on microcomputers) and efficiency are equally important. These were the considerations behind the development of Coco and its input langage Cocol. Table 4.1 summarizes the main features of the described compiler compilers.
‘Tempo SIOPNLISUOI aremıjJosUrduJuu $]00} <OnfeA:1no> ISFTNSIC] [NSIC oneAyIST ISPTIWiq:Ip) :(BIp:pNSIC]IST BI Snpea TNSIG joquids pozisouuds o3uls +1B81p AOIDNALSNOI Id ISTTIBIG =:: ‘suoissaidxo oynquyye nap ayngunre ied poynquye-Tsrewurels yyım e feeds Tease (DATVI ‘syuoWUsIsse : IST USI] |SBI srpisiq “I9sIed INUEULIS JoTenfeA9 ON Berg (suonJe) sofnI STURWOS srewumıld pemnqiyre Jo SSe]) sse[D Jo 3a1J-IX9JU09 srewureld Areniqry I STUIWITEIS anquyy dating es] xeyuäsuONeNJeA9 san Ul OY} I9PIO Jan (ofdurex3) xeuks (oIdumexs) uorejou uonepu sıngLmy Baer pm 5] Samen mare | oN Joyenyead INUEUIIS ainquaye JofenfeA9 ‘ı9sıed Ye] wou ssed afduıs e uy KıaA9 xejuAs polopioar uopusdapur Joy Afqıssod srewurel3 uo I9PIO Jo somnquite pomnquye-]) uoTenfeA>) | T’p aquL “DUUEIS uorjouny uonnguue epy ‘Teosed “Ta Vdd Iewwreis ‘Ie pelaplo WOLF sy[nsal Iopio uoTenTeaq a Pee (feuruuspuou snquny OnfeASTIEIT (soon s1aqtdui09 Japiduios snows Jo soniodoig JoTenfeA9 ‘edendur] ‘sjTeo Sp Sri = ISIC IST “SIP an) gnquyy psiq st= PA dooms, a[8uts e uf Jorn ‘sJUOWUSISSY sıngınre ‘suotssoidxe UOMOUN ‘sTTed {Zip} psig sr= wtp BP awes |e ‘“Iosıed ‘I ‘dy ‘Jopey Teoseg yoıym 01 porddy IXJUOJ sodendue] Jo SUONIPUOO au Jptiduioo adendue] porersusd
) The compiler description language Cocol This chapter describes Cocol, the input language of the compiler generator Coco. A Cocol text essentially consists of an attributed grammar and declarations. From this description, Coco generates a parser and a semantic evaluator. The user has to provide a main program, a scanner, an error message module and semantic modules to get a complete compiler. Some of these modules can be generated by tools or are standard modules that do not depend on the language to be processed. The attributed grammar consists of a context-free grammar as a description of the compiler input and of semantic information as a description of how this input is to be translated. When designing an attributed grammar one usually starts with the context-free grammar and completes it step by step with attributes, semantic actions and context conditions. Therefore this chapter is arranged in two parts: the specification of Cocol as a syntax description language and its specification as a semantic description language. 5.1 Lexical structure A grammar description in Cocol consists of keywords, identifiers, strings, numbers, comments and special characters. Keywords ALIAS ENDSEM MACROS RULES 105
The compiler description language Cocol 106 ANY DECLARATIONS ENDGRAM EPS GRAMMAR IN NONTERMINALS OUT PRAGMAS Chap. 5 SEM SEMANTIC TERMINALS Keywords must be written with upper-case letters, except for the following keywords that may also be written with lower-case letters, as they often appear in a context where they are not to be emphasized. alias any endsem eps in out sem Identifiers identifier = letter {letter | digit}. Identifiers may be of arbitrary length. Case is significant. Strings string = quote {anybutquote} quote | apostrophe {anybutapostrophe} apostrophe. quote means the character ", apostrophe means the character '. anybutquote is any character except quote, anybutapostrophe is any character except apostrophe. Strings must not extend beyond line boundaries. Numbers number = digit {digit}. Special characters for the syntax description: for the semantic description: PR: SUITE Comments start with the string '--' and extend to the end of the line. 5.2 Cocol as a syntax description language The kernel of a Cocol text is the syntactic description of the language that the generated compiler is to process. Grammar = "GRAMMAR" identifier SyntaxDeclarations Productions "ENDGRAM" , The syntax description consists of declarations for terminals and nontermin als and of the context-free grammar. The identifier following the keyword
Sees 5.2 Cocol as a syntax description language 107 GRAMMAR is the grammar name. It is the root symbol (start symbol) of the grammar and must be declared as a nonterminal. We start with the productions and continue with the declarations later. 5.2.1 Productions The productions of the context-free grammar are written in an EBNF suggested by Wirth [1982] (square brackets enclose optional expressions, curly brackets denote repetition zero or more times). Productions = Production Expression Term Factor = identifier "=" Expression = Term {"|" Term}. = Factor {Factor}. Symbol NzExXpression. u) w(? Expression) |" "RULES" {Production}. H("SExXpression: ".", Vy" "eps" "any" Symbol W identifier | string. 5.1 Example Cocol grammar for real constants RULES Real Integer Exponent = Integer "." [Integer] = digit {digit}. = "E" ["+"|"-"] Integer. [Exponent]. The symbols Real, Integer and Exponent are nonterminals. The symbols digit, "E", ".", "+" and "-" are terminals (they have no productions). eps The symbol eps denotes the empty string (see Section 2.1) and is used to describe empty alternatives. 5.2 Example Sign The use of eps "+" | "-" | eps. Isequivalentto Sign°= [24 "NT, eps is not necessarily needed for the syntax description, but it is required if one has to attach semantic actions to empty alternatives. any The symbol any denotes any terminal, which is not the start of the alternative
The compiler description language Cocol 108 Chap. 5 chain to which the any symbol belongs. Therefore any is a representative of a whole set of terminals, i.e. all terminals which cannot be recognized instead of itat that point in the grammar. 5.3 Example The use of any Option = "$" any. Here, any means any terminal. Token = keyword | identifier | number | any. Here, any means any terminal except keyword, identifier or number (which may be recognized instead of it). String = LY {any} sun Here, any means any terminal except '"' (which may be recognized instead of it). Properties of a correct grammar Coco generates a compiler only if the grammar is: 1. 2. 3. 4. 5. complete: there must exist a rule for every nonterminal; free of redundancy: every nonterminal must occur in at least one derivation of the root symbol; free of cycles: there must not be a nonterminal which can be derived from itself in one or more steps; terminating: every nonterminal must be able to produce a string of terminals; unambiguous: the grammar must be LL(1). LL(1) conflicts do not necessarily mean serious errors. They can be viewed as warnings in situations where the generated compiler will take the first matching alternative and ignore the others. Sometimes this is what the user wants, as in the well-known case of the dangling else. 5.4 Example How the compiler treats LL(1) conflicts This is the grammar of the dangling else: Statement = ... | IfStatement IfStatement = "IF" Expr "THEN" | B Statement ["ELSE" Statement]. When analyzing the string IF a THEN IF b THEN c ELSE d it is not clear whether the else clause belongs to the inner or to the outer if. During parsing the first matching alternative is the else of the inner
Sec35:2 Cocol as a syntax description language 109 if. The generated compiler takes this alternative. 5.2.2 Declarations All terminals and nonterminals must be declared before they can be used in productions. Declarations have the following order: SyntaxDeclarations = TerminalDeclarations [PragmaDeclarations] NonterminalDeclarations. Terminal declarations TerminalDeclarations AliasName Symbol = "TERMINALS" {Symbol [AliasName]}. = "alias" Symbol. = identifier | string. Terminals are declared by their enumeration behind the symbol TERMINALS. Consecutive token numbers are assigned to them in the order of their declaration. The first symbol gets the number 1, the next one the number 2, and so on. If a symbol name contains a special character, it must be enclosed in quotes (e.g. "+", "plus-symbol"). The end-of-file symbol must not be declared. It always is assumed to have the token number 0. The lexical analyzer has to supply it as the last symbol of the input text. At its arrival, the syntax analyzer automatically interprets it as an indication that the input is empty now. The end-of-file symbol must not (and cannot) be specified in a production. A symbol may be given an alias name, which is used in error messages by the generated compiler. If the alias name is omitted, the symbol name is used instead of it. Alias names allow the use of short names in the grammar and of expressive names in error messages. 5.5 Example Terminal declarations TERMINALS id Zu en alias alias alias identifier "becomes symbol" semicolon Pragma declarations Pragmas are a special feature of Cocol. They are neither terminals nor nonterminals and must not be used in productions. They may occur at any position in the input text and are read by the parser as if they were terminals, but they do not belong to the syntax of the language (examples of pragmas are
Chap. 5 The compiler description language Cocol 110 options, the end-of-line symbol, and comments). Parsing is not influenced by pragmas but they may carry semantic information (such as line numbers, option values, etc.). Pragmas can be used to propagate information between the passes of a multi-pass compiler. PragmaDeclarations = Symbol = dentitier "PRAGMAS" {Symbol}. || string: Pragmas are declared by enumerating them behind the keyword PRAGMAS. They are assigned consecutive token numbers, starting with the highest terminal number plus one. 5.6 Example Pragma declarations PRAGMAS "end of option line" The purpose of pragmas will become clear when we attach semantic actions to them (see Example 5.11). Nonterminal declarations NonterminalDeclarations = AliasName Symbol = "alias" Symbol. = identifier | string. "NONTERMINALS" {identifier [AliasName]}. Nonterminals are declared by enumerating them behind the keyword NONTERMINALS. Their declaration order is insignificant. Nonterminals can be given an alias name too. The root symbol (grammar name) must also be declared as a nonterminal. 5.7 Example Nonterminal declarations NONTERMINALS Stat alias Statement Expr alias Expression 5.3 Cocol as a semantic description language The semantics of a translation are specified by attaching semantic actions, attributes and semantic declarations to the syntax description. The following grammar of Cocol shows that there are only few locations (marked by underlined text), where semantic parts have to be added to a Syntax description in order to get an attributed grammar.
Sec. 5:3 Cocol as a semantic description language CocolText = "GRAMMAR" 111 identifier SyntaxDeclarations Productions "ENDGRAM" . SyntaxDeclarations = TerminalDeclarations [PragmaDeclarations] NonterminalDeclarations. TerminalDeclarations = "TERMINALS" PragmaDeclarations = {Symbol ? [Attributes] [AliasName]}. [Attributes] [SemAction]}. "PRAGMAS" {Symbol NonterminalDeclarations = "NONTERMINALS" {identifier [Attributes] AliasName = "ALIAS" Symbol. {Production}. Productions = "RULES" Production = identifier Expression = Term {"|" [Attributes] [AliasName]}. "=" Expression ".". Term}. Term = Factor Factor = Symbol [Attributes] | "(" Expression ")" et" Expressuone 4)" {Factor}. | Tiieixpression =} | SemAction | a Symbol 5.3.1 "eps" | "any". = Semantic identifier | string. actions A semantic action is a statement sequence on the right-hand side of a production, which is executed after the symbol to the left of it has been recognized and before the symbol to the right of it will be recognized. Semantic actions may be written in any algorithmic programming language (in our Coco implementation this language is Modula-2). There are two kinds of semantic . actions. SemAction = SimpleAction Simple semantic SimpleAction | SemMacroCall. actions = "sem" {any} "endsem". A semantic action is enclosed by the keywords sem and endsem. Between them, any statements such as assignments, procedure calls, conditional statements and loops are allowed. The syntactical correctness of the statements is not checked by Coco.
Chap. 5 The compiler description language Cocol 112 5.8 Example Semantic actions We want to have a compiler which counts the words in a text. The context-free grammar is Text = {Word}. Now we add semantic actions. Text = sem count:=0 endsem {Word sem count:=count+l endsem} sem IF count>0 THEN WriteCard(count,3); WriteString(" words") END endsem. Since syntactic and semantic parts are intermixed and hard to read, we separate them in two 'colums': Text = {Word } sem count:=0 endsem sem count:=count+l endsem sem IF count>0 THEN WriteCard(count,3); END WriteString(" words") endsem. Syntactic and semantic parts are separated clearly now. The production must be read line by line from the left to the right. The parameters of procedure calls in semantic actions may be specified as input, output or transient parameters by writing the characters ‘J’, 'T' or '\T' in front of them ('!', '', and '!4' on an ASCII keyboard). This is a simple way to make procedure calls more readable. In the resulting compiler these marks are removed. 5.9 Example Indication of data flow at parameters ComputeValues (Largument1, Semantic macros Sometimes a semantic Jargument2, Tresult); action is needed at more than one location in a grammar. To avoid rewriting of the action, the user can define a macro for it and call it whenever he needs it. SemMacroDefinition SemMacroCall MacroName = "sem" ":" MacroName = "sem" "(" MacroName = identifier. ":" ")" {any} "endsem", "endsem", A macro definition is a semantic action headed by a macro name which is enclosed in colons. It must be given in a special section of the semantic declarations (see Section 5.3.4). Note: The use of semantic macros reduces the code size of the resulting compiler. also
See3.5.3 Cocol as a semantic description language 113 5.10 Example Semantic macros The last semantic action of Example 5.8 is needed more than once, say. The action is defined as a macro in the semantic declarations as follows (see Section 5.3.4): MACROS :WriteCounter: IF count>0 THEN sem WriteCard(count,3); WriteString(" words") END endsem It may then be called by writing sem (WriteCounter) endsem Semantic actions for pragmas A semantic action may be associated with the declaration of a pragma. This means that the action is executed every time the parser reads the pragma. In this way a pragma can cause the execution of a semantic action although it does not occur in any production. 5.11 Example Semantic actions for pragmas PRAGMAS eolsy 5.3.2 sem PrintLineInfo; Emit (veol) endsem --- call a semantic procedure write pragma to next interpass file Attributes Attributes describe semantic properties of symbols and their context. Attributes = InArteriputess "<" QutAttributes ">" Bu ETnAteributesz su OuFAttributesien = nern Ater pe NATE}. QutAttributes InAttr = “out™ "N FQutAttr"t?,T = identifier | number. OutAttr = >" OutAttr}. identifier. In Cocol, attributes play the role of parameters of the grammar symbols. They are Classified into input attributes, which are passed to a nonterminal for its recognition, and output attributes, which arise during the recognition of a symbol. We also distinguish between formal and actual attributes. Formal attributes occur in the declaration of a symbol or are attached to nonterminals on
Chap. 5 The compiler description language Cocol 114 the left-hand side of a production. Actual attributes are attached to symbols on the right-hand side of a production. 5.12 Example Attributes NONTERMINALS Variable N: ; <in:type; out:object> type: -- object: formal formal input attribute output attribute <in:type; out:object> zesteyper -- object: formal formal input attribute output attribute out:obj> Sas =- actual actual input attribute output attribute RULES Variable = ie Declaration = Variable <in:tp; {E08 Ob]? Attribute names may be used like variables in semantic actions. Attributes of nonterminals Nonterminals may have input and output attributes of arbitrary types. The type of an attribute is declared like the type of any other variable (see Section 5.3.4). Formal and actual attributes must be assignment compatible in the sense of Modula-2, although this is not checked by Coco. Whenever a nonterminal occurs, all its attributes must follow it. Formal and actual attributes must correspond in number, sequence, and kind (in or out). A numeric constant may only be specified as an actual input attribute. Attribute evaluation is similar to parameter passing in procedures: before the recognition of a nonterminal is started, the values of the actual input attributes of the nonterminal are assigned to its formal input attributes; when the nonterminal has been recognized, the formal output attribute values are assigned to its actual output attributes. Attributes of terminals and pragmas Terminals and pragmas may have only output attributes. For implementation reasons their size is restricted to word size. This restriction can be circumvented by using abstract data types for longer attributes. Whenever a terminal or a pragma occurs, all its attributes must follow it. For terminals, the names of the formal attributes are insignificant, but for pragmas they are significant as they may be used in a semantic action. Pragmas don't have actual attributes since they cannot appear on the righthand side of a production. The attribute values of terminals and pragmas are supplied by the scanner (see Section 6.4.2).
SeCH.3 5.3.3 Cocol as a semantic description language Context 115 conditions There is no special language construct for context conditions in Cocol. They are written as conditional statements in semantic actions. This has the drawback of hiding them somewhat but has the advantage that arbitrary error actions can be associated with them. 5.13 Example Context conditions sem IF typel=type2 THEN@ ELSE ° -- context condition -- semantic action =-werronsaction 2 END endsem 5.3.4 Semantic declarations All variables, procedures and named constants that are used as attributes or in semantic actions must be declared. The compiler description can be viewed as a module to which these objects are local. The user may also import objects from other modules. SemanticDeclarations Declarations = of semantic ObjectDeclarations = [ObjectDeclarations] [SemMacroDeclarations]. objects "SEMANTIC" "DECLARATIONS" modulatext. modulatext is an arbitrary text of import statements, constant, type, variable, or procedure declarations in Modula-2. The syntax of this text is not checked by Coco. 5.14 Example Declarations of semantic objects SEMANTIC FROM FROM DECLARATIONS InOut IMPORT WriteCard, WriteString; UserModule IMPORT UserProcedure; CONST maxint = VAR field: ARRAY[1..100] 32767; PROCEDURE Equal(x,y:ARRAY BEGIN END ... Equal; OF OF CHAR; CHAR) : BOOLEAN;
Chap. 5 The compiler description language Cocol 116 Declaration of semantic macros At this point the user may declare a set of semantic macros in this place which can be used in the productions. SemMacroDeclarations SemMacroDefinition MacroName = "MACROS" {SemMacroDefinition}. = "sem" ":" MacroName ":" {any} "endsem". = identifier. An example of the definition and the use of a semantic macro can be found in Section 5.3.1 (Example 5.10). 5.3.5 Scope of semantic objects For implementation reasons, the scope of a semantic object cannot be restricted to a single production: all declared and imported objects are global to the whole compiler description. This means that the value of a semantic object may be destroyed by a nonterminal that is processed between the assignment and the use of that object. One has to resort to the following remedies: 1. 2. Naming conventions. Every production should use its own names for those attributes and semantic objects which may be destroyed by another production. This reduces the problem to semantic objects of recursive nonterminals. Stacking. All values which may be destroyed by a nonterminal should be stacked before this nonterminal is entered and unstacked afterwards. 5.15 Example Stacking of semantic objects Expression<out:exprval> Term<out = :exprval> Warn Term<out : x> sem Push (Jexprval) endsem sem Pop(lexprval); exprval:=exprval+x Ie endsem Term<out:termval> = Factor<out:termval> EN sem Push(Jtermval) Factor<out :x> Ie sem Pop(Ttermval) endsem ; termval:=termval*x endsem Factor<out:factval> = integer<out: factval> | "(" Expression<out:factval> ")", The original values of exprval and termval are destroyed by the recursiv e calls to Term and Factor so they must be saved on a stack.
6 The compiler compiler Coco This chapter describes the compiler compiler Coco from the user’s point of view. It contains everything the user needs to know in order to produce a compiler with Coco. Section 6.1 presents a survey of the main characteristics of Coco, Section 6.2 describes the components of the generated compilers, and Section 6.3 shows how these compilers work. Since Coco produces only the basic parts of a compiler, the user must supply additional modules to get a complete compiler. Section 6.4 describes the interfaces for these modules and Section 6.5 shows how a multi-pass compiler can be produced with Coco. 6.1 Characteristics Coco is a program which generates the basic parts of a compiler from a compiler description that is supplied as its input. The characteristics of Coco are: 1. 2. The compiler definition language Cocol is easy to read and easy to learn. It is based on L-attributed grammars whose syntax rules are written in Wirth's EBNF notation, and whose semantic actions are coded directly in Modula-2. Coco and the compilers produced by it are small and efficient, since they use simple analysis techniques (table-driven top-down parsing and Lattributed grammars), and since the parser tables are encoded in a very compact form (G-code). Therefore, they can be efficiently used on microcomputers with a small memory and limited processor performance. 117
The compiler compiler Coco 118 Chap. 6 The generated compilers contain a syntax error-recovery algorithm that is automatically derived from the attributed grammar. This frees the user from developing individual error handlers for each target compiler. The user can attach modules of his own to the generated compiler parts, thus adapting the compiler to his particular needs. The input grammar is checked for completeness, consistency, and unambiguity. Coco supports the production of multi-pass compilers for languages that cannot be translated in a single pass, or that are so large that a single-pass compiler will not fit into memory. Coco offers the possibility of excluding selected source text portions from syntax analysis. Thus, it is possible to describe complements of regular languages, or to forward parts of the input from one pass to the next without modification. Besides terminals and nonterminals, Coco provides a third class of symbols called pragmas. Pragmas are special terminals that can appear at arbitrary positions in the input stream, but are not part of the syntax of the language itself (e.g. end-of-line symbols or compiler options). How to invoke Coco The invocation of Coco and the naming of the files involved depend on the computer on which Coco is running. We describe the version for the Apple Macintosh. On the Macintosh, Coco is invoked by clicking its icon and by selecting an input file from the open dialog box which shows all available text files. Fig. 6.1 is a block diagram of a Coco run. Compiler description in Cocol Syntax analyzer Fig. 6.1 Input and output files of Coco Coco reads a compiler description and produces the following: 1% a Syntax analyzer as described in Section 2.5 together with parser tables (G-code and symbol information);
Sec. 6.2 2. 3. Components of the generated compiler 119 asemantic evaluator as described in Section 3.6; asource list of the Cocol input with any syntax and semantic error messages, with the results of the grammar tests and with statistical data about the grammar. The syntax analyzer and the semantic evaluator are generated from program frames on files. On the Macintosh, the generated parts are written to the following files: Syntax analyzer: grammarnamesyn.DEF, grammarnamesyn Semantic evaluator: grammarnamesem.DEF, grammarnamesem.MOD .MOD Source list: inputname.LST grammarname is the grammar name specified in Cocol, inputname is the name of the input file. Section 8.3 shows an example of these files. 6.2 Components of the generated compiler In order to get a complete compiler, the user must attach his own modules to the compiler parts produced by Coco. The following table shows which parts are generated by Coco, which must be supplied by the user, and which are available as standard modules. Generated by Coco User-supplied Standard module Syntax analyzer Semantic evaluator Main program Lexical analyzer Semantic modules Error message module Hence, Coco generates only the basic parts of a compiler (those which are described by the attributed grammar). For flexibility, the remaining parts may be written individually, although they are very similar in all compilers (see program listings in Appendix F). The lexical analyzer can be generated with the scanner generator Alex (Mössenböck [1986]), which is a separate tool not described in this book. It produces a scanner module in Modula-2 that exactly fits to the modules generated by Coco. The semantic modules are written in Modula-2. Only few conventions have to be obeyed (see Section 6.4).
Chap. 6 The compiler compiler Coco 120 6.3 Operation of the generated compiler Figure 6.2 shows the overall structure of a generated single-pass compiler. The main program calls the syntax analyzer. The syntax analyzer parses the source program by interpreting the G-code and executes semantic actions contained in the semantic evaluator, which in turn call semantic procedures to emit the target code. A filter procedure between the actual syntax analyzer and the lexical analyzer filters any pragmas out of the input stream and processes them semantically. To create a multi-pass compiler, one must write a compiler description for each pass separately and translate it with Coco. This results in a syntax analyzer and a semantic evaluator for each pass. Figure 6.3 shows the interaction of the generated parts in a two-pass compiler. The first pass reads the source program, processes it and generates an intermediate language (IL). The second pass reads the intermediate language, processes it again and generates the target code. Main program Syntax analyzer Lexical analyzer Error message module Error Fig. 6.2 Semantic evaluator i Semantic Overall structure of a generated single-pass compiler Main program Syntax analyzer 1 Syntax analyzer 2 Lexical Semantic Semantic analyzer evaluator 1 evaluator 2 m, procedures 1 [2] See procedures 2 Fig. 6.3 Overall structure of a generated two-pass compiler eS)
Sec. 6.4 Interfaces of the generated compiler 6.4 Interfaces of the generated 121 compiler A compiler nucleus produced by Coco has four interfaces (shown in Fig. 6.4). It is called by the main module, reads the input stream, translates it into an output stream, and produces error messages. This nucleus is the same for all generated compilers. The user must attach some of his own modules to these interfaces to adapt the compiler to his particular needs. Operating system interface Input E Syntax analyzer nee Semantic evaluator = interface Fig. 6.4 Interfaces of a generated compiler « 6.4.1 # Caller interface The main program must call the syntax analyzer of the generated compiler to perform the syntax analysis and semantic processing of the input text. The following definition module shows the interface between the syntax analyzer and the main program. DEFINITION VAR MODULE printinput: printnodes: grammarnamesyn; BOOLEAN; BOOLEAN; PROCEDURE Parse (VAR END grammarnamesyn. (*trace (*trace the the input?*) parser?*) correct :BOOLEAN) ; grammarnamesyn is the name of the generated syntax analyzer (the grammar name from Cocol with the suffix syn). The procedure Parse is the actual syntax analyzer. It must be called from the main program of the compiler. Prior to this, the lexical analyzer (see Section 6.4.2) must be initialized and ready to supply the first symbol. The parameter correct shows if syntax errors have been found. The variables printinput and printnodes can be set to TRUE in order to produce a trace of the syntax analysis for debugging.
122 The compiler compiler Coco 6.4.2 Input Chap. 6 interface The syntax analyzer expects the input from a procedure GetSy which must be supplied by the user in a module grammarnamelex (grammar name from Cocol with the suffix lex). The corresponding definition module must look like this: DEFINITION MODULE grammarnamelex; VAR typ: at: line: col: CARDINAL; ARRAY[1..10] CARDINAL; CARDINAL; PROCEDURE GetSy; END OF CHAR; (*current symbol number*) (*attributes of the current symbol*) (*current symbol line number*) (*current symbol column number*) grammarnamelex. Every time the syntax analyzer needs a new terminal, it calls the procedure GetSy which returns the symbol number, line number and column number of the next source symbol in the global variables typ, line and col. It also fills the array at. If a symbol has i attributes, then az[1..i] holds their values. at is implicitly imported in any attributed grammar. It can contain a maximum of 10 attributes which experience has shown is sufficient. If imported, typ, line, and col can be used in the attributed grammar to get the type and the attributes of symbols that are recognized by the special symbol any. The symbol numbers returned by GetSy must correspond to the declaration sequence of the terminals and pragmas in the compiler description. The first declared symbol must have the number 1, the next symbol must have 2 and so on. At the end of the input stream GetSy must return an end-of-file symbol which by convention has the symbol number 0. 6.4.3 Output interface For the generation of object code and other compiler outputs the user is not bound by any restrictions. One can arbitrarily attach one's own modules to the compiler nucleus and call one's procedures from the semantic actions of the attributed grammar. Thus, the output interface is the interface to all user-supplied semantic modules. It is described by the import clauses in the semanti c declarations of the compiler description and by the imported definition modules .
Sec. 6.4 Interfaces of the generated compiler 6.4.4 Syntax error 123 interface The syntax analyzer of the generated compiler automatically recovers from a syntax error and gathers information about the cause of error. However, the user must provide for the output of the error message by supplying a procedure SyntaxError exported from a module Errors (see standard module in Appendix F). This procedure is called by the syntax analyzer each time a syntax error occurs. It can print the error message immediately or store it in order to display all error messages together at the end of the compilation. The definition module Errors must have the following form: DEFINITION TYPE MODULE Symbolname = Errors; ARRAY[1..25] Errorptr = POINTER Errornode = RECORD txt: Symbolname; ils CARDINAL; next: Errorptr; OF CHAR; TO Errornode; (*symbol name*) (*length of symbol name*) (*to next symbol of the same message*) END; PROCEDURE SyntaxError END Errors. (symbols:Errorptr; line,col:CARDINAL) ; SyntaxError has three parameters: symbols is a pointer to a linked list of those symbols that are expected at the error location (if available, alias names are uSed in place of symbol names). The parameters line and column indicate the line number and column number of the error location. Figure 6.5 shows a sample list of expected symbols pointed to by the parameter symbols. Bl a aig ee SS pee Ä Fig. 6.5 List of expected symbols. colon is the symbol causing the error; semicolon or END have been expected instead The first node of the list contains the symbol that caused the error (in this case the colon), the subsequent nodes contain the symbols that were expected instead of the erroneous symbol (in this case semicolon and END). SyntaxError can now produce the following message: Syntaxerror in line...column...near colon: semicolon or END expected
Chap. 6 The compiler compiler Coco 124 6.5 Generation of multi-pass compilers With L-attributed grammars, some languages can only be translated in multiple passes. Some other languages are so complex that a single-pass compiler would not fit into the memory of a microcomputer. For these reasons, a compiler must often be split into several passes. Each pass is a compiler of its own. It reads the source program, or an intermediate language from which it produces a new intermediate language, or the target program. If somebody wants to write a multi-pass compiler, he must write a compiler description for each pass, and then put the produced compiler passes in sequence (see Fig. 6.3). Cocol has features that are specially designed for the generation of multi-pass compilers: Input from an intermediate language. It is possible to read an intermediate language file instead of a source text by simply supplying an appropriate input procedure GetSy (see Section 6.4.2) Pragmas serve mainly to pass control information from one pass to the next in the intermediate language. Before they get to the syntax analyzer of the next pass they are extracted from the input stream and processed semantically. The symbol any. The grammar symbol any can be used to exclude parts of the source text from the syntax analysis, and forward it unchanged to the next pass. 6.1 Example Application of any A typical application of the complement symbol any is to process declarations in the first pass of a compiler and statements in the second pass. The following example skips statements and forwards them to the next pass: Block = Declarations BEGINSY { any sem Copy (4typ, dline, dcol, dat) ; == copy symbol to next -- intermediate language endsem } ENDBLOCKSY. Here, any denotes all terminal symbols except ENDBLOCKSY. It can be semantically processed using the variables typ and at exported by the lexical analyzer (see Section 6.4.2).
7 The implementation In this chapter we will show how Coco is structured and how it works. First we provide an overview of its design (7.1). Then we describe the internal data structures such as the symbol list (7.2) and the top-down graph (7.3), as well as the collection of some sets of terminal symbols (7.4). Section 7.5 covers various grammar tests which the top-down graph is subjected to before the target compiler is generated. The last three sections cover the generation of the compiler parts, namely the parser tables (7.6), the syntax analyzer (7.7), and the semantic evaluator (7.8). Section 8.3 shows an example of the generated compiler parts for a specific input grammar. At the beginning of each section, a diagram is used to illustrate how this section relates to the structure of chapter 7. The implementation Sn Moc Structure of the symbol list Structure of the top-down graph Collecting the symbol sets Grammar tests Generation of the parser tables Generation of the syntax analyzer Generation of the semantic evaluator Fig. 7.1 Structure of Chapter 7 We describe algorithms in an abstract manner, using Adele or Cocol. Appendix F contains the concrete implementation of Coco. Details that are not 125
necessary for understanding the algorithms are the program listings. Coco is written in Modula-2 and has been computers including Macintosh, IBM-PC, compilers in Modula-2 and was used for its describe the implementation on the Macintosh. 7.1 Chap. 7 The implementation 126 omitted as they can be found in implemented on various microAtari and Lilith. It produces own implementation, too. We Survey Like any compiler, Coco is composed of an analysis part (front end) and a synthesis part (back end). The analysis part consists of a lexical analyzer and a syntax analyzer. The synthesis part consists of a semantic evaluator with several semantic modules attached to it (Fig. 7.2). Main program Syntax analyzer Lexical analyzer Symbol list handler Top-down graph handler Semantic evaluator Grammar tests Generation of the syntax analyzer Generation of the semantic evaluator Fig. 7.2 Structure of Coco with its main tasks shown as semantic modules From the above, the main tasks of Coco are: 1. 2. 3. 4. handling a symbol list: Symbol information is stored (name, symbol number, attribute, scope, etc.); handling a top-down graph: Graph nodes are generated and linked to form subgraphs; testing the grammar: The grammar is checked to see if it is complete, non-circular, and LL(1). It is also checked to see whether all nonterminals can be reached and derived into terminal strings; generating the syntax analyzer: The source code of the generated syntax analyzer is built from fixed frame parts, and variable parts derived from
Sec 7.2 Structure of the symbol list 127 the compiler description. It includes LL(1) parser tables generated from the attributed grammar; 5. generating the semantic evaluator: The source code of the semantic evaluator is built from fixed frame parts and from semantic actions and declarations copied from the compiler description. The main algorithm of Coco is as follows: Coco: Initialize lexical Parse (Tok); analyzer; 7 SOC CU LONE a4 if ok then Find deletable ‘symbols; Insert eps-nodes before deletable Delete redundant eps-nodes; nt's; Get symbol sets; Test grammar(lok); end; EWOK then Generate compiler; else Print error message; end; end Coco; == Section -- Section =—SSCCLVONU == Section -- Section -- 141 7.3.3 Teor 7.4 7.5 Sections 7.6 and 7.7 The procedure Parse parses the input text and calls the semantic actions for the construction of the top-down graph and the symbol list as well as for the generation of the semantic evaluator. After some tests and transformations of the data structures the target compiler is produced. 7.2 Structure of the symbol list Coco handles a symbol list with information about terminals, nonterminals, and pragmas. This section describes its representation and shows how it is filled. 7.2.1 Symbol list representation The symbol list is a linear list of symbol nodes each of them describing a syntax symbol. The list is indexed by symbol numbers. TYPE Symboltype = (eps,t,pr,nt,any,err); (*eps, terminal, pragma, Symbolnode = RECORD spix: CARDINAL; nonterminal, (*spelling any, index error-symbol*) of symbol name*)
Chap. 7 The implementation 128 The implementation Structure ofthe top-down 2 symbol list representation Collecting the symbol sets es Grammar tests Generation of the parser tables Generation of the syntax analyzer Generation of the semantic evaluator symbol list construction Fig. 7.3 Structure of Section 7.2 aliasspix: CARDINAL; nra: CARDINAL; CASE typ: Symboltype OF t,eps,any: (*spelling index of alias (*number of attributes*) (*symbol kind*) name*) (*nothing*) | pr: seml,sem2: CARDINAL; (*pragma semantics*) nt,err: start: CARDINAL; (*start del: BOOLEAN; (*TRUE firstat: Attributeptr; (*to of top-down if first graph*) deletable*) formal attribute*) END; END; Symbollist = ARRAY[0..maxsymbol] OF Symbolnode; The fields spix, aliasspix, nra, and typ are filled when the symbol is declared. For terminals, this is the only information stored in the symbol list. The node of a pragma has two additional fields denoting the semantic actions which the generated compiler has to execute when it reads this pragma. The first action is for the output attribute assignments (Section 7.8.4), the second is the semantic action associated with this pragma in Cocol. If no actions are to be executed, both fields are zero. The fields are filled when the pragma is declared. Nonterminal nodes contain additional information: The field start points to the root of the top-down graph of this specific nonterminal. It is set when the corresponding rule has been processed. At the same time, the field del is set, which indicates whether the nonterminal is directly deletable, i.e. if it can be immediately derived into the empty string. The indirect deletability of a nonterminal can only be determined when the top-down graphs of all nonterminals have been built (see Section 7.4.1). Finally, nonterminal nodes have a field firststat pointing to a list of formal attributes. This list contains
Seen 7.2 Structure of the symbol list 129 the name and direction (input-output) of each attribute of the nonterminal. The attribute list is built when the nonterminal is declared. It is implemented as follows: TYPE Direction Attributeptr = (up,down); (*attribute = POINTER TO Attribute; Attribute = RECORD spix: CARDINAL; (*attribute dir: Direction; (*up, down*) next: Attributeptr; (*to END; direction*) next name*) attribute of same nt*) 7 Names of symbols and attributes are not stored in the symbol list directly. Rather, they are stored in a name list which is an array of characters. Instead of the actual names the symbol list contains only their address in the name list, called spix (spelling index). The lexical analyzer handles a hashed list of 'spixes' for fast searching of names. 7.2.2 Symbol list construction For each symbol in the syntax declarations of Cocol, a symbol node with a successive number is allocated. Therefore, symbol numbers correspond to the declaration sequence of the symbols. The following procedures are used to generate, access, and modify symbol nodes: PROCEDURE NewSy (spix:CARDINAL; PROCEDURE SyNr(spix:CARDINAL): PROCEDURE PROCEDURE GetSy(sy:CARDINAL; RepSy(sy:CARDINAL; typ:Symboltype) : CARDINAL; CARDINAL; VAR sn:Symbolnode) ; sn:Symbolnode) ; NewSy generates a new symbol node with the fields spix and typ and returns its node number. SyNr searches for the symbol with the name spix. If spix is found, SyNr returns the corresponding symbol number, else it returns 65535 (the value of the null symbol). GetSy gets the symbol node sn corresponding to symbol number sy. Repsy replaces the symbol sy by the node sn. Attributes are processed with the following procedures: PROCEDURE NewAt (sy, spix:CARDINAL; PROCEDURE PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; CompleteAt (sy,n:CARDINAL) : BOOLEAN; dir:Direction); VAR dir:Direction); NewAt defines a new attribute for the symbol sy. For nonterminals, it also appends the name (spix) and the direction (dir) of the attribute to the attribute list. GetAt gets the fields spix and dir of the nth attribute of the nonterminal sy. If sy has less than n attributes, then 0 is returned as the value of spix.
130 The implementation Chap. 7 CompleteAt returns TRUE if the symbol sy has exactly n attributes. The implementation of these procedures is trivial as can be seen in Appendix F. 7.3 Structure of the top-down graph The top-down graph has already been described in Section 2.3 as an internal grammar representation. In Coco, it is implemented in a somewhat extended form. First, we will describe the extended top-down graphs, and then show how they are generated. In Section 7.6.2, we will describe the translation of top-down graphs into G-code. The implementation Structure of the symbol Collecting the symbol sets Grammar tests Generation of the parser Generation of the syntax Generation of the semantic tables analyzer evaluator list Top-down Top-down graph graph representation construction Insertion Removal of of eps-nodes redundant eps-nodes Fig. 7.4 Structure of Section 7.3 7.3.1 Top-down graph representation The top-down graph is a linear list of graph nodes. Each symbol on the righthand side of a Cocol rule is represented by a node. The pointers linking the nodes are indices of this list. TYPE Topdowngraph = ARRAY{1..maxnode] OF Graphnode; Graphnode = RECORD typ: (eps,t,nt,any); (*symbol kind*) sp: CARDINAL; ipo ra CARDINAL; CARDINAL; (*t,nt: pointer to node in symbol (*eps: pointer to eps-set*) (*any: pointer to any-set*) (*left pointer*) (*right pointer*) list*)
Sec. 7.3 Structure of the top-down graph seml: sem2: sem3: line: link: CARDINAL; CARDINAL; CARDINAL; CARDINAL; CARDINAL; ‘ 131 (*in-attribute action*) (*out-attribute action*) (*explicit semantic action*) (*line number in the source text*) (*pointer to the next right end*) END; Compared to Section 2.3 the graph node is extended by three semantic numbers, a line number, and a pointer (link). These fields have the following meaning: seml: action number of the input attribute assignments or zero (Sect. 7.8.4); sem2: action number of the output attribute assignments or zero (Sect. 7.8.4); sem3: number of the user-written semantic action which follows this symbol in the Cocol text, or zero; line: line number of this symbol in the Cocol text (for error messages); link: pointer for linking the right ends of a graph (the right ends are the nodes whose right pointer is zero). = 7.3.2 Top-down graph construction It is useful to think of a top-down graph as a ‘black box' linked to its environment by two pointers head and tail. The interior of the black box may contain a single node, or an arbitrarily complex graph with several nodes. (Fig. 7.5). Fig. 7.5 Top-down graph as a black box’ head points to the root of the graph and fail to its right end. Since the right end of the graph usually consists of several nodes, these nodes are linked (see dashed lines above). The following procedures are used to generate and process the graph nodes: PROCEDURE PROCEDURE NewNode (typ:Symboltype; GetNode (n:CARDINAL; VAR PROCEDURE RepNode (n:CARDINAL; sy,line:CARDINAL) : CARDINAL; gn:Graphnode); gn:Graphnode) ; NewNode creates a graph node containing the specified symbol sy, having
The implementation 132 Chap. 7 the symbol type typ, and the line number line and returns its node number. GetNode returns the nth graph node in gn. RepNode replaces the nth graph node by gn. Two top-down graphs can be combined to a new graph by arranging them either side by side as successive components or below one another as alternatives. In either case, a new top-down graph with head and tail is produced. Linking of successive components Coco uses the procedure ConcatRight to link sucessive components. ConcatRight (theadl, param local ftaill, Jhead2, headl,head2,taill,tail2: p:2Cardıinal; Jtail2): Cardinal; begin p:=taill; while p<>0 do gn(p) .rp:=head2; p:=gn(p) .link; end; Pad lista las end ConcatRight; ConcatRight links the graphs (headl, taill) and (head2, tail2) via right pointers giving the new graph (headl, taill). The right ends of the first graph are linked with the root of the second graph (see Fig. 7.6). Fig. 7.6 Linking of successive components
Secs 7.3 Structure of the top-down graph 133 Linking of alternatives Coco uses the procedure ConcatLeft to link alternatives. ConcatLeft (fheadl, {taill, Jhead2, Jtail2) param headl,head2,taill,tail2: Cardinal; local p: Cardinal; begin p:=headl; while gn(p).lp<>0 do p:=gn(p).1lp; end; gn (p) .lp:=head2; p:=taill; while gn(p) .link<>0 gn(p) .link:=tail2; end ConcatLeft; ConcatLeft do p:=gn(p).link; end; links the graphs (headl, taill) and (head2, tail2) via left pointers giving the new graph (headl, taill). The end of the first alternative chain of the first graph is linked with the root of the second graph. The right ends of both graphs are connected in a similar way (see Fig. 7.7). Fig. 7.7 Linking of alternatives An attributed grammar for the construction of top-down graphs In order to show that attributed grammars can be used for documentation as well, we will describe the generation of the top-down graph for one syntax rule by means of an attributed grammar. The complete top-down graph is composed of the graphs for all syntax rules.
Chap. 7 The implementation 134 The grammar of EBNF rules Rule Expression Term = identifier "=" Expression = Term {"|" Term}. = Factor {Factor}. symbol | "eps" | "any" | "(" Expression ")" Factor ".". [eUIeExpressmon))" (USS Bx pices SHOnmun aus contains the nonterminals Expression, Term, and Factor. Each of these nonterminals supplies as an output attribute a top-down graph with the ends head and tail. These graphs can be linked in two different ways: factor graphs are linked via right pointers, term graphs via left pointers (ConcatRight and ConcatLeft). A new top-down graph is formed in either case, which is again represented by head and tail. Expression, Term, and Factor also supply an output attribute del, which indicates if the term or factor is directly deletable, i.e. if it can be derived into the empty string. del is entered into the symbol list. The attributed grammar uses the procedures described above to handle the symbol list (GetSy, RepSy, SyNr) and the top-down graph (NewNode, ConcatLeft, ConcatRight). GRAMMAR Rule SEMANTIC FROM FROM VAR -- graph generation for a single rule DECLARATIONS cocogra cocosym IMPORT IMPORT NewNode, ConcatLeft, ConcatRight, Push, Pop; GetSy, RepSy, SyNr, Symbolnode, anysy, epssy; h1,h2,h3: CARDINAL; t1,t2,t3: CARDINAL; dell,del2,del3:BOOLEAN; sn: Symbolnode; spix,syspix: CARDINAL; sy: CARDINAL; ---- head pointers tail pointers TRUE, if element --- spelling indices symbol number is deletable MACROS sem :PushValues: Push(Jh1); Push(Jh2); Push(Jt1); Push(Jt2); Push (Jdell); Push(Jdel2); endsem sem :PopValues: Pop(Tdel2); Pop(Tt2); Pop(Th2); Pop (Tdell); endsem Pop (ttl); Pop (Th1); TERMINALS " (a uh) " " fe wy " symbol<out :spix> " {” ” }" wow " n ” "eps" "any"
Sec. 7.3 Structure of the top-down graph 135 NONTERMINALS Rule Expression <out:hl,tl,dell> Term <out:h2,t2,del2> Factor <out:h3,t3,del3> RULES Rule = symbol<out:syspix> win Expression<out:hl,t1,dell> sem sy :=SyNr (dbsyspix) ; # Get Sy (Lsy, Tsn); sn.del:=dell; sn.start:=hl; RepSy (Jsy,Jsn); endsem Expression<out:hl,tl,delt> = Term<out:hl,tl,dell> { "|" Term<out :h2,t2,del2> sem ConcatLeft (fh1,ft1,4h2,4t2); dell:=dell endsem OR del2; Term<out :h2,t2,del2> = Factor<out:h2,t2,del2> { Factor<out:h3,t3,del3> sem ConcatRight (fh2,1t2,4h3,Yt3); del2:=del2 AND del3; endsem }% Factor<out:h3,t3,del3> symbol<out :spix> = sem sy:=SyNr(Jspix); h3:=NewNode (Jsy); t3:=h3; del3:=FALSE; endsem | "eps" sem h3:=NewNode (Jepssy); t3:=h3; del3:=TRUE; sem h3:=NewNode (Janysy); endsem (Tr sem (PushValues) endsem Expression<out:h3,t3,del3> N sem (PopValues) endsem aes le sem (PushValues) endsem Expression<out :h3,t3,del3> t3:=h3; del3:=FALSE; sem hl :=NewNode (Jepssy) ; tl:=hl; endsem | "any" ConcatLeft (th3,!t3,lh1,/t1); del3:=TRUE; endsem
Chap. 7 The implementation 136 Mt ps he Expression<out sem (PopValues) endsem sem (PushValues) endsem :h3,t3,del3> sem h1:=NewNode (Jepssy); tl:=hly ConcatRight (th3, 1t3,4n3, 4t3); ConcatLeft (fh3,ft3,Jh1,Jt1); t3:=t1; del3:=TRUE; endsem sem (PopValues) endsem. Ww ENDGRAM Figure 7.8 shows which graphs are produced by the translation of an EBNF expression in brackets. As an example, we select the expression abc. (ablc) [abIc] {ablc} Fig. 7.8 Translation of an EBNF expression into a top-down graph 7.3.3 Insertion of eps-nodes Normally each symbol of the input grammar corresponds to one node in the top-down graph. However, from Fig. 7.8, we see that the translation of expressions in square or curly brackets leads to the generation of additional eps-nodes which have no counterpart in the input grammar. They are inserted by Coco to indicate that an expression is deletable. There are also some other cases where eps-nodes must be inserted into graphs: The algorithm of Section 7.3.2 will fail if a term that begins with an expression in curly brackets has an alternative. The production g = (tay be ey would lead to the top-down graph shown in Fig. 7.9.
Sec. 7.3 Structure of the top-down graph 137 Fig. 7.9 Erroneous top-down graph for S = ({a} b Ic) This is obviously wrong because once an a has been identified, only a or b should follow, not c, as is possible in the above graph. This problem is solved by including an €ps-node in front of the first alternative (Fig. 7.10). anes u Fig 7.10 Correct top-down graph for S = ({a} b | c) with inserted eps-node This graph is now correct since after identifying an a, only a or b can follow, not c. For each eps-node, the set of terminal successors (eps-sef) is computed (Section 7.4.4). The eps-set of the node el (namely {a, b}) allows us to distinguish between the two alternatives in the above example. Epsnodes-are inserted in front of all expressions in curly brackets during the construction of the top-down graph (see attributed grammar in Appendix F). Deletable nonterminals present a similar problem. If a nonterminal is deletable, it is always processed by the syntax analyzer, because if the current input symbol is not a start symbol of the nonterminal itself it may still be a valid successor. Now, if there is a node which is an alternative of a deletable nonterminal, this node will never be visited, since the nonterminal will always be recognized beforehand. Coco solves this problem by inserting an eps-node in front of a deletable nonterminal. The eps-set of this node is then used to distinguish between the alternatives. From the graphs shown in Fig. 7.11, where the deletable nonterminal Y has an alternative, the graphs in Fig. 7.12 are produced. 1 SS NO Ne 1 b & Fig. 7.11 Top-down graph with deletable nonterminal Y X: — el i b —~> Y—- a Y: — c i €2 Fig. 7.12 Top-down graph with inserted eps-node in front of deletable nonterminal Y
Chap. 7 The implementation 138 The eps-set of the node el (namely {a, c}; c is a terminal start of Y and a is successor of the deletable nonterminal Y) enables the selection between the two alternatives starting with el and b. There are no more alternatives to the node with the deletable nonterminal Y. It can therefore be safely visited by the syntax analyzer. The algorithm for the insertion of eps-nodes in front of deletable nonterminals is shown below. Insert eps-nodes before deletable local gn,gnl: Graphnode; sn: Symbolnode; begin for all nodes i do GetNode (Li, Tgn) ; if (gn.typ=nt) and nt's: (gn.lp<>0) then GetSy(Jgn.sp,Tsn); if sn.del then -- deletable gnl:=gn; gnl.1p:=0; -j:=NewNode (Jnt,J0,40); -- nt with gnl now create alternative holds the deletable empty nt node RepNode (4j,Jgnl); gn.typ:=eps; gn.sp:=0; -- gn holds the new eps-node gn.rp:=j; gn.seml:=0; gn.sem2:=0; gn.sem3:=0; RepNode (Li, tgn) ; end; end; ame => ioe end Insert eps-nodes 7.3.4 Removal before of redundant deletable nt's; eps-nodes When expressions in square or curly brackets are translated, eps-nodes arise that can be removed again if it turns out that the expressions have successors (see Fig. 7.13). The algorithm for the removal of redundant eps-nodes is shown below: Delete redundant eps-nodes: global visited: set of nodenumbers; sn: Symbolnode; ; begin visited:={}; for all nonterminals i do GetSy (Li, Tsn); DelEps (Jsn.start); end; end Delete redundant eps-nodes; -- mark list for visited nodes
sec. 7.3 Structure of the top-down graph EBNF expression Graph with redundant eps-nodes 139 Equivalent graph without redundant eps-nodes [a] b — lee (a)b ee en ’ €e—b Fig. 7.13 Creation and removal of redundant eps-nodes The procedure DelEps(Jloc) deletes all redundant graph with the root loc. Redundant eps-nodes following characteristics: they have no associated pointer is null, and their right pointer is not null. from the left pointer of some other node. DelEps (4loc) : pafam loc: global local begin if Cardinal; visited: gn,gnl: loc=0 or eps-nodes in the top-down can be recognized by the semantic actions, their left They always receive a link set of nodenumbers; Graphnode; loc in visited then -- return mark end; list for visited -- end or cycle visited:=visited+t{loc}; GetNode (Lloc, Tan); if gn.lp<>0 then -- test if alt. node GetNode (Jgn.1p, Tgnı); if (gnl.typ=eps) and (gnl.sem3=0) and (gnl.lp=0) and (gnl.rp<>0) then gn.lp:=gnl.rp; RepNode (Jloc, Jan); end; end; DelEps (Jgn.1p); DelEps (Jgn.rp); end DelEps; is a redundant eps nodes;
Chap. 7 The implementation 140 7.4 Collecting the symbol sets So far, the input grammar has been read and the symbol list as well as the topdown graph have been built. From these two data structures, Coco calculates the symbol sets needed for the grammar tests and for the generated compiler. The implementation acs Structure of the Structure of the Collecting the symbol list top-down graph |symbol sets Deletable nonterminals co Grammar tests Terminal start symbols of Terminal successors of nonterminals nonterminals To Generation of the Generation of the Generation of the parser tables syntax analyzer semantic evaluator eps-sets any-sets Fig. 7.14 Structure of Section 7.4 Coco collects four sets of terminals: 1. 2. 3. 4. start symbols of nonterminals; successors of nonterminals; successors of eps-nodes (eps-sets); sets represented by any-symbols (any-sets). The following procedures are used to access the top-down graph and the symbol list: PROCEDURE PROCEDURE PROCEDURE PROCEDURE GetNode(loc:CARDINAL; VAR gn:Graphnode); RepNode(loc:CARDINAL; gn:Graphnode); GetSy(sy:CARDINAL; VAR sn:Symbolnode) ; RepSy (sy:CARDINAL; sn:Symbolnode) ; GetNode gets the graph node gn with the number loc. RepNode replaces the graph node with the number loc by the node gn. GetSy gets the symbol node sn with the number sy. RepSy replaces the symbol node with the number sy by the node sn. Before the symbol sets are collected, it is necessary to find out which nonterminals are deletable.
Sec. 7.4 7.4.1 Collecting the symbol sets Deletable 141 nonterminals All deletable nonterminals are tagged in the symbol list. In the first step, tagging of those symbols which’can be directly derived into the empty string is carried out. In the second step, tagging of all those nonterminals whose topdown graph can be traversed along a path of already tagged symbols is carried out. The second step is repeated until no more deletable symbols are found. The directly deletable nonterminals are found when the top-down graph is created (see Section 7.3.2). The following algorithm finds the indirectly deletable nonterminals. “ Find deletable local symbols: sn: changed: Symbolnode; Boolean; begin repeat changed:=false; for all nonterminals i do Getsy(Ji,Tsn); if not sn.del and Deletable(Jsn.start) sn.del:=true; RepSy (Ji,Ysn); end; end; «, until not changed; end Find deletable symbols; then changed: =true; The procedure Deletable(\ loc) checks if the top-down graph rooted at loc is deletable (i.e. if it can be traversed along a path of deletable symbols). Deletable param global begin (loc) marked:={}; end Boolean: loc: marked: Cardinal; set of nodenumbers; return -- mark list for visited nodes DelGraph (4loc) ; Deletable; The actual work is performed by the procedure DelGraph. DelGraph (4 loc) Boolean: param loc: Cardinal; global marked: set of nodenumbers; local gn: Graphnode; begin if loc=0 then return true; end; if loc in marked then return false; marked:=marked+{loc}; GetNode(dloc, return DelGraph; end of graph found already visited: cycle Tgn); ((gn.lp<>0) (Delnode(Jgn) end --- end; and DelGraph (Jgn.1p)) and or DelGraph(Jgn.rp)); -- deletable -- or deletable alternat. -- part of graph right
Chap. 7 The implementation 142 Finally, DelNode checks if a node (i.e. its corresponding symbol) is deletable. DelNode (gn) Boolean: param gn: Graphnode; local sn: Symbolnode; begin if gn.typ=nt then GetSy(Jgn.sp,Tsn); else return end; end DelNode; 7.4.2 Terminal return sn.del; gn.typ=eps; start symbols of nonterminals The terminal start symbols of a nonterminal are the terminal start symbols of its top-down graph, i.e. the start symbols of its first alternative chain. Those nodes of the chain which contain nonterminals will have their terminal start symbols calculated recursively. If the chain contains a deletable symbol, its successors have also to be considered. The terminal start symbols of all nonterminals are stored in a list. Get terminal start symbols: global first: array(nonterminals) of record 1888 set of terminals; -- terminal start symbols ready: Boolean; Street eseisEcompuLed end; loealsssıt Symbolnode; begin for all nonterminals i do first (i).ready:=false; end; for all nonterminals i do GetSy (vi, Tsn); GetFirstSet (Ysn.start, Tfirst (i) .ts); first (i) .ready:=true; end; end Get terminal start symbols; The procedure GetFirstSer(Lloc,Ts) supplies the terminal start symbols of the top-down graph with the root loc. GetFirstSet param (Lloc,Ts): loc: Si visited: global begin visited:={}; Cardinal; set of terminals; set of nodenumbers; CollectFirst (Lloc, fs); end GetFirstSet; -- mark list for visited nodes
Sec. 7.4 Collecting the symbol sets 143 GetFirstSet initializes a mark list for the prevention of cycles and calls the procedure CollectFirst which does the actual work. CollectFirst (Jloc,Ts): param loc: Sr global visited: Prster localssesn: gn: Sie begin Cardinal; set of terminals; set of nodenumbers; -- mark like in 'Get terminal start Symboinode; Graphnode; set of terminals; s:={}; list for visited symbols'; nodes ¢ while loc<>0 do -- for all alternatives if loc in visited then return; end; -visited:=visited+{loc}; cycle GetNode (loc, Ton); if DelNode (Jgn) case gn.typ tee | nt: then CollectFirst if eps: end; first (gn.sp) .ready s:=s+tfirst (gn.sp) .ts; GetSy(Jgn.sp,Tsn); s:=stsl; end; “| s:=s+sl; S3=St(gni.sp}; then else any: (Jgn.rp,1sl); of Sei CollectFirst (4sn.start,Ts1); alltermnimansı, -- nothing end; loc:=gn.1p; end; end CollectFirst; The procedure DelNode(J gn) from Section 7.4.1 checks if the graph node gn is deletable. 7.4.3 Terminal successors of nonterminals The terminal successors of all nonterminals are stored in another list. They are collected in two steps: first, a search is made for the direct successors of all nonterminals (those terminals immediately following this nonterminal at all its occurrences in the graph); then the indirect successors are calculated (if a nonterminal is at the end of a rule, its indirect successors are the successors of the nonterminal on the left-hand side of this rule). In the first step, the data structure follow is filled; this contains for each nonterminal i its direct successors (ts) and those nonterminals (nts), whose successors are indirect successors of i. In the second step, the indirect successors are added to ts.
Chap. 7 The implementation 144 Get terminal successors: global follow: array(nonterminals) of -- terminal successors set of terminals; ts: -- nt's whose successors nts: set of nonterminals; -- must be added to ts end; -- mark list (visited nodes) visitednod: set of nodenumbers; visitedsym: set of nonterminals; -- mark list (visited nt's) Symbolnode; sn: ike Cardinal; local begin ie all nonterminals visitednod:={}; i do follow(i).ts:={}; follow(i).nts:={}; for i do -- fill and -- complete all nonterminals follow.ts end; follow.nts GetSy (vi, T sn); CollectFollow(lsn.start,Vi); end; for all nonterminals visitedsym:={}; Complete(Ji); end; end Get terminal i do follow.ts follow(i).nts:={}; successors; The procedure CollectFollow(Lloc,\sy) traverses the top-down graph of the nonterminal sy starting at the node loc. Every time it encounters a nonterminal i, it adds its direct successors to the set follow(i).ts. For each non- terminal i at the right end of the graph, it adds sy to the set follow(i).nts. CollectFollow(Jloc,Ysy): param global local loc,sy: Cardinal; follow: as in 'Get terminal visitednod: set of nodenumbers; gn: Graphnode; Se set of terminals; successors'; begin while loc<>0 do -- step through alternatives chain if loc in visitednod then return; end; -- cycle visitednod:=visitednod+{loc}; GetNode (loc, Tgn); if gn.typ=nt then GetFirstSet (Jgn.rp, Ts); follow(gn.sp).ts := follow(gn.sp).ts + s; ie Deletable(tgn.rp) then -- nt at end of rule follow(gn.sp) .nts := follow(gn.sp).nts + {sy}; end; end; CollectFollow(Jgn.rp,Ysy); loc:=gn.lp; end; end CollectFollow; The procedure GerFirstSet(Lloc,?s) from Section 7.4.2 computes the set of
Sec. 7.4 Collecting the symbol sets 145 terminal start symbols s of the graph with the root loc. The procedure Deletable(\loc) from Section 7.4.1 checks whether the graph rooted at loc is deletable. The procedure Complete(li) used in Get terminal successors completes the direct successors of the nonterminal i (follow(i).ts) by adding its indirect successors, which are the successors of the nonterminals contained in follow(i).nts. Complete(li): param i: global visitedsym:-set of nonterminals; follow: like in 'Get terminal Cardinal; local j: Cardinal; begin if i in visitedsym then return; end; successors'; zeyele visitedsym:=visitedsym+{i}; for all j in follow(i).nts do Complete (14); follow(i) ..ts:=follow(i)..ts+follow(j) .ts; end; end Complete; 7.4.4 eps-sets eps-nodes having an alternative must not be recognized by the generated syntax analyzer unless the next input symbol is a valid successor of this epsnode. In order to find out whether a symbol is a valid successor, the syntax analyzer must know the set of all possible successors of each eps-node with alternatives. The terminal successors of an eps-node are the terminal start symbols of the subgraph rooted at the right pointer of the eps-node. If the right pointer is null, the terminal successors are the successors of the nonterminal on the lefthand side of the graph containing the eps-node. First, the top-down graph of each nonterminal is searched for eps-nodes. Get eps-sets: global epsset: maxeps: array of set Cardinal; of terminals; --- number visited: set of nodenumbers; local sn: Symbolnode; begin visited:={}; maxeps:=0; for all nonterminals i do GetSy(Ji,Tsn); FindEps (Jsn.start,Ji,\false); end; end Get eps-sets; -- mark eps successors of eps-sets list for visited nodes
Chap. 7 The implementation 146 The procedure FindEps(lloc,lleftsy,\vialp) searches the top-down graph with the root loc for eps-nodes. It computes their successors and stores them into the global array epsset. The field sp of the eps-node is set to point to this entry in epsset. The flag vialp indicates whether loc has been reached via a left pointer. FindEps (loc, Jleftsy,Jvialp): param loc: Cardinal; == root. leftsy: Cardinal; -- left vialp: global local begin Boolean; -- Of DG side nonterminal true, visited: set of nodenumbers; gn: Graphnode; if loc=0 or loc in visited visited:=visited+t{loc}; then if -- loc is mark return; reached list end; for --end via lp visited or nodes; cycle GetNode (Jloc,Tgn); if (gn.typ=eps) and (vialp or (gn.lp<>0)) then -- FindEpsFollowers (Jgn.rp,Jleftsy, Tgn.sp); -- RepNode (loc, Jon); CDSS eps gn.sp with points alt. to Sel end; FindEps (lgn.lp, bleftsy, true) ; FindEps (Jgn.rp, Jleftsy,Yfalse); end FindEps; The procedure FindEpsFollowers(Lloc,Lleftsy,Tnr) collects the terminal start symbols of the subgraph with the root loc. If the graph is deletable, the successors of the nonterminal leftsy are also added. nr is the index into the global array epsset. The collected set has been stored in epsset(nr). FindEpsFollowers (Vloc,Jleftsy, fnr): param loc,leftsy,nr: Cardinal; global epsset: array of set of terminals; -- successors follow: like in Get terminal successors; maxeps: Cardinal; local s: set of terminals; begin GetFirstSet of eps-nodes (Jloc,1s); ie Deletable(Lloc) then s:=stfollow(leftsy) .ts; end; maxeps:=maxeps+l; epsset (maxeps) :=s; nr:=maxeps; end FindEpsFollowers; The procedure GerFirstSet(Lloc,?s) from Section 7.4.2 collects the terminal start symbols of the graph with the root loc. The procedure Deletable(J loc) from Section 7.4.1 determines whether the graph with the root loc is deletable.
Sec. 7.5 7.4.5 Grammar tests 147 any-sets In order to recognize an any-symbol, the generated syntax analyzer needs the set of all terminals represented by the any-symbol. An any-symbol represents all terminals which are not in the alternative chain to which it belongs. For any-symbols without alternatives, no any-sets are computed. The syntax analyzer recognizes them regardless of the next input symbol. Get any-sets: global anyset: maxany: eofsy: local gn: Ss begin for all nodes array of set of terminals; -- any-sets Cardinal; -- number of any-sets Cardinal; -- symbol number of eof-symbol Graphnode; set of terminals; i do GetNode (ti, Tgn); if (gn.typ=any) and (gn.lp<>0) GetFirstSet (Jgn.1p,1s); Make complement of s; s:=s-{eofsy}; -- eofsy maxany:=maxanytl; anyset (maxany) :=s; gn.sp:=maxany; -- 2 sp of must then not any-node be recognized points to by any any-set RepNode (Ji,Jgn); end; end; end Get any-sets; The procedure GetFirstSet(Lloc,1s) from Section 7.4.2 supplies the terminal start symbols of the graph with the root loc. For the calculation of an any-set, only those symbols are considered which can be reached via the left pointer of the any-node. The symbols which lie before the any-node in the alternative chain are not considered, since the syntax analyzer has already checked them before it gets to the any-node. 7.5 Grammar tests Before Coco generates the target compiler, it carefully checks if the grammar satisfies certain requirements which are necessary for a correct compiler. Here the compiler compiler proves to be very valuable: even in large grammars, which are hard to understand for human readers, it rapidly finds hidden ambiguities or circularities. The well-known problem of the ‘dangling else’ clearly
Chap. 7 The implementation 148 shows how easy bugs in the grammar design can remain undetected without the support of an automatic tool (actually, this ambiguity was overlooked in the language definition of Algol). Coco verifies the following properties: 1. 2. 3. completeness; reachability; noncircularity; 4. 5. termination; LL(1) property The implementation Structure of the symbol list Structure of the top-down graph Completeness Collecting the symbol sets Reachability Generation of the parser tables Noncircularity Generation of the syntax analyzer Terminalization Generation of the semantic evaluator LL(1)-condition Fig. 7.15 Structure of Section 7.5 The test algorithms are executed in the following Test grammar Test (Tok): completeness (T okl); Teste ut eal Find circular Test LL1 order: if all Wontisecanebe reached (Tok2) ; rules (Tok3); nt's can be derived to t's (Tok4); test (Tok5); ok:=okl and ok2 and end Test grammar; ok3 and ok4 and ok5; These algorithms access the top-down graph and the symbol list with the following procedures, already described in Sections 7.2.2 and 7.3.2: PROCEDURE PROCEDURE GetNode(loc:CARDINAL; VAR gn:Graphnode); GetSy(sy:CARDINAL; VAR sn:Symbolnode) ;
Sec. 7.5 7.5.1 Grammar tests 149 Completeness As check is carried out as to whether there is a rule for all nonterminals. Basic idea: The field start in the symbol node of each nonterminal must point to a top-down graph. Test completeness (Tok): param ok: Boolean; local begin sn: Symbolnode; ok:=true; for all = nonterminals i do GetSy (Li, Tsn); if sn.start=0 end; end Test 7.5.2 then ok:=false; end; completeness; Reachability A check is made as to whether all declared nonterminals appear in some sentential form derived from the start symbol of the grammar. Basic idea: First, tagging is done on all those nonterminals which can be derived directly from the start symbol, then on those nonterminals which can be derived from symbols already tagged. This is repeated until no more nonterminals can be tagged. The untagged nonterminals are not reachable. Test if all param ok: nt's Boolean; can be reached (Tok) : global visited: reached: rootsy: sn: set of nodenumbers; set of nonterminals; Cardinal; Symbolnode; ---- already visited nodes reachable nonterminals start symbol of grammar local begin visited:={}; reached:={rootsy}; Get Sy (Lrootsy, 1sn); MarkReachedNts(Jsn.start); ok:=true; for all nonterminals i do if not (i in reached) then ok:=false; end; end Test if all nt's can be reached; end; The procedure MarkReachedNts(Jloc) marks all nonterminals which can be reached from the node loc.
Chap. 7 The implementation 150 MarkReachedNts(JLloc): Cardinal; loc: param set of nonterminals; global reached: --- set of nodenumbers; Graphnode; visited: Kocalmrgni: sn: reachable nonterminals already visited nodes Symbolnode; begin if loc=0 or loc in visited visited:=visited+t{loc}; then -- return; visit end; -- end or cycle loc GetNode (Jloc,Tgn); if (gn.typ=nt) and not (gn.sp reached:=reached+{gn.sp}; GetSy (Lgn.sp, in reached) then -- new nt reached sn); MarkReachedNts (Jsn.start); end; MarkReachedNts(Jgn.1p); MarkReachedNts(Jgn.rp); end 7.5.3 MarkReachedNts; Noncircularity A check is made as to whether there are nonterminals which can be derived into themselves, i.e. if there are derivations X ++ X for some nonterminals X. (This circularity definition differs from the usual definition in attributed grammars, which defines circular dependencies of attributes.) Basic idea: All productions are considered, which have a single nonterminal as their right-hand side. These single-nonterminal productions make up a graph that must be noncircular. Algorithm: The graph is stored as pairs (left, right) of nonterminals for which there is a production left > right. Find circular param global local rules (Tok): ok: visited: graph : Boolean; set of nodenumbers; array of record left, right: singles:={}; nodes Cardinal; deleted: Boolean; end; graphlength: Cardinal; singles: set of nonterminals; sn: Symbolnode; ' changed: Boolean; Aura) Cardinal; begin graphlength:=0; for all nonterminals -- mark list for visited -- derivation graph i do visited:={}; -- build -- single the graph descendants of a nt
Sec. 7.5 Grammar tests 151 GetSy (Li, Tsn); Get Singles (Jsn.start,tsingles) for all nonterminals ; Hs j in singles CSc, MeElS: 5) Silene Wess do graphlength:=graphlength+1; with graph(graphlength) do left:=i; right:=j; deleted:=false; end; end; end; repeat -- remove edges, which are not on a cycle changed:=false; for i:=1 to graphlength do if not graph(i).deleted and (graph(i).left not on any right-hand side or graph(i).right not on any left-hand side) then graph (i) .deleted:=true; changed:=true; end; end; until not changed; ok:=graph is empty; end Find circular rules; The elements that have not been deleted in the graph represent the circular part of the grammar. The procedure GetSingles(Lloc,tsingles) collects a set (singles) of nonterminals in the top-down graph with the root loc. If the graph can be derived into a single nonterminal X, then X is added to singles. The following assertion always holds: Joc is on a path which contains only deletable symbols between its beginning and loc. Get Singles (Jloc,tsingles): param global local loc: singles: visited: gn: begin -- assert: Cardinal; set of nonterminals; set of nodenumbers; Graphnode; all nodes if loc=0 or loc in visited visited:=visited+t{loc}; GetNode if left then to loc are return; deletable end; -- end or cycle (Lloc, Tgn) ; (gn.typ=nt) and Deletable(lgn.rp) singles:=singlest{gn.sp} end; then -- right subgraph —zueletable if DelNode(Jgn) then GetSingles(lgn.rp,tsingles) end; GetSingles(Jgn.1p,!singles); end GetSingles; A nonterminal X is added to singles if it is on a path from loc to the end of the top-down graph and if this path has only deletable nodes to the left and right of X. The deletability of subgraphs and nodes is determined by the procedures Deletable and DelNode from Section 7.4.1.
7.5.4 Chap. 7 The implementation 152 Termination A check is made as to whether all nonterminals can be derived into (possibly empty) strings of terminals. Basic idea: Those nonterminals are tagged which are deletable or can be derived into a string consisting only of terminals or already tagged nonterminals. This is repeated until no more nonterminals can be tagged. The untagged nonterminals are those which cannot be derived into terminals. Test if nt's can be derived to t's(Tok): param global ok: visited: termlist: Boolean; set of nodenumbers; set of nonterminals; local changed: sn: Boolean; Symbolnode; begin termlist:={}; repeat changed:=false; for all nonterminals i which are --- mark list for visited nodes nonterminals which can be -- derived not to terminals in termlist do Getsy(Ji,Tsn); visited:={}; ie IsTerm(Jsn.start) then termlist:=termlist+{i}; end; end,= 2705 until not changed; ok:=all nonterminals end Test if nt's can changed:=true; are in termlist; be derived to t's; The procedure /sTerm(Lloc) checks if the top-down graph with the root loc has a (possibly empty) path which consists only of terminals or already tagged nonterminals. IsTerm(Jloc): Boolean: param loc: Cardinal; global visited: set of nodenumbers; termlist: set of nonterminals; local gn: Graphnode; begin if loc=0 or loc in visited then return visited:=visited+{loc}; GetNode if return elsezrewurne end; “end end; -- end or (Lloc, Tgn) ; (gn.typ=nt) then false; IsTerm; and not (gn.sp in termlist) IsTerm(dgn. 1p); (gn-rPp-0) or IsTerm(Jgn.rp) or IsTerm(Jgn.1p); cycle
Sec;=7.5 7.5.5 Grammar tests LL(1) 153 condition A check is made as to whether it is always possible to decide which path of the top-down graph should be followed during syntax analysis depending on the next input symbol. Basic idea: The LL(1) test consists of the following two subtests: l. 2. The terminal start symbols of all alternatives in an alternative chain must be disjoint. The terminal start symbols of deletable subgraphs must be different from the terminal succéssors of the left-hand side nonterminal. LL1 test (Tok): param ok: global visited: Ioealssssn: begin ok:=true; for all Boolean; set of nodenumbers; Symbolnode; nonterminals visited:={}; -- mark list for visited nodes i do GetSy(Yi,Tsn); CheckAlternatives(lsn.start, i, lok); end; end LL1 test; The procedure C'heckAlternatives(Lloc,\sy,tok) checks if the alternative chaih with the root loc contains only alternatives with distinct start symbols (subtest 1). If the subgraph rooted at loc is deletable (i.e. if it can produce the empty string), it is also checked whether the start symbols of the subgraph are different from the successors of the left-hand side nonterminal sy (subtest 2). CheckAlternatives uses GetF(lsy,1 first) and GetFo(Jsy,7 follow) to access the already calculated sets of terminal start symbols and successors of nonterminals. CheckAlternatives (loc, sy, lok) : param global localss loc,sy: ok: visited: first; follow: locset: Si gn: begin if loc=0 or Cardinal; Boolean; set of nodenumbers; set of terminals; set of terminals; set of terminals; set of terminals; Graphnode; loc in visited if Deletable(Jloc) then GetFirstSet (Vloc, Ts); GetFo(Jsy,Tfollow); then -- mark --- start start return; end; list for symbols symbols -- end visited nodes of current node of prev. alt. or cycle =zsubtests2
Chap. 7 The implementation 154 if s * follow end; <> {} then ok:=false; end; S:=(}7 -- for all alternatives while loc<>0 do if loc in visited then return; end; visited:=visited+{loc}; we subtestan GetNode (loc, Tan); if DelNode (4gn) then GetFirstSet else locset:={}; (Jgn.rp, Tlocset); end; case gn.typ of t: locset:=locset+t{gn.sp}; Mani: GetF (Jgn.sp,Tfirst); locset:=locset+first; | eps,any: -- nothing end; if s * locset <> {} then ok:=false; end; s:=stlocset; CheckAlternatives(tgn.rp, sy, tok) ; loc:=gn. 1p; end; end CheckAlternatives; The procedures Deletable(Lloc) and DelNode(\gn) from Section 7.4.1 check whether the top-down graph with the root loc or the graph node gn are deletable. The procedure GetFirstSet(lloc,ts) from Section 7.4.2 supplies the terminal start symbols s of the top-down graph with the root loc. 7.6 Generation of the parser tables When the grammar tests are completed, Coco can generate the target compiler. From the symbol list and the top-down graph, the parser tables which drive the generated compiler are constructed. The tables contain information for the recognition of symbols and for error handling, including the G-code which controls the syntax analysis. This section is structured as shown in Fig. 7.16. 7.6.1 Table format The parser tables are inserted into the generated syntax analyzer as initialization code. Table 7.1 shows their contents:
Sec. 7.6 Generation of the parser tables 155 The implementation Structure Structure of the symbol list of the top-down graph — Collecting Grammar | Generation the symbol sets tests Table format 7 Generation of the Generation of the syntax semantic analyzer evaluator Generation of the remaining tables Generation of theG-code Fig. 7.16 Structure of Section 7.6 Table 7.1 Contents of the parser tables header table dimensions (for decoding) code G-oode ntsymbols information about nonterminals epssets anysets attribute numbers pragma semantics sets of valid successors, one for each eps-instruction in the G-code sets of terminals represented by each any-symbol number of attributes for each terminal and each pragma for each pragma, the semantic actions to be executed when the pragma is recognized symbol names for error messages pointers to the symbol names namelist name pointers The structure of the above data is shown by the following Modula-2 type declarations: TYPE Header = RECORD maxcodevar, maxtvar, maxpvar, maxsvar, maxepsvar, Maxanyvar, maxnamevar, maxnamepvar: END; Code CARDINAL; . = ARRAY[1..maxcode] Symbolset = ARRAY[0..maxt OF DIV [0..255]; 16] OF BITSET; Ntsymbols = ARRAY[maxptl..maxsym] OF RECORD startpc: CARDINAL; (*start of rule in G-code*) del: BOOLEAN; (*true, if deletable*) seabiesic, 2 Symbolset; (*terminal start symbols*) END; Epsset = ARRAY[{1..maxeps] OF Symbolset; Anyset = ARRAY[1..maxany] OF Symbolset; Attributenumbers = ARRAY(0..maxp] OF [0..255];
Chap. 7 The implementation 156 OF RECORD Pragmasemantics = ARRAY[maxt..maxp] (*element maxt is a dummy*) seml,sem2: CARDINAL; END; Namelist = ARRAY[l..maxname] OF CHAR; Namepointers = ARRAY[0..maxnamep] Checksum = CARDINAL; OF CARDINAL; The constants maxcode, maxt, maxp, etc. are the table dimensions derived from the input grammar. They are inserted into the generated syntax analyzer as constant declarations. The header of the parser tables contains the same values as variables again. However, they are not used by the syntax analyzer, but are reserved for a decoding program. 7.6.2 Generation of the G-code The G-code is derived from the top-down graph. This process is very simple: A recursive algorithm visits all nodes of the top-down graph and translates them into G-code instructions. The simplified algorithm is shown below: GenCode (Jnode) : Generate code for node; if (node.rp<>0) and (node.rp not yet visited) then GenCode (dnode.rp) ; end; if (node.lp<>0) AND (node.lp not yet visited) then GenCode (dnode. lp) ; end; end GenCode; Each node is processed as follows (for the definition of the G-code, see Section 2.4 or Appendix D): 1. Depending on the node type, a G-code instruction for the recognition of this node is generated (T, NT, NTS, ANY and EPS instructions). For nodes with a nonzero left pointer value, the generated instruction also contains the address of the corresponding alternative (TA, NTA, NTAS, ANYA and EPSA instructions). 2. WwW 4. If semantic actions are specified in the node, SEM instructions are generated. If the right pointer of the node is zero, a RET instruction is generated. If the right pointer points to an already visited node, a JMP instruction to the address of this node is generated. In order to resolve jumps and addresses of alternatives, an address list of all G-code sequences generated from graph nodes is needed. It is handled by the following procedures:
Sec. 7.6 Generation of the parser tables 157 PROCEDURE NewAdr (loc:CARDINAL; adr:CARDINAL) ; PROCEDURE GetAdr (loc, fixup:CARDINAL; VAR adr:CARDINAL); PROCEDURE Visited(loc:CARDINAL) : BOOLEAN; NewAdr defines that the G-code sequence generated from node loc has the address adr. GetAdr returns the address adr of the G-code sequence corresponding to node loc. If the address is not yet in the address list, then adr is zero. In this case, fixup is remembered as a G-code location where the node's address is to be entered as soon as it becomes known. An address becomes known, when it is defined by NewAdr. It is then automatically entered into all fixup locations waiting for this address. Visited returns TRUE if the address of the node with number loc is already known. Two additional procedures are needed: one to emit G-code instructions and one to access nodes of top-down graphs: PROCEDURE PROCEDURE Emit (VAR pc:CARDINAL; GetNode (loc:CARDINAL; code:Instruction); VAR node:Graphnode) ; Emit writes the specified instruction code into the code segment at the location pc and increases the code segment length accordingly. Here, Instruction is a symbolic type that is represented by the text of the instruction. The actual implementation deviates from this. GetNode gets the graph node with the node number loc. The type Graphnode is described in Section 7.3.1. - The actual algorithm for the generation of the G-code follows: Generate local begin G-code: pc: Cardinal; pc:=1; for all nonterminals GenCode (Jroot end; end Generate of i do top-down graph of nonterminal i, $pc); G-code; GenCode(lloc,tpc) is a recursive procedure which will now be refined. It translates the top-down graph with the root /oc into a corresponding G-code sequence and inserts it into the code segment at the location pc. When GenCode arrives at a node loc that has already been visited, the G-code for the subgraph at loc has already been generated, so this node does not have to be revisited. GenCode (loc, pc): param var loc,pc: node: adr,nr: Cardinal; Graphnode; Cardinal; begin if Visited(Jloc) NewAdr (Lloc, dpc) ; then return; —— end; NOW) Vasit) Loc
Chap. 7 The implementation GetNode (4 1oc, Tnode) ; case node.typ of cs if node.1lp=0 then Emit (fpc,Y"T node.sp"); else Getadr (node. 1p, bpc+2, fadr) ; Emit([pc,4"TA node.sp,adr"); end; | nt: if node.lp=0 then if node.sem1=0 then Emit (tpc, "NT else node.sp"); Emit (fpc, Y"NTS node.sp,node.seml"); end; else GetAdr (Inode.Ip,Ypc+2,Tadr); if node.sem1=0 then Emit (pc,J"NTA else Emit (fpc, /"NTAS node.sp,adr"); node.sp,adr,node.sem1") ; end; | any: end; if node.sp=0 then Emit (Ipc, )"ANY") ; else GetAdr (lnode.1p,tpc+2, Tadr) ; Emit (Ipc, /"ANYA | eps: end; if node.sp<>0 then if node. 1p=0 node.sp,adr") ; -- then Emit (fpc,L"EPS node with eps-set node.sp"); else Get Adr (Lnode.lp,lpct+2, Tadr) ; Emit (Ipc, /"EPSA node.sp, adr") ; end; end; end; --case if node.sem2<>0 THEN Emit (Ipc, "SEM (node.sem2)"); if node.sem3<>0 THEN Emit (pc, "SEM (node.sem3)"); end; end; if node.rp=0 then Emit (pc, L"RET"); else abt Visited (node.rp) then GetAdr (tnode.rp, dpc+1, fadr) ; Emit (tpc, 4"JMP end; end; if node.rp<>0 then if node.lp<>0 then GenCode(lnode.lp,lpc); end GenCode; GenCode (node. rp, pc) ; end; end; ella )\\c
Sec.r7.7 Generation of the syntax analyzer 159 The G-code is completely stored in memory so that the missing addresses can be inserted when they become known. 7.6.3 Generation of the remaining tables Besides the G-code, the contents of the generated tables are almost entirely extracted from the symbol list. Only the name list is handled by the lexical analyzer of Coco. Coco gets the necessary data from the symbol list and from the lexical analyzer with the help of access procedures, and writes them unchanged into the syntax analyzer as initialization values. 7.7 Generation of the syntax analyzer Coco generates a table-driven LL(1) syntax analyzer with error handling in the form of a Modula-2 source module which the user must compile and include in his compiler. The syntax analyzer is the implementation of the analysis algorithm described in Section 2.5. It is the same for all generated compilers. Only-the parser tables differ from compiler to compiler so they have to be inserted into the otherwise invariant parser module. The implementation Structure of the symbol list Structure of the top-down graph Collecting the symbol sets Grammar tests Generation | Generation | Generation of the of the parser semantic tables evaluator Fig. 7.17 Structure of Section 7.7 The definition module and the implementation module of the syntax analyzer are generated from a frame text which Coco reads from the file cocosynframe. At certain locations grammar-dependent parts have to be inserted into this frame. The locations are marked by the string '-->' and a descriptive name of the text to be inserted. The following table shows what has to be inserted at these locations.
Chap. 7 The implementation 160 -->modulename -->semantic -->input grammar name + syn analyzer module grammar name + sem grammar name + lex -->declarations table dimensions declared as constants (see example in Section 8.3) -->tables table values The syntax analyzer contains references to other modules (e.g. the lexical analyzer or the semantic evaluator) whose names are constructed from the grammar name (the name of the root symbol in the attributed grammar) and from a suffix. The resulting syntax analyzer is written to the files grammarnamesyn.DEF and grammarnamesyn.MOD. Coco uses a procedure CopyFramePart to copy pieces of text from the frame to the syntax analyzer module. PROCEDURE CopyFramePart (VAR source,target:File; str:ARRAY OF CHAR); CopyFramePart copies text from the file source to the file target until it encounters the string str (str is not copied). When it is next called, it continues copying the text immediately behind str. This procedure is called with the name of the next piece of text to be inserted (e.g. '-->tables'). It copies the frame up to this name and then Coco inserts the specified text in place of the name. This process is repeated until the entire syntax analyzer has been generated. A source listing of cocosynframe is shown in Appendix F. The module cocosyn, also shown in Appendix F, is an example of a syntax analyzer generated by this process. 7.8 Generation of the semantic evaluator In addition to the syntax analyzer and the parser tables, Coco also generates a semantic evaluator. This is a Modula-2 source module which the user must compile and include in his compiler. The semantic evaluator consists of some invariant parts and of the semantic actions and declarations which are copied from the attributed grammar. Its generation can be divided into three tasks: 1. 2. 3. copy the semantic declarations from the attributed grammar to the semantic evaluator; translate the semantic actions into components of a case statement; generate new semantic actions (assignments) for attribute passing. Before covering these three tasks in detail, we will describe the invarian t parts of the semantic evaluator.
Sec. 7.8 Generation of the semantic evaluator 161 The implementation Sp eae Structure of the symbol list Structure of the top-down graph Collecting the symbol sets Grammar tests Generation of the parser tables Generation of the syntax analyzer Constant parts of the Translation of Translation of semantic evaluator semantic declarations semantic actions Generation of the semantic evaluator Attribute processing Fig. 7.18 Structure of Section 7.8 7.8.1 The invariant parts of the semantic evaluator Like the syntax analyzer, the semantic analyzer is derived from a frame module which Coco reads from the file cocosemframe. Again Coco copies the frame using the procedure CopyFramePart (see Section 7.7) and inserts grammar-dependent parts at some specified places in the frame. These places are: -->modulename grammar name + sem -->scannername grammar name + lex -->declarations -->actions semantic declaration of the grammar semantic actions of the grammar The frame module is as follows: DEFINITION VAR PROCEDURE END MODULE printactions: -->modulename; BOOLEAN; Semant (sem:CARDINAL) ; -->modulename. IMPLEMENTATION MODULE -->modulename; FROM SYSTEM IMPORT WORD; FROM -->scannername IMPORT at; -->declarations PROCEDURE BEGIN x:=y ASSIGN (VAR END ASSIGN; x:WORD; y:WORD);
The implementation 162 PROCEDURE Semant (sem:CARDINAL) Chap. 7 ; BEGIN CASE sem OF 11: ; (*action -->actions numbers start at 12*) END; END Semant; END -->modulename. The resulting semantic analyzer is written to the files grammarnamesem.DEF and grammarnamesem.MOD. The user may set the exported variable printactions to TRUE if he wants a trace of the executed semantic actions. 7.8.2 Processing of the semantic declarations The semantic declarations, which are written in Modula-2, are copied immediately and without change from the attributed grammar to the frame program, and are inserted at the location marked by '-->declarations'. This happens in the following manner: the lexical analyzer of Coco returns the symbols of the Modula-2 text to the syntax analyzer as Cocol symbols, and from there they go to the semantic evaluator. The procedure Copy(Jtyp,\col) is called for each symbol to translate its symbol code back into its source text, which is then inserted into the frame module. Problems can arise since the Modula-2 text may contain symbols that are not Cocol symbols (i.e. +, *, &, etc). Such symbols are copied by means of a trick: the lexical analyzer assigns them a special symbol code (nococosy) and an attribute (spix). They are treated like names and entered into the name list. spix is their address in the name list, which allows their source text to be accessed. In order to keep the name list small, the Modula-2 names are entered only temporarily. Permanent storage is prevented with the procedure StopHash. This causes a name to be entered, but overwritten by the next name, so the names can be accessed via their addresses just like the permanently stored names, but only until the next name has been recognized. The procedure RestartHash re-establishes permanent storage. Coco copies the declarations without checking the syntax. If there are syntax errors, they will be detected by the Modula-2 compiler when the generated semantic evaluator is compiled. We now describe the translation of the semantic declarations by an attributed grammar in Cocol. GRAMMAR SEMANTIC FROM Declarations DECLARATIONS cocogen IMPORT Copy;
Sec. 7.8 Generation of the semantic evaluator FROM == cocolex IMPORT PROCEDURE oe = col, typ, StopHash, 163 RestartHash; Copy (typ,col:CARDINAL); writes the source text semantic analyzer. col TERMINALS SEMANTICSY NONTERMINALS RULES Declarations SEMANTICSY { any } Declarations of the is the symbol symbol 'typ' to the generated column in the grammar. DECLARATIONSY = DECLARATIONSY sem StopHash endsem sem Copy (Jtyp, Jcol) endsem sem RestartHash endsen. ENDGRAM 7.8.3 Processing of the semantic actions Coco translates the semantic actions of the attributed grammar into continuously numbered variants of a case statement, and inserts them into the semantic frame program at the location marked by the string '-->actions'. Like the declarations, the semantic actions are copied unchanged and without a syntax check. Again, each symbol is copied by translating its symbol code back into its source text. We describe this process in Cocol. «GRAMMAR SemAction SEMANTIC DECLARATIONS FROM cocogen IMPORT Copy, FROM cocosym IMPORT NewMacro, GetMacroNr; FROM FROM cocolex Errors IMPORT IMPORT col, typ, SemErr; StopHash, --PROCEDURE -- OpenSem (VAR generates --PROCEDURE --- gets does VAR the not spix,sem: a new OpenSem; sem:CARDINAL) ; case GetMacroNr action exist, RestartHash; label and returns (spix:CARDINAL; number sem=0. sem its VAR of the number sem. sem:CARDINAL) ; macro 'spix'. If the CARDINAL; TERMINALS SEMS Yee ND OEMS Yanna NONTERMINALS SemAction<out:sem> EEE TEEN SZESTDENTSOUNERISPLX> RULES SemAction<out:sem> SEMSY ( "(" = IDENT<out:spix> sem GetMacroNr (Jspix, Tsen); IF sem=0 endsem THEN SemErr (1) END 1) " | { any sem Opensen (Tsen) ; StopHash sem Copy (Jtyp, Jcol) sem RestartHash endsem } ) ENDSEMSY ENDGRAM endsen. endsem macro
Chap. 7 The implementation 164 The above grammar also shows how semantic macros are processed. The module cocosym handles a list of macro names and their corresponding semantic action numbers. The action number of a macro is supplied by the procedure GetMacroNr. 7.8.4 Attribute processing While declarations and semantic actions need only to be copied from the attributed grammar into the semantic evaluator, attributes need further processing. For each symbol, its attributes must be stored in the symbol list, and must be checked for consistency every time this symbol occurs. In addition to this, Coco must generate semantic actions by which values are assigned to the attributes at run-time. The processing of attributes depends on the context in which they appear. In Cocol there are three different places where attributes may occur: 1. 2. 3. at the declaration of a syntax symbol; at anonterminal on the left-hand side of a rule; atasymbol on the right-hand side of a rule. We will now describe the processing of attributes in each of these three cases, and then summarize it by an attributed grammar. Declaration of attributes Attributes are declared together with syntax symbols and are entered into the symbol list. The context of attribute declarations is: SyntaxDeclarations = TERMINALS {Symbol [ PRAGMAS {Symbol NONTERMINALS [Attributes] [Attributes] {identifier [AliasName] } [SemAction]} ] [Attributes] [AliasName]}. Coco uses the procedure New4r to enter an attribute into the symbol list. TYPE Direction PROCEDURE = (up,down) ; NewAt (sy, spix:CARDINAL; dir:Direction); NewAt enters an attribute spix with the direction dir for the symbol sy. Depending on the kind of sy, the following information is stored: for terminal symbols: for pragmas: number of attributes; number of attributes; for nonterminals: numbéh ame, and direction of attributes.
Sec. 7.8 Generation of the semantic evaluator 165 Attributes on the left-hand side of productions Attributes on the left-hand side of productions are called formal attributes. Their context is: Rule = identifier [Attributes] "=" Expression "." . Formal attributes are checked for consistency with their declaration. For every left-hand side nonterminal the number of attributes, their names, order, and direction must agree with the attributes declared for this nonterminal. The procedure GetAt is used to access the attribute information in the symbol list. It gets the name (spix) and the direction (dir) of the nth attribute of the nonterminal sy. If sy has fewer than n atttributes, then spix is zero. PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction) ; Attributes on the right-hand side of productions Attributes on the right-hand side of productions appear as actual attributes of syntax symbols in EBNF expressions. Expression Term Factor = Term {"|" Term}. = Factor {Factor}. = Symbol [Attributes] | In this context, attributes denote semantic values which result from the recognition of a syntax symbol, or which are required for its recognition. Coco generates assignments between the attribute values and the attribute names, and includes them as semantic actions in the evaluator program. It also checks whether the number of attributes, their order and their direction agree with the corresponding attribute declaration. Attribute assignments for terminals and pragmas The lexical analyzer of the generated compiler exports the attribute values of terminals and pragmas in the variable at. The array at is filled for each symbol by the lexical analyzer. A terminal (or pragma) t<out:a,b> is handled by the generated compiler as follows: recognize t and fill at; a:=at(1); b:=at(2); When t has been recognized, a semantic action must be executed in which the attribute values at(1) and at(2) are assigned to the attributes a and b. Since such an action does not exist, Coco must generate it. Attribute assignments for nonterminals For nonterminals, attribute assignments occur between formal and actual attributes. A nonterminal nt<in:a,b; out:c,d> is handled by the generated
Chap. 7 The implementation 166 compiler as follows: formal formal attribute attribute parse nt; c := formal d := formal corresponding corresponding attribute attribute to to a b corresponding corresponding := a; := b; to c; to d; Again Coco must generate semantic actions for the attribute assignments. Generation of attribute assignments For each attribute on the right-hand side of a production, Coco calls the procedure GenAssign, which generates an assignment of the corresponding attribute value to the attribute variable. TYPE Attrkind = (term, nonterm, const) ; PROCEDURE (*attribute of a terminal*) (*attribute of (*const. value GenAssign(typ:Attrkind; a nonterminal*) as an attribute of an nt*) left, right:CARDINAL); Table 7.2 shows the meaning of the parameters left and right depending on the value of the parameter typ. It also shows which code is generated: Table 7.2 Parameters of GenAssign and the generated code pee of Sat of Meaning of right Generated code term Spix of left side ie name(left):=at[right] nonterm Spix of left side Spix of right side name(left):=name(right) Spix of left side Constant name(left):=right name(spix) denotes the name at the address spix in the name list. The array at is exported by the lexical analyzer and contains the attribute values of the most recently recognized terminal. The procedure EmitAction builds a semantic action from the attribute assignments generated since its last call. It inserts the action as a variant of a Case statement into the semantic evaluator. Thus, the semantic evaluator contains not only the semantic actions of the attributed grammar, but also the actions generated from attributes by Coco. EmitAction returns the action number of the generated semantic action. PROCEDURE EmitAction(VAR sem:CARDINAL) ;
Sec. 7.8 Generation of the semantic evaluator 167 Optimization of attribute passing Coco performs two optimizations to reduce the number of attribute assignments: 1. 2. If the formal and the actual attribute of a nonterminal have identical names, no assignment is generated. Identical semantic actions (with the same assignments) are generated only once. Description of the attribute processing in Cocol We will now summarize the attribute processing, describing it by an attributed grammar in Cocol. The start symbol of the grammar is the nonterminal Attributes. The grammar is a segment of a larger grammar in which attributes can appear in various contexts. Therefore, Attributes has three input attributes which control its processing. Attributes<in:sy,styp,kind; out:seml,sem2> sy denotes the symbol to which the attributes belong; styp specifies the type of this symbol; kind is the context in which the attributes are being used indicating how they are to be processed: kind=def: kind=check: treat them as an attribute declaration; perform a consistency check (when used on the left-hand side of a production); kind=use: generate semantic actions for attribute passing (when used on the right-hand side of a production). seml and sem2 are the numbers of the generated semantic actions for input and output attribute passing (or zero). GRAMMAR Attributes SEMANTIC FROM DECLARATIONS cocosym IMPORT FROM cocogen IMPORT NewAt, GetAt, CompleteAt, Direction, Symboltype; Attrtype, EmitAction, GenAssign; FROM Errors IMPORT SemErr; Usage, SAUa -- Attrtype = (term,nonterm, const) ; -Direction = (up,down); (AOUG=ateoOrein=dt*) -Usage = (def,check,use); (*attribute context :*) -Symboltype = (eps,t,pr,nt,any); --PROCEDURE NewAt (sy, spix:CARDINAL; dir:Direction) ; -declares an attribute for the symbol sy with the name -the direction dir. --PROCEDURE GetAt(sy,n:CARDINAL; -VAR dir:Direction) ; VAR spix:CARDINAL; spix and
Chap. 7 The implementation 168 spix and the direction dir of attribute number If sy has less than n attributes, then spix=0. gets the name of symbol sy. --- --PROCEDURE CompleteAt -- true returns (sy,n:CARDINAL) if symbol sy has : BOOLEAN; exactly n attributes. VAR sy, Spix, spixl, seml,sem2,n,val: styp: Symboltype; kind: Usage; dir,dirl: Direction; CARDINAL; MACROS sem :AssignInAt: n:=ntl; CASE kind OF use: IF styp=nt THEN Getat (Jsy,Jn,Tspixl,Tdirl); IF spixl<>0 THEN IF dir=dirl THEN ELSE GenAssign(tnonterm, SemErr (2) /spixl, /spix) END END END; | check: IF styp=nt THEN GetAt (Lsy, bn, Tspix1,Tdirl); IF spixl<>0 THEN IF spix<>spixl THEN SemErr(3) END; IF dir<>dirl THEN SemErr(2) END END END; | def: END NewAt (Lsy, bspix, -- dir) CASE endsem sem :AssignNumber: ig Sahel IF kind=use THEN IF styp=nt THEN Getat (bsy, dn, Tspix1, hdirl) ; IF spixl<>0 THEN IF dir=dirl THEN ELSE GenAssign(Jconst,Yspix1,\/val) SemErr (2) END END END ELSE END endsem SemErr (4) n
Sec. 7:8 sem Generation of the semantic evaluator 169 :AssignOutAt: n:=ntl; CASE kind use: OF IF styp=t ELSIF THEN styp=nt GenAssign (Jterm,Jspix,Yn) THEN Getat (Jsy,In,Tspix1,Tdiri); IF spixl<>0 THEN IF dir=dirl THEN ELSE GenAssign (Vnonterm, 4spix, SemErr (2) spix1) END END END; | check:IF styp=nt THEN Get at (bsy, dn, Tspix1, Tdirl); IF spixl<>0 THEN IF spix<>spixl IF dir<>dirl END THEN THEN SemErr(3) SemErr(2) END; END END; | def: NewAt (Usy,lspix, IF styp=pr END -- THEN dir) ; GenAssign (Jterm, /spix, Vn) END CASE endsem PERMINALS Wu nen mae IDENT<out :spix> ee Wis INSY OUTSY NUMBER<out:val> NONTERMINALS Attributes<in:sy,styp,kind; out:seml,sem2> InAttr<in:sy,styp,kind; out:seml,sem2,n> zn: attribute counter OutAttr<in:sy,styp,kind,n; out:seml,sem2,n> Attributes<in:sy,styp,kind; out:seml,sem2> = u sem seml:=0; sem2:=0 endsem ( InAttr<in:sy,styp,kind; out:seml,n> [ ";" OutAttr<in:sy,styp,kind,n; out:sem2,n> . | OutAttr<in:sy,styp,kind,0; out:sem2,n> ) wu sem IF NOT Completeat (lsy,4n) THEN SemErr (5) END endsem. InAttr<in:sy,styp,kind; INS Ys! out:semi,n> = sem IF styp<>nt THEN dir:=down; n:=0 endsem SemErr (1) END;
The implementation 170 Chap. 7 ( IDENT<out :spix> sem (AssignInAt) | NUMBER<out sem (AssignNumber) sem sem (AssignInAt) endsem (AssignNumber) endsem :val> endsem endsem lan ( IDENT<out:spix> | NUMBER<out :val> sem IF kind=use THEN EmitAction(Tseml) END endsem. OutAttr<in:sy,styp,kind,n; out:sem2,n> = sem dir:=up endsem OULS Ye IDENT<out :spix> sem (AssignOutAt) endsem { "," IDENT<out:spix> sem (AssignOutAt) endsem sem IF (kind=use) OR (styp=pr) } EmitAction THEN (Tsem2) END endsem. ENDGRAM If one of the context conditions is violated, the procedure SemErr(Jn) called, which emits an error message depending on n: error message : In-attributes for a pragma or terminal : Wrong attribute direction : Wrong attribute name : Formal attribute is a constant AP wm 8m : Wrong number of attributes is
8 Applications 8.1 Applications in compiler construction Attributed grammars are mainly used in compiler construction — more precisely for the description of compilers. However, the description of an actual compiler is far too complex to be used as an introductory example. Therefore, in this Section we will use Cocol to develop a lexical analyzer, which is part of a compiler. This example is general enough to demonstrate all language constructs of Cocol, and yet simple enough for a reader inexperienced with attributed grammars to follow it. The application of Coco to an actual compiler (the compiler description for Coco itself) can be found in Appendix F. It is unusual to describe and to generate lexical analyzers with attributed grammars. Normally, they are coded by hand since they must be very efficient (lexical analysis takes the biggest part of the compilation time). There are special scanner generators which are designed to produce fast lexical analyzers. Although Coco is not such a generator, run-time measurements show that it is possible in both theory and practice to implement lexical analyzers with Coco. As an example, we will develop a lexical analyzer for Modula-2. First we will give a general specification for lexical analyzers. Then we will prepare a special specification of a lexical analyzer for Modula-2. Next we will describe and build this lexical analyzer using Cocol. Finally we will explain some of the problems that can arise. At the end of this section, we will specify the semantic procedures used in the description of the lexical analyzer. 171
Chap. 8 Applications 172 8.1.1 Specification of a lexical analyzer General tasks A lexical analyzer must at least perform the following tasks: 1. 2. read and optionally print the source program; skip meaningless character sequences such as blanks, comments, etc.; 3. recognize and tokenize terminals such as keywords, names, numbers, 4. and operators; report lexical errors. Usually, a lexical analyzer will recognize only one terminal per call and pass it to the syntax analyzer. However, there are also analyzers that process the entire source text at once, and write the symbol codes of the recognized terminals to an intermediate file so that the syntax analyzer can read them later on. The lexical analyzer described here is of the latter type. Tasks of a lexical analyzer for Modula-2 A lexical analyzer for Modula-2 must recognize the following terminals: Keywords AND ELSIF LOOP REPEAT ARRAY END MOD RETURN BEGIN BY EXIT EXPORT MODULE NOT SET THEN CASE FOR OF TO CONST DEFINITION DIV DO ELSE FROM ER IMPLEMENTATION IMPORT IN OR POINTER TYPE UNTIL PROCEDURE QUALIFIED RECORD WHILE VAR WITH Names Identifier = Letter Letter = INENDEEN Digit = u {Letter a KALI RE AUT | Digit}. DAT Va IS SU | Oe SU SE OL Decimal constants DecNumber = Digit {Digit}. Hexadecimal constants HexNumber = Digit HexDigit = Digit = OctalDigit {HexDigit} | "H". KAUNIBEITENTUHE TE Octal constants OctalNumber {OctalDigit} "B", tem ‘
Sec. 8.1 Applications in compiler construction OctalDigit = mor |"" | wou hey ay | wou |wen |u7u 173 n Real constants RealNumber = Digit {Digit} pata a ee "." Fe) {Digit} Digit [Digit]]. Character constants CharConst = win any wig | OctalDigit Character strings String | us any ms {OctalDigit} "c". ei = win {any} wie tur {any} dl Comments Comment Ze comment any) Operators and separators a = z / := addition subtraction multiplication & logical >= ( [ { z F real division assignment and Segel # not equal <> not equal < less than greater than less than or Context 1 2, 3. 4 greater than or equal round parenthesis index-parenthesis set-brackets pointer comma period 9 semicolon S colon range operator variant operator equal conditions Decimal, hexadecimal, or octal constants must be in the range 0 to 65535. The numerical value of character constants must be in the range 0 to 255. Real constants must be in the range 1.4694E-39 to 1.7014E+38. Character strings must not extend over line boundaries. 8.1.2 Description of a lexical analyzer for Modula-2 In the previous section, we described the lexical structure of Modula-2 by a context-free grammar. Now we will have to attribute it. The following points need special attention. The lexical analyzer supplies the terminals for syntax analysis. These are the nonterminals of the lexical analyzer, whereas the terminals of the lexical
Chap. 8 Applications 174 analyzer are the characters of the source text. These characters must be supplied by a mini-scanner with the following tasks: 1. 2. 3. read and print the source program, supply the characters of the source text as terminals; treat the character sequences "..', '(*', and '*)' as special terminals (to simplify the attributed grammar). This still leaves enough work for the lexical analyzer proper. In accordance with Section 6.4.2, we will implement the mini-scanner in the procedure GetSy of the module Scannerlex. The mini-scanner is so simple that we refrain from describing it further. Now we will specify the lexical analyzer of Modula-2 with Cocol. GRAMMAR Scanner SEMANTIC DECLARATIONS FROM Conversions IMPORT Convert, FROM Errors IMPORT SemErr; FROM ListMod IMPORT EnterString, FROM Scannerlex IMPORT typ, FROM OutMod IMPORT Symboltype, Emit, EmitConstant, EmitIdent, EmitString; (*token codes*) --TYPE -== == == Symboltype = ConvertReal; line, Hash; col; (eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy, minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, semicolonsy, periodsy, colonsy, rangesy, constsy, commasy, typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy, importsy, exportsy, fromsy, qualifiedsy, == == beginsy, casesy, ofsy, ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy, untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy, becomessy, endsy, callsy, definitionsy, SS implementationsy, intcardcon, proceduresy, realcon, charcon, modulesy, ident, stringcon, eolsy); cardcon, CONST blmax = 80; --- buffer fit on length (every token must a 80 character line) -- string address VAR addr: CARDINAL; bi Joie ARRAY [1..blmax] OF CHAR; -CARDINAL; -- buffer length chi: CHAR; firstch:CHAR; ir length: rval: spix: sy: symcol: val: CARDINAL; CARDINAL; REAL; CARDINAL; Symboltype; CARDINAL; CARDINAL; -== -------- in string list buffer auxiliary first character in a string auxiliary string length value of real-constant spelling index of identifier token code symbol column constant value
Applications in compiler construction Sec. 8.1 175 MACROS sem :AddCh: are -- -not bl:=bl+1; it is supposed, that lines longer than 80 characters b[bl] :=ch endsem TERMINALS RU chr9 chr17 chr25 NEN chr lO chr18 chr26 vow ui Lt aN ga wy" vg" Tom va 2 A 38 Q B J R x Ne an h a a b 3 C k d 1 e m if n g 0 p 8 q y ag 2 S ur ie ae u a Vv chr126 W chr127 fg H Schr che l6 chr24 chr4 chr l2 chr20 chr28 ChuS ey eR chr21 chr29 sichro = chris ehrlds chris chr22 chr23 chr3l chr30 uur ie Su won Wew wee "x" win En woe LW ypu wou zu nz“ "zu TAU Tu Wet Wig u vw Wu wa E G (6) W URAN chrl chr19 chr27 Z, [& K S u D L m I vn BE M U N Ne " wau V ou NONTERMINALS „Scanner Symbol Identifier <out:sy,spix,symcol> Number String Comment Letter <out:sy,val,symcol> <out:sy,addr,lengt firstch, symcol> h, <out:ch> Digit HexDigit <out:ch> <out:ch> RULES Scanner = sem Emit (Veofsy,Jcol) {Symbol} Symbol = {eos ( Identifier endsem. ep banks <out:sy,spix,symcol> sem IF sy=ident THEN EmitIdent (Jspix,Jsymcol) -- ident. ELSE Emit (Lsy,tsymcol) -- keyword END | Number endsem <out:sy,val,symcol> sem EmitConstant (Jsy,Jval,\symcol) -- cardcon, intcardcon, endsem realcon, charcon
Chap. 8 Applications 176 String <out:sy,addr, length, firstch, symcol> sem IF sy=stringcon THEN EmitString (laddr, length, Jsymcol) ELSE EmitConstant (Jcharcon, JORD (firstch) ,dsymcol) END endsem | Comment UT | "=" un mie vie nu Laney sem sem sem sem Emit Emit Emit Emit sem Emit sem Emit sem Emit Mu he | van jj) ow wt wen su wan nn we ng" nu ee CR UO (ME | eps (Jsemicolonsy,Ycol) endsem (Jeqlsy,Jcol) endsem (Jlparsy,Jcol) endsem (Jrparsy,Jcol) endsem (Jlbracksy,Jcol) endsem (lrbracksy,/col) endsem (Jlconbrsy,Jcol) endsem sem Emit ({rconbrsy,Jtcol) sem sem sem sem sem sem sem sem sem sem sem sem Emit Emit Emit Emit Emit Emit Emit Emit Emit Emit Emit Emit endsem sem Emit (Jbecomessy,Jcol) (Jtimessy,\col) endsem (\commasy,Ylcol) endsem ({slashsy,col) endsem (Lplussy,4col) endsem (\minussy,Jcol) endsem (Jarrowsy,\col) endsem (\variantsy,Jcol) endsem ({notsy,dcol) endsem (Jandsy,Jcol) endsem (\periodsy,Jcol) endsem (Jrangesy,Jcol) endsem (leolsy,Jcol) endsem sem Emit (Lcolonsy,Jcol) endsem endsem ) | mh (MDM MIeN | eps sem Emit (\notsy,Jcol) sem Emit (Llegsy,lcol) sem Emit (Llsssy,dcol) endsem endsem endsem sem Emit (\gegsy,Jcol) sem Emit (\gtrsy,Jcol) endsem endsem ) | ">" ( wan | eps Identifier <out:sy,spix,symcol> = Letter <out:ch> { Letter <out:ch> | Digit <out:ch> sem sem sem sem symcol:=col; bl:=0 (AddCh) endsem (AddCh) endsem (AddCh) endsem } sem Hash(lb,lb1,Tsy, Tspix) -- sy is identifier endsem. endsem or keyword
Sec. 8.1 Applications in compiler construction Number <out:sy,val,symcol> = Digit <out:ch> { HexDigit <out:ch> sem symcol:=col; bl:=0 (AddCh) endsem (AddCh) endsem (CH sem DIEBE ER DII ZUR: sem sem endsem bl:=bl+1; b[bl]:=CHR(typ) (AddCh) endsem sem sem 177 endsem Convert (lb, /b1,Tsy, Tval) Digit <out:ch> [ Digit <out:ch> endsem sem bl:=bl+1; b[bl]:=CHR(typ) endsem sem bl:=bl+1; b[bl]:=CHR(typ) endsem sem sem (AddCh) (AddCh) endsem endsem ] sem ConvertReal (lb, 4b1,Trval) ; sy:=realcon; val:=CARDINAL endsem (rval) Convert (lb, 4b1,Tsy,Tval) String <out:sy,addr, length, firstch,symcol> sem ( endsem = symcol:=col; bl:=0 endsem DIDI ER TDZISURNEHTE sus sem EZ HI? endsem | | Ku Man sem bibl endsem sem DIDI endsem sem sem sem s="(" 5 bi bl+2 1st 40s SS bls =i HD=a) eles: =billake SemError (J1,Jline,Jcol); bl:=0 endsem bl:=bl+1; b[bl]:=CHR(typ) endsem bb Beh (bl t2 = ts bils=bilAe2 b[bl+2]:="*"; bl:=bl+2 b[b1+2]:=")"; bl:=b1+2 endsem sem sem b[bl+1]:="("; endsem b[b1+1]:="*"; endsem sem sem sem SemError (J1,Jline,Jcol); bl:=0 endsem bl:=bl+1; b[bl]:=CHR(typ) endsem length:=bl; IF length=1 THEN ELSE sy:=charcon; firstch:=b[1]
Chap. 8 Applications 178 sy :=stringcon; EnterString (tb, Jbl, Taddr) END endsem. Comment "(*" Letter = { comment <out:ch> = (AIBICIDIEIFIGIHIIIJIKILIMINJO|PJQIRISITIUIVIWIXIYIZ] albleldlel£f|gihliljik|llmInlolplalr|sitlulviwix|ylz) sem Digit ut <out:ch> RR ZU ch:=CHR(typ) endsem. = TRITT EZ TUSU TEE I ERTL) sem ch:=CHR(typ) endsem. sem endsen. HexDigit <out:ch> = digit <out:ch> | (A|B|C|DIE|F) ch:=CHR(typ) ENDGRAM The rules for Number and String need some explanation: Numerical constants cannot be converted while they are being recognized because decimal, hexadecimal, octal, and real constants can be distinguished only by their last character or by a decimal point. Their text must therefore be stored and converted later. Strings also have their peculiarities. Our mini-scanner returns the character sequences '..', '(*', and '*)' as single terminals. If one of these Sequences appears within a string it has to be expanded again, since strings must be stored in their original form. Therefore, the rule for strings gets more complicated than expected. On the other hand, the description of strings and comments with the symbol any looks very simple and elegant. In accordance with Section 5.2.1, any represents all those terminals which cannot be recognized instead of it, at this point in the grammar (in String: all terminals except... (“27 SCHE and ''' (or '"'); in Comment: all terminals except '(* and '*)'). The example also shows the semantic processing of any. In a string, the symbol recognized by any is processed using the global variable typ (see Section 6.4.2). The reason for the introduction of the terminals fey Chand“) is not obvious, and requires an explanation: the symbol '..' is necessary, because otherwise a lookahead of 2 characters would be needed (the first period in the
Sec. 8.1 Applications in compiler construction 179 sequence '1..2' may be a decimal point or the start of a range operator). Although comments can be processed with a single lookahead character, it simplifies the processing of comments considerably if we treat the sequences '(*' and '*)' as single terminals. LL(1) Conflicts As shown by Example 8.1, it is often difficult to avoid LL(1) conflicts when lexical structures are described by an attributed grammar: 8.1 Example LL(1) conflicts in lexical structures Scanner Symbol {Symbol}. | mon Maren], This situation represents an LL(1) conflict because if '>' is read and the next character is '=', the syntax analyzer cannot decide whether this character belongs to the symbol '>=' or whether it constitutes a separate symbol '='. Such conflicts also appear in the symbols ':=", '<>', '<=", Identifier, and Number. However, they are not critical since the syntax analyzer always chooses the first alternative it encounters during analysis. In the example above, this means that '=' is correctly considered part of the symbol '>=' rather than being recognized as a separate symbol. Speed A lexical analyzer implemented with Coco runs at approximately one-half the speed of a hand-coded analyzer. A 35% speed gain can be achieved if the nonterminals Letter and Digit with their many alternatives are already recognized as terminal classes by the mini-scanner. Assessment The example has shown how easy a translation process can be described with Cocol. At the first glance, the grammar may seem a bit confusing. Yet, as soon as one becomes familiar with this notation, the following advantages can be observed: 1. 2. 3. The grammar is short and precise. For the recognition of a symbol, it is sufficient to write its name without any additional actions. The syntax is clearly separated from the semantics. Thus the syntax is more explicit than it is in a hand-coded compiler. From the syntax declarations, one can see immediately which terminals and nonterminals are in the language.
Chap. 8 Applications 180 4. Error-handling actions need not be described explicitly. 5. Many constructs, like nested comments, can be described with any ina straightforward and elegant way which is hard to surpass. Of course, there are some parts of the grammar which are not very simple to read, e.g. the production for Number. It has a rather complex structure, but this only shows that Cocol can also handle difficult constructs. After all, the production for Number describes four different kinds of numerical constants. This would be difficult to read in a hand-coded lexical analyzer, too, and could hardly be written in this short and concise form using a conventional programming language. 8.1.3 Semantic procedures for lexical analysis We decompose the semantic procedures of the attributed grammar into four modules Scannerlex, OutMod, ListMod, and Conversions and specify their definition modules, but omit their implementation modules due to space limits. DEFINITION MODULE VAR typ,col,line: at: ARRAY[1..10] PROCEDURE END Scannerlex; CARDINAL; (*information OF CHAR; (*not needed about the current token*) here*) GetSy; Scannerlex. Scannerlex reads and prints a source text and returns every single character as a separate token. The token number as well as its column and its line number are returned by GetSy in the global variables typ, col, and line. The token numbers are the ASCII-values of the source characters (exceptions: eofch=0, '.=1, '(*'=2, and '*)'=3). After the last character in the source text is read GetSy always returns eofch. DEFINITION TYPE MODULE OutMod; Symboltype = (*token codes*) (eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy, minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, commasy, semicolonsy, periodsy, colonsy, rangesy, constsy, typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy, importsy, exportsy, fromsy, qualifiedsy, beginsy, casesy, ofsy, ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy, untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy, becomessy, endsy, callsy, definitionsy, implementationsy, proceduresy, modulesy, ident, cardcon, intcardcon, realcon, charcon, PROCEDURE stringcon, eolsy); Emit (sy:Symboltype; col:CARDINAL);
Sec. 8.1 Applications in compiler construction PROCEDURE EmitConstant (sy:Symboltype; 181 val,col:CARDINAL) PROCEDURE EmitIdent (spix,col:CARDINAL) ; PROCEDURE EmitString (addr, len,col:CARDINAL) ; END OutMod. ; The module OutMod contains procedures to write symbols to an intermediate language file. Emit writes a symbol without attributes (e.g. a keyword, an operator or a single character) to the intermediate language. It emits a word which contains the symbol type sy and the column col of that symbol. EmitConstant writes a numeric constant to the intermediate language. It emits two words, the first of which contains the type sy and the column col of the symbol and the second the constant value val. EmitIdent writes a name to the intermediate language. It emits two words, the first of which contains the symbol type ‘ident’ and the column col and the second the spelling index (spix) of the name. EmitString writes a string to the intermediate language. It emits three words, the first of which contains the symbol type 'string' and the column col, the second the string address addr and the third the string length len. DEFINITION MODULE ListMod; FROM OutMod IMPORT Symboltype; PROCEDURE EnterString(buffer:ARRAY OF CHAR; « VAR addr :CARDINAL) ; PROCEDURE Hash(buffer:ARRAY VAR spix:CARDINAL) ; END OF CHAR; len:CARDINAL; len:CARDINAL; VAR sy:Symboltype; ListMod. ListMod handles the name list and the string list of the scanner. EnterString enters a string (stored in buffer[1..len]) into the string list and returns its address addr. Hash searches a name (stored in buffer[1..len]) in the name list. If not found it is entered. For keywords Hash returns the token code of the keyword and spix is 0. Otherwise Hash returns the token code ‘ident' and spix is the address (spelling index) of the name in the name list. DEFINITION MODULE Conversions; FROM OutMod IMPORT Symboltype; PROCEDURE Convert (buffer:ARRAY OF CHAR; len:CARDINAL; VAR sy:Symboltype; VAR val:CARDINAL) ; PROCEDURE VAR END ConvertReal (buffer:ARRAY OF CHAR; len:CARDINAL; rval:REAL) ; Conversions. The module Conversions converts digit strings to cardinal or real numbers. The procedure Convert converts a digit string (stored in buffer[1..len]) to a numeric constant or a character constant. The digit string may have the following syntax: digitstring = digit {digit} -- decimal constant
Chap. 8 Applications 182 'H' | digit {hexdigit} | octaldigit {octaldigit} | octaldigit {octaldigit} 'B' --- hex constant octal constant 'C'. -- character constant For numeric constants the output parameter sy is cardcon and val is in the range 0..65535; for character constants sy is charcon and val is in the range VERITIE: ConvertReal converts a digit string (stored in buffer[1..len]) to its real value rval. The syntax of the digit string is digitstring = digit {digit} 8.2 Applications '.' {digit} in software ['E' ('+'|'-'] digit [digit]]. engineering An attributed grammar as a description method and a compiler compiler as an implementation tool are not limited to compiler construction. They can also be useful in other fields of software engineering. The reason why compiler construction techniques can be generally used in software engineering is that most large programs have the following characteristics: 1. 2. Input streams are sufficiently complex to be described in terms of syntax and semantics. The structure of the input text often determines the logical structure of the entire program or of a large portion of it. This wide field of applications is remarkable. We known Jackson method of program design can be program design with attributed grammars. With the compiler description language is emphasized stays in the background. 8.2.1 Attributed grammars will now show that the wellregarded as a special case of this in mind, in this section while the compiler compiler as a software design method The use of attributed grammars automatically leads to a two-step design process: In the first step (coarse design) the problem is decomposed into its syntactical and semantical parts. Here, the attributed grammar serves as a design method. In the second step (refined design) the semantic procedures are designed from their specifications in the rough design. The creation of the coarse design consists of the following steps, which may be executed sequentially or iteratively:
Sec. 8.2 Applications in software engineering 183 Write the grammar. The syntactic structure of the input text is described by a context-free grammar. Find attributes. Starting from the meaning of each syntax symbol, one tries to find out which (semantic) attributes should be attached to it. Then one defines these attributes and their occurrences in the grammar rules. With some experience and a proper understanding of the problem the right choice is almost automatic. This step is therefore also a good check on correct understanding of the problem. Prepare context conditions. Possibly further attributes may be necessary for this process. ” Define semantic procedures. In this step, all procedures which are used in semantic actions are defined. The refinement of semantic actions into code and procedure calls may again be done in a coarse or fine manner. Using the first approach, one may associate a special semantic procedure with each semantic action; using the latter approach, one may describe each semantic action in terms of elementary operations of a programming language without calling semantic procedures. Since many of the semantic procedures are usually access procedures to data structures, they support a modular design in the form of data capsules. The collection of all procedures shows which operations can be performed with the various data structures and which relations exist between the data structures. Set up the attributed grammar. One combines the context-free grammar, the attributes, the semantic actions, and any context conditions for a proper attributed grammar. After these five steps, the coarse design is completed and the following has been accomplished: 1. 2. 3. The problem has been decomposed into three parts: syntax, context conditions, and semantic actions. The attributes and the data structures derived from them are the terms in which the problem solution can be appropriately described. The access routines to the data structures and all other algorithms required for the solution are defined by the semantic procedures. This completes the design method with attributed grammars. The result is sufficiently abstract to fix only the essential semantic design decisions but to leave enough freedom to the implementor. On the other hand it is sufficiently concrete to specify explicitly those details that should not be left to the decision of the implementor.
Applications 184 Chap. 8 The result of the coarse design, consisting of a system of attributes, semantic procedures, and an attributed grammar, can be viewed as the specification for the refined design, since it describes what is to be done but not how it should be done. The next step is the refined design which may now exclusively concentrate on the semantic procedures without having to consider any syntactic problems. However, coarse design and refined design may influence each other. After the definition of the attributes, one may find that the semantic procedures are either too abstract or too concrete, too complex or too simple. For example, too many access procedures to the data structures of a module may indicate that it would have been better to add a lower level of abstraction, and to divide the large module into several smaller ones. The concise and formal notation of attributed grammars encourages one to try several approaches and to check their consequences without much effort, even when the task is large. The refined design is followed by the implementation. Only a lexical analyzer has to be written here, the rest is done by the compiler compiler. 8.2.2 The telegram problem as an example Henderson and Snowdon [1972] presented the following problem, which is known as the 'telegram problem’: A stream of telegrams is to be processed. Each telegram is terminated by the string 'ZZZZ’. The telegram stream is terminated when an empty telegram followed by 'ZZZZ' arrives. The words in a telegram are to be counted. Long words with more than 12 characters are to be counted separately. After each telegram, the counter values are to be printed. The telegrams are read and subsequently printed in lines of 100-120 characters. Superfluous blanks are to be eliminated. The maximum word length is 20 characters. Longer words are to cause the program to stop. Since the input consists of structured data, and its structure will significantly determine the algorithm, this task is well suited for the application of attributed grammars, and a subsequent implementation with a compiler compiler. The design steps for the solution of the telegram problem are: 1. Setup the grammar of the input data Terminals: textword endword a word in a telegram end word (= ZZZZ)
Sec. 8.2 Applications in software engineering Nonterminals: TelegramSt ream the total telegram stream Text Telegram EmptyTelegram Context-free grammar: TelegramStream TextTelegram EmptyTelegram Define attributes. result: WwW array n 1 integer integer 185 a text telegram (including its end word) an empty telegram containing only the end word = = {TextTelegram} EmptyTelegram. textword {textword} endword. endword. From the specification of the task, three attributes of char the text of a word the number of words in a telegram the number of long words in a telegram Assign attributes to the grammar symbols. In this step, we list the grammar symbols and attach attributes to them. textwordt, recognizes a word and provides its text w. Text Telegram, ft) recognizes and prints a telegram with n words, of which / words are long. The remaining gramrnar symbols have no attributes. Note that the attributed symbols are viewed from an algorithmic point (i.e. we do not say 'TextTelegram represents a telegram’, but rather 'TextTelegram recognizes a telegram’). The verbal description of the attributed symbols should specify all attributes of the symbol. It should be accurate enough to be used as a specification of the translation process. This is usually possible and easy to accomplish since the few items involved have already been previously defined. Define semantic procedures. The actions the program must execute can be seen from the problem description: (a) read the source text, recognize and count the words; (b) print the source text with a different line length; (c) print the counter values. Reading the source text is the task of the lexical analyzer and does not concern us here. The words are counted with the attributes n and /. Therefore, the only candidates for semantic procedures are those which print the text and the counter values. A variable will probably be needed to assure that the line size will not exceed 120 characters. It will be initialized at the beginning of each telegram, and will be checked and increased when a new word is added to the line. A line buffer may also be needed. Following the principle of stepwise refinement, we are not yet interested in the implementation details here. Rather, we define the following three procedures which will do the whole printing job.
Out Init Cut Word (dw) initialize the output of a telegram; print the word w according to the problem definition; print the counter values n and / with an appropriate text. OutAccount (InJ1) 5. Chap. 8 Applications 186 Write down the attributed grammar. Having completed through 4, the attributed grammar is almost self-evident now: TelegramStream { steps 1 = sem TextTelegramf,f) OutInit endsem sem OutAccount ({nb1) endsem } EmptyTelegram. TextTelegramt,tı = textwordf,, where sem (|w|<=20) n:=1; if |w|>12 OutWord then 1:=1 else 1:=0 end; (Lw) endsem { textwordt,, where sem (|w|<= 20) n:=ntl; if |w|>12 OutWord then 1:=1+1 end; (Lw) endsem } endword. EmptyTelegram = endword. This completes the coarse design of the telegram problem. Syntax and semantics are clearly separated. Together they provide a clear decomposition of the program, making its structure apparent. The separation shows that the semantic processing — i.e. the essential part — is very simple if there is a printing module with the access procedures Out/nit, OutWord, and OutAccount. A comparison with Henderson and Snowdon's solution shows that in his program lexical analysis and syntax analysis attract the major part of attention in design, program text, and possible design errors. Output and counting are of minor importance and are nearly lost. Their solution avoids the terms syntax and semantics, thus letting the problem appear to be much more complex than it is. In contrast, we focus most of our attention on printing and counting. We consider lexical analysis and syntax analysis as routine matters that do not require special attention.
Sec. 8.2 8.2.3 Attributed Applications in software engineering grammars 187 as documentation From the above, it should be obvious that attributed grammars are also well suited for documentation. Thé system of syntax, attributes, semantic procedures, and the attributed grammar of a software product is its documentation (on the abstraction level of the attributed grammar). The following advantages of this documentation method are evident: 1. The form of the documentation (its structure) is easy to find. It is almost independent of the product to be described, and consists of the parts: terminals, nonterminals, context-free grammar, attributes, attributed grammar symbols, semantic procedures, and attributed grammar (in 2. 3. The documentation is formal and therefore precise, complete, and short. The documentation is abstract enough to hide implementation details, but concrete enough to express important conceptual details. The fact that attributed grammars represent a machine-readable documentation renders it unnecessary to separate implementation and documentation, thus ensuring that the documentation is always up-to-date. this order). This arrangement aids standardization. 4. 3 8.2.4 The Jackson method as a special case At a quick glance, the often discussed Jackson method of program design seems to have nothing in common with attributed grammars. Jackson [1975] uses a totally different terminology and describes his method only by examples in an indirect and unsystematic manner. To find out the essence of Jackson's method, the reader is forced to study other literature. The Jackson method is based on the following three concepts: Ne The structure of an algorithm is derivable from its input and output data. The structure of the input and output data is described by tree diagrams which allow the description of sequences, alternatives, and (unlimited) 3. repetitions. If the structures of the input and output data 'match' in a certain way, the total algorithm for the transformation of the input into the output data can be viewed as an assembly of the transformation algorithms for the individual substructures. If the structures of the input and output data do not match, the Jackson method fails. However, in the examples in his book, Jackson shows that his method
188 Applications Chap. 8 can still be used with the aid of tricks such as ‘backtracking’, ‘program inversion’, and some other techniques. Hughes [1979] looked at the Jackson method from the standpoint of formal languages and summarized the following points: 1. 2. 3. Jackson's tree diagrams describe only regular languages since they are only based on sequences, alternatives, and unlimited iterations. In addition, it is required that the input data can be deterministically analyzed with a single-character look-ahead. Jackson's requirement of a structural matching between input and output data means in the terminology of formal languages that there must be a finite automaton that transforms the input into the output. Jackson's design method can be viewed as a special case of the design method with attributed grammars, in which: 1. 2. 3. the input data is regular and its grammar is LL(1); the output data form a regular language; acertain correspondence exists between input and output language that manifests itself in the fact that a finite automaton can be found that transforms the input into the output. It is therefore only applicable to a narrow range of tasks that meet these conditions. It is suprising that this relationship between Jackson's method and the design method with attributed grammars has hardly been recognized. The reason for this may be that Jackson does not distinguish between syntax and semantics (in fact, they are indistinguishably coupled in his examples), and does not use attributes. If we describe the examples in Jackson's book with attributed grammars, they will become simpler, shorter, and easier to understand. The grammars are simple throughout. We will show this by example 14 of Jackson's book, which in his discussion covers 17 pages, and is the most voluminous of the entire book. Problem description. An operating system collects data about its use. These data are: A record for the start of each session (LOGON), the end of a session (LOGOFF), the start of a program (PROGSTART), and the end of a program (PROGEND). At logon time, the user is assigned a unique session number. The system makes sure that a user can start a session only when his terminal is free, and cannot terminate a session that he has not initiated. Furthermore, a user can have only one active program at any given time. He must terminate an active program before starting a new one.
Sec. 8.2 Applications in software engineering 189 The collected data is written to a file. The records have the following contents: Logon record: 106G0N Logoff record: Progstart record: Progend record: session number LOGOFF-,, PROGSTART PROGEND start time session session session stop time program name program name number number number start time stop time The records are stored in strict chronological order. However, it is possible that records are missing due to erroneous processing. In this case, the data file contains incomplete information for some sessions and programs: a logon record without corresponding logoff record, and vice versa; a progstart record without corresponding progend record, and vice versa. As a result, the program should produce the following list: Number of complete sessions Average session length Number of known sessions = nnnn —SBEEE = mmmm Number = of complete Average program Number of known programs length programs pppp = uuuu = qqqq Grammar. The input consists of four kinds of records. We regard them as terminals: logon, logoff, progstart, and progend, and arrive at the following grammar: input = {logon | logoff | progstart | progend}. It consists of a single rule (for regular languages, there is always a grammar that consists of a single rule). In accordance with the problem description we attach attributes to the terminals: session: integer session number prog: name program name time: integer time of logon, logoff, progstart and progend and get the attributed grammar symbols 1 FONT sessiont time logo £fT sessiontt ime progstartf,essiontprogftime progendt, essionlprogltime Semantics. In the semantic actions, we need variables that hold the results. We call them completesessions: integer knownsessions: integer number of complete sessions number of known completeprogs: knownprogs: integer integer number of complete programs number of known programs sessions
Chap. 8 Applications 190 sessiontime: integer length of all complete sessions progtime: integer length of all complete programs It is clear from the above that, when a logon record appears, the job number and the start time must be stored until a logoff record with the same job number is encountered. The same is true for programs. For the time being, we will put the definition of the concrete data structures in the background, and consider only the fact that we need the following access procedures: NewSession (lsession\time) Define the start of a session at the specified time. DisposeSession(\session) Define the end of a session. SessionStarted(\session): boolean Return true if the specified session has been started. SessionStartTime(\session): integer Return the start time of the specified session. NewProg(\session\prog\time) Define the start of the program prog in the specified session at the specified time. DisposeProg(\session\prog) Define the end of the program prog in the specified session. ProgStarted(\session\prog): boolean Return true if prog in session has been started. ProgStartTime(\session\prog): integer Return the start time of prog in session. InitStorage Initialize the abstract data structure. Attributed grammar. With only these few facts, which are easily derived by modest thought, the attributed grammar of the problem can be formulated: input = sem InitStorage; completesessions:=0; knownsessions:=0; completeprogs:=0; knownprogs:=0; sessiontime:=0; progtime:=0; endsem {1 OFONT sessionT time sem knownsessions:=knownsessionstl; NewSession(Lsessiondtime) endsem ;
Sec. 8.2 Applications in software engineering 191 | prog Startf ses siontprogft ime sem knownprogs:=knownprogs+1; NewProg (lsessionlprogdt ime) endsem | progendfsessiontprogTtime sem if ProgStarted(JsessionYprog) then completeprogs:=completeprogst1; progtime:=progtime+ (time- ProgStartTime (Vsession\prog) ) DisposeProg(JsessionYprog) else knownprogs:=knownprogs+1 end endsem | logofffsessionttime sem if SessionStarted(Jsession) then completesessions:=completesessionst]; sessiontime:=sessiontime+ (timeSessionStartTime (Jsession)) DisposeSession (Jsession); else knownsessions:=knownsessionstl end endsem ae sem Write (Jcompletesessions) Write (Jsessiontime/completesessions) Write (Jknownsessions) Write (Jcompleteprogs) Write (Jprogtime/completeprogs) Write (Jknownprogs) endsem At this point, the coarse design is already completed. The refined design will decide about the concrete implementation of the abstract data structure. In principle, the program can be implemented with a compiler compiler. In order to read the input data, only a (trivial) lexical analyzer needs to be written. But since the grammar of this problem is so simple (as it is also for the telegram problem), the use of a compiler compiler is analogous to taking a sledgehammer to crack a nut. It is therefore almost self-explanatory that the syntax analyzer for this problem is coded using the method of recursive descent (in this case it is even non-recursive). Jackson instead undertakes voluminous considerations about intermediate data files and program inversions which make the task appear much more complicated than it really is.
Chap. 8 Applications 192 8.3 Results of a Coco run For readers interested in the way Coco works, we present an example showing the contents of the compiler parts generated from a specific input grammar. It can be viewed as a supplement to the implementation description in Chapter 7, and should help to understand the principles explained there. The example will be the description of an index generator, which is a program that generates an index from a list of keywords entered according to some syntactic rules. This problem provides another example of the use of attributed grammars in software engineering. The input to the index generator is to be as follows: for each page of a document, the page number and all keywords on this page are entered in the following manner: 1 = Introduction; User's Guide; 2 = Start up; Parts of the tool; 3 = General characteristics; User's Guide On the left-hand side of the '=' sign, page numbers as well as words are allowed. Words, however must start with a '*': *Appendix = Maintenance; Troubleshooting; From this input, the compiler generates a file of pairs <keyword, page number>, sorts this file, and prints an index in which page numbers of identical keywords are collected (the index at the end of this book was produced with such a program). In our example, we will describe the first phase of this compiler, i.e. the generation of the <keyword, pagenumber> file. 1 GRAMMAR Index 2 3 SEMANTIC 4 FROM 5 6 FROM 7 8 VAR f: File; keystring,refstring,string: value: 9 10 11 DECLARATIONS FileIO IMPORT File, Open,Close, Write, WriteString, WriteLn; Indexlex IMPORT GetKeyword, AdjustNumber. ARRAY[1..50] CARDINAL; TERMINALS n=) Urn alias alias equal semicolon Seil == 7; 14 MER alias asterisk => 15 16 keyword number<out:value> 17 18 == =>. 5) OF CHAR;
Sec. 8.3 Results of a Coco run 18 19 20 PRAGMAS eolsy 21 NONTERMINALS 22 23 24 Index Relation Reference<out -- , 193 6 -- 7 = te =.) :string> 25 26 RULES 27 28 Index = 29 30 sem Open (f, "INDEX.OUT") sem Close(f) endsem. {Relation}” endsem 3l ------------------------------------------- 2-2... 32 33 Relation = Reference<out:refstring> 34 n_" 35 36 37 { keyword sem GetKeyword (Tkeystring); WriteString (Jf,\keystring); Write (Jf,JCHR(0)); 38 39 WriteString (Jf,Jrefstring);; endsem 40 41 QQ 43 „44 45 0.0... WriteLn(f) Wie " } eon nnn 5 Reference<out:string> number<out :value> | "*" keyword 5-5-5 = == 5-5 $= $= $= - = -- = - == - == == --------- = sem AdjustNumber (Jvalue,Tstring) sem GetKeyword (Tstring) endsen. endsem 46 47 ENDGRAM This is the description of the translation process. The only thing the user has to provide is the module /ndexlex that supplies the terminals and exports the two procedures GetKeyword and AdjustNumber. GetKeyword should return the keyword string that the lexical analyzer has obtained after recognition of the terminal keyword. AdjustNumber should right-justify a number in a character field for sorting. The pragma eolsy is specified only to show how pragmas are encoded in the generated tables. From this input, Coco generates a table-driven syntax analyzer and a semantic evaluator. These modules will be discussed in the next sections. 8.3.1 The generated syntax analyzer The syntax analyzer is generated from a frame program (cocosynframe, shown in Appendix F) into which Coco inserts the following constant declarations.
Chap. 8 Applications 194 CONST maxname maxnamep = = vs oF (*length (*number of name list*) of names*) maxcode = 48; (*length of G-code*) maxany maxeps maxt maxp maxs startpc = = = = = = ip aie 57 6; 9; 44; (*number of any-sets. At least one (*number of eps-follower sets*) (*last terminal number*) (*last pragma number*) (*last nonterminal number*) (*start address of the grammar*) dummy*) These values are the table dimensions derived from the above grammar. 8.3.2 The generated semantic evaluator The semantic evaluator also consists of fixed frame parts and parts that are copied from the attibuted grammar. For the index generator, the semantic evaluator is as follows (generated parts are shown in italics and frame parts are shown in roman type): IMPLEMENTATION MODULE Indexsem; FROM SYSTEM IMPORT WORD; FROM Indexlex IMPORT at; FROM FileIO FROM Indexlex VAR f: IMPORT File, Open, Close, Write, WriteString,WriteLn; IMPORT GetKeyword, AdjustNumber; File; keystring, refstring,string: value: CARDINAL; PROCEDURE BEGIN ASSIGN(VAR x:=y END x:WORD; y:WORD); ASSIGN; PROCEDURE Semant (sem:CARDINAL) BEGIN CASE sem OF vee? | 12: ARRAY[1..50] ; (*line 29*) Open (f, "INDEX.OUT") [| 13: (*line 30*) Close (f) | 14: 72152. (*line 33*) refstring:=string; (line 35%) GetKeyword (keystring); WriteString(f,keystring) ; Write(f,CHR(0)); WriteString(f,refstring); I 16: (*line 44%) WriteLn (f) OF CHAR;
Sec. 8.3 Results of a Coco run 195 ASSIGN (value, at [1]); [17s I 18: (*l]ine 44%) AdjustNumber (value, string) (*line 45%) GetKeyword (string) END; END Semant; END Indexsen. 8.3.3 The generated parser tables Coco generates the following tables: 1. 2. G-code; information about nonterminals (G-code start address, deletability, set of start symbols); terminal successors of eps-symbols; symbol sets represented by any-symbols; number of attributes for terminals and pragmas; number of semantic actions for pragmas; symbol names for error messages. The table values are inserted as initialization code into the generated syntax analyzer. We will now show these values in a decoded form. G-code Address = Instruction Code (addresses take uindex2——= IL SEM12 2 NTA Relation, 3 8 0 6 JMP 2 l@ @ 2 EPS SEM13 RET 1 8 13 ia 1 Reference 2 9 14 al 1 4 0 15 he 2 10) Os Cu? 9 oF 12 See 13 15 16 18 22 23 25 28 VW 9 9 Re OU LON mea NT SEM14 a TA SEM15 T JMP EPS won keyword, Du 18 2 28 28 2 bytes)
Chap. 8 Applications 196 = References 31 TA number, 35 SEM16 36 SEM17 37 38 40 42 43 RET "iL T SEM18 RET === 38 1 16 5 0238 WH 11 0 0 18 ial ial keyword 3 4 dunmyarulleg= 44 NT Index 2 7 46 Te EOF 0 0 48 RET im) The entire grammar occupies only 48 bytes of G-code! Nonterminal description symbol Index (no.) start (7) Relation (8) Reference (9) address deletability iL deletable {"*", number} nondeletable nondeletable {"*", {"*", number} number} 13 31 terminal epS-SUCCEesSSors 119 {EOF } oe {EOF, "*", number} Number of attributes for terminals and pragmas EOF: mat: ml: "Ku 0 0 0 ‘ keyword: number: eolsy: 0 1 0 0 Pragma semantics attribute eolsy: passing 0 action user action 0 Symbol names names: EOF/equal/semicolon/asterisk/keyword/ number/eolsy/Index/Relation/Reference nanespointers ip sell, 21, BO, Sh, “5. Sil, 57, 66 start symbols
9 Experiences with Coco In 1981 workers at the University of Linz built a parser-generator that generates parser tables for an LL(1) syntax analyzer from an input grammar in Wirth's EBNF notation. The generator proved useful, which is the reason why it was enhanced in 1983, and eventually evolved into the compiler compiler Coco. The first version of Coco ran on an Intel 8080 development system, and was written in PL/M-80, a language similar to PL/I for microcomputers. Since then, many more versions of Coco have been implemented in Modula-2 on various microcomputers including the Macintosh, the IBM-PC, the Atari 1040 and the Lilith. There is also a version for IBM mainframes. Coco has been in use for several years now and has proved to be useful both in research projects (e.g. construction of a Modula-2 compiler, tools for static program analysis) and in student courses. 9.1 A basis for measurements In the following sections, we will describe the results of memory and run-time measurements performed on Coco, and on three compilers generated by Coco. First, we will measure the generation of a Modula-2 compiler. This compiler consists of 6 passes (lexical analysis, syntax analysis, name analysis, declaration analysis, semantic analysis, and code generation). Each of passes 2 through 6 reads the entire source program in an intermediate language generated by the previous pass. This intermediate program is 197
Chap. 9 Experiences with Coco 198 analyzed and forwarded to the next pass as a new, usually shorter, intermediate program (with the exception of pass 6, which generates the object code). Each pass is therefore a compiler in itself, described with an attributed grammar and translated by Coco into a syntax analyzer and a semantic evaluator. For the measurements, we will not look at the entire Modula-2 compiler, but rather at two specific passes, since we are interested in the individual Coco runs. We select pass 2 (syntax analysis) and pass 4 (declaration analysis). These two passes have rather different characteristics, which make them well suited for a comparison. Pass 2 has a large and deeply nested recursive grammar with only a few semantic actions, while pass 4 has a simple grammar with a lot of semantic actions. In the following paragraphs, we will talk about each of the passes as if they were independent compilers. Secondly, we will measure the generation of Coco by itself. Compared to the Modula-2 compiler Coco is much smaller and consists of a single pass. Thus, we have a comparison between two large applications and a small application. Table 9.1 shows the sizes of the compilers in terms of their attributed grammar. Table 9.1 Size of the attributed grammars of the example compilers Modula-2 Modula-2 (pass 2) (pass 4) Number of lines Terminal symbols Pragmas Nonterminal symbols Alternatives Symbols in productions Semantic actions G-code The measurements shown in the following sections were taken from the Lilith, since the Modula-2 compiler was only available there. For the Macintosh the results would have been very similar. The Lilith is a 16-bit computer built on an Am2901 bit-slice processor with a cycle time of 150 nanoseconds. It has a very compact object code format (the so-called M-code) which has been especially tailored to Modula-2.
Sec. 9.2 9.2 Measurements Measurements on Coco on 199 Coco First, we will look at Coco and measure the memory requirements and the-run time required by Coco to generate a compiler. Memory requirements Obviously the memory requirements for the code and the static data of Coco are the same in all three measurements (65 347 bytes). The size of the dynamic data depends on the input grammar but requires typically less than 1000 bytes (see Table 9.2). Table 9.2 Memory requirements of Coco for the generation of various compilers Modula-2 (pass 2) 65537 bytes 66219 bytes 65911 bytes The memory requirement for the code is shared between ten Coco-specific modules and two standard modules. In addition, Coco uses one module that belongs to the resident part of the operating system, and thus does not increase Coco's memory requirements. Run-time The run-time of Coco depends on the size of the input grammar. Most of the time is used by the lexical analyzer that reads and lists the grammar. To write out the syntax analyzer and the semantic evaluator of the target compiler also requires considerable time, while the rest of the work is done fairly rapidly. In large grammars, with a deeply nested hierarchy of nonterminals (as in pass 2 of the Modula-2 compiler), also the grammar tests take a certain amount of time. (see Table 9.3)
Chap. 9 Experiences with Coco 200 Table 9.3 Run-time of Coco for the generation of various compilers Modula-2 (pass 2) Modula-2 (pass 4) Lexical analysis Syntax analysis, semantic processing Grammar tests Output of the generated compiler 9.3 Measurements on some generated compilers We will now consider the memory requirements and the run-time of the compilers generated by Coco. Memory requirements Here, we are only interested in parts which are actually generated by Coco, namely the syntax analyzer, the semantic evaluator, and the parser tables. We are not going to consider the size of the semantic modules since they are independent of Coco. Table 9.4 Memory requirements of some generated compilers Modula-2 (pass 2) Syntax analyzer Semantic processor Analysis tables 9532 bytes 8389 bytes 6344 bytes All three compilers use the same syntax analyzer driven by different tables. Its size is constant. The size of the semantic evaluator depends on the number and the length of the semantic actions of the attributed grammar. As expected, its size is larger in pass 4 of the Modula-2 compiler than in pass 2 and in Coco. Note that the memory requirements of the generated compilers do not depend on the length of the input text, since no Syntax tree of the input is built.
Sec. 9.4 General experiences 201 Run-time The run-time of the generated compilers on input texts of various length is shown in Table 9.5. Table 9.5 Run-time of some generated compilers 100 Input symbols 1000 Input symbols 5000 Input symbols Even though Coco is the smallest of the three compilers, it runs much slower than the others since it does a lot of input and output (it writes long parts of Source programs to disk), while pass 2 and pass 4 of the Modula-2 compiler work almost entirely in the main memory (with input and output used only for intermediate languages). # ws# 9.4 General experiences The experiences with Coco are exceptionally good. Coco allows a tight and very readable specification of the translation processes. The attributed grammars become essential parts of each compiler documentation. By automating syntax analysis, error handling, and semantic processing, attention can be focused on the actual translation process in the semantic procedures. More time is available for the design now. Working with attributed grammars almost automatically leads to a modular program structure with abstract data structures and access procedures, which are usually small and easy to understand. In multi-pass compilers, like the Modula-2 compiler, the symbol any is especially useful since it lets one easily skip over portions of the input that are not of interest in this pass. The concept of pragmas has also proved useful since they make it easy to pass control information between successive passes (e.g. trace commands, options, etc.). The limitations of LL(1) grammars are not a serious problem. Because of Wirth's EBNF notation, it is not necessary to perform complex grammar transformations in order to remove LL(1) conflicts, which is usually required
202 Experiences with Coco Chap. 9 in the standard BNF notation. The only time when we failed to resolve LL(1) conflicts was in the translation of the language PLM-80. The conflicts were resolved by delegating some parts of the processing to the lexical analyzer. Processing the input with L-attributed grammars and without building a syntax tree is not a serious restriction. If during processing some attributes are needed which only become available later, intermediate results are stored until the required attributes have been calculated and the final translation is possible. The omission of a syntax tree leads to efficient compilers with regard to speed and memory requirements. Most of the generated compilers run on microcomputers. The negative experiences in the use of Coco are limited to the global nature of semantic objects in Cocol, which requires explicit stacking of variables, and to the fact that whenever an error has been detected in the attributed grammar the program development cycle is enlarged by an additional run of the compiler compiler. However, the positive experiences outweigh the negative ones. Even though we have no hand-coded compiler that we can compare directly to a Coco-generated compiler, we are not afraid to claim that the efficiency of compilers generated by Coco is close to that of hand-coded compilers, and it is certainly easier to implement and to maintain a compiler with Coco than by hand.
A Definition of Adele An algorithm description language, like a programming language, should offer all concepts for the description of algorithms, but should be free of syntactic peculiarities. In this way, the algorithms will stand out clearly and the reader will not be distracted by all sorts of baroque constructs. For the same reason, it should use only a few constructs and give the user freedom of expression. It should lean on popular programming languages so that it is easy to read, but should not be firmly bound to a particular programming language. Our algorithm description language Adele contains elements of PL/I, Modula-2, and Ada. We will describe its structure by a few examples. Overall structure Each algorithm has a name, parameters, and instructions: Search (Jlistllengthixti): begin Instructions end Search The parameter list of functions is followed by the type of the function: Search (llistllengthlx) integer: begin Instructions return ı end Search Input parameters are marked by J, output parameters by T, and transition parameters by J. Statements We distinguish between assignments, procedure calls, control statements, input-output statements, and text statements. To improve readability, instructions may optionally be separated by a semicolon. 203
App. A Definition of Adele 204 Assignment. The assignment has the form variable:= expression Procedure call. The call of a procedure consists of the procedure name and the actual parameters in parentheses: ReadCard (Tcard) It is a useful convention to define procedure names partially with capital letters, and variable names completely with lower case letters. Control Statements. Here we use the modern forms of Modula-2 which are explicitly terminated by an end, with the exception of the repeat statement: if expression then statement sequence end if expression then statement sequence else case expression of label: statement sequence | label: statement or sequence expression repeat loop do Statement statement variable statement statement sequence sequence sequence sequence with := expression sequence until end | label: else end while sequence case expression of label: statement end for statement statement statement sequence end expression exit end to expression [by expression] do sequence end The control variable will be undefined after completion of the for loop. exit exits from the immediately enclosing loop statement. return exits from a procedure. return expression exits from the function procedure with expression as the function value. halt stops the algorithm without return to a surrounding algorithm. Input-output statements. Here we only use three statements: read(TxTeof) write (Lx) writeln read x or signal end of input file write x to the output medium emit line feed We do not concern ourselves with the format of the input and output text. The boolean parameter eof indicates the end of the input file. When x has been read, eof will be false
App. A Definition of Adele 205 on return. If x could not be read due to end of file, eof will be true and x will be undefined on return. Text statements. Text statements are free texts that describe actions. For example: calculate mean values and variances; The only rule is that they be terminated or separated by a semicolon so that their end can be seen. Expressions For expressions we stipulate the common combinations of Operators and operands without giving specific rules. We state only that boolean expressions can be viewed as conditional expressions with short circuit evaluation: ach au || Io is equivalent to is equivaleni to if a then if a then b else false true else b end This means that if the left operand alone is sufficient to determine the value of the expression, then the right operand is not evaluated. Declarations Usually declarations are not needed for the description of short and simple algorithms, especially if the variables used are obvious from the preceding explanations. However, in longer algorithms with local variables, global variables, parameters, and perhaps also named constants, it is advantageous if the algorithm description language also contains declarations. In Adele, the declaration of constants and variables can be written between the head of the algorithm and begin. We partition the declared items into the following classes: parameters, global variables, constants, local dynamic variables, and local static variables. The classes are identified by the keywords param, global, const, static. After each keyword, one or more declarations of names of the corresponding type can be placed. A constant declaration has the form name = value a variable declaration has the form name: type As types we use the elementary types of Pascal and Modula-2 with the following keywords or structures: integer real boolean char (red, green, array blue) (index:index) of type Array types allow a certain amount of freedom. If the range limits are not needed, we write array of type If the type is not needed, we write array (index: index)
Definition of Adele 206 App. A If both are not needed, we simply write array As an example of the use of declarations, we describe a linear search algorithm with declarations of all names: Search (vlistd length! xT i) : param list: array of integer length, x, i: integer local j: integer begin j:=length while j>0 & list (j)<>x do j:=j-1 end 1:=j end Search For static variables, we allow optional initialization. This is done by adding the phrase init(value) after the type: static finished: boolean init (false) Comments Comments, like those in Ada, start with two minus signs and extend over the rest of the line. --- This is a comment which extends over Undefined two lines. issues Adele has no rules for the remaining items such as records, pointers, modules, etc. We write them, more or less, in the style of Modula-2.
B Modula-2 and Pascal Since Modula-2 evolved from Pascal, its appearance is very similar to Pascal, and so Pascal programmers have no difficulty in reading Modula-2 programs. Here we will briefly present the most important differences for the reader of the Modula-2 programs in this book. The complete language definition and examples can be found in the books of Wirth [1982] and Pomberger [1986]. A didactically emphasized introduction to Modula-2 is the book of Blaschek, Pomberger, and Ritzinger [1985]. General’ characteristics Modula-2 is a system implementation language that enhances Pascal in the following key features: 1. 2. 3. Modular program structure. Modula-2 programs are composed of separately compiled modules. The compiler checks the consistency of the interface between modules. The language is therefore especially suited for the implementation of data capsules and abstract data types. Coroutines and parallel processes. Modula-2 provides the coroutine facility as the basic element for the implementation of parallel processes. Low-level features. Modula-2 provides facilities to bypass strong type checking so that memory words can be directly accessed and addresses can be handled. This makes it possible to produce machine-specific code. We will not describe parallel processing or low-level features in this chapter since Coco does not use them. Lexical elements Modula-2 differs from most Pascal implementations by its sensitivity to the case of letters. The names TRUE, True, and true denote three different objects. Single character constants can be denoted by use of an octal number that is terminated with a 'C', e.g. CONST ff = 14C. 207
App. B Modula-2 and Pascal 208 Declarations In contrast to Pascal, constants, type, variable, and procedure declarations can appear in any order. There are no labels or label declarations. Standard types. In addition to the standard types of Pascal; INTEGER, REAL, BOOLEAN, CHAR, we have the standard type CARDINAL for unsigned natural numbers. For 16-bit implementations, the range of integer values is -32 768 to +32 767. The range of cardinal values is 0 to 65 535. Enumeration, subrange, array, record, and pointer types are the same as in Pascal with the exception that arrays cannot be packed, and variant record types have an improved syntax. If the word length of the computer is w bits, then the cardinality of set types is confined to w, or a ‘small multiple thereof (according to the language definition). There is a standard type BITSET that consists of the elements 0 through w - 1: TYPE BITSET = SET OF [0!..w-1] Set constants are enclosed in '{' and '}'. The machine-dependent type WORD denotes arbitrary data whose length is a machine word. It is compatible with all types whose length is a machine word. Expressions Expressions in Modula-2 are constructed in the same way as in Pascal. The operators have essentially the same meaning. One important difference in Modula-2 is that expressions that contain the operators 'AND' or 'OR' are interpreted as conditional expressions whose evaluation is terminated as soon as the result of the expression is known (short-circuit evaluation): a AND b a OR b is equivalentto is equivalentto if a then b else false if a then true else b Statements Assignment, procedure call, and repeat-statement are taken from Pascal without change. If, case, while, and for statements have been syntactically improved and expanded. The if statement can have one or more elsif parts, the case statement can have an else part. All of these constructs are explicitly terminated by END, which eliminates the need to distinguish between single and multiple statements in a block: ifstatement = IF expr THEN statementsequence {ELSIF expr THEN statementsequence} [ELSE statement sequence] END. casestatement CASE case expr = = OF Whilestatement WHILE case caselabellist expr ":" case} [ELSE statementsequence] statementsequence. = DO forstatement = FOR ident ":=" statementsequence expr statement sequence END. {"|" TO expr END. [BY constexpr] DO END.
App. B Modula-2 and Pascal 209 New features are the loop statement (infinite loop), the exit statement to leave the loop statement, and the return statement to leave a procedure or function (here with passing of the function value): loopstatement = exitstatement = EXIT. returnstatement LOOP statementsequence RETURN = END. [expr]. There is no goto statement and no input-output statement in Modula-2. Input and output is done by procedure calls. Procedures 7 There are procedures and function procedures as in Pascal that permit VAL and VAR parameters. Procedures and functions both begin with the keyword PROCEDURE. Modula-2 permits procedure variables (not used by Coco), and arrays of unspecified length (so-called open arrays) e. g. in the form: PROCEDURE Sort (VAR VAR n: INTEGER; list:ARRAY BEGIN (* assume n:=HIGH (list); (* standard END OF list: proc. INTEGER); ARRAY[0..n] to find OF upper INTEGER limit *) of index *) Sort Standard procedures. The standard procedures that differ from Pascal are: CAP (ch): CHAR converts from lower to upper case HIGH(a): CARDINAL returns the upper bound of array a DEC (x) decrease DEC (x,n) X!=x-n EXCL (s,i) exclude element i from set s: HALT terminate entire program INC (x) increase INC (x,n) INCL(s,i) x:ı=X-1 s:=s-{i} x:=xt1 x:=x+n include element i in set s: s:=st{i} Type transfer functions. Modula-2 offers the possibility of explicit type conversions by so-called type transfer functions. Each type name can be used as a function with one argument. For example, the type transfer function CARDINAL (b) denotes the bit pattern of b (without any conversion) but with type CARDINAL. The context condition must hold that type b has the same number of bits as CARDINAL. Type transfer functions should be used with care since they make programs machine dependent. Modules An executable Modula-2 program consists of one or more separately compiled modules. A module is a collection of declarations and statements giving a higher-level unit. Module boundaries are like a fence for names, which means that names declared inside a module are unknown outside, and names declared outside a module are unknown inside. The programmer can open the fence for selected names by an import list that contains all names that are
Modula-2 and Pascal 210 App. B declared outside and are to be known inside the module and an export list that contains all names that are declared inside the module and are to be known outside. Thus the access is explicitly specified by the programmer and visible in the program text. There are four kinds of modules: main modules, definition modules, implementation modules, and inner modules. Main modules are almost like Pascal programs. They consist of an import list, declarations (of constants, types, variables, procedures, and inner modules), and statements: programmodule = MODULE ident ";" {import} {declaration} BEGIN statement sequence END ident "." Only the line {import} is different from Pascal. It references other separately compiled modules, and causes these modules to be loaded. In the most common form import = FROM ident IMPORT identlist ";" ident is the name of the module to be loaded and identlist contains the names of the objects exported by the loaded module for use in the declarations and statements of the importing module. In the less common form import = IMPORT identlist ";" the identlist contains only the names of the modules that are to be loaded together with the importing module. Separately compilable modules that are not main programs consist of two separately compiled parts, the definition module and the implementation module. The definition module describes the interface of the module to its clients. All declared names are autornatically exported. definitionmodule DEFINITION = MODULE ident ";" {import} {definition} END ident "." definition contains the declarations of the exported objects. Procedures are only specified by their procedure heading (procedure name and parameters): definition = GONS Tarra EYRE | WS a Sec | PROCEDURE ident [formalparameters] ";". The implementation module contains the declaration of the non-exported objects, the code for all procedures, and the statements of the module: implementationmodule IMPLEMENTATION {import} {declaration} BEGIN statement sequence END ident "." = MODULE ident ";"
App. B Modula-2 and Pascal 211 Definition and implementation modules exist in pairs and have the same name. The definition module must be compiled before the implementation module. A module can be compiled only if the definition modules of all of the imported modules have been compiled before. Storage for local objects of separately compiled modules is allocated when the object program is loaded, and remains allocated until the program terminates (static memory allocation). The statement sequence of the implementation module is executed immediately after loading the module, and therefore can be used for the initialization of data. Inner modules are modules that are not separately compiled. They are like procedures nested inside other modules or procedures. They can import and export. moduledeclaration MODULE ident = ";* {import } [EXPORT [QUALIFIED] {declaration} identlist ";"] BEGIN statement sequence END ident. Storage for local objects of inner modules is allocated when the procedure that contains the inner module is activated, and released when the procedure returns to its caller. By calling the surrounding procedure, the statements of the inner module are also executed. There is a (fictitious) separately compiled module SYSTEM, provided by the compiler, that gives access to low-level features. It exports types and related procedures (including the type WORD). Each module that imports SYSTEM is therefore machine dependent. ¥,#
C Syntax of Cocol Keywords: Other terminal symbols: Nonterminal symbols: Cocol Upper-case letters Literals or lower-case letters Upper and lower-case letters = GRAMMAR identifier [SEMANTIC DECLARATIONS {any}] {MACROS {SemMacroDef }] TERMINALS {Symbol [Attributes] [PRAGMAS {Symbols [Attributes] NONTERMINALS RULES {identifier {identifier [AliasName]} [SemAction]}] [Attributes] [Attributes] "=" [AliasName] Expression ENDGRAM, Expression = Term {"|" Term = Factor. {Factor}. Factor = [Attributes] Symbol Term}. | EPS | ANY | SemAction | "(" Expression ")" [SUL SE xpressaonmua | “Expression: ™}te Attributes = "<" ( OutAttributes | InAttributes [";" OutAttributes]) InAttributes = IN ":" (identifier | number) {"," (identifier | number) }. OutAttributes = OUT SemAction ="SEM ":" identifier (EN (2 {"," identifier}. identifiers) | {any} ) ENDSEM. SemMacroDef = SEM Symbol = identifier AliasName = ALIAS 212 ":" identifier || string. Symbol. ":" {any} ENDSEM. ">", "."} }
D G-code T sy terminal If the next input, symbol is sy, then recognize it, else report an error. TA sy adr terminal with alternative If the next input symbol is sy, then recognize it, else go to adr. NT sy nonterminal If the next input symbol is a valid start of the nonterminal sy, then enter the production of sy, else report an error. NTA sy adr nonterminal with alternative If the next input symbol is a valid start of the nonterminal sy, then enter the production of sy, else go to adr. NTS sy sem nonterminal with input attribute semantics If the next input symbol is a valid start of the nonterminal sy, then execute the semantic action sem (for input attribute assignment) and enter the production of sy, else report an error. NTAS sy adr sem nont. with alternative and input attribute semantics If the next input symbol is a valid start of the nonterminal sy, then execute the semantic action sem (for input attribute assignment) and enter the production of sy, else report an error. ANY any Recognize the next input symbol. ANYA nradr any with alternative If the next input symbol is in the symbol set (any-set) denoted by nr, then recognize it, else go to adr. EPS nr epsilon (empty string) If the next input symbol is in the successor set (eps-set) denoted by nr, then recognize the empty string, else report an error. EPSA nradr epsilon with alternative If the next input symbol is in the successor set (eps-set) denoted by nr, then recognize the empty string, else go to adr. 11 JMP adr Go to adr. Jump RET return Return from the production of a nonterminal. 129% SEM semantic action Execute the semantic action with the number of the G-code instruction. 213
E Intermodular cross-reference list The following list contains all names that are exported or imported by a module of the Coco system as well as their data types. For every name, the first reference denotes the exporting module and the other references the importing modules. Allocate PROC (VAR System, alts cocogen2, ARRAY [1..10] cocolex, Attrtype PROC CloseFile PROC cocosyn cocotst s:Symbolset; n:CARDINAL) cocotst cocogen, cocogen2, cocolst coco, cocosem CARDINAL PROC PROC cocogen, cocogen2, cocosem, cocosym, (nr:CARDINAL) cocogen, cocogen2, cocosem, (sy,nr:CARDINAL) : BOOLEAN cocosym, 214 cocosem, PROC Errors, CompleteAt cocosem m:Marklist) PROC (f:File) FileIO, coco, cocolex, CompErr Errors CARDINAL cocosym, (VAR cocogen, col cocosyn, cocosem (VAR cocosym, Close cocosym, (term, nonterm, const) cocogra, ClearSet OF cocogen, cocogen, ClearMarkList size:LONGINT) cocogen2, CARDINAL cocogra, at ptr:ADDRESS; cocogen, cocosem cocosym cocosyn
App. E con Concat Left Intermodular cross-reference list File FileIO, coco, cocogen, cocogen2, cocogra, cocosym, cocosyn, cocotst, Errors PROC (VAR PROC (VAR gp,gl,gp1,gl1:CARDINAL) cocogra, cocosem Copy PROC (typ,col:CARDINAL) cocogen, cocosem CopyFramePart PROC 4VAR f1,f2:File; cocogen, cocogen2 ddt ARRAY ["A".."Z"] cocolex, OF coco, s:ARRAY OF CHAR) BOOLEAN cocogra, cocosem, PROC (VAR ptr:ADDRESS) System, cocogen, cocogen2, PROC cocosem, cocosem ConcatRight Deletable cocolex, gp/9g1,gpl1,g11:CARDINAL) cocogra, Deallocate 215 cocosym, cocotst Errors (loc:CARDINAL) : BOOLEAN cocogra, cocosym, cocotst DeleteRedundantEps PROC cocogra, Coco DelNode Direction PROC (gn:Graphnode) : BOOLEAN cocogra, cocosym, cocotst (up, down) cocosym, cocosem Done BOOLEAN EF CONST EmitAction PROC (line:CARDINAL; cocogen, cocosem EOL CONST FileIO, FileIO, cocogen, cocogen2 cocolex, cocolst VAR cocolex, cocolst RECORD Errors, cocosyn Errorptr POINTER Errors, TO Errornode cocolst, cocosyn File RECORD FileIO, coco, BOOLEAN cocogen, coco FindCircularRules PROC cocotst, (VAR coco filesopen FindDelSymbols sem:CARDINAL) CHAR FileIO, Errornode coco, CHAR PROC cocosym, Coco cocogen, cocogen2, ok:BOOLEAN) cocolex, cocolst, Errors
GenAssign GenSynFiles GetA cocosem PROC cocogen2, coco PROC (n:CARDINAL; GetF PROC GetFirstSet PROC set:Symbolset) VAR (sy:CARDINAL; set:Symbolset) VAR set:Symbolset) cocotst (spix:CARDINAL; cocosym, VAR sem:CARDINAL) cocosem (spix:CARDINAL;VAR cocolex, cocogen, name:ARRAY cocogen2, PROC (VAR nr, line,col:CARDINAL) BrFOLSTREOCOLSE GetNextSynErr PROC (VAR GetNode PROC PROC VAR CHAR;VAR cocosym, cocogen2, (VAR VAR gn:Graphnode) cocosem, synerrors, cocosym, cocotst semerrors:CARDINAL) EEEORSTELCHEO GetSy PROC cocolex, cocosyn GetSy PROC (sy:CARDINAL; cocosym, cocogen2, GetSymbolSets PROC cocosym, coco gramspix CARDINAL cocosym, cocogen2, PROC cocogra, cocosem RECORD cocogra, cocogen2, GraphList Graphnode InsertFramePart PROC cocogen, cocosem VAR sn:Symbolnode) cocogra, cocosem, cocotst cocosem cocosem, len:CARDINAL) cocotst line,col:CARDINAL) cocolst (p:CARDINAL; cocogra, GetNumberOfErrors symbols:Errorptr; OF cocogra, GetNextSemErr Errors, dir:Direction) cocotst cocosym, PROC VAR VAR first:Symbolset) cocotst (loc:CARDINAL; cocosym, GetName VAR (sy:CARDINAL; cocosym, cocogen2, PROC spix:CARDINAL; cocosem PROC (n:CARDINAL; cocosym, cocogen2 PROC set:Symbolset) VAR (sy,n:CARDINAL; PROC GetE GetMacroNr VAR cocogen2 cocosym, GetFo left, right :CARDINAL) (typ:Attrtype; PROC cocogen, cocosym, Get At App. E Intermodular cross-reference list 216 cocosym, cocotst
App. E IsInSet line LL1Test Intermodular cross-reference list PROC (n:CARDINAL; cocosym, cocotst CARDINAL cocolex, cocogen, PROC (VAR cocotst, lst PROC PROC cocogen2, (loc:CARDINAL; cocosym, cocosym, cocogen2 cocogen2 maxn CARDINAL maxp CARDINAL cocogra, cocotst m:Marklist): 16] BOOLEAN OF BITSET Cocogen2, cocosym cocogen2, cocogra, cocotst cocosym, cocogen2, cocogra, cocotst CARDINAL cocogen, cocogen2 cocosym, CARDINAL CARDINAL cocosym, PROC cocogen2, NewEpsBeforeDelNts dir:Direction) cocosem PROC cocogra, PROC cocotst (sy,spix:CARDINAL; cocosym, NewMacro VAR cocotst cocosym, cocosym, cocotst DIV cocosym, CARDINAL cocosym, NewAt cocosyn m:Marklist) cocogra, maxeps maxt VAR ARRAY[O..maxnodes CARDINAL maxsem cocosym, cocotst (loc:CARDINAL; maxany maxs cocosem, 111:BOOLEAN) coco, cocogra, Marklist cocogen2, BOOLEAN File cocogra, Marked s:Symbolset): coco cocolst, Mark VAR 217 Coco (spix,sem:CARDINAL; cocosym, VAR ok:BOOLEAN) cocosem NewNode PROC (typ:Symboltype; sp, line:CARDINAL) :CARDINAL cocogra, cocosem NewSy PROC (spix:CARDINAL; cocosym, cocosem normal enumeration constant System, coco, Errors Open PROC (VAR f:File; output : BOOLEAN) FileIO, coco, typ:Symboltype) : CARDINAL volRef: INTEGER; cocogen, cocogen2, fn:ARRAY cocolst OF CHAR;
OpenFile PROC (spix:CARDINAL) cocogen, OpenSem PROC cocosem (line:CARDINAL; cocogen, VAR sem:CARDINAL) cocosem Parse PROC (VAR correct :BOOLEAN) cocosyn, COCO printinput BOOLEAN cocosyn, PrintListing COCO, coco BOOLEAN cocosyn, PrintSynError cocolex PROC cocolst, printnodes App. E Intermodular cross-reference list 218 PROC coco, (VAR cocolex f:File; VAR synerrors:CARDINAL) VAR ch:CHAR) ERROESTRECOCONSE PutStatistics PROC cocogen2, Read PROC FileIO, RepNode PROC coco (VAR £:File; coco, (p:CARDINAL; cocogra, RepSy RestartHash Restriction PROC cocogen, PROC cocolex, cocosem PROC sn:Symbolnode) cocogra, rootloc CARDINAL rules CARDINAL cocogra, cocogra, cocogra, cocolex, cocosem, cocogen2, cocosem PROC (sem:CARDINAL) cocosem, cocosyn SemErr PROC cocotst cocosym, cocosym cocotst (nr, line, col:CARDINAL) Errors, cocogen, (VAR cocosym, src cocosem, cocogen2, Semant PROC cocosem, (nr:CARDINAL) Errors, SetBit Errors cocosym (sy:CARDINAL; cocogen2, cocolst, gn:Graphnode) cocosem, cocosym, cocolex, cocogen2, cocolex, s:Symbolset) cocotst File cocolex, coco, cocogen, StartCopy PROC (col:CARDINAL) cocogen, cocosem StopHash PROC Symbolnode RECORD cocolex, cocosem cocolst cocosem, cocosym
App. E Intermodular cross-reference list cocosym, Symbolset cocogen2, SyNr PROC cocogen2, cocotst OF BITSET SyntaxError PROC PROC coco, (VAR cocotst, PROC Errors ok:BOOLEAN) Coco PROC cocotst, (VAR ok:BOOLEAN) coco (VAR ok:BOOLEAN) cocotst, Coco CARDINAL cocolex, cocosem, PROC (VAR cocosym, PROC cocosyn sl,s2:Symbolset; n:CARDINAL) cocotst (VAR f:File; FileIO, cocogen, cocosym, Errors PROC line,col:CARDINAL) (st:Status) PROC TestIfAllNtReached cocotst cocosyn System, TestCompleteness cocosem, cocosem (symbols:Errorptr; Errors, Terminate cocogra, (spix:CARDINAL) : CARDINAL cocosym, WriteCard cocosem, 16] (eps,t,pr,nt,any,err) cocosym, typ DIV cocotst cocogen2, cocosym, Symbolt ype TestI£NtToTerm cocogra, ARRAY [0..maxterminals 219 (VAR f:File; ch:CHAR) cocogen2, cocolex, nr:CARDINAL; cocolst, w: INTEGER) FileIO, cocogen, cocogen2, cocogra, cocolex, cocosem, cocosym, cocosyn, cocotst, Errors Writelnt PROC (VAR f:File; FilelIO, coco WriteLn PROC (VAR f:File) FileIO, coco, cocogen, cocosyn, WriteString PROC (VAR FileIO, cocosem, WriteText cocotst, f:File; coco, nr:INTEGER; w: INTEGER) cocogen2, cocogra, cocolst, cocosym, cocogra, cocolex, cocolst, Errors s:ARRAY cocogen, cocosym, cocolst, OF CHAR) cocogen2, cocosyn, cocotst, Errors PROC (VAR f:File; t:ARRAY OF CHAR; 1: INTEGER) FileIO, cocogen, cocogen2, cocogra, cocolex, cocosym, cocotst, Errors
F Program listings This appendix contains the program listings of Coco, more than 3500 lines of Modula-2 source code. It is not our intention to describe the program step by step. At this point we want to provide the reader with an overview of the function of the individual modules, and to tell him where he should start reading, and which procedures he should further review in order to understand the program. Modula-2 has a high degree of self-documentation, which makes it possible to partition a large program into small modules that are easy to understand, and furthermore to separate these modules into even smaller procedures that are once more easy to understand. By reviewing the algorithms in Chapters 2, 3 and 7, it should not be difficult for the reader to understand all the details of Coco. F.1 Overview Figure F.1 shows the phases of Coco with their modules and the data flow between them. The lexical analyzer (cocolex) reads the compiler description and separates it into tokens. The syntax analyzer (cocosyn) checks the syntax of the input stream and drives the semantic processing program (cocosem) by activating semantic actions via action numbers. In this phase, the symbol list (in cocsym) and the top-down graph (in cocogra) are generated. The module cocogen generates the new semantics evaluator from the semantic actions of the compiler description. Finally, the symbol list and the top-down graph are analyzed in the grammar tests (cocotst), and if these tests have been successfully completed, the new syntax analyzer with its parser tables is generated. Since Coco was constructed by itself, the syntax analyzer (cocosyn) and its semantic evaluator (cocosem) are examples of compiler parts produced by Coco. 220
App. F Overview 221 Compiler description Lexical analysis cocolex Mannnunsnnnnnnnennnnnnnnnnee Symbols, attributes Syntax analysis Semantic analysis cocosyn cocosem cocosym cocogra cocogen RER Symbol list Compiler generation Syntax analyzer Fig. F.1 Phases and modules of Coco * F.2 Module hierarchy Coco consists of 1. 10 Coco-related modules coco cocolex cocosyn cocosem cocogra cocosym main module lexical analyzer syntax analyzer semantic evaluator top-down graph handler symbol list handler cocotst grammar tests cocogen cocogen2 cocolst generator of the new semantic evaluator generator of the new syntax analyzer and the parser tables source list generator 2. 2 general purpose standard modules Errors general error module for compilers generated by Coco FileIo input/output procedures 3. 1 operating system module (not part of Coco) System dynamic memory management (heap)
222 Program listings App. F Figure F.2 shows the module hierarchy. An arrow from module A to module B means that A calls B. Arrows leading to the operating system module and the standard modules are not shown for simplicity. Those modules are used by almost all of the other modules, and are not a direct part of Coco. cocogen cocogra cocosym cocolex System FileIO Errors Fig. F.2 Module hierachy with relation 'uses procedures from' F.3 Module descriptions We will now give a short description of all modules of the Coco system. A diagram for each module will show which procedures are called from other modules. coco coco is the main module. It opens the source file and the list file and calls the syntax analyzer (Parse). When the syntax analysis is completed, the source file has been read, and the symbol list and a top-down graph have been stored. The top-down graph is further processed by inserting and deleting eps-nodes at certain positions (NewEpsBeforeDelNts, DelRedundantEps) and the terminal start symbols are collected (FindDelSymbols, GetSymbolSets). After that, coco calls the grammar tests (FindCircularRules, TestIfNtToTerm, TestCompleteness, TestIfAllNtReached, LL1 Test) and generates the target compiler (GenSynFiles) if no errors are found. At the end, statistics about the compilation are written to the list file (PutStatistics), and all files are closed.
App. F Module descriptions FindDelSymbols GetSymbolSets NewEpsBeforeDelNts DelRedundantEps 223 GenSynFiles PutStatistics CloseFile FindCircularRules Testl£fNtToTerm TestCompleteness TestlfAllNtReached LL1Test Fig. F.3 coco and the modules imported by it cocolex cocolex is the lexical analyzer of Coco. It reads the Cocol input, separates it into tokens, and passes them together with their attributes to the syntax analyzer. Names and strings are stored in a name list. Numbers are translated into their numeric value. The main procedure of cocolex is GetSy. cocosyn cocosyn is the syntax analyzer of Coco and has been generated by Coco itself. It operates according to the table-driven LL(1) parsing algorithm described in Section 2.5, and uses the error-handling mechanism described in Section 2.6. cocosyn gets the source tokens from the lexical analyzer (GetSy), analyzes them, and calls the procedure Semant to execute the semantic actions. cocosyn Fig. F.4 cocosyn and the modules imported by it cocosem cocosem is the semantics evaluator of Coco. It has been generated by Coco itself and contains the semantic actions of the attributed grammar of Coco. cocosem calls the procedures for the generation and management of the symbol list and the top-down graph: 1. 2. 3. 4. 5. 6. symbol handling: NewSy, GetSy, RepSy, SyNr; attribute handling: NewAt, GetAt, CompleteAt, top-down graph handling: NewNode, GetNode, RepNode, ConcatLeft, ConcatRight, GraphList; generation of the semantic evaluator: OpenFile, CloseFile, OpenSem, StartCopy, Copy, InsertFramePart, GenAssign, EmitAssign, EmitAction; handling of the semantic macros: NewMacro, GetMacroNr; control over the entries into the name list: StopHash, RestartHash.
224 Program listings App. F The listing of cocosem is an example of a large semantic evaluator generated by Coco. But it is not useful to study cocosem, rather one should study the attributed grammar. cocosem OpenFile ConcatRight CloseFile Copy InsertFramePart StartCopy GraphList OpenSem CompleteAt GenAssign NewMacro GetMacroNr EmitAction Fig. F.5 cocosem and the modules imported by it cocosym The module cocosym handles the symbol list of Coco. It contains procedures to generate, read, and modify symbol nodes, to search names in the symbol list, to enter, read, and check attributes, and to generate and retrieve information about semantic macros. It also contains procedures to determine the deletability of nonterminals, and to collect their terminal start symbols. cocosym uses a few procedures from cocolex and cocogra. cocosym ClearMarkList Mark Marked Fig. F.6 cocosym and the modules imported by it cocogra The module cocogra handles the top-down graph. It contains procedures to generate, read, and modify graph nodes, to link subgraphs, and to print the entire top-down graph for tracing. cocogra also contains procedures to insert eps-nodes in front of deletable nonterminals, and to remove redundant eps-nodes. To output the top-down graph, cocogra needs the syntax symbols and their names, which it gets from the modules cocosym and cocolex. cocogen The module cocogen generates the semantic evaluator of the target compiler from the semantic declarations and semantic actions of the input grammar. It contains procedures to
App. F Module descriptions 225 cocogra RepSy Fig. F.7 cocogra and the modules imported by it read the frame module, to copy the semantic parts from the attributed grammar, and to translate attributes into semantic actions. cocogen uses no other modules of Coco except for the lexical analyzer, from which it gets the symbol names. cocogen Fig. F.8 cocogen and the modules imported by it cocotst The module cocotst is a collection of procedures for the execution of the grammar tests as described in Section 7.5. It uses the symbol list (from cocosym) and the top-down graph (from cocogra). For the output of error messages, cocotst needs the symbol names which are. obtained with the procedure GetName. To recognize the deletability of graph nodes, and subgraphs, it uses the procedures Deletable and DelNode from cocogra. cocotst Deletable DeINode ClearMarkList Mark Marked Fig. F.9 cocotst and the modules imported by it cocogen2 The module cocogen2 generates the syntax analyzer and the parser tables of the target compiler. The table values are obtained from the symbol list (with GetSy, RepSy, GetF, GetE, and GetA) and from the top-down graph (GetNode). Before the tables can be inserted into the syntax analyzer, cocogen2 transforms the top-down graph into G-code instructions. The syntax analyzer of the target compiler is assembled mainly from the frame parts (on the file cocosynframe), in which cocogen2 inserts the parser tables, some
App. F Program listings 226 declarations, and grammar-specific names. For the output of statistics, cocogen2 uses the procedure GetName from the lexical analyzer. cocogen2 cocogen CopyFramePart Fig. F.10 cocogen2 and the modules imported by it cocolst cocolst is called by the main program if errors have been detected during parsing. It reads the input again and prints a source list with error messages. Errors Errors is a general-purpose error message module that can be used by all compilers generated by Coco. It contains procedures for storing semantic and syntax errors, for retrieving stored error messages, and for printing all of the stored error messages at the end of the program. In addition, it contains procedures for handling implementation restrictions and compiler errors. FileIO FileIO is a general-purpose module that contains screen and disk I/O procedures for characters, strings, and numbers. It is based on five system modules which are not described in this book. These are Terminal, MemTypes, OS, Toolbox and QuickDraw (see Inside Macintosh [1985] and Wirth et al. [1986]). System System is an operating system module that among other things manages the heap. F.4 Instructions on how to study the source code The listings consist of the attributed grammar of Coco and all other modules in alphabetical order. The reader should first study the source code of the main module coco to see how the program is started and initialized. The lexical analyzer and the syntax analyzer are not essential for an understanding of the other modules, so they may be skipped in the beginning. The central document that describes the actual translation is the attributed grammar. The reader should study the attributed grammar and the procedures that are called from the semantic actions in detail. It is recommended that the procedures belonging to a particular task are studied together. These tasks are:
App. F Instructions how to study the source code 227 handling the symbol list: NewSy, GetSy, RepSy, IsSy handling the attributes: NewAt, GetAt, CompleteAt handling the top-down graph: NewNode, GetNode, RepNode, ConcatLeft, Ka SE ConcatRight, GraphList generating the semantic evaluator: CloseFile, CopyFramePart, InsertFramePart copying semantic parts: OpenSem, StartCopy, Copy generating attribute assignments: GenAssign, EmitAction handling semantic macros: NewMacro, GetMacroNr ee controlling the name list entries: StopHash, RestartHash The procedures for the collection of the symbol sets and the execution of the grammar tests may be studied in any order. The only procedures used almost everywhere are the procedures for marking paths that have been previously visited in traversing the top-down graph (ClearMarkList, Mark, and Marked in cocogra) and the procedures which check the deletability of graphs and graph nodes (Deletable and DelNode in cocogra). These procedures should be read first. As the last module, the reader should study cocogen2. It generates the parser tables and the syntax analyzer, and uses the data structures generated by the other modules. The reader should study these modules first to understand how the data structures are filled. Before an implementation module is studied, the corresponding definition module should be inspected. It describes the interface of the module, and contains the declarations and descriptions of all exported objects. The procedures of an implementation module appear in alphabetical order. Most of them are at the outermost level of the module. Only auxiliary procedures that are clearly part of another procedure are nested within this procedure. Each implementation module is followed by a cross-reference list. As an additional aid, Appendix E contains an intermodular cross-reference list with the names and types of all objects transferred between modules. This list also shows which modules export an object and which import it. Program listings in alphabetical coco.ATG coco.MOD cocogen.DEF, cocogen.MOD cocogen2.DEF, cocogen2 .MOD cocogra.DEF, cocogra.MOD cocolex.DEF, cocolex.MOD cocolst.DEF, cocolst.MOD cocosem.DEF, cocosem.MOD cocosemframe cocosym.DEF, cocosym.MOD cocosyn.DEF, cocosyn.MOD cocosynframe cocotst.DEF, cocotst.MOD Errors.DEF, FileIO.DEF System.DEF Errors.MOD FileIO.MOD order attributed grammar main program generator of semantics processor generator of the syntax analyzer top-down graph manager lexical analyzer source list generator semantic evaluator of Coco semantics evaluator frame symbol list manager syntax analyzer syntax analyzer frame grammar tests standard error module input/output module dynamic memory management 228 241 245 254 266 274 283 287 297 299 316 328 338 348 356 369
App. F Program listings 228 1 -- Attributed Q -- ssesss=s==s===s==sssssSs=5==5===== This grammar is a documentation of the compiler compiler Coco, but it is also an example how to use the Coco input language Cocol. The grammar describes the construction of the parser tables and of 3 4 5 6 | grammar ----the semantic einen 8 GRAMMAR of Coco evaluator. Moe EI Ba EIER ae 13.3.83 ee eee coco 9 ty SS u coco = GRAMMARSY = IDENT [SEMANTICSY DECLARATIONSY {any}] IP == eh = Ne [MACROSY {macrodef}] TERMINALSY {symbol [attr] (PRAGMASY {symbol [attr] 15 -16%== 17 == NONTERMINALSY {IDENT [attr] [aliasname] } RULESSY {IDENT [attr] '=' expr '.'} ENDGRAMSY . 18 -92 ZUR expr Zzterm ca = term {'|' term) =Etaetz trace). = ( symbol [attr] Ze | EPSSY BR | ANYSY PAR) iz Co | semaction I UO ebgoye tl DB) == ZAC SS I DIR Sepgoyer UY [UG kexpral tcl) [aliasname] } [semaction]}] . ue. ZU attr = (outattr Bes 299 2 Hl a= = = = inattr outattr semaction macrodef = INSY ':' (IDENT | NUMBER) {',' (IDENT | NUMBER)} = OUTSY ':' IDENT {',' IDENT} = SEMSY ( '(' IDENT ')' | {any}) ENDSEMSY . = SEMSY ":" IDENT ":" {any} ENDSEM . etinattr pit; outavtr)) er 32 -38) = symbol aliasname = IDENT | STRING . = ALIASSY symbol. 34 35 36 SEMANTIC DECLARATIONS 3] --=================== 38 39 FROM 40 41 FROM 42 43 FROM 44 FROM 45 46 cocogen IMPORT cocogra IMPORT cocolex cocosym IMPORT IMPORT Attrtype, CloseFile, Copy, EmitAction, GenAssign, InsertFramePart, OpenFile, OpenSem, StartCopy; alts, rules, rootloc, ConcatLeft, ConcatRight, GetNode, GraphList, Graphnode, NewNode, RepNode; typ, line, col, ddt, RestartHash, StopHash; gramspix, CompleteAt, Direction, GetAt, GetMacroNr, NewSy, RepSy, FROM Errors IMPORT CompErr, 48 FROM SYSTEM IMPORT VAL; 47 GetSy, Symbolnode, Restriction, 52 53 54 55 CONST null = 65535; -- null symbol -- symbol TYPE Usage = (def, check, use); 56 57 VAR 58 -- symbol 59eysn: nodes Symbolnode; node NewMacro, Symboltype, SemErr; 49 50 51 NewAt, SyNr;
App. F 60 61 62 coco.ATG sy, syl: rootsy: eofsy: 63 64 Ooi 66 67 68 69 70 CARDINAL; CARDINAL; CARDINAL; ---- 229 symbol numbers start symbol of grammar endfile symbol (always Nr. -- graph nodes gn: gp,gpl,gp2,gp3: gl,gl1,g12,g13: Graphnode; CARDINAL; CARDINAL; -- graph node -- ptr to start -- ptr to right dd, dd1,dd2: gpo: firstfact: BOOLEAN; CARDINAL; BOOLEAN; ---- il U2 73 74 1 Ths 77 -- attribute processing Kinde Usage; styp: Symboltype; dir, dirl: Direction; count: CARDINAL; 48 CARDINAL; +-- generation of semantic of graphs open ends of graphs is graph deletable ? auxiliary ptr TRUE if first factor in term -- usage of attribute -- (eps,t,pr,nt,any,err) -- input/output attribute -- attribute counter -- value of an attribute constant evaluator 78 seml,sem2,sem3: CARDINAL; -- semantic 79 80 81 firstsymbol: -- various ok: BOOLEAN; -- current 82 spix, --- error indicator auxiliaries 83 84 85 dummy: 86 -- BOOLEAN; CARDINAL; spixl: 0*) actions symbol the first in action ? CARDINAL; SEMANTICSTACK Stack to save semantic values 87 --===========2=====2===2===2=2====2 2222222222222 22222222222 == 2222222222222 88 MODULE SEMANTICSTACK; 89 IMPORT CompErr, 90 ‚EXPORT Pop, Restriction; Push; 91 CONST maxstacksize = 70; 92 VAR 93 stack: ARRAY[l..maxstacksize] 94 sp: CARDINAL; 95 96 PROCEDURE Pop(): CARDINAL; 97 VAR 98 x: OF CARDINAL; CARDINAL; BEGIN 99 100 101 IF sp=0 THEN RETURN x; END Pop; CompErr(6); ELSE x:=stack[sp]; DEC(sp); END; 102 103 104 PROCEDURE BEGIN 105 IF sp<maxstacksize 106 THEN 107 108 109 110 111 Push(x:CARDINAL) ; INC (sp); stack[sp] :=x; ELSE Restriction(14); END; END Push; BEGIN 112 sp:=0; 113 END SEMANTICSTACK; 114 AES) iMG ——s a ee heron Report semantic 118 PROCEDURE Error(nr:CARDINAL) ; error 2222222
230 Program listings 119 BEGIN SemErr(nr,line,col); END Error; 120 121 122 123 124 sem :AssignIdl: 125 126 INC (count); CASE kind OF 127 use: 128 IF styp=nt THEN 129 GetAt (!sy, !count, “spixl, “dirl); 130 IF spixl<>0 THEN 131 IF dir=dirl 132 THEN GenAssign(!nonterm, !spixl, !spix); 133 ELSE Error(8); END; 134 185 END; 136 END; 137 | check: 138 IF styp=nt THEN 139 GetAt (!sy, !count, *spixl, *dirl); 140 IF spixl<>0 THEN 141 IF spix<>spixl THEN Error(9); END; 142 IF dir<>dirl THEN Error(8); END; 143 END; 144 END; 145 | def: 146 NewAt (!sy, !spix, !dir); 147 END; -- CASE 148 endsem 149 150 sem :AssignId2: 151 INC (count) ; 192 CASE kind OF 153 use: 154 IF styp=t THEN 153 GenAssign(!term, !spix, !count) ; 156 ELSIF styp=nt THEN 157 GetAt (!sy, !count, “spixl,“dirl); 158 IF spixl<>0 THEN 159 IF dir=dirl 160 THEN GenAssign(!nonterm, !spix, !spix1) 161 ELSE Error (8); 162 END; 163 END; 164 END; 165 | check: 166 IF styp=nt THEN 167 GetAt (!sy, !count, *spixl, *dirl); 168 IF spixl<>0 THEN 169 IF spix<>spixl THEN Error(9); END; 170 IF dir<>dirl THEN Error(8); END; 171 END; 172 END; 173 | def: 174 NewAt (!sy,!spix,!dir); 175 IF styp=pr THEN 176 GenAssign(!term, !spix, !count) ; 177 END; App.F
App. F coco.ATG 178 END; 179 endsem -- 231 CASE 180 181 182 sem 183 :AssignNumber: INC (count); IF kind=use 184 185 THEN IF styp=nt THEN 186 GetAt (!sy, !count, *spixl,“dirl); 187 188 IF spixl<>0 THEN IF dir=dirl 189 190 THEN GenAssign(!const, !spix1, !n) ; ELSE Error (8); 191 END; 192 193 194 195 END; END; ELSE Error(10); END; 196 endsem 197 198 199 sem :CheckAttr: IF NOT CompleteAt (!sy, !count) 200 THEN Error (6); 201 END; 202 endsem 203 204 205 sem :Copy: Copy (typ, col) endsem sem :InitCopy: StartCopy (1) 206 207 208”, 209 210 endsem 211 212 213 214 sem :PopPointers: firstfact :=VAL (BOOLEAN, Pop()); ddl :=VAL (BOOLEAN, Pop()); gll:=Pop(); 215 dd:=VAL (BOOLEAN, Pop()); 216 217 218 219 gpo:=0 endsem sem 220 221 gl:=Pop(); :PushPointers: Push(!gp); Push(!gl); Push(!VAL(CARDINAL, dd) ); Push(!gpl); Push(!gll); Push(!VAL(CARDINAL, ddl) ) ; 222 Push(!VAL(CARDINAL, firstfact) ); 223 224 225 226 227 228 endsem 229 230 231 232 233 234 gpl:=Pop(); gp:=Pop(); sem :StoreSymbol: sy:=SyNr(!spix); IF sy=null THEN sy:=NewSy (spix, styp) ELSE END; endsem TERMINALS 235 --======= 236 Error(1);
232 Program listings 237 -- 238 239 240 241 ALIASSY ANYSY DECLARATIONSY ENDGRAMSY 242 243 App. F key words ENDSEMSY +EPSSY alias alias alias alias "ALIAS" "any" "DECLARATIONS" "ENDGRAM" == ---- 1: 2: 3: 4: ALIAS ANY, any DECLARATIONS ENDGRAM alias alias “endsem" "eps" -== 5: OH ENDSEM [hy es "GRAMMAR" GRAMMAR 244 GRAMMARSY alias -- 7: 245 INSY allasarinz =—) 9: 246 247 MACROSY NONTERMINALSY alias alias "MACROS" "NONTERMINALS" --- 9: 10: MACROS NONTERMINALS 248 OUTSY alias "out" -- 11: OUT, 249 250 PRAGMASY RULESSY alias alias "PRAGMAS" "RULES" -- 12: 13:2 PRAGMAS RULES 251 SEMSY alias "sem" -- 14: SEM, 252 253 SEMANTICSY TERMINALSY alias alias "SEMANTICS" "TERMINALS" --- 15: 16: SEMANTICS TERMINALS alias identifier -a= = 17: Alls, 9: name Shgkealiave; eeiconstant IN, in in sem 254 255 256 257 258 259 260 261 262 263 264 205) 285 267 -- terminal classes IDENT <out:spix> STRING <out:spix> NUMBER <out:n> al Ne ul mi u)! MIP al -- 20 ==, Vil ==e22 == 23 SS 7! == 2S == 2G 20 Sees -- 27 269 ea a= 270 271 ee USS be ==" 30 272 u: ==: 3] 273 Es a ey) 274 275 276 277 ir nococosy == Se) 278 NONTERMINALS AG 280 ee ee coco 281 282 283 284 285 -- expr 286 287 293 294 295 characters <out:n> alias "correct grammar" ~~ recognizes the whole compiler description <out:gp,gl,dd> alias expression -- recognizes an expression and builds its TDG. -- gp points to the root of the TDG -- gl points to right open ends of the TDG <out:gpl,gll,ddl> ---- fact fhe 3A -- dd indicates term 288 289 290 291 292 single recognizes gpl points gll points if the TDG alias 35 == is deletable alternative Shi) an alternative and builds to the root of the TDG to right open ends of the -- ddl indicates if the TDG is deletable <in:gpo,firstfact; out:gp2,g1l2,dd2,gpo> -- recognizes -- gp2 points a= alias symbol a component and builds to the root of the TDG its TDG. TDG its TDG. "== 38
App. F coco.ATG 296 297 298 299 300 ---- 301 302 303 304 305 306 307 308 309 310 Ss] 312 is TRUE, if fact is the out:seml,sem2,count> „alias attribute first is 0 one in the term == 49) -- recognizes input/output attributes for the symbol -- with type styp. -- kind=def: used in declaration context == seml=0. sem2=0 (except of pragmas) -- kind=check: used on the left-hand side of rules inattr seml=0, -- kind=use: == Saye -- count is the used on the right-hand side of rules seml: sem.no. of input attribute evaluation sem2: sem.no. of output attribute evaluation nr.of attributes in attr sem2=0 out:seml,count> == alias "in-attribute" input/output attributes -- recognizes -- with type styp 315 sy -- <in:sy,styp,kind,count; 313 314 316 317 318 319 320 321 322 323 324 325 gl2 points to right open ends of the TDG dd2 indicates if the TDG is deletable gpo points to the predecessor of fact or -- firstfact <in:sy,styp,kind; attr 233 for the symbol (sy must be a nonterminal). 40 sy -- kind=def: used in declaration context -seml=0. -- kind=check: used on the left-hand side of rules == seml=0. -- kind=use: used on the right-hand side of rules = seml: sem.no. of input attribute evaluation -- count is the no.of attributes in inattr <in:sy,styp,kind,count; out:sem2,count> alias "out-attribute” -- recognizes input/output attributes for the symbol sy outattr 326, -- 321: -- kind=def: with type styp. used in declaration 328 329 330 331 332 --- kind=check: _— -- kind=use: ca sem2=0. used on the left-hand side of rules sem2=0. used on the right-hand side of rules sem2: sem.no. of output attribute evaluation context 333 -- count is the no.of attributes in outattr 334 semaction <out:sem3> alias "semantic action" == 42 335 -- recognizes a semantic action and generates a CASE block 336 -- in Semant. sem2 is the action number. 337 macrodef alias “semantic macro” as 4) 338 symbol <out:spix> -- 44 339 -- recognizes a name or a string 340 aliasname <in:sy> alias "alias name" =_=45 341 -- recognizes a name which is used for the symbol sy in 342 -- syntax error messages in the generated compiler. 343 344 345 --======================== grammar rules ================2=============== 346 347 RULES coco = 348 GRAMMARSY 349 350 351 IDENT <out:gramspix> sem rules:=0; alts:=0; OpenFile (gramspix); endsem 352 353 354 355 [ SEMANTICSY { any DECLARATIONSY sem sem (InitCopy) endsem (Copy) endsem StopHash;
356 3911 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 Si) 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 App. F Program listings 234 ] sem RestartHash; InsertFramePart; endsem [ MACROSY { macrodef TERMINALSY { symbol <out:spix> styp:=t; } ] sem eofsy:=NewSy(!0,!t) endsem sem (StoreSymbol) endsem [ attr <in:sy,t,def; out:seml,sem2,count> { aliasname <in:sy> ] } { PRAGMASY { symbol { attr <out:spix> sem sem <in:sy,pr,def; ] styp:=pr endsem (StoreSymbol) endsem out:seml,sem2,count> sem GetSy(!sy,*sn); RepSy (!sy,!sn); endsem sn.seml:=sem2; ] { semaction <out:sem3> sem GetSy(!sy,“sn); RepSy (!sy,!sn); endsem sn.sem2:=sem3; } } ] NONTERMINALSY { IDENT <out:spix> sem styp:=nt endsem sem (StoreSymbol) endsem [ attr <in:sy,nt,def; out:seml,sem2,count> [ aliasname <in:sy> ] } sem ] rootsy:=SyNr (!gramspix); IF rootsy=null THEN Error(2); endsem END; RULESSY { IDENT <out:spix> sem sy:=SyNr(!spix); IF sy=null THEN Error(3); sy:=NewSy(!spix, !err) END; GetSy(!sy,”sn); IF (sn.typ<>nt) AND (sn.typ<>err) Error (4); END; IF sn.start<>0 [ attr We expr THEN Error(5); END; syl:=sy; count:=0; styp:=sn.typ endsem <in:sy,styp, check; out:seml,sem2,count> ] sem (CheckAttr) endsem <out:gp,gl,dd> sem GetSy(!syl,*sn); sn.start:=gp; sn.del:=dd; RepSy(!syl,!sn); INC (rules); endsem St } sem rootloc:=NewNode (Int, !rootsy, !0); gpl:=NewNode (!t, !eofsy, !0); gl:=rootloc; gll:=gpl; ConcatRight (rootloc,gl, endsem !gpl,!gli) THEN
coco.ATG ENDGRAMSY sem 235 IF ddt["L*] THEN GraphList; END; CloseFile; endsem. expr <out:gp,gl,dd> = term <out:gp,gl,dd> “ sem INC(alts); <out:gpl,gll,ddl> sem INC (alts); endsem el term Concatleft (gp,gl, !gpl,!gqll); dd:=dd OR ddl endsem term <out:gpl,gll,ddl> = sem gpo:=0 endsem fact <in:gpo,TRUE; out:gpl,gl1,ddl,gpo> { fact <in:gpo,FALSE; out:gp2,g12,dd2,gpo> sem IF gp2<>0 THEN ConcatRight (gp1,g11,!gp2, !gl2); ddl:=ddl AND dd2; END; endsem fact <in:gpo,firstfact; ( symbol <out:spix> out:gp2,g12,dd2,gpo> = sem sy:=SyNr(!spix); IF sy=null THEN Error(3); sy:=NewSy (!spix, !err) END; GetSy(!sy,”sn); IF sn.typ=pr THEN Error (16); END; gp2:=NewNode (!sn.typ, !sy, !line); gl2:=gp2; dd2:=FALSE; gpo:=gp2; count:=0; styp:=sn.typ endsem [ attr <in:sy,styp,use; out:seml,sem2,count> sem GetNode(!gp2,%gn) ; gn.seml:=seml; gn.sem2:=sem2; RepNode (!gp2, ! gn) endsem sem (CheckAttr) endsem ] | EPSSY sem gp2:=NewNode(!eps,!0,!line); | ANYSY endsem sem gp2:=NewNode(!any,!0,!line); gl2:=gp2; gl2:=gp2; | semaction <out:sem3> sem dd2:=TRUE; dd2:=FALSE; gpo:=gp2 gpo:=gp2 endsem IF gpo=0 THEN gp2:=N(!eps, ewN !0, ode !line); gl2:=gp2; dd2:=TRUE; GetNode (!gp2,”gn); RepNode (!gp2, !gn); gn.sem3:=sem3; ELSE GetNode(!gpo,*gn); gn.sem3:=sem3; RepNode (gpo, gn) ; gp2:=0; gl2:=0; gpo:=0 END; endsem
474 475 476 IR expr ie 477 [Peet 478 App. F Program listings 236 sem (PushPointers) endsem sem (PopPointers) sem (PushPointers) sem gp2:=NewNode (!eps, !0,!line); <out:gp2,g12,dd2> expr <out:gp,gl,dd> 479 endsem endsem gl2:=gp2; 480 ConcatLeft (gp,gl, !gp2,!g12); 481 gp2:=gp; 482 483 484 485 gl2:=gl; dd2:=TRUE; endsem sem (PopPointers) endsem sem (PushPointers) endsem si linet expr <out:gp,gl,dd> sem gp2:=NewNode(!eps,!0,!line); 486 gl2:=gp2; 487 488 ConcatRight (gp,gl, !gp, !gl); ConcatLeft (gp,gl, !gp2,!g12); 489 490 491 492 493 494 495 gp2:=gp; dd2:=TRUE; -- gl2 is link of eps endsem sem (PopPointers) endsem sem IF firstfact THEN gp3:=gp2; gl3:=g12; u gp2:=NewNode (!eps, !0, !line); gl2:=gp2; 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 ConcatRight (gp2,g12,!gp3, !g13); END; endsem ). -----------------------------------------------------------------------attr <in:sy,styp,kind; out:seml,sem2,count> = net sem seml:=0; sem2:=0 endsem ( inattr <in:sy,styp,kind,0; out:seml,count> [ ';' outattr <in:sy,styp,kind,count; out:sem2,count> | outattr <in:sy,styp,kind,0; out:sem2,count> inattr INSY <in:sy,styp,kind,count; out:seml,count> = sem IF styp<>nt THEN Error(7); dir:=down; 532 END; endsem Ba ( IDENT <out:spix> | NUMBER <out:n> ) (er ( IDENT <out:spix> | NUMBER <out:n> )} sem sem (AssignIdl) endsem (AssignNumber) endsem sem sem sem (AssignIdl) endsem (AssignNumber) endsem IF kind=use THEN EmitAction(!line,*sem1) END; 523 1 ] ) ‘oak --------------------------------------------------~-~------------------- 521 522 aaa 525 526 Sn 528 529 am node ; endsem. outattr OUTSY Mosh IDENT UN menu <in:sy,styp,kind,count; out:sem2,count> sem dir:=up endsem <out:spix> IDENT } <out:spix> = sem (AssignId2) sem sem (AssignId2) endsem IF (kind=use) OR (styp=pr) endsem EmitAction(!line, "sem2) ; THEN
App. F 33 534 SE 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 coco.ATG semaction SEMSY WE: IDENT === <out:sem3> END; endsem. Sr 237 nn m = sem StopHash; firstsymbol:=TRUE endsem “sem RestartHash endsem sem GetMacroNr (!spix, “sem3) ; IF sem3=0 THEN Error(12); END; endsem <out:spix> UpNee | { any sem IF firstsymbol THEN firstsymbol:=FALSE; OpenSem(!line,*sem3); END; Copy (!typ, !col) endsem sem RestartHash; endsem 5 } StartCopy(!col) ) 551 ENDSEMSY. 952 553 ------------------------------macrodef = 554 SEMSY 955 556 tet IDENT <out:spix> sem SE 558 2.0.0000... OpenSem(!line, ”sem3); NewMacro (!spix, !sem3, *ok) ; IF NOT ok THEN Error (11); END; 559 StopHash; firstsymbol:=TRUE; 560 endsem 561 San 562 { any sem IF firstsymbol THEN 563% firstsymbol:=FALSE; StartCopy (col) 564 END; 565 Copy (!typ, !col) 566 endsem 567 } 568 ENDSEMSY sem RestartHash endsen. 569 ------------ nnn nn nnn nnn 7777220 570 symbol <out:spix> = oval ( IDENT <out:spix> | STRING <out:spix> ). a a a 573 aliasname 574 ALIASSY 915 576 symbol <in:sy> = <out:spix> sem WT GetSy(!sy,“sn); RepSy(!sy,!sn); sn.aliasspix:=spix; endsem. 578 579 ENDGRAM alias allasname aliasspix ALIASSY alts any ANYSY AssignIdl 238 239 240 BO 2S 25 BELIEBEN N) 13 Tey Ssh Si) 3302385574 41 349 420 11 Seal 22 239 459 125, Pola ASS 241 28 242 PAR 243 230 244 ZEA SHO BX SRA yesh 422 7320 739783555.4597 245 Zh) 246 sh 54352562 247 eid 248 sricsh 249 ey
528 515 14 501 shill 530 519 15 16 20 27 314 322 325 333 402 416 10 455 280 347 119 89 199 423 413 205 99 545 547 563 565 480 433 488 487 496 204 126 Sn 448 25 205 130 312 450 220 359 139 312 501 282 547 151 322 503 286 565 155 323 504 404 214 292 221 297 287 431 291 434 415 240 145 393 173 304 132 130 74 142 132 146 139 116 229 521 241 242 363 243 243 392 118 387 532 415 951 411 456 456 395 ae 392 480 67 67 18 19 213 537 133 130 539 451 371 215 481 214 292 AssignId2 AssignNumber attr attributes Attrtype CheckAttr CloseFile coco cocogen cocogra cocolex cocosym col CompErr CompleteAt ConcatLeft ConcatRight const App. F Program listings 238 3m 365 370 384 401 157 323 504 405 167 333 505 419 176 365 509 420 182 370 509 424 186 384 525 424 199 399 525 478 422 439 424 447 428 457 430 460 434 465 434 475 481 316 327 365 370 384 159 142 170 157 174 159 188 167 511 170 526 186 188 464 478 485 490 495 442 134 396 141 398 142 442 161 445 169 510 170 540 190 558 194 24 20 222 543 155 139 25 292 292 544 160 157 26 298 299 559 176 167 282 299 439 562 189 186 404 430 493 563 419 431 475 439 478 485 466 376 220 485 221 296 469 394 282 487 287 431 404 285 487 290 433 444 404 488 412 439 575 412 413 419 420 423 478 413 447 422 457 423 460 428 465 430 471 433 475 300 189 Copy count 300 401 dd 485 ddl dd2 489 ddt DECLARATIONSY def del dir dirl Direction down 405 Sit dummy EmitAction ENDGRAMSY ENDSEMSY eofsy eps EPSSY err Error 568 Errors expr fact firstfact firstsymbol GenAssign GetAt GetMacroNr GetNode GetSy gl gll gl2 200
coco.ATG gp2 gp3 gpo GRAMMARSY gramspix GraphList Graphnode IDENT inattr InitCopy InsertFramePart INSY kind line link macrodef MACROSY maxstacksize n name NewAt NewMacro NewNode NewSy nococosy nodes nonterm NONTERMINALSY nr nt null NUMBER ok OpenFile OpenSem outattr OUTSY Pop PopPointers pr PRAGMASY Push PushPointers RepNode RepSy RestartHash 479 67 65 66 481 66 433 66 456 478 66 69 447 10 44 42 42 10 383 27 208 40 28 72 323 43 556 49) 12 12 91 76 256 45 45 42 46 275 58 133 15 118 73 52 28 81 40 40 27 29 90 212 13 14 90 219 42 46 43 480 494 451 215 485 214 292 457 479 494 216 457 244 349 415 65 15 390 28 354 358 245 127 327 119 31 246 93 189 339 146 557 410 228 64 160 247 119 129 52 28 557 350 545 27 248 96 476 175 249 103 474 453 372 357 481 496 452 220 487 221 239 488 490 494 495 496 453 284 488 289 466 404 489 411 466 405 467 419 469 420 469 423 470 478 480 412 413 422 423 428 430 432 460 485 433 460 486 439 464 488 446 465 489 447 466 494 447 467 495 451 471 495 453 475 496 298 469 429 470 430 471 430 431 431 439 439 16 514 312 28 528 503 29 530 509 29 539 30 556 31 571 32 256 349 510 192 329 446 300 501 459 304 503 464 306 504 478 308 505 485 312 509 495 316 520 521 318 525 532 320 531 545 295 457 480 496 292 460 348 350 337 361 105 258 341 174 553 515 319 411 363 456 442 459 464 478 485 495 382 309 138 227 258 558 311 166 391 519 32 185 441 322 382 332 384 333 395 410 333 504 505 525 214 214 214 25 ZS) 215 445 531 220 220 221 221 221 222 556 29 526 101 483 368 368 109 477 467 shi] 538 576 568 510
Restriction root rootloc rootsy rules RULESSY seml sem2 sem3 semaction Semant SEMANTICSTACK SEMANTICSY SemErr SEMSY sn sp spix spixl stack StartCopy StopHash StoreSymbol 47 89 107 284 289 295 AeA OMe AS 61 386 387 410 41 306 309° 31873207 329" 331 3455 34955407 16 250 389 79.300.305 307 1 IB RES 72319 es 392 4014507 4525 4525 501° 5022508509521 ie sk) 3050307023100 ews) eh ale) sig SI) SR) SY 3710037160, 3847 “401 450 4529 45222501 S02 04 eeODmmD ZO 532 78 334 375 376 462 466 466 469 469 536 539 540 5455255655557] 14 23 300 3340537150.24:6252536 336 86 88 E2523 113 53 47 119 30 Sy 592370 404 405 94 99 8271337 25155371554 9371103129316 405 406 444 99 99° 105) 7142171467 1557 257 SHU S25 169 93 40 A35 338 yA 30S 186 99 209 3505 364 SE Ste 187 106 545 233i) 364 369 sv PE Sal string 257 339 styp 13201295213 85 154515 315 323 326 358 368 a Sy a Gl) GS; Bee a 30077302 5312753147 3197 37131273167 3117 3327 440 441 442 444 446 ay SE BIG 60 399 404 406 a 208 32 re 293 302 314 325 338 59 46 A 3 46 226 386 390 440 Symbolnode Symboltype SyNr SYSTEM 48 t term 13 1517 35873630 365) Tre TERMINALSY 13552530363 typ type up 43 205 395 3037731577326 526 Usage Sey use VAL x 455) ci lsh 97 99 128) ZA 100 O04 448 2112 TA esoones OS 575 575 6 226 228256 390 08597 ze 392 SS a 440 440 RR 442 514 less A GE aS 399 IS) 401 BOE 448 SO) 450 SYNE} 501 Aa es Na 325734073410. 390,739 7392 501 503 504 22 3650. 3045 505 Ban DR 3660.370 3008 401 509 525 es mesos mes oo 576 563 6559 383 225 syl symbol 31.503111 445 446 106" 21106 60) 6 oe 369 383 390 2539225395755 USS es Soe 189 STRING sy App. F Program listings 240 65166, 382 ys! ken 3237 3857 450 ae 341 5 O) 364 369 440 ae 2 a 547 61 62 570 575 2411] 395 399 445 446 448 S53) PS 103 183) QA 106 308s DR 320m Wo 33184505 Ak 565 52008531 SHLD 503 9
App. F coco MOD 1 (* Coco 2 == Compiler compiler 241 Coco Moe SETHIS is the main module of Coco. It controls the execution 4 compiler compiler. It Sera) opens and closes the files One b) initializes the scanner Je clmcallssthe parser 8 d) calls the procedures which collect the symbol sets 9 e) calls the grammar test procedures 10 f) calls the procedure which generates the compiler 1l g) calls the 12 13 Implementation 14 1: cocolex 15 2: cocolex 16 3: cocolex 17 4: cocolex 18 5: cocogra 19 6: cocosym lister to print a listing with error of 27.12.83 the messages restrict ions: Hash, Hash table full Name list full Include stack overflow Attribute queue overflow Too many nodes in TDG (>600) Symbol list overflow (>199) Too many terminals (>127) Hash PushInc EnQueue NewNode NewSy 20 7: cocosym NewSy zu 22 Compiler errors: 23 i: cöcolex PopInc Include stack underflow 24 2: cocolex DeQueue Attribute queue underflow 25 3: cocosym GetAt Try to get attribute inf. for a terminal 26 4: cocogen OpenFile Semantic frame not found 27 5: cocogen2 GenSynFiles Parser frame not found 28 6: cocogen2 NewAdr Fixups already resolved 29 30 Trace switches: can be set by "$D letter {letter}" (without spaces) 31 +, A: cocosyn Print parser input (remove comments!) 32 B: cocosyn Trace parser run (remove comments!) 33 C: cocogra DelGraph Print visited nodes 34 D: cocotst FindCircularRules Print derivations between single nt's 35 E: cocotst TestIfNtToTerm Trace flow of algorithm 36 F: cocotst CheckAlternatives Print visited nodes 37 G: cocosym CollectF irstSet Print visited nodes 38 H: cocosym GetFirst Set Print resulting set 39 I: cocosym GetFollo wSets Print resulting sets 40 J: cocosym CollectFollowSets Print visited nodes 41 K: cocosym Print sets of term.starts and succ.s 42 L: cocosem Print generated TDG 43 44 MODULE Coco; 45 46 FROM cocogen filesopen, CloseFile; 47 FROM cocogen2 GenSynFiles, PutStatistics; 48 FROM cocogra DeleteRedundantEps, NewEpsBeforeDelNts; ddGyasncy 49 FROM cocolex lst, PrintListing; 50 FROM cocolst FindDelSymbols, GetSymbolSets; 51 FROM cocosym Parse, printinput, printnodes; 52 FROM cocosyn FindCircularRules, LL1Test, TestCompleteness, 53 FROM cocotst TestIfAllNtReached, TestIfNtToTerm; 54 GetNumberOfErrors; 55 FROM Errors 56 FROM 57 58 FROM 59 FileIO con, File, WriteLn, System Terminate, Done, Open, WriteString; normal; Close, Read, Writelnt,
60 61 62 63 64 65 66 67 68 69 70 ql 72 73 74 75 76 14) 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 Of 98 99 100 101 102 103 104 105 106 107 108 109 110 111 1212 113 114 115 116 7 118 App. Program listings 242 VAR ch? CHAR; correct: BOOLEAN; aus lstn: ok: BOOLEAN; ARRAY(0..63] BOOLEAN; semerrors: CARDINAL; synerrors: CARDINAL; (* ChangeExtension PROCEDURE OF Change (*TRUE (*list CHAR; extension ChangeExtension(VAR if grammar is LL(1)*) file name*) of file name ext:ARRAY OF CHAR; old,new:ARRAY OF CHAR); VAR i,}: INTEGER; BEGIN 1:=0; WHILE (i<=HIGH(old)) WHILE j:=1; WHILE (i>=0) AND (old[i]<=" (j>=0) AND (old{j]<>".") IF j>=0 THEN 1:=j-1; AND (old[i]<>0C) DO 1:=i+l; ") DO DEC(i) DO DEC(j) END; END; END; END; FOR j:=0 TO i DO new[j]:=old[j]; END; new[itl]:="."; new[i+2]:=ext[0]; new[i+4]:=ext[2]; new[i+5] :=0C; new(i+3]:=ext[1]; END ChangeExtension; BEGIN WriteString(con,"Coco - Compiler Compiler Vs 4.1$"); Open (src,0,"",FALSE); IF NOT Done THEN Terminate (normal) END; (*cancel*) ChangeExtension (src”.name,1lstn, "LST"); Open (lst,src*.volRef,1lstn, TRUE); WriteString(lst,"Coco - Compiler Compiler Vs 4.1 Mr WriteString(lst," (Source file: "); WriteString(lst,src*.name) ; WriteString(lst,")$$"); WriteString(con, "parsing"); Parse (correct); (*parse GetNumberOfErrors (synerrors,semerrors); (*check IF synerrors+semerrors<>0 THEN IF filesopen THEN CloseFile END; WriteString (con, "$listing"); PrintListing; WriteString(con,"$Compilation terminated. "); Writelnt input grammar*) for errors*) (con, synerrors+semerrors,0); WriteString(con," errors Close (sre); Close (lst); Read(con,ch); END; detected. Press Terminate (normal); WriteString(con, "$evaluating$"); FindDelSymbols; NewEpsBeforeDelNts; DeleteRedundantEps; GetSymbolSets; TestCompleteness (ok) ; IF ok THEN TestIfAllNtReached(ok); END; any key.$");
coco.MOD 119 120 IF ok THEN IF ok THEN 121 122 IF ok THEN LL1Test (111); END; IF NOT ok OR NOT 111 THEN 123 243 FindCircularRules(ok); END; TestIfNtToTerm(ok); END; WriteString(con,"listing$") ; 124 WriteLn(1st); 125 END; WriteLn(Yst); PrintListing; 126 127 IF 128 129 130 ok THEN WriteString(con, "writing$"); GenSynFiles; PutStatistics; bl END; 132 IF NOT 133 ok THEN # WriteString(con, "Compilation 134 Shey 136 137 138 ELSIF NOT 111 THEN WriteString(con, "Compilation ELSE WriteString (con, "Compilation END; 139 Close (src); 140 WriteString(con," 141 END with errors ended with LL(1) completed. in grammar tests."); errors."); No errors detected."); Close (lst); Press any key.$"); Read(con, ch) ; Coco. & ch ChangeExtension Close CloseFile Coco” » cocogen 77 62 73 56 46 44 46 cocogen2 47 cocogra cocolex cocolst cocosym cocosyn cocotst con 48 49 50 51 52 53 56 185 63 49 correct ddt DeleteRedundantEps Done 56 84 109 85 108 102 141 89 137 99 48 91 Errors ext 55 73 83 File Filelo filesopen FindCircularRules FindDelSymbols GenSynFiles GetNumberOfErrors GetSymbolSets 56 56 46 53 Sl 47 55 Sn 102 119 113 129 100 116 HIGH un 1 74 83 74 5 ended 76 83 79 140 92 108 139 139 98 140 103 140 105 106 107 109 112 VS 28 133 77 “84 80 tH 78 78 78 Ty 333 82 81 81 82 82 82 5 83 84 07 83 80 77 84 80
App. F Program listings 244 121 121 93 92 95 82 48 91 ite ig 90 99 122 134 94 93 95 95 96 108 83 114 109 118 an 93 83 83 84 84 118 78 119 80 119 82 120 52 50 52 104 124 PutStatistics Read semerrors src synerrors 47 56 67 49 68 130 109 100 90 100 140 101 92 101 106 23 106 95 108 139 System 58 95 137 95 140 96 98 147 LL1Test lst lstn name new NewEpsBeforeDelNts normal ok old Open Parse 64 53 50 65 92 13 printinput PrintListing printnodes 58 66 73 56 52 Terminate 58 TestCompleteness 53 TestIfAllNtReached TestIfNtToTerm volRef WriteInt WriteLn 54 93 56 57 WriteString 57 128 91 Jl) 54 120 106 124 89 133 124 124 139 120 121 122 127 132 103 105 107 2 123 109 118 124 94 135
App. F cocogen.DEF (* cocogen Generator This module a) copies b) copies c) stores DEFINITION FROM files Moe 28.12.83 generates the semantic evaluator. It symbols from the input grammar to the evaluator text from the semantic frame to the evaluator attribute assignments (and emits them as semantic MODULE FileIO = actions) cocogen; IMPORT TYPE Attrtype of compiler 245 File; (term,nonterm, const); ? VAR maxsem: filesopen: CARDINAL; BOOLEAN; PROCEDURE CloseFile; (* Closes the file (*number of last semantic action*) (*files may remain open after a syntax where the semantic evaluator PROCEDURE Copy (typ,col:CARDINAL) ; (* Copies the source symbol typ at column semantic file*) PROCEDURE (* Copies col is written to the error*) to*) generated CopyFramePart (VAR fl,f2:File; s:ARRAY OF CHAR); file fl to file £2 until string s occurs. s is not copied*) PROCEDURE EmitAction(line:CARDINAL; VAR sem:CARDINAL); (* Emits the stored attribute assignments as a semantic action. line “» is used to print a comment. sem is the number of the new action*) PROCEDURE GenAssign(typ:Attrtype; left, right:CARDINAL) ; (* Generates an assignment arg(left)<--arg(right). typ indicates if arg(right) is a terminal attribute, a nonterminal attribute or a constant*) PROCEDURE InsertFramePart; (* Inserts the middle part in the generated semantics file*) PROCEDURE OpenFile(spix:CARDINAL); (* Opens the file where the semantic evaluator is written to. spix is the grammar name in Cocol. The name of the generated file is the grammar name with the suffix "sem"*) PROCEDURE OpenSem(line:CARDINAL; VAR sem:CARDINAL); (* Prints the start of a new semantic action (case-number of a new case-block). line is used to print a comment. sem is the number the new action*) PROCEDURE StartCopy (col:CARDINAL); (* Saves col as the leftmost column END cocogen. in the following semantic of action*)
App. F Program listings 246 1 (* cocogen Q 9 sess=== 3 This module a) copies 4 5 b) copies 1 stores c) 6 of semantic Generation evaluator Moe generates the semantic evaluator. It symbols from the input grammar to the evaluator text from the semantic frame to the evaluator attribute assignments (and emits them as semantic 30.12.83 Be -727770 -----7772722 -------=----------- 8 IMPLEMENTATION MODULE cocogen; 10 FROM cocolex IMPORT at, 11 FROM 12 FROM 13 14 FROM Errors FileIO IMPORT IMPORT System IMPORT CompErr, SemErr; con, File, Done, Open, Close, Read, Write, WriteCard, WriteLn, WriteString, WriteText; Allocate, Deallocate; line, col, src, GetName; 15 16 CONST 17 blanks = 19S 20 ident SELING =" 17; lS) 2a number = 22 23 Ilparsy commasy = 23; = 33; 24 eolsy =299; 18 " We (*symbol numbers*) 19; 26 TYPE 27 28 29 30 Sil 32 33 34 35 36 37 38 39 40 Actionptr = POINTER TO Action; Assignmentptr = POINTER TO Assignment; Action = RECORD (*information sem: about attr.eval. action*) (*action number*) firstass: Assignmentptr; (*to first assignment*) next: Actionptr; (*to next action*) END; Assignment = RECORD (*information about an attr. assignment*) typ: Attrtype; (*term, nonterm, const*) left: CARDINAL; (*spix of left-hand side*) right: CARDINAL; (*spix or val of right-hand side*) next: Assignmentptr; (*to next assignment*) END; Name = ARRAY[1..80] OF CHAR; 41 42 VAR 43 firstact: 44 firstass: 45 fram: 46 gram: 47 48 graml: lastact: 49 50 51 lastass: lastcol: lasttyp: leftcol: margin: op: sem: semname: 52 53 54 55 e CARDINAL; 58 59 PROCEDURE Actionptr; Assignmentptr; File; Name; CARDINAL; Actionptr; Assignmentptr; CARDINAL; CARDINAL; CARDINAL; CARDINAL; ARRAY[0..commasy] File; Name; (*first generated action*) (*first stored assignment*) (*file with frame of sem.Analyzer*) (*grammar name*) (*length of grammar name*) (*last generated action*) (*last stored assignment*) (*column of last symbol*) (*type of last symbol*) (*leftmost column in semantic action*) (*indent from left margin*) OF CHAR; (*operator tablet) (*file containing sem.evaluator*) (*file name of sem.evaluator*) EmitAssign(p:Assignmentptr) ; FORWARD;
App. F cocogen.MOD 247 60 61 62 (* CloseFile Close file containing the semantic evaluator a ee 64 PROCEDURE CloseFile; “i 65 BEGIN 66 CopyFramePart (fram, sem, "-->modulename") ; 67 WriteText (sem, gram, graml) ; WriteString(sem,"sem") ; 68 CopyFramePart (fram,sem,"$$$") ; 69 Close(fram); Close (sem); 70 filesopen:=FALSE; un END CloseFile; 72 73 ? 74 (* Copy Copy source symbol to semantic evaluator 19 27-2222 76 PROCEDURE Copy (typ,col:CARDINAL); 77 VAR 78 che CHAR; 79 1,i: 80 name: 81 BEGIN 82 83 84 85 CARDINAL; Name; IF col<=lastcol THEN WriteLn (sem) ; WriteText (sem,blanks,margin) IF col>leftcol THEN 86 87 88 lasttyp:=eolsy; END; IF 91 92 ; (typ<=number) Write(sem," END; AND (lasttyp<=number) CASE typ OF 94 95 96 | | | 1: WriteString(sem, "allas"); 2: WriteString(sem,"any"); 3: WriteString (sem, "DECLARATIONS"); 97 | 4: 98 99 100 | | | | | 5: WriteString (sem, "endsem"); 6: WriteString (sem, "eps”); 7: WriteString (sem, "GRAMMAR"); 102 103 THEN "); 93 101 line*) WriteText (sem, blanks, col-leftcol) ; END; 89 90°» (*new WriteString (sem, "ENDGRAM") ; 8: WriteString(sem, "IN"); 9: WriteString (sem, "MACROS") ; | 10: WriteString(sem, "NONTERMINALS") ; 108 | | | | | 15: WriteString (sem, "SEMANTICS"); 109 | 16: WriteString (sem, "TERMINALS"); 110 Aus il | 17,18: (*ident, string*) GetName (at [l],name,1); WriteText (sem,name,1); | 19: WriteCard(sem,at[1],0); 113 [2027-33 104 105 106 107 11: WriteString (sem, "out"); 12: 13: 14: WriteString (sem, "PRAGMAS"); WriteString (sem, "RULES") ; WriteString (sem, "sem"); 114 115 116 ily 118 a a(*Operators®) Write(sem,op[typ]); | 34: ch:=CHR(at[1]); IF (ch="!") OR ((ch="*") THEN; ELSE Write (sem,ch) ; AND (lasttyp<>ident) ) 2)
119 120 121 122 123 124 125 App. F Program listings 248 END; END; (*CASE*) lasttyp:=typ; lastcol:=col; END Copy; (* CopyFramePart Copies file fl to file f2 until string s ee 126 ---------------------2277777777777 127 PROCEDURE CopyFramePart (VAR fl,f2:File; s:ARRAY OF CHAR); 128 VAR 129 130 131 132 ch,startch: CHAR; 1: INTEGER; t: ARRAY[0..50] OF CHAR; BEGIN 133 startch:=s[0]; 134 135 WHILE NOT f1*.eof IF ch=startch Read(fl,ch); 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 THEN (*check if s occurs*) 1:=0; WHILE (i<HIGH(s)) AND (ch=s[{i]) AND NOT f1*.eof DO t[i1]:=ch; INC(1); Read(fl,ch); END; IF ch=s[1] THEN RETURN; END; (*found - exit*) WriteText (f2,t,1); (*not found - continue*) Write (f2,ch); ELSE Write(f2,ch); (*normal character - write END; Read(fl,ch); END; (*WHILE*) END CopyFramePart; DO 151 (* EmitAction 152 ------------------------------------------- Emit stored 153 PROCEDURE EmitAction(line:CARDINAL; semantic VAR action 2-22... 2... sem:CARDINAL); 154 VAR 155 156 157 158 act,p: Actionptr; q: Assignmentptr} 159 BEGIN 160 161 162 163 PROCEDURE WHILE EqualAct (pl,p2: (pl<>NIL) AND Assignmentptr): (p2<>NIL) (pl*.left=p2*.left) AND pl:=pl*.next; p2:=p2*.next; END; 164 RETURN 165 END (pl=NIL) AND AND BOOLEAN; (pl*.typ=p2*.typ) (pl*.right=p2*.right) AND DO (p2=NIL); EqualAct; 166 167 68 169 170 171 172 1078 174 175 176 IG BEGIN IF firstass=NIL THEN sem:=0; ELSE p:=firstact; WHILE (p<>NIL) p:=p*.next; END; AND NOT EqualAct (p*.firstass,firstass) IF p=NIL THEN (*new action*) OpenSem(line, sem); it*) EmitAssign (firstass); DO x)
App. F cocogen MOD 249 178 Allocate (act, SIZE (Action) ); 179 180 181 act*.sem:=sem; act*.firstass:=firstass; IF firstact=NIL THEN firstact:=act 182 ELSE 183 184 185 186 187 188 189 lastact”.next:=act END; lastact:=act; ELSE (*same action found; delete recently sem:=p*.sem; WHILE firstass<>NIL DO g:=firstass; firstass:=firstass”.next; END; 190 act*.next:=NIL; stored assignments 4, Deallocate(q); END; 191 192 END; firstass:=NIL; 193 END A EmitAction; 194 195 196 (* EmitAssign 197 ------------------~--------------------------------------------------- Write attribute assignment 198 PROCEDURE EmitAssign (p:Assignmentptr); 199 VAR 200 1: CARDINAL; 201 name: Name; 202 203 BEGIN WHILE p<>NIL DO 204 205 WriteLn(sem); WriteText (sem,blanks,margin) GetName (p*.left,name, 1); 206 ZI I CASE p*.typ term: 208° * 209 210 211 212 WriteString(sem,”"ASSIGN("); WriteText (sem,name,1); WriteString(sem,”,at["); WriteCard(sem,p*.right,0); WriteString(sem,”]);"); | nonterm: WriteText (sem,name,1); WriteString (sem, ":="); 213 214 215 216 GAS | 218 219 220 221 222 223 224 OF GetName (p*.right,name, 1) ; WriteText (sem,name,1); Write (sem,";"); | const: WriteText (sem,name,1); WriteString(sem,":="); WriteCard(sem,p*.right,0); Write(sem,";"); END; (*CASE*) p:=p” .next; END; (*WHILE*) END EmitAssign; (* GenAssign Store attribute 225 -----------------------226 PROCEDURE GenAssign(t:Attrtype; 227 VAR ass: Assignmentptr; 228 229 ; BEGIN IF (t=nonterm) AND (l=r) THEN assignment 07777707700 1,r:CARDINAL); RETURN; END; 230 231 232 Allocate(ass,SIZE (Assignment) ); WITH ass* DO typ:=t; left:=1; right:=r; next:=NIL; END; IF firstass=NIL THEN firstass:=ass; ELSE lastass”.next:=ass; 233 234 235 lastass:=ass; END GenAssign; 236 END;
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 Zn 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 App. F Program listings 250 Insert (* InsertFramePart PROCEDURE part middle of semantic evaluator InsertFramePart; BEGIN CopyFramePart (fram, sem, "-->actions"); margin:=9; END InsertFramePart; (* OpenFile PROCEDURE VAR 1,1: BEGIN Open file for semantic evaluator OpenFile (spix:CARDINAL); CARDINAL; GetName (spix,gram,1l); graml:=1; FOR i:=1 TO graml DO semname[i]:=gram[i]; semname[l+1]:="s"; semname(1+4]:="."; semname(1+7]:="F"; semname[l+2]:="e"; semname[l+5]:="D"; semname[1+8] :=0C; END; semname[1l+3] :="m"; semname[1+6] :="E"; Open (sem, src*.volRef, semname, TRUE) ; (*definition module*) Open (fram,src*.volRef,"cocosemframe", FALSE); IF NOT Done THEN SemErr(25,line,col); WriteString(con,"The file 'cocosemframe' must be in the same "); WriteString(con, "subdirectory as the input grammar.$Aborted.$"); CompErr (4) END; CopyFramePart (fram, sem, "-->modulename") ; WriteText (sem,gram,graml); WriteString(sem,"sem") ; CopyFramePart (fram, sem, "-->modulename") ; WriteText (sem, gram,graml); WriteString(sem,"sem") ; CopyFramePart (fram, sem, "-->implementation") ; Close (sem); semname[1+5]:="M"; semname[1+6]:="0"; semname[l+7) :="D"; Open (sem, src*.volRef, semname, TRUE) ; (*implementation CopyFramePart (fram, sem, "-->modulename") ; WriteText (sem,gram,graml); WriteString(sem,"sem") ; CopyFramePart (fram, sem, "-->scannername") ; WriteText (sem,gram,graml); WriteString(sem,"lex") ; CopyFramePart (fram, sem, "-->declarations"); filesopen:=TRUE; END OpenFile; (* OpenSem PROCEDURE Write start of new OpenSem(line:CARDINAL; semantic VAR action nr:CARDINAL); BEGIN INC (maxsem) ; nr:=maxsem; WriteString(sem,"$ | "); WriteString(sem,": (*line WriteCard(sem,maxsem, 3) ; "); WriteCard(sem, line,0); WriteString(sem,"*)"); END OpenSem; (* StartCopy Set leftmost column in semantic action module*)
App. F cocogenMOD 251 296 PROCEDURE StartCopy(col:CARDINAL) ; 297 BEGIN leftcol:=col; lasttyp:=eolsy; lastcol:=99; 298 END StartCopy; 299 300 BEGIN (*cocogen*) 301 (*012345678901234567890*) 302 303 304 op:=" =.1|1 1 <>;:,"; maxsem:=11; margin:=0; END cocogen. firstact:=NIL; act Action Actionptr Allocate ass 199, 27 27 14 Bel 34 230 Assignmentptr at Attrtype blanks Cc ch 7A 31 ioe a 39.226 17 84 255 ish aie) 143 144 12 69 64 all 8 304 10 WO) 7 23054 11722263 SB slab Assignment Close CloseFile cocogen cocolex col commasy CompErr Con const Copy CopyFramePart Deallocate Done EmitAction EmitAssign eof eolsy EqualAct Errors fl f2 File FileIo filesopen firstact firstass FORWARD 28 SUA ee 29 178 32S eA COS ¥78 230 B30 BS Dey Raye 20%) filesopen:=FALSE; nr! pe\s} A SO Salis} a) GG I 1198227 86 204 UNG 146 69 12 261 262 215 76 66 122 68 127 14 188 133551352,1385 1397218 9141 270 ey 148 G.I 241 265 1222259 153.193 DIE 199221 134 138 24 88 297 1582.165 2172 ali 1272133751347 2138721397 1270 1422 43 44 2 Gh) | Sys) 12 7105271972303 43 171 180 181 303 3 A 19200273273 22308 146 a), PS GV 267 269 274 276 278 ee Re 59 fram GenAssign GetName gram graml ASO CR a 226 234 I Jul Ay AG Ol me2 ole OT =PS HIGH 138 1 EA aN) (*"=" must start at pos. firstass:=NIL; US) 252 AVS) USS ee za 200 rl Ake} 2 25292665208 BY OG AS 252d Ay AN) er SISK SIG SIS AICS AR a SN) re Re AV Sy
ident InsertFramePar 1 lastact lastass lastcol lasttyp left leftcol line lparsy margin maxsem name Name next nonterm nr number op Open OpenFile OpenSem SemErr semname spix src startch StartCopy string System E term typ volRef Write WriteCard WriteLn WriteString App. F Program listings 252 19 239 79 231 272 48 49 50 Sl 36 52 10 22 53 287 80 40 32 al 285 21 54 12 248 177 59 206 158 158 156 226 12 37 227 30 95 107 186 214 268 288 11 56 272 248 10 129 296 20 14 131 207 35 257 12 13 13 13 104 116 243 111 249 272 182 232 82 88 161 85 153 200 251 ul Zou 272 184 233 121 90 161 86 WH 297 116 205 297 260 84 287 Wa 46 38 229 287 90 114 257 280 285 155 209 160 160 188 229 133 161 133 55 96 108 186 216 269 289 260 252 212 251 257 133 297 204 288 111 56 162 242 303 201 80 162 90 302 258 273 205 253 208 258 121 231 297 285 289 2122 293 2135212 254 254 21652268229 255 255 254 303 205 201 173 208 212 213 214 216 1219 182 188 219 231 232 172 217 161 161 12 219 161 161 173 219 162 162 103 1219 186 198 203 205 162 162 164 164 146 209 138 67 98 alba 204 217 273 290 213 141 67 99 112 208 217 274 217 231 68 100 114 208 241 275 69 101 118 209 257 275 83 102 153 209 265 276 84 103 169 210 266 277 86 104 Ln 212 266 2 91 105 179 212 267 278 94 106 179 214 268 288 253 273 253 253 254 254 254 255 255 257 272 258 135 273 139 142 226 229 231 76 258 91 112 83 67 105 90 Zi 114 209 204 94 106 93 114 121 160 160 206 ze! 118 217 143 288 144 289 214 217 95 107 96 108 97 109 98 208 99 a) 100 al) 101 ar 102 Pal 103 Aa Zo 171 213 160 160 188 231 139 161 138 66 97 109 204 216 270 289
App. F WriteText cocogen.MOD 262 266 IST AX A 268 ar 22/8) 275 BE 277 288 289 IDEE 253 290 08210 2112160266
App. F Program listings 254 Generator for syntax 1 (* cocogen2: 2 3 This module generates the parser. It 4 a) translates the top-down graph into 5b) copies text from the parser frame, 6 the table sizes 7c) writes the parser tables 8 d) prints statistical information Moe files G-code inserting about the the declarations compilation io} 10 DEFINITION 11 MODULE cocogen2; 12 PROCEDURE GenSynFiles; 13 (* Generates the parser 14 15 PROCEDURE 16 (* Writes 17 18 END and the parser PutStatistics; statistics about cocogen2. the tables*) compilation to the list 1.2.84 file*) of
App. F cocogen2. MOD (* cocogen2: Generator for syntax 255 files Moe 1.2.84 This module generates the parser. It a) translates the top-down graph into G-code b) copies text from the parser frame, inserting the declarations the table sizes c) writes the parser tables d) prints statistical information about the compilation IMPLEMENTATION MODULE of cocogen2; rr DIDHGHSwWNwHrH POW FROM FROM FROM FROM FROM cocogen cocogra cocolex cocolst IMPORT IMPORT IMPORT IMPORT cocosym IMPORT maxsem, CopyFramePart; alts, maxn, rootloc, rules, GetNode, Graphnode; line, col, GetName; lst; gramspix, maxany, maxeps, maxt, maxp, maxs, GetA, GetE, GetF, GetSy, RepSy, Symbolnode, Symbolset, Symboltype; YE Ree RPP PRR wo PM Ww & DOAIdDO FROM Errors IMPORT CompErr, SemErr; FROM FileIO IMPORT con, File, Done, Open, Close, Write, WriteString, WriteText, WriteLn; FROM FROM System SYSTEM IMPORT IMPORT Allocate, VAL; WriteCard, Deallocate; PO>wmwHro LH DH NM NM CONST (*for G-code*) lmaxc = 3000; (*G-code length*) YDNDM www 28 29 30 31 32 33 34 35 36 37 38 39 40 4] 42 43 44 45 46 47 48 TYPE Filename = ARRAY[1..30] OF CHAR; Instruction=(tc, tac,ntc,ntac,ntsc,ntasc, anyc,anyac,epsc,epsac, . jmpc, retc) ; # VAR code: ARRAY[{1..lmaxc] pe: CARDINAL; maxname : CARDINAL; first: BOOLEAN; alo Oe OF [0..255]; (*G-code area*) (*index in code*) (*length of name list*) (*used for printing of tables*) (*initialization counter*) CARDINAL; RECORD CASE :BOOLEAN TRUE: | FALSE: OF ch: ARRAY[1..2] card: CARDINAL; OF CHAR; END; END; PROCEDURE OutByte(VAR f:File; ch:CHAR); PROCEDURE OutWord(VAR f:File; n:CARDINAL); PROCEDURE PrintTables(VAR 49 PROCEDURE f:File); FORWARD; FORWARD; FORWARD; WriteConstDecl (VAR f:File;t:ARRAY 50 Sl 52 (* G-code labels 53 MODULE LABMOD; 220222 2222222222222 2202222222222 54 Z=2=22=2=2=2==2==2=2=2=2 55 IMPORT code, CompErr, Allocate, Deallocate; 56 57 EXPORT GetAdr, labact, NewAdr, Visited; 58 59 OF CHAR;n:CARDINAL); FORWARD; 2222222222222 === 2 === === )
60 61 62 TYPE Fixupptr Fixup = POINTER = RECORD TO Fixup; 63 adr: CARDINAL; (*G-code 64 next: Fixupptr; (*to 65 END; 66 67 App. F Program listings 256 Labeladr = RECORD loc,adr: CARDINAL; 68 69 fxs END; Fixupptr; (*node (*to address*) next fixup*) address first and corresponding G-code address*) fixup*) 70 VAR 71 lab: ARRAY[1..70] OF Labeladr; 72 labact: CARDINAL; ve 74 75 PROCEDURE GetAdr (loc, fixup:CARDINAL; 76 VAR ah 1: CARDINAL; 78 Fixupptr; fp: VAR adr:CARDINAL) ; 79 BEGIN 80 i:=1; 81 WHILE 82 83 84 85 86 IF i>labact THEN (*new label*) INC (labact); lab[i].loc:=loc; lab[i].adr:=0; Allocate (fp, SIZE (Fixup) ); fp*.adr:=fixup; fp*.next:=NIL; lab[i].fix:=fp; 87 88 89 90 (i<=labact) AND (lab[i].loc<>loc) ELSE (*old label*) IF lab{i].adr=0 THEN (*not Allocate (fp,SIZE (Fixup)); lab[i].fix:=£p; 91 DO INC(i); yet resolved*) fp*.adr:=fixup; END; fp*.next:=lab[i].fix; END; 92 END; 93 adr:=lab[i].adr; 94 END GetAdr; 95 96 97 PROCEDURE NewAdr (loc,adr:CARDINAL); 98 VAR 99 100 101 i: CARDINAL; p,q: Fixupptr; BEGIN 102 desis 103 WHILE 104 105 106 107 108 109 110 IF i>labact THEN (*new label*) INC (labact); lab[i].loc:=loc; (*old label*) ELSE IF lab[i].adr=0 THEN (*resolve fixups*) p:=lab[1].fix; 111 112 113 114 115 116 al 118 (i<=labact) AND (lab[i].loc<>loc) DO INC(i); lab[i].adr:=adr; WHILE p<>NIL DO code [p*.adr]:=adr DIV 256; code[p*.adr+1]:=adr MOD 256; qg:=p; p:=p*.next; Deallocate(q); END; lab{i].adr:=adr; lab[i].f1x:=NIL; ELSE (*fixups already resolved*) CompErr (6); END; lab[1].fix:=NIL;
App. F 1019 120 121 122 123 cocogen2.MOD END; END; END NewAdr; 124 PROCEDURE Visited(loc:CARDINAL): 125 VAR 1: CARDINAL; 126 BEGIN 127 Ir BOOLEAN; 128 129 WHILE (i<=labact) AND (lab[i].loc<>loc) RETURN (i<=labact) AND (lab[i].adr>0); 130 131 132 END Visited; 133 257 DO INC(i); END; z BEGIN (*LABMOD*) 134 labact:=0; 135 END LABMOD; 136 137 138 139 (* Emit Emit G-code byte ---------------------------------------- 140 PROCEDURE Emit (byte:CARDINAL); 141 BEGIN code[pc]:=byte; INC(pc); END Emit; 142 143 144 (* Emit2 Emit G-code word 145 ---------------------------------77 146 PROCEDURE Emit2 (word:CARDINAL); 147 BEGIN 148 code [pc]:=word 149% INC(pc,2); 150 END DIV 256; code[pc+1]:=word MOD 256; Emit2; 151 3:52 153 (* GenCode 154 ---------------------------------- 22-2222 nna Generate 155 PROCEDURE GenCode G-code for TDG in loc (loc:CARDINAL); 156 VAR 157 adr: 158 gn: 159 BEGIN CARDINAL; Graphnode; 160 IF Visited(loc) 161 162 163 NewAdr(loc,pc); GetNode(loc,gn); WITH gn DO 164 165 172 RETURN; (*now coming END; to address loc*) CASE typ OF Gc 166 167 168 169 170 171 THEN IF lp=0 THEN Emit (ORD(tc)); Emit (sp); ELSE GetAdr (lp,pc+2,adr) ; | nt: Emit (ORD(tac)); END; IF lp=0 THEN Emit(sp); Emit2 (adr); IF seml=0 173 174 THEN Emit (ORD(ntc)); Emit (sp); ELSE Emit (ORD(ntsc)); Emit(sp); 175 END; 176 170 ELSE GetAdr (lp,pc+2,adr); Emit (seml); -
258 Program listings App. F IF seml=0 178 THEN Emit (ORD(ntac)); Emit(sp); Emit2 (adr); 179 ELSE Emit (ORD(ntasc) ) ;Emit (sp) ;Emit2 (adr) ; Emit (sem1) ; 180 181 END; END; 182 | any: IF lp=0 183 184 THEN Emit (ORD (anyc) ); ELSE 185 GetAdr (lp, pc+2,adr) ; 186 Emit (ORD(anyac)); Emit(sp); Emit2(adr); 187 188 END; 189 | eps: IF sp<>0 THEN 190 IF lp=0 191 THEN Emit (ORD(epsc)); Emit (sp); 192 ELSE 193 GetAdr (lp, pc+2,adr) ; 194 Emit (ORD(epsac)); Emit(sp); Emit2 (adr) ; 195 END; 196 END; 197 END; (*CASE*) 198 IF sem2<>0 THEN Emit (sem2); END; 199 IF sem3<>0 THEN Emit (sem3); END; 200 IF rp=0 THEN Emit (ORD (retc)); 201 ELSIF Visited(rp) THEN 202 GetAdr(rp,pc+l,adr); Emit (ORD(jmpc)); Emit2 (adr); 203 END; 204 IF rp>0 THEN GenCode(rp); END; 205 IF lp>0 THEN GenCode (lp); END; 206 END; (*WITH*) 207 END GenCode; 208 209 210 (* GenSynFiles Generates files for syntax analysis a - -- = - == --=2------_________ * ) 211 wn nnn na a a 212 PROCEDURE GenSynFiles; 218 VAR 214 fn: Filename; 215 fram: File; (*file with parser frame*) 216 graml: CARDINAL; (*length of grammar name*) 217 gramname: Filename; (*grammar name*) 218 Na CARDINAL; 219 name: ARRAY[1..50) OF CHAR; 220 startpc: CARDINAL; 221 sn: Symbolnode; 222 syn: File; (*file for generated parser*) 223 BEGIN 224 pe:=1; 225 FOR i1:=maxp+1l TO maxs DO 226 labact:=0; startpc:=pc; 227 GetSy(1,sn); 228 GenCode (sn.start); 229 sn.start:=startpc; 230 RepSy (1,sn); 231 END; 232 startpc:=pc; GenCode (rootloc) ; 233 234 maxname:=4; (*"EOF"+0C*) 235 FOR i:=1 TO maxs DO 236 GetSy(1,sn); GetName (sn.aliasspix,name,]l);
App. F 237 238 239 240 241 242 243 244 cocogen2. MOD 259 sn.spix:=maxname+1; RepSy(i,sn); INC (maxname,1+1); (*sn.spix becomes a pointer in the generated name list*) END; GetName (gramspix,gramname,graml); ‘ generate parser*) (*------------------------------------------FOR i:=1 TO graml DO fn[i]:=gramname[i]; END; 245 246 247 248 fn{graml+1]:="s"; £n[graml+2]:="y"; fn[graml+3]:="n"; £n[graml+5]:="D"; fn[graml+6]:="E"; fn[graml+7]:="F"; TRUE); fn,ef, Open (syn, 1st*.volR FALSE); Open (fram, lst*.volRef,"cocosynfram e", 249 IF NOT 250 251 252 253 254 255 256 280, 258 259 260 261 262 Done fn[graml+4] :="."; fn[graml+8] :=0C; THEN same WriteString(con,"The file 'cocosynframe' must be in the WriteString(con,"subdirectory as the input grammar.$"); SemErr(21,line,col); CompErr(5); END; syn, "-->modulename") CopyF(fram, (*definition rameP art ; WriteText (syn,gramname,graml); WriteString(syn,"syn") ; CopyF(fram, syn, "-->modulename") rameP art ; WriteText (syn,gramname,graml); WriteString(syn,"syn"); CopyF (fram, syn, "-->implementat ion"); rameP art Close(syn); £n[graml+5]:="M"; fn[{graml+6]:="0"; "); module*) fn[graml+7]:="D"; TRUE); fn,ef, Open (syn, 1lst*.volR 263 264 265 CopyFramePart (fram, syn, "-->modulename"); (*module WriteText (syn,gramname,graml); WriteString(syn,"syn"); name*) 266 267 " CopyFramePart (fram,syn,"-->semantic analyzer"); (*various 268 WriteText (syn,gramname,graml); 269 270 CopyFramePart (fram,syn,"-->input module"); WriteText (syn,gramname,graml); WriteString(syn,"lex") ; imports*) WriteString(syn,"sem") ; 271 272 273 274 CopyFramePart (fram, syn, "-->declarations"); "CONST$"); (syn,ring WriteSt =",maxname) ; maxname WriteConstDecl (syn," 205 276 277 278 279 280 281 maxnamep =",maxs); WriteConstDecl (syn," =",pc-1); maxcode WriteConstDecl(syn," IF maxany=0 =",l); maxany THEN WriteConstDecl (syn," ELSE WriteConstDecl (syn," maxany =",maxany); END; IF maxeps=0 (*semantic 282 THEN WriteConstDecl (syn," maxeps u), 283 ELSE WriteConstDecl (syn," maxeps =",maxeps); 284 END; 285 286 WriteConstDecl(syn," WriteConstDecl(syn," maxt maxp =",maxt); =",maxp) ; 287 WriteConstDecl(syn," maxs =",maxs); 288 289 WriteConstDecl(syn," startpc WriteString(syn,"$ "); 290 291 292 293 294 295 declarations*) =",startpc); CopyFramePart (fram,syn,"-->tables") ; PrintTables (syn); (*module syn, "-->modulename") CopyF(fram, rameP art ; WriteText (syn,gramname,graml); WriteString(syn,"syn"); CopyFra (fram, syn, "$$$"); mePart name*)
Program listings 260 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 Sith 312 313 314 315 316 Sal 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 Close(fram); END App. F Close(syn); GenSynFiles; (* OutByte PROCEDURE BEGIN Write OutByte(VAR a byte f:File; Valeo THEN c.ch[1]:=ch; ELSE c.ch[2]:=ch; END; first:=NOT first; END OutByte; value to tables file ch:CHAR); Vg (* OutWord PROCEDURE OutWord(f,c.card); Write OutWord(VAR a word f:File; to tables file n:CARDINAL); BEGIN IF ic=10 THEN WriteString(f,"$ "); ic:=0 END; WriteCard(f,n,5); Write(f,","); INC (ic) ; OutWord; END (* PrintTables PROCEDURE Write PrintTables(VAR out an initialization of the grammar tables f:File); VAR 1,3j,1: CARDINAL; name: ARRAY[1..50] 38 sn: Symbolset; Symbolnode; OF CHAR; BEGIN first:=TRUE; WriteString(f," INLINE($ OutWord(f,pc-1); OutWord(f,maxt) ; "); ic:=0; (*header (table lengths)*) OutWord (f,maxp); OutWord(f,maxs) ; OutWord(f,maxeps); OutWord(f,maxany) ; OutWord(f,maxs) ; OutWord(f,maxname) ; WriteString(f,"$(*---G-code---*)$ FOR 1:=1 TO pc-1 DO *); ic:=0; (*G-code*) OutByte (f,CHR(code[i])); END; IF ODD(pc-1) THEN OutByte (f,0C); END; WriteString(f£,"$(*---nt-symbols---*)$ FOR i1:=maxp+l TO maxs DO GetSy (1,sn); OutWord(f,sn.start); OutWord(f,ORD(sn.del) *256); GetF (i,s); "); ic:=0; (*nt-symbols*)
App. F 356 357 358 359 360 361 362 363 364 365 366 367 cocogen2. FOR j:=0 TO maxt DIV 16 DO END; WriteString(f,"$(*---eps FOR i:=1 TO maxeps DO GetE (1,s); FOR j:=0 TO maxt DIV END; IF maxeps=0 END; maxeps:=1; 16 DO (*dummy*) END; 3183 WriteString(f,"$(*---any FOR i:=1 TO maxany 375 GetA(1,s); 376 FOR j:=0 TO maxt DIV END; END; IF maxany=0 OutWord(f,0); END; maxany:=1; (*dummy*) (*any-sets*) 16 DO j:=0 TO maxt DIV 16 DO END; "»WriteString(f,"$(*---attribute numbers---*)$ FOR i:=0 TO maxp DO GetSy (1,sn); "); ic:=0; (*attribute numbers*) OutWord(f,sn.nra) ; 390 391 END; WriteString(f,"$(*---pragma 392 OutWord(f,0); 393 394 395 396 FOR i:=maxt+l TO maxp DO GetSy(i,sn); OutWord (f,sn.seml); OutWord(f,sn.sem2) ; semantic---*)$ OutWord(f,0); "); psem*) (*pragma END; WriteString(f,"$(*---name 399 OutWord(f,1); 400 401 402 403 404 FOR i:=1 TO maxs DO GetSy(i,sn); (*sn.spix is now a pointer OutWord(f,sn.spix); END; pointers---*)$ "); semantic*) ic:=0; (*for eofsy*) 405 Writesering(f, 406 OutByte(f,"E"); 407 408 OutByte(f,"F"); OutByte(f,0C); FOR 1:=1 TO maxs DO (*name in the generated %oi(*—==-names List=——*)iSi pointers*) name ")7) tes=07 OutByte(f,"0"); (*name GetSy (1,sn); 410 GetName (sn.aliasspix,name, 1); 411 FOR j:=1 TO 1 DO OutByte(f,name[j]); 412 413 OutByte(f,0C); END; IF ODD(maxname) ic:=0; (*dummy 397 398 414 ic:=0; THEN 382 383 384 409 "); OutWord(f,VAL(CARDINAL,s[j])); FOR 389 sets---*)$ DO 381 387 388 of eps 16 DO 374 386 "); ic:=0; (*followers THEN 370 371 372 385 followers---*)$ F OutWord(f, VAL(CARDINAL,s[4])); END; FOR J:=0 TO maxt DIV OutWord(f,0);- 378 379 380 261 OutWord(f,VAL (CARDINAL, s[4])); END; 368 369 377 MOD THEN OutByte(f,0C); END; END; list*) list*) nodes*)
262 Program listings 415 App. F WriteString(f,"0);$"); 416 END PrintTables; 417 418 419 420 (* PutStatistics Writes statistics about compilation to list file -----------------------2722722777 421 PROCEDURE 422 VAR 423 ptrsize: 424 425 setsize: storage: PutStatistics; CARDINAL; CARDINAL; CARDINAL; 426 BEGIN 427 ptrsize:=2; setsize:=2*((maxt DIV 16)+1); 428 storage:=pc-1 + 429 (ptrsize+t2+setsize)*(maxs-maxp) + 430 setsize*maxeps + 431 setsize*maxany + 432 2*(maxpt+1l) + (*G-code*) (*ntsymbols*) (*eps-followers*) (*any-sets*) (*nra*) 433 (Eps2) 4* (maxp-maxt+1) + 434 435 436 437 2* (maxs+1l) + (*namep*) maxname + (*name*) 16; (*header*) WriteLn(lst); WriteString(lst,"Statistics:"); WriteLn(lst); 438 439 440 441 442 443 WriteCard(lst,rules,5); WriteString(lst," rules"); WriteLn(lst); WriteCard(lst,alts,5); WriteString(lst," alternatives"); WriteLn(lst); WriteCard(lst,maxn,5); WriteString(lst," nodes"); WriteLn(lst); WriteCard(lst,maxsem-10,5); WriteString(lst," semantic actions"); WriteLn (lst); WriteCard(lst,maxeps,5); WriteString(lst," eps with look ahead"); 444 445 446 447 448 449 450 451 452 WriteLn (lst); WriteCard(lst,maxany,5); WriteString(lst," any-sets"); WriteLn (lst); WriteCard(lst,pc-1,5); WriteString(lst," bytes for G-code"); WriteLn(lst); WriteCard(lst,storage,5); WriteString(lst," bytes for END PutStatistics; 453 (* WriteConstDecl Write grammar constant tables (total)"); declaration WriteLln(lst); text 454 =---------------2----2--------------2--_ 22... __________ *) 455 PROCEDURE WriteConstDecl(VAR f:File; t:ARRAY OF CHAR; n:CARDINAL) ; 456 457 458 459 460 BEGIN WriteString(f,t); WriteCard(f,n,4); END WriteConstDecl; END cocogen2. adr 63 108 179 67 132 180 75 112) 186 84 143" 187 aliasspix Allocate 236 22 410 56 85 89 13 439 306 306 alts WriteString(f,";$"); any anyac 183 30 anyc byte c 30 140 38 187 184 141 305 86 “L138 193 88 89 VICE 194 202 “93. 1290 202 937 ST ode teow 0 77
App. F Cc card ch Close cocogen cocogen2 cocogra cocolex cocolst cocosym code col CompErr con CopyFramePart Deallocate del Done Emit Emit2 eps epsac epsc Errors if: File FileIo Filename first fix fixup Fixup Fixupptr fn GenCode GenSynFiles GetA GetAdr GetE GetF GetName GetNode GetSy gn graml gramname gramspix cocogen2. 246 41 40 20 12 10 13 14 15 16 33 14 19 20 12 22 354 20 140 179 199 146 189 30 30 19 46 335 353 392 411 20 20 29 36 68 75 61 61 214 261 46 78 215 296 155 212 16 58 17 17 14 13 17 158 216 257 217 16 348 306 46 259 MOD 263 407 412 414 302 296 305 296 305 306 306 141 148 148 345 258 264 267 269 272 291 293 295 174 191 174 194 174 194 19 198 334 350 391 407 460 56 252 56 250 254 56 1172 113 118 251 256 114 252 249 141 180 200 150 166 180 202 169 166 180 169 184 169 187 173 187 173 191 179 180 187 194 202 47 336 354 392 412 46 48 337 357 395 414 47 49 338 360 396 415 48 302 339 364 398 455 49 306 340 369 399 457 Za 314 341 373 403 457 222 317 342 377 405 457 302 319 343 382 406 319 345 386 406 326 348 389 407 314 326 455 214 304 86 86 62 64 244 261 47 85 248 217 308 89 89 85 68 245 262 48 86 254 308 90 334 106 110 116 89 78 245 100 245 245 246 246 246 246 247 49 86 256 86 258 89 264 89 267 89 269 90 272 291 29388295 204 297 375 75 362 355 236 162 227 162 241 261 241 241 205 207 228 232 94 168 177 186 193 202 241 410 236 163 244 261 244 352 388 394 401 409 245 261 255 245 265 257 245 268 265 245 270 268 246 294 270 246 246 246 194 191 294 261 255
Graphnode ih LC Instruction j jmpe 1 lab labact Labeladr maxany maxeps maxn maxname maxp maxs maxsem maxt n name NewAdr next nra nt ntac ntasc ntc ntsc ODD Open OutByte OutWord 9 pe PrintTables ptrsize PutStatistics q RepSy retc App. F Program listings 264 13 77 93 116 230 355 409 37 405 30 218 30 218 Da 106 58 66 53 14 26 67 128 165 15 439 445 16 16 13 35 16 16 WZ 16 47 219 58 64 389 171 30 30 30 30 347 20 46 47 353 399 100 34 224 48 423 421 100 17 30 158 80 99 116 235 361 81 102 125 236 362 81 103 127 237 374 81 103 128 244 375 82 103 128 244 387 84 104 128 244 388 84 106 129 328 393 86 106 129 344 394 88 106 218 345 400 89 108 225 351 401 90 110 227 352 408 316 oly 320 334 343 350 360 373 386 391 398 328 202 236 81 108 72 71 135 252 33 19 128 168 247 440 446 Zu 281 440 234 225 225 44] 285 49 236 97 86 356 357 363 364 368 376 377 381 411 411 237 84 110 81 328 84 116 82 410 86 116 84 411 88 128 103 89 129 104 90 98 103 106 106 106 128 129 134 226 81 155 171 248 440 446 279 283 81 160 11) 262 440 447 340 339 84 161 183 437 441 448 374 361 84 162 186 437 441 449 380 367 97 103 103 106 106 124 190 437 442 449 384 371 193 438 443 205 438 443 205 438 444 439 445 439 445 431 430 445 443 237 286 235 237 33, 275 274 351 287 342 387 338 414 393 341 435 429 351 432 400 433 408 429 434 336 314 329 121 89 356 319 410 161 114 363 455 411 368 457 376 381 393 427 433 248 309 314 357 262 345 321 364 348 335 369 406 336 377 406 337 382 407 338 389 407 339 392 411 340 392 412 341 395 414 342 396 111 141 232 326 429 112 148 276 416 113 148 335 114 149 344 114 161 347 114 168 428 177 446 186 193 202 1719 180 173 174 414 247 302 306 354 403 110 141 226 292 427 450 114 230 200 114 237
App. F rootloc rp rules s seml sem2 sem3 SemErr setsize sn sp spix start startpc storage Symbolnode Symbolset Symboltype syn System SYSTEM t tac BC typ VAL Visited volRef word Write WriteCard WriteConstDecl WriteLn WriteString WriteText cocogen2.MOD 13 200 13 330 72 198 199 19 424 221 354 166 237 228 220 425 17 17 18 222 265 276 293 22 23 49 30 30 164 23 58 247 146 20 20 49 455 21 21 334 439 21 232 201 438 355 174 198 199 252 427 227 388 169 403 229 226 428 221 330 204 204 362 180 364 395 430 229 394 174 431 230 395 179 292 288 247 265 278 294 255 268 282 295 255 268 283 296 165 169 166 457 357 124 248 148 319 319 274 458 437 250 343 440 258 377 160 201 439 276 438 255 360 443 265 265 375 37 236 396 180 236 401 187 237 403 189 237 409 191 33 410 194 352 353 256 269 285 Zoi) 270 286 257 270 287 258 ile 288 259 273 289 262 274 291 264 275 292 440 278 44] 279 443 282 445 283 446 285 448 236 457 287 288 439 257 373 445 268 440 265 386 446 270 442 268 391 449 294 444 270 398 457 445 273 405 457 447 289 415 449 294 437 317 438
266 Program listings Graph node (* cocogra App. F Moe list 28.12.83 This module builds and handles the top-down graph. It a) generates and updates single graph nodes b) concatenates graphs via left or right pointers the whole graph for tracing oO = prints d) inserts eps nodes before deletable nonterminals with alternatives e) deletes redundant eps-nodes resulting from EBNF-constructs such as sc N ORION. ee ee er BR rotvonawuwbwßMV+H ER % 11 DEFINITION MODULE cocogra; 12 13 FROM cocosym IMPORT Symboltype; 14 15 CONST 16 iy) 18 19 20 za 22 23 24 25 26 27 28 29 maxnodes = 600; TYPE Graphnode = RECORD typ: Symboltype; sp: CARDINAL; lp: CARDINAL; rp: CARDINAL; seml: [0..255]; sem2: [0..255]; sem3: [0..255]; line: CARDINAL; link: CARDINAL; (*eps,t,pr,nt,any,err*) (*node symbol*) (*left pointer*) (*right pointer*) (*evaluation of in-attributes*) (*evaluation of out-attributes*) (*semantic action*) (*line number*) (*ptr to node with same right successor*) END; 30 Marklist = ARRAY[0..maxnodes DIV 16] OF 31 32 VAR 33 maxn: CARDINAL; (*number of 34 alts: CARDINAL; (*number of 35 rules: CARDINAL; (*number of 36 rootloc: CARDINAL; (*root node 37 38 PROCEDURE ClearMarkList (VAR m:Marklist) ; 39 (* Clears the mark list m*) 40 41 PROCEDURE 42 43 44 BITSET; graph nodes*) alternatives, filled by AG*) grammar rules, filled by AG*) of grammar, filled by AG*) ConcatLeft (VAR gp,gl,gpl,gl1:CARDINAL); (* Links the graph (gp,gl) with the graph (gpl,gll) The resulting graph is identified by (gp,gl)*) 45 PROCEDURE via left via right ConcatRight (VAR gp,gl,gpl,gll:CARDINAL); (* Links the graph (gp,gl) with the graph (gpl,gll) The resulting graph is identified by (gp,gl)*) iS tes Css Oo DID PROCEDURE Deletable(loc:CARDINAL): (* TRUE if the graph with the root BOOLEAN; loc is deletable*) nn © = onie) PROCEDURE DeleteRedundantEps; (* Deletes eps nodes in constructions PROCEDURE (* TRUE aAAaAnnnn PROCEDURE w ou» onio} (* Gets pointers. {x}y and [x]y*) DelNode (gn:Graphnode) : BOOLEAN; if the node gn contains a deletable symbol*) GetNode (p:CARDINAL; VAR gn:Graphnode); the graph node with the index p*) pointers.
App. F cocogra.DEF 267 60 61 PROCEDURE GraphList; 62 63 a test (* Prints list of the top-down graphs of all rules*) 64 PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist); 65 (* Marks loc in list m as visited*) 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 PROCEDURE Marked(loc:CARDINAL; VAR m:Marklist): (* TRUE if loc is marked in m*) PROCEDURE NewEpsBeforeDelNts; (* Inserts eps nodes in front of deletable BOOLEAN; nt's*) PROCEDURE NewNode (typsSymboltype; sp,line:CARDINAL): CARDINAL; (* Generates a new graph node with the specified values and returns its index*) PROCEDURE RepNode (p:CARDINAL; gn:Graphnode); (* Replaces the graph node with index. p by gn*) END cocogra.
App. F Program listings 268 list node Graph (* cocogra Moe for coco 29.12.83 This module builds and handles the top-down graph. It a) generates and updates single graph nodes graphs via left or right pointers oO < concatenates c) prints the whole graph for tracing d) inserts eps nodes before deletable nonterminals with alternatives e) deletes redundant eps-nodes resulting from EBNF-constructs such as vo,sawuPbwMNH {x}y or [x]y rR LEN ua IMPLEMENTATION MODULE cocogra; FROM FROM cocolex cocosym IMPORT IMPORT FROM FROM Errors FileIo IMPORT IMPORT ddt, GetName; maxp, maxs, GetSy, RepSy, Symbolnode, Symboltype; Restriction; con, WriteCard, WriteLn, WriteString, WriteText; P>+rrereHrrHrH oJn\aw>w%M vo TYPE VAR Graphnodelist = ARRAY[l..maxnodes] OF Graphnode; gn: Graphnodelist; (*syntax graph*) NP ND NM MM Oo + wMNV > (* ClearMarkList Clear mark ! | ! | | | | ' ! | ' ' | | I ! | ! | | ! | | | ! | | | ! PROCEDURE i: BEGIN m | | ' ! I | ' I | | ! I ! I I ! | - CARDINAL; FOR i:=0 TO maxnodes (* ConcatLeft VAR p: BEGIN DIV Concatenate MW nr wwwh WCOW eo WmOrR PROCEDURE ConcatLeft 16 DO m[i]:={}; graph gpl left END; END to graph ClearMarkList; gp (VAR gp,gl,gpl,gl1:CARDINAL); CARDINAL; Www = Dm P:=9p; Ww oo — gn[p] ..1p:=gpl; WHILE gn(p].lp<>0 DO p:=gn[p].lp; END; p:=gl; > W ow WHILE gn[p] .link<>0 an[p] .link:=gl1; END ConcatLeft; DO p:=gn[p].link; END; > PP ww - (* ConcatRight Concatenate graph gpl right to graph PROCEDURE ConcatRight (VAR gp,gl,gpl,gl1:CARDINAL) ; VAR p: CARDINAL; BEGIN [u Se Sn u WODWAIDHDUS p:=gl; WHILE p<>0 DO gn{p].rp:=gpl; p:=gn[p].link; END; gl:=gll; END om oom wwre6o0 56 ' ! ! cam ClearMarkList (VAR m:Marklist); NM No VAR list | ! | ' I ' ' | j ' ! I ConcatRight; (* Deletable Check if graph in loc 58 PROCEDURE Deletable (loc:CARDINAL) :BOOLEAN; 59 VAR m: Marklist; is deletable gp ~
App. F cocograMOD 269 60 61 PROCEDURE 62 VAR gn:Graphnode; 63 BEGIN DelGraph (loc: 64 IF loc=0 65 66 IF Marked(loc,m) Mark (loc,m); 67 68 GetNode (loc,gn); IF ddt["C"! THEN 69 70 WA THEN RETURN CARDINAL) :BOOLEAN; TRUE; THEN END; RETURN (*end of graph FALSE; found*) END; WriteString(con,"DelGraph:") ; WriteCard(con,1loc,6); WriteCard(con,ORD (gn.typ) ,8); WriteCard(con,gn.sp,6); WriteLn(con); 72 END; 73 RETURN 74 ((gn.lp<>0) AND DelGraph(gn.lp)) (DelNode(gn) 75 76 AND OR DelGraph(gn.rp)); END DelGraph; 77 BEGIN (*Deletable*) 78 ClearMarkList (m); 79 80 RETURN DelGraph (loc); END Deletable; 83 (* DelNode Test if node gn is deletable 84 ------------------------------22222-000000 : BOOLEAN; 85 PROCEDURE DelNode (gn:Graphnod e) 86 VAR sn:Symbolnode; 87 BEGIN 88 89 90 91 | IF gn.typ=nt THEN GetSy(gn.sp,sn); ” ELSE RETURN RETURN sn.del; gn.typ=eps; END; 92 END DelNode; 93 94 95 (* DeleteRedundantEps Delete eps nodes in constructions {x}y and [x]y 96 -----------------------------------200. *) 97 PROCEDURE DeleteRedundantEps; 98 VAR 99 m: Marklist; 100 1: CARDINAL; 101 sn: Symbolnode; 102 103 PROCEDURE DelEps (loc:CARDINAL); 104 VAR gn,gnl: Graphnode; 105 106 107 108 109 110 24 172 113 BEGIN IF (loc=0) OR Marked(loc,m) RETURN; IF lp<>0 THEN GetNode (lp,gnl); IF (gnl.typ=eps) AND (gn1l.sem3=0) AND (gnl.lp=0) AND (gnl.rp<>0) THEN 114 lp:=gnl.rp; 115 END; 116 bald] 118 THEN Mark (loc,m); GetNode (loc, gn) ; WITH gn DO END; DelEps (lp); DelEps (rp); RepNode (loc, gn); END; *)
Program listings 270 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 185 136 137 138 199 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 App. F END; END DelEps; BEGIN ClearMarkList (m); FOR i:=maxp+l TO maxs GetSy(i,sn); DO DelEps (sn.start); END; END DeleteRedundantEps; (* GetNode Get node gp PROCEDURE GetNode (gp:CARDINAL; VAR BEGIN gnl:=gn[gp]; END GetNode; (* GraphList PROCEDURE trace output gnl:Graphnode); of graph node list GraphList; VAR 1,3,1: CARDINAL; name: sn: ARRAY[1..80] Symbolnode; OF CHAR; BEGIN WriteString (con, "$$Topdown-graph:$$”); WriteString(con,"loc symbol typ WriteString(con," seml FOR i:=1 TO maxn DO WriteCard(con,1,3); sem2 lp sem3 link WriteString(con," rp"); line$$"); "); WITH gn[i] DO CASE typ OF eps,any: WriteString(con, " we | EAMES GetSy(sp,sn); GetName(sn.spix,name,1); FOR 12 DO name[j]:=" j:=l+1 TO WriteText (con,name,12); | err: WriteString(con, "error END; (*CASE*) "; END; LANG CASE typ OF | | | | eps: t: pr: nt: any: ELSE; END; WriteString(con," WriteString(con," WriteString(con," WriteString(con," WriteString(con," eps t pr nt any "); "); ™); "); "); (*CASE*) WriteCard(con,1lp,7); WriteCard(con, rp,7) ; WriteCard(con,seml,7); WriteCard(con, sem2,7); WriteCard(con, sem3, 7); WriteCard(con, link, 7); WriteCard(con,line,7); WriteLn(con); END; (*WITH*) END; (*FOR*) END GraphList; (* Mark Marks node loc in m as visited
a cocograMOD 271 PROCEDURE Mark(loc:CARDINAL; BEGIN INCL(m[loc DIV 16],loc VAR m:Marklist); MOD 16); END Mark; (* Marked loc Tests PROCEDURE BEGIN if node Marked(loc:CARDINAL; RETURN (loc MOD 16) (* NewEpsBeforeDelNts PROCEDURE is marked VAR IN m[loc Insert in m m:Marklist): BOOLEAN; DIV Marked; 16]; eps before END del. nt's with alternatives NewEpsBeforeDelNts; VAR gn,gnl: Graphnode; loc, locl,maxloc: CARDINAL; sn: Symbolnode; BEGIN maxloc:=maxn; FOR loc:=1 TO maxloc DO GetNode (loc,gn); IF (gn.typ=nt) AND (gn.lp<>0) AND DelNode(gn) locl:=NewNode (gn.typ,gn.sp,gn.line); gnl:=gn; gnl.lp:=0; THEN WITH gn DO typ:=eps; END; RepNode sp:=0; rp:=locl; seml:=0; sem2:=0; sem3:=0; (locl,gnl); RepNode (loc, gn) ; “~ END; END; (*FOR*) END NewEpsBeforeDelNts; (* NewNode 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 PROCEDURE Generate a new NewNode (t:Symboltype; graph node s:CARDINAL; and return 1:CARDINAL): the CARDINAL; BEGIN INC (maxn) ; IF maxn>maxnodes THEN Restriction(5); END; WITH gn[maxn] DO typ:=t; sp:=s; lp:=0; rp:=0; seml:=0; sem2:=0; line:=1; link:=0; END; RETURN maxn; END NewNode; (* RepNode Replace node gp PROCEDURE RepNode (gp:CARDINAL; gnl:Graphnode) ; BEGIN gn[gp] :=gnl; END RepNode; BEGIN (*cocogra*) maxn:=0; END cocogra. index sem3:=0;
any ClearMarkList cocogra cocolex cocosym con 151 26 hl 18) 14 17 156 170 33 47 13 89 ConcatLeft ConcatRight dat del DelEps 103 Deletable 58 DeleteRedundantEps DelGraph 61 DelNode 74 eps 90 err Errors FileIo GetName GetNode GetSy gl gll gn 157 16 17 12) 67 14 gpl 33 33 21 Th 114 203 104 206 33 33 GraphList Graphnode Graphnodelist sl INCL 138 20 20 Ze 180 gnl gp j 140 il line link Oe Mark Marked Marklist maxloc 140 ON! 40 58 108 194 37 202 26 185 66 65 26 194 maxn 147 locl lp m App. F Program listings 27,2 maxnodes 20 maxp 14 maxs 14 165 28 235 78 123 69 158 ilzıl 42 53 68 70 161 ITA 70 162 71 163 117 80 97 is 85 112 118 120 125 127 74 92 kan 19 200 161 19 154 108 89 39 41 37 73 133 207 111 229 36 38 174 62 21 28 155 154 201 40 61 114 201 37 220 28 186 107 106 oo 197 197 28 124 124 al 125 47 47 37 23 149 219 112 230 47 47 132 154 50 52 38 74 193 230 112 tial 164 144 165 145 168 146 168 148 169 148 169 152 170 204 133 199 199 40 85 200 41 88 200 Sl 89 200 51 90 201 62 104 201 67 108 201 70 109 202 113 114 132 133 1.93 202 202 132 hil 229 230 85 104 193 229 28 100 125 140 147 148 149 155 158 221 41 64 179 204 38 51 65 180 206 73 221 67 185 70 186 > 186 103 194 106 198 106 199 107 207 110 111 113 114 La 168 200 wf 65 78 99 106 107 123 179 180 179 185 99 198 217 218 180 186 179 185 218 219 223 234 215
App. F cocograMOD name 141 NewEpsBeforeDelNts NewNode 201 nt 88 p 34 50 pr 163 RepNode 114 RepSy 14 Restriction 16 rp s 51 215 seml sem2 sem3 sn sp spix start Symbolnode Symboltype t typ WriteCard WriteLn WriteString 169 169 112 86 71 154 125 14 15 153 70 7 17 17 165 18 WriteText 154 193 ails) 153 36 51 155 210 224 164 37 51 Et 37 51 206 207 229 230 13 114 220 220 204 89 154 218 74 220 204 204 1% 89 89 86 215 162 88 70 qi 69 156 273 156 200 37 38 39 40 118 168 204 220 220 101 201 125 204 125 220 142 101 142 195 215 90 70 171 144 220 112 di 150 148 160 168 145 146 148 40 40 154 154 195 200 168 201 169 204 169 220 170 170 al 152 158 161 162 163 164 41 48
App. F Program listings 274 (* cocolex Lexical analyzer for coco This is the Coco-scanner. It a) reads the input grammar b) returns symbol numbers and terminal c) hashes names and strings into a name temporarily) d) converts number-strings to values All symbols which are not 'nococosy' and are hashed Moe attributes list to the parser (permanently terminals of Cocol get the into the name list. 83.03.27 symbol or type DEFINITION MODULE cocolex; FROM FileIO IMPORT File; VAR typ: CARDINAL; (*next token code*) at: ARRAY[1..10]) OF CARDINAL; (*attr. values of current token*) line: CARDINAL; (*current line number*) el on eee ee cee Cel jr HH &wWwWNHrMN DWAA CW WODAIDNSFwWNHH col: CARDINAL; (*current column number*) 20 ddt: ARRAY ["A".."Z"] OF BOOLEAN; (*debug and test switches*) 21 sre: File; (*source file*) 22 23 PROCEDURE GetName (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL); 24 (* Get the text of a name or a string with the spelling index spix. 25 len denotes its length*) 26 27 PROCEDURE GetSy; 28 (* Gets the next input token and fills at, line and col*) 29 30 PROCEDURE RestartHash; 31 (* Causes identifiers and strings to be stored permanently*) PROCEDURE (* Causes END Ww w WW Dom MY Sw StopHash; identifiers cocolex. and strings to be stored temporarily*)
App. F cocolex MOD (* cocolex: ======= 275 lexical analyzer for coco S======2=2=25==222=2===222222= moe 83.03.27 83512023 This is the Coco-scanner. It a) reads the input grammar b) returns symbol numbers and terminal attribut es to the parser c) hashes names and strings’into a name list (permanently or temporarily) d) converts number-strings to values All symbols which are not terminals of Cocol get "nococosy' and are hashed into the name list. IMPLEMENTATION FROM cocosyn MODULE the symbol type cocolex; IMPORT printinput, FROM Errors IMPORT SemErr, FROM FileIo IMPORT FROM SYSTEM IMPORT printnodes; Restriction; con, EF, EOL, File, Read, Write, WriteCard, VAL; WriteString, WriteText; RPP PRP Hr CMO BB UH vo Pur au DID wm r CONST 20 21 22 eofsy ident string number eqlsy periodsy varlantsy 23 24 25 26 27 1parsy 28 Zoe rparsy ibracksya = 0; = 178 = 18; 19, = 20; = 21; = 22; =) 23; Ae = = 24; 725; Ca) er) 026: 2271; (Er) SO Ree LACKS Vm Sie lconbrsy |=) S25 rconbrsy) = 5 29 Some 34 lat pansy ratparsy = = 2 OF 30; 35 36 37 38 semicolonsy= colonsy = commasy = snococosy) = 31%, 32, 33; 73%; 39 40 41 42 notyp buflen = = 43 TYPE 44 Charclass 45 46 47 (*lexical (*numbers types*) 1..16 reserved for keywords*) aes) GE) (ES) 255; 1024*16; = (none, letter,digit,quote,eql, period, variant, lpar, rpar, lbrack, rbrack, lconbr, rconbr, latpar, ratpar, semicolon, colon, comma, endfile, endline,dollar,minus); 48 49 VAR 50 51 SP 53 54 55 Ce class: CHAR; ARRAY [0C..377Cj OF Charclass; DUT: ARRAY [0..buflen-1] OF CHAR; bp,bpmax:CARDINAL; (*class OF input character*) (*input buffer*) (*buffer pointers*) 56 CONST 57 58 59 idmax htmax = 4980; = 359; (*max.length (*max.length of identifier list*) of hash table*)
App. F Program listings 276 60 VAR Game chis CHAR; (*current 62 63 64 OS 66 OY) 68 column: abe idl: snidact: keys: Whee storeid: CARDINAL; CARDINAL; ARRAY[0..idmax+20] OF CHAR; CARDINAL; CARDINAL; ARRAY(0..htmax] OF CARDINAL; BOOLEAN; (*start input character*) column*) of current (*identifiers*) (*last element IN id*) (*pos. OF last keyword IN id*) (*hash table*) (*store id. permanently?*) 69 70 71 (* Nextch Get next input character (ch,column global) 12. --------------------------2 2222222222002 73 PROCEDURE 74 BEGIN 75 222 2o2ooooo 5) NextCh; Read(srce,ch); INC (column); 76 END NextCh; Ui 78 79 (* Hash WW) Sear SSS Hash an ES identifier and SS return i its spix Se ee esas 81 PROCEDURE Hash(idp:CARDINAL; VAR spix: CARDINAL); 82 VAR h,l,d: INTEGER; SS *) 83 84 85 86 PROCEDURE Equalld(x,y,1:CARDINAL) :BOOLEAN; VAR 1: CARDINAL; BEGIN 87 88 1:=0; WHILE (i<l) 89 RETURN i=]; 90 END Equalld; AND (id[x+i]=id[y+i]) DO INC(1); END; 91 92 BEGIN 93 94 95 l:=idp-idact; spix:=idact+1; h:=(ORD(id[{spix])*7 + ORD(id[spix+1]) d:= -htmax; 96 LOOP 97 IF ht[h]=0 98 THEN IF storeid 99 * 17 MOD (*new THEN ht[h]:=spix; idact:=idp; htmax; identifier*) END; EXIT; 100 ELSIF 101 102 103 104 105 spix:=ht[h]; EXIT; ELSE INC (d, 2); IF d=htmax THEN Restriction(l); h:=(h+ABS(d)) MOD htmax; 106 107 END; Equalld(ht[h],spix,1) THEN (*old identifier*) (*collision*) END; (*hash table full*) END; (*LOOP*) 108 IF idp>idmax 109 END THEN Restriction(2); END; (*identifier Hash; 110 al Es (* EnterKey Enter a keyword 114 PROCEDURE EnterKey (sy:CARDINAL; 115 VAR idp,i: INTEGER; 116 BEGIN 117 118 + 1) INC (idact); id{idact]:=CHR(sy); FOR 1:=0 TO HIGH(key) DO to the key:ARRAY identifier list full*) list OF CHAR); idp:=idact; (*store (*store symbol number*) keyword*)
App. F 119 120 121 122 123 124 125 126 AL 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 197 158 159 160 161 162 163 164 165 166 167 168 169 170 ial 172 Ws 174 175 176 177 cocolex MOD INC (idp); 277 id{idp]:=key[i]; END; INC (idp); id[idp]:=0c; Hash (idp, keys) ; (*keys END EnterKey; (* GetName Get PROCEDURE the contains name of an GetName (spix:CARDINAL;VAR VAR i,h:CARDINAL; the last keyword identifier from name:ARRAY spix at any the name 1:=spix; 1:=0; h:=HIGH (name) ; (id[{i]<>0C) name[l]:=id[i]; AND (l<=h) INC(1); DO INC(1); END; END GetName; (* ReadName Read identifier or keyword PROCEDURE ReadName (VAR typ, val:CARDINAL); VAR spix,idp: CARDINAL; BEGIN idp:=idact; WHILE (class[ch]=letter) INC (idp); id[idp]:=ch; OR (class[ch]=digit) DO NextCh; END; INC (1dp); id[idp]:=0c; Hash (idp, spix); ‘IF spix<=keys THEN typ:=ORD (id[spix-1]); val:=0; ELSE typ:=ident; val:=spix; (*keyword*) (*identifier*) END; END ReadName; (* ReadString Read and hash PROCEDURE ReadString(VAR VAR och: CHAR; idp: CARDINAL; a string spix:CARDINAL); BEGIN idp:=idact; och:=ch; INC (idp); id[idp]:=och; NextCh; LOOP IF ch=och THEN NextCh; EXIT; ELSIF ELSIF ELSE (*store quote*) ch=EF THEN SemErr(24,line,col); EXIT; ch=EOL THEN SemErr(23,line,col); EXIT; INC(idp); id{idp]:=ch; NextCh; END; END; INC (idp); id[idp] :=och; (*store quote*) INC (idp); id[idp] :=0C; Hash (idp, spix) END ReadString; (* RestartHash Causes identifiers to be stored list 3) OF CHAR;VAR BEGIN WHILE time*) permanently 1: CARDINAL) ;
Program listings 278 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 woe ee PROCEDURE RestartHash; BEGIN storeid:=TRUE; END (* StopHash sr se Causes == ------ = = = = = = = x) RestartHash; identifiers a Sn es scm see rk i wis = ni App. F ne, cs a mscas to be stored ne a ms awa ps sr en temporarily Src es Sy eS a a rs *) PROCEDURE StopHash; BEGIN storeid:=FALSE; (* ReadNumber Fee un u PROCEDURE END StopHash; Read and convert rea a a a a wa Fa ae ee cardinal SE constant N a ao SS SS SS ee *) ReadNumber (VAR val:CARDINAL); BEGIN val:=0; WHILE class[ch]=digit DO IF (val>6553) OR ( (val=6553) AND (ch>'5') ) THEN SemErr(22,line,col); WHILE class[ch]=digit DO NextCh; END; ELSE val:=10*val+VAL (CARDINAL, ORD (ch) -ORD('0')); NextCh; END; END; END ReadNumber; (* GetSy nn a a - get next => 2 7-22 lexical symbol - - - - - - --- -- - 2-22 22a PROCEDURE GetSy; VAR val:CARDINAL; BEGIN REPEAT WHILE ch=' ' DO NextCh; END; col:=column; CASE class[ch] OF none: typ:=nococosy; at[1]:=ORD(ch); Nextch; | letter: ReadName (typ, val) ; IF typ=ident THEN at[1]:=val; END; | digit: ReadNum(at ber [1]); typ:=number; | quote: ReadString(at[1]); typ:=string; eql: period: | variant: | lpar: typ:=eqlsy; NextCh; typ:=periodsy; NextCh; typ:=variantsy; NextCh; typ:=lparsy; NextCh; rpar: | lbrack: | rbrack: lconbr: rconbr: | latpar: ratpar: semicolon: | colon: typ:=rparsy; NextCh; typ:=lbracksy; NextCh; typ:=rbracksy; NextCh; typ:=lconbrsy; NextCh; typ:=rconbrsy; NextCh; typ:=latparsy; NextCh; typ:=ratparsy; NextCh; typ:=semicolonsy; NextCh; typ:=colonsy; NextCh; comma : endfile: | endline: typ:=commasy; typ:=eofsy; typ:=notyp; NextCh; ----________ *)
App. F cocolex.MOD 237 238 239 240 241 242 column:=0; INC(line); Nextch; IF (line MOD 16)=0 THEN (*update counter IF line>16 THEN FOR 1:=1 TO 5 DO Write(con,10C) END; END; WriteCard(con, line, 5) 243 244 screen*) Nextch; IF CAP (ch)="D" (*debug option*) THEN NextCh; 246 247 248 WHILE 249 250 (CAP(ch)>="A") „ IF ddt["A"] IF ddt["B"] 253 WHILE (CAP(ch)<="Z") ch<>EOL DO NextCh; typ:=notyp; ELSE typ:=nococosy; 256 END; END; END; END; at[1]:=ORD('$'); NextCh; IF ch='-' 259 THEN 260 261 262 263 WHILE ch<>EOL DO NextCh; END; typ:=notyp; ELSE typ:=nococosy; at[1]:=ORD('-'); END; END; DO NextCh THEN printinput:=TRUE THEN printnodes:=TRUE 254 255 | minus: AND ddt [CAP (ch) ]:=TRUE; END; 21 252 264 on END; | dollar: 245 257 258 279 (*CASE*) 265 UNTIL typ<>notyp; 266 END GetSy; A u 268 269 BEGIN (*cocolex*) 270 FOR c:="A" TO "Z" DO AAU FOR c:=0C TO 377C DO 272 FOR c:='a’ TO 'z' DO 273 FOR c:='A' TO 'Z' DO 274 FOR c:='0' TO '9' DO 20S) class [EF] :=endfile; 276 class["'"] :=quote; ddt[c]:=FALSE END; class[c]:=none; END; class[c]:=letter; END; class[c]:=letter; END; class[c]:=digit; END; class [EOL] :=endline; class['$']:=dollar; class['"'] :=quote; 277 class['(']:=lpar; class[')']:=rpar; class[',']:=comma; 278 class['-']:=minus; class['.']:=period; class[':']:=colon; 279 class[';']:=semicolon; class['<']:=latpar; class['=']:=eql; 280 281 282 283 284 285 286 287 288 289 290 class['>']:=ratpar; class['{']:=lconbr; class['[']:=lbrack; class['|’]:=variant; class[']']:=rbrack; class['}']:=rconbr; FOR 1:=0 TO htmax-1 DO ht[i]:=0; END; storeid:=TRUE; 1410] :="E"; ial1):="0”, 1d[2]:="F", 1A[l3]:=0C; idact:=3; EnterKey( 1,'ALIAS'); Enterkey( 1,'alias'); EnterKey( 2,'ANY'); EnterKey( 2,'any'); column:=0; col:=0; line:=1; 291 EnterKey( 292 EnterKey( 4,'ENDGRAM'); 293 294 EnterKey( EnterKey( 5, 'ENDSEM'); 6, 'EPS'); 295 EnterKey( 7, 'GRAMMAR'); ch:=" "; 3,'DECLARATIONS') ; EnterKkey( EnterKey( 5, 'endsem'); 6,'eps');
Program listings 280 296 297 298 Enterkey( EnterKey( 8,'IN'); + EnterKey( 9, 'MACROS'); _Enterkey (10, 'NONTERMINALS') ; 299 EnterKey (11, 'OUT'); 300 301 302 303 Enterkey (12, 'PRAGMAS'); EnterKey(13,'RULES'); EnterKey(14,'SEM'); _EnterKey (15, 'SEMANTIC'); 304 EnterKey (16, 'TERMINALS') ; 305 END ABS at bp bpmax buf buflen 6 € CAP ch Charclass class cocolex cocosyn col colon colonsy column comma commasy con d ddt digit dollar EF endfile endline EnterKey eofsy EOL eql eqlsy Equalld Errors File FileIo GetName GetSy h Hash HIGH App. F 8,'in'); EnterKey (11, 'out'); EnterKey (14, 'sem'); cocolex. 105 2165021:855219522205225555262 53 53 52 Al 52 a GH aa ales a We BAY Bahl SU N a ZN Pk RIP Pl le 245 248 248 249 61 75 144 144 145 163 166 167 UCR ANY als) Ailey IG 24552409248 283 4 51 51 144 144 194 198 215 271 272 US UGS CNG Zi 21) PAR PIG Bie 2800 9280582805 9281952990528 12 305 13 167 168 197 214 283 46 233 278 S58) C2 Smee217375783 46234 277, 32234 15 240 242 82 95 103 104 105 24 9525 1252270 45 144 194 198 219 274 47 244 276 1022116738275 20282355275 47 236 275 114 123 289 289 290 290 291 292 295 296 296 297 298 299 299 300 304 2000235 15 168 253 260 275 AS 2212719 AN BPI 84 90 100 14 15 15 172855135 209 266 82: 94° 97. 98 -100..10% 105 105 re 118 131 271 273 287 274 274 168 249 169 253 194 258 195 260 273 278 274 279 215 279 215 279 293 301 293 302 294 302 294 303 129 ia 132
cocolex.MOD id idact ident idmax idp key keys l latpar latparsy lbrack lbracksy lconbr lconbrsy letter line lpar lparsy minus name NextCh nococosy none notyp number och period periodsy printinput printnodes quote ratpar ratparsy rbrack rbracksy rconbr rconbrsy Read ReadName ReadNumber ReadString RestartHash Restriction rpar rparsy SemErr semicolon semicolonsy spix 97 67 85 132 88 164 93 151 64 93 145 171 118 122 84 230 230 226 226 228 228 144 168 224 224 Zoi 131 76 224 244 216 216 236 219 163 222 222 251 252 220 231 231 227 227 229 229 19 153 204 174 180 104 22 225 167 232 232 93 151 98 94 87 133 88 169 93 218 108 98 145 172 119 149 88 279 281 100 95 88 133 94 171 98 101 104 88 240 94 172 117 285 105 88 285 117 287 117 285 88 285 119 287 117 89 1215 118 119 129 121 287 143 132 287 163 133 145 147 288 108 147 172 115 tay 173 117 148 119 161 119 163 121 164 121 164 122 169 141 169 89 93 94 100 128 WH! 132 133 133 272 287 273 238 239 242 283 164 226 249 262 166 227 253 169 228 257 198 229 260 201 230 218 231 216 232 22 233 222 234 261 265 164 278 166 171 276 280 276 100 101 128 131 141 148 149 280 281 2107 197 277 278 133 145 225 247 255 271 254 280 281 217 219 220 108 277 168 279 197 94 158 94 173 98
src StopHash storeid string sy SYSTEM typ val VAL variant variantsy Write WriteCard WriteString WriteText x y App. F Program listings 282 75 185 68 22 114 17 140 225 254 140 17 45 26 16 16 16 16 84 84 186 98 220 1977 150 226 255 150 200 223 223 240 242 88 88 180 186 286 151 227 261 io 216 228 262 191 217 229 265 193 281 218 230 DUG) Dil PAY) BE 2G. 23200233231 223224 235 236 195 195) 2008 217 200R e210 218
App. F cocolst.DEF 1 (* cocolst 2 Prints ZZZ222= = SS listing ZSZSEI2I5ESESI EI of Cocol SI S=S SS 283 text 00222202 Moe 16.8.87 > 3 This module closes the source file and reopens it for reading. It prints 4 a listing of the source file with line numbers and error messages. ee ae a en Sl le I nn =) 6 DEFINITION MODULE cocolst; 7 FROM FileIO IMPORT File; 8 9 VAR Ist: File; (*list file*) 10 11 PROCEDURE PrintListing; 12 13 END cocolst.
284 Program listings (* cocolst Prints listing of Cocol This module closes the source file and a listing of the source file with line 6 IMPLEMENTATION 7 FROM cocolex 8 FROM Errors 9 FROM FileIO 10 MODULE IMPORT IMPORT IMPORT Moe reopens numbers 16.8.87 it for reading. It prints and error messages. cocolst; 15 PROCEDURE GetLine(f:File; 16 VAR ch:CHAR; i:CARDINAL; line. Return SSS VAR line:ARRAY empty line if eof. SS IEE OF *) CHAR); BEGIN 18 19 20 723 24 text src; Errorptr, GetNextSynErr,GetNextSemErr, PrintSynError; File, EF, EOL, Open, Close, Read, Write, WriteString, WriteCard, Writeln; 11 12 13 (* GetLine Read a source WAL SRS IS aS a 17 App. F Read(f,ch); i:=0; WHILE (ch<>EOL) AND IF (1=0) AND (ch=EF) END GetLine; (* PrintSemError 26 PROCEDURE (ch<>EF) DO line[i]:=ch; INC(1); Read(f,ch) THEN line[0]:=EF ELSE line[i]:=0C END; Print semantic PrintSemError(f:File; error END; message nr,col:CARDINAL); 27 VAR i:CARDINAL; 28 BEGIN 29 30 WriteString(f£,"***** Writestering(t mas: 31 32 CASE "); FOR nr OF 1: WriteString(f,"Symbol i:=1 TO declared col-1 DO Write(f," ") twice"); 33 | 2: WriteString(f,"Grammar 34 35 36 31] | | | | WriteString(f,"Undeclared symbol"); WriteString(f,"Terminal on left-hand side of rule"); WriteString(f,"Two rules for the same nonterminal") ; WriteString(f,"Wrong number of attributes"); 38 39 40 41 | 7: WriteString(f,"In-attribute for a terminal") ; 8: WriteString(f,"Wrong attribute direction") ; 3: 4: 5: 6: name is no nonterminal"); 9: WriteString(f,"Wrong attribute name"); 10: WriteString(f,"Attribute constant on left-hand 42 I11: WriteString(f,"Semantic 43 44 45 46 |12: 16: |21: |22: 47 48 WriteString(f,"Undeclared semantic macro") ; WriteString(f,"Pragma used in rules"); WriteString(f,"File "cocosynframe' not found"); WriteString(f,"Number too agit): 23: 124: WriteString(f,"End WriteString(f,"End 49 50 5 52 53 54 55 of of macro line file declared side twice"); in string"); in string"); 125: WriteString(f,"File 'cocosemframe' ELSE WriteString(f,"Error"); END; not found") ; WriteLn(f); END PrintSemError; = (* PrintListing 59 VAR Print a source list END; with error messages of rule");
App. F cocolstMOD 60 volRef: INTEGER; (*volume 285 or directory of source file*) 61 seen: 62 63 64 65 line: ARRAY[0..255] OF CHAR; (*source line*) symbols: Errorptr; (*pointer to error symbols*) synline,syncol: CARDINAL; (*line and column of syntax error*) semnr: CARDINAL; (*semantic error number*) 66 67 68 69 semline,semcol: CARDINAL; Inr: CARDINAL; sync,semc:CARDINAL; 18 CARDINAL; ARRAY[0..63] OF CHAR; (*source name*) (*line and column of (*line number*) (*error counters*) semantic error*) 70 BEGIN 71 volRef:=sre*.volRef; 72 73 1:=0; REPEAT srcn[i]:=src*.name[i]; INC(1) Close (src); Open(src,volRef,sr FALSE); en, GetNextSemErr (semnr, semline,semcol) ; GetNextSynErr (symbols, synline,syncol) ; 74 15 76 GetLine(src,line); 77 WHILE 78 79 80 81 WHILE symbols<>NIL DO PrintSynError (lst,symbols,syncol); INC (sync); GetNextSynErr (symbols,synline,syncol); 93 END; WHILE semnr<>0 DO PrintSemError (lst,semnr,semcol); INC (semc); GetNextSemErr(semnr,semline,semcol) ; 97 END; WriteLn(lst) ; 99 WriteCard(lst,sync,5); 100 WriteCard(lst,semc,5); 101 END PrintListing; 102 103 END cocolst. C ch Close cocolex cocolst col ZZ Gms 103 29 EF J WY EOL ly Errorptr Errors he 8 8) oe u Oel a File WriteString(lst," WriteString(lst," TOR Oe 10 syntax error(s)$"); semantic error(s)$$"); 2.0 IE 7 6 26 f sync:=0; WHILE semline=lnr DO PrintSemError(lst,semnr,semcol); INC(semc); GetNextSemErr (semnr, semline, semcol) ; END; GetLine(src,line); INC(lnr); END; 91 92 98 semc:=0; DO GetNextSynErr (symbols, synline,syncol) ; END; 84 85 86 87 88 89 94 95 96 Inr:=1; sren[1-1]=0C; WriteCard(lst,Inr,5); WriteString(lst," "); WriteString(lst,line); WriteLn(lst); WHILE synline=lnr DO PrintSynError(lst,symbols,syncol); INC (sync); 82 83 90 line[0]<>EF UNTIL 2 Oe he 2 Wi 629293032 0 eet re aa 33 45) 34 468 35 Ai, 36 48
App. F Program listings 286 FileIO GetLine GetNextSynErr i 76 86 82 19 88 96 92 19 20 20 27 29 69 line Inr st 20 78 79 20 80 79 62 84 81 76 88 85 77 79 88 91 95 85 91 19 85 85 84 85 72 12 81 81 81 80 71 95 95 86 86 86 73 73 82 91 82 82 18 99 79 30 43 100 98 32 44 36 48 GetNextSemErr 72 We 72 98 99 99 100 31 49 38 50 39 78 40 79 name nr Open PrintListing PrintSemError PrintSynError Read semc semcol semline semnr sec sren symbols sync syncol synline volRef Write WriteCard WriteLn WriteString 101 100 95 96 94 73 90 99 91 92 33 45 96 95 76 96 88 91 92 92 34 46 35 47
App. F cocosem.DEF 1 (* Generated 2 semantic analyzer ====2=====- 222222220200 00000- 3 This module is produced 4 attributed grammar. 6 7 8 9 287 by Coco from the semantic actions DEFINITION MODULE cocosen; VAR printactions: BOOLEAN; (*trace PROCEDURE Semant (sem:CARDINAL) ; END cocosem. the executed semantic of an actions*)
App. F Program listings 288 (* Generated semantic analyzer This module is produced attributed grammar. by Coco from the semantic actions of an w ome hdr IMPLEMENTATION MODULE cocosem; FROM FROM FROM FileIO IMPORT con, WriteCard, SYSTEM IMPORT WORD; cocolex IMPORT at; FROM cocogen IMPORT FROM cocogra IMPORT FROM FROM cocolex cocosym IMPORT IMPORT WriteString; Attrtype,CloseFile,Copy,EmitAction, GenAssign, InsertFramePart,OpenFile,OpenSem, StartCopy; alts,rules, rootloc,ConcatLeft,ConcatRight, GetNode, GraphList, Graphnode, NewNode, RepNode; typ, line,col,ddt,RestartHash, StopHash; gramspix,CompleteAt,Direction, GetAt, GetMacroNr, GetSy,NewAt, NewMacro, NewSy, RepSy, Symbolnode, Symboltype, SyNr; FROM Errors IMPORT FROM SYSTEM IMPORT CONST null=65535; CompErr,Restriction, SemErr; VAL; TYPE Usage=(def, check, use) ; VAR sn:Symbolnode; sy, Sy1:CARDINAL; rootsy:CARDINAL; eofsy:CARDINAL; gn:Graphnode; gp, 9p1,9p2, gp3: CARDINAL; gl,gl1,912,913:CARDINAL; dd, ddl, dd2:BOOLEAN; gpo : CARDINAL; firstfact : BOOLEAN; kind:Usage; styp:Symboltype; dir,dirl:Direction; count: CARDINAL; n:CARDINAL; seml, sem2,sem3:CARDINAL; firstsymbol : BOOLEAN; ok: BOOLEAN; spix,spix1:CARDINAL; dummy : CARDINAL; MODULE SEMANTICSTACK; IMPORT CompErr,Restriction; EXPORT Pop, Push; CONST maxstacksize=70; VAR stack:ARRAY[1..maxstacksize]OF CARDINAL; sp:CARDINAL; PROCEDURE VAR Pop() :CARDINAL; x:CARDINAL; BEGIN IF sp=0 THEN CompErr (6) ;ELSE RETURN x; END Pop; PROCEDURE Push (x:CARDINAL); BEGIN IF sp<maxstacksize 58 59 THEN INC (sp) ;stack [sp] :=x; ELSE Restriction (14); x:=stack [sp] ;DEC (sp) ;END;
App. F 60 61 62 63 64 65 66 67 68 69 70 71 72 Ue) 74 75 76 vi 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 cocosem.MOD END; END Push; BEGIN sp:=0; END SEMANTICSTACK; PROCEDURE Error (nr:CARDINAL); BEGIN SemErr(nr,line,col);BND PROCEDURE BEGIN ASSIGN(VAR x:WORD; Error; y:WORD) ; xy; END ASSIGN; PROCEDURE BEGIN Semant (sem:CARDINAL) ; (*IF printactions THEN WriteString(con,"$ I) WriteCard(con, sem, 3); WriteString(con,"] "); END;*) CASE sem ne | 12: OF (*line 125*) INC (count); CASE kind OF use: IF styp=nt THEN GetAt (sy, count, spixl,dirl); IF spixl<>0 THEN IF dir=dirl THEN GenAssign (nonterm, spix1,spix); ELSE Error (8) ;END; P END; END; |check: IF styp=nt THEN GetAt (sy, count, spix1,dirl); IF spixl<>0 THEN IF spix<>spixl IF dir<>dirl END; THEN Error (9) ;END; THEN Error(8) ;END; END; |def: NewAt (sy, spix,dir); END; | 13: (*line 150*) INC (count); CASE kind OF use: IF styp=t THEN GenAssign (term, spix, count) ; ELSIF styp=nt THEN GetAt (sy, count, spixl,dirl); IF spixl<>0 THEN IF dir=dirl THEN GenAssign (nonterm, spix, spix1) ELSE END; Error (8); END; 289
Program listings 290 119 120 ial 122 123 124 125 126 17277 128 129 130 sh 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 199 160 161 162 163 164 165 166 167 168 169 170 171 1072 WS) 174 175 176 177 END; |check: IF styp=nt THEN GetAt (sy, count, spix1,dirl); IF spixl<>0 THEN IF spix<>spixl THEN Error (9) ;END; IF dir<>dirl THEN Error(8);END; END; END; |def: NewAt (sy, spix,dir); IF styp=pr THEN GenAssign (term, spix, count) ; END; END; 14: (*line 181*) INC (count); IF kind=use THEN IF styp=nt THEN GetAt (sy, count, spixl,dirl); IF spixl<>0 THEN IF dir=dirl THEN GenAssign (const,spix1,n); ELSE Error (8); END; END; END; ELSE Error (10); END; 19% (*line 198*) IF NOT CompleteAt (sy, count) THEN Error (6); END; 16: KT: 18: (*line 204*) Copy (typ, col) (*line 208*) StartCopy (1) (*line 212*) firstfact :=VAL (BOOLEAN, Pop()); ddl :=VAL (BOOLEAN, Pop()) ;gl1:=Pop() ;gp1:=Pop(); dd:=VAL (BOOLEAN, Pop () ) ;gl:=Pop() ;gp:=Pop() ; 19: gpo :=0 (*line 219*) Push (gp) ;Push (gl) ; Push (VAL (CARDINAL, dd) ) ; Push (gp1) ;Push (gl1) ;Push (VAL (CARDINAL, ddl) ); 20: Push (VAL (CARDINAL, firstfact)); (*line 225%) sy:=SyNr (spix); IF sy=null THEN sy:=NewSy (spix,styp) ELSE END; Error(1); 2% (*line 349*) ASSIGN (gramspix,at[1]); 22% (*line 349*) rules:=0;alts:=0; OpenF ile (gramspix) ; StopHash; 238 (*line 357*) RestartHash; App. F
App. F 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 cocosem. MOD 24: 25% 26: 27: InsertFramePart;styp:=t; (*line 363*) eofsy:=NewSy (0,t) (*line 365*) styp:=t; kind:=def; (*line 368*) styp:=pr (*line 370%) styp:=pr; 28: 29: 30: kind:=def; (*line 371%) GetSy (sy,sn) ;sn.seml:=sem2; RepSy (sy,sn); (*line 376*) GetSy (sy,sn) ;sn.sem2:=sem3; RepSy (sy,sn); (*line 382%) styp:=nt Silke (*line 383*) ASSIGN (spix,at[1]); 322 (*line 384*) styp:=nt; 332 34: kind:=def; (*line 386*) rootsy:=SyNr (gramspix) ; IF rootsy=null THEN Error (2) ;END; (*line 390*) sy:=SyNr (spix) ; IF sy=null THEN Error (3);sy:=NewSy (spix,err) END; GetSy(sy,sn); IF (sn.typ<>nt) AND (sn.typ<>err) THEN Error (4); END; IF sn.start<>0 THEN Error (5) ;END; 39%: 36: syl:=sy;count:=0;styp:=sn.typ (*line 401*) kind:=check; (*line 404*) GetSy (syl,sn); sn.start:=gp;sn.del:=dd; RepSy (syl,sn); INC (rules); Sis (*line “ 410%) rootloc:=NewNode (nt, rootsy, 0); 38: gp1:=NewNode (t,eofsy, 0); gl:=rootloc;gll:=gpl; ConcatRight (rootloc,gl,gpl,gll) (*line 415*) IF ddt ["L"]THEN CloseFile; 39: (*line 420*) gp:=gpl; gl:=gll; dd:=ddl; 40: (*line 420%) INC (alts); GraphList;END; 291
Program listings 292 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 22 213 274 ZS) 276 277 278 Zo 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 41: (*line 422%) INC (alts); ConcatLeft (gp,gl,gpl,gll); 42: dd:=dd OR ddl (*line 429*) gpo:=0 43: (*line 430%) firstfact:=TRUE; 44: (*line 430*) gpl:=gp2; gll:=gl2; ddl:=dd2; 45; (*line 46: firstfact:=FALSE; (*line 432*) 431*) IF gp2<>0 THEN ConcatRight ddl:=ddl (gp1,g11,gp2,g12) ; AND dd2; END; 47: (*line 440%) sy:=SyNr (spix); IF sy=null THEN Error (3) ;sy:=NewSy (spix,err) END; GetSy (sy,sn); IF sn.typ=pr THEN Error (16) ;END; gp2:=NewNode (sn.typ,sy, line); gl2:=gp2;dd2:=FALSE;gpo:=gp2; count :=0;styp:=sn.typ 48: . 49; (*line 450*) kind:=use; (*line 451*) GetNode (gp2,gn); gn.seml:=seml;gn.sem2:=sem2; RepNode (gp2,gn) 50: 51% O2 (*line 456*) gp2:=NewNode (eps, 0, line); g1l2:=gp2;dd2:=TRUE;gpo:=gp2 (*line 459*) gp2:=NewNode (any, 0,line); 912:=gp2;dd2:=FALSE;gpo:=gp2 (*line 462*) IF gpo=0 THEN gp2:=New (eps, Node 0,line); g12:=gp2;dd2:=TRUE; GetNode (gp2, gn) ; gn. sem3:=sem3; RepNode (gp2, gn) ; ELSE GetNode (gpo, gn) ;gn.sem3:=sem3; RepNode (gpo, gn) ; gp2:=0;912:=0;gpo:=0 END; 53: (*line 475%) gp2:=gp; gl2:=gl; dd2:=dd; 54: (*line 478%) 9p2:=NewNode (eps, 0, line); gl2:=gp2; ConcatLeft (gp,gl,gp2,gl2); App. F
cocosem.MOD gp2:=gp;gl2:=gl;dd2:=TRUE; 555 (*line 485*) gp2:=NewNode (eps, 0, line); 56: gl2:=gp2; ConcatRight (gp,gl,gp,gl); ConcatLeft (gp,gl,gp2,g12) ; gp2:=gp;dd2:=TRUE; (*line 493*) IF firstfact THEN gp3:=9p2;g13:=g12; gp2:=NewNode (eps, 0, line) ;gl2:=gp2; ConcatRight (gp2,912,9P3,913); alle (*line END; 502*) sem1:=0;semZ:=0 58: 59: 60: (*line 503*) count :=0; (*line 510*) IF styp<>nt THEN dir:=down; (*line 515*) Error(7);END; ASSIGN (n,at[1]); (Sl (*line 520*) IF kind=use THEN EmitAction(line,seml) ; END; 62: 63: (*line 526%) dir:=up (*line 531*) IF (kind=use) OR(styp=pr) THEN EmitAction(line,sem2) ; END; 64: 65: 66: 67: (*line 537*) StopHash; firstsymbol:=TRUE (*line 538%) RestartHash (*line 539%) GetMacroNr (spix, sem3) ; IF sem3=0 THEN Error (12) ;END; (*line 543*) IF firstsymbol THEN firstsymbol:=FALSE; OpenSem(line, sem3) ; StartCopy (col) END; Copy (typ, col) 68: (*line 69: RestartHash; (*line 556*) 549*) OpenSem(line, sem3) ; NewMacro (spix, sem3, 0k) ; IF NOT ok THEN Error (11);END; OE StopHash; firstsymbol:=TRUE; (*line 562*) IF firstsymbol THEN firstsymbol:=FALSE; StartCopy (col) END; Ws PAB Copy (typ, col) (*line 568*) RestartHash (*line 575*) 293
294 Program listings 356 357 GetSy(sy,sn);sn.aliasspix:=sp1ix; RepSy (sy,sn); 358 359 App. F END; END Semant; 360 BEGIN 361 362 printactions:=FALSE; END cocosem. aliasspix alts any 356 15° 276 174° 236 ASSIGN 67 1) UR at Attrtype check CloseFile cocogen cocogra cocolex LO 298 13 24 94 120 1372230 13 NS 10 17 cocosem cocosym col CompErr CompleteAt th 18 17 21 18 «238 ASB) 217 Sy Go 46 149 153 53 98399340350 con ConcatLeft ConcatRight const 8 19523955295 sy 22 PSs 141 Copy count SMS 3s 40 352 38 83 87 96 312 21928265, 32015 9861625022072 32 158 163 234 32 248 254 264 1708229 ZA S02 S128 183 220 37 89 OO Os! 37 87 89 96 18 37 315 44 13 320 326 28 180 225 273 280 293 298 208 211 259 64 65 91 98 204 208 212 214 21 8 34 157 164 244 cull Fey) SINS ec 13 ey ae ale 19 87 IC 19333 16 269 282 284 US) US) RE} A Si IGE) NG PG, dd ddl dd2 ddt def del dir dirl Direction down dummy EmitAction eofsy eps err Error Errors FileIo firstfact firstsymbol GenAssign GetAt GetMacroNr GetNode GetSy gl eng) 3 es On saya) 352 aly 106 110 112 122 131 135 2312740 240 248 274 277 240 254 281 291 254 291 296 302 129 114 140 122 315 125 323 138 140 124 314 125 334 142 346 146 150 169 290 295 296 300 300 188) 138 149 201 ie, DE 997112 306 99 259 116 262 250 SV) 304 AG) an 122138 DIY Oo) a ae 350 356 239
cocosem.MOD gp3 gpo gramspix GraphList Graphnode InsertFramePart kind line maxstacksize n NewAt NewMacro NewNode NewSy nonterm nr nt nulıy +, ok OpenFile OpenSem Pop pr printactions Push RepNode RepSy RestartHash Restriction rootloc rootsy rules sem seml sem2 sem3 Semant SEMANTICSTACK SemErr sn sp spix spixl 301 31 sil 299 31 29 30 302 30 30 276 296 30 33 18 16 16 14 35 17 344 48 39 19 19 16 20 90 64 86 23 42 14 14 47 130 361 47 16 20 17 21 15 27 15 72 40 40 40 72 45 21 25 219 50 43 206 43 124 158 247 301 305 269 159 163 253 305 307 270 162 158 246 277 298 305 160 172 163 252 277 299 307 242 175 22929 178 84 65 107 263 49 141 103 345 224 168 115 65 95 167 345 175 338 51 185 56 271 191 177 46 224 203 174 79 190 190 193 359 63 65 190 220 53 90 208 87 138 295 226 264 306 227 274 307 233 277 239 281 247 286 253 290 294 295 296 270 220 271 232 282 239 282 289 283 295 284 296 284 300 285 300 301 225 253 280 301 226 263 281 302 227 264 282 305 232 264 283 306 239 269 286 306 246 271 289 307 253 273 293 274 294 274 295 264 203 274 277 279 284 285 286 136 273 183 276 188 280 201 293 217 298 267 306 319 320 325 326 338 225 180 263 208 273 259 276 280 293 298 306 111 204 346 121 207 137 258 196 200 211 224 314 157 262 158 325 158 158 159 159 159 162 285 221 342 162 162 163 163 163 164 334 338 344 345 210 265 62 129 23 356 211 356 214 357 29 131 166 168 198 112 113 115 122 123 57 317 129 344 55 187 61 283 194 331 59 226 204 222 357 354 227 224 270 193 282 270 270 282 310 270 284 320 310 284 326 333 190 220 53 98 257 88 139 197 221 53 103 259 90 141 193 261 57 110 333 96 193 262 58 115 345 97 194 263 58 124 356 98
Program listings 296 stack start StartCopy StopHash styp Sy syl Symbolnode Symboltype SyNr SYSTEM t term typ up Usage use VAL WORD WriteCard WriteString x y 33 220 155 1515 86 196 87 191 263 215 25 36 166 22 178 131 153 35 85 157 67 53 69 App. F 58 338 329 95 200 96 193 356 219 350 347 109 215 103 194 Soff 221 111 265 i 206 203 206 250 180 182 225 211 21 108 158 67 54 121 314 122 207 130 325 129 208 Ney) 168 178 182 138 210 149 215 166 27 167 168 29080259 ZS) 262 263 265 340 352 136 159 267 162 319 163 325 164 56 58 67 69 185
App. F cocosemframe (* Generated semantic analyzer This module is produced attributed grammar. DEFINITION MODULE VAR printactions: by Coco from -->modulename; BOOLEAN; (*trace 11 12 13 14 IE PROCEDURE Semant (sem:CARDINAL) ; END -->modulename. -->implementation (* Generated semantic analyzer S===2=2=2=2=222=2===2=2=2==2=22==2===== This module is produced by Coco from attributed grammar. De nn nn a 18 FROM 10 the semantic actions the executed semantic the semantic actions ne 16 IMPLEMENTATION MODULE -->modulename; 17 FROM FileIO IMPORT con, WriteCard, WriteString; SYSTEM IMPORT WORD; 19 FROM -->scannername 20 21 -->declarations 22 23 24 PROCEDURE BEGIN 250 Xt=y; 26 END 29 BEGIN 30 (*IF 31 +. IMPORT ASSIGN(VAR at; x:WORD; y:WORD) ; ASSIGN; Du 28 PROCEDURE Semant (sem:CARDINAL) ; printactions THEN WriteString(con,"$ ["); 32 33 WriteCard(con,sem, 3) ; WriteString(con,"] "); 34 END; *) 35 CASE sem OF 36 112; 37 -->actions 38 END; 39 END Semant; 40 BEGIN 41 printactions:=FALSE; 42 END -->modulename. actions 37 ASSIGN 23 at con declarations 19 17 21 FileIoO 17 implementation modulename printactions scannername sem 10 6 7 19 8 Semant sy SYSTEM 18 WORD WriteCard 18 17 297 26 9 4] 16 28 35 ASS) 233 42 ae of an actions*) of an ee *)
298 Program listings WriteString 17 x y me DS) 23S App. F
cocosym.DEF (* cocosym Symbol This module a) generates and updates nonterminals list symbol for nodes 299 coco for Moe terminals, 28.12.83 pragmas searches names in the symbol list stores and retrieves attribute information stores and retrieves semantic macros marks deletable collects DEFINITION symbols first-sets, MODULE in symbol follow-sets, list eps-sets and any-sets cocosym; CONST , maxterminals = 128; TYPE Direction = (up,down); (*attribute direction*) Attributeptr = POINTER TO Attribute; Attribute = RECORD spix: CARDINAL; (*name of attribute*) dir: Direction; (*up,down*) next: Attributeptr; (*to next attribute of same nt*) END; Symboltype = (eps,t,pr,nt,any,err); Symbolnode = RECORD spix: CARDINAL; (*spelling index of symbol*) aliasspix: CARDINAL; (*spelling index of alias name*) nra: CARDINAL; (*no.of attributes*) CASE typ: Symboltype OF (*type of symbol*) *, pr: seml,sem2: CARDINAL; (*pragma semantics*) | nt,err: start: CARDINAL; (*start of top-down graph*) del: BOOLEAN; (*TRUE if deletable*) firstat: Attributeptr; (*to first attribute node*) END; END; Symbolset = ARRAY[0..maxterminals DIV 16] OF BITSET; VAR maxany: maxeps: CARDINAL; CARDINAL; (*no.of (*no.of maxt: maxp: maxs: gramspix: CARDINAL; CARDINAL; CARDINAL; CARDINAL; (*no.of last terminal*) (*no.of last pragma*) (*no.of last nonterminal*) (*grammar name, filled by .AG*) PROCEDURE (* Clears ClearSet (VAR set s*) s:Symbo lset; PROCEDURE CompleteAt (sy,nr:CARDINAL) (* Checks if symbol sy has any-sets*) eps-follower-sets*) n:CARDINAL) ; : BOOLEAN; nr attributes*) PROCEDURE FindDelSymbols; (* Marks deletable nonterminals and prints them*) PROCEDURE GetA(n:CARDINAL; VAR set:Symbolset) ; (* Gets the any-set with the number n*) and
300 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 iu 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 We 113 114 App. F Program listings PROCEDURE GetAt (sy,n:CARDINAL; VAR spix:CARDINAL; (* Gets the spelling index spix and the direction attribute of the symbol sy*) VAR dir:Direction); dir of the n-th PROCEDURE GetE(n:CARDINAL; VAR set:Symbolset); (* Gets the eps-follower-set with the number n*) PROCEDURE (* Gets GetF (sy:CARDINAL; the set of terminal VAR first:Symbolset) ; start symbols for the nonterminal PROCEDURE GetFirstSet (loc:CARDINAL; VAR set:Symbolset); (* Gets the terminal start symbols of the graph with the root sy*) loc*) PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset) ; (* Gets followers of the nonterminal sy*) PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL) ; (* Gets the number sem of the semantic action corresponding macro with the name spix*) PROCEDURE GetSy(sy:CARDINAL; (* Gets the symbol node with VAR the sn:Symbolnode); index sy*) PROCEDURE GetSymbolSets; (* Collects first-sets, follower-sets, PROCEDURE IsInSet (n:CARDINAL; (* TRUE if n is in set s*) VAR PROCEDURE NewAt (sy,spix:CARDINAL; (* Enters a new attribute for the spix and the direction dir*) PROCEDURE (* Enters sem*) eps-sets and any-sets*) s:Symbols :BOOLEAN; et) dir:Direction); symbol sy with the spelling NewMacro(spix,sem:CARDINAL; VAR ok: BOOLEAN) ; a new semantic macro with the name spix and the PROCEDURE NewSy (spix:CARDINAL; (* Generates a new symbol with its returns index*) to the index action typ:Symboltype) : CARDINAL; the name spix and the type typ and PROCEDURE RepSy (sy:CARDINAL; sn:Symbolnode); (* Replaces the symbol sy by the node snt) PROCEDURE SetBit (VAR (* Sets bit n in set s:Symbolset; s*) PROCEDURE Unit (VAR sl,s2:Symbolset; (* Adds the set s2 to the set s1*) PROCEDURE (* Gets END n:CARDINAL) ; n: CARDINAL) ; SyNr(spix:CARDINAL) : CARDINAL; the cocosym. symbol number for the identifier with the name spix*) number
App. F cocosym.MOD (* cocosym Symbol list 301 for coco Moe This module a) generates and updates symbol nodes for terminals, pragmas nonterminals b) searches names in the symbol list c) stores and retrieves attribute information d) stores and retrieves semantic macros o ~— marks deletable symbols in symbol list f) collects first-sets, follow-sets, eps-sets and any-sets 29.12.83 and IMPLEMENTATION MODULE cocosym; FROM cocogra IMPORT maxn, rootloc, ClearMarkList, Deletable, DelNode, RP PRP Rr OW Nr Sw SPWMHYPrP DWYIDO GetNode, Graphnode, Mark, Marked, Marklist, RepNode; 15 FROM cocolex IMPORT line, col, ddt, GetName; 16 FROM cocolst IMPORT lst; 17 FROM Errors IMPORT CompErr, Restriction, SemErr; 18 FROM FileIO IMPORT con, Write, WriteCard, WriteString,WriteText,WriteLn; 19 FROM System IMPORT Allocate; 20 SYSTEM IMPORT VAL; FROM 21 22 CONST 23 24 25 26 Zi 28 29 30 31 32 33 34 35 +anysetsize epssetsize maxsymbols maxnt null: eofsy = 80; = 65535; = 0; is to be added*) END; Firstset = ARRAY[0..maxnt-1] OF RECORD ts: Symbolset; (*terminal symbols*) ready: BOOLEAN; (*TRUE if ts is complete*) END; Macroptr = POINTER Macronode = RECORD spix: CARDINAL; sem: CARDINAL; next: Macroptr; 46 47 compl.-sets for any-symbols*) eps-follower-sets*) symbols*) (*max.number of nonterminals*) ‘Anyset = ARRAY[l..anysetsize] OF Symbolset; Epsset = ARRAY[l..epssetsize] OF Symbolset; Followset = ARRAY[0..maxnt-1] OF RECORD ts: Symbolset; (*terminal symbols*) nts: Symbolset; (*nts whose start set 40 4) 42 43 44 45 (*max.no.of (*max.no.of (*max.no.of TYPE 36 37 38 39 = 20; = 70; = 200; TO Macronode; (*name of semantic (*associated (*to next macro*) semantic action*) sem macro*) END; Symbollist = ARRAY[{0..maxsymbols] OF Symbolnode; 48 49 VAR 50 yi! 52 Se 54 99 56 Sn Ss anyset: column: epsset: ATTA firstmacro: fnt: follow: lastmacro: snc Anyset; CARDINAL; Epsset; Firstset; Macroptr; CARDINAL; Followset; Macroptr; Symbollist; (*actual no.of any-sets*) (*printing column for terminal (*actual no.of eps-sets*) (*terminal start symbols*) (*first sem macro*) (*no.of first nonterminal*) (*terminal successors*) (*last sem macro*) (*symbol list*) sets*)
Program listings 302 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 La 12 NIL} 114 115 116 157 118 s:Symbolset); App. F PROCEDURE AllBit (VAR PROCEDURE PROCEDURE DelBit (VAR s:Symbolset; n:CARDINAL); FORWARD; PrintSet (VAR s:Symbolset; n:CARDINAL); FORWARD; FORWARD; PROCEDURE PutNt (sy:CARDINAL) ; FORWARD; PROCEDURE PutTermSet (VAR s:Symbolset); (* CompleteAt PROCEDURE BEGIN Test CompleteAt RETURN correct (sy,nr:CARDINAL) (sn[sy].nra=nr) (* FindDelSymbols PROCEDURE if nr is the Find all OR FORWARD; no.of attributes : BOOLEAN; (sn[sy].typ=err); deletable END CompleteAt; symbols and print (*while new them FindDelSymbols; VAR change: BOOLEAN; dummy: CARDINAL; first: BOOLEAN; sales CARDINAL; name: ARRAY [1..50] sn: Symbolnode; BEGIN fnt:=maxpt1; REPEAT change :=FALSE; OF CHAR; deletable symbols*) FOR 1:=maxp+l TO maxs DO GetSy (i,sn); IF (NOT sn.del) AND (sn.start<>0) AND Deletable(sn.start) sn.del:=TRUE; RepSy(i,sn); change:=TRUE; THEN END; END; UNTIL NOT change; first:=TRUE; FOR i:=maxp+l TO maxs GetSy(i,sn); IF sn.del THEN IF first (*print DO deletable symbols*) THEN WriteLn(1lst); WriteLn(lst); WriteString(1st,"Deletable first:=FALSE; symbols:"); WriteLn (lst); END; GetName (sn.spix,name,1); WriteString(lst," "); WriteText (lst,name, 1); END; END; IF first THEN WriteLn(lst); WriteLn(lst); WriteString(lst,"Grammar WriteLn(lst); contains no deletable END; END FindDelSymbols; (* GetA Returns WriteLn(lst); the any-set with the number nr symbols.");
App. FR 219 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 199 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 al 172 173 174 175 176 rl cocosym.MOD PROCEDURE GetA(nr:CARDINAL; VAR BEGIN s:=anyset[nr]; END GetA; (* GetAnySets Find the 303 s:Symbolset); complement sets for any-nodes PROCEDURE GetAnySets; VAR gn: Graphnode; loc,i: CARDINAL; Si Symbolset; BEGIN (*GetAnySets*) FOR loc:=1 TO maxn DO GetNode (loc,gn); IF (gn.typ=any) AND (gn.lp<>0) THEN (*any with alternatives*) GetFirstSet (gn.lp,s); FOR 1:=0 TO maxt DIV 16 DO (*make s [i] :=VAL(BITSET, 65535) -s [i]; complement*) END; DelBit(s,eofsy); (*any must not INC (maxany); anyset [maxany] :=s; gn.sp:=maxany; RepNode (loc,gn); END; END; END GetAnySets; (* GetAt PROCEDURE Get name and recognize direction GetAt(sy,nr:CARDINAL; VAR of an eofsy*) attribute spix:CARDINAL; VAR dir:Direction); VARs | i: CARDINAL; p: Attributeptr; BEGIN IF IF (sn[sy].typ<>nt) AND (sn[sy].typ<>err) (nr>sn[sy].nra) OR (sn[sy].typ=err) THEN spix:=0; dir:=down; (*semantic ELSE p:=sn[sy].firstat; FOR 1:=1 TO nr-1 DO p:=p*.next; spix:=p*.spix; dir:=p*.dir; END THEN CompErr(3); error*) END; END; GetAt; (* GetE Returns the eps-set with PROCEDURE GetE(nr:CARDINAL; VAR BEGIN s:=epsset [nr]; END GetE; (* GetEpsSets PROCEDURE Find GetEpsSets; VAR curnt: CARDINAL; m: Marklist; sn: Symbolnode; the follower the number nr s:Symbolset) ; symbols for eps-nodes END;
Program listings App. F PROCEDURE FindEpsFollowers (loc, leftsy:CARDINAL; VAR s:Symbolset; VAR nr:CARDINAL) ; BEGIN GetFirstSet (loc,s); IF Deletable(loc) THEN Unit(s,follow[leftsy-fnt] INC (maxeps); epsset [maxeps] :=s; .ts,maxt); END; nr:=maxeps; END FindEpsFollowers; PROCEDURE VAR gn: nr: FindEps (loc, leftsy:CARDINAL; vialp:BOOLEAN); Graphnode; CARDINAL; BEGIN IF (loc=0) OR Marked(loc,m) Mark (loc,m) ; THEN RETURN; END; GetNode (loc,gn); WITH gn DO IF (typ=eps) AND (vialp OR (lp<>0)) FindEpsFollowers (rp, leftsy,nr) ; sp:=nr; RepNode(loc,gn) ; THEN END; IF lp<>0 IF rp<>0 THEN THEN FindEps(lp,leftsy,TRUE); END; FindEps (rp,leftsy,FALSE); END; END; END FindEps; BEGIN (*GetEpsSets*) ClearMarkList (m) ; FOR curnt:=maxp+l TO maxs DO GetSy(curnt,sn); FindEps (sn.start,curnt, FALSE); END; END GetEpsSets; (* GetF Returns PROCEDURE GetF (sy:CARDINAL; BEGIN the s:=first[sy-fnt].ts; (* GetFirstSet PROCEDURE VAR terminal m: Gets start VAR END the PROCEDURE (*mark list CollectFirstSet s:Symbolset); terminal for start VAR symbols (loc:CARDINAL; VAR IF ddt[{"G"] THEN THEN ; set:Symbolset) BEGIN WHILE loc<>0 DO (*for Mark(loc,m); GetNode (loc,gn); graph nodes*) Graphnode; Symbolnode; Symbolset; ClearSet (set,maxt) ; IF (loc=0) OR Marked(loc,m) of the set:Symbolset) visited VAR gn: sn: sl: of sy GetF; GetFirstSet (loc:CARDINAL; Marklist; symbols RETURN; all alternatives*) END; ; in loc
App. F cocosym.MOD 237 238 239 240 241 WriteString(con,"CollectFirstSet:"); WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ) ,6); WriteCard(con,gn.sp,6); WriteLn(con); END; IF DelNode (gn) THEN 242 CollectFirstSet (gn.rp,sl); Unit(set,sl,maxt); END; CASE gn.typ OF eps: ; | 468 SetBit (set,gn.sp) ; | nt: IF first (gn.sp-fnt] .ready 243 244 245 246 247 248 THEN 249 ELSE 250 Unit (set, first [gn.sp-fn .ts,maxt) t] ; GetSy (gn.sp,sn); 251 CollectFirstSet (sn.start,sl); Unit (set,sl,maxt); END; 252 253 | any: AllBit (set); 254 END; 205 (*CASE*) loc:=gn.1p; 256 257 258 259 260 261 262 305 END; END (*WHILE*) CollectFirstSet; BEGIN (*GetFirstSet*) ClearMarkList (m); CollectFirstSet (loc, set); IF ddt["H"] THEN WriteString (con, "GetFirstSet:"); END; END GetFirstSet; 263 PrintSet (set,maxt); 264 265 266 267 268 (* GetFollowSets Get terminal successors of nonterminals 269 --------------------------------400 270 PROCEDURE GetFollowSets; 271 VAR 272 change: 203 i,n,nl: BOOLEAN; CARDINAL; 274 Zio) m: sn! Marklist; Symbolnode; 276 Zn, PROCEDURE 278 279 280 VAR gn: Graphnode; set: Symbolset; 281 BEGIN 282 WHILE CollectFollowSets (loc, sym:CARDINAL); loc<>0 DO (*step 284 285 286 287 288 289 Mark (loc,m); GetNode (loc,gn); WITH gn DO IF ddt["J"] THEN WriteString(con,"CollectFollowSets "); WriteCard(con,loc,6); WriteCard(con,sp, 6); 292 WriteCard(con,sym,6); END; IF typ=nt END; alternative IF Marked(loc,m) 290 291 THEN RETURN; through 283 (*cycle*) WriteLn(con); THEN 293 294 GetFirstSet (rp,set); Unit (follow[sp-fnt].ts,set,maxt) ; 295 IF Deletable(rp) THEN chain*) x)
306 Program listings App. F SetBit (follow[sp-fnt] .nts,sym-fnt) ; 296 297 END; IF ddt ["I"] THEN 298 WriteString(con, "CollectFollowSets:"); 299 WriteCard(con, loc, 6); 300 WriteString(con,"$ "); PrintSet (follow[sp-fnt].ts,maxt); 301 WriteString(con,"$ "); 302 PrintSet (follow[sp-fnt] .nts,maxs-maxp) ; 303 WriteLn (con); 304 END; 305 END; (*IF typ=nt*) 306 CollectFollowSets (rp,sym); 307 loc:=1p; 308 309 END; (*WITH*) END; (*WHILE*) 310 END CollectFollowSets; gyal 312 PROCEDURE Complete (1:CARDINAL); (*add indirect successors of*) 313 314 VAR j: CARDINAL; (*i+fnt to follow[i].ts*) 315 BEGIN 316 IF Marked(i,m) THEN RETURN; END; (*already visited*) 317 Mark (i,m); 318 FOR j:=0 TO maxs-fnt DO 319 IF IsInSet(j,follow[i].nts) THEN 320 Complete (j); 321 Unit (follow[i].ts,follow[j].ts,maxt) ; 322 END; 323 END; 324 END Complete; 325 326 BEGIN (*GetFollowSets*) 327 FOR i:=fnt TO maxs DO 328 ClearSet (follow[1-fnt] .ts,maxt); 329 ClearSet (follow[i-fnt] .nts,maxs-fnt); 330 END; 331 332 ClearMarkList (m); 333 FOR 1:=fnt TO maxs DO (*get direct successors of nonterminals*) 334 GetSy(i,sn); 335 IF ddt["I"] THEN 336 WriteString (con, "GetFollowSets (0) :"); WriteCard(con,sn.start,6); 337 WriteCard(con,1,6); WriteLn(con); 338 END; 339 CollectFollowSets (sn.start,i); 340 END; 341 CollectFollowSets (rootloc,maxs+1l); (*successors of grammar symbol*) 342 343 FOR 1:=0 TO maxs-fnt DO (*add indirect successors to follow.ts*) 344 ClearMarkList (m); 345 Complete (i); 346 ClearSet (follow[i].nts,maxt); 347 END; 348 349 IF ddt ["I"] THEN 350 WriteString (con, "GetFollowSets (3) :$"); 351 FOR i:=0 TO maxs-fnt DO 352 WriteCard(con, fnt+i, 6); PrintSet (follow[i].ts,maxt) ; 353 WriteLn (con); 354 END; 355 END;
App. F 356 357 358 359 cocosym.MOD END 307 GetFollowSets; (* GetFo Get follow-set of nonterminal sy 360 =----------=------2_--_--___ _______ 2... _______ ____________ x) 361 PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset); 362 BEGIN set:=follow[sy-fnt].ts; END GetFo; 363 364 365 (* GetMacroNr Get semantic macro 366 ------------------------------------------------------~-------------- x) 367 PROCEDURE GetMacroNr (spix:CARDINAL; VAR sem:CARDINAL) ; 368 VAR p: Macroptr; 369 BEGIN , 370 371 312 373 p:=firstmacro; WHILE (p<>NIL) AND (p*.spix<>spix) DO p:=p*.next; IF p=NIL THEN sem:=0; ELSE sem:=p*.sem; END; END GetMacroNr; END; 374 375 376 (* GetSy 311 2222222222222 Gets the symbol 378 PROCEDURE GetSy(sy:CARDINAL; VAR 379 BEGIN snl:=sn[sy]; END GetSy; sy =) snl:Symbolnode); 380 381 382 (* GetSymbolSets 383 ----------------------------2-----------------22 Get first-sets, follower-sets, eps-sets and any-sets 384 PROCEDURE GetSymbolSets; 385 VAR 386 1; CARDINAL; 387 sn: Symbolnode; 388 BEGIN 389 390 391 392 393 394 395 fnt:=maxpt1; FOR i:=0 TO maxs-fnt DO first[i].ready:=FALSE; FOR i:=fnt TO maxs DO GetSy (1i,sn); GetFirstSet (sn.start,first[i-fnt].ts); first [i-fnt] .ready:=TRUE; END; END; 396 397 398 GetFollowSets; GetEpsSets; GetAnySets; 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 IF ddt({"K"] THEN (*print first-sets and follow-sets*) WriteLn (lst); WriteString(lst,"List of terminal start symbols:"); WriteLn(lst); FOR i:=fnt TO maxs DO PutNt (1); PutTermSet (first [i-fnt].ts); END; WriteLn(lst); WriteLn(lst); WriteString(lst,"List of terminal successors:"); FOR i:=fnt TO maxs DO PutNt (1); PutTermSet (follow[i-fnt].ts); END; END; END GetSymbolSets; (* NewAt Enter new attribute for a symbol WriteLn(lst); =)
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 App. Program listings 308 PROCEDURE NewAt (sy, spx:CARDINAL; dir:Direction); VAR i: CARDINAL; p,at: Attributeptr; BEGIN WITH sn[sy] DO INC (nra); IF typ=nt THEN (*store name and direction?) Allocate (at,SIZE (Attribute)); at*.spix:=spx; at*.dir:=dir; at*.next:=NIL; IF firstat=NIL THEN firstat:=at; ELSE p:=firstat; WHILE p*.next<>NIL DO p:=p*.next END; p*.next:=at; END; END; END; END NewAt; (* NewMacro Enter ee new semantic ee macro ee PROCEDURE NewMacro(spix,sem:CARDINAL; VAR p,s: Macroptr; VAR ee ee ee *) ok:BOOLEAN); BEGIN p:=firstmacro; WHILE (p<>NIL) AND (p*.spix<>spix) DO p:=p%*.next; IF p=NIL THEN ok:=TRUE; Allocate (s, SIZE (Macronode) ) ; s*.spix:=spix; s*.sem:=sem; s*.next:=NIL; IF firstmacro=NIL THEN firstmacro:=s; lastmacro:=s; ELSE lastmacro*.next:=s; END; ELSE ok:=FALSE; END; END NewMacro; (* NewSy a END; lastmacro:=s; Generate a new symbol and return index a a ee NewSy (spx:CARDINAL; tp:Symboltype): CARDINAL; a PROCEDURE VAR i: CARDINAL; BEGIN IF maxs=null THEN maxs:=0; ELSE INC(maxs); END; IF maxs>=maxsymbols THEN Restriction(6); END; WITH sn[maxs] DO typ:=tp; spix:=spx; aliasspix:=spix; nra:=0; CASE typ OF Be IF maxt=null THEN maxt:=0; IF maxp=null THEN maxp:=0; IF maxt>=maxterminals THEN Il Toter IF maxp=null ELSE INC(maxt); ELSE INC(maxp); Restriction(7); END; END; END; *)
App. F cocosym.MOD 474 THEN SemErr(25,line,col); 475 ELSE INC (maxp); 476 477 END; seml:=0; sem2:=0; 478 | nt,err: 479 480 481 482 483 309 maxp:=0; maxt:=0; 5 start:=0; del:=FALSE; END; (*CASE*) END; (*WITH*) RETURN maxs; END NewSy; firstat:=NIL; 484 485 486 (* RepSy 487 ----------------- A ----- =~ Replace symbol 488 489 PROCEDURE RepSy(sy:CARDINAL; snl:Symbolnode); BEGIN sn[sy]:=snl; END RepSy; $5 5 sy 5 5 $5 = == = = = = = == == = === ------- x) 490 491 492 (* SyNr Gets index of name spix Sn 494 495 PROCEDURE SyNr(spix:CARDINAL): VAR i: CARDINAL; 496 BEGIN THEN RETURN =) CARDINAL; 497 498 IF maxs=null 1:=0; null; END; 499 500 WHILE (i<=maxs) AND (sn[{i].spix<>spix) IF 1i>maxs THEN i:=null; END; 501 RETURN 502 503, END DO INC(1); END; i; SyNr; 504 * 905 (* ALIBIEC Set all bits in set s 506 --------------------------------------------------------------------- x) 507 PROCEDURE AllBit (VAR s:Symbolset) ; 508 509 VAR 1: BEGIN CARDINAL; 510 FOR 511 512 513 514 515 516 517 518 519 520 S21E 220 523 END AllBit; 1:=0 TO maxterminals DIV 16 DO s[{i]:=VAL(BITSET, 65535); Deletes bit n in set s en en een ee PROCEDURE DelBit (VAR s:Symbolset; n:CARDINAL); END; (* ClearSet Clears set s ----------------------------7777777777222 PROCEDURE ClearSet (VAR s:Symbolset; n:CARDINAL); VAR i: CARDINAL; BEGIN FOR i:=0 TO n DIV 16 DO s[i]:={}; END; END ClearSet; *) A DeLBLt EXCL(s[n DIV 16], n MOD END Sn SoSe 2 524 BEGIN 525 526 527 528 529 (* Empty TRUE if set s is empty =-------2--------------222-2222. Sn PROCEDURE Empty(VAR s:Symbolset; n:CARDINAL) :BOOLEAN; 530 531 532 VAR 1:CARDINAL; BEGIN FOR i:=0 TO n DIV 16 DO 16); Son *) DelBit; *)
App. F Program listings 310 533 IF s[i]<>{} 534 END; 535 RETURN 536 END THEN RETURN FALSE; END; TRUE; Empty; 537 538 539 (* InSet TRUE if sl <= s2 540 ------------------------nn n= 541 PROCEDURE InSet (VAR sl,s2:Symbolset; n:CARDINAL) :BOOLEAN; 542 VAR 543 544 545 546 BEGIN FOR i:=0 TO n DIV 16 DO IF NOT(s1{i]<=s2[i]) THEN END; i: CARDINAL; 547 RETURN 548 END RETURN FALSE; END; TRUE; InSet; 549 550 551 (* IsInSet TRUE VIE nedseineseess 552 -------------------------22-2222 2202220 553 PROCEDURE IsInSet (n:CARDINAL; VAR s:Symbolset) :BOOLEAN; 554 BEGIN RETURN (n MOD 16) IN s[n DIV 16]; END IsInSet; 555 556 557 (* PrintSet ddt output of set s 558 ---------------------------------------4... 559 PROCEDURE PrintSet (VAR s:Symbolset; n:CARDINAL); 560 561 562 563 564 565 VAR i: CARDINAL; BEGIN FOR 1:=0 TO n DIV DIV MOD (* PutNt Print name of nonterminal sy = =======2-2=2-- 2-2 2-2 571 PROCEDURE PrintSet; 572 VAR Sl ih: 574 name: SS, Gkalp nn *) PutNt (sy:CARDINAL); CARDINAL; ARRAY[1..50) Symbolnode; OF CHAR; BEGIN 577 GetSy(sy,sn); GetName(sn.spix,name,1); 578 WHILE INC(1); 579 980 581 WriteLn(lst); WriteString(lst," column:=15; 582 END 1<12 DO name[l]:=" "; END; "); WriteText (lst,name,1); Write (lst," "); PutNt; 583 584 585 x) 256,4); 256,4); 566 567 568 569 520 576 x) 16 DO WriteCard(con,VAL(CARDINAL,s[i]) WriteCard(con,VAL(CARDINAL,s[i]) END; END *) (* PutTermSet Print names of terminals 386) ----- 2. nn ln nn eg 587 PROCEDURE PutTermSet (VAR s:Symbolset); 588 CONST maxlinelen = 72; 589 VAR 590 1,1: CARDINAL; 591 name: ARRAY[1..50] OF CHAR; ee in set s ee SE
App. F 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 cocosym.MOD sn: 311 Symbolnode; BEGIN FOR i:=0 TO maxt DO IF IsInSet(i,s) THEN GetSy(i,sn); GetName(sn.spix,name, 1); IF column+l>maxlinelen THEN WriteLn (lst); WriteString(lst," column:=15; END; WriteText (lst,name,1); INC (column, 1+2) ; END; (*IF IsInSet*) END; "); (*FOR*) Writeln (lst); END WriteString(lst," > PutTermSet; (* SetBit PROCEDURE BEGIN Sets DIV (* Unit 16],n MOD si Unit (VAR s= END SetBit; + s2 n:CARDINAL) ; 16 DO sl[i]:=sl[i]+s2[1i] ; END; 622 BEGIN (*cocosym*) 623 maxt:=null; maxp:=null; 624 maxany:=0; maxeps:=0; 625 END cocosym. aliasspix 466 AllBit Allocate any ClaeZ5 3 00M 19 424 447 19328253 31 sl s n:CARDINAL); 16); sl,s2:Symbolset; VAR i: CARDINAL; BEGIN FOR 1:=0 TO n DIV Anyset n in set SetBit (VAR s:Symbolset; INCL(s[n PROCEDURE bit maxs:=null; firstmacro:=NIL; od 50 anyset anysetsize at Attribute 505512082139 230231 419 424 425 424 Attributeptr change 151 419 188779129437 ClearMarkList ClearSet cocogra cocolex cocolst 13 231 is 15 16 206 328 260 329 425 332 346 425 427 272 344 516 518 cocosym col 1272625 15 474 CollectFirstSet 225 242 251 257 261 CollectFollowSets am RNP ae) column S158 lee 597225997.602 CompErr 17153 Complete 3135 320532452345 CompleteAt u 7p SV 430 END Unit;
con curnt ddt del DelBit Deletable DelNode dir Direction down dummy Empty eofsy eps Epsset epsset epssetsize err Errors EXCL FileIo FindDelSymbols FindEps FindEpsFollowers first firstat firstmacro Firstset £nt follow Followset FORWARD GetA GetAnySets GetAt GetE GetEpsSets GetF GetFirstSet GetFo GetFollowSets GetMacroNr GetName GetNode GetSy GetSymbolSets gn Graphnode 1 App. F Program listings 312 972895229058 Bee) 77631208528 Ds) Dei Osh! 18, Dey? Youle ei Heer 564 563 174 207 208 209 1522236 22620 PE ZO. ssi SHO ses 90 91 99 479 621 1138 523 524 1372905 825 7295 13241 14971557 USSR S98 46425" 7425 148 416 155 79 529 536 28 138 196 245 3252 S2e 107185 24 32 Tle 53 LOAN e478 2 352 17 524 18 76 114 187 200 201 203 178 185 197 SO CONS 6n 0 403 157 426 427 429 54 370 442 449 37 53 555 85) 182217 3210328703297 3297 393 394 402 403 56.118202 29457296 33253627108 33 56 61 62 63 64 119 120 125 143 398 148 161 166 167 2a 397 2162217 134 181 222 265 361 362 270 356 396 aii) 209 OSI LIED 479 450 623 2477723922995 7333, 3133510 407 408 3012530323 EA EZ SE 90 9 3 os 296796 30120 3525 362723805390 391 EBD DE a Er 2 931s 65 293 393 El la Als Er 14721327199 I AT Sr zul 127 132, 1337 2355238082395 286 285 iM WG AI 81 88 89 598 5235=5.285 25033403 ee 133513 22140, 241724072417 120191921551 246 217222090 950809558979 Ban 9] aie) 97 273 339 eyes 317, 346 319° 351 320 352 313. 343 316" 345 ae 9327) 352 OG NSIS aS GOYA 3288370) 386 390 333) 390 38 391 ay 39%
App. F INCL InSet IsInSet J 1 lastmacro Macronode Macroptr Mark Marked Marklist maxany maxeps maxlinelen maxn maxnt maxp maxs maxsymbols maxt maxterminals n nl name NewAt NewMacro NewSy next nr cocosym.MOD 393 499 530 594 612 541 319 314 81 601 57 178 15 128 198 282 133 16 400 598 175 317 41 41 14 14 14 139 183 588 13 26 85 623 88 391 623 25 135 392 471 62 553 273 82 416 439 460 45 70 197 71 153 35 27 439 151 372 443 394 499 532 395 313 402 499 83 596 403 500 542 618 403 500 544 619 407 501 545 619 408 508 545 619 408 510 560 619 418 510 562 461 517 563 495 518 564 498 518 590 554 319 106 595 320 573 321 577 578 578 578 580 590 596 597 451 187 451 197 200 201 132 225 284 196 101 401 601 193 344 447 54 234 232 223 140 184 140 232 285 200 102 405 605 206 178 232 289 200 102 405 181 233 300 255 106 406 182 234 308 308 106 406 187 235 192 238 192 255 193 261 194 277 106 579 110 580 110 580 111 580 112 598 223 232 234 260 274 283 284 316 OW 284 283 274 624 624 368 317 316 440 37 97 207 303 389 470 470 470 473 474 475 97 402 207 407 303 463 318 463 327 463 329 464 333 465 341 482 343 497 351 499 390 500 47 182 469 510 63 554 464 231 469 242 469 248 471 251 474 263 594 294 623 301 321 328 346 273 554 516 559 518 562 523 611 524 612 524 612 529 617 532 541 544 105 434 455 483 158 71 198 154 247 296 463 446 157 372 443 106 574 Si 578 580 Seh 596 601 371 19 425 120 429 148 429 154 430 158 443 166 448 167 451 178 184 190 422 292 303 469 453 158 419 444 466 423 319 470 478 329 473 346 497 497 500 623 623 623 158 429 159 429 159 429 368 429 370 430 371 440 371 442 371 443 548 553 318 105 602 450 182 474 als 222 283 134 101 401 601 192 332 42 45 193 192 175 139 183 597 131 33 88 619 371 443
pr PrintSet PutNt 472 63) 64 263) 403 301 408 303" 571 S52) 582 PutTermSet ready 65 39 403 247 408 390 587 394 606 RepNode RepSy 14 91 17 140 488 464 198 489 471 13 341 Restriction rootloc rp s sl s2 sem seml sem2 SemErr set SetBit sn snl sp spix App. F Program listings 314 5595566 ey AW 61 62) 166 167 44824507 Seis) Geis! IE 363) 179 24507 GY! 2222992957207 5657119120129 1 S43 6m 181 182 183 216 217 440 450 451 5 07 S051 6 OS SSE) Gs) Stl Sk GO (il 2298 2325 Ki Selby A436) 477 477 17 474 LLL 225 294 361 246 296 58 71 242 a 3725 25 ls) 372) 25 le S14 SO Sila 439 448 231 362 611 il 29328 2165 2185 2512535 612 83 89 90 90 90 91 91 98 99 ar 275 oe ays) 336 sil a 339 Bee at 379 BIS MG 387 S86 LG 392 OE 393 20D 421 220) 465 250) 489 Sil 499 247 159 466 248 159 494 250 367 499 289 371 499 294 371 577 296 425 596 301 439 303 443 336 148 416 339 153 421 393 153 488 479 154 489 154 571 157 577 216 217 ass 334 Sy 3 447 DZ ily, Cees 8189 448 448 32429 OOOO 448 2612635280293 sym Symbollist Symbolnode Symbolset 378 140 43 443 416 90 64 361 277 47 47 31. 379 488 198 239 105 148 448 448 425 460 90 209 70 71 362 378 290 296 58 83 176 832" 34, 489 246 155 466 466 251 71 379 307 228 275 0955 030" 378 Gil 387 488 “62%, 63% 575 65a 592 119 7799 166 216 eke 225 Gal 280 361 516 523 529 541 Symboltype SyNr System 179 a 460 494 19 OU ar GO Spx start sy SYSTEM ie tp ts typ Unit VAL vialp Write WriteCard WriteLn WriteString 222 a 229 faaly) 507 502 20 246 468 460 466 layer 393 403 408 Uy ise) le IGE) GLY aI 1927724272487 251 294321 20 136 510 563 564 187 196 18 580 18 238 238 239 289 289 564 18 101° 101° 102° 106° 110° 353 400 401 405 405 406 18 102 106 111 237 263 Aa OS WE YE 9617619 PY ADS) AGG 7167) 290 336 337 352 563 239. 250 605 3070302 304 339 336 350 300 1104.12. 579 598 288% 2997
App. F WriteText cocosym.MOD 401 18 406 106 580 580 598 601 601 315
Program listings 316 1 (* General table-driven 2 3 This is a parser module syntax analyzer generated by Coco from an attributed grammar. 4 Before calling the procedure Parse from the main program, initialize 5 the scanner (<grammarname>lex.MOD) . 7 DEFINITION MODULE cocosyn; 8 VAR 9 printinput: BOOLEAN; (*trace the 10 + printnodes: BOOLEAN; (*trace the 11 12 PROCEDURE Parse (VAR correct :BOOLEAN); 13 END cocosyn. input tokens read*) G-code interpretation*) App. F
2 'ORB cocosyn.MOD 317 (* General table-driven syntax analyzer Re S===S==2==SSS2=S 222222222 22=2======2=>= Moe 21.12.83 01 (21.12.83) First version (rewritten from PL/M) 02 (28.02.84) New interface for input and errors 03 (02.04.84) Error in EOL-processing corrected 04 (08.05.84) New EOL-processing 05 (23.07.84) For G-code 06 (30.08.84) Error recovery simplified 07 (05.04.85) New G-code instruction EPSA (ANYA modified) 08 (12.04.87) Grammar tables initialized INLINE 09 (12.04.87). typ,col,line and at exported by cocolex 10 (07.06.87) Name of error module and scanner procedure constant nenn an, IMPLEMENTATION MODULE cocosyn; FROM Errors IMPORT SyntaxError, FROM FileIo IMPORT con, Errorptr, WriteCard, Errornode; WriteLn, WriteString; RP PRP RPP PRP HM NH SW MN WIAD COW WODMDANIDOBPWNHOrFP FROM System IMPORT Allocate; FROM SYSTEM IMPORT FROM cocosem FROM cocolex IMPORT ADDRESS, ADR, INLINE; Semant; IMPORT GetSy, typ, at, line, col; NW MR NM Oo >wvwomwh CONST maxname maxnamep maxcode maxany YH LH MH NM von maxeps ww ro = 385; = 45; = 401; = 37 = 10; maxt = maxp maxs startpc = 34; = 45; = 397; 34; MH SW OO YD Ww ww 38 CONST et nts eps (*G-code = 0; = 4; = 8; errdistmin lmaxs eofsy wo bP BB wWwnNroowo instructions*) eek ntas epsa = = = = ay = 5; = 9; 2; 50; 0; ies = 2 any = 6; jmp = 10; Dtags= anya = ret = 3, 7; ll; (*min.distance between two errors*) (*max.stack length*) (*token number of endfile symbol*) = Nom Sb Sp 47 TYPE > ce WO Attributenumbers = ARRAY[0..maxp] OF CARDINAL; Namepointers = ARRAY[0..maxnamep] OF CARDINAL; Name list = ARRAY(1..maxname] OF CHAR; Pragma = RECORD (*semantics for a pragma*) sem2,sem3: CARDINAL; END; Pragmalist Symbolset Symbolnode startpc: Oo ND WMHr-H OS AaAannnnnn 58 59 del: = ARRAY[maxt..maxp] OF Pragma; = ARRAY[0..maxt DIV 16] OF BITSET; (*set of terminals*) = RECORD (*symbol information (only for nt)*) CARDINAL; (*start node of rule for nt*) BOOLEAN; (*TRUE, if nt is deletable*)
App. F Program listings 318 60 61 62 first: END; Symbollist 63 Stack (*terminals Symbolset; = ARRAY[maxp+l..maxs] = ARRAY[1..lmaxs] 64 65 VAR 66 tab: POINTER TO RECORD causing to analyze nt*) this OF Symbolnode; OF CARDINAL; (*grammar tables*) 67 68 69 70 hl header: code: ntsymbols: epsset: anyset: ARRAY[1..8] OF CARDINAL; (*not used*) ARRAY[1..maxcode] OF CHAR; (*G-code area*) Symbollist; (*nonterminals information*) ARRAY[1..maxeps] OF Symbolset; ARRAY[1..maxany] OF Symbolset; 72 73 74 75 nra: ps: namep: name: Attributenumbers; Pragmalist; Namepointers; Namelist; 76 END; 77 Lem correct: pee (*no.of attributes*) (*semantics for pragmas*) (*pointers to symbol names*) (*symbol names*) BOOLEAN; CARDINAL; (*error indicator*) (*program counter*) 79 80 errdist: CARDINAL; 81 newlacts: ARRAY 82 83 84 newpc: s,olds: lacts: (*current [0..maxt] ARRAY [0..maxt] Stack; CARDINAL; OF CARDINAL; (*new OF CARDINAL; error stack (*pc after (*stack distance*) length*) recovery*) pointer*) 85 86 87 PROCEDURE 88 GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL); FORWARD; 89 PROCEDURE RestoreStack; 90 PROCEDURE 91 PROCEDURE 92 PROCEDURE SaveStack; FORWARD; StackElem(i:CARDINAL): CARDINAL; FORWARD; Triple (altroot:CARDINAL); FORWARD; FORWARD; 93 94 95 (* Match Check if sy is member of the specified set 96 ---------------------------- x) 97 PROCEDURE Match(sy:CARDINAL; set:Symbolset): BOOLEAN; 98 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match; 99 100 101 (* NextSym 102 103 --------------------------------------------------_-_________ 2... 2... Ei) PROCEDURE NextSym; 104 BEGIN 105 106 107 Get next symbol LOOP GetSy; = (*IF printinput THEN 108 WriteString(con,"S$(in:"); 109 110 WriteString(con,") IF printnodes THEN 111 ANZ, 113 WriteCard(con, END; END; *) 114 IF typ<=maxt 115 WITH tab” WriteCard(con,typ, 3); "); lacts,3); THEN RETURN WriteString(con,"| "); END; DO 116 IF correct AND (ps[typ].sem2<>0) 117 118 IF correct END; THEN Semant (ps[typ].sem2); AND (ps[typ].sem3<>0) END; THEN END; Semant (ps[typ].sem3);
App. F 119 120 121 122 123 124 125 cocosyn MOD IF typ=eofsy THEN RETURN 319 END; END; END NextSym; (*=========================== ERRORS S===S======2=22=2=========5===5========mk) 126 127 (* AdjustPc Adjust pc to next symbol instruction 128 --------------------------------------------------------------------- x) 129 PROCEDURE AdjustPc(VAR pc:CARDINAL) ; 130 131 132 183 BEGIN WITH tab” IF pc=0 LOOP DO THEN RETURN; END; 134 CASE ORD(code[pc]) 185 136 137 t,ta,nt,nta,nts,ntas,any,anya,eps,epsa: EXIT; | jmp: pc:=256*ORD (code [pc+1])+ORD (code [pc+2]); leret:2pss=0, ZEXIT; 138 139 140 ELSE END; END; 141 142 INC (pc); OF (*sem*) END; END AdjustPc; 143 144 145 146 (* Error Report syntax error ------------------------------------- 147 PROCEDURE 148 VAR 149* e,el,h: 2272722222222 - *) Error (VAR pc,altroot:CARDINAL) ; Errorptr; 150 1,j: CARDINAL; 151 opcode, sy,nextpc,altpc,pcl: CARDINAL; 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 PROCEDURE GiveName(q:Errorptr; sy:CARDINAL); VAR p,4}: CARDINAL; BEGIN WITH tab“ DO p:=namep[sy]; j:=0; WHILE (j<25) AND (name[p+j]<>0C) DO INC (4); q*.txt[j]:=name [p+tj-1]; END; qu. Ls=i END; END GiveName; BEGIN (*Error*) correct :=FALSE; IF errdist >= errdistmin THEN 169 170 Allocate (h,SIZE (Errornode)); h*.next:=NIL; el:=h; 171 172 pel:=altroot; AdjustPc(pcl); WHILE pc1>0 DO 173 174 175 176 197, GiveName(h,typ); (*pass GetSymInstr (pcl,opcode,sy,nextpc,altpc); IF opcode<any THEN (*t,nt,nts,ta,nta,ntas*) Allocate (e,SIZE (Errornode)); GiveName (e,sy); (*pass el‘.next:=e; el:=e; e*.next:=NIL; near-symbol*) expected symbol*)
Program listings 320 178 END; pel:=altpc; 19 180 END; (*WHILE*) SyntaxError(h, line, col); 181 Triple(altroot); SaveStack; 182 IF printnodes THEN 183 WriteString(con,"$ typ newpc newlacts$"); 184 FOR i:=0 TO maxt DO 185 IF newpc[1]<>0 THEN 186 WriteCard(con,i,5); WriteCard(con,newpc[1],10); 187 WriteCard(con,newlacts[i],10); WriteLn(con) ; 188 189 END; (*IF*) 190 END; (*FOR*) 191 END 7a (rR) ELSE RestoreStack; 192 END; 193 WHILE newpc[typ]=0 DO 194 IF printnodes THEN 195 196 WriteString(con,"$(skip:"); WriteCard(con,typ,0); 197 WriteString(con,") "); END; 198 199 NextSym; 200 END; 201 pe:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0; END Error; 202 203 204 205 (* Fill Fill triple list with alt-chain starting at pc 206 207 PROCEDURE Fill(pc,lacts:CARDINAL); 208 VAR 209 i1,opcode,sy,nextpc,altpc: CARDINAL; 210 s: Symbolset; 211 BEGIN 212 AdjustPc (pc); 213 WHILE pc<>0 DO 214 GetSymInstr (pc, opcode, sy, nextpc,altpc) ; 215 CASE opcode OF 216 eta 2a] newpc[sy]:=pc; newlacts[sy]:=lacts; 218 | nt,nta,nts,ntas: 219 s:=tab*.ntsymbols[sy].first; 220 FOR 1:=0 TO maxt DO 221 IF Match(1,s) THEN newpc[i]:=pc; newlacts[1]:=lacts; END; 222 END; 223 IF tab*.ntsymbols[sy].del THEN Fill(nextpc,lacts); END; 224 | eps,epsa: 225 Fill(nextpc,lacts) ; 226 ELSE (*any,anya: nothing*) PEN END; (*CASE*) 228 pc:=altpc; 229 END; (*WHILE*) 230 END Fill; 231 232 233 (* Fillsuce Fill triple list with succ. of alt-chain at pe 234 235 PROCEDURE 236 VAR FillSucc (pc, lacts:CARDINAL) ; App.
App. F 237 cocosyn. MOD opcode, sy,nextpc,altpc: 321 CARDINAL; 238 BEGIN 239 AdjustPc(pc); 240 WHILE pc>0 DO (*fill with successors of alternative-starts*) 241 GetSymInstr (pc, opcode, sy,nextpc, altpc) ; 242 IF nextpc>0 243 pce:=altpc; 244 245 END; END THEN Fill (nextpc,lacts); END; (*WHILE*) FillSucc; 246 247 248 (* GetSymInstr Get G-code instruction at address pc 249 -------------4-------4 250 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: 251 252 BEGIN (*assert: WITH tab* pc points opcode:=ORD (code [pc] ) ; 254 IF 256 257 258 (opcode<=epsa) THEN AND (opcode<>any) e,nt,eps: nextpc:=pc+2; 261 262 263 | ta,nta,anya,epsa: nextpc:=pct4; | nts: nextpc:=pc+3; 264 265 | ntas: | any: 271 272 Zar IN RET,JMP,SEM,ANY)*) sy:=ORD (code [pc+1]); 260 269 270 (not ELSE sy:=0; END; CASE opcode OF 259 266 267” 268 instruction DO 253 255 to a symbol *) CARDINAL); nextpc:=pc+5; nextpc:=pc+tl; altpc:=0; altpc:=256*ORD (code [pc+2])+ORD (code [pc+3]) ; altpc:=0; altpc:=256*ORD (code [pc+2] ) +ORD (code [pc+3]); altpc:=0; END; (*CASE*) AdjustPc(nextpc); AdjustPc(altpc); END; (*assert: nextpc,altpc point to symbol END GetSymInstr; instructions or are zero*) ZTriple Fill triple list SEI I OOSEESES =) 275 PROCEDURE Triple (altroot:CARDINAL); 276 VAR i: CARDINAL; 277 BEGIN 278 279 280 FOR 1:=0 TO maxt DO newpc[i]:=0; newlacts[i]:=0; END; 281 282 283 FOR 1:=1 TO lacts DO (*s[1] contains successor at FillSucc(StackElem(i),1-1); 284 Fill (StackElem(i),1-1); 285 END; 286 287 288 289 290 291 292 293 294 295 FillSucc(altroot, lacts) ; Fill (altroot, lacts) ; (*clear triple (*fill with level 0*) (*fill (*fill with with list*) succ.of stacked nt's*) succ.of current alt-chain*) alt-chain*) END Triple; (*========================= (*======================== END ERRORS ================================%) SYNTAXSTACK ===============================%)
296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 Sal 352 353 354 355 App. F Program listings 322 PROCEDURE Pop(VAR loc: CARDINAL); BEGIN IF lacts>0 : THEN loc:=s[lacts]; DEC(lacts); ELSE WriteString(con,"--- Parser stack underflow.$"); HALT; END; (*IF printnodes END Pop; PROCEDURE BEGIN IF THEN Push(loc: WriteString(con," pop"); END;*) CARDINAL); lacts<lmaxs THEN INC(lacts); s[{lacts] :=loc; ELSE WriteString(con,"--- Parser stack overflow.$"); END; (*IF printnodes THEN WriteString(con," push"); END;*) END Push; HALT; PROCEDURE RestoreStack; BEGIN s:=olds; END RestoreStack; PROCEDURE SaveStack; BEGIN olds:=s; END SaveStack; PROCEDURE StackElem(i:CARDINAL) : CARDINAL; BEGIN RETURN s[i]; END StackElem; (* TableContents A dirty trick PROCEDURE TableContents; BEGIN (*%% dont remove or change to initialize this the grammar tables comment*) INLINE( 401, 34, 34, 45, 10, Sr 45, 385, (7=——G=Code-——*) Ue lp BIOs AL, Die 3, 4359, 256, 5648, 2560, SE, 22, BOS, 36, 811, 3679296070 14247 4120 82), 56, 5125, 9984,12569, 813, 39, 2560, 9985, 3072,20506, 812, 80, 5125, 9984,18459, 7171,10752,15645, 2560,15616, 2590, 273, 101, 7956, 1319, $4, 8195,11520,21258, 83, 2050, 8448, 3329, 4352,33311, 8709, 9984,29987, 2052, 3840, 5122, 9252, 21, 2560,27144, 805, 4, 9739, 549,10024, 278, 151, 549,10506, 141, 2053, 2858, 1062,11052, 1318, 168,11566, 2560,40712, 1547, 812, 186,12037, 9984,46640, 12552, 1807, 2817, 1536,49202, 2817, 512,50739, 281.9,.1.01527 52276, 2817, 5888,55315, 548,13568, 6162, 2817, 6400,58387, 348,13824, 6674, 2816, 6931, 548,14080, 7186,14347, 2 14597,10241, SB 2p Ass), SS), 30, 2820,10554, 2560, 64768, 2107, 32, 273, 297, 7948, 289, 293, 273, 286, 7948, 2561, 4352, 4924, 3594, 273, 2056,15627, 19,15374, 2561, 4352, 2878, 327 1721949 2228972324, 17, 1949, 2561,14600, 2367, 2816, 3648, 279, 345,16640, 4383,16896, 6144, 1291, 1794, 353,17162, 345, 2058,17418, 342, 14, 32, 17, 8005, V5 WEB, Sip LIS, SiS6), 5,18187, UE Mle TO) 18, 7947, 17.556, 18443, 5477, 0, 2816, (*---nt-symbols---*) 17 0, 128, 0, 0, 137, 0,16452, 2694, 0,
App. F cocosyn MOD 323 356 154, 0,16452, 2694, Oy, ste 0,16452, 2694, 0, 357 0, 07.256, OBER 2EZE 0, 0, 8192, 0, 239, 358 0, 0, 0,16384, Ya SS 304, 0, 0, 2048, 359 0, 0, 359, 6, 0, OFS Sill, 0,16384, 0, 360 391, 0, 2, 0, 0, 361 (*---eps followers---*) 362 0, 17 0, 512, 0, 0, 8192, 0, 0, 16, 363 0, 16, 0, 5408, 0, 0,16452, 8166, Op AT27 364 0, 0, 0, 0,16384, 0,49152, 0, 0, 32, 365 (*---any sets---*) 366 65022, 65534, 65535, 65502, 65535, 65535, 65502, 65535, 65535, 367 (*---attribute numbers---*) 368 0, 0, 0, 0, 0 0, 0, 0, 0, 0, 369 370 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 371 0, 0, 0, 0, 17 372 (*---pragma semantic---*) 373 0, 0, 374 (*---name pointers---*) 375 17 197 57 74, 69, 53, 19, 59, 44, 34, 376 Cp 83, I, ele Ana, IR Ar ait A or 377 za WS, le, a NE). er AO ee PADI 378 ZU PAY aii, AES, Bly PA, IN PD. XS Oi 379 366, 2313, 1,2349,5 30073157533 380 (*---name list---*) 381 17743,17920, 8769,19529,16723, 8704, 8801,28281, 8704, 8772, 382 34,17742,17479,21057, 17931719521, 21057,.21577,20302, 212827 383 19746, 34,25966,25715,25965, 8704, 8805,28787, 8704, 8775, 384 21057,19789,16722, 8704, 8809,28194, 34,19777,17234,20307, 385 » ‚ 8704, 8782,20302,21573,21069,18766,16716,21282, 34,28533, 386 " 29730, 34,20562,16711,19777,21282, 34,21077,19525,21282, 387 34, 21317,19777, 20052, 18755, 21282, 34,29541,27938, 34, 388 21573,21069,18766,16716,21282, 105,25701,28276, 26982, 26981, 389 78,21837,16965,20992,10045, 9984, 29184, 21332,21065, 20039, 390 10030, 9984,10108, 9984,10024, 9984,10025, 9984,10075, 9984, 391 10077, 9984,10107, 9984,10109, 9984,10044, 9984,10046, 9984, 392 25455, 29561, 10043, 9984,10042, 9984,10028, 9984,28271,25455, 101, 34, 25455, 29298, 25955, 29728, 26482, 24941, 28001,29218, 393 394 30832, 29285, 29555, 26991, 28160, 24940, 29797, 29294, 24948, 26998, 34, 97,29812, 29289, 25205, 29797, 25856,29561, 28002, 28524, 395 26990 29812, 29289, 25205,29797, 8704, 8815,30068,11617, 396 ,1161 7, 29812,29289,25205,29797, 8704, 8819,25965,24942,29801, 25376, 397 28001, 25376, 01, 24931,29801,28526, 8704, 8819,25965,24942,298 398 115,31085,25199,27648, 8801,27753,24947, 8302, 25458,28450, 399 0,0); 24941,25890, 400 401 END TableContents; 402 403 404 405 406 407 PROCEDURE Parse(VAR corr:BOOLEAN) ; 408 VAR altroot: CARDINAL; (*root of current alternative chain*) 409 mustread: BOOLEAN; (*TRUE if next symbol must be read*) 410 opcode: CARDINAL; (*instruction code*) 411 running: BOOLEAN; (*interpreter state*) 412 sy: CARDINAL; 413 414 ee ie
324 Program listings App. F 415 BEGIN tab:=ADR(TableContents)+10D; (*initialize the tables*) 416 pe:=startpc; altroot:=pc; 417 line:=1; col:=1; 418 correct:=TRUE; mustread:=TRUE; running:=TRUE; 419 420 WITH tab* DO 421 WHILE running DO 422 opcode: =ORD (code [pc]) ; 423 IF mustread AND (opcode<=epsa) THEN 424 NextSym; mustread:=FALSE; INC(errdist); altroot:=pc; 425 END; 426 (*IF printnodes THEN WriteCard(con,pc,5); END;*) 427 428 INC (pc); CASE opcode OF 429 (58 430 431 IF ORD (typ) =ORD (code [pc] ) THEN IF typ=eofsy (*t recognized*) 432 THEN running:=FALSE; 433 ELSE INC(pc); mustread:=TRUE; 434 END; 435 ELSE Error (pc,altroot); 436 437 END; ta: 438 439 IF ORD (typ)=ORD (code [pc] ) 440 THEN INC (pc,3); mustread:=TRUE; (*t recognized*) 441 ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2])); (*try alt.*) 442 END; | nt pnts: 443 444 sy:=ORD (code [pc]); 445 IF Match (typ,ntsymbols[sy].first) OR ntsymbols[sy].del 446 THEN (*right nt, parse it*) 447 IF opcode=nts THEN INC (pc); Semant (ORD(code[pc])); END; 448 Push (pc+1); pc:=ntsymbols[sy].startpc; 449 altroot:=pc; 450 ELSE Error (pc,altroot); 451 END; 452 | nta,ntas: 453 sy:=ORD (code [pc]); 454 IF Match (typ,ntsymbols[sy].first) 455 THEN (*right nt, parse it*) 456 INC (pc, 3); 457 IF opcode=ntas THEN Semant (ORD(code[pc])); INC(pc) END; 458 Push (pc); pc:=ntsymbols[sy].startpc; 459 altroot:=pc; 460 ELSE pc:=ORD (code [pc+1])*256+0RD (code [pc+2]); (*try alt.*) 461 END; 462 | any: mustread:=TRUE; (*any recognized*) 463 | anya: 464 IF Match (typ,anyset [ORD (code [pc]) ]) 465 THEN INC (pc,3); mustread:=TRUE; (*any recognized*) 466 ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2]); 467 END; 468 | eps: 469 IF Match (typ, epsset [ORD (code [pc])]) 470 THEN INC (pc); 471 ELSE Error (pc,altroot); 472 END; 473 | epsa:
App. F 474 475 476 477 478 479 480 481 482 483 484 485 486 487 cocosyn.MOD 325 IF Match (typ,epsset [ORD (code [pc])]) THEN INC (pc, 3); (*eps recognized*) ELSE pc:=ORD (code [pct+1] ) *256+0RD (code [pce+2]) ; END; | jmp: pc:=ORD (code [pc] ) *256+ORD(code[pct1]); (*goto successor*) | ret: Pop(pc); altroot:=pc; (*end of nt*) ELSE (*sem*) IF correct THEN Semant (ORD (opcode) ); END; END; (*CASE*) END; (*WHILE running*) END; (*WITH tab**) corr:=correct; END Parse; Ss 488 BEGIN 489 490 491 492 493 printinput:=FALSE; printnodes:=FALSE; errdist:=100; lacts:=0; END cocosyn. ADDRESS AdjustPc ADR Allocate altpc altroot Pr any anya anyset at Attributenumbers Cc cocolex cocosem cocosyn code col con corr correct D del e el eofsy eps epsa epsset errdist errdistmin Error Errornode Errorptr 20 20 20 110 87 262 92 449 40 40% 71 23 48 EA IE 2102392677261] 416 6 S75) lol 7379 2095 DIT 2287231772499 237 72637264 7265 7261 147) Pia) 9182; 201 275) 286 28710972177 450 459 471 479 135 174 254 265 462 1352. 464 2615 9250 260 42572436 463 72 158 23 22 15 493 OGme 345 1360136 439 441 441 444 469 474 476 476 23 181 418 18 184 187 187 407 485 ieee kG ey 416 59 223 445 IWS) a A Ae WA le NTP a a nl) ANS 41 135 224 259 AVS 5) 2245 2545 70 469 474 SOR Gime 2015 4255 43 167 147 202 436 450 11216951715 aba 498 153 2535259) 447 453 478 478 262 457 2620264 460 460 2642323 464 466 188 188 196 196 300 atom Zul Eee a] 468 26164249 4911 471 473 197 309 7431 466
Errors FileIo Fill FillSucc first FORWARD GetSy GetSymInstr GiveName h HALT header al INLINE j jmp 1 lacts line lmaxs loc Match maxany maxcode maxeps maxname maxnamep maxp maxs maxt mustread name Namelist namep Namepointers newlacts newpc next nextpc NextSym nra nt nta ntas nts ntsymbols olds opcode Pp Parse pc App. F Program listings 326 17 18 207 233 60 88 23 87 153 149 300 67 91 276 20 150 41 161 84 298 23 44 296 97 29 28 30 26 27 32 33 31 410 15 50 74 49 81 82 170 87 260 103 72 39 39 40 40 69 83 87 254 154 407 78 201 243 265 441 284 287 225 283 445 90 230 286 454 91 242 214 169 169 241 176 170 250 270 170 181 185 279 186 279 187 281 187 283 188 283 209 284 220 284 199 478 158 158 159 199 159 161 201 299 181 63 299 98 71 68 70 50 49 48 62 54 419 158 19 157 74 188 186 177 151 262 121 207 299 418 307 305 221 MT 307 221 308 223 308 225 492 235 242 308 445 454 464 469 474 54 62 55 424 159 8 425 82 434 114 440 185 462 220 465 201 187 177 173 263 199 207 194 221 201 279 217 221 279 209 264 425 214 265 223 267 225 237 135 135 135 135 219 315 151 258 157 486 87 207 250 417 441 218 218 218 218 223 318 173 411 158 259 261 264 263 445 443 452 452 443 445 457 447 448 454 458 174 423 159 209 424 214 429 215 447 129 212 253 417 444 132 213 255 423 447 134 214 260 425 447 136 217 262 428 448 136 221 262 431 448 223 245 219 89 106 173 163 169 309 150 278 330 154 136 92 221 320 221 321 221 281 286 287 278 241 2427222250 237 457 241 481 250 253 254 136 228 262 434 449 137 235 263 436 450 138 239 264 439 453 147 240 264 440 456 201 241 264 441 457
App. F pel Pop Pragma Pragmalist printinput printnodes ps Push q RestoreStack ret running s SaveStack sem2 sem3 Semant set Stack StackElem startpc sy cocosyn.MOD 457 469 479 151 296 51 54 489 183 73 305 153 89 41 412 83 Symbollist Symbolnode Symbolset 90 52 52 22 97 63 91 34 87 217 448 62 57 55 SyntaxError 17 System SYSTEM ie ta tab TableContents Triple txt typ WriteCard WriteLn WriteString 19 20 39 39 66 328 92 159 23 431 18 18 18 458 470 458 471 459 474 460 475 460 476 wal 303 54 73 171 479 172 178 179 195 116 312 159 192 137 419 210 182 116 117 116 98 83 283 58 97 219 453 69 62 60 181 490 116 448 161 314 479 422 219 317 116 1407 117 117 458 117 327 460 476 464 476 465 478 sail 466 478 466 478 466 479 Hl Sh} STE 255 7321765209214 256 413 444 445 27, 445 315 433 221 318 299 308 447 457 481 284 417 98 223 454 320 448 98 237 458 321 458 Ton 241 153 250 70 71 97 210 135 135 115 402 182 216 216 131 416 275 259 261 156 430 438 219 223 252 416 421 288 114 432 187 188 184 116 439 187 116 445 188 117 454 196 117 464 ie 469 a 474 ae 196 197 300 309 ee Zu zu
328 1 Program listings (* General table-driven syntax App. F analyzer 2 3 This is a parser module generated by Coco from an attributed grammar. 4 Before calling the procedure Parse from the main program, initialize 5 the scanner (<grammarname>lex.MOD) . 7 DEFINITION MODULE 8 VAR 9 printinput: 10 printnodes: -->modulename; BOOLEAN; BOOLEAN; (*trace (*trace the input tokens read*) the G-code interpretation*) 12 PROCEDURE Parse (VAR correct:BOOLEAN) ; 13 END -->modulename. 14 -->implementation 15 (* General table-driven syntax analyzer Re 16 Moe 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 S===2=2=2=2=============2=2===2=2=========> 21.12.83 01 (21.12.83) First version (rewritten from PL/M) 02 (28.02.84) New interface for input and errors 03 (02.04.84) Error in EOL-processing corrected 04 (08.05.84) New EOL-processing 05 (23.07.84) For G-code 06 (30.08.84) Error recovery simplified 07 (05.04.85) New G-code instruction EPSA (ANYA modified) 08 (12.04.87) Grammar tables initialized INLINE 09 (12.04.87) typ,col,line and at exported by cocolex 10 (07.06.87) Name of error module and scanner procedure constant -----------------222----------222-2... 2... 0.0... “) IMPLEMENTATION MODULE -->modulename; FROM FROM Errors FileIO IMPORT IMPORT FROM System IMPORT SyntaxError, Errorptr, Errornode; con, WriteCard, WriteLn, WriteString; Allocate; 33 FROM SYSTEM IMPORT ADDRESS, 34 35 FROM -->semantic analyzer 36 FROM -->input module 37 IMPORT IMPORT ADR, INLINE; Semant; GetSy, typ, at, line, col; 38 -->declarations 39 40 CONST (*G-code instructions*) Als) ta =]; none): 42 nts=4; ntas = 5; any = 6; 43 eps = 8; epsa = 9; jmp = 10; 45 46 i errdistmin Ilmaxs eofsy = 2; = 50; (*min. (*max. = (*token (5 nta anya ret = 3; = 7; = 11% distance between stack length*) number of endfile two errors*) symbol*) 49 TYPE 50 51 52 53. Attributenumbers Namepointers Namelist Pragma 54 sem2,sem3: 95 END; 56 57 58 59 Pragmalist Symbolset Symbolnode = = = = ARRAY [0..maxp] OF CARDINAL; ARRAY(0..maxnamep] OF CARDINAL; ARRAY[{1..maxname] OF CHAR; RECORD (*semantics for a pragma*) CARDINAL; N i il} ARRAY [maxt..maxp] OF Pragma; ARRAY[O..maxt DIV 16] OF BITSET; (*set of terminals*) RECORD (*symbol information (only for nt)*)
App. F 60 61 62 63 64 65 68 cocosynframe startpc: del: elite Sie CARDINAL; BOOLEAN; Symbolset; 329 (*start node of rule for nt*) (*TRUE, if nt is deletable*) (*terminals causing this nt to be analyzed*) END; Symbollist Stack = ARRAY (maxp+1..maxs] OF Symbolnode; = ARRAY[1..lmaxs] OF CARDINAL; VAR tab: header: POINTER TO RECORD (*grammar tables*) ARRAY [1..8] OF CARDINAL; (*not used*) ARRAY[1l..maxcode] OF CHAR; code: (*G-code area*) ntsymbols: Symbollist; (*nonterminals information*) epsset: ARRAY[1..maxeps] OF Symbolset; anyset: ARRAY [1..maxany] OF Symbolset; nra: Attributenumbers; (*no.of attributes*) Pragmalist; ps: (*semantics for pragmas*) Namepointers; namep: (*pointers to symbol names*) name: (*symbol names*) Namelist; END; correct: BOOLEAN; (*error indicator*) CARDINAL; pes (*program counter*) CARDINAL; errdist: newlacts: ARRAY [0..maxt] ARRAY [0..maxt] newpc: s,oldsz Stack; CARDINAL; lacts: (*stack PROCEDURE GetSymInstr (pc:CARDINAL; PROCEDURE RestoreStack; PROCEDURE SaveStack; “ # FORWARD; VAR StackElem(i:CARDINAL): Triple (altroot:CARDINAL); Check PROCEDURE (* NextSym --- PROCEDURE (sy MOD 16) Get next --------- ---- CARDINAL; CARDINAL); ---- FORWARD; FORWARD; if sy is member Match(sy:CARDINAL; RETURN opcode,sy,nextpc,altpc: FORWARD; PROCEDURE (* Match pointer*) FORWARD; PROCEDURE BEGIN (*current error distance*) (*new stack length*) (*pc after recovery*) OF CARDINAL; OF CARDINAL; of the specified set:Symbolset): IN set[sy DIV 16]; set BOOLEAN; END Match; symbol - --------- --- --- - -- ee + x) NextSym; BEGIN LOOP GetSy; (*IF printinput THEN WriteString(con,"$(in:"); WriteString(con,") "); IF printnodes THEN WriteCard(con,lacts,3); WriteCard(con, typ, 3); WriteString(con,"| "); END; END; *) IF typ<=maxt WITH tab“ THEN RETURN END; AND (ps[typ].sem2<>0) DO IF correct THEN Semant (ps[typ].sem2); END;
330 Program listings 119 120 121 122 123 124 125 126 127 128 IF correct AND (ps[typ].sem3<>0) END; IF typ=eofsy THEN RETURN END; END; THEN App. F Semant (ps[typ].sem3); END; END NextSym; (*=========================== 129 (* AdjustPc IS) Sean Adjust 131 PROCEDURE pc to next ==================================%*) symbol instruction a AdjustPc (VAR an SE ID pc:CARDINAL); 132 BEGIN 133 WITH tab“ DO 134 IF pc=0 THEN RETURN; 135 ERRORS END; LOOP 136 CASE 137 ORD(code[pc]) OF t,ta,nt,nta,nts,ntas,any,anya,eps,epsa: 138 139 EXIT; | jmp: pc:=256*0RD (code [pc+1])+ORD (code [pc+2]); | ret: pc:=0; EXIT; 140 ELSE INC (pc); (*sem*) 141 142 143 144 145 146 147 148 END; END; END; END AdjustPc; (* Error Report syntax error -------------------------------------------------_____0... 2) 149 PROCEDURE Error(VAR pc,altroot:CARDINAL); 150 VAR 151 e,el,h: 152 153 154 155 156 157 1,3: CARDINAL; opcode, sy,nextpc,altpc,pcl: Errorptr; CARDINAL; PROCEDURE GiveName(q:Errorptr; VAR p,j: CARDINAL; BEGIN 158 WITH 159 160 161 162 163 164 165 166 167 BEGIN p:=namep[sy]; 4:=0; WHILE ()<25) AND (name[p+}]<>0C) DO INC (J); q*.txt[j]:=name[p+j-1]; END; Qoolks=3i7 END; END GiveName; 168 169 170 171 172 173 174 175 176 wi) tab* sy:CARDINAL) ; DO (*Error*) correct :=FALSE; IF errdist >= errdistmin THEN Allocate (h, SIZE (Errornode)); h*.next:=NIL; pel:=altroot; WHILE pc1>0 DO el:=h; AdjustPc(pcl); GiveName(h,typ); GetSymInstr (pcl, opcode, sy,nextpc,altpc); IF opcode<any THEN (*t,nt,nts,ta,nta,ntas*) Allocate (e, SIZE (Errornode)); (*pass near-symbol*)
App. F cocosynframe 178 331 GiveName (e, sy) ; 182 END; 183 184 185 186 187 (*pass expected el:=e; el*.next:=e; END; pel:=altpc; 179 180 181 e*.next:=NIL; (*WHILE*) SyntaxError (h,line,col); Triple (altroot); SaveStack; IF printnodes THEN WriteString(con,"$ typ FOR 1:=0 TO maxt DO 188 symbol*) IF newpc[i]<>0 189 190 newpc newlacts$") ; THEN WriteCard(con,1,5); WriteCard(con,newpc[1],10); WriteCard(con,newlacts[i],10); WriteLn(con); 191 END; 192 193 (*IF*) END; (*FOR*) END; (*IF*) 194 195 ELSE END; RestoreStack; 196 WHILE 197 198 199 200 201 202 203 204 205 206 207 208 IF printnodes THEN WriteString(con,"$(skip:"); WriteCard(con,typ,0); WriteString(con,") "); END; Next Sym; END; pc:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0; END Error; newpc[typ]=0 ----------------------------------= == =$5 = === $= == === ==== =-- === --- *) 209 PROCEDURE (& Fill Fill DO triple list with alt-chain starting at pc Fill(pc, lacts:CARDINAL) ; 210 VAR 211 1,opcode,sy,nextpc,altpc: 212 s: 213 BEGIN 214 AdjustPc (pc); NS WHILE 216 217 218 219 220 221 229 230 pc<>0 DO GetSymInstr (pc, opcode, sy,nextpc,altpc) ; CASE opcode OF ei, cars newpc[sy]:=pc; newlacts[sy] :=lacts; | nt,nta,nts,ntas: s:=tab*.ntsymbols[sy].first; 222 223 224 225 226 227 228 CARDINAL; Symbolset; FOR 1:=0 TO maxt DO IF Match(i,s) THEN newpc[i]:=pc; newlacts[i]:=lacts; END; END; IF tab*.ntsymbols[sy].del THEN Fill(nextpc,lacts); END; | eps,epsa: Fill(nextpc, lacts) ; ELSE (*any,anya: nothing*) END; (*CASE*) pc:=altpc; 231 END; 232° 233 234 235 END Fill; (*WHILE*) (* FillSuce Fill triple list with succ. of alt-chain at pc 236 --------------------------------------------------------------------- x)
237 App. F Program listings 332 PROCEDURE FillSucc(pc, lacts:CARDINAL) ; 238 VAR 239 240 opcode, sy,nextpc,altpc: BEGIN CARDINAL; 241 AdjustPc(pc); 242 WHILE pc>0 DO (*fill with successors of alternative-starts*) 243 GetSymInstr (pc, opcode, sy, nextpc,altpc) ; 244 IF nextpc>0 THEN Fill (nextpc,lacts); END; 245 pe:=altpc; 246 END; (*WHILE*) 247 END FillSucc; 248 249 250 (* GetSymInstr Get G-code instruction at address pc AS SESS SSSI BSH HEFTE FREE IT ET x 252 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL); 253 BEGIN (*assert: pc points to a symbol instruction (not RET, JMP, SEM, ANY) *) 254 WITH 255 256 257 tab* DO opcode :=ORD (code [pc] ); IF (opcode<=epsa) AND (opcode<>any) THEN sy:=ORD (code [pct+l}); 258 259 ELSE END; 260 261 262 263 264 sy:=0; CASE opcode OF t,nt,eps: nextpc:=pct2; | ta,nta,anya,epsa: nextpe:=pc+4; altpc:=0; altpc:= (code [pc+2] 256*0RD ) +ORD (code [pc+3]); 265 | nts: nextpc:=pc+3; altpc:=0; 266 | ntas: nextpc:=pc+t5; altpc: (code =256*O [pc+2] )+ORD (codeRD [pc+3]); | any: 267 altpc:=0; nextpc:=pc+l; 268 END; (*CASE*) 269 AdjustPc(nextpc); AdjustPc(altpc) ; 270 END; 271 (*assert: nextpc,altpc point to symbol instructions or are zerot*) 272 END GetSymInstr; 273 274 275 (* Triple Fill triple list 6722222222 2 ee ae *) 277 PROCEDURE Triple(altroot:CARDINAL); 278 VAR 279 BEGIN 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 i: CARDINAL; FOR i:=0 TO maxt DO (*clear triple list*) newpc(i]:=0; newlacts[i] :=0; END; FOR i:=1 TO lacts DO (*fill with succ.of stacked (*s[1] contains successor at level 0*) FillSuce (StackElem(1) ‚,i-1); Fill(StackElem(1),1-1); nt's*) END; FillSucc (altroot, lacts); Fill(altroot,lacts); END Triple; (*========================= (*fill (*fill END ERRORS with with succ.of current alt-chain*) alt-chain*) ===S==5S222==22=25=2=22===22=2===22=%)
SF cocosynframe (#======================== PROCEDURE BEGIN Pop(VAR loc: SYNTAXSTACK 333 ====================s=s==2=======t) CARDINAL) ; IF lacts>0 THEN loc:=s[lacts]; ELSE WriteString(con,"--- DEC(lacts); Parser stack underflow.$"); HALT; END; (*IF printnodes END Pop; PROCEDURE THEN Push(loc: WriteString(con," pop"); END;*) CARDINAL) ; BEGIN IF lacts<lmaxs THEN INC(lacts); s[lacts]:=loc; ELSE WriteString(con,"--- Parser stack overflow.$"); HALT; END; (*IF printnodes END Push; THEN WriteString(con," push"); END;*) PROCEDURE RestoreStack; BEGIN s:=olds; END RestoreStack; PROCEDURE SaveStack; BEGIN olds:=s; END SaveStack; PROCEDURE StackElem(1:CARDINAL): CARDINAL; BEGIN RETURN s{i]; END StackElem; (* TableContents A dirty PROCEDURE TableContents; BEGIN (*%% dont remove or -->tables END TableContents; PROCEDURE VAR altroot: mustread: opcode: running: sy: Parse(VAR trick change to initialize this comment*) the grammar corr:BOOLEAN) ; CARDINAL; BOOLEAN; CARDINAL; BOOLEAN; CARDINAL; (*root of current alternative chain*) (*TRUE if next symbol must be read*) (*instruction code*) (*interpreter state*) BEGIN tab:=ADR(TableContents)+10D; pe:=startpc; altroot:=pc; line:=1; col:=0; correct:=TRUE; mustread:=TRUE; WITH tab“ tables (*initialize the running:=TRUE; DO WHILE running DO opcode :=ORD (code [pc]) ; IF mustread AND (opcode<=epsa) THEN tables*)
358 (*IF printnodes INC (pc) ; CASE opcode ee 362 363 364 365 366 367 368 369 370 371 372 373 374 mustread:=FALSE; NextSym; END; 356 357 359 360 361 App. F Program listings 334 THEN INC(errdist); WriteCard(con,pc,5); altroot:=pc; END;*) OF IF ORD (typ) =ORD (code [pc] } THEN IF typ=eofsy THEN running:=FALSE; ELSE INC(pc); mustread:=TRUE; END; ELSE Error (pc,altroot); END; (*t recognized*) Peete IF ORD (typ) =ORD (code [pc] ) THEN INC(pc, 3); mustread:=TRUE; ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]); (*t recognized*) (*try alt.*) END; jene, nes: 375 376 377 378 379 380 sy:=ORD (code [pc] ); IF Match(typ,ntsymbols[sy].first) OR ntsymbols[sy] .del THEN (*right nt, parse it*) IF opcode=nts THEN INC (pc); Semant (ORD (code[pc])); END; Push(pc+1); pc:=ntsymbols[sy] .startpc; altroot:=pc; 381 ELSE 382 END; Error (pc,altroot); 383 | nta,ntas: 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 sy:=ORD (code [pc] ) ; IF Match(typ,ntsymbols[sy].first) THEN (*right nt, parse it*) INC (pc, 3); IF opcode=ntas THEN Semant (ORD(code[pc])); INC(pc) END; Push (pc); pc:=ntsymbols[sy].startpc; altroot:=pc; ELSE pc:=ORD (code [pc+1])*256+0RD (code[pc+2]); (*try alt.*) END; | any: mustread:=TRUE; (*any recognized*) | anya: IF Match (typ,anyset [ORD (code [pc])]) THEN INC (pc,3); mustread:=TRUE; (*any recognized*) ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2] ) ; END; | eps: IF Match (typ,epsset [ORD (code [pc])]) 401 402 403 404 405 406 407 408 409 410 411 THEN INC (pc); ELSE END; | epsa: IF Match (typ,e [ORD psset (code [pc])]) THEN INC (pc, 3); (*eps recognized*) ELSE pc:=ORD (code [pc+1]) *256+0RD (code [pc+2]); END; | jmp: Pc:=ORD (code [pc] ) *256+0RD (code [pc+1]); (*goto successor* ) | ret: Pop(pc); altroot:=pc; (*end of nt*) ELSE 412 413 414 Error (pc,altroot); (*sem*) IF correct END; END; THEN Sema (ORD (opcode) nt ); END; (*CASE*) (*WHILE running*)
App. F 415 416 417 cocosynframe 335 END; (*WITH tab**) corr:=correct; END Parse; 418 419 BEGIN 420 421 422 423 printinput: =FALSE; printnodes: =FALSE; errdist:=100; lacts:=0; 424 END -->modulename. ADDRESS AdjustPc ADR Allocate altpc altroot analyzer any anya anyset at Attributenumbers C code } Cole ee” con corr correct D declarations del e el eofsy eps epsa epsset errdist errdistmin Error Errornode Errorptr Errors FileIo Fill FillSucc first 33 131 33 32 89 264 94 380 35 42 42 73 36 144 347 174 153 265 149 381 50 74 160 70 370 400 36 31 338 12 347 38 61 151 151 47 43 43 12 82 45 149 30 30 30 31 209 237 62 FORWARD 90 GetSy 36 GetSymInstr GiveName h HALT 89 155) 151 302 173 214 241 269 269 177 175 266 173 390 181 267 184 402 211 269 203 410 216 230 239 243 245 252 262 277 288 289 340 348 356 367 176 263 256 394 267 393 136 372 405 183 186 416 79 138 312 407 349 189 138 375 407 255 378 409 PI 384 409 264 388 264 391 266 391 266 395 354 397 362 397 189 190 190 198 198 199 302 311 118 119 168 350 412 416 225 177 172 121 137 137 400 169 169 204 171 151 376 178 179 363 226 226 405 203 179 179 179 ins) 261 256 839 263 355 404 356 422 367 177 Ve) 381 402 BOS 247 221 91 108 175 165 1m 311 227 285 376 92 232 288 385 93 244 286 289 216 171 171 243 178 172 252 272 172 183 137 137 395 94
header 1 implementation INLINE input 5 jmp 1 lacts line lmaxs loc Match maxany maxcode maxeps maxname maxnamep maxp maxs maxt module modulename mustread name Namelist namep Namepointers newlacts newpc next nextpc NextSym nra nt nta ntas nts ntsymbols olds opcode App. F Program listings 336 69 93° 152 187) 18872199 18977190 21102 2222235273223 21802 PO EL Pl sel EG, AI AS A 160° Neh 16 Acleeies 2109 309 52237572255, 310 310 221, 423 2312128328 310 376 385 395 400 405 56 57 83 84 116 36 ii 13 28 424 341723505 3555235623652 Teer 52 iy 76 159 187 222 280 Ss 14 33 36 1522 156 215921607 43 138 409 163 8607203222095 300 301 301 36 183 349 46 65 309 298 301 307 99 100 223 13 70 72 52 ol 50 56 64 29 64 517718 832190502032 84 188 189 1122 795279 Ee) ates) 265 264 262 UOSS 232 0s 37103935396 2195 196 223 203 287 219 SEE 266 56 NG 267 AP 269 223 281 OP Bley Deh OM DA DIG BS) A 950 O65 DEG 149 203 242 266 371 387 397 409 243 266 372 2308 397 410 74 p Parse pc “ML ale PPA PAL 412137272207 2638 ale ar oe A237 e220 265 TA 221 225376, Somes 820 SIE 1537 Gy 11,0 230 7A SV ee 156° 159160 61 a ee al) 80 895 131" 134) pel Pop 203 245 267 372, 388 400 410 US) 283 ee 383 Se Tat 23760,.2379 3858399 il ac PNT ee PAG. KGW BES) Se) 136, 138) 209 252 348 372 389 401 13985129 214 255 348 375. 389 402 215 257 354 378: 390 405 216 262 356 378) 391 406 219 264 359 9379. 391 407 223 264 362 3790 391 407 sh AGS) ANN) NG NGG alii Pragma Se} Ys Pragmalist 56 75 1295140, 236-937 264 265 365 367 3801039817 395 396 407 409 241 266 370 07384 397 409
App. F cocosynframe printinput printnodes ps Push q RestoreStack ret running s SaveStack sem2 sem3 Semant semantic set Stack StackElem startpc sy 9 420 10 185 197 421 Ths) ar le ae A 307 314 379 389 15572 16122163 91 194 316 #317 43551397410 S43 ees 503537364 BSI 2221? 23 0 lee 31 320 319 184 92 54.378118 94 119 119 35 118 119 378 388 412 35 Fr 99 100 65 85 9352285572865, 3225323 60 348 379 389 COT 9910 022100515322 1755 Symbollist Symbolnode Symbolset SyntaxError 219502218 7225552395, 22352525 379 384 385 389 64 7 59 64 Ol OZ en 2a? 30 183 SYSTEM 58 System 32 t ta tab TableContents tables Triple Au le ak 41 137 218 a isis) 3303337349 332 94 184 277 EXC 161 typ SE 362 WriteCard WriteLn WriteString al 263 G3 eo] 92 0923 5159 1757221219 7257222595 34082375. 31608316 El 369 a 22 er) Gy? IDEEN 400 405 IE 395 290 LUO 363 EEE 370 18119 376 385 189 190 re 189 190 198 ae OO oz 31 31 337 til 9872037203
338 Program listings Perform I(*=cocoust 3 This 4 a) 5 b) 6 c) 7 +d) 8 e) various tests with App. F graph top-down module tests if all nonterminals can be reached from the start if there exist productions for all nonterminals if all nonterminals can be derived to terminals if the grammar is free of circular derivations if the grammar satisfies the LL(1)-conditions 10 DEFINITION MODULE 16 PROCEDURE LL1Test (VAR 17 if the 19 PROCEDURE 20 (* ok=TRUE 22 PROCEDURE 23 (* ok=TRUE 25 PROCEDURE 26 (* ok=TRUE 28 END 12.1.83 symbol cocotst; 12 PROCEDURE FindCircularRules(VAR ok:BOOLEAN); 13 (* Finds and prints the circular part of the grammar. 14 no circular part*) (* Checks Moe ok means: 111:BOOLEAN); grammar satisfies the LL(1) conditions*) TestCompleteness (VAR ok:BOOLEAN) ; if all nonterminals have rules*) TestIfAllNtReached (VAR ok:BOOLEAN); if all nonterminals TestIfNtToTerm(VAR cocotst. if all nonterminals can be reached from the start ok:BOOLEAN); can be reduced to terminals*) symbol*)
App. F cocotst MOD 339 1 (* cocotst Perform various tests with the top-down graph Moe 11.1.84 2 3 This module tests 4 a) if all nonterminals can be reached from the start symbol 5 b) if there exist productions for all nonterminals 6 c) if all nonterminals can be derived to terminals i d) if the grammar is free of circular derivations 8 e) if the grammar satisfies the LL(1)-conditions Oa ee *) 10 IMPLEMENTATION MODULE cocotst; FROM cocogra IMPORT FROM FROM FROM cocolex cocolst cocosym IMPORT IMPORT IMPORT rootloc, ClearMarkList, Deletable, DelNode, Graphnode, GetNode, Mark, Marked, Marklist; ddt, GetName; lst; maxp, maxs, maxt, ClearSet, GetF, GetFirstSet, GetFo, GetSy, IsInSet, RepSy, FROM FileIo IMPORT al! 12 13 14 15 16 17 18 19 20 SetBit, con, Unit, Symbolnode, WriteCard, Symbolset, WriteString, Symboltype; WriteText, WriteLn; 21 VAR 22 headline: BOOLEAN; (*TRUE if header shall be printed*) a) slike BOOLEAN; (*TRUE if LL(1) conditions hold*) 24 25 26 (* FindCircularRules Test grammar for circular derivations Q] 2222222222220 28 PROCEDURE 29 CONST *) FindCircularRules (VAR ok:BOOLEAN) ; 30 circmax = 150; 31 TYPE 32 Circrule = RECORD 33 left, right: CARDINAL; 34 del: BOOLEAN; 35 END; 36 Circrulelist = ARRAY[l..circmax] OF Circrule; 37 VAR 38 es Circrulelist; 39 changed: BOOLEAN; 40 headline: BOOLEAN; 41 1,j},k,dummy: CARDINAL; 42 ied. CARDINAL; 43 m Marklist; 44 singleset: Marklist; (*set of single nonterminals in a production*) 45 sn: Symbo lnode; 46 rside,lside: BOOLEAN; 47 48 PROCEDURE GetSingles(loc:CARDINAL; VAR singles:Marklist) ; 49 VAR gn: Graphnode; 50 51 BEGIN IF (loc=0) OR Marked(loc,m) THEN RETURN; 52 583 54 55 Mark (loc,m); GetNode (loc,gn); CASE gn.typ OF eps: GetSingles(gn.rp,singles) ; 56 | t,any: ; 57 58 ent» IF Deletable(gn.rp) IF DelNode(gn) THEN 59 END; (*CASE*) END; THEN Mark (gn.sp,singles); END; GetSingles(gn.rp,singles); END;
340 Program listings GetSingles(gn.lp,singles) ; 60 END GetSingles; 61 62 PROCEDURE PutCirc(1:CARDINAL); 63 VAR 64 65 1: CARDINAL; name: ARRAY[1..50]) OF CHAR; 66 sn: Symbolnode; 67 BEGIN 68 IF headline THEN 69 WriteLn (lst); 70 WriteString(lst,"Circular part for this grammar:"); 71 72 WriteLn (lst); 73 headline:=FALSE; 74 END; 75 WriteString(lst," "); 76 GetSy(c[i].left,sn); GetName(sn.spix,name,1); vy WriteText (lst,name,1l); WriteString(lst," --> "); 78 GetSy(e[i].right,sn); GetName(sn.spix,name, 1); 19 WriteText (lst,name,1l); WriteLn(lst); 80 END PutCirc; 81 82 BEGIN (*FindCircularRules*) 83 leirc:=0; 84 (*---------------------------- fill list of circular derivations c*) 85 FOR i:=maxp+l TO maxs DO 86 ClearMarkList (singleset); ClearMarkList (m); 87 GetSy(i,sn); 88 GetSingles (sn.start,singleset); (*get nt's j such that i->j*) 89 FOR ):=maxp+l TO maxs DO 90 IF Marked(j,singleset) THEN 91 INetlerre); 92 WITH c[lcirc] DO left:=i; right:=j; del:=FALSE; END; 93 IF ddt["D"] THEN 94 WriteCard(con, lcirc,6); WriteCard(con,i,6); 95 WriteCard(con, j,6); WriteLn(con) ; 96 END; 97 END; (*IF Marked*) 98 END; (*FOR j*) 99 END; (*FOR i*) 100 (*#=--2==22---2---------- remove non circular derivations from c*) 101 REPEAT 102 changed:=FALSE; 103 FOR 1:=1 TO lcirc DO 104 IF NOT c[i].del THEN 105 rside:=FALSE; lside:=FALSE; 106 FOR j:=1 TO leirc DO 107 IF NOT c[}j].del THEN 108 IF c{i].left=c[4].right THEN rside:=TRUE; END; 109 IF c{j].left=c[i].right THEN lside:=TRUE; END; 110 END; ala END; (*FOR j*) 112 IF NOT rside OR NOT lside THEN 113 c{i].del:=TRUE; changed:=TRUE; 114 IF ddt[{"D"] THEN ISLS) WriteCard(con,i,6); WriteString(con," deleted$"); 116 END; a7) END; 118 END; (*IF NOT c[i].del*) App. F
cocotst MOD END; (*FOR*) UNTIL NOT changed; Saas ce contains the 123 124 125 126 20] 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 1155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 apt 172 173 174 175 176 107 341 circular part ok :=TRUE; headline:=TRUE; FO R is=1 10 leire DO IF NOT c[{i].del THEN PutCirc(i); of the grammar. ok:=FALSE; Print it*) END; END; IR ok THEN WriteLn (lst); WriteString(lst,"Grammar WriteLn (lst); contains no circular derivations."); END; END FindCircularRules; # GET LlError PROCEDURE Print LL(1) error message LL1Error (code, line, sy:CARDINAL) ; VAR ile name: CARDINAL; ARRAY[1..50] sn: BEGI N OF CHAR; Symbolnode; IF headline THEN headline:=FALSE; WriteLn(lst); WriteString(lst,"LL(1)-error(s):"); Writeln(lst); END; WriteString(lst," line"); WriteCard(l1st, line, 4); GetSy(sy,sn); GetName(sn.spix,name, 1); WriteString(lst," HR "CASE code OF 1: WriteText (lst,name,1); WriteString(lst," is start 2: of more than one alternative."); WriteText (lst,name,1); WriteString(lst," is start and successor WriteString(lst,"rest of rule."); of deletable "); END; WriteLn (lst); 11:=FALSE; END LL1Error; (* LL1Test Collects PROCEDURE LL1Test (VAR VAR dummy: CARDINAL; gn: Graphnode; 1, loc: m: Marklist; sn: Symbolnode; terminal sets and checks LL(1) conditions 111:BOOLEAN) ; CARDINAL; PROCEDURE Test (VAR s1,s2:Symbolset; code, 1ine:CARDINAL) ; VAR 1:CARDINAL; BEGIN FOR i:=0 TO maxt DO IF IsInSet(i,sl) AND IsInSet(i,s2) LL1Error (code, line, 1); THEN
342 Program listings 178 END; 179 180 App. F END; END Test; 181 182 183 PROCEDURE 184 185 VAR gn: 186 187 CheckAlternatives (loc, sym:CARDINAL) ; Graphnode; locset,s,first,follow: Symbolset; BEGIN 188 IF 189 GetNode (loc, gn) ; (loc=0) OR Marked(loc,m) 190 IF ddt["F"] THEN RETURN; THEN 191 WriteCard(con,loc,6); 192 193 WriteCard(con,gn.sp,6); END; WriteCard(con,ORD IF Deletable (loc) THEN GetFirstSet (loc,s); GetFo(sym, follow); Test (s, follow,2,gn.line); 197 198 END; ClearSet (s,maxt); 199 WHILE loc<>0 DO 200 Mark (loc,m); 201 202 203 GetNode (loc, gn) ; IF DelNode (gn) THEN GetFirstSet (gn.rp, locset) ; ELSE 205 END; 206 CASE 207 tes ClearSet gn.typ 208 | nt: | eps,any: ; OF GetF(gn.sp,first); Unit (locset, first,maxt); ; END; ZN 212 213 Test (s, locset,1,gn.line) ; Unit(s, locset,maxt) ; CheckAlternatives(gn.rp, sym) ; 214 loc:=gn.lp; 215 END; 216 (locset,maxt) SetBit (locset,gn.sp); 209 210 (gn.typ),6); WriteLln(con); 194 195 196 204 END; END CheckAlternatives; 217 218 219 BEGIN (*LL1Test*) 220 11:=TRUE; headline:=TRUE; 221 FOR 1:=maxp+1 TO maxs DO 222 ClearMarkList (m) ; 223 GetSy (1,sn); 224 CheckAlternatives(sn.start,1); 225 226 END; IF 11 THEN 227 228 229 230) 231 232 233 234 239) WriteLn (lst); WriteString(lst,"Grammar END; d= END LL1Test; (* TestCompleteness Test if all === mean nna Sera ci 236 PROCEDURE satisfies LL(1)-conditions."); nonterminals ee TestCompleteness (VAR ok :BOOLEAN) ; have WriteLn (1st); rules *
App. F cocotst MOD 237 VAR 238 sn: Symbolnode; 239 i,1,dummy: 240 name: 241 242 243 244 245 CARDINAL; ARRAY[1..50] OF CHAR; BEGIN Py, ok:=TRUE; FOR i:=maxp+l TO maxs GetSy(i,sn); IF sn.start=0 THEN 246 IF ok DO THEN 247 WriteLn 248 249 WriteString(lst,"Nonterminals END; 250 251 252 256 257 258 259 260 261 (lst) ; without rules:"); GetName (sn.spix,name,1); WriteString(lst," "); WriteText (lst,name,l); ok:=FALSE; END; END; (*FOR*) 253 254 255 343 IF ok WriteLn(lst); WriteLn (lst); THEN WriteLn (lst); WriteString(lst,"All END; END TestCompleteness; nonterminals Tests if all have nts can rules."); be WriteLn(lst); 262 (* TestIfAllNtReached 263 ---------------------------------------------------== x) reached 264 PROCEDURE TestIfAllNtReached(VAR ok:BOOLEAN) ; 265 266 VAR gn: Graphnode; 267 » i,1,dummy: CARDINAL; 268 269 270 271 272 273 274 275 m: name: sn: reached: Marklist; ARRAY[1..50] Symbolnode; Marklist; 276 za 278 BEGIN IF (loc=0) OR Marked(loc,m) Mark (loc,m); PROCEDURE MarkReachedNts VAR gn: Graphnode; sn: Symbolnode; 279 GetNode (loc,gn); 280 WITH gn DO 281 282 283 284 285 286 287 288 289 290 291 292 OF CHAR; (loc:CARDINAL) ; THEN RETURN; IF END; (typ=nt) AND NOT Marked(sp, reached) THEN Mark (sp, reached); GetSy(sp,sn); MarkReachedNts(sn.start) ; END; MarkReachedNts (lp) ; MarkReachedNts (rp) ; END; END MarkReachedNts; BEGIN ClearMarkList (m) ; ClearMarkList (reached); 293 GetNode (rootloc,gn); GetSy(gn.sp,sn); Mark(gn.sp, reached) ; 294 295 MarkReachedNts(sn.start); ok:=TRUE;
344 296 297 Program listings GetSy(i,sn); GetName (sn.spix,name, 1); WriteString(lst,"Nonterminal "); WriteText (lst,name, 1); WriteString(lst," cannot be reached."); WriteLn (lst); ok:=FALSE; END; END; 304 IF 305 306 307 308 309 310 WriteLn (1st); WriteString(lst,"All nonterminals END; END TestIfAllNtReached; SD 313 ok THEN (* TestIfNtToTerm mm PROCEDURE Test if all nt can be derived TestIfNtToTerm(VAR t =) ok:BOOLEAN) ; which can be derived to BEGIN IF 327 Mark (loc,m); (loc=0) OR Marked(loc,m) 328 GetNode (loc,gn); 329 WITH gn DO 330 IF (typ=nt) AND NOT 331 332 333 334 335 336 THEN RETURN ELSE RETURN END; END; END IsTerm; 337 338 BEGIN (*TestIfNtToTerm*) ClearMarkList (termlist) ; 340 341 342 343 344 345 346 347 348 349 350 351 to nen 326 339 WriteLn(1lst); reached."); be can 314 VAR 315 i1,1,dummy: CARDINAL; 3165 ssn: Symbolnode; Si name: ARRAY[1..50] OF CHAR; 318 changed: BOOLEAN; 319 termlist: Marklist; (*list of nts 3202 M: Marklist; 321 term: BOOLEAN; 322 323 PROCEDURE IsTerm(loc:CARDINAL) : BOOLEAN; 324 VAR gn: Graphnode; 325 symbols*) not marked FOR i:=maxp+l TO maxs DO (*report IF NOT Marked(i, reached) THEN 298 299 300 301 302 303 311 App. THEN RETURN FALSE; END; Marked(sp,termlist) IsTerm(lp); (rp=0) OR IsTerm(rp) OR IsTerm(lp); REPEAT changed: =FALSE; FOR i:=maxp+l TO maxs DO IF NOT Marked(i,termlist) GetSy (1,sn); ClearMarkList (m); term:=IsTerm(sn.start); THEN IF term THEN Mark (i,termlist); IF ddt["E"] THEN WriteCard(con,i,6); IF term THEN WriteString(con," ELSE WriteString(con," 352 353 END; END; (*IF 354 355 END; (*FOR*) UNTIL NOT changed; NOT Marked*) changed:=TRUE; reducable not END; to term.$"); reducable to term.$"); END; t*)
App. F cocotst 356 357 ok:=TRUE; WriteLn (lst); 358 FOR i:=maxp+l TO maxs MOD 345 DO 359 IF NOT Marked(i,termlist) THEN 360 GetSy(i,sn); GetName(sn.spix,name,1); 361 WriteText (lst,name,1); 362 WriteString(lst," cannot be derived to terminals."); WriteLn(ls t); 363 ok:=FALSE; 364 END; 365 END; (*FOR*) 366 IF ok THEN 367 WriteString(lst,"All nonterminals can be derived to terminals."); 368 WriteLn (lst); 369 END; > 370 END TestIfNtToTerm; hl 372 373 END cocotst. any c changed 56 209 38 76 78 SIERT 92 120 216 104 318 224 107 340 108 346 108 355 86 204 222 290 291 338 344 172 94 gH 95 95 115 115 il 191 ddt 114 del 104 Deletable 194 DelNode 202 dummy 41 165 239 eps 55 209 FileIo 19 FindCircularRules 28 131 first 186 208 208 follow 186 195 196 GetF 16 208 GetFirstSet 17195722083 190 107 347 118 124 267 315 147 201 60 87 55 201 280 185 73 250 219 61 147 57 202 292 266 122 282 58 207 324 293 60 208 328 CheckAlternatives circmax Circrule Circrulelist ClearMarkList ClearSet cocogra cocolex , cocolst cocosym cocotst code con GetFo GetName GetNode GetSingles GetSy gn Graphnode headline 30 32 36 12, 16 12 14 15 16 10 136 19 350 14 34 12 12 183002183 36 124 192 192 348 298 166 211 329 343 360 US) IHS) 21357214 36 38 86 198 373 149 94 351 93 92 57 58 17.195 14 76 13 53 48 Ma 49 191 266 13 22 113 55) 53 192 274 49 40 78 189 08 TKS} 54 196 279 166 69 298 292 88 223 57 203 292 274 142 360 328 244 58 206 293 324 143 220
IsInSet IsTerm j k 1 ikenlfete: left line ifal at LLlError LL1Test loc locset m Mark Marked Marklist MarkReachedNts maxp maxs maxt name nt ok PutCirc reached RepSy right rootloc rp rside s sl s2 SetBit singles singleset sn sp App. F Program listings 346 4 113 223 343 107 323 41 41 65 267 42 33 136 23 163 136 163 48 195 326 186 60 46 IL) 144 227 299 368 43 320 13 13 13 273 16 16 16 66 269 Di 28 304 63 Zale. 17 33 12 55 46 186 172 172 18 48 44 45 223 298 57 104 176 315 108 177 341 109 221 342 152 239 250 251 188 277 188 278 189 279 191 323 194 326 ul 150 251 306 18 151 251 357 79 152 251 361 2 153 256 362 128 154 257 362 129 156 257 367 188 200 222 268 277 278 290 278 277 168 287 243 243 208 79 360 282 281 268 294 296 296 DZ 189 361 292 297 ZN 327 326 319 346 330 320 342 359 341 341 358 358 147 150 152 240 250 251 236 266 242 246 252 239 264 295 301 Zon) 292 zZ 92 108 109 58 108 196 203 Ha 198 213 285 332 332 211 212 57 88 76 238 316 207 58 90 76 244 343 208 60 87 270 360 282 88 PAS) 140 Se: 147 147 169 2820729377294 292 293 330 92 141,3 296 94 1775 297 103 176 298 107 108 109 147 150 106 123 167 273 183 Zu 208 332 211 212 12) 146 248 305 77 148 248 306 168 63 115 224 346 176 sit 89 76 123 239 348 176 332 90 78 124 243 358 85 124 244 359 87 167 267 360 332 92 335 95 345 106 76 298 83 76 146 254 230 158 231. Sil 199 327 203 214 105 70 144 228 299 am 299 Sil 92 172 220 78 sly) 92 108 177 226 79 360 94 109 196 230 138 361 103 52 201 53 214 207 es 112 72 146 247 300 51 326 52 fet 43 282 85 85 1075 76 298 208 122 313 80 281 52 327 57 90 44 284 89 89 198 Ui 299 281 124 356 124 282 86 344 200 188 48 285 221 22) 204 78 Sur] 330 126 363 78 292 57 105 195 176 176 207 55 86 67 224 298 192 211 177 sl 200 328 204 284 109 71 144 228 300 78 245 345 281 78 250 360 282
App. F cocotst MOD spix start sy sym Symbolnode Symbolset Symboltype t 76 88 136 183 18 18 18 56 term termlist 321 319 Test 1072 TestCompleteness 236 TestIfAllNtReached TestIfNtToTerm 318 typ 54 Unit 18 WriteCard 19 WriteLn 19 228 WriteString 19 228 WriteText 19 78 224 147 195 45 172 207 345 330 180 259 264 370° 191 208 94 70 247 71 248 77 147 245 250 282 298 294 360 345 213 67 186 140 169 238 346 338 196 349 342 211 346 359 281 330 95 79 251 1 257 150 115 95 256 145 299 152 347 270 275 316 191 129 300 144 306 299 191 144 305 146 350 361 192 144 306 148 351 308 206 212 94 72 248 75 251 79 146 127 257 128 300 25 348 156 San 151 362 192 362 158 367 22, 368 154
Program listings module General (* Errors App. F error to store Moe messages 21.03.84 This module stores information about syntax errors and semantic errors. The information can either be retrieved afterwards or be printed automatically as simple error messages. Furthermore the module contains procedures to report compiler errors and implementation restrictions. These procedures cause a program stop. DEFINITION FROM MODULE FileIO IMPORT Errors; File; TYPE Symbolname {| = ARRAY[1..25] Errorptr POINTER Errornode = RECORD txt: Symbolname; Ike next: END; PROCEDURE (* Reports OF CHAR; TO Errornode; (*expected symbol in syntax error message*) CARDINAL; Errorptr; CompErr(nr:CARDINAL) ; compiler error nr and stops the program*) PROCEDURE GetNextSemErr(VAR nr, line,col:CARDINAL); (* Gets the error number, the line number and the column next semantic error. nr=0 if no next error exists*) number of the PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL) ; (* Gets the expected symbols, the line number and the column number of the next syntax error. symbols=NIL if no next error exists*) PROCEDURE GetNumberOfErrors(VAR synerrors, semerrors:CARDINAL) ; (* Gets the total number of syntax errors and semantic errors which occurred during compilation*) PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL) ; (* Prints error messages for all stored semantic errors (line,col, error number). semerrors holds the total number of stored semantic errors*) PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL); (* Prints error messages for all stored syntax errors (line,col, "near symbol",expected symbols). synerrors holds the total number stored syntax errors*) PROCEDURE (* Prints PrintSynError(f:File; symbols:Errorptr; col:CARDINAL) one error message line (* expected symbols) .*) PROCEDURE Restriction(nr:CARDINAL); (* Reports implementation restriction nr and PROCEDURE SemErr(nr,line,col:CARDINAL); (* Stores the error number, line number and error*) PROCEDURE (* Stores stops column the of ; program*) number of a semantic SyntaxError (symbols:Errorptr; line,col:CARDINAL) ; the "near-symbol", the expected symbols, the line number and
App. F 60 Errors. DEF the column 61 62 END Errors. number of a syntax error*) 349
Program listings (* Errors General module to store App. F error messages Moe 21.03.84 This module stores information about syntax errors and semantic errors. The information can either be retrieved afterwards or be printed automatically as simple error messages. Furthermore the module contains procedures to report compiler errors and implementation restrictions. These procedures cause a program stop. IMPLEMENTATION (*imports FROM Errors; of definition FileIO (*imports MODULE IMPORT module*) File; of implementation FROM FileIO IMPORT FROM System IMPORT module*) con, Write, WriteCard, WriteLn, WriteString, WriteText, Read; Allocate, Deallocate, Terminate, normal; TYPE Semerrptr = POINTER TO Semerror; Semerror = RECORD nr,line,col: CARDINAL; next: Semerrptr; END; Synerrptr Synerror = POINTER = RECORD symbols: TO Synerror; Errorptr; line,col: CARDINAL; next: Synerrptr; END; VAR semerr: synerr: Semerrptr; Synerrptr; (* CompErr Reports compiler error nr and stops the program PROCEDURE CompErr(nr:CARDINAL); VAR dummy:CARDINAL; ch:CHAR; BEGIN PrintSynErrors (con, dummy) ; PrintSemErrors (con, dummy) ; WriteString(con,"Compiler error "); WriteCard(con,nr,0); WriteString(con,". Program terminated.$"); WriteString(con,"Press a key to continue.$"); Read (con, ch) ; Terminate (normal); END 5 CompErr; (* GetNextSemErr Gets next semantic error information 53 PROCEDURE GetNextSemErr (VAR nr, line,col:CARDINAL) ; 54 VAR p: Semerrptr; 55 BEGIN 56 IF semerr=NIL 57 THEN nr:=0; line:=0; col:=0; 58 ELSE 59 P:=semerr;
App. F 60 61 62 63 64 65 66 DES 68 69 70 71 72 a3 Errors MOD 351 nr:=p*.nr; line:=p*.line; col:=p*.col; semerr:=p*.next; Deallocate(p); END; END GetNextSemErr; (* GetNextSynErr Gets next syntax error information a a ea mu land PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL) ; VAR p: Synerrptr; BEGIN IF synerr=NIL THEN symbols:=NIL; line:=0; col:=0; ELSE ‘i 74 p:=synerr; 15 symbols:=p*.symbols; line:=p*.line; col:=p*.col; 76 synerr:=p*.next; Deallocate (p); 77 END; 78 END GetNextSynErr; 79 80 81 (* GetNumberOfErrors Gets the total number of errors that occurred 82 --------- 2222-222 ----- 4-2. 20002 2-2 ---== =~ 83 PROCEDURE GetNumberOfErrors(VAR synerrors, semerrors:CARDINAL); 84 VAR 85 syn: Synerrptr; 86 sem: Semerrptr; 87 88 89 90 *) ©) BEGIN synerrors:=0; syn:=synerr; „WHILE syn<>NIL DO INC(synerrors); Semerrors:=0; syn:=syn*.next; END; sem:=semerr; 91 WHILE sem<>NIL DO INC(semerrors); sem:=sem*.next; END; 92 END GetNumberOfErrors; 93 94 95 (* PrintSemErrors Prints simple error messages for semantic errors 96 --------------7--7 222-227 *) 97 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL) ; 98 VAR 9 p: Semerrptr; 100 synerrors: CARDINAL; 101 BEGIN 102 GetNumberOfErrors (synerrors,semerrors); 103 IF semerrors>0 THEN 104 WriteString(f,"Semantic errors:$$"); 105 p:=semerr; 106 WHILE p<>NIL DO 107 WriteString(f,"line"); WriteCard(f,p*.line,5); 108 WriteString(f," col"); WriteCard(f,p*.col, 3); 109 WriteString(f,": error "); WriteCard(f,p*.nr,0); 110 WriteLn (f); aval p:=p*.next; #12 END; la) END; 114 END PrintSemErrors; 115 116 117 (* PrintSym Print a symbol in error message 118 ----------------------------------------------------------------------- x)
119 120 121 122 123 App. F Program listings 352 PROCEDURE BEGIN IF txt:ARRAY PrintSym(f:File; OF CHAR; len=1 THEN Write(f,'"'); Write(f,txt[0]); ELSE WriteText (f,txt,len); len:CARDINAL); Write(f,'""); 124 END; 125 END PrintSym; 126 127 Print expected symbols 128 (* PrintExpected 7 129 ------ == = 777777777707 7770000007 130 PROCEDURE PrintExpected(f:File; VAR p:Errorptr); 131 VAR first:BOOLEAN; q:Errorptr; 132 BEGIN 133. first:=TRUE; 134 WHILE p<>NIL DO 135 IF first THEN first:=FALSE 136 ELSIF p*.next=NIL THEN WriteString(f,' or ') 137 ELSE 138 139 140 END; PrintSym (i; p> txt, poe) q:=p; p:=p*.next; Deallocate(q); 141 142 143 WriteString(f,', ') END; WriteString(f,' expected’); END PrintExpected; WriteLn(f); 144 145 146 (* PrintSynErrors Prints simple error messages for syntax errors al en Haaren 148 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL) ; 149 VAR 150 iil 152 153 154 155 156 1157 158 159 160 err,errl: og semerrors: Synerrptr; Errorptr; CARDINAL; BEGIN GetNumberOfErrors (synerrors, semerrors); IF synerrors>0 THEN WriteString(f,"Syntax errors:$$"); err:=synerr; WHILE err<>NIL DO WriteString(f,'line'); p:=err”.symbols; WriteCard(f,err*.line,5); 161 WriteString(f,' near '); PrintSym(f,p*.txt,p*.1); 162 Writes trstno (ti een: 163 PrintExpected(f,p*.next); Deallocate(p); 164 errl:=err; err:=err*.next; Deallocate(errl); 165 END; 166 END; 167 END PrintSynErrors; 168 169 170 (* PrintSynError Prints one error message line OSI I BE EEE 172 PROCEDURE PrintSynError(f:File; symbols:Errorptr; col:CARDINAL); 173 174 a) VAR i:CARDINAL; BEGIN 175 WriteString(f,"***** 176 Wii WriteString(f,"* "); PrintExpected(f,symbols” .next); "); FOR 4:=1 TO col-1 Deallocate DO Write(f," (symbols); ") END; x
App. F 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 1199 200 201 202 203 204 205 206 207 208 209 210 211 212 Errors MOD END PrintSynError; (* Restriction Reports impl. ee eS a mem ees SO a oe SeI restriction a nn nr and nu stops the program nn nun lan x) PROCEDURE Restriction(nr:CARDINAL) ; VAR dummy:CARDINAL; ch:CHAR; BEGIN PrintSynErrors(con,dummy); PrintSemErrors (con, dummy) ; WriteString(con,"Implementation restriction "); WriteCard(con,nr,0); WriteString(con,". Program terminated.$"); WriteString(con,"Press a key to continue.$"); Read(con,ch); Terminate (normal); END Restriction; ” (* SemErr a a a PROCEDURE VAR Stores - -- -—- - === e,p,qg: information 22-22 en about nn semantic ------ error - = 22==-----_______ *) SemErr (nr, line,col:CARDINAL); Semerrptr; BEGINAllocate (e,SIZE (Semerror)); e*.nr:=nr; e*.line:=line; e*.col:=col; p:=semerr; q:=NIL; WHILE (p<>NIL) AND (p*.line<line) DO q:=p; p:=p*.next; END; WHILE (p<>NIL) AND (p*.line=line) AND (p*.col<col) DO q:=p; p:=p*.next; END; IF q=NIL THEN e*.next:=p; « „END SemErr; (* SyntaxError semerr:=e; ELSE Stores g*.next:=e; information PROCEDURE SyntaxError (symbols:Errorptr; 213 VAR e,p,q: Synerrptr; 214 353 END; about syntax error line,col:CARDINAL) ; BEGIN 215 216 ZN 218 219 220 221 222 223 Allocate(e,SIZE(Synerror) ); e*.symbols:=symbols; e*.line:=line; e*.col:=col; p:=synerr; q:=NIL; WHILE (p<>NIL) AND (p*.line<line) DO q:=p; p:=p%.next; WHILE (p<>NIL) AND (p*.line=line) AND (p*.col<col) DO q:=p; p:=p*.next; END; IF q=NIL THEN synerr:=e; ELSE q*.next:=e; END; e*.next:=p; END SyntaxError; END; 224 225 226 BEGIN (*Errors*) 227 ~ synerr:=NIL; semerr:=NIL; 228 END Errors. Allocate ch col CompErr con iz, Se) ails} 44 46 184 189 a) eS ee OO NU Sie 969 IE O2 40 48 TSA Ses Se 4444 OC OC er A 0221221672165 45465 46) 186 a 219219 186 18702187 2
err 188 17 41 197 ONG 150 189 61 43 199 QE 157 189 76 43 199 APE 158) errl 150 164 164 Deallocate dummy e Errorptr Errors f File FileIo first App. F Program listings 354 ale) 9 228 C7) LOA Ore 140 184 199 Pees} 159° 163 186 199 164 186 205 205 C0 eA 64162 stk a ee Be A aeOS OF ee 177 De oe ee le ee ae a 12 12 eh ec10 BP CU an 15 ek ae GetNextSemErr 93 63 GetNextSynErr 68 78 EP 206 ae 2 SMO) 43 aly 97 ans) 114 aa 186 PrintSym Ie) sy isi) ANS PrintSynError PrintSynErrors q 278 43 148 167 13D LAO LAO 220022200722 le 46 189 183 191 86 90 91 34 56 59 196 207 21 201199 83 9 91 2 24 34 28 68 72 85 8 89 ea ul 74 26 2235 83 88 89 26 30 Se) Zee 224 17 I eh 19.0) 216 216 ne re 12 1 eA se Ale AS 75 26 YS le SSR) 2160210218 ee er a 183 187 196 199 75 15 75 ashe) US az EE MUS 136521392139 140 en AN al AA CAVES PAYS a 22005220 2205223 Read Restriction sem semerr SemErr Semerror semerrors Semerrptr symbols syn synerr Synerror synerrors Synerrptr SyntaxError System Terminate 215 re GetNumberOfErrors 83 92 102 154 1 Hash. aS) il 139 161 len Mae) brat sue line 23 29 55 57 60 60 68 72 196" 199) 199° 201 201 52027 202 21205 219 219 next 24 30 61 76 89 sr a re 2.010=2032°2.0552206.0218772202225223 normal ia 47 190 nr 23 40 44 53 57 60 60 109 199 p 54 59 60 60 60 61 61 69 PrintExpected PrintSemErrors 213 186 ITEE2 91 61 KS 140 AW Pay 0 ae SSI POL ale 102 74 Me) en NCOm Glen 202202 A A 0201203205205 91 SOROS ae Sr 54 86 75 75 89 89 76 Be 100 SY a 140 AA NE Ra ae SY oe l6Seeale3 DOP Ros wwe WC) Maley DING Dong 227 ANNs} GY 99 197 16070172 ahGya 177 170. eh QV 222 297) 148 aI) 154 AS) 155 aie more
App. F txt Write WriteCard WriteLn WriteString WriteText Errors. MOD 119 15 15 15 15 159 16 122 122 44 110 44 161 123 123 139 122 122 107 108 142 45 46 162 21775 355 161 175 109 159 187 104 176 107 187 108 188 109 189 136 137 142 156
App. F Program listings 356 Simple 1 (* FileIo 2 3 This module 4 except that provides they can IO with more than one 16.8.87 Moe file procedures which are similar to those be used with more than one file (even of InOut, with the 5 console). 7 DEFINITION 9 FROM MODULE SYSTEM FilelO; IMPORT 10 FROM Toolbox IMPORT 11 FROM OS IMPORT 2 13 CONST 14 DEL = 177C; 15 EF = 4C; 16 EOL = 15C; wi PSCa=ooGs 18 buffersize 20 21 22 = WORD; DialogPtr; ParmBlkPtr; 16*1024; TYPE File = POINTER TO FileRecord; FileRecord = RECORD 23 24 ref: volRef: INTEGER; INTEGER; 25 name: ARRAY[0..63] 26 27 28 29 30 buffer: bp: bb: output: eof: ARRAY(0..buffersize-1] CARDINAL; (*index of CARDINAL; (*number of BOOLEAN; (*true, if BOOLEAN; (*true, if 31 END; 32 33 VAR SAM con: 35 Done: 36 termCH: (*file reference number*) (*volume (subdirectory) reference OF CHAR; File; BOOLEAN; CHAR; (*Modula string number*) terminated by 0C*) OF CHAR; next byte in buffer*) bytes in buffer*) opened for output*) no more unread bytes*) (*console file (screen and keyboard)*) (*TRUE if an operation was successful*) (*first character after input text*) 37 38 (* --SYETYPE 40 4 42 for Mac FilterHook DialogHook Filetype u open dialog box (see "Inside Macintosh") --- *) PROCEDURE (ParmBlkPtr) : BOOLEAN; PROCEDURE (INTEGER, DialogPtr): INTEGER; ARRAY[0..3] OF CHAR; 43 44 VAR 45 46 47 48 49 50 Sl 52 errCode: filterHook: dlgHook: ftype: DialogHook; ARRAY[0..3] (*file (*file manager status code*) filter procedure (init none)*) (*dialog handling procedure (init none)*) OF Filetype; (*file types to be handled by open dialog*) (*init: ftype(0]:="TEXT", ftype[1..3]:=""*) (* ------------------------------------------------------~- 53 PROCEDURE 54 55 56 57 58 59 INTEGER; FilterHook; Open(VAR f:File; volRef:INTEGER; output : BOOLEAN) ; fn:ARRAY a) OF CHAR; (* Opens file f with name fn on volume (subdirectory) volRef. volRef 0:default volume; 1:internal drive; 2:external drive negative:volume or subdirectory reference number. fn - If not empty, fn is the name of the file to be opened on volume (subdirectory) volRef. The drive number may be placed
App. F 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 a 112 113 114 115 116 117 118 FilelO.DEF 357 In front of the file name separated by a colon (e.g.1:na me). It overwrites volRef. - If empty, an open dialog box is displayed which allows choosing the volume, subdirectory and filename. The chosen values are returned in f*. The value of volRef is irrelevan t in this case. (Advanced programmers: Only those files are displayed whose file type is contained in ftype. Own procedures may be supplied in the variables "filterHook" and "dlgHook" to suppress file names in the open box or to handle additional output dialog items.) TRUE: the specified file with the FALSE: Done indicates file same is opened for output. name is deleted. the specified file is opened for input. if the PROCEDURE Close (VAR (* Closes file f. file f has been indicates opened NIL*) if the operation f (no echo has been ReadInt (f:File; VAR on the are skipped). are skipped) . val:INTEGER); (* Reads an INTEGER from file f (leading ~ . termCH and Done get values*) blanks PROCEDURE ReadString(f:File; VAR s:ARRAY OF CHAR); (* Reads a string of characters (terminated by " " or CR) file f. termCH and Done get values*) PROCEDURE ReadWord(f:File; VAR w:CARDINAL); (* Reads a 16 bit word w from the file f without PROCEDURE Write(f:File; (* Writes a character PROCEDURE (* Writes width ch:CHAR); ch to the file f*) WriteCard(f:File; nr:CARDINAL; w: INTEGER); a CARDINAL nr with width w to the file f. If the of nr is bigger than w, w is expanded*) WriteHex(f:File; a:ARRAY length hexadecimal bytes PROCEDURE (* Writes WriteInt(f:File; 1:INTEGER; w: INTEGER); an INTEGER i with w characters to file f. PROCEDURE (* Skips from conversion*) PROCEDURE (* Writes width console). successful*) PROCEDURE ReadCard(f:File; VAR val:CARDINAL); (* Reads a CARDINAL from file f (leading blanks termCH and Done get values*) PROCEDURE successfully.*) f:File); f becomes PROCEDURE Read(f:File; VAR ch:CHAR); (* Reads a character ch from the file Done Any existing of nr is bigger than w, actual OF WORD; length: INTEGER) ; from a to the file f*) If the actual w is expanded*) WriteLn(f:File); to the start of the next line on the file PROCEDURE WriteString(f:File; s:ARRAY OF CHAR); (* Writes a string s to the file f. Any occurrence "Ss" in s causes a WriteLn*) f*) of the character
App. F Program listings 358 119 PROCEDURE WriteText 120 121 (* Writes a text 122 PROCEDURE WriteWord(f:File; 123 124 (* Writes a 16 bit 125 END FilelO. (f:File; t with word t:ARRAY length OF CHAR; 1 to the file 1: INTEGER); f*) w:CARDINAL) ; w without conversion to the file f*)
App. F 1 2 3 4 5 FilelO.MOD (* FileIo Simple This module except that console). provides they can 7 IMPLEMENTATION 9 FROM SYSTEM IO with more than 359 one file Moe 16.8.87 procedures which are similar to those be used with more than one file (even MODULE FileIo; of InOut, with the IMPORT WORD, 10 FROM MemTypes IMPORT Str255, 11 FROM 112 18 IMPORT DupFNErr, EOFErr, OSType, ParamBlockRec, FS, PBHOpen, PBHCreate,PBClose, PBHDelete, PBWrite, OS , 14 ' ADR, HFS, GetCatInfo, 15 SFGetFile, 16 SFTypeList; 17 FROM 18 FROM QuickDraw Toolbox IMPORT IMPORT Point; ModStr, 19 FROM System IMPORT Allocate, 20 SETREG, REG, SHORT, VAL; ProcPtr; PBRead, SetCatInfo, SFPutFile, SFget, SFput, SFReply, PasStr; Deallocate; Terminal; IMPORT 25 PROCEDURE Open(VAR f:File; output:BOOLEAN) ; 26 volRef:INTEGER; fn:ARRAY OF CHAR; 27 VAR 28 ZI par: Ss: 30 31 32 . pt: reply: tlist: ParamBlockRec; Str259; Point; SFReply; SFTypeList; 33 4i,4,1: 34 35 36 37 PROCEDURE Create (drive: INTEGER; name:ARRAY OF CHAR; type,creator:OSType; VAR status: INTEGER) ; VAR statusl: INTEGER; par:ParamBlockRec; 38 BEGIN 39 WITH par DO 40 41 42 43 loNamePtr:=ADR (name); 1oVRefNum:=drive; status:=FS(PBHCreate,par); statusl:=0; IF status=DupFNErr THEN statusl:=FS(PBHDelete, par) ; 44 status:=FS (PBHCreate, 45 46 END; IF (status=0) 47 48 49 50 51 52 53 54 55 56 INTEGER; AND ioVersNum:=0C; par); (statusl=0) THEN (*set finder info*) 1oFDirIndex:=0; status:=HFS (GetCatInfo,par); IF status=0 THEN 1oFlFndrInfo.fdType:=type; ioFlFndrInfo.fdCreator:=creator; 1oDirID:=0; status:=HFS (SetCatInfo,par); END; END; END; END Create; 57 BEGIN 58 59 ioDirID:=0; Done:=TRUE; IF fn[{0J=0C errCode:=0; THEN (*get file name from dialog box*)
App. F Program listings 360 60 61 62 pt.v:=60; pt.h:=100; PasStr(fn,s); IF output THEN SFPutFile(pt,s,s,VAL(ProcPtr,dlgHook) , reply, SFput) 63 ELSE 64 65 66 67 1:=0; WHILE (1<4) AND (ftype[1,0]<>0C) DO FOR j:=0 TO 3 DO tlist[i,j+1]:=ftype(i,j] INC (1) 68 END; END; 69 70 SFGetFile(pt,s,VAL(ProcPtr, filterHook),i,tlist, VAL (ProcPtr,dlgHook) , reply, SFget) 71 72 END; IF reply.good 73 THEN 74 1:=ORD (reply. fName[0]); 25 FOR i:=0 TO 1 DO s[i]J:=reply.fName[i]; END; 76 77 78 79 80 81 82 83 84 volRef:=reply.vRefNum ELSE errCode:=2 (*cancel*) END; ELSIF (£n[1]=":") AND (£n[0]>="0") AND (fn[0]<="9") THEN volRef:=ORD (£fn[0])-ORD ("0"); 1:=2; WHILE (i<=HIGH(fn)) AND (fn[{i]<>0C) DO s[i-1]:=fn[i]; INC(i) s[0):=CHR(i); ELSE PasStr(fn,s); 85 END; 86 87 88 89 90 91 92 IF output & (errCode=0) THEN Create (volRef,s, "TEXT", "222?" ,errCode) ; END; IF errCode=0 THEN WITH par DO 93 94 95 96 97 joNamePtr:=ADR(s); 10VRefNum:=volRef; 1oPermssn:=0C; ioMisc:=NIL; errCode:=FS (PBHOpen, par) ; IF errCode=0 THEN Allocate (f,SIZE (FileRecord)); 98 IF 99 f<>NIL f*.bp:=0; £*.bb:=0; f*.volRef:=volRef; f*.eof:=FALSE; ModStr(s,f*.name) ; f* .output:=output; END; END; 102 103 END; IF errCode#0 END Open; THEN Done:=FALSE; 106 107 108 109 (* Close Close file f 110 === Hanne en Se ae 111 PROCEDURE Close(VAR f:File); 112 VAR par:ParamBlockRec; 114 115 116 IKT] 118 ioDirID:=0; END; 101 113 10VersNum:=0C; THEN f*.ref:=1oRefNum; 100 104 105 END; f:=NIL END; oe ee gee eee BEGIN IF f=NIL THEN RETURN END; (*con cannot be closed*) par.ioRefNum:=f*. ref; IF f*.output THEN par.ioBuffer:=ADR(f*.buffer); par.ioReqCount :=f*.bp; par.ioPosMode:=0; par.ioP osOffset:=0;
App. F 19 120 FilelIO MOD errCode:=FS END; 361 (PBWrite, par) 121 errCode:=FS 122 123 124 125 126 aa Deallocate(f); END Close; (* Read Read a character from file f ee ee 128 PROCEDURE (PBClose, par) ; Done:=errCode=0; f:=NIL; Read(f:File; VAR x) ch:CHAR) ; 129 VAR par:ParamBlockRec; 130 BEGIN 1 IF f=NIL 132 THEN 133 ELSE 134 (*con*) Terminal.Read(ch); WITH 135 136 £* DO IF bp>=bb THEN par.ioRefNum:=ref; 1737 138 par.ioBuffer:=ADR (buffer); par.ioReqCount:=buffersize; par.ioPosMode:=0; par.ioPosOffset:=0; 139 errCode:=FS (PBRead, par) ; 140 IF errCode=EOFErr 141 142 bb:=SHORT (par.ioActCount); IF bb=0 THEN 143 buffer[0]:=EF; 144 END 145 errCode:=0 END; bp:=0; eof:=TRUE; Done:=FALSE; errCode:=ROFErr END; 146 ch:=buffer [bp]; 147 END 148, 149 THEN INC (bp) END; ” END Read; 150 151 152 (* ReadCard 153 ---------------------~---------------------------------------------- *) 154 155 PROCEDURE ReadCard(f:File; VAR ch:CHAR; 1:INTEGER; 156 BEGIN 157, IF f=NIL 158 Read a CARDINAL-constant VAR (*input 1:=0; val:=0; REPEAT Terminal.Read(ch); 161 WHILE IF ch>" ELSIF (ch>="0") AND DEC(1); (ch<="9") 169 val:=10*val+VAL END; 171 "; val:=val DIV 10; AND ((val<6553) OR ((val=6553) AND Terminal.Write(ch); INC(1); 170 175 ch<>" " DO Terminal.Write(ch); END; 167 168 174 UNTIL from terminal*) ch=DEL THEN IF 1>0 THEN 164 165 172 173 f (*con*) 159 160 166 file val:CARDINAL); THEN 162 163 from (ch<="5"))) THEN (CARDINAL,ORD (ch) -ORD ("0")); Terminal.Read(ch); END; Done:=1>0; ELSE val:=0; 176 REPEAT 177 WHILE (*input Done:=TRUE; Read(f,ch) ch>" " DO UNTIL ch<>" "; from file*)
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 ZAM 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 App. F Program listings 362 IF (ch>="0") AND ((val<6553) (ch<="9") OR AND ((val=6553) Done AND AND (ch<="5"))) THEN val:=10*val+VAL (CARDINAL, ORD (ch)-ORD("0")); ELSE Done:=FALSE; val:=0; END; Read (f, ch) ; END; END; termCH:=ch; END ReadCard; (* ReadInt Read PROCEDURE VAR ReadInt ch: sign: x Ss; CHAR; INTEGER; CARDINAL; ARRAY[1..80] ae BEGIN an (f:File; OF INTEGER-constant VAR from file f val: INTEGER) ; CHAR; INTEGER; ReadString(f,s); x:=0; val:=0; i:=1; IF s[i1]="-" THEN sign:=-1; ch:=s[1]; INC(1); ELSE sign:=1; END; LOOP IF IF ch=0C THEN Done:=TRUE; EXIT; END; (ch<"0") OR (ch>"9") THEN Done:=FALSE; EXIT; END; IF (x>3276) OR ((x=3276) AND (ch>"8")) THEN Done:=FALSE; x:=10*x+VAL (CARDINAL, ORD (ch) -ORD ("0") ); INC (1); ch:=s[1]; END; IF Done IF THEN x<=32767 ELSIF ELSE THEN sign=-1 val:=sign*VAL THEN Done:=FALSE; (INTEGER, x); val:=-32767; DEC(val); END; END; END ReadInt; (* ReadString Read a string PROCEDURE ReadString(f:File; VAR i: INTEGER; ch:CHAR; BEGIN IF f=NIL of characters VAR s:ARRAY from file OF CHAR); (*con*) THEN REPEAT Terminal.Read(ch); UNTIL ch<>" "; di=—ilire WHILE ch>" IF " DO ch=DEL THEN IF 1>=0 THEN ELSIF 1<HIGH(s) Terminal.Write(10C); Terminal.Write(ch); END; Terminal.Read(ch) ; END; ELSE DEC(i); THEN INC(1); s[i]:=ch; END; f EXIT END;
App. F FilelO. MOD 237 REPEAT 238 1:=-1; 239 240 WHILE ch>" " DO IF i<HIGH(s) THEN 241 242 243 244 245 246 247 248 249 250 2312 252 UNTIL ch<>" INC(1); "; s[1i]:=ch; Read(f,ch) ; END; END; termCH:=ch; INC (1); IF 1<=HIGH(s) THEN END ReadString; s[1]:=0C; END; END; (* ReadWord Read a word from File f without Le _ a = PROCEDURE ReadWord(f:File; VAR w:CARDINAL); 253 VAR 254 Read(f,ch); 363 i,j: conversion en x) CHAR; BEGIN 255 256 257 258 259 Read(f,i); Read(f, 4); w:=256*ORD(i) + ORD(4); END ReadWord; 260 (* Write 261 262 263 ----------------------------------------------------------------------- *) PROCEDURE Write(f:File; ch:CHAR); VAR par:ParamBlockRec; status: INTEGER; 264 BEGIN 265 266 IF f=NIL „ 26%, THEN a character to list file (*con*) Terminal.Write(ch); 3° ELSE 268 WITH 269 270 271 272 273 274 275 276 277 278 279 280 281 282 Write £f* DO IF bp>=buffersize THEN par.ioRefNum:=ref; par.ioBuffer:=ADR (buffer); par.ioReqCount:=buffersize; par.ioPosMode:=0; par.ioPosOffset:=0; status:=FS (PBWrite,par) ; bp:=0 END; buffer[bp] :=ch; INC (bp) END END; END Write; (* WriteCard Write a cardinal to list file 283 ----------------------------2722222222222. 284 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER); 285 VAR 286 1,d: INTEGER; Zee: 288 ARRAY[1..5] 289 1:=0; 290 REPEAT 291 d:=nr 292 INC (1); 293 294 295 OF CHAR; BEGIN MOD 10; nr:=nr UNTIL nr=0; WHILE w>l DO Write(f," WHILE 1>0 DIV 10; t[{1]:=CHR(ORD ("0") +d); DO "); Write(f,t[{l]); DEC(w); DEC(1); END; END; *)
Program listings 364 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 Sig 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 END WriteCard; (* WriteHex PROCEDURE VAR App. F Write length WriteHex(f:File; i,j: INTEGER; PROCEDURE bytes s:ARRAY from a OF WORD; length: INTEGER) ; w:CARDINAL; WriteHexDigit (b: INTEGER) ; BEGIN IF b<10 THEN Write (f,CHR(b+ORD("0"))); ELSE END BEGIN Write(f£,CHR(b-10+ORD("A"))); END; WriteHexDigit; (*WriteHex*) j:=0; FOR i:=1 TO length DO IF ODD (1) THEN w:=VAL(CARDINAL,s[j]) DIV 256; ELSE w:=VAL(CARDINAL,s[j]) MOD 256; INC(}4); END; weitet zn? WriteHexDigit(w DIV WriteHexDigit(w MOD END; END 16); 16); WriteHex; (* WriteInt PROCEDURE VAR Write an INTEGER-value WriteInt(f:File; mar INTEGER; xe tes CARDINAL; ARRAY[1..5] sign: CHAR; OF i:INTEGER; to file f w: INTEGER) ; CHAR; BEGIN IF i<0 THEN ELSE sign:="-"; sign3=" "> END; x:=VAL (CARDINAL, ABS(i+1)); x:=VAL (CARDINAL, ABS (1) ) ; INC(x); 1:=0; REPEAT d:=x MOD 10; x:=x DIV 10; INC(1); t[1]:=CHR(ORD ("0") +d); UNTIL x=0; WHILE w>1+1 DO Write(f," "); DEC(w); END; IF (sign="-") OR (w>l) THEN Write(f,sign); WHILE 1>0 DO Write(f,t{l]); DEC(1); END; END WriteInt; (* WriteLn Skip to new a PROCEDURE aaah a — WriteLn(f:File); BEGIN IF f=NIL (*con*) THEN Terminal.WriteLn; ELSE Write (f,EOL); line athe on list END; file Le a i *
App. F 356 357 358 359 360 361 FilelO. MOD END; END WriteLn; (* WriteString Write a string to list -------------------------------------- 362 PROCEDURE WriteString(f:File; 363 VAR i: INTEGER; 364 365 366 IF 1>HIGH(s) 368 ELSIF 369 ELSIF s[i]=0C THEN EXIT; 370 371 372 373 ELSE Write(f,s[i]); END; INC (1); END; 374 END *) THEN WriteLn(f); WriteString; (* WriteText Write text to list file ---------------------------------------------------------------------- 7) PROCEDURE VAR i: BEGIN WriteText (f:File; t:ARRAY OF CHAR; 1:INTEGER); INTEGER; FOR 1:=0 383 END WriteText; TO 384 385) 7_ 386 (* WriteWord Ha a 1-1 DO Write(f,t[i]); Write a word END; to File f without conversion 2222070000020 00000 x) 388 PROCEDURE 389 0.0. OF CHAR); THEN EXIT; s[1]="$" 382 381 s:ARRAY fill BEGIN 13=0; LOOP 367 375 376 377 378 379 380 381 365 WriteWord(f:File; w:CARDINAL) ; BEGIN 390 391 Write(f,CHR(w Write (f,CHR(w 392 END DIV MOD 256)); 256)); WriteWord; 393 394 BEGIN 395 con:=NIL; 396 dlgHook:=VAL(DialogHook,NIL); ftype[0]:="TEXT"; 397 398 filterHook:=VAL(FilterHook,NIL) ; errCode:=0; 399 END FilelO. ABS ADR Allocate b bb bp buffer 335 9 19 304 lol io lala) 336 40 97 306 Ahelsr a lee 307 Alzal a GAs) buffersize WBF 269 ZN € ch 40 1a 59 er 65 a 194 MOO) Aa 203 DYE A 205 Dv 932. ftype[1]:=""; 17012213082270 308 42 a SPN A Ry u AG rl PATHS 82 a 93 94 205 230 246 re 369 7 206 Ol A 206 207 Dahil Bob les 226 226 228 22476255266 208 DSO 209 BAY 222 DATE el
App. F Program listings 366 Close con 123 Create creator 55 49 291 122 229 292 329 340 341 70 105 396 121 143 173 175 178 181 205 206 207 214 d Deallocate DEL DialogHook dlgHook Done 40 42 drive DupFNErr EF eof EOFErr EOL errCode f fdCreator fdType File FileIo FileRecord FilterHook filterHook fn fName ftype GetCatInfo good h HFS HIGH il l1oFlFndrInfo l1oMisc 1oNamePtr loPermssn loPosMode loPosOffset ioRefNum 143 140 143 77 140 97 115 183 268 ss 87 143 98 116 192 284 353 88 398 99 117 200 294 355 91 95 96 105 19 121 v2 139 99 118 221 295 362 99 122 224 301 368 100 122 237 307 370 100 128 241 308 379 100 131 252 318 382 100 134 239 327 388 105 154 255 343 390 111 157 262 344 Sol 111 379 399 128 388 154 192 221 252 262 284 301 327 351 60 719 79 79 80 82 82 82 84 43 395 44 395 95 119 121 139 273 47 231 64 82 201 232 313 372 51 240 65 82 202 238 314 380 246 65 82 202 240 327 382 367 66 82 203 240 334 382 66 83 209 240 335 67 155 209 245 336 69 159 222 246 363 75 163 227 246 365 75 164 230 253 367 Tis) 168 230 255 368 136 50 270 93 397 59 15 al 66 47 FS 1oActCount ioBuffer 10oDirID 1oFDirIndex 88 198 232 302 370 141 117 40 47 49 94 40 94 118 118 99 49 93 Si 138 115 271 272 136 270 81 173 231 256 369
App.F ioReqCount FilelO. MOD 118 1oVersNum ioVRefNum J 1 length MemTypes ModStr name nr ODD 338 301 284 314 Open os 367 137 271 93 93 66 66 66 253 255 256 14 75 286 289 292 292 341 341 343 344 345 345 313 99 40 99 291 291 291 302 294 345 Read 132 147 139 ParamBlockRec PasStr PBClose PBHCreate PBHDelete PBHOpen PBRead PBWrite Point ProcPtr QuickDraw ReadCard ReadInt ReadString ReadWord ref REG reply s SetCatInfo SETREG SFget SFGetFile SFput SFPutFile SFReply SFTypeList SHORT sign status statusl Str255 149 160 171 176 183 226 a 185 273 ai ls ek ey eh 23423717 72412255 BA 240 Rey 246 187 216 221 247 257 115 136 270 62 70 72 74 75 16 60 62 62 69 75 82 83 200 202 203 209 221 231 232 315 316 362 367 368 369 370 51 70 69 62 62 31 32 141 202 202 212 213 332 335 336 A aa Sig PAT 49 2.51 41 43 46 29 SYSTEM System (8 ShkKS BAS) 106 per. par SIG POS, 293 36 61 87 100 100 116 Sie Page ale dg) Man AT 351 118 118 118 119 121 129 136 14a" 263 72708 270° 271 2711-272 28 37 112 1290263 60 84 121 a1 44 43 95 139 119 273 30 2 69 m 60 60 62 69 OSType output 2 Sey PRS) sy 310382 292 295 331 341 345 379 382 Ry 240 344 344 2638213 EY) 246
termCH Terminal tlist Toolbox type VAL volRef vRefNum WwW WORD Write WriteCard WriteHex WriteHexDigit WriteInt WriteLn WriteString WriteText WriteWord x App. F Program listings 368 186 244 132 66 160 69 164 168 iyi 226 230 232 234 266 354 159 181 62 397 76 164 192 69 164 201 70 167 212 169 167 23 180 169 213 208 169 175 179 179 180 212 319 316 335 336 80 88 93 99 99 256 344 301 168 344 296 322 309 346 354 374 383 392 201 340 284 388 294 390 294 391 302 315 316 319 320 327 343 230 345 232 355 262 370 266 382 279 390 294 391 295 307 308 318 319 320 357 368 207 340 207 342 208 208 212 212 330 335 335 336 49
App. F System.DEF 1 (* System 2 3 4 5 6 System dependent module 369 (from MacMETH [86]) The module System is the heart of the Modula-2 system on the Macintosh. It contains the loader and procedures to supply missing instructions of the processor (REAL and LONGINT arithmetic). There are also procedures for calling and terminating programs and handling the heap. 8 DEFINITION MODULE 10 FROM Wes 12 SYSTEM IMPORT 13 TYPE 14 Status = System; (*H.Seiler, C.Vetterli, 22-Dec-85/26-Feb-86*) ADDRESS; (normal, moduleNotFound, fileNotFound, illegalKey, readError, badSyntax, noMemory, alreadyLoaded, 15 killed, tooManyPrograms, continue, noApplication); 16 17 PROCEDURE Allocate (VAR ptr:ADDRESS; size:LONGINT); 18 (* Tries to allocate a memory area of the given size on the heap. If the 19 space is not available, ptr returns NIL otherwise ptr returns the 20 address of the reserved area*)PROCEDURE Deallocate (VAR Ptr:ADDRESS); 22 PROCEDURE Deallocate (VAR ptr:ADDRESS); 23 (* Releases the memory area given by address 25 PROCEDURE 26 27 ptr returns NIL*) Terminate (status :Status); (* terminates the currently cause of termination*) 31 END Systen. ptr. running process. status signals the
Bibliography Aho A.V., Johnson S.C. [1974] LR-parsing, Computing Surveys 6, 2, 99-124 Aho A.V., Ullman J.D. [1972] The Theory of Parsing, Translation, and Compiling, Prentice Hall Aho A.V., Ullman J.D. [1977] Principles of Compiler Design, Addison-Wesley Bauer F.L., Eickel J.(eds) [1976] Compiler Construction. An Advanced Course, SpringerVerlag Blaschek G., Pomberger G., Ritzinger F. [1985] Einführung in die Programm ierung mit Modula-2, Springer-Verlag, to appear in English 1989 Engelfriet J., File G. [1981] Passes, Sweeps, and Visits, in: Lecture Notes in Computer Science 115, Springer-Verlag, 193-207 Feldman J.A., Gries D. [1968] Translator writing systems, CACM 9, 1, 77-113 Fischer C.N., LeBlanc R.J. [1988] Crafting a Compiler, Publishing Company The Benjamin/Cummings Ganzinger H., Giegerich R. [1984] Attribute coupled grammar s, SIGPLAN Notices 19, 6 157-170 Gries D. [1971] Compiler Construction for Digital Compute rs, Wiley Hartmann A.C. [1977] A Concurrent Pascal Compiler for Minicomputers, Springer-Verlag Henderson P., Snowdon R. [1972] An experiment in structured programming, Bit 2, 38-53 370 >
Bibliography Hopcroft, Ullman J.D. [1979] Introduction Computation, Addison-Wesley 371 to Automata Theory, Languages, and Hughes J.W. [1979] A formalization and explication of the Michael Jackson method of program design, SOFTWARE - Practice and Experience 9, 191-202 Inside Macintosh [1985] volumes I-III, Addison-Wesley Jackson M.A. [1975] Principles of Program Design, Academic Press Johnson S.C. [1975] YACC_Laboratories, July 1975 Yet Another Compiler-Compiler, Tech.Rep.Nr.32, Bell Kastens U., Hutt B., Zimmermann E. [1982] GAG: A Practical Compiler-Generator, in: Lecture Notes in Computer Science 141, Springer-Verlag Knuth D.E. [1965] On the translation of languages from left to right, Information and Control 8, 6, 607-639 Knuth D.E. [1968] Semantics of context-free languages, Mathematical Systems Theory 2, 127-145 Koskimies K. [1984] A specification language for one-pass semantic analysis, SIGPLAN Notices 19, 6, 179-189 Koskimies K., Räihä K.-J., Sarjakoski M. [1982] Compiler construction using attribute grammars, Proc. SIGPLAN 82 Symposion on Compiler Construction, June 1982, 153-159 Lewis P.M., Rosenkrantz D.J., Stearns R.E. [1976] Compiler Design Theory, AddisonWesley Lewis P.M., Stearns R.E. 3,464-488 [1968] Syntax directed transduction, Journal ACM 15, Meijer H., Nijholt A. [1982] YABBER - yet another bibliography: translator writing tools, SIGPLAN Notices 17, 10 Mössenböck H. [1986] Alex — a simple and efficient scanner-generator, SIGPLAN Notices 2S Pomberger G [1986] Software Engineering and Modula-2, Prentice Hall Räihä K.-J. [1977] On Attribute Grammars and their Use in a Compiler Writing System, Report A-1977-4, Department of Computer Science, University of Helsinki Räihä K.-J. [1980] Bibliography on attribute grammars, SIGPLAN Notices 15, 3 Räihä K.-J., et al. [1983] Revised Report on the Compiler Writing System HLP78, Report A-1983-1, Department of Computer Science, University of Helsinki
372 Bibliography Rosen S. (ed.) [1967] Programming Systems and Languages, McGraw-Hill, New York Rosenkrantz D.J., Stearns R.E. [1970] Properties of deterministic top-down grammars, Information and Control 17, 3, 226-256 Spenke M., Mühlenbein H., Mevenkamp M., et al. [1984] A language independent error recovery method for LL(1) parsers, SOFTWARE - Practice and Experience 14, 11 Tienari M. [1980] On the Definition of an Attribute Grammar, in: Lecture Notes in Computer Science 94 (eds Goos, G. and Hartmanis, J.), Springer-Verlag Waite W.M., Goos G. [1984] Compiler Construction, Springer-Verlag Watt D.E., Lehrmann Madsen O. [1983] Extended attribute grammars, The Computer Journal 26, 2, 142-153 Wirth N. [1982] Programming in Modula-2, Springer-Verlag Wirth N. [1986] Compilerbau, B.G. Teubner Stuttgart Wirth N., Gutknecht J., Heiz W., et al. [1986] MacMETH - A Fast Modula-2 Language System For the Apple Macintosh, User Manual, ETH Zürich
Index actual attributes, 113, 165 address list for G-code generation, 157 Adele, 11, 125, 203 Aho, 13, 41 Alex, 119 Algol60, 52 algorithmic interpretation of grammars, 83 alias name, 109, 123 aliasspix, 128 alphabet, 14 extension, 51 alternative chain, 48, 108 alternatives, 15 of deletable nonterminals, 137 of eps-nodes, 137 ambiguity, 108 analysis phase, 4 analyzing grammar, 23 AND, 208 any, 45, 107, 122, 124, 178 any-set, 140, 147, 155 anyset, 54 applications of attributed grammars, 171 arithmetic expressions, 19 arithmetization of symbols, 6 arrows, 112 assessment of some compiler generators, 102 at, 122, 165 Atari, 101, 126 attribute, 71, 72, 113 assignment, 131, 165 context, 167 coupling, 98 direction, 164 evaluation, 79 list, 129, 164, 226 numbers, 155 passing, 87 processing, 164 saving, 90 attributed grammar, 73, 79, 105 applications, 171 of Coco, 228 attributes consistency check, 165 of terminals, 122 Attrkind, 166 back end, 6 Bauer, 7 BITSET, 208 Blaschek, 207 BNF, 102 bottom-up syntax analysis, 24 brackets, 136 caller interface, 121 CAP, 209 CARDINAL, 208 central-recursive grammar, 19 characteristics of Coco, 117 CheckAltematives, 153 circular, 108 derivation, 21 grammar, 21 circularity, 150 CloseFile, 223 Coco, 4, 104, 222, 241 characteristics, 117 history, 197 short description, 100 coco.ATG, 228
374 Index cocogen, 224, 245 cocogen2, 225, 254 indirect, 141 Deletable, 60, 141 cocogra, 224, 266 deletable nonterminal, 31, 141 Cocol, 4, 105 Delete redundant eps-nodes, 127, 138 example, 101, 134, 163, 167, 174, 186, DelGraph, 141 190, 192 syntax, 212 derivable symbol, 21 derivation, 16 cocolex, 223, 275 cocolst, 226, 283 tules, 15 derived attributes, 74 cocosem, 223, 287 cocosemframe, 161, 297 cocosym, 224, 299 deterministic grammar, 24 direct deletability, 128, 134 documentation, 187 cocosyn, 223, 316 cocosynframe, 159, 328 cocotst, 225, 338 dynamic compiler structure, 8 col, 122 CollectFirst, 143 EBNF, 19, 20, 107, 117 CollectFollow, Emit, 157 144 comments, 106, 110 compiler, 2 compiler compiler, 3, 91 compiler description language, 3, 105 compiler error numbers, 241 compiler structure dynamic, 8 Static, 4 complement symbol any, 45, 107 Camplete, 145 CompleteAt, 129, 223 completeness, 108, 149 components of a generated compiler, 119 compound characters, 6 ConcatLeft, 133, 223 ConcatRight, 132, 223 context condition, 76, 87, 115 context-free grammar, 15, 106 Copy, 162, 163, 223 CopyFramePart, 160, 161 correct grammar, properties, 108 cross-reference list, 214 cyclic semantic dependencies, 82 EmitAction, 166, 167, 223 empty string, 14, 107 end-of-file symbol, 109 end-of-line symbol, 110 endsem, 70 Engelfriet, 98 eps, 107 followers, 54 eps-nodes insertion, 136 removal, 138 terminal successors of, 140 eps-set, 140, 145, 155 example, 196 epsset, 54 equivalent top-down graphs, 45 errdist, 68 Error, 60, 65, 68 error distance, 68 error handling, 62, 64 error message module, 119, 226, 348 error messages, 65, 123 Errorptr, 123 Errors, 123, 226, 348 dangling else, 29, 108, 147 debug switches, 241 DEC, 209 declaration of semantic objects, 115 symbols, 109 definition module, 210 DelEps, 139 deletability, 31 direct, 128, 134 example of Cocol, 101, 163, 167, 174, 186, 190, 192 generated compiler parts, 192 EXCL, 209 exit statement, 209 experiences, 197, 201 export list, 209 extended Backus-Naur form, 19 factorization of
Index nonterminals, 49 top-down graphs, 43 File, 98 FileIo, 226, 356 Fill, 67 FillSucc, 67 filter procedure, 120 Find circular rules, 148, 150 Find deletable symbols, 127, 141 FindEps, 146 FindEpsFollowers, 146 first (X), 26, 54 ; Fischer, 13 follow(X), 28, 143 formal attributes, 113, 165 frame module, 118, 159, 161, 297, 328 free monoid, 14 free semi-group, 14 front end, 6 grammar of Cocol, 212 grammar name, 106, 110, 121 grammar rules, 107 grammar tests, 126, 147, 225, 338 grammars in matrix form, 34 grammatical language levels, 22 GraphList, 223 Graphnode, 47, 130 Gries, 7, 13, 85 HALT, 209 handle, 18 Hartmann, 85 Henderson, 184 HIGH, 209 hints for reading the source lists, 226 HLP84, 91, 94, 104 Hopcroft, 21 Hughes, 188 Hutt, 96 G-code, 53, 55, 88, 117, 155, 213 example, 195 generation, 156 parser, 58 IBM-PC, 101, 126 identifiers, 106 GAG, 91, 96, 102, 104 Ganzinger, 91, 98 GenAssign, 166, 167, 223 GenCode, 156, 157 Generate G-code, 157 implementation description, 125 implementation module, 210 implementation restrictions, 241 import, 115, 122 list, 209 generated compiler parts, 118 INC, 209 example, 192 generated compiler, operation, 120 generated semantic actions, 165 generation of the semantic evaluator, 245 syntax analyzer, 254 generative grammar, 23 Get eps-sets, 145 Get symbol sets, 127 Get terminal start symbols, 142 Get terminal successors, 144 GetAdr, 157 GetAt, 129, 165, 167, 223 GetFirstSet, 142 GetMacroNr, 163, 223 GetNode, 131, 140, 148, 157, 223 GetSingles, 151 GetSy, 122, 124, 129, 140, 148, 223 Giegerich, 91 Goos, 13, 82, 83 GRAMMAR, 106 grammar, 15 INCL, 209 indirect deletability, 141 individual characters, 6 inherited attributes, 74, 75 inner module, 211 input attribute, 113 input of Coco, 118 input interface, 122 Insert eps-nodes before deletable nt's, 127,138 _ interfaces of the generated compiler, 121 intermediate language, 120, 124 intermodular cross-reference list, 214 invocation of Coco, 118 IsTerm, 152 Jackson, 187 Johnson, 13, 91, 92 Kastens, 91, 96 375
376 Index keywords, 6, 105 Knuth, 13, 29, 82 Koskimies, 91, 94, 102 Coco, 199 the generated compilers, 200 MenTypes, 226 mini-scanner, 174 L-attributed grammar, 4, 82, 83, 92, 117 LALR(I) parser, 92,94, 96 language, 16 levels, 22 LeBlanc, 13 left-canonical derivation, 17 left-recursive grammar, 19 Lewis, 82 lexical analysis, 5, 6 analyzer, 119, 122, 129, 165, 275 analyzer described by Cocol, 171 analyzer, specification, 172 language level, 22 Lilith, 101, 126, 198 line, 122 line numbers, 122, 131 linking alternative graphs, 133 component graphs, 132 listings, 220 literals, 6 LL(1) test, 148, 153 LL(1) analysis nonrecursive, 38 recursive, 35 LL(1) conditions, 27, 28 for top-down-graphs, 47, 49 LL(1) conflicts, 108 in lexical structures, 179 LL(1) grammar, 23, 26 ; 201 LL(k) condition, 40 LL(k) grammar, 25, 40 LL(k) test, 41 lookahead, 25 Macintosh, 101, 119, 126 macro, 112, 116, 163 main algorithm of Coco, 127 main program, 119, 121, 210, 222, 241 MarkReachedNts, 150 matching of symbols, 48 matrix form of grammars, 34 measurements, 197 Meijer, 91 memory requirements of Modula-2, 111, 115, 119, 126, 207 modules, 209 description, 222 hierarchy, 221 overview, 220 Mössenböck, 119 MUG, 91, 98, 104 multi-pass compiler, 8, 9, 120, 124 name list, 129, 155 names, 6 NewAdr, 157 NewAt, 129, 164, 167, 223 NewMacro, 223 NewNode, 131, 223 NewSy, 129, 223 Nijholt, 91 nococosy, 162 nodes of the top-down graph, 130 non-circular grammar, 21 nonterminal, 14, 15, 110, 128 deletable, 141 nonterminals factorization of, 49 replacement of, 15 substitution of, 49 terminal successors of, 140, 143 termination of, 108, 152 numbering of terminals, 109, 122 numbers, 106 OpenFile, 223 OpenSem, 163, 223 optimization of attribute processing, 167 option symbol, 20 OR, 208 ordered attributed grammar, 96 OS, 226 Output attribute, 113 output of Coco, 118 Output interface, 122 parameter arrows, 112 Parse, 58, 60, 86, 121, 127 ParseNonRecursive, 38
Index parser, 223, 316 generation, 159 377 the generated compilers, 201 interface, 121 tables, 118, 155 tables, example, 195 tables, generation, 154 ParseRecursive, 35 parts of the generated compiler, 119 Pascal, 207 pass, 8 phrase, 17, 18 PL/1, 50 F PLM/80, 50 Pomberger, 207 pragma, 109, 124 semantics, 113, 128, 155 printinput, 121 printnodes, 121 procedures, 115 productions, 15, 107 program frames, 118 program listings, 220 scanner, 129, 165, 223, 275 scanner generator, 119, 171 scanner interface, 122 scanner procedure, 122 scanner specification, 172 scope of semantic objects, 116 sem, 70, 111 Semant, 85, 86 semantic action numbers, 131 actions, 70, 111 actions, generated, 165 actions, processing, 163 analysis, 5, 8 declarations, copying, 162 description, 110 error action, 115 evaluator, 118, 119, 223 evaluator of Coco, 287 evaluator, example, 194 QuickDraw, 226 evaluator, generation, 160 frame module, 297 interface, 85 . macro, 111, 112, 116, 163 Räihä, 91, 94 modules, 119, 122 reachability, 149 recursive grammar, 19 productions, 19 reduced grammar, 20, 21 redundancy, 108 redundant eps-node, 138 symbol, 21 repetition symbol, 20 replacement of nonterminals, 15 RepNode, 131, 140, 223 RepSy, 129, 140, 223 RestartHash, 162, 223 restrictions, 241 results of a Coco run, 192 right end of graphs, 131 procedures for lexical analysis, 180 semantics, 69 sentence, 16 symbol, 15 sentential form, 16 simple phrase, 18 single-pass compiler, 8, 9 Snowdon, 184 software engineering, 182 source code, 220 hints, 226 source list, 118 generator, 283 source program, 2 spelling index, 129 spix, 128, 129, 162, 166 stacking of semantic objects, 116 right-recursive grammar, 19 Ritzinger, 207 start symbol, 110, 149 StartCopy, 223 root, 15 static compiler structure, 4 symbol, 106, 110, 149 Rosenkrantz, 40, 42 RULES, 107 run-time of Stearns, 40, 42 stepwise refinement, 11 StopHash, 162, 223 strings, 6, 14, 106 Coco, 199 substitution of nonterminals, 49
378 Index Test if all nt's can be reached, 148, 149 symbol list, 126, 127, 224, 226, 299 symbol names, 129 symbol sets, collection, 140 token code, 109, 122 Toolbox, 226 wn graph, 42, 126, 130, 226, 266 graphs, equivalent, 45 graphs, factorization of, 43 syntax analysis, 23, 24 top-down-graphs, LL(1) conditions for, 47, 49 trace switches, 241 tracing the parser, 121 Triple, 66 two level-grammar, 77 Symbolnode, 127 symbols, 6, 14 Symboltype, 127 SyNr, 129, 223 syntactic extension, 51 syntactical language level, 22 syntax analysis, 5, 34 analyzer, 118, 119, 223, 316 analyzer, generation, 159 of Cocol, 212 description, 106 error indicator, 121 error interface, 123 error message, 109 error-recovery, 118 notation, 107 rules, 15, 107 tree, 7, 14, 17, 91 SyntaxError, typ, 122 type transfer functions, 209 Ullman, 13, 21, 41 understanding the source code, hints, 226 useless symbol, 21 user modules, 122 using Coco, 117 123 synthesis phase, 5 synthesized attributes, 74 SYSTEM, 211 Vach, 98 van Wijngaarden, 77 variables, 115 versions of Coco, 4 System, 226 system specific procedures, 369 Visited, 157 vocabulary, 14 target program, 2 tasks of Coco, 126 telegram problem, 184 terminal, 14, 15, 109, 122, 128 class, 23 start symbols, 26, 31, 32, 140, 142 start symbols of length k, 40 successors, 28, 31, 33 Waite, 13, 82, 83 Watt, 77 where, 77 Wirth, 20, 85, 107, 198, 207 WORD, 208 successors of eps-nodes, 140, 145 successors of nonterminals, 140, 143 terminating symbol, 21 YACC, 91, 92, 98, 104 termination, 21 of nonterminals, 108, 152 Test completeness, 148, 149 Test grammar, 127, 148 Zimmermann, 96 Test if all nt's t's, 148, 152 can be derived to

* » = i —_ ~ ; é
N c En 005.26 R2Z4 : Pe Rechenberö r to ra genne A compiler uters microcomp (005.26 R24c 120,067 Rechenberg, Peter. A compiler generator microcomputers tor for
EEE ee A COMPILER GENERATOR FOR MICROCOMPUTERS Presents a practical approach to compiler construction, illustrating how to convertthe theoretical principles of compiler writing into a working program. The book _ describes the compiler generator Coco, developed by the authors in Modula-2 to runon microcomputers. Features include: m Adetailed description of acompiler generator including its source code: = The application of the compiler generatorto non-trivial problems. m Emphasis on table-driven syntax analysis with automatic error recovery and semantic specification of compilers by means of attributed grammars. @ |llustration of the application of documentation methods to alarge program. P. Rechenberg is Professor of Computer Science at the . University of Linz, Austriaand H. Mössenböck is Assistant Professor of Computer Science atthe Federal Institute of Technology (ETH), Zurich, Switzerland. IT if Prentice Hall ISBN D-13-155060-8