Digital PDFs

AA-MFO7A-TE

2000

769 pages

Original

51MB

Document:	ULTRIX-32 Supplementary Documents Programmer
Order Number:	AA-MFO7A-TE
Revision:	000
Pages:	769
Original Filename:

OCR Text

ULTRIX-32 ™

Supplementary Documents
Volume 2 Programmer

Order Number: AA-MF07 A-TE

ULTRIX-32 Supplementary Documents
Programmer
Order No. AA-MF07 A-TE

ULTRIX-32 Operating System, Version 3.0

Digital Equipment Corporation

The information in this document is subject to change without notice and
should not be construed as a commitment by Digital Equipment Corporation.
Digital Equipment Corporation assumes no responsibility for any errors that
may appear in this document.
The software described in this document is furnished under a license and may
be used or copied only in accordance with the term~ of such license.
No responsibility is assumed for the use or reliability of software on equipment that is not supplied by DIGITAL or its affiliated companies.

The following are trademarks of Digital Equipment Corporation:
DEC
DEC US
MASSBUS

PDP
ULTRIX
ULTRIX~ll

ULTRIX-32
UNIBUS
VAX
VMS
VT

~D~DDmD™

UNIX is a trademark of AT&T Bell Laboratories.
Information herein is derived from copyrighted material as permitted under a
license agreement with AT&T Bell Laboratories.

This software and documentation is based in part on the Fourth Berkeley
Software Distribution under license from the Regents of the University of
California. We ackp.owledge the Electrical Engineering and Computer Science
Departments at the Berkeley Campµs of the Up,iversity of California for their
role in its development.

iii

This software and documentation is based in part on the Fourth Berkeley Software Distribution under
license from The Regents of the University of California. Digital Equipment Corporation acknowledges
the following individuals and institutions for their role in its development:
"The UNIX Time-Sharing System": Copyright c 1974, Association for Computing Machinery, Inc.
reprinted by permission. This is a revised version of an article that appeared in Communications of the
ACM, 17, No. 7 (July 1974), pp. 365-375. That article was a revised version of a paper presnted at the
Fourth ACM Symposium on Operating Systems Principles, IBM Thomas J. Watson Research Center,
Yorktown Heights, New York, October 15-17, 1973. Acknowledgements: for their help and support,
R.H. Canaday, R. Morris, M.D. Mcilroy, and J.F. Ossanna.
"Advanced Editing on UNIX" acknowledgement: Ted Dolotta for his ideas and assistance.
"An Introduction to the UNIX Shell" acknowledgements: Dennis Ritchie, John Mashey and Joe Maranzano for their help and support.
"LEARN - Computer-Aided Instruction on UNIX" acknowledgements: for their help and support, M.E.
Bittrich, J.L. Blue, S.I. Feldman, P.A. Fox, M.J. McAlpin, E.Z. Rothkopf, Don Jackowski, and Tom
Plum.
"A System for Typesetting Mathematics" acknowledgements: J.F. Ossanna, A.V. Aho, and S.C. Johnson,
for their ideas and assistance.
"A TROFF Tutorial" acknowledgements: J. F. Ossanna, Jim Blinn, Ted Dolotta, Doug Mcilroy, Mike
Lesk and Joel Sturman, for their help and support.
The document "The C Programming Language - Reference Manual" is reprinted, with minor changes,
from "The C Programming Language, by Brian W. Kernighan and Dennis M. Ritchie, Prentice-Hall,
Inc., 1978.
"Make - A Program for Maintaining Computer Programs" ackowledgements: S.C. Johnson, and H.
Gajewska, for their ideas and assistance.
"YACC: Yet Another Compiler-Compiler" acknowledgements: B.W. Kernighan, P.J. Plauger, S.I. Feldman, C. Imagna, M.E. Lesk, A. Snyder, C.B. Haley, D.M. Ritchie, M.O. Harris and Al Aho, for their
ideas and assistance.
"Lex - A Lexical Analyzer Generator" acknowledgements: S.C. Johnson, A.V. Aho, and Eric Schmidt, for
their help as originators of much of Lex, as well as debuggers of it.
The document "RATFOR - A Preprocessor for a Rational Fortran" is a revised and expanded version of
the one published in Software - Practice and Experience, October 1975. The Ratfor described here is
the one in use on UNIX and GCOS at A T & T Bell Laboratories. Acknowledgements: Dennis Ritchie,
and Stuart Feldman, for their ideas and assistance.
"The M4 Macro Processor" acknowledgements: Rick Becker, John Chambers, Doug Mcilroy, and Jim
Weythman, for the help and support.
"BC - An Arbitrary Precision Desk-Calculator Language" acknowledgement: The compiler is written in
YACC; its original version was written by S.C. Johnson.
"A Dial-Up Network of UNIX TM Systems" acknowledgements: G.L. Chesson, A.S. Cohen, J. Lions,
and P.F. Long, for their suggestions and assistance.
Copyright c 1979, 1980 Regents of the University of California. Permission to copy these documents or
any portion thereof as necessary for licensed use of the software is granted to licensees of this software,
provided this copyright notice and statement of permission are included.
The document "Writing Tools - The STYLE and DICTION Programs" is copyrighted© 1979 by AT &
T Bell Laboratories. Holders of a UNIX TM/32V software license are permitted to copy this document,
or any portion of it, as necessary for licensed use of the software, provided this copyright notice and
statement of permission are included.

iv
The document "The Programming Language EFL" is copyrighted® 1979 by AT & T Bell Laboratories.
EFL has been approved for general release, so that one. may copy it subject only to the restriction of giving proper acknowledgement to A T & T Bell Laboratories.
The documents "A Portable Fortran 77 Compiler" and "Fsck - The UNIX File System Check Program"
are modifications of earlier documents which are copyrighted ® 1979 by A T & T Bell Laboratories.
Holders of a UNIX TM/32V software license are permitted to copy these documents, or any portion of
them, as necessary for licensed use of the software, provided this copyright notice and statement of permission are included. This manual reflects system enhancements made at Berkeley and sponsored in
part by NSF Grants MCS-7807291, MCS-8005144, and MCS-74-07644-A04; DOE Contract DE-AT0376SF00034 and Project Agreement DE-AS03-79ER10358; and by Defense Advanced Research Projects
Agency (DoD) ARPA Order No. 4031, monitored by Naval Electronics Systems Command under Contract No. N00039-80-K-0649.
"Ex Reference Manual" acknowledgements: Chuck Haley contributed greatly to the early development
of ex. Bruce Englar encouraged the redesign which led to ex version 1. Bill Joy wrote versions 1 and 2.0
through 2.7, and created the framework that users see in the present editor. Mark Horton added macros
and other features and made the editor work on a large number of terminals and UNIX systems.
"A Guide to the Dungeons of Doom" acknowledgements: Rogue was originally conceived by Glenn Wichman and Michael Toy. Ken Arnold and Michael Toy then smoothed out the user interface, and added
many new features. We would like to thank Bob Arnold, Michelle Busch, Andy Hatcher, Kipp Hickman,
Mark Horton, Daniel Jensen, Bill Joy, Joe Kalash, Steve Maurer, Marty McNary, Jan Miller, and Scott
Nelson for their ideas and assistance.
The document "The FRANZ LISP Manual" is copyrighted c 1980, 1981, 1983 by the Regents of the
University of California. (exceptions: Chapters 13, 14 (first half), 15 and 16 have separate copyrights, as
indicated. These are reproduced by permission of the copyright holders.) Permission to copy without
fee all or part of this material is granted provided that the copies are not made or distributed for direct
commercial advantage, and the copyright notice of the Regents, University of California, is given. All
rights reserved. Work reported herein was supported in part by the U.S. Department of Energy, Contract DE-AT03-76SF00034, Project Agreement DE-AS03-79ER10358, and the National Science Foundation under Grant No. MCS 7807291. MC68000 is a trademark of Motorola Semiconductor Products, Inc.
"The FRANZ LISP Manual" acknowledgements: Richard Fateman, Mike Curry, John Breedlove, Jeff
Levinsky, Bill Rowan, Tom London, Keith Sklower, Kipp Hickman, Charles Koester, Mitch Marcus,
Don Cohen, John Foderaro, and Kevin Layer.
The document "Berkeley Pascal User's Manual" is copyrighted c 1977, 1979, 1980, 1983 by W.N. Joy,
S.L. Graham, C.B. Haley, M.K. McKusick, P.B. Kessler. The financial support of the first and second
authors' work by the National Science Foundation under grants MCS74-07644-A04, MCS78-07291, and
MCS80-05144, and the first author's work by an IBM Graduate Fellowship are gratefully acknowledged.
"Introduction to the f77 1/0 Library" acknowledgement: Peter J. Weinberger originally wrote the 1/0/
Library at A T & T Bell Laboratories.
"Writing Papers with NROFF Using -ME", and "-ME Reference Manual" acknowledgements: Bob
Epstein, Bill Joy, Larry Rowe, Ricki Blau, Pamela Humphrey, and Jim Joyce, for their ideas and assistance. UNIX, NROFF, and TROFF are trademarks of A T & T Bell Laboratories.
"Refer - A Bibliography System" acknowledgements: Mike Lesk of A T & T Bell Laboratories wrote the
original refer software, including the indexing programs. Al Stanberger of the Forestry Department
wrote the first version of addbib, then called bibin. Greg Shenaut of the Linguistics Department wrote
the original versions of sortbib and roffbib.
"Screen Updating and Cursor Movement Optimization: A Library Package" acknowledgements: For
their help and support, Bill Joy, Doug Merritt, Kurt Shoens, Ken Abrams, Alan Char, Mark Horton, and
Joe Kalash.
"Disc Quotas in a UNIX Environment" acknowledgements: Sam Leffler and Kirk McKusick, for their

v
\

)

work on the quota code. The current disc quota system is loosely based on a very early scheme implemented at the University of New South Wales and Syndey University.
The document, "Fsck - The UNIX File System Check Program", is a revision by Marshall Kirk
McKusick; T.J. Kowalski wrote the original paper. For their help and support, we thank Bill Joy, Sam
Leffler, Robert Elz, Dennis Ritchie, Robert Henry, Larry A. Wehr, and Rick B. Brandt. Our sponsors
were the National Science Foundation under grant MCSS0-05144, and the Defense Advance Research
Projects Agency (DoD) under Arpa Order No. 4031 monitored by Naval Electronic System Command
under Contract No. N00039-82-C-0235.
"A Fast File System for UNIX" acknowledgements: William N. Joy, Samuel J. Leffler, Robert S. Fabry,
Marshall Kirk McKusick, Robert Elz, Michael Powell, Peter Kessler, Rober Henry, and Dennis Ritchie.
This work was done under grants from the National Science Foundation under grant MCSS0-05144, and
the Defense Advance Research Projects Agency (DoD) under ARPA No. 4031 monitored by Naval Electronic System Command under Contract No. N00039-82-C-0235.
"4.2BSD Networking Implementation Notes" acknowledgements: The internal structure of the system is
patterned after the Xerox PUP architecture [Boggs79]. The use of software interrupts for process invocation is based on similar facilities found in the VMS operating system. Many of the ideas are based on
Rob Gurwitz's TCP/IP implementation for the 4.lBSD version of UNIX on the VAX [Gurwitz81]. Greg
Chesson explained his use of trailer encapsulations in Datakit, instigating their use in our system.
"SENDMAIL - An Internetwork Mail Router" acknowledgements: For their ideas and assistance, Kurt
Shoens, Bill Joy, Mark Horton, Erick Schmidt, Kirk McKusick, Marvin Solomon, Mike Stonebraker, and
Bob Epstein. A considerable part of this work was done while under the employ of the INGRES Project
at the University of California at Berkeley.

vii

BEFORE YOU START

This is the second volume of ULTRIX-32 Supplementary Documents, a three volume set that
contains articles describing the U{,TRIX-32 system. The authors are computer scientists and
program developers at Bell Laboratories and the University of California at Berkeley. The
articles explain the software tools and utilities available on your ULTRIX-32 system. They
constitute most of the lore that enriches this operating system; topics range from getting
started to the details of screen updating and cursor movement facilities.
Each volume in this set contains several parts, and each part begins with an introduction.
The introduction to each part serves as a map that will help you find your way around in the
documentation, allowing you to select articles that relate to your interest. Each introduction
gives an overview of the material covered in the part and a description of the articles included.
Most readers will not need to read all articles, since many articles cover parallel topics.
These articles provide authoritative and accurate information that is unavailable elsewhere.
However, you should be aware that some of the information in some articles is dated. We
include those articles because many of the concepts they develop are still current and important.
At the end of each volume in this set, you will find a master index identifying topics in all
thr~e volumes.

Topics in Volume II
The articles in this second volume deal with programming and support tools for programmers
on the ULTRIX-32 system. Most of the authors assume that readers are familiar with one or
more programming languages. For example, the articles on FORTRAN 77 are written for people wllo already know a standard version of FORTRAN.
,
"UNIX Programming - Se(:ond Edition," in Part 1 of this volume, tells how to write programs
that cooper&te with the operating system. Many readers will find it useful to read this article
before going on to articles on the languages and utilities.
The articles in Part 2 deal with four languages and four pr~processors. The languages are:

• c
• FORTRAN
77.
.
• Franz Lisp
• Pascal
The four preprocessors are:
• RATFOR
• EFL
• FP
• M4

viii

Part 3, Supporting Tools, offers articles on three kinds of utilities:
• Program and library maintenance tools
• Program checking and debugging tools
• Compiler and preprocessor development tools
And the articles in Part 4, System Programming, cover topics such as:
• Inner workings of the ULTRIX-32 system
• System and kernel facilities available to user programs
• Assembly language (as)
• Screen manipulation functions
• The ULTRIX-32 line printer spooler
The features described in this volume provide the flexibility and programming power for
which UNIX is famous. A good understanding of many of the concepts and procedures
presented here is essential for efficient use of your ULTRIX-32 system.

Table of Contents ix
\

BEFORE YOU START
PART 1: PROGRAMMING CONSIDERATIONS
UNIX PROGRAMMING
INTRODUCTION . . . .
BASICS . . . . . . . . .
Program Arguments.
The "Standard Input" and "Standard Output"

. 1-3
. 1-3
. 1-3

. 1-4

THE STANDARD VO LIBRARY. . . .

. 1-5

File Access . . . . . . . . . . . .
Error Handling - Stderr and Exit .
Miscellaneous VO Functions

. 1-7

LOW-LEVEL VO ..

. 1-5
. 1-8
. 1-8

File Descriptors. . . . . . .
Read and Write . . . . . .
Open, Creat, Close, Unlink .
Random Access - Seek and Lseek .
Error Processing . . . .

. 1-8
. 1-9
1-10
1-11

PROCESSES . . . . . . . . . . . . .

1-12

The "System" Function . . . . . .
Low-Level Process Creation - Execl and Execv.
Control of Processes - Fork and Wait . .
Pipes . . . . . . . . . . . . . . . . . .
SIGNALS - INTERRUPTS AND ALL THAT .
APPENDIX: THE STANDARD VO LIBRARY.
General Usage
Calls . . . . .

1-12
1-12
1-13

1-14
1-14
1-17
1-21
1-21
1-21

PART 2: LANGUAGES
THE C PROGRAMMING LANGUAGE REFERENCE MANUAL
INTRODUCTION . . . . . .
LEXICAL CONVENTIONS .

. 2-5
. 2-5

Comments . . . . .
Identifiers (Names) .
Keywords . . . . .
Constants . . . . .

. 2-5
. 2-5
. 2-5
. 2-6

Integer Constants
Explicit Long Constants .
Character Constants .
Floating Constants . .

. 2-6
. 2-6
. 2-6
. 2-6

Strings . . . . . . . . . .
Hardware Characteristics .

. 2-6
. 2-6

SYNTAX NOTATION.
WHAT'S IN A NAME? . . . .

. 2-7
. 2-7

x Table of Contents

THE C PROGRAMMING LANGUAGE REFERENCE MANUAL (continued)
OBJECTS AND !VALUES. .
CONVERSIONS . . . . . .

. 2-8
. 2-8

Characters and Integers .
Float and Double . . .
Floating and Integral .
Pointers and Integers .
Unsigned . . . . . . .
Arithmetic Conversions .

. 2-8
. 2-8
. 2-8
. 2-8
. 2-8
. 2-8

EXPRESSIONS . . . . .
Primary Expressions
Unary Operators . .
Multiplicative Operators
Additive Operators .
Shift Operators . . .
Relational Operators
Equality Operators .
Bitwise AND Operator
Bitwise Exclusive OR Operator
Bitwise Inclusive OR Operator.
Logical AND Operator
Logical OR Operator .
Conditional Operator .
Assignment Operators.
Comma Operator . . .
DECLARATIONS . . . . .
Storage Class Specifiers.
Type Specifiers . . . . .
Declarators. . . . . . .
Meaning of Declarators .
Structure and Union Declarations .
Initialization .
Type Names.
Typedef . . .
STATEMENTS . .
Expression Statement.
Compound Statements, or Block.
Conditional Statement
While Statement .
Do Statement . .
For Statement . .
Switch Statement
Break Statement .
Continue Statement
Return Statement .
Goto Statement . .
Labeled Statement .
Null Statement. . .

. 2-9
. 2-9
2-10
2-11
2-11

2-12
2-12
2-12
2-12
2-12
2-13
2-13

2-13
2-13
2-13
2-14
2-14
2-14
2-15
2-15
2-15
2-16
2-18
2-19

2-20
2-20
2-20
2-20
2-21
2-21
2-21
2-21
2-21
2-22
2-22
2-22
2-22
2-22
2-23

Table of Contents xi
EXTERNAL DEFINITIONS. . . .

2-23

External Function Definitions .
External Data Definitions .

2-23
2-24

SCOPE RULES . . . .

2-24

COMPILER CONTROL LINES

2-24
2-24
2-25

Token Replacement. . .
File Inclusion. . . . . .
Conditional Compilation
Line Control . . . . . .

2-25
2-25
2-25
2-26

IMPLICIT DECLARATIONS
TYPES REVISITED. . . .

2-26
2-26

Lexical Scope . . . . . .
Scope of Externals . . . .

Expressions . . .
Declarations . . .
Statements . . . .
External Definitions
Preprocessor . . . .

2-26
2-26
2-27
2-27
2-28
2-28
2-29
2-30
2-30
2-31
2-32
2-33
2-33

RECENT CHANGES TO C .

2-35

Structure Assignment.
Enumeration Type . . .

2-35
2-35

Structures and Unions .
Functions . . . . . . .
Arrays, Pointers, and Subscripting.
Explicit Pointer Conversions . .

CONSTANT EXPRESSIONS . . . .
PORTABILITY CONSIDERATIONS.
ANACHRONISMS . .
SYNTAX SUMMARY.

A TOUR THROUGH THE PORTABLE C COMPILER
INTRODUCTION . . .
OVERVIEW . . . . . .
THE SOURCE FILES .
DATA STRUCTURE CONSIDERATIONS.
PASS ONE . . . . . .
LEXICAL ANALYSIS.
PARSING . . . . . . .
STORAGE CLASSES .
SYMBOL TABLE MAINTENANCE.
TREE BUILDING.
INITIALIZATION .
STATEMENTS . .
OPTIMIZATION .
MACHINE DEPENDENT STUFF .
FIRST PASS SUMMARY.
PASS TWO . . . . . . . .
OVERVIEW . . . . . . . .
THE MACHINE MODEL .

2-37
2-38
2-39
2-40
2-41
2-41
2-41
2-42
2-43
2-44
2...,45
2-46
2-47
2-47
2-49
2-49
2-49
2-50

xii Table of Contents

A TOUR THROUGH THE PORTABLE C COMPILER (continued)
GENERAL ORGANIZATION . . . . . . . . .
THE TEMPLATES . . . . . . . . . . . . . .
THE TEMPLATE MATCHING ALGORITHM.
REGISTER ALLOCATION . . . . . . . . .
THE MACHINE DEPENDENT INTERFACE
THE REWRITING RULES . . . . . . .
THE SETHI-ULLMAN COMPUTATION
REGISTER ALLOCATION . . .
COMPILER BUGS . . . . . . .
SUMMARY AND CONCLUSION

2-50
2-53
2-54
2-55
2-56
2-56
2-58
2-59
2-59
2-60

A TOUR THROUGH THE UNIX C COMPILER
THE INTERMEDIATE LANGUAGE
EXPRESSION OPTIMIZATION.
CODE GENERATION . . . . . .
DELAYING AND REORDERING . .

2-63
2-66
2-68
2-76

INTRODUCTION TO THE F77 VO LIBRARY
FORTRAN VO . . . .

2-79

Types of I/0 . . .

2-79
2-79
2-79
2-79
2-80
2-80

Direct Access
Sequential Access .
List Directed VO
Internal I/0 . . . .

VO Execution . . . . .
IMPLEMENTATION DETAILS.
Number of Logical Units
Standard Logical Units .
Vertical Format Control.
The Open Statement .
Format Interpretation.
List Directed Output . .
VO Errors . . . . . . .

NON-"ANSI STANDARD" EXTENSIONS.
Format Specifiers.
Print Files . . . .
Scratch Files . . .
List Directed I/0 .

RUNNING OLDER PROGRAMS
Traditional Unit Control Parameters.
Preattachment of Logical Units .

MAGNETIC TAPE I/0 . . . . . . . . .
CAVEAT PROGRAMMER . . . . . . .
APPENDIX A: I/0 LIBRARY ERROR MESSAGES.
APPENDIX B: EXCEPTIONS TO THE ANSI STANDARD

2-80
2-80
2-80
2-81
2-81
2-81
2-82
2-82
2-82
2-82
2-83
2-83
2-83
2-83
2-83
2-84
2-84
2-84
2-85
2-88

Table of Contents xiii

A PORTABLE FORTRAN 77 COMPILER
INTRODUCTION . . . . . . . .

2-89

Usage . . . . . . . . . . .
Documentation Conventions.
Implementation Strategy . .

2-89
2-90
2-91

LANGUAGE EXTENSIONS . . .

2-91

Double Complex Data Type .
Internal Files . . . . . . . .
Implicit Undefined Statement .
Recursion . . . . . .
Automatic Storage . .
Source Input Format .
Include Statement . .
Binary Initialization Constants
Character Strings . . . .
Hollerith . . . . . . . .
Equivalence Statements.
One-Trip DO Loops . . .
Commas in Formatted Input
Short Integers . . . . . . .
Additional Intrinsic Functions .
VIOLATIONS OF THE STANDARD.
Double Precision Alignment . .
Dummy Procedure Arguments.
T and TL Formats
Carriage Control . . . . . . .
Assigned Goto . . . . . . . .
INTER-PROCEDURE INTERFACE .

2-91
2-91
2-91
2-91
2-91
2-92
2-92
2-92
2-92
2-93
2-93
2-93
2-93
2-93
2-94
2-94
2-94
2-94
2-94
2-94
2-95
2-95

Procedure Names . . .
Data Representations .
Return Values .
Argument Lists . . . .

2-95
2-95
2-95
2-96

FILE FORMATS . . . . .

2-96

Structure of Fortran Files .
Portability Considerations.
Pre-Connected Files and File Positions.

2-96
2-97
2-97

xiv Table of Contents

A PORTABLE FORTRAN 77 COMPILER (continued)
APPENDIX A: DIFFERENCES BETWEEN FORTRAN 66 AND FORTRAN 77
Features Deleted from Fortran 66

2-98
2-98

Hollerith . . . .
Extended Range .

2-98
2-98

Program Form . . . .

2-98

Blank Lines . . .
Program and Block Data Statements .
ENTRY Statement
DO Loops . . . .
Alternate Returns .
Declarations . . . . . .
CHARACTER Data Type
IMPLICIT Statement . .
PARAMETER Statement
Array Declarations. . .
SAVE Statement . . .
INTRINSIC Statement
Expressions . . . . . . .
Character Constants .
Concatenation . . . .
Character String Assignment .
Substrings. . . . . . . .
Exponentiation . . . . .
Relaxation of Restrictions

2-98
2-98
2-98
2-99
2-99
2-99
2-99
2-99
. 2-100
. 2-100
. 2-100
. 2-100
. 2-100
. 2-100
. 2-101
. 2-101
. 2-101
. 2-101
. 2-101

Executable Statements .

. 2-102

IF-THEN-ELSE ..
Alternate Returns .

. 2-102
. 2-102

Input/Output. . . . . .

. 2-102

Format Variables .
END=, ERR=, and IOSTAT= Clauses
Formatted VO . . . . . . . .

. 2-102
. 2-103
. 2-103

Character Constants . .
Positional Editing Codes
Colon . . . . . . .
Optional Plus Signs. . .
Blanks on Input . . . .
Unrepresentable Values .
lw.m . . . . . . .
Floating Point . .
"A" Format Code.

. 2-103
. 2-103
. 2-103
. 2-104
. 2-104
. 2-104
. 2-104
. 2-104
. 2-104

Standard Units . . . .
List-Directed Formatting.
Direct VO . .
Internal Files . . . . . .

. 2-104
. 2-105
. 2-105
. 2-105

Table of Contents xv
\

OPEN, CLOSE, and INQUIRE Statements.

. 2-106

OPEN . .
CLOSE . . . . . . . . . . . . . . .
INQUIRE . . . . . . . . . . . . . .

. 2-106
. 2-106
. 2-106

APPENDIX B: REFERENCES AND BIBLIOGRAPHY .

. 2-109

RATFOR: A PREPROCESSOR FOR A RATIONAL FORTRAN
INTRODUCTION . . . . . .
LANGUAGE DESCRIPTION
Design . . . . . . .
Statement Grouping
The "Else" Clause
Nested Ifs . . . . .
If-Else Ambiguity. .
The "Switch" Statement
The "Do" Statement . .
"Break" and "Next" . .
The "While" Statement.
The "For" Statement . .
The "Repeat-Until" Statement
More on Break and Next
"Return" Statement
Cosmetics . . . . .
Free-Form Input . .
Translation Services
"Define" Statement.
"Include" Statement .
Pitfalls, Botches, Blemishes and Other Failings.

. 2-111
. 2-111
. 2-111
. 2-112
. 2-112
. 2-113
. 2-113
. 2-114
. 2-114
. 2-115
. 2-115
. 2-116
. 2-117
. 2-117
. 2-117
. 2-117
. 2-117
. 2-118
. 2-118
. 2-118
. 2-119

IMPLEMENTATION.
EXPERIENCE . .

. 2-119
. 2-120

Good Things .
Bad Things ..

. 2-120
. 2-120

CONCLUSIONS .
APPENDIX: USAGE ON UNIX AND GCOS.

. 2-121
. 2-122

THE PROGRAMMING LANGUAGE EFL
INTRODUCTION .

. 2-123

Purpose .
History . . . .
Notation .. .

. 2-123
. 2-123
. 2-123

LEXICAL FORM .

. 2-124

Character Set
Lines . . . .
White Space.
Comments .
Include Files
Continuation
Multiple Statements on a Line .

. 2-124
. 2-124
. 2-124
. 2-124
. 2-124
. 2-124
. 2-125

xvi Table of Contents

THE PROGRAMMING LANGUAGE EFL (continued)
Tokens . . . . .
Identifiers .
Strings . .
Integer Constants
Floating Point Constants .
Punctuation .
Operators .

. 2-125
. 2-125
. 2-125
. 2-126
. 2-126
. 2-126
. 2-126

Macros . . . .

. 2-126

PROGRAM FORM

. 2-127

Files . . . .
Procedures .
Blocks . . .
Statements.
Labels . . .
DATA TYPES AND VARIABLES
Basic Types
Constants . .
Variables . . .
Storage Class.
Scope of Names
Precision.
Arrays . . .
Structures .

. 2-127
. 2-127
. 2-127
. 2-127
. 2-128
. 2-128
. 2-128
. 2-128
. 2-129
. 2-129
. 2-129
. 2-129
. 2-129
. 2-130

EXPRESSIONS .

. 2-130

Primaries .

. 2-130

Constants.
Variables .
Array Elements
Structure Members
Procedure Invocations
Input/Output Expressions
Coercions .
Sizes . . .

. 2-131
. 2-131
. 2-131
. 2-131
. 2-131
. 2-132
. 2-132
. 2-132

Parentheses . .
Unary Operators

. 2-132
. 2-132

Arithmetic
Logical . .

. 2-133
. 2-133

Binary Operators .

. 2-133

Arithmetic .
Logical . . .
Relational Operators .
Assignment Operators
Dynamic Structures . .
Repetition Operator. .
Constant Expressions .

. 2-133
. 2-133
. 2-134
. 2-134
. 2-134
. 2-135
. 2-135

Table of Contents xvii
DECLARATIONS.

. 2-135

Syntax . . . .
Attributes . .

. 2-135
. 2-135

Basic Types .
Arrays ..
Structures.
Precision
Common
External

. 2-135
. 2-136
. 2-136
. 2-136
. 2-136
. 2-137

Variable List .
The Initial Statement.

. 2-137
. 2-137

EXECUTABLE STATEMENTS .

. 2-137

Expression Statements . . .

. 2-137

Subroutine Call . . . .
Assignment Statements

. 2-137
. 2-138

Blocks . . . . . .
Test Statements .

. 2-138
. 2-138

If Statement.
If-Else . . .
Se.lect Statement

. 2-138
. 2-138
. 2-139

Loops . . . . . . . .

. 2-139

While Statement

. 2-139

For Statement . . . .

. 2-139

Repeat Statement .
Repeat ... Until Statement.
Do Loops . . . .

. 2-140
. 2-140
. 2-140

Branch Statements . .

. 2-141

Goto Statement .
Break Statement
Next Statement .
Return . . . . .

. 2-141
. 2-141
. 2-142
. 2-142

Input/Output Statements .

. 2-142

Input/Output Units .
Binary Input/Output.
Formatted Input/Output .
Iolists. . . . . . . . . .
Formats . . . . . . . . .
Manipulation Statements
PROCEDURES . . . . . .
Procedure Statement .
End Statement . . . .
Argument Association.
Execution and Return Values .
Known Functions . . . . . . .
Minimum and Maximum Functions
Absolute Value . . . . .
Elementary Functions . .
Other Generic Functions .

. 2-142
. 2-143
. 2-143
. 2-143
. 2-143
. 2::..144
. 2-144
. 2-144
. 2-145
. 2-145
. 2-145
. 2-145
. 2-145
. 2-145
. 2-145
. 2-146

xviii Table of Contents

THE PROGltAMMING LANGUAGE EFL (continued)
ATAVISMS .. , . .

. 2-146

Escape Lines . .
Call Statement .
Obsolete Keywords .
Numeric Labels . . .
Implicit Deciarations .
Computed Goto . .
Go To Statement . .
Dot Names. . ., . .
Complex Constants .
Function Values . .
Equivalence . . . .
Minimum and Mali'.imurn Functions .

. 2-146
. 2-146
. 2-146
. 2-146
. 2-147
. 2-147
. 2-147
. 2-147
. 2-148
. 2-148
. 2-148
. 2-148

COMPILER OPTIONS . . .

. 2-148
. 2-149
. 2-149
. 2-149
. 2-149
. 2-149
. 2-149
. 2-i50
. 2-150

Default Options . . . . . . . '• .
Input Language Options . . .
Input/Output Error Handling .
Continuation Conventions.
Default Formats . , . . . .
Alignments and Sizes . . . .
Default Input/Output Units .
Miscellaneous Output Control Options.
EXAMPLES . . . ; . . .

. 2-150

File Copying . . . . . .
Matrix Multiplication. .
Searching a Linked List.
Walking a Tree .

. 2-150
. 2-150
• . 2-150
. 2-151

PORTABILITY . . . . , . .

. 2-153

Primitives . . . . i

. 2-153

•

Character String Copying
Character String Comparisons

. 2-153
. 2-154

APPENDIX A: RELATION BETWEEN EFL ANb RATFOR.
APPENDIX B: COMPILER .
Current Version . . . . . .
Diagnostics. . . . . . . . .
Quality of Fortran Produced.

APPENDIX C: CONSTRAINTS ON THE DESIGN OF THE EFL LANGUAGE .
External Names . .
Procedure Interface .
Pointers . . . . .
Recursion . . . .
Storage Allocation

. 2-155
. 2-155
. 2-155
. 2-155
. 2-155
. 2-156
. 2-157
. 2-157
. 2-i57
. 2-157
. 2-157

Table of Contents xix

BERKELEY PASCAL USER'S MANUAL
SOURCES OF INFORMATION . . .
Where To Get Documentation. .
Documentation Describing UNIX
Text Editing Documents . . . .
Pascal Documents: The language
Pascal Documents: The Berkeley Implementation
References . . . .
BASIC UNIX PASCAL
A First Program .
A Larger Program
Correcting the First Errors
Executing the Second Example
Formatting the Program Listing .
Execution Profiling . . .

. 2-160
. 2-160
. 2-160
. 2-161
. 2-161
. 2-162
. 2-162
. 2-165
. 2-165
. 2-168
. 2-169
. 2-171
. 2-173
. 2-173

ERROR DIAGNOSTICS . . . . .

. 2-177

Translator Syntax Errors . .
Translator Semantic Eerrors
Translator Panics, VO Errors

. 2-177
. 2-180
. 2-184

INPUT/OUTPUT .
Introduction .
Eof and Eoln .
More about Eoln .
Output Buffering .
Files, Reset, and Rewrite
Argc and Argv . . . . .
DETAILS ON THE COMPONENTS OF THE SYSTEM.

. 2-186
. 2-186
. 2-187
. 2-188
. 2-189
. 2-190
. 2-190
. 2-193

Options . . . . . . . . . . . . . .
Options Common to Pi, Pc, and Pix .
Options Available in Pi .
Options Available in Px. .
Options Available in Pc. .
Options Available in Pxp .
Formatting Programs using Pxp .
Pxref. . . . . . . . . . . . .
Multi-File Programs . . . . . .
Separate Compilation with Pc. .

. 2-193
. 2-193
. 2-195
. 2-195
. 2-195
. 2-196
. 2-197
. 2-199
. 2-199
. 2-199

APPENDIX TO WIRTH'S PASCAL REPORT

. 2-202

Extensions to the Language Pascal . . .
Resolution of the Undefined Specifications .
Restrictions and Limitations . . . . . . .
Added Types, Operators, Procedures and Functions
Remarks on Standard and Portable Pascal . . . . .

. 2-202
. 2-203
. 2-206
. 2-207
. 2-208

xx Table of Contents

THE FRANZ LISP MANUAL
INTRODUCTION .

. 2-211

Data Types . .

. 2-211

Lispval .
Symbol.
List. . .
Binary .
Fixnum.
Flonum.
Bignum.
String.
Port . .
Vector
Array.
Value.
Hunk.
Other.

. 2-212
. 2-212
. 2-212
. 2-213
. 2-213
. 2-213
. 2-213
. 2-214
. 2-214
. 2-214
. 2-214
. 2-215
. 2-215
. 2-215

Documentation .

. 2-215

DATA STRUCTURE ACCESS.

. 2-217

Lists . . . . . . . .

. 2-217

List Creation .
List Predicates
List Accessing .
List Manipulation .

. 2-217
. 2-219
. 2-219
. 2-221

Predicates . . . . . . .
Symbols and Strings . .

. 2-223
. 2-226

Symbol and String Creation
String and Symbol Predicates
Symbol and String Accessing.
Symbol and String Manipulation .
Vectors . . . . . . . .
Vector Creation .
Vector Reference.
Vector Modification
Arrays . . . . . . .
Array Creation
Array Predicate
Array Accessors .
Array Manipulation
Hunks . . . . . . .

. 2-226
. 2-228
. 2-228
. 2-229
. 2-231
. 2-231
. 2-231
. 2-232
. 2-232
. 2-232
. 2-233
. 2-233
. 2-234
. 2-235

Hunk Creation
Hunk Accessor.
Hunk Manipulators

. 2-235
. 2-236
. 2-236

Beds . . . . . . . . . .

. 2-236

Table of Contents xxi
Structures . . . .

. 2-237

Assoc List . .
Property List
Tconc Structure .
Fclosures . .

. 2-237
. 2-238
. 2-240
. 2-240

Random Functions . .

. 2-241

ARITHMETIC FUNCTIONS

. 2-244

Simple Arithmetic Functions
Predicates . . . . . . .
Trignometric Functions .
Bignum Functions
Bit Manipulation . .
Other Functions . .

. 2-244
. 2-245
. 2-247
. 2-247
. 2-248
. 2-248

SPECIAL FUNCTIONS .
INPUT/OUTPUT . . . .
SYSTEM FUNCTIONS .
THE LISP READER

. 2-251
. 2-266
. 2-275
. 2-287

Introduction . . .
Syntax Classes . .
Reader Operations
Character Classes.
Syntax Classes . .
Character Macros.

. 2-287
. 2-287
. 2-288
. 2-288
. 2-291
. 2-293

Types . . . .

. 2-293

Normal
Splicing
Infix.

. 2-293
. 2-294
. 2-294

Invocations

. 2-295

Functions . . .

. 2-296

FUNCTIONS, FCLOSURES, AND MACROS
Valid Function Objects .
Functions . . . .
Macros . . . . . .
Macro Forms
Defmacro . ·.
The Backquote Character Macro .
Sharp Sign Character Macro . . .
Conditional Inclusion . . . .
Fixnum Character Equivalents
Read Time Evaluation
Fclosures . . . . . . .
An Example . . .
Useful Functions.
Internal Structure .
Foreign Subroutfoes and Functions

. 2-297
. 2-297
. 2-297
. 2-297
. 2-299
. 2-299
. 2-299
. 2-300
. 2-300
. 2-301
. 2-301
. 2-302
. 2-302
. 2-303
. 2-304
. 2-304

xxii Table of Contents

THE FRANZ LISP MANUAL (continued)
ARRAYS AND VECTORS. . . .

. 2-309

General Arrays . . . . . . .
Subparts of an Array Object

. 2-309
. 2-310

Access Function .
Auxiliary
Data .
Length .
Delta . .

. 2-310
. 2-310
. 2-310
. 2-310
. 2-310

The Maclisp Compatible Array Package .
Vectors . . . . . . .
Anatomy of Vectors.

. 2-310
. 2-311
. 2-312

Size . . . . . .
Property . . .
Internal Order.

. 2-312
. 2-312
. 2-312

Immediate-Vectors .

. 2-312

EXCEPTION HANDLING

. 2-314

Errset and Error Handler Functions .
The Anatomy of an Error .
Error Handling Algorithm.
Default Aids . . . .
Autoloading . . . . . . .
Interrupt Processing . . .

. 2-314
. 2-314
. 2-314
. 2-315
. 2-315
. 2-316

THE JOSEPH LISTER TRACE PACKAGE
LISZT - THE LISP COMPILER. . .

. 2-317
. 2-321

General Strategy of the Compiler
Running the Compiler
Special Forms . . . .

. 2-321
. 2-321
. 2-321

Macro Expansion
Classification .
Using the Compiler.
Compiler Options.
Autorun . . . .
Pure Literals . . .
Transfer Tables. .
Fixnum Functions
THE CMU USER TOPLEVEL AND THE FILE PACKAGE
User Command Input Top Level.
The File Package . . . .

. 2-321
. 2-322
. 2-323
. 2-324
. 2-326
. 2-327
. 2-327
. 2-328
. 2-329
. 2-329
. 2-330

THE LISP STEPPER . . . .

. 2-334

Simple Use of Stepping .
Advanced Features . . .

. 2-334
. 2-335

Selectively Turning On Stepping .
Stepping with Breakpoints .

. 2-335
. 2-336

Overhead of Stepping . . .
Evalhook and Funcallhook

. 2-336
. 2-336

THE FIXIT DEBUGGER . . .

. 2-338

Table of Contents xxiii
. 2-338
. 2-340
. 2-340
. 2-340

Introduction . . . . .
Interaction with Trace
Interaetion with Step .
Multiple Error Levels.

. 2-341

THE LISP EDITOR. .
The Editors . . .
Scope of Attention
Pattern Matching Commands .

. 2-341
. 2-341
. 2-342

Commands That Search .

. 2-343

Location Specifications .

. 2-344

The Edit Chain . . . . . . .

. 2-345

Printing Commands . . . , . . .
Structure Modification .Commands
Extraction and Embedding Commands
Move and Copy Commands . . .
Parentheses Moving Commands .

. 2-345
. 2-345
. 2-346
. 2-347
. 2-347

Using To and Thrti .

. 2-348

Undoing Commands . . .
Commands That Evaluate
Commands That Text . .
Editor Macros . . . . . .
Miscellaneous Editor Commands
Editor Functions . . . . . . . .

. 2-348
. 2-349
. 2-349
. 2-350
. 2-351
. 2-351

APPENDIX A: SPECIAL SYMBOLS
APPENDIX B: SHORT SUBJECTS .
The Garbage Collector . . .
Debugging . . . . . . . . .
The Interpreter's Top Level .

. 2-354
. 2-357
. 2-357
. 2-357
. 2-358

BERKELEY FP USER'S MANUAL
BACKGROUND . . . . . .
SYSTEM DESCRIPTION •

. 2-359
. 2-36i

Objects . .
Application . . .
Functions . . .

. 2-36i
. 2-361
. 2-362

Structural .
Predicate (Test) Functions .
Arithmetic/Logical .
Library Routines. .
Functional Forms. . . .
User Defined Functions.
GETTING ON AND OFF THE SYSTEM
Comments . . . .
Breaks . . . . . .
Non-Termination.

. 2-363
. 2-364
. 2-364
. 2-365
. 2-365
. 2-367
. 2-368
. 2-368
. 2-368
. 2-368

xxiv Table of Contents

BERKELEY FP USER'S MANUAL (continued)
SYSTEM COMMANDS .

. 2-368

Load . . . . . . .
Save . . . . . . .
Csave and Fsave ,.
Cload .
·
Pfn ..
Delete .
Fns . .
Stats ..

. 2-368
. 2-368
. 2-368
. 2-369
. 2-369
. 2-369
. 2-369
. 2-369

On
Off.
Print
Reset .

. 2-370
. 2-370
. 2-370
. 2-370

Trace
Timer
Script
Help.
Special System Functions .

. 2-371
. 2-371
. 2-371
. 2-371
. 2-372

Lisp . . . . . . . .
Debug. . . . . . . .

. 2-372
. 2-372

PROGRAMMING EXAMPLES

. 2-373

MergeSort . . . . . . . .
FP Session . . . . . . . .

. 2-373
. 2-375

IMPLEMENTATION NOTES.

. 2-381

The Top Level . . .
The Scanner . . . .
The Parser . . . . .
The Code Generator
Function Definition and Application .
Function Naming Conventions
Measurement Impelementation . . .
Data Structures . . . . . . . .
Interpretation of Data Structures .

. 2-381
. 2-381
. 2-381
. 2-382
. 2-383
. 2-383
. 2-383
. 2-383
. 2-384

Times . .
Size . . .
Funargno.
Funargtyp

. 2-384
. 2-384
. 2-384
. 2-384

Trace Information .

. 2-384

APPENDIX A: LOCAL MODIFICATIONS.

. 2-386

Character Set Changes . . .
Syntactic Modifications . . .

. 2-386
. 2-386

While and Conditional.
Function Definitions . .
Sequence Construction .

. 2-386
. 2-386
. 2-386

User Interface . . . . . .
Additions and Ommissions .

. 2-387
. 2-387

Table of Contents xxv
APPENDIX B: FP GRAMMAR . . . . . . . . . , . . . . . .
APPENDIX C: COMMAND SYNTAX . . . . . . . . . . . .
APPENDIX D: TOKEN-NAME CORRESPONDENCES . . .
APPENDIX E: SYMBOLIC PRIMITIVE FUNCTION NAMES

. 2-388
. 2-389
. 2-390
. 2-391

THE M4 MACRO PROCESSOR
INTRODUCTION . . .
USAGE . . . . . . . .
DEFINING MACROS .
QUOTING . . . . . .
ARGUMENTS . . . .
ARITHMETIC BUILT-INS
FILE MANIPULATION ..
SYSTEM COMMAND . .
CONDITIONALS . . . . .
STRING MANIPULATION
PRINTING . . . . . . . .
SUMMARY OF BUILT-INS .

. 2-393
. 2-393
. 2-393
. 2-394
. 2-395
. 2-395
. 2-396
. 2-396
. 2-397
. 2-397
. 2-397
. 2-398

PART 3: SUPPORTING TOOLS
AWK: A PATTERN SCANNING AND PROCESSING LANGUAGE
INTRODUCTION . . .

. 3-5

Usage . . . . . .
Program Structure
Records and Fields .
Printing . . . . .

. 3-5
. 3-5
. 3-5
. 3-6

PATTERNS . . . . . . .

. 3-6

BEGIN and END . .
Regular Expressions
Relational Expressions
Combinations of Patterns .
Pattern Ranges . .

. 3-6
. 3-7
. 3-7
. 3-7
. 3-7

ACTIONS . . . . . . . . . . .

. 3-7

Built-In Functions . . . .
Variables, Expressions, and Assignments.
Field Variables . . . . . . .
String Concatenation . . . .
Arrays . . . . . . . . . . .
Flow-of-Control Statements .
DESIGN . . . . . . .
IMPLEMENTATION . . . . . .

. 3-8
. 3-8
. 3-8
. 3-9
. 3-9
. 3-9
. 3-9
3-10

xxvi Table of Contents

MAKE: A PROGRAM FOR MAINTAINING COMPUTER PROGRAMS
3-13
3-13
3-15
3-16
3-17
3-18
3-20
3-21

INTRODUCTION . . . . . . . . . . . . . . .
BASIC FEATURES . . . . . . . . . . . . . .
DESCRIPTION FILE& 4ND SUBSTJTUTIONS
COMMAND USAGE . . . . . . .
IMPLICIT RULES . . . . . . . . : . . . . .
EXAMPLE . . . . . . . . . . . . . . . . . .
SUGGESTIONS AND WARNINGS . . . . . .
APPENDIX: SUFFIXES AND TRANSFORMATION RULES.

AN I1']''fRODUCTION TO THE SOURCE CODE CONTROL SYSTEM

The What Command . . . . . .
Where To Put ID Keywords . . .

3-23
3-23
3-23
3-24
3-24
3-24
3-24
3-25
3-25
3-25
3-25
3-26
3-26
3-26
3-26
3-27

Keeping SID's Cotisistetit Across Files .
Creating New Releases . .

3-27
3-27

INTRODUCTION . . . . .
LEARNING THE LINGO .
S-file . . . . . . . .
Deltas . . . . . . . .
SID's (or, Version Numbers)
Id keywords . . . , . . . .

..
' .

CREATING FILES . . . . . , .
GETTING FILES FOR COMPILATION .
CHANGING FILES (OR, CREATING DELTAS).
Getting a Copy To Edit • . . . . . . . . .
Merging the Changes B11ck into the S-File .
When To Make Deltas . . . . , . . .
What's Going On: The lnfo Command.
ID Keywords . . . . , . . . . :

RESTORING OLD VE~SIONS . .
Reverting to Old Versions . . .
Selectively Oelet~ni Old D~ltas
AUDITING CHANGES . . . . . .
The Prt Command . . . . . .
Fini:ling Why Lines Were Inserted .
Fiq~ing What Ch11nges You liave Made .
SHORTHAND NOTATIONS
Delget . . .
Fix . . . .
Unt;1i:lit . . .
Th~ :...d Flag
USING SCCS 'ON A PROJECT
SAVING YOURSELF . . . . .
Recovering a Munged Edit File
Restoring the S-File . . . . .
USING THE ADMIN COMMAND.

3-27
3-27
3-28
3-28
3-28
3-29
3-29
3-29
3-29
3-29
3-29
3-30
3-30
3-30
3-30
3-30
3-31

Table of Contents xxvii

MAINTAINING DIFFERENT VERSIONS (BRANCHES).

3-31

Creating a Branch . . . . . . . . . . . . .
Merging a Branch Back into the Main Trunk
A More Detailed Example.
A Warning . . . . . . . . . .

3-31
3-31
3-32
3-32

USING SCCS WITH MAKE. . . .

3-32

To Maintain Single Programs .
To Maintain a Library . . . .
To Maintain a Large Program .
Further Information.

3-33
3-33
3-34
3-35

QUICK REFERENCE .

3-36

Commands . .
Id Keywords . . .

3-36
3-37

LINT, A C PROGRAM CHECKER
INTRODUCTION AND USAGE . . . . . .
A WORD ABOUT PHILOSOPHY . . . . .
UNUSED VARIABLES AND FUNCTIONS
SET/USED INFORMATION.
FLOW OF CONTROL.
FUNCTION VALUES . . . .
TYPE CHECKING . . . . .
TYPE CASTS . . . . . . . .
NONPORTABLE CHARACTER USE
ASSIGNMENTS OF LONGS TO INTS
STRANGE CONSTRUCTIONS . . . .
ANCIENT HISTORY . . . . . . . . .
POINTER ALIGNMENT . . . . . . .
MULTIPLE USES AND SIDE EFFECTS
IMPLEMENTATION . . . . . . .
PORTABILITY . . . . . . . . . .
SHUTTING LINT UP. . . . . . .
LIBRARY DECLARATION FILES.
BUGS, ETC . . . . . . . . . . . .
APPENDIX: CURRENT LINT OPTIONS

3-39
3-39
3-39
3-40
3-40
3-41
3-41
3-42
3-42
3-42
3-43
3-43
3-44
3-44
3-44
3-45
3-46
3-47
3-47
3-50

A TUTORIAL INTRODUCTION TO ADB
INTRODUCTION .
A QUICK SURVEY .

3-51
3-51

Invocation .
Current Address
Formats .
General Request Meanings

3-51
3-51
3-52
3-52

DEBUGGING C PROGRAMS .

3-53

Debugging a Core Image
Multiple Functions .
Setting Breakpoints.
Advanced Breakpoint Usage.
Other Breakpoint Facilities .

3-53
3-54
3-55
3-56
3-58

xxviii Table of Contents

A TUTORIAL INTRODUCTION TO ADB (continued)
MAPS . . . . . . . .
ADVANCED USAGE .

3-58
3-59

Formatted Dump .
Directory Dump .
Ilist Dump . . . .
Converting Values

3-59
3-61
3-61
3-61
3-62
3-62
3-77

PATCHING . . . .
ANOMALIES . . .
ADB SUMMARY .

YACC: YET ANOTHER COMPILER-COMPILER
INTRODUCTION . . . . .
BASIC SPECIFICATIONS.
ACTIONS . . . . . . . . .
LEXICAL ANALYSIS . . .
HOW THE PARSER WORKS .
AMBIGUITY AND CONFLICTS.
PRECEDENCE . . . . . . . .
ERROR HANDLING . . . . . .
THE YACC ENVIRONMENT . .
HINTS FOR PREPARING SPECIFICATIONS.
Input Style. . .
Left Recursion .
Lexical Tie-Ins .
Reserved Words
ADVANCED TOPICS .
Simulating Error and Accept in Actions .
Accessing Values in Enclosing Rules.
Support for Arbitrary Value Types ..
APPENDIX A: A SIMPLE EXAMPLE. .
APPENDIX B: YACC INPUT SYNTAX .
APPENDIX C: AN ADVANCED EXAMPLE.
APPENDIX D: OLD FEATURES SUPPORTED BUT NOT ENCOURAGED

3-79
3-81
3-83
3-84
3-86
3-89
3-92
3-94
3-96
3-97
3-97
3-97
3-98
3-98
3-99
3-99
3-99
3-99
. 3-102
. 3-104
. 3-106
. 3-111

LEX: A LEXICAL ANALYZER GENERATOR
INTRODUCTION . . . . . . . .
LEX SOURCE . . . . . . . . .
LEX REGULAR EXPRESSIONS
Operators . . . . .
Character Classes. . .
Arbitrary Character. .
Optional Expression .
Repeated Expressions .
ALTERNATION AND GROUPING
Context Sensitivity . . . . .
Repetitions and Definitions . .

. 3-113
. 3-115
. 3-115
. 3-115
. 3-116
. 3-116
. 3-116
. 3-116
. 3-116
. 3-116
. 3-117

Table of Contents xxix
\

LEX ACTIONS . . . . . . . . .
AMBIGUOUS SOURCE RULES .
LEX SOURCE DEFINITIONS.
USAGE . . .
UNIX.
GCOS.
TSO ..
LEX AND YACC .
EXAMPLES . . .
LEFT CONTEXT SENSITIVITY
CHARACTER SET . . . . . . .
SUMMARY OF SOURCE FORMAT .
CAVEATS AND BUGS . . . . . . .

. 3-117
. 3-119
. 3-120
. 3-120
. 3-121
. 3-121
. 3-121
. 3-121
. 3-121
. 3-123
. 3-124
. 3-124
. 3-125

PART 4: System Programming
UNIX IMPLEMENTATION
INTRODUCTION . . . . . . . . .
PROCESS CONTROL. . . . . . .
Process Creation and Program Execution.
Swapping . . . . . . . . . . .
Synchronization and Scheduling .

VO SYSTEM . . . . . . .
Block VO System . . .
Character VO System .
Disk Drivers. . .
Character Lists .
Other Character Devices .

. 4-5
. 4-5
. 4-6
. 4-7
. 4-7
. 4-8
. 4-9
. 4-9
. 4-9
4-10
4-10

THE FILE SYSTEM . . . . . .

4-10

File System Implementation
Mounted File Systems .
Other System Functions

4-11
4-13
4-13

4.2BSD System Manual
NOTATION AND TYPES . .
KERNEL PRIMITIVES . . .

4-15
4-16

Processes and protection

4-17

Host and Process Identifiers
Process Creation and Termination
User and Group Ids
Process Groups . . .
Memory Management. . .
Text, Data and Stack
Mapping Pages . . .
Page Protection Control
Giving and Getting Advice .

4-17
4-17
4-18
4-19
4-20
4-20
4-20
4-21
4-21

xxx Table of Contents

4.2BSD System Manual
4-22

Signals . . . . . .

4-22
4-22
4-23
4-23
4-24
4-24
4-25
4-25
4-25
4-27

Overview . .
Signal Types
Signal Handlers .
Sending Signals .
Protecting Critical Sections
Signal Stacks
Timers . . . . . . .
Real Time . . .
Interval Time .
Descriptors. . . . .

4-27
4-27
4-27
4-27
4-28
4-30
4-30
4-30
4-31
4-32
4-32
4-32
4-32
4-33
4-34

The Reference Table .
Descriptor Properties.
Managing Descriptor References
Multiplexing Requests .
Descriptor Wrapping.
Resource Controls. . . . .
Process Priorities . .
Resource Utilization .
Resource Limits . . .
System Operation Support
Bootstrap Operations
Shutdown Operations
Accounting . .
SYSTEM FACILITIES .
Generic Operations .

4-34
4-34
4-35

Read and Write .
Input/Output Control
Nonblocking and Asynchronous Operations

Directory Creation and Removal.
File Creation . . . . . . . . .
Creating References to Devices .
Portal Creation . . . . . . . . .
File, Device, and Portal Removal

4-36
4-36
4-36
4-36
4-36
4-37
4-37
4-37
4-38

Reading and Modifying File Attributes
Links and Renaming. . .
Extension and Truncation
Checking Accessibility .
Locking . . .
Disk Quotas . . . . . .

4-38
4-39
4-39
4-41
4-41
4-41

File System .
Overview
Naming.
Creation and Removal .

Table of Contents xxxi
Interprocess Communications . . . . . . . .
Interprocess Communication Primitives .
Communication Domains . . . . .
Socket Types and Protocols . . . .
Socket Creation, Naming and Service Establishment .
Accepting Connections . . .
Making Connections . . . . . . . . . . . .
Sending and Receiving Data. . . . . . . . .
Scatter/Gather and Exchanging Access Ri~hts
Using Read and Write with Sockets . . . . . 1
Shutting Down Halves of Full-Duplex Con.nectipns .
Socket and Protocol Options

UNIX Domain. . . . .
fypes of Sockets .
Naming . . . . .
Access Rights Transmission .
INTERNET Domain. . . . . . .
Socket Types and Protocols .
Socket Naming . . . . . . .
Access Rights Transmission .
Raw Access.
Terminals and Devices
Terminals . . . .
Terminal Ip.put .
Input ¥odes
Intermpt Characters .
Line Editing . . . . .

Termin~l Output . . . . .
Terminal Control Operations
Terlllin.al Hardware Support.

Structured Devices . . . .
Unstructured Devices . . ,
Process and Kernel Descriptors
SUMMARY OF FACILITIES . . .

4-42
4-42
4-42
4-42
4-42
4-43
4-43
4-44
4-44
4-45
4-45
4-45
4-45
4-45
4-45
4-46
4-46
4-46
4-46
4-46
4-46
4-47
4-47
4-47
4-47
4-47
4-47
4-47
4-47
4-48
4-48
4-48
4-49
4-50

BER~ELEY VAX/UNIX ASSEMBLER REFE~ENCE MANUAL

INTROPUCTION . . . . . . . . . . . . . . . . . . . . . . . . . , . .
Assembler Revi$ions Since November 5, 1979 . . . . . . . . . . . . .
Fe~tu:res Supported, But No Longer Encouraged as of February 9, 1983
USAGE . . . . . . . . . . .
LEXICAL CONVENTIONS .
Identifiers . . . ' . . .
Constants . . . . . . .
Scalar Constants .
Floating Point Constants.
String Const~nts . . . . .

4-53
4-53
4-53
4-53
4-54
4-54
4-54
4-54
4-55
4,..55

xxxii Table of Contents

BERKELEY VAX/UNIX ASSEMBLER REFERENCE MANUAL (continued)
Operators . . . . . . . .
Blanks . . . . . . . . . .
Scratch Mark Comments .
"C" Style Comments . . .
SEGMENTS AND LOCATION COUNTERS.
STATEMENTS . . . . . .

4-55
4-55
4-55
4-56
4-56
4-56

Named Global Labels.
Numeric Local Labels.
Null Statements . . .
Keyword Statements .

4-56
4-56
4-57
4-57

EXPRESSIONS. . . . . .

4-57
4-57
4-57
4-58
4-59
4-59
4-60
4-60
4-61
4-61

Expression Operators .
Data Types. . . . . .
TYPE PROPAGATION IN EXPRESSIONS
PSEUDO-OPERATIONS (DIRECTIVES)
Interface to a Previous Pass .
Location Counter Control .
Filled Data . . . . .
Symbol Definitions . . .
Initialized Data. . . . .
MACHINE INSTRUCTIONS
Character Set . . . . .
Specifying Displacement Lengths
Casex Instructions . . . . . .
Extended Branch Instructions .
DIAGNOSTICS . . . . . . . . . .
LIMITS . . . . . . . . . . . . . .
ANNOYANCES AND FUTURE WORK

4-63
4-63
4-63
4-64
4-64
4-64
4-64
4-65

THE UNIX 1/0 SYSTEM
DEVICE CLASSES . . . . . . .
OVERVIEW OF 1/0 . . . . . . . . ,
CHARACTER DEVICE DRIVERS . .
THE BLOCK-DEVICE INTERFACE.
BLOCK DEVICE DRIVERS .
RAW BLOCK-DEVICE 1/0 . . . . .

4-67
4-67
4-68
4-70
4-72
4-73

SCREEN UPDATING AND CURSOR MOVEMENT OPTIMIZATION

Naming Conventions .

4-75
4-75
4-75
4-76
4-76

VARIABLES . . . . . . .

4-77

OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TERMINOLOGY (OR, WORDS YOU CAN SAY TO SOUND BRILLIANT) .
COMPILING THINGS . .
SCREEN UPDATING. . .

(

Table of Contents xxxiii
USAGE. . . . . . . .

4-77

Starting Up . . .
The Ni tty-Gritty .

4-77
4-78

Output . . .
Input . . . .
Miscellaneous .

4- 78
4-78
4- 78

Finishing up . . . .

4-78

CURSOR MOTION OPTIMIZATION: STANDING ALONE

4-78

Terminal Information. . . . . . . . . . . . . . .
Movement Optimizations, or, Getting Over Yonder.

4-79
4-80

THE FUNCTIONS . .

4-80

Output Functions.
Input Functions .
Miscellaneous Functions
Details . .

4-80
4-84
4-85
4-87

APPENDIX A. . . . . . . .

4-89

Capabilities from Termcap
Disclaimer . . . . .
Overview . . . . . .
Variables Set By Setterm().
Variables Set By Gettmode() .

4-89
4-89
4-89
4-89
4-90

APPENDIX B. . . . . . . .

4-91

The WINDOW structure

4-91

APPENDIX C. . . .

4-92

Examples . . .
Screen Updating
Twinkle . . . .
Life . . . . . .
Motion Optimization .

4-92
4-92
4-92
4-94
4-97

Twinkle . . . . .

4-97

4.2BSD LINE PRINTER SPOOLER MANUAL
OVERVIEW. . . . . . . . . . .
COMMANDS . . . . . . . . . .
LPD - Line Printer Dameon
LPQ - Show Line Printer Queue
LPRM - Remove Jobs from a Queue.
LPC - Line Printer Control Program.

4-99
4-99
4-99
. 4-100
. 4-100
. 4-100

ACCESS CONTROL . . . .
SETTING UP . . . . . . . .

. 4-100
. 4-101

Creating a Printcap File

. 4-101

Printers on Serial Lines
Remote Printers .

. 4-101
. 4-101

Output Filters . . . . . . .

. 4-102

xxxiv Table of Contents

4.2BSD LINE PRINTER SPOOLER MANUAL (continued)
OUTPUT FILTER SPECIFICATIONS.
LINE PRINTER ADMINISTRATION
TROUBLESHOOTING
LPR ..
LPQ ..
LPRM.
LPD.
LPC ..

. 4-102
. 4-103
. 4-103
. 4-103
. 4-104
. 4-105
. 4-105
. 4-105

Introduction 1-1

PART 1: PROGRAMMING CONSIDERATIONS

This part contains one article, "UNIX Programming - Second Edition," by Kernighan and
Ritchie. The article gives background information that will help you write programs that
make full use of the ULTRIX-32 system. Readers should be familiar with the fundamentals
of the ULTRIX-32 system (or the UNIX system). Although the techniques shown in the article apply to programming in any language available on the ULTRIX-32 system, the sample
programs are written in the C language.
The authors explain how to:
• Pass arguments to and from a program
• Send program output to a file, to a pipe, or to a terminal
• Use the standard 1/0 (input/output) library
• Handle 1/0 errors
• Use low level 1/0
• Execute a program from within another
• Handle signals (interrupts)

UNIX Programning -

Second Edit ion 1-3

Second Edition

Brian W. Kernighan

Dennis M. Ritchie
Bell Laboratories
Murray Hill, New Jersey 07974

1. INTRODUCTION
This paper describes how to write programs that interface with the UNIX operating
system in a non-trivial way. This includes programs that use files by name, that use
pipes, that invoke other commands as they run, or that attempt to catch interrupts and
other signals during execution.
The document collects material which is scattered throughout several sections of The
UNIX Programmer's Manual [1] for Version 7 UNIX. There is no attempt to be complete;
only generally useful material is dealt with. It is assumed that you will be programming
in C, so you must be able to read the language roughly up to the level of The C Programming Language [2]. Some of the material in sections 2 through 4 is based on topics
covered more carefully there. You should also be familiar with UNIX itself at least to the
level of UNIX for Beginners [3].
2. BASICS
2.1. Program Arguments
When a C program is run as a command, the arguments on the command line are
made available to the function main as an argument count a r g c and an array a r gv of
pointers to character strings that contain the arguments. By convention, a r gv [ 0 ] is the
command name itself, so argc is always greater than 0.
The following program illustrates the mechanism: it simply echoes its arguments back
to the terminal. (This is essentially the echo command.)
:main(argc, argv)
int argc;
char *argv[ ] ;
{
int i;

/* echo argunents */

for ( i = 1; i < argc; i++)
printf(''o/t.So/oe", argv[i], (i<argc-1) ? ' ' : '\.n');
}

argv is a pointer to an array whose individual elements are pointers to arrays of characters; each is terminated by \ 0, so they can be treated as strings. The program starts by
printing argv[l] and loops until it has printed them all.
The argument count and the arguments are parameters to main. If you want to keep
them around so other routines can get at them, you must copy them to external variables.

\
\

UNIX is a Trademark of Bell Laboratories

1-4 lNIX Programning - Second Edit ion
2.2. The "Standard Input" and "Standard Output"
The simplest input mechanism is to read the "standard input," which is generally the
user's terminal. The function get char returns the next input character each time it is
called. A file may be substituted for the terminal by using the < convention: if prog uses
get char, then the command line

prog <file
causes prog to read f i 1e instead of the terminal. prog itself need know nothing about
where its input is coming from. This is also true if the input comes from another program
via the
otherprog I prog
provides the standard input for prog from the standard output of o therprog.
getchar returns the value :EDF when it encounters the end of file (or an error) on
whatever you are reading. The value of EDF is normally defined to be -1, but it is unwise
to take any advantage of that knowledge. As will become clear shortly, this value is
automatically defined for you when you compile a program, and need not be of any concern.
Similarly, put char ( c) puts the character c on the "standard output," which is also
by default the terminal. The output can be captured on a file by using >: if prog uses
put char,
prog >0utfi le
writes the standard output on ou tf i 1e instead of the terminal. ou tf i 1e is created if it
doesn't exist; if it already exists, its previous contents are overwritten. And a pipe can be
used:
prog I otherprog
puts the standard output of prog into the standard input of otherprog.
The function pr int f, which formats output in various ways, uses the same mechanism
as putchar does, so calls to printf and putchar may be intermixed in any order; the
output will appear in the order of the calls.
Similarly, the function scanf provides for formatted input conversion; it will read the
standard input and break it up into strings, numbers, etc., as desired. scanf uses the
same mechanism as getchar, so calls to them may also be intermixed.
Many programs read only one input and write one output; for such programs 1/0 with
getchar, putchar, scanf, and printf may be entirely adequate, and it is almost
always enough to get started. This is particularly true if the UNIX pipe facility is used to
connect the output of one program to the input of the next. For example, the following
program strips out all ascii control characters from its input (except for newline and tab).
#include <stdio.Ii>
DBin()

t• ccstrip: strip non-graphic characters •/

{

int c;
while ((c = getchar()) != IDF)
if ((c>=' '&&c<0177) Ii c= '\t' II c
putchar(c);
exit(O);
The line
#include <stdio.Ii>

'\n')

~ Progranning-Second F.dition

1-5

should appear at the beginning of each source file. It causes the C compiler to read a file
(/usr/include/stdio.h) of standard routines and symbols that includes the definition of EDF.
If it is necessary to treat multiple files, you can use cat to collect the files for you:
cat fi lel fi le2 . . .

I ccstrip >Output

and thus avoid learning how to access files from a program. By the way, the call to exit
at the end is not necessary to make the program work properly, but it assures that any
caller of the program will see a normal termination status (conventionally 0) from the program when it completes. Section 6 discusses status returns in more detail.
3. THE STANDARD 1/0 LIBRARY
The "Standard I/0 Library" is a collection of routines intended to provide efficient
and portable I/0 services for most C programs. The standard 1/0 library is available on
each system that supports C, so programs that confine their system interactions to its
facilities can be transported from one system to another essentially without change.
In this section, we will discuss the basics of the standard 1/0 library. The appendix
contains a more complete description of its capabilities.
3.1. File Access
The programs written so far have all read the standard input and written the standard
output, which we have assumed are magically pre-defined. The next step is to write a program that accesses a file that is not already connected to the program. One simple example is we, which counts the lines, words and characters in a set of files. For instance, the
command
WC X.C

y.c

prints the number of lines, words and characters in x . c and y • c and the totals.
The question is how to arrange for the named files to be read - that is, how to connect the file system names to the 1/0 statements which actually read the data.
The rules are simple. Before it can be read or written a file has to be opened by the
standard library function fopen. fopen takes an external name (like x. c or y. c), does
some housekeeping and negotiation with the operating system, and returns an internal
name which must be used in subsequent reads or writes of the file.
This internal name is actually a pointer, called a file pointer, to a structure which contains information about the file, such as the location of a buffer, the current character
position in the buffer, whether the file is being read or written, and the like. Users don't
need to know the details, because part of the standard I/0 definitions obtained by including stdio . h is a structure definition called FILE. The only declaration needed for a file
pointer is exemplified by
FIIE

*fp, *fopen();

This says that f p is a pointer to a FII.E, and f open returns a pointer to a FILE. (FII.E is
a type name, like int, not a structure tag.
The actual call to f open in a program is
fp = fopen(name, nnde);

)

The first ar~ument of f open is the name of the file, as a character string. The second
argument is ~ode, also as a character string, which indicates how you intend to use
the file. The only all.owable modes are read ("r"), write (''w''), or append ("a").
If a file that you open for writing or appending does not exist, it is created (if possible). Opening an existing file for writing causes the old contents to be discarded. Trying
to read a file that does not exist is an error, and there 1pay be other causes of error as well

1-6 lNIX Programning - Second Edit ion
(like trying to read a file when you don't have permission). If there is any error, fopen
will return the null pointer value NJ.IL (which is defined as zero in stdio . h).
The next thing needed is a way to read or write the file once it is open. There are
several possibilities, of which getc and putc are the simplest. getc returns the next
character from a file; it needs the file pointer to tell it what file. Thus
c = getc(fp)

places in c the next character from the file referred to by f p; it returns FDF when it
reaches end of file. putc is the inverse of getc:
putc(c, fp)

puts the character con the file fp and returns c. getc and putc return FDF on error.
When a program is started, three files are opened automatically, and file pointers are
provided for them. These files are the standard input, the standard output, and the standard error output; the corresponding file pointers are called stdin, stdout, and stderr.
Normally these are all connected to the terminal, but may be redirected to files or pipes as
described in Section 2.2. stdin, stdout and stderr are pre-defined in the 1/0 library
as the standard input, output and error files; they may be used anywhere an object of type
FILE *can be. They are constants, however, not variables, so don't try to assign to them.
With some of the preliminaries out of the way, we can now write we. The basic design
is one that has been found convenient for many programs: if there are command-line arguments, they are processed in order. If there are no arguments, the standard input is processed. This way the program can be used stand-alone or as part of a larger process.

(

lNIX Progrmming -Second F.dition 1-7
#include <stdio.II>
nain(argc, argv) /*we: count lines, words, chars */
int argc;
char *argv[ ] ;
{
int c, i, inword;
FIIE *fp, *fopen();
long linect, wordct, charct;
long tlinect = 0, twordct = 0, tcharct = O;
i = 1;
fp = stdin;
do {
if (argc > 1 && ( fp=fopen(argv[ i], "r")) == N.JIL)
fprintf(stderr, ''we: can't open o/160, argv[i]);
continue;
}
linect = wordct = charct = inword = O;
while ((c = getc(fp)) !=EDF) {
charct++;
if (c = '0)
linect++;
if <c = • • I I c == • • I I c
'O>
inword = O;
else if (inword == 0) {
inword = 1;
wordct++;

}
printf( "%7ld %7ld %7ld", linect, wordct, charct);
printf(argc > 1 ? " o/160 : "0, argv[ i]);
fclose( fp);
tlinect += linect;
twordc t += wordc t ;
tcharct += charct;
} while (++i < argc);
if (argc > 2)
printf("o/o7ld %7ld %7ld totalO, tlinect, twordct, tcharct);
exit (0);

}

The function f print f is identical to print f, save that the first argument is a file pointer
that specifies the file to be written.
The function fclose is the inverse of fopen; it breaks the connection between the
file pointer and the external name that was established by f open, freeing the file pointer
for another file. Since there is a limit on the number of files that a program may have
open simultaneously, it's a good idea to free things when they are no longer needed.
There is also another reason to call fclose on an output file - it flushes the buffer in
which putc is collecting output. (fclose is called automatically for each open file when
a program terminates normally.)
3.2. Error Handling - Stderr and Exit
stderr is assigned to a program in the same way that stdin and stdout are. Output written on s tderr appears on the user's terminal even if the standard output is
redirected. we writes its diagnostics on stderr instead of stdout so that if one of the
files can't be accessed for some reason, the message finds its way to the user's terminal
instead of disappearing down a pipeline or into an output file.

1-8 lNIX Programning - Second Edit ion
The program actually signals errors in another way, using the function ex i t to terminate program execution. The argument of ex i t is available to whatever process called
it (see Section 6), so the success br failure of the program can be tested by another program that uses this one as a sub-process. By convention, a return value of 0 signals that
all is well; non-zero values signal Qbnormal situations.
exit itself calls fclose for each open output file, to flush out any buffered output,
then calls a routine named ex i t. The function ex i t causes immediate termination
without any buffer :flushing; it may be called directly if desired.
3.3. Miscellaneous 1/0 Functions

The standard 1/0 library provides several other 1/0 functions besides those we have
illustrated above.
Normaiiy output with putc, etc., is buffered (except to stderr); to force it out
immediately, use fflush(fp).
fscanf is identical to scanf, except that its first argument is a file pointer (as with
f print f) that specifies the file from which the input comes; it returns H>F at end of file.
The functions sscanf and sprintf are identical to fscanf and fprintf, except
that the first argument names a character string instead of a file pointer. The conversion
is done from the string for sscanf and into it for sprint f.
f gets ( buf , s i ze , f p) copies the next line from f p, up to and including a newline,
into buf; at most size-1 characters are copied; it returns NJ.LL at end of file.
fputs(buf, fp) writes the string in buf onto file fp.
The function unge t c ( c , f p) "pushes back" the character c onto the input stream
fp; a subsequent call to getc, fscanf, etc., will encounter c. Only one character of
pushback per file is permitted.
4. LOW-LEVEL 1/0
This section describes the bottom level of 1/0 on the UNIX system. The lowest level
of 1/0 in UNIX provides no buffering or any other services; it is in fact a direct entry into
the operating system. You are entirely on your own, but on the other hand, you have the
most control over what happens. And since the calls and usage are quite simple, this isn't
as bad as it sounds.

4.1. File Descriptors
In the UNIX operating system, all input and output is done by reading or writing files,
because all peripheral devices, even the user's terminal, are files in the file system. This
means that a single, homogeneous interface handles all communication between a program
and peripheral devices.
In the most general case, before reading or writing a file, it is necessary to inform the
system of your intent to do so, a process called "opening" the file. If you are going to
write on a file, it may also be necessary to create it. The system checks your right to do
so (Does the file exist? Do you have permission to access it?), and if all is well, returns a
small positive integer called a file descriptor. Whenever 1/0 is to be done on the file, the
file descriptor is used instead of the name to id,entify the file. (This is roughly analogous
to the use of READ(5,... ) and WRITE(6,... ) in Fortran.) All information about an open file is
maintained by the system; the user program refers to the file only by the file descriptor.
The file pointers discussed in section 3 are similar in spirit to file descriptors, but file
descriptors are more fundamental. A file pointer is a pointer to a structure that contains,
among other things, the file descriptor for the file in question.
Since input and output involving the user's terminal are so common, special arrangements exist to make this convenient. When the command interpreter (the "shell") runs a

lNIX Progranming - Second :Edit ion 1-9
program, it opens three files, with file descriptors 0, 1, and 2, called the standard input,
the standard output, and the standard error output. All of these are normally connected
to the terminal, so if a program reads file descriptor 0 and writes file descriptors 1 and 2,
it can do terminal 1/0 without worrying about opening the files.
If 1/0 is redirected to and from files with< and>, as in
prog <infile >0utfile
the shell changes the default assignments for file descriptors 0 and 1 from the terminal to
the named files. Similar observations hold if the input or output is associated with a pipe.
Normally file descriptor 2 remains attached to the terminal, so error messages can go
there. In all cases, the file assignments are changed by the shell, not by the program. The
program does not need to know where its input comes from nor where its output goes, so
long as it uses file 0 for input and 1 and 2 for output.
4.2. Read and Write
All input and output is done by two functions called read and write. For both, the
first argument is a file descriptor. The second argument is a buffer in your program where
the data is to come from or go to. The third argument is the number of bytes to be
transferred. The calls are
n read= read(fd, buf, n);
n written= write(fd, buf, n);
Each call returns a byte count which is the number of bytes actually transferred. On
reading, the number of bytes returned may be less than the number asked for, because
fewer than n bytes remained to be read. (When the file is a terminal, read normally
reads only up to the next newline, which is generally less than what was requested.) A
return value of zero bytes implies end of file, and -1 indicates an error of some sort. For
writing, the returned value is the number of bytes actually written; it is generally an error
if this isn't equal to the number supposed to be written.
The number of bytes to be read or written is quite arbitrary. The two cost common
values are 1, which means one character at a time ("unbuffered"), and 512, which
corresponds to a physical blocksize on many peripheral devices. This latter size will be
most efficient, but even character at a time 1/0 is not inordinately expensive.
Putting these facts together, we can write a simple program to copy its input to its
output. This program will copy anything to anything, since the input and output can be
redirected to any file or device.
#define llJFSIZE 512
ma.in()

/* best size for :RI>-11 lNIX */

/* copy input to output */

{

char
int

buf [llJFSIZE] ;
n;

while ((n = read(O, buf, llJFSIZE)) > 0)
write(l, buf, n);
exit(O);
If the file size is not a multiple of IUFSIZE, some read will return a smaller number of
bytes to be written by wr i t e; the next call to read after that will return zero.

It is instructive to see how read and write can be used to construct higher level routines like get char, put char, etc. For example, here is a version of get char which does
unbuffered input.

1-10 lNIX Progranming - Second :Edit ion
#define

~ 0377

getchar()

/* for making char's > 0 */

/* unbuffered single character input */

{

char c;
return( (read(O, &.c, 1) > 0) ? c & ~ : EDF);
}

c must be declared char, because read accepts a character pointer. The character being
returned must be masked with 0377 to ensure that it is positive; otherwise sign extension
may make it negative. (The constant 0377 is appropriate for the PDP-11 but not necessarily for other machines.)
The second version of get char does input in big chunks, and hands out the characters one at a time.
#define ~ 0377 /* for making char's > 0 */
#define JIJFSIZE 512
getchar()

I* buffered version *I

{

static char buf [JIJFSIZE] ;
static char *bufp = buf;
static int n = O;
if (n == 0) { /* buffer is ent>tY */
n = read(O, buf, JIJFSIZE);
bufp = buf;
}

return((--n >= 0) ? *bufp+t & ~

EDF);

}

4.3. Open, Creat, Close, Unlink
Other than the default standard input, output and error files, you must explicitly open
files in order to read or write them. There are two system entry points for this, open and
creat [sic].
open is rather like the f open discussed in the previous section, except that instead of
returning a file pointer, it returns a file descriptor, which is just an int.

int fd;
fd = open(name, rwnode);

As with f open, the name argument is a character string corresponding to the external file
name. The access mode argument is different, however: rwnode is 0 for read, 1 for write,
and 2 for read and write access. open returns -1 if any error occurs; otherwise it returns
a valid file descriptor.
It is an error to try to open a file that does not exist. The entry point creat is provided to create new files, or to re-write old ones.
fd = creat(name, pnode);

returns a file descriptor if it was able to create the file called name, and -1 if not. If the
file already exists, creat will truncate it to zero length; it is not an error to creat a file
that already exists.
If the file is brand new, creat creates it with the protection mode specified by the
pmde argument. In the UNIX file system, there are nine bits of protection information

lNIX Programning - Second Fili t ion 1-11
associated with a file, controlling read, write and execute permission for the owner of the
file, for the owner's group, and for all others. Thus a three-digit octal number is most
convenient for specifying the permissions. For example, 0755 specifies read, write and
execute permission for the owner, and read and execute permission for the group and
everyone else.
To illustrate, here is a simplified version of the UNIX utility cp, a program which
copies one file to another. (The main simplification is that our version copies only one
file, and does not permit the second argument to be a directory.)
#define NJIL 0
#define IIJFSIZE 512
#define IMIE 0644 /* BV for owner, R for group, others */
nain(argc, argv)
int argc;
char *argv[];

/* cp: copy fl to f2 */

{

int
char

fl, f2, n;
buf[IIJFSIZE];

if (argc I= 3)

error(''Usage: cp frmi to", NJIL);
if ((fl= open(argv[l], 0)) == -1)
error("cp: can't open o/<S", argv[l]);
if ((f2 = creat(argv[2], IMIE)) == -1)
error( "cp: can't create o/<S", argv[2]);
while ((n = read(fl, buf, IIJFSIZE)) > 0)
if (write(f2, buf, n) I= n)
error ( "cp: write error", NJIL) ;
exit(O);
error(sl, s2) /* print error message and die */
char *sl, *s2;
{

printf(sl, s2);
printf("O);
exi t(l);
As we said earlier, there is a limit (typically 15-25) on the number of files which a program may have open simultaneously. Accordingly, any program which intends to process
many files must be prepared to re-use file descriptors. The routine close breaks the connection between a file descriptor and an open file, and frees the file descriptor for use with
some other file. Termination of a program via exit or return from the main program
closes all open files.
The function un I ink ( f i I ename) removes the file f i I ename from the file system.

4.4. Random Access - Seek and Lseek
File 1/0 is normally sequential: each read or write takes place at a position in the
file right after the previous one. When necessary, however, a file can be read or written in
any arbitrary order. The system call lseek provides a way to move around in a file
without actually reading or writing:
lseek(fd, offset, origin);

forces the current position in the file whose descriptor is fd to move to position offset,

1-12 lNIX Programning - Second Edit ion

which is taken relative to the location specified by origin. Subsequent reading or writing will begin at that position. offset is a long; fd and origin are int's. origin can
be 0, 1, or 2 to specify that offset is to be measured from the beginning, from the
current position, or from the end of the file respectively. For example, to append to a file,
seek to the end before writing:
lseek(fd, OL, 2);
To get back to the beginning ("rewind"),
lseek(fd, OL, O);
Notice the OL argument; it could also be written as (long) 0.
With 1seek, it is possible to treat files more or less like large arrays, at the price of
slower access. For example, the following simple function reads any number of bytes from
any arbitrary place in a file.
get(fd, pos, buf, n) /* read n bytes fr<Dl position pos */
int fd, n;
long pos;
char *buf;
{

lseek(fd, pos, O);
/*get to pos */
return(read(fd, buf, n));
In pre-version 7 UNIX, the basic entry point to the 1/0 system is called seek. seek is
identical to lseek, except that its offset argument is an int rather than a long.
Accordingly, since PDP-11 integers have only 16 bits, the offset specified for seek is
limited to 65,535; for this reason, origin values of 3, 4, 5 cause seek to multiply the
given offset by 512 (the number of bytes in one physical block) and then interpret origin
as if it were 0, 1, or 2 respectively. Thus to get to an arbitrary place in a large file
requires two seeks, first one which selects the block, then one which has origin equal to
1 and moves to the desired byte within the block.
4.5. Error Processing
The routines discussed in this section, and in fact all the routines which are direct
entries into the system can incur errors. Usually they indicate an error by returning a
value of -1. Sometimes it is nice to know what sort of error occurred; for this purpose all
these routines, when appropriate, leave an error number in the external cell errno. The
meanings of the various error numbers are listed in the introduction to Section II of the
UNIX Programmer's Manual, so your program can, for example, determine if an attempt to
open a file failed because it did not exist or because the user lacked permission to read it.
Perhaps more commonly, you may want to print out the reason for failure. The routine
perror will print a message associated with the value of errno; more generally,
sys -errno is an array of character strings which can be indexed by errno and printed
by your program.
5. PROCESSES
It is often easier to use a program written by someone else than to invent one's own.
This section describes how to execute a program from within another.
5.1. The "System" Function
The easiest way to execute a program from another is to use the standard library routine sys tan. sys tan takes one argument, a command string exactly as typed at the terminal (except for the newline at the end) and executes it. For instance, to time-stamp the
output of a program,

(

mix Progrmnning - Second Edition 1-13
11Bin()
{
system("date");
I* rest of processing */
If the command string has to be built from pieces, the in-memory formatting capabilities

of sprintf may be useful.
Remember than get c and put c normally buffer their input; terminal 1/0 will not be
properly synchronized unless this buffering is defeated. For output, use fflush; for
input, see set buf in the appendix.
5.2. Low-Level Process Creation - Execl and Execv
If you're not using the standard library, or if you need finer control over what happens, you will have to construct calls to other programs using the more primitive routines
that the standard library's sys t Em routine is based on.
The most basic operation is to execute another program without returning, by using
the routine exec I. To print the date as the last action of a running program, use
execl("/bin/date", "date", N.JIL);

The first argument to exec I is the file name of the command; you have to know where it
is found in the file system. The second argument is conventionally the program name
(that is, the last component of the file name), but this is seldom used except as a placeholder. If the command takes arguments, they are strung out after this; the end of the list
is marked by a NJ.LL argument.
The exec I call overlays the existing program with the new one, runs that, then exits.
There is no return to the original program.
More realistically, a program might fall into two or more phases that communicate
only through temporary files. Here it is natural to make the second pass simply an exec I
call from the first.
The one exception to the rule that the original program never gets control back occurs
when there is an error, for example if the file can't be found or is not executable. If you
don't know where date is located, say
execl("/bin/date", "date", N.JIL);
execl("/usr/bin/date", "date", N.JIL);
fprintf(stderr, "Soneone stole 'date'O);

A variant of exec I called execv is useful when you don't know in advance how many
arguments there are going to be. The call is
execv(filename, argp);

where argp is an array of pointers to the arguments; the last pointer in the array must be
NJ.LL so execv can tell where the list ends. As with exec 1, f i 1ename is the file in which
the program is found, and argp[O] is the name of the program. (This arrangement is
identical to the argv array for program arguments.)
Neither of these routines provides the niceties of normal command execution. There
is no automatic search of multiple directories - you have to know precisely where the
command is located. Nor do you get the expansion of metacharacters like <, >, *, ? , and
[ ] in the argument list. If you want these, use exec 1 to invoke the shell sh, which then
does all the work. Construct a string cmmandl ine that contains the complete command
as it would have been typed at the terminal, then say
execl("/bin/sh", "sh", "-c", CCUDIUldline, NJlL);

1-14 lNIX Programdng-Second F.dition
The shell is assumed to be at a fixed place, I bin I sh. Its argument - c says to treat the
next argument as a whole command line, so it does just what you want. The only problem
is in constructing the right information in cmmandl ine.
5.3. Control of Processes - Fork and Wait
So far what we've talked about isn't really all that useful by itself. Now we will show
how to regain control after running a program with exec l or execv. Since these routines
simply overlay the new program on the old one, to save the old one requires that it first be
split into two copies; one of these can be overlaid, while the other waits for the new, overlaying program to finish. The splitting is done by a routine called fork:
proc id= fork();
splits the program into two copies, both of which continue to run. The only difference
between the two is the value of proc id, the "process id." In one of these processes (the
"child"), proc id is zero. In the other (the "parent"), proc id is non-zero; it is the process number of the child. Thus the basic way to call, and return from, another program is
if (fork() == 0)
execl("/bin/sh", "sh", "-c", mid, N.1.L);/• in child •t
And in fact, except for handling errors, this is sufficient. The fork makes two copies of
the program. In the child, the value returned by fork is zero, so it calls exec l which
does the cmnm.nd and then dies. In the parent, fork returns non-zero so it skips the
execl. (If there is any error, fork returns -1).
More often, the parent wants to wait for the child to terminate before continuing
itself. This can be done with the function wa i t:
int status;
if (fork() == 0)
execl( ... );
wait (&stat us);
This still doesn't handle any abnormal conditions, such as a failure of the execl or fork,
or the possibility that there might be more than one child running simultaneously. (The
wa i t returns the process id of the terminated .child, if you want to check it against the
value returned by fork.) Finally, this fragment doesn't deal with any funny behavior on
the part of the child (which is reported in status). Still, these three lines are the heart
of the standard library's sys tElll routine, which we'll show in a moment.
The status returned bywai t encodes in its low-order eight bits the system's idea of
the child's termination status; it is 0 for normal termination and non-zero to indicate various kinds of problems. The next higher eight bits are taken from the argument of the call
to exit which caused a normal termination of the child process. It is good coding practice for all programs to return meaningful status.
When a program is called by the shell, the three file descriptors 0, 1, and 2 are set up
pointing at the right files, and all other possible file descriptors are available for use.
When this program calls another one, correct etiquette suggests making sure the same
conditions hold. Neither fork nor the exec calls affects open files in any way. If the
parent is buffering output that must come out before output from the child, the parent
must flush its buffers before the exec I. Conversely, if a caller buffers an input stream,
the called program will lose any information that has been read by the caller.
5.4. Pipes
A pipe is an 1/0 channel intended for use between two cooperating processes: one process writes into the pipe, while the other reads. The system looks after buffering the data
and synchronizing the two processes. Most pipes are created by the shell, as in

lNIX Programning - Second Edition 1-15
ls

I pr

which connects the standard output of 1s to the standard input of pr. Sometimes, however, it is most convenient for a process to set up its own plumbing; in this section, we will
illustrate how the pipe connection is established and used.
The system call pipe creates a pipe. Since a pipe is used for both reading and writing, two file descriptors are returned; the actual usage is like this:
int

fd[2];

stat= pipe(fd);
if (stat == -1)
I* there was an error ... *I
f d is an array of two file descriptorSjwhere f d [ 0] is the read side of the pipe and f d [ 1 ]
is for writing. These may be used in read, write and close calls just like any other file
descriptors.
If a process reads a pipe which is empty, it will wait until data arrives; if a process
writes into a pipe which is too full, it will wait until the pipe empties somewhat. If the
write side of the pipe is closed, a subsequent read will encounter end of file.
To illustrate the use of pipes in a realistic setting, let us write a function called
popen(md, DDde), which creates a process end Gust as system does), and returns a
file descriptor that will either read or write that process, according to DDde. That is, the
call
fout = popen( "pr", WU1E) ;
creates a process that executes the pr command; subsequent wr i t e calls using the file
descriptor f out will send their data to that process through the pipe.
popen first creates the the pipe with a pipe system call; it then forks to create two
copies of itself. The child decides whether it is supposed to read or write, closes the other
side of the pipe, then calls the shell (via execl) to run the desired process. The parent
likewise closes the end of the pipe it does not use. These closes are necessary to make
end-of-file tests work properly. For example, if a child that intends to read fails to close
the write end of the pipe, it will never see the end of the pipe file, just because there is
one writer potentially active.

1-16 lNIX Prograrming - Second Fili t ion
#include <stdio.Ir>
#define READ 0
#define \'\RI1E 1
#define tst (a, b)
(m>de == READ ? (b)
int
popen pid;
static

(a))

popen ( md, m>de)
char *md;
int m>de;
{

int p[2];
if (pipe(p) < 0)
return(.J.llJIL);
,
if ((popen pid =fork()) == 0) {
close(tst(p[\\Rl1E], p[BFAD]));
close(tst(O, l));
dup( tst (p[RFAD], p[\\Rl1E]));
close( tst (p[RFAD), p[Wll1E]));
execl("/bin/sh", "sh", "-c'', mil, O);
exit(l);
/*disaster has Occurred if we get here*/
}

if (popen pid == -1)
return(NJii);
close(tst(p[RFAD], p[\\Ri1E]));
return(tst(p[\\Rl1E], p(RFAD]));

The sequence of closes in the child is a bit tricky. Suppose that the task is to create a
child process that will read data from the parent. Then the first close closes the write
side of the pipe, leaving the read side open. The lines
close(tst(O, l));
dup( tst (p[RFAD), p[\\RI1E]));

are the conventional way to associate the pipe descriptor with the standard input of the
child. The close closes file descriptor 0, that is, the standard input. dup is a system call
that returns a duplicate of an hlready open file descriptor. File descriptors are assigned in
increasing order and the first available one is returned, so the effect of the dup is to copy
the file descriptor for the pipe (read side) to file descriptor O; thus the read side of the
pipe becomes the standard input. (Yes, this is a bit tricky, but it's a standard idiom.)
Finally, the old read side of the pipe is closed.
A similar sequence of operations takes place when the child process is supposed to
write from the parent instead of reading. You may find it a useful exercise to step
through that case.
The job is not quite done, for we still need a function pclose to close the pipe
created by popen. The main reason for using a separate function rather than close is
that it is desirable to wait for the termination of the child process. First, the return value
from pc lose indicates whether the process succeeded. Equally important when a process
creates several children is that only a bounded number of unwaited-for children can exist,
even if some of them have terminated; performing the wa i t lays the child to rest. Thus:

lNIX Progrmming-Second F.dition 1-17
#include <signal.h>
pclose(fd)
int fd;

/* close pipe fd */

{

register r, (*hstat)(), (*istat)(), (*qstat)();
int
status;
extern int popen pid;
close(fd);
istat = signal(SIGil\T, SIG IGS');
qstat = signal(SI(JJJIT, SIG IGS');
hstat = signal (SI<HJP, SIG IGS');
while ((r = wait(&status)) I= popen pid && r I= -1);
if (r =

-1)

status = -1;
signal(SIGil\T, istat);
signal(SI(JJJIT, qstat);
signal(SICJIJP, hstat);
return(status);
The calls to s i gna I make sure that no interrupts, etc., interfere with the waiting process;
this is the topic of the next section.
The routine as written has the limitation that only one pipe may be open at once,
because of the single shared variable popen pi d; it really should be an array indexed by
file descriptor. A popen function, with slightly different arguments and return value is
available as part of the standard I/0 library discussed below. As currently written, it
shares the same limitation.
6. SIGNALS-INTERRUPTS AND ALL THAT
This section is concerned with how to deal gracefully with signals from the outside
world (like interrupts), and with program faults. Since there's nothing very useful that
can be done from within C about program faults, which arise mainly from illegal memory
references or from execution of peculiar instructions, we'll discuss only the outside-world
signals: interrupt, which is sent when the DEL character is typed; quit, generated by the
FS character; hangup, caused by hanging up the phone; and terminate, generated by the
kill command. When one of these events occurs, the signal is sent to all processes which
were started from the corresponding terminal; unless other arrangements have been made,
the signal terminates the process. In the quit case, a core image file is written for debugging purposes.
The routine which alters the default action is called signal. It has two arguments:
the first specifies the signal, and the second specifies how to treat it. The first argument is
just a number code, but the second is the address is either a function, or a somewhat
strange code that requests that the signal either be ignored, or that it be given the default
action. The include file s i gna I . h gives names for the various arguments, and should
always be included when signals are used. Thus
#include <signal.h>
signal(SIGil\T, SIG IGS');
causes interrupts to be ignored, while
\

signal (SIGil\T, SIG DFL);
I

restores the default action of process termination. In all cases, signal returns the previous value of the signal. The second argument to s i gna I may instead be the name of a

1-18 lNIX Progranming - Second Edit ion
function (which has to be declared explicitly if the compiler hasn't seen it already). In
this case, the named routine will be called when the signal occurs. Most commonly this
facility is used to allow the program to clean up unfinished business before terminating,
for example to delete a temporary file:
#include <signal.h>
imin()
{

int onintr();
if (signal(SIGINI', SIG I~) !=SIG I~)
signal(SIGINI', onintr);
I* Process ... *I

exit(O);
}

onintr()
{

unlink(tE1q>file);
exit(l);
}

Why the test and the double call to s i gna l? Recall that signals like interrupt are
sent to all processes started from a particular terminal. Accordingly, when a program is to
be run non-interactively (started by&), the shell turns off interrupts for it so it won't be
stopped by interrupts intended for foreground processes. If this program began by
announcing that all interrupts were to be sent to the on int r routine regardless, that
would undo the shell's effort to protect it when run in the background.
The solution, shown above, is to test the state of interrupt handling, and to continue
to ignore interrupts if they are already being ignored. The code as written depends on the
fact that s i gna I returns the previous state of a particular signal. If signals were already
being ignored, the process should continue to ignore them; otherwise, they should be
caught.
A more sophisticated program may wish to intercept an interrupt and interpret it as a
request to stop what it is doing and return to its own command-processing loop. Think of
a text editor: interrupting a long printout should not cause it to terminate and lose the
work already done. The outline of the code for this case is probably best written like this:
#include <signal.h>
#include <setj...,.h>
jDt> buf sjbuf;
imin()
{

int (*istat)(), onintr();
istat = signal(SIGINI', SIG I~); /*save original status*/
setjDt>(sjbuf); /* save current stack position*/
if (istat !=SIG I~)
signal(SIGINI', onintr);
I* imin processing loop */

lNlX Progrmming - Second Edit ion 1-19
onintr()
{

printf("OnterruptO);
longjiq>(sjbuf);
/* return to saved state */
}

The include file set j1q>. h declares the type jmp buf an object in which the state can be
saved. s j buf is such an object; it is an array of some sort. The set jmp routine then
saves the state of things. When an interrupt occurs, a call is forced to the onintr routine, which can print a message, set flags, or whatever. longjmp takes as argument an
object stored into by set jmp, and restores control to the location after the call to
set j1q>, so control (and the stack level) will pop back to the place in the main routine
where the signal is set up and the main loop entered. Notice, by the way, that the signal
gets set again after an interrupt occurs. This is necessary; most signals are automatically
reset to their default action when they occur.
Some programs that want to detect signals simply can't be stopped at an arbitrary
point, for example in the middle of updating a linked list. If the routine called on
occurrence of a signal sets a flag and then returns instead of calling exit or longjmp,
execution will continue at the exact point it was interrupted. The interrupt flag can then
be tested later.
There is one difficulty associated with this approach. Suppose the program is reading
the terminal when the interrupt is sent. The specified routine is duly called; it sets its
flag and returns. If it were really true, as we said above, that "execution resumes at the
exact point it was interrupted," the program would continue reading the terminal until
the user typed another line. This behavior might well be confusing, since the user might
not know that the program is reading; he presumably would prefer to have the signal take
effect instantly. The method chosen to resolve this difficulty is to ter~inate the terminal
read when execution resumes after the signal, returning an error code which indicates
what happened.
Thus programs which catch and resume execution after signals should be prepared for
"errors" which are caused by interrupted system calls. (The ones to watch out for are
reads from a terminal, wait, and pause.) A program whose onintr program just sets
intflag, resets the interrupt signal, and returns, should usually include code like the following when it reads the standard input:
if (getchar() == J!DF)
if ( intflag)
I* 1!DF caused by interrupt */
else
I* true end-of-file */

A final subtlety to keep in mind becomes important when signal-catching is combined
with execution of other programs. Suppose a program catches interrupts, and also
includes a method (like "!" in the editor) whereby other programs can be executed. Then
the code should look something like this:

== O)
execl( ••. );
signal(SIGINI', SIG IQ\J);
/* ignore interrupts */
wait(&status); /*until the child is done*/
signal(SIGINI', onintr); /* restore interrupts */
if (fork()

Why is this? Again, it's not obvious but not really difficult. Suppose the program you call
catches its own interrupts. If you interrupt the subprogram, it will get the signal and
return to its main loop, and probably read your terminal. But the calling program will
also pop out of its wait for the subprogram and read your terminal. Having two processes
reading your terminal is very unfortunate, since the system figuratively flips a coin to

1-20 lNIX Progranming - Second Edit ion
decide who should get each line of input. A simple way out is to have the parent program
ignore interrupts until the child is done. This reasoning is reflected in the standard I/O
library function system
#include <signal.II>
system(s)
char *s;

I* run ccnma.nd string s */

{

int status, pid, w;
register int (*istat)(), (*qstat)();
if ((pid = fork()) == 0) {
execl("/bin/sh", "sh", "-c", s, O);
exit(127);
}

istat = signal(SIGINr, SIG IQ'li);
qstat = signal(SI<IJJIT, SIG IQ'li);
while ((w = wait(&status)) != pid && w != -1)
if (w == -1)
status = -1;
signal(SIGINr, istat);
signal(SI<IJJIT, qstat);
return(status);

As an aside on declarations, the function s i gna 1 obviously has a rather strange
second argument. It is in fact a pointer to a function delivering an integer, and this is
also the type of the signal routine itself. The two values SIG ICN and SIG IFL have the
right type, but are chosen so they coincide with no possible actual functions. For the
enthusiast, here is how they are defined for the PDP-11; the definitions should be
sufficiently ugly and nonportable to encourage use of the include file.
#define
#define

SIGDFL (int (*)())O
SIG IQ'li (int (*)())1

References
(1) K. L. Thompson and D. M. Ritchie, The UNIX Programmer's Manual, Bell Labora-

tories, 1978.
[2] B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice-Hall, Inc.,
1978.
[3] B. W. Kernighan, "UNIX for Beginners - Second Edition." Bell Laboratories, 1978.

(

lNIX Progranming - Second Edit ion 1-21

Appendix - The Standard 1/0 Library
D. M. Ritchie
The standard I/0 library was designed with the following goals in mind.
1. It must be as efficient as possible, both in time and in space, so that there will be no
hesitation in using it no_ matter how critical the application.
2. It must be simple to use, and also free of the magic numbers and mysterious calls
whose use mars the understandability and portability of many programs using older
packages.
3. The interface provided should be applicable on all machines, whether or not the programs which implement it are directly portable to other systems, or to machines other
than the PDP-11 running a version of UNIX.
1. General Usage
Each program using the library must have the line

#include <stdio.h>

which defines certain macros and variables. The routines are in the normal C library, so
no special library argument is needed for loading. All names in the include file intended
only for internal use begin with an underscore to reduce the possibility of collision with
a user name. The names intended to be visible outside the package are
stdin
The name of the standard input file
s tdout The name of the standard output file
stderr The name of the standard error file
BlF
is actually -1, and is the value returned by the read routines on end-of-file or
error.
NJIL
is a notation for the null pointer, returned by pointer-valued functions to indicate an error
FILE
expands to st rue t i ob and is a useful shorthand when declaring pointers to
streams.
BJFSIZ is a number (viz. 512) of the size suitable for an I/O buffer supplied by the user.
See sethuf, below.
getc, getchar, putc, putchar, feof, ferror, tileno
are defined as macros. Their actions are described below; they are mentioned
here to point out that it is not possible to redeclare them and that they are not
actually functions; thus, for example, they may not have breakpoints set on
them.
The routines in this package offer the convenience of automatic buffer allocation and
output flushing where appropriate. The names stdin, stdout, and stderr are in effect
constants and may not be assigned to.
2. Calls

FILE *fopen(filename, type) char *filename, •type;
opens the file and, if needed, allocates a buffer for it. f i l ename is a character string
specifying the name. type is a character string (not a single character). It may be
"r", "w", 01 "a" to indicate intent to read, write, or append. The value returned is a
file pointer. If it is NJIL the attempt to open failed.
FILE *freopen(filenmne, type, ioptr) char *filename, •type; FILE *ioptr;

1-22 lNIX Progrmnning - Second Edit ion
The stream named by ioptr is closed, if necessary, and then reopened as· if by fopen.
If the attempt to open fails, NJIL is returned, otherwise ioptr, which will now refer

to the new file. Often the reopened stream is std in or s tdou t.
int getc(ioptr) FILE *ioptr;
returns the next character from the stream named by ioptr, which is a pointer to a
file such as returned by fopen, or the name stdin. The integer FDF is returned on
end-of-file or when an error occurs. The null character xO is a legal character.
int fgetc(ioptr) FILE *ioptr;
acts like get c but is a genuine function, not a macro, so it can be pointed to, passed
as an argument, etc.
putc(c, ioptr) FILE *ioptr;
putc writes the character c on the output stream named by ioptr, which is a value
returned from fopen or perhaps stdout or stderr. The character is returned as
value, but FDF is returned on error.
fputc(c, ioptr) FILE *ioptr;
acts like put c but is a genuine function, not a macro.
fclose(ioptr) FILE *ioptr;
The file corresponding to ioptr is closed after any buffers are emptied. A buffer allocated by the 1/0 system is freed. fclose is automatic on normal termination of the
program.
fflush(ioptr) FILE *ioptr;
Any buffered information on the (output) stream named by ioptr is written out.
Output files are normally buffered if and only if they are not directed to the terminal;
however, stderr always starts off unbuffered and remains so unless setbuf is used,
or unless it is reopened.
exit ( errcode);
terminates the process and returns its argument as status to the parent. This is a special version of the routine which calls f flush for each output file. To terminate
without flushing, use exit.
feof(ioptr) FILE *ioptr;
returns non-zero when end-of-file has occurred on the specified input stream.
ferror(ioptr) FILE *ioptr;
returns non-zero when an error has occurred while reading or writing the named
stream. The error indication lasts until the file has been closed.
getchar();
is identical to getc(stdin).
putchar(c);
is identical to putc(c, stdout).
char *fgets(s, n, ioptr) char *s; FILE *ioptr;
reads up to n-1 characters from the stream ioptr into the character pointer s. The
read terminates with a newline character. The newline character is placed in the
buffer followed by a null character. fgets returns the first argument, or NJlL if error
or end-of-file occurred.
fputs(s, ioptr) char *s; FILE *ioptr;
writes the null-terminated string (character array) s on the stream ioptr. No newline is appended. No value is returned.
ungetc(c, ioptr) FILE *ioptr;
The argument character c is pushed back on the input stream named by ioptr. Only

(

lNIX Progrmnning - Second F.di t ion 1-23
one character may be pushed back.
printf(fonnat, al, ... )char *fonnat;
fprintf(ioptr, fonnat, al, .•. )FILE *ioptr; char *fonnat;
sprintf(s, fonnat, al, ... )char *s, *fonnat;
printf writes on the standard output. fprintf writes on the named output stream.
spr intf puts characters in the character array (string) named by s. The
specifications are as described in section pr intf(3) of the UNIX Programmer's
Manual.
scanf(fonnat, al, ... )char *fonnat;
fscanf(ioptr, fonnat, al, •.. )FILE *ioptr; char *fonnat;
sscanf(s, fonnat, al, ... ) char *s, *fonnat;
scanf reads from the standard input. fscanf reads from the named input stream.
sscanf reads from the character string supplied as s. scanf reads characters, interprets them according to a format, and stores the results in its arguments. Each routine expects as arguments a control string fonnat, and a set of arguments, each of
which must be a pointer, indicating where the converted input should be stored.
scanf returns as its value the number of successfully matched and assigned input
items. This can be used to decide how many input items were found. On end of file,
H>F is returned; note that this is different from 0, which means that the next input
character does not match what was called for in the control string.
fread(ptr, sizeof(*ptr), nitems, ioptr) FILE *ioptr;
reads ni tems of data beginning at ptr from file ioptr. No advance notification that
binary 1/0 is being done is required; when, for portability reasons, it becomes
required, it will be done by adding an additional character to the mode-string on the
fopen call.
fwrite(ptr, sizeof(*ptr), nitems, ioptr) FILE *ioptr;
Like fread, but in the other direction.
rewind(ioptr) FILE *ioptr;
rewinds the stream named by ioptr. It is not very useful except on input, since a
rewound output file is still open only for output.
system(string) char *string;
The string is executed by the shell as if typed at the terminal.
getw(ioptr) FILE *ioptr;
returns the next word from the input stream named by ioptr. H>F is returned on
end-of-file or error, but since this a perfectly good integer feof and ferror should be
used. A "word" is 16 bits on the PDP-11.
putw(w, ioptr) FILE *ioptr;
writes the integer w on the named output stream.
setbuf(ioptr, buf) FILE *ioptr; char *buf;
setbuf may be used after a stream has been opened but before I/O has started. If
buf is NJLL, the stream will be unbuffered. Otherwise the buffer supplied will be
used. It must be a character array of sufficient size:
char

buf[llJFSIZ];

fileno(ioptr) FILE *ioptr;
returns the integer file descriptor associated with the file.
fseek(ioptr, offset, ptrname) FILE *ioptr; long offset;
The location of the next byte in the stream named by ioptr is adjusted. offset is a
long integer. If ptrname is 0, the offset is measured from the beginning of the file; if
ptrname is 1, the offset is measured from the current read or write pointer; if

1-24 lNIX Progrsmning - Second Edit ion
ptrnmne is 2, the offset is measured from the end of the file. The routine accounts
properly for any buffering. (When this routine is used on non-UNIX systems, the
offset must be a value returned from ftel 1 and the ptrname must be 0).
long ftell(ioptr) FIIE *ioptr;
The byte offset, measured from the beginning of the file, associated with the named
stream is returned. Any buffering is properly accounted for. (On non-UNIX systems
the value of this call is useful only for handing to fseek, so as to position the file to
the same place it was when ftel 1 was called.)
getpw(uid, buf) char *buf;
The password file is searched for the given integer user ID. If an appropriate line is
found, it is copied into the character array buf, and 0 is returned. If no line is found
corresponding to the user ID then 1 is returned.
char *malloc(nlDl);
allocates n\Dl bytes. The pointer returned is sufficiently well aligned to be usable for
any purpose. NJlL is returned if no space is available.
char *calloc(nlDl, size);
allocates space for D\Dl items each of size s i ze. The space ill guaranteed to be set to 0
and the pointer is sufficiently well aligned to be usable for any purpose. NJLL is
returned if no space is available .
cfree(ptr) char *ptr;
Space is returned to the pool used by ca 11 oc. Di!ilorder can be expected if the
pointer was not obtained from ca 11 oc.
The following are macros whose definitions may be obtained by including <ctype .h>.
i sa I pha ( c) returns non-zero if the argument is alphabetic.
i supper ( c) returns non-zero if the argument is upper-case alphabetic.

is 1ower ( c) returns non-zero if the argument is lower-case alphabetic.
i sd i g i t ( c) returns non-zero if the argument is a digit.
isspace(c) returns non-zero if the argument is a spacing character: tab, newline, carriage return, vertical tab, form feed, space.
is punc t ( c) returns non-zero if the argument is any punctuation character, i.e., not a
space, letter, digit or control character.
i sa 1nmt( c) returns non-zero if the argument is a letter or a digit.
is print ( c) returns non-zero if the argument is printable tion character.

a letter, digit, or punctua-

iscntrl(c) returns non-zero ifthe argument is a control character.
i sas c i i ( c) returns non-zero if the argument is an ascii character, i.e., less than octal
0200.
toupper ( c) returns the upper-case character correspon~ing to the lower-case letter c.
tolower(c) returns the lower-case character corresponding to the upper-case letter c.

Introduction 2-1

PART 2: LANGUAGES

This part includes articles on four of the languages and four of the language preprocessors
available on ULTRIX-32:

• c
• FORTRAN77
• RATFOR
• EFL

• Pascal
• Franz Lisp
• FP
• M4
These articles are authoritative reference materials appropriate for people familiar with programming in the languages described. Each article defines the implementation of a language
or preprocessor on the ULTRIX-32 system. With the exception of the articles on Pascal,
RATFOR, and M4, these articles are not tutorial, and they are not for beginners.

C Language
The first three articles deal with the C language. "The C Programming Language - Reference
Manual" lists in detail the rules, conventions, and concepts that define the implementation of
C on the VAX computer. This is reprinted from an appendix in The C Programming
Language [1], by Kernighan and Ritchie. Before you use this article, you should know how to
write programs in C and have read The C Programming Language.
The next two articles describe C language compilers. "A Tour Through the Portable C Compiler," by Johnson, explains the Berkeley C compiler available in the ULTRIX-32 system. It
tells what happens when you compile a C program on ULTRIX-32 and is meant for people
who may support the C compiler. This article gives an excellent overview of the organization,
operation, and background of the ULTRIX-32 C compiler. The Ritchie article, "A Tour
Through the UNIX C Compiler," describes the Bell UNIX C compiler, not implemented on
ULTRIX-32.
FORTRAN
The two articles that follow describe f77 FORTRAN. The "Introduction to the f77 1/0
Library," by Wasley, lists specifications and rules for using the f77 1/0 library routines. These
routines make use of the standard C 1/0 library routines in ULTRIX-32. The article explains
/

[1] Kernighan, Brian W. and Ritchie, Dennis M., The C Programming Language, Prentice Hall, Englewood
Cliffs, N.J., 1978.

2-2 Introduction
the different methods available for accessing files, rules for use of logical units for 1/0, and
error and status handling for 1/0 processing. It tells in detail how the standard FORTRAN
commands and concepts are implemented on the ULTRIX-32 system. In addition, the article
identifies non-ANSI standard extensions to the library and shows methods you can use tO
make older FORTRAN programs compatible with this 1/0 library.
"A Portable FORTRAN 77 Compiler," by Feldman and Weinberger, describes the rules and
conventions of FORTRAN 77 as implemented on the ULTRIX-32 system. Familiarity with
FORTRAN 66 or another standard FORTRAN is prerequisite tO comprehending this article.
RATFOR and EFL
The next two articles deal with FORTRAN preprocessors. RATFOR and EFL translate input
files into FORTRAN source code. They overcome some of the cosmetic and control-flow
defects of FORTRAN while retaining desirable FORTRAN features such as universality and
efficiency. RATFOR and EFL programs are compatible with FORTRAN libraries, yet they
offer a significant improvement over standard FORTRAN.
The article "RATFOR - A Preprocessor for a Rational FORTRAN," by Kernighan, tells how
tO write RATFOR code that is easier to read and write than FORTRAN code. The article also
explains how to:
• Eliminate goto statements
• Group statements within a conditional construction
• Include the else clause as a part of a conditional construction
• Improve do, while, for, and repeat until functions
Readers will find this article easy to read and full of useful examples.
EFL is a descendant of RATFOR. EFL is more flexible; it allows more general forms for
expressions and it provides a more uniform syntax. "The Programming Language EFL," by
Feldman, lists concepts and rules and provides some programming examples.

Berkeley Pascal
The "Berkeley Pascal User's Manual" tells what you need to know tO write and execute Pascal programs on the ULTRIX-32 system if you are already familiar with Pascal programming.
The article is arranged in tutorial format; it lists reference materials, explains how to use an
editor tO create a Pascal program, and gives various execution options. Berkeley Pascal
includes six utilities for translating, compiling, running, and analyzing programs:
pi

Translates the source program into object code and stOres the object code

Interprets (executes) the object code created by pi

pix

Translates the source program and then executes it

Processes the source program tO compile an executable binary file

pxp

Creates an execution profile for a program when used tOgether with pi or pix

pxref

Produces a program listing and a cross-reference identifier from a source program

"The Berkeley Pascal User's Manual" explains how tO use these utilities, how tO handle piping, input, and output, how to interpret error diagnostics, how to include source text from
several files for the translatOr, and how tO compile separate segments of a Pascal program tO
be linked for running later. An appendix gives a precise definition of Berkeley Pascal.

Introduction 2-3
Franz Lisp
"The Franz Lisp Manual" gives a detailed and extensive description of the Berkeley dialect of
Lisp. Franz Lisp is a sophisticated language that provides a complete environment in which
you can develop and run programs. In addition, it offers:
• 14 data types
• Both a compiler and an interpreter
• Special functions (such as apply)
• System control functions (such as memory allocation)
• Macros and fclosures
• Compatibility with foreign subroutines
• Error handling capabilities
• Powerful debugging tools (trace, stepper, fixit)
• A CMU top-level package that serves as an alternative to the default Franz Lisp toplevel package
• A file package that allows you to save functions for use in other sessions
• An editor specially designed for modifying Lisp programs
Because this long article is organized as a reference manual, you may find it useful to read the
introductory section in each chapter to gain an overview, before reading the chapters in depth.
FP
FP is a preprocessor that produces Franz Lisp source code. The "Berkeley FP User's
Manual" is appropriate reading for sophisticated programmers familiar with Lisp. The article
describes, in terse terms, the principles and rules of the language. This description includes
definitions of:
• Objects
• Operations
• Functions
• Input and output procedures
• Execution options
You may find the extensive programming examples helpful.

M4
M4 is a macro processor that provides string substitution. It accepts as input source code in
any computer language and substitutes a defined text for each occurrence of a macro name.
"The M4 Macro Processor," by Kernighan and Ritchie, offers readable explanations and good
examples. You can use M4 to:
• Set up your own macros
• Create and use macros that take several arguments
• Use a set of built-in macros
• Bring in new files with an include function
• Call shell functions with a system command

The C Programming Language 2-5

The C Programming Language - Reference Manual
Dennis M. Ritchie
Bell Laboratories. Murray Hill. New Jersey
This manual is reprinted. with minor changes. from The C Programming Language. by Brian W. Kernighan and Dennis M. Ritchie. Prentic:e·Hall. Inc:.• 1978.

1. Introduction
This manual describes the C language on 1he DEC POP-I I. ·the DEC \'AX· I I. the Honeywell 6000.
the IBM System/370. and the Interdata 8/32. Where differences exist. it concentrates on the PDP· I I. but
tries to point out implementation-dependent details. With few exceptions. these dependencies follow
directly from the underlying properties of the hardware; the various compilers are generally quite c:ompa·
tible.
2. Lexical conventions
There are six classes of tokens: identifiers. keywords. constants. strings. operators. and other separators. Blanks. tabs. newlines. and comments kollec:tively ... white space .. ) ~described below are ignored
except as they serve 10 separ-•te tokens. Some white space is required to separate otherwise adjacent
identifiers. keywords. and constants.
If the input stream has been parsed into tokens up to a given character. the next token is taken to
inc:lude the longest string of characters which could possibly constitute a token.
2.1 Comments
The characters / • introduce a comment. which terminates with the characters • /. Comments do not
nest.
2.2 ldeQUfiers (Names)
An identifier is a sequence of letters and digits; the first character must be a letter. The underscore _
counts as a letter. Upper and lower case letters are different. No more thilll the first eight characters are
significant, although more may be used. External identifiers. which are used by various assemblers and
loaders. are more restricted:
DEC PDP·ll
DEC VAX-I I
Honeywell 6000
IBM 360/370
Interdata 8/32

7 characters. 2 cases
8 characters. 2 cases
6 characters. I case
7 characters. I case
8 characters. 2 cases

2.3 Keywords
The following identifiers are reserved for use as keywords. and may not be used otherwise:

int
char
float
double
struct
union
lon9
short
unsiqned
auto

'\
'

extern
reqister
typedef
static
9oto
return
sizeof
break
continue
if

ehe
for
do
while
switch
case
default
entry

The entry keyword is not currently implemented by any compiler but is reserved for future use. Some
t UNIX is a Trademark of Bell L.aboralones.

- - - - · --··--

-- ·------------

---

·-------

2-6 The C Programming Language
implement;itions also reserve the words for-::ran and asm.
2.4 Constants
There Jre several lcinds of constants. as listed below. Hoird'<lr"llre characteristics which affect sizes are
summarized in §2.6.
2.4. l Integer constants
An integer constant consisting of a sequence of digits is taken to be octal if it begins with o (digit
zero>. decimal otherwise. The digits 8 and 9 have octal value 10 and 11 respectively. A sequence of
digits preceded by Ox or OX (digit zero) is taken to be a hexadecimal integer. The hexadecimal digilS
include a or A through f or F with values 10 through 15. A decimal constant whose value exceeds the
largest signed machine integer is taken to be loner. an octal or hex constant which exceeds the largest
unsigned machine integer is likewise taken to be lone;.
2.4.2 Explicic tong constanlS
A decimal. oaal. or hexadecimal inceger constanl immediacely followed by l Clener ell) or t. is a long
constant. As discussed below. on some machines integer and long values may be considered identical.
(

2.4.3 Character coastanlS
A character constant is a character enclosed in single quotes. as in ' x'. The value of a character
constant is the numerical value of the character in the machine•s character set.
Certain non-graphic characters. the single quote • and the backslash \. may be represented according
to the foltowing table of escape sequences:
newline
horizontal tab

NL <LF)
HT

backspace

BS
CR

carriage retum
form feed

backslash

FF
\

single quote
bit pattern

ddd

\r
\f
\\
\'
\ddd

(

The escape \dddconsists of the backslash followed by t. 2. or 3 octal digits which are taken to specify the
value of the desired character. A special case of this construction is \0 (not followed by a digit), which
indicates the charaeter NUL. If the character followin& a backslash is not one of those specified, the
backslash is ignored.
2.4.4 Ftoatin& constants
A Roating constant consists or an integer part. a decimal point. a fraclion part. an e or !. and an
optionally signed integer exponent. The integer and fraction pans both consist of a sequence of digits.
Either the integer part or the fraction pan (not both) may be missing: either the decimal point or the e
and the exponent (not both) may be missing. Every Roating constant is taken to be double-precision.

2.5 Strings
A string is a sequence of characters surrounded by double quotes. as in " ••• "· A string has type
"array of characters" and storage class static (see §4 below) and is initialized with the given characters.
All strings. even when written identially, are distinct. The compiler places a null byte \0 at the end 01'
each string so that programs which sc:tn the string can find its end. In a string~ the double quote char:ic·
ter " must be preceded by a \; in addition. the same escapes as described for character const:intS may be
used. Finally, a \ and an immediately following newline are ignored.
2.6 Hardware characteristics
The following uble summarizes cert;iin hardware properties which vary from machine to machine.
Alrhough these affect program portability. in practice they are less of a problem than might be thought a
prtor1.

The C Programming Language 2-7

ehar
int
short
lonq
float
double
range

DEC PDP-I I

Honeywell 6000

IBM 370

Interdata 8/32

ASCII
8 bits
16
16
32
32
64
±10=31

ASCII
9 bits
36
36
36
36
72
±10= 3•

EBCDIC
8 bits
32
16
32
32

ASCII
8 bits
32
16
32
32
64
±10::76

64
±10::76

The vAX-11 is identical to the POP· I I except that intqers have 32 bits.

3. Syntax notation
Jn the syntax notation used in this manual. syntactic categories are indicated by italic type, and literal
words and characters in bold type. Alternative categories are listed on separate lines. An optional ter·
minal or non-terminal symbol is indicated by the subscript "opt." so that

l aprasion.,, )
indicates an optional expression enclosed in braces. The syntax is summarized in §18.
4. What's in a name?
C bases the interpretation of an identifier upon two attributes of the identifier: its s1ar~ class and its
ty~•• The storage class determines the location and lifetime of the storage associated with an identifier;
the type determines the meaning of the values found in the identifier's storage.
There are four declarable storage classes: automatic. static. external. and register. Automatic variables are IOC31 to each invocation of a block (§9.2). and are discarded upon exit from the block; static
variables are local to a block. but retain their values upon reentry to a block even after control has left
the block: external variables exist and retain their values throughout the execution of the entire program,
and may be used for communication between functions. even separately compiled functions. Register
variables are (if possible) storeki in the fast reijsters of the machine; like automatic variables they are
local to each block and disappear on exit from the block.
C supports several fundamental types of objects:
Objects declared as characters (char) are large enough to store any member of the implementation's
character set. and if a genuine character from that character set is stored in a character variable. its value
is equivalent to the intqer code for that character. Other quantities may be stored into characte!' variables. but the implementation is machine~epcndent.
Up to three sizes of integer. declared short int. int, and lonq int. are available. Longer
intqers provide no less storage than shorter ones. but the implementation may make either short
intqers, or long integers, or both. equivalent to plain integers. "Plain" integers have the natural size
suggested by the host machine architecture: the other sizes are provided to meet special needs.
Unsigned intqers. declared unsiqned, obey the laws of arithmetic modulo 2" where n is the
number o( bits in the representation. (On the PDP·l l, unsigned long quantities are not supported.)
Single-precision Ooating point (float) and double-precision Ooating point (double) may be
synonymous in some implementations.
Because objects
the forqoing types can usefully be interpreted as numbers, they will be referred
to as arithntic types. Types char and int of all sizes will collectively be called integral types. float
and double will collectively be called ,floating types.
Besides the fundamental arithmetic types there is a conceptually infinite class of derived types constructed from the fundamental types in the following ways:
a"ays of objects o( most types;
functions which return objects of a given type;
painters to objects o( a given type;
s1n1crures containing a sequence of objects of various types:
unions capable of containing any one of several objects of various types.
In general these methods of constructing objects can be applied recursively.

2-8 The C Programming Language
S. Objects and lv:ilues
An object is a manipulatable region of storage: an lvaiue is an expression referring to an object. An
obvious example of an [value expression is an identifier. There are operators which yield !values: for
ex:imple. if ::: is an expression of pointer type. then •!:: is an !value expression referring to the object to
which E: points. The name .. !value" comes from the assignment expression :::1 • £2 in which the left
operand £1 must be an !value expression. The discussion of e:ich operator below indicates whe!her it
expectS !value oper:inds and whether it yields an !value.
6. Conversions
A number of operators may. depending on their operands. cause conversion of the value of an
operand from one type to another. This section explains the result to be expected from such conversions. §6.6 summ:irizes the conversions demanded by most ordinary operators; it will be supplemented as
required by the discussion or e:ich operatC\r.
6.1 Characters and integers
A character or a short integer may be used wherever an inte;er may be used. tn all cases the value
is convened to an integer. Conversion of a shorter integer to a longer always involves sign extension;
integers are signed quantities. Whether or not sign-extension occurs for characters is machine dependent.
but it is guaranteed that a member of the standard character set is non-negative. Of the machines treated
by this manual. only the POP-11 si1n-extends. On the POP·ll. character variables range in value from
-128 to 127; the characters of the ASCII alphabet are all positive. A character constant specified with an
octal escape su fers sign extension and may appe:ir negative; for example, • \377 • has the value -1.
When 1 Ir nger integer is converted to a shorter or to a c:har, it is truncated on the left; excess bits
are simpiy di!" vded.

6.2 float and double
All Ooatin1 arithmetic in C is carried out in double·precision; whenever a floa.t appears in an
expression it is lengthened to dou!lle by zero-padding its fraction. When a doUble must be converted
to flo&1:. for example by an assignment. the double is rounded before truncation to ~loa.t length.

6.3 floating and integral
Conversions of Ooating values to integral type tend tO be rather machine-dependent; in particular the
direction of truncation of negative numbers varies rrom machine to machine. The result is undefined if
the value will not fit in the space providCd.
Conversions of inteiral values to ftoatin& type are well behaved. Some loss of precision occurs ir the
destination lac:ks sufficient bits.
6.4 Pointers and integers
An integer or Iona intqer may be added to or subtracted from a pointer. in such a case the first is
convened as specified in the discussion of the addition operator.
Two pointers to objects of the same 1ype may be subtracted; in this case the result is converted to an
integer as specift~ in 1he discussion of the subtraction operator.

6.5 Uosigned
Whenever an unsigned integer and a plain integer are combined. the plain integer is converted to
unsigned and the result is unsigned. The value is 1he least unsigned integer congruent to the signed
integer (modulo 2wordsi&e). In a l's complement representation. this conversion is conc:e;nual and there is
no actual change in the bit pattern.
When an unsigned inleger is convened to lonq. the value of the result is 1he same numerically as
that of the unsigned integer. Thus the conversion amounts to padding with zeros on the left.
6.6 Arithmetic conversions
A great many operators cause conversions and yield result types in a similar way. This pattern will
be called the .. usual arithmetic conversions...
First. any operands of type c:ha.r or shcr'C are convert1:d to int, and any of type floa.t are converted to dou!lle.

(

The C Programming Language 2-9

Then. if either operand is double, the other is converted to double and that is the type of the
result.
Otherwise. if either operand is lonq, the other is converted to lone;; and that is the type of the
result.
Otherwise. if either operand is unsic;;ned, the other is converted to unsic;;ned and that is the type
of the result.
Otherwise. both operands must be int, and that is the type of the result.
7. Expressions
The precedence of expression operators is the same as the order of the major subsections of this sec·
tion. highest precedence first. Thus. for example. the expressions referred to as the operands of+ (§7.4>
are those expressions defined in §§7.1-7.3. Within each subsection. the operators have the same pre·
cedence. Left· or right-associativity is specified in each subsection for the operators discussed therein.
The precedence and associativity of all the expression operators is summarized in the grammar of § 18.
Otherwise the order of evaluation of expressions is undefined. In particular the compiler considers
itself free to compute subexpressions in the order it believes most efficient. even if the subexpressions
involve side effects. The order in which side effects take place is unspecified. Expressions involving a
commutative and associative operator ( *· +. &. 1... ) may be rearranged arbitrarily. even in the presence
of parentheses: to force a particular order of evaluation an explicit temporary must be used.
The handling of overflow and divide check in expression evaluation is machine-dependent. All exist·
ing implementations of C ignore integer overflows: treatment of division by O. and all floating-point
exceptions. varies between machines. and is usually adjustable by a library function.
7.1 Primary expressions
Primary expressions involving •• ->. subscripting. and function calls group left to right.

primary-vcprtssion:
idtntijier
constant
stri~

( txprtssion )
prima~ession Cvcprtssion l

primary-'expression ( vcprtssion-list.,,. l
primary-lwzlue • identifier
primary-exprtssion -> identifier
txpression-list:
expression
vcpression-list , exprtssion
An identifier is a primary expression. provided it has been suitably declared as discussed below. Its type
is specified by its declaration. If the type of the identifier is ..array of ..... , however, then the value of
the identifier-expression is a pointer to the first object in the array, and the type of the expression is
.. pointer to ...... Moreover. an array identifier is not an lvalue expression. Likewise. an identifier which
is declared .. function returning ..... , when used except in the function-name position of a call. is converted to .. pointer to function returning ......
A constant is a primary expression. Its type may be int. lone;. or double depending on its form.
Character constants have type int: floating constants are double.
A string is a primary expression. Its type is originally ••array of eha.r"; but following the same rule
given above for identifiers. this is modified to .. pointer to char" and the result is a pointer to the first
character in the string. (There is an exception in certain initializers; see §8.6.)
A parenthesized expression is a primary expression whose type and value are identical to those of the
unadorned expression. The presence of parentheses does not affect whether the expression is an !value.
A primary expression followed by an expression in square brackets is a primary expression. The
intuitive meaning is that of a subscript. Usually. the primary expression has type .. pointer to ... ". the
subscript expression is int. and the type of the result Is .. . . . ... The expression E1 (E2] is identical (by
definition) to * ( (E1 l + CE2) l. All the clues needed to understand this notation are contained in this sec·
tion together with the discussions in§§ 7.1. 7.2, and 7.4 on identifiers. •, and +respectively; §14.3 below
summarizes the implications.

2-10 The C Programming Language
A function c:ill is a primary expression followed by parentheses c:ont:uning a possibly empty.
comma-separated list of expressions which constitute the actual arguments co the function. The primary
expression must be of type •·function returning ... ••• and the result of the function oil is of type •• ... ••.
As indic::ued below. a hitherto unseen identifier followed immediately by a left parenthesis is contextually
declared to represent a function retuminc an integer: 1hus in the most common case. integer-valued
functions need not be declared.
Any actual argumentS
type !loaC are converted to double before the C31l~ any of type cha= Or
short are converted to in~ and as usuaJ. array names are converted to pointers. No other conversions
are performed automatically; in particular. the compiler does not compare the types of actual arguments
with those of formaJ arguments.. If conversion is needed. use a cast; see §7.2. 8. 7.
In preparing for the c:aU to a function. a copy is made of
actual parameter: thus. aU argument·
passin1 in C is strictly by value. A function may change the values of its formal parameters. but these
changes cannot affect the values of the ac:tual parameters. On the other hand. it is possible to pass a
pointer on the understanding that the function may change the value of the object to which the pointer
points.. An array name is a pointer expression. The order of evaluation of arguments is undefined by the
lan1uage; take note that the various compilers differ.
Recursive calls to any function are permitted.
A primary expression followed by a dot followed by an identifier is an expression. The first expres·
sion must be an !value namins a structure or a union. and the identifier must name a member
the
structure or union. The result is an !value referring to the named member of the structure or union.
A primary expression followed by an arrow (built from a - and a
followed by an identifier is an
expression. The first expression must be a pointer to a structure or a union and the identifier must name
a member of that structure or union. The result is an lvalue referring to the named member of the struc·
ture or union to which the pointer expression points.
Thus the expression E1 ->MOS is the same as ( •E1 > • MOS. Structures and unions are discussed in
§8.5. The rules given here for the use or structures and unions are not enforced strictly. in order to allow
an escape ,from the typing mr-hanism. See §l 4.1.

each

7.2 Unary operators
Expressions with unary operators group right-to-left.
unary-aprtssion: ·
* aprtssion
5 lvalut
- ccprtssion
I ccprusion
• ccprusion
++ /valw
- lvoluc
lvalu• ++
/val~-

( typt-na/M ) aprasion
sizeof e:rprtssion
sizeof ( rypt-nanw )

The unary • operator means indil'«tion: the expression must be a pointe·r. and the result is an !value
referring to the object to which the expression points. If the type of the expression is .. pointer to ......
the type of the result is •• ......
The result
the unary ' operator is a pointer to the object referred to by ~he !value. If the type of
the !value is•• ... ", the·type of the result is ''pointer to ...".
The re:sult of the unary - operator is the negative of its operand. The usual arithmetic conversions
are performed. The negative of an unsigned quantity is computed by subtracting itS value from 2".
where n is the number of bitS in an inc. There is no unary + operator.
The result of the logical negation operator ! is 1 if the value of its operand is O. 0 if the value of its
operand is non-zero. The type of the result is inc. lt is applicable to any arithmetic: type or to pointers.
The • operator yields the one's complement of its operand. The usual arithmetic conversions are
petfortned. The type of the operand must be integral.
The object referred to by the !value operand of prefix ++is incremented. The value is the new value
of the operand. but is not an !value. The expression ....x is equivalent to x~· 1 . Se: the discussions oi
addition <§7.4) and assignment operators (§7.14) for information on conversions.

(

The C Programming Language 2-11

The !value operand of prefix -- is decremented analogously to the prefix •+ operator.
When postfix ++ is applied to an !value the result is the value of the object referred to by the !value.
After the result is noted, the object is incremented in the same manner as for the prefix ++ operator.
The type of the result is the same as the type of the !value expression.
When postfix -- is applied to an !value the result is the value of the object referred to by the !value.
After the result is noted. the object is decremented in the manner as for the prefix -- operator. The type
of the result is the same as the type of the lvalue expression.
An expression preceded by the parenthesized name of a data type causes conversion of the value of
the expression to the named type. This construction is called a cast. Type names are described in §8. 7.
The sizeof operator yields the size. in bytes. of its operand. (A byte is undefined by the language
except in terms of the value of sizeof. However, in all existing implementations a byte is the space
required to hold a char.> When applied to an array, the result is the total number of bytes in the array.
The size is determined from the declarations of the objects in the expression. This expression is semantically an integer constant and may be used anywhere a constant is required. Its major use is in communication with routines like storage allocators and l/O systems.
The sizeof operator may also be applied to a parenthesized type name. In that case it yields the
size, in bytes, of an object of the indicated type.
The construction sizeof (type) is taken to be a unit, so the expression sizeof (type> -2 is the
same as (sizeof(type) )-2.

7.3 Multiplicative operators
The multiplicative operators *• /, and "' group left-to-righl. The usual arithmetic conversions are
performed.

multiplicatiw-expression:
aprasion * expression
exprasion I expression
apression"' expression

The binary • operator indicates multiplication. The • operator is associative and expressions with
several multiplications at the same level may be rearranged by the compiler.
The binary I operato.r indicates division. When positive integers are divided truncation is toward 0,
but the form of truncation is machine-dependent if either operand is negative. On ali machines covered
by this manual, the remainder has the same sign as the dividend. It is always true that (a/b) •b + a."'b
is equal to a (if bis not 0).
'
The binary "' operator yields the remainder from the division of the first expression by the second.
The usual arithmetic conversions arc performed. The operands must not be float.

7.4 Additive operators
The additive operators + and - group left-to-right. The usual arithmetic conversions arc performed.
There are some additional type possibilities for each operator.
additiw-expression:
expreuion + exprasion
,expression - expression

The result of the + operator is the sum of the operands. A pointer ..to an object in an array and a value of
any integral type may be added. The latter is in all cases converted to an address offset by multiplying it
by the length of the object to which the pointer points. The result is a pointer of the same type as the
original pointer, and which points to another object in the same array, appropriately offset from the original object. Thus if P is a pointer to an object in an array, the expression P+1 is a pointer to the next
object in the array.
No further type combinations are allowed for pointers.
The + operator is associative and expressions with several additions at the same level may be rear·
ranged by the compiler.
The result of the - operator is the difference of the operands. The usual arithmetic conversions are
performed. Additionally, a value of any integral type may be subtracted from a pointer, and then the
same conversions as for addition apply.
If two pointers to objects of the same type are subtracted, the result is converted (by division by the
length of the object) to an int representing the number of objects separating the pointed-to objects.
This conversion will in general give unexpected results unless the pointers point to objects in the same

2-12 The C Programming Language
array. sine:: pointers. ~ven to objectS
objec:t·length.

or the same type. do not nec::ssarily differ by a multiple of the

1.S Shift operators
The shift operators << and >> group left-to-right. Both perform the usual arithmetic: conversions on
their operands. each of which must be integral. Then the right operand is converted to in~ the type of
the result is that of the left operand. The .result is undefined if the right operand is negative. or greater
than or equal to the length of the object in bitS.

shift-exprtssion:
aprn:sion << aprtssion
upnssion >> aprtssion
The value of E1 «E2 is !1 (interpreted as a bit pattern) left-shifted £2 bitS: vacated bits are 0-filled.
The value of !1 »!2 is E1 right-shifted E2 bit positions. The right shirt is guaranteed to be logical ro.
fill) if E1 is unsic;ned: otherwise it may be (and is. on the PDP-11) arithmetic: (fill by a copy of the sign
bit>.
1.6 Relational operators
The relational operators group left-to-right. but this fact is not very useful: a.<b<c does not mean
what it se:ms to.

rtlational-cprtssion:
aprasion < aprmion
aprasion > aprtssion
aprasion <• txprtssion
aprasion >- aprmion
The operators < (less than). >{greater than). <•(less than or equal to) and >• (greater than or equal to)
all yield 0 if the specified relation is false and l it it is true. The type of the result is in~ The usual
arithmetic conversions are pert'ormed. Two pointers may be compared: the result depends on the relative
locations in the address space of the pointed-to objec:tS. Pointer comparison is portable only when the
pointers point to objec:tS in the same amy.
1.1 Equality operators

tqlllJlity-aprtssion:
aprn:sian - t:cprasion
aprasion ! • expression
The •• (equal to) and the ! • (not equal to) operators are exactly analogous to the relational operators
except ror their lower prec:denc:e. <Thus a<b •• c<d is 1 whenever a.<b and c<d have the same
truth-value).
A pointer may be compared to an integer. but the result is machine dependent unless the integer is
the constant 0. A pointer to which 0 has been assigned is guaranteed not to point to any object. and will
appear to be equal to O: in conventional usage. such a pointer is considered to be null.
7.8 Bitwise AND operator

and-exprnsion:
tqJrasion ' aprtSSion
The ' operator is associative and expressions involving ' may be rearranged. The usual arithmetic:
c:onversior.s are performed: the result is the bitwise AND runction or the operands. The operator applies
only to integral operands.
7.9 Bitwise exclusive OR. operator

urlusiv•-or·t:cpr•ssion:
apn:uion .. tqJrt:ssion
The • operator is associative and expressions involving • may be rearranged. The usual arithmetic:
conversions are performed: the result is the bitwise exclusive OR function of the operands. The operator
applies only to integr:il operands.

(

The C Programming Language 2-13
7.10 Bitwise inclusive OR operator
inclusiw-or~pression:

e."Cpression I expression
The I operator is associative and expressions involving I may be re3rranged. The usual arithmetic
conversions are performed: the result is the bitwise inclusive OR function of its operands. The operator
applies only to integral operands.
7.11 Logical AND operator

logical-and-expression:
expression && expression
The && operator groups left-to-right. It returns 1 if both its operands are non-zero. 0 otherwise. Unlike
&. r.& guarantees left-to-right evaluation: moreover the second operand is not evaluated if the first
operand is 0.
The operands need not have the same type, but each must have one of the fundamental types or be
a pointer. The result is always int.
7.12 Logical OR operator

logical-or-expression:
expression I I expression
The I I operator groups left-to-right. It returns I if either of its operands is non-zero. and 0 otherwise.
Unlike I. 11 guarantees left-to-right evaluation: moreover, the second operand is not evaluated if the
value of the first operand is non-zero.
The operands need not have the same type, but each must have one of the fundamental types or be
a pointer. The result is always int.
7.13 Conditional operator

conditional-expression:
expression ? expression : expression
Conditional expressions group right-to-left. The first expression is evaluated and if it is non-zero. the
result is the value of the second expression. otherwise that of third expression. If possible. the usual
arithmetic conversions are performed to bring the second and third expressions to a common type: other·
wise. if both are pointers of the same type. the result has the common type: otherwise. one must be a
pointer and the other the constant 0, and the result has the type of the pointer. Only one of the second
and third expressions is evaluated.
7.14 Assignment operators
There are a number of assignment operators. all of which group right-to-left. All require an !value as
their left operand. and the type of an assignment expression is that· of its left operand. The value is the
value stored in the left operand after the assignment has taken place. The two par.ts of a compound
assignment operator arc separate tokens.

assignment-expression:
/value • expression
/value +• expression
/value -· expression
/value•• expression
/value /• expression
/value~- expression
/value >>• expression
/value<<• expression
/value&• expression
/value "'• expression
/value I • expression
In the simple assignment with •, the value of the expression replaces that of the object referred to by
the !value. If both operands have arithmetic type. the right operand is converted to the type of the left

2-14 The C Programming Language

(
preparatory to the assignment.
The behavior of an expression of the form !!1 op• !!2 may be inferred by t3king it as equivalent to
!:1 • E1 op (£2); however. !:1 is evaluated only once. In +• and -•. the left operand may be a
pointer. in which case the (integral) right operand is convened as explained in §7.4; all right operands
and ail non-pointer left operands must have arithmetic type.
The compilers currently allow a pointer to be assigned to an integer. an integer to a pointer. and a
pointer to a pointer of another type. The assignment is a pure copy operation. with no conversion. This
usage is nonportable. and may produce pointers which cause addressing exceptions when used. However.
it is guaranteed that assignment of the constant 0 to a pointer will produce a null ,Pointer distinguishable
from a pointer to any object.

7.1 S Comma operator

comma-ap,asion:
apl'USion , aprmion

A pair of expressions separated by a comma is evaluated left-to-ri1ht and the value of the left expression
is disc:arded. The type and value or the result are the type and value of the right operand. This operator
groups left-to-right. In contexts where comma is zjven a special meaning. for example in a list of actual
arguments to functions (§7.1) and lists of initializers (§8.6), the comma operator as described in this se~
tion can only appear in parentheses; for example,

tea, Ct•l, t+2), e)
has three arguments. the se=nd of which has the value S.

a. Declarations
Declarations are used to specify the interpretation which C aives to each identifter: they do not
necessarily reserve stora1e associated with the idenlifter. Declarations have the form
d«ltmztion:
d«l-sp«ijien tf«la,ator-lisl.,, ;

The dedarators in the declarator-list contain the identifters being declared. The decl-specifters consist of a
sequence of type and storage class specifiers.
deel-sp«ijirn:
tyf*StJ«i/it!r ded-speei/it!n.,
$C•S{J«i/it!r d«l•S{J«i/it!n.,,

The list must be self-consistent in a way described below.
8.1 Storage class specifiers
The sc-spccifiers are:
SC•S{J«i/it!r.

auto
S1:&tie
extern
r119is1:e:'
tr,:iedef

The eypedef specifier does not reserve storage arid is called a .. stor:ige class SlJccifter" only for syntactic
convenience: it is discussed in §8.S. The meanings of the various storage classes were discussed in §4.
The au1:o. static. and r119ister declarations also serve as definitions in that they auie an
appropriate amount of srorage to be reserved. ln the extern case there must be an external definition
(§10) for the given identifiers somewhere outside the runction in which they are declared.
A reqister dectar.uion is best thought of as an auto declaration. together with a hint to the compiler that the variables declared will be heavily used. Only the first few such decl:imions are effective.
Moreover, only variables of certain types will be stored in. registers; on the PDP·l l. they are int. char.
or pointer. One other restriction applies to register vari:ibles: the address-of operator ' c:innot be :ipplied
to them. Smaller~ faster programs c:in be expected if register d«larations are used appropriately. but
future improvements in code gener:ition m:iy render them unnecessary.

The C Programming Language 2-15

At most one sc-specifier may be given in a declaration. If the sc-specifier is missing from a declaration. it is taken to be auto inside a function. extern outside. Exception: functions are never automatic.

1.2 Type specifiers
The type-specifiers are
ry~sp«ijier:

char
short
int
lone;
unSiCJtled
float
double
struct-M•llnion-specifinty~tkf-na~

The words long, short. and unsiCJtled may be thought of as adjectives: the following combinations are
acceptable.

short int
lonq int
unsiqned int
lonq float
The meanin1 of the last is the same as double. Otherwise. at most one type-specifier may be given in a
declaration. If the type-specifier is missing from a declaration. it is taken to be int.
'Specifiers for structures and unions are discussed in §8.S: declarations with typedef names are discussed in §8.8.
8.3 Declarators
The declarator-list appearing in a declaration is a comma-separated sequence of declarators. each of
which may have an initializer.
d«larator-lisc
init-d«larator
init-declarator , declarator-list
init-declarator:
d«larator initialiur.,

Initializers are discussed in §8.6. The specifiers in the declaration indicate the type and storage class of
the objects to which the declarators refer. Declarators have the syntax:
d«larator:
identif~r

( declarator l
• declarator
d«larator ( l
d«larator Cconstant-expression.,, l

The groupin1 is the same as in expressions.

8.4 Meaning oC declarators
Each declarator is taken to be an assenion that when a construction of the same form as the declarator appears in an expression. it yields an object of the indicated type and storage class. Each declarator
contains exactly one identifier: it is this identifier that is declared.
(fan unadorned identifier appears as a declarator. then it has the type indicated by the specifier heading the declaration.
A declarator in parentheses is identical to the unadorned declarator. but the binding of complex
declarators may be altered by parentheses. See the examples below.
Now ima;;ine a declaration

2-16 The C Programming Language

T D1

where T is a type-specifier (like in'!. etc.) and 01 is a declarator. Suppose this declar:ition makes the
identifier have type •• ... T. •• where the •• ... " is empty if 01 is just a plain identifier (so lhat the 1ype of
x in ·• in1: x" is just i.ncl. Then if 01 has the form
the type of the conuine:1 identifier is " . . . pointer to T. ••
lf 01 has the form
DO

then the contained identifier has the type " ... function returning T. ••
lf 01 has the form
,
D (cansranc~"Cpressian)

or
D(]

then the contained identifier has type .. • • • array of T." ln the first case the constant expression is an
expression whose value is determinable at compile time. and whose type is int. (Constant expressions
are defined precisely in §IS.) When several "array of'.. spec:iftcations are adjacent. a multi-dimensional
arr.1y is created~ the constant expressions which specify the bounds of the arrays may be missing only for
the first member of the sequence. This elision is useful when the array is external and the actual
definition. which allocates storage. is given elsewhere. The first constant-expression may also be omitted
when the declarator is followed by initialization. In this case the size is c:alc:ulated from the number of
initial elements supplied.
An amy may be constructed from one of the basic types. from a pointer. from a structure or union.
or from another array (to generate a multi-dimensional array).
Not all the possibilities allowed by the syntax above are actually permitted. The restrictions are as
follows: functiom may not return arrays. structures. unions or functions, although they may return
pointers to ~uc:h things; there are no arrays of functions, although there may be arrays of pointers to
functions. Likewise a structure or union may not c:onuin a function. but it may contain a pointer to a
function.
As an example, the declaration
in1: i, dp, f(), •fip(), C•pfilO;

declares an integer i. a pointer ip to an integer, a runctibn f returning an integer. a function fip
returning a pointer to an integer, and a pointer pfi to a function which returns an integer. It is espe·
cially useful to compare the last two. The binding of •fip() is •<fip() l, so that the declaration sus·
sests. and the same construction in an expression requires. the calling of a function fip, and then using
indirection through the (pointer) result to yield an integer. In the declarator (•pfil (), the extra
parentheses are necessary, as they are also in an expression. to indicate that indirection through a pointer
to a function yields a function, which is then call~ it returns an inteaer.
As another example.

floa1: fa(17], •afp(17];
declares an array of floa1: numbers and an array of pointers to floa1: numbers. Finally,
sta1:ic in1: xld(l] (5](7];
declares a static three-dimensional array of integers. with rank 3 xSx7. In c:Omplete detail. xld is an
array of three items: eac:h item iS an array of five arrays; each of the latter arrays is an array of seven
integers. Any of the e.:cprmions xld. xld Ci J. xJd Ci J Cj l • xld ( i l ( j l (kl may reasonably appear in
an expression. The first three have type "array." the last has type int.
8.S Structure aod unioa declarations
A structure is an object consisting of a sequence of named members. Each member may have any
type. A union is an object which may. at a given time. contain any one of several members. Strucsure
and union specifiers have the same form.

The C Programming Language 2-17
stTUct-or-union-s{J«ifier:
struct-or-union I struct-decl-list I
struct-or-union identifier I stTUct·d«l-list I
srrucr-or-union identifier
struct-or-union:
struct
union
The struct-decl-list is a sequence of declarations for the members of the structure or union:

srruct-decl-list:
srruct-declaration
struct-declaration srrucr-decl-list
struct-declaration:
type-specifier struct-declarator-list ;
struct-declarator-list:
struct-declarator
srruct-declarator , srruct-declarator-list
In the usual case. a struct-declarator is just a declarator for a member of a structure or union. A structure member may also consist of a specified number of bits. Such a member is also called a field: itS
length is set off from the field name by a colon.

strucr-declarator:
declarator
declarator : constant-expression
: constant-expression
Within a structure. the objects declared have addresses which increase as their declarations are read left·
to-right. Each non-field member of a structure begins on an addressing boundary appropriate to itS type;
therefore. there may be unnamed holes in a structure. Field members are packed into machine integers;
they do not straddle words. A· field which does not fit into the space remaining in a word is put into the
next word. No field may be wider than a word. Fields are assigned right-to-left on the PDP-11. left-to·
right on other machines.
A struct-declarator with no declarator. only a colon and a width, indicates an unnamed field useful
for padding to conform to externally-imposed layouts. As a special case, an unnamed field with a width
of 0 specifies alignment of the next field at a word boundary. The .. next field .. presumably is a field. not
an ordinary structure member. because in the latter case the alignment would have been automatic.
The language does not restrict the types of things that are declared as fields. but implementations are
not required to support any but integer fields. Moreover, even int fields may be considered to be
unsigned. On the PDP·ll. fields are not signed and have only integer values. In all implementations.
there are no arrays of fields, and the address-of operator ' may not be applied to them. so that there are
no pointers to fields.
A union may be thought of as a structure all of whose members begin at offset 0 and whose size is
sufficient to contain any of its members. At most one of the members can be stored in a union at any
time.
A structure or union specifier of the second form. that is. one of
struct identifier ( struct-decl-list I
union identifier ( stTUct-det:l-list I
declares the identifier to be the structure t.ag (or union tag) of the structure specified by the list. A subse·
quent declaration may then use the third form of specifier, one of
struct identifier
union identifier
Structure tags allow definition of self-referential structures; they also permit the long part of the declaration to be given once and used several times. It is illegal to declare a structure or union which contains
an instance of itself, but a structure or union may contain a pointer to an instance of itself.

2-18 The C Programming Language

The names of members and tags may be the same as ordinary variables. However, names oi ta;s
and members must be mutually distinct.
Two structures may share a common initial sequence of members: that is. the same member may
appear in two different structures if it has the same type in both and if all previous members are the same
in both. (Actually. the compiler checks only that a name in two different structures has the same type
and olfset in both. but if preceding members differ the construction is nonportable.)
A simple example of a structure declaration is

st::ucc cnode (
cha.r twcrd C20] ;
inc ccunc;
st::uct c.~cde •lefc;
sc:uct tnode •:iqhc;
which contains an array of 20 characters. an integer. and two pointers to similar structures. Once this
declaration has been given. the declaration

st:ucc tnode s, •sp;
declares s to be a structure of the given sort and sp 10 be a pointer to a structure of the given sort. With
these declarations. the expression
sp->ccunc
refers to the ccunc field of the mucture to which sp points:

s.left
refers to the left subtree pointer of the structure s~ and

s.riqht->t--'Ord(Ol
refers to the first character of the OolCrd member of the right subtree of s.
8.6 Initialization
A declarator may specify an initial value for the identifier being declared. The initializer is preceded
by •, and consists of an expression or a list of values nested in braces..
initialiur:

• txpnssion
• I initializa-·list I

• I inititlli:a-·list ,
initializ~·list:

aprasion

initialiur·list , initiali;er·list
I initiali:8·1ist I

All the expressions in an initializer for a static or external variable must be constant expressions.
which are described in §IS. or expressions which reduce to the address of a previously declared variable.
possibly offset by a constant expression. Automatic or register variables may be initialized by arbitrary
expressions involving constants. and previously declared variables and functions.
Static and external variables which are no< initialized are guaranteed to start off as ().~ automatic: and
register variables which are not initialized are guaranteed to start off' as garbage:
When an initializer applies to a scalar (a pointer or an object of arithmetic: typeJ. it consists of a single expression. perhaps in braces. The initial value of the object is taken from the expression: the S3me
conversions as for assignment are performed.
When the declared variable is an aggr1gare (a structure or array) then the initializer consists of a
brace-enclosed. comma-separated list of initializers for the members of the aggregate. written in increasing subscript or member order. If the aggregate contains subaggregates. this rule applies recursively to
the members of the aggregate. If there ue fewer initializers in the list than there arc members of the
aggregate. then the aggrcpte is padded with O's. It is not permitted to initialize unions or automatic:
aggregates.

The C Programming Language 2-19

Braces may be elided as follows. If the initializer begins with a left brace. then the succeeding
comma-separated list of initializers initializes the members of the aggregate; it is erroneous for there to
be more initializers than members. If. however. the initializer docs not begin with a left brace. then only
enough elements from the list are taken to account for the members of the aggregate; any remaining
members are left to initialize the next member of the aggregate of which the current aggregate is a part.
A final abbreviation allows a char array to be initialized by a string. In this case successive charac·
ters of the string initialize the members of the array.
For example,
int x Cl • I 1 , 3 , S I ;
declares and initializes x as a I-dimensional array which has three members. since no size was specified
and there are three initializers.
float y(4] (J] • I

11,3,SJ,
I :Z, 4, 6 I,

13,S,71,
I;
is a completely-bracketed initialization: l, 3, and S initialize the first row of the array y (O]. namely
y(O] (O], y(O] [1 ), and y[O) C:ZJ. Likewise the next two lines initialize y(1 l and y(:Z]. The initializer ends early and therefore y [3) is initialized with 0. Precisely the same effect could have been
achieved by
float y(4](3] • I
1, 3, s, :z, 4, 6, 3,
I;

s, 7

The initializer for y begins with a left brace, but that for y[O) does not, therefore 3 elements from the
list are used. Likewise the next three arc taken successively for y[1) and y[:Z). Also.
float y[4)[3) • I
I 1 J, I :Z I, I 3 J, I 4 l
I;
initializes the first column of y (regarded as a two-dimensional array) and leaves the rest 0.
Finally,
char msc;Cl • "Syntax error on line "s\n";
shows a character array whose members arc initialized with a string.

8. 7 Type names
In two contexts (to specify type conversions explicitly by means of a cast, and as an argument of
sizeof) it is desired to supply the name of a data type. This is accomplished using a ••type name."
which in essence is a declaration for an object of that type which omits the name of the objcd.
type-name:
type-specifier abstract-declarator
abstract-declarator:
empty
( abstract-declarator )
• abstract·declararor
abstract-declarator ()
abstract-declarator [ constant-expression.,,, ]
To avoid ambiguity, in the construction

( abstract-declarator )
the abstract-declarator is required to be non-empty. Under this restriction, it is possible to identify
uniquely the location in the abstract-declarator where the identifier would appear if the construction were
a declarator in a declaration. The named type is then the same as the type of the hypothetical identifier. For example.

2-20 The C Programming Language
(

int
int •
int •(3}
int <•> Cll
int • ()
int <•l ()
n:ime respectively the types .. intcger..... pointer to integer," ··array of .3 pointers to integers.·· .. pointer
to an array of 3 integers.•• .. function returning pointer to integer." and .. pointer to function returning an
integer."
8.8 Typedef
Declarations whose .. storage class" is typedef do not define storage. but instead define identifiers
which Clln be ~ed later as if they were type keywords naming fundamental or derived types.
ry~f-name:

iMnli/i~r

Within the scope of a declaration involving typedef. each identifier appearing as part of any declarator
therein bea>me syntactically equivalent to the type keyword naming the type associated with the identifier
in the way deseribed in §8.4. For example, after

typedef int MILES, •ICI.ICXSP;
typedef struct (double re, im;I complex;
the constructions

MILES distance;
~em lCLICXSP metriei);
complex z, •zp;
are all lepl declarations: the type of distance is int. that of metric? is .. pointer to int." and that of
z is the specified structure. zp is a pointer to such a struc:ture.
typedef does not introduce brand new types. only synonyms for types which could be specified in
another way. Thus in the example above distance is considered to have exac:tly the same type as any
other int object.
9. Statements
Exc::pt as indicated. statements are executed in sequenc::.
9 .1 Expression statement
Most statements are expression statements. which have the form
txpnUion;
Us~lly expression statements are assignments or function calls.

9.2 Cqmpound statement. or bloclc
So that several statements an be used where one is expected. the.. compound statement (also. and
equivalently. called "block") is provided:
compound·stat~~nt:

( dttlaration-lis10,, stat~t'Mnt-list,,. I
dttlara11on-list:
dttlaration
d«laration d«laration-list
s1a~m~nt·lis1:
sta~~nt

sraterrwnt s1a1~m~n1-list

If any of the identifiers in the declaration-list were previously declared, the outer declaration is pushed
down ror the duration of the block. after which it resumes its force.

The C Programming Language 2-21
Any initializations of auto or register variables are performed each time the block is en~ .u
the top. It is currently possible (but a bad practice) to transfer into a block; in that case the initializations
are not performed. Initializations of static variables are performed only once when the program begins
execution. Inside a block. extern declarations do not reserve storage so initialization is not permitted.
9.J Conditional statement
The two forms of the conditional statement are
if ( expression l statement

if ( expression ) statement else statement
In both cases the expression is evaluated and if it is non-zero. the first substatement is executed. In the
second case the second substatement is executed if the expression is 0. As usual the "else" ambiguity is
resolved by connecting an else with the last encountered else-less if.
9.4 While statement
The while statement has the form
while ( expression ) statement
The substatement is executed repeatedly so long as the value of the expression remains non-zero. The
test takes place before each execution of the statement.

9.S Do statement
The do statement has the form
do statement while ( expression ) ;
The substatement is executed repeatedly until the value of the expression becomes zero. The test takes
place after each execution of the statement.
9.6 For statement
The for statement has the form
for ( expression-I.,, ; expression-2.,, ; expression-].,, > statement
This statement is equivalent to

expression-I ;
while ( expression-2 )
statement
expression-] ;
Thus the first expression specifies initialization for the loop; the second specifies a test, made before each
iteration. such that the loop is exited whert the expression becomes 0; the third expression often specifies
an incrementation which is performed after each iteration.
Any or all of the expressions may be dropped. A missing expression-2 makes the implied while
clause equivalent to while C1 ) ; other missing expressions are simply dropped from the expansion above.
9.7 Switch statement
The switch statement causes control to be transferred to one of several statements depending on
the value of an expression. It has the form
switch ( expression ) statement
The usual arithmetic conversion is performed on the expression, but the result must be int. The state·
ment is typically compound. Any statement within the statement may be labeled with one or more case
prefixes as follows:
case constant-expression :
where the constant expression must be int. No two of the case constants in the same switch may ha..-e
the same value. Constant expressions are precisely defined in §15.
There may also be at most one statement prefix of the form

2-22 The C Programming Language
f

default: :

When the s-.ritch statement is executed. its expression is evalua1ed and compared wi1h each c:ise constant. If one of the case constants is ec:;ual 10 the value of the expression. control is passed to the state•
ment following the matched case prefix. If no c:ase constant matches the expression. and if there is a
defa11lt prefix. control passes to the prefixed statement. If no case ma1ches and if there is no default
then none of the statements in the switch is executed.
case and defa.ul t prefixes in themselves do not alter the flow of control. which continues unimpeded across such prefixes. To exit rrom a switch. see brea.lc. §9.8.
Usually the statement that is the subject of a switch is compound. Declarations may appear at the
head of this statement. but initializations of automatic or register variables are ineffective.

9.S Break statement
The statement

break ;
causes termination of the smallest enclosing while. do. for. or switch statement: control passes to the
statement followin1 the terminated statement.

9.9 Continue statement
The statement

continue ;
causes control to pass to the loop.continuation portion of the smallest enclosing vhile. do. or for state·
ment: that is to the end of the loop. More precisely, in each of the statements
while ( ••• )

do I

concin: ;

c:oncin: ;
I while I ••• J ;

for ( ••• l

c:oncin: ;
I
a continue is equivalent to c;oco c:oncin. (Followin1 the c:oncin: is a null statement. §9.13.)
I

9.10 Return statement
A function returns to its caller by me:ins of the return statement, which has one of the rorms

return ;

return apr~ssion ;
In the first case the returned value is undefined. In the second case. the value of the expression is
returned to the caller of the function. If required. the expression is converted. as if by assignment. to the
type of the function in which it appears. Flowing oft" the end of a function is equivalent to a return with
no returned value.
9.11 Ooto statement
Control may be transferred unconditionally by means of the statement

c;oco i<Unti/iu ;
The identifier must be a label (§9.12) located in the current function.
9.12 Labeled statement
Any statement may be preceded by l:abel prefixes of the form

identljier :
which serve to declare the identifier as a label. The only use of a label is as a target of a qoto. The
scope of a label is the current function. excluding any sub-blocks in which the same identifier has been
redeclared. See § 11.

The C Programming Language 2-23
9.13 Null statement
The null statement has the form
A null statement is useful to carry a label just before the I of a compound statement or to supply a null
body to a looping statement such as while.
10. External definitions
A C program consists of a sequence of external definitions. An external definition declares an
identifier to have storage class extern (by default) or perhaps static. and a specified type. The type·
specifier (§8.2) may also be empty. in which case the type is taken to be int. The scope of external
definitions persists to the end of the file in which they are declared just as the effect of declarations persists to the end of a block. The syntax of external definiiions is the same as that of all declarations.
except that only at this level may the code for functions be given.
10.1 External function definitions
Function definitions have the form
junction-definition:
decl-specijiers,,,.function-declarator function-body

The only sc:-specifiers allowed among the decl-specifiers are extern or static; see §11.2 for the distinc·
tion between them. A function declaraior is similar to a declarator for a .. function returning ... •• except
that it lists the formal parameters of the function being defined.
function-declarator:
declarator ( parameter-list,,,, l
parameter-list:
identifier
idt!ntiflt!r , parameter-list

The function-body ha5 the form
function-~dy:

declaration-list compound-statement

The identifiers in the parameter list. and only those identifiers. may be declared in !he declaration list.
Any identifiers whose type is not given are taken to be int. The only storage class which may be
specified is reqister. if it is specified. the corresponding actual parameter will be copied. if possible.
into a register at the outset of the function.
A simple example of a complete function definition is
int max(a, b, cl
int a, b 1 c;
(

int m;
m • (a > bl ? a : b;
return((m >cl ? m
cl;

Here int is the type-specifier; max< a 1 b, cl is. the function-declarator. int a, b, c; is the
declaration-list for the formal parameters; I • . • I is the block giving the code for the statement.
C converts all float actual parameters to double, so formal parameters declared float have their
declaration adjusted to read double. Also. since a reference to an array in any context (in particular as
an actual parameter) is taken to mean a pointer to the first element of the array. declarations of formal
parameters declared "array of ..... are adjusted to read "pointer to ... ". Finally. because structures.
unions and functions cannot be passed to a function, it is useless to declare a formal parameter to be a
structure. union or function (pointers to such objects are of course permitted).

2-24 The C Programming Language

10.2 External data definitions
An external data definition has the form
data-definition:
declarauon

The storage class or suc:h data may be eX1:e=n (which is the default) or s1:a1:ic. but not au-:o or
r~is1:er.

11. Scope rules
A C pro1ram need not all be compiled at the same time: the sourc:e text of the program may be kept
in several files. and precompiled routines may be loaded from libraries. Communication amon1 the func·
tions or a program may be c:irried out both through explicit c:ills and through manipulation of external
data.
Therefore. there are two kinds or scope to consider: first. what may be called the le."Cical scope.of an
identifier. which is essentially the region of a program during which it may be used without drawing
··undefined identifier'' diagnostics; and second. the scope associated with external identifiers. which is
characterized by the rule that references to the same external identifier are referenc:es to the same object.

11.l Lexical scope
The lexical scope or identifiers declared in external definitions persists from the definition through
the end or the source file in which they appear. The lexial scope of identifiers which are fcrmal parame·
ters persists through the function with which they are associated. The lexical scope of identifiers declared
at the head or bloc:ks persists until the end or the bloclc. The lexical scope or labels is the wh.ole or the
function in which they appear.
Because all references to the same external identifier refer to the same object (see §11.2> the c:om·
piler c:hecks all declarations or the same external identifier for c:ompatibility: in effect their scope is
inc:reased to the whole file in which they appear.
In a,11 c:ases. however. if an identifier is explicitly dedared at the head of a bloc:k. inc:luding the block
constituting a function. any dec:larouion or that identifier outside the block is suspended until the end of
the block.
Remember also (§8.S) that identifiers associated with ordinary variables on the one hand and those
associated with structure and union members and tags on the other form two disjoint c:lasses whic:h do
not c:onilict. Members and tap follow the same scope rules as other identifiers. typedef names are in
the same class as ordinary identifiers. They may be redeclared in inner blocks. but an explicit type must
be given in the inner declaration:

typedef float distance;
auto int distance:
The int must be present in the second declaration. or it would be taken to be a declaration with no
declarators and type dis~c:et.

11.2 Scope or externals
If a func:tion refers to an identifier declared to be extern. then somewhere among the files or
libraries c:onstituting the c:omplete program there must be an external definition for the identifier. All
runc:tions in a given program which refer to the same extern:al identifier refer to the same object, so c:are
must be taken that the type and size specified in the definition are compatible with those specified by e:ich
runc:tion whic:h references the data.
The :appearance of the ex1:ern keyword in an external definition indic:ates that storage for the
identifiers bein1 declared will be allocated in another file. Thus in a multi-tile program. an external data
definition without the ex1:ern specifier must appear in exactly one of the files. Any other files which
wish to give an external definition for the identifier must include the ex-=e::n in the definition. The
identifier can be initialized only in the dec:taration where storage is allocated.
Identifiers declared sta~ic: at the top level in external definitions are not visible in other tiles.
Func:tions may be declared sta1:ic.
th is itrn4 111a1 1lle u:e is 1111n llcre.

The C Programming Language 2-25
\

12. Compiler control lines
The C compiler contains a preprocessor capable of macro substitution. conditional compilation. and
inclusion of named files. Lines beginning with t communicate with this preprocessor. These lines have
syntax independent of the rest of the language; they may appear anywhere and have effect which lasts
(independent of scope) until the end of the source program file.
12.1 Token replacement
A compiler-control line of the form

#define identifier token-string
(note: no trailing semicolon) causes the preprocessor to replace subsequent instances of the identifier with
the given string of tokens. A line of the form
tdefine identifier( identifier , • • • , identifier l token-string
where there is no space between the first identifier and the (, is a macro definition with arguments. Subsequent instances of the first identifier followed by a c. a sequence of tokens delimited by commas. and a
> are replaced by the token string in the definition. Each occurrence of an identifier mentioned in the
formal parameter list of the definition is replaced by the corresponding token string rrom the call. The
actual arguments in the call are token strings separated by commas; however commas in quoted strings or
protected by parentheses do not separate arguments. The number of formal and actual parameters must
be the same. Text inside a string or a character constant is not subject to replacement.
In both forms the replacement string is rescanned for more defined identifiers. In both forms a long
definition may be continued on another line by writing \ at the end of the line to be continued.
This facility is most valuable for definition of "manifest constantS." as in

#define TABSIZE 100
int table[TABSIZE];
A control line of the form

#undef identifier
causes the identifier's preprocessor definition to be forgotten.
12.2 File inclusion
A compiler control line of the form

#include "filename"
causes the replacement of that line by the entire contents of the file filename. The named file is searched
for first in the directory of the original source file, and then in a sequence of standard places. Alternatively, a control line or the form

#include </ikname>
searches only the standard places. and not the directory of the source file.
tinclude's may be nested.
12.3 Conditional compilation
A compiler control line of the form
#if constant-expression

checlcs whether the constant expression (see §15) evaluates to non-zero. A control line of the form

#ifdef identifier
c:heclcs whether the identifier is currently defined in the preprocessor; that is. whether it has been the
subject of a tdefine control line. A control line of the form
#ifndef identifier

checlcs whether the identifier is currently undefined in the preprocessor.
All three forms are followed by an arbitrary number of lines. possibly containing a control line

2-26 The C Programming Language

•else

and then by a control line
Jendif

(f the checked condition is true then any lines between hlse and •endif are ignored. If the checked
condition is false then any lines between the test and an •else or. lacking an •else. the #endif. are
ignored.
These constructions may be nested.
12.4 Line control
For the benefit of other preprocessors which generate C programs. a line of the form
Uine constant identifier

c:u.ises the compiler to believe. for purposes of error diagnostics. that the line number of the next source
line is given by the constant and the current input file is named by the identifier. If the identifie:- is
absent the remembered file name does not change.
13. Implicit declarations
It is not always necessary to specify both the storage class and the type of identifiers in a declaration.
The storage class is supplied by the context in external definitions and in declarations of formal parame·
ters and structure members. In a declaration inside a function. if a storage class but no type is given. the
identifier is assumed to be int: if a type but no storage class is indicated. the identifier is assumed to be
auto. An exception to the latter rule is made for functions. since auto functions are meaningless CC
being inapable or compiling code into the stack); if the type of an identifier is .. functior. returning ...••• it
is implicitly declared to be extern.
ln an expression. an identifier followed by ( and not already declared is contextually declared to be
.. function returning int...

14. Types revisited
This section summarizes the operations which can be performed on objects of certain types.

J4.1 Structures and unions
There are only two things tha~ can be done with a structure or union: name one of its members (by
means of the • operator); or take iis•address (by unary &.). Other operations. sµc:h as assigning from or
to it or passing it as a parameter. draw an error message. ln the future. it is expected that these opera·
tions. but not necessarily others. will be allowed.
§7.l says that in a direct or indirect structure reference (with • or
the name on the right must
be a member of the structure named or pointed to by the expression on the left. To allow an escape
from the typing rufes. this restriction is not firmly enforced by the compiler. In fact. any lvalue is ailowed
before •• and that !value is then assumed to have the form of the structure of which the name on the
right is a member. Also. the expression before a -> is required only to be a pointer or an integer. If a
pointer. it is assumed to poinc to a scructure of which the name on the right is a member. If an integer.
it is taken to be the absolute address. in machine storage units. of the appropriate structure.
Such constructions are non-portable.

-»

14.2 Functions
There are only two things that can be done with a function: call it. or take its address. If the name
of a function appears in an expression not in the function-name position of a all. a pointer to the rune·
tion is generated. Thus. to pass one function to another. one might say
int f (}:
CJ (f):

Then the definition of CJ might read

The C Programming Language 2-27

qCfuncp)
int (•funcpl ();
(

Notice that f must be declared explicitly in the calling routine since its appearance in q <fl was not fol·
lowed by (.
14.3 Arrays. pointers. and subscripting
Every 1ime an identifier of array type appears in an expression. ii is converted into a pointer to the
first member of the array. Because of this conversion. arrays are not lvalues. By definition, the subscript
operator ( J is interpreted in such a way that E1 (E2) is identical to * ( (E1 l + <E2 l l. Because of the
conversion rules which apply to +, if E1 is an array and E2 an integer. then E1 (E2] refers to the E2·th
member of E1. Therefore. despite its asymmetric appearance. subscripting is a commutative operation.
A consistent rule is followed in the case of multi-dimensional arrays. If E is an n·dimensional array
of rank iXjX • • • x k. then E appearing in an expression is converted to a pointer to an (n-1).
dimensional array with rank jx · · · xk. If the *operator. either explicitly or implicitly as a result of
subscripting. is applied to this pointer. the result is the pointed-to (n-0-dimensional array. which itself
is immediately converted into a pointer.
For example. consider
int x(JJ(S);

Here xis a JxS array of integers. When x appears in an expression. it is converted to a pointer to (the
first of three) S-membered arrays of integers. In the expression x(i). which is equivalent to • (x+il. x
is first converted to a pointer as described; then i is converted to the type of x. which involves multiplying i by the length the object to which the pointer points. namely S integer objects. The results are
added and indirection applied to yield an array (of S integers) which in tum is converted to a pointer to
the first of the integers. If there is another subscript the same argument applies again; this time the
result is an integer.
.
It follows from all this that'arrays in Care stored row-wise (last subscript varies fastest) and that the
first subscript in the declaration helps determine the amount of storage consumed by an array but plays
no other part in subscript calculations.
14.4 Explicit pointer conversions
Certain conversions involving pointers are permitted but have implementation-dependent aspects.
They are all specified by means of an explicit type-conversion operator. §§7.2 and 8. 7.
A pointer may be converted to any of the integral types large enough to hold it. Whether an int or
lonq is required is machine dependent. The mapping function is also machine dependent, but is
intended to be unsurprising to those who know the addressing structure of the machine. Details for
some particular machines are given below.
An object of integral type may be explicitly converted to a pointer. The mapping always carries an
integer converted from a pointer back to the same pointer, but is otherwise machine dependent.
A pointer to one type may be converted to a pointer to another type. The resulting pointer may
cause addressing exceptions upon use if the subject pointer does not refer to an object suitably aligned in
storage. It is guaranteed that a pointer to an object of a given size may be converted to a pointer to an
object of a smaller size and back again without change.
For example. a storage-allocation routine might accept a size (in bytes) of an object to allocate, and
return a char pointer: it might be used in this way.

extern char •alloc(l;
double •dp;
dp • (double •> alloc(sizeof(double)l;
•dp • 22.0 I 7.0;
alloc must ensure (in a machine-dependent way) that its return value is suitable for conversion to a
pointer to double; then the use of the function is portable.

2-28 The C Programming Language
The pointer representation on the POP· 11 corresponds to a 16-bit integer and is me:isured in bytes.
chars have no alignment requirements: everything else must have an even address.

On the Honeywell 6000. a pointer corresponds to a 36-bit integer: the word part is in the left 18 bits.
:ind the two bits that select the character in a word just to their right. Thus c:hAr pointers are measured
in units of 2111 bytes: everything else is me:isured in units of 211 machine words. double quantities .ind
aggreptes containing them must lie on an even word address (0 mod 2 19l.
The IBM 370 and the Interdata S/32 are similar. On both. addresses are measured in bytes: elemen·
tary objects must be aligned on a boundary equal to their length. so pointers to shor~ must be 0 mod 2.
to inc :ind floac 0 mod 4, and to double 0 mod 8. Aggreiates are aligned on the strictest boundary
required by any of their constituents.
1S. Constant expressions
ln several places C requires expressions which evaluate to a constant: after c:ase. as array bounds.
and in initializers. In the first two cases. the expression c::in involve only integer constants. character con·
st:ints. and sizeof expressions. possibly connected by the binary operators
+

•

••

I•

<•

>•

or by the unary operators
or by the ternary operuor

?:
Parentheses c:an be used for grouping. but not for function cails.
More latitude is permitted for initializers: besides constant expressions as discussed above. one can
also apply the unary ' operator to external or Static: objectS. and to external or static: arrays subscripted
with a CO'nStant expression. The unary ' can also be appljed implicitly by appearance of unsubscripted
arrays and functions. The basic: rule is that initializers must evaluate either to a constant or to the
address of a previously declared external or static: object plus or minus a constant.
16. Portability considerations
Certain parts of C are inherently machine dependent. The following list of potential trouble spots is
not me:int to be all-inclusive. but io point out the main ones.
Purely hardware issues like word size and the properties of floating point arithmetic: and integer divi·
sion have proven in practice: to be not much of a problem. Other facets of the hardware are reflected in
differing implementations. Some of these. particularly sign extension (converting a negative character
into a negative inteier) and the order in which bytes are placed in a word. are a nuisance that must be
carefully watched. Most of the others are only minor problems.
The number of reqister variables that can actually be placed in registers varies from machine to
machine. as does the set of valid types. Nonetheless. the compilers all do things properly for their own
machine: excess or invalid reqister declarations are ignored.
Some difficulties arise only when dubious coding practices are used. It is exc:e:dingly unwise to write
programs that depend on any of these properties.
The order of evaluation of function arguments is not specified by the-language. It is right to left on
the POP·l l. and VAX·! I. left to right on the others. The order in which side effects take plac:: is also
unspecified.
Since character constants are really objects of type i.nt. multi-character character c:oRstants may be
permitted. The specific: implementation is very machine dependent. however. because the order in which
characters are assigned to a word varies from one machine to another.
Fields are assigned to words and characters to integers right-to-left on the POP· I I and VAX· I I and
left-to-right on other machines. These differences are invisible to isolated programs which do not indulge
in type punning (for example. by converting an int: pointer to a char pointer and inspecting the
pointed-to storage). but must be accounted for when conforming to e~ternally-imposed storage layouts.
The language accepted by the various compilers differs in minor details. Most notably, the current
PDP· I I compiler will not initialize structures containing bit·lields. and does not accept a few assignment
operators in certain contexts where the value of the assignment is used.

(

The C Programming Larl.guage 2-29

17. Anachronisms
.
Since C is an evolving language. cenain obsolete constructions may be found in older programs.
Ahhough most versions or the compiler supp<>rt such anachronisms. ultimately they will di~appear. leaving only a portability problem ~hind.
Earlier versions or c used the rorm -op instead of op- for assignment operators. This leads lO
ambiguities. typifted by
X•-1

which actually decrement$ :ic since the • and the - are adjaceni. but which might easily be intended to
assign -1 to x.
The syntax or initializers has changed: previously. the equals sign that introduces an initializer was
not present. so instead or
int

• 1;

int

one used
The change wu made beciusd the initialization
int

(1 +2)

resembles a function declaration closely enough to conruse the compilers.

. 2-30 The C Programming Language
18. Syntax Summary
This summary of C syntax is intended more for aidin1 comprehension than as an exact statement or
the lan1uage.
18.1 Expressions
The basic expressions are:
upnssion:
primary

• e.Tf1',SSion
'apr,ssion
- aprmion
! aprmion
- upnssion
-

tvalw
lva/w

lwztwlvalwsizeof txfl'tSSIOn
( ty~'lltlnw ) aprasion
Dqlf'ttSion binop aprnsion
aprtssion ? aprasion : apnssion
lwzlw asgnop aprasion
Dqlf'asiotr I ~tr

primary:

itantifwr
constant
$11'inf
( aplTSSiOlf )

primary < apnuion-lin.,, >
primaly .( txprasion l

IYG/w • id,ntijier
primary-> id,,nti/itr
lvalw:
irknti/ler
primary ( aprtssion l
Iva/rut , ifknti/iB
primary -> itkntijier
• apttUion
< /.wzlw >

The primary-expression operators
()

(]

have highest priority and group lefMo-right. The unary operators

sizeof

( ~-nam' l

have priority below the primary operators but higher than any binary operator. and group right·to•left.
Binary operators group left-to-right~ they have priority decreasing as indiated below. The conditional
operator groups right 10 left.

The C Programming Language 2-31

binap:

•

+
>>
<

--'

,,.

<<
>

<•

>•

I•

Assignmenl operators all have the same priority, and all group right-to-left.
asgnap:
-

+•

The comma operator has the lowest priority, and groups left-to-right.

I 1.2 Declarations
d«lal'tltion:
d«/•SJl«ijiBS init-t/«fal'tltrw•list.,, ;
d«l-sP«ifters:
IYIM-SfJ«ifter decl·SP«ifien.,,
SC·sp«ifier d«l·sp«ifiers.,,
SC•sp«if1er:

auto
static
extern
r~ister

typedef
IY/M'"SP«ifier:

ch&r
short
int
lonq
unsiqned
float
double
srruct·or-union·sp«ifier
typedef-natM
init·d«larator-list:
init·declaratol'
init-<Jeclaratrw , init-d«larator·list
init-declarator:
d«lal'tltor initializer.,,
declarator:
identifier
< declarator )
• declarator
declarator ( )
declarator ( constant-expression.,, ]

~-32

The C Programmi]Jg LJinguage
suuct-"'~urrio~·s~ifitt:
s t:ruc t: I struet·fi«l·lisz I

st:ruc1: id1ntiffer I struct-d«J.list I
si:ruc 1: itkntifltt
gziion I SITUCt..d«l·list )
uniQn idtntifltr I stTUCt•dtt/.list I
union idlntfjkr
SITUCt-d«l·list:
Stl'flCt•dttfatatlQn

struet•d«f(l1'9lion SlftlCt.-d«/.liSl
Strflt:t-d«larq1io,,:
fYll*-sp«ifier Sln,lt:t·d«larator·list ;

Sll'flCt-d«lotatqr·list;
StlVCtC/a'°"tar
stn1Ct·d«larator , str11C1"'1«/ara«Jr·lisz

S11'11Ct_.dararor:
dcclqrator
d«latator : CfltUlf!nt-1.'CpraSion
: COllStQ(ft•Cttf!'mHJll

initialiur:

• txpaJion

• ( ini11t1it;a-liSl I
•

( iflili(lli:D·liSl '

inititl/;ur·list:
CIC/#~iO(f

inifiali?'·list , initialiu,.list
I initit1li:er·lis1 I
tyfJ(t-nanw:
IY!*'~V"' a~uact~/arotor

absuoct-dfC/qtatar:

'"''"'
.ab$lract-M(laratw
C

' a(Js"4ct·dttlantror

abslract-dtttOn,ior <>

abstract-dtclalflltl' ~ constant~pnssion.,, ]
ryptdtf-nt1m#:
i•nrifitr

18.3 Statements
compound-srattmtnt:
I dttlarattan·list"" stattmtnt·list.,, I
dttlaratt01'-ljfr.
dttlarotton
dttlarar1on d«lara11on·list

The C Programming Language 2-33
statement-list:
statement
statement statement-list
statement:
compound-statement
ccpression ;
if ( expression ) statement
if ( expression ) statement else statement
while ( expression ) statement
do statement while ( expression ) ;
for ( expression-/°" ; expression-2°" ; expression·).., l statement
switch ( e."Cpression ) statement
case constant-expression : statement
default : statement
break ;
continue ;
return ;
return expression ;
9oto identifier ;
identifter : statement

18.4 External definitions
fJ'Otram:
external-definition
external-definition program
external-definition:
function-definition
data-defi1filion
function-definition:
type-specifier°" function-declarator function-body
function-declarator:
declararor ( parameter-list°" >
parameter-list:
identifter
identifier , parameter-list
function-body:
rype-decl-list function-statement
function-statement·
( declaration-list., statement-list I
data-definition:
extern.,, type-specifier.,,, init-declarator-list°" ;
static.,, rype-specijier init-declarator-list°" ;

°"

18.S Preprocessor

2-34 The C Programming Language
IJdefine identt/ier token-string
tdefine identifier< identifier , • • • , id4ntijier > token-suing
lunde f identifier
tinc:lude "Jilena~"
Unc:lude <jilellll~>
t i f r:onstant.uprf!SS/on

lifdef id4ntifier
ftifndef identifier
telse

tendif
llin• const4nt id4ntijier

The C Programming Language 2-35

Recent Changes to C
November 15. 1978
A rew extensions have been made to the C language beyond what is described in the rererence document ("The C Programming Language." Kernighan and Ritchie. Prentice-Hall. 1978).

I. Structure assignment
Structures may be assigned. passed as arguments to runctions. and returned by functions. The types
of operands taking part must be the same. Other plausible operators. such as equality comparison. have
not been implemented.
There is a subtle defect in the POP-I I implementation or fonctions that return structures: ir an interrupt occurs during the return sequence, and the same function is called reentrantly during the interrupt.
the value returned rrom the first call may be corrupted. The problem can occur only in the presence or
true interrupts, as in an operating system or a user program that makes significant use or signals: ordinary
recursive calls are quite safe.

2. Enumeration type
There is a new data type analogous to the scalar types of Pascal. To the type-specifiers in the syntax
on p. 193 of the C book add

enum-specifier
with syntax

enum-specifier:
enu.m. I enum-list I
enu.m. identifier I enum-list I
enu.m. identifier
enum-list:
enumerator
enum-list , enumerator
enumerator:
identifier
identifier • constant-expression
The role of the identifier in the enum-specifi.er is entirely analogous to that or the structure tag in a
struct-specifier: it names a particular enumeration. For example,

enu.m. color I chartreuse, burgundy, claret, winedark l;
enwn color •cp, col;

makes color the enumeration-tag of a type describing various colors, and then declares ep as a pointer
to an object of that type, and col as an object of that type.
The identifiers in the enum-list are declared as constants. and may appear wherever constants are
required. If no enumerators with •appear. then the values of the constants begin at 0 and increase by I
as the declaration is read from left to right. An enumerator with • gives the associated identifier the
value indicated: subsequent identifiers continue the progression from the assigned value.
Enumeration tags and constants must all be distinct. and. unlike structure tags and members, are
drawn from the same set as ordinary identifiers.
Objects of a given enumeration type are regarded as having a type distinct from objects of all other
types. and lint flags type mismatches. In the PDP·ll implementation all enumeration variables are treated
as if they were int.

A Tour Through the Portable C Compiler 2-37

A Tour Through the Portable C Compiler
S. C. Johnson
Bell Laboratories
Murray Hill, New Jersey 07974

Introduction
A C compiler has been implemented that has proved to be quite portable, serving as the
basis for C compilers on roughly a dozen machines, including the Honeywell 6000, IBM 370,
and Interdata 8/32. The compiler is highly compatible with the C language standard. 1
Among the goals of this compiler are portability, high reliability, and the use of state-ofthe-art techniques and tools wherever practical. Although the efficiency of the compiling process is not a primary goal, the compiler is efficient enough, and produces good enough code, to
serve as a production compiler.
The language implemented is highly compatible with the current PDP-11 version of C.
Moreover, roughly 75% of the compiler, including nearly all the syntactic and semantic routines, is machine independent. The compiler also serves as the major portion of the program
lint, described elsewhere. 2
A number of earlier attempts to make portable compilers are worth noting. While on
CO-OP assignment to Bell Labs in 1973, Alan Snyder wrote a portable C compiler which was
the basis of his Master's Thesis at M.I.T. 3 This compiler was very slow and complicated, and
contained a number of rather serious implementation difficulties; nevertheless, a number of
Snyder's ideas appear in this work.
Most earlier portable compilers, including Snyder's, have proceeded by defining an intermediate language, perhaps based on three-address code or code for a stack machine, and writing a machine independent program to translate from the source code to this intermediate
code. The intermediate code is then read by a second pass, and interpreted or compiled. This
approach is elegant, and has a number of advantages, especially if the target machine is far
removed from the host. It suffers from some disadvantages as well. Some constructions, like
initialization and subroutine prologs, are difficult or expensive to express in a machine
independent way that still allows them to be easily adapted to the target assemblers. Most of
these approaches require a symbol table to be constructed in the second (machine dependent)
pass, and/or require powerful target assemblers. Also, many conversion operators may be generated that have no effect on a given machine, but may be needed op. others (for example,
pointer to pointer conversions usually do nothing in C, but must be generated because there
are some machines where they are significant).
For these reasons, the first pass of the portable compiler is not entirely machine
independent. It contains some machine dependent features, such as initialization, subroutine
prolog and epilog, certain storage allocation functions, code for the switch statement, and code
to throw out unneeded conversion operators.
As a crude measure of the degree of portability actually achieved, the Interdata 8/32 C
compiler has roughly 600 machine dependent lines of source out of 4600 in Pass 1, and 1000
out of 3400 in Pass 2. In total, 1600 out of 8000, or 20%, of the total source is machine
dependent (12% in Pass 1, 30% in Pass 2). These percentages can be expected to rise slightly
as the compiler is tuned. The percentage of machine-dependent code for the IBM is 22 % , for
the Honeywell 25 % . If the assembler format and structure were the same for all these

2-38 A Tour Through the Portable C Compiler
machines, perhaps another 5-10% of the code would become machine independent.
These figures are sufficiently misleading as to be almost meaningless. A large fraction of
the machine dependent code can be converted in a straightforward, almost mechanical way.
On the other hand, a certain amount of the code requres hard intellectual effort to convert,
since the algorithms embodied in this part of the code are typically complicated and machine
dependent.
To summarize, however, if you need a C compiler written for a machine with a reasonable architecture, the compiler is already three quarters finished!
Overview
- - - - This paper discusses the structure and Q!&_anization of the portable compiler. The intent
is to JvetlieoifPICture;-·ratlier.than ciiSC:\i-;-ing 'iiie <ietaffsor aparliCiilarmacliiiie implementation. After a brief overview and a discussion of the source file structure, the paper describes
the major data structures, and then delves more closely into the two passes. Some of the
theoretical work on which the compiler is based, and its application to the compiler, is discussed elsewhere. 4 One of the major design issues in any C compiler, the design of the calling
sequence and stack frame, is the subject of a separate memorandum. 5
The compiler consists of two passes, passl and pass2, that together turn C source code
into assembler code for the target machine. The two passes are preceded by a preprocessor,
that handles the #define and #include statements, and related features (e.g., #ifdef, etc.).
It is a nearly machine independent program, and will not be further discussed here.
The output of the preprocessor is a text file that is read as the standard input of the first
pass. This produces as standard output another text file that becomes the standard input of
the second pass. The second pass produces, as standard output, the desired assembler
language sourc'e code. The preprocessor and the two passes all write error messages on the
standard error file. Thus the compiler itself makes few demands on the 1/0 library support,
aiding in the bootstrapping process.
Although the compiler is divided into two passes, this represents historical accident more
than deep necessity. In fact, the compiler can optionally be loaded so that both passes
operate in the same program. This "one pass" operation eliminates the overhead of reading
and writing the intermediate file, so the compiler operates about 30% faster in this mode. It
also occupies about 30 % more space than the larger of the two component passes.
Because the compiler is fundamentally structured as two passes, even when loaded as
one, this document· primarily describes the two pass version.
The first pass does the lexical analysis, parsing, and symbol table maintenance. It also
constructs parse trees for expressions, and keeps track of the types of the nodes in these trees.
Additional code is devoted to initialization. Machine dependent portions of the first pass
serve to generate subroutine prologs and epilogs, code for switches, and code for branches,
label definitions, alignment operations, changes of location counter, etc.
The intermediate file is a text file organized into lines. Lines beginning with a right
parenthesis are copied by the second pass directly to its output file, with the parenthesis
stripped off. Thus, when the first pass produces assembly code, such as subroutine prologs,
etc., each line is prefaced with a right parenthesis; the second pass passes these lines to
through to the assembler.
The major job done by the second pass is generation of code for expressions. The
expression parse trees produced in the first pass are written onto the intermediate file in Polish Prefix form: first, there is a line beginning with a period, followed by the source file line
number and name on which the expression appeared (for debugging purposes). The successive
lines represent the nodes of the parse tree, one node per line. Each line contains the node
number, type, and any values (e.g., values of constants) that may appear in the node. Lines
representing nodes with descendants are immediately followed by the left subtree of descendants, then the right. Since the number of descendants of any node is completely determined

A Tour Through the Portable C Compiler 2-39
by the node number, there is no need to mark the end of the tree.
There are only two other line types in the intermediate file. Lines beginning with a left
square bracket ('[') represent the beginning of blocks (delimited by { ... } in the C source);
lines beginning with right square brackets (']') represent the end of blocks. The remainder of
these lines tell how much stack space, and how many register variables, are currently in use.
Thus, the second pass reads the intermediate files, copies the')' lines, makes note of the
information in the '[' and ']' lines, and devotes most of its effort to the '.' lines and their associated expression trees, turning them turns into assembly code to evaluate the expressions.
In the one pass version of the compiler, the expression trees tha,t are built by the first
pass have been declared to have room for the second pass information as well. Instead of
writing the trees onto an intermediate file, each tree is transformed in place into an acceptable
form for the code generator. The code generator then writes the result of compiling this tree
onto the standard output. Instead of'[' and ']'lines in the intermediate file, the information
is passed directly to the second pass routines. Assembly code produced by the first pass is
simply written out, without the need for')' at the head of each line.

The Source Files
The compiler source consists of 22 source files. Two files, manifest and macdefs, are
header files included with all other files. Manifest has declarations for the node numbers,
types, storage classes, and other global data definitions. Macdefs has machine-dependent
definitions, such as the size and alignment of the various data representations. Two machine
independent header files, mfilel and mfile2, contain the data structure and manifest
definitions for the first and second passes, respectively. In the second pass, a machine dependent header file, mac2defs, contains declarations of register names, etc.
There is a file, common, containing (machine independent) routines used in both passes.
These include routines for allocating and freeing trees, walking over trees, printing debugging
information, and printing error messages. There are two dummy files, comml .c and comm2.c,
that simply include common within the scope of the appropriate passl or pass2 header files.
When the compiler is loaded as a single pass, common only needs to be included once:
comm2.c is not needed.
Entire sections of this document are devoted to the detailed structure of the passes. For
the moment, we just give a brief description of the files. The first pass is obtained by compiling and loading scan.c, cgram.c, xdefs.c, pftn.c, trees.c, optim.c, local.c, code.c, and
comml.c. Scan.c is the lexical analyzer, which is used by cgram.c, the result of applying
Yacc 6 to the input grammar cgram.y. Xdefs.c is a short file of external definitions. Pftn.c
maintains the symbol table, and does initialization. Trees.c builds the expression trees, and
computes the node types. Optim.c does some machine independent optimizations on the
expression trees. Comml.c includes common, that contains service routines common to the
two passes of the compiler. All the above files are machine independent. The files local.c and
code.c contain machine dependent code for generating subroutine prologs, switch code, and
the like.
The second pass is produced by compiling and loading reader.c, allo.c, match.c,
comml .c, order.c, local.c, and table.c. Reader.c reads the intermediate file, and controls the
major logic of the code generation. Allo.c keeps track of busy and free registers. Match.c
controls the matching of code templates to subtrees of the expression tree to be compiled.
Comm2.c includes the file common, as in the first pass. The above files are machine independent. Order.c controls the machine dependent details of the code generation strategy.
Local2.c has many small machine dependent routines, and tables of opcodes, register types,
etc. Table.c has the code template tables, which are also clearly machine dependent.

2-40 A Tour Through the Portable C Compiler

Data Structure Considerations.
This section discusses the node numbers, type words, and expression trees, used
throughout both passes of the compiler.
The file manifest defines those symbols used throughout both passes. The intent is to
use the same symbol name (e.g., MINUS) for the given operator throughout the lexical
analysis, parsing, tree building, and code generation phases; this requires some synchronization with the Yacc input file, cgram.y, as well.
,
A token like MINUS may be seen in the lexical analyzer before it is known whether it is
a unary or binary operator; clearly, it is necessary to know this by the time the parse tree is
constructed. Thus, an operator (really a macro) called UNARY is provided, so that MINUS
and UNARY MINUS are both distinct node numbers. Similarly, many binary operators exist
in an assignment form (for example, -= ), and the operator ASG may be applied to such node
names to generate new ones, e.g. ASG MINUS.
It is frequently desirable to know if a node represents a leaf (no descendants), a unary
operator (one descendant) or a binary operator (two descendants). The macro optype(o)
returns one of the manifest constants LTYPE, UTYPE, or BITYPE, respectively, depending
on the node number o. Similarly, asgop(o) returns true if o is an assignment operator
number (=, +=,etc. ), and logop(o) returns true if o is a relational or logical (&&, 11,or !)
operator.
C has a rich typing structure, with a potentially infinite number of types. To begin with,
there are the basic types: CHAR, SHORT, INT, LONG, the unsigned versions known as
UCHAR, USHORT, UNSIGNED, ULONG, and FLOAT, DOUBLE, and finally STRTY (a
structure), UNIONTY, and ENUMTY. Then, there are three operators that can be applied to
types to make others: if t is a type, we may potentially have types pointer to t, function
returning t, and array of t's generated from t. Thus, an arbitrary type in C consists of a
basic type, and zero or more of these operators.
In the compiler, a type is represented by an unsigned integer; the rightmost four bits
hold the basic type, and the remaining bits are divided into two-bit fields, containing 0 (no
operator), or one of the three operators described above. The modifiers are read right to left
in the word, starting with the two-bit field adjacent to the basic type, until a field with 0 in it
is reached. The macros PTR, FTN, and ARY represent the pointer to, function returning,
and array of operators. The macro values are shifted so that they align with the first two-bit
field; thus PTR+ INT represents the type for an integer pointer, and
ARY+ (PTR<<2) + (FTN<<4) +DOUBLE
represents the type of an array of pointers to functiorls returning doubles.
The type words are ordinarily manipulated by macros. If t is a type word, BTYPE(t)
gives the basic type. ISPTR(t), ISARY(t), and ISFTN(t) ask if an object of this type is a
pointer, array, or a function, respectively. MOJ)TYPE(t,b) sets the basic type of t to b.
DECREF(t) gives the type resulting from removing the first operator from t. Thus, if t is a
pointer to t', a function returning t', or an array of t', then DECREF(t) would equal t'.
INCREF(t) gives the type representing a pointer to t. Finally, there are operators for dealing
with the unsigned types, ISUNSIGNED(t) returns true if t is one of the four basic unsigned
types; in this case, DEUNSIGN(t) gives the associated 'signed' type. Similarly,
UNSIGNABLE(t) returns true if t is one of the four basic types that could become unsigned,
and ENUNSIGN(t) retuq1s the u:ru;igned analogue oft in this case.
The other important global data structure is that of expression trees. The actual shapes
of the nodes are given in mfilel and mfile2, They are not the same in the two passes; the
first pass nodes contain dimension and size information, while the second pass nodes contain
register allocation information. Nevertheless, all nodes contain fields called op, containing the
node number, and type, containing the type word. A function called talloc() returns a
pointer to a new tree node. To free a node, its op field need merely be set to FREE. The

A Tour Through the Portable C Compiler 2-41
other fields in the node will remain intact at least until the next allocation.
Nodes representing binary operators contain fields, left and right, that contain pointers
to the left and right descendants. Unary operator nodes have the left field, and a value field
called rval. Leaf nodes, with no descendants, have two value fields: lval and rval.
At appropriate times, the function tcheck() can be called, to check that there are no
busy nodes remaining. This is used as a compiler consistency check. The function tcopy(p)
takes a pointer p that points to an expression tree, and returns a pointer to a disjoint copy of
the tree. The function walkf(p,f) performs a postorder walk of the tree pointed to by p, and
applies the function f to each node. The function fwalk(p,f,d) does a preorder walk of the
tree pointed to by p. At each node, it calls a function f, passing to it the node pointer, a
value passed down from its ancestor, and two pointers to values to be passed down to the left
and right descendants (if any). The value d is the value passed down to the root. Fwalk is
used for a number of tree labeling and debugging activities.
The other major data structure, the symbol table, exists only in pass one, and will be
discussed later.

Pass One
The first pass does lexical analysis, parsing, symbol table maintenance, tree building,
optimization, and a number of machine dependent things. This pass is largely machine
independent, and the machine independent sections can be pretty successfully ignored. Thus,
they will be only sketched here.
Lexical Analysis
The lexical analyzer is a conceptually simple routine that reads the input and returns
the tokens of the C language as it encounters them: names, constants, operators, and keywords. The conceptual simplicity of this job is confounded a bit by several other simple jobs
that unfortunately must go on simultaneously. These include
•
Keeping track of the current filename and line number, and occasionally setting this
information as the result of preprocessor control lines.
•
Skipping comments.
•
Properly dealing with octal, decimal, hex, floating point, and character constants, as well
as character strings.
To achieve speed, the program maintains several tables that are indexed into by character value, to tell the lexical analyzer what to do next. To achieve portability, these tables
must be initialized each time the compiler is run, in order that the table entries reflect the
local character set values.
Parsing
As mentioned above, the parser is generated by Yacc from the grammar on file cgram.y.
The grammar is relatively readable, but contains some unusual features that are worth comment.
Perhaps the strangest feature of the grammar is the treatment of declarations. The
problem is to keep track of the basic type and the storage class while interpreting the various
stars, brackets, and parentheses that may surround a given name. The entire declaration
mechanism must be recursive, since declarations may appear within declarations of structures
and unions, or even within a sizeof construction inside a dimension in another declaration!
There are some difficulties in using a bottom-up parser, such as produced by Yacc, to
handle constructions where a lot of left context information must be kept around. The problem is that the original PDP-11 compiler is top-down in implementation, and some of the
semantics of C reflect this. In a top-down parser, the input rules are restricted somewhat, but
one can naturally associate temporary storage with a rule at a very early stage in the

2-42 A Tour Through the Portable C Compiler
recognition of that rule. In a bottom-up parser, there is more freedom in the specification of
rules, but it is more difficult to know what rule is being matched until the entire rule is seen.
The parser described by cgram.c makes effective use of the bottom-up parsing mechanism in
some places (notably the treatment of expressions), but struggles against the restrictions in
others. The usual result is that it is necessary to run a stack of values "on the side", independent of the Yacc value stack, in order to be able to store and access information deep within
inner constructions, where the relationship of the rules being recognized to the total picture is
not yet clear.
In the case of declarations, the attribute information (type, etc.) for a declaration is carefully kept immediately to the left of the declarator (that part of the declaration involving the
name). In this way, when it is time to declare the name, the name and the type information
can be quickly brought together. The "$0" mechanism of Yace is used to accomplish this.
The result is not pretty, but it works. The storage class information changes more slowly, so
it is kept in an external variable, and stacked if necessary. Some of the grammar could be
considerably cleaned up by using some more recent features of Yace, notably actions within
rules and the ability to return multiple values for actions.
A stack is also used to keep track of the current location to be branched to when a
break or continue statement is processed.
This use of external stacks dates from the time when Yacc did not permit values to be
structures. Some, or most, of this use of external stacks could be eliminated by redoing the
grammar to use the mechanisms now provided. There are some areas, however, particularly
the processing of structure, union, and enum declarations, function prologs, and switch statement processing, when having all the affected data together in an array speeds later processing; in this case, use of external storage seems essential.
The cgram.y file also contains some small functions used as utility functions in the
parser. These include routines for saving case values and labels in processing switches, and
stacking and popping values on the external stack described above.
Storage Classes
C has a finite, but fairly extensive, number of storage classes available. One of the compiler design decisions was to process the storage class information totally in the first pass; by
the second pass, this information must have been totally dealt with. This means that all of
the storage allocation must take place in the first pass, so that references to automatics and
parameters can be turned into references to cells lying a certain number of bytes offset from
certain machine registers. Much of this transformation is machine dependent, and strongly
depends on the storage class.
The classes include EXTERN (for externally declared, but not defined variables),
EXTDEF (for external definitions), and similar distinctions for USTATIC and STATIC,
UFORTRAN and FORTRAN (for fortran functions) and ULABEL and LABEL. The storage
classes REGISTER and AUTO are obvious, as are STNAME, UNAME, and ENAME (for
structure, union, and enumeration tags), and the associated MOS, MOU, and MOE (for the
members). TYPEDEF is treated as a storage class as well. There are two special storage
classes: PARAM and SNULL. SNULL is used to distinguish the case where no explicit
storage class has been given; before an entry is made in the symbol table the true storage class
is discovered. Similarly, PARAM is used for the temporary entry in the symbol table made
before the declaration of function parameters is completed.
The most complexity in the storage class process comes from bit fields. A separate
storage class is kept for each width bit field; a k bit bit field has storage class k plus FIELD.
This enables the size to be quickly recovered from the storage class.

A Tour Through the Portable C Compiler 2-43
Symbol Table Maintenance.
The symbol table routines do far more than simply enter names into the symbol table;
considerable semantic processing and checking is done as well. For example, if a new declaration comes in, it must be checked to see if there is a previous declaration of the same symbol.
If there is, there are many cases. The declarations may agree and be compatible (for example,
an extern declaration can appear twice) in which case the new declaration is ignored. The
new declaration may add information (such as an explicit array dimension) to an already
present declaration. The new declaration may be different, but still correct (for example, an
extern declaration of something may be entered, and then later the definition may be seen).
The new declaration may be incompatible, but appear in an inner block; in this case, the old
declaration is carefully hidden away, and the new one comes into force until the block is left.
Finally, the declarations may be incompatible, and an error message must be produced.
A number of other factors make for additional complexity. The type declared by the
user is not always the type entered into the symbol table (for example, if an formal parameter
to a function is declared to be an array, C requires that this be changed into a pointer before
entry in the symbol table). Moreover, there are various kinds of illegal types that may be
declared which are difficult to check for syntactically (for example, a function returning an
array). Finally, there is a strange feature in C that requires structure tag names and member
names for structures and unions to be taken from a different logical symbol table than ordinary identifiers. Keeping track of which kind of name is involved is a bit of struggle (consider
typedef names used within structure declarations, for example).
The symbol table handling routines have been rewritten a number of times to extend
features, improve performance, and fix bugs. They address the above problems with reasonable effectiveness but a singular lack of grace.
When a name is read in the input, it is hashed, and the routine lookup is called,
together with a flag which tells which symbol table should be se~rched (actually, both symbol
tables are stored in one, and a flag is used to distinguish individual entries). If the name is
found, lookup returns the index to the entry found; otherwise, it makes a new entry, marks it
UNDEF (undefined), and returns the index of the new entry. This index is stored in the rval
field of a NAME node.
When a declaration is being parsed, this NAME node is made part of a tree with
UNARY MUL nodes for each*, LB nodes for each array descriptor (the right descendant has
the dimension), and UNARY CALL nodes for each function descriptor. This tree is passed to
the routine tymerge, along with the attribute type of the whole declaration; this routine collapses the tree to a single node, by calling tyreduce, and then modifies the type to reflect the
overall type of the declaration.
Dimension and size information is stored in a table called dimtab. To properly describe
a type in C, one needs not just the type information but also size information (for structures
and enums) and dimension information (for arrays). Sizes and offsets are dealt with in the
compiler by giving the associated indices into dimtab. Tymerge and tyreduce call dstash to
put the discovered dimensions away into the dimtab array. Tymerge returns a pointer to a
single node that contains the symbol table index in its rval field, and the size and dimension
indices in fields csiz and cdim, respectively. This information is properly considered part of
the type in the first pass, and is carried around at all times.
To enter an element into the symbol table, the routine defid is called; it is handed a
storage class, and a pointer to the node produced by tymerge. Defid calls fix type, which
adjusts and checks the given type depending on the storage class, and converts null types
appropriately. It then calls fixclass, which does a similar job for the storage class; it is here,
for example, that register declarations are either allowed or changed to auto.
The new declaration is now compared against an older one, if present, and several pages
of validity checks performed. If the definitions are compatible, with possibly some added
information, the processing is straightforward. If the definitions differ, the block levels of the

2-44 A Tour Through the Portable C Compiler

current and the old declaration are compared. The current block level is kept in blevel, an
external variable; the old declaration level is kept in the symbol table. Block level 0 is for
external declarations, 1 is for arguments to functions, and 2 and above are blocks within a
function. If the current block level is the same as the old declaration, an error results. If the
current block level is higher, the new declaration overrides the old. This is done by marking
the old symbol table entry "hidden", and making a new entry, marked "hiding". Lookup will
skip over hidden entries. When a block is left, the symbol table is searched, and any entries
defined in that block are destroyed; if they hid other entries, the old entries are "unhidden".
This nice block structure is warped a bit because labels do not follow the block structure
rules (one can do a goto into a block, for example); default definitions of functions in inner
blocks also persist clear out to the outermost scope. This implies that cleaning up the symbol
table after block exit is more subtle than it might first seem.
For successful new definitions, defid also initializes a "general purpose" field, offset, in
the symbol table. It contains the stack offset for automatics and parameters, the register
number for register variables, the bit offset into the structure for structure members, and the
internal label number for static variables and labels. The offset field is set by falloc for bit
fields, and dclstruct for structures and unions.
The symbol table entry itself thus contains the name, type word, size and dimension
offsets, offset value, and declaration block level. It also has a field of flags, describing what
symbol table the name is in, and whether the entry is hidden, or hides another. Finally, a
field gives the line number of the last use, or of the definition, of the name. This is used
mainly for diagnostics, but is useful to lint as well.
In some special cases, there is more than the above amount of information kept for the
use of the compiler. This is especially true with structures; for use in initialization, structure
declarations must have access to a list of the members of the structure. This list is also kept
in dimtab. Because a structure can be mentioned long before the members are known, it is
necessary to have another level of indirection in the table. The two words following the csiz
entry in dimtab are used to hold the alignment of the structure, and the index in dimtab of
the list of members. This list contains the symbol table indices for the structure members,
terminated by a -1.

Tree Building
The portable compiler transforms expressions into expression trees. As the parser recognizes each rule making up an expression, it calls buildtree which is given an operator number,
and pointers to the left and right descendants. Buildtree first examines the left and right
descendants, and, if they are both constants, and the operator is appropriate, simply does the
constant computation at compile time, and returns the result as a constant. Otherwise, buildtree allocates a node for the head of the tree, attaches the descendants to it, and ensures that
conversion operators are generated if needed, and that the type of the new node is consistent
with the types of the operands. There is also a considerable amount of semantic complexity
here; many combinations of types are illegal, and the portable compiler makes a strong effort
to check the legality of expression types completely. This is done both for lint purposes, and
to prevent such semantic errors from being passed through to the code generator.
The heart of buildtree is a large table, accessed by the routine opact. This routine
maps the types of the left and right operands into a rather smaller set of descriptors, and then
accesses a table (actually encoded in a switch statement) which for each operator and pair of
types causes an action to be returned. The actions are logical or's of a number of separate
actions, which may be carried out by buildtree. These component actions may include checking the left side to ensure that it is an lvalue (can be stored into), applying a type conversion
to the left or right operand, setting the type of the new node to the type of the left or right
operand, calling various routines to balance the types of the left and right operands, and
suppressing the ordinary conversion of arrays and function operands to pointers. An important operation is OTHER, which causes some special code to be invoked in buildtree, to

A Tour Through the Portable C Compiler 2-45
handle issues which are unique to a particular operator. Examples of this are structure and
union reference (actually handled by the routine stref), the building of NAME, ICON,
STRING and FCON (floating point constant) nodes, unary * and &, structure assignment,
and calls. In the case of unary * and &, buildtree will cancel a * applied to a tree, the top
node of which is &, and conversely.
Another special operation is PUN; this causes the compiler to check for type
mismatches, such as intermixing pointers and integers.
The treatment of conversion operators is still a rather strange area of the compiler (and
of C!). The recent introduction of type casts has only confounded this situation. Most of the
conversion operators are generated by calls to tymatch and ptmatch, both of which are given
a tree, and asked to make the operands agree in type. Ptmatch treats the case where one of
the operands is a pointer; tymatch treats all other cases. Where these routines have decided
on the proper type for an operand, they call makety, which is handed a tree, and a type word,
dimension offset, and size offset. If necessary, it inserts a conversion operation to make the
types correct. Conversion operations are never inserted on the left side of assignment operators, however. There are two conversion operators used; PCONV, if the conversion is to a
non-basic type (usually a pointer), and SCONV, if the conversion is to a basic type (scalar).
To allow for maximum flexibility, every node produced by buildtree is given to a
machine dependent routine, clocal, immediately after it is produced. This is to allow more or
less immediate rewriting of those nodes which must be adapted for the local machine. The
conversion operations are given to clocal as well; on most machines, many of these conversions
do nothing, and should be thrown away (being careful to retain the type). If this operation is
done too early, however, later calls to buildtree may get confused about correct type of the
subtrees; thus clocal is given the conversion ops only after the entire tree is built. This topic
will be dealt with in more detail later.

Initialization
Initialization is one of the messier areas in the portable compiler. The only consolation
is that most of the mess takes place in the machine independent part, where it is may be
safely ignored by the implementor of the compiler for a particular machine.
The basic problem is that the semantics of initialization really calls for a co-routine
structure; one collection of programs reading constants from the input stream, while another,
independent set of programs places these constants into the appropriate spots in memory.
The dramatic differences in the local assemblers also come to the fore here. The parsing
problems are dealt with by keeping a rather extensive stack containing the current state of the
initialization; the assembler problems are dealt with by having a fair number of machine
dependent routines.
The stack contains the symbol table number, type, dimension index, and size index for
the current identifier being initialized. Another entry has the offset, in bits, of the beginning
of the current identifier. Another entry keeps track of how many elements have been seen, if
the current identifier is an array. Still another entry keeps track of the current member of a
structure being initialized. Finally, there is an entry containing flags which keep track of the
current state of the initialization process (e.g., tell if a } has been seen for the current
identifier.)
When an initialization begins, the routine beginit is called; it handles the alignment restrictions, if any, and calls instk to create the stack entry. This is done by first making an
entry on the top of the stack for the item being initialized. If the top entry is an array,
another entry is made on the stack for the first element. If the top entry is a structure,
another entry is made on the stack for the first member of the structure. This continues until
the top element of the stack is a scalar. lnstk then returns, and the parser begins collecting
initializers.

2-46 A Tour Through the Portable C Compiler
When a constant is obtained, the routine doinit is called; it examines the stack, and does
whatever is necessary to assign the current constant to the scalar on the top of the stack.
gotscal is then called, which rearranges the stack so that the next scalar to be initialized gets
placed on top of the stack. This process continues until the end of the initializers; endinit
cleans up. If a { or } is encountered in the string of initializers, it is handled by calling
ilbrace or irbrace, respectively.
A central issue is the treatment of the "holes" that arise as a result of alignment restrictions or explicit requests for holes in bit fields. There is a global variable, inoff, which contains the current offset in the initialization (all offsets in the first pass of the compiler are in
bits). Doinit figures out from the top entry on the stack the expected bit offset of the next
identifier; it calls the machine dependent routine inforce which, in a machine dependent way,
forces the assembler to set aside space if need be so that the next scalar seen will go into the
appropriate bit offset position. The scalar itself is passed to one of the machine dependent
routines fincode (for floating point initialization), incode (for fields, and other initializations
less than an int in size), and cinit (for all other initializations). The size is passed to all these
routines, and it is up to the machine dependent routines to ensure that the initializer occupies
exactly the right size.
Character strings represent a bit of an exception. If a character string is seen as the initializer for a pointer, the characters making up the string must be put out under a different
location counter. When the lexical analyzer sees the quote at the head of a character string, it
returns the token STRING, but does not do anything with the contents. The parser calls
getstr, which sets up the appropriate location counters and flags, and calls lxstr to read and
process the contents of the string.
If the string is being used to initialize a character array, lxstr calls put byte, which in
effect simulates doinit for each character read. If the string is used to initialize a character
pointer, lxstr calls a machine dependent routine, bycode, which stashes away each character.
The pointer to this string is then returned, and processed normally by doinit.
The null at the end of the string is treated as if it were read explicitly by lxstr.

Statements
The first pass addresses four main areas; declarations, expressions, initialization, and
statements. The statement processing is relatively simple; most of it is carried out in the
parser directly. Most of the logic is concerned with allocating label numbers, defining the
labels, and branching appropriately. An external symbol, reached, is 1 if a statement can be
reached, 0 otherwise; this is used to do a bit of simple flow analysis as the program is being
parsed, and also to avoid generating the subroutine return sequence if the subroutine cannot
"fall through" the last statement.
Conditional branches are handled by generating an expression node, CBRANCH, whose
left descendant is the conditional expression and the right descendant is an ICON node containing the internal label number to be branched to. For efficiency, the semantics are that the
label is gone to if the condition is false.
The switch statement is compiled by collecting the case entries, and an indication as to
whether there is a default case; an internal label number is generated for each of these, and
remembered in a big array. The expression comprising the value to be switched on is compiled when the switch keyword is encountered, but the expression tree is headed by a special
node, FORCE, which tells the code generator to put the expression value into a special distinguished register (this same mechanism is used for processing the return statement). When
the end of the switch block is reached, the array containing the case values is sorted, and
checked for duplicate entries (an error); if all is correct, the machine dependent routine
genswitch is called, with this array of labels and values in increasing order. Genswitch can
assume that the value to be tested is already in the register which is the usual integer return
value register.

A Tour Through the Portable C Compiler 2-47
Optimization
There is a machine independent file, optim.c, which contains a relatively short optimization routine, optim. Actually the word optimization is something of a misnomer; the results
are not optimum, only improved, and the routine is in fact not optional; it must be called for
proper operation of the compiler.
Optim is called after an expression tree is built, but before the code generator is called.
The essential part of its job is to call clocal on the conversion operators. On most machines,
the treatment of & is also essential: by this time in the processing, the only node which is a
legal descendant of & is NAME. (Possible descendants of * have been eliminated by buildtree.) The address of a static name is, almost by definition, a constant, and can be
represented by an ICON node on most machines (provided that the loader has enough power).
Unfortunately, this is not universally true; on some machine, such as the IBM 370, the issue of
addressability rears its ugly head; thus, before turning a NAME node into an ICON node, the
machine dependent function andable is called.
The optimization attempts of optim are currently quite limited. It is primarily concerned with improving the behavior of the compiler with operations one of whose arguments is
a constant. In the simplest case, the constant is placed on the right if the operation is commutative. The compiler also makes a limited search for expressions such as
(x+a)+b

where a and b are constants, and attempts to combine a and b at compile time. A number of
special cases are also examined; additions of 0 and multiplications by 1 are removed, although
the correct processing of these cases to get the type of the resulting tree correct is decidedly
nontrivial. In some cases, the addition or multiplication must be replaced by a conversion op
to keep the types from becoming fouled up. Finally, in cases where a relational operation is
being done, and one operand is a constant, the operands are permuted, and the operator
altered, if necessary, to put the constant on the right. Finally, multiplications by a power of 2
are changed to shifts.
There are dozens of similar optimizations that can be, and should be, done. It seems
likely that this routine will be expanded in the relatively near future.

Machine Dependent Stuft'
A number of the first pass machine dependent routines have been discussed above. In
general, the routines are short, and easy to adapt from machine to machine. The two exceptions to this general rule are clocal and the function prolog and epilog generation routines,
bfcode and efcode.
Clocal has the job of rewriting, if appropriate and desirable, the nodes constructed by
buildtree. There are two major areas where this is important; NAME nodes and conversion
operations. In the case of NAME nodes, clocal must rewrite the NAME node to reflect the
actual physical location of the name in the machine. In effect, the NAME node must be
examined, the symbol table entry found (through the rval field of the node), and, based on
the storage class of the node, the tree must be rewritten. Automatic variables and parameters
are typically rewritten by treating the reference to the variable as a structure reference, off the
register which holds the stack or argument pointer; the stref routine is set up to be called in
this way, and to build the appropriate tree. In the most general case, the tree consists of a
unary * node, whose descendant is a + node, with the stack or argument register as left
operand, and a constant offset as right operand. In the case of LABEL and internal static
nodes, the rval field is rewritten to be the negative of the internal label number; a negative
rval field is taken to be an internal label number. Finally, a name of class REGISTER must
be converted into a REG node, and the rval field replaced by the register number. In fact,
this part of the clocal routine is nearly machine independent; only for machines with addressability problems (IBM 370 again!) does it have to be noticeably different,

2-48 A Tout' Through the Portable C Compiler
The conversion operator treatment is rather tricky. It is necessary to handle the application of conversion operators to constants in clocal, in order that all constant expressions can
have their values known at compile time. In extreme cases, this may mean that some simulation of the arithmetic of the target machine might have to be done in a cross-compiler. In the
most common case; conversions from pointer to pointer do nothing. For some machines, however, conversion from byte pointer to short or long pointer might require a shift or rotate
operation, which would have to be generated here.
The extension of the portable compiler to machines where the size of a pointer depends
on its type would be straightforward, but has not yet been done.
The other major machine dependent issue involves the subroutine prolog and epilog generation. The hard part here is the design of the stack frame and calling sequence; this design
issue is discussed elsewhere. 5 The routine bfcode is called with the number of arguments the
function is defined with, and an array containing the symbol table indices of the declared
parameters. Bfcode must generate the code to establish the new stack frame, save the return
address and previous stack pointer value on the stack, and save whatever registers are to be
used for register variables. The stack size and the number of register variables is not known
when bfcode is called, so these numbers must be referred to by assembler constants, which are
defined when they are known (usually in the second pass, after all register variables, automatics, and temporaries have been seen). The final job is to find those parameters which may
have been declared register, and generate the code to initialize the register with the value
passed on the stack. Once again, for most machines, the general logic of bfcode remains the
same, but the contents of the print{ calls in it will change from machine to machine. efcode
is rather simpler, having just to generate the default return at the end of a function. This
may be nontrivial in the case of a function returning a structure or union, however.
There seems to be no really good place to discuss structures and unions, but this is as
good a place as any. The C language now supports structure assignment, and the passing of
structures as arguments to functions, and the receiving of structures back from functions.
This was added rather late to C, and thus to the portable compiler. Consequently, it fits in
less well than the older features. Moreover, most of the burden of making these features work
is placed on the machine dependent code.
There are both conceptual and practical problems. Conceptually, the compiler is structured around the idea that to compute something, you put it into a register and work on it.
This notion causes a bit of trouble on some machines (e.g., machines with 3-address opcodes),
but matches many machines quite well. Unfortunately, this notion breaks down with structures. The closest that one can come is to keep the addresses of the structures in registers.
The actual code sequences used to move structures vary from the trivial (a multiple byte
move) to the horrible (a function call), and are very machine dependent.
The practical proble91 is more painful. When a function returning a structure is called,
this function has to have some place to put the structure value. If it places it on the stack, it
has difficulty popping its stack frame. If it places the value in a static temporary, the routine
fails to be reentrant. The most logically consistent way of implementing this is for the caller
to pass in a pointer to a spot where the called function should put the value before returning.
This is relatively straightforward, although a bit tedious, to implement, but means that the
caller must have properly declared the function type, even if the value. is never used. On some
machines, such Els the Interdata 8/32, the return value simply overlays the argument region
(which on the 8/32 is part of the caller's stack frame). The caller takes care of leaving enough
room if the returned value is larger than the arguments. This also assumes that the caller
know and declares the function properly.
The PDP-11 and the VAX have stack hardware which is used in function calls and
returns; this makes it very inconvenient to use either of the above mechanisms. In these
machines, a static area within the called functionis allocated, and the function return value is
copied into it on return; the function returns the address of that region. This is simple to
implement, but is non-reentrant. However, the function can now be called as a subroutine

A Tour Through the Portable C Compiler 2-49
without being properly declared, without the disaster which would otherwise ensue. No
matter what choice is taken, the convention is that the function actually returns the address
of the return structure value.
In building expression trees, the portable compiler takes a bit for granted about structures. It assumes that functions returning structures actually return a pointer to the structure, and it assumes that a reference to a structure is actually a reference to its address. The
structure assignment operator is rebuilt so that the left operand is the structure being
assigned to, but the right operand is the address of the structure being assigned; this makes it
easier to deal with
a=b=c
and similar constructions.
There are four special tree nodes associated with these operations: STASG (structure
assignment), STARG (structure argument to a function call), and STCALL and UNARY
STCALL (calls of a function with nonzero and zero arguments, respectively). These four
nodes are unique in that the size and alignment information, which can be determined by the
type for all other objects in C, must be known to carry out these operations; special fields are
set aside in these nodes to contain this information, and special intermediate code is used to
transmit this information.

First Pass Summary
There are may other issues which have been ignored here, partly to justify the title
"tour", and partially because they have seemed to cause little trouble. There are some debugging flags which may be turned on, by giving the compiler's first pass the argument
-X[flags]
Some of the more interesting flags are - Xd for the defining and freeing of symbols, - Xi for
initialization comments, and - Xb for various comments about the building of trees. In many
cases, repeating the flag more than once gives more information; thus, - Xddd gives more
information than - Xd. In the two pass version of the compiler, the flags should not be set
when the output is sent to the second pass, since the debugging output and the intermediate
code both go onto the standard output.
We turn now to consideration of the second pass.

Pass Two
Code generation is far less well understood than parsing or lexical analysis, and for this
reason the second pass is far harder to discuss in a file by file manner. A great deal of the
difficulty is in understanding the issues and the strategies employed to meet them. Any particular function is likely to be reasonably straightforward.
Thus, this part of the paper will concentrate a good deal on the broader aspects of strategy in the code generator, and will not get too intimate with the details.
Overview.
It is difficult to organize a code generator to be flexible enough to generate code for a
large number of machines, and still be efficient for any one of them. Flexibility is also important when it comes time to tune the code generator to improve the output code quality. On
the other hand, too much flexibility can lead to semantically incorrect code, and potentially a
combinatorial explosion in the number of cases to be considered in the compiler.
One goal of the code generator is to have a high degree of correctness. It is very desirable to have the compiler detect its own inability to generate correct code, rather than to produce incorrect code. This goal is achieved by having a simple model of the job to be done
(e.g., an expression tree) and a simple model of the machine state (e.g., which registers are

2-50 A Tour Through the Portable C Compiler
free). The act of generating an instruction performs a transformation on the tree and the
machine state; hopefully, the tree eventually gets reduced to a single node. If each of these
instruction/transformation pairs is correct, and if the machine state model really represents
the actual machine, and if the transformations reduce the input tree to the desired single
node, then the output code will be correct.
For most real machines, there is no definitive theory of code generation that encompasses all the C operators. Thus the selection of which instruction/transformations to generate, and in what order, will have a heuristic flavor. If, for some expression tree, no transformation applies, or, more seriously, if the heuristics select a sequence of
instruction/transformations that do not in fact reduce the tree, the compiler will report its
inability to generate code, and abort.
A major part of the code generator is concerned with the model and the transformations,
- most of this is machine independent, or depends only on simple tables. The flexibility
comes from the heuristics that guide the transformations of the trees, the selection of
subgoals, and the ordering of the computation.

The Machine, Model
The machine is assumed to have a number of registers, of at most two different types: A
and B. Within each register class, there may be scratch (temporary) registers and dedicated
registers (e.g., register variables, the stack pointer, etc.). Requests to allocate and free registers involve only the temporary registers.
Each of the registers in the machine is given a name and a number in the mac2defs file;
the numbers are used as indices into various tables that describe the registers, so they should
be kept small. One such table is the rstatus table on file local2.c. This table is indexed by
register number, and contains expressions made up from manifest constants describing the
register types: SAREG for dedicated AREG's, SAREG/STAREG for scratch AREGS's, and
SBREG and SBREG\ STBREG similarly for BREG's. There are macros that access this information: isbreg(r) returns true if register number r is a BREG, and istreg(r) returns true if
register number r is a temporary AREG or BREG. Another table, rnames, contains the register names; this is used when putting out assembler code and diagnostics.
The usage of registers is kept track of by an array called busy. Busy[r] is the number
of uses of register r in the current tree being processed. The allocation and freeing of registers will be discussed later as part of the code generation algorithm.
General Organization
As mentioned above, the second pass reads lines from the intermediate file, copying
through to the output unchanged any lines that begin with a')', and making note of the information about stack usage and register allocation contained on lines beginning with ']' and '['.
The expression trees, whose beginning is indicated by a line beginning with '.', are read and
rebuilt into trees. If the compiler is loaded as one pass, the expression trees are immediately
available to the code generator.
The actual code generation is done by a hierarchy of routines. The routine delay is first
given the tree; it attempts to delay some postfix ++ and -- computations that might reasonably be done after the smoke clears. It also attempts to handle comma (,) operators by computing the left side expression first, and then rewriting the tree to eliminate the operator.
Delay calls codgen to control the actual code generation process. Codgen takes as arguments
a pointer to the expression tree, and a second argument that, for socio-historical reasons, is
called a cookie. The cookie describes a set of goals that would be acceptable for the code generation: these are assigned to individual bits, so they may be logically or'ed together to form a
large number of possible goals. Among the possible goals are FOREFF (compute for side
effects only; don't worry about the value), INTEMP (compute and store value into a temporary location in memory), INAREG (compute into an A register), INTAREG (compute into
a scratch A register), INBREG and INTBREG similarly, FORCC (compute for condition

A Tour Through the Portable C Compiler 2-51
codes), and FORARG (compute it as a function argument; e.g., stack it if appropriate).
Codgen first canonicalizes the tree by calling canon. This routine looks for certain
transformations that might now be applicable to the tree. One, which is very common and
very powerful, is to fold together an indirection operator (UNARY MUL) and a register
(REG); in most machines, this combination is addressable directly, and so is similar to a
NAME in its behavior. The UNARY MUL and REG are folded together to make another
node type called OREG. In fact, in many machines it is possible to directly address not just
the cell pointed to by a register, but also cells differing by a constant offset from the cell
pointed to by the register. Canon also looks for Sl!Ch easel"; calling the machine dependent
routine notoff to decide if the offset is acceptable (for example, in the IBM 370 the offset
must be between 0 and 4095 bytes). Another optimization is w replace bit field operations by
shifts and masks if the operation involves extracting the field. Finally, a machine dependent
routine, sucomp, is called that computes the Sethi-Ullman numbers for the tree (see below).
After the tree is canonicalized, codgen calls the routine store whose job is to select a
subtree of the tree to be computed and (usually) stored before beginning the computation of
the full tree. Store must return a tree that can be computed without need for any temporary
storage locations. In effect, the only store operations generated while processing the subtree
must be as a response to explicit assignment operators in the tree. This division of the job
marks one of the more significant, and successful, departures from most other compilers. It
means that the code generator can operate under the assumption that there are enough registers to do its job, without worrying about temporary storage. If a store into a temporary
appears in the output, it is always as a direct result of logic in the store routine; this makes
debugging easier.
One consequence of this organization is that code is not generated by a treewalk. There
are theoretical results that support this decision. 7 It may be desirable to compute several subtrees and store them before tackling the whole tree; if a subtree is to be stored, this is known
before the code generation for the subtree is begun, and the subtree is computed when all
scratch registers are available.
The store routine decides what subtrees, if any, should be stored by making use of
numbers, called Sethi-Ullman numbers, that give, for each subtree of an expression tree, the
minimum number of scratch registers required to compile the subtree, without any stores into
temporaries.a These numbers are computed by the machine-dependent routine sucomp, called
by canon. The basic notion is that, knowing the Sethi-Ullman numbers for the descendants
of a node, and knowing the operator of the node and some information about the machine, the
Sethi-Ullman number of the node itself can be computed. If the Sethi-Ullman number for a
tree exceeds the number of scratch registers available, some subtree must be stored. Unfortunately, the theory behind the Sethi-Ullman numbers applies only to uselessly simple
machines and operators. For the rich set of C operators, and for machines with asymmetric
registers, register pairs, different kinds of registers, and exceptional forms of addressing, the
theory cannot be applied directly. The basic idea of estimation is a good one, however, and
well worth applying; the application, especially when the compiler comes to be tuned for high
code quality, goes beyond the park of theory into the swamp of heuristics. This topic will be
taken up again later, when more of the compiler structure has been described.
After examining the Sethi-Ullman numbers, store selects a subtree, if any, to be stored,
and returns the subtree and the associated cookie in the external variables stotree and stocook. If a subtree has been selected, or if the whole tree is ready to be processed, the routine
order is called, with a tree and cookie. Order generates code for trees that do not require
temporary locations. Order may make recursive calls on itself, and, in some cases, on codgen;
for example, when processing the operators &&, 11, and comma (','), that have a left to right
evaluation, it is incorrect for store examine the right operand for subtrees to be stored. In
these cases, order will call codgen recursively when it is permissible to work on the r~ht
operand. A similar issue arises with the? : operator.

2-52 A Tour Through the Portable C Compiler
The order routine works by matching the current tree with a set of code templates. If a
template is discovered that will match the current tree and cookie, the associated assembly
language statement or statements are generated. The tree is then rewritten, as specified by
the template, to represent the effect of the output instruction(s). If no template match is
found, first an attempt is made to find a match with a different cookie; for example, in order
to compute an expression with cookie INTEMP (store into a temporary storage location), it is
usually necessary to compute the expression into a scratch register first. If all attempts to
match the tree fail, the heuristic part of the algorithm becomes dominant. Control is typically
given to one of a number of machine-dependent routines that may in turn recursively call
order to achieve a subgoal of the computation (for example, one of the arguments may be
computed into a temporary register). After this subgoal has been achieved, the process begins
again with the modified tree. If the machine-dependent heuristics are unable to reduce the
tree further, a number of default rewriting rules may be considered appropriate. For example,
if the left operand of a + is a scratch register, the + can be replaced by a += operator; the
tree may then match a template.
To close this introduction, we will discuss the steps in compiling code for the expression

a+= b
where a and b are static variables.
To begin with, the whole expression tree is examined with cookie FOREFF, and no
match is found. Search with other cookies is equally fruitless, so an attempt at rewriting is
made. Suppose we are dealing with the Interdata 8/32 for the moment. It is recognized that
the left hand and right hand sides of the += operator are addressable, and in particular the
left hand side has no side effects, so it is permissible to rewrite this as

a=a+b
and this is done. No match is found on this tree either, so a machine dependent rewrite is
done; it is recognized that the left hand side of the assignment is addressable, but the right
hand side is not in a register, so order is called recursively, being asked to put the right hand
side of the assignment into a register. This invocation of order searches the tree for a match,
and fails. The machine dependent rule for + notices that the right hand operand is addressable; it decides to put the left operand into a scratch register. Another recursive call to order
is made, with the tree consisting solely of the leaf a, and the cookie asking that the value be
placed into a scratch register. This now matches a template, and a load instruction is emitted.
The node consisting of a is rewritten in place to represent the register into which a is loaded,
and this third call to order returns. The second call to order now finds that it has the tree
reg+ b
to consider. Once again, there is no match, but the default rewriting rule rewrites the + as a
+= operator, since the left operand is a scratch register. When this is done, there is a match:
in fact,
reg+= b
simply describes the effect of the add instruction on a typical machine. After the add is emitted, the tree is rewritten to consist merely of the register node, since the result of the add is
now in the register. This agrees with the cookie passed to the second invocation of order, so
this invocation terminates, returning to the first level. The original tree has now become

a= reg
which matches a template for the store instruction. The store is output, and the tree rewritten to become just a single register node. At this point, since the top level call to order was
interested only in side effects, the call to order returns, and the code generation is completed;
we have generated a load, add, and store, as might have been expected.

A Tour Through the Portable C Compiler 2-53
The effect of machine architecture on this is considerable. For example, on the
Honeywell 6000, the machine dependent heuristics recognize that there is an "add to storage"
instruction, so the strategy is quite different; b is loaded in to a register, and then an add to
storage instruction generated to add this register in to a. The transformations, involving as
they do the semantics of C, are largely machine independent. The decisions as to when to use
them, however, are almost totally machine dependent.
Having given a broad outline of the code generation process, we shall next consider the
heart of it: the templates. This leads naturally into discussions of template matching and
register allocation, and finally a discussion of the machine dependent interfaces and strategies.

The Templates
The templates describe the effect of the target machine instructions on the model of
computation around which the compiler is organized. In effect, each template has five logical
sections, and represents an assertion of the form:
If we have a subtree of a given shape (1), and we have a goal (cookie) or goals to achieve
(2), and we have sufficient free resources (3), then we may emit an instruction or
instructions (4), and rewrite the subtree in a particular manner (5), and the rewritten
tree will achieve the desired goals.
These five sections will be discussed in more detail later. First, we give an example of a
template:
ASGPLUS,

INAREG,
SAREG,
SNAME,

TINT,
TINT,
0,

RLEFT,
add

AL,AR\n",

The top line specifies the operator(+=) and the cookie (compute the value of the subtree into
an AREG). The second and third lines specify the left and right descendants, respectively, of
the += operator. The left descendant must be a REG node, representing an A register, and
have integer type, while the right side must be a NAME node, and also have integer type.
The fourth line contains the resource requirements (no scratch .registers or temporaries
needed), and the rewriting rule (replace the subtree by the left descendant). Finally, the
quoted string on the last line represents the output to the assembler: lower case letters, tabs,
spaces, etc. are copied verbatim. to the output; upper case letters trigger various macro-like
expansions. Thus, AL would expand into the Address form of the Left operand - presumably the register number. Similarly, AR would expand into the name of the right operand.
The add instruction of the last section might well be emitted by this template.
In principle, it would be possible to make separate templates for all legal combinations
of operators, cookies, types, and shapes. In practice, the number of combinations is very large.
Thus, a considerable amount of mechanism is present to permit a large number of subtrees to
be matched by a single template. Most of the shape and type specifiers are individual bits,
and can be logically or'ed together. There are a number of special descriptors for matching
classes of operators. The cookies can also be combined. As an example of the kind of template that really arises in practice, the actual template for the Interdata 8/32 that subsumes
the above example is:
ASG OPSIMP, INAREGIFORCC,
SAREG,
TINTITUNSIGNEDITPOINT,
SAREGISNAMEISOREGISCON, .
TINTITUNSIGNEDITPOIN1
0,
RLEFTIRESCC,
"
OI
AL,AR\n",

Here, OPSIMP represents the operators + • - • I, &, and The 01 macro in the output string
expands into the appropriate Integer Opcode for the operator. The left and right sides can be
A.

2-54 A Tour Through the Portable C Compiler
integers, unsigned, or pointer types. The right side can be, in addition to a name, a register, a
memory location whose address is given by a register and displacement (OREG), or a constant. Finally, these instructions set the condition codes, and so can be used in condition contexts: the cookie and rewriting rules reflect this.

The Template Matching Algorithm.
The heart of the second pass is the template matching algorithm, in the routine match.
Match is called with a tree and a cookie; it attempts to match the given tree against some
template that will transform it according to one of the goals given in the cookie. If a match is
suceessful, the transformation is applied; expand is called to generate the assembly code, and
then reclaim rewrites the tree, and reclaims the resources, such as registers, that might have
become free as a result of the generated code.
This part of the compiler is among the most time critical. There is a spectrum of implementation techniques available for doing this matching. The most naive algorithm simply
looks at the templates one by one. This can be considerably improved upon by restricting the
search for an acceptable template. It would be possible to do better than this if the templates
were given to a separate program that, ate them and generated a template matching subroutine. This would make maintenance of the compiler much more complicated, however, so this
has not been done.
The matching algorithm is actually carried out by restricting the range in the table that
must be searched for each opcode. This introduces a number of complications, however, and
needs a bit of sympathetic help by the person constructing the compiler in order to obtain
best results. The exact tuning of this algorithm continues; it is best to consult the code and
comments in match for the latest version.
In order to match a template to a tree, it is necessary to match not only the cookie and
the op of the root, but also the types and shapes of the left and right descendants (if any) of
the tree. A convention is established here that is carried out throughout the second pass of
the compiler. If a node represents a unary operator, the single descendant is always the "left"
descendant. If a node represents a unary operator or a leaf node (no descendants) the "right"
descendant is taken by convention to be the node itself. This enables templates to easily
match leaves and conversion operators, for example, without any additional mechanism in the
matching program.
The type matching is straightforward; it is possible to specify any combination of basic
types, general pointers, and pointers to one or more of the basic types. The shape matching is
somewhat more complicated, but still pretty simple. Templates have a collection of possible
operand shapes on which the opcode might match. In the simplest case, an add operation
might be able to add to either a register variable or a scratch register, and might be able (with
appropriate help from the assembler) to add an integer constant (ICON), a static memory cell
(NAME), or a stack location (OREG).
It is usually attractive to specify a number of such shapes, and distinguish between them
when the assembler output is produced. It is possible to describe the union of many elementary shapes such as ICON, NAME, OREG, AREG or BREG (both scratch and register forms),
etc. To handle at least the simple forms of indirection, one can also match some more complicated forms of trees; STARNM and STARREG can match more complicated trees headed by
an indirection operator, and SFLD can match certain trees headed by a FLD operator: these
patterns call machine dependent routines that match the patterns of interest on a given
machine. The shape SWADD may be used to recognize NAME or OREG nodes that lie on
word boundaries: this may be of some importance on word-addressed machines. Finally,
there are some special shapes: these may not be used in conjunction with the other shapes,
but may be defined and extended in machine dependent ways. The special shapes SZERO,
SONE, and SMONE are predefined and match constants 0, 1, and -1, respectively; others are
easy to add and match by using the machine dependent routine special.

A Tour Through the Portable C Compiler 2-55
When a template has been found that matches the root of the tree, the cookie, and the
shapes and types of the descendants, there is still one bar to a total match: the template may
call for some resources (for example, a scratch register). The routine allo is called, and it
attempts to allocate the resources. If it cannot, the match fails; no resources are allocated. If
successful, the allocated resources are given numbers 1, 2, etc. for later reference when the
assembly code is generated. The routines expand and reclaim are then called. The match
routine then returns a special value, MDONE. If no match was found, the value MNOPE is
returned; this is a signal to the caller to try more cookie values, or attempt a rewriting rule.
Match is also used to select rewriting rules, although the way of doing this is pretty straightforward. A special cookie, FORREW, is used to ask match to search for a rewriting rule. The
rewriting rules are keyed to various opcodes; most are carried out in order. Since the question
of when to rewrite is one of the key issues in code generation, it will be taken up again later.

Register Allocation.
The register allocation routines, and the allocation strategy, play a central role in the
correctness of the code generation algorithm. If there are bugs in the Sethi-Ullman computation that cause the number of needed registers to be underestimated, the compiler may run
out of scratch registers; it is essential that the allocator keep track of those registers that are
free and busy, in order to detect such conditions.
Allocation of registers takes place as the result of a template match; the routine allo is
called with a word describing the number of A registers, B registers, and temporary locations
needed. The allocation of temporary locations on the stack is relatively straightforward, and
will not be further covered; the bookkeeping is a bit tricky, but conceptually trivial, and
requests for temporary space on the stack will never fail.
Register allocation is less straightforward. The two major complications are pairing and
sharing. In many machines, some operations (such as multiplication and division), and/or
some types (such as longs or double precision) require even/odd pairs of registers. Operations
of the first type are exceptionally difficult to deal with in the compiler; in fact, their theoretical properties are rather bad as well. 9 The second issue is dealt with rather more successfully;
a machine dependent function called szty(t) is called that returns 1 or 2, depending on the
number of A registers required to hold an object of type t. If szty returns 2, an even/odd pair
of A registers is allocated for each request.
The other issue, sharing, is more subtle, but important for good code quality. When
registers are allocated, it is possible to reuse registers that hold address information, and use
them to contain the values computed or accessed. For example, on the IBM 360, if register 2
has a pointer to an integer in it, we may load the integer into register 2 itself by saying:
L

2,0(2)

If register 2 had a byte pointer, however, the sequence for loading a character involves clearing
the target register first, and then inserting the desired character:

SR
IC

3,3
3,0(2)

In the first case, if register 3 were used as the target, it would lead to a larger number of registers used for the expression than were required; the compiler would generate inefficient code.
On the other hand, if register 2 were used as the target in the second case, the code would
simply be wrong. In the first case, register 2 can be shared while in the second, it cannot.
In the specification of the register needs in the templates, it is possible to indicate
whether required scratch registers may be shared with possible registers on the left or the
right of the input tree. In order that a register be shared, it must be scratch, and it must be
used only once, on the appropriate side of the tree being compiled.
The allo routine thus has a bit more to do than meets the eye; it calls freereg to obtain
a free register for each A and B register request. Freereg makes multiple calls on the routine

2-56 A Tour Through the Portable C Compiler

usable to decide if a given register can be used to satisfy a given need. Usable calls shareit if
the register is busy, but might be shared. Finally, shareit calls ushare to decide if the desired
register is actually in the appropriate subtree, and can be shared.
Just to add additional complexity, on some inachinefl (such as the IBM 370) it is possible
to have "double indexing" forms of addressing; these are represented by OREGS's with the
base and index registers encoded intd the register field. While the register allocation and
deallocation per se is not made more difficult by this phenomenon, the code itself is somewhat
more complex.
Having allocated the registers and expanded the assembly language, it is. time to reclaim
the resources; the routine reclaim does this. Many operations produce more than one result.
For example, many arithmetic operations may produce a value in a register, and also set the
condition codes. Assignment operations may leave results both in a register and in memory.
Reclaim is passed three parameters; the tree and cookie that were matched, and the. rewriting
field of the template. The rewriting field ailows the specification of possible results; the tree is
rewritten to reflect the results of the operation. If the tree was computed for side effects only
(FOREFF), the tree is freed, and all resources in it reclaimed. If the tree was computed for
condition codes, the resources are a1so freed, and the tree replaced by a special node type,
FORCC. Otherwise, the value may be found in the left argument of the root, the right argument of the root, or one of the temporary resources allocated. In these cases, first the
resources of the tree, and the newly allocated resources, are freed; then the resources needed
by the result are made busy again. The final result must always match the shape of the input
cookie; otherwise, the cotnpiler error "cannot reclaim" is generated. There are some machine
dependent ways of preferring results in registers or memory when there are multiple results
matching multiple goals in the cookie.

The Machine Dependent Interface
The files order.c, local2.c; and table.c, as well as the header file mac2defs, represertt the
machine dependent portion of the seeond pass. The machine dependent portion can be
toughly divided into two: the easy portion and the hard portion. The easy portion tells the
compiler the names of the registers, and arranges that the compiler generate the proper
assembler formats, opcode names, location counters, etc. The hard portion involves the
Sethi-Ullman computation, the rewriting rules, and, to some extent, the templates. It is hard
because there are no real algorithms that apply; most of this portion is based on heuristics.
This section discusses the easy pottiort; the next several sections will discuss the hard portion.
If the compiler is adapted from a compiler for a machine of similar architecture, the easy
part is indeed easy. In mac2defs, the register numbers are defined, as well as various parameters for the stack frame, and various tnacros that describe the machine architecture. If double
indexing is to be permitted, for etample, the symbol R2REGS is defined. Also, a number of
macros that are involved in function call processing, especially for unusual function call
mechanisms, are defined here.
In local2.c, a large number of simple functions are defined. These do things such as
write out opcodes, register names, and address forms for the assembler. Part of the function
call code is defined here; that is nontrivial to design, but typically rather straightforward to
implement. Among the easy routines in order.c are routines for generating a created label,
defining a label, and generating the argutnents of a function call.
These routines tend to have a local effect, and depend on a fairly straightforward way on
the target assembler and the design decisions already made about the compiler. Thus they
will not be further treated here.
The Rewriting Rules
When a tree fails to match any template, it becomes a candidate for rewriting. Before
the tree is rewritten, the machine dependent routine nextcook is called with the tree and the
cookie; it suggests another cookie that might be a better candidate for the matching of the

A Tour Through the Portable C Compiler 2-57
tree. If all else fails, the templates are searched with the cookie FORREW, to look for a
rewriting rule. The rewriting rules are of two kinds; for most of the common operators, there
are machine dependent rewriting rules that may be applied; these are handled by machine
dependent functions that are called and given the tree to be computed. These routines may
recursively call order or codgen to cause certain subgoals to be achieved; if they actually call
for some alteration of the tree, they return 1, and the code generation algorithm recanonicalizes and tries again. If these routines choose not to deal with the tree, the default rewriting
rules are applied.
The assignment ops, when rewritten, call the routine setasg. This is assumed to rewrite
the tree at least to the point where there are no side effects in the left hand side. If there is
still no template match, a default rewriting is done that causes an expression such as
a+= b
to be rewritten as
a=a+b
This is a useful default for certain mixtures of strange types (for example, when a is a bit field
and b an character) that otherwise might need separate table entries.
Simple assignment, structure assignment, and all forms of calls are handled completely
by the machine dependent routines. For historical reasons, the routines generating the calls
return 1 on failure, 0 on success, unlike the other routines.
The machine dependent routine setbin handles binary operators; it too must do most of
the job. In particular, when it returns 0, it must do so with the left hand side in a temporary
register. The default rewriting rule in this case is to convert the binary operator into the associated assignment operator; since the left hand side is assumed to be a temporary register, this
preserves the semantics and often allows a considerable saving in the template table.
The increment and decrement operators may be dealt with with the machine dependent
routine setincr. If this routine chooses not to deal with the tree, the rewriting rule replaces
x ++

by
( (x += 1) - 1)

which preserves the semantics. Once again, this is not too attractive for the most common
cases, but can generate close to optimal code when the type of x is unusual.
Finally, the indirection (UNARY MUL) operator is also handled in a special way. The
machine dependent routine offstar is extremely important for the efficient generation of code.
Offstar is called with a tree that is the direct descendant of a UNARY MUL node; its job is to
transform this tree so that the combination of UNARY MUL with the transformed tree
becomes addressable. On most machines, offstar can simply compute the tree into an A or B
register, depending on the architecture, and then canon will make the resulting tree into an
OREG. On many machines, offstar can profitably choose to do less work than computing its
entire argument into a register. For example, if the target machine supports OREGS with a
constant offset from a register, and offstar is called with a tree of the form
expr + const
where const is a constant, then offstar need only compute expr into the appropriate form of
register. On machines that support double indexing, offstar may have even more choice as to
how to proceed. The proper tuning of offstar, which is not typically too difficult, should be
one of the first tries at optimization attempted by the compiler writer.

2-58 A Tour Through the Portable C Compiler

The Sethi-Ullman Computation
The heart of the heuristics is the computation of the Sethi-Ullman numbers. This computation is closely linked with the rewriting rules and the templates. As mentioned before,
the Sethi-Ullman numbers are expected to estimate the number of scratch registers needed to
compute the subtrees without using any stores. However, the original theory does not apply
to real machines. For one thing, the theory assumes that all registers are interchangeable.
Real machines have general purpose, floating point, and index registers, register pairs, etc.
The theory also does not account for side effects; this rules out various forms of pathology
that arise from assignment and assignment ops. Condition codes are also undreamed of.
Finally, the influence of types, conversions, and the various addressability restrictions and
extensions of real machines are also ignored.
Nevertheless, for a "useless" theory, the basic insight of Sethi and Ullman is amazingly
useful in a real compiler. The notion that one should attempt to estimate the resource needs
of trees before starting the code generation provides a natural means of splitting the code generation problem, and provides a bit of redundancy and self checking in the compiler. Moreover, if writing the Sethi-Ullman routines is hard, describing, writing, and debugging the alternative (routines that attempt to free up registers by stores into temporaries "on the fly") is
even worse. Nevertheless, it should be clearly understood that these routines exist in a realm
where there is no "right" way to write them; it is an art, the realm of heuristics, and, consequently, a major source of bugs in the compiler. Often, the early, crude versions of these routines give little trouble; only after the compiler is actually working and the code quality is
being improved do serious problem have to be faced. Having a simple, regular machine architecture is worth quite a lot at this time.
The major problems arise from asymmetries in the registers: register pairs, having
different kinds of registers, and the related problem of needing more than one register (frequently a pair) to store certain data types (such as longs or doubles). There appears to be no
general way of treating this problem; solutions have to be fudged for each machine where the
problem arises. On the Honeywell 66, for example, there are only two general purpose registers, so a need for a pair is the same as the need for two registers. On the IBM 370, the register pair (0,1) is used to do multiplications and divisions; registers 0 and 1 are not generally
considered part of the scratch registers, and so do not require allocation explicitly. On the
Interdata 8/32, after much consideration, the decision was made not to try to deal with the
register pair issue; operations such as multiplication and division that required pairs were simply assumed to take all of the scratch registers. Several weeks of effort had failed to produce
an algorithm that seemed to have much chance of running successfully without inordinate
debugging effort. The difficulty of this issue should not be minimized; it represents one of the
main intellectual efforts in porting the compiler. Nevertheless, this problem has been fudged
with a degree of success on nearly a dozen machines, so the compiler writer should not abandon hope.
The Sethi-Ullman computations interact with the rest of the compiler in a number of
rather subtle ways. As already discussed, the store routine uses the Sethi-Ullman numbers to
decide which subtrees are too difficult to compute in registers, and must be stored. There are
also subtle interactions between the rewritl.ng routines and the Sethi-Ullman numbers. Suppose we have a tree such as

A-B
where A and B are expressions; suppose further that B takes two registers, and A one. It is
possible to compute the full expression in two registers by first computing B, and then, using
the scratch register used by B, but not containing the answer, compute A. The subtraction
can then be done, computing the expression. (Note that this assumes a number of things, not
the least of which are register-to-register subtraction operators and symmetric registers.) If the
machine dependent routine setbin, however, is not prepared to recognize this case and compute the more difficult side of the expression first, the Sethi-Ullman number must be set to
three. Thus, the Sethi-Ullman number for a tree should represent the code that the machine

A Tour Through the Portable C Compiler 2-59
dependent routines are actually willing to generate.
The interaction can go the other way. If we take an expression such as
*(p+i)

where p is a pointer and i an integer, this can probably be done in one register on most
machines. Thus, its Sethi-Ullman number would probably be set to one. If double indexing is
possible in the machine, a possible way of computing the expression is to load both p and i
into registers, and then use double indexing. This would use two scratch registers; in such a
case, it is possible that the scratch registers might be unobtainable, or might make some other
part of the computation run out of registers. The usual solution is to cause offstar to ignore
opportunities for double indexing that would tie up more scratch registers than the SethiUllman number had reserved.
In summary, the Sethi-Ullman computation represents much of the craftsmanship and
artistry in any application of the portable compiler. It is also a frequent source of bugs. Algorithms are available that will produce nearly optimal code for specialized machines, but unfortunately most existing machines are far removed from these ideals. The best way of proceeding in practice is to start with a compiler for a similar machine to the target, and proceed very
carefully.

Register Allocation
After the Sethi-Ullman numbers are computed, order calls a routine, rallo, that does
register allocation, if appropriate. This routine does relatively little, in general; this is especially true if the target machine is fairly regular. There are a few cases where it is assumed
that the result of a computation takes place in a particular register; switch and function
return are the two major places. The expression tree has a field, rall, that may be filled with
a register number; this is taken to be a preferred register, and the first temporary register allocated by a template match will be this preferred one, if it is free. If not, no particular action
is taken; this is just a heuristic. If no register preference is present, the field contains
NOPREF. In some cases, the result must be placed in a given register, no matter what. The
register number is placed in rall, and the mask MUSTDO is logically or'ed in with it. In this
case, if the subtree is requested in a register, and comes back in a register other than the
demanded one, it is moved by calling the routine rmove. If the target register for this move is
busy, it is a compiler error.
Note that this mechanism is the only one that will ever cause a register-to-register move
between scratch registers (unless such a move is buried in the depths of some template). This
simplifies debugging. In some cases, there is a rather strange interaction between the register
allocation and the Sethi-Ullman number; if there is an operator or situation requiring a particular register, the allocator and the Sethi-Ullman computation must conspire to ensure that the
target register is not being used by some intermediate result of some far-removed computation. This is most easily done by making the special operation take all of the free registers,
preventing any other partially-computed results from cluttering up the works.
Compiler Bugs
The portable compiler has an excellent record of generating correct code. The requirement for reasonable cooperation between the register allocation, Sethi-Ullman computation,
rewriting rules, and templates builds quite a bit of redundancy into the compiling process.
The effect of this is that, in a surprisingly short time, the compiler will start generating
correct code for those programs that it can compile. The hard part of the job then becomes
finding and eliminating those situations where the compiler refuses to compile a program
because it knows it cannot do it right. For example, a template may simply be missing; this
may either give a compiler error of the form "no match for op ... ", or cause the compiler to go
into an infinite loop applying various rewriting rules. The compiler has a variable, nrecur,
that is set to 0 at the beginning of an expressions, and incremented at key spots in the

2-60 A Tour Through the Portable C Compiler
compilation process; if this parameter gets too large, the compiler decides that it is in a loop,
and aborts. Loops are also characteristic of botches in the machine-dependent rewriting rules.
Bad Sethi-Ullman computations usually cause the scratch registers to run out; this often
means that the Sethi-Ullman number was underestimated, so store did not store something it
should have; alternatively, it can mean that the rewriting rules were not smart enough to find
the sequence that sucomp assumed would be used.
The best approach when a compiler error is detected involves several stages. First, try to
get a small example program that steps on the bug. Second, turn on various debugging flags
in the code generator, and follow the tree through the process of being matched and rewritten.
Some flags of interest are -e, which prints the expression tree, -r, which gives information
about the allocation of registers, -a, which gives information about the performance of rallo,
and -o, which gives information about the behavior of order. This technique should allow
most bugs to be found relatively quickly.
Unfortunately, finding the bug is usually not enough; it must also be fixed! The
difficulty arises because a fix to the particular bug of interest tends to break other code that
·already works. Regression tests, tests that compare the performance of a new compiler against
the performance of an older one, are very valuable in preventing major catastrophes.

Summary and Conclusion
The portable compiler has been a useful tool for providing C capability on a large
number of diverse machines, and for testing a number of theoretical constructs in a practical
setting. It has many blemishes, both in style and functionality. It has been applied to many
more machines than first anticipated, of a much wider range than originally dreamed of. Its
use has also spread much faster than expected, leaving parts of the compiler still somewhat
raw in shape.
On the theoretical side, there is some hope that the skeleton of the sucomp routine
could be generated for many machines directly from the templates; this would give a considerable boost to the portability and correctness of the compiler, but might affect tunability and
code quality. There is also room for more optimization, both within optim and in the form of
a portable "peephole" optimizer.
On the practical, development side, the compiler could probably be sped up and made
smaller without doing too much violence to its basic structure. Parts of the compiler deserve
to be rewritten; the initialization code, register allocation, and parser are prime candidates. It
might be that doing some or all of the parsing with a recursive descent parser might save
enough space and time to be worthwhile; it would certainly ease the problem of moving the
compiler to an environment where Yacc is not already present.
Finally, I would like to thank the many people who have sympathetically, and even
enthusiastically, helped me grapple with what has been a frustrating program to write, test,
and install. D. M. Ritchie and E. N. Pinson provided needed early encouragement and philosophical guidance; M. E. Lesk, R. Muha, T. G. Peterson, G. Riddle, L. Rosier, R. W. Mitze, B.
R. Rowland, S. I. Feldman, and T. B. London have all contributed ideas, gripes, and all, at
one time or another, climbed "into the pits" with me to help debug. Without their help this
effort would have not been possible; with it, it was often kind of fun.

A Tour Through the Portable C Compiler 2-61
References
1.
2.
3.
4.

B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice-Hall,
Englewood Cliffs, New Jersey, 1978.
S. C. Johnson, "Lint, a C Program Checker," Comp. Sci. Tech. Rep. No. 65, 1978.
updated version TM 78-1273-3
A. Snyder, A Portable Compiler for the Language C, Master's Thesis, M.I.T., Cambridge, Mass., 1974.
S. C. Johnson, "A Portable Compiler: Theory and Practice," Proc. 5th ACM Symp. on
Principles of Programming Languages, pp. 97-104, January 1978.
M. E. Lesk, S. C. Johnson, and D. M. Ritchie, The C Language Calling Sequence, 1977.
S. C. Johnson, "Yacc - Yet Another Compiler-Compiler," Comp. Sci. Tech. Rep. No.
32, Bell Laboratories, Murray Hill, New Jersey, July 1975.
A. V. Aho and S. C. Johnson, "Optimal Code Generation for Expression Trees," J.
Assoc. Comp. Mach., vol. 23, no. 3, pp. 488-501, 1975. Also in Proc. ACM Symp. on
Theory of Computing, pp. 207-217, 1975.
R. Sethi and J. D. Ullman, "The Generation of Optimal Code for Arithmetic Expressions," J. Assoc. Comp. Mach., vol. 17, no. 4, pp. 715-728, October 1970. Reprinted as
pp. 229-247 in Compiler Techniques, ed. B. W. Pollack, Auerbach, Princeton NJ (1972).
A. V. Aho, S. C. Johnson, and J. D. Ullman, "Code Generation for Machines with Multiregister Operations," Proc. 4th ACM Symp. on Principles of Programming Languages,
pp. 21-28, January 1977.

A Tour Through the UNIX C Compiler 2-63

A Tour Through the UNIXt C Compiler
D. M. Ritchie
Bell Laboratories,
Murray Hill, New Jersey 07974

The Intermediate Language
Communication between the two phases of the compiler proper is carried out by means
of a pair of intermediate files. These files are treated as having identical structure, although
the second file contains only the code generated for strings. It is convenient to write strings
out separately to reduce the need for multiple location counters in a later assembly phase.
The intermediate language is not machine-independent; its structure in a number of
ways reflects the fact that C was originally a one-pass compiler chopped in two to reduce the
maximum memory requirement. In fact, only the latest version of the compiler has a complete intermediate language at all. Until recently, the first phase of the compiler generated
assembly code for those constructions it could deal with, and passed expression parse trees, in
absolute binary form, to the second phase for code generation. Now, at least, all inter-phase
information is passed in a describable form, and there are no absolute pointers involved, so
the coupling between the phases is not so strong.
The areas in which the machine (and system) dependencies are most noticeable are
1.
Storage allocation for automatic variables and arguments has already been ·performed,
and nodes for such variables refer to them by offset from a display pointer. Type
conversion (for example, from integer to pointer) has already occurred using the assumption of byte addressing and 2-byte words.
2.
Data representations suitable to the PDP-11 are assumed; in particular, floating point
constants are passed as four words in the machine representation.
As it happens, each intermediate file is represented as a sequence of binary numbers
without any explicit demarcations. It consists of a sequence of conceptual lines, each headed
by an operator, and possibly containing various operands. The operators are small numbers;
to assist in recognizing failure in synchronization, the high-order byte of each operator word is
always the octal number 376. Operands are either 16-bit binary numbers or strings of characters representing names. Each name is terminated by a null character. There is no alignment
requirement for numerical operands and so there is no padding. after a name string.
The binary representation was chosen to avoid the necessity of converting to and from
character form and to minimize the size of the files. It would be very easy to make each
operator-operand 'line' in the file be a genuine, printable line, with the numbers in octal or
decimal; this in fact was the representation originally used.
The operators fall naturally into two classes: those which represent part of an expression,
and all others. Expressions are transmitted in a reverse-Polish notation; as they are being
read, a tree is built which is isomorphic to the tree constructed in the first phase. Expressions
are p;tSsed as a whole, with no non-expression operators intervening. The reader maintains a
stack; each leaf of the expression tree (name, constant) is pushed on the stack; each unary
tUNIX is a Trademark of Bell Laboratories.

2-64 A Tour Through the UNIX C Compiler
operator replaces the top of the stack by a node whose operand is the old top-of-stack; each
binary operator replaces the top pair on the stack with a single entry. When the expression is
cpmplete there is exactly one item on the stack. Following each expression is a special operator which passes the unique previoµs expression to the 'optimizer' described below and then to
the code generator.
. ,
.
Here is the list of operators not themselves part of expressions.

EOF
marks the end of an input file.

BDATA flag data ...
specifies a sequence of bytes to be assembled as static data. It is followed by pairs of
words; the first member of the pair is non-zero to indicate that the data continue; a zero
flag is not followed by data and terminates the operator. The data bytes occupy the
low-order part of a word.
WDATA flag data ...
specifies a sequence of words to be assembled as static data; it is identical to the BDATA
operator except that entire words, not just bytes, are passed.

PROG
means that subsequent information is to be compiled as program text.

DATA
means that subsequent information is to be compiled as static data.
BSS
means that subsequent information is to be compiled as unitialized static data.

SYMDEF name
means that the symbol name is an external name defined in the current program. It is
produced for each external data or function definition.
CSPACE name size
indicates that the name refers to a data area whose size is the spe~ified number of bytes.
It is produced for external data definitions without explicit initialization.
SSPACE size
indicates that size bytes should be set aside for data storage. It is used to pad out short
initializations of external data and to reserve space for static (internal) data. It will be
preceded by an appropriate label.
EVEN
is produced after each external data definition whose size is not an integral number of
words. It is not produced after strings except when they initialize a character array.

NLABEL name
is produced just before a BDATA or WDATA initializing external data, and serves as a
label for the data.

A Tour Through the UNIX C Compiler 2-65

RLABELname
is produced just before each function definition, and labels its entry point.
SNAME name number
is produced at the start of each function for each static variable or label declared
therein. Subsequent uses of the variable will be in terms of the given number. The code
generator uses this only to produce a debugging symbol table.
ANAME name number
Likewise, each automatic variable's name and stack offset is specified by this operator.
Arguments count as automatics.
RNAME name number
Each register variable is similarly named, with its register number.
SAVE number
produces a register-save sequence at the start of each function, just after its label (RLABEL).
SETREG number
is used to indicate the number of registers used for register variables. It actually gives
the register number of the lowest free register; it is redundant because the RNAME
operators could be counted instead.
PROFIL
is produced before the save sequence for functions when the profile option is turned on.
It produces code to count the number of times the function is called.
SWIT defl,ab line label value ...
is produced for switches. When control flows into it, the value being switched on is in
the register forced by RFORCE (below). The switch statement occurred on the indicated line of the source, and the label number of the default location is defiab. Then the
operator is followed by a sequence of label-number and value pairs; the list is terminated
by a 0 label.
LABEL number
generates an internal label. It is referred to elsewhere using the given number.
BRANCH number
indicates an unconditional transfer to the internal label number given.
RETRN
produces the return sequence for a function. It occurs only once, at the end of each
function.
EXPR line
causes the expression just preceding to be compiled. The argument is the line number in
the source where the expression occurred.
\

2-66 A Tour Through the UNIX C Compiler
NAME class type name
NAME class type number
indicates a name occurring in an expression. The first form is used when the name is
external; the second when the name is automatic, static, or a register. Then the number
indicates the stack offset, the label number, or the register number as appropriate. Class
and type encoding is described elsewhere.
CON type value
transmits an integer constant. This and the next two operators occur as part of expressions.
FCON type 4-word-value
transmits a floating constant as four words in PDP-11 notation.
SFCON type value
transmits a floating-point constant whose value is correctly represented by its high-order
word in PDP-11 notation.
NULL
indicates a null argument list of a function call in an expression; call is a binary operator
whose second operand is the argument list.
CBRANCH label cond
produces a conditional branch. It is an expression operator, and will be followed by an
EXPR. The branch to the label number takes place if the expression's truth value is the
same as that of cond. That is, if cond=I and the expression evaluates to true, the
branch is taken.
binary-operator type
There are binary operators corresponding to each such source-language operator; the
type of the result of each is passed as well. Some perhaps-unexpected ones are:
COMMA, which is a right-associative operator designed to simplify right-to-left evaluation of function arguments; prefix and postfix ++ and --, whose second operand is the
increment amount, as a CON; QUEST and COLON, to express the conditional expression as 'a?(b:c)'; and a sequence of special operators for expressing relations between
pointers, in case pointer comparison is different from integer comparison (e.g. unsigned).
unary-operator type
There are also numerous unary operators. These include ITOF, FTOI, FTOL, LTOF,
ITOL, LTOI which convert among floating, long, and integer; JUMP which branches
indirectly through a label expression; INIT, which compiles the value of a constant
expression used as an initializer; RFORCE, which is used before a return sequence or a
switch to place a value in an agreed-upon register.
Expression Optimization
Each expression tree, as it is read in, is subjected to a fairly comprehensive analysis.
This is performed by the optim routine and a number of subroutines; the major things done
are
1.
Modifications and simplifications of the tree so its value may be computed more
efficiently and conveniently by the code generator.

A Tour Through the UNIX C Compiler 2-67
2.

Marking each interior node with an estimate of the number of registers required to
evaluate it. This register count is needed to guide the code generation algorithm.
One thing that is definitely not done is discovery or exploitation of common subexpressions, nor is this done anywhere in the compiler.
The basic organization is simple: a depth-first scan of the tree. Optim does nothing for
leaf nodes (except for automatics; see below), and calls unoptim to handle unary operators.
For binary operators, it calls itself to process the operands, then treats each operator
separately. One important case is commutative and associative operators, which are handled
by acommute.
Here is a brief catalog of the transformations carried out by by optim itself. It is not
intended to be complete. Some of the transformations are machine-dependent, although they
may well be useful on machines other than the PDP-11.
1.
As indicated in the discussion of unoptim below, the optimizer can create a node type
corresponding to the location addressed by a register plus a constant offset. Since this is
precisely the implementation of automatic variables and arguments, where the register is
fixed by convention, such variables are changed to the new form to simplify later processing.
2.
Associative and commutative operators are processed by the special routine acommute.
3.
After processing by acommute, the bitwise & operator is turned into a new andn operator; 'a & b' becomes 'a andn -b'. This is done because the PDP-11 provides no and
operator, but only andn. A similar transformation takes place for'=&'.
4.
Relationals are turned around so the more complicated expression is on the left. (So
that '2 > f(x)' becomes 'f(x) < 2'). This improves code generation since the algorithm
prefers to have the right operand require fewer registers than the left.
5.
An expression minus a constant is turned into the expression plus the negative constant,
and the acommute routine is called to take advantage of the properties of addition.
6.
Operators with constant operands are evaluated.
7.
Right shifts (unless by 1) are turned into left shifts with a negated right operand, since
the PDP-11 lacks a general right-shift operator.
8.
A number of special cases are simplified, such as division or multiplication by 1, and
shifts by 0.
The unoptim routine performs the same sort of processing for unary operators.
1.
'*&x' and '&*x' are simplified to 'x'.
2.
If r is a register and c is a constant or the address of a static or external variable, the
expressions '*(r+c)' and '*r' are turned into a special kind of name node which expresses
the name itself and the offset. This simplifies subsequent processing because such constructions can appear as the the address of a PDP-11 instruction.
3.
When the unary '&' operator is applied to a name node of the special kind just discussed, it is reworked to make the addition explicit again; this is done because the PDP11 has no 'load address' instruction.
4.
5.
6.
7.

Constructions like '*r++' and '*--r' where r is a register are discovered and marked as
being implementable using the PDP-11 auto-increment and -decrement modes.
If '!' is applied to a relational, the '!' is discarded and the sense of the relational is
reversed.

Special cases involving reflexive use of negation and complementation are discovered.
Operations applying to constants are evaluated.
The acommute routine, called for associative and commutative operators, discovers clusters of the same operator at the top levels of the current tree, and arranges them in a list: for
'a+((b+c)+(d+f))' the list would be'a,b,c,d,e,r. After each subtree is optimized, the list is

2-68 A Tour Through the UNIX C Compiler
sorted in decreasing difficulty of computation; as mentioned above, the code generation algorithm works best when left operands are the difficult ones. The 'degree of difficulty' computed
is actually finer than the mere number of registers required; a constant is considered simpler
than the address of a static or external, which is simpler than reference to a variable. This
makes it easy to fold all the constants together, and also to merge together the sum of a constant and the address of a static or external (since in such nodes there is space for an 'offset'
value). There are also special cases, like multiplication by 1 and addition of 0.
A special routine is invoked to handle sums of products. Distrib is based on the fact that it is
better to compute 'cl *c2*x + cl *y' as 'cl *(c2*x + y)' and makes the divisibility tests required
to assure the correctness of the transformation. This transformation is rarely possible with
code directly written by the user, but it invariably occurs as a result of the implementation of
multi-dimensional arrays.
Finally, acommute reconstructs a tree from the list of expressions which result.
Code Generation
The grand plan for code-generation is independent of any particular machine; it depends
largely on a set of tables. But this fact does not necessarily make it very easy to modify the
compiler to produce code for other machines, both because there is a good deal of machinedependent structure in the tables, and because in any event such tables are non-trivial to
prepare.
The arguments to the basic code generation routine rcexpr are a pointer to a tree
representing an expression, the name of a code-generation table, and the number of a register
in which the value of the expression should be placed. Rcexpr returns the number of the
register in which the value actually ended up; its caller may need to produce a mov instruction
if the value really needs to be in the given register. There are four code generation tables.
Regtab is the basic one, which actually does the job described above: namely, compile
code which places the value represented by the expression tree in a register.
Cctab is used when the value of the expression is not actually needed, but instead the
value of the condition codes resulting from evaluation of the expression. This table is used,
for example, to evaluate the expression after if. It is clearly silly to calculate the value (O or 1)
of the expression 'a==b' in the context 'if (a==b) ... '
The sptab table is used when the value of an expression is to be pushed on the stack, for
example when it is an actual argument. For example in the function call 'f(a)' it is a bad idea
to load a into a register which is then pushed on the stack, when there is a single instruction
which does the job.
The efftab table is used when an expression is to be evaluated for its side effects, not its
value. This occurs mostly for expressions which are statements, which have no value. Thus
the code for the statement 'a = b' need produce only the approoriate mov instruction, and
need not leave the value of bin a register, while in the expression 'a + (b = c)' the value of 'b
= c' will appear in a register.
All of the tables besides regtab are rather small, and handle only a relatively few special
cases. If one of these subsidiary tables does not contain an entry applicable to the given
expression tree, rcexpr uses regtab to put the value of the expression into a register and then
fixes things up; nothing need be done when the table was efftab, but a tst instruction is produced when the table called for was cctab, and a mov instruction, pushing the register on the
stack, when the table was sptab.
The rcexpr routine itself picks off some special cases, then calls cexpr to do the real
work. Cexpr tries to find an entry applicable to the given tree in the given table, and returns
-1 if no such entry is found, letting rcexpr try again with a different table. A successful
match yields a string containing both literal characters which are written out and pseudooperations, or macros, which are expanded. Before studying the contents of these strings we

(

A Tour Through the UNIX C Compiler 2-69
will consider how table entries are matched against trees.
Recall that most non-leaf nodes in an expression tree contain the name of the operator,
the type of the value represented, and pointers to the subtrees (operands). They also contain
an estimate of the number of registers requirecl to evaluate the expression, placed there by the
expression-optimizer routines. The register counts are used to guide the code generation process, which is based on the Sethi-Ullman algorithm.
The main code generation tables consh\!t of entries each containing an operator number
and a pointer to a subtable for the corresponding operator. A subtable consists of a sequence
of entries, each with a key describing certain propertieii of the operands of the operator
involved; associated with the key is a code string. Once the subtable corresponding to the
operator is found, the subtable is searched linearly until a key is foµnd such that the properties demanded by the key are compatible with the operands of the tree node. A successful
match returns the code string; an unsuccessful search, either for the operator in the main
table or a compatble key in the subtable, returns a failure indication.
The tables are all contained in a file which must be processed to obtain an assembly
language program. Thus they are written in a special-purpose language. To provided
definiteness to the following discussion, here is an example of a subtable entry.
%n,aw
F
add A2,R
The '%' indicates the key; the information following (up to a blank line) specifies the code
string. Very briefly, this entry is in the subtable for '+' of regtab; the key specifies that the
left operand is any integer, character, or pointer expression, and the right operand is any word
quantity which is directly addressible (e.g. a variable or constant). The code string calls for
the generation of the code to compile the left (first) operand into the current register ('F') and
then to produce an 'add' instruc~ion which adds the second operand ('A2') to the register ('R').
All of the notation will be explained below.
Only three features of the operands are used in deciding whether a match has occurred.
They are:
1.
Is the type of the operand compatible with that demanded?
2.
Is the 'degree of difficulty' (in a sense described below) compatible?
3.
The table may demand that the operand have a'*' (indirection oper~tor) as its highest
operator.
As suggested above, the key for a subtable entry is indicated by a '% ,' and a commaseparated pair of specifications for the operands. (The second specification is ignored for
unary operators). A specification indicates a type requirement by including one of the following letters. If no type letter is p~esent, any integer, character, or ppintjilr operand will satisfy
the requirement (not float, double, or long).
b
A byte (character) operand is required.
w
A word (integer or pointer) operand is required.
f
A float or double operand is required.
d
A double operand is required.
A long (32-bit integer) operand is required.
Before discussing the 'qegree of difficulty' specification, the algorithm has to be
explained more completeJy, :Rcf!xpr (and cexpr) are called with a register number in which to
place their result. Registers Q, 1, ... are used during evaluation of expressions; the maximum
register which can be used inthifi! way depends on the number of register variables, but in any
event only registers 0 through 4 are available since r5 is used as a stack frame header and r6
(sp) and r7 (pc) have special hardware properties. The code genert}tion routines assume that
when called with register n as argument, they may use n+l, ... (up to the first register

2-70 A Tour Through the UNIX C Compiler
variable) as temporaries. Consider the expression 'X+Y', where both X and Y are expressions. As a first approximation, there are three ways of compiling code to put this expression
in register n.
1.
If Y is an addressible cell, (recursively) put X into register n and add Y to it.
2.
If Y is an expression that can be calculated in k registers, where k smaller than the
number of registers available, compile X into register n, Y into register n+ 1, and add
register n +1 to n.
3.
Otherwise, compile Y into register n, save the result in a temporary (actually, on the
stack) compile X into register n, then add in the temporary.
The distinction between cases 2 and 3 therefore depends on whether the right operand
can be compiled in fewer than k registers, where k is the number of free registers left after
registers 0 through n are taken: 0 through n-1 .are presumed to contain already computed
temporary results; n will, in case 2, contain the value of the left operand while the right is
being evaluated.
These considerations should make clear the specification codes for the degree of
difficulty, bearing in mind that a number of special cases are also present:
z
is satisfied when the operand is zero, so that special code can be produced for expressions like 'x = O'.
1
is satisfied when the operand is the constant 1, to optimize cases like left and right shift
by 1, which can be done efficiently on the PDP-11.
c
is satisfied when the operand is a positive (16-bit) constant; this takes care of some special cases in long arithmetic.
a
is satisfied when the operand is addressible; this occurs not only for variables and constants, but also for some more complicated constructions, such as indirection through a
simple variable, '*p++' where p is a register variable (because of the PDP-ll's autoincrement address mode), and '*(p+c)' where p is a register and c is a constant. Precisely, the requirement is that the operand refers to a cell whose address can be written
as a source or destination of a PDP-11 instruction.
e
is satisfied by an operand whose value can be generated in a register using no more than
k registers, where k is the number of registers left (not counting the current register).
The 'e' stands for 'easy.'
n
is satisfied by any operand. The 'n' stands for 'anything.'
These degrees of difficulty are considered to lie in a linear ordering and any operand
which satisfies an earlier-mentioned requirement will satisfy a later one. Since the subtables
are searched linearly, if a 'l' specification is included, almost certainly a 'z' must be written
first to prevent expressions containing the constant 0 to be compiled as if the 0 were 1.
Finally, a key specification may contain a '*' which requires the operand to have an
indirection as its leading operator. Examples below should clarify the utility of this
specification.
Now let us consider the contents of the code string associated with each subtable entry.
Conventionally, lower-case letters in this string represent literal information which is copied
directly to the output. Upper-case letters generally introduce specific macro-operations, some
of which may be followed by modifying information. The code strings in the tables are written with tabs and new-lines used freely to suggest instructions which will be generated; the
table-compiling program compresses tabs (using the 0200 bit of the next character) and throws
away some of the new-lines. For example the macro 'F' is ordinarily written on a line by
itself; but since its expansion will end with a new-line, the new-line after 'F' itself is dispensable. This is all to reduce the size of the stored tables.
· The first set of macro-operations is concerned with compiling subtrees. Recall that this
is done by the cexpr routine. In the following discussion the 'current register' is generally the

A Tour Through the UNIX C Compiler 2-71
argument register to cexpr; that is, the place where the result is desired. The 'next register' is
numbered one higher than the current register. (This explanation isn't fully true because of
complications, described below, involving operations which require even-odd register pairs.)
F
causes a recursive call to the rcexpr routine to compile code which places the value of
the first (left) operand of the operator in the current register.
Fl generates code which places the value cif the first operand in the next register. It is
incorrectly used if there might be no next register; that is, if the degree of difficulty of
the first operand is not 'easy;' if not, another register might not be available.
FS generates code which pushes the value of the first operand on the stack, by calling
rcexpr specifying sptab as the table.
Analogously,
S,Sl,SS
compile the second (right) operand into the current register, the next register, or onto
the stack.
To deal with registers, there are
R
which expands into the name of the current register.
Rl which expands into the name of the next register.
R+ which expands into the the name of the current register plus 1. It was suggested above
that this is the same as the next register, except for complications; here is one of them.
Long integer variables have 32 bits and require 2 registers; in such cases the next register
is the current register plus 2. The code would like to talk about both halves of the long
quantity, so R refers to the register with the high-order part and R+ to the low-order
part.
R- This is another complication, involving division and mod. These operations involve a
pair of registers of which the odd-numbered contains the left operand. Cexpr arranges
that the current register is odd; the R- notation allows the code to refer to the next
lower, even-numbered register.
To refer to addressible quantities, there are the notations:
Al causes generation of the address specified by the first operand. For this to be legal, the
operand must be addressible; its key must contain an 'a' or a more restrictive
specification.
A2 correspondingly generates the address of the second operand providing it has one.
We now have enough mechanism to show a complete, if suboptimal, table for the +
operator on word or byte operands.

2- 72 A Tour Through the UNIX C Compiler
%n,z
F

%n,l
F
inc

%n,aw
F
add A2,R
%n,e
F

81
add Rl,R
%n,n

SS
F
add (sp)+,R
The first two.sequences handle some special cases. Actually it turns out that handling a right
operand of 0 is unnecessary since the expression-optimizer throws out adds of 0. Adding 1 by
using the 'increment' instruction is done next, and then the case where the right operand is
addressible. It pmst be a wo~d quantity, since the PDP-11 lacks an 'add byte' instruction.
Finally the cases where the right operand either can, or cannot, be done in the available registers are treated.
The next macro-instructions are conveniently introduced by noticipg that the above
table is suitable for subtraction as well as addition, since no use is made of the commutativity
of addition. All that is needed is substitution of 'sub' for 'add' and 'dee' for 'inc.' Considerable saving of space is achieved by factoring out li\everal similar operations.
I
is replaced by a string from another table indexed by the operator in the .node being
expanded. This secondary table actually contains two strings per operator.
I' is replaced by the second string in the side table entry for the current operator.
Thus, given that the entries for'+' apd '-'in the side table (which is called instab) are
'add' and 'inc,' 'sub' ap.d 'dee' respectively, the middle of of the above addition table can be
written

%n,1
F
I'

%n,aw
F
I

A2,R

and it will be suitable for subtraction, and several other operators, as well.
Next, there is the question of chara~ter an4 floating-point operations.
Bl generates the letter 'b' if tJie qrst operam;l is a character, 'f' if it is float or double, and
nothing otherwise. It is µsed in a c(mt~xt like 'movBl' which generates a 'mov', 'movb',
or 'movf' instruction according t~ the tYI>e of the operand.
B2 is just like Bl but applies to the i;iecond operand.

A Tour Through the UNIX C Compiler 2-73
BE generates 'b' if either operand is a character and null otherwise.
BF generates •r if the type of the operator node itself is float or double, otherwise null.
For example, there is an entry in el/tab for the'=' operator
%a,aw
%ab,a
IBE A2,Al
Note first that two key specifications can be applied to the same code string. Next, observe
that when a word is assigned to a byte or to a word, or a word is assigned to a byte, a single
instruction, a mov or movb as appropriate, does the job. However, when a byte is assigned to
a word, it must pass through a register to implement the sign-extension rules:
%a,n

s
IBl R,Al

Next, there is the question of handling indirection properly. Consider the expression 'X
+ *Y', where X and Y are expressions, Assuming that Y is more complicated than just a variable, but on the other hand qualifies as 'easy' in the context, the expression would be compiled by placing the value of X in a register, that of *Y in the next register, and adding the
registers. It is easy to see that a better job can be done by compiling X, then Y (into the next
register), and producing the instruction symbolized by 'add (Rl),R'. This scheme avoids generating the instruction 'mov (Rl),Rl' required actually to place the value of *Y in a register.
A related situation occurs with the expression 'X + *(p+6)', which exemplifies a construction
frequent in structure and array references. The addition table shown above would produce
[put X in register R]
mov p,Rl
add $6,Rl
mov (Rl),Rl
add Rl,R
when the best code is
[put X in R]
mov p,Rl
add 6(Rl),R
As we said above, a key specification for a code table entry may require an operand to have an
indirection as its highest operator. To make use of the requirement, the following macros are
provided.
the first operand must have the form *X. If in particular it has the form *(Y + c), for
some constant c, then code is produced which places the value of Y in the current register. Otherwise, code is produced which loads X into the current register.
Fl* resembles F* except that the next register is loaded.
S* resembles F* except that the second operand is loaded.
Sl * resembles S* except that the next register is loaded.
FS* The first operand must have the form '*X'. Push the value of X on the stack.
SS* resembles FS* except that it applies to the second operand.
To capture the constant that may have been skipped over in -the above macros, there are
#1 The first operand must have the form *X; jf in particular it has the form *(Y + c) for c
a constant, then the constant is written out, otherwise a null string.
#2 is the same as #1 except that the second operand is used.
F*

2-74 A Tour Through the UNIX C Compiler
Now we can improve the addition table above. Just before the '% n,e' entry, put
%n,ew*
F

Sl*
add #2(Rl),R
and just before the '% n,n' put
%n,nw*
SS*
F

add *(sp)+,R
When using the stacking macros there is no place to use the constant as an index word, so
that particular special case doesn't occur.
The constant mentioned above can actually be more general than a .number. Any quantity acceptable to the assembler as an expression will do, in particular the address of a static
cell, perhaps with a numeric offset. If x is an external character array, the expression 'x[i+5]
= O' will generate the code
mov i,rO
clrb x+5(r0)
via the table entry (in the'=' part of efftab)
%e*,z
F
I'Bl #l(R)
Some machine operations place restrictions on the registers used. The divide instruction, used
to implement the divide and mod operations, requires the dividend to be placed in the odd
member of an even-odd pair; other peculiarities of multiplication make it simplest to put the
multiplicand in an odd-numbered register. There is no theory which optimally accounts for
this kind of requirement. Cexpr handles it by checking for a multiply, divide, or mod operation; in these cases, its argument register number is incremented by one or two so that it is
odd, and if the operation was divide or mod, so that it is a member of a free even-odd pair.
The routine which determines the number of registers required es_timates, conservatively, that
at least two registers are required for a multiplication and three for the other peculiar operators. After the expression is compiled, the register where the result actually ended up is
returned. (Divide and mod are actually the same operation except for the location of the
result).
These operations are the ones which cause results to end up in unexpected places, and
this possibility adds a further level of complexity. The simplest way of handling the problem
is always to move the result to the place where the caller expected it, but this will produce
unnecessary register moves in many simple cases; 'a = b*c' would generate
mov b,rl
mul c,rl
mov rl,rO
mov rO,a
The next thought is used the passed-back information as to where the result landed to change
the notion of the current register. While compiling the'=' operation above, which comes from
a table entry like
%a,e

s
mov R,Al

A Tour Through the UNIX C Compiler 2-75
it is sufficient to redefine the meaning of 'R' after processing the 'S' which does the multiply.
This technique is in fact used; the tables are written in such a way that correct code is produced. The trouble is that the technique cannot be used in general, because it invalidates the
count of the number of registers required for an expression. Consider just 'a*b + X' where X
is some expression. The algorithm assumes that the value of a*b, once computed, requires
just one register. If there are three registers available, and X requires two registers to compute, then this expression will match a key specifying '%n,e'. If a*b is computed and left in
register 1, then there are, contrary to expectations, no longer two registers available to compute X, but only one, and bad code will be produced. To guard against this possibility, cexpr
checks the result returned by recursive calls which implement F, S and their relatives. If the
result is not in the expected register, then the number of registers required by the other
operand is checked; if it can be done using those registers which remain even after making
unavailable the unexpectedly-occupied register, then the notions of the 'next register' and possibly the 'current register' are redefined. Otherwise a register-copy instruction is produced. A
register-copy is also always produced when the current operator is one of those which have
odd-even requirements.
Finally, there are a few loose-end macro operations and facts about the tables. The
operators:
V
is used for long operations. It is written with an address like a machine instruction; it
expands into 'adc' (add carry) if the operation is an additive operator, 'she' (subtract
carry) if the operation is a subtractive operator, and disappears, along with the rest of
the line, otherwise. Its purpose is to allow common treatment of logical operations,
which have no carries, and additive and subtractive operations, which generate carries.
T
generates a 'tst' instruction if the first operand of the tree does not set the condition
codes correctly. It is used with divide and mod operations, which require a signextended 32-bit operand. The code table for the operations contains an 'sxt' (signextend) instruction to generate the high-order part of the dividend.
H
is analogous to the 'F' and 'S' macros, except that it calls for the generation of code for
the current tree (not one of its operands) using regtab. It is used in cctab for all the
operators which, when executed normally, set the condition codes properly according to
the result. It prevents a 'tst' instruction from being generated for constructions like 'if
(a+b) .. .'since after calculation of the value of 'a+b' a conditional branch can be written
immediately.
All of the discussion above is in terms of operators with operands. Leaves of the expression tree (variables and constants), however, are peculiar in that they have no operands. In
order to regularize the matching process, cexpr examines its operand to determine if it is a
leaf; if so, it creates a special 'load' operator whose operand is the leaf, and substitutes it for
the argument tree; this allows the table entry for the created operator to use the 'Al' notation
to load the leaf into a register.
Purely to save space in the tables, pieces of subtables can be labelled and referred to
later. It turns out, for example, that rather large portions of the the et/tab table for the '='
and'=+' operators are identical. Thus'=' has an entry

3 [move3:]
%a,aw
%ab,a
IBE A2,Al
while part of the '=+' table is
%aw,aw
3
[move3]
Labels are written as '3 [ ... : ]', before the key specifications; references are written with '3 [
... ]' after the key. Peculiarities in the implementation make it necessary that labels appear

2-76 A Tour Through the UNIX C Compiler
before references to them.
The example illustrates the utility of allowing separate keys to point to the same code
string. The assignment code works properly if either the right operand is a word, or the left
operand is a byte; but since there is no 'add byte' instruction the addition code has to be restricted to word operands.

Delaying and reordering
Intertwined with the code generation routines are two other, interrelated processes. The
first, implemented by a routine called delay, is based on the observation that naive code generation for the expression 'a= b++' would produce
mov b,rO
inc b
mov rO,a
The point is that the table for postfix ++ has to preserve the value of b before incrementing
it; the general way to do this is to preserve its value in a register. A cleverer scheme would
generate
mov b,a
inc b
Delay is called for each expression input to rcexpr, and it searches for postfix ++ and -operators. If one is found applied to a variable, the tree is patched to bypass the operator and
compiled as it stands; then the increment or decrement itself is done. The effect is as if 'a =
b; b++' had been written. In this example, of course, the user himself could have done the
same job, but more complicated examples are easily constructed, for example 'switch (x++ )'.
An essential restriction is that the condition codes not be required. It would be incorrect to
compile 'if (a++) .. .'as
tst a
inc a
beq
because the 'inc' destroys the required setting of the condition codes.
Reordering is a similar sort of optimization. Many cases which it detects are useful
mainly with register variables. If r is a register variable, the expression 'r = x+y' is best compiled as
mov x,r
add y,r
but the codes tables would produce
mov x,rO
add y,rO
mov rO,r
which is in fact preferred if r is not a register. (If r is not a register, the two sequences are the
same size, but the second is slightly faster.) The scheme is to compile the expression as if it
had been written 'r = x; r =+ y'. The reorder routine is called with a pointer to each tree
that rcexpr is about to compile; if it has the right characteristics, the 'r = x' tree is constructed and passed recursively to rcexpr; then the original tree is modified to read 'r =+ y'
and the calling instance of rcexpr compiles that instead. Of course the whole business is itself
recursive so that more extended forms of the same phenomenon are handled, like 'r = x + y I
z'.
Care does have to be taken to avoid 'optimizing' an expression like 'r = x + r' into 'r =
x; r =+ r'. It is required that the right operand of the expression on the right of the'=' be a
', distinct from the register variable.

A Tour Through the UNIX C Compiler 2-77
The second case that reorder handles is expressions of the form 'r = X' used as a subexpression. Again, the code out of the tables for 'x = r = y' would be
mov y,rO
mov rO,r
mov rO,x
whereas if r were a register it would be better to produce
mov y,r
mov r,x
When reorder discovers that a register variable is being assigned to in a subexpression, it calls
rcexpr recursively to compile the subexpression, then fiddles the tree passed to it so that the
register variable itself appears as the operand instead of the whole subexpression. Here care
has to be taken to avoid an infinite regress, with rcexpr and reorder calling each other forever
to handle assignments to registers.
A third set of cases treated by reorder comes up when any name, not necessarily a register, occurs as a left operand of an assignment operator other than '=' or as an operand of
prefix '++' or '--'. Unless condition-code tests are involved, when a subexpression like '(a
=+ b)' is seen, the assignment is performed and the argument tree modified so that a is its
operand; effectively 'x + (y =+ z)' is compiled as 'y =+ z; x + y'. Similarly, prefix increment
and decrement are pulled out and performed first, then the remainder of the expression.
Throughout code generation, the expression optimizer is called whenever delay or
reorder change the expression tree. This allows some special cases to be found that otherwise
would not be seen.

Introduction to the F77 1/0 Library 2- 79

Introduction to the f77 1/0 Library
David L. Wasley
University of California, Berkeley
Berkeley, California 94720

The f77 1/0 library, libl77.a, includes routines to perform all of the standard types of
FORTRAN input and output. Several enhancements and extensions to FORTRAN 1/0 have

been added. The f77 library routines use the C stdio library routines to provide efficient
buffering for file 1/0.
1. FORTRAN 1/0
The requirements of the ANSI standard impose significant overhead on programs that do
large amounts of 1/0. Formatted 1/0 can be very "expensive" while direct access binary 1/0 is
usually very efficient. Because of the complexity of FORTRAN 1/0, some general concepts
deserve clarification.
1.1. Types of 1/0
There are three forms of 1/0: formatted, unformatted, and list-directed. The last
is related to formatted but does not obey all the rules for formatted 1/0. There are two
modes of access to external and internal files: direct and sequential. The definition of a
logical record depends upon the combination of 1/0 form and mode specified by the FORTRAN 1/0 statement.
1.1.1. Direct access
A logical record in a direct access external file is a string of bytes of a length specified
when the file is opened. Read and write statements must not specify logical records longer
than the original record size definition. Shorter logical records are allowed. Unformatted
direct writes leave the unfilled part of the record undefined. Formatted direct writes cause
the unfilled record to be padded with blanks.
1.1.2. Sequential access
Logical records in sequentially accessed external files may be of arbitrary and variable length. Logical record length for unformatted sequential files is determined by the size
of items in the iolist. The requirements of this form of 1/0 cause the external physical record
size to be somewhat larger than the logical record size. For formatted write statements, logical record length is determined by the format statement interacting with the iolist at execution time. The "newline" character is the logical record delimiter. Formatted sequential
access causes one or more logical records ending with "newline" characters to be read or written.
1.1.3. List directed 1/0
Logical record length for list-directed 1/0 is relatively meaningless. On output, the
record length is dependent on the magnitude of the data items. On input, the record length is
determined by the data types and the file contents.

2-80 Introduction to the F77 1/0 Lib:tary

1.1.4. Internal 1/0
The logical record length for an internal read or write is the length of· the character
variable or array element. Thus a simple character variable is a single logical record. A character variable array is similar to a fixed length direct access file, and obeys the same rules.
Unformatted 1/0 is not allowed on "internal" files.
1.2. 1/0 execution
Note that each execution of a FORTRAN unformatted 1/0 statement causes a single
logical record to be read or written. Each execution of a FORTRAN formatted 1/0 statement
causes one or more logical records to be read or written.
A slash, "/", will terminate assignment of values to the input list during list-directed
input and the remainder of the current input line is skipped. The standard is rather vague on
this point but seems to require that a new external logical record be found at the start of any
formatted input. Therefore data following the slash is ignored and may be used to comment
the data file.
Direct access list-directed 1/0 is not allowed. Unformatted internal 1/0 is not
allowed. Both the above will be caught by the compiler. All other flavors of 1/0 are allowed,
although some are not part of the ANSI standard.
Any error detected during 1/0 processing will cause the program to abort unless alternative action has been provided specifically in the program. Any 1/0 statement may include an
err= clause (and iostat= clause) to specify an alternative branch to be taken on errors (and
return the specific error code). Read statements may include end= to branch on end-of-file.
File position and the value of 1/0 list items is undefined following an error.
2. lmplementaiion details
Some details of the current implementation may be useful in understanding constraints
on FORTRAN 1/0.
2.1. Number of logical units
The maximum number of logical units that a program may have open at one time is the
same as the UNIXt system limit, currently 20. Unit numbers must be in the range 0 - 19
because they are used to index an internal control table.
2.2. Standard logical units
By default, logical units 0, 5, and 6 are opened to "stderr"1 "stdin", and "stdout" respectively. However they can be re-defined with an open statement. To preserve error reporting,
it is an error to close logical unit 0 although it may be reopened to another file.
If you want to open the default file name for any preconnected logical unit, remember to
close the unit first. Redefining the standard units may impair normal console 1/0. An alternative is to use shell re-direction to externally re-define the above units. To re-define default
blank control or format of the standard input or output files, use the open statement specifying the unit number arid no file name (see § 2.4).
The standard units, 0, 5, and 6, are naliled internally "stderr~', "stdin", and "stdout"
respectively. These are not actual file names and can not be used for opening these units.
Inquire will not return these names and will indicate that the above units are not named
unless they have been opened to real files. The names are meant to make error reporting
more meaningful.
t UNIX is a trademark of Bell Laboratories.

Introduction to the F77 1/0 Library 2-81

2.3. Vertical format control
Simple vertical format control is implemented. The logical unit must be opened for
sequential access with form = 'print' (see § 3.2). Control codes "O" and "l" are replaced in
the output file with "\n" and "\f'' respectively. The control character "+" is not implemented
and, like any other character in the first position of a record written to a "print" file, is
dropped. No vertical format control is recognized for direct formatted output or list
directed output.
2.4. The open statement
An open statement need not specify a file name. If it refers to a logical unit that is
already open, the blank= and form= specifiers may be redefined without affecting the
current file position. Otherwise, if status = 'scratch' is specified, a temporary file with a
name of the form "tmp.FXXXXXX" will be opened, and, by default, will be deleted when
closed or during termination of program execution. Any other status= specifier without an
associated file name results in opening a file named "fort.N" where N is the specified logical
unit number.
It is an error to try to open an existing file with status = 'new' . It is an error to try to
open a nonexistent file with status = 'old' . By default, status = 'unknown' will be
assumed, and a file will be created if necessary.
By default, files are positioned at their beginning upon opening, but see ioinit(3f) for
alternatives. Existing files are never truncated on opening. Sequentially accessed external
files are truncated to the current file position on close , backspace , or rewind only if the
last access to the file was a write. An endfile always causes such files to be truncated to the
current file position.

2.5. Format interpretation
Formats are parsed at the beginning of each execution of a formatted 1/0 statement.
Upper as well as lower case characters are recognized in format statements and all the alphabetic arguments to the 1/0 library routines.
If the external representation of a datum is too large for the field width specified, the
specified field is filled with asterisks (*). On Ew.dEe output, the exponent field will be filled
with asterisks if the exponent representation is too large. This will only happen if "e" is zero
(see appendix B).
On output, a real value that is truly zero will display as "O." to distinguish it from a very
small non-zero value. This occurs in F and G format conversions. This was not done for E
and D since the embedded blanks in the external datum causes problems for other input systems.
Non-destructive tabbing is implemented for both internal and external formatted 1/0.
Tabbing left or right on output does not affect previously written portions of a record. Tabbing right on output causes unwritten portions of a record to be filled with blanks. Tabbing
right off the end of an input logical record is an error. Tabbing left beyond the beginning of
an input logical record leaves the input pointer at the beginning of the record. The format
specifier T must be followed by a positive non-zero number. If it is not, it will have a
different meaning (see § 3.1).
Tabbing left requires seek ability on the logical unit. Therefore it is not allowed in 1/0
to a terminal or pipe. Likewise, nondestructive tabbing in either direction is possible only on
a unit that can seek. Otherwise tabbing right or spacing with X will write blanks on the output.

2-82 Introduction to the F77 1/0 Library

2.6. List directed output
In formatting list directed output, the 1/0 system tries to prevent output lines longer
than 80 characters. Each external datum will be separated by two spaces. List-directed output of complex values includes an appropriate comma. List-directed output distinguishes
between real and double precision values and formats them differently. Output of a character string that includes "\n" is interpreted reasonably by the output system.
2.7. 1/0 errors
If 1/0 errors are not trapped by the user's program an appropriate error message will be
written to "stderr" before aborting. An error number will be printed in [ ] along with a brief
error message showing the logical unit and 1/0 state. Error numbers < 100 refer to UNIX
errors, and are described in the introduction to chapter 2 of the UNIX Programmer's Manual.
Error numbers ;;::. 100 come from the 1/0 library, and are described further in the appendix to
this writeup. For internal 1/0, part of the string will be printed with "I" at the current position in the string. For external 1/0, part of the current record will be displayed if the error
was caused during reading from a file that can backspace.
3. Non-"ANSI Standard" extensions
Several extensions have been added to the 1/0 system to provide for functions omitted
or poorly defined in the standard. Programmers should be aware that these are non-portable.
3.1. Format specifiers
B is an acceptable edit control specifier. It causes return to the default mode of blank
interpretation. This is consistent with S which returns to default sign control.
P by itself is equivalent to OP . It resets the scale factor to the default value, 0.
The form of the Ew.dEe format specifier has been extended to D also. The form Ew.d.e
is allowed but is not standard. The "e" field specifies the minimum number of digits or
spaces in the exponent field on output. If the value of the exponent is too large, the exponent
notation e. or d will be dropped from the output to allow one more character position. If this
is still not adequate, the "e" field will be filled with asterisks (*). The default value for "e" is

2.
An additional form of tab control specification has been added. The ANSI standard
forms TRn, TLn, and Tn are supported where n is a positive non-zero number. If Tor nT is
specified, tabbing will be to the next (or n-th) 8-column tab stop. Thus columns of
alphanumerics can be lined up without counting.
A format control specifier has been added to suppress the newline at the end of the last
record of a formatted sequential write. The specifier is a dollar sign ($). It is constrained by
the same rules as the colon (:). It is used typically for console prompts. 'For example:
write (*, "('enter value for x: ',$)")
read(*,*) x
Radices other than 10 can be specified for formatted integer 1/0 conversion. The
specifier is patterned after P, the scale factor for floating point conversion. It remains in effect
until another radix is specified or format interpretation is complete. The specifier is defined as
[n]R where 2 .;;; n .;;; 36. If n is omitted, the default decimal radix is restored.
In conjunction with the above, a sign control specifier has been added to cause integer
values to be interpreted as unsigned during output conversion. The specifier is SU and
remains in effect until another sign control specifier is encountered, or format interpretation is
complete. Radix and "unsigned" specifiers could be used to format a hexadecimal dump, as
follows:

Introduction to the F77 1/0 Library 2-83
2000 format ( SU, 16R, 8110.8 )
Note: Unsigned integer values greater than (2**30 - 1), i.e. any signed negative value, can not
be read by FORTRAN input routines. All internal values will be output correctly.
3.2. Print :files
The ANSI standard is ambiguous regarding the definition of a "print" file. Since UNIX
has no default "print" file, an additional form= specifier is now recognized in the open statement. Specifying form = 'print' implies formatted and enables vertical format control for
that logical unit. Vertical format control is interpreted only on sequential formatted writes to
a "print" file.
The inquire statement will return print in the form= string variable for logical units
opened as "print" files. It will return -1 for the unit number of an unconnected file.
If a logical unit is already open, an open statement including the form= option or the
blank= option will do nothing but re-define those options. This instance of the open statement need not include the file name, and must not include a file name if unit= refers to a
standard input or output. Therefore, to re-define the standard output as a "print" file, use:

open (unit=6, form='print')
3.3. Scratch :files
A close statement with status = 'keep' may be specified for temporary files. This is
the default for all other files. Remember to get the scratch file's real name, using inquire , if
you want to re-open it later.

3.4. List directed 1/0
List directed read has been modified to allow input of a string not enclosed in quotes.
The string must not start with a digit, and can not contain a separator (, or /) or blank (space
or tab). A newline will terminate the string unless escaped with x Any string not meeting the
above restrictions must be enclosed in quotes (" or ').
Internal list-directed 1/0 has been implemented. During internal list reads, bytes are
consumed until the iolist is satisfied, or the 'end-of-file' is reached. During internal list writes,
records are filled until the iolist is satisfied. The length of an internal array element should be
at least 20 bytes to avoid logical record overflow when writing double precision values. Internal list read was implemented to make command line decoding easier. Internal list write
should be avoided.
· 4. Running older programs
Traditional FORTRAN environments usually assume carriage control on all logical units,
usually interpret blank spaces on input as "O"s, and often provide attachment of global file
names to logical units at run time. There are several routines in the I/O library to provide
these functions.
4.1. Traditional unit control parameters
If a program reads and writes only units 5 and 6, then including -1166 in the f77 command will cause carriage control to be interpreted on output and cause blanks to be zeros on
input without further modification of the program. If this is not adequate, the routine
ioinit(3f) can be called to specify control parameters separately, including whether files should
be positioned at their beginning or end upon opening.

2-84 Introduction to the F77 I/O Library
4.2. Preattachment of logical units
The ioinit routine also can be used to attach logical units to specific files at run time. It
will look for names of a user specified form in the environment and open the corresponding
logical unit for sequential formatted 1/0. Names must be of the form PREFIXnn where
PREFIX is specified in the call to ioinit and nn is the logical unit to be opened. Unit
numbers< 10 must include the leading "O".
loinit should prove adequate for most programs as written. However, it is written in
FORTRAN-77 specifically so that it may serve as an example for similar user-supplied routines. A copy may be retrieved by "ar x /usr/lib/libl77.a ioinit.f".

5. Magnetic tape I/O
Because the 1/0 library uses stdio buffering, reading or writing magnetic tapes should be
done with great caution, or avoided if possible. · A set of routines has been provided to read
and write arbitrary sized buffers to or from tape directly. The buffer must be a character
object. Internal 1/0 can be used to fill or interpret the buffer. These routines do not use
normal FORTRAN 1/0 processing and do not obey FORTRAN 1/0 rules. See tapeio(3{).
6. Caveat Programmer
The 1/0 library is extremely complex yet we believe there are few bugs left. We've tried
to make the system as correct as possible according to the ANSI X3.9-1978 document and
keep it compatible with the UNIX file system. Exceptions to the standard are noted in appendix B.

Introduction to the F77 1/0 Library 2-85
Appendix A
1/0 Library Error Messages
The following error messages are generated by the 1/0 library. The error numbers are
returned in the iostat= variable if the err= return is taken. Error numbers < 100 are generated by the UNIX kernel. See the introduction to chapter 2 of the UNIX Programmers
Manual for their description.

/* 100 */ "error in format"
See error message output for the location
of the error in the format. Can be caused
by more than 10 levels of nested (), or
an extremely long format statement.

/* 101 *I "illegal unit number"
It is illegal to close logical unit 0.
Negative unit numbers are not allowed.
The upper limit is system dependent.

/* 102 */ "formatted io not allowed"
The logical unit was opened for
unformatted 1/0.

/* 103 *I "unformatted io not allowed"
The logical unit was opened for
formatted 1/0.

/* 104 */ "direct io not allowed"
The logical unit was opened for sequential
access, or the logical record length was
specified as 0.

/* 105 *I "sequential io not allowed"
The logical unit was opened for direct
access 1/0.

/* 106 */ "can't backspace file"
The file associated with the logical unit
can't seek. May be a device or a pipe.

/* 107 */ "off beginning of record"
The format specified a left tab beyond the
beginning of an internal input record.

/* 108 */ "can't stat file"
The system can't return status information
about the file. Perhaps the directory is
unreadable.

/* 109 *I "no * after repeat count"
Repeat counts in list-directed 1/0 must be
followed by an * with no blank spaces.

2-86 Introduction to the F77 1/0 Library

/* 110 */ "off end of record"
A formatted write tried to go beyond the
logical end-of-record. An unformatted read
or write will also cause this.

/* 111 */ "truncation failed"
The truncation of an external sequential file on
'close', 'backspace', 'rewind' or 'endfile' failed.

/* 112 *I "incomprehensible list input"
List input has to be just right.
/* 113 *I "out of free space"
The library dynamically creates buffers for
internal use. You ran out of memory for this.
Your program is too big!

/* 114 */ "unit not connected"
The logical unit was not open.

/* 115 *I "read unexpected character"
Certain format conversions can't tolerate
non-numeric data. Logical data must be
TorF.

/* 116 */ "blank logical input field"
/* 117 *I "'new' file exists"
You tried to open an existing file with
"status= 'new'".

/* 118 */ "can't find 'old' file"
You tried to open a non-existent file
with "status='old'".

/* 119 *I "unknown system error"
Shouldn't happen, but .....

/* 120 *I "requires seek ability"
Direct access requires seek ability.
Sequential unformatted 1/0 requires seek
ability on the file due to the special
data structure required. Tabbing left
also requires seek ability.

/* 121 *I "illegal argument"
Certain arguments to 'open', etc. will be
checked for legitimacy. Often only nondefault forms are looked for.

----~----

Introduction to the F77 1/0 Library 2-87

/* 122 *I "negative repeat count"
The repeat count for list directed input
must be a positive integer.

/* 123 *I "illegal operation for unit"
An operation was requested for a device
associated with the logical unit which
was not possible. This error is returned
by the tape 1/0 routines if attempting to
read past end-of-tape, etc.

2-88 Introductio:li to the F77 1/0 Library
Appendix B
Exceptions to the ANSI Standard
A few exceptions to the ANSI standard remain.
1) Vertical format control
The "+" carriage control specifier is not implemented. It would be difficult to implement it correctly and still provide UNIX-like file 1/0.
Furthermore, the carriage control implementation is asymmetrical. A file written with
carriage control interpretation can not be read again with the same characters in colutnn 1.
An alternative to interpreting carriage control internally is to run the output file through
a "FORTRAN output filter" before printing. This filter could recognize a much broader range
of carriage control and ihclude terminal dependent processing.
2) Default files
Files created by default use of rewind or end:file statements are opened for sequential formatted access. There is no way to redefine such a file to allow direct or unformatted access.
3) Lower case strings
It is not clear if the ANSI standard requires internally generated strings to be upper case
or not. As currently written, the inquire statement will return lower case strings for any
alphanumeric data.
4) Exponent representation on Ew.dEe output
If the field width for the exponent is too small, the standard allows dropping the
exponent character but only if the exponent is> 99.. This system does not enforce that restriction. Further, the standard implies that the entire field, 'w', should be filled with asterisks if
the exponent can not be displayed. This system fills only the exponent field in the above case
since that is more diagnostic.

A Portable Fortran 77 Compiler 2-89

A Portable Fortran 77 Complier
S. I. Feldman
P. J. Weinberger
Bell Laboratories
Murray Hill, New Jersey 07974

1. INTRODUCTION
The Fortran languqe has been revised. The new language, known as Fortran 77, became an
official American National Standard (1) on April 3, 1978. Fortran 77 supplants 1966 Standard
Fortran [2]. We report here on a compiler and run-time system for the new extended language.
The compiler and computation library were written by S.I.F., the 1/0 system by P.J.W. We
believe ours to be the first complete Fortran 77 system to be implemented. This compiler is
designed to be portable to a number of dift'erent machines, to be correct and complete, and to
generate code compatible with calling sequences produced by compilers for the C language (3).
In particular. it is in use on UNIX systems. Two families of C compilers are in use at Bell
Laboratories, those based on D. M. Ritchie's PDP-11 compiler (4) and those based on S. C.
Johnson's portable C compiler [SJ. This Fortran compiler can drive the second passes of either
family. In this paper, we describe the lanauaae compiled, interfaces between procedures, and
file formats assumed by the 1/0 system. We will describe implementation details in companion
papers.
1.1. u...e
At present, versions of the compiler run on and compile for the PDP-11, the VAX-111780,
and the Interdata 8/32 UNIX systems. The command to run the compiler is
f77 ftags file .•.
177 is a general-purpoM command for compiling and loading Fortran and Fortran-related
files. EFL [6] and Ratfor [7] source files will be preprocessed before being presented to
the Fortran compiler. C and assembler source files will be compiled by the appropriate
programs. Object files will be loaded. (The f77 and cc commands cause slightly different
loading sequences to be generated, since Fortran programs need a few extra libraries and a
different startup routine than do C programs.) The followina file name suftixes are understood:

Fortran source file

.e
.r
.c

EFL source file
Ratfor source file
C source file
Assembler source file
Object file

.F Fortran source file

••

\
/

Arguments whose names end with .f are taken to be Fortran 77 source programs; they are
compiled, and each object program is left on the file in the current directory whose name
is that of the source with .o substituted for .f.
Arguments whose names end with .F are also taken to be Fortran 77 source programs;
these are first processed by the C preprocessor before being compiled by f77.

2-90 A Portable Fortran 77 Compiler
Arauments whose names end with .r or .e are taken to be Ratfor or EFL source programs,
respectively; these are first transformed by the appropriate preprocessor, then compiled by

rn.

In the same way, arguments whose names end with .c or .1 are taken to be C or assembly
source programs and are compiled or assembled, producing a .o file.
The followin1 flap are understood:
Compile but do not load. Output for "S..f, "S..F, "S..e, 'S..r, "S..c, or s.s is put
on file "S..o.
Have the compiler produce additional symbol table information for
dbx(l). This only applies on the Vax UNIX system. Do not use with -0.
-12
On machines which support short integers, make the default integer constants and variables short (see section 2.14). (-14 is the standard value
of this option). All loaical quantities will be short.
Apply the M4 macro preprocessor to each EFL or Ratfor source file before
-m
using the appropriate compiler.
-ofile
Put executable module on file file. (Default is a.out).
-onetrlp
Compile code that performs every do loop at least once (see section 2.12).
-p
Generate code to produce usage profiles.
Generate code in the manner of -p, but invoke a run-time recording
mechanism that keeps more extensive statistics.
Suppress all warning messages.
-w
Suppress warnings about Fortran 66 features used.
Make the default type of a variable andelned (see section 2.3).
-a
-c
Compile code that checks that subscripts are within array bounds.

-·''

-Dname-def
-Dname
Define the name to the C preprocessor, as if by '#define'. If no definition

-Estr

-r
-ldir

-0
-llstr

-u
-s

is pven, the name is defined as •1•. (.F files only).
Use the string str as an EFL option in processing .e files.
Ratfor and and EFL source programs are pre-processed into Fortran files,
but those files are not compiled or removed.
'#include' files whose names do not beain with '/'are always sought first
in the directory of the file argument, then in directories named in -I
options, then in directories on a standard list. (.F files only).
Invoke the object code optimizer. Do not use with-..
Use the string str as a Ratfor option in processing .r files.
Do not convert upper case letters to lower case. The default is to convert
Fortran programs to lower case except within character string constants.
Generate assembler output for each source file, but do not assemble it.
Assembler output for a source file s.f, s.F, s.e, s.r, or s.c is put on file

s.1.
Other flap, all library names (arguments beainning -1), and any names not ending with
one of the understood suffixes are passed to the loader.
1.2. Doeamentadon Connntlons
In running text, we write Fortran keywords and other literal strings in boldface lower case.
Examples will be presented in ligbtface lower case. Names representing a class of values
will be printed in italics.

A Portable Fortran 77 Compiler 2-91

1.3. Implementation Strateu
The compiler and library are written entirely in C. The compiler aenerates C compiler
intermediate code. Since there are C compilers runnina on a variety of machines, relatively small chanaes will make this Fortran compiler aenerate code for any of them.
Furthermore, this approach auarantees that the resultina prolfams are compatible with C
usqe. The runtime computational library is complete. The runtime 1/0 library makes
use of D. M. Ritchie's Standard C 1/0 package (8) for transferrina data. With the few
exceptions described below, only documented calls are used, so it should be relatively
euy to modify to run on other operatina systems.

2. LANGUAGE IXTINSIONS
Fortran 77 includes almost all of Fortran 66 as a subset. We describe the differences briefly in
Appendix A. The most important additions are a character strina data type, file-oriented
input/output statements, and random access 1/0. Also, the lanauaae has been cleaned up considerably.
In addition to implementina the lanauaae specified in the new Standard, our compiler implements a few extensions described in this section. Most are useful additions to the language.
The remainder are extensions to make it easier to communicate with C procedures or to permit
compilation of old (1966 Standard) prOlfams.

2.1. Doable Comples Data Tne
The new type Mable eomples is defined. Each datum is represented by a pair of double
precision real variables. A double complex version of every comples built-in function is
provided. The specific function names beain with z instead of c.

2.2. Internal fll•
The Fortran 77 standard introduces "internal files" (memory arrays), but restricts their
use to formatted sequential 1/0 statements. Our 1/0 system also permits internal files to
be used in formatted direct reads and writes.
2.3. lmplidt Undehed Statement
Fortran 66 has a fixed rule that the type of a variable that does not appear in a type statement is lnteaer if its first letter is I, J, k, I, m or n, and real otherwise. Fortran 77 has an
Implicit statement for overridina this rule. As an aid to aood prolfamming practice, we
permit an additional type, andehed. The statement
implicit undefined (a-z)
turns oft' the automatic data typina mechanism, and the compiler will issue a diagnostic
for each variable that is used but does not appear in a type statement. Specifying the -u
compiler flag is equivalent to beainnina each procedure with this statement.
2.4. Reeanlon
Procedures may caU themselves, direcdy or throuah a chain of other procedures.
2.5. Automatic Storqe
Two new keywords are recognized, static and automatic. These keywords may appear as
"types" in type statements and in Implicit statements. Local variables are static by
default; there is exacdy one copy of the datum, and its value is retained between calls.
There is one copy of each variable declared automatic for each invocation of the procedure. Automatic variables may not appear in equivalence, data, or save statements.

2-92 A Portable Fortran 77 Compiler

2.6. Soarce lnpat Format
The Standard expects input to the compiler to be in 72-column format: except in comment lines, the first five characters are the statement number, the next is the continuation
character, and the next 66 are the body of the line. (If there are fewer than 72 characters
on a line, the compiler pads it with blanks; characters after the seventy-second are
isnored.)
In order to make it easier to type Fortran programs, our compiler also accepts input in
variable leqth lines. An ampersand "cl" in the first position of a line indicates a continuation line; the remainin1 characters form the body of the line. A tab character in one
of the first six positions of a line sisnals the end of the statement number and continuation part of the line; the remaining characters form the body of the line. A tab elsewhere
on the line is treated as another kind of blank by the compiler.
In the Standard, there are only 26 letters - Fortran is a one-case language. Consistent
with ordinary UNIX system usage, our compiler expects lower case input. By default, the
compiler converts all upper case characters to lower case except those inside character
constants. However, if the -U compiler flag is specified, upper case letters are not
transformed. In this mode, it is possible to specify external names with upper case letters
in them, and to have distinct variables differing only in case. Regardless of the setting of
the Dag, keywords will only be recognized in lower case.

2.7. lndade Statemat
The statement
include 'stuff'
is replaced by the contents of the file staff; lndade statements may be nested to a reason-

able depth, currendy ten.

2.1. BlnUJ lnldallzadon Constants
A variable may be initialized in a data statement by a binary constant, denoted by a letter
followed by a quoted string. If the letter is It, the string is binary, and only zeroes and
ones are permitted. If the letter is o, the string is octal, with digits 0-7. If the letter is z
ors, the strina is hexadecimal, with digits 0-t, a-f. Thus, the statements

inte1er a(3)
data a I b'lOlO', o'l2', z'a' I
initialize all three elements of a to ten.

2.t. Character Strln1s
For compatibility with C usage, the following backslash escapes are recognized:
\n

\t
\•

\0
\'
\•

newline
tab
backspace
form feed
null
apostrophe (does not terminate a string)
quotation mark (does not terminate a string)

\\ \
\x x, where xis any other character
Fortran 77 only has one quoting character, the apostrophe. Our compiler and 1/0 system
recognize both the apostrophe " ' " and the double-quote " " 0 • If a string begins with
one variety of quote mark, the other may be embedded within it without using the
repeated quote or backslash escapes.

(

A Portable Fortran 77 Compiler 2-93

Each character string constant appearing outside a data statement is followed by a null
character to eue communication with C routines.
2.10. Hollerith
Fortran 77 does not have the old Hollerith "nb" notation, though the new Standard
recommends implementing the old Hollerith feature in order to improve compatibility
with old proarams. In our compiler, Hollerith data may be used in place of character
string constants, and may also be used to initialize non-character variables in data statements.
2.11. 14aulnlence Statements
As a very special and peculiar case, Fortran 66 permits an element of a multiplydimensioned array to be represented by a singly-subscripted reference in equivalence
statements. Fortran 77 does not permit this usage, since subscript lower bounds may now
be dift'erent from 1. Our compiler permits single subscripts in equivalence statements,
under the interpretation that all missing subscripts are equal to 1. A warning message is
printed for each such incomplete subscript.
2.12. One-Trip DO Loops
The Fortran 77 Standard requires that the range of a do loop not be performed if the initial value is already past the limit value, as in
do 10 i - 2, 1

The 1966 Standard stated that the effect of such a statement was undefined, but it was
common practice that the ranae of a do loop would be performed at least once. In order
to accommodate old proarams, though they were in violation of the 1966 Standard, the
-onetrlp compiler Oas causes non-standard loops to be 1enerated.
2.13. Commas In formatted Input
The UO system attempts to be more lenient than the Standard when it seems worthwhile.
When doina a formatted read of non-character variables, commas may be used as value
separators in the input record, overriding the field lenaths given in the format statement.
Thus, the format

(ilO, f20.10, i4)
will read the record

-34S,.0Se-3, 12
correctly.

I
I

2.14. Short Intesen
On machines that support halfword integers, the compiler accepts declarations of type
lntesere2. (Ordinary integers follow the Fortran rules about occupying the same space as
a REAL variable; they are assumed to be of C type lon1 Int; halfword integers are of C
type short Int.) An expression involvina only objecta of type lnteaer•2 is of that type.
Generic functions return short or long integers depending on the actual types of their
aquments. If a procedure is compiled using the -12 Oq, all small integer constants will
be of type lntesere2. If the precision of an integer-valued intrinsic function is not determined by the 1eneric function rules, one will be chosen that returns the prevailing length
(lntesere2 when the -12 command 0q is in effect). When the -12 option is in etrect, all
quantities of type loalcal will be short. Note that these short integer and logical quantities
do not obey the standard rules for storage association.

2-94 A Portable Fortran 77 Compiler

2.15. Additional Intrinsic F11nctlon1
This compiler supports all of the intrinsic functions specified in the Fortran 77 Standard.
In addition, there are functions for performing bitwise Boolean operations (or, and, sor,
and not) and for accessing the UNIX command arguments (setaq and laqc) and environment (setenT).

3. VIOLATIONS OF THE STANDARD
We know only a few ways in which our Fortran system violates the new standard:
3.1. Double Precision Allpment
The Fortran Standards (both 1966 and 1977) permit common or eqalnlence statements
to force a double precision quantity onto an odd word boundary, as in the following example:
real a(4)
double precision b,c

equivalence (a(l),b), (a(4),c)
Some machines (e.g., Honeywell 6000, IBM 360) require that double precision quantities
be on double word boundaries; other machines (e.g., IBM 370), run inefficiently if this
alignment rule is not observed. It is possible to tell which equivalenced and common
variables suft'er from a forced odd alignment, but every double precision argument would
have to be assumed on a bad boundary. To load such a quantity on some machines, it
would be necessary to use separate operations to move the upper and lower halves into
the halves of an aligned temporary, then to load that double precision temporary; the
reverse would be needed to store a result. We have chosen to require that all double precision real and complex quantities fall on even word boundaries on machines with
corresponding hardware requirements, and to issue a diagnostic if the source code
demands a violation of the rule.
3.2. Dummr Procedure A1111ments
If any argument of a procedure is of type character, all dummy procedure arguments of
that procedure must be declared in an e:s:temal statement. This requirement arises as a
subtle corollary of the way we represent character string arguments and of the one-pass
nature of the compiler. A warning is printed if a dummy procedure is not declared external. Code is correct if there are no character arguments.

3.3. T and TL Formats
The implementation of the t (absolute tab) and ti (leftward tab) format codes is defective.
These codes allow rereading or rewriting part of the record which has already been processed (section 6.3.2 in Appendix A). The implementation uses seeks, so if the unit is
not one which allows seeks, such as a terminal, the program is in error. A benefit of the
implementation chosen is that there is no upper limit on the length of a record, nor is it
necessary to predeclare any record lengths except where specifically required by Fortran or
the operating system.
3•.C. Curlqe Control
The Standard leaves as implementation dependent which logical unit(s) are treated as
"printer" files. In this implementation there is no printer file and thus no carriage control
is recop.ized on formatted output, except by special arrangement [9].

(

A Portable Fortran 77 Compiler 2-95

3.5. A11lpell Goto
The optional list associated with an assigned 1oto statement is not checked against the
actual assianed value during execution.
4. INTER-PROCEDUllEINTERFACE
To be able to write C procedures that call or are called by Fortran procedures, it is necessary to
know the conventions for procedure names, data representation, return values, and argument
lists that the compiled code obeys.

4.1. Procedure Names
On UNIX systems, the name of a common block or a Fortran procedure has an underscore
appended to it by the compiler to distinguish it from a C procedure or external variable
with the same user-assigned name. Fortran library procedure names have embedded
underscores to avoid clashes with user-assigned subroutine names.
4.2. Data Representadon1
The following is a table of corresponding Fortran and C declarations:
Fortna

integer•2 x
integer x
logical x
real x
double precision x
complex x
double complex x
character•6 x

short int x;
long int x;
long int x;
float x;
double x;
struct ( float r, i; } x;
struct { double dr, di; } x;
char x[6];

(By the rules of Fortran, inteser, lodcal, and real data occupy the same amount of
memory.)
4.3. Return Values
A function of type lnteser, I01ical, real, or double precision declared as a C function
returns the corresponding type. A complex or doable complex function is equivalent to a
C routine with an additional initial argument that points to the place where the return
value is to be stored. Thus,
complex function f( ... )
is equivalent to
f_(temp, ... )
struct { float r, i; } •temp;
A character-valued function is equivalent to a C routine with two extra initial arguments:
a data address and a length. Thus,
character•lS function g( ... )
is equivalent to

g_ (result, length, ... )
char result[ ];
long int length;
;

and could be invoked in C by

2-96 A Portable Fortran 77 Compiler
char chars[lS];

L(chars, ISL, ... );
Subroutines are invoked u if they were lnteser-valued functions whose value specifies
which alternate return to use. Alternate return arguments (statement labels) are not
passed to the function, but are used to do an indexed branch in the calling procedure. (If
the subroutine bu no entry points with alternate return arguments, the returned value is
undefined.) The statement
call met(•l, •2, •3)
is treated exactly u if it were the computed 1oto

aoto (1, 2, 3), met()
4.4. Arpment L11t1
All Fortran arauments are passed by address. In addition, for every araument that is of
type character or that is a dummy procedure, an argument giving the lenath of the value
is passed. (The strina lenaths are Iona Int quantities pused by value.) The order of arguments is then:

Extra arauments for complex and character functions
Address for each datum or function
A lon1 Int for each character or procedure argument
Thus, the call in
external f
character•7 s
inteaer b(3)
call sam(f, b(2), s)
is equivalent to that in

int fO;
char s[7];
Iona int b[3);
sam_(f, ctb[l], s, OL, 7L);
Note that the first element of a C array always bu subscript zero, but Fortran arrays begin
at 1 by default. Fortran arrays are stored in column-major order, C arrays are stored in
row-major order.
5. FILE FORMATS

5.1. Straetan of Fortna Files
Fortran requires four kinds of external files: sequential formatted and unformatted, and
direct formatted and unformatted. On UNIX systems, these are all implemented u ordi~ files which are usumed to have the proper internal structure.
Fortran I/O is basect on records. When a direct file is opened in a Fortran program, the
recont lenath of the records must be given, and this is us~ by the Fortran 110 system to
make ·the file look u if it is made up of records of the given lenath. In the special cue
that the record length is given u 1, the files are not considered to be divided into records,
but· are treated u byte-addressable byte strinp; that is, u ordinary UNIX file system files.
(A read or write request on such a file keeps consuming bytes until satisfied, rather than

A Portable Fortran 77 Compiler 2-97
beina restricted to a sinale record.)
The peculiar requirements on sequential unformatted files make it unlikely that they will
ever be read or written by any means except Fortran 1/0 statements. Each record is pre·
ceded and followed by an inteaer containing the record's lenath in bytes.
The Fortran 1/0 system breaks sequential formatted files into records while reading by
usina each newline as a record separator. The result of readina oft' the end of a record is
undefined according to the Standard. The 1/0 system is permissive and treats the record
as beina extended by blanks. On output, the 110 system will write a newline at the end of
each record. It is also possible for proarams to write newlines for themselves. This is an
error, but the only eft'ect will be that the single record the user thouaht he wrote will be
treated as more than one record when beina read or backspaced over.
5.2. PortabUltJ Consldendons
The Fortran 1/0 system uses only the facilities of the standard C 110 library, a widely
available and fairly portable package, with the foil owing two nonstandard features: the 1/0
system needs to know whether a flle can be used for direct 1/0, and whether or not it is
possible to backspace. Both of these facilities are implemented using the fseek routine, so
there is a routine eaaseek which determines if fseek will have the desired eft'ect. Also,
the laqalre statement provides the user with the ability to find out if two files are the
same, and to get the name of an already opened file in a form which would enable the
proaram to reopen it. Therefore there are two routines which depend on facilities of the
operatina system to provide these two services. In any case, the 1/0 system runs on the
PDP-11, VAX-111780, and Interdata 8/32 UNIX systems.
5.3. Pre-Connected Files ud File P•ldons
Units S, 6, and 0 are preconnected when the proaram starts. Unit S is connected to the
standard input, unit 6 is connected to the standard output, and unit 0 is connected to the
standard error unit. All are connected for sequential formatted 1/0.
All the other units are also preconnected when execution begins. Unit n is connected to
a flle named fort.n. These flies need not exist, nor will they be created unless their units
are used without first executina an open. The default connection is for sequential format·
ted 1/0.
The Standard does not specify where a ftle which has been explicitly opened for sequential
110 is initially positioned. The 1/0 system will position the file at the beginning. Therefore a write will destroy any data already in the ftle, but a read will work reasonably. To
position a ftle to its end, use a 'read' loop, or the system dependent 'fseek' function. The
preconnected units 0, S, and 6 are positioned as they come from the program's parent
process.

\.
/

2-98 A Portable Fortran 77 Compiler

APPENDIX A: Differences Between Fortran " and Fortran 77
The following is a very brief description of the differences between the 1966 (2) and the 1977
(1) Standard languages. We assume that the reader is familiar with Fortran 66. We do not pretend to be complete, precise, or unbiased, but plan to describe what we feel are the most
important aspects of the new lanauqe. The best current information on the 1977 Standard is
in publications of the X3J3 Subcommittee of the American National Standards Institute, and the
ANSI. Xl.9-1978 document, the official description of the language. The Standard is written in
English rather than a meta-languqe, but it is forbiddin1 and legalistic. A number of tutorials
and textbooks are available (see Appendix B).

t. Feahuel Deleted from Fortran "
1.1. Hollerith
All notions of "Hollerith" (nh) as data have been officially removed, although our compiler, like almost all in the foreseeable future, will continue to support this archaism.
1.2. Enended l.an1e
In Fortran 66, under a set of very restrictive and rarely-understood conditions, it is permissible to jump out of the range of a do loop, then jump back into it. Extended range
bas been removed in the Fortran 77 language. The restrictions are so special, and the
implementation of extended ranae is so unreliable in many compilers, that this change
really counts as no loss.
2. Propam Form
2.1. Blank Lines
Completely blank lines are now legal comment lines.
2.2. Propam and Block Data Statement•
A main program may now beain with a statement that gives that program an external
name:
program work
Block data procedures may also have names.
block data stuff
There is now a rule that only one unnamed block data procedure may appear in a program. (This rule is not enforced by our system.) The Standard does not specify the etrect
of the program and block data names, but they are clearly intended to aid conventional
loaders.
2.3. ENTRY Statement
Multiple entry points are now legal. Subroutine and function subprograms may have additional entry points, declared by an entrJ statement with an optional argument list.
entry extra(a, b, c)
Execution begins at the first statement followin1 the entrJ line. All variable declarations
must precede all executable statements in the procedure. If the procedure beains with a
1abroatlne statement, all entry points are subroutine names. If it begins with a function
statement, each entry is a function entry point, with type determined by the type declared
for the entry name. If any entry is a character-valued function, then all entries must be.
In a function, an entry name of the same type as that where control entered must be
assisned a value. Arguments do not retain their values between calls. (The ancient trick

(

A Portable Fortran 77 Compiler 2-99
of calling one entry point with a large number of arguments to cause the procedure to
"remember" the locations of those arguments, then invoking an entry with just a few
arguments for later calculation, is still illegal. Furthermore, the trick doesn't work in our
implementation, since arguments are not kept in static storage.)
2.4. DO LooPI
do variables and range parameters may now be of integer, real, or double precision types.
(The use of floating paint do variables is very dangerous because of the possibility of
unexpected roundoff, and we strongly recommend against their use.) The action of the do
statement is now defined for all values of the do parameters. The statement
do 10 i - 1, u, d
performs max(O, l<u-l+d)/d J) iterations. The do variable has a predictable value when
exitin1 a loop: the value at the time a 1oto or retum terminates the loop; otherwise the
value that failed the limit test.
2.5. Alternate Retams
In a sabroatlne or subroutine entrJ statement, some of the arguments may be noted by
an asterisk, u in
subroutine s(a, •, b, •)
The meaning of the "alternate returns" is described in section S.2 of Appendix A.

3. Deeluadons
3.1. CHARACTER Data Tne
One of the biggest improvements to the language is the addition of a character-string data
type. Local and common character variables must have a length denoted by a constant
expression:
character•17 a, b(3,4)
character• (6 + 3) c

If the len1th is omitted entirely, it is assumed equal to 1. A character string argument
may have a constant length, or the length may be declared to be the same u that of the
correspondina actual argument at run time by a statement like
character-(•) a
(There is an intrinsic function lea that returns the actual length of a character string.)
Character arrays and common blocks containing character variables must be packed: in an
array of character variables, the first character of one element must follow the Jut character of the precedina element, without holes.
3.2. IMPLICIT Statement
The traditional implied declaration rules still hold: a variable whose name begins with 1, J,
k, I, m, or n is of type lnteaer; other variables are of type real, unless otherwise declared.
This general rule may be overridden with an lmplldt statement:
implicit real(a-c,g), complex(w-z), character•(l 7) (s)
declares that variables whose name begins with an a ,b, c, or 1 are real, those beginning
with w, s, 1, or 1 are assumed comples, and so on. It is still poor practice to depend on
implicit typing, but this statement is an industry standard.

2-100 A Portable Fortran 77 Compiler

3.3. PARAMETER Statement
It is now possible to aive a constant a symbolic name, as in

(

parameter (x-17, y-x/3, pi-3.141S9d0, s-'hello')
The type of each parameter name is aovemed by the same implicit and explicit rules as
for a variable. The right side of each equal sip must be a constant expression (an
expression made up of constants, operaton, and already defined parameten).

3.•. Ana1 Dedaradoa1
Arrays may now have as many as seven dimensions. (Only three were permitted in
1966.) The lower bound of each dimension may be declared to be other than 1 by using a
colon. Furthermore, an acljustable array bound may be an integer expression involving
constants, arauments, and variables in commoa.
real a(-5:3, 7, m:n'.), b(n+1:2•n)
The upper bound on the last dimension of an array argument may be denoted by an asterisk to indicate that the upper bound is not specifted:
inteaer a(S, •), b(•), c(O:l, -2:•)
3.5. SAVE Statemeat
A poorly known rule of Fortran 66 is that local variables in a procedure do not necessarily
retain their values between invocations of that procedure. At any instant in the execution
of a proaram, if a common block is declared neither in the currently executina procedure
nor in any of the procedures in the chain of callen, all of the variables in that common
block also become undefined. (The only exceptions are variables that have been defined
in a data statement and never changed.) These rules permit overlay and stack implementations for the aft'ected variables. Fortran 77 permits one to specify that certain variables
and common blocks are to retain their values between invocations. The declaration
save a, /b/, c
leaves the values of the variables a and c and all of the contents of common block b
unaffected by a retum. The simple declaration
save
has this effect on all variables and common blocks in the procedure. A common block
must be sand in every procedure in which it is declared if the desired effect is to occur.

3.6. INTRINSIC Statemeat
All of the functions specified in the Standard are in a single category, "intrinsic functions", rather than being divided into "intrinsic" and "basic external" functions. If an
intrinsic function is to be passed to another procedure, it must be declared intrinsic.
Declarina it estemal (as in Fortran 66) causes a function other than the built-in one to be
passed.

•· lsprea1loa1

•.t. Character Constaats
Character string constants are marked by strings surrounded by apostrophes. If an apostrophe is to be included in a constant, it is repeated:
'abc'
'ain"t'

(

A Portable Fortran 77 Compiler 2-101

There are no null (zero-lenath) character strinp in Fortran 77. Our compiler has two
dift'erent quotation mub. 0 ' 0 and " • ". (See section 2.9 in the main text.)

•.z. Concatenadon
One new operator has been actded, chuacter strina concatenation, muked by a double
slash '' //". The result of a concatenation is the strina contajning the chuacters of the
left operand followed by the chuacters of the right operand. The strings
'ab'// 'cd'
'abed'

are equal. The strinp being concatenated must be of constant lensth in all concatenations
that are not the right sides of assignments. (The only concatenation expressions in which
a cbuacter string declared acljustable with a "•(•)" modifier Qr a substring denotation
with nonconstant position values may appear are the right sides of assignments.)

4.3. Character Strbq Aulpmeat
The left and right sides of a character assignment mar not share storage. (The assumed
implementation of character assignment is to copy characters from the right to the left
side.) If the left side is longer than tJie right, it is padded with blanks. If the left side is
shorter than the right, trailing chuacters are discarded.

'·'· Sab1trln11
It is possible to extract a substrin1 of a chuacter variable or chuacter array element, using
the colon notation:
a(i,j) (m:n)

is the string of (n-m+l) chuacters beginning at the m•h chuacter of the character array
element au. R.esults are undefined unless m n. Substrings may be used on the left
sides of assisnments and u procedure actual qµments.

•.$. l•••ndatloa

It is now permissible to raise real quantities to complex powers, or complex quantities to
real pr complex powers. (The principal part of the ioaarithm is used.) Also, multjple
exponentiation is now defined:
a••h--c is equivalent to a •• (b-•c)

4.6. Reluatlo-. of Restrictions
Mixed mode expressions are now permitted. (For instance, it is permissible to combine
inteaer and complex quantities in an expression.)
CQnstant expressions are pennitted where a constant is allowed, except in data statements. (A constant expression is made up of e~pUctt constants and parameters and the
Fortran operators, except for exponentiation tq a ftoatins'!point power.) An a(ljustable
dimension may now be an inteaer expression involvln• constants, arauments, and vari~bles in 1' common.
Subl!lcripts may now be aeneral inte1er expressions; ~e old cv ± c' rules have been
removed. .. loop bounds may be general inteatr, real, pr double precision expressiop.s.
Computed 1oto expressions and 1/0 unit numbers may be general integer expressions.

2-102 A Portable Fortran 77 Compiler

5. E1:ecutable Statements
5.1. IF-THEN-ELSE
At last, the if-then-else branchina structure bas been added to Fortran. It is called a
"Block If". A Block If beai.ns with a statement of the form
if ( ... )then
and ends with an
end if
statement. Two other new statements may appear in a Block If. There may be several
else if (... ) then
statements, followed by at most one
else
statement. If the loaical expression in the Block If statement is true, the statements followina it up to the next else If, else, or end If are executed. Otherwise, the next else If
statement in the group is executed. If none of the else If conditions are true, control
passes to the statements followina the else statement, if any. (The else block must follow
all else If blocks in a Block If. Of course, there may be Block Ifs embedded inside of
other Block If structures.) A case construct may be rendered:
if (s .eq. 'ab') then
else if (s .eq. 'cd') then
else
end if

5.2. Altemate Returns
Some of the arauments of a subroutine call may be statement labels preceded by an asterisk, as in:
call joe(j, •10, m, •2)
A ntum statement may have an integer expression, such as:
return k
If the entry point bas n alternate return (asterisk) arguments and if 1Et k Et n, the return
is followed by a branch to the correspondina statement label; otherwise the usual return to
the statement following the call is executed.

'· Inpat/Oatpat

,.1. Format Variables
A format may be the value of a character expression (constant or otherwise), or be stored
in a character array, as in:
write(6, '(iS)') x

A Portable Fortran 77 Compiler 2-103

6.2. END•, ERR•, ud IOSTAT• Clauses
A read or write statement may contain ead•, en•, and lostat• clauses, as in:
write(6, 101, err-20, iostat-a(4))
read(S, 101, err-20, end-30, iostat-x)
Here S and 6 are the unitl on which the 1/0 is done, 101 is the statement number of the
associated format, 20 and 30 ue statement numbers, and a and s ue inteaers. If an error
occurs duriq 1/0, control returns to the proaram at statement 20. If the end of the file is
reached, control returns to the proaram at statement 30. In any case, the variable
referred to in the loltat • clause is aiven a value when the 1/0 statement finishes. (Yes,
the value is assianed to the name on the riaht side of the equal sian.) This value is zero if
all went well, neptive for end of ftle, and some positive value for errors.
6.3. Formatted 1/0
6.3.1. Cbuaeter Coa1tut1
Chuacter constants in formats ue copied literally to the output. It is not allowed to read
into chuacter constants or hollerith fields.
A format may be specified as a chuacter constant within the read or write statement.
write(6,'(i2," im""t ",il)') 7, 4
produces

7 isn't 4
In the example above, the format is the chuacter constant

(i2,' isn"t ',il)
and the imbedded chuacter constant
isn't
is copied into the output.

The example could have been written more leaibly by takina advantaae of the two types
of quote marks.
write(6,'(i2,. isn"t •,il)') 7. 4
However, the double quote is not standud Fortran 77.
6.3.2. Posldoaal Edltba1 Codes
t, ti, tr, ands codes control where the next character is in the record. trn or a specifies
that the next chuacter is n to the riaht of the current position. tin specifies that the next
character is n to the left of the current position, allowina puts of the record to be reconsidered. tn says that the next character is to be character number n in the record. (See
section 3.3 in the main text.)
6.3.3. Celoa
A colon in the format terminates the UO operation if there ue no more data items in the
UO list, otherwise it has no effect. In the fraament
x-'(•heno•, :, •there-, i4)'
write(6, x) 12
write(6, x)
the first write statement prints

2-104 A Portable Fortran 77 Compiler

hello there 12
while the second only prints
hello

6.3.4. Optloaal Pia• SJla•
Accordina to the Standard, each ~pleQ1entation has the option of putting plus signs in
front of non-negative numeric output. The 1p format code may be used to make the
optional plus signs actually appear for ill SQJ>sequent i~ms while the format is active. The
Ui• 1/0 8"lem will not insert the optional plus signs, and
the 1 format code r~tor~ the defaql~ be~vior of the 1/0 system. (Since we never put
out optional plus sians, 11 and 1 codes have the same effect in our implementation.)
11 fonnat code guarantees that

6.3.5. Blok.I oa lapat
Blanks in numeric input fields, other than leading blanks, will be ignored following a bn
code in a format statement, and Will be treated as zeros following a bz code in a format
statement. Th,e default for a wiit may be changed by using the open statement. (Blanks
are ipored by default.)

6.3.,. Un~presentable Valaes
The Stanct.,rd requires that if a numeric item cannot be represented in the form required
by a format code, the output field must be fllled with asterisks. (We think this should
have been an option.)

6.3.7. Iw.m
There is a new integer output code, lw.m. It is the same as lw, except that there will be at
least m digits in the output field, including, if necessary, leading zeros. The case I w. 0 is
special, in that if the value being printed is 0, the output field is entirely blank. Iw.1 is
the same as I w.

6.3.I. noat1a1 Point
On input, exponents may start with the letter E, D, e, or d. All have the same meanina.
Qn outpµt we always use e or d. The e and d format codes also have identical meanings.
A leadina zero before the decimal point in e output without a scale factor is optional with
the implementation. Tllere is a 1w.d format code which is the same as ew.d and fw.d on
input,· but which chooses f or e formats for output depending on the size of the number
and of d.

6.3.9. "A" format Cade
The a code is u-1 for cllaracter data. aw uses a field width of w, while a plain a uses the
lenath of the internal character item.

6.4. Standard Ualt1
There are 4efault formatted input and output units. The statement

read 10, a, b
reads frqm the stanctard unit using format statement 10. The default unit may be explicitly specified by an asterisk, as in
·
read(•, 10) a,b
SUnilartY, the standard output unit is specified by a print statement or an as.terisk unit:

A Portable Fortran 77 Compiler 2-105
print 10
write(•, 10)

,.S. Llst-Dlrected Formattln1
List-directed 1/0 is a kind of free form input for sequential 1/0. It is invoked by using an
asterisk as the format identifier, as in
read(6, •) a,b,c
On input, values are separated by strings of blanks and possibly a comma. Values, except
for character strings, cannot contain blanks. End of record counts as a blank, except in
character strings, where it is ignored. Complex constants are given as two real constants
separated by a comma and enclosed in parentheses. A null input field, such as between
two consecutive commas, means the corresponding variable in the 1/0 list is not changed.
Values may be preceded by repetition counts, as in

4•(3.,2.) 2•, 4•'hello'
which stands for 4 complex constants, 2 null values, and 4 string constants.
For output, suitable formats are chosen for each item. The values of character strings are
printed; they are not enclosed in quotes, so they cannot be read back using list-directed
input.
'·'· Direct 1/0
A file connected for direct access consists of a set of equal-sized records each of which is
uniquely identified by a positive integer. The records may be written or read in any order,
using direct access 1/0 statements.
Direct access read and write statements have an extra argument, rec•, which gives the
record number to be read or written.
read(2, rec-13, err-20) (a(i), i-1, 203)
reads the thirteenth record into the array a.
The size of the records must be given by an open statement (see below). Direct access
files may be connected for either formatted or unformatted 110.

'· 7. lntemal Flies
Internal files are character string objects, such as variables or substrings, or arrays of type
character. In the former cases there is only a single record in the file; in the latter case
each array element is a record. The Standard includes only sequential formatted 110 on
internal files. (I/O is not a very precise term to use here, but internal files are dealt with
using read and write.) There is no list-directed 1/0 on internal files. Internal files are
used by giving the name of the character object in place of the unit number, as in
character•80 x
read(S,'(a)') x
read(x,'(i3,i4)') nl,n2
which reads a character string into s and then reads two integers from the front of it. A
sequential read or write always starts at the beginning of an internal file.
We also support a compatible extension, direct 1/0 on internal files. This is like direct 110
on external files, except that the number of records in the file cannot be changed. In this
case a record is a single element of an array of character strings.

2-106 A Portable Fortran 77 Compiler
6.1. OPEN, CLOSE, ud INQUIRE Statemeatl
These statements are used to connect and disconnect units and files, and to pther information about units and flies.
6.1.1. OPEN
The open statement is used to connect a flle with a unit, or to alter some properties of the
connection. The followina is a minimal example.
open(l, flle-'fortjunk')
open takes a variety of arauments with meaninp described below.
aalt• a small non-neptive inteaer which is the unit to which the flle is to be connected.
We allow, at the time of this writina, 0 throuah 19. If this parameter is the first one
in the open statement, the aalt• can be omitted.
lostat • is the same u in read or write.
en• is the same u in nad or write.
lie• a character expression, which when stripped of trailin& blanks, is the name of the
file to be connected to the unit. The filename should not be aiven if the
1tata1•'scratch'.
1tata1• one of 'old', 'new', 'scratch', or 'ulmowa'. If this parameter is not aiven,
'ukaowa' is usumed. The meanina of 'uknowa' is processor dependent; our system will create the file if it doesn't exist. If 'scratch' is aiven, a temporary file will
be created. Temporary files are destroyed at the end of execution. If 'new' is aiven,
the file must not exist. It will be created for both readina and writina. If 'old' is
aiven, it is an error for the file not to exist.
aceeu• 'seqaentlal' or 'dlnct', dependina on whether the flle is to be opened for
sequential or direct 1/0.
form• 'formatted' or 'unformatted'. On UNIX systems form•'prbat' implies 'formatted'
with vertical format control.
reel• a positive inteaer specifyina the record lenath of the direct access file being opened.
We meuure all record lenaths in bytes. On UNIX systems a record length of 1 has
the special meanina explained in section S. l of the text.
'n.U' or '1en'. This parameter bu meanina only for formatted 1/0. The default
value is 'nall'. 'zero' means that blanks, other than leadina blanks, in numeric input
flelds ue to be treated u zeros.
Openina a new file on a unit which is already connected has the effect of first closing the
old file.

••ut•

6.1.2. CLOSE
dose seven the connection between a unit and a flle. The unit number must be given.
The optional parameten are l0ttat• and err• with their usual meaninas. and status•
either 'keep' or 'delete'. For scratch files the default is 'delete'; otherwise 'keep' is the
default. 'delete' means the file will be removed. A simple example is
close(3, err•l7)

,.1.3. INQUIRE
The Inquire statement aives information about a unit ("inquire by unit") or a file
("inquire by flle"). Simple examples are:

A Portable Fortran 77 Compiler 2-107
inquire(unit-3, namexx)
inquire (file-]unk', number-n, exist-I)
Ile• a character variable specifies the file the Inquire is about. Trailing blanks in the file
name are ianored.
ult• an integer variable specifies the unit the lnqalre is about. Exactly one of Ile• or
ult• must be used.
loltat•, en• are as before.
eslst• a logical variable. The logical variable is set to .trae. if the file or unit exists and
is set to .false. otherwise.
opened• a logical variable. The logical variable is set to .trae. if the file is connected to
a unit or if the unit is connected to a file, and it is set to .false. otherwise.
namller• an integer variable to which is assianed the number of the unit connected to
the file, if any.
named• a loaical variable to which is assianed .trae. if the file has a name, or .false.
otherwise.
name• a character variable to which is assianed the name of the file (inquire by file) or
the name of the file connected to the unit (inquire by unit). The name will be the
full name of the file.
access• a character variable to which will be assianed the value 'sequential' if the connection is for sequential 110, 'direct' if the connection is for direct 1/0. The value
becomes undefined if there is no connection.
sequential• a character variable to which is assiped the value '1es' if the file could be
connected for sequential 1/0, 'ao' if the file could not be connected for sequential
110, and 'aaluaowa' if we can't tell.
direct• a character variable to which is assianed the value '1es' if the file could be connected for direct 1/0, 'ao' if the file could not be connected for direct 1/0, and 'unknown' if we can't tell.
form• a character variable to which is assianed the value unformatted' if the file is connected for unformatted 110, 'formatted' if the file is connected for formatted 110, or
'print' for formatted 110 with vertical format control.
formatt•• a character variable to which is assiped the value 'yes' if the file could be
connected for formatted 110, 'ao' if the file could not be connected for formatted
110, and 'ubown' if we can't tell.
uformatted• a character variable to which is assiped the value '1es' if the file could be
connected for unformatted 110, 'ao' if the file could not be connected for unformatted 110, and 'aaluaown' if we can't tell.
nd • an inteaer variable to which is assiped the record length of the records in the file
if the file is connected for direct access.
aestrec• an integer variable to which is assiped one more than the number of the the
last record read from a file connected for direct access.
ltlank • a character variable to which is assigned the value 'nail' if null blank control is in
etfect for the file connected for formatted 110, 'zero' if blanks are being converted to
zeros and the file is connected for formatted 110.
The gentle reader will remember that the people who wrote the Standard probably weren't
thinking of his needs. Here is an example. The declarations are omitted.
\

open(l, file-'/dev/console')
On a UNIX system this statement opens the console for formatted sequential 110. An

2~108

A Portable Fortran 77 Compiler

Inquire statement for either unit 1 or file •/dev/console• would reveal that the file exists,
is connected to unit 1, bu a name, namely •/dev/console•, is opened for sequential 110,
could be connected for sequential 110, could not be connected for direct 1/0 (can't seek),
is connected for formatted 110, could be connected for formatted 1/0, could not be connected for unformatted 110 (can't seek), bu neither a record lenath nor a next record
number, and is ianorina blanks in numeric fields.
In the FOllTllAN environment, the only way to discover what permissions you have for a
file is to open it and try to read and write it. The err• parameter will return system error
numbers. The Inquire statement does not pve a way of determinina permissions.
For further discussion of the UNIX Fortran 110 system see "Introduction to the f77 1/0
Library" [9].

A Portable Fortran 77 Compiler 2-109

APPENDIX B: References ud Blbll011Sph1
Refereaees
1. American National Standard Programming Language FORTRAN, A.NS/ XJ.9-1978. New York:
American National Standards Institute, 1978.
2. USA Slllndard FORTRAN, USAS XJ.9-1966. New York: United States of America Standards
Institute, 196(;. Clarified in Comm. A.CM 12:289 (1969) and Comm. ACM 14:628 (1971).
3. Kerniahan, B. W., and D. M. Ritchie. The C Progra,,,ming Language. Enaiewood Cliffs:
Prentice-Hall, 1978.
4. Ritchie, D. M. Private communication.
S. Johnson, S. C. "A Portable Compiler: Theory and Practice," Proceedings of Fifth ACM Symposium on Principles of Programming Languagei. 1978.
6. Feldman, S. I. "An Informal °'5c1iption of EFL," internal memorandum.
7. Kerniahan, B. W. "RATFOR-A Preprocessor for :Rational Fortran," Bell Laboratories Computing Sciena Technical Report # 55. 1977.
8. Ritchie, D. M. Private communication.
9. Wasley, D. L. "Introduction to the n7 1/0 Library", UNIX Programmer's Manual, VolfJme
2c.
BlbllOll'SPhJ

11le f ollowina books or documents describe aspects of Fortran 77. This list cannot prete~d to
be complete. Certainly no particular endorsement is implied.
1. Brainerd, Walter S., et al. Fortran 77 Programming. Harper Row, 1978.
2. Day, A. C. Compatible Fortran. Cambridae University Press, 1979.
3. Dock, V. Thomas. Structured Fortran JV Programming. West, 1979.
4. Feldman, S. I. "The Programmina Lanauaae EFL," Bell Laboratories Technical Report. June
1979.
S. Hume, J. N., and R. C. Holt. Programming Fortran 77. Reston, 1979.
6. Katzan, Harry, Jr. Fortran 77. Van Nostrand-Reinhold, 1978.
7. Meissner, Loren P., and Orpnick, Elliott I. Fortrpn 77 Featuring Structured Programming,
Addison-Wesley, 1979.
8. Merchant, Michael J. ABC's of Fortran Programmlffl. Wa~orth, 1979.
9. Page, Rex, and Richard Didday. Fortran 77 for J(umans. West, 19~0.
10. Wqener, Jerrold L. Principles of Fortran 71 Programming. Wiley, 1980.

RATFOR 2-111

RATFOR - A Preprocessor for a Rational Fortran
Brian W. Kernighan
Bell Laboratories
Murray Hill, New Jersey 07974
1. INTRODUCTION

Most programmers will agree that Fortran
is an unpleasant language to program in, yet
there are many occasions when they are forced
to use it. For example, Fortran is often the only
language thoroughly supported on the local computer. Indeed, it is the closest thing to a universal programming language currently available:
with care it is possible to write large, truly portable Fortran programs[!]. Finally, Fortran is
often the most "efficient" language available,
particularly for programs requiring much computation.
But Fortran is unpleasant. Perhaps the
worst deficiency is in the control flow statements
- conditional branches and loops - which
express the logic of the program. The conditional statements in Fortran are primitive. The
Arithmetic IF forces the user into at least two
statement numbers and two (implied) GOTO's; it
leads to unintelligible code, and is eschewed by
good programmers. The Logical IF is better, in
that the test part can be stated clearly, but
hopelessly restrictive because the statement that
follows the IF can only be one Fortran statement
(with some further restrictions!). And of course
there can be no ELSE part to a Fortran IF: there
is no way to specify an alternative action if the
IF is not satisfied.
The Fortran DO restricts the user to going
forward in an arithmetic progression. It is fine
for "l to N in steps of 1 (or 2 or ... )'', but there
is no direct way to go backwards, or even (in
ANSI Fortran[2]) to go from 1 to N-1. And of
course the DO is useless if one's problem doesn't
map into an arithmetic progression.
The result of these failings is that Fortran
programs must be written with numerous labels
and branches. The resulting code is particularly
difficult to read and understand, and thus hard
to debug and modify.

When one is faced with an unpleasant
language, a useful technique is to define a new
language that overcomes the deficiencies, and to
translate it into the unpleasant one with a
preprocessor. This is the approach taken with
Ratfor. (The preprocessor idea is of course not
new, and preprocessors for Fortran are especially
popular today. A recent listing (3) of preprocessors shows more than 50, of which at least half a
dozen are widely available.)
2. LANGUAGE DESCRIPTION
Design
Ratfor attempts to retain the merits of
Fortran (universality, portability, efficiency)
while hiding the worst Fortran inadequacies.
The language is Fortran except for two aspects.
First, since control flow is central to any program, regardless of the specific application, the
primary task of Ratfor is to conceal this part of
Fortran from the user, by providing decent control flow structures. These structures are
sufficient and comfortable for structured programming in the narrow sense of programming
without GOTO's. Second, since the preprocessor
must examine an entire program to translate the
control structure, it is possible at the same time
to clean up many of the "cosmetic" deficiencies
of Fortran, and thus provide a language which is
easier and more pleasant to read and write.
Beyond these two aspects - control flow
and cosmetics - Ratfor does nothing about the
host of other weaknesses of Fortran. Although it
would be straightforward to extend it to provide
character strings, for example, they are not
needed by everyone, and of course the preprocessor would be harder to implement.
Throughout, the design principle which has
determined what should be in Ratfor and what
should not has been Ratfor doesn't know any
Fortran. Any language feature which would

This paper is a revised and expanded version of one published in Software-Practice and Experience October 1975. ~he Ratfor described here is the one in use on UNIX and Gcos at Bell Laboratories, Murray' Hill,
N. J. UNIX is a Trademark of Bell Laboratories

2-112 RATFOR
require that Ratfor really understand Fortran
has been omitted. We will return to this point
in the section on implementation.
Even within the confines of control flow
and cosmetics, we have attempted to be selective
in what features to provide. The intent has
been to provide a small set of the most useful
constructs, rather than to throw in everything
that has ever been thought useful by someone.
The rest of this section contains art informal description of the Ratfor language. The
control flow aspects will be quite familiar to
readers used to languages like Algol, PL/I, Pascal, etc., and the cosmetic changes are equally
straightforward. We shall concentrate on showing what the language looks like.
Statement Grouping
Fortran provides no way to group statements together, short of making them into a
subroutine. The standard construction "if a
condition is true, do this group of things," for
example,
if (x > 100)
{ call error("x> 100"); err = 1; return }
cannot be written ditectly in Fortran. Instead a
programmer is forced to translate this relatively
clear thought into murky Fortran, by stating the
negative condition and branching around the
group of statements:
if (x .le. 100) goto 10
call error(5hx>100)
err= 1
return

(like "x>lOO"), quotes are not allowed in ANSI
Fortran, so Ratfor converts it into the right
number of H's: computers count better than
people do.
Ratfor is a free-form language: statements
may appear anywhere on a line, and several may
appear on one line if they are separated by semicolons. The example above could also be written
as
if (x > 100) {
call error("x> 100")
err= 1
return
}
In this case, no semicolon is needed at the end
of each line because Ratfor assumes there is one
statement per line unless told otherwise.
Of course, if the statement that follows the
if is a single statement (Ratfor or otherwise), no
braces are needed:
if (y <= 0.0 & z <= 0.0)
write(6, 20) y, z
No continuation need be indicated because the
statement is clearly not finished on the first line.
In general Ratfor continues lines when it seems
obvious that they are not yet done. (The continuation convention is discussed in detail later.)
Although a free-form language permits
wide latitude in formatting styles, it is wise to
pick one that is readable, then stick to it. In
particular, proper indentation is vital, to make
the logical structure of the program obvious to
the reader.

when the program doesn't work, or when it
must be modified, this must be translated back
into a clearer form before one can be sure what
it does.
Ratfor eliminates this error-prone and
confusing back-and-forth translation; the first
form is the way the computation is written in
Ratfor. A group of statements can be treated as
a unit by enclosing them in the braces { and }.
This is true throughout the language: wherever a
single Ratfor statement can be used, there can
be several enclosed in braces. (Braces seem
clearer and less obtrusive than begin and end
or do and end, and of course do and end
already have Fortran meanings.)
Cosmetics contribute to the readability of
code, and thus to its understandability. The
character ">" is clearer than ".GT.'', so Ratfor
translates it appropriately, along with several
other similar shorthands. Although many Fortran compilers permit character strings in quotes

The "else" Clause
Ratfor provides an else statement to handle the construction "if a condition is true, do
this thing, otherwise do that thing."
if (a<= b)
{ sw = O; write(6, 1) a, b }
else
{ sw = 1; write(6, 1) b, a }
This writes out the smaller of a and b, then the
larger, and sets sw appropriately.
The Fortran equivalent of this code is circuitous indeed:

Ratfor 2-113
if (a .gt. b) goto 10
SW= 0

write(6, 1) a, b
goto 20
10

SW= 1

if (x < O)
f = -1
else if (x > 100)
f = +1

else
f = 0

write(6, 1) b, a
20

This is a mechanical translation; shorter forms
exist, as they do for many similar situations.
But all translations suffer from the same problem: since they are translations, they are less
clear and understandable than code that is not a
translation. To understand the Fortran version,
one must scan the entire program to make sure
that no other statement branches to statements
10 or 20 before one knows that indeed this is an
if-else construction. With the Ratfor version,
there is. no question about how one gets to the
parts of the statement. The if-else is a single
unit, which can be read, understood, and ignored
if not relevant. The program says what it
means.
As before, if the statement following an if
or an else is a single statement, no braces are
needed:
if (a<= b)
SW= 0

Here the statement after the first else is
another if-else. Logically it is just a single
statement, although it is rather complicated.
This code says what it means. Any version written in straight Fortran will necessarily
be indirect because Fortran does not let you say
what you mean. And as always, clever shortcuts
may turn out to be too clever to understand a
year from now.
Following an else with an if is one way to
write a multi-way branch in Ratfor. In general
the structure
if( ... )
else if( ... )
else if( ... )

else

else
SW= 1

The syntax of the if statement is
if (legal Fortran condition)
Ratfor statement
else
Ratfor statement
where the else part is optional. The legal Fortran condition is anything that can legally go
into a Fortran Logical IF. Ratfor does not check
this clause, since it does not know enough Fortran to know what is permitted. The Ratfor
statement is any Ratfor or Fortran statement, or
any collection of them in braces.
Nested if's
Since the statement that follows an if or
an else can be any Ratfor statement, this leads
immediately to the possibility of another if or
else. As a useful example, consider this problem: the variable f is to be set to -1 if x is less
than zero, to +1 if x is greater than 100, and to
0 otherwise. Then in Ratfor, we write

provides a way to specify the choice of exactly
one of several alternatives. (Ratfor also provides
a switch statement which does the same job in
certain special cases; in more general situations,
we have to make do with spare parts.) The tests
are laid out in sequence, and each one is followed by the code associated with it. Read
down the list of decisions until one is found that
is satisfied. The code associated with this condition is executed, and then the entire structure is
finished. The trailing else part handles the
"default" case, where none of the other conditions apply. If there is no default action, this
final else part is omitted:
if (x < O)

x=O
else if (x > 100)
x = 100
if-else ambiguity
There is one thing to notice about complicated structures involving nested if's and else's.
Consider

2-114 RATFOR

if (x > O)
if (y > O)
write(6, 1) x, y
else
write(6, 2) y
There are two if's and only one else. Which if
does the else go with?
This is a genuine ambiguity in Ratfor, as
it is in many other programming languages. The
ambiguity is resolved in Ratfor (as elsewhere) by
saying that in such cases the else goes with the
closest previous un-else'ed if. Thus in this case,
the else goes with the inner if, as we have indicated by the indentation.

match expression, and there is a default section, the statements with it are done; if there is
no default, nothing is done. In all situations,
as soon as some block of statements is executed,
the entire switch is exited immediately.
(Readers familiar with C[4] should beware that
this behavior is not the same as the C switch.)
The "do" Statement
The do statement in Ratfor is quite similar to the DO statement in Fortran, except that it
uses no statement number. The statement
number, after all, serves only to mark the end of
the DO, and this can be done just as easily with
braces. Thus

It is a wise practice to resolve such cases
by explicit braces, just to make your intent clear.
In the case above, we would write

do i = 1, n {
x(i) = 0.0
y(i) = 0.0
z(i) = 0.0
}

if (x > O) {

if (y > 0)
write(6, 1) x, y
else
write(6, 2) y

is the same as

which does not change the meaning, but leaves
no doubt in the reader's mind. If we want the
other association, we must write

}

if (x > O) {
if (y > 0)
write(6, 1) x, y
}
else
write(6, 2) y
The "switch" Statement
The switch statement provides a clean
way to express multi-way branches which branch
on the value of some integer-valued expression.
The syntax is
switch (expression) {
case exprl:
statements
case expr2, expr3 :
statements
default:
statements

}
Each case is followed by a list of commaseparated integer expressions. The expression
inside switch is compared against the case
expressions exprl, expr2, and so on in turn until
one m11tches, at which time the statements following that case are executed. If no cases

do 10 i = 1, n
x(i) = 0.0
y(i) = 0.0
z(i) = 0.0
continue

The syntax is:
do legal-Fortran-DO-text
Ratfor statement
The part that follows the keyword do has to be
something that can legally go into a Fortran DO
statement. Thus if a local version of Fortran
allows DO limits to be expressions (which is not
currently permitted in ANSI Fortran), they can
be used in a Ratfor do.
The Ratfor statement part will often be
enclosed in braces, but as with the if, a single
statement need not have braces around it. This
code sets an array to zero:
do i = 1, n
x(i) = 0.0
Slightly more complicated,
do i = 1, n
do j = 1, n
m(i, j) = 0
sets the entire array m to zero, and

Ratfor 2-115
do i = l, n
do j = 1, n
if (i < j)
m(i, j) = -1
else if (i == j)
m(i, j) = 0
else
m(i, j) = +1
sets the upper triangle of m to -1, the diagonal
to zero, and the lower triangle to + 1. (The
operator == is "equals", that is, ".EQ.".) In each
case, the statement that follows the do is logically a single statement, even though complicated, and thus needs no braces.

"break" and "next"
Ratfor provides a statement for leaving a
loop early, and one for beginning the next iteration. break causes an immediate exit from the
do; in effect it is a branch to the statement after
the do. next is a branch to the bottom of the
loop, so it causes the next iteration to be done.
For example, this code skips over negative values
in an array:
do i = 1, n {
if (x(i) < 0.0)
next
process positive element
}
break and next also work in the other Ratfor
looping constructions that we will talk about in
the next few sections.
break and next can be followed by an
integer to indicate breaking or iterating that
level of enclosing loop; thus
break 2
exits from two levels of enclosing loops, and
break 1 is equivalent to break. next 2
iterates the second enclosing loop. (Realistically,
multi-level break's and next's are not likely to
be much used because they lead to code that is
hard to understand and somewhat risky to
change.)
The "while" Statement
One of the problems with the Fortran DO
statement is that it generally insists upon being
done once, regardless of its limits. If a loop
begins
DO I= 2, 1

this will typically be done once with I set to 2,
even though common sense would suggest that

perhaps it shouldn't be. Of course a Ratfor do
can easily be preceded by a test
if (j <= k)
do i = j, k {

but this has to be a conscious act, and is often
overlooked by programmers.
A more serious problem with the DO statement is that it encourages that a program be
written in terms of an arithmetic progression
with small positive steps, even though that may
not be the best way to write it. If code has to be
contorted to fit the requirements imposed by the
Fortran DO, it is that much harder to write and
understand.
To overcome these difficulties, Ratfor provides a while statement, which is simply a loop:
"while some condition is true, repeat this group
of statements". It has no preconceptions about
why one is looping. For example, this routine to
compute sin(x) by the Maclaurin series combines
two termination criteria.
real function sin(x, e)
# returns sin(x) to accuracy e, by
# sin(x) = x - x**3/3! + x**5/5! - ...
sin= x
term= x
i =3

while (abs(term)>e & i<lOO) {
term = -term * x**2 I float(i*(i-1))
sin = sin + term
i=i+2

return
end
Notice that if the routine is entered with
term already smaller than e, the loop will be
done zero times, that is, no attempt will be
made to compute x**3 and thus a potential
underflow is avoided. Since the test is made at
the top of a while loop instead of the bottom, a
special case disappears - the code works at one
of its boundaries. (The test i<lOO is the other
boundary - making sure the routine stops after
some maximum number of iterations.)
As an aside, a sharp character "#" in a
line marks the beginning of a comment; the rest
of the line is comment. Comments .and code can
co-exist on the same line - one can make marginal remarks, which is not possible with
Fortran's "C in column l" convention. Blank
lines are also permitted anywhere (they are not

2-116 Ratfor
in Fortran); they should be used to emphasize
the natural divisions of a program.
The syntax of the while statement is

for (i=3; abs(term) > e & i < 100; i=i+2) {
term =-term * x**2 I ftoat(i*(i-l))
sin = sin + term

while (legal Fortran condition)

}

Ratfor statement
As with the if, legal Fortran condition is something that can go into a Fortran Logical IF, and
Ratfor statement is a single statement, which
may be multiple statements in braces.
The while encourages a style of coding
not normally practiced by Fortran programmers.
For example, suppose nextch is a function
which returns the next input character both as a
function value and in its argument. Then a loop
to find the first non-blank character is just
while (nextch(ich) = = iblank)
A semicolon by itself is a null statement, which
is necessary here to mark the end of the while;
if it were not present, the while would control
the next statement. When the loop is broken,
ich contains the first non-blank. Of course the
same code can be written in Fortran as

100

if (nextch(ich) .eq. iblank) goto 100

but many Fortran programmers (and a few compilers) believe this line is illegal. The language
at one's disposal strongly influences how one
thinks about a problem.

The "for" Statement
The for statement is another Ratfor loop,
which attempts to carry the separation of loopbody from reason-for-looping a step further than
the while. A for statement allows explicit initialization and increment steps as part of the
statement. For example, a DO loop is just
for (i = 1; i <= n; i = i + 1) ...

The syntax of the for statement is
for ( init ; condition ; increment )

Ratfor statement
init is any single Fortran statement, which gets
done once before the loop begins. increment is
any single Fortran statement, which gets done at
the end of each pass through the loop, before the
test. condition is again anything that is legal in
a logical IF. Any of init, condition, and increment may be omitted, although the semicolons
must always be present. A non-existent condition is treated as always true, so for(;;) is an
indefinite repeat. (But see the repeat-until in
the next section.)
The for statement is particularly useful
for backward loops, chaining along lists, loops
that might be done zero times, and similar
things which are hard to express with a DO statement, and obscure to write out with IF's and
GOTO's. For example, here is a backwards DO
loop to find the last non-blank character on a
card:
for (i = 80; i > O; i = i - 1)
if (card(i) != blank)
break
("!=" is the same as ".NE."). The code scans the
columns from 80 through to 1. If a non-blank is
found, the loop is immediately broken. (break
and next work in for's and while's just as in
do's). If i reaches zero, the card is all blank.
This code is rather nasty to write with a
regular Fortran DO, since the loop must go forward, and we must explicitly set up proper conditions when we fall out of the loop. (Forgetting
this is a common error.) Thus:

This is equivalent to
i = 1
while (i <= n) {

10
i=i+l

DO 10 J = 1, 80
I= 81 -J
IF (CARD(I) .NE. BLANK) GO TO 11
CONTINUE
I= 0

11
The initialization and increment of i have been
moved into the for statement, making it easier
to see at a glance what controls the loop.

The version that uses the for handles the termination condition properly for free; i is zero when
we fall out of the for loop.

The for and while versions have the
advantage that they will be done zero times if n
is less than 1; this is not true of the do.

The increment in a for need not be an
arithmetic progression; the following program
walks along a list (stored in an integer array
ptr) until a zero pointer is found, adding up elements from a parallel array of values:

The loop of the sine routine in the previous section can be re-written with a for as

Ratfor 2-117
sum= 0.0
for (i = first; i > O; i = ptr(i))
sum = sum + value(i)
Notice that the code works correctly if the list is
empty. Again, placing the test at the top of a
loop instead of the bottom eliminates a potential
boundary error.

# equal _compare strl to str2;
#
return 1 if equal, 0 if not
irtteger function equal(strl, str2)
integer strl(lOO), str2(100)
integer i

for (i = 1; strl(i) = = str2(i); i = i + 1)
if (strl(i) = = -1) {
equal= 1
return
}
equal= 0
return
end

The "repeat-until" statement
In spite of the dire warnings, there are
times when one really needs a loop that tests at
the bottom after one pass through. This service
is provided by the repeat-until:
repeat

Ratfor statement
until (legal Fortran condition)
The llatfor statement part .is done once, then
the condition is evaluated. If it is true, the loop
is exited; if it is false, another pass is made.
The until part is optional, so a bare
repeat is the cleanest way to specify an infinite
loop. Of course such a loop must ultimately be
broken by some transfer of control such as stop,
return, or break, or an implicit stop such as
running out of input with a READ statement.
As a matter of observed fact[8], the
repeat-until statement is much less used than
the other looping constructions; in particular, it
is typically outnumbered ten to one by for and
while. Be cautious about using it, for loops that
test only at the bottom often don't handle null
cases weU.

In many languages (e.g., PL/I) one instead
says
return (expression)
to return a value from a function. Since this is
often clearer, Ratfor provides such a return
statement
in
a
function
F,
return(expression) is equivalent to
{ F = expression; return }
For example, here is equal again:
# equal -compare strl to str2;
#
return 1 if equal, 0 if not
integer function equal(strl, str2)
integer strl(lOO), str2(100)
integer i

for (i = 1; strl(i) == str2(i); i = i + 1)
if (strl(i) == -1)
return(l)
return(O)
end

More on break and next
break exits immediately from do, while,
for, and repeat-until. next goes to the test
part of do, while and repeat-until, and to the
increment step of a for.

If there is no parenthesized expression after

"return" Statement
The standard Fortran mechanism for
returning a value from a function uses the name
of the function as a variable which can be
assigned to; the. last value stored in it, is the
function value upon return. For example, here
is a routine equal which returns 1 if two arrays
are identical, and zero if they differ. The array
ends are marked by the special value -1.

Cosmetics
As we said above, the visual appearance of
a language has a substantial effect on how easy
it is to read and understand programs. Accordingly, Ratfor provides a number of cosmetic
facilities which may be used to make programs
more readable.

return, a normal RETURN is made. (Another
version of equal is presented shortly.)

Free-form Input
Statements can be placed anywhere on a
line; long statements are continued automatically, as are long conditions in if, while, for,
and until. Blank lines are ignored. Multiple
statements may appear on one line, if they are
separated by semicolons. No semicolon is

2-118 Ratfor
needed at the end of a line, if Ratfor can make
some reasonable guess about whether the statement ends there. Lines ending with any of the
characters

+ -

l
$)

"define" Statement

I &

are assumed to be continued on the next line.
Underscores are discarded wherever they occur;
all others remain as part of the statement.
Any statement that begins with an allnumeric field is assumed to be a Fortran label,
and placed in columns 1-5 upon output. Thus

Any string of alphanumeric characters can
be defined as a name; thereafter, whenever that
name occurs in the input (delimited by nonalphanumerics) it is replaced by the rest of the
definition line. (Comments and trailing white
spaces are stripped off). A defined name can be
arbitrarily long, and must begin with a letter.
define is typically used to create symbolic
parameters:

write(6, 100); 100 format("hello")
is converted into
100

[
$(

define ROWSlOO
define COLS 50

write(6, 100)
format(5hhello)

dimension a(ROWS), b(ROWS, COLS)
if (i > ROWS

Translation Services
Text enclosed in matching single or double
quotes is converted to nH... but is otherwise
unaltered (except for formatting - it may get
split across card boundaries during the reformatting process). Within quoted strings, the
backslash '\' serves as an escape character: the
next character is taken literally. This provides a
way to get quotes (and of course the backslash
itself) into quoted strings:

"\\\"'
is a string containing a backslash and an apostrophe. (This is not the standard convention of
doubled quotes, but it is easier to use and more
general.)
Any line that begins with the character

'%' is left absolutely unaltered except for stripping off the '%' and moving the line one position to the left. This is useful for inserting control cards, and other things that should not be
transmogrified (like an existing Fortran program). Use '%' only for ordinary statements,
not for the condition parts of if, while, etc., or
the output may come out in an unexpected
place.
The following character translations are
made, except within single or double quotes or
on a line beginning with a ' % '.
>
<
&

.eq.
.gt.
.lt.
.and.
.not.

!=
>=
<=

.ne .
.ge .
.le.
.or.
.not.

In addition, the following translations are provided for input devices with restricted character
sets.

I j > COLS) ...

Alternately, definitions may be written as
define(ROWS, 100)
In this case, the defining text is everything after
the comma up to the balancing right
parenthesis; this allows multi-line definitions.
It is generally a wise practice to use symbolic parameters for most constants, to help
make clear the function of what would otherwise
be mysterious numbers. As an example, here is
the routine equal again, this time with symbolic
constants.
define
define
define
define

YES
NO
EOS
ARB

1
0
-1

100

# equal _compare strl to str2;

return YES if equal, NO if not
integer function equal(strl, str2)
integer strl(ARB), str2(ARB)
integer i
for (i = 1; strl(i) == str2(i); i = i + 1)
if (strl(i) == EOS)
return(YES)
return(NO)
end

"include" Statement
The statement
include file
inserts the file found on input stream file into
the Ratfor input in place of the include statement. The standard usage is to place COMMON
blocks on a file, and include that file whenever

Ratfor 2-119
a copy is needed:
subroutine x
include commonblocks
end
suroutine y
include commonblocks
end
This ensures that all copies of the COMMON
blocks are identical
Pitfalls, Botches, Blemishes and other
Failings
Ratfor catches certain syntax errors, such
as missing braces, else clauses without an if,
and most errors involving missing parentheses in
statements. Beyond that, since Ratfor knows no
Fortran, any errors you make will be reported by
the Fortran compiler, so you will from time to
time have to relate a Fortran diagnostic back to
the Ratfor source.
Keywords are reserved - using if, else,
etc., as variable names will typically wreak
havoc. Don't leave spaces in keywords. Don't
use the Arithmetic IF.
The Fortran nH convention is not recognized anywhere by Ratfor; use quotes instead.

3. IMPLEMENTATION
Ratfor was originally written in C[4] on
the UNIX operating system[5]. The language is
specified by a context free grammar and the
compiler constructed using the YACC compilercompiler[6].
The Ratfor grammar is simple and
straightforward, being essentially
prog
stat

: stat
I prog stat
: if (... ) stat
J if (... ) stat else stat
I while (... ) stat
I for (... ; ... ; ... ) stat
I do ... stat
I repeat stat
I repeat stat until (... )
I switch (... ) { case ... : prog ...
default: prog }
I return
I break
I next
I digits stat
I { prog}
I anything unrecognizable

The observation that Ratfor knows no Fortran

follows directly from the rule that says a statement is "anything unrecognizable". In fact most
of Fortran falls into this category, since any
statement that does not begin with one of the
keywords is by definition "unrecognizable."
Code generation is also simple. If the first
thing on a source line is not a keyword (like if,
else, etc.) the entire statement is simply copied
to the output with appropriate character translation and formatting. (Leading digits are treated
as a label.) Keywords cause only slightly more
complicated actions. For example, when if is
recognized, two consecutive labels L and L+ 1
are generated and the value of L is stacked. The
condition is then isolated, and the code
if (.not. (condition)) goto L

is output. The statement part of the if is then
translated. When the end of the statement is
encountered (which may be some distance away
and include nested if's, of course), the code
L

continue

is generated, unless there is an else clause, in
which case the code is
L

goto L+l
continue

In this latter case, the code

L+ 1

continue

is produced after the statement part of the else.
Code generation for the various loops is equally
simple.
One might argue that more care should be
taken in code generation. For example, if there
is no trailing else,
if (i > O) x =a
should be left alone, not converted into

100

if (.not. (i .gt. O)) goto 100
x=a
continue

But what are optimizing compilers for, if not to
improve code? It is a rare program indeed
where this kind of "inefficiency" will make even
a measurable difference. In the few cases where
it is important, the offending lines can be protected by '% '.
The use of a compiler-compiler is
definitely the preferred method of software
development. The language is well-defined, with
few syntactic irregularities. Implementation is
quite simple; the original construction took
under a week. The language is sufficiently simple, however, that an ad hoc recognizer can be
readily constructed to do the same job if no

2-120 RATFOR

compiler-compiler is available.
The C version of Ratfor is used on UNIX
and on the Honeywell GCOS systems. C compilers are not. as widely available as Fortran,
however, so there is also ll Ratfor written in
itself and originally bootstrapped with the C version. The Ratfor version was written so as to
translate into the portable subset of Fortran
described in [1], so it is portable, having been
run essentially without change on at least twelve
distinct machines. (The main restrictions of the
portable subset are: only one character per
machine word; subscripts in the form c *v ± c;
avoiding expressions in places like DO loops; consistency in subroutine argument usage, and in
COMMON declarations. Ratfor itself will not gratuitously generate non-standard Fortran.)
The Ratfor versiort is about 1500 lines of
Ratfor (compared to about 1000 lines of C); this
compiles into 2500 lines of Fortran. This expansion ratio is somewhat higher than average, since
the compiled code contains unnecessary
occurrences of COMMON declarations. The ·execution time of the Ratfor version is dominated
by two routines thst read and write cards.
Clearly these routines - could be replaced by
machine coded local versions; unless this is done,
the efficiency of other parts of the translation
process is largely irrelevant.

4. EXPERIENCE
Good Things
"It's so much better than Fortran" is the
most common response of users when asked how
well Ratfor meets their needs. Although cynics
might consider this to be vacuous, it does seem
to be true that decent control flow and cosmetics
converts Fortran from a bad language into quite
a reasonable one, assumirig that Fortran data
structures are adequate for the task at hand.
Although there are no quantitative results,
Users feel that coding in Ratfor is at least twice
as fast as in Fottrari. More important, debugging and subsequent revision are much faster
than in Fortran. Plirtly this is simply because
the code can be read. The looping statements
which test at the top instead of the bottom seem
to eliminate or at least reduce the occurrence of
a wide class of boundary errors. And of course it
is easy to do structured programming in Ratfor;
this self-discipline also contributes markedly to
reliability.
One interesting and encouraging fact is
that programs written in Ratfor tend to be as
readable as programs written in tnore modern
languages like Pascal. Once one is freed from
the shackles of Fortran's clerical detail and rigid

input format, it is easy to write code that is
readable, even esthetically pleasing. For example, here is a Ratfor implementation of the
linear table search discussed by Knuth [7]:
A(m+l) = x
for (i = l; A(i) != x; i = i + 1)
if (i > m) {
m = i
B(i) = 1

else

B(i) = B(i) + 1

A large corpus (5400 lines) of Ratfor, including a
subset of the Ratfor preprocessor itself, can be
found in [8].

Bad Things
The biggest single problem is that many
Fortran syntax errors are not detected by Ratfor
but by the local Fortran compiler. The compiler
then prints a message in terms of the generated
Fortran, and in a few cases this may be difficult
to relate back to the offending Ratfor line, especially if the implementation conceals the generated . Fortran. This problem could be dealt
with by tag"ing each generated line with some
indication of the source line that created it, but
this is inherently implementation-dependent, so
no action has Fet b~eq taken. Error message
interpretation is actlially not so arduous as
might be thought. Since Ratfor generates no
variables, onlJ.y
simple pattern of IF's and
GOTO's, data-re18.ted errors like missing DIMENSION statements are easy to find in the Fortrah.
Furthermore, there has been ,a steady improvement in Ratfor's ability to catch trivial syntactic
errors like unbalanced. parentheses and quotes.
There are a number of implemeritation
weaknesses that are a nuisance, especially to new
users. For example, keywords are reserved.
This rarely makes any difference, except for
those hardy souls who want to use an Arithmetic
IF. A few standard Fortran constructions are not
accepted by Ratfor, and this is perceived as a
problem by users with a large corpus of existing
Fortran progtams. Protecting every line with a
'%' is not really a complete solution, although it
serves as a stop-gap. The best long-term solution is provided by the program Struct [9];
which converts arbitrary Fortran programs into
Ratfor.
Users who export programs often complain
that the generated Fortran is "unreadable"
because it is not tastefully formatted and contains extraneous CONTINUE statements. To some
extent this can be ameliorated (Ratfor now has
an option to copy Ratfor comments into the gen-

Ratfor 2-121
erated Fortran), but it has always seemed that
effort is better spent on the input language than
on the output esthetics.
One final problem is partly attributable to
success - since Ratfor is relatively easy to
modify, there are now several dialects of Ratfor.
Fortunately, so far most of the differences are in
character set, or in invisible aspects like code
generation.

5. CONCLUSIONS
Ratfor demonstrates that with modest
effort it is possible to convert Fortran from a
bad language into quite a good one. A preprocessor is clearly a useful way to extend or
ameliorate the facilities of a base language.
When designing a language, it is important
to concentrate on the essential requirement of
providing the user with the best language possible for a given effort. One must avoid throwing
in "features" - things which the user may trivially construct within the existing framework.
One must also avoid getting sidetracked
on irrelevancies. For instance it seems pointless
for Ratfor to prepare a neatly formatted listing
of either its input or its output. The user is
presumably capable of the self-discipline
required to prepare neat input that reflects his
thoughts. It is much more important that the
language provide free-form input so he can format it neatly. No one should read the output
anyway except in the most dire circumstances.

Acknowledgements
C. A. R. Hoare once said that "One thing
[the language designer] should not do is to
include untried ideas of his own." Ratfor follows
this precept very closely - everything in it has
been stolen from someone else. Most of the control flow structures are taken directly from the
language C[4] developed by Dennis Ritchie; the
comment and continuation conventions are
adapted from Altran[lO].
I am grateful to Stuart Feldman, whose
patient simulation of an innocent user during
the early days of Ratfor led to several design
improvements and the eradication of bugs. He
also translated the C parse-tables and YACC
parser into Fortran for the first Ratfor version of
Ratfor.
References
[1] B. G. Ryder, "The PFORT Verifier,"
Software-Practice
&
Experience,
October 1974.
[2]
.-\merican National Standard Fortran.
American National Standards Institute,

[3]
[4]

[5]

New York, 1966.
For-word:
Fortran
Development
Newsletter, August 1975.
B. W. Kernighan and D. M. Ritchie, The
C Programming Language, Prentice-Hall,
Inc., 1978.
D. M. Ritchie and K. L. Thompson, "The
UNIX Time-sharing System." CACM, July
1974.

[6]

S. C. Johnson, "YACC - Yet Another
Compiler-Compiler." Bell Laboratories
Computing Science Technical Report #32,
1978.

[7]

D. E. Knuth, "Structured Programming
with goto Statements." Computing Surveys, December 1974.

[8]

B. W. Kernighan and P. J. Plauger,
Software Tools, Addison-Wesley, 1976.

[9]

B. S. Baker, "Struct - A Program which
Structures Fortran", Bell Laboratories
internal memorandum, December 1975.

[10]

A. D. Hall, "The Altran System for
Rational Function Manipulation - A Survey." CACM, August 1971.

2-122 Ratfor
Appendix: Usage on UNIX and GCOS.
Beware - local customs vary. Check with a native before going into the jungle.
UNIX
The program ratfor is the basic translator; it takes either a list of file names or the standard
input and writes Fortran on the standard output. Options include -6x, which uses x as a continuation
character in column 6 (UNIX uses & in column 1), and -C, which causes Ratfor comments to be copied
into the generated Fortran.
The program re provides an interface to the ratfor command which is much the same as cc.
Thus
re [options] files
compiles the files specified by files. Files with names ending in .r are Ratfor source; other files are
assumed to be for the loader. The flags -C and -6x described above are recognized, as are
-e

-f
-I"

-2
-U

compile only; don I load
save intermediate Fortran .f files
Ratfor only; implies -e and -f
use big Fortran compiler (for large programs)
flag undeclared variables (not universally available)

Other flags are passed on to the loader.

GCOS
The program ./ratfor is the bare translator, and is identical to the UNIX version, except that the
continuation convention is & in column 6. Thus
./ratfor files >output
translates the Ratfor source on files and collects the generated Fortran on file 'output' for subsequent
processing.
Jrc provides much the same services as re (within the limitations of GCOS), regrettably with a
somewhat different syntax. Options recognized by ./re include
name
h=/name
r=/name
a=
C=
f=name
g=name

Ratfor source or library, depending on type
make TSS H* file (runnable version); run as /name
update and use random library
compile as ascii (default is bed)
copy comments into Fortran
Fortran source file
gmap source file

Other options are as specified for the ./cc command described in [4].

TSO, TSS, and other systems
Ratfor exists on various other systems; check with the author for specifics.

EFL 2-123

The Programming Language EFL
Stuart I. Feldman
Bell Laboratories
Murray Hill, New Jersey 07974

1. INTRODUCTION

1.1. Purpose
EFL is a clean, general purpose computer language intended to encourage portable programming. It has a uniform and readable syntax and good data and control flow structuring.
EFL programs can be translated into efficient Fortran code, so the EFL programmer can take
advantage of the ubiquity of Fortran, the valuable libraries of software written in that language.
and the portability that comes with the use of a standardized language, without suffering from
Fortran's many failings as a language. It is especially useful for numeric programs. Thus, the
EFL language permits the programmer to express complicated ideas in a comprehensible way.
while permitting access to the power of the Fortran environment.
1.2. History
EFL can be viewed as a descendant of B. W. Kemighan's Ratfor [l]; the name originally
srood for 'Extended Fortran Language'. A. D. Hall designed the initial version of the language
and wrote a preliminary version of a compiler. I extended and modified the language and wrote
a full compiler (in C) for it. The current compiler is much more than a simple preprocessor: it
attempts to diagnose all syntax errors, to provide readable Fortran output, and to avoid a
number of niggling restrictions. To achieve this goal, a sizable two-pass translator is needed.
1.3. Notation
In examples and syntax specifications, boldface type is used to indicate literal words and
punctuation, such as while. Words in italic type indicate an item in a category, such as an
expression. A construct surrounded by double brackets represents a list of one or more of those
items, separated by commas. Thus, the notation
[item)

could refer to any of the following:
item
item, item
item, item, item

The reader should have a fair degree of familiarity with some procedural language. There
will be occasional references to Ratfor and to Fortran which may be ignored if the reader is
unfamiliar with those languages.

2-124 EFL

2. LEXICAL FORM
2.1. Character Set
The following characters are legal in an EFL program:
letters
digits
white space
quotes.
sharp
continuation
braces
parentheses
other

a b c d e f g h i Jk I m
nopqrstuvwxyz
0123456789
blank tab

••
#

-{ }
( )
,;:.+-•/

•<>&-IS

Letter case (upper or lower) is ignored except within strings, so •a• and •A' are treated as the
same character. All of the examples below are printed in lower case. An exclamation mark
('!') may be used in place of a tilde ('-'). Square brackets ('{' and ']') may be used in place of
braces ('{'and'}').

2.2. Lines
EFL is a line-oriented language. Except in special cases (discussed below}, the end of a
line marks the end of a token and the. end of a statement. The trailing portion of a line may be
used for a comment. There is a mechanism for diverting input from one source file to another.
so a single line in the program may be replaced by a number of lines from the other file. Diagnostic messages are labeled with the line number of the file on which they are detected.
2.2.1. White Space
Outside of a character string or comment, any sequence of one or more spaces or tab
characters acts as a single space. Such a space terminates a token.
2.2.2. Comments
A comment may appear at the end of any line. It is introduced by a sharp (#) character,
and continues to the end of the line. (A sharp inside of a quoted string does not mark a comment.) The sharp and succeeding characters on the line are discarded. A blank line is also a
comment. Comments have no effect on execution.
2.2.3. Include Files
It is possible to insert the contents of a file at a point in the source text, by referencing it
in a line like
include Joe
No statement or comment may follow an include on a line. In effect, the include line is
replaced by the lines in the named file, but diagnostics refer to the line number in the included
file. Includes may be nested at least ten deep.

2.2.4. Continuation
Lines may be continued explicitly by using the underscore (_) character. If the last character of a line (after comments and trailing white space have been stripped) is an underscore,
the end of line and the initial blanks on the next line are ignored. Underscores are ignored in
other contexts (except inside of quoted strings). Thus

EFL 2-125

1_000_000_
000
equals 109•
There are also rules for continuing lines automatically: the end of line is ignored whenever it is obvious that the statement is not complete. To be specific. a statement is continued if
the last token on a line is an operator, comma. left brace. or left parenthesis. (A statement is
not continued just because of unbalanced braces or parentheses.) Some compound statements
are also continued automatically; these points are noted in the sections on executable state·ments.

2.2.S. Multiple Statements on a Line
A semicolon terminates the current statement. Thus. it is possible to write more than one
statement on a line. A line consisting only of a semicolon, or a semicolon following a semicolon. forms a null statement.
2.3. Tokens
A program is made up of a sequence of tokens. Each token is a sequence of characters.
A blank terminates any token other than a quoted string. End of line also terminates a token
unless explicit continuation (see above) is signaled by an underscore.
2.3.1. Identifiers
An identifier is a letter or a letter followed by letterS or digits. The following is a list of
the reserved words that have special meaning in EFL. They will be discussed later.
array

automatic
break
call
case
character
common
complex
continue
debug
default
define
dimension
do
double
doubleprecision
else
end
equivalence

exit
external
false
field
for
function
go
goto
if
implicit
include
initial
integer
internal
lengtbof
logical
long
next
option

precision
procedure
read
readbin
real
repeat
return
select
short
sizeof
static
struct
subroutine
true
until
value
while
write
writebin

The use of these words is discussed below. These words may not be used for any other purpose.

2.3.l. Strings
A character string is a sequence of characters surrounded by quotation marks. If the
string is bounded by single-quote marks ( • ) , it may contain double quote marks ( " ) , and vice
versa. A quoted string may not be broken across a line boundary.

2-126 EFL

'hello there·
"ain't misbehavin'"
2.3.3. Integer Constants
An integer constant is a sequence of one or more digits.
0
57
123456

2.3.4. Floating Point Constants
A floating point constant contains a dot and/or an exponent field. An exponent field is a
letter d or e followed by an optionally signed integer constant. If I and J are integer constants
and E is an exponent field, then a floating constant has one of the following forms:

.I
I.
J.J
IE
J.E

.IE
J.JE
2.3.S. Punctuation
Certain characters are used to group or separate objects in the language. These are

parentheses
braces
comma
semicolon
colon
end-of-line

( )
{ }

The end-of-line is a token (statement separator) when the line is neither blank nor continued.
2.3.6. Operators
The EFL operators are written as sequences of one or more non-alphanumeric characters.

+
<

- • I ••

<- > >-

&& II & I
+&&-

--s
II-

-- --

··-

&- I-

A dot ('. ') is an operator when it qualifies a structure element name, but not when it acts as a
decimal point in a numeric constant. There is a special mode (see the Atavisms section) in
which some of the operators may be represented by a string consisting of a dot, an identifier,
and a dot (e.g., .It. ) .
2.4. Macros
EFL has a simple macro substitution facility. An identifier may be defined to be equal to
a string of tokens; whenever that name appears as a token in the program, the string replaces it.
A macro name is given a value in a define statement like

---·---···-~-·---

EFL 2-127
\

n +- 1

define count

Any time the name count appears in the program. it is replaced by the statement
n += 1

A define statement must appear alone on a line; the form is
define

name

rest-of-line

Trailing comments are part of the string.
3. PROGRAM FORM

3.1. Files
A file is a sequence of lines. A file is compiled as a single unit. It may contain one or
more procedures. Declarations and options that appear outside of a procedure affect the
succeeding procedures on that file.
3.2. Procedures
Procedures are the largest grouping of statements in EFL. Each procedure has a name by
which it is invoked. (The first procedure invoked during execution. known as the main procedure, has the null name.) Procedure calls and argument passing are discussed in Section 8.
3.3. Blocks
Statements may be formed into groups inside of a procedure. To describe the scope of
names, it is convenient to introduce the ideas of block and of nesting level. The beginning of a
program file is at nesting level zero. Any options, macro definitions, or variable declarations
there are also at level zero. The text immediately following a procedure statement is at level 1.
After the declarations, a left brace marks the beginning of a new block and increases the nesting level by 1; a right brace drops the level by 1. (Braces inside declarations do not mark
blocks.) (See Section 7.2). An end statement marks the end of the procedure, level I. and the
return to level 0. A name (variable or macro) that is de.fined at level k is de.fined throughout
that block and in all deeper nested levels in which that name is not redefined or redeclared.
Thus, a procedure might look like the following:

# block O
procedure george
real x

x•l
if(x > 2)
#new block
integer x # a different variable
do x = 1,7
write(,.x)
# end of block
end

# end of procedure, return to block 0

3.4. Statements
A statement is terminated by end of line or by a semicolon. Statements are of the following types:

2-128 EFL

Option
Include
Define
Procedure
End
Declarative
Executable
The optl,n statement is described in Section 10. The include, define, and end statements have
been d~bed above; they may not be followed by another statement on a line. Each procedure ~egins with a procetlure statements and finishes with an end statement; these are discussed ip Section 8. Declarations describe types and values of variables and procedures. Exe·
cutable ~tatements cause specific actions to be taken. A block is an example of an executable
s~temept; it is made up of declarative and executable statcrments.
3.S. Labels

An executable statement may have a label which may be used in a branch statement. A
label is an identifier fallowed by a colon, as in
read(, x)
if(x < 3) goto error

error:

fatal ("bad input")

4. DATA TYPES AND VARIABLES
EFL supports a small number of basic (scalar) types. The programmer may define objects
made up of variables of bas~c type; other aggregates may then be defined in terms of previously
defined aggregates.
4.1. Basic Types
The basic types are
logical
integer

fteld(m :n)
real

complex
long real
long complex

character( n)
A logical quantity may take on the two values true and false. An integer may take on any
whole number value in some machine-dependent range. A field quantity is an integer restricted
to a particular closed interval ([m:n]). A 'real' quantity is a floating point approximation to a
real or rational number. A long real is a more precise approximation to a rational. (Real quanUties are represented as single precision floating point numbers; long reals are double precision
floating point numbers.) A complex quantity is an approximation to a complex number. and is
represented as a pair of reals. A character quantity is a fixed-length string of n characters.
4.2. Constants
There is a notation for a constant of each basic type.
A logical may take on the two values

EFL 2-129
true
false
An integer or field constant is a fixed point constant, optionally preceded by a plus or minus
sign, as in

17
-94

+6
0
A long real ('double precision') constant is a floating point constant containing an exponent
field that begins with the letter d. A real ('single precision') constant is any other floating point
constant. A real or long real constant may be preceded by a plus or minus sign. The following
are valid real constants:

17.3
-.4
7.9e-6 ( - 7.9x10-6)
14e9 ( - l.4x 10 10)
The following are valid long real constants

7.9d-6

( - 7.9x10-6)

Sd3

A character constant is a quoted string.

4.3. Variables
A variable is a quantity with a name and a location. At any particular time the variable
may also have a value. (A variable is said to be undefined before it is initialized or assigned its
first value, and after certain indefinite operations are performed.) Each variable has certain
attributes:

4.3.1. Storage Class
The association of a name and a location is either transitory or permanent. Transitory
association is achieved when arguments are passed to procedures. Other associations are permanent (static). (A future extension of EFL may include dynamically allocated variables.)

4.3.l. Scope of Names
The names of common areas are global, as are procedure names: these names may be
used anywhere in the program. All other names are local to the block in which they are
declared.

4.3.3. Precision
Floating point variables are either of normal or long precision. This attribute may be
stated independently of the basic type.

4.4. Arrays
It is possible to declare rectangular arrays (of any dimension) of values of the same type.
The index set is always a cross-product of intervals of integers. The lower and upper bounds of
the intervals must be constants for arrays that are local or common. A formal argument array
may have intervals that are of length equal to one of the other formal arguments. An element
of an array is denoted by the array name followed by a parenthesized comma-separated list of
integer values, each of which must lie within the corresponding interval. (The intervals may
include negative numbers.) Entire arrays may be passed as procedure arguments or in

----------

-··-·--

2-130 EFL

input/output lists, or they may be initialized; all other array references must be to individual
elements.
4.5. Structures
It is possible to define new types which are made up of elements of other types. The
compound object is known as a structure; its constituents are called members of the structure.
The structure may be given a name, which acts as a type name in the remaining statements
within the scope of its declaration. The elements of a structure may be of any type (including
previously defined structures), or they may be arrays of such objects. Entire structures may be
passed to procedures or be used in input/output lists; individual elements of structures may be
referenced. The uses of structures will be detailed below. The following structure might
represent a symbol table:
struct tableentry

{
character(8) name
Integer hashvalue
Integer numberofelements
field(O:l) initialized, used, set
field(O:lO) type

}
S. EXPRESSIONS
Expressions are syntactic forms that yield a value. An expression may have any of the
followina forms, recursively applied:

primary
( expression )
unary-operator expression
expression binary-operator expression

In the following table of operators, atl operators on a line have equal precedence and have
higher precedence than operators on later lines. The meanings of these operators are described
in sections S.3 and S.4.

••
• I

unary + -

++ - -

+ < <• > >• -- -=

&: &:&:
111

• +• -== •• I= •••

&:•

I• &:&:• 11=-

Examples of expressions are

a<b &:& b<c
-(a + sin(x)) I (5+cos(x))••2
5.1. Primaries
Primaries are the basic elements of expressions, as follows:

EFL 2-131

S.1.1. Constants
Constants are described in Section 4.2.
5.1.2. Variables
Scalar variable names are primaries. They may appear on the left or the right side of an
assignment. Unqualified names of aggregates (structures or arrays) may only appear as procedure arguments and in input/output lists.
5.1.3. Array Elements
An element of an array is denoted by the array name followed by a parenthesized list of
subscripts, one integer value for each declared dimension:
a(5)

b(6, -3, 4)

5.1.4. Structure Members
A structure name followed by a dot followed by the name of a member of that structure
constitutes a reference to that element. If that element is itself a structure, the reference may
be further qualified.
a.b
x(3).y(4).z(5)

5.1.5. Procedure Invocations
A procedure is invoked by an expression of one of the forms

procedurename ( )
procedurename ( expression )
procedurename ( expression-], ... , expression-n)
The procedurename is either the name of a variable declared external or it is the name of a
function known to the EFL compiler (see Section 8.5), or it is the actual name of a procedure.
as it appears in a procedure statement. If a procedurename is declared external and is an argument of the current procedure, it is associated with the procedure name passed as actual argument; otherwise it is the actual name of a procedure. Each expression in the above is called an
actual argument. Examples of procedure invocations are
f(x)
workO
g(x, y+J, 'xx')
When one of these procedure invocations is to be performed, each of the actual argument
expressions is first evaluated. The types, precisions, and bounds of actual and formal arguments should agree. If an actual argument is a variable name, array element, or structure
member, the called procedure is permitted to use the corresponding formal argument as the left
side of an assignment or in an input list; otherwise it may only use the value. After the formal
and actual arguments are associated, control is passed to the first executable statement of the
procedure. When a return statement is executed in that procedure, or when control reaches
the end statement of that procedure, the function value is made available as the value of the
procedure invocation. The type of the value is determined by the attributes of the procedurename that are declared or implied in the calling procedure, which must agree with the
attributes declared for the function in its procedure. In the special case of a generic function.
the type of the result is also atfected by the type of the argument. See Chapter 8 for details.

2-132 EFL

S.1.6. Input/Output Expressions
The EFL input/output syntactic forms may be used as integer primaries that have a nonzero value if an error occurs during the input or output. See Section 7. 7.
S.1.7. Coercions
An expression of one precision or type may be converted to another by an expression of
the form
attributes ( expression )

At present, the only attributes permitted are precision and basic types. Attributes are separated
by white space. An arithmetic value of one type may be coerced to any other arithmetic type; a
character expression of one length may be coerced to a character expression of another length;
logical expressions may not be coerced to a nonlogical type. As a special case, a quantity of
complex or 1001 complex type may be constructed from two integer or real quantities by passing two expressions (separated by a comma) in the coercion. Examples and equivalent values
are

(

integer(S.3) =- S
lon1 real (5) • S.OdO
complex(S,3) • S+3i
Most conversions are done implicitly, since most binary operators permit operands of different
arithmetic types. Explicit coercions are of most use when it is necessary to convert the type of
an actual argument to match that of the corresponding formal parameter in a procedure call.

S.1.8. Sizes
There is a notation which yields the amount of memory required to store a datum or an
item of specified type:

(

sizeof ( leftside )
sizeof ( attributes )

In the first case, leftside can denote a variable, array, array element, or structure member. The
value of sizeof is an integer, which gives the size in arbitrary units. If the size is needed in
terms of the size of some specific unit, this can be computed by division:
sizeof(x) I sizeof(lnteger)

yields the size of the variable x in integer words.
The distance between consecutive elements of an array may not equal sizeof because certain data types require final padding on some machines. The lengthof operator gives this larger
value, again in arbitrary units. The syntax is
lengthof ( leftside )
lengthof ( attributes )

S.l. Parentheses
An expression surrounded by parentheses is itself an expression. A parenthesized expression must be evaluated before an expression of which it is a part is evaluated.
S.3. Unary Operators
All of the unary operators in EFL are prefix operators. The result of a unary operator has
the same type as its operand.

('
'\

EFL 2-133

5.3.1. Arithmetic
Unary + has no effect. A unary - yields the negative of its operand.
The prefix operator + + adds one to its operand. The prefix operator - - subtracts one
from its operand. The value of either expression is the result of the addition or subtraction.
For these two operators, the operand must be a scalar, array element, or structure member of
arithmetic type. (As a side effect, the operand value is changed.)
5.3.l. Logical
The only logical unary operator is complement (-). This operator is defined by the equa-

tions
- true == false
- false • true
5.4. Binary Operators
Most EFL operators have two operands, separated by the operator. Because the character
set must be limited, some of the operators are denoted by strings of two or three special characters. All binary operators except exponentiation are left $5Sociative.
5.4.1. Arithmetic
The binary arithmetic operators are

+
•
I
••

addition
subtraction
multiplication
division
exponelltiation

Exponentiation is right associative: a.. b..c - a••(b••c) - a 1bc> The operations have the conventional meaninp: 8+2 - 10, 8-2 - 6, 8•2 - 16, 8/2 - 4, 3.. 2 - 82 - 64.
The type of the result of a binary operation A op B is determined by the types of its
operands:
·
Type of B
Type of A
integer
real
long re.al
complex
long complex

integer
integer
real
long real
complex
long complex

real
real
real
long real
complex
long complex

long real
long real
lpng real
long real
long complex
long complex

complex
complex
complex
long complex
complex
long complex

long complex
long complex
long complex
long complex
long complex
long complex

If the type of an operand differs from the type of the resqlt, the calculation is done as if the
operand were first coerced to the type of the result. lf both operands are integers, the result is
of type integer, and is computed exactly. (Quotients are truncated toward zero, so 8/3-2.)

5.4.l. Logical
The two binary logical operations in Efl, and cµld or, are defined by the truth tables:
A
false
false
true
true

B
false,
true
false

tnie

A and B
fl}lSe
false
false
true

A or B
false
true
true
true

Each of these operators comes in two form~. In one form, the order of evaluation is specified.

2-134 EFL

The expression

(

a&& b
is evaluated by first evaluating a; if it is false then the expression is false and b is not evaluated;
otherwise the expression has the value of b. The expression

a 11 b
is evaluated by first evaluating a; if it is true then the expression is true and b is not evaluated;
otherwise the expression has the value of b. The other forms of the operators (& for and and I
for or) do not imply an order of evaluation. With the latter operators. the compiler may speed
up the code by evaluating the operands in any order.

5.4.3. Relational Operators
There are six relations between arithmetic quantities. These operators are not associative.

EFL Operator

<
<-

>
>-

Meaning
less than
less than or equal to
equal to
not equal to
greater than
greater than or equal

Since the complex numbers are not ordered, the only relational operators that may take complex operands are • • and - • . The character collating sequence is not defined.
5.4.4. Assignment Operators
All of the assignment operators are right associative. The simple form of assignment is

basic-left-side •

expression

A basic-left-side is a scalar variable name, array element, or structure member of basic type.
This statement computes the expression on the right side. and stores that value (possibly after
coercing the value to the type of the left side) in the location named by the left side. The
value of the assignment expression is the value assigned to the left side after coercion.
There is also an assignment operator corresponding to each binary arithmetic and logical
operator. In each case, a op- b is equivalent to a - a op b. (The operator and equal sign
must not be separated by blanks.) Thus, n + •2 adds 2 to n. The location of the left side is
evaluated only once.

5.S. Dynamic Structures
EFL does not have an address (pointer. reference) type. However. there is a notation for
dynamic structures,

leftside - > structurename
This expression is a structure with the shape implied by structurename but starting at the loca-

tion of leftside. In effect, this overlays the structure template at the specified location. The leftside must be a variable, array, array element, or structure member. The type of the lefiside
must be one of the types in the structure declaration. An element of such a structure is
denoted in the usual way using the dot operator. Thus,
place(i) - > st.elt
refers to the elt member of the st structure starting at the ;'" element of the array place.

EFL 2-135

S.6. Repetition Operator
Inside of a list, an element of the form

integer-constant-expression $ constant-expression

is equivalent to the appearance of the expression a number of times equal to the first expression.
Thus,
(3, 3$4, 5)

is equivalent to
(3, 4, 4, 4, S)
S. 7. Constant Expressions
If an expression is built up out of operators {other than functions) and constants. the
value of the expression is a constant, and may be used anywhere a constant is required.
6. DECLARATIONS
Declarations statement describe the meaning, shape, and size of named objects in the
EFL language.
6.1. Syntax
A declaration statement is made up of attributes and variables. Declaration statements are
of two form:

allributes variable-list
attributes { declarations

In the first case, each name in the variable-list has the specified attributes. In the second, each
name in the declarations also has the specified attributes. A variable name may appear in more
than one variable list, so long as the attributes are not contradictory. Each name of a nonargument variable may be accompanied by an initial value specification. The declarations inside the
braces are one or more declaration statements. Examples of declarations are
Integer k=2
long real b(7,3)
common (cname)

{
integer i
long real array(S,0:3) x, y
character(7) ch

}
6.2. Attributes
6.2.1. Basic Types
The following are basic types in declarations
logical
integer
fteld(m:n)
character(k)
real
complex

2-136 EFL

In the above. the quantities k. m. and n denote integer constant expressions with the properties k > 0 and n > m.

6.2.2. Arrays
The dimensionality may be declared by an array attribute
array(bi. ...• b,,)
Each of the b; may either be a sirigle integer expression or a pair of integer expressions
separated by a colon. The pair of expressions form a lower and an upper bound; the single
expression is an upper bound with an implied lower bound of 1. The number of dimensions is
equal to n. the number of bounds. All of the integer expressions must be constants. An
exception is permitted only if all of the variables associated with an array declarator are formal
arguments of the procedure~ in this case, each bound must have the property that
upper-lower+ 1 is equal to a formal argument of the procedure. (The compiler has limited ability to simplify expressions. but it will recognize important cases such as (O:n-1). The upper
bol.lnd for the last dimension (b,,) may be marked by an asterisk ( • ) if the size of the array is
not known. The fallowing are legal array attributes:

array(S)
array(S, 1:5, -3:0)
array(S, •)
array(O:m-1, m)
6.2.J. Structures
A structure declaration is of the form
struct structnqme { declaration statements }
The structname is optional; if it is present. it acts as if it were the name of a type in the rest of
its scope. Each name that appears inside the declarations is a memberof the structure. and has a
special meaning when used to qualify any variable declared with the structure type. A name
may appear as a member of any number of structures. and may also be the name of an ordinary
variable. since a structure member name is used only in contexts where the parent type is
known. The following are valid structure attributes

struct xx
{
integer a, b
real x(5)
}
struct { xx z (3); character(5) y }
The last line defines a structure containing an array of three xx's and a character string.

6.2.4. Precision
Variables of floating point (real or complex) type may be declared to be long to ensure
they have higher precision than ordinary floating point variables. The default precision is short.
6.2.5. Common
Certain objects called common areas have external scope. and may be referenced by any
procedure that has a declaration for the name using a
common ( commonareaname )
attribute. All of the variables declared with a particular common attribute are in the same

EFL 2-137

block; the order in which they are declared is significant. Declarations for the same block in
differing procedures must have the variables in the same order and with the same types. precision, and shapes, though not necessarily with the same names.
6.2.6. External
If a name is used as the procedure name in a procedure invocation. it is implicitly
declared to have the external attribute. If a procedure name is to be passed as an argument. it
is necessary to declare it in a statement of the form
external ( name ]

If a name has the external attribute and it is a formal argument of the procedure. then it is
associated with a procedure identifier passed as an actual argument at each call. If the name is
not a formal argument, then that name is the actual name of a procedure, as it appears in the
corresponding procedure statement.
6.3. Variable List
The elements of a variable list in a declaration consist of a name, an optional dimension
specification, and an optional initial value specification. The name follows the usual rules. The
dimension specification is the same form and meaning as the parenthesized list in an array
attribute. The initial value specification is an equal sign ( •) followed by a constant expressi<'n.
If the name is an array, the right side of the equal sign may be a parenthesized list of constant
expressions, or repeated elements or lists; the total number of elements in the list must not
exceed the number of elements of the array, which are filled in column-major order.

6.4. The Initial Statement
An initial value may also be specified for a simple variable, array, array element. or
member of a structure using a statement of the form
initial ( var = val]
The var may be a variable name, array element specification, or member of structure. The
right side follows the same rules as for an initial value specification in other declaration statements.
7. EXECUTABLE STATEMENTS
Every useful EFL program contains executable statements - otherwise it would not do
anything and would not need to be run. Statements are frequently made up of other state·
ments. Blocks are the most obvious case, but many other forms contain statements as constituents.
To increase the legibility of EFL programs, some of the statement forms can be broken
without an explicit continuation. A square (O) in the syntax represents a point where the end
of a line will be ignored.

7.1. Expression Statements
7.1.1. Subroutine Call
A procedure invocation that returns no value is known as a subroutine call. Such an
invocation is a statement. Examples are
work (in, out)
run()

Input/output statements (see Section 7.7) resemble procedure invocations but do not
yield a value. If an error occurs the program stops.

2-'138 EFL

7.1.2. Assignment Statements
An expression that is a simple assignment {=) or a compound assignment ( + • etc.> is a
statement:
a•b
a= sin(:d/6

x •= y
7.2. Blocks
A block is a compound statement that acts as a statement. A block begins with a left
brace. optionally followed by declarations, optionally followed by executable statements, followed by a right brace. A block may be used anywhere a statement is permitted. A block is
not an expression and does not have a value. An example of a block is
{
integer i
# this variable is unknown outside the braces
bi1 - 0
do i • 1,n
if(big < a(i))
big • a(i)

7.3. Test Statements
Test statements permit execution of certain statements conditional on the truth of a predi·
cate.
7.3.1. If Statement
The simplest of the test statements is the if statement, of form
if ( logical-expression ) CJ statement

the logical expression is evaluated; if it is true, then the statement is executed.

7.3.2. If-Else
A more general statement is of the form
if ( logical-expression) CJ statement-I CJ else CJ starement-2
If the expression is true then statement-] is executed, otherwise statemenr-2 is executed. Either
of the consequent statements may itself be an If-else so a completely nested test sequence is
possible:
if(x<y)
if(a<b)
k•l
else

k•2
else
if(a<b)

m= 1
else

m =2

An else applies to the nearest preceding un-elsed if. A more common use is as a sequential

test:

EFL 2-139

if(x•=l)
k==l
else if(x==J I x===S)
k==2
else

k=3
7.3.3. Select Statement
A multiway test on the value of a quantity is succinctly stated as a select statement, which
has the general form

select ( expression ) Cl block
Inside the block two special types of labels are recognized. A prefix of the form
case I constant I :
marks the statement to which control is passed if the expression in the select has a value equal

to one of the case constants. If the expression equals none of these constants, but there is a
label default inside the select, a branch is taken to that point; otherwise the statement following
the right brace is executed. Once execution begins at a case or default label, it continues until
the next case or default is encountered. The else-if example above is better written as

select(x)
{
case 1:
k•l
case 3,S:
k=2
default:
k==J
Note that control does not 'fall through' to the next case.

7.4. Loops
The loop forms provide the best way of repeating a statement or sequence of operations.
The simplest (while) form is theoretically sufficient, but it is very convenient to have the more
general loops available, since each expresses a mode of control that arises frequently in practice.

7.4.1. While Statement
This construct has the form
while ( logical-expression ) Cl statement
The expression is evaluated; if it is true, the statement is executed, and then the test is performed again. If the expression is false, execution proceeds to the next statement.

7.S. For Statement
The for statement is a more elaborate looping construct. It has the form
for ( initial-statement , Cl logical-expression , Cl iteration-statement) Cl body-statement
Except for the behavior of the next statement (see Section 7.6.3), this construct is equivalent
to

2-140 EFL

initial-statement
while ( logical-expression )
(

body-statement
iteration-statement
}
This form is useful for general arithmetic iterations, and for various pointer-type operations.
The sum of the integers from 1 to 100 can be computed by the fragment

n•O
for(i = 1, i < = 100, i + = 1)
n += i
Alternatively, the computation could be done by the single statement
for( (n•O; i•l}, 1<•100, (n+=i; ++i})
Note that the body of the for loop is a null statement in this case. An example of following a
linked list will be given later.

7.S.1. Repeat Statement
The statement
repeat Cl statement
executes the statement, then does it again, without any termination test. Obviously, a test
inside the statement is needed to stop the loop.
7 .S.2. Repeat ••• Until Statement
The while loop performs a test before each iteration. The statement
repeat Cl statement Cl until ( logical-expression )
executes the statement. then evaluates the logical; if the logical is true the loop is complete~ otherwise control returns to the statement. Thus, the body is always executed at least once. The
until refers to the nearest preceding repeat that has not been paired with an until. In practice.
this appears to be the least frequently used looping construct.

7.5.3. Do Loops
The simple arithmetic progression is a very common one in numerical .applications. EFL
has a special loop form for ranging over an ascending arithmetic sequence
do variable= expression-I. expression-2, expression-3

statement
The variable is first given the value expression-I. The statement is executed, then expression-3
is added to the variable. The loop is repeated until the variable exceeds expression-2. If
expression-3 and the preceding comma are omitted, the increment is taken to be 1. The loop
above is equivalent to

t2 - expression-2
t3 - expression-3
for(variable - expression-I , variable < - t2 , variable +- t3)
statement
(The compiler translates EFL do statements into Fortran DO statements, which are in tum usually compiled into excellent code.) The do variable may not be changed inside of the loop, and
expression-] must not exceed expression-2. The sum of the first hundred positive integers could

EFL 2-141

be computed by

n•O

do i = 1, 100
n += i
7.6. Branch Statements
Most of the need for branch statements in programs can be averted by using the loop and
test constructs. but there are programs where they are very useful.
7.6.1. Goto Statement
The most general, and most dangerous, branching statement is the simple unconditional
goto label
After executing this statement, the next statement performed is the one following the given
label. Inside of a select the case labels of that block may be used as labels. as in the following
example:
select(k)
case 1:
error(7)
case 2:

k•2
1oto case 4
case 3:

k=5
goto case 4
case 4:
flxup(k)
1oto defa ult
default:
prmsg ("ouch")
(If two select statements are nested, the case labels of the outer select are not accessible from
the inner one.)
7.6.2. Break Statement
A safer statement is one which transfers control td the statement fallowing the currem
select or loop form. A statement of this sort is almost always needed in a repeat loop:
repeat

{
do a computation
if (finished)
break

More general forms permit controlling a branch out of more than one construct.

2-142 EFL

break 3
transfers control to the statement following the third loop and/or select surrounding the statement. It is possible to specify which type of construct (for, while. repeat. do. or select) is to
be counted. The statement
break while
breaks out of the first surrounding while statement. Either of the statements
break 3 for
break for 3
will transfer to the statement after the third enclosing for loop.

7.6.3. Next Statement
The next statement causes the first surrounding loop statement to go on to the next iteration: the next operation performed is the test of a while, the iteration-statement of a for, the
body of a repeat, the test of a repeat ••• until, or the increment of a do. Elaborations similar to
those for break are available:
next
next 3
next 3 for
next for 3
A next statement ignores select statements.
7.6.4. Retum
The last statement of a procedure is followed by a return of control to the caller. If it is
desired to effect such a return from any other point in the procedure, a
retum
statement may be executed. Inside a function procedure, the function value is specified as an
argument of the statement:
retum ( expression )
7.7. Input/Output Statements
EFL has two input statements (read and readbin), two output statements (write and writebin), and three control statements (endfile. rewind, and backspace). These forms may be
used either as a primary with a integer value or as a statement. If an exception occurs when
one of these forms is used as a statement, the result is undefined but will probably be treated as
a fatal error. If they are used in a context where they return a value, they return zero if no
exception occurs. For the input forms, a negative value indicates end-of-file and a positive
value an error. The input/output part of EFL very strongly reflects the facilities of Fortran.
7.7.1. Input/Output Units
Each 110 statement refers to a 'unit', identified by a small positive integer. Two special
units are defined by EFL, the standard input unit and the standard output unit. These particular
units are assumed if no unit is specified in an 1/0 transmission statement.
The data on the unit are organized into records. These records may be read or written in a
fixed sequence, and each transmission moves an integral number of records. Transmission
proceeds from the first record until the end offile.

EFL 2-143

7. 7.2. Binary Input/Output
The readbin and writebin statements transmit data in a machine-dependent but swift
manner. The statements are of the form
writebin ( unit , binary-output-list )
readbin ( unit , binary-input-list )
Each statement moves one unformatted record between storage and the device. The unit is an
integer expression. A binary-output-list is an iolist (see below) without any format specifiers. A
binary-input-list is an iolist without format specifiers in which each of the expressions is a variable name, array element, or structure member.

7.7.3. Formatted Input/Output
The read and write statements transmit data in the form of lines of characters. Each
statement moves one or more records Oines). Numbers are translated into decimal notation.
The exact form of the lines is determined by format specifications, whether provided explicitly
in the statement or implicitly. The syntax of the statements is
write( unit, formatted-output-list)·
read ( unit , formatted-input-list )
The lists are of the same form as for binary 1/0, except that the lists may include format
specifications. If the unit is omitted, the standard input or output unit is used.

7. 7.4. lolists
An iolist specifies a set of values to be written or a set of variables into which values are to
be read. An iolist is a list of one or more ioexpressions of the form

expression
{ iolist}
do-specification { iolist }

For formatted 1/0, an ioexpression may also have the forms
ioexpression : format-specifier
: format-specifier

A do-specification looks just like a do statement, and has a similar effect: the values in the braces
are transmitted repeatedly until the do execution is complete.

7. 7.5. Formats
The following are permissible format-specifiers. The quantities w, d, and k must be
integer constant expressions.

\
/

2-144 EFL
i( w)

f( w,"'
e(w,a)
I ( w)

c
c( w)

s (k)
x(k)

•

integer with w digits
floating point number of w characters,
d of them to the right of the decimal point.
floating point number of w characters,
d of them to the right of the decimal point,
with the exponent field marked with the letter e
logical field of width w characters,
the first of which is t or t
(the rest are blank on output, ignored on input)
Standing for true and false respectively
character string of width equal to the length of the datum
character string of width w
skip k lines
skip k spaces
use the characters inside the string as a Fortran format

If no format is specified for an item in a formatted input/output statement, a default form is
chosen.

If an item in a list is art array name, then the entire array is transmitted as a sequence of
elements, each with its own format. The elements are transmitted in column-major order. the
same order used for array initializations.
7. 7.6. Manipulation statements
The three input/output statements
backspace (unit)
rewind (unit)
endftle(unit)
look like ordinary procedure cails. but may be used either as statements or as integer expressions which yield non-zero if an error is detected. backspace causes the specified unit to back
up, so that the next read will re-read the previous record. and the next write will over-write it.
rewind moves the device to its begiMing, so that the next input statement will read the first
record. endfile causes the file to be marked so that the record most recently written will be the
last record on the file, and any attempt to read past is an error.
8. PROCEDURES
Procedures are the basic unit of an EFL program, and provide the means of segmenting a
program into separately compilable and named parts.
8.1. Procedure Statement
Each procedure begins with a statement of one of the forms

procedure
attributes procedure procedurename
attributes procedure procedurename ( )
attributes procedure procedurename ( ( name ] )
The first case specifies the main procedure, where execution begins. In the two other cases. the
a11ributes may specify precision and type, or they may be omitted entirely. The precision and
type of the procedure may be declared in an ordinary declaration statement. If no type is
declared, then the procedure is called a subroutine and no value' may be returned for it. Otherwise, the procedure is a function and a value of the declared type is returned for each call.
Each name inside the parentheses in the last form above is called a formal argument of the procedure.

(

EFL 2-145

8.2. End Statement
Each procedure terminates with a statement
end
8.3. Araument Association
When a procedure is invoked, the actual arguments are evaluated. If an actual argument
is the name of a variable. an array element. or a structure member. that entity becomes associated with the formal argument, and the procedure may reference the values in the object, and
assign to it. Otherwise, the value of the actual is associated with the formal argument, but the
procedure may not attempt to change the value of that formal argument.
If the value of one of the arguments is changed in the procedure. it is not permitted that
the corresponding actual argument be associated with another formal argument or with a com·
mon element that is referenced in the procedure.

8.4. Execution and Return Values
After actual and formal arguments have been associated, control passes to the first executable statement of the procedure. Control returns to the invoker either when the end statement
of the procedure is reached or when a retum statement is executed. If the procedure is a function (has a declared type). and a return (value) is executed, the value is coerced to the correct
type and precision and returned.

8.5. Known Functions
A number of functions are known to EFL. and need not be declared. The compiler
knows the types of these functions. Some of them are gtneric; i.e., they name a family of functions that differ in the types of their arguments and return values. The compiler chooses which
element of the set to invoke based upon the attributes of the actual arguments.
8.5.1. Minimum and Maximum Functions
The generic functions are min and max. The min calls return the value of their smallest
argument; the max calls return the value of their largest argument. These are the only functions that may take different numbers of arguments in different calls. If any of the arguments
are long real then the result is long real. Otherwise. if any of the arguments are real then the
result is real; otherwise all the arguments and the result must be integer. Examples are
min(S, x, -3.20)
max(i, z)
8.5.2. Absolute Value
The abs function is a generic function that returns the magnitude of its argument. For
integer and real arguments the type of the result is identical to the type of the argument; for
complex arguments the type of the result is the real of the same precision.
8.5.3. Elementary Functions
The following generic functions take arguments of real, long real, or complex type and
return a result of the same type:

2-146 EFL

sine function
cosine function
exponential function Ce·'").
natural (base e) logarithm
common (base 10) logarithm
square root function (.JX).

sin
cos

exp
log
loglO

sqrt

In addition, the following functions accept only real or long real arguments:

atan
atan2

atan (x)-tan- 1x
atan2(x y)-tan- 1 ~
'

8.5.4. Other Generic Functions
The sign functions takes two arguments of identical type; sign{x,y) - sgn(y)lxl. The
mod function yields the remainder of its first argument when divided by its second. These

functions accept integer and real arguments.

9. ATAVISMS
Certain facilities are included in the EFL language to ease the conversion of old Fortran
or Ratfor programs to EFL.

9.1. Escape Lines
In order to make use of nonstandard features of the local Fortran compiler, it is occasionally necessary to pass a particular line through to the EFL compiler output. A line that begins
with a percent sign ('Oft') is copied through to the output, with the percent sign removed but no
other change. Inside of a procedure. each escape line is treated as an executable statement. If
a sequence of lines constitute a continued Fortran statement, they should be enclosed in braces.
9.2. Call Statement
A subroutine call may be preceded by the keyword call.
call joe
call work (17)
9.3. Obsolete Keywords
The following keywords are recognized as synonyms of EFL keywords:
Fortran
double precision
function
subroutine

EFL
long real
procedure
procedure (untyped)

9.4. Numeric Labels
Standard statement labels are identifiers. A numeric (positive integer constant) label is
also permitted; the colon is optional following a numeric label.

(

EFL 2-147

9.S. lmpJlcit Declarations
If a name is used but does not appear in a declaration. the EFL compiler gives a warning
and assumes a declaration for it. If it is used in the context of a procedure invocation. it is
assumed to be a procedure name; otherwise it is assumed to be a local variable defined at nesting level 1 in the current procedure. The assumed type is determined by the first letter of the
name. The association of letters and types may be given in an implicit statement. with syntax
implicit ( letter-list ) type
where a letter-list is a list of individual letters or ranges (pair of letters separated by a minus
sign). If no implicit statement appears, the following rules are assumed:
implicit (a-h, o-z) real
implicit (i- n) integer

9.6. Computed goto
Fonran contains an indexed multi-way branch; this facility may be used in EFL by the
computed GOTO:
goto ( ( label ) ) , expression
The expression must be of type integer and be positive but be no larger than the number of
labels in the list. Control is passed to the statement marked by the label whose position in the
list is equal to the expression.

9.7. Go To Statement
In unconditional and computed goto statements, it is permissible to separate the go and to
words, as in
go to xyz
9.8. Dot Names
Fonran uses a restricted character set, and represents certain operators by multi-character
sequences. There is an option (dots=on; see Section 10.2) which forces the compiler to recognize the forms in the second column below:
<
<=

>
>•

----

&:&:

II
true
false

.It.
.le.
.gt.
.ge.
.eq.
.ne.
.and.
.or.
.andand.
.oror•
•not •
•true.
.false.

In this mode. no structure element may be named It, le, etc. The readable forms in the left
column are always recognized.

2-148 EFL

9.9. Complex Constants
A complex constant may be written as a parenthesized list of real quantities. such as
(1.5, 3.0)
The preferred notation is by a type coercion,
complex(l.S, 3.0)
9.10. Function Values
The preferred way to return a value from a function in EFL is the return (value) construct. However, the name of the function acts as a variable to which values may be assigned;
an ordinary return statement returns the last value assigned to that name as the function value.
9.11. Equivalence
A statement of the form
equivalence vi. v2 •••• , Vn
declares that each of the v; starts at the same memory location. Each of the v; may be a variable name, array element name, or structure member.
9.12. Minimum and Maximum Functions
There are a number of non-generic functions in this category. which differ in the required
types of the arguments and the type of the return value. They may also have variable numbers
of arguments, but all the arguments must have the same type.
Function
amino
aminl
minO
minl
dminl

Argument Type
integer
real
integer
real
long real

Result Type
real
real
integer
integer
long real

amaxO
amaxl
maxO
maxl
dmaxl

integer
real
integer
real
long real

real
real
integer
integer
long real

10. COMPILER OPTIONS
A number of options can be used to control the output and to tailor it for various compilers and systems. The defaults chosen are conservative, but it is sometimes necessary to
change the output to match peculiarities of the target environment.
Options are set with statements of the form
option ( opt 1
where each opt is of one of the forms
option name
optionname

= optionvalue

The optionvalue is either a constant (numeric or string) or a name associated with that option.
The two names yes and no apply to a number of options.

EFL 2-149

10.1. Default Options
Each option has a default setting. It is possible to change the whole set of defaults to
those appropriate for a particular environment by using the system option. At present, the only
valid values are system=unix and system=gcns.
10.2. Input Language Options
The dots option determines whether the compiler recognizes .It. and similar forms. The
default setting is no.
10.3. Input/Output Error Handling
The ioerror option can be given three values: none means that norte of the 110 statements
may be used in expressions, since there is no way to detect errors. The implemerttation of the
ibm form uses ERR- and END- clauses. The implementation of the fortran77 form uses
IOSTAT- clauses.

10.4. Continuation Conventions
By default, continued Fortran statements are indicated by a character in column 6 (Standard Fortran). The option continue-column! puts an ampersand (&) in the first column of
the continued lines instead.
10.5. Default Formats
If no format is specified for a datum in an iolist for a read or write statement, a default is
provided. The default formats can be changed by setting certain options
Option
iformat
rformat
dformat
zformat
zdformat
lformat

Type
integer
real
long real
complex
long complex
logical

The associated value must be a Fortran format, such as
option rformat=f22.6

10.6. Alignments and Sizes
In order to implement character variables, structures, and the sizeof and lengthof operators. it is necessary to know how much space various Fortran data types require, and what
boundary alignment properties they demand. The relevant options are
Fortran Type
integer
real
long real
complex
logical

Size Option
isize
rsize
dsize
zsize
Isize

Alignment Option
ialign
ralign
dalign
zalign
lalign

The sizes are given in terms of an arbitrary unit~ the aHgnment is given in the same units. The
option charperint gives the number of characters per integer variable.

2-150 EFL

10.7. Default Input/Output Units
The options ftnin and ftnout are the numbers of the standard input and output units.
The default values are ftnin=S and ftnout=6.
10.8. Miscellaneous Output Control Options
Each Fortran procedure generated by the compiler will be preceded by the value of the
procheader option.
No Hollerith strings will be passed as subroutine arguments if hollincaJl=no is specified.
The Fortran statement numbers normally start at 1 and increase by I. It is possible to
change the increment value by using the deltastno option.

11. EXAMP.LES
In order to show the flavor or programming in EFL, we present a few examples. They are
short, but show some of the convenience of the language.
11.1. File Copying
The following short program copies the standard input to the standard output, provided
that the input is a formatted file containing lines no longer than a hundred characters.
procedure # main program
character<tOO) 11.IRe
while( read(, 11.Jte) =• 0)
write( , line)
end
Since read returns zero until the end of file (or a read error), this program keeps reading and
writing until the input is exhausted.

11.2. Matrix Multiplication
The following procedure multiplies the m x n matrix a by the n x p matrix b to give the
m x p matrix c. The calculation obeys the formula cu - !, a;k bk1.
procedure matmul (a,b,c, m,n,p)
integer i, j, k, m, n, p
long real a (m,n), b(n,p), c(m,p)
do i = 1,m
do J = 1,p
{
c(ij) = O
do k • 1,n
c(ij) + a(i,k) • b(kj)

end

11.3. Searching a Linked List
Assume we have a list of pairs of numbers (x ,y). The list is stored as a linked list sorted
in ascending order of x values. The following procedure searches this list for a particular value
of x and returns the corresponding y value.

EFL 2-151

define LAST
0
define NOTFOUND -1
integer procedure val (list, first, x)

list is an array of structures.
# Each structure contains a thread index value, an x, and a y value.
struct
integer nextindex
integer x, y
} list(•)
integer first, p, arg
for(p •first , p =LAST && list(p).x< =x, p • list(p).nextindex)
if (list (p) .x • = x)
return( llst(p).y)
00

retum (NOTFOUND)
end
The search is a single for loop that begins with the head of the list and examines items until
either the list is exhausted (p- -LAST) or until it is known that the specified value is not on
the list Oist(p).x > x). The two tests in the conjunction must be performed in the specified
order to avoid using an invalid subscript in the list(p) reference. Therefore. the && operator is
used. The next element in the chain is found by the iteration statement p=list(p).nextindex.
11.4. Walking a Tree
As an example of a more complicated problem, let us imagine we have an expression tree
stored in a common area. and that we want to print out an infix form of the tree. Each node is
either a leaf (containing a numeric value) or it is a binary operator, pointing to a left and a right
descendant. In a recursive language, such a tree walk would be implement by the following
simple pseudocode:

if this node is a leaf
print its value
otherwise
print a left parenthesis
print the left node
print the operator
print the right node
print a right parenthesis

In a nonrecursive language like EFL, it is necessary to maintain an explicit stack to keep track
of the current state of the computation. The following procedure calls a procedure outch to
print a single character and a procedure outval to print a value.

2-152 EFL

procedure walk (first) # print out an expression tree
integer first
# index of root node
integer currentnode
integer stackdepth
common (nodes) struct
{
character(t> op
integer leftp, rightp
real val
} treeUOO) # array of structures
struct
integer nextstate
integer nodep
} stackframe(lOO)
define NODE
define STACK

tree<currentnode)
stackframe (stick depth)

# nextstate values
define DOWN
1
define LEFT
2
define RIGHT
3

# initialize stack with root node
stackdepth • 1
STACK.nextstate = DOWN
STACK.nodep • first

EFL 2-153

while( stackdepth > 0 )
(
currentnode
ST ACK.nodep
select(STACK.nextstate)
(
case DOWN:
if(NODE.op = = " ") # a leaf
(
outvaH NODE.val )
stackdepth - • 1

}
else

( # a binary operator node
outch( "(")
STACK.nextstate • LEFT
stackdepth + 1
STACK.nextstate • DOWN
STACK.nodep = NODE.leftp

}
case LEFT:
outch ( NODE.op )
STACK.nextstate =RIGHT
stackdepth + 1
STACK.nextstate • DOWN
STACK.nodep
NODE.rightp

=
=

case RIGHT:
outch( ")")
stackdepth - = 1
end
12. PORTABILITY
One of the major goals of the EFL language is to make it easy to write ponable programs.
The output of the EFL compiler is intended to be acceptable to any Standard Fortran compiler
(unless the fortran77 option is specified).
12.1. Primitives
Certain EFL operations cannot be implemented in portable Fortran. so a few machinedependent procedures must be provided in each environment.
12.1.1. Character String Copying
The subroutine efiasc is called to copy one character string to another. If the target string
is shorter than the source, the final characters are not copied. If the target string is longer. its
end is padded with blanks. The calling sequence is
subroutine eflasc(a. la. b, lb)
integer a(•), la, b(•), lb
and it must copy the first lb characters from b to the first la characters of a.

""\

2-154 EFL

12.1.2. Character Strini Comparisons
The function eficmc is invoked to determine the order of two character strings. The
declaration is
integer function eflcmc(a. la, b. lb)
integer a(•). la. b(•). lb
The function returns a negative value if the string a of length la precedes the string b of length
lb. It returns zero if the strings are equal. and a positive value otherwise. If the strings are of
differing length. the comparison is carried out as if the end of the shorter string were padded
with blanks.
13. ACK~OWLEDGMENTS
A. D. Hall originated the EFL language and wrote the first compiler for it~ he also gave
inestimable aid when I took up the project. B. W. Kernighan and W. S. Brown made a number
of useful suggestions about the language and about this report. N. L. Schryer has acted as willing. cheerful, and severe firs~ user and helpful critic of each new version and facility. J. L.
Blue. L. C. Kaufman. and D. 0. Warner made very useful contributions by making serious use
of the compiler. and noting and tolerating its misbehaviors.
14. REFERENCE
1.
B. W. Kernighan, "Ratfor - A Preprocessor for a Rational Fortran". Bell Laboratories
Computing Science Technical Report #SS

EFL 2-155

APPENDIX A. Relation Between EFL and Ratfor
There are a number of differences between Ratfor and EFL, since EFL is a defined
language while Ratfor is the union of the special control structures and the language accepted
by the underlying Fortran compiler. Ratfor running over Standard Fortran is almost a subset of
EFL. Most of the features described in the Atavisms section are present to ease the conversion
of Ratfor programs to EFL.
There are a few incompatibilities: The syntax of the for statement is slightly different in
the two languages: the three clauses are separated by semicolons in Ratfor, but by commas in
EFL. (The initial and iteration statements may be compound statements in EFL because of
this change). The input/output syntax is quite different in the two languages, and there is no
FORMAT statement in EFL. There are no ASSIGN or assigned GOTO statements in EFL.
The major linguistic additions are character data, factored declaration syntax, block structure, assignment and sequential test operators, generic functions, and data structures. EFL permits more general forms for expressions, and provides a more uniform syntax. (One need not
worry about the Fortran/Ratfor restrictions on subscript or DO expression forms, for example.)
APPENDIX B. COMPILER
B.1. Current Version
The current version of the EFL compiler is a two-pass translator written in portable C. It
.implements all of the features of the language described above except for long complex
numbers. Versions of this compiler run under the and uN1xt operating systems.

B.l. Dia1nostics
The EFL compiler diagnoses all syntax errors. It gives the line and file name (if known)
on which the error was detected. Warnings are given for variables that are used but not explicitly declared.

B.3. Quality of Fortran Produced
The Fortran produced by EFL is quite clean and readable. To the extent possible, the
variable names that appear in the EFL program are used in the Fortran code. The bodies of
loops and test constructs are indented. Statement numbers are consecutive. Few unneeded
GOTO and CONTINUE statements are used. It is considered a compiler bug if incorrect Fortran is produced (except for escaped lines). The following is the Fortran procedure produced
by the EFL compiler for the matrix multiplication example (Section 11.2):
subroutine matmul (a, b, c, m, n, p)
integer m, n, p
double precision a(m, n), b(n, p), c(m, p)
integer i, j, k
do 3 i • 1, m
do 2 j = 1, p
c(i, j) = 0
do 1 k • 1, n
c(i, j)
c(i, j) +a(i, k)*b(k, j)
1
continue
2
continue
3
continue
end

The following is the procedure for the tree walk (Section 11.4):
tUNIX is a Trademark of Bell Laboratories.

2-156 EFL

subroutine walk (first>
integer first
common /nodes/ tree
integer tree(4, 100)
real treel (4, 100)
integer staame(2, 100), staptb, curode
integer constl(l)
equivalence (tree(l,1), treet<l,l))
data constl (1) I 4h
I
c print out an expression tree
c index of root node
c array of structures
c nextstate values
c initialize stack with root node
stapth = 1
staame(l, stapth) = 1
staame(2, staptb) = first
1 if (stapth .le. 0) goto 9
curode
staame(2, stapth)
1oto 7
2
if (tree(l, curode) .ne. constl (1)) goto 3
call outvaHtreeH4, curode))
c a leaf "
stapth = stapth-1
goto 4
caJI outch (1h0
3
c a binary operator node
staame(l, staptb) = 2
stapth
stapth+l
staame(l, staptb)
1
staame(2, stapth)
tree(2, curode)
4
goto 8
5
call outch (tree(l, curode))
staame(l, stapth) = 3
stapth
stapth + 1
staame(l, stapth) = 1
staame(2, stapth) = tree(J, curode)
goto 8
6
call outch (1 b))
stapth = stapth-1
goto 8
7
if (staame(l, stapth) .eq. 3) goto 6
if (staame(l, staptb) .eq. 2) goto 5
if (staame(l, stapth) .eq. 1) goto 2
8
continue
goto 1
9 continue
end

=
=

APPENDIX C. CONSTRAINTS ON THE DESIGN OF THE EFL LANGUAGE
Although Fortran can be used to simulate any finite computation. there are realistic limits
on the generality of a language that can be translated into Fortran. The design of EFL was constrained by the implementation strategy. Certain of the restrictions are petty (six character
external names). but others are sweeping (lack of pointer variables). The following paragraphs

EFL 2-157

describe the major limitations imposed by Fortran.
C.1. External Names
External names (procedure and COMMON block names) must be no longer than six
characters in Fortran. Further, an external name is global to the entire program. Therefore.
EFL can support block structure within a procedure. but it can have only one level of external
name if the EFL procedures are to be compilable separately. as are Fortran procedures.
C.2. Procedure Interface
The Fortran standards. in eft'ect. permit arguments to be passed between Fortran procedures either by reference or by copy-in/copy-out. This indeterminacy of specification shows
through into EFL. A program that depends on the method of argument transmission is illegal
in either language.
There are no procedure-valued variables in Fortran: a procedure name may only be passed
as an argument or be invoked; it cannot be stored. Fortran (and EFL> would be noticeably
simpler if a procedure variable mechanism were available.
C.3. Pointers
The most grievous problem with Fortran is its lack of a pointer-like data type. The implementation of the compiler would have been far easier if certain hard cases could have been
handled by pointers. Further. the language could have been simplified considerably if pointers
were accessible in Fortran. (There are several ways of simulating pointers by using subscripts.
but they founder on the problems of external variables and initialization.)

C.4. Recursion
Fortran procedures are not recursive. so it was not practical to permit EFL procedures to
be recursive. (Recursive procedures with arguments can be simulated only with great pain.)
C.S. Storage Allocation
The definition of Fortran does not specify the lifetime of variables. It would be possible
but cumbersome to implement stack or heap storage disciplines by using COMMON blocks.

Berkeley Pascal User's Manual 2-159

Berkeley Pascal User's Manual
Version 3.0 - July 1983
William N. Joy, Susan L. Graham, Charles B. Ha/(!)¢,
Marshall Kirk McKusick, and Peter B. Kessler
Computer Science Division
Department of Electrical Engineering and Computer Science
University of California, Berkeley
Berkeley, California 94720
Introduction
The Berkeley Pascal User's Manual consists of five major sections and an appendix. In
section 1 we give sources of information about UNIX, about the programming language Pascal,
and about the Berkeley implementation of the language. Section 2 introduces the Berkeley
implementation and provides a number of tutorial examples. Section 3 discusses the error diagnostics produced by the translators pc and pi, and the runtime interpreter px. Section 4
describes input/output with special attention given to features of the interactive implementation
and to features unique to UNIX. Section 5 gives details on the components of the system and
explanation of all relevant options. The User's Manual concludes with an appendix to Wirth's
Pascal Report with which it forms a precise definition of the implementation.
History of the implementation
The first Berkeley system was written by Ken Thompson in early 1976. The main
features of the present system were implemented by Charles Haley and William Joy during the
latter half of 1976. Earlier versions of this system have been in use since January, 1977.
The system was moved to the vAX-11 by Peter Kessler and Kirk McKusick with the porting of the interpreter in the spring of 1979, and the implementation of the compiler in the summer of 1980.

2-160 Berkeley Pascal User's Manual

1. Sources of information
This section lists the resources available for information about general features of lJ!';IX. t
text editing. the Pascal language. and the Berkeley Pascal implementation. concluding with a list
of references. The available documents include both so.-called standard documents - those
distributed with all UNIX system - and documents (such as this one) written at Berkeley.

1.1. Where to get documentation
Current documentation for most of the UNIX system is available .. on line" at your terminal. Details on getting such documentation interactively are given in section 1.3.
1.2. Documentation describing UNIX
The following documents are those recommended as tutorial and reference material about
the UNIX system. We give the documents with the introductory and tutorial materials first, the
reference materials last. ·
UNIX For Beginners - Second Edition
This document is the basic tutorial for UNIX available with the standard system.
Communicating with UNIX
This is also a basic: tutorial on the system and assumes no previous familiarity with computers~ it was written at Berkeley.
An introduction to the C shell
This document introduces csh. the shell in common use at Berkeley. and provides a good
deal of general description about the way in which the system functions. It provides a useful
glossary of terms used in discussing the system.
UNIX Programmer's Manual
This manual is the major source of details on the components of the UNIX system. It consists of an Introduction, a permuted index. and eight command sections. Section 1 consists of
descriptions of most of the .. commands" of UNIX. Most of the other sections have limited
relevance to the user of Berkeley Pascal, being of interest mainly to system programmers.
UNIX documentation often refers the reader to sections of the manual. Such a reference
consists of a command name and a section number or name. An example of such a reference
would be: ed (1 ). Here ed is a command name - the standard UNIX text editor, and '(l )' indicates that its documentation is in section 1 of the manual.
The pieces of the Berkeley Pascal system are pi (1). px (1). the combined Pascal translator
and interpretive executor pix (1), the Pascal compiler pc (l). the Pascal execution profiler pxp
(1). and the Pascal cross-reference generator pxref (1).
It is possible to obtain a copy of a manual section by using the man (1) command. To get
the Pascal documentation just described one could issue the command:

% man pi
to the shell. The user input here is shown in bold face~ the '% ', which was printed by the shell
as a prompt. is not. Similarly the command:
% man man
tUNIX is a Trademark of Bell Laboratories.

Berkeley Pascal User's Manual 2-161

asks the man command to describe itself.

1.3. Text editing documents
The following documents introduce the various UNIX text editors. Most Berkeley users
use a version of the text editor ex: either edit, which is a version of ex for new and casual users.
ex itself, or vi (visual) which focuses on the display editing portion of ex.
A Tutorial Introduction to the UNIX Text Editor
This document, written _by Brian Kernighan of Bell Laboratories. is a tutorial for the standard UNIX text editor ed. It introduces you to the basics of text editing. and provides enough
information to meet day-to-day editing needs. for ed users.
Edit: A tutorial
This introduces the use of edit, an editor similar to ed which provides a more hospitable
environment for beginning users.
Ex/edit Command Summary
This summarizes the features of the editors ex and edit in a concise form. If you have
used a line oriented editor before this summary alone may be enough to get you started.
Ex Reference Manual - Version 3.5
A complete reference on the features of ex and edit.
An Introduction to Display Editing with Vi
Vi is a display oriented text editor. It can be used on most any CRT terminal. and uses the
screen as a window into the file you are editing. Changes you make to the file are reflected in
what you see. This manual serves both as an introduction to editing with vi and a reference
manual.
Vi Quick Reference
This reference card is a handy quick guide to vi; you should get one when you get the
introduction to vi.
1.4. Pascal documents - The language
This section describes the documents on the Pascal language which are likely to be most
useful to the Berkeley Pascal user. Complete references for these documents are given in section I. 7.
Pascal User Manual
By Kathleen Jensen and Niklaus Wirth, the User Manual provides a tutorial introduction
to the features of the language Pascal. and serves as an excellent quick-reference to the
language. The reader with no familiarity with Algol-like languages may prefer one of the Pascal
text books listed below. as they provide more examples and explanation. Particularly important
here are pages 116-118 which define the syntax of the language. Sections 13 and 14 and
Appendix F pertain only to the 6000-3.4 implementation of Pascal.
Pascal Report
By Niklaus Wirth. this document is bound with the User Manual. It is the guiding reference for implementors and the fundamental definition of the language. Some programmers
find this report too concise to be of practical use. preferring the User Manual as a reference.

2-162 Berkeley Pascal User's Manual

Books on Pascal
Several good books which teach Pascal or use it as a medium are available. The books by
Wirth Systematic Programming and Algorithms+ Data Structures - Programs use Pascal as a vehicle for teaching programming and data structure concepts respectively. They are both recommended. Other books on Pascal are listed in the references below.
1.5. Pascal documents - The Berkeley Implementation
This section describes the documentation which is available describing the Berkeley implementation of Pascal.
User's Manual
The document you are reading is the User's Manual for Berkeley Pascal. We often refer
the reader to the Jensen-Wirth User Manual mentioned above. a different document with a
similar name.
Manual sections
The sections relating to Pascal in the UNIX Programmer's Ma11ual are pix (] ). pi (}), pc
(] ). px (1), pxp (1), and pxref (l). These sections give a description of each program. summarize the available options. indicate files used by the program, give basic information on the diagnostics produced and include a list of known bugs.
Implementation notes
For those interested in the internal organization of the Berkeley Pascal system there are a
series of Implementation Notes describing these details. The Berkeley Pascal PXP lmpleme111atio11
Notes describe the Pascal interpreter px; and the Berkeley Pascal PX /mplemematio11 Notes
describe the structure of the execution profiler pxp.
1.6. References
UNIX Documents

Communicating With UNIX
Computer Center
University of California. Berkeley
January. 1978.
Edit: a tutorial
Ricki Blau and James Joyce
Computing Services Division. Computing Affairs
University of California. Berkeley
January. 1978.
£"</edit Command Summary
Computer Center
University of California, Berkeley
August. 1978.
£r: Reference Manual - Version 3.5
An lmroduction to Display Editing with Vi
Vi Quick Reference
William Joy
Computer Science Division

Berkeley Pascal User's Manual 2-163

Department of Electrical Engineering and Computer Science
University of California. Berkeley
October. 1980.

A11 lmroduction to the C shell (Revised)
William Joy
Computer Science Division
Department of Electrical Engineering and Computer Science
University of California. Berkeley
October. 1980.
Brian W. Kernighan
UNIX for Beginners - Second Edition
Bell Laboratories
Murray Hill. New Jersey.
Brian W. Kernighan
A Tutorial lmroduction to the UNIX Text Editor
Bell Laboratories
Murray Hill. New Jersey.
Dennis M. Ritchie and Ken Thompson
The UNIX Time Sharing System
Communications of the ACM
July 1974
365-378.
B. W. Kernighan and M. D. Mcllroy
UNIX Programmer's Manual - Seve11th Editio11
Bell Laboratories
Murray Hill. New Jersey
December. 1978.
(Virtual VAX/11 Version.
U. C. Berkeley
Berkeley, Ca.
November. 1980.)
Pascal Language Documents

Conway. Gries and Zimmerman
A Primer on PASCAL
Winthrop. Cambridge Mass.
1976. 433 pp.
Kathleen Jensen and Niklaus Wirth
Pascal - User Manual and Report
Springer-Verlag. New York.
1975. 167 pp.

2-164 Berkeley Pascal User's Manual

C. A. G. Webster

Introduction to Pascal
Heyden and Son. New York
1976. l 29pp.
Niklaus Wirth

Algorithms+ Data structures- Programs
Prentice-Hall. New York.
1976. 366 pp.
Niklaus Wirth

Systematic Programming
Prentice-Hall. New York.
1973, 169 pp.
Berkeley Pascal documents
The following documents are available from the Computer Center Library at the University of California. Berkeley.
William N. Joy, Susan L. Graham. and Charles B. Haley

Berkeley Pascal User's Manual - Version 2. 0
October 1980.
William N. Joy

Berkeley Pascal PX Implementation Notes
Version I. I. April 1979.
(Vax-11 Version 2.0 By Kirk McKusick •. December, 1979)
William N. Joy

Berkeley Pascal PXP Implementation Notes
Version 1.1. April 1979.

Berkeley Pascal User's Manual 2-165
2. Basic UNIX Pascal
The following sections explain the basics of using Berkeley Pascal. In examples here we
use the text editor ex (1). Users of the text editor ed should have little trouble following these
examples. as ex is similar to ed. We use ex because it allows us to make clearer examples. t The
new UNtxt user will find it helpful to read one of the text editor documents described in section
1.4 before continuing with this section.

2.1. A first program
To prepare a program for Berkeley Pascal we first need to have an account on UNIX and to
'login' to the system on this account. These procedures are described in the documents Communicating with UNIX and UNIX for Beginners.
Once we are logged in we need to choose a name for our program; let us call it 'first' as
this is the first example. We must also choose a name for the fi1e in wl'Jich the program will be
stored. The Berkeley Pascal system requires that programs reside in files which have names
ending with the sequence '.p' so we will call our file 'tirst.p'.
A sample editing session to create this file would begin:
% ex first.p

"first.p" [New file}
We didn't expect the file to exist, so the error diagnostic doesn't bother us. The editor now
knows the name of the file we are creating. The •:' prompt inqicates that it is ready for command input. We can add the text for our program using the •append' command as follows.
:append
program first (output>
begin
writeln ('Hello. world!')
end.

The line containing the single •.' character here indicated the end of tfle appended text. The ':'
prompt indicates that ex is ready for another command. As the editQr operates in a temporary
work space we must now store the contents of this work space in the file •first.p' so we can use
the Pascal translator and executor pix on it.
:write
"first.p" [New file] 4 lines, 59 characters
:quit
o/o

We wrote out the file from the edit buffer here with the ·write• command, and ex indicated the
number 1of lines and characters written. We then quit the editor, and now have a prompt from
the shell.i

t Users with CRT terminals should find the editor 1•i more pleasant to use: we do no1 show its use here be·
cause its display oriented nature makes it difficult 10 illustrale.
tUNIX is a Trademark of Bell Laboratories.
Our examples here assume you are using ,·sir.

2-166 Berkeley Pascal User's Manual
We are ready to try to translate and execute our program.
% pix ftrst.p

Tue Oct 14 21:371980 first.p:
2 begin
e
Inserted ·; •
Execution begins ...
Hello. world!
Execution terminated.

---T--

1 statements executed in 0.02 seconds cpu time.

%
The translator first printed a syntax error diagnostic. The number 2 here indicates that
the rest of the line is an image of the second line of our program. The translator is saying that
it expected to find a •;' before the keyword begin on this line. If we look at the Pascal syntax
charts in the Jensen-Wirth User Ma11ual. or at some of the sample programs therein. we will see
that we have omitted the terminating •;' of the program statement on the first line of our program.
One other thing to notice about the error diagnostic is the letter •e' at the beginning. It
stands for •error'. indicating that our input was not legal Pascal. The fact that it is an ·e· rather
than an 'E' indicates that the translator managed to recover from this error well enough that
generation of code and execution could take place. Executio·n is possible whenever no fatal •E•
errors occur during translation. The other classes of diagnostics are •w' warnings. which do not
necessarily indicate errors in the program. but point out inconsistencies which are likely to be
due to program bugs. and 's' standard-Pascal violations. t
After completing the translation of the program to interpretive code. the Pascal system
indicates that execution of the translated program began. The output from the execution of the
program then appeared. At program termination. the Pascal runtime system indicated the
number of statements executed. and the amount of cpu time used. with the resolution of the
latter being l/60'th of a second.
Let us now fix the error in the program and translate it to a permanent object code file obj
using pi. The program pi translates Pascal programs but stores the object code instead of executing it*.

% ex ftrst.p
"first.p" 4 lines. 59 characters
:1 print
program first(output)

:s/S/;
program first (output);
:write
"first.p" 4 lines. 60 characters
:quit
% pi first.p

%
tThe standard Pascal warnings occur only when the associated s translator option is enabled. The s option is
discussed in sections 5.1 and A.6 below. Warning diagnostics are discussed at the end of section 3.2. the associated w option is described in section 5.2.
*This script indicates some other useful approaches to debugging Pascal programs. As in eel we can shorten
commands in e:c to an initial prefix of the command name as we did with the s11bsmu1e command here. We
have also used the "!' shell escape command here to execute other commands with a shell without leaving
the editor.

Berkeley Pascal User's Manual 2-167

If we now use the UNIX Is list files command we can see what files we have:
% ls

first.p
obj
%

The file •obj' here contains the Pascal interpreter code. We can execute this by typing:
% px obj

Hello. world!
I statements executed in 0.02 seconds cpu time.
%
Alternatively. the command:
% obj

will have the same effect. Some examples of different ways to execute the program follow.

% px
Hello. world!
I statements executed in 0.02 seconds cpu time.
% pi -p first.p
% px obj

Hello. world!
% pix -p first.p
Hello. world!
%

Note that px will assume that 'obf is the file we wish to execute if we don't tell it otherwise. The last two translations use the -p no-post-mortem option to eliminate execution
statistics and •Execution begins' and 'Execution terminated' messages. See section 5.2 for
more details. If we now look at the files in our directory we will see:
% ls

first.p
obj
%
We can give our object program a name other than 'obj' by using the move command 1111· <1).
Thus to name our program 'hello':
% mv obj hello

% hello
Hello. world!
% ls

first.p
hello
%
Finally we can get rid of the Pascal object code by using the rm (1) remove file command. e.g.:
% rm hello
o/o ls

first.p
%

2-168 Berkeley Pascal User's Manual

For small programs which are being developed pix tends to be more convenient to use
than pi and px. Except for absence of the obj file after a pix run. a pix command is equivalent to
a pi command followed by a px command. For larger programs. where a number of runs testing
different parts of the program are to be made. pi is useful as this obj file can be executed any
desired number of times.

2.2. A larger program
Suppose that we have used the editor to put a larger program in the file 'bigger.p'. We
can list this program with line numbers by using the program cat-n i.e.:
% cat -n bigger.p

This program is similar to program 4. 9 on page 30 of the Jensen-Wirth User Manual. A
number of problems have been introduced into this example for pedagogical reasons.
If we attempt to translate and execute the
, program using pix we get the following
response:

% pix bigger. p
Tue Oct 14 21:37, 1980 bigger.p:
9
h - 34~
(•Character position of x-axis •)
w
in a ( • ... •) comment
16
for i :- 0 to lim begin
e ------------------------------ t ·----- Inserted keyword do
18
y :- exp(-x9 • sin(i • x);
E
Undefined variable
e ····-·····-··-------···-·-------·-····--···------· t --·--- Inserted ')'
19
n :- Round(s • y) + h;
E --------------·-···----- l----·- Undefined function
E ----··········---------·--···-------······-···
Undefined variable
23
writeln('.')
e ·······-····--·---T----·- Inserted ':'
24 end.
E --- l --··· Expected keyword until
E --------· t ···-·· Unexpected end-of-file - QUIT
Execution suppressed due to compilation errors

---------------------------1 ------ (•

·---------·---···--·--------l ------

T-----·

Since there were fatal 'E' errors in our program. no code was generated and execution was
necessarily suppressed. One thing which would be useful at this point is a listing of the program with the error messages. We can get this by using the command:

% pi -1 bigger.p
There is no point in using pix here. since we know there are fatal errors in the program. This
command will produce the output at our terminal. If we are at a terminal which does not produce a hard copy we may wish to print this listing off-line on a line printer. We can do this
with the command:

% pi -I bigger.p IIpr
In the next few sections we will illustrate various aspects of the Berkeley Pascal system by
correcting this program.

Berkeley Pascal User's Manual 2-169
2.3. Correcting the first errors
Most of the errors which occurred in this program were symac:tic: errors. those in the format and structure of the program rather than its content. Syntax errors are flagged by printing
the offending line. and then a line which flags the location at which an error was detected. The
flag line also gives an explanation stating either a possible cause of the error. a simple action
which can be taken to recover from the error so as to be able to continue the analysis. a symbol
which was expected at the point of error. or an indication that the input was 'malformed'. In
the last case. the recovery may skip ahead in the input to a point where analysis of the program
can continue.
In this example. the first error diagnostic indicates that the translator detected a comment
within a comment. While this is not considered an error in 'standard' Pascal. it usually
corresponds to an error in the program which is being translated. In this case. we have accidentally omitted the trailing '•)' of the comment on line 8. We can begin an editor session to
correct this problem by doing:

% ex bigger.p
"bigger.p" 24 lines. 512 characters
:8s/$/ •)
s - 32;

(• 32 character width for interval [x. x+ 1] •)

The second diagnostic. given after line 16. indicates that the keyword do was expected
before the keyword begin in the for statement. If we examine the staremem syntax chart on
page 118 of the Jensen-Wirth User Manual we will discover that do is a necessary part of the for
statement. Similarly. we could have referred to section C.3 of the Jensen-Wirth User Manual
to learn about the for statement and gotten the same information there. It is often useful to
refer to these syntax charts and to the relevant sections of this book.
We can correct this problem by first scanning for the keyword for in the file and then substituting the keyword do to appear in front of the keyword begin there. Thus:
:/for
for i :- 0 to lim begin
:s/begin/do &
for i :- 0 to lim do begin
The next error in the program is easy to pinpoint. On line 18, we didn't hit the shift key and
got a '9' instead of a ')'. The translator diagnosed that 'x9' was an undefined variable and.
later. that a ')' was missing in the statement. It should be stressed that pi is not suggesting that
you should insert a')' before the ';'. It is only indicating that making this change will help it to
be able to continue analyzing the program so as to be able to diagnose further errors. You
must then determine the true cause of the error and make the appropriate correction to the
source text.
This error also illustrates the fact that one error in the input may lead to multiple error
diagnostics. Pi attempts to give only one diagnostic for each error. but single errors in the
input sometimes appear to be more than one error. It is also the case that pi may not detect an
error when it occurs. but may detect it later in the input. This would have happened in this
example if we had typed 'x' instead of 'x9'.
The translator next detected. on line 19, that the function Round and the variable h were
undefined. It does not know about Round because Berkeley Pascal normally distinguishes
between upper and lower case. t On UNIX lower-case is preferred;, and all keywords and built-in
tin ·•standard" Pascal no distinction is made based on case.
;Qne good reason for using lower-ase is that it is easier to type.

2-170 Berkeley Pascal User's Manual

procedure and function name~ are composed of lower-case letters. just as they are in the
Jensen-Wirth Pascal Report. Thus we need to use the function round here. As far as his concerned. we can see why it is Uindefined if we look back to line 9 and note that its definition was
lost in the non-terminated comment. This diagnostic need not. therefore. concern us.
The next error which occurred in the program caused the translator to insert a ·: • before
the statement calling write/11 on line 23. If we examine the program around the point of error
we will see that the actual error is that the keyword until and an associated expression have
been omitted here. Note that the diagnostic from the translator does not indicate the actual
error. and is somewhat misleading. The translator made the correction which seemed to be
most plausible. As the omission of a •:' character is a common mistake, the translator chose to
indicate this as a possible fix here. It later detected that the keyword until was missing. but not
until it saw the keyword end on line 24. The combination of these diagnostics indicate to us
the true problem.
The final syntactic error -message indicates that the translator needed an end keyword to
match the begin at line 15. Since the end at line 24 is supposed to match this begin. we can
infer that another begin must have been mismatched, and have matched this end. Thus we see
that we need an end to match the begin at line 16. and to appear before the final end. We can
make these corrections:
:/x9/s//x)
y :- exp(-x) • sin(i • x):
:+s/Round/round
n :- round(s • y) + h:
:/write
writer

·>:

:/
writeln('•)

:insert
until n - O;
:$
end.

:insert
end

At the end of each procedure or function and the end of the program the translator sum·
marizes references to undefined variables and improper usages of variables. It also gives warnings about potential errors. In our program, the summary errors do not indicate any further
problems but the warning that c is unused is somewhat suspicious. Examining the program we
see that the constant was intended to be used in the expression which is an argument to sin, so
we can correct this expression. and translate the program. We have now made a correction for
each diagnosed error in our program.

:?i ?s//c/
y :- exp(-x) • sin(c • x):
:write
"bigger.p" 26 lines, 538 characters
:quit
% pi bigger.p
%

It should be noted that the translator suppresses warning diagnostics for a particular procedure.
function or the main program when it finds severe syntax errors in that part of the source text.
This is to prevent possibly confusing and incorrect warning diagnostics from being produced.

Berkeley Pascal User's Manual 2-171
Thus these warning diagnostics may not appear in a program with bad syntax errors until these
errors are corrected.
We are now ready to execute our program for the first time. We will do so in the next
section after giving a listing of the corrected program for reference purposes.

% cat -n bigger.p
1
2
3

(•
• Graphic representation of a function
• f(x) • exp(-x) • sin(2 • pi • x)

4 •)
5 program graph I (output);
6 const
7
8
9
10
11
12 var
13
14
15 begin
16
17
18
19
20
21
22
23
24
25
26 end.

d • 0.0625; (• 1/16. 16 lines for interval [x. x+l] •>
s • 32;
(• 32 character width for interval [x. x+l] •)
h - 34;
(• Character position of x-axis •)
c - 6.28138; (• 2 • pi •)
lim - 32;
x. y: real;
i. n: integer;
for i :- 0 to Jim do begin
x :- d Ii;
y :- exp(-x) • sin(c • x);
n :• round(s • y) + h;
repeat
write{' ');
n :- n - 1
until n - O;
writeln('•)
end

%
2.4. Executing the second example
We are now ready to execute the second example. The following output was produced by
our first run.

% px
Execution begins ...

Floating point division error
Error in "graphl"+2 near line 17.
Execution terminated abnormaJly.
2 statements executed in 0.05 seconds cpu time.
%

Here the interpreter is presenting us with a runtime error diagnostic. It detected a 'division by
zero' at line 17. Examining line 17, we see that we have written the statement 'x :- d / i'
instead of 'x :- d • i'. We can correct this and rerun the program:

% ex bigger.p

2-172 Berkeley Pascal User's Manual
•bigger.p" 26 lines. 538 characters
:17
x :-d/ i

:s·r·

x :- d. i

:write
•bigger.p• 26 lines, 538 characters

:q
% pix bigger. p

Execution begins ...

•

•
•

•

•
•
•
•
•
•
•

•

•
•

•

•
•
•

•

Execution terminated.

2550 statements executed in 0.30 seconds cpu time.
%
This appears to be the output we wanted. We could now save the output in a file if we
wished by using the shell to redirect the output:
% p:x >graph

We can use car (1) to see the contents of the file graph. We can also make a listing of the
graph on the line printer without putting it into a file, e.g.

Berkeley Pascal User's Manual 2-173

% px I Jpr
Execution begins ...
Execution terminated.
2550 statements executed in 0.37 seconds cpu time.
%
Note here that the statistics lines came out on our terminal. The statistics line comes out on
the diagnostic output (unit 2.) There are two ways to get rid of the statistics line. We can
redirect the statistics message to the printer using the syntax 'I&' to the shell rather than 'I'. i.e.:

% px I& lpr
%
or we can translate the program with the p option disabled on the command line as we did
above. This will disable all post-mortem dumping including the statistics line. thus:

% pi -p bigger.p

% px I lpr
%

This option also disables the statement limit which normally guards against infinite looping.
You should not use it until your program is debugged. Also if p is specified and an error
occurs, you will not get run time diagnostic information to help you determine what the problem is.

2.S. Formatting the program listing
It is possible to use special lines within the source text of a program to format the program listing. An empty line (one with no characters on it) corresponds to a 'space' macro in an
assembler, leaving a completely blank line without a line number. A line containing only a
control-I (form-feed) character will cause a page eject in the listing with the corresponding line
number suppressed. This corresponds to an 'eject' pseudo-instruction. See also section 5.2 for
details on the n and i options of pi.
2.6. Execution profiling
An execution profile consists of a structured listing of (all or part of) a program with
information about the number of times each statement in the program was executed for a particular run of the program. These profiles can be used for several purposes. In a program
which was abnormally terminated due to excessive looping or recursion or by a program fault.
the counts can facilitate location of the error. Zero counts mark portions of the program which
were not executed~ during the early debugging stages they should prompt new test data or a reexamination of the program logic. The profile is perhaps most valuable, however. in drawing
attention to the (typically small) portions of the program that dominate execution time. This
information can be used for source level optimization.
An example
A prime number is a number which is divisible only by itself and the number one. The
program primes. written by Niklaus Wirth, determines thefirst few prime numbers. In translating the program we have specified the z option to pix. This option causes the translator to generate counters and count instructions sufficient in number to determine the number of times
each statement in the program was executed. t When execution of the program completes.
tThe counts are completely accurate only in the absence of rumime errors and nonlocal i=oto Slatements.
This is not generally a problem. however. as in structured programs nonlocal goto statements occur infre·
quently. and counts are incorrect after abnormal termination only when the upward look described below to
get a count passes a suspended call point.

2-174 Berkeley Pascal User's Manual
either normally or abnormally. this count data is wrinen to the file pmo11.ou1 in the current
directory.; It is then possible to prepare an execution profile by giving pxp the name of the file
associated with this data. as was done in the following example.
% pix -I -z primes.p
Berkeley Pascal PI - Version 2.0 (Sat Oct 18 21:01:54 1980)

Tue Oct 14 21:38 1980 primes.p

1 program primes(output);
2 const n - SO; nl - 7; (•nl - sqrt(n)•)
3
4

var i.k.x.inc.lim.square.1: integer;
prim: boolean;
p, v: array(l..nl] of integer;
6 begin
7
write(2:6. 3:6); 1 :- 2;
x :- l; inc :- 4; Jim :- l; square :- 9;
8
9
for i :- 3 to n do
begin (•find next prime•)
10
repeat x :- x + inc; inc :- 6-inc;
11
12
if square < - x then
begin lim :- lim+ 1;
13
v[lim] :- square; square:- sqr(p[lim+l])
14
end;
15
k :- 2; prim :- true;
16
while prim and (k<lim) do
17
begin k :- k+l;
18
if v(k] < x then v[k] :- v[k] + 2•p[k];
19
prim :- x < > v(k]
20
21
end
until prim;
22
23
if i <- nl then p[i] :- x;
write(x:6); 1 :- l+l;
24
25
if 1 - 10 then
begin writeln; 1 :- 0
26
27
end
28
end;
29
writeln;
30 end.
Execution begins ...
7
2
3
5
17
11
13
19
23
29
31
37
41
47
43
67
71
53
59
61
73
79
83
89
97
101 103 107 109 113
127
131
137
139 149 151
157
163
167
173
179
181
191 193 197 199 211 223 227 229

Execution terminated.
1404 statements executed in 0.17 seconds cpu time.
%
'+Pmm1.0111 has a name similar to mo11.ou1 the monitor file produced by the profiling facility of the C compiler
,.,. ( l l. See prof' (I l for a discussion of the C compiler profiling facilities.

Berkeley Pascal User's Manual 2-175

Discussion
The header lines of the outputs of pix and pxp in this example indicate the version of the
translator and execution profiler in use at the time this example was prepared. The time given
with the file name (also on the header line) indicates the time of last modification of the program source file. This time serves to version stamp the input program. Pxp also indicates the
time at which the profile data was gathered.
% pxp -z primes. p

Berkeley Pascal PXP -- Version 1.1 (May 7. 1979)
Tue Oct 14 21:38 1980 primes.p
Profiled Tue Oct 21 18:48 1980
1
2

2
2
3

3
4

S
6
1
1
8
8

8
9
9
11
11
12
13
14
14
14
16
16
17
18
19
19

20
20
20
23
23
24
24
25
26
26
26

1.-----IProgram primes(output):
~onst

I
I

n - 50:
nl - 7; (•nl - sqrt(n)•)
Ivar
I i. k. x. inc, lim, square. I: integer:
I prim: boolean:
I p. v: array [l..n 1] of integer:
lbegin
I write<2: 6. 3: 6>:

I 1 :- 2:
I x :- I;
I inc:- 4;
I lim :- 1;
I square :- 9:
I for i :- 3 ton do begin (•find next prime•)
48.-----1 repeat
76.-----1 x :- x + inc:
I inc :- 6 - inc:
I if square <- x then begin
5.-----1 lim :- lim + 1;
I v[lim] :- square:
I square :- sqr(p[lim + I])
I end:
I k :- 2:
I prim :- true:
I while prim and (k < lim) do begin
157.-----1 k :- k + I;
I if v[k] < x then
42.-----1 v[k] :- v[k] + 2 • p[k]:
I prim:- x < > v[k]
I end
!until prim;
I if i <- n 1 then
S.-----1 p(i] :- x:
I write(x: 6):
I I :-1 +I:
I if I - 10 then begin
5.-----1 writeln:
I I:- 0
end

2-176 Berkeley Pascal User's Manual

26
29
29

I end:
I writeln
lend.

To determine the number of times a statement was executed, one looks to the left of the
statement and finds the corresponding vertical bar i'. If this vertical bar is labelled with a count
then that count gives the number of times the statement was executed. If the bar is not
labelled. we look up in the listing to find the first i' which directly above the original one which
has a count and that is the answer. Thus. in our example. k was incremented 157 times on line
18. while the 'l\·rite procedure call on line 24 was executed 48 times as given by the count on the
repeal.
More information on pxp can be found in its manual section pxp (1) and in sections 5.4.
5.5 and S.10.

Berkeley Pascal User's Manual 2-177

3. Error diagnostics
This section of the User's Manual discusses the error diagnostics of the programs pi. pt:
and px. Pix is a simple but useful program which invokes pi and px to do all the real processing.
See its manual section pix (1) and section 5.2 below for more details. All the diagnostics given
by pi will also be given by pc.
3.1. Translator syntax errors
A few comments on the general nature of the syntax errors usually made by Pascal programmers and the recovery mechanisms of the current translator may help in using the system.
Illegal characters
Characters such as ·s·. •!',and '@'are not part of the language Pascal. If they are found
in the source program, and are not part of a constant string. a constant character, or a comment. they are considered to be •megal characters'. This can happen if you leave off an opening string quote •••. Note that the character .... , although used in English to quote strings. is
not used to quote strings in Pascal. Most non-printing characters in your input are also illegal
except in character constants and character strings. Except for the tab and form feed characters. which are used to ease formatting of the program. non-printing characters in the input file
print as the -character •?' so that they will show in your listing.
String errors
There is no character string of length 0 in Pascal. Consequently the input •••• is not
acceptable. Similarly, encountering an end-of-line after an opening string quote ··• without
encountering the matching closing quote yields the diagnostic .. Unmatched • for string'". It is
permissible to use the character •#' instead of ••• to delimit character and constant strings for
portability reasons. For this reason, a spuriously placed '#' sometimes causes the diagnostic
about unbalanced quotes. Similarly, a •#' in column one is used when preparing programs
which are to be kept in multiple files. See section 5.11 for details.
Comments in a comment, non-terminated comments
As we saw above, these errors are usually caused by leaving off a comn:ient delimiter.
You can convert parts of your program to comments without generating this diagnostic since
there are two different kinds of comments - those delimited by 'I' and'}', and those delimited
by ' (•' and '•) '. Thus consider:
{ This is a comment enclosing a piece of program
a:- functioncall;
(•comment within comment•)
procedurecall;
lbs :- rhs;
(• another comment •)
}
By using one kind of comment exclusively in your program you can use the other delimiters when you need to ••comment out" parts of your programt. In this way you will also allow
the translator to help by detecting statements accidentally placed within comments.
If a comment does not terminate before the end of the input file, the translator will point
to the beginning of the comment. indicating that the comment is not terminated. In this case
processing will terminate immediately. See the discussion of "QUIT' below.
tlf you wish to transport your program. especially to the 6000-3.4 implementation. you should use the char·
acter sequence '(•' to delimit comments. For transportation over the "·slink to Pascal 6000-3.4. the character
'#'should be used to delimit characters and constant strings.

2-178 Berkeley Pascal User's Manual

Digits in numbers
This part of the language is a minor nuisance. Pascal requires digits in real numbers both
before and after the decimal point. Thus the following statements. which look quite reasonable
to FORTRAN users. generate diagnostics in Pascal:

Tue Oct 14 21 :37 1980 digits.p:
4 r :- 0.;
e
Digits required after decimal point
S r :- .O;
e
Digits required before decimal point
6 r :- l.elO;
e ·-----------T ·----- Digits required after decimal point
7 r :- .OSe-10;
e
Digits required before decimal point

··---------T ·----·

---------T-----·-------T-----

These same constructs are also illegal as input to the Pascal interpreter px.
Replacements. insertions, and deletions
When a syntax error is encountered in the input text. the parser invokes an error recovery
procedure. This procedure examines the input text immediately after the point of error and
considers a set of simple corrections to see whether they will allow the analysis to continue.
These corrections involve replacing an input token with a different token. inserting a token. or
replacing an input token with a different token. Most of these changes will not cause fatal syn·
tax errors. The exception is the insertion of or replacement with a symbol such as an identifier
or a number; in this case the recovery makes no altempt to determine tt·hich identifier or what
number should be inserted. hence these are considered fatal syntax errors.
Consider the following example.

% pix -I synerr.p
Berkeley Pascal PI -- Version 2.0 (Sat Oct 18 21:01:54 1980)

Tue Oct 21 23:51 1980 synerr.p
1 program syn(output);
2 var i, j are integer;
e
Replaced identifier with a·:·
3 begin
4
for j :• 1 to 20 begin
e
Replaced·.· with a·-·
e
Inserted keyword do
5
write(j);
6
i - 2 •• j;
e
Inserted ·:'
E ------------------------··· T·-· Inserted identifier
7
write!n (i))
E
Deleted ·r
8
end
9 end.
%

------------T--

-----------------T-----------------------------T-----------------------T---

----------------------------T ···

The only surprise here may be that Pascal does not have an exponentiation operator. hence the
complaint about '••'. This error illustrates that. if you assume that the language has a feature
which it does not. the translator diagnostic may not indicate this. as the translator is unlikely to
recognize the construct you supply.

Berkeley Pascal User's Manual 2-179
Undefined or improper identifiers
If an identifier is encountered in the input but is undefined, the error recovery will replace
it with an identifier of the appropriate class. Further references to this identifier will be summarized at the end of the containing procedure or function or at the end of the program if the
reference occurred in the main program. Similarly, if an identifier is used in an inappropriate
way. e.g. if a type identifier is used in an assignment statement, or if a simple variable is used
where a record variable is required. a diagnostic will be produced and an identifier of the
appropriate type inserted. Further incorrect references to this identifier will be flagged only if
they involve incorrect use in a different way, with all incorrect uses being summarized in the
same way as undefined variable uses are.
Expected symbols, malformed constructs
If none of the above mentioned corrections appear reasonable. the error recovery will
examine the input to the left of the point of error to see if there is only one symbol which can
follow this input. If this is the case. the recovery will print a diagnostic which indicates that the
given symbol was •Expected'.
In cases where none of these corrections resolve the problems in the input, the recovery
may issue a diagnostic that indicates that the input is .. malformed". If necessary, the translator
may then skip forward in the input to a place where analysis can continue. This process may
cause some errors in the text to be missed.
Consider the fallowing example:
% pix -I synerr2.p

Berkeley Pascal Pl -- Version 2.0 (Sat Oct 18 21 :01:54 1980)
Tue Oct 14 21:38 1980 synerr2.p
I program synerr2(input,outpu);
2 integer a(lO)
E --- t --- Malfarmed declaration
3 begin
4
read(b);
E
Undefined variable
5
for c :- I to I 0 do
E
Undefined variable
6
a(c) :- b • c;
E ··········-····--T ··-··· Undefined procedure
E ····-·····-·--···----T ······ Malformed statement
7 end.
E I - File outpu listed in program statement but not declared
e I - The file output must appear in the program statement file list
In program synerr2:
E - a undefined on line 6
E - b undefined on line 4
E - c undefined on lines S 6
Execution suppressed due to compilation errors
%

······-----------T ····-·
····----·----T-----

Here we misspelled output and gave a FORTRAN style variable declaration which the translator
diagnosed as a 'Malformed declaration'. When, on line 6. we used '('and ')' for subscripting
(as in FORTRAN) rather than the '[' and ']' which are used in Pascal. the translator noted that a
was not defined as a procedure. This occurred because procedure and function argument lists
are delimited by parentheses in Pascal. As it is not permissible to assign to procedure calls the
translator diagnosed a malformed statement at the point of assignment.

2-180 Berkeley Pascal User's Manual

Expected and unexpected end-of-file, "QUIT ..
If the translator finds a complete program. but there is more non-comment text in the
input file. then it will indicate that an end-of-file was expected. This situation may occur after a
bracketing error. or if too many ends are present in the input. The message may appear after
the recovery says that it .. Expected •.'" since '.' is the symbol that terminates a program.
If severe errors in the input prohibit funher processing the translator may produce a diagnostic followed by ''QUIT". One example of this was given above - a non-terminated comment~ another example is a line which is longer than 160 characters. Consider also the following example.

% pix -I mism.p
Berkeley Pascal PI -- Version 2.0 (Sat Oct 18 21:01:54 1980)
Tue Oct 14 21:38 1980 mism.p
1 program mismatch (output)
2 begin
e ·-· T----- Inserted •~ •
3
writeln r •••');
4
I The next line is the last line in the file }
5
writeln
E
Unexpected end-of-file - QUIT
%

------------------T------

3.2. Translator semantic errors
The extremely large number of semantic diagnostic messages which the translator produces make it unreasonable to discuss each message or group of messages in detail. The messages are. however. very informative. We will here explain the typical formats and the terminology used in the error messages so that you will be able to make sense out of them. In any
case in which a diagnostic is not completely comprehensible you can refer to the User Manual
by Jensen and Wirth for examples.

Format of the error diagnostics
As we saw in the example program above. the error diagnostics from the Pascal translator
include the number of a line in the text of the program as well as the text of the error message.
While this number is most often the line where the error occurred. it is occasionally the
number of a line containing a bracketing keyword like end or until. In this case. the diagnostic
may refer to the previous statement. This occurs because of the method the translator uses for
sampling line numbers. The absence of a trailing ';' in the previous statement causes the line
number corresponding to the end or until. to become associated with the statement. As Pascal
is a free-format language. the line number associations can only be approximate and may seem
arbitrary to some users. This is the only notable exception. however. to reasonable associations.

Incompatible types
Since Pascal is a strongly typed language. many semantic errors manifest themselves as
type errors. These are called 'type clashes' by the translator. The types allowed for various
operatorS in the language are summarized on page 108 of the Jensen-Wirth User Manual. It is
important to know that the Pascal translator. in its diagnostics. distinguishes between the following type 'classes':

Berkeley Pascal User's Manual 2-181

array
pointer

Boolean
real

char
record

file
scalar

integer
string

These words are plugged into a great number of error messages. Thus. if you tried to assign an
integer value to a char variable you would receive a diagnostic like the following:
Tue Oct 14 21 :37 1980 clash.p:
E 7 - Type clash: integer is incompatible with char
... Type of expression clashed with type of variable in assignment
In this case. one error produced a two line error message. If the same error occurs more than
once. the same explanatory diagnostic will be given each time.
Scalar
The only class whose meaning is not self-explanatory is •scalar'. Scalar has a precise
meaning in the Jensen-Wirth User Manual where. in fact. it refers to char. i111eger. real. and
Boolean types as well as the enumerated types. For the purposes of the Pascal translator. scalar
in an error message refers to a user-defined. enumerated type. such as ops in the example
above or color in
type color - (red. green. blue)

For integers. the more explicit denotation integer is used. Although it would be correct. in the
context of the User Manual to refer to an integer variable as a scalar variable pi prefers the
more specific identification.
Function and procedure type errors
For built-in procedures and functions. two kinds of errors occur. If the routines are called
with the wrong number of arguments a message similar to:
Tue Oct 14 21:38 1980 sinl.p:
E 12 - sin takes exactly one argument
is given. If the type of the argument is wrong. a message like
Tue Oct 14 21:38 1980 sin2.p:
E 12 - sin's argument must be integer or real. not char
is produced. A few functions and procedures implemented in Pascal 6000-3.4 are diagnosed as
unimplemented in Berkeley Pascal. notably those related to segmented tiles.
Can't read and write scalars, etc.
The messages which state that scalar (user-defined) types cannot be written to and from
files are often mysterious. It is in fact the case that if you define
type color - (red. green, blue)

.. standard" Pascal does not associate these constants with the strings 'red', 'green'. and 'blue'
in any way. An extension has been added which allows enumerated types to be read and written. however if the program is to be portable. you will have to write your own routines to perform these functions. Standard Pascal only allows the reading of characters, integers and real
numbers from text files. You cannot read strings or Booleans. It is possible to make a
file of color
but the representation is binary rather than string.

2-182 Berkeley Pascal User's Manual

Expression diagnostics
The diagnostics for semantically ill-formed expressions are very explicit. Consider this
sample translation:
% pi -I expr.p

Berkeley Pascal Pl -

Version 2.0 (Sat Oct 18 21:01:54 1980)

Tue Oct 14 21:37 1980 expr.p
1 program x (output);

2 var
3
4

a: set of char:
b: Boolean:
c: (red. green. blue);
p: T integer:
A: alfa;
8: packed array (1 .• 5) of char:

6
7
8
9 begin
10
b :- true;
11
c :- red:
new(p);
12
13
a:• 0:
A :- 'Hello. yellow·;
14
b :- a and b;
15
a:- a• 3;
16
17
if input < 2 then writelnCboo'):
if p <- 2 then writeln('sure nutr);
18
if A - B then writelnCsame');
19
if c - true then writeln('hue"s and color"s')
20
21 end.
E 14 - Constant string too long
E 15 - Left operand of and must be Boolean. not set
E 16 - Cannot mix sets with integers and reals as operands of •
E 17 - files may not participate in comparisons
E 18 - pointers and integers cannot be compared - operator was <E 19 - Strings not same length in - comparison
E 20 - scalars and Booleans cannot be compared - operator was In program x:
w - constant green is never used
w - constant blue is never used
w - variable B is used but never sei
%
This example is admittedly far-fetched. but illustrates that the error messages are sufficiently
clear to allow easy determination of the problem in the expressions.

Type equivalence
Several diagnostics produced by the Pascal translator complain about ·non-equivalent
types'. In general. Berkeley Pascal considers variables to have the same type only if they were
declared with the same constructed type or with the same type identifier. Thus. the variables x
and y declared as

var
x: T integer:
y: T integer.

Berkeley Pascal User's Manual 2-183

do not have the same type. The assignment

x :-y
thus produces the diagnostics:
Tue Oct 14 21:38 1980 typequ.p:
E 7 - Type clash: non-identical pointer types
... Type of expression clashed with type of variable in assignment
Thus it is always necessary to declare a type such as
type intptr - T integer;
and use it to declare
var x: intptr; y: intptr.
Note that if we had initially declared
var x. y: T integer;
then the assignment statement would have worked. The statement

xf :- YT
is allowed in either case. Since the parameter to a procedure or function must be declared with
a type identifier rather than a constructed type. it is always necessary. in practise. to declare any
type which will be used in this way.
Unreachable statements
Berkeley Pascal flags unreachable statements. Such statements usually correspond to
errors in the program logic. Note that a statement is considerrd to be reachable if there is a
potential path of control. even if it can never be taken. Thus, no diagnostic is produced for the
statement:
if false then
writeln ('impossible!')
Goto's into structured statements
The translator detects and complains about goto statements which transfer control into
structured statements (for. while, etc.) It does not allow such jumps, nor does it allow branching from the then part of an if statement into the else part. Such checks are made only within
the body of a single procedure or function.
Unused variables, never set variables
Although p; always clears variables to 0 at procedure and function entry, pc does not
unless runtime checking is enabled using the C option. It is not good programming practice to
rely on this initialization. To discourage this practice. and to help detect errors in program
logic. pi flags as a 'w' warning error:
1) Use of a variable which is never assigned a value.
2)
A variable which is declared but never used. distinguishing between those variables
for which values are computed but which are never used. and those completely
unused.
In fact. these diagnostics are applied to all declared items. Thus a const or a procedure which is
declared but never used is flagged. The w option of p; may be used to suppress these warnings~
see sections 5.1 and 5.2.

2-184 Berkeley Pascal User's Manual
3.3. Translator panics. i/o errors
Panics
One class of error which rarely occurs. but which causes termination of all processing
when it does is a panic. A panic indicates a translator-detected internal inconsistency. A typical
panic message is:
snark (rvalue) line-110 yyline-109
Snark in pi
If you receive such a message. the translation will be quickly and perhaps ungracefully terminated. You should contact a teaching assistant or a member of the system staff. after saving
a copy of your program for later inspection. If you were making changes to an existing program
when the problem occurred. you may be able to work around the problem by ascertaining
which change caused the s11ark and making a different change or correcting an error in the program. A small number of panics are possible in px. All panics should be reported to a teaching
assistant or systems staff so that they can be fixed.
Out of memory
The only other error which will abort translation when no errors are detected is running
out of memory. All tables in the translator. with the exception of the parse stack. are dynamically allocated. and can grow to take up the full available process space of 64000 bytes on the
PDP-11. On the VAX-11. table sizes are extremely generous and very large (25000) line programs have been easily accommodated. For the PDP-11, it is generally true that the size of the
largest translatable program is directly related to procedure and function size. A number of
non-trivial Pascal programs. including some with more than 2000 lines and 2500 statements
have been translated and interpreted using Berkeley Pascal on PDP-11 's. Notable among these
are the Pascal-S interpreter. a large set of programs for automated generation of code generators. and a general context-free parsing program which has been used to parse sentences with a
grammar for a superset of English. In general, very large programs should be translated using
p<: and the separate compilation facility.
If you receive an out of space message from the translator during translation of a large
procedure or function or one containing a large number of string constants you may yet be able
to translate your program if you break this one procedure or function into several routines.

1/0 errors
Other errors which you may encounter when running pi relate to input-output. If pi cannot open the file you specify, or if the file is empty, you will be so informed.
3.4. Run-time errors
We saw, in our second example. a run-time error. We here give the general description
of run-time errors. The more unusual interpreter error messages are explained briefly in the
manual section for px (1).
Start-up errors
These errors occur when the object file to be executed is not available or appropriate.
Typical errors here are caused by the specified object file not existing, not being a Pascal object.
or being inaccessible to the user.
Program execution errors
These errors occur when the program interacts with the Pascal runtime environment in an
inappropriate way. Typical errors are values or subscripts out of range, bad arguments to builtin functions. exceeding the statement limit because of an infinite loop, or running out of

Berkeley Pascal User's Manual 2-185
memory;. The interpreter will produce a backtrace after the error occurs. showing all the active
routine calls. unless the p option was disabled when the program was translated. UnfortunaLely.
no variable values are given and no way of extracting them is available.•
As an example of such an error, assume that we have accidentally declared the consLant
11 I to be 6. instead of 7 on line 2 of the program primes as given in section 2.6 above. If we
run this program we get the following response.
% pix primes.p
Execution begins ...
2
3
5
37
41
31
73
79
83
127
137
131

7
43
89
139

11
47
97
149

13
53
101
151

17
59
103
157

19
61
107
163

23
67
109
167

29
71
113

Subscript out of range
Error in "primes"+8 near line 14.
Execution terminated abnormally.
941 statements executed in 0.50 seconds cpu time.
%
Here the interpreter indicates that the program terminated abnormally due to a subscript
out of range near line 14. which is eight lines into the body of the program primes.
Interrupts
If the program is interrupted while executing and the p option was not specified. then a
backtrace will be printed. t The file pmo11.out of profile information will be written if the program was translated with the z option enabled to pi or pix.

1/0 interaction errors
The final class of interpreter errors results from inappropriate interactions with files.
including the user's terminal. Included here are bad formats for integer and real numbers
(such as no digits after the decimal point) when reading.

;The checks for running out of memory are not foolproof and there is a chance that the interpreter will fault.
producing a core image when it runs out of memory. This situation occurs very rarely.
• On the VAX· I I. each variable is restricted to allocate at most 65000 bytes of storage !this is a PDP· 1lism that
has survived to the VAX.)
tQccasionally. the Pascal system will be in an inconsistent stale when this occurs. e.g. when an interrupt terminates a procedure or function entry or exit. In this case. the backtrace will only contain the current line.
A reverse call order list of procedures will not be given.

2-186 Berkeley Pascal User's Manual
4. Input/output
This section describes features of the Pascal input/output environment, with special consideration of the features peculiar to an interactive implementation.
4.1. Introduction
Our first sample programs. in section 2. used the file output. We gave examples there of
redirecting the output to a file and to the line printer using the shell. Similarly, we can read the
input from a file or another program. Consider the following Pascal program which is similar to
the program cat (1).

% pix -1 kat.p <primes
Berkeley Pascal PI -

Version 2.0 (Sat Oct 18 21:01:54 1980)

Tue Oct 14 21:38 1980 kat.p
1 program kat(input. output);
2 var
3
ch: char;
4 begin
5
while not eof do begin
6
while not eoln do begin
7
read(ch);
8
write(ch)
9
end;
10
readln;
11
writeln
12
end
13 end ( kat }.
Execution begins ...
2
3
5
7
11
13
31
37
41
43
47
53
73
79
83
89
97
101
127
131
137
139
149
15i
179
181
191
193
197
199

17
59
103
157
211

19
61
107
163
223

23
67
109
167
227

29
71
113
173
229

Execution terminated.
925 statements executed in 0.15 seconds cpu time.
%

Here we have used the shell's syntax to redirect the program input from a file in primes in
which we had placed the output of our prime number program of section 2.6. It is also possible
to 'pipe' input to this program much as we piped input to the line printer daemon /pr (1 l
before. Thus, the same output as above would be produced by
% cat primes I pix -I kat.p

All of these examples use the shell to control the input and output from files. One very
simple way to associate Pascal files with named UNIXt files is to place the file name in the program statement. For example, suppose we have previously created the file data. We then use
it as input to another version of a listing program.
tUNIX is a Trademark of Bell Laboralories.

Berkeley Pascal User's Manual 2-187
% cat data

line one.
line two.
line three is the end.
% pix -I copydata.p
Berkeley Pascal Pl -

Version 2.0 (Sat Oct 18 21:01:54 1980)

Tue Oct 14 21:37 1980 copydata.p
1 program copydata(data. output);

2 var
3
ch: char:
data: text:
5 begin
6
reset(data);
7
while not eof(data) do begin
8
while not eoln(data) do begin
9
read(data. ch>:
10
write(ch)
11
end:
12
readln (data):
13
writeln
14
end
15 end I copydata }.
Execution begins ...
line one.
line two.
line three is the end.
Execution terminated.
4

134 statements executed in 0.08 seconds cpu time.
%

By mentioning the file data in the program statement. we have indicated that we wish it to
correspond to the UNIX file data. Then. when we 'reset(data)', the Pascal system opens our file
'data' for reading. More sophisticated, but less portable. examples of using UNIX files will be
given in sections 4.5 and 4.6. There is a portability problem even with this simple ex.ample.
Some Pascal systems attach meaning to the ordering of the file in the program statement file
list. Berkeley Pascal does. not do so.

4.2. Eof and eoln
An extremely common problem encountered by new users of Pascal. especially in the
interactive environment offered by UNIX. relates to the definitions of eof and eo/11. These functions are supposed to be defined at the beginning of execution of a Pascal program, indicating
whether the input device is at the end of a line or the end of a file. Setting eof or eo/11 actually
corresponds to an implicit read in which the input is inspected. but no input is .. used up". In
fact. there is no way the system can know whether the input is at the end-of-file or the end-ofline unless it attempts to read a line from it. If the input is from a previously created file. then
this reading can take place without run-time action by the user. However. if the input is from a
terminal. then the input is what the user types. t If the system were to do an initial read
automatically at the beginning of program execution. and if the input were a terminal. the user
would have to type some input before execution could begin. This would make it impossible
tit is not possible to determine whether the input is a terminal. as the input may appear to be a file but actually be a pipe. the output of a proaram which is reading from the terminal.

2-188 Berkeley Pascal User's Manual
for the program to begin by prompting for input or printing a herald.
Berkeley Pascal has been designed so that an initial read is not necessary. At any given
time. the Pascal system may or may not know whether the end-of-file or end-of-line conditions
are true. Thus. internally, these functions can have three values - true. false. and .. I don"t
know yet; if you ask me I'll have to find out". All files remain in this last, indeterminate state
until the Pascal program requires a value for eof or eo/11 either explicitly or implicitly. e.g. in a
call to read. The important point to note here is that if you force the Pascal system to determine whether the input is at the end-of-file or the end-of-line. it will be necessary for it to
attempt to read from the input.
Thus consider the following example code
while not eof do begin
writefnumber. please? ');
read(i);
writeln('that was a·, i: 2)
end
At first glance, this may be appear to be a correct program for requesting. reading and echoing
numbers. Notice. however, that the while loop asks whether eof is true before the request is
printed. This will force the Pascal system to decide whether the input is at the end-of-file. The
Pascal system will give no messages; it will simply wait for the user to type a line. By producing the desired prompting before testing eof. the following code avoids this problem:
write('number, please ?');
while not eof do begin
read(i);
writelnrthat was a ·, i:2);
write('number. please ?')
end
The user must still type a line before the while test is completed, but the prompt will ask for it.
This example, however, is still not correct. To understand why, it is first necessary to know. as
we will discuss below, that there is a blank character at the end of each line in a Pascal text file.
The read procedure. when reading integers or real numbers. is defined so that. if there are only
blanks left in the file, it will return a zero value and set the end-of-file condition. If, however.
there is a number remaining in the file, the end-of-file condition will not be set even if it is the
last number, as read never reads the blanks after the number. and there is always at least one
blank. Thus the modified code will still put out a spurious
that was a 0
at the end of a session with it when the end-of-file is reached. The simplest way to correct the
problem in this example is to use the procedure readln instead of read here. In general. unless
we test the end-of-file condition both before and after calls to read or read/11, there will be
inputs for which our program will attempt to read past end-of-file.

4.3. More about eoln
To have a good understanding of when eoln will be true it is necessary to know that in any
file there is a special character indicating end-of-line. and that, in effect, the Pascal system
always reads one character ahead of the Pascal read commands. t For instance, in response to
'read (ch)', the system sets ch to the current input character and gets the next input character.
If the current input character is the last character of the line. then the next input character
from the file is the new-line character, the normal UNIX line separator. When the read routine
gets the new-line character, it replaces that character by a blank (causing every line to end with
tin Pascal terms. 'readlchl' corresponds to 'ch :•input·; get(input)'

Berkeley Pascal User's Manual 2-189
a blank) and sets eoln to true. Eoln will be true as soon as we read the last character of the line
and before we read the blank character corresponding to the end of line. Thus it is almosl
always a mistake to write a program which deals with input in the following way:
read(ch);

if eoln then
Done with line
else

Normal processing
as this will almost surely have the effect of ignoring the last character in the line.
'read(ch)' belongs as part of the normal processing.

The

Given this framework. it is not hard to explain the function of a readln call, which is
defined as:
while not eoln do
get(input);
get(input);
This advances the file until the blank corresponding to the end-of-line is the current input symbol and then discards this blank. The next character available from read will therefore be the
first character of the next line. if one exists.

4.4. Output buffering
A final point about Pascal input-output must be noted here. This concerns the buffering
of the file output. It is extremely inefficient for the Pascal system to send each character to the
user's terminal as the program generates it for output; even less efficient if the output is the
input of another program such as the line printer daemon /pr (1). To gain efficiency, the Pascal
system "buffers" the output characters (i.e. it saves them in memory until the buffer is full and
then emits the entire buffer in one system interaction.) However, to allow interactive prompting to work as in the example given above. this prompt must be printed before the Pascal system waits for a response. For this reason, Pascal normally prints all the output which has been
generated for the file output whenever
1) A writeln occurs. or
2)
The program reads from the terminal. or
3)
The procedure message or flush is called.
Thus. in the code sequence
for i :- I to S do begin
write(i: 2);

Compute a lot with no output
end;
writeln
the output integers will not print until the writeln occurs. The delay can be somewhat disconcerting. and you should be aware that it will occur. By setting the b option to 0 before the program statement by inserting a comment of the form
(•SbO•)
we can cause output to be completely unbuffered. with a corresponding horrendous degradation
in program efficiency. Option control in comments is discussed in section 5.

2-190 Berkeley Pascal User's Manual

4.5. Files, reset, and rewrite
It is possible to use extended forms of the built-in functions reser and re"·rire to get more
general associations of UNIX file names with Pascal file variables. When a file other than 111pur
or output is to be read or written. then the reading or writing must be preceded by a reser or
rewri1e call. In general, if the Pascal file variable has never been used before. there will be no
UNIX filename associated with it. As we saw in section 2.9. by mentioning the file in the program statement, we could cause a UNIX file with the same name as the Pascal variable to be
associated with it. If we do not mention a file in the program statement and use it for the first
time with the statement
reset<O
or
rewrite(O
then the Pascal system will generate a temporary name of the form 'tmp.x' for some character
'x', and associate this UNIX file name name with the Pascal file. The first such generated name
will be 'tmp. l' and the names continue by incrementing their last character through the ASCII
set. The advantage of using such temporary files is that they are automatically removed by the
Pascal system as soon as they become inaccessible. They are not removed. however. if a runtime error causes termination while they are in scope.
To cause a particular UNIX pathname to be associated with a Pascal file variable we can
give that name in the reset or rewrite call, e.g. we could pave associated the Pascal file da1a with
the file 'primes' in our example in section 3.1 by doing:
reset(data. 'primes')
insteatl of a simple
reset (data)
In this case it is not essential to mention 'data' in the program statement. but it is still a good
idea because is serves as an aid to program documentation. The second parameter to reset and
re"·rite may be any string value, including a variable. Thus the names of UNIX files to be associated with Pascal file variables can be read in at run time. Full details on file name/file variable
associations are given in section A.3.

4.6. Argc and argv
Each UNIX process receives a variable length sequence of arguments each of which is a
variable length character string. The built-in function argc and the built-in procedure argv can
be used to access and process these arguments. The value of the function argc is the ru.imber
of arguments to the process. By convention. the arguments are treated as an array. and
indexed from 0 to argc-1, with the zeroth argument being the name of the program being executed. The rest of the arguments are those passed to the command on the command line.
Thus. the command

% obj /etc/motd /usr/dict/words hello
will invoke the program in the file obj with argc having a value of 4. The zeroth element
accessed by argv will be 'obj', the first '/etc/motd', etc.
Pascal does not provide variable size arrays. nor does it allow character strings of varying
length. For this reason, argv is a procedure and has the syntax
argv(i. a)
where i is an integer and a is a string variable. This procedure call assigns the (possibly truncated or blank padded) i'th argument of the current process to the string variable a. The file
manipulation routines reset and rewrite will strip trailing blanks from their optional second

Berkeley Pascal User's Manual 2-191

arguments so that this blank padding is not a problem in the usual case where the arguments
are file names.
We are now ready to give a Berkeley Pascal program 'kat'. based on that given in section
3.1 above. which can be used with the same syntax as the UNIX system program cat ( 1).

% cat kat.p
program kat(input. output);
var
ch: char;
i: integer;
name: packed array [1..100) of char;
begin
i :- 1;
repeat
if i < argc then begin
argv(i. name);
reset (input. name);
i :-i + 1
end~

while not eof do begin
while not eoln do begin
read(ch);
write(ch)
end;
readln;
writeln
end
until i >- argc
end I kat }.
%
Note that the reset call to the file input here. which is necessary for a clear program. may be
disallowed on other systems. As this program deals mostly with argc and argv and UNIX system
dependent considerations. portability is of little concern.
If this program is in the file 'kat.p', then we can do

% pi kat.p
% mv obj kat
% kat primes
2
3
31
37
73
79
127
131
179
181

5
41
83
137
191

7
43
89
139
193

11
47
97
149
197

13
53
101
151
199

17
59
103
157
211

19
61
107
163
223

23
67
109
167
227

29
71
113
173
229

930 statements executed in 0.18 seconds cpu time.
% kat
This is a line of text.
This is a line of text.
The next line contains only an end-of-file (an invisible control-d!)
The next line contains only an end-of-file (an invisible control-d!)
287 statements executed in 0.03 seconds cpu time.
%

2-192 Berkeley Pascal User's Manual
Thus we see that. if it is given arguments. 'kat' will. like cat, copy each one in tum. If no
arguments are given. it copies from the standard input. Thus it will work as it did before_ with
% kat < primes

now equivalent to
% kat primes

although the mechanisms are quite different in the two cases. Note that if 'kat' is given a bad
file name, for example:

% kat xxxxqqq
Could not open xxxxqqq: No such file or directory
Error in "kat"+5 near line 11.
4 statements executed in 0.02 seconds cpu time.
%

it will give a diagnostic and a post-mortem control flow backtrace for debugging. If we were
going to use 'kat'. we might want to translate it differently, e.g.:

o/o pi -pb kat.p
o/o mv obj kat
Here we have disabled the post-mortem statistics printing. so as not to get the statistics or the
full traceback on error. The b option will cause the system to block buffer the input/output so
that the program will run more efficiently on large files. We could have also specified the t
option to tum off runtime tests if that was felt to be a speed hindrance to the program. Thus
we can try the last examples again:

% kat xxxxqqq
Could not open xxxxqqq: No such file or directory
Error in "kat"

% kat primes
2
31
73
127
179

3
37
79
131
181

5
41
83
137
191

7
43
89
139
193

11
47
97
149
197

13
53
101
151
199

17
59
103
157
211

19
61
107
163
223

23
67
109
167
227

29
71
113
173
229

o/o
The interested reader may wish to try writing a program which accepts command line
arguments like pi does. using argc and argv to process them.

Berkeley Pascal User's Manual 2-193

5. Details on the components of the system

5.1. Options
The programs pi, pc. and pxp take a number of options.t There is a standard L'NIXt convention for passing options to programs on the command line, and this convention is followed
by the Berkeley Pascal system programs. As we saw in the examples above. option related
arguments consisted of the character ' - ' followed by a single character option name.
Except for the b option which takes a single digit value, each option may be set on
(enabled) or off (disabled.) When an on/off valued option appears on the command line of pi
or it inverts the default setting of that option. Thus
% pi -I foo.p
enables the listing option I. since it defaults off. while
% pi -t foo.p

disables the run time tests option t, since it defaults on.
In additon to inverting the default settings of pi options on the command line. it is also
possible to control the pi options within the body of the program by using comments of a special form illustrated by

ISi-)
Here we see that the opening comment delimiter (which could also be a '(•') is immediately followed by the character '$'. After this '$', which signals the start of the option list. we
can place a sequence of letters and option controls, separated by '.' characters;. The most basic
actions for options are to set them. thus
{SI+ Enable listing}
or to clear them
{St-,p- No run-time tests, no post mortem analysis)
Notice that '+' always enables an option and • - ' always disables it. no matter what the default
is. Thus ' - ' has a different meaning in an option comment than it has on the command line.
As shown in the examples. normal comment text may follow the option list.

5;2. Options common to Pi, Pc. and Pix
The following options are common to both the compiler and the interpreter. With each
option we give its default setting, the setting it would have if it appeared on the command line.
and a sample command using the option. Most options are on/off valued. with the b option
taking a single digit value.

\
'

tAs pix uses pi to translale Pascal programs. il takes Lhe options of pi also. We refer to them here. however.
as p1 options.
tUNIX is a Trademark of Bell Laboratories.
iThis format was chosen because it is used by Pascal 6000-3.4. In general the options common to both implementalions are controlled in the same way so that commenL control in options is mostly portable. It is
recommended. however. Lhal only one control be pul per commenl for maximum portability. dS the Pasc;il
6000-3.4 implementalion win ignore controls after the firsl one which it does not recognize.

2-194 Berkeley Pascal User's Manual

Buffering of the file output - b
The b option controls the buffering of the file output. The default is line buffering. with
flushing at each reference to the file input and under certain other circumstances detailed in section S below. Mentioning b on the command line. e.g.
% pi -b assembler.p

causes standard output to be block buffered. where a block is some system-defined number of
characters. The b option may also be controlled in comments. It. unique among the Berkeley
Pascal options. takes a single digit value rather 'than an on or off setting. A value of 0. e.g.

ISbOI
causes the file output to be unbuffered. Any value 2 or greater causes block buffering and is
equivalent to the flag on the command line. The option control comment setting b must precede the program statement.
Include file listing - i
The i option takes the name of an include file. procedure or function name and causes it
to be listed while translatingt. Typical uses would be

% pix -i scanner.i compiler.p
to make a listing of the routines in the file scanner.i. and
% pix -i scanner compiler.p

to make a listing of only the routine scanner. This option is especially useful for conservationminded programmers making partial program listings.
Make a listing - I
The I option enables a listing of the program. The I option defau Its off. When specified
on the command line. it causes a header line identifying the version of the translator in use and
a line giving the modification time of the file being translated to appear before the actual program listing. The I option is pushed and popped by the i option at appropriate points in the
program.
Standard Pascal only - s
The s option causes many of the features of the UNIX implementation which are not
found in standard Pascal to be diagnosed as •s' warning errors. This option defaults off and is
enabled when mentioned on the command line. Some of the features which are diagnosed are:
non-standard procedures and functions. extensions to the procedure write. and the padding of
constant strings with blanks. In addition. all letters are mapped to lower case except in strings
and characters so that the case of keywords and identifiers is effectively ignored. The s option
is most useful when a program is to be transported. thus

% pi -s isitstd.p
will produce warnings unless the program meets the standard.
Runtime tests - t and C
These options control the generation of tests that subrange variable values are within
bounds at run time. pi defaults to generating tests and uses the option t to disable them. pc
defaults to not generating tests, and uses the option C to enable them. Disabling runtime tests
also causes assert statements to be treated as comments.:i:
tlnclude files are discussed in section 5.9.
iSee section A. I for a description of assert statements.

Berkeley Pascal User's Manual 2-195

Suppress warning diagnostics - w
The w option. which defaults on. aJlows the translator to print a number of warnings
about inconsistencies it finds in the input program. Turning this option off with a comment of
the form
ISw-}
or on the command line
% pi -w tryme. p
suppresses these usually useful diagnostics.
Generate counters for a pxp execution profile - z
The z option. which defaults off. enables the production of execution profiles. By specifying z on the command line. i.e.
% pi -z foo.p

or by enabling it in a comment before the program statement causes pi and pc to insert operations in the interpreter code to count the number of times each statement was executed. An
example of using pxp was given in section 2.6; its options are described in section 5.6. Note
that the z option cannot be used on separately compiled programs.

5.J. Options available in Pl
Post-mortem dump - p
The p option defaults on. and causes the runtime system to initiate a post-mortem backtrace when an error occurs. It also cause px to count statements in the executing program,
enforcing a statement limit to prevent infinite loops. Specifying p on the command line disables these checks and the ability to give this post-mortem analysis. It does make smaller and
faster programs. however. It is also possible to control the p option in comments. To prevent
the post-mortem backtrace on error, p must be off at the end of the program statement. Thus.
the Pascal cross-reference program was translated with
% pi -pbt pxref.p

5.4. Options available in Px
The first argument to px is the name of the file containing the program to be interpreted.

If no arguments are given. then the file obj is executed. If more arguments are given. they are
available to the Pascal program by using the built-ins argc and argv as described in section 4.6.
Px may also be invoked automatically. In this case. whenever a Pascal object file name is
given as a command. the command will be executed with px prepended to it; that is
% obj primes

will be converted to read
% px obj primes

5.5. Options available in Pc
Generate assembly language - S
The program is compiled and the assembly language output is left in file appended .s.
Thus
% pc -S foo.p

2-196 Berkeley Pascal User's Manual
creates a file foo.s. No executable file is created.
Symbolic Debugger Information - g
The g option causes the compiler to generate information needed by sdb(J) the symbolic
debugger. For a complete description of sdb see Volume 2c of the u~1x Reference Manual.
Redirect the output file - o
The name argument after the -o is used as the name of the output file instead of a.ou1. Its
typical use is to name the compiled program using the root of the file name. Thus:
% pc -o myprog myprog.p

causes the compiled program to be called myprog.
Generate counters for a pro/execution profile - p
The compiler produces code which counts the number of times each routine is called.
The profiling is based on a periodic sample taken by the system rather than by inline counters
used by pxp. This results in less degradation in execution. at somewhat of a loss in accuracy.
See proj(.1) for a more complete description.
Run the object code optimizer - 0
The output of the compiler is run through the object code optimizer. This provides an
increase in compile time in exchange for a decrease in compiled code size and execution time.
S.6. Options available in Pxp
Pxp takes. on its command line. a list of options followed by the program file name. which
must end in '.p' as it must for pi. pc. and pix. Pxp will produce an execution profile if any of
the z. t or c options is specified on the command line. If none of these options is specified.
then pxp functions as a program reformatter.
It is important to note that only the z and w options of pxp. which are common to pi. pc.
and pxp can be controlled in comments. All other options must be specified on the command
line to have any effect.
The following options are relevant to profiling with pxfJ".
Include the bodies of all routines in the profile - a
Pxp normally suppresses printing the bodies of routines which were never executed. to
make the profile more·compact. This option forces all routine bodies to be printed.
Suppress declaration parts from a profile - d
Normally a profile includes declaration parts.
suppresses declaration parts.

Specifying d on the command line

Eliminate include directives - e
Normally. pxp preserves include directives to the output when reformatting a program. as
though they were comments. Specifying -e causes the contents of the specified files to be
reformatted into the output stream instead. This is an easy way to eliminate include directives.
e.g. before transporting a program.
Fully parenthesize expressions - f
Normally pxp prints expressions with the minimal parenthesization necessary to preserve
the structure of the input. This option causes pxp to fully parenthesize expressions. Thus the
statement which prints as

Berkeley Pascal User's Manual 2-197

d :- a + b mod c I e
with minimal parenthesization. the default, will print as
d :- a + ((b mod c) I e)
with the f option specified on the command line.
Left justify all procedures and functions - j
Normally, each procedure and function body is indented to reflect its static nesting depth.
This option prevents this nesting and can be used if the indented output would be too wide.
Print a table summarizing procedure and function calls - t
The t option causes pxp to print a table summarizing the number of calls to each procedure and function in the program. It may be specified in combination with the z option. or
separately.
Enable and control the profile - z
The z profile option is very similar to the i listing control option of pi. If z is specified on
the command line. then all arguments up to the source file argument which ends in '.p' are
taken to be the names of procedures and functions or include files which are to be profiled. If
this list is null, then the whole file is to be profiled. A typical command for extracting a profile
of part of a large program would be

% pxp -z test parser.i compiler.p
This specifies that profiles of the routines in the file parser.i and the routine test are to be made.

5. 7. Formatting programs using pxp
The program pxp can be used to reformat programs, by using a command of the form

% pxp dirty.p > clean.p
Note that since the shell creates the output file 'clean.p' before pxp executes. so ·ctean.p' and
'dirty.p' must not be the same file.
Pxp automaticaJly paragraphs the program. performing housekeeping chores such as comment alignment, and treating blank lines, lines containing exactly one blank and lines containing only a form-feed character as though they were comments, preserving their vertical spacing
etfect in the output. Pxp distinguishes between four kinds of comments:
1)
Left marginal comments, which begin in the first column of the input line and are
placed in the first column of an output line.
2)
Aligned comments, which are preceded by no input tokens on the input line. These
are aligned in the output with the running program text.
3)
Trailing comments, which are preceded in the input line by a token with no more
than two spaces separating the token from the comment.
4)
Right marginal comments, which are preceded in the input line by a token from
which they are separated by at least three spaces or a tab. These are aligned down
the right margin of the output, currently to the first tab stop after the 40th column
from the current "left margin".
Consider the following program.
% cat comments.p

I This is a left marginal comment. J
program hello(output);
var i : integer; !This is a trailing comment}
j : integer;
!This is a right marginal comment!

2-198 Berkeley Pascal User's Manual
k: array [ 1..10) of array [l..10) of integer; !Marginal. but past the margin}
I

An aligned. multi-line comment
which explains what this program is
all about

l
begin
i :- I; {Trailing i comment}
IA left marginal comment}
!An aligned comment}
j :- I;
{Right marginal comment}
k[l] :- I;
writeln(i. j. k[l])
end.
When formatted by pxp the following output is produced.

% pxp c:omments.p

I This is a left marginal comment. l

program hello(output);
var
i: integer. (This is a trailing comment}
j: integer;
k: array [l..10) of array [1..10) of integer;

lThis is a right marginal comment!
!Marginal. but past the marginl

l
An aligned. multi-line comment
which explains what this program is
all about

l
begin
i :- 1; {Trailing i comment}
IA left marginal comment)
!An aligned comment)
j :- 1;
k[l] :- 1;
writeln(i. j. k[I])
end.

!Right marginal comment}

The following formatting related options are currently available in pxp. The options f and j
described in the previous section may also be of interest.
Strip comments -s
The s option causes pxp to remove ail comments from the input text.
Underline keywords - _
A command line argument of the form - _ as in

% pxp -- dirty.p
can be used to cause pxp to underline all keywords in the output for enhanced readability.

Berkeley Pascal User's Manual 2-199

Specify indenting unit - 1234567891
The normal unit which pxp uses to indent a structure statement level is 4 spaces. By giving an argument of the form -d with d a digit. 2 ~ d ~ 9 you can specify that d spaces are to
be used per level instead.
5.8. Pxref
The cross-reference program pxrefmay be used to make cross-referenced listings of Pascal
programs. To produce a cross-reference of the program in the file 'foo.p' one can execute the
command:
% pxref foo. p

The cross-reference is. unfortunately, not block structured. Full details on pxrefare given in its
manual section pxref (1).
5.9. Multi-file programs
A text inclusion facility is available with Berkeley Pascal. This facility allows the interpolation of source text from other files into the source stream of the translator. It can be used to
divide large programs into more manageable pieces for ease in editing, listing. and maintenance.
The include facility is based on that of the UNIX C compiler. To trigger it you can place
the character '#' in the first portion of a line and then. after an arbitrary number of blanks or
tabs. the word •include' followed by a filename enclosed in single ••• or double '"' quotation
marks. The file name may be followed by a semicolon ';' if you wish to treat this as a pseudoPascal statement. The filenames of included files must end in '.i'. An example of the use of
included files in a main program would be:

program compiler(input. output. obj);
#include "globals.i"
#include "scanner.i"
#include "parser.i"
#include "semantics.i"
begin
I main program }
end.

At the point the include pseudo-statement is encountered in the input, the lines from the
included file are interpolated into the input stream. For the purposes of translation and runtime
diagnostics and statement numbers in the listings and post-mortem backtraces. the lines in the
included file are numbered from 1. Nested includes are possible up to 10 deep.
See the descriptions of the i option of pi in section 5.2 above; this can be used to control
listing when include files are present.
When a non-trivial line is encountered in the source text after an include finishes. the
·popped' filename is printed. in the same manner as above.
For the purposes of error diagnostics when not making a listing. the filename will be
printed before each diagnostic if the current filename has changed since the last filename was
printed.
5.10. Separate Compilation with Pc
A separate compilation facility is provided with the Berkeley Pascal compiler. pc. This
facility allows programs to be divided into a number of files and the pieces to be compiled individually. to be linked together at some later time. This is especially useful for large programs.
where small changes would otherwise require time-consuming re-compilation of the entire

2-200 Berkeley Pascal User's Manual

program.
Normally. pc expects to be given entire Pascal programs. However. if given the -c
option on the command line. it will accept a sequence of definitions and declarations. and compile them into a .o file. to be linked with a Pascal program at a later time. In order that procedures and functions be available across separately compiled files. they must be declared with
the directive external. This directive is similar to the directive forward in that it must precede
the resolution of the function or procedure. and formal parameters and function result types
must be specified at the external declaration and may not be specified at the resolution.
Type checking is performed across separately compiled files. Since Pascal type defintions
define unique types. any types which are shared between separately compiled files must be the
same definition. This seemingly impossible problem is solved using a facility similar to the
include facility discussed above. Definitions may be placed in files with the extension .h and
the files included by separately compiled files. Each definition from a .h file defines a unique
type. and all uses of a definition from the same .h file de.fine the same type. Similarly. the
facility is extended to allow the definition of consts and the declaration of labels. vars. and
external functions and procedures. Thus procedures and functions which are used between
separately compiled files must be declared external. and must be so declared in a .h file
included by any file which calls or resolves the function or procedure. Conversely. functions
and procedures declared external may only be so declared in .h files. These files may be
included only at the outermost level. and ihus define or declare global objects. Note that since
only external function and procedure declarations (and not resolutions) are allowed in .h files.
statically nested functions and procedures can not. be declared external.
An example of tile use of included .h files in a program would be:
program compiler(input, output, obj);
#include •globals.h"
#include "scanner.h"
#include "parser.h"
#include "semantics.h•
begin
I main program }
end.
This might include in the main program the definitions and declarations of all the global
labels, consts. types vars from the file globals.h, and the external function and procedure
declarations for each of the separately compiled files for the scanner. parser and semantics. The
header file sca1111er.h would contain declarations of the form:
type
token - record
I token fields }
end;
function scan (var inputfile: text): token;
external;

Berkeley Pascal User's Manual 2-201

Then the scanner might be in a separately compiled file containing:
#include "globaJs.h"
#include "scanner.h"
function scan;
begin
I scanner code }
end;
which includes the same global definitions and declarations and resolves the scanner functions
and procedures declared external in the file scanner.h.

2-202 Berkeley Pascal User's Manual

A. Appendix to Wirth's Pascal Report
This section is an appendix to the definition of the Pascal language in Niklaus Wirth·s
Pascal Report and, with that Report, precisely defines the Berkeley implementation. This
appendix includes a summary of extensions to the language. gives the ways in which the
undefined specifications were resolved, gives limitations and restrictions of the current impleqientation, and lists the added functions and procedures available. It concludes with a list of
differences with the commonly available Pascal 6000-3.4 implementation, and some comments
on standard and portable Pascal.
A.1. Extensions to the language Pascal
This section defines non-standard language constructs available in Berkeley Pascal. The s
standard Pascal option of the translators pi and pc can be used to detect these extensions in programs which. are to be transported.
String padding
Berkeley Pascal will pad constant strings with blanks in expressions and as value parameters to make them as long as is required. The following is a legal Berkeley Pascal program:
program x(output);
var z : packed array [ 1 .. 13 ] of char;
begin
z :- 'red';

writeln(z)
end~

The padded blanks are added on the right. Thus the assignment above is equivalent to:
z :- 'red
which is standard Pascal.
Octal constants, octal and hexadecimal write
Octal constants may be given as a sequence of octal digits followed by the character 'b. or
'B'. The forms
write(a:n oct)
and
write(a:n hex)
cause the internal representation of expression a, which must be Boolean, character. integer.
pointer, or a user-defined enumerated type, to be written in octal or hexadecimal respectively.
Assert statement
An assert statement causes a Boolean expression to be evaluated each time the statement
is executed. A runtime error results if any of the expressions evaluates to be false. The assert
statement is treated as a comment if run-time tests are disabled. The syntax for assert is:
assert < expr >

Berkeley Pascal User's Manual 2-203

Enumerated type input-output
Enumerated types may be read and written. On output the string name associated with
the enumerated value is output. If the value is out of range. a runtime error occurs. On input
an identifier is read and looked up in a table of names associated with the type of the variable.
and the appropriate internal value is assigned to the variable being read. If the name is not
found in the table a runtime error occurs.
Structure returning functions
An extension has been added which allows functions to return arbitrary sized structures
rather than just scalars as in the standard.
Separate compilation
The compiler pc has been extended to allow separate compilation of programs. Procedures and functions declared at the global level may be compiled separately. Type checking
of calls to separately compiled routines is performed at load time to insure that the program as
a whole is consistent. See section 5.10 for details.

A.l. Resolution of the undefined specifications
File name - file variable associations
Each Pascal tile variable is associated with a named UNIXt tile. Except for input and output, which are exceptions to some of the rules. a name can become associated with a tile in any
of three ways:
1)
If a global Pascal tile variable appears in the program statement then it is associated
with UNIX file of the same name.
2)
If a file was reset or rewritten using the extended two-argument form of reset or
rewrite then the given name is associated.
3)
If a tile which has never had UNIX name associated is reset or rewritten without
specifying a name via the second argument. then a temporary name of the form
'tmp.x' is associated with the file. Temporary names start with 'tmp.1' and continue
by incrementing the last character in the USASCII ordering. Temporary tiles are
removed automatically when their scope is exited.
The program statement
The syntax of the program statement is:
program <id> ( <file id> I . <file id > } ) ;
The file identifiers (other than input and output) must be declared as variables of file type in the
global declaration part.

\
/

The flies input and output
The formal parameters input and output are associated with the UNIX standard input and
output and have a somewhat special status. The following rules must be noted:
1)
The program heading must contains the formal parameter output. If input is used.
explicitly or implicitly, then it must also be declared here.
2)
Unlike all other files, the Pascal files input and owpur must not be defined in a
declaration. as their declaration is automatically:
var input. output: text
tUNIX is a Trademark of Bell Laboratories.

2-204 Berkeley Pascal User's Manual

The procedure reset may be used on input. If no UNIX file name has ever been associated with input, and no file name is given. then an attempt will be made to 'rewind'
input. If this fails, a run time error will occur. Rewrite calls to output act as for any
other file. except that output initially has no associated file. This means that a simple
rewrite<output)
associates a temporary name with output.

Details for files
If a file other than input is to be read, then reading must be initiated by a call to the procedure reset which causes the Pascal system to attempt to open the associated UNIX file for reading. If this fails, then a runtime error occurs. Writing of a file other than output must be initiated by a rewrite call, which causes the Pascal system to create the associated UNIX file and to
then open the file for writing only.
Buffering
The buffering for output is determined by the value of the b option at the end of the pro1ram statement. If it has its default value 1. then output is buffered in blocks of up to 512
characters, flushed whenever a writeln occurs and at each reference to the file input. If it has
the value 0, output is unbuffered. Any value of 2 or more gives block buffering without line or
input reference flushing. All other output files are always buffered in blocks of 512 characters.
All output buffers are flushed when the files are closed at scope exit, whenever the procedure
message is called, and can be flushed using the built-in procedure flush.
An important point for an interactive implementation is the definition of 'inputl'. If inpw
is a teletype, and the Pascal system reads a character at the beginning of execution to define
•inputf', then no prompt could be printed by the program before the user is required to type
some input. For this reason. •inputf' is not ~efined by the system until its definition is needed.
reading from a file occurring only when necessary.
The character set
Seven bit USASCII is the character set used on UNIX. The standard Pascal symbols 'and'.
'or', 'not', '<-'. '>-', '< > ·. and the uparrow 'T' (for pointer qualification) are recognized. t
Less portable are the synonyms tilde ••• for not, '&' for and, and for or.
Upper and lower case are considered to be distinct. Keywords and built-in procedure and
function names are composed of all lower case letters. Thus the identifiers GOTO and GOto
are distinct both from each other and from the keyword goto. The standard type 'boolean' is
also available as •Boolean'.
Character strings and constants may be delimited by the character •·• or by the character
•#'; the latter is sometimes convenient when programs are to be transported. Note that the '#'
character has special meaning when it is the first character on a line - see Multi-file programs
below.

·r

The standard types
The standard type integer is conceptually defined as
type integer - minint .. maxint;
/111eger is implemented with 32 bit twos complement arithmetic. ·Predefined constants of type
i11teger are:
tOn many terminals and printers. the up arrow is represented as a cin:umtlex ·••. These are not distinct
characters. but rather different graphic representations of the same internal codes.
The proposed standard for Pascal considers them to be the same.

(

Berkeley Pascal User's Manual 2-205
const maxint - 2147483647; minint - -2147483648:
The standard type char is conceptually defined as

type char - minchar .. maxchar:
Built-in character constants are 'minchar' and 'maxchar', 'bell' and 'tab': ord(minchar) - 0.
ord(maxchar) - 127.
The type real is implemented using 64 bit floating point arithmetic. The floating point
arithmetic is done in 'rounded' mode, and provides approximately 17 digits of precision with
numbers as small as 10 to the negative 38th power and as large as 10 to the 38th power.
Comments
Comments can be delimited by either 'I' and '}' or by • (•' and '• >'. If the character 'I'
appears in a comment delimited by 'I' and'}', a warning diagnostic is printed. A similar warning will be printed if the sequence '(•' appears in a comment delimited by '(•' and '•>'. The
restriction implied by this warning is not part of standard Pascal. but detects many otherwise
subtle errors.
Option control
Options of the translators may be controlled in two distinct ways. A number of options
may appear on the command line invoking the translator. These options are given as one or
more strings of letters preceded by the character ' - ' and cause the default setting of each given
option to be changed. This method of communication of options is expected to predominate
for UNIX. Thus the command
% pi -I -s foo.p

translates the file foo.p with the listing option enabled (as it normally is off), and with only
standard Pascal features available.
If more control over the portions of the program where options are enabled is required.
then option control in comments can and should be used. The format for option control in
comments is identical to that used in Pascal 6000-3.4. One places the character'$' as the first
character of the comment and follows it by a comma separated list of directives. Thus an
equivalent to the command line example given above would be:
ISI+.s+ listing on, standard Pascal)
as the first line of the program. The 'I' option is more appropriately specified on the command
line, since it is extremely unlikely in an interactive environment that one wants a listing of the
program each time it is translated.
Directives consist of a letter designating the option, followed either by a '+' to turn the
option on, or by a ' - ' to turn the option off. The b option takes a single digit instead of a '+ ·
or·-·.
Notes on the listings
The first page of a listing includes a banner line indicating the version and date of generation of pi or pc. It also includes the UNIX path name supplied for the source file and the date of
last modification of that file.
Within the body of the listing, lines are numbered consecutively and correspond to the
line numbers for the editor. Currently, two special kinds of lines may be used to format the
listing: a line consisting of a form-feed character. control-I, which causes a page eject in the listing, and a line with no characters which causes the line number to be suppressed in the listing.
creating a truly blank line. These lines thus correspond to 'eject' and 'space' macros found in
many assemblers. Non-printing characters are printed as the character '?' in the listing. t
tThe character generated by a control-i indents to the next 'tab stop'. Tab stops are set every 8 columns in
UNIX. Tabs thus provide a quick way of indenting in the program.

2-206 Berkeley Pascal User's Manual

The standard procedure write
If no minimum field length parameter is specified for a write, the following default values
are assumed:
integer
real
Boolean
char
string
oct
hex

10
22
length of 'true' or 'false'
1

length of the string
11

8
The end of each line in a text file should be explicitly indicated by 'writeln<O'. where
•wnteln(output)' may be written simply as •wnteln'. For UNIX, the built-in function 'page(f)'
puts a single ASCII form-feed character on the output file. For programs which are to be transported the filter pee can be used to interpret carriage control, as UNIX does not normally do so.
A.3. Restrictions and limitations
Files
Files cannot be members of files or members of dynamically allocated structures.
Arrays, sets and strings
The calculations involving array subscripts and set elements are done with 16 bit arithmetic. This restricts the types over which arrays and sets may be defined. The lower bound of
such a range must be greater than or equal to -32768. and the upper bound less than 32768.
In particular. strings may have any length from 1 to 65535 characters. and sets may contain no
more than 65535 elements.

Line and symbol length
There is no intrinsic limit on the length of identifiers. Identifiers are considered to be distinct if they differ in any single position over their entire length. There is a limit. however. on
the maximum input line length. This limit is quite generous however. currently exceeding 160
characters.
Procedure and function nesting and program size
At most 20 levels of procedure and function nesting are allowed. There is no fundamental. translator defined limit on the size of the program which can be translated. The ultimate
limit. is supplied by the hardware and thus. on the PDP-11. by the 16 bit address space. If one
runs up against the 'ran out of memory' diagnostic the program may yet translate if smaller
procedures are used. as a lot of space is freed by the translator at the completion of each procedure or function in the current implementation.
On the VAX-11. there is an implementation defined limit of 65536 bytes per variable.
There is no limit on the number of variables.
Overftow
There is currently no checking for overflow on arithmetic operations at run-time on the
PDP-11. Overflow checking is performed on the v AX-11 by the hardware.

Berkeley Pascal User's Manual 2-207

A.4. Added types, operators, procedures and functions
Additional predefined types
The type a/fa is predefined as:

type alfa - packed array [ 1..10 1of char
The type intset is predefined as:

type intset • set of 0.. 127

In most cases the context of an expression involving a constant set allows the translator to
determine the type of the set. even though the constant set itself may not uniquely determine
this type. In the cases where it is not possible to determine the type of the set from local context, the expression type defaults to a set over the entire base type unless the base type is
integert. In the latter case the type defaults to the current binding of i11tser. which must be
..type set of (a subrange of) integer .. at that point.
Note that if intset is redefined via:
type intset - set of 0.. 58;

then the default integer set is the implicit intset of Pasc:aJ 6000-3.4
Additional predefined operators
The re!ationals • <' and • >' of proper set inclusion are available. With a and b sets. note
that
(not (a< b)) <> (a>• b)

As an example consider the sets a - [0.2) and b - [1}. The only relation true between these
sets is•<>'.

Non-standard procedures
argv(i.a)
where i is an integer and a is a string variable assigns the (possibly
truncated or blank padded) i'th argument of the invocation of the
current UNIX process to the variable a. The range of valid ; is 0 to

argc-1.
date(a)

assigns the current date to the alfa variable a in the format 'dd
mmm yy ', where 'mmm' is the first three characters of the month.
i.e. •Apr'.

Oush<O

writes the output buffered for Pascal file f into the associated L':"llX
file.
terminates the execution of the program with a control flow backtrace.
with fa textfile and x an integer expression causes the program to
be abnormally terminated if more than x lines are written on file .t:
If xis less than 0 then no limit is imposed.
causes the parameters. which have the format of those to the builtin procedure write, to. be written unbuffered on the diagnostic unit
2. almost always the user's terminal.

halt
linelimi t (f.x);

message(x.... )

tThe current transla1or makes a sllCCial ase of the construct 'if ... in ( ... l' and enforces only the more lax
restriction on 16 bit arithmetic given above in this case.
;currently ignored by ?d11· I l px.

2-208 Berkeley Pascal User's Manual

null

remove(a)
reset ff.a)

rewrite(f.a)
stlimit(i)
time(a)

a procedure of no arguments which does absolutely nothing. It is
useful as a place holder. and is generated by pxp in place of the
invisible empty statement.
where a is a string causes the UNIX file whose name is a. with trailing blanks eliminated. to be removed.
where a is a string causes the file whose name is a (with blanks
trimmed) to be associated with fin addition to the normal function
of reset.
is analogous to 'reset' above.
where i is an integer sets the statement limit to be i statements.
Specifying the p option to pc disables statement limit counting.
causes the current time in the form ' hh:mm:ss ' to be assigned to
the alfa variable a.

Non-standard functions
argc
card(x)
clock
expo(x)
random(x)

seed(i)

sysclock
undefined(x)
wallclock

returns the count of arguments when the Pascal program was
invoked. Argc is always at least 1.
returns the cardinality of the set x. i.e. the number of elements contained in the set.
returns an integer which is the number of central processor milliseconds of user time used by this process.
yields the integer valued exponent of the floating-point representation of x, expo(x) - entier(log2(abs(x))).
where x is a real parameter. evaluated but otherwise ignored.
invokes a linear congruential random number generator. Successive
seeds are generated as (seed•a + c) mod m and the new random
number is a normalization of the seed to the range 0.0 to 1.0; a is
62605. c is 113218009. and m is 536870912. The initial seed is
7774755.
where i is an integer sets the random number generator seed to /
and returns the previous seed. Thus seed(seed(i)) has no effect
except to yield value i.
an integer function of no arguments returns the number of central
processor milliseconds of system time used by this process.
a Boolean function. Its argument is a real number and it always
returns false.
an integer function of no arguments returns the time in seconds
since 00:00:00 GMT January 1. 1970.

A.S. Remarks on standard and portable Pascal
It is occasionally desirable to prepare Pascal programs which will be acceptable at other
Pascal installations. While certain system dependencies are bound to creep in. judicious design
and programming practice can usually eliminate most of the non-portable usages. Wirth's Pascal Report concludes with a standard for implementation and program exchange.
In particular. the following differences may cause trouble when attempting to transport
programs between this implementation and Pascal 6000-3.4. Using the s translator option may
serve to indicate many problem areas. t
tThe s option does not. however. check that identifiers differ in the first 8 characters. Pi and pc also do not
check the semantics of packed.

----

·---

--- · - -

Berkeley Pascal User's Manual 2-209

Features not available in Berkeley Pascal
Segmented files and associated functions and procedures.
The function trum: with two arguments.
Arrays whose indices exceed the capacity of 16 bit arithmetic.
Features available in Berkeley Pascal but not in Pascal 6000-3.4
The procedures reser and rewrite with file names.
The functions argc, seed, sysclock. and wallclock.
The procedures argv, flush, and remove.
Message with arguments other than character strings.
Write with keyword hex.
The assert statement.
Reading and writing of enumerated types.
Allowing functions to return structures.
Separate compilation of programs.
Comparison of records.
Other problem areas
Sets and strings are more general in Berkeley Pascal; see the restrictions given in the
Jensen-Wirth User Manual for details on the 6000-3.4 restrictions.
The character set differences may cause problems, especially the use of the function chr.
characters as arguments to ord, and comparisons of characters. since the character set ordering
differs between the two machines.
The Pascal 6000-3.4 compiler uses a less strict notion of type equivalence. In Berkeley
Pascal, types are considered identical only if they are represented by the same type identifier.
Thus, in particular, unnamed types are unique to the variables/fields declared with them.
Pascal 6000-3.4 doesn't recognize our option flags, so it is wise to put the control of
Berkeley Pascal options to the end of option lists or, better yet. restrict the option list length to
one.
For Pascal 6000-3.4 the ordering of files in the program statement has significance. It is
desirable to place input and output as the first two files in the program statement.

Franz Lisp Manual 2-211

CHAPTER 1
FRANZ LISP

1.1. FRANZ LISPt was created as a tool to further research in symbolic and algebraic manipulation, artificial intelligence, and programming languages at the University of California at Berkeley. Its roots are in a PDP-11 Lisp system which originally came from Harvard. As it grew it adopted features of Maclisp and Lisp Machine Lisp which enables our
work to be shared with colleagues at the Laboratory for Computer Science at M.l.T.
Substantial compatibility with other Lisp dialects (Interlisp, UCILisp, CMULisp) is
achieved by means of support packages and compiler switches. The heart of FRANZ LISP
is written almost entirely in the programming language C. Of course, it has been greatly
extended by additions written in Lisp. A small part is written in the assembly language
for the current host machines, VAXen and a couple of flavors of 68000. Because FRANZ
LISP is written in C, it is relatively portable and easy to comprehend.
FRANZ LISP is capable of running large lisp programs in a timesharing environment,
has facilities for arrays and user defined structures, has a user controlled reader with
character and word macro capabilities, and can interact directly with compiled Lisp, C,
Fortran, and Pascal code.
This document is a reference manual for the FRANZ LISP system. It is not a Lisp
primer or introduction to the language. Some parts will be of interest only to those
maintaining FRANZ LISP at their computer site. This document is divided into four
Movements. In the first one we will attempt to describe the language of FRANZ LISP
precisely and completely as it now stands (Opus 38.69, June 1983). In the second Movement we will look at the reader, function types, arrays and exception handling. In the
third Movement we will look at several large support packages written to help the FRANZ
LISP user, namely the trace package, compiler, fixit and stepping package. Finally the
fourth movement contains an index into the other movements. In the rest of this
chapter we shall examine the data types of FRANZ LISP. The conventions used in the
description of the FRANZ LISP functions will be given in §1.3 -- it is very important that
these conventions are understood.

1.2. Data Types FRANZ LISP has fourteen data types. In this section we shall look in
detail at each type and if a type is divisible we shall look inside it. There is a Lisp function type which will return the type name of a lisp object. This is the official FRANZ LISP
name for that type and we will use this name and this name only in the manual to avoid
confusing the reader. The types are listed in terms of importance rather than alphabetically.

'It is rumored that this name has something to do with Franz Liszt [Frants List] 0811-1886) a Hungarian composer and keyboard virtuoso. These allegations have never been proven.

2-212 Franz Lisp Manual

1.2.0. lispval This is the name we use to describe any lisp object. The function type
will never return 'lispval'.

1.2.1. symbol This object corresponds to a variable in most other programming
languages. It may have a value or may be •unbound'. A symbol may be lambda
bound meaning that its current value is stored away somewhere and the symbol is
given a new value for the duration of a certain context. When the Lisp processor
leaves that context, the symbol's current value is thrown away and its old value is
restored.
A symbol may also have a function binding. This function binding is static; it cannot
be lambda bound. Whenever the symbol is used in the functional position of a Lisp
expression the function binding of the symbol is examined (see Chapter 4 for more
details on evaluation).
A symbol may also have a property list, another static data structure. The property list
consists of a list of an even number of elements, considered to be grouped as pairs.
The first element of the pair is the indicator the second the value of that indicator.
Each symbol has a print name (pname) which is how this symbol is accessed from
input and referred to on (printed) output.
A symbol also has a hashlink used to link symbols together in the oblist -- this field is
inaccessible to the lisp user.
Symbols are created by the reader and by the functions concat, maknam and their
derivatives. Most symbols live on FRANZ LISP'S sole oblist, and therefore two symbols with the same print name are usually the exact same object (they are eq). Symbols which are not on the oblist are said to be uninterned. The function maknam
creates unintemed symbols while concat creates interned ones.
Subpart name

Get value

Set value

Type

value

eval

lispval

property
list

plist
get

function
binding
print name
hash link

getd

set
setq
setplist
putprop
defprop
putd
def

get _pname

list or nil
array, binary, list
or nil
string

1.2.2. list A list cell has two parts, called the car and cdr. List cells are created by the
function cons.

- - - - - - - - - - - ---------

Franz Lisp Manual 2-213

Subpart name

Get value

Set value

Type

car
cdr

rplaca
rplacd

lisp val
lispval

1.2.3. binary This type acts as a function header for machine coded functions. It has
two parts, a pointer to the start of the function and a symbol whose print name
describes the argument discipline. The discipline (if lambda, macro or nlambda) determines whether the arguments to this function will be evaluated by the caller before
this function is called. If the discipline is a string (specifically "subroutine', "functiori',
"integer-functiorl', "real-function", "c-functiori', "double-c-functiori', or "vector-c-functiori' )
then this function is a foreign subroutine or function (see §8.5 for more details on
this). Although the type of the entry field of a binary type object is usually string or
other, the object pointed to is actually a sequence of machine instructions.
Objects of type binary are created by mfunction, cfasl, and getaddress.
Subpart name

Get value

entry
discipline

getentry
getdisc

Set value

Type

putdisc

string or fixnum
symbol or fixnum

1.2.4. fixnum A fixnum is an integer constant in the range -2 31 to 231 -1. Small
fixnums (-1024 to 1023) are stored in a special table so they needn't be allocated each
time one is needed.

1.2.5. flonum A flonum is a double precision real number in the range ±2.9x 10-37 to
± 1. 7x 1038 . There are approximately sixteen decimal digits of precision.

1.2.6. bignum A bignum is an integer of potentially unbounded size. When integer
arithmetic exceeds the limits of fixnums mentioned above, the calculation is automatically done with bignums. Should calculation with bignums give a result which can be
represented as a fixnum, then the fixnum representation will be usedt. This contraction is known as integer normalization. Many Lisp functions assume that integers are
normalized. Bignums are composed of a sequence of list cells and a cell known as an
sdot. The user should consider a bignum structure indivisible and use functions such
as haipart, and bignum-leftshift to extract parts of it.

trhe current algorithms for integer arithmetic operations will return (in certain cases) a result between ±2 30 and
231 as a bignum although this could be represented as a fixnum.

2-214 Franz Lisp Manual

1.2.7. string A string is a null terminated sequence of characters. Most functions of
symbols which operate on the symbol's print name will also work on strings. The
default reader syntax is set so that a sequence of characters surrounded by double
quotes is a string.

1.2.8. port A port is a structure which the system 1/0 routines can reference to
transfer data between the Lisp system and external media. Unlike other Lisp objects
there are a very limited number of ports (20). Ports are allocated by irifile and ouifile
and deallocated by close and resetio. The print function prints a port as a percent sign
followed by the name of the file it is connected to (if the port was opened by fileopen,
iflfile, or ouifile). During initialization, FRANZ LISP binds the symbol piport to a port
attached to the standard input stream. This port prints as %$stdin. There are ports
connected to the standard output and error streams, which print as %$stdout and
%$stderr. This is discussed in more detail at the beginning of Chapter 5.

1.2.9. vector Vectors are indexed sequences of data·. They can be used to implement a
notion of user-defined types, via their associated property list. They make hunks
(see below) logically unnecessary, although hunks are very efficiently garbage collected. There is a second kind of vector, called an immediate-vector, which stores
binary data. The name that the function type returns for immediate-vectors is vectori.
Immediate-vectors could be used to implement strings and block-flonum arrays, for
example. Vectors are discussed in chapter 9. The functions new-vector, and vector,
can be used to create vectors.
Subpart name

Get value

Set value

Type

daturn.J!!
property

vref
vprop

vset
vsetprop
vputprop

lisp val
lisp val

size

vsize

fixnum

1.2.10. array Arrays are rather complicated types and are fully described in Chapter 9.
An array consists of a block of contiguous data, a function to access that data and
auxiliary fields for use by the accessing function. Since an array's accessing function
is created by the user, an array can have any form the user chooses (e.g. ndimensional, triangular, or hash table).
Arrays are created by the function marray.

Franz Lisp Manual 2-215

Subpart name

Get value

Set value

Type

access function

getaccess

putaccess

auxiliary
data

getaux
arrayref

length
delta

getlength
getdelta

putaux
replace
set
putlength
putdelta

binary, list
or symbol
lispval
block of contiguous
lisp val
fixnum
fixnum

1.2.11. value A value cell contains a pointer to a lispval. This type is used mainly by
arrays of general lisp objects. Value cells are created with the ptr function. A value
cell containing a pointer to the symbol 'foo' is printed as '(ptr to)foo'

1.2.12. hunk A hunk is a vector of from 1 to 128 lispvals. Once a hunk is created (by
hunk or makhunk) it cannot grow or shrink. The access time for an element of a
hunk is slower than a list cell element but faster than an array. Hunks are really only
allocated in sizes which are powers of two, but can appear to the user to be any size
in the 1 to 128 range. Users of hunks must realize that (not (atom 'lispva/)) will
return true if lispval is a hunk. Most lisp systems do not have a direct test for a list
cell and instead use the above test and assume that a true result means lispval is a list
cell. In FRANZ LISP you can use dtpr to check for a list cell. Although hunks are not
list cells, you can still access the first two hunk elements with cdr and car and you can
access any hunk element with crrt. You can set the value of the first two elements of
a hunk with rplacd and rplaca and you can set the value of any element of the hunk
with rplacx. A hunk is printed by printing its contents surrounded by ( and }. However a hunk cannot be read in in this way in the standard lisp system. It is easy to
write a reader macro to do this if desired.

1.2.13. other Occasionally, you can obtain a pointer to storage not allocated by the lisp
system. One example of this is the entry field of those FRANZ LISP functions written
in C. Such objects are classified as of type other. Foreign functions which call malloc
to allocate their own space, may also inadvertantly create such objects. The garbage
collector is supposed to ignore such objects.

1.3. Documentation The conventions used in the following chapters were designed to give
a great deal of information in a brief space. The first line of a function description contains the function name in bold face and then lists the arguments, if any. The arguments
all have names which begin with a letter or letters and an underscore. The letter(s)
gives the allowable type(s) for that argument according to this table.

tin a hunk, the function cdr references the first element and car the second.

2-216 Franz Lisp Manual

Letter
g
s
t
l
n
i
x
b
f
u
y
v

a
e
p

Allowable type(s)
any type
symbol (although nil may not be allowed)
string
list (although nil may be allowed)
number (fixnum, flonum, bignum)
integer (fixnum, bignum)
fixnum
bignum
flonum
function type (either binary or lambda body)
binary
vector
vectori
array
value
port (or nil)
hunk

In the first line of a function description, those arguments preceded by a quote mark are
evaluated (usually before the function is called). The quoting convention is used so that
we can give a natne to the result of evaluating the argument and we can describe the
allowable types. If an argument is not quoted it does not mean that that argument will
not be evaluated, but rather that if it is evaluated, the time at which it is evaluated will
be specificaliy mentioned in the function description. Optional arguments are surrounded by square brackets. An ellipsis (...) means zero or more occurrences of an
argument of the directly preceding type.

Franz Lisp Manual 2-217

CHAPTER 2
Data Structure Access

The following functions allow one to create and manipulate the various types of lisp data
structures. Refer to §1.2 for details of the data structures known to FRANZ LISP.

2.1. Lists
The following functions exist for the creation and manipulating of lists. Lists are
composed of a linked list of objects called either 'list cells', 'cons cells' or 'dtpr cells'.
Lists are normally terminated with the special symbol nil. nil is both a symbol and a
representation for the empty list ().

2.1.1. list creation
(cons 'g_argl 'g_arg2)
RETURNS: a new list cell whose car is g_argl and whose cdr is g_arg2.
<xcons 'g_argl 'g_arg2)
EQUIVALENT TO: (cons 'g_arg2 'g_argl)

,.<neons 'g_arg)
EQUIVALENT TO: (cons 'g_arg nil)

(list ['g_argl ... ])
RETURNS: a list whose elements are the g_argi.
(append 'l_argl 'l_arg2)
RETURNS: a list containing the elements of l_argl followed by l_arg2.
NOTE: To generate the result, the top level list cells of l_argl are duplicated and the cdr of
the last list cell is set to point to l_arg2. Thus this is an expensive operation if
l_argl is large. See the descriptions of nconc and tconc for cheaper ways of doing
the append if the list l_argl can be altered.

2-218 Franz Lisp Manual

(appendl 'l_argl 'g_arg2)
RETURNS: a list like l_argl with g_arg2 as the last element.
NOTE: this is equivalent to (append 'l_argl (list 'g_arg2)).

; A common mistake is using append to add one element to the end of a list
- > (append '(a b c d) 'e)
(ab c d. e)
; The user intended to say:
- > (append '(a b c d) '(e))
(ab c de)
; better is append}
-> (append} '(ab c d) 'e)
(ab c de)

(quote! [g_qformi] ... [! 'g_eform11 ... [!! 'l_formi] .. .>
RETURNS: The list resulting from the splicing and insertion process described below.
NOTE: quote! is the complement of the list function. list forms a list by evaluating each for
in the argument list; evaluation is suppressed if the form is quoteed. In quote!,
each form is implicitly quoteed. To be evaluated, a form must be preceded by one
of the evaluate operations ! and !!. ! g_eform evaluates g_form and the value is
inserted in the place of the call; !! l_form evaluates l_form and the value is spliced
into the place of the call.
'Splicing in' means that the parentheses surrounding the list are removed as the
example below shows. Use of the evaluate operators can occur at any level in a
form argument.
Another way to get the effect of the quote! function is to use the backquote character macro (see§ 8.3.3).

(quote! cons ! (cons J 2) 3) == (cons (J • 2) 3)
(quote! J !! (/ist 2 3 4) 5) - (J 2 3 4 5)
(setq quoted 'evaled)(quote! ! ((1 am ! quoted))) - ((1 am evaled))
(quote! try ! '(this ! one)) .. (try (this ! one))

Franz Lisp Manual 2-219
(bignum-to-list 'b_arg)
RETURNS: A list of the fixnums which are used to represent the bignum.
NOTE: the inverse of this function is list-to-bignum.

Oist-to-bignum 'l_ints)
WHERE: l_ints is a list of fixnums.
RETURNS: a bignum constructed of the given fixnums.
NOTE: the inverse of this function is bignum-to-list.

2.1.2. list predicates
(dtpr 'g_arg)
RETURNS: t ift' g_arg is a list cell.
NOTE: that (dtpr '()) is nil.

Oistp 'g_arg)
RETURNS: t iff g_arg is a list object or nil.

(tailp 'l_x 'l_y)
RETURNS: l_x, if a list cell eq to l_x is found by cdring down l_y zero or more times, nil

otherwise.

- > (setq x '(a b c d) y fcddr x))
(c d)

- > (and (dtpr x) (/istp x))

; x and y are dtprs and lists

- > (dtpr '())

; () is the same as nil and is not a dtpr

nil

/- > Oistp '())

; however it is a list

- > (tailp y x)
(c d)

(length 'l_arg)
RETURNS: the number of elements in the top level of list l_arg.

2.1.3. list accessing

2-220 Franz Lisp Manual

(car 'I arg)
(cdr 'Carg)
RETURNS: cons cell.

(car (cons x y)) is always x, (cdr (cons x y)) is always y. In FRANZ

LISP, the cdr portion is located first in memory.

This is hardly noticeable, and

seems to bother few.
(c •• r 'lh_arg)
the .. represents any positive number of a's and d's.
RETURNS: the result of accessing the list structure in the way determined by the function
name. Thea's and d's are read from right to left, ad directing the access down
the cdr part of the list cell and an a down the car part.
NOTE: lh arg may also be nil, and it is guaranteed that the car and cdr of nil is nil. If
lh=arg is a hunk, then (car 'lh_arg) is the same as (cxr 1 'lh_arg) and (cdr '/h_arg)
is the same as (cxr 0 'lh_arg).
It is generally hard to read and understand the context of functions with large
strings of a's and d's, but these functions are supported by rapid accessing and
open-compiling (see Chapter 12).

WHERE:

(nth 'x_index 'l_list)
RETURNS: the nth element of l_list, assuming zero-based index. Thus (nth 0 l_list) is the
same as (car I list). nth is both a function, and a compiler macro, so that more
efficient code might be generated than for nthelem (described below).
NOTE: If x_argl is non-positive or greater than the length of the list, nil is returned.
(nthcdr 'x_index 'l_list)
RETURNS: the result of cdring down the list l_list x_index times.
NOTE: If x_index is less than 0, then (cons nil '/_list) is returned.
(nthelem 'x_argl 'l_arg2)
RETURNS: The x_argl 'st element of the list l_arg2.
NOTE: This function comes from the PDP-11 lisp system.
(last 'l_arg)
RETURNS: the last list cell in the list l_arg.

EXAMPLE: last does NOT return the last element of a list!

(last '(a b)) - (b)

Odiff 'l_x 'l_y)
RETURNS: a

list of all elements in l_x but not in l_y , i.e., the list difference of l_x and
l_y.
NOTE: l_y must be a tail of l_x, i.e., eq to the result of applying some number of cdr's to
l_x. Note that the value of ldiff is always new list structure unless l_y is nil,
in which case {/diff l_x nil) is l_x itself. If l_y is not a tail of l_x, /diff generates
an error.
EXAMPLE: (/diff 'l_x (member 'gJoo 'l_x)) gives all elements in l_x up to the first g_foo.

Franz Lisp Manual 2-221

2.1.4. list manipulation
(rplaca 'lh_argl 'g_arg2)
RETURNS: the modified lh_argl.
SIDE EFFECT: the car of lh_argl is set to g_arg2. If lh_argl is a hunk then the second
element of the hunk is set to g_arg2.
(rplacd 'lh_argl 'g_arg2)
RETURNS: the modified lh_argl.
SIDE EFFECT: the cdr of lh_arg2 is set to g_arg2. If lh_argl is a hunk then the first element of the hunk is set to g_arg2.
(attach 'g_x 'l_l)
RETURNS: l_l whose car is now g_x, whose cadr is the original (car I_/), and whose cddr is
the original (cdr I_/).
NOTE: what happens is that g_x is added to the beginning of list l_l yet maintaining the
same list cell at the beginning of the list.
(delete 'g_val 'l_list ['x_countD
RETURNS: the result of splicing g_val from the top level of l_list no more than x_count
times.
NOTE: x_count defaults to a very large number, thus if x_count is not given, all
occurrences of g val are removed from the top level of I list. g_val is compared
with successive car's of l_list using the function equal.
SIDE EFFECT: l_list is modified using rplacd, no new list cells are used.
(delq 'g_val 'l_list ['x_countD
(dremove 'g_val 'l_list ['x_count])
RETURNS: the result of splicing g_val from the top level of l_list no more than x_count
times.
NOTE: delq (and dremove) are the same as delete except that eq is used for comparison
instead of equal.

: note that you should use the value returned by delete or delq
; and not assume that g_val will always show the deletions.
; For example

- > (setq test '(a b c a de))
(ab cad e)

- > (delete 'a test)
(b c d e)

: the value returned is what we would expect

-> test
(ab c de)

; but test still has the first a in the list!

2-222 Franz Lisp Manual

(remq 'g_x '1_1 ['x_count])
(remove 'g_x 'l_l)
RETURNS: a copy of 1_l with all top level elements equal to g_x removed. remq uses eq
instead of equal for comparisons.
NOTE: remove does not modify its arguments like delete, and de/q do.
(insert 'g_object 'l_list 'u_comparefn 'g_nodups)
RETURNS: a list consisting of l list with g object destructively inserted in a place determined by the ordering function u_comparefn.
NOTE: (comparefn 'g_x 'gy) should return something non-nil if g_x can precede g_y in
sorted order, nil if g_y must precede g_x. If u_comparefn is nil, alphabetical order
will be used. If g_nodups is non-nil, an element will not be inserted if an equal element is already in the list. insert does binary search to determine where to insert
the new element.
(merge 'l_datal 'l_data2 'u_comparefn)
RETURNS: the merged list of the two input sorted lists l datal and l datal using binary
comparison function u comparern.
NOTE: (comparefn 'g_x 'gy) should return something non-nil if g_x can precede g_y in
sorted order, nil if g_y must precede g_x. If u_comparefn is nil, alphabetical order
will be used. u_comparefn should be thought of as "less than or equal". merge
changes both of its data arguments.
(subst 'g_x 'g_y 'l_s)
(dsubst 'g_x 'g_y 'l_s)
RETURNS: the result of substituting g_x for all equal occurrences of g_y at all levels in l_s.
NOTE: If g_y is a symbol, eq will be used for comparisons. The function subst does not
modify l s but the function dsubst (destructive substitution) does.
Osubst 'l_x 'g_y 'l_s)
RETURNS: a copy of l_s with l_x spliced in for every occurrence of of g_y at all levels.
Splicing in means that the parentheses surrounding the list l_x are removed as
the example below shows.

- > (subst '(a b c) x '(x y z (x y z) (x y z)))
((a b c) y z ((a b c) y z) ((a b c) y z)}
- > (/subst '(a b c) x '(x y z (x y z) (x y z)))
(a b c y z (a b c y z) (a b c y z)}

Franz Lisp Manual 2-223

(subpair 'l_old 'l_new 'l_expr)
WHERE: there are the same number of elements in l_old as I_new.
RETURNS: the list l_expr with all occurrences of a object in l_old replaced by the
corresponding one in l_new. When a substitution is made, a copy of the value
to substitute in is not made.
EXAMPLE: (subpair '(a c)' (x y) '(a b c d)) == (x by d)

(nconc 'l_argl 'l_arg2 ['I_arg3 ... ])
RETURNS: A list consisting of the elements of l_argl followed by the elements of l_arg2
followed by l_arg3 and so on.
NOTE: The cdr of the last list cell of l_argi is changed to point to l_argi+ I.

; nconc is faster than append because it doesn't allocate new list cells.
- > (setq /isl '(a b c))

(ab c)

- > (setq /is2 '(def})
(def)

- > (append /isl lis2)
(abcdeO

- > /isl
(ab c)

; note that lisl has not been changed by append

- > (nconc /isl lis2)
(a b c d e f) ; nconc returns the same value as append

-> /isl
(a b c d e f) ; but in doing so alters tis I

(reverse 'l_arg)
(nreverse 'l_arg)
RETURNS: the list l_arg with the elements at the top level in reverse order.
NOTE: The function nreverse does the reversal in place, that is the list structure is
modified.
(nreconc 'l_arg 'g_arg)
EQUIVALENT TO: (nconc (nreverse 'l_arg) 'g_arg)

2.2. Predicates
The following functions test for properties of data objects. When the result of the
test is either 'false' or 'true', then nil will be returned for 'false' and something other
than nil (often t) will be returned for 'true'.
/ '

2-224 Franz Lisp Manual

(arrayp 'g_arg)
RETURNS: tiff g_arg is of type array.

(atom 'g_arg)
RETURNS: tiff g_arg is not a list or hunk object.
NOTE:

(atom '()) returns t.

(bcdp 'g_arg)
RETURNS: tiff g_arg is a data object of type binary.
NOTE: the name of this function is a throwback to the PDP-11 Lisp system.

(blgp 'g_arg)
RETURNS: t iff g_arg is a bignum.

(dtpr 'g_arg)
RETU~NS: t iff g_arg is a list cell.

NOTE: that (dtpr '()) is nil.

(hunkp 'g_arg)
RETURNS: t iff g_arg is a hunk.

(listp 'g_arg)
RETURNS: t iff g_arg is a list object or nil.

(stringp 'g_arg)
RETURNS: t iff g_arg is a string.
(symbolp 'g_arg)
RETURNS: t iff g_arg is a symbol.
(valuep 'g_arg)
RETURNS: tiff g_arg is a value cell

(vectorp 'v_vector)
RETURNS: t iff the argument is a vector.
(vectorip 'v_vector)
RETURNS: tiff the argument is an immediate-vector.

Franz Lisp Manual 2-225

(type 'g_arg)
(typep 'g_arg)
RETURNS: a symbol whose pname describes the type of g_arg.

(signp s_test 'g_val)
RETURNS: t iff g_val is a number and the given test s_test on g_val returns true.
NOTE: The fact that signp simply returns nil if g_val is not a number is probably the most
important reason that signp is used. The permitted values for s_test and what they
mean are given in this table.
s_test

tested

1
le
e
n
ge
g

gval<O
gval~O

g_val - 0
g_val ¢ 0
g_val ;:ii 0
gval>O

(eq 'g_argl 'g_arg2)
RETURNS: t if g_argl and g_arg2 are the exact same lisp object.
NOTE: Eq simply tests if g_argl and g_arg2 are located in the exact same place in memory.
Lisp objects which print the same are not necessarily eq. The only objects
guaranteed to be eq are interned symbols with the same print name. [Unless a
symbol is created in a special way (such as with uconcat or maknam) it will be
interned.]
(neq 'g_x 'g_y)
RETURNS: t if g_x is not eq to g_y, otherwise nil.

(equal 'g_argl 'g_arg2)
(eqstr 'g_argl 'g_arg2)
RETURNS: tiff g_argl and g_arg2 have the same structure as described below.
NOTE: g_arg and g_arg2 are equal if
(1) they are eq.
(2) they are both fixnums with the same value
(3) they are both flonums with the same value
(4) they are both bignums with the same value
(5) they are both strings and are identical.
(6) they are both lists and their cars and cdrs are equal

2-226 Franz Lisp Manual

; eq is much faster than equal, especially in compiled code,
; however you cannot use eq to test for equality of numbers outside
; of the range -1024 to 1023. equal will always work.
- > (eq 1023 1023)
t

- > (eq 1024 1024)
nil

- > (equal 1024 1024)
t

(not 'g_arg)
(null 'g_arg)
RETURNS: tiff g_arg is nil.

(member 'g_argl 'l_arg2)
(memq 'g_argl 'l_arg2)
RETURNS: that part of the l_arg2 beginning with the first occurrence of g_argl. If g_argl is
not in the top level of l_arg2, nil is returned.
NOTE: member tests for equality with equa~ memq tests for equality with eq.

2.3. Symbols and Strings
In many of the following functions the distinction between symbols and strings is
somewhat blurred. To remind ourselves of the difference, a string is a null terminated
sequence of characters, stored as compactly as possible. Strings are used as constants in
FRANZ LISP. They eval to themselves. A symbol has additional structure: a value, property list, function binding, as well as its external representation (or print-name). If a
symbol is given to one of the string manipulation functions below, its print name will be
used.
Another popular way to represent strings in Lisp is as a list of fixnums which
represent characters. The suffix 'n' to a string manipulation function indicates that it
returns a string in this form.

2.3.1. symbol and string creation

Franz Lisp Manual 2-227

(concat ['stn_argl ... ])
(uconcat ['stn argl ... ])
RETURNS: a symbol whose print name is the result of concatenating the print names, string
characters or numerical r~presentations of the sn_argi.
NOTE: If no arguments are given, a symbol with a null pname is returned. concat places
the symbol created on the oblist, the function uconcat does the same thing but does
not place the new symbol on the oblist.
EXAMPLE: (concat 'abc (add 3 4) "def) - abc7def
(concatl 'l_arg)
EQUIVALENT TO: (apply 'concat 'l_arg}

(implode 'l_arg)
(maknam 'l_arg)
WHERE: l_arg is a list of symbols, strings and small fixnullis.
RETURNS: The symbol whose print name is the result of concatenating the first characters
of the print names of the symbols and strings in the list. Any fixnums are converted to the equivalent ascii character. In order to concatenate entire strings or
print names, use the function concat.
NOTE: implode interns the symbol it creates, maknam does not.
(gensym ['s_Ieader])
RETURNS: a new uninterned atom beginning with the first character of s_leader's pname,
or beginning with g if s_leader is not given.
NOTE: The symbol looks like xOnnnnn where x is s_leader's first character and nnnnn is
the number of times you have called gensym.
(copysymbol 's_arg 'g_pred)
RETURNS: an uninterned symbol with the same print name as s_arg. If g_pred is non nil,
then the value, function binding and property list of the new symbol are made
eq to those of s_arg.
(ascii 'x_charnum)
WHERE: x_charnum is between 0 and 255.
RETURNS: a symbol whose print name is the single character whose fixnum representation
is x_charnum.

2-228 Franz Lisp Manual

(intern 's_arg)
RETURNS: s_arg
SIDE EFFECT: s_arg is put on the oblist if it is not already there.

Cremob 's_symbol)
RETURNS: s_symbol
SIDE EFFECT: s_symbol is removed from the oblist.
Crematom 's_arg)
RETURNS: t if s_arg is indeed an atom.
SIDE EFFECT: s_arg is put on the free atoms list, effectively reclaiming an atom cell.
NOTE: This function does not check to see if s_arg is on the oblist or is referenced anywhere. Thus calling rematom on an atom in the oblist may result in disaster when
that atom cell is reused!

2.3.2. string and symbol predicates
(boundp 's_name)
RETURNS: nil if s_name is unbound, that is it has never be given a value. If x_name has
the value g_val, then (nil . g_val) is returned.
(alphalessp 'st_argl 'st_arg2)
RETURNS: t itf the 'name' of st_argl is alphabetically less than the name of st_arg2. If
st_arg is a symbol then its 'name' is its print name. If st_arg is a string, then its
'name' is the string itself.

2.3.3. symbol and string accessing
(symeval 's_arg)
RETURNS: the value of symbol s_arg.
NOTE: It is illegal to ask for the value of an unbound symbol. This function has the same
effect as eval, but compiles into much more efficient code.
(get_pname 's_arg)
RETURNS: the string which is the print name of s_arg.

F:ranz Lisp Manual 2-229

(plist 's_arg)
RETURNS: the property list of s_arg.

(getd 's_arg)
RETURNS: the function definition of s_arg or nil if there is no function definition.
NOTE: the function definition may turn out to be an array header.

(getcbar 's_arg 'x_index)
(nthcbar 's arg 'x index)
<1etcham - arg - index)
RETURNS: the x_indexth character of the print name of s_arg or nil if x_index is less than
1 or greater than the length of s_arg's print name.
NOTE: getchar and nthchar return a symbol with a single character print name, getcharn
returns the fixnum representation of the character.

(substring 'st_string 'x_index ['x_length])
(substringn 'st_string 'x_index C'x_length])
RETURNS: a string of length at most x_length starting at x_indexth character in the string.
NOTE: If x_length is not given, all of the characters for x_index to the end of the string
are returned. If x_index is negative the string begins at the x_indexth character
from the end. If x_index is out of bounds, nil is returned.
NOTE: substring returns a list of symbols, substringn returns a list of fixnums. If substringn
is given a 0 x_length argument then a single fixnum which is the x_indexth character is returned.

2.3.4. symbol and string manipulation
(set 's_argl 'g_arg2)
RETURNS: g_arg2.
SIDE EFFECT: the value of s_argl is set to g_arg2.
(setq s_atml 'g_vall [ s_atm2 'g_val2 ...... ])
WHERE: the arguments are pairs of atom names and expressions.
RETURNS: the last g_vali.
SIDE EFFECT: each s_atmi is set to have the value g_vali.
NOTE: set evaluate~ all of its arguments, setq does not evaluate the s_atmi.

2-230 Franz Lisp Manual

(desetq sl_patternl 'g_expl [. ..... ])
RETURNS: g_expn
SIDE EFFECT: This acts just like setq if all the sl_patterni are symbols. If sl_patterni is a
list then it is a template which should have the same structure as g_expi
The symbols in sl_pattern are assigned to the corresponding parts of g_exp.
EXAMPLE: (desetq (a b (c . d)) '(J 2 (3 4 5)))
sets a to 1, b to 2, c to 3, and d to (4 5).
(setplist 's_atm 'l_plist)
RETURNS: l_plist.
SIDE EFFECT: the property list of s_atm is set to l_plist.
(makunbound 's_arg)
RETURNS: s_arg
SIDE EFFECT: the value of s_arg is made 'unbound'. If the interpreter attempts to evaluate s_arg before it is again given a value, an unbound variable error will
occur.
(aexplode 's_arg)
(explode 'g_arg)
(aexplodec 's_arg)
(explodec 'g_arg)
(aexploden 's_arg)
(exploden 'g_arg)
RETURNS: a list of the characters used to print out s_arg or g_arg.
NOTE: The functions beginning with 'a' are internal functions which are limited to symbol
arguments. The functions aexplode and explode return a list of characters which print
would use to print the argument. These characters include all necessary escape
characters. Functions aexplodec and explodec return a list of characters which patom
would use to print the argument (i.e. no escape characters). Functions aexploden
and exploden are similar to aexplodec and explodec except that a list of fixnum
equivalents of characters are returned.

- > (setq x lquote this~ ok .:V
l<!uote this~ ok?I
- > (explode x)
<q u o t e ~\I I It h i s ~\I I I~~ M~~ 11 o k ?>
; note that ~\I just means the single character: backslash.
; and Mjust means the single character: vertical bar
; and 11 means the single character: space
- > (exp/odec x)
<q u o t e 11 t h is 11N11 o k ?)
- > (exploden x)
(113 117 111 116 101 32 116 104 105 115 32 124 32 lll 107 63)

Franz Lisp Manual 2-231

2.4. Vectors
See Chapter 9 for a discussion of vectors. They are intermediate in efficiency
between arrays and hunks.

2.4.1. vector creation
(new-vector 'x_size ['g_fill ['g_prop]])
RETURNS: A vector of length x_size. Each data entry is initialized to g_fill, or to nil, if the
argument g_fill is not present. The vector's property is set to g_prop, or to nil,
by default.
(new-vectori-byte 'x_size ['g_fill ['g_prop]])
(new-vectori-word 'x_size ['g_fill ['g_prop]])
(new-vectori-long 'x_size ['g_fill ['g_prop]])
RETURNS: A vectori with x size elements in it. The actual memory requirement is two
long words + x_size* (n bytes), where n is 1 for new-vector-byte, 2 for newvector-word, or 4 for new-vectori-long. Each data entry is initialized to g_fill, or
to zero, if the argument g_fill is not present. The vector's property is set to
g_prop, or nil, by default.

Vectors may be created by specifying multiple initial values:
(vector ['g_valO 'g_vall ... ])
RETURNS: a vector, with as many data elements as there are arguments. It is quite possible to have a vector with no data elements. The vector's property will be null.
(vectori-byte ['x valO 'x val2 ... ])
(vectori-word
valO
val2 ... ])
(vectori-long ['xyalO 'x~val2 ... ])
RETURNS: a vectori, with as many data elements as there are arguments. The arguments
are required to be fixnums. Only the low order byte or word is used in the case
of vectori-byte and vectori-word. The vector's property will be null.

['x

2.4.2. vector reference
(vref 'v vect 'x index)
(vrefl-byte 'V vect 'x bindex)
(vrefl-word 'V vect
windex)
(vrefl-long 'V~vect 'x]index)
RETURNS: the desired data element from a vector. The indices must be fixnums. Indexing is zero-based. The vrefi functions sign extend the data.

2-232 Franz Lisp Manual

(vprop 'Vv_vect)
RETURNS: The Lisp property associated with a vector.
(vget 'Vv_vect)
RETURNS: The value stored under g_ind if the Lisp property associated with 'Vv_vect is a
disembodied property list.
(vsize 'Vv vect)
(vsize-byte 'V_vect)
(vsize-word 'V_vect)
RETURNS: the number of data elements in the vector. For immediate-vectors, the functions vsize-byte and vsize-word return the number of data elements, if one
thinks of the binary data as being comprised of bytes or words.

2.4.3. vector modfication
(vset 'v_vect 'x_index 'g_val)
(vseti-byte 'V vect 'x bindex 'x val)
(vsetl-word 'V-_v~ct 'x_windex 'x_val)
(vseti-long 'V_vect 'x_lindex 'x_val)
RETURNS: the datum.
SIDE EFFECT: The indexed element of the vector is set to the value. As noted above, for
vseti-word and vseti-byte, the index is construed as the number of the data
element within the vector. It is not a byte address. Also, for those two
functions, the low order byte or word of x_val is what is stored.
(vsetprop 'Vv_vect 'g_value)
RETURNS: g_value. This should be either a symbol or a disembodied property list whose
car is a symbol identifying the type of the vector.
SIDE EFFECT: the property list of Vv_vect is set to g_value.
(vputprop 'Vv_vect 'g_value 'g_ind)
RETURNS: g_value.
SIDE EFFECT: If the vector property of Vv_vect is a disembodied property list, then
vputprop adds the value g_value under the indicator g_ind. Otherwise, the
old vector property is made the first element of the list.

2.5. Arrays
See Chapter 9 for a complete description of arrays. Some of these functions are
part of a Maclisp array compatibility package, which represents only one simple way of
using the array structure of FRANZ LISP.

2.5.1. array creation

Franz Lisp Manual 2-233

(marray 'g_data 's_access 'g_aux 'x_length 'x_delta)
RETURNS: an array type with the fields set up from the above arguments in the obvious
way (see§ 1.2.10).
(*array 's_name 's_type 'x_diml ... 'x_dimn)
(array s_name s_type x_diml ... x_dimn)
WHERE: s_type may be one oft, nil, fixnum, flonum, fixnum-block and flonum-block.
RETURNS: an array of type s_type with n dimensions of extents given by the x_dimi.
SIDE EFFECT: If s_name is non nil, the function definition of s_name is set to the array
structure returned.
NOTE: These functions create a Maclisp compatible array. In FRANZ LISP arrays of type t,
nil, fixnum and flonum are equivalent and the elements of these arrays can be any
type of lisp object. Fixnum-block and flonum-block arrays are restricted to fixnums
and flonums respectively and are used mainly to communicate with foreign f unctions (see §8.5).
NOTE: "array evaluates its arguments, array does not.

· 2.5.2. array predicate
(arrayp 'g_arg)
RETURNS: t iff g_arg is of type array.

2.5.3. array accessors
(getaccess 'a array)
(getaux 'a_array)
(getdelta 'a_array)
(getdata 'a_array)
(getlength 'a_array)
RETURNS: the field of the array object a_array given by the function name.
(arrayref 'a_name 'x_ind)
RETURNS: the x_indth element of the array object a_name. x_ind of zero accesses the first
element.
NOTE: arrayrefuses the data, length and delta fields of a_name to determine which object
to return.

2-234 Franz Lisp Manual

(arraycall s_type 'as_array 'x_indl ... )
RETURNS: the element selected by the indicies from the array a_array of type s_type.
NOTE: If as_array is a symbol then the function binding of this symbol should contain an
array object.
s_type is ignored by arraycall but is included for compatibility with Maclisp.
(arraydims 's_name)
RETURNS: a list of the type and bounds of the array s_name.
(listarray 'sa_array ['x_elements])
RETURNS: a list of all of the elements in array sa array. If x_elements is given, then only
the first x_elements are returned.
-

; We will create a 3 by 4 array of general lisp objects

- > (array ernie t 3 4)
array[l2]
; the array header is stored in the function definition slot of the
; symbol ernie

- > (arrayp (getd 'ernie))
t

- > (arraydims (getd 'ernie))
(t 3 4)

; store in ernie[2][2) the list (test list)

- > (store (ernie 2 2) '(test list))
(test list)
; check to see if it is there

- > (ernie 2 2)
(test list)
; now use the low level function arrayref to find the same element
; arrays are 0 based and row-major (the last subscript varies the fastest)
; thus element [2)(2) is the 10th element, (starting at 0).

- > (arrayref (getd 'ernie) 10)
(ptr to){test list)

; the result is a value cell (thus the (ptr to))

2.5.4. array manipulation

Franz Lisp Manual 2-235

(putaccess 'a_array 'su_func)
(putaux 'a_array 'g_aux)
(putdata 'a_array 'g_arg)
(putdelta 'a_array 'x_delta)
(putlength 'a_array 'x_length)
RETURNS: the second argument to the function.
SIDE EFFECT: The field of the array object given by the function name is replaced by the
second argument to the function.
(store 'l_arexp 'g_val)
WHERE: l_arexp is an expression which references an array element.
RETURNS: g_val
SIDE EFFECT: the array location which contains the element which l_arexp references is
changed to contain g_val.
(fillarray 's_array 'l_itms)
RETURNS: s_array
SIDE EFFECT: the array s array is filled with elements from 1 itms. If there are not
enough elements in l_itms to fill the entire array, then the last element of
l_itms is used to fill the remaining parts of the array.

2.6. Hunks
Hunks are vector-like objects whose size can range from 1 to 128 elements. Internally hunks are allocated in sizes which are powers of 2. In order to create hunks of a
given size, a hunk with at least that many elements is allocated and a distinguished symbol EMPTY is placed in those elements not requested. Most hunk functions respect those
distinguished symbols, but there are two ("'makhunk and "'rplacx) which will overwrite the
distinguished symbol.

2.6.1. hunk creation
(hunk 'g_vall ['g_val2 ... 'g_valn])
RETURNS: a hunk of length n whose elements are initialized to the g_vali.
NOTE: the maximum size of a hunk is 128.
EXAMPLE: (hunk 4 'sharp 'keys) - {4 sharp keys}

2-236 Franz Lisp Manual

(makhunk 'xl_arg)
RETURNS: a hunk of length xl_arg initialized to all nils if xl_arg is a fixnum. If xl_arg is a
list, then we return a hunk of size (length 'xi arg) initialized to the elements in
xl_arg.
NOTE: (makhunk '(ab c)) is equivalent to (hunk 'a 'b 'c).
EXAMPLE: (makhunk 4) - {nil nil nil ni4
(*makhunk 'x_arg)
RETURNS: a hunk of size 2x_arg initialized to EMPTY.
NOTE: Thjs is only to be used by such functions as hunk and makhunk which create and

initialize hunks for users.

2.6.2. hunk accessor
(cxr 'x_ind 'h_hunk)
RETURNS: element x_ind (starting at 0) of hunk h_hunk.
(hunk-to-list 'h_hunk)
RETURNS: a list consisting of the elements of h_hunk.

2.6.3. hunk manipulators
(rplacx 'x_ind 'h_hunk 'g_val)
(*rplacx 'x_ind 'h_hunk 'g_val)
RETURNS: h_hunk
SIDE EFFECT: Element x_ind (starting at 0) of h_hunk is set to g_val.
NOTE: rplacx will not modify one of the distinguished (EMPTY) elements whereas "'rplacx
will.
hunksize 'h_arg)
RETURNS: the size of the hunk h_arg.
EXAMPLE: (hunksize (hunk 1 2 3)) - 3

2. 7. Beds
A bed object contains a pointer to compiled code and to the type of function object
the compiled code represents.

Franz Lisp Manual 2-237

(getdisc 'y_bcd)
(getentry 'y_bcd)
RETURNS: the field of the bed object given by the function name.
(putdisc 'y_func 's_discipline)
RETURNS: s_discipline
SIDE EFFECT: Sets the discipline field of y_func to s_discipline.

2.8. Structures
There are three common structures constructed out of list cells: the assoc list, the
property list and the tconc list. The functions below manipulate these structures.

2.8.1. assoc list
An 'assoc list' (or alist) is a common lisp data structure. It has the form
((keyl . valuel) (key2 . value2) (key3 . value3) ... (keyn . valuen))
(assoc 'g_argl 'l_arg2)
(assq 'g_argl 'l_arg2)
RETURNS: the first top level element of l_arg2 whose car is equal (with assoc) or eq (with
assq) to g_argl.
NOTE: Usually l_arg4 has an a-list structure and g_argl acts as key.
(sassoc 'g_argl 'l_arg2 'sl_func)
RETURNS: the result of (cond ((assoc 'g_arg 'l_arg2) (apply 'sfJunc nil)))
NOTE: sassoc is written as a macro.
(sassq 'g_argl 'l_arg2 'sl_func)
RETURNS: the result of (cond ((assq 'g_arg 'l_arg2) (apply 'sl_fanc ni/)))
NOTE: sassq is written as a macro.

- -----

-- --------- -- - - - - - -

2-238 Franz Lisp Manual

; assoc or assq is given a key and an assoc list and returns
; the key and value item if it exists, they differ only in how they test
; for equality of the keys.

- > (setq alist '((alpha . a) ( (complex key) . b) (junk. x)))
((alpha. a) ((complex key) . b) (junk. x))
; we should use assq when the key is an atom

- > (assq 'alpha alist)
(alpha. a)
: but it may not work when the key is a list

- > (assq '(complex key) alist)
nil
: however assoc will always work

- > (assoc '(complex key) alist)

((complex key) . b)

(sublls 'l_alst 'l_exp)
l_alst is an a-list.
RETURNS: the list l_exp with every occurrence of keyi replaced by vali.

WHERE:

NOTE: new list structure is returned to prevent modification of 1 exp. When a substitution

is made, a copy of the value to substitute in is not made. -

2.8.2. property list
A property list consists of an alternating sequence of keys and values. Normally
a property list is stored on a symbol. A list is a 'disembodied' property' list if it contains an odd number of elements, the first of which is ignored.
(plist 's_name)
RETURNS: the property list of s_name.

(setplist 's_atm 'l_plist)
RETURNS: l_plist.
SIDE EFFECT: the property list of s_atm is set to l_plist.

Franz Lisp Manual 2-239

(get 'ls_name 'g_ind)
RETURNS: the value under indicator g_ind in ls_name's property list if ls_name is a symbol.
NOTE: If there is no indicator g_ind in ls_name's property list nil is returned. If ls_name
is a list of an odd number of elements then it is a disembodied property list. get
searches a disembodied property list by starting at its cdr, and comparing every
other element with g_ind, using eq.
(getl 'ls_name '!_indicators)
RETURNS: the property list ls_name beginning at the first indicator which is a member of
the list I indicators, or nil if none of the indicators in I indicators are on
ls name's-property list.
NOTE: If ls_name is a list, then it is assumed to be a disembodied property list.

(putprop 'ls_name 'g_val 'g_ind)
(defprop ls_name g_val g_ind)
RETURNS: g_val.
SIDE EFFECT: Adds to the property list of ls name the value g val under the indicator

g_ind.

NOTE: putprop evaluates it arguments, de/prop does not. ls_name may be a disembodied

property list, see get.
Cremprop 'ls_name 'g_ind)
RETURNS: the portion of ls_name's property list beginning with the property under the
indicator g_ind. If there is no g_ind indicator in ls_name's plist, nil is returned.
SIDE EFFECT: the value under indicator g_ind and g_ind itself is removed from the property list of ls_name.
NOTE: ls_name may be a disembodied property list, see get.

- > (putprop )date 'a 'alpha)
a

- > (putprop 'x/ate 'b 'beta)

- > (plist 'xlate)
(alpha a beta b)
- > (get 'xlate 'alpha)

a
; use of a disembodied property list:
- > (get '(ni/fateman tjf sklower klsfoderarojkf) 'sklower)
kls

2-240 Franz Lisp Manual

2.8.3. tconc structure
A tconc structure is a special type of list designed to make it easy to add objects
to the end. It consists of a list cell whose car points to a list of the elements added
with tconc or /cone and whose cdr points to the last list cell of the list pointed to by
the car.
<tconc 'l_ptr 'g_x)
WHERE: l_ptr is a tconc structure.
RETURNS: l_ptr with g_x added to the end.
(lconc 'l_ptr 'l_x)
WHERE: l_ptr is a tconc structure.
RETURNS: l_ptr with the list l_x spliced in at the end.

; A tconc structure can be initialized in two ways.
; nil can be given to tconc in which case tconc will generate
; a tconc structure.

- > (setq Joo (tconc nil 1))
((1) l)

; Since tconc destructively adds to
; the list, you can now add to foo without using setq again.

- > (tconc foo 2)
((I 2) 2)

->Joo
((I 2) 2)

; Another way to create a null tconc structure
; is to use (neons nil}.
~ > (setq foo (neons nil))

(nil)

- > (tconc Joo I)
((I) I)

; now see what /cone can do

- > (/cone Joo nil)

((I) l)

- > (/cone foo '(2 3 4))

; no change

((I 2 3 4) 4)

2.8.4. fclosures
An fclosure is a functional object which admits some data manipulations. They
are discussed in §8.4. Internally, they are constructed from vectors.

Franz Lisp Manual 2-241

(fclosure 'l_vars 'g_funobj)
WHERE:

l_vars is a list of variables, g_funobj is any object that can be funcalled (including, fclosures).

RETURNS: A vector which is the fclosure.

(fclosure-alist 'v_fclosure)
RETURNS: An association list representing the variables in the fclosure. This is a snapshot
of the current state of the fclosure. If the bindings in the fclosure are changed,
any previously calculated results of fc/osure-alist will not change.
(fclosure-function 'v_fclosure)
RETURNS: the functional object part of the fclosure.
(fclosurep 'v_fclosure)
RETURNS: t iff the argument is an f closure.

(symeval-in-fclosure 'v_fclosure 's_symbol)
RETURNS: the current binding of a particular symbol in an fclosure.

(set-in-fclosure 'v_[closure 's_symbol 'g_newvalue)
RETURNS: g_newvalue.
SIDE EFFECT: The variable s_symbol is bound in the fclosure to g_newvalue.

2.9. Random functions
The following functions don't fall into any of the classifications above.
(bcdad 's_funcname)
RETURNS: a

fixnum which is the address in memory where the function s funcname
begins. If s_funcname is not a machine coded function (binary) then bcdad
returns nil.

(copy 'g_arg)
RETURNS: A structure equal to g_arg but with new list cells.

(copyint* 'x_arg)
RETURNS: a fixnum with the same value as x_arg but in a freshly allocated cell.

(cpyl 'xvt_arg)
RETURNS: a new cell of the same type as xvt_arg with the same value as xvt_arg.

2-242 Franz Lisp Manual

(getaddress 's_entryl 's_binderl 'st_disciplinel [. ........ ])
RETURNS: the binary object which s_binderl 's function field is set to.
NOTE: This looks in the running lisp's symbol table for a symbol with the same name as
s_entryi. It then creates a binary object whose entry field points to s_entryi and
whose discipline is st_disciplinei. This binary object is stored in the function field
of s_binderi. If st_disciplinei is nil, then "subroutine" is used by default. This is
especially useful for cfasl users.
(macroexpand 'g_form)
RETURNS: g_form after all macros in it are expanded.
NOTE: This function will only macroexpand expressions which could be evaluated and it
does not know about the special nlambdas such as cond and do, thus it misses many
macro expansions.
(ptr 'g_arg)
RETURNS: a value cell initialized to point to g_arg.

(quote g_arg)
RETURNS: g_arg.
NOTE: the reader allows you to abbreviate (quote foo) as 'foo.

(kwote 'g_arg)
RETURNS:

(/ist (quote quote) g_arg).

(replace 'g_argl 'g_arg2)
WHERE: g_argl and g_arg2 must be the same type of lispval and not symbols or hunks.
RETURNS: g_arg2.
SIDE EFFECT: The effect of replace is dependent on the type of the g_argi although one
will notice a similarity in the effects. To understand what replace does to
fixnum and flonum arguments, you must first understand that such
numbers are 'boxed' in FRANZ LISP. What this means is that if the symbol
x has a value 32412, then in memory the value element of x's symbol
structure contains the address of another word of memory (called a box)
with 32412 in it.
Thus, there are two ways of changing the value of x: the first is to change
the value element of x's symbol structure to point to a word of memory
with a different value. The second way is to change the value in the box
which x points to. The former method is used almost all of the time, the
latter is used very rarely and has the potential to cause great confusion.
The function replace allows you to do the latter, i.e., to actually change the
value in the box.
You should watch out for these situations. If you do (setq y x), then both x
and y will point to the same box. If you now (replace x 12345), then y will
also have the value 12345. And, in fact, there may be many other pointers
to that box.
Another problem with replacing fixnums is that some boxes are read-only.
The fixnums between -1024 and 1023 are stored in a read-only area and
attempts to replace them will result in an "Illegal memory reference" error

Franz Lisp Manual 2-243

(see the description of copyint• for a way around this problem).
For the other valid types, the effect of replace is easy to understand. The
fields of g_vall 's structure are made eq to the corresponding fields of
g_val2's structure. For example, if x and y have lists as values then the
effect of (replace x y) is the same as (rplaca x (car y)) and (rplacd x (cdr y)).
(scons 'x_arg 'bs_rest)
WHERE: bs_rest is a bignum or nil.
RETURNS: a bignum whose first bigit is x_arg and whose higher order bigits are bs_rest.
(setf g_refexpr 'g_value)
NOTE: seif is a generalization of setq. Information may be stored by binding variables,
replacing entries of arrays, and vectors, or being put on property lists, among others. Setf will allow the user to store data into some location, by mentioning the
operation used to refer to the location. Thus, the first argument may be partially
evaluated, but only to the extent needed to calculate a reference. seif returns
g_value.

(setf x 3)
- (setq x 3)
(setf (car x) 3) - (rplaca x 3)
(setf (get foo 'bar) 3) - (putprop foo 3 'bar)
(self (vref vector index) value) - (vset vector index value)

(sort 'l_data 'u_comparefn)
RETURNS: a list of the elements of l_data ordered by the comparison function u_comparefn
SIDE EFFECT: the list l_data is modified rather than allocate new storage.
NOTE: (comparefn g_x gy) should return something non-nil if g-x can precede g_y in
sorted order; nil if g_y must precede g_x. If u_comparefn is nil, alphabetical order
will be used.
(sortcar 'l_list 'u_comparefn)
RETURNS: a list of the elements of l_list with the car's ordered by the sort function
u_comparefn.
SIDE EFFECT: the list l_list is modified rather than allocating new storage.
NOTE: Like sort, if u_comparefn is nil, alphabetical order will be used.

2-244 Franz Lisp Manual

CHAPTER 3
Arithmetic Functions

This chapter describes FRANZ LISP's functions for doing arithmetic. Often the same function is known by many names, such as add which is also plus, sum, and +. This is due to our
desire to be compatible with other Lisps. The FRANZ LISP user is advised to avoid using functions with names such as + and • unless their arguments are fixnums. The Lisp compiler takes
advantage of the fact that their arguments are fixnums.
An attempt to divide or to generate a floating point result outside of the range of floating
point numbers will cause a floating exception signal from the UNIX operating system. The user
can catch and process this interrupt if desired (see the description of the signal function).

3.1. simple arithmetic functions
(add ['n argl ... ])
(plus l'n_argl ... ])
(sum ['n_argl ... ])
(+ ['x_argl ... ])
RETURNS: the sum of the arguments. If no arguments are given, 0 is returned.
NOTE: if the size of the partial sum exceeds the limit of a fixnum, the partial sum will be
converted to a bignum. If any of the arguments are flonums, the partial sum will
be converted to a flonum when that argument is processed and the result will thus
be a flonum. Currently, if in the process of doing the addition a bignum must be
converted into a flonum an error message will result.
(addl 'n_arg)
(1+ 'x_arg)
RETURNS: its argument plus 1.

(diff ['n_argl ... ])
(difference ['n_argl ... ])
(- ['x_argl ... ])
RETURNS: the result of subtracting from n_argl all subsequent arguments. If no arguments
are given, 0 is returned.
NOTE: See the description of add for details on data type conversions and restrictions.

Franz Lisp Manual 2-245

(subl 'n_arg)
(1- 'x_arg)
RETURNS: its argument minus 1.

(minus 'n_arg)
RETURNS: zero minus n_arg.

(product ['n_argl ... ])
(times ['n_argl ... ])
(• ['x_argl ... ])
RETURNS: the product of all of its arguments. It returns 1 if there are no arguments.
NOTE: See the description of the function add for details and restrictions to the automatic
data type coercion.
(quotient ['n_argl ... ])
(/ rx_argl ... ])
RETURNS: the result of dividing the first argument by succeeding ones.
NOTE: If there are no arguments, 1 is returned. See the description of the function add
for details and restrictions of data type coercion. A divide by zero will cause a
floating exception interrupt -- see the description of the signal function.
(•quo 'i_x 'i_y)
RETURNS: the integer part of i_x I i_y.

(Divide 'i_dividend 'i_divisor)
RETURNS: a list whose car is the quotient and whose cadr is the remainder of the division
of i_dividend by i_divisor.
NOTE: this is restricted to integer division.
(Emuldiv 'x_factl 'x_fact2 'x_addn 'x_divisor)
RETURNS: a
list
of the
quotient
and
remainder
of
((x_factl • x_fact2) + (sign extended) x_addn) I x_divisor.
NOTE: this is useful for creating a bignum arithmetic package in Lisp.

3.2. predicates
(numberp 'g_arg)

this

operation:

2-246 Franz Lisp Manual

(numbp 'g_arg)
RETURNS: t iff g_arg is a number (fixnum, tlonum or bignum).

(ftxp 'g_arg)
RETURNS: t iff g_arg is a fixnum or bignum.

(ftoatp 'g_arg)
RETURNS: t iff g_arg is a tlonum.

(evenp 'x_arg)
RETURNS: tiff x_arg is even.

(oddp 'x_arg)
RETURNS: t iff x_arg is odd.

(zerop 'g_arg)
RETURNS: t iff g_arg is a number equal to 0.

<onep 'g_arg)
RETURNS: t iff g_arg is a number equal to 1.

(plusp 'n_arg)
RETURNS: tiff n_arg is greater than zero.

(mlnusp 'g_arg)
RETURNS: tiff g_arg is a negative number.
(greaterp l'n_argl ... ])
(> 'fx_argl 'fx_arg2)

(>&: 'x_argl 'x_arg2)
RETURNS: t iff the arguments are in a strictly decreasing order.

greaterp and > the function difference is used to compare adjacent
values. If any of the arguments are non-numbers, the error message will come
from the difference function. The arguments to > must be fixnums or both
tlonums. The arguments to > & must both be fixnums.

NOTE: In functions

Oessp l'n_argl ... ])

(< 'fx_argl 'fx_arg2)
( <&: 'x_argl 'x_arg2)
RETURNS: tiff the arguments are in a strictly increasing order.
NOTE: In functions lessp and < the function difference is used to compare adjacent values.

If any of the arguments are non numbers, the error message will come from the
difference function. The arguments to < may be either fixnums or tlonums but
must be the same type. The arguments to < & must be fixnums.

Franz Lisp Manual 2-247

<- 'fx_argl 'fx_arg2)
(-&: 'x_argl 'x_arg2)
RETURNS: t iff the arguments have the same value. The arguments to -

must be the
either both fixnums or both flonums. The arguments to -& must be fixnums.

3.3. trlgnometric functions
(cos 'fx_angle)
RETURNS: the {flonum) cosine of fx_angle {which is assumed to be in radians).

(sin 'fx_angle)
RETURNS: the sine of fx_angle (which is assumed to be in radians).

(acos 'fx_arg)
RETURNS: the (flonum) arc cosine of fx_arg in the range 0 to.,,..

(asin 'fx_arg)
RETURNS: the (flonum) arc sine of fx_arg in the range -.,,./2 to .,,.12.

Catan 'fx_argl 'fx_arg2)
RETURNS: the {flonum) arc tangent of fx_argl/fx_arg2 in the range •.,,. to.,,.,

3.4. bi1num functions
(haipart bx_number x_bits)
RETURNS: a fixnum (or bignum) which contains the x_bits high bits of (abs bx_ number) if
x_bits is positive, otherwise it returns the (abs x_bits) low bits of
(abs bx_number).
(haulon1 bx_number)
RETURNS: the number of significant bits in bx_number.
NOTE: the result is equal to the least integer greater to or equal to the base two logarithm
of one plus the absolute value of bx_number.
Cbignum-leftshift bx_arg x_amount)
RETURNS: bx_arg shifted left by x_amount. If x_amount is negative, bx_arg will be shifted
right by the magnitude of x_amount.
NOTE: If bx_arg is shifted right, it will be rounded to the nearest even number.

2-248 Franz Lisp Manual

(sticky-bignum-leftshift 'bx_arg 'x_amount)
RETURNS: bx_arg shifted left by x_amount. If x_amouttt is negative, bx_arg will be shifted
right by the rllagnitude of x_amount and rounded.
NOTE: sticky rounding is done this way: after shifting, the low order bit is changed to 1 if
any l's were shifted off to the right.

3.S. bit manipulation
Cboole 'x_key 'x.._vl 'x_v2 .J
RETURNS: the result of the bitwise boolean operation as described in the following table.
NOTE: If there are more than 3 arguments, then evaluation proceeds left to right with each
partial result becoming the new value of x_vl. That is,
(boole 'key 'vl 'v2 'v3)
(boo/e 'key (boole 'key 'vl 'v2) 'v3).
In the following table, • represents bitwise and, + represents bitwise or, EB
represents bitwise xor and .... represents bitwise negation and is the highest precedence operator.

(boole 'key 'x 'y)
key
result

0
0

common
names

key
result
common
names

x•y

... x. y

x .... y

xEBy

x+y

xor

and

bitclear

.., (x + y)

-.(x EB y)

... x

nor

equiv

11
... x +y

implies

-.y

x +-.y

-.x + -.y

-1

nand

Osh 'x_val 'x_amt)
RETURNS: x_val shifted left by x_amt if x_amt is positive. If x_amt is negative, then /sh
returns x_val shifted right by the magnitude if x_amt.
NOTE: This always returns a fixnum even for those numbers whose magnitude is so large
that they would normally be represented as a bignum, i.e. shifter bits are lost. For
more general bit shifters, see bignum-lejtshift and sticky-bignum-leftshift.
<rot 'x_val 'x_amt)
RETURNS: x_val rotated left by x_amt if x_amt is positive. If x_amt is negative, then x_val
is rotated right by the magnitude of x_amt.

3.6. other functions

Franz Lisp Manual 2-249

(abs 'n arg)
(absvaf'n_arg)
RETURNS: the absolute value of n_arg.

(exp 'fx_arg)
RETURNS: e raised to the fx_arg power (flonum) .

(e:xpt 'n_base 'n_power)
RETURNS: n_base raised to the n_power power.
NOTE: if either of the arguments are flonums, the calculation will be done using log and
exp.

(fact 'x_arg)
RETURNS: x_arg factorial. (fixnum or bignum)

(fix 'n_arg)
RETURNS: a fixnum as close as we can get to n_arg.
NOTE: fix will round down.

Currently, if n_arg is a flonum larger than the size of a

fixnum, this will fail.

(ft.oat 'n_arg)
RETURNS: a flonum as close as we can get to n_arg.
NOTE: if n_arg is a bignum larger than the maximum size of a flonum, then a floating

exception will occur.
(log 'fx_arg)
RETURNS: the natural logarithm of fx_arg.

(mu 'n_argl ... )
RETURNS: the maximum value in the list of arguments.

(min 'n_argl ... )
RETURNS: the minimum value in the list of arguments.

(mod 'i dividend 'i divisor)
(remainder 'i_dividend 'i_divisor)
RETURNS: the remainder when i_dividend is divided by i_divisor.
NOTE: The sign of the result will have the same sign as i_dividend.

2-250 Franz Lisp Manual

(*mod 'x_dividend 'x_divisor)
RETURNS: the balanced representation of x_dividend modulo x_divisor.
NOTE: the range of the balanced representation is abs(x divisor)/2 to (abs(x divisor)/2)

- x_divisor + 1.

(random ['x_limit])
RETURNS: a fixnum between 0 and x_limit - 1 if x_limit is given. If x_limit is not given,
any fixnum, positive or negative, might be returned.
(sqrt 'fx_arg)
RETURNS: the square root of fx_arg.

,.Franz Lisp Manual 2-251

CHAPTER 4
Special Functions

(and [g_argl ... ])
RETURNS: the value of the last argument if all arguments evaluate to a non-nil value, otherwise and returns nil. It returns t if there are no arguments.
NOTE: the arguments are evaluated left to right and evaluation will cease with the first nil
encountered
(apply 'u_func 'l_args)
RETURNS: the result of applying function u_func to the arguments in the list l_args.
NOTE: If u_func is a lambda, then the (length l_args) should equal the number of formal

parameters for the u_func. If u_func is a nlambda or macro, then l_args is bound
to the single formal parameter.

; add/ is a lambda of 1 argument
- > (apply 'add/ '(1))
4

; we will define plus/ as a macro which will be equivalent to add/
- > (def plus/ (macro (arg) Oist 'add/ (cadr arg))))
plusl
-> (plus/ 1)
4
; now if we apply a macro we obtain the form it changes to.
- > (apply 'plus] '(plusl 1))
(addl 3)
; if we funcall a macro however, the result of the macro is eva"d
; before it is returned.
- > (/Uncall 'plus/ '(plus/ 1))
4
; for this particular macro, the car of the arg is not checked
; so that this too will work
- > (apply 'plusl '(foo 1))
(addl 3)

2-252 Franz Lisp Manual.,
(aq l'x_numb])
RETURNS: if x_numb is specified then the x_numb' th argument to the enclosing lexpr If

x_numb is not specified then this returns the number of arguments to the
enclosing lexpr.
NOTE: it is an error to the interpreter if x_numb is given and out of range.

(break [g_message ['g_pred]])
WHERE: if g_message is not given it is assumed to be the null string, and if g_pred is not
given it is assumed to be t.
RETURNS: the value of (•break 'g_pred 'g_message)
(*break 'g_pred 'g_message)
RETURNS: nil immediately if g_pred is nil, else the value of the next (return 'value)
expression typed in at top level.
SIDE EFFECT: If the predicate, g_pred, evaluates to non-null, the lisp system stops and
prints out 'Break ' followed by g_message. It then enters a break loop
which allows one to interactively debug a program. To continue execution
from a break you can use the return function. to re tum to top level or
another break level, you can use retbrk or reset
(caseq 'g_key-form l_clausel .. .)
WHERE: l_clausei is a list of the form (g_comparator ['g_formi ... ]). The comparators
may be symbols, small fixnums, a list of small fixnums or symbols.
NOTE: The way caseq works is that it evaluates g key-form, yielding a value we will call
the selector. Each clause is examined until the selector is found consistent with the
comparator. For a symbol, or a fixnum, this means the two must be eq. For a list,
this means that the selector must be eq to some element of the list.
The symbol t has special semantics: it matches anything, and consequently, should
be the last comparator. Then, having chosen a clause, caseq evaluates each form
within that clause and
RETURNS: the value of the last form. If no comparators are matched, caseq returns nil.

Here are two ways of defining the same function:

- > (defun fate (personna)
(caseq personna
(cow 'Qumped over the moon))
(cat '(played nero))
((dish spoon) '(ran away together))
(t '(lived happily ever after))))

fate

- > (defun fate (personna)
(cond
((eq personna 'cow) '(jumped over the moon))
((eq personna 'cat) '(played nero))
((memq personna '(dish spoon)) '(ran away together))
(t '(lived happily ever qfter))))

fate

Franz Lisp Manual 2-253

(catch g_exp [ls_tag])
WHERE: if ls_tag is not given, it is assumed to be nil.
RETURNS: the result of (•catch 'ls_tag g_exp)
NOTE: catch is defined as a macro.
(*catch 'ls_tag g_exp)
WHERE: ls_tag is either a symbol or a list of symbols.
RETURNS: the result of evaluating g_exp or the value thrown during the evaluation of
g_exp.
SIDE EFFECT: this first sets up a 'catch frame' on the lisp runtime stack. Then it begins
to evaluate g_exp. If g_exp evaluates normally, its value is returned. If,
however, a value is thrown during the evaluation of g_exp then this •catch
will return with that value if one of these cases is true:
(1) the tag thrown to is ls_tag
(2) ls_tag is a list and the tag thrown to is a member of this list
(3) ls_tag is nil.
NOTE: Errors are implemented as a special kind of throw. A catch with no tag will not
catch an error but a catch whose tag is the error type will catch that type of error.
See Chapter 10 for more information.
(comment (g_arg ... ))
RETURNS: the symbol comment.
NOTE: This does absolutely nothing.
(cond [l_clausel ... ])
RETURNS: the last value evaluated in the first clause satisfied. If no clauses are satisfied
then nil is returned.
NOTE: This is the basic conditional 'statement' in lisp. The clauses are processed from left
to right. The first element of a clause is evaluated. If it evaluated to a non-null
value then that clause is satisfied and all following elements of that clause are
evaluated. The last value computed is returned as the value of the cond. If there
is just one element in the clause then its value is returned. If the first element of a
clause evaluates to nil, then the other elements of that clause are not evaluated and
the system moves to the next clause.
(cvttointllsp)
SIDE EFFECT: The reader is modified to conform with the Interlisp syntax. The character

% is made the escape character and special meanings for comma, backquote
and backslash are removed. Also the reader is told to convert upper case to
lower case.

2-254 Franz Lisp Manual

(cvttofranzlisp)
SIDE EFFECT:

The reader is modified to conform with franz's default syntax. One would
run this function after having run cvttomaclisp, only. Backslash is made
the escape character, and super-brackets are reinstated. The reader is reminded to distinguish between upper and lower case.

(cvttomaclisp)
SIDE EFFECT:

The reader is modified to conform with Maclisp syntax. The character I is
made the escape character and the special meanings for backslash, left and
right bracket are removed. The reader is made case-insensitive.

(cvttoucillsp)
SIDE EFFECT:

The reader is modified to conform with UCI Lisp syntax. The character I
is made the escape character, tilde is made the comment character, exclamation point takes on the unquote function normally held by comma, and
backslash, comma, semicolon become normal characters. Here too, the
reader is made case-insensitive.

(debug s_msg)
SIDE EFFECT:

Enter the Fixit package described in Chapter 15. This package allows you
to examine the evaluation stack in detail. To leave the Fixit package type
'ok'.

(debuggin1 'g_arg)
SIDE EFFECT:

If g_arg is non-null, Franz unlinks the transfer tables, does a (•rset t) to
tum on evaluation monitoring and sets the all-error catcher (ER%all) to be
debug-err-handler. If g_arg is nil, all of the above changes are undone.

(declare [g_arg ... ])
RETURNS: nil
NOTE: this is a no-op to the evaluator. It has special meaning to the compiler (see
Chapter 12).
(def s_name (s_type l_argl g_expl ... ))
WHERE: s_type is one of lambda, nlambda, macro or lexpr.
RETURNS: s_name
SIDE EFFECT: This defines the function s name to the lisp system. If s type is nlambda
or macro then the argument list l_argl must contain exactly one non-nil
symbol.
(defmacro s_name l_arg g_expl .. .>
(defcmacro s_name l_arg g_expl .. .)
RETURNS: s_name
SIDE EFFECT: This defines the macro s_name. de/macro makes it easy to write macros
since it makes the syntax just like dejim. Further information on de/macro
is in §8.3.2. defcmacro defines compiler-only macros, or cmacros. A cmacro
is stored on the property list of a symbol under the indicator cmacro. Thus
a function can have a normal definition and a cmacro definition. For an
example of the use of cmacros, see the definitions of nthcdr and nth in
/usr /lib/lisp/common2.l

(

Franz Lisp Manual 2-255

(defun s- name [s- mtype] ls- argl g- expl ... )
WHERE: s_mtype is one of fexpr, expr, args or macro.
RETURNS: s_name
SIDE EFFECT: This defines the function s_name.
NOTE: this exists for Maclisp compatibility, it is just a macro which changes the defun
form to the def form. An s_mtype of fexpr is converted to nlambda and of expr to
lambda. Macro remains the same. If ls_argl is a non-nil symbol, then the type is
assumed to be lexpr and ls_argl is the symbol which is bound to the number of
args when the function is entered.
For compatability with the Lisp Machine lisp, there are three types of optional
parameters that can occur in ls_argl: &optional declares that the following symbols
are optional, and may or may not appear in the argument list to the function, &rest
symbol declares that all forms in the function call that are not accounted for by previous lambda bindings are to be assigned to symbo~ and &aux forml ... formn
declares that the formi are either symbols, in which case they are lambda bound to
nil, or lists, in which case the first element of the list is lambda bound to the
second, evaluated element.

; defand de.fun here are used to define identical functions
; you can decide for yourself which is easier to use.

- > (def append} (lambda (/is extra) (append lis (/ist extra})))

append I

-> (de.fun append} (/is extra) (append /is (/ist extra)))
append I
; Using the & forms ...

- > (de.fun test (a b &optional c &aux (retval 0) &rest z)
(if c them (msg "Optional arg presenf N
"c is" c N))
(msg "rest is" z N
"retval is " retval N))

test

- > (test 1 2 J 4)

Optional arg present
c is 3
rest is (4)
retval is 0

(defvar s_variable ['g_init])
RETURNS: s_variable.
NOTE: This form is put at the top level in files, like de.fan.
SIDE EFFECT: This declares s_variable to be special. If g_init is present, and s_variable is
unbound when the file is read in, s variable will be set to the value of
g_init. An advantage of '(defvar foo)' over '(declare (special foo))' is that
if a file containing defvars is loaded (or fasl'ed) in during compilation, the
variables mentioned in the defvar's will be declared special. The only way
to have that effect with '(declare (special foo))' is to include the file.

2-256 Franz Lisp Manual
(do l_vrbs l_test g_expl .. .)
RETURNS: the last form in the cdr of I_test evaluated, or a value explicitly given by a
return evaluated within the do body.
NOTE: This is the basic iteration form for FRANZ LISP. l vrbs is a list of zero or more
var-init-repeat forms. A var-init-repeat form looks like:
(s_name (g_init (g_repeat)])
There are tltree cases depending on what is present in the form. If just s_name is
present, this means that when the do is entered, s_name is lambda-bound to nil
and is never moQified by the system (though the program is certainly free to
modify its value). If the form is (s_name 'g_init) then the only difference is that
s_name is lambda-bound to the value of g_init instead of nil. If g_repeat is also
present then s_name is lambda-bound to g_init when the loop is entered and after
each pass through the do body s_name is bound to the value of g_repeat.
l_test is either nil or has the form of a cond clause. If it is nil then the do body
will be evaluated only once and the do will return nil. Otherwise, before the do
body is evalu~ted the car of l test is evaluated and if the result is non-null, this signals an enq to the looping. Then the rest of the forms in l_test are evaluated and
the value of the last one is returned as the value of the do. If the cdr of l test is
nil, then nil i$ retumed -- thus this is not exactly like a cond clause.
g_expl and those forms which follow constitute the do body. A do body is like a
prog body and th~ may have labels and one may use the functions go and return.
The sequence of evaluations is this:
·
(1) the init forms are evaluated left to right and stored in temporary locations.
(2) Simultaneously all do variables are lambda bound to the value of their init forms or
nil.
(3) If l_test is non-null, then the car is evaluated and if it is non-null, the rest of the
forms in l_test are evaluated and the last value is returned as the value of the do.
(4) The forms in the do body are evaluated left to right.
(5) If l_test is nil the do function returns with the value nil.
(6) The repeat forms are evaluated and saved in temporary locations.
(7) The variables with repeat forms are simultaneously bound to the values of those
forms.
(8) (Jo to step 3.
NOTE: there is an alternate form of do which can be used when there is only one do variable;,. It is described next.

(

Franz Lisp Manual 2-257

; this is a simple function which numbers the elements of a list.
; It uses a do function with two local variables.
- > (dejiln printem (/is)
(do ((xx /is (cdr xx))
(i 1 (l+ i)))
((null xx) (patom "all done') (terpr))
(print i)
(patom": ")
(print (car xx))
(terpr)))
printem
- > (printem '(a b c d))
I: a

2:b
3:c
4:d
all done
nil

(do s_name g_init g_repeat g_test g_expl .. .)
NOTE: this is another, less general, form of do. It is evaluated by:
(1) evaluating g_init
(2) lambda binding s_name to value of g_init
(3) g_test is evaluated and if it is not nil the do function returns with nil.
(4) the do body is evaluated beginning at g_expl.
(5) the repeat form is evaluated and stored in s_name.
(6) go to step 3.
RETURNS: nil
(environment ll whenl I whatl I when2 l what2 ... ])
(envlronment-maclisp [(whenl [whatl (when2 l_what2 ... ])
(environment-lmllsp [l_whenl l_whatl l_when2 l_what2 ... ])
WHERE: the when's are a subset of (eval compile load), and the symbols have the same
meaning as they do in 'eval-when'.
The what's may be
(files fi.lel fi.le2 ... fi.leN),
which insure that the named files are loaded. To see if fi.lei is loaded, it looks
for a 'version' property under ti.tels property list. Thus to prevent multiple
loading, you should put
(putprop 'myfi.le t 'version),
at the end of myfi.le.1.
Another acceptible form for a what is
(syntax type)
Where type is either maclisp, intlisp, ucilisp, franzlisp. This sets the syntax
correctly.

2-258 Franz Lisp Manual

environment-mac/isp sets the environment to that which 'liszt -m' would generate. environment-lmlisp sets up the lisp machine environment. This is like
maclisp but it has additional macros. For these specialized environments, only
the ftles clauses are useful.
(environment-maclisp (compile eval) (files
foo bar))
(err rs_value [nil]])
RETURNS: nothing (it never returns).
SIDE EFFECT: This causes an error and if this error is caught by an errset then that errset
will return s_value instead of nil. If the second arg is given, then it must
be nil (MAClisp compatibility).
(error ['s_messagel rs_message2]])
RETURNS: nothing (it never returns).
SIDE EFFECT: s_messagel and s_message2 are patomed if they are given and then err is
called (with no arguments), which causes an error.
(errset g_expr [s_flag])
RETURNS: a list of one element, which is the value resulting from evaluating g_expr. If an
error occurs during the evaluation of g_expr, then the locus of control will
return to the errset which will then return nil (unless the error was caused by a
call to err, with a non-null argument).
SIDE EFFECT: S_flag is evaluated before g_expr is evaluated. If s_flag is not given, then it
is assumed to be t. If an error occurs during the evaluation of g_expr, and
s_flag evaluated to a non-null value, then the error message associated with
the error is printed before control returns to the errset.
(eval 'g_val ['x_bind-pointer])
RETURNS: the result of evaluating g_val.
NOTE: The evaluator evaluates g_val in this way:
If g_val is a symbol, then the evaluator returns its value. If g_val had never been
assigned a value, then this causes an 'Unbound Variable' error. If x_bind-pointer
is given, then the variable is evaluated with respect to that pointer (see evalframe
for details on bind-pointers).
If g_val is of type value, then its value is returned. If g_val is of any other type
than list, g_val is returned.
If g_val is a list object then g_val is either a function call or array reference. Let
g_car be the first element of g_val. We continually evaluate g_car until we end up
with a symbol with a non-null function binding or a non-symbol. Call what we end
up with: g_func.

G_func must be one of three types: list, binary or array. If it is a list then the first
element of the list, which we shall call g_functype, must be either lambda,
nlambda, macro or lexpr. If g_func is a binary, then its discipline, which we shall
call g_functype, is either lambda, nlambda, macro or a string. If g_func is an array
then this form is evaluated specially, see Chapter 9 on arrays. If g_func is a list or
binary. then g_functype will determine how the arguments to this function, the cdr
of g_val, are processed. If g_functype is a string, then this is a foreign function call
(see §8.5 for more details).

Franz Lisp Manual 2-259

If g_functype is lambda or lexpr, the arguments are evaluated (by calling eval recursively) and stacked. If g_functype is nlambda then the argument list is stacked. If
g_functype is macro then the entire form, g_val is stacked.

Next, the formal variables are lambda bound. The formal variables are the cadr of
g_func. If g_functype is nlambda, lexpr or macro, there should only be one formal
variable. The values on the stack are lambda bound to the formal variables except
in the case of a lexpr, where the number of actual arguments is bound to the formal variable.
After the binding is done, the function is invoked, either by jumping to the entry
point in the case of a binary or by evaluating the list of forms beginning at cddr
g_func. The result of this function invocation is returned as the value of the call to
eval.
(evalframe 'x_pdlpointer)
RETURNS: an evalframe descriptor for the evaluation frame just before x_pdlpointer. If
x_pdlpointer is nil, it returns the evaluation frame of the frame just before the
current call to evalframe.
NOTE: An evalframe descriptor describes a call to eva~ apply or .fimca/L The form of the
descriptor is
(type pd/-pointer expression bind-pointer np-index /bot-index)
where type is 'eval' if this describes a call to eval or 'apply' if this is a call to apply
or junca/L pdl-pointer is a number which describes this context. It can be passed
to evalframe to obtain the next descriptor and can be passed to .freturn to cause a
return from this context. bind-pointer is the size of variable binding stack when
this evaluation began. The bind-pointer can be given as a second argument to eval
to order to evaluate variables in the same context as this evaluation. If type is
'eval' then expression will have the form (/Unction-name argl .. .). If type is 'apply'
then expression will have the form (/Unction-name (argl .. .)). np-index and lbotindex are pointers into the argument stack (also known as the namestack array) at
the time of call. lbot-index points to the first argument, np-index. points one
beyond the last argument.
In order for there to be enough information for evalframe to return, you must call
(•rset t).
EXAMPLE: (progn (evalframe ni/))
returns (eva/ 2147478600 (progn (evalframe nil)) 1 8 7)
(evalhook 'g_form 'su,..evalfunc ['su_funca!lfunc])
RETURNS: the result of evaluating g_form after lambda binding 'evalhook' to su_evalfunc
and, if it is given, lambda binding 'funcallhook' to su_funcallhook.
NOTE: As explained in §14.4, the function eval may pass the job of evaluating a form to a
user 'hook' function when various switches are set. The hook function normally
prints the form to be evaluated on the terminal and then evaluates it by calling
evalhook. Eva/hook does the lambda binding mentioned above and then calls eval
to evaluate the form after setting an internal switch to tell eva/ not to call the user's
hook function just this one time. This allows the evaluation process to advance
one step and yet insure that further calls to eval will cause traps to the hook function (if su evalfunc is non-null).
In order for evalhook to work, (•rset t) and (sstatus eva/hook t) must have been
done previously.

2-260 Franz Lisp Manual
(exec s_argl .. .)
RETURNS: the result of forking and executing the command named by concatenating the
s_argi together with spaces in between.

(

(exece 's fname ['l args ['l envirlJ)

RETURNS: the error code from the system if it was unable to execute the command

s_fname with arguments l_args and with the environment set up as specified in
l_envir. If this function is successful, it will not return, instead the lisp system
will be overlaid by the new command.

(fretum 'x_pdl-pointer 'g_retval)
RETURNS: g_retval from the context given by x_pdl-pointer.
NOTE: A pdl-pointer denotes a certain expression currently being evaluated. The pdl-

pointer for a given expression can be obtained from evalframe.

Cfrexp 'f_arg)

(
\_

RETURNS: a list cell (exponent. mantissa) which represents the given flonum
NOTE: The exponent will be a fixnum, the mantissa a 56 bit bignum. If you think of the

the binary point occurring right after the high order bit of mantissa, then
f_arg - 2exponent. mantissa.
(funcall 'u_func ['g_argl ... ])
RETURNS: the value of applying function u_func to the arguments g_argi and then evaluating that result if u_func is a macro.
NOTE: If u_func is a macro or nlambda then there should be only one g_arg. fimcallis the
function which the evaluator uses to evaluate lists. If Joo is a lambda or lexpr or
array, then (/Uncall 'foo 'a 'b 'c) is equivalent to (Joo 'a 'b 'c). If Joo is a nlambda
then (/Unca/I 'foo '(ab c)) is equivalent to (Joo a b c). Finally, if Joo is a macro
then (/Uncall 'foo '(Joo ab c)) is equivalent to (Joo ab c).
(funcallbook 'l_form 'su_funcallfunc ['su_evalfunc])
RETURNS: the result of funcal!ng the (car Uorm) on the already evaluated arguments in
the (cdr Uorm) after lambda binding 'funcallhook' to su_funcallfunc and, if it is
given, lambda binding 'evalhook' to su_evalhook.
NOTE: This function is designed to continue the evaluation process with as little work as
possible after a funcallhook trap has occurred. It is for this reason that the form of
1 form is unorthodox: its car is the name of the function to call and its cdr are a list
Of arguments to stack (without evaluating again) before calling the given function.
After stacking the arguments but before calling funcall an internal switch is set to
prevent funca// from passing the job of funcalling to su_funcallfunc. If funca// is
called recursively in funcalling l_form and if su_funcallfunc is non-null, then the
arguments to funcal/ will actually be given to su_funcallfunc (a lexpr) to be funcalled.
In order for evalhook to work, (•rset t) and (sstatus evalhook t) must have been
done previously. A more detailed description of evalhook and funcallhook is given
in Chapter 14.

Franz Lisp Manual 2-261

(function u_func)
RETURNS: the function binding of u_func if it is an symbol with a function binding otherwise u_func is returned.
(getdisc 'y_func)
RETURNS: the discipline of the machine coded function (either lambda, nlambda or
macro).
(go g_labexp)
g_labexp is either a symbol or an expression.
SIDE EFFECT: If g_labexp is an expression, that expression is evaluated and should result
in a symbol. The locus of control moves to just following the symbol
g_labexp in the current prog or do body.
NOTE: this is only valid in the context of a prog or do body. The interpreter and compiler
will allow non-local go's although the compiler won't allow a go to leave a function
body. The compiler will not allow g_labexp to be an expression.
WHERE:

(if 'g_a 'g_b)
(if 'g_a 'g_b 'g_c .. .)
(if 'g_a then 'g_b [. . .] [elseif 'g_c then 'g_d .. .] [else 'g_e [... ])
(if 'g_a then 'g_b [.. .] [elseif 'g_c thenret1 [else 'g_d [... ])
NOTE: The various forms of if are intended to be a more readable conditional statement,
to be used in place of cond There are two varieties of i/, with keywords, and
without. The keyword-less variety is inherited from common Maclisp usage. A
keyword-less, two argument ifis equivalent to a one-clause cond, i.e. (cond (a b)).
Any other keyword-less ifmust have at least three arguments. The first two arguments are the first clause of the equivalent cond, and all remaining arguments are
shoved into a second clause beginning with t. Thus, the second form of if is
equivalent to
(cond (ab) (t c .. .)).
The keyword variety has the following grouping of arguments: a predicate, a thenclause, and optional else-clause. The predicate is evaluated, and if the result is
non-nil, the then-clause will be performed, in the sense described below. Otherwise, (i.e. the result of the predicate evaluation was precisely nil), the else-clause
will be performed.
Then-clauses will either consist entirely of the single keyword thenret, or will start
with the keyword then, and be followed by at least one general expression. (These
general expressions must not be one of the keywords.) To actuate a thenret means
to cease further evaluation of the i/, and to return the value of the predicate just
calculated. The performance of the longer clause means to evaluate each general
expression in tum, and then return the last value calculated.

The else-clause may begin with the keyword else and be followed by at least one
general expression. The rendition of this clause is just li~e that of a then-clause.
An else-clause may begin alternatively with the keyword elseif, and be followed
(recursively) by a predicate, then-clause, and optional else-clause. Evaluation of
this clause, is just evaluation of an {f fonn, with the same predicate, then- and
else-clauses.

-·-

·-·---··-·--------~-·--·

·----

2-262 Franz Lisp Manual

{I-throw-err 'l_token)
WHERE: l_token is the cdr of the value returned from a •catch with the tag ER%unwindprotect.
RETURNS: nothing (never returns in the current context)
SIDE EFFECT: The error or throw denoted by l_token is continued.
NOTE: This function is used to implement unwind-protect which allows the processing of a
transfer of control though a certain context to be interrupted, a user function to be
executed and than the transfer of control to continue. The form of 1 token is
either
(t tag value) for a throw or
(nil type message valret contuab uniqueid [arg ..• ])for an error.
This function is not to be used for implementing throws or errors and is only documented here for completeness.
(let l_args g_expl ... g_expm)
RETURNS: the result of evaluating g_expm within the bindings given by l_args.
NOTE: l_args is either nil (in which case let is just like progn) or it is a list of binding
objeclt. A binding object is a list (symbol expression). When a let is entered all of
the expressions are evaluated and then simultaneously lambda bound to the
corresponding symbols. In effect, a let expression is just like a lambda expression
except the symbols and their initial values are next to each other which makes the
expression easier to understand. There are some added features to the let expression: A binding object can just be a symbol, in which case the expression
corresponding to that symbol is 'nil'. If a binding object is a list and the first element of that list is another list, then that list is assumed to be a binding template
and let will do a desetq on it.
Clet• l_args g_expl ... g_expn)
RETURNS: the result of evaluating g_expm within the bindings given by l_args.
NOTE: This is identical to let except the expressions in the binding list l_args are evaluated
and bound sequentially instead of in parallel.
Clexpr-funcall 'g_function ['g_argl ... ] 'l_argn)
NOTE: This is a cross between funcall and apply. The last argument, must be a list (possibly empty). The element of list arg are stack and then the function is funcalled.
EXAMPLE: Oexpr-funcall 'list 'a '(b c d)) is the same as
(funcall 'list 'a 'b 'c 'd)

(listify 'x count)
RETURNS: a list of x_count of the arguments to the current function (which must be a
lexpr).
NOTE: normally arguments 1 through x_count are returned. If x_count is negative then a
list of last abs(x_count) arguments are returned.

(

Franz Lisp Manual 2-263

(map 'u_func 'l_argl .. .)
RETURNS: l_argl
NOTE: The function u_func is applied to successive sublists of the l_argi. All sublists
should have the same length.
(mapc 'u_func 'l_argl .. .)
RETURNS: l_argl.
NOTE: The function u_func is applied to successive elements of the argument lists. All of
the lists should have the same length.
(mapcan 'u_func 'l_argl .. .)
RETURNS: nconc applied to the results of the functional evaluations.
NOTE: The function u_func is applied to successive elements of the argument lists. All
sublists should have the same length.
(mapcar 'u_func 'l_argl .. .)
RETURNS: a list of the values returned from the functional application.
NOTE: the function u_func is applied to successive elements of the argument lists. All
sublists should have the same length.
(mapcon 'u_func 'l_argl .. .)
RETURNS: nconc applied to the results of the functional evaluation.
NOTE: the function u_func is applied to successive sublists of the argument lists. All sublists should have the same length.
(maplist 'u_func 'l_argl .. .)
RETURNS: a list of the results of the functional evaluations.
NOTE: the function u_func is applied to successive sublists of the arguments lists. All
sublists should have the same length.
Readers may find the following summary table useful in remembering the differences
between the six mapping functions:
Value returned is

nconc of results

Argument to functional is

l_argl

elements of list

mapc

mapcar

mapcan

sublists

map

maplist

mapeon

list of results

2-264 Franz Lisp Manual

(mfunction t_entry 's_disc)
RETURNS: a lisp object of type binary composed of t_entry and s_disc.
NOTE: t_entry is a pointer to the machine code for a function, and s_disc is the discipline
(e.g. lambda).
Coblist)
RETURNS: a list of all symbols on the oblist.

Cor (g_argl ... ])
RETURNS: the value of the first non-null argument or nil if all arguments evaluate to nil.
NOTE: Evaluation proceeds left to right and stops as soon as one of the arguments evaluates to a non-nu[\ value.
(pr«>1 l_vrbls g_expl .. .)
RETURNS: the value explicitly given in a return form or else nil if no return is done by the
time the last g_expi is evaluated.
NOTE: the local variables are lambda bound to nil then the g_exp are evaluated from left
to right. This is a prog body (obviously) and this means than any symbols seen are
not evaluated, insteaq they are treated as labels. This also means that return's and
go's are allowed.
(pfOll 'g_expl ['g_exp2 ... ])
RETURNS: g_expl
(pr«>12 'g_expl 'g_exp2 ['g_exp3 ... ])
RETURNS: g_exp2
NOTE: the forms are evaluated from left to right and the value of g_exp2 is returned.
(pr«>1n 'g_expl ['g_exp2 ... ])
RETURNS: th~ last g_expi.
(progv 'l_locv 'l_initv g_exp 1 .. .)
WHERE: l_locv is a list of symbols and l_initv is a list of expressions.
RETURNS: the value of the last g_expi evaluated.
NOTE: The expressions in l initv are evaluated from left to right and then lambda-bound
to the symbols in l_locv. If there are too few expressions in l_initv then the missing values are assumed to be nil. If there are too many expressions in l_initv then
the extra ones are ignored (although they are evaluated). Then the g_expi are
evaluated left to right. The body of a progv is like the body of a progn, it is not a
prog body. (C.f. let)

Franz Lisp Manual 2-265

(purcopy 'g_exp)
RETURNS: a copy of g_exp with new pure cells allocated wherever possible.
NOTE: pure space is never swept up by the garbage collector, so this should only be done
on expressions which are not likely to become garbage in the future. In certain
cases, data objects in pure space become read-only after a dumplisp and then an
attempt to modify the object will result in an illegal memory reference.
(purep 'g_exp)
RETURNS: tiff the object g_exp is in pure space.

(putd 's_name 'u_func)
RETURNS: u_func
SIDE EFFECT: this sets the function binding of symbol s_name to u_func.
(return ['g_val])
RETURNS: g_val (or nil if g_val is not present) from the enclosing prog or do body.
NOTE: this form is only valid in the context of a prog or do body.
(selectq 'g_key-form [l_clausel .. .])
NOTE: This function is just like caseq (see above), except that the symbol otherwise has
the same semantics as the symbol t, when used as a comparator.
(setaq 'x_argnum 'g_val)
WHERE: x_argnum is greater than zero and less than or equal to the number of arguments to the lexpr.
RETURNS: g_val
SIDE EFFECT: the lexpr's x_argnum'th argument is set tog-val.
NOTE: this can only be used within the body of a lexpr.
(throw 'g val [s tag])
WHERE: if s_tag is not given, it is assumed to be nil.
RETURNS: the value of (•throw 's_tag 'g_val).

(*throw 's_tag 'g_val)
RETURNS: g_val from the first enclosing catch with the tag s_tag or with no tag at all.
NOTE: this is used in conjunction with •catch to cause a clean jump to an enclosing context.
(unwind-protect g_protected [g_cleanupl ... ])
RETURNS: the result of evaluating g_protected.
NOTE: Normally g_protected is evaluated and its value remembered, then the g_cleanupi
are evaluated and finally the saved value of g_protected is returned. If something
should happen when evaluating g_protected which causes control to pass through
g_protected and thus through the call to the unwind-protect, then the g_cleanupi
will still be evaluated. This is useful if g_protected does something sensitive which
must be cleaned up whether or not g_protected completes.

2-266 Franz Lisp Manual

CHAPTERS
Input/Output

The following functions are used to read from and write to external devices (e.g. files)
and programs (through pipes). All 1/0 goes through the lisp data type called the port. A port
may be open for either reading or writing, but usually not both simultaneously (see fileopen ) .
There are only a limited number of ports (20) and they will not be reclaimed unless they are
closed. All ports are reclaimed by a resetio call, but this drastic step won't be necessary if the
program closes what it uses.
If a port argument is not supplied to a function which requires one or if a bad port argument (such as nil) is given, then FRANZ LISP will use the default port according to this scheme:
If input is being done then the default port is the value of the symbol piport and if output is
being done then the default port is the value of the symbol poport. Furthermore, if the value
of piport or poport is not a valid port, then the standard input or standard output will be used,
respectively.
The standard input and standard output are usually the keyboard and terminal display
unless your job is running in the background and its input or output is connected to a pipe. All
output which goes to the standard output will also go to the port ptport if it is a valid port.
Output destined for the standard output will not reach the standard output if the symbol "w is
non nil (although it will still go to ptport if ptport is a valid port).
Some of the functions listed below reference files directly. FRANZ LISP has borrowed a
convenient shorthand notation from /bin/csh, concerning naming files. If a file name begins
with - (tilde), and the symbol tilde-expansion
is bound to something other than nil, then FRANZ LISP expands the file name. It takes the
string of characters between the leading tilde, and the first slash as a user-name. Then, that initial segment of the filename is replaced by the home directory of the user. The null username
is taken to be the current user.
Having gone to the effort of searching the password file, FRANZ LISP remembers the user
directory, in case it gets asked to do so again. Tilde-expansion is performed in the following
functions: efasl, chdir, fas/, ffasl, fileopen, ilffile, load, ou(file, probe/, sys:access, sys:unlink.
(cfasl 'st_file 'st_entry 'st_funcname ['st_disc ['st_library]])
RETURNS:t
SIDE EFFECT: This is used to load in a foreign function (see §8.4). The object file st_file
is loaded into the lisp system. St_entry should be an entry point in the file
just loaded. The function binding of the symbol s_funcname will be set to
point to st_entry, so that when the lisp function s_funcname is called,
st_entry will be run. st_disc is the discipline to be given to s_funcname.
st_disc defaults to "subroutine" if it is not given or if it is given as nil. If
st_library is non-null, then after st_file is loaded, the libraries given in
st_library will be searched to resolve external references. The form of
st_library should be something like "-IS -lm". The C library (" -le " ) is
always searched so when loading in a C file you probably won't need to
specify a library. For Fortran files, you should specify "-1F77" and if you
are doing any 1/0, the library entry should be "-1177 -1F77". For Pascal files
"-lpc" is required.

Franz Lisp Manual 2-267

NOTE: This function may be used to load the output of the assembler, C compiler, Fortran
compiler, and Pascal compiler but NOT the lisp compiler (use fas/ for that). If a
file has more than one entry point, then use getaddress to locate and setup other
foreign functions.
It is an error to load in a file which has a global entry point of the same name as a
global entry point in the running lisp. As soon as you load in a file with cjas~ its
global entry points become part of the lisp's entry points. Thus you cannot cfasl in
the same file twice unless you use removeaddress to change certain global entry
points to local entry points.

(dose 'p_port)
RETURNS:t
SIDE EFFECT: the specified port is drained and closed, releasing the port.
NOTE: The standard defaults are not used in this case since you probably never want to
close the standard output or standard input.
(cprlntf 'st_format 'xfst_val ['p_port])
RETURNS: xfst_val
SIDE EFFECT: The UNIX formatted output function printf is called with arguments
st_format and xfst_val. If xfst_val is a symbol then its print name is passed
to printf. The format string may contain characters which are just printed
literally and it may contain special formatting commands preceded by a percent sign. The complete set of formatting characters is described in the
UNIX manual. Some useful ones are %d for printing a fixnum in decimal,
%f or %e for printing a flonum, and %s for printing a character string (or
print name of a symbol).
EXAMPLE: (cprinif" Pi equals %/ 3.14159) prints 'Pi equals 3.14159'

(drain ['p_port]}
RETURNS: nil
SIDE EFFECT: If this is an output port then the characters in the output buffer are all sent
to the device. If this is an input port then all pending characters are
flushed. The default port for this function is the default output port.
(ex [s filename])
(vi [s -filename])
(exl CS filename])
(vii [sJilename])
RETURNS:
SIDE EFFECT: The lisp system starts up an editor on the file named as the argument. It
will try appending .1 to the file if it can't find it. The functions ex/ and vii
will load the file after you finish editing it. These functions will also
remember the name of the file so that on subsequent invocations, you
don't need to provide the argument.
NOTE: These functions do not evaluate their argument.

2-268 Franz Lisp Manual

(fasl 'st_name C'st_mapf ['g_warn]])
WHERE: st_mapf and g_warn default to nil.
RETURNS: t if the function succeeded, nil otherwise.
SIDE EFFECT: this function is designed to load in an object file generated by the lisp compiler Liszt. File names for object files usually end in '.o', so fas/ will
append '.o' to st_name (if it is not already present). If st_mapf is non nil,
then it is the name of the map file to create. Fas/ writes in the map file the
names and addresses of the functions it loads and defines. Normally the
map file is created (i.e. truncated if it exists), but if (sstatus appendmap t) is
done then the map file will be appended. If g_warn is non nil and if a function is loaded from the file which is already defined, then a warning message will be printed.
NOTE: fas/ only looks in the current directory for the file to load. The function load looks
through a user-supplied search path and will call fas/ if it finds a file with the same
root name and a '.o' extension. In most cases the user would be better off using
the function load rather than calling fas/ directly.
(ffasl 'st_file 'st_entry 'st_funcname C'st_discipline C'st_library]])
RETURNS: the binary object created.
SIDE EFFECT: the Fortran object file st_file is loaded into the lisp system. St_entry should
be an entry point in the file just loaded. A binary object will be created and
its entry field will be set to point to st_entry. The discipline field of the
binary will be set to st_discipline or "subroutine" by default. If st_library is
present and non-null, then after st_file is loaded, the libraries given in
st_library will be searched to resolve external references. The form of
st_library should be something like "-IS -!termcap". In any case, the standard Fortran libraries will be searched also to resolve external references.
NOTE: in F77 on Unix, the entry point for the fortran function foo is named '_foo_'.
(ftlepos 'p_port C'x_pos])
RETURNS: the current position in the file if x_pos is not given or else x_pos if x_pos is
given.
SIDE EFFECT: If x_pos is given, the next byte to be read or written to the port will be at
position x_pos.
(ftlestat 'st_filename)
RETURNS: a vector containing various numbers which the UNIX operating system assigns
to files. if the file doesn't exist, an error is invoked. Use probe/to determine if
the file exists.
NOTE: The individual entries can be accesed by mnemonic functions of the form
filestat1ie~ where field may be any of atime, ctime, dev, gid, ino, mode,mtime,
nlink, rdev, size, type, uid. See the UNIX programmers manual for a more
detailed description of these quantities.

Franz Lisp Manual 2-269

(ftatc 'g_form ['x_max])
RETURNS: the number of characters required to print g_form using patom. If x_max is
given, and if .ftatc determines that it will return a value greater than x_max,
then it gives up and returns the current value it has computed. This is useful if
you just want to see if an expression is larger than a certain size.
(ftatsize 'g_form ['x_max])
RETURNS: the number of characters required to print g_form using print The meaning of
x_max is the same as for flatc.
NOTE: Currently this just exp/ode's g_form and checks its length.
(fileopen 'st_filename 'st_mode)
RETURNS: a port for reading or writing (depe.nding on st_mode) the file st_name.
SIDE EFFECT: the given file is opened (or created if opened for writing and it doesn't yet
exist).
NOTE: this function call provides a direct interface to the operating system's fopen function. The mode may be more than just "r" for read, "w" for write or "a" for append.
The modes "r+", "w+" and "a+" permit both reading and writing on a port provided that fseek is done between changes in direction. See the UNIX manual
description of fopen for more details. This routine does not look through a search
path for a given file.

(fseek 'p_port 'x_offset 'x_flag)
RETURNS: the position in the file after the function is performed.
SIDE EFFECT: this function positions the read/write pointer before a certain byte in the

file. If x_flag is 0 then the pointer is set to x_offset bytes from the beginning of the file. If x_flag is 1 then the pointer is set to x_offset bytes from
the current location in the file. If x_flag is 2 then the pointer is set to
x_offset bytes from the end of the file.

(infile 's_filename)
RETURNS: a port ready to read s_ftlename.
SIDE EFFECT: this tries to open s_filename and if it cannot or if there are no ports available it gives an error message.
NOTE: to allow your program to continue on a file-not-found error, you can use something
like:
(cond ((null (setq myport (car (errset {iflfile name) nil))))
(patom .. couldn't open the file")))

which will set myport to the port to read from if the file exists or will print a message if it couldn't open it and also set myport to nil. To simply determine if a file
exists, there is a function named probe/.

2-270 Franz Lisp Manual

Ooad 's filename ['st map ['g warn]])
RETUR.NS:t

NOTE: The function of load has changed since previous releases of FRANZ LISP and the

following description should be read carefully.
SIDE EFFECT: load now serves the function of both fas/ and the old load Load will search
a user defined search path for a lisp source or object file with the filename
s filename (with the extension .1 or .o added as appropriate). The search
path which load uses is the value of (status load-search-path}. The default is
<I.I /usr/lib/lisp) which means look in the current directory first and then
/usr/lib/lisp. The file which load looks for depends on the last two characters of s_filename. If s_filename ends with ".1" then load will only look for
a file name s filename and will assume that this is a FRANZ LISP source file.
If s_filename ends with ".o" then load will only look for a file named
s_filename and will assume that this is a FRANZ LISP object file to be fasled
in. Otherwise, load will first look for s_filename.o, then s_filename.l and
finally s_filename itself. If it finds s_filename.o it will assume that this is an
object file, otherwise it will assume that it is a source file. An object file is
loaded using fas/ and a source file is loaded by reading and evaluating each
form in the file. The optional arguments st_map and g_warn are passed to
fas/ should fas/ be called.
NOTE: load requires a port to open the file s filename. It then lambda binds the symbol
piport to this port and reads and evaluates the forms.
(makereadtable ['s_tlag])
WHER.E: if s_tlag is not present it is assumed to be nil.
R.ETURNS: a readtable equal to the original readtable if s_flag is non-null, or else equal to
the current readtable. See chapter 7 for a description of readtables and their
uses.
(ms1 [l_option .. .] ['g_msg ... ])
NOTE: This function is intended for printing short messages. Any of the arguments or
optio~ presented can be used any number of times, in any order. The messages
themselves (g_msg) are evaluated, and then they are transmitted to patom. Typically, they are strings, which evaluate to themselves. The options are interpreted
specially:

Franz Lisp Manual 2-271

msg Option Summary

(Pp.J10rtname)

causes subsequent output to go to the port p_JX>rtname
(port should be opened previously)

print a single blank.

(8 'n_b)

N
(N 'n_nJ

evaluate n_b and print that many blanks.
print a single by calling terpr.
evaluate n_n and transmit
that many newlines to the stream.

drain the current port.

(nwritn ['p_port))
RETURNS: the number of characters in the buffer of the given port but not yet written out
to the file or device. The buffer is flushed automatically when filled, or when
terpr is called.
(outfile 's_filename l'st_type))
RETURNS: a port or nil
SIDE EFFECT: this opens a port to write s_filename. If st_type is given and if it is a symbol or string whose name begins with 'a', then the file will be opened in
append mode, that is the current contents will not be lost and the next data
will be written at the end of the file. Otherwise, the file opened is truncated by outfi/e if it existed beforehand. If there are no free ports, outfi.le
returns nil. If one cannot write on s_filename, an error is signalled.
(patom 'g_exp ['p_port))
RETURNS: g_exp
SIDE EFFECT: g_exp is printed to the given port or the default port. If g_exp is a symbol
or string, the print name is printed without any escape characters around
special characters in the print name. If g_exp is a list then patom has the
same effect as print
(pntlen 'xfs_arg)
RETURNS: the number of characters needed to print xfs_arg.

2-272 Franz Lisp Manual
(portp 'g_arg)

RETURNS: t iff g_arg is a port.
(pp [l_option] s_namel .. .)

RETURNS:t
SIDE EFFECT: If s namei has a function binding, it is pretty-printed, otherwise if s namei
has -a value then that is pretty-printed. Normally the output of the prettyprinter goes to the standard output port poport. The options allow you to
redirect it.

PP Option Summary
(F s_Jilename)

direct future printing to s_filename

(Pp_JJOrtname)

causes output to go to the port p__portname
(port should be opened previously)

(E g_expressionJ

evaluate g_expression and don't print

(princ 'g_arg ['p_port])

EQUIVALENT TO: patom.
(print 'g_arg ['p_port])

RETURNS: nil
SIDE EFFECT: prints g_arg on the port p_port or the default port.

(probef 'st_fi.le)
RETURNS: t iff the file st_fi.le exists.
NOTE: Just because it exists doesn't mean you can read it.
(pp-form 'g_form ['p_port])
RETURNS:t
SIDE EFFECT: g_form is pretty-printed to the port p_port (or poport if p_port is not
given). This is the function which pp uses. pp-form does not look for function definitions or values of variables, it just prints out the form it is given.
NOTE: This is useful as a top-level-printer, c.f. top-level in Chapter 6.

Franz Lisp Manual 2-273

(ratom ['p_port ['g_eof]])
RETURNS: the next atom read from the given or default port. On end of file, g_eof
(default nil) is returned.
(read [' p_port [' g_eof]])
RETURNS: the next lisp expression read from the given or default port. On end of file,
g_eof (default nil) is returned.
NOTE: An error will occur if the reader is given an ill formed expression. The most common error is too many right parentheses (note that this is not considered an error
in Maclisp).
(readc ['p_port ['g_eof]])
RETURNS: the next character read from the given or default port. On end of file, g_eof
(default nil) is returned.
(readllst 'l_arg)
RETURNS: the lisp expression read from the list of characters in l_arg.
(removeaddress 's_namel ['s_name2 .. .])
RETURNS: nil
SIDE EFFECT: the entries for the s namei in the Lisp symbol table are removed. This is
useful if you wish to-<;fasl or jfasl in a file twice, since it is illegal for a symbol in the file you are loading to already exist in the lisp symbol table.
(resetio)
RETURNS: nil
SIDE EFFECT: all ports except the standard input, output and error are closed.
(setsyntax 's_symbol 's_synclass ['ts_func])
RETURNS:t
SIDE EFFECT: this sets the code for s symbol to sx code in the current readtable. If
s_synclass is macro or splicing then ls_func is the associated function. See
Chapter 7 on the reader for more details.
(sload 's_file)
SIDE EFFECT: the file s_file (in the current directory) is opened for reading and each form
is read, printed and evaluated. If the form is recognizable as a function
definition, only its name will be printed, otherwise the whole form is
printed.
NOTE: This function is useful when a file refuses to load because of a syntax error and you
would like to narrow down where the error is.

2-274 Franz Lisp Manual

(tab 'x_col ['p_port])
SIDE EFFECT: enough spaces are printed to put the cursor on column x_col. If the cursor
is beyond x_col to start with, a terpr is done first.
(terpr ['p_port])
RETURNS: nil
SIDE EFFECT:

a terminate line character sequence is sent to the given port or the default
port. This will also drain the port.

(terpri ['p_port])
EQUIVALENT TO: terpr.

(tilde-expand 'st_filename)
RETURNS: a symbol whose pname is the tilde-expansion of the argument, (as discussed at
the beginning of this chapter). If the argument does not begin with a tilde, the
argument itself is returned.
(tyi ['p_port])
RETURNS: the

fixnum representation of the next character read. On end of file, -1 is
returned.

(tyipeek ['p_port])
RETURNS: the fixnum representation of the next character to be read.
NOTE: This does not actually read the character, it just peeks at it.
(tyo 'x_char ['p_port])
RETURNS: x_char.
SIDE EFFECT: the character whose fixnum representation is x_code, is printed as a on the
given output port or the default output port.
(untyi 'x_char ['p_port])
SIDE EFFECT: x_char is put back in the input buffer so a subsequent tyi or read will read it
first.
NOTE: a maximum of one character may be put back.
(usemame-to-dir 'st_name)
RETURNS: the home directory of the given user. The result is stored, to avoid unnecessarily searching the password file.
(zapline)
RETURNS: nil

all characters up to and including the line termination character are read
and discarded from the last port used for input.
NOTE: this is used as the macro function for the semicolon character when it acts as a
comment character.

SIDE EFFECT:

CHAPTER 6

Franz Lisp Manual 2-275

System Functions

This chapter describes the functions used to interact with internal components of the Lisp
system and operating system.
(allocate 's_type 'x_pages)
WHERE: s_type is one of the FRANZ LISP data types described in §1.3.
RETURNS: x_pages.
SIDE EFFECT: FRANZ LISP attempts to allocate x_pages of type s_type. If there aren't
x_pages of memory left, no space will be allocated and an error will occur.
The storage that is allocated is not given to the caller, instead it is added to
the free storage list of s_type. The functions segment and small-segment
allocate blocks of storage and return it to the caller.
(aqv 'x_argnumb)
RETURNS: a symbol whose pname is the x_argnumbth argument (starting at 0) on the
command line which invoked the current lisp.
NOTE: if x_argnumb is less than zero, a fixnum whose value is the number of arguments
on the command line is returned. (argv 0) returns the name of the lisp you are
running.
(baktrace)
RETURNS: nil
SIDE EFFECT: the lisp runtime stack is examined and the name of (most) of the functions

currently in execution are printed, most active first.
NOTE: this will occasionally miss the names of compiled lisp functions due to incomplete
information on the stack. If you are tracing compiled code, then baktrace won't be
able to interpret the stack unless (sstatus trans/ink nil) was done. See the function
showstack for another way of printing the lisp runtime stack.
Cboundp 's_name)
RETURNS: nil if s_name is unbound, that is it has never be given a value. If x_name has
the value g_val, then (nil . g_val) is returned.

2-276 Franz Lisp Manual

(chdir 's_path)
RETURNS: t iff the system call succeeds.

the current directory set to s_path. Among other things, this will affect the
default location where the input/output functions look for and create files.
NOTE: chdir follows the standard UNIX conventions, if s_path does not begin with a slash,
the default path is changed to the current path with s_path appended. Chdir
employs tilde-expansion (discussed in Chapter 5).
SIDE EFFECT:

(command-llne-args)
RETURNS: a list of the arguments typed on the command line, either to the lisp interpreter,
or saved lisp dump, or application compiled with the autorun option (liszt -r>.
(deref 'x_addr)
RETURNS: The contents of x_addr, when thought of as a longword memory location.
NOTE: This may be useful in constructing arguments to C functions out of 'dangerous'

areas of memory.
(dumplisp s_name)
RETURNS: nil
the current lisp is dumped to the named file. When s_name is executed,
you will be in a lisp in the same state as when the dumplisp was done.
NOTE: dumplisp will fail if one tries to write over the current running file. UNIX does not
allow you to modify the file you are running.

SIDE EFFECT:

(eval-when l_time g_expl .. .)
SIDE EFFECT: l_time may contain any combination of the symbols load, eva~ and compile.
The effects of load and compile is discussed in §12.3.2.1 compiler. If eval
is present however, this simply means that the expressions g_expl and so
on are evaluated from left to right. If eval is not present, the forms are not
evaluated.
(exit ['x_code])
RETURNS: nothing (it never returns).
SIDE EFFECT: the lisp system dies with exit code x_code or 0 if x_code is not specified.
(fake 'x_addr)
RETURNS: the lisp object at address x_addr.
NOTE: This is intended to be used by people debugging the lisp system.

Franz Lisp Manual 2-277

(fork)
RETURNS: nil to the child process a:hd the pr0cess number of the child to the parent.
SIDE EFFECT:

A copy of the current lisp system is made in memory and both lisp systems
now begin to run. This function can be used interactively to temporarily
save the state of Lisp (as shown below), but you must be careful that only
one of the lisp's interacts with the terminal after the fork. The wait function is useful for this.

-> (setqJoo 'bar)

;; set a variable

bar

- > (cond ((for/c)(wait)))

nil
->Joo

;; duplicate the lisp system and
;; make the parent wait
;; check the value of the variable

bar

- > (setq Joo 'baz)

;; give it a new value

baz
;; make sure it worked

->Joo

baz

-> (exit)

(5274. 0)
->Joo
bar

;; exit the child
;; the wait function returns this
;; we check to make sure parent was
;; not modified.

(gc)
RETURNS: nil
SIDE EFFECT:

this causes a garbage collection.

NOTE: The function gcafter is not called automatically after this function finishes.

Normally the user doesn't have to call gc since garbage collection occurs automatically
whenever internal free lists are exhausted.

(gcafter s_type)
WHERE: s type is one of the FRANZ LISP data types listed in §1.3.
NOTE: this function is called by the garbage collector after a garbage collection which was
caused by running out of data type s type. This function should determine if more
space need be allocated and if so should allocate it. There is a default gcafter function but users who want control over space allocation can define their own -- but
note that it must be an nlambda.
(getenv 's_name)
RETURNS: a symbol whose pname is the value of s name in the current UNIX environment. If s_name doesn't exist in the current environment, a symbol with a null
pname is returned.

2-278 Franz Lisp Manual

(hashtabstat)
RETURNS: a list of fixnums representing the number of symbols in each bucket of the
oblist.
NOTE: the oblist is stored a hash table of buckets. Ideally there would be the same
number of symbols in each bucket.
(help [sx_arg])
SIDE EFFECT: If sx_arg is a symbol then the portion of this manual beginning with the
description of sx_arg is printed on the terminal. If sx_arg is a fixnum or
the name of one of the appendicies, that chapter or appendix is printed on
the terminal. If no argument is provided, help prints the options that it
r~c gnizes. The program 'more' is used to print the manual on the termin , it will stop after each page and will continue after the space key is
p essed.
(include s_file~
RETURNS: nil
SIDE EFFECT: The given filename is loaded into the lisp.
NOTE: this is similar to load except the argument is not evaluated. Include means something special to the compiler.
(include-if 'g_predicate s_filename)
RETURNS: nil
SIDE EFFECT: This has the same effect as include, but is only actuated if the predicate is
non-nil.
Cincludef 's_filename)
RETURNS: nil
SIDE EFFECT: this is the same as include except the argument is evaluated.
Cincludef-lf 'g_predicate s_filename)
RETURNS: nil
SIDE EFFECT: This has the same effect as includef, but is only actuated if the predicate is
non-nil.
(maknum 'g_arg)
RETURNS: the address of its argument converted into a fixnum.
(monitor ['xs_maxaddr])
RETURNS:t
SIDE EFFECT: If xs_maxaddr is t then profiling of the entire lisp system is begun. If
xs_maxaddr is a fixnum then profiling is done only up to address
xs_maxaddr. If xs_maxaddr is not given, then profiling is stopped and the
data obtained is written to the file 'moo.out' where it can be analyzed with
the UNIX 'prof' program.
NOTE: this function only works if the lisp system has been compiled in a special way, otherwise, an error is invoked.

Franz Lisp Manual 2-279

(opval 's_arg ['g_newval])
RETURNS: the value associated with s_arg before the call.
SIDE EFFECT: If g_newval is specified, the value associated with s_arg is changed to
g_newval.
NOTE: opval keeps track of storage allocation. If s_arg is one of the data types then opval
will return a list of three fixnums representing the number of items of that type in
use, the number of pages allocated and the number of items of that type per page.
You should never try to change the value opva/ associates with a data type using
opvaL
If s_arg is pagelimit then opval will return (and set if g_newval is given) the maximum amount of lisp data pages it will allocate. This limit should r~main small
unless you know your program requires lots of space as this limit will catch programs in infinite loops which gobble up memory.
(*process 'st_command ['g_readp ['g_writep]])
RETURNS: either a fixnum if one argument is given, or a list of two ports and a fixnum if
two or three arguments are given.
NOTE: •process starts another process by passing st_command to the shell (it first tries
/bin/csh, then it tries /bin/sh if /bin/csh doesn't exist). If only one argument is
given to •process, •process waits for the new process to die and then returns the exit
code of the new process. If more two or three arguments are given, •process starts
the process and then returns a list which, depending on the value of g_readp and
g_writep, may contaip i/o ports for communcating with the new process. If
g_writep is non-null, then a port will be created which the lisp program can use to
send characters to the new process. If g_readp is non-null, then a port will be
created which the lisp program can use to read characters from the new process.
The value returned by •process is (readport writeport pid) where readport and writeport are either nil O[ a port based on the value of g_readp and g_writep. Pid is
the process id of the new process. Since it is hard to remember the order of
g_readp and g_writep, the functions •process-send and •process-receive were written
to perform the common functions.
(*process-receive 'st_command)
RETURNS: a port which can be read.
SIDE EFFECT: The command st_command is given to the shell and it is started running in
the background. The output of that command is available for reading via
the port returned. The input of the command process is set to /dev/null.
(*process-send 'st_command)
RETURNS: a port which can be written to.
SIDE EFFECT: The command st_command is given to the shell and it is started runing in
the background. The lisp program can provide input for that command by
sending characters to the port returned by this function. The output of the
command process is set to /dev/null.

2-280 Franz Lisp Manual

(process s_pgrm [s_frompipe s_topipe])
RETURNS: if the optional argumentS are not present a fixnum which is the exit code when
s_prgm dies. If the optional arguments are present, it returns a fixnum which is
the process id of the child.
NOTE: This command is obsolete. New program8 should use one of the •process commands given above.
SIDE EFFECT: If s trompipe and s topipe are given, they are bound to ports which are
pipeS which direct ciiatacters from FRANZ LISP to the new process and to
FRANZ LISP from the new process respectively. Process forks a process
named s_prgm and waits for it to die iff there are no pipe arguments given.
(ptime)
RETURNS: a list of two elements, the first is the amount of processor time used by the lisp

system so far, the second is the amount of time used by the garbage collector so
far.
NOTE: the time is measured in those units used by the times(2) system call, usually 60ths
of a second. The first number includes the second number. The amount of time
used by garbage collection is not recorded until the first call to ptime. This is done
to prevent overhead when the user is not interested in garbage collection times.
(reset)
SIDE EFFECT: the lisp runtime stack is cleared and the system restarts at the top level by

executin& a (/ilncall top-level niO
(restorelisp 's_name)
SIDE EFFECT: this reads in file s_name (which was created by savelisp) and then does a
(reSl!t).
NOTE: This is only used on vMs systems where dumplisp cannot be used.
(retbrk ['x_Ievel])
WHERE: x_level is a smatl integer of either sign.
SIDE EFFECT: The default error handler keeps a notion of the current level of the error
caught. If x_level is negative, control is thrown to this default error
handler whose level is that many less than the present, or to top-level if
there aren't enough. If x level is non-negative, control is passed to the
handler at that level. If i_level is not present, the value -1 is taken by
default.
(•rset 'g_flag)
RETURNS: g_flag
SIDE EFFECT: If g_tlag is non nil then the lisp system will maintain extra information

about calls to eva/ and jimca/L This record keeping slows down the evaluation but this is required for the functions evalhook, fancallhook, and evalframe to work. To debug compiled lisp code the transfer tables should be
unlinked: (sstatus trans/ink ni/)

Franz Lisp Manual 2-281

(savellsp 's_name)
RETURNS:t
SIDE EFFECT: the state of the Lisp system is saved in the file s_name. It can be read in
by restorelisp.
NOTE: This is only used on VMS systems where dumplisp cannot be used.
(segment 's_type 'x_size)
WHERE: s_type is one of the data types given in §1.3
RETURNS: a segment of contiguous lispvals of type s_type.
NOTE: In reality, segment returns a new data cell of type s_type and allocates space for
x_size - 1 more s_type's beyond the one returned. Segment always allocates new
space and does so in 512 byte chunks. If you ask for 2 fi.xnums, segment will actually allocate 128 of them thus wasting 126 fi.xnums. The function small-segment is a
smarter space allocator and should be used whenever possible.
(shell)
RETURNS: the exit code of the shell when it dies.
SIDE EFFECT: this forks a new shell and returns when the shell dies.
(showstack)
RETURNS: nil
SIDE EFFECT: all forms currently in evaluation are printed, beginning with the most
recent. For compiled code the most that showstack will show is the function name and it may miss some functions.
(signal 'x_signum 's_name)
RETURNS: nil if no previous call to signal has been made, or the previously installed
s_name.
SIDE EFFECT: this declares that the function named s_name will handle the signal number
x_signum. If s_name is nil, the signal is ignored. Presently only four
UNIX signals are caught, they and their numbers are: lnterrupt(2), Floating
exception(8), Alarm(14), and Hang-up(l).
(sizeof 'g_arg)
RETURNS: the number of bytes required to store one object of type g_arg, encoded as a
fi.xnum.
(small-segment 's_type 'x_cells)
WHERE: s_type is one of fi.xnum, flonum and value.
RETURNS: a segment of x_cells data objects of type s_type.
SIDE EFFECT: This may call segment to allocate new space or it may be able to fill the
request on a page already allocated. The value returned by small-segment is
usually stored in the data subpart of an array object.

\2-282 Franz Lisp Manual

(sstatus g_type g_val)
RETURNS: g_val
SIDE EFFECT: If g_type is not one of the special sstatus codes described in the next few
pages this simply sets g_val as the value of status type g_type in the system
status property list.
(sstatus appendmap g_val)
RETURNS: g_val
SIDE EFFECT: If g_val is non-null when fas/ is told to create a load map, it will append to
the file name given in the fas/ command, rather than creating a new map
file. The initial value is nil.
(sstatus automatic-reset g_val)
RETURNS: g_val
SIDE EFFECT: If g val is non-null when an error occurs which no one wants to handle, a
reset will be done instead of entering a primitive internal break loop. The
initial value is t.
(sstatus chainatom g_val)
RETURNS: g_val
SIDE EFFECT: If g_val is non nil and a car or cdr of a symbol is done, then nil will be
returned instead of an error being signaled. This only affects the interpreter, not the compiler. The initial value is nil.
(sstatus dumpcore g_val)
RETURNS: g_val
SIDE EFFECT: If g_val is nil, FRANZ LISP tells UNIX that a segmentation violation or bus
error should cause a core dump. If g_val is non nil then FRANZ LISP will
catch those errors and print a message advising the user to reset.
NOTE: The initial value for this flag is nil, and only those knowledgeable of the innards of
the lisp system should ever set this flag non nil.
(sstatus dumpmode x_val)
RETURNS: x_val
SIDE EFFECT: All subsequent dumplisps will be done in mode x_val. x_val may be either
413 or 410 (decimal).
NOTE: the advantage of mode 413 is that the dumped Lisp can be demand paged in when
first started, which will make it start faster and disrupt other users less. The initial
value is 413.

Franz Lisp Manual 2-283

(sstatus evalhook L val)
RETURNS: g_val
SIDE EFFECT: When g_val is non nil, this enables the evalhook and funcallhook traps in
the evaluator. See §14.4 for more details.
(sstatus feature g_val)
RETURNS: g_val
SIDE EFFECT: &_val is added to the (status features) list,
(sstatus gcstrlngs g_val)
RETURNS: g_val
SIDE EFFECT: if g_val is non-null, and if string garbage collection was enabled when the
lisp system was compiled, string space will be garbage collected.
NOTE: the default value for this is nil since in most applications garbage collecting strings
is a waste of time.
(sstatus ignoreeof g_val)
RETURNS: g_val
SIDE EFFECT: If g_val is non-null when an end of file (CNTL-D on UNIX) is typed to the
standard top-level interpreter, it will be ignored rather then cause the lisp
system to exit. If the the standard input is a file or pipe then this has no
effect, an EOF will always cause lisp to exit. The initial value is nil.
(sstatus nofeature g_val)
RETURNS: g_val
SIDE EFFECT: g_val is removed from the status features list if it was present.
(sstatus translink g_val)
RETURNS: g_val
SIDE EFFECT: If g_val is nil then all transfer tables are cleared and further calls through
the transfer table will not cause the fast links to be set up. If g_val is the
symbol on then all possible transfer table entries will be linked and the flag
will be set to cause fast links to be set up dynamically. Otherwise all that is
done is to set the flag to cause fast links to be set up dynamically. The initial value is nil.
NOTE: For a discussion of transfer tables, see §12.8.
(sstatu'S uctolc g_val)
RETURNS: g_val
SIDE EFFECT: If g_val is not nil then all unescaped capital letters in symbols read by the
reader will be converted to lower case.
NOTE: This allows FRANZ LISP to be compatible with single case lisp systems (e.g.
Maclisp, Interlisp and UCILisp).

2-284 Franz Lisp Manual

(status g_code)
RETURNS: the value associated with the status code g_code if g_code is not one of the spe-

cial cases given below
(status ctlme)
RETURNS: a symbol whose print name is the current time and date.
EXAMPLE: (status clime) - ~un Jun 29 16:51:26 198~
NOTE: This has been made obsolete by time-string, described below.

(status feature g_val)
RETURNS: t ift' g_val is in the status features list.
(status features)
RETURNS: the value of the features code, which is a list of features which are present in
this system. You add to this list with (sstatusfeature 'g_va/) and test if feature
g_feat is present with (status feature 'gJeat}.
(status isatty)
RETURNS: t ift' the standard input is a terminal.

(status localtlme)
RETURNS: a list of fixnums representing the current time.
EXAMPLE: (status localtime) - (3 51 13 31 6 81 5 211 1)
means 3rd second, 5lst minute, 13th hour (1 p.m), 31st day, month 6
(0 - January), year 81 (0 - 1900), day of the week 5 (0 - Sunday), 2llth
day of the year and daylight savings time is in eft'ect.
(status syntax s_char)
NOTE: This function should not be used. See the description of getsyntax (in Chapter 7)
for a replacement.
(status undeffunc)
RETURNS: a list of all functions which transfer table entries point to but which are not
defined at this point.
NOTE: Some of the undefined functions listed could be arrays which have yet to be
created.
(status version)
RETURNS: a string which is the current lisp version name.
EXAMPLE: (status version) - "Franz Lisp, Opus 38.61"

Franz Lisp Manual 2-285

(syscall 'x_index ['xst_argl ... ])
RETURNS: the result of issuing the UNIX system call number x_index with arguments
xst_argi.
NOTE: The UNIX system calls are described in section 2 of the UNIX Programmer's
manual. If xst_argi is a fixnum, then its value is passed as an argument, if it is a
symbol then its pname is passed and finally if it is a string then the string itself is
passed as an argument. Some useful syscalls are:
(sysca/120) returns process id.
(sysca/113) returns the number of seconds since Jan l, 1970.
(sysca/110 'foo) will unlink (delete) the file foo.
(sys:access 'st_filename 'x_mode)
(sys:chmod 'st_filename 'x_mode)
(sys:aethostname)
(sys :1etpid)
(sys:getpwnam 'st_username)
(sys:link 'st_oldfilename 'st_newfilename)
(sys:time)
(sys:unlink 'st_filename)
NOTE: We have been warned that the actual system call numbers may vary among
different UNIX systems. Users concerned about portability may wish to use this
group of functions. Another advantage is that tilde-expansion is performed on all
filename arguments. These functions do what is described in the system call section of your UNIX manual.
sys:getpwname returns a vector of four entries from the password file, being the
user name, user id, group id, and home directory.

(time-string ['x_seconds])
RETURNS: an ascii string, giving the time and date which was x_seconds after UNIX's idea
of creation (Midnight, Jan 1, 1970 GMT). If no argument is given, time-string
returns the current date. This supplants (status ctime), and may be used to
make the results of filestat more intelligible.
(top-level)
RETURNS: nothing (it never returns)
NOTE: This function is the top-level read-eval-print loop.

It never returns any value. Its
main utility is that if you redefine it, and do a (reset) then the redefined (top-level)
is then invoked. The default top-level for Franz, allow one to specify his own
printer or reader, by binding the symbols top-level-printer and top-level-reader.
One can let the default top-level do most of the drudgery in catching resets, and
reading in .lisprc files, by binding the symbol user-top-level, to a routine that concerns itself only with the read-eval-print loop.

-·-·-----· - - -

2-286 Franz Lisp Manual
(wait)
RETURNS: a dotted pair (processid . status) when the next child process dies.

'""

(

Franz Lisp Manual 2-287

CHAPTER 7
The Lisp Reader

7.1. Introduction
The read function is responsible for converting a stream of characters into a Lisp
expression. Read is table driven and the table it uses is called a readtab/e. The print
function does the inverse of read, it converts a Lisp expression into a stream of characters. Typically the conversion is done in such a way that if that stream of characters
were read by read, the result would be an expression equal to the one print was given.
Print must also refer to the readtable in order to determine how to format its output.
The explode function, which returns a list of characters rather than printing them, must
also refer to the readtable.
A readtable is created with the makereadtable function, modified with the setsyntax
function and interrogated with the getsyntax function. The structure of a readtable is
hidden from the user - a readtable should only be manipulated with the three functions
mentioned above.
There is one distinguished readtable called the current readtable whose value determines what read, print and explode do. The current readtable is the value of the symbol
readtable. Thus it is possible to rapidly change the current syntax by lambda binding a
different readtable to the symbol readtable. When the binding is undone, the syntax
reverts to its old form.

7.2. Syntax Classes
The readtable describes how each of the 128 ascii characters should be treated by
the reader and printer. Each character belongs to a syntax class which has three properties:
character class Tells what the reader should Jo when it sees this character. There are a large
number of character classes. They are described below.
separator Most types of tokens the reader constructs are one character long. Four token
types have an arbitrary length: number (1234), symbol print name (franz), escaped
symbol print name ~r~, and string ("franz"). The reader can easily determine
when it has come to the end of one of the last two types: it just looks for the
matching delimiter qor "). When the reader is reading a number or symbol print
name, it stops reading when it comes to a character with the separator property.
The separator character is pushed back into the input stream and will be the first
character read when the reader is called again.
escape Tells the printer when to put escapes in front of, or around, a symbol whose print
name contains this character. There are three possibilities: always escape a symbol
with this character in it, only escape a symbol if this is the only character in the

2-288 Franz Lisp Manual

symbol, and only escape a symbol if this is the first character in the symbol. [note:
The printer will always escape a symbol which, if printed out, would look like a
valid number.]
·
When the Lisp system is built, Lisp code is added to a C-coded kernel and the
result becomes the standard lisp system. The readtable present in the C-coded kernel,
called the raw readtable, contains the bare necessities for reading in Lisp code. During
the construction of the c9mplete Lisp system, a copy is made of the raw readtable and
then the copy is modified by adding macro characters. The result is what is called the
standard readtable. When a new readtable is created with makereadtable, a copy is made
of either the raw readtable or the current readtable (which is likely to be the standard
readtable).

7.3. Reader operations
The reader has a very simple algorithm. It is either scanning for a token, collecting
a token, or processing a token. Scanning involves reading characters and throwing away
those which don't start tokens (such as blanks and tabs). Collecting means gathering the
characters which make up a token into a buffer. Processing may involve creating symbols, strings, lists, fixnums, bignums or flonums or calling a user written function called
a character macro.
The components of the syntax class determine when the reader switches between
the scanning, collecting and processing states. The reader will continue scanning as long
as the character class of the characters it reads is cseparator. When it reads a character
whose character class is not cseparator it stores that character in its buffer and begins the
collecting phase.
If the character class of that first character is ccharacter, cnumber, cperiod, or csign.
then it will continue collecting until it runs into a character whose syntax class has the
separator property. (That last character will be pushed back into the input buffer and will
be the first character read next time.) Now the reader goes into the processing phase,
checking to see if the token it read is a number or symbol. It is important to note that
after the first character is collected the component of the syntax class which tells the
reader to stop collecting is the separator property, not the character class.
If the character class of the character which stopped the scanning is not ccharacter,
cnumber, cperiod, or csign. then the reader processes that character immediately. The
character classes csingle-macro, csingle-sp/icing-macro, and csingle-ilffix-macro will act like
ccharacter if the following token is not a separator. The processing which is done for a
given character class is described in detail in the next section.

7.4. Character classes

ccharacter

raw readtable:A-Z a-z "H !#$%&*,/:;<->?@"_'0standard readtable:A-Z a-z "H !$%&*/:;<->?@"_{}-

A normal character.

cnumber

raw readtable:0-9
standard readtable:0-9
This type is a digit. The syntax for an integer (fixnum or bignum) is a string of cnumber
characters optionally followed by a cperiod. If the digits are not followed by a cperiod,

Franz Lisp Manual 2-289

then they are interpreted in base ibase which must be eight or ten. The syntax for a
floating point number is either zero or more cnumbers followed by a cperiod and then
followed by one or more cnumbers. A floating point number may also be an integer or
floating point number followed by 'e' or 'd', an optional '+' or ' - ' and then zero or
more cnumbers.
csign

raw readtable: + standard readtable: + A leading sign for a number. No other characters should be given this class.

cleft-paren

raw readtable: (
standard readtable: (

A left parenthesis. Tells the reader to begin forming a list.
cright-paren

raw readtable:)
standard readtable:)
A right parenthesis. Tells the reader that it has reached the end of a list.

cleft-bracket

raw readtable: [
standard readtable: [
A left bracket. Tells the reader that it should begin forming a list. See the description
of cright-bracket for the difference between cleft-bracket and cleft-paren.

cright-bracket

raw readtable:]
standard readtable:]
A right bracket. A cright-bracket finishes the formation of the current list and all enclosing lists until it finds one which begins wi• a cleft-bracket or until it reaches the top level
list.

cperiod

raw readtable:.
standard readtable:.
The period is used to separate element of a cons cell [e.g. (a . (b . nil)) is the same as
(ab)]. cperiodis also used in numbers as described above.

cseparator

raw readtable:"I-"M esc space
standard readtable:"I-"M esc space
Separates tokens. When the reader is scanning, these character are passed over. Note:
there is a difference between the cseparator character class and the separator property of a
syntax class.

csingle-quote

raw readtable:·
standard readtable:'
This causes read to be called recursively and the list (quote <value read>) to be
returned.

csymbol-delimiter

raw readtable~

2-290 Franz Lisp Manual

standard readtable~
This causes the reader to begin collecting characters and to stop only when another
identical csymbol-delimiter is seen. The only way to escape a csymbol-delimiter within a
symbol name is with a cescape character. The collected characters are converted into a
string which becomes the print name of a symbol. If a symbol with an identical print
name already exists, then the allocation is not done, rather the existing symbol is used.
raw readtable:\
standard readtable:\
This causes the next character to read in to be treated as a vcharacter. A character
whose syntax class is vcharacter has a character class ccharacter and does not have the
separator property so it will not separate symbols.

cescape

cstring-delimiter

raw readtable:•
standard readtable:•
This is the same as csymbol-delimiter except the result is returned as a string instead of a
symbol.

csingle-character-symbol

raw readtable:none
standard readtable:none
This returns a symbol whose print name is the the single character which has been collected.

cmacro

raw readtable:none
standard readtable:',
The reader calls the macro function associated with this character and the current readtable, passing it no arguments. The result of the macro is added to the structure the
reader is building, just as if that form were directly read by the reader. More details on
macros are provided below.

esp/icing-macro

raw readtable:none
standard readtable:#;
A esp/icing-macro differs from a cmacro in the way the result is incorporated in the structure the reader is building. A esp/icing-macro must return a list of forms (possibly
empty). The reader acts as if it read each element of the list itself without the surrounding parenthesis.

csingle-macro

raw readtable:none
standard readtable:none
This causes to reader to check the next character. If it is a cseparator then this acts like a
cmacro. Otherwise, it acts like a ccharacter.

csingle-splicing-macro

raw readtable:none
standard readtable:none
This is triggered like a csingle-macro however the result is spliced in like a esp/icing-macro.

cilffix-macro

raw readtable:none

Franz Lisp Manual 2-291

standard readtable:none
This is differs from a cmacro in that the macro function is passed a form representing
what the reader has read so far. The result of the macro replaces what the reader had
read so far.

csingle-irr/ix-macro

raw readtable:none
standard readtable:none
This differs from the cirrfix-macro in that the macro will only be triggered if the character
following the csingle-irrfix-macro character is a cseparator.

cillegal

raw readtable:"@-"G"N-"Z"\-" rubout
standard readtable:"@-"G"N- "Z"\-..- rubout
The characters cause the reader to signal an error if read.
-

7.S. Syntax classes
The readtable maps each character into a syntax class. The syntax class contains
three pieces of information: the character class, whether this is a separator, and the
escape properties. The first two properties are used by the reader, the last by the printer
(and explode). The initial lisp system has the following syntax classes defined. The user
may add syntax classes with add-syntax-class. For each syntax class, we list the properties
of the class and which characters have this syntax class by default. More information
about each syntax class can be found under the description of the syntax class's character
class.
vcharacter

ccharacter

vnumber

cnumber

vsi1n

csign

vleft-paren

cleft-paren
escape-a/ways
separator
vri1ht-paren

cright-paren
escape-always
separator
vleft-bracket

cleft-bracket

raw readtable;A-Z a-z "H !#$%&* ,/:; < - > ?@" '{}standard readtable:A-Z a-z "H !$%&* /:; < - > ?@;;

raw readtable:0-9
standard readtable:0-9

raw readtable: +standard readtable: +-

raw readtable: (
standard readtable: (

raw readtable:)
standard readtable:)

raw readtable: [
standard readtable: [

2-292 Franz Lisp Manual

escape-a/ways
separator
vright-bracket

cright-bracket
escape-always
separator
vperiod

cperiod
escape-when-unique
vseparator

cseparator
escape-always
separator
vsingle-quote

csingle-quote
escape-always
separator
vsymbol-delimiter

csing/e-delimiter
escape-always
vescape

cescape
escape-a/ways
vstring-delimiter

cstring-delimiter
escape-always
vsingle-character-symbol

csingle-character-symbol
separator
vmacro

cmacro
escape-always
separator
vsplicing-macro

esp/icing-macro
escape-always
separator
vsingle-macro

csingle-macro

raw readtable:]
standard readtable:]

raw readtable:.
standard readtable:.

raw readtable:AI-AM esc space
standard readtable:AI-AM esc space

raw readtable:'
standard readtable:'

raw readtable~
standard readtable~

raw readtable:\
standard readtable:\

raw readtable:"
standard readtable:~