Digital PDFs

AA-PBKPA-TE

June 1990

80 pages

Original

2.8MB

Document:	ULTRIX Guide to the nawk Utility
Order Number:	AA-PBKPA-TE
Revision:	0
Pages:	80
Original Filename:

OCR Text

ULTRIX

Guide to the nawk Utility

Order Number: AA-PBKPA-TE
June 1990

Product Version:

nawk Version 1.0

Operating System and Version:

ULTRIX Version 4.0 or higher

This manual is a tutorial description of the nawk text-processing utility and programming
language.

digital equipment corporation
maynard, massachusetts

Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in
subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause of DFARS 252.227-7013.
© Digital Equipment Corporation 1990
All rights reserved.

The information in this document is subject to change without notice and should not be construed as a commitment
by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may
appear in this document.
The software described in this document is furnished under a license and may be used or copied only in accordance
with the terms of such license.
No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital or its
affiliated companies.

The following are trademarks of Digital Equipment Corporation:

mllmaama
CDA
DDIF
DDIS
DEC
DECnet
DECstation

DECUS
DECwindows
DTIF
MASSBUS
MicroVAX
Q-bus
ULTRIX
ULTRIX Mail Connection

ULTRIX Worksystem Software
VAX
VAXstation
VMS
VMS/ULTRIX Connection
VT
XVI

INTEL is a trademark of Intel Corporation.
Xenix, MS-DOS, and MS-OS/2 are trademarks of Microsoft Corporation.
MKS and MKS AWI( are trademarks of Mortice Kern Systems, Inc.
PC-DOS is a trademark of International Business Machines, Inc.
UNIX is a registered trademark of AT&T in the USA and other countries.

Contents

About This Manual
vii

Audience
Organization

. .. . . .. . .. . .. . .. .. .. . . .. . .. . .. . . .. . . . .. .. .. .. . . .. . .. . .. . .. . . . .. . .. . . .. ... .. . . ... . . .. . ... .... .. . . . .. . .

Related Documents

. ... .. . . ... . .. . ... .. .... . .. .. . ... ... .. . . .. . ... .. .. ... . .. . .. . ... .. . . .... .. . .... .. . ... . .. .. ..

Basic Concepts

1.1

Data Files

1.2

1.3

................................................................................................

............................................................................

1-2

Simple Patterns . ... .. . . .. . .. .. . .. . . .. .. . . .. . .. . . . .. . .. . .. .. .. ... .. .. .. .. . . . . ... ...... . . . .. . .
Numbers and Strings ................... ............. ........................ ..............
The Print Action ............................................................................
Additional Points About Rules ........................................................

1-3
1-4
1-5
1-5

Running nawk Programs
1.3.1
1.3.2
1.3.3
1.3.4

1-1
1-2
1-2

Records
Fields

The Shape of a Program
1.2.1
1.2.2
1.2.3
1.2.4

vii
viii

Conventions

1.1.1
1.1.2

vii

...........................................................................

1-6

The nawk Command Line ...............................................................
Program Files ... .............................................................................
Sources of Data .............................................................................
Saving nawk Output .......................................................................

1-6
1-7
1-7
1-8

Simple Arithmetic

2.1

Arithmetic Operations

2-1

2.1.1

2-2

2.2

Operation Ordering
................................................................................... ..

2-3

Placeholders ................................................................................. .
Escape Sequences ......................................................................... .

2-4
2-5

Formatted Output
2.2.1
2.2.2

2.3

.................................................................................................

2-6

The Increment and Decrement Operators .......................................... .
Initial Values ............................................................................... .
Built-In Record-Oriented Variables ................................................. .

2-8
2-8
2-9

Variables

2.3.1
2.3.2
2.3.3
2.4

Arithmetic Functions

Patterns and Regular Expressions

3.1

Using Matching Expressions

3.2

Metacharacters

3-2

3.3

Using Matching Expressions with Strings

3-4

3.4

Applying Actions to a Group of Lines

3-5

3.5

Combining Conditions in Patterns

3-5

Actions and Control Structures

4.1

Adding Comments

4-1

4.2

The if Statement

4-1

4.2.1

4-3

2-10

..................................................................... .

A Word on Style

3-1

4.3
4.4

The while Loop

4.5

The for Loop

4-5

4.6

The next Statement

4-6

4.7

The exit Statement

4-7

String Manipulation

5.1

String Variables

Using Compound Statements

5.1.1

5.1.2

.................................................................... .

4-3

...................................................................................... .

4-4

. .. . . ... .. . . .. . . .. . .. .. . . . . .. .. .. .. . . .. . . . .. .. . . . . .. .. .. . . .. . . . . .. . . . .. .. . . .. . .. . .. . .. .

5-1

Built-In String Variables .................................................................
String vs. Numeric Variables ...........................................................

5-1
5-3

5.2

String Concatenation

5.3

String Manipulation Functions

ivContents

................................................................................
...................................................................

5-3
5-4

Arrays

6.1

Arrays with Integer Subscripts

6--1

6.2

Generalized Arrays

6--2

6.2.1

String Subscripts vs. Numeric Subscripts

6--3

6.3

Deleting Array Elements

6--3

6.4

Multidimensional Arrays

6-4

User-Defined Functions

7.1

Defining Functions

7-1

7.2

Recursion

7.3

Call By Value

7-3
7-3

7.4

Passing Arrays to Functions

Enhancing Your nawk Programs

8.1

The getline Function
8.1.1
8.1.2
8.1.3
8.1.4
8.1.5

......................................................................................... .
..................................................................... ..

7-4

.................................................................................

8-1

Reading from the Current Input .......................................................
Reading a Line into a String Variable ...............................................
Reading from a New File ................................................................
Reading from Other Commands .......................................................
Redirecting Output to Files and Pipes ...............................................

8-1
8-1
8-2
8-2
8-3

8.2

The system Function

.................................................................................

8.3

Compound Assignments

8.4

The sortgen Program

Order of Operations

Example Files

8-3

............................................................................

8-3

................................................................................

8-4

Examples
8-1: sortgen Program for nawk

8-4

Contents v

Tables
2-1 : Arithmetic Operations

....................................................... " ...................... .

2-2: Fonnat String Placeholders

........................................................................ .

2-3: Escape Sequences for na wk

....................................................................... .

2-1

2-4
2-6

2-4: Built-In Record-Oriented Variables

............................................................ ..

2-9

2-5: Common Mathematical Functions

............................................................... .

2-11

3-1: Metacharacters Recognized by nawk

........................................................... .

3-2

........................................................................... .

5-2

5-1 : Built-In String Variables
8-1: Compound Assignments

vi Contents

8-3

About This Manual

The Guide to the nawk Utility introduces the important principles and concepts of the
nawk programming language and utility, and shows how they can be used for
productive programming. This manual is a tutorial that teaches you how to use
nawk; it is also a reference manual that you can use later.

Audience
This manual is a guide for intermediate users of the ULTRIX system. If you are a
novice user, you might want to read the chapter on regular expressions in The Big
Gray Book: The Next Step with ULTRIX before using this manual.

Organization
This book contains eight chapters and two appendixes. The following list gives a
brief description of the book's contents:
Chapter 1

Introduces nawk and describes the basic concepts of the language.

Chapter 2

Describes how to use nawk to perform mathematical calculations.

Chapter 3

Describes how to use pattern matching and regular expressions in
nawk programs.

Chapter 4

Describes the actions you can make n a w k perform, and discusses how
to use control structures to create more powerful nawk programs.

Chapter 5

Describes how to manipulate strings with nawk.

Chapter 6

Describes how to use arrays of information with nawk.

Chapter 7

Describes how to create your own custom functions for n a wk
programs.

Chapter 8

Describes how to tailor your nawk programs.

Appendix A

Describes the order in which n a w k performs operations when
executing a program.

Appendix B

Contains copies of the example files used in this manual.

Related Documents
The Little Gray Book: An ULTRIX Primer introduces the ULTRIX operating system
and some of the tools and utilities discussed here, and is a handy reference as you
read this book.
The Big Gray Book: The Next Step with ULTRIX provides more information on
ULTRIX utilities. The Guide to the nawk Utility is a thorough tutorial description of
an enhanced version of the awk utility discussed in The Big Gray Book.

Another excellent reference for nawk is The AWK Programming Language, by
Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan (Addison-Wesley,
1988). Aho, Weinberger, and Kernighan created awk, of which nawk is an
enhanced version, at AT&T Laboratories.
The ULTRIX Reference Pages provide details of the commands and utilities
described in this book. Experienced programmers may prefer to turn directly to
nawk(1) in the Reference Pages.

Conventions
The following typeface conventions are used in this manual:

The default user prompt is your system name followed by a right
angle bracket. In this manual, a percent sign ( % ) is used to
represent this prompt.

user input

This bold typeface is used in interactive examples to indicate
typed user input.

system output This typeface is used in interactive examples to indicate system

output and also in code examples and other screen displays. In
text, this typeface is used to indicate the exact name of a
command, option, partition, pathname, directory, or file.
UPPERCASE
lowercase

The ULTRIX system differentiates between lowercase and
uppercase characters. Literal strings that appear in text,
examples, syntax descriptions, and function definitions must be
typed exactly as shown.

rlogin

In syntax descriptions and function definitions, this typeface is
used to indicate terms that you must type exactly as shown.

filename

In examples, syntax descriptions, and function definitions, italics
are used to indicate variable values; and in text, to give references
to other documents.

macro

In text, bold type is used to introduce new tenns.
A vertical ellipsis indicates that a portion of an example that
would normally be present is not shown.

ICTRUxl

viii About This Manual

This symbol is used in examples to indicate that you must hold
down the CTRL key while pressing the key x that follows the
slash. When you use this key combination, the system sometimes
echoes the resulting character, using a circumflex ( 1\ ) to represent
the CTRL key (for example, I\C for CTRL/C). Sometimes the
sequence is not echoed.

Basic Concepts

The nawk language is an easy-to-use programming language that lets you work with
information that is stored in files. With nawk programs, you can do these things:
•

Display all of the information in a file, or selected pieces of information

•

Perform calculations with numeric information from a file

•

Prepare reports based on information from a file

•

Analyze text for spelling and frequency of words and letters

At first glance, these operations seem elementary. However, later chapters show how
they can be combined to perform complicated tasks.
You will find that nawk is a good first programming language. It allows most of the
logical constructs of modem computing languages: if-else statements, while
and for loops, function calls, and so on. It is easy to learn, and allows beginners to
get results with little effort. At the same time, it introduces all the important
concepts of programming and prepares users for more complicated languages.
Every programming language has its own way of looking at the world. To write
programs in the language, you must learn to see things from the language's point of
view.
This chapter examines the fundamentals of nawk:

1.1

•

The kind of information it works with

•

The "shape" of a nawk program

•

How to run n a w k programs

Data Files
Almost all nawk programs work with data. Programs can obtain data typed in from
the terminal or from the output of other commands (through pipes); but usually data
is obtained from data files.
Data files for nawk are always text files. This means that the files contain readable
text, made up of letters, digits, punctuation characters, and so on. For example, you
could create a data file containing information about the hobbies of a group of
people. Each line in this file would give a person's name, one of that person's
hobbies, how many hours a week the person spends on the hobby, and how much
money the hobby costs per year. Using a separate line for each of a person's
hobbies, the file might look like this:
Jim
Jim
Jim
Linda

reading
bridge
role-playing
bridge

15
4

5
12

100.00
10.00
70.00
30.00

Linda
Katie
Katie
John
John
Andrew
Lori
Lori
Lori

cartooning
jogging
reading
role-playing
jogging
wind-surfing
jogging
weight-lifting
bridge

14
10
8
8

20
5

12
2

75.00
120.00
60.00
100.00
30.00
1000.00
30.00
200.00
0.00

If you want to follow the examples using this file, create a copy of the file and name
it hobbies. There are other example files used in this manual; you might want to
create copies of them as well. Appendix B contains copies of all the example files.

1.1.1

Records
A n a wk data file is a collection of records. A record contains a number of pieces of
information about a single item; these pieces are called fields. In the hobbies file,
each line is a separate record, giving a complete set of information about one
person's hobby.
Records are separated by a record separator character, which is usually the newline character. A new-line character shows where one line of text ends and another
begins; in a file using new-line as a record separator, each line of the file is a separate
record. All the examples in this manual use the new-line character as a record
separator.

1.1.2

Fields
A record consists of a number of fields. A field is a single piece of information. For
example, the following record from the hobbies file contains four fields:
Jim

reading

100.00

The information in the first field is Jim, the second is reading, and so on.
Specify fields in the same order in each record; that way nawk and other tools can
easily access a particular piece of information in any record.
The fields of a record are separated by one or more field separator characters. In
the hobbies file, strings of blank characters (spaces) separate the fields.
By default, nawk uses white space (any number of blanks or tab characters) to
separate fields. You can change this default, as you will see in Section 1.3.1.

1.2 The Shape of a Program
A nawk program looks like this:
pattern { actions }
pattern { actions }
pattern { actions }

Each line is a separate instruction or rule. The nawk utility looks through the
data files record by record and executes the rules, in the given order, on each record.

1-2 Basic Concepts

1.2.1

Simple Patterns
A rule has this form:
[pattern] [ {actions} ]

The form of a rule is called its syntax. This syntax indicates that the given set of
actions is to be performed on every record that meets a certain set of conditions. The
conditions are given by the pattern part of the rule. The brackets indicate that both
the pattern part and the actions part are optional.
The pattern of a rule often looks for records that have a particular value in some
field. The notation $1 stands for the first field of a record, $ 2 stands for the second
field, and so on. The special notation $ 0 represents the entire record. A pair of
equal signs ( == ) stands for "is equal to." For example:
$2

"jogging" { print }

This rule tells nawk to print any record whose second field is jogging.
This rule is a complete nawk program. If you ran this program on the hobbies
file, nawk would look through the file record by record (line by line). Whenever a
line had jogging as its second field, nawk would print the complete record. The
output from the program would therefore be as follows:
Katie
John
Lori

jogging
jogging
jogging

14
8
5

120.00
30.00
30.00

Here is another example; ask yourself what the following nawk program does:
$1

"John" { print }

As you probably guessed, this program prints every record that has John as its first
field. The output would be as follows:
John
John

role-playing
jogging

8
8

100.00
30.00

The same sort of search can be performed on any text database. The only difference
is that databases tend to contain a great deal more data than the example contains.
The previous examples both used the print action. In fact, this action does not
ha ve to be written explicitly; if a n a wk rule does not contain an action, p r i ntis
assumed. The two example programs you've seen could have been written as
follows, with the same effect:
$2

"jogging"

and
$1

"John"

The use of the two equal signs ( == ) is an example of a comparison operation. The
nawk language recognizes several other types of comparison:
!=

<
>

<=
>=

Not equal
Less than
Greater than
Less than or equal
Greater than or equal

For example, consider each of the following rules as complete programs, and decide
what the programs do with the hobbies file:
Basic Concepts 1-3

(a)

$1!= "Linda" { print}

(b)
(c)

$3 > 10
$4 < 100.00
$4 <= 100.00

(d)

These rules have the following effects:

1.2.2

(a)

Prints all records whose first field is not Linda.

(b)

Prints all records whose third field is greater than 10. Remember that when
there is no explicit action, print is assumed.

(c)

Prints all records whose fourth field is less than 100.00.

(d)

Prints all records whose fourth field is less than or equal to 100.00.

Numbers and Strings
In the previous examples, there are quotation marks ( " ) around Linda in (a), but
none in any of the other rules. The nawk language distinguishes between string
values, which are enclosed in quotation marks, and numeric values, which are not.
A string value is a sequence of characters like =" abc". Any characters are allowed,
even digits, as in "abc 123". Strings can contain any number of characters. A
string with zero characters is called the null string and is written " " .
A numeric value is mostly made up of digits, but it can also have a sign and a
decimal point. The following are all valid numerical values in nawk:
10

-78

0.34

+2.56

-.92

The nawk language does not let you put commas inside numbers. For example, you
must write 1000 instead of 1,000.
Note
The nawk utility lets you use exponential or scientific notation.
Exponents are given as e or E followed by an optionally signed
exponent. Thus, the following values are all equivalent:
lE3

1. Oe3

10E2

1000

When numbers are compared (with operators like> and <), comparisons are made in
accordance with the usual rules of arithmetic. When strings are compared,
comparisons are made in accordance with the ASCII! collating order. This is a little
like alphabetical order; for example:
$1 >= "Katie"

This program will print out the Katie, Linda, and Lori lines, as you would
expect from alphabetical order. However, ASCII collating order differs from
alphabetical order in a number of respects; for example, lowercase letters are greater
than uppercase ones, so that a is greater than z.
The complete ASCII collating order is given in the ascii(7) Reference Page.

1 ASCII is an abbreviation for American Standard Code for Information Interchange; most computer systems use
the ASCII code to represent characters.

1-4 Basic Concepts

1.2.3 The Print Action
So far, the only action you have learned is print. As you have seen, print can
display an entire record. It can also display selected fields of the record, as in the
following example:
$2 == "bridge" { print $1 }

This rule displays the first field of every record whose second field is bridge. The
output is as follows:
Jim
Linda
Lori

The pr int command can display more than one field. If you give pr int a list of
fields separated by commas, print displays the given fields separated by single
blanks. For example:
$1 == "Jim" { print $2,$3,$4 }

This program produces the following output:
reading 15 100.00
bridge 4 10.00
role-playing 5 70.00

The print action can display strings and numbers along with fields. For example:
$1 == "John" { print "$",$4 }

This program's output looks like this:
$ 100.00
$ 30.00

In this example, the print action prints out a string containing a dollar sign ( $ )
followed by a blank, followed by the value of the fourth field in each selected record.

As an exercise, predict the output of the following programs:
(a)
(b)
(c)

$1 == "Lori" { print $1, "spends $", $4,"on",$2
$2 == "jogging" {print $1, "jogs",$3, "hours a week"
$4 > 100.00 { print $1, "has an expensive hobby" }

1.2.4 Additional Points About Rules
You can put any number of extra blanks and tabs into nawk patterns and actions.
For example:
{ print $1 , $2 , $3 }

You can leave out the pattern part of a rule. In this case, the action part is applied to
every record in the file. The following example is a complete n a wk program that
displays every record in the data file.
{ print }

You can leave out the action part of a rule. In this case, the default action is
print. The following example is a complete nawk program that displays every
record whose first field is Andrew:
$1 == "Andrew"

This is equivalent to the following:
$l=="Andrew" { print }

Basic Concepts 1-5

When a nawk program contains several rules, nawk applies every appropriate rule to
the first record, then every appropriate rule to the second record, and so on. Rules
are applied in order. For example:
$1 == "Linda"
$2 == "bridge" { print $1 }

This program produces the following output:
Jim
Linda
Linda
Linda
Lori

bridge

30.00

cartooning

75.00

The nawk program looks through the file record by record. The following record is
the first to satisfy one of the patterns:
Jim

bridge

10.00

As a result, nawk prints out the first field of the record (as dictated by the second
rule). The next record of interest is
Linda

bridge

30.00

This record satisfies the pattern of the first rule, so the whole record is printed. It
also satisfies the pattern of the second rule, so the first field is printed again. The
nawk program continues through the file, record by record, executing the appropriate
actions when the pattern is satisfied.

1.3 Running nawk Programs
You can run nawk programs in two ways:
•

From a command line

•

From a program file

The following sections describe these two methods.

1.3.1

The nawk Command Line
The simplest nawk command line has the following form:
nawk 'program' datafile

The nawk program is enclosed in apostrophes, or single quotation marks ( '). The
datafile argument gives the name of the data file. For example, the following
command executes the program $1 == "Linda" on the hobbies file:
% nawk '$1

== "Linda'"

hobbies

You can also type in a multiline program within apostrophes, provided that the shell
you are using allows this construction. For example:
nawk '
$1 == "Linda"
$2 == "bridge" { print $1 }
, hobbies

As mentioned in a previous section, the default is for nawk to assume that record
fields are separated by space and tab characters. If the data file uses different field
1-6 Basic Concepts

separator characters, you must indicate this on the command line. You do this with
an option of the following form:
-Fstring

The string lists the characters used to separate fields. For example:
nawk -F":" '{ print $3 }"

file.dat

This rule indicates that the given data file uses colons (:) to separate fields in its
records. The - F option must come before the quoted program rules.

1.3.2 Program Files
Short programs like the ones discussed in this chapter can be entered on a single
command line. Later chapters discuss longer programs, which cannot be typed on a
single line. Such programs are most easily executed from a program file.
A program file is a text file that contains a n a wk program. You can create program
files with any text editor. For example, you might create a program file named
Ibprog . nawk that contains the following lines:
$1
$2

==
==

"Linda"
"bridge" { print $1 }

To execute a program on a particular data file, use the following command:
nawk - f progfile datafile

The name progfile is the name of the file that contains the nawk program, and
data/tie is the name of the data file. The following example runs the program in
Ibprog . nawk on the data in hobbies:
nawk -f lbprog.nawk hobbies

If the data file does not use the default separator characters, you must specify a - F
option after the progfile name. For example:
nawk -f prog.nawk -F":" file.dat

As an exercise, execute the examples in this chapter on the hobbies file. Run
some from the command line and some from program files.

1.3.3 Sources of Data
If you do not specify a data file on the command line, nawk reads data from the
terminal. If you issue a command as in the following example, nawk prints the first
word of every line you type in:
nawk ' { print $2 }'

When you are entering data from the terminal, mark the end of the data by typing
CTRL/D. For example:
% nawk '{ print $1 }'
Jim
reading

100.00

bridge

10.00

role-playing

70.00

30.00

reading

Jim
bridge

Jim

role-playing

Linda

bridge

Basic Concepts 1-7

Linda

cartooning

75.00

cartooning
ICTRUol
%

You can specify several data files on the nawk command line. For example:
nawk -f progfile datal data2 data3 ...

When n a w k finishes reading the first data file, data 1, it moves to data 2, and so
on.

1.3.4

Saving nawk Output
You can save a nawk program's output in a file by using output redirection. To do
this, specify a right angle bracket (> ) and a file name at the end of any nawk
command line. For example:
nawk -f progfile datafile >outfile

This command line writes the output from the nawk program to a file named
outfile. In this case, the output is not displayed on the terminal screen. For more
information about redirection, see the chapter on the shell in The Little Gray Book:
An ULTRIX Primer.

1-8 Basic Concepts

Simple Arithmetic

The nawk language makes it easy for you to perform calculations with numbers
contained in data files. This chapter discusses how nawk does arithmetic and shows
examples of programs using these features.
Note that nawk performs arithmetic operations in exactly the same way as the C
programming language. Therefore, knowledge of nawk is good preparation for
learning C.

2.1

Arithmetic Operations
Here is an example of a nawk program that uses simple arithmetic:
$3 > 10 { print $1, $2, $3-10 }

In the pr int statement, $ 3 -1 0 subtracts 10 from the value of the third field in the
record. The pr int statement prints this result. If you apply this program to the
hobbies file shown in the previous chapter, the output will be as follows:
Jim reading 5
Linda bridge 2
Katie jogging 4
Andrew wind-surfing 10
Lori weight-lifting 2

The program works like this: if someone spends more than 10 hours on a hobby, the
program prints the person's name, the name of the hobby, and the number of extra
hours the person spends on the hobby (the number of hours more than 10).
The notation $ 3 -lOis called an arithmetic expression. It performs an arithmetic
operation and comes up with a result; the result of the arithmetic is called the value
of the expression.
The nawk language recognizes the arithmetic operations shown in Table 2-1.
Table 2-1: Arithmetic Operations
Operation

Operator

Example

Addition
Subtraction
Multiplication

A + B

2+3 is 5
7-3 is 4
2*4 is 8

Division
Negation

A - B
A * B
A / B
- A

6/3 is 2
- 9 is -9

Table 2-1:

(continued)

Operation

Operator

Example

Remainder
Exponentiation

A % B

7%3 is 1
3"2 is 9

I\.

The remainder operation is also known as the modulus or integer remainder
operation. The value of a modulus operation is the integer remainder you get when
you divide A by B. For example:
7 % 3

This expression has a value of 1, because when you divide 7 by 3, you get a quotient
of 2 and a remainder of 1.
The value for the exponentiation operation A
exponent B. For example:

I\.

B is the value of A raised to the

3 " 2

This expression has the value 9 (that is, 3x3).
Here are some programs that perform simple arithmetic with the hobbies file. Try
to figure out what they do and what they will print out.
$1 == "Katie" { print $2, $3/7 }
{print $1, $2, $3/7 }
$1 == "Jim" { print $1, $2, "$", $4/52 }
{print $1, "$", $4*1.05 }

(a)
(b)
(c)
(d)

After you have thought about the programs, run them to see if they produce the
output you have predicted. An explanation of each program follows:

2.1.1

(a)

Because field 3 gives the average number of hours per week that a person
spends on a hobby, $ 3 / 7 shows the average number of hours per day.
Program (a) therefore prints out the number of hours per day Katie spends on
each of her hobbies.

(b)

This is a variation on program (a). It prints out the number of hours per day
each person spends on each hobby.

(c)

Field 4 gives the amount of money a person spent this year on a particular
hobby. Dividing this by 52 gives the average amount of money spent per week.

(d)

If the current inflation rate is 5 percent, multiplying this year's expenses by 1.05
will give the amount of money the same person might expect to spend next
year. This is the information that program (d) prints out.

Operation Ordering
Expressions can contain several operations. For example:
A+B*C

As is customary in mathematics, all multiplications and divisions (and remainder
operations) are performed before additions and subtractions. When handling the
expression A +B * C, n a wk performs B * C first and then adds A. The value of 2 +3 * 4
is therefore 14 (3x4 first, then add 2). If you want a particular operation done first,
enclose it in parentheses. For example:

2-2 Simple Arithmetic

(A+B) *C

When evaluating this expression, nawk performs the addition before the
multiplication. Therefore, (2 +3) * 4 is 20. (Add 2 and 3 first, then multiply by 4.)
For example, consider the following program:
{ print $4/($3*52)

}

Field 4 is the amount of money a person spent on a hobby in the last year. Field 3 is
the average number of hours a week the person spent on that hobby, so $ 3 * 52 is the
number of hours in 52 weeks (one year). The value $ 4 / ($ 3 * 52) is therefore the
amount of money that the person spent on the hobby per hour.
Appendix A shows the order of evaluation for nawk expressions.

2.2 Formatted Output
With nawk, you can specify the format you want your output to take. For example:
$1 == "Jim" { print "$", $4/52 }
This program produces the following output:
$ 1.923077
$ .192308
$ 1.346154

This output shows the amount of money per week that Jim spent on his hobbies.
However, it is customary to write money amounts with only two digits after the
decimal point. How can you change the program to make the money amounts look
more normal? The answer is to use the printf action instead of print. The
printf statement lets you specify the format in which output should be printed.
A printf action has the following form:
{ printf format-string, value, value, ... }

The format-string indicates the format in which output should be printed. The values
give the data to be printed.
A format string contains two kinds of items:

•

Normal characters, which are just printed out as is

•

Placeholders, which are replaced with values given later in the printf action

As an example, try running the following program on the hobbies file:
$2 == "bridge" { printf "%55 plays bridge\n", $1 }
This nawk program will produce the following output:
Jim plays bridge
Linda plays bridge
Lori plays bridge

The following format string has one placeholder, %5 s:
"%55 plays bridge\n"

The first (and only) value printed by this program is $1; when the printf
statement prints its output, the placeholder is replaced by the value of field 1. The
rest of the format string is printed as is. (Note that the format string ends in \n; this
symbol is explained in Section 2.2.2.

Simple Arithmetic 2-3

2.2.1

Placeholders
The fonn of a placeholder tells nawk how to print out the associated value. All
placeholders begin with a percent sign ( %) and end in a letter. Table 2-2 shows the
most common letters used in placeholders.

Table 2-2: Format String Placeholders
Placeholder

Description

An integer in decimal form (base 10)

A floating point number in scientific notation, as in -d. ddddddE+dd

A floating point number in conventional form, as in -ddd. dddddd

A floating point number in either e or f form, whichever is shorter; also,
non-significant zeroes are not printed
An unsigned integer in octal form (base 8)

A string

An unsigned integer in hexadecimal form (base 16)

For example, the following fonnat string contains two placeholders:
"%s %d\n"

The notation %8 represents a string and %d represents a decimal integer.
You can put additional infonnation between the percent sign and the letter at the end
of the placeholder. If you put an integer there, as in %58, the number is used as a
width. The corresponding value is printed using (at least) the given number of
characters. For example:
$2 == "bridge" { printf "%5s plays bridge\n", $1 }

Here, the value of the string $1 replaces the placeholder %5 8 and is always printed
using at least five characters. The output, therefore, is as follows:
Jim plays bridge
Linda plays bridge
Lori plays bridge

If you did not specify the 5 in the placeholder, the output would be different. For
example:
$2 == "bridge" { printf "%s plays bridge\n", $1 }

This program produces the following output:
Jim plays bridge
Linda plays bridge
Lori plays bridge

If no width is given, nawk prints values using the smallest number of characters
possible.
The nawk language also lets you put a minus sign ( - ) in front of the number in the
width position. The amount of output space will be the same, but the infonnation
will be left-justified. For example:
$2 == "bridge" { printf "%-5s plays bridge\n", $1 }

2-4 Sim pie Arithmetic

This program's output looks like this:
Jim
plays bridge
Linda plays bridge
Lori plays bridge

A placeholder for a floating point number may also contain a precision. This is
written as a decimal point followed by an integer. A precision determines the
number of digits to be printed after the decimal point in a floating point number. For
example:
$1 == "John" { printf "$%.2f\n", $4/52 }

Here, the placeholder %• 2 f indicates that all floating point numbers are to be printed
with two digits after the decimal point. This program produces the following output:
$1.92 on role-playing
$.58 on jogging

Using both a width and a precision can improve the appearance of your program's
output. For example:
$1 == "John" { printf "$%4.2f on %s\n", $4/52, $2 }

This program's output looks like this:
$1.92 on role-playing
$0.58 on jogging

The %4 . 2 f indicates that the corresponding floating point value are to be printed
with a width of four characters, with two characters after the decimal point. Note
that the decimal point itself is counted in the width.
Here are a few more nawk programs that work on the hobbies file. Predict what
each will print out, and run them to see if your prediction is right.
(a)
(b)
(c)

2.2.2

{printf "%6s %s\n", $1, $2 }
{printf "%20s: %2d hours/week\n", $2, $3 }
$l=="Katie" { printf "%20s: $%6.2f\n",$2,$4

Escape Sequences
All of the format strings shown so far have ended in \n. This kind of construct is
called an escape sequence. All escape sequences are made from a backslash
character ( \ ) followed by one, two, or three other characters.
You use escape sequences inside strings to represent special characters. In particular,
the \n escape sequence represents the new-line character. A \n in a printf format
string tells nawk to start printing output at the beginning of a new line. For
example:
$1 == "Lori" { printf " %s", $2 }

This program produces the following output:
jogging weight-lifting bridge

The output is all on one line; without the \ n escape sequence, p r i n t f does not
start new lines. This action is different from that of print, which begins a new line
each time it executes.
You can use the \ n escape sequence in the middle of a format string. For example:
$1 == "John" { printf "%s:\n

%d\n",$2,$3}

Simple Arithmetic 2-5

This program's output looks like this:
role-playing:
8
jogging:
8

The first new-line escape sequence starts a new line after the colon; the second starts
a new line after the value of $ 3.
Table 2-3 shows the valid nawk escape sequences.

Table 2·3: Escape Sequences for nawk
Escape

Interpretation

Escape

Interpretation

\"
\a

Quotation mark

Audible bell

Backspace

Formfeed

New-line
Carriage return
Horizontal tab
Vertical tab

\000

ASCII character, octal 000

Use the escape sequence \" (a backslash followed by a quotation mark) when you
want a string to contain an actual quotation mark. For example:
"He said, \"Hello\"."

By entering this escape sequence, you indicate that the quotation mark character is
inside the string; if you left out the backs lash, nawk would think that the quotation
mark before Hello was marking the end of the string.
Because a backslash followed by another character looks like an escape sequence,
you must type two backslashes ( \ \ ) if you want to put a single backslash character
in a string. For example:
{ print "The backslash (\\) character"

The output from this program is as follows:
The backs lash (\> character

2.3 Variables
Suppose you want to find out how many people have jogging as a hobby. To do
this" you_have to look through the hobbies file, record by record, and keep a count
of the number of records that have jogging in their second field. This means you
must remember the count from one record to the next.
A nawk program remembers information by using variables. A variable is a storage
place for information. Every variable has a name and a value. A variable is given a
value with an action of the following form:
name = value

The nawk utility assigns the specified value to the variable that has the given name.
The following example assigns the value 0 (zero) to the variable count:
count

Do not confuse the assignment operator ( = ) with the equality test operator ( ==). A
2-6 Simple Arithmetic

single equal sign ( = ) stores a value in a variable. A pair of equal signs ( == ) tests to
see if two values are equal.
You can use variables in expressions. For example:
count + 1

The value of this expression is the current value of count plus 1.
Now consider the action in the following example:
count = count + 1

Your nawk program first finds the value of count + 1 and then assigns this value
to count. This action increases the value of count by 1. You can use this kind of
action in a program to count how many people have jogging as a hobby.
BEGIN { count = O}
[j]
$2 == "jogging" { count = count + 1}
121
END { printf "%d people like jogging.\n", count

I3l

A line by line review of this program follows:

[1]

When a rule has BEGIN as its pattern, the associated action is performed before
nawk has looked at any of the records in the data file. Therefore, nawk begins
by assigning the value 0 to count.

121

This line adds one to count every time nawk finds a record with jogging in
the second field.

I3l

When a rule has END as its pattern, the associated action is performed after
nawk has looked at all records in the data files specified on the command line.
Thus, after nawk has looked at all the records, the printf action prints out
the count of people who jog. The output from the program will be as follows:
3 people like jogging.

Notice how the value of count is printed out in place of the %d placeholder.
Here are a few more programs that use variables. Examine the programs and try to
figure out what they are doing.
(a)

BEGIN {count = 0 }
$1 == "John" { count = count + 1 }
END {printf "John has %d hobbies.\n", count}

(b)

BEGIN {sum = 0 }
$1 == "Linda" { sum = sum + $4 }
END {printf "Linda spends $%6.2f a year\n",sum

(c)

BEGIN {hours = 0 }
$1 == "Lori" { hours = hours + $3 }
END {printf "Lori passes %d hours/week\n",hours

Here is what each of these programs does:
(a)

This program counts the number of hobbies that John has.

(b)

This program adds up the amount of money that Linda spent on hobbies in the
past year.

(c)

This program calculates the number of hours a week that Lori spends on her
hobbies.

Using variables, you can write even more complex programs. For example, consider
the following:
BEGIN

{sum = 0; count = 0 }

Simple Arithmetic 2-7

$2 == "role-playing" {
count = count + 1
sum = sum + $4
END

{printf "Average per person: $%6.2f\n",sum/count }

This program has two variables. The count variable keeps track of the number of
people with role-playing as a hobby, and sum keeps track of the amount of money
spent on role-playing. When sum is divided by count, the result is the average
amount spent on role-playing.
Notice that the action part of the BEGIN rule contains two assignment instructions.
A semicolon is used to separate the two instructions. The second rule in the program
also has two assignments:
count = count + 1
sum = sum + $4

These two instructions are on separate lines. When an action contains more than one
instruction, you can separate the instructions with semicolons or put them on separate
lines.
Variables can be used in the pattern part of a rule. For example:
BEGIN {max = a }
$3 > max { max = $3 }
END {printf "The maximum time is %d hours.\n", max}

This program finds the maximum value of field 3 in the hobbies file. The
maximum is set to 0 to start. Then, if a record has a value in field 3 that is greater
than the current value of max, max is set to this new value. At the end of the data
file, max will hold the largest value found.
As an exercise, try to write a nawk program that examines the hobbies file and
calculates the average number of hours per week that someone spends on anyone
hobby. Then write a program that calculates the average number of hours per year
that a person spends on anyone hobby.

2.3.1

The Increment and Decrement Operators
You know how to advance the value held in a variable with an addition operation:
count = count + 1

This is such a common operation that n a w k has a special operator for incrementing
variables by 1:
count++

A pair of minus signs ( - - ) is the counterpart of ++. This operator decrements
(subtracts 1 from) the current value of a variable. For example, to subtract 1 from
count, you could use either of these two forms:
count = count -1
count--

2.3.2

Initial Values
If you use any variable in an arithmetic expression before you assign the variable a
value, the variable is automatically given the value O. This means that the BEGIN
rule in the following program could be left out:

2-8 Simple Arithmetic

BEGIN {count = 0 }
$2 == "jogging" { count = count + 1 }
END {printf "%d people jog\n", count

2.3.3

Built-In Record-Oriented Variables
The nawk language has several built-in variables that you can use in your programs.
You do not have to assign values to these variables; nawk automatically assigns the
values for you. Table 2-4 describes some of the important numeric built-in variables.
These variables have to do with information about records.

Table 2-4: Built-In Record-Oriented Variables
Variable

Description

Contains the number of records that have been read so far. When nawk is
looking at the first record, NR has the value 1; when nawk is looking at
the second record, NR has the value 2; and so on. In a BEGIN rule, NR
has the value O. In an END rule, NR contains the total number of records
that were read. The following rule prints the total number of data records
read by the nawk program:
END

FNR

{print NR }

Like NR, but counts the number of records that have been read so far from
the current file. When several data files are given on the nawk command
line, FNR is set back to 1 when nawk begins reading each new file. Thus,
the following rule will print the line number in the current file, followed
by a colon, followed by the contents of the current line:
{ printf "%d:%s\n",FNR,$O }

Gives the number of fields in the current record. For the hobbies file,
NF is 4 for each line because there are four fields in each record. In an
arbitrary text file, NF gives the number of words on the current line in the

file; by default, the fields of a file are assumed to be separated by blanks,
so each word on a line is considered to be a separate field. The following
program therefore prints out the total number of words in the file:
{ count = count + NF
END {print count }

You can use built-in variables in place of any other variable or value. For example,
they can appear in the pattern part of a rule. For example:.
NF > 10 { print }

This rule prints out any record that has more than ten fields. Here is another
example:
NR == 5 { print }

This rule prints out record 5 in a file; the pattern selection criterion is true only when
NR is 5.
Try to predict what the following example will do:
{ print $NF }

Simple Arithmetic 2-9

Because NF is the number of fields in the current record, it is also the number of the
last field in the record. Therefore, $NF refers to the contents of the last field in a
record, and the command in the previous example prints the last field in every record
in the data file.
To test your understanding of almost everything discussed in this chapter, try to
predict what the following rule will print:
(NR % 5)

The expression NR% calculates the remainder of NR divided by 5. The rule prints
out a record whenever this remainder is equal to O. Therefore, the rule prints out
every fifth record from the data file.
As an exercise, write nawk programs to do the following:
(a)

Print every record that does not have exactly three fields.

(b)

Print the total number of words and total number of lines in a text file. (This is
two thirds of what the we( 1) command does.)

(c)

Print the total number of records that have either four fields or five fields.

(d)

Print the average number of words per line in a text file.

Write these programs and test them by running them on arbitrary text files. Once
you have solutions that work, compare them against the following answers:
(a)

NF! = 3

(b)

END
(c)

words = words + NF }
printf "Words = %d, Lines
words, NR }

%d\n" ,

NF == 4
count = count + 1
NF == 5 { count = count + 1
END
{ print count

(d)
END

words = words + NF
print "Average = %d\n", words/NR }

There are often several ways to write a given program; your solutions may differ
from the ones presented here.

2.4 Arithmetic Functions
In nawk, a function can be compared to a car assembly line: you feed in various
parts and raw materials at one end, and you get out a complete product at the other
end. In nawk, a function is fed data values (called the arguments of the function)
and the final product is also a data value (called the result of the function).
You may already be familiar with this kind of function in mathematics. For
example, mathematics uses sin to stand for a function that calculates the
trigonometric sine of an angle. If you "feed" an angle into the sin function, the
number returned is the trigonometric sine of the given angle. The angle is the
argument of the function, and the sine is the result.
In nawk, you use functions inside expressions. For example:
y = sin (x)

The right hand side of the assignment is a function call. The name of the function
is sin; this name is immediately followed by the function's arguments, which are

2-10 Simple Arithmetic

enclosed in parentheses. When a nawk program contains a function call, nawk
calculates the result of the function and uses that result in the expression that contains
the function call. In the statement y=sin (x) , nawk calculates the number that is
the sine of the given angle and then assigns that number to the variable y.
Another nawk function is sqrt, whose result is the square root of its argument.
The following statement assigns the value 4 to x:
x = sqrt(16)

To show how you can use these functions, suppose you have a set of data that
contains one number per line. Here is a program that reads these numbers and prints
out the square root of each:
{ printf "Number: %f, Root: %f\n", $1, sqrt ($1)

}

You can run this program with the following command line, and then type in
numbers from the terminal:
% awk '{ printf "Number: %f, Root: %f\n", $1, sqrt($l) }'

Each time you press the RETURN key at the end of the line, n a wk prints out the
square root of the number.
Any argument of a function can be an expression instead of a single value. For
example:
y

sin(2*x)

Your n a w k program will calculate the value of the expression and then use the
resulting value as the argument of the function.
The nawk language recognizes the most common mathematical functions, as shown
in Table 2-5.

Table 2-5: Common Mathematical Functions

Function

Result

Function

Result

sin (x)

Sine of x, where x is in
radians
Cosine of x, where x is in
radians
Arctangent of y/x in range
-1t to 1t radians
Natural logarithm (base

sqrt (x)

Square root of x

int (x)

Integer part of x

rand( )

Random number n, ~n<l

srand (x)

Sets x as seed for rand ()

cos (x)
atan2 (y,x)
log (x)

e)
exp(x)

Exponential (ex)

Several of these functions need a little more explanation.
The in t function takes a floating point number as an argument and returns an
integer. The integer is the floating point number without its fractional part. For
example:
int (6.3)

This expression has the value 6. The following expression has the value -7. Note

Simple Arithmetic 2-11

that the fractional part is removed (truncated), not rounded.
int(-7.4)

The next expression has the value 8:
int(S.99999)

A call to rand returns a random number greater than or equal to 0 and less than 1.
In this way, you can get a sequence of random numbers. You can use srand to set
the starting point (seed) for a random number sequence. If you set the seed to a
particular value, you will always get the same sequence of numbers from rand.
This is useful if you want a program to use rand but obtain uniform results every
time the program runs.
As an example of how you can use rand, here is a sequence of instructions that
could be used in a nawk program to simulate a roll of two six-sided dice.
die1 = int(6 * rand()
die2 = int(6 * rand()

+ 1)
+ 1 )

The function call rand () obtains a random floating point number from 0 to 1 (not
including 1). Note that the function call needs the parentheses, even though rand
requires no argument values. Multiplying the random number by 6 gives a floating
point value from 0 to 6 (not including 6). Adding 1 gives a floating point value from
1 to 7 (not including 7). Applying the int function to this floating point value
drops the fraction part, giving an integer from 1 to 6.

2-12 Simple Arithmetic

Patterns and Regular Expressions

So far, this manual has discussed three kinds of patterns: comparisons, and the
special patterns BEGIN and END. This chapter discusses a fourth kind: regular
expressions.
A regular expression is a way of telling nawk to select records that contain certain
strings of characters. For example, the following rule tells nawk to print all records
that contain the string r i :
/ri/ { print }

Applying this rule to the hobbies file produces this output:
Jim
Linda
Lori
Lori
Lori

bridge
bridge
jogging
weight-lifting
bridge

12
5

12
2

10.00
30.00
30.00
200.00
0.00

All these records contain ri, either in Lori or bridge.
Regular expressions are always enclosed in slashes. For example:
/ing/

This expression finds all the records that contain in g .
The nawk language pays attention to the case of letters in regular expressions. For
example,
/li/

will print the record that contains weight-lifting; however, the /li/ does not
match the Linda records because the L in Linda is uppercase.
It is important to recognize the difference between two rules like the following:
$1 == "Lori"
/Lori/

To satisfy the first of these patterns, a record must have its first field exactly equal to
the string Lori. If the first field is Lorie, for example, the comparison will not be
true and the pattern will not be satisfied. With the regular expression / Lo r i / the
string Lori can appear anywhere in the record, and can be all or part of a field.
This regular expression would match a string like Lorie .

3.1

Using Matching Expressions
If the pattern in a rule is a regular expression, nawk looks for a matching string
anywhere in a record. Sometimes, however, you only want to look for a matching
string in a particular field of a record. In this case, you can use a matching
expression.

Two types of expressions check for matches:
•

The following expression is true if the string matches the given regular
expression:
string - /regular-expression/

•

The following expression is true if the string does not match the given regular
expression:
string ! - /regular-expression/

The statement in the following program looks for matching strings; applied to the
hobbies file, it will print all records that have ri contained somewhere in the
second field:
$2 - /ri/

This example produces the following output:
Jim
Linda
Lori

bridge
bridge
bridge

12
2

10.00
30.00
0.00

The following rule looks for nonmatching strings; it will print all records that do not
have the letter J somewhere in the first field:
$1 !- /J/

Note that the following two patterns are equivalent because $ 0 represents the whole
record:
/Lori/
$0 - /Lori/

3.2 Metacharacters
Several characters have special meanings when they are used in regular expressions.
These special characters, known as metacharacters, are described in Table 3-1.

Table 3-1: Metacharacters Recognized by nawk
Character

Description
Stands for the beginning of a field. For example:
$2 - /Ab/ { print }

This rule prints any record whose second field begins with b.
$

Stands for the end of a field. For example:
$2 - /g$/ { print }

This rule prints any record whose second field ends with g.
Matches any single character (except the new-line). For example:
$2 - /i.g/ { print}

This rule selects the records with fields containing ing, and also selects
the records containing br idge (idg).

3-2 Patterns and Regular Expressions

Table 3-1:

(continued)

Character

Description
Means "or."

For example:

/LindaILori/

This regular expression matches either of the strings Linda or Lori.

Indicates zero or more repetitions of a character. For example, / ab * e /
matches abe, abbe, abbbe, and so on. It also matches ae (zero
repetitions of b). The asterisk is most frequently used in conjunction with
the period ( . *). Because the period matches any character except the
new-line, the period/asterisk combination matches an arbitrary string of
zero or more characters. For example:
$2 - /Ar.*g$/ { print}

This rule prints any record whose second field begins with r, ends in g,
and has any set of characters between (for example, reading and
role-playing).

Similar to the asterisk, but stands for one or more repetitions of a string.
For example, / ab+e/ matches abe, abbe, and so on; but it does not
match ae.

Similar to the asterisk, but stands for zero or one repetitions of a string.
For example. /ab?e/ matches ae and abe, but not abbe, and so on.

{m, n}

[X]

Indicates m to n repetitions of a character (where m and n are both
integers). For example, / ab {2, 4 }e/ matches abbe, abbbe, and
abbbbe, but nothing else.
Matches anyone of the set of characters X given inside the brackets. For
example:
$1 - /A[LJ]/ { print}

This rule prints any record whose first field begins with either L or J. As
a special case, [: lowe r:] inside brackets stands for any lowercase
letter, [: upper:] inside brackets stands for any uppercase letter,
[: alpha:] inside brackets stands for any letter, and [ : digit:]
inside brackets stands for any digit. For example:
/[[:digit:] [:alpha:]]/

This expression matches a digit or letter.

[AX]

Matches anyone character that is not in the set X that follows the
circumflex ( A). For example:
$1 - /A[ALJ]/ { print}

This rule prints any record whose first field does not begin with L or J.
$1 - /A[A[:digit:]]/ { print}

This rule prints any record whose first field does not begin with a digit.
(X)

Matches anything that the regular expression X does. Parentheses are used
to control the way in which other special characters behave. For example,
the asterisk ( * ) normally applies to the single character that immediately
precedes it. For example, /abe*d/ matches abd, abed, abeed, and so
on. However, /a (be) *d/ matches ad, abed, abebed, and so on.

Patterns and Regular Expressions 3-3

When a meta character appears in a regular expression, it usually has its special
meaning. If you want to use one of these characters literally (without its special
meaning), put a backs lash in front of the character. For example, the following
statement prints all records that contain a dollar sign ( $ ) followed by a 1:
/\$1/ { print }

If you wrote the expression without the backslash, n a w k would search for records in
which the end of the record is followed by aI, which is impossible.

Because the backs lash has this special meaning, it too is considered a meta character.
If you want to create a regular expression that matches a backslash, you must
therefore use two backslashes ( \ \ ).

3.3 Using Matching Expressions with Strings
Until now, you have seen matching operations that contain regular expressions inside
slash ( / ) characters. Matching operations can also refer to normal strings; for
example:
$1 - "xyz"

This has the same effect as the following statement:
$1 - /xyz/

Regular expressions are compiled when the program is read. To use a string as a
regular expression, nawk constructs a dynamic regular expression out of the string.
Dynamic regular expressions take more time to compile than regular expressions, but
they are more powerful.
When a matching operation uses a string instead of a regular expression, and the
string contains one or more metacharacters, the situation is a little bit tricky. If you
want to escape a metacharacter (have it taken literally), you must use two
backslashes instead of one. For example, suppose you want to look for strings of the
form" $1 . 00" in field 4 of a record. Using regular expressions, you would write
the statement as follows to show that both the dollar sign ( $ ) and the period ( . )
should be taken literally:
$4 - /\$1\.00/

With strings, you would have to write the statement like this:
$4 - "\\$1\\.00"

Two backslashes are needed instead of one. The reason is simple: as discussed in
Chapter 2, you need to type two backslashes inside a quoted string to get the effect of
one. For example:
{ print "The backs lash character: \\" }

This program prints the following:
The backslash character: \

To match an actual backslash with a dynamic regular expression, you must use four,
as in:
$1 - "\\\\"

The literal string" \ \ \ \" is read by nawk and turned into a string consisting of
"\ \ ". When used as a dynamic regular expression, this will match one backslash.

3-4 Patterns and Regular Expressions

3.4 Applying Actions to a Group of Lines
Pattern ranges let you apply an action to a group of lines. A rule that applies to a
pattern range has the following fonn:
pattern] ,pattern2 { action}

This rule perfonns the given action on every line, starting at an occurrence of
pattern] and ending at the next occurrence of pattern2 (inclusive). For example:
NR == 1, NR == 10 { print $1 }

This rule prints the first field of each of the first 10 input lines. It starts when NR is 1
and ends when NR is 10. Here is another example, using the hobbies file as its
data file:
/Jim/,

/Linda/ { print $2 }

This example produces the following output:
reading
bridge
role-playing
bridge

As you can see, this program prints the second field of all lines between an
occurrence of Jim and an occurrence of Linda.
After nawk has found a record matching pattern2 , it begins to look for a line
matching pattern] again. In the following example, nawk prints the first range of
records from reading to role, then starts looking for reading again.
/reading/, /role/

The output from this program looks like this:
Jim
Jim
Jim
Katie
John

reading
bridge
role-playing
reading
role-playing

15
4
5
10
8

100.00
10.00
70.00
60.00
100.00

It is important to remember that n a w k starts performing the rule's action as soon as
there is a record that matches pattern]. A nawk program does not check to make
sure that there is a line matching pattern2 in the rest of the file. For example:
/Lori/, /Jim/ { print $2 }

In this case, nawk begins printing at the first record that contains Lori, and
continues until it reaches the end of the file, finding no record that matches the
second pattern, Jim.

3.5 Combining Conditions in Patterns
A double ampersand ( & & ) operator means AND. It is used to combine conditions in
patterns. For example:
$3 > 10 && $4 > 100.00 { print $1, $2 }

In this case, nawk prints the first and second fields of any record where $ 3 is greater
than 10 and $4 is greater than 100.00. Here is another example:
$1 - /J/ && $4 < 50.00

Patterns and Regular Expressions 3-5

This rule prints all records in which the first field $1 contains a J and the fourth field
$ 4 is less than 50.00.
The double vertical bar ( I I ) operator means OR. It is also used to combine
conditions in patterns. For example:
$1 == "Linda"

I I $1 == "Lori"

This rule prints any record whose first field is either Linda or Lori. Here is
another example:
/jogging/ I I /reading/ { sum = sum + $4 }
END
{print sum }

This program calculates the total money spent by hobbyists on both jogging and
reading (because sum is increased if the hobby is either jogging or reading).
This program is equivalent to the following program:
/jogginglreading/ { sum = sum + $4 }
END
{print sum }

These last two examples demonstrate that there are often several ways of writing the
same program.
The double ampersand and double vertical bar operators can only be used to combine
complete pattern expressions. For example, you cannot write a pattern like this:
$1 == "Linda" II "Lori"

You must write this kind of pattern this way:
$1 == "Linda" I I $1 == "Lori"

For practice with the concepts discussed in this chapter, write programs that do the
following:
(a)

Print every record that begins with A and contains more than four fields.

(b)

Print the number of records that contain a dollar sign ( $ ).

(c)

Print records 10 through 20 of every data file.

(d)

Print every tenth record of a file, plus the record that immediately follows the
tenth record (records 10 and 11, records 20 and 21, and so on).

When you have written your programs, compare them against the solutions that
follow. Remember that there may be several ways to write the same program.
NF > 4

(a)

/AA/

(b)

count + 1 }
/\$/ { count
{ print count }
END

(c)

FNR -- 1O, FNR == 20

(d)

(NR % 10) -- 0, (NR % 10) -- 1
or
«NR % 10) -- 0) II ( (NR % 10)

3-6 Patterns and Regular Expressions

Actions and Control Structures

So far, you have learned three actions: print, printf, and assignments. In this
chapter, you will examine a wide variety of constructs that may appear in the action
part of a nawk rule. Note that most of these are virtually identical to constructs in
the C programming language.

4.1

Adding Comments
A comment is a note inside your program, explaining what the program is doing.
Your n a wk program ignores comments, so they do not affect how your program
behaves, but they do help explain what is going on.
A comment begins with a number sign (#). When nawk sees the number sign in a
program (outside of a quoted string or regular expression), it ignores the rest of the
line. For example:
# This program adds up the hours John spends on hobbies
/John/ { sum = sum + $3}
# field 3 is hours
END
{print sum }

The first line of this program explains what the program is doing. This is useful
when you have a number of nawk programs stored in different files and you cannot
remember which program is which. A comment at the beginning of the program lets
you identify the program without having to read through the code and figure out what
is going on.
The following example shows another way in which you can use comments:
/John/ { sum = sum + $3 }
# field 3 is hours
A comment on the end of a line can give further information about what that line is
doing. In this case, it explains the meaning of the number in field 3 of the record.
It is a good practice to use comments in your programs. Without meaningful
comments, you may find it difficult to understand a program if you look at it several
months after you wrote it. Comments also make it easier for others to understand the
programs you write.

4.2 The if Statement
An if statement lets you perform an action if a specified condition is true. The
statement has the following form:

if {expression} statement} else statement2
Typically, the expression in an if statement has a true/false value. If the value is
true, statement} is performed; otherwise, statement2 is performed. The else
statement2 part is optional.
To see how if statements are used, consider the following programs, which examine
a file of baseball scores. This file is named baseball, and it looks like this:

Brewers
Brewers
Blue Jays

Tigers
Blue Jays
Red Sox

5
2

9
6
7

Each line gives the home team first and the visitors second. Fields in each record are
separated by tab characters (shown here as wide spaces) instead of single blanks,
because some team names contain blanks. This means that you must use the
following option when you run command-line nawk programs on the baseball file:
-F"\t"

This option is equivalent to having the following line in a nawk program file:
BEGIN

{ FS = "\t" }

(The built-in FS variable is explained in Chapter 5.)
Consider the following program:
{ if ($2 > $4) print "Home"
else
print "Visitor"

This program prints Home when the home team's score ( $2 ) is greater than the
visiting team's, and prints Visi tor otherwise.
The e 1 s e part of an i f statement can be omitted. In this case, n a wk does nothing
if the expression of the if statement is not true. For example:
$1 - /Tigers/ { i f ($2 > $4) win++ }
END {print win }

This is a simple program that looks at all the Tigers' home games and prints out the
number of times the Tigers won. On records where $2 is not greater than $4, nawk
takes no action.
As a more complicated example, consider this program:
if ($2 > $4) print
else
print
if ($4 > $2) print
else
print

$1 - /Yankees/
$3 - /Yankees/

"Home Win"
"Home Loss"
"Away Win"
"Away Loss"

This program runs through the baseball scores looking for games involving the
Yankees. Appropriate messages are written for each possible outcome.
This next program is similar to the previous program. However, this program keeps
track of the number of wins and losses, at home and away, then prints these values at
the end:
$1 - /Yankees/
if ($2 > $4) hw++
else
hl++
$3 - /Yankees/
if ($4 > $2) aw++
else
al++
END
printf
printf
printf
printf

"Home Wins: %d\n", hw
"Home Losses: %d\n", hI
"Away Wins: %d\n", aw
"Away Losses: %d\n", al

4-2 Actions and Control Structures

4.2.1

A Word on Style
Note the way in which indentation is used in the preceding program:
•

Except in trivial cases, the program begins a new line after after every opening
brace ( { ).

•

Every e 1 s e is lined up under the corresponding if.

•

Parallel statements, like the sequence of p r i n t f instructions, are lined up
underneath each other.

It is not necessary to write nawk programs in this way, but appropriate indentation
and spacing make programs easier to read and understand. Your style for writing
programs can also help you spot errors as you type in your program. For example, if
you always try to make opening and closing braces line up, it is easy to notice if you
leave out a brace.
The indentation format used in the rest of this guide demonstrates a clean readable
programming style. All programmers develop personal preferences as they become
familiar with a language, and you may decide to deviate from this guide's style in
some respects. The important thing is to have a style and to follow it consistently in
all your programs. It may not make much difference now, when your programs are
relatively simple; but as your programs become more complex, you will find that
style will be an important aid to writing programs that work correctly.

4.3 Using Compound Statements
In an if statement, you might sometimes want to perform several instructions. You
can do this by enclosing the instructions in braces. Such a construct is called a
compound statement.
For example, consider the following program:
{

if ($2 > $4) {
homewin++
printf "The %s defeated the %s . \n" , $1, $3
else {
homeloss++
printf "The %s defeated the %s.\n", $3, $1

END

printf "The horne team won %d times.\n", homewin
printf "The horne team lost %d times.\n", homeloss

The first action is applied to every record in the file. It keeps a count of how many
times the home team wins and how many times the home team loses. It also prints
out a line telling who defeated whom. The END action summarizes the results after
they have been calculated.
As another example, the following program examines the games involving the
Orioles:
$1 - /Orioles/
if ($2 > $4) {
win++
# Horne win
printf "%s: %d, %s: %d\n",$1,$2,$3,$4
} else {
loss++
# Horne loss

Actions and Control Structures 4-3

printf "is: %d, %s: %d\n",$3,$4,$1,$2
}

$3 - /Orioles/
if ($4 > $2) {
win++
# Away win
printf "%s: %d, is: %d\n",$3,$4,$1,$2
} else {
loss++
# Away loss
printf "%s: %d, is: %d\n",$1,$2,$3,$4

END
printf "Wins: %d, Losses: %d\n", win, loss

Each line of output from the first two actions will have the following fonn:
Winning team: score, Losing team: score

The final line of output (from the END rule) summarizes the Orioles' wins and losses.
Examine this program closely to see how it works. The program is straightforward,
but you should make sure you understand how it covers all the possible cases.
One if statement can contain another. For example, the previous program could
have been written as follows:
/Orioles/ {
if ($2 > $4) {
# Home team wins
printf "%s: %d, is: %d\n",$1,$2,$3,$4
if ($1 - /Orioles/)
win++
else
loss++
} else
# Home team loses
printf "%s: %d, %s: %d\n",$3,$4,$1,$2
if ($3 - /Orioles/)
win++
else
loss++

END
printf "Wins: %d, Losses: %d\n", win,

loss

This version of the program detennines whether the game was won by the home
team, prints out the scores with the winner first, and then checks to see if the Orioles
were the home team or the visitors. The previous version of the program split the
problem into two parts: one action perfonned when the Orioles were the home team
ilnd ove when they were not.

4.4 The while Loop
A while loop repeats one or more other instructions as long as a given condition
holds true. A while loop has the following fonnat:
while (expression) statement

The statement can be a single statement or a compound statement. For example, the
file n umbe r s contains a set of one to ten random numbers on each line. The
following program adds up the numbers on each line and prints the line's total:

4-4 Actions and Control Structures

sum = 0
i = 1
while (i <= NF) {
sum = sum + $i
i = i + 1
print sum

The variable i counts fields in the record. While i is less than or equal to the total
number of fields in the record, the while loop adds the value of the ith field to sum
and then adds 1 to i. The loop then starts again; if the new value of i is still less
than or equal to the total number of fields, the loop adds the value of the next field.
The loop stops when i is greater than NF.
As another example, here is a program that uses the same data file and prints out the
maximum value on each line:
max = $1
# starting max is field 1
i = 2
while (i <= NF) {
if ($i > max) max
$i
i = i + 1
print max

On each line, the variable max starts out with the value of the first field (the first
number). The while loop then moves across the record number by number, using
an if statement to test whether a field is greater than the current value of max. If a
greater value is found, max is assigned the new maximum value. After the loop, the
maximum value is printed.
What does this program do if there is only one number on a particular line? In that
case, NF would be 1. The nawk program would execute the following statements
and find that i was already greater than NF:
max = $1
i = 2
while (i <= NF)

...

Therefore, nawk would not execute any of the instructions in the while loop at all.
If the condition part of a while loop is false when the loop is first encountered, the
statements in the loop are not executed.
As an exercise, try to write a program that reads a normal text file and writes out the
text, one word per line.

4.5 The for Loop
A for loop is another way to repeat instructions as long as a given condition holds
true. A for loop has the following format:

for (expressionl;expression2;expressionJ) statement
This loop is equivalent to the following instruction sequence:
expression1
while (expression2)
statement
expressionJ

Actions and Control Structures 4-5

For example, you could write the exercise given at the end of Section 4.4 as follows:
for (i = NF; i > 0; i--)
printf "%5 ", $i
printf "\n"

The program that prints the maximum value in an input line could be written as
follows:
max = $1
for (i = 2; i <= NF; i++)
if ($i > max) max
print max

As you can see, the for loop is just a short-hand way of writing a certain kind of
while loop. Another form of the for loop is described in Chapter 6.

4.6 The next Statement
The next statement tells nawk to skip immediately to the next record in the data
file. In the following example, a next statement is added to the baseball score
program from Section 4.2.
if (NF < 4) {
printf "Not enough fields: %s\n", $0
next
if ($2 > $4) print "Home Win"
else print "Home loss"

If a particular record has less than four fields, this program will print a warning
message and skip to processing the next record. This bypasses the rest of the
instructions in the rule. It also bypasses any other rules that might normally be
applied to this record. As this example shows, next is often used when a program
finds a record that does not have the format you expect.

You can also use next to skip to the next record if you do not want the record
processed by any of the remaining rules. For example:
$1 - /Orioles/
$3 - /Orioles/

{count++; next}
{count++}

1J1is p~ogram prevents the record from being counted twice if it happens to have
Orioles in both the first and third fields. You could also write this program as
follows:
($1 - /Orioles/)

($3 - /Orioles/)

{ count++ }

Using the next instruction inside a BEGIN rule tells nawk to start normal
processing (by reading the first record of the first file). In other words, the next
instruction indicates that you have finished the action associated with the BEGIN
pattern.

4-6 Actions and Control Structures

4.7 The exit Statement
The exit statement makes a nawk program behave as if it has just reached the end
of data input. No further input is read. If there is an END action, it is executed
before the program terminates. As with next, exit is often used when input data
is found to be in error.
If exi t appears inside the END action, it terminates the program immediately.

Actions and Control Structures 4-7

String Manipulation

The preceding chapters have used quoted strings extensively. This chapter discusses
strings in more detail and shows the various operations that manipulate strings.

5.1

String Variables
In Chapter 2, you learned how to use numeric variables: variables that contained

numbers. Variables can also contain strings. For example:
a

"string"

This statement assigns a string to a variable a. As an example of how this can be
used, here is a simple program that checks a text file for duplicate lines (places where
two adjacent lines are identical):
{

if ($0 == lastline) printf "%d: %s\n", FNR, $0
last line = $0

The variable lastline represents the contents of the previous line in the file. In
the action of the program, the current record $ 0 is compared to the previous record
(stored in lastline). If the two are equal, the printf action prints the line
number FNR and the contents of the line. At the end of the action, last line is
assigned the contents of the current line (so that it can be compared to the next line).
You might wonder what lastline contains when the program first begins. After
all, nothing is assigned to lastline until the first line has been read. All string
variables begin with a null string value. A null string is a string, but it
contains no characters. It is written " ". When used in an arithmetic expression, a
null string has the value O.
As another example of a program that uses string variables, here is a program that
writes out the last line of a file:
{ line = $0 }
END { print line }

The value of each input line is assigned to the variable line. At the end of the file,
line contains the contents of the last line in the file. -Therefore, the END action
prints out the contents of that line.

5.1.1

Built-In String Variables
In Chapter 3, you learned about the built-in numeric variables NF, NR, and FNR. The
nawk language also provides the built-in string variables shown in Table 5-1.

Table 5-1: Built-In String Variables
Variable

Description

FILENAME

Contains the name of the current input file. For example, when you apply
programs to the hobbies file, the value of FILENAME is hobbies (if
that is the file you are using). If the input is coming from the nawk
standard input, the value of FILENAME is the string "-".

The field separator string. Specifies the character that is used to separate
fields in the current file. The default value for FS is" "(a single blank),
which as a special case matches both blank and tab. However, if the
command line contains a - F option specifying a different field separator,
F S is a string containing the given separator character. A program can
also assign values to F S to indicate new field separator characters. For
example, you could create a data file whose first line gives the character
that is to be used to separate fields in the records in the rest of the file. A
nawk program could then contain the following rule:
FNR == 1 { FS

$0 }

This says that the field separator string F S is to be assigned the contents of
the first record in the current data file. The character in this line will then
be used as the field separator for the rest of the file (unless the program
changes the value of F S again).
Any F S value of more than one character is used as a regular expression.
See the INPUT section of the nawk(1) reference page for details.
RS

The input record separator string. Just as F S specifies the string that is
used to separate fields within records, RS specifies the string that is used to
separate one record from another. By default, RS contains a new-line
character, which means that input records are separated by new-line
characters. However, a different character may be assigned to RS. For
example, the following statement says that input records are separated by
semicolons (;):
RS = ";"

This would let you have several records on one line, or a single record that
extends over several lines.
To separate records by empty lines, specify the following:
RS

OFS

The output field separator string. When the p r i n t action is used to
print several values, as in { print A, B, C }, the output field
separator string is printed between each two of the values. By default,
OF S contains a single blank character. However, if you make the
assignment OF S = " : ", the output values will be separated by spacecolon-space.

ORS

The output record separator string. When the print action is used, the
output record separator is printed at the end of each record. By default,
ORS is the new-line character.

OFMT

The default output format for numbers when they are printed by
print. This is a format string like the one used by printf. By
default, it is %. 6g, indicating that numbers are to be printed with a
maximum of six digits after the decimal point. By changing OFMT, you
can display more or less precision.

5-2 String Manipulation

5.1.2

String vs. Numeric Variables
Because string variables start out with the null string value while numeric variables
start out as 0, the question arises: how can nawk differentiate between string and
numeric variables, especially when execution is starting and a variable has not been
used yet? The answer is that a variable is assumed to contain a string unless you use
it as a number. For example, if you have a program that consists of
{ print X }

with no value assigned to x, the variable is assumed to be a string. Thus, the output
will be a blank line for each line of input; if X had been taken as a number, the
output would be zero for each line of input.

In an action like X = $1, the variable X will be taken as a number if the form of $1
looks like a number; otherwise, it will be taken as a string. Consider the record in
the following example:
3 ...

Here, the first field looks like a number, so X will normally be taken to be a numeric
variable. On the other hand, consider this example:
7ABC ...

The first field cannot be a number (even though it starts with a digit), so X will be
taken to be a string variable.
There are times when you want a value to be treated as a string, even though it looks
like a number. For example, suppose a file contains the string 1el. In some
contexts, this could be a number (with an exponential part); in other contexts, you
might want to interpret this as a string. To make sure that a value is taken as a
string, even when it might look numeric, concatenate it with an empty string, by
placing a pair of quotation marks ( " " ) after it. For example:

x = $2 ""
This makes sure that the value in $ 2 is interpreted as a string, even if it looks like a
number. Therefore, X will be a string variable.
Similarly, if you want to make sure that a value is taken to be a number, just add
zero to it. For example:

x = $3 + 0
In this case, $ 3 will be taken to be a number because it is involved in an arithmetic
operation. What happens if $ 3 is not a valid number? If $ 3 starts with something
that looks like a number, as in 7 ABC, the numeric value of the string is the number.
Thus, the numeric value of 7 ABC is 7. If the field does not start with anything that
looks like a number, the numeric value of the string is zero. Thus the numeric value
of ABC is O.

5.2 String Concatenation
When a line in a program contains two or more strings that are separated only by
blank characters, the strings are concatenated Goined) into one long string. The
following expression is an example of string concatenation:
$2

String Manipulation 5-3

The following action prints the contents of the first three fields, joined together into
one string:
{ print $1 $2 $3 }

Suppose your input line is:
ABC

Then the output will be as follows:
ABC

Consider the following example as applied to the hobbies file:
$1 - /John/ { print "$" $4 }

This example's output looks like this:
$100.00
$30.00

The dollar sign ( $ ) is concatenated with the contents of the fourth field in all the
appropriate records.

5.3 String Manipulation Functions
Chapter 3 introduced numeric functions like sin and sqrt. The nawk language
also provides the following functions that perform string operations:
length

Returns an integer that is the length of the current record (the number of
characters in the record, without the new-line on the end). For example, the
following program calculates the total number of characters in a file (except
for new-line characters):
sum = sum + length }
END { print sum }

length(s)

Returns an integer that is the length of the string s. For example, the
following program prints out the length of the first field in each record of the
data file:
{ print length($l)

}

The function call length ($ 0) is equivalent to length.
g s ub(regexp,replacement)

Puts the replacement string replacement in place of every string matching the
regular expression regexp in the current record. For example:
gsub(/John/,"Jonathan")
print

This program checks every record in the data file for the regular expression
John. Every matching string is replaced with Jonathan and printed out.
As a result, the output of the program is exactly like the input except that
every occurrence of John has been changed to Jonathan. This form of
the gsub function returns an integer that tells how many substitutions were
made in the current record. This result will be zero if the record has no
strings that match regexp .

5-4 String Manipulation

sub(regexp,replacement)
Works like gsub, except that it only replaces the first occurrence of a string
matching regexp in the current record.
g s ub(regexp ,replacement,string_var)

Puts the replacement string replacement in place of every string matching the
regular expression regexp in the string string_var. For example:
gsub (/John/, "Jonathan", $1)
print

This program is similar to the previous program, but the replacement is only
made in the first field of each record. This form of the g s ub function
returns an integer that tells how many substitutions were made in string_var.
s ub(regexp,replacement,string_var)
Works like gsub, except that it replaces only the first occurrence of a string
matching regexp in the string string_var.
in de x (string ,substring)
Searches the given string for the appearance of the given substring. If the
substring cannot be found, index returns zero; otherwise, it returns the
number (origin 1) of the character in string where substring begins. For
example:
index ("abed", "cd")

This program returns the integer 3 because cd is found beginning at the third
character of abed.
rna t ch(string ,regexp)

Determines if string contains a substring that matches the regular expression
(pattern) regexp. If so, match returns an index giving the position of the
matching substring within string; if not, it returns zero. This function also
sets a variable named RSTART to the index where the matching string starts,
and sets a variable named RLENGTH to the length of the matching string.
sub s t r(string ,pos)

Returns the last part of string, beginning at a particular character position.
The argument pos is an integer, giving the number of a character.
Numbering begins at 1. For example:
substr("abed",3)

The value of this expression is the string cd.
substr(string,pos,length)
Returns the part of string that begins at the character position given by pos
and has the length given by length. For example:
substr("abedefg",3,2)

The value of this expression is cd (a string of length 2 beginning at position

3).

String Manipulation 5-5

sprintfiformat,valuel,value2, ... )
Returns the string value that would be printed by the following printf
action:
printf(format,valuel,value2, ... )

For example,
str = sprintf("%d %d!! !\n",2,3)

assigns the string
"2 3! !!\n"

to the string variable s t r.
to lowe r(string)
Returns the value of string, but with all the letters in lowercase. (This
function is not found in all versions of awk.)

toupper(string)
Returns the value of string, but with all the letters in uppercase. (This
function is not found in all versions of awk.)
ord(string)
Converts the first character of string into a number. This number gives the
decimal value of the character in the ASCII character set. (This function is
not found in all versions of awk.)

5-6 String Manipulation

Arrays

In most programming languages, an array is an ordered list of values, similar to a
table of information. Arrays in nawk are more flexible than arrays in most other
languages, but it is helpful to begin by discussing the traditional concept of an array.

6.1

Arrays with Integer Subscripts
The simplest sort of array is a list of values (either numbers or strings). The values
in the list are called the elements of the array.
Elements in an array are most commonly referred to by number. For example, the
first element in the array could be number 1, the second could be number 2, and so
on. These numbers are called subscripts of the array elements.
A nawk array has a name, similar to a variable name. To refer to an element of an
array, you give the name of the array followed by brackets containing the element's
subscript. For example:
arr[3]

This statement refers to element 3 in an array named arr.
A statement like the following creates an array named arr whose elements are all
the fields of the current record:
for (i=li i<=NF; i++)
arr[i] = $i

The following program stores the entire contents of the input file in an array called
lines:
{ lines [NR] = $0 }

Remember that the variable NR is incremented by 1 for each line that is read in, so
the elements in the lines array will be the lines of the input file, in order.
The following program reads the contents of a data file and stores the input in
lines:
END

{ lines [NR] = $0 }
{for (i=NRi i>Oi i--) print lines[i] }

When all the lines have been read in, the END action prints out the lines in reverse
order. The program therefore reads lines of text and then prints them in reverse
order.
As another example of the simple use of arrays, suppose you have a file that contains
12 columns of numbers and you want to add up the numbers in each column. You
could do this with the following program:
END

{ for (i=l; i<=12; i++) sum [i) = sum[i] + $i }
{for (i=l; i<=12; i++) print sum[iJ }

Each element in the array called sum holds a running total of the sum of numbers in
the corresponding column.
Notice that the previous examples make extensive use of the for statement. This is
true of many programs that use arrays.
Also notice that you do not need a special statement to create (declare) an array. If a
statement in a program contains a name followed by a value in brackets, the name is
assumed to refer to an array, and the array is created ilutOl1latically. A name must not
be used as both a variable and an array in the same nawk program.

6.2 Generalized Arrays
Most programming languages let you create arrays that use numbers as subscripts;
nawk also lets you create arrays that have string values as subscripts. For example,
here is a program that calculates how much each person spends on all his or her
hobbies.
{ moneY[$l] += $4 }

The array in this program is named money; the subscripts are the names of the
people in the hobbies file. The elements of the array are therefore as follows:
money ["Jim"]
money [ "Linda" ]
money [ "John"]

(Note that the following statements are equivalent:
money[$l] += $4
money[$l] = money[$l] + $4

This notation is explained in Section 8.3.)
Apply this program to the following input record:
Jim

reading

100.00

The action becomes
money["Jim"] += 100.00

As with all numeric variables, money [ "Jim"] starts out with a value of zero. At
the end of the program, the array element will contain the amount of money that Jim
spends on all his hobbies.
To print the contents of the money array, you can use a new form of the for
statement:
for (s in money) print s, money[s]

This form of the for statement executes the p r i n t action once for every value that
is used as a subscript for the money array. In each loop, the variable s has one of
the subscript values. Therefore, the first time through the loop, s might have the
value Jim, the next time Linda, and so on. The order is undefined. Therefore,
the complete program prints out the amount that each person spends on his or her
hobbies:
END

{ rnoney[$l] += $4 }
{for (s in money) print s, money[s]

}

Run this program to see how it works. After you have done so, replace the print
6-2 Arrays

action with pr int f to produce more understandable output.
Generalized arrays have a wide variety of applications. For example, the following
program produces a list of all the words used in an input text file:

END

{ for (i=l; i<=NF; i++)
wordlist[$i] = 1 }
for (x in wordlist)
print x }

Assigning 1 to each element of wordlist is just a dummy action; the important
thing is that the program creates an element of wordlist whose subscript value is
one of the words in the input text file. The for loop in the END action then prints
out all the words that were used as subscript values; this list is the set of all words
used in the file.
As an exercise, modify the preceding program so that it keeps a count of how often
each word is used in the input file. At the end, the program should print out each
word that appears in the file and how often the word was used.

6.2.1

String Subscripts vs. Numeric Subscripts
This chapter began by showing arrays with numeric subscripts because those types of
arrays are most familiar to programmers. However, all nawk array subscripts are
converted to strings. For example, the subscript in a [ 1] is converted to a string,
giving a [" 1 "]. In a [0 1] , the numeric subscript is first converted to its simplest
form, a [ 1] , which is then converted to the string a [ " 1 "] as before.
Floating point subscripts are converted to the simplest equivalent integer, then
converted to the corresponding string. Thus a [ 1 . 0] is converted to a [ 1] and then
converted to a [" 1 "]. Therefore, the following forms are all equivalent:
a[l]

a[l.O]

a["l"]

Note that the array element a [" a1 "] is not equivalent to the ones in the preceding
examples because "1 " is not the same string as "0 1 " .

6.3 Deleting Array Elements
Because array elements are stored in the computer's memory, you can decrease
memory requirements by deleting elements when you are finished using them. To do
this, use the following statement:
delete arrayname [subscript]

For example:
delete money ["Jim"]

As an extension of standard a wk, the following statement deletes the entire array:
delete money

This statement is equivalent to the following:
for (ind in money)
delete money[ind]

Arrays 6-3

6.4 Multidimensional Arrays
The nawk language lets you define arrays with more than one subscript. Subscripts
are separated by commas and enclosed in brackets, as in the following example:
a[l,2] = 3
b["cat", "dog",

"bird"] = "horse"

The following example creates a multidimensional array that records different animal
names:
name ["chicken", "female"] = "hen"
name ["chicken", "male"] = "rooster"
name ["chicken", "young"] = "chick"
name ["cattle", "female"] = "cow"
name ["cattle", "male"] = "bull"
name ["cattle", "young"] = "calf"

As you can see, it is simple to create and manipulate a database that is just a
multidimensional nawk array.

6-4 Arrays

User-Defined Functions

Previous chapters discuss numeric functions like sin and sqrt, and string functions
like gsub and length. This chapter shows how nawk lets you create your own
functions to perform similar kinds of operations.

7.1

Defining Functions
In a nawk program, a function definition looks like this:
function name(argument-list) {
statements

The argument-list is a list of one or more names, separated by commas, that represent
argument values passed to the function. When an argument name is used in the
statements of a function, it is replaced by a copy of the corresponding argument
value.
For example, here is a simple function that takes a single numeric argument Nand
returns a random integer between 1 and N (inclusive):
function random(N) {
return (int(N * rand() + 1»

This function uses two built-in functions discussed in Chapter 3: rand (which
returns a random floating point number between 0 and 1) and in t (which returns the
integer part of a floating point number). The expression N * rand () + 1 yields
a random floating point number between 1 and N+1 (not including N+1 itself).
Applying the int function to this floating point number obtains an integer between 1
and N. The ret urn statement returns this value as the result of the function
random.

Once you define the random function, you can use it anywhere in your program that
you would use other functions.
For example, if you have a file that contains people's names in its first field, and each
of these people is going to roll two six-sided dice, you could simulate this situation
with the following program:
function random(N) {
return (int(N * rand() + 1»
score = random(6) + random (6)
printf "is rolls %d\n", $1, score

This program consists of a definition for the random function and a rule to be
applied to every record in the file. The score variable contains the sum of two
simulated six-sided die rolls. This value is printed, along with the name of the
person who rolled the dice.

You can test this program on the hobbies file. Remember, however, that the file
contains several lines for most people, so the output will show more than one roll per
person.
As another example of the random function, here is the program used to generate
the random baseball scores in the baseball file. The input data file contains a
single line giving the names of baseball teams (separated by tabs).
BEGIN {FS = "\t"} # Tab is field separator
function random(N) {
# Produce random number between 1 and N
return ( int(N * rand() + 1) )

# Read in names of baseball teams
for (i = 1; i <= NF; i++)
team[i] = $i
# Generate 100 random scores
for (i = 1; i <= 100; i++) (
41=
Choose teams
hometeam = team[random(NF)]
visteam = team[random(NF)]

41= Make
sure teams are different
while (hometeam == visteam)
visteam = team[random(NF)]

41=
Generate scores
homescore = random(13)
visscore = random(13)

41= Make
sure scores are different
while (homescore == visscore)
visscore = random(13)

41=
Print out score
printf "%s\t%d\t",hometeam,homescore
printf "%s\t%d\n",visteam,visscore

The comments in the program should make it easy to understand what is happening
in each section. The program chooses two different teams at random from the list in
an input file. It then assigns each team a random score from 1 to 13 (a range typical
of baseball scores) and prints the results with two p r i n t f statements. (We could
also have used a single printf statement.)
As another example of the random function, here is the program used to generate
the-random lists of numbers in the n umbe r s file:
function random(N) {
# Produce random integer between 1 and N
return ( int(N * rand() + 1) )
BEGIN
for

exit

7-2 User-Defined Functions

(i = 1; i <= 30; i++) {
for (j = random(10); j > 0; j--)
printf "%d ",random(100)
printf "\n"

This program has only a BEGIN rule. This rule prints out 30 lines, each of which
contains a random number of integers in the range 1 to 100. Note that random is
used both to choose the integers and to decide how many of these integers will
appear on each line.

7.2 Recursion
A function can call itself; this process is called recursion. One example of a
recursive function is the factorial function, which is called with the following
form:
factorial(N)

This factorial function produces the number that is the product of all positive integers
less than or equal to N. For example:
factorial (4)

The result of this expression is 4x3x2xl, or 24. The factorial of any N less than 1 is
defined as 1.
The following function definition defines the factorial function recursively:
function factorial (N) {
if (N <= 1)
return 1
else
return N * factorial (N-1)

If N is less than or equal to 1, the factorial is 1. Otherwise, the factorial of N is N
times the factorial of N -1. Thus the factorial of 4 (4x 3x2x 1) is 4 times the factorial
of 3 (3x2xl). The factorial function calls itself recursively to figure out the
appropriate result.

By the way, the factorial function demonstrates that a function can have more
than one return statement. When a return statement is executed, the function
immediately stops executing and returns the given value as the function result.

7.3 Call By Value
When a program calls a user-defined function, nawk makes copies of the argument
values passed to the function and the function does all its work using those copies.
For example, suppose a program is using a variable named X and calls a user-defined
function F:
F(X)

The function F is given a copy of the current value of X. Because F only has a copy,
the function cannot affect the current value of X: For example, consider this
program:
function exchange (A,B)
temp = A
A = B

B = temp

exchange ($1,$2)
print $0

User-Defined Functions 7-3

In this program, it appears that the exchange function swaps the values of
arguments A and B. The value of A is temporarily stored in temp; the value of B is

assigned to A and the saved value of A is assigned to B. Now, when the main rule of
the program issues the function call exchange ($1, $2) does nawk swap the
values of the first two fields of the current record? No, the function is only working
with copies of the two fields; the function does not change the fields themselves.
Note that the definition of exchange does not have a ret urn statement. It is not
necessary for functions to return values. If a function does not have a return
statement, the function ends when the last statement is executed.
If a function does not use ret urn to return a result, do not use that function as if it
did return a result. A function with no ret urn statement yields a meaningless

(undefined) result value.

7.4 Passing Arrays to Functions
When an array is passed as an argument to a function, it is passed by reference.
This means that the function works with the actual array, not with a copy. Anything
that the function does to the array has an effect on the original array.
For example, the spl it function is a built-in function that takes an array as an
argument. It has the following form:
split(string ,array)

The split function breaks up the string into fields, and assigns each of the fields to
an element of the array. The first field is assigned to array [1] , the next to
array [2] , and so on. Fields are assumed to be separated with the field separator
string F S. If you want to use a different field separator string, you can use the
following format:

spl it(string ,array Isstring)
The value ofJsstring is the field separator string you want to use instead of FS. The
result of spli t is the number of fields that string contained.
Note that split actually changes the elements of array. When an array is passed
to a function, the function may change the array elements.

7-4 User-Defined Functions

Enhancing Your nawk Programs

This chapter discusses additional ways you can tailor your nawk programs to serve
your needs.

8.1 The getline Function
The getline function reads input from the current data file or from a different file.
The function has several different forms, discussed in the sections that follow.

8.1.1

Reading from the Current Input
In its simplest form, getline is called as follows:
get line

This reads a new record from the current data file. The function automatically
changes the value of $ 0 and all the other field values. It also changes variables like
NF, NR, and FNR. In other words, using getline in this way is exactly like what
happens when nawk reads in a new record in the normal way. For example:
/XYZ/ { print ; getline ; print }

First, this rule prints any record that contains the string XYZ. Next, the getline
function reads the next record, and the final p r i n t prints that new record.
Therefore, the rule prints every record that contains XY Z and also the record that
follows (regardless of what the next record contains).
When getline reads a new record, the previous record is discarded; subsequent
rules are applied to the new record, if appropriate. For example:
/XYZ/ { print ; getline ; print }

/ABe/ { ... some action ... }

The ABC rule in this program will be applied to the new record (if appropriate); it
will not be applied to the XY Z record because that record is discarded when the new
record is read.
If a call to getline appears in the BEGIN action, nawk immediately starts reading
the first data file specified on the command line.

8.1.2

Reading a Line into a String Variable
The getline function can also be called in the following form:
getline variable

This form reads a new line from the current data file but assigns the contents of the
line to the named string variable. The variables NR and FNR are changed to reflect
that another record has been read from the input data file; however, the contents of
$ 0 and NF are unchanged. Therefore, the following example reads a line into the
variable X and compares this new line to the old line that is still stored in $ 0:

getline X
if (X == $0)
print "Duplicate line"

8.1.3

Reading from a New File
Another fonn of getline reads a line from a different file instead of the current
data file:
getline var <''filename''

This form of the function reads a line from the given file and stores the contents of
the line in the string variable var. For example, here is a simple program that
compares the current data file to another file named t est f i 1 e and prints out a
message if the two are not identical:
getline X <"test file"
if ($0 != X)
print "Not identical!"

This rule is executed for every line in the data file. Every time the action is
executed, the getline function reads a new line from testfile and compares it
with the current line from the data file. For every line read from the current data file,
another line is read from t est f i 1 e and the two lines are compared. If the two files
differ at any point, the message' 'Not identical!" is printed.
A program may also call getline with the fonn
getline <"filename"
In this case, a line is read from the given file and assigned to $ o. The value of NF is
changed to reflect the new record in $ 0, but the variables NR and FNR are not
changed because the record was not read from the current data file.

8.1.4

Reading from Other Commands
The getline function can also be used to read data produced by another command
or program:
"command' I getline var

This form of the function executes the given command and gathers the command's
output. The first line of output is piped into (assigned to) the string variable var.
For example, the following program executes the date command and assigns the
output of the command to the string variable now:
"date" I getline now

The following statements read the current date into the variable now and check to see
if the date string contains Ap r :
"date" I getline now
if (now - /.*Apr.*/)
print "April Shower Time!"

You can also pipe command output into $ O. This is done with a statement of the
following form:
"command' I getline
8-2 Enhancing Your nawk Programs

This form of getline changes the value of $0 and NF but does not change NR or
FNR.

8.1.5

Redirecting Output to Files and Pipes
You can redirect the output of pr in t and pr int f to a file or a pipe. Details are
given in the Output section of the nawk(l) reference page.
Only a limited number of files and pipes can be opened at one time. You can use the
close function to close files during execution. In this way, any number of files and
pipes can be used during the execution of a naw k program. You can cl 0 s e both
input files (used by get line) and output files (used by print and printf).

8.2 The system Function
The previous section showed how you can execute programs and system commands
from nawk programs using the getline function. You can also execute
commands with the system function. This function has the following form:
system(" command line")

The following statement executes a cd command to change the current directory to
directory XY Z :
system ("cd XYZ")

8.3 Compound ASSignments
The na wk language lets you use a shorthand notation for some common assignment
operations. For example, the following statements are equivalent:
sum = sum + value
sum += value

Note, however, that the second form is simpler to write.
The += operation is an example of a compound assignment. Table 8-1 shows all
the compound assignment operations of nawk and their equivalents:

Table 8-1: Compound Assignments
Compound Operation

Equivalent

Compound Operation

Equivalent

A += B

A = A + B

A /= B

A = A / B

A -= B

A = A - B

A %= B

A %B

A *= B

A = A * B

A "'= B

A '" B

For example, you could use the following program on the hobbies file to calculate
how many hours a week John spends on his hobbies:
/John/ { sum += $3 }

Enhancing Your nawk Programs 8-3

8.4 The sortgen Program
It can be difficult to remember all of the options to the sort command. As an
example of the power of nawk, this section presents a nawk program, named
sortgen, that generates the correct options for a specification.
The sortgen program is described in detail in The AWK Programming Language.
Briefly, sortgen takes a description of the layout of the fields in a record and emits
a command line for sort that will carry out the desired sort.
Note that sortgen uses I-origin (the first field to be sorted on is field 1), and writes
the sort command line to use sort's O-origin field labeling. Example 8-1 shows
the definition of sortgen:

Example 8-1: sortgen Program for nawk
# sortgen - generate sort command
#
input: sequence of lines describing sort options
#
output: command line for sort

BEGIN { key = a

/no Inot In't /

print "error: cannot do negatives:", $0; ok

1 }

# rules for global variables

{ ok = a }
/uniqldiscard.*(idenldupl)/
uniq = " -u"; ok = 1 }
/separ.*tabltab.*separ/
sep
"t'\t'''; ok = 1 }
/separ/
{ for (i = 1; i <= NF; i++)
if (length($i) == 1)
sep = "t'" $i
ok
1
}

/key/

{ key++; dokey(); ok

1 }

# new key; must come in order

# rules for each key

/dict/
dict[key] = "d"; ok = 1 }
/ignore.*(spacelblank)/
blank[key] = "b"; ok = 1 }
/foldlcase/
fold[key] = "f"; ok = 1 }
/nurn/
nurn[key] = "n"; ok = 1 }
/revldescendldecreasldownloppos/ { rev[key] = "r"; ok = 1 }
/month/
{ month [key] = "M"; ok = 1 }
/forwardlascendlincreasluplalpha/ { next}
# this is default
!ok
{ print "error: cannot understand:", $0 }
END

# print flags for each key
crnd = "sort" uniq
flag = dict[O] blank[O] fold[O] rev[O] nurn[O] month[O] sep
if (flag) cmd = cmd " -" flag
for (i = 1; i <= key; i++)
if (pos[i] != "") {
flag = pos[i] dict[i] blank[i] fold[i]
flag = flag rev[i] nurn[i] month[i]
if (flag) cmd = cmd " +" flag
if (pos2[i]) cmd = cmd " -" pos2[i]
print cmd

function dokey(
for (i

i) {
# determine position of key
1; i <= NF; i++)
if ($i - /" [0-9] +$/) {
pos[key] = $i - 1
# sort uses a-origin

8-4 Enhancing Your nawk Programs

Example 8-1 :

(continued)
break

for

(i++; i <= NF; i++)
if ($i - /A[0-9]+$/) {
pos2[key] = $i
break
}

if (pos[key] == "H)
printf("error: invalid key specification: %s\n", $0)
if (pos2[key] == "H)
pos2[key] = pos[key] + 1

Enhancing Your nawk Programs 8-5

Order of Operations

This appendix lists the order of operations for nawk, from highest precedence
(operations done first) to lowest (operations done last). You can use parentheses ()
to change this ordering.

Operators

Description

$i v [a]

field, array element
increment, decrement
exponentiation
unary plus, unary minus, logical NOT
multiplication, division, remainder
addition, subtraction
string concatenation
comparison

V++ V-- ++V --v
A"'B
+A -A !A
A*B AlB

A%B

A+B A-B
A B

A<B A>B A<=B A>=B
A!=B A==B
A-B A!-B
A in V
A && B
A II B
A ? B : C
V=B V+=B
V*=B V/=B
V"'=B

V-=B
V%=B

regular expression matching
array membership
logical AND
logical OR
conditional expression
assignment

In this table, A, B, and C can be any expression; i is any expression yielding an
integer; and V is any variable.

Example Files

This appendix contains copies of all the example files used in this manual.

The hobbies File
Fields in this file are separated by spaces. When creating files that will use nawk's
default value for FS, you can enter a single space or as many spaces as needed to
make the fields align neatly.
Jim
Jim
Jim
Linda
Linda
Katie
Katie
John
John
Andrew
Lori
Lori
Lori

reading
bridge
role-playing
bridge
cartooning
jogging
reading
role-playing
jogging
wind-surfing
jogging
weight-lifting
bridge

15
4
5
12
5

14
10
8
8
20
5

12
2

100.00
10.00
70.00
30.00
75.00
120.00
60.00
100.00
30.00
1000.00
30.00
200.00
0.00

The baseball File
Fields in this file are separated by tabs. Note that the fields do not line up uniformly
when you look at the file on your terminal. This irregularity occurs because exactly
one tab is used between fields; using multiple tabs to make the fields line up in neat
columns would result in nawk's seeing two adjacent tabs as the field separators
before and after an empty field. When creating the baseball file, key in the
information as in this example:
% cat > baseball

BrewersITABI5~Tigers~9

IQTBL.a;~1

Here is the file:
Brewers 5
Brewers 2
Blue Jays
Indians 6
Yankees 7
Orioles 10
Brewers 6
Red Sox 3
Red Sox 6
Blue Jays

Tigers 9
Blue Jays
8
Red Sox
Blue Jays
Brewers 2
Indians 1
Yankees 3
Indians 12
Yankees 2
Brewers
8

6
7
7

Orioles 2
Indians 6
Orioles 6
Red Sox 7
Yankees 9
Brewers 4
Tigers 9
Tigers 10
Brewers 10
Indians 4
Blue Jays
Yankees 11
Orioles 5
Yankees 12
Orioles 1
Yankees 5
Orioles 6
Indians 12
Red Sox 3
Blue Jays
Yankees 9
Orioles 10
Red Sox 5
Yankees 13
Orioles 4
Yankees 11
Tigers 4
Red Sox 3
Yankees 1
Yankees 8
Orioles 1
Blue Jays
Indians 8
Brewers 2
Brewers 2
Orioles 7
Yankees 4
Red Sox 11
Tigers 6
Indians 11
Orioles 8
Yankees 9
Tigers 8
Indians 1
Blue Jays
Indians 12
Yankees 8
Indians 2
Brewers 6
Brewers 13
Blue Jays
Orioles 2
Orioles 1
Red Sox 5
Brewers 3
Blue Jays
Blue Jays
Tigers 7
Brewers 2
Blue Jays
Red Sox 4
Yankees 12
Brewers 4
Tigers 2
Orioles 4

B-2 Example Files

7
Blue Jays
Blue Jays
9
12
Blue Jays
Blue Jays
11
Indians 10
Blue Jays
5
Blue Jays
10
Red Sox 9
Red Sox 9
Tigers 12
Brewers 5
8
Tigers 2
Red Sox 6
Blue Jays
13
Red Sox 8
Brewers 4
Indians 13
Tigers 9
Blue Jays
12
Orioles 8
9
Orioles 6
Indians 7
Orioles 2
Brewers 6
Brewers 6
Indians 9
Indians 13
Brewers 10
Indians 8
Tigers 10
12
Blue Jays
Indians 8
9
Blue Jays
9
Orioles 5
Indians 7
Indians 2
Orioles 6
Orioles 12
Brewers 13
Yankees 12
Red Sox 7
Brewers 13
Indians 7
8
Blue Jays
Red Sox 5
8
Tigers 9
Indians 5
Orioles 12
Red Sox 2
Indians 9
Tigers 7
9
Yankees 11
Blue Jays
9
Yankees 9
Tigers 13
Red Sox 6
8
Brewers 5
11
Brewers 3
Tigers 5
Red Sox 1
9
Indians 5
Orioles 5
Blue Jays
8
8
Blue Jays
Blue Jays
6

Orioles 10
Tigers 5
Brewers 9
Blue Jays
Yankees 2
Brewers 12
Indians 4
Red Sox 2
Yankees 6
Indians 8
Yankees 8
Orioles 4
Red Sox 9
Yankees 8
Indians 3
Indians 1
Red Sox 8
Brewers 7
Indians 11
Yankees 3
Orioles 9
Indians 12
Tigers 11
Brewers 7
Red Sox 13

Brewers 3
Red Sox 2
Tigers 12
11
Tigers
Blue Jays
Orioles 6
Tigers 8
Tigers 7
Brewers 11
Brewers 11
Red Sox 11
Yankees 5
Yankees 10
Tigers 13
Brewers 8
Blue Jays
Brewers 13
Orioles 6
Yankees 4
Red Sox 11
Indians 6
Red Sox 11
Orioles 12
Indians 9
Brewers 8

1
13

The numbers File
Fields in this file are separated by spaces.
74 33 66
8 87 40
68 46
53 40 5 45 50
19 54 12 55 35 70 77 5 22 100
44 21 66 43 20
58 98 44 12 2 20 12 60 55 12
2 43
10 46 1 57
46
58 7 52 83 90 43 63 69 64
17 2 46 42 14 84 7 65
83 63 73 63 15 59 71 63
35 82 24
14 23 60 35 94 95 82 82 10
48 59 33 39 99
90 88
51 50 58 1 56 86 94 19 31 26
50 36 42 41 95
40 76 88 68
7 94 5 5 49 68 56
44 69 41 45 33 72 47 60 49 35
96 21
46 52 47 26 26 45 89 34 79 65
36 28 93 63 20 17 73 96
5 56 88 79 60
55 1 1 91 12 36 67 58
42 12 57 63
55 13 35
33 11 47

Example Files B-3

Index

Special Characters
, (comma)

See comma
, (apostrophe)
See apostrophe

(circumflex)

See circumflex
{ } (braces)

See braces

I (vertical bar)
See vertical bar

. (period)

See period
'''' (quotation marks)
See quotation marks

A
action, 1-3

$ (dollar sign)

after processing input, 2-7

See dollar sign
$0 notation, 1-3
% (percent sign)

before processing input, 2-7

See percent sign
& (ampersand)

See ampersand
( ) (parentheses)
See parentheses

* (asterisk)
See asterisk
+ (plus sign)
See plus sign
; (semicolon)

See semicolon
=(equal sign)
See equal sign
? (question mark)
See question mark
[ ] (brackets)
See brackets
- (minus sign)

See minus sign
\ (backslash)

See backslash

compound,4-3
default, 1-5
omitting from rules, 1-5
print, 1-5
implied if no action specified, 1-3
printf,2-3
alphabetical order, 1-4
ampersand
double, for multiple conditions, 3-5
AND operator, 3-5
apostrophe
for enclosing a nawk program, 1-6
arguments
for numeric functions, 2-10, 2-11
passing mechanisms for, 7-1, 7-3, 7-4
arithmetic operations, 2-1
functions in, 2-10
operators for, list of, 2-lt
remainder (modulus), 2-2
arrays
creating, 6-2
deleting elements from, 6-3
generalized,6-2

arrays (cont.)
generalized (cont.)
applications for, 6-3
multidimensional, 6-4
names of, 6-2
passing mechanism to functions, 7-4
subscripts, 6-1
floating-point numbers as, 6-3
non-equivalent strings in, 6-3
treatment of, by nawk, 6-3
using strings as, 6-2
syntax of references to, 6-1
ASCII collating order, 1-4
assigning values, 2-6, 2-9
assignment operator, 2-6
asterisk
in regular expressions, 3-2t
atan2 function, 2-11 t

B
backslash
preventing interpretation of metacharacters with,
3-4
printing in a string, 2-6
BEGIN pattern, 2-7
next statement in action for, 4-6
braces
in regular expressions, 3-2t
brackets
in regular expressions, 3-2t
built-in variables, 2-9, 5-1t

c
calculating with nawk, 2-1
case of letters, 3-1
changing in a string, 5-6
character
escape sequences for certain, 2-5
normal,2-3
with special meaning to nawk, 3-2
circumflex
in regular expressions, 3-2t

Index-2

close function, 8-3
comma, to separate fields, 1-5
command line, running nawk from, 1-6
comments in nawk programs, 4-1
comparing values, 1-3, 1-4
operators for, list of, 1-3t
compound assignments, 8-3
list of, 8-3t
compound statements, 4-3
concatenating strings, 5-3
conditions, 1-3
multiple, 3-5, 3-6
control structures
else statement, 4-1
exit statement, 4-7
for loop, 4-5, 6-2
if statement, 4-1
next statement, 4-6
while loop, 4-4
converting a string to a number, 5-6
cos function, 2-11 t
creating arrays, 6-2
creating your own functions, 7-1 to 7-4
using built-in functions, 7-1

D
data
entering from the terminal, 1-7
files, 1-1, 1-8
form of, 1-1
sources of, 1-1, 1-7
decimal point in numbers, 2-3, 2-5
decrementing values, 2-8
defining your own functions, 7-1 to 7-4
using built-in functions, 7-1
dollar sign
in regular expressions, 3-2t
to indicate fields, 1-3
dynamic regular expressions, 3--4

formatting variables as strings, 5--6

element
deleting from an array, 6-3

FS variable, 4-2, 5-1t
functions
argument passing mechanisms, 7-1, 7-3, 7-4

of an array, 6-1

call by reference, 7-4

else statement, 4-1

call by value, 7-3

END pattern, 2-7, 4-3

closing files or pipes, 8-3

exit statement in action for, 4-7

defining your own, 7-1 to 7-4

equal sign
assigning values to variables with, 2-6

using built-in functions, 7-1
getline, 8-1

testing equality with, 1-3

reading from a different file with, 8-2

escape sequences, 2-5

reading from other commands with, 8-2

list of, 2--6t

numeric

executing commands from a nawk program, 8-3

arguments for, 2-10, 2-11

exit statement, 4-7

described, 2-10

exp function, 2-11 t

list of, 2-11t

exponential notation, 1-4

results of, 2-10

expressions

string, 5-4