Digital PDFs
Documents
Guest
Register
Log In
AA-PBKPA-TE
June 1990
80 pages
Original
3.4MB
view
download
OCR Version
3.0MB
view
download
Document:
ULTRIX Guide to the nawk Utility
Order Number:
AA-PBKPA-TE
Revision:
0
Pages:
80
Original Filename:
OCR Text
ULTRIX Guide to the nawk Utility Order Number: AA-PBKPA-TE June 1990 Product Version: nawk Version 1.0 Operating System and Version: ULTRIX Version 4.0 or higher This manual is a tutorial description of the nawk text-processing utility and programming language. digital equipment corporation maynard, massachusetts Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause of DFARS 252.227-7013. © Digital Equipment Corporation 1990 All rights reserved. © Mortice Kern Systems, Inc., 1987, 1990 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital or its affiliated companies. The following are trademarks of Digital Equipment Corporation: mn@nan DECUS ULTRIX Worksystem Software DECwindows VAX DDIF DDIS DEC MASSBUS MicroVAX Q-bus VMS VMS/ULTRIX Connection VT DECstation ULTRIX Mail Connection CDA DECnet DTIF VAXstation ULTRIX XUI INTEL is a trademark of Intel Corporation. Xenix, MS-DOS, and MS-0S/2 are trademarks of Microsoft Corporation. MKS and MKS AWK are trademarks of Mortice Kern Systems, Inc. PC-DOS is a trademark of International Business Machines, Inc. UNIX is a registered trademark of AT&T in the USA and other countries. Contents About This Manual AUGIENCE 1) ¢2r:1 41 b4: 18 L0 o e easenstan e et ttnetsnaennsaatnssassseneras vii LR vil ooeniiiiiiiiieirie .uiieiiiiniiiiiiiiiiii ettt rerie it e seaereeeeeressssaencnsnennes vii c1vivuiiiieiiiueueieeienaeeertereeeeeraereeeesrencrnrensansuosensnssnsrsesnsensansssensensnnsnnnens viii Related DOCUMENES CONVENTIONS e ce et et te e e e e et en s en 1 Basic Concepts 1.1 Data FIleS couiiiiiiiiiiiiiiiiii 1.2 RECOTAS weniiieiiiiiici ittt st st e ea e enes 112 FHeldS couiiiiiiiiecini ettt ettt st et e e e e The Shape of @ Program ......c.coeeuiiiiniiiiiiiiieiiiiei et e e ebinevanes Simple Patterns .....c.oveviiiiiiiiii Numbers and Strings ....ccoiiiiiiiiiiiiiiiii e e The Print ACHON ...ovuiiiiiiiiiiiiii it e e naas Additional Points About Rules ........ccoiiiiiiiiiiiiii Running nawk Programs 1.3.1 1.3.2 1.3.3 1.3.4 Simple Arithmetic 2.1 ArithmetiC OPETatiONS 2.1.1 ..........ccooiiiiiiiiiiiii e The nawk Command Line ........cccovviuiiieiiiiciiiiiiiiiciiini e Program Files ....ccveiuiiiiiiiiiiiiiiii it Sources of Data ....coiiiiiiiiiii et Saving nawk OULPUL ..eeiiniiniieiitiiier e et en et e e e 2 2.2 eie i e s et seneaerresa s st stseeae et esessanerasanasas LI 1.2.1 1.2.2 1.2.3 1.2.4 1.3 et ...iiuiiveriurirnriineinireneenreneteteeaeenernerenieneeensererrareseeensennas 1-3 1-4 1-5 1-5 1-6 1-6 1-7 1-7 1-8 2-1 aaeaenes 2-2 ...iiviiiiiiiiiiieeieeiiee e et e eee e eees s s s eaa s enaee 2-3 Operation Ordering .....ceoeeierieniiiiiciiiiieiii ittt Formatted OULPUL 1-2 2.2.1 Placeholders 2.2.2 Escape Sequences ....c.coviiiiiiiiiii -------------------------------------------------------------------------- 24 2-5 P2 TEAYZ: 1 & 1 o) (-1 SO O 2-6 2.3.1 The Increment and Decrement Operators ...........cccccevevveeeriervenreneennnees 2-8 2.3.2 INIHAL ValUES oottt ete ettt et te e e e easaean e earereneneaeanananens 2-8 2.3.3 Built-In Record-Oriented VariableS A% SN o 11410 (<1 (ol 21 1o o3 510 4 1 .......ovviiieiieiieeeeeeeeeseienreasnanns 2-9 R U U 2-10 3 Patterns and Regular Expressions 3.1 Using Matching EXPressions RV (7:T¢ o 13 ¢ Lot £ £ 3.3 Using Matching Expressions with Strings 3.4 Applying Actions to a Group of Lines .......ccccoevvviiiiiiiiiiiiniiin 3.5 Combining Conditions in Patterns 4 Actions and Control Structures 4.1 Adding COMMENLS L ¥ IO VN 1311 1) oL 42.1 ......cc.cicoeieiiiiiiiiiieieiiieeieeieeeeeeece 3-1 S OO 3-2 .........ccceeviiiiiiieiinieineieciieineenn. 34 e, 3-5 et 3-5 .e..iieiiiiiiieiiuirernrerereeierteetacereentaerenrernnrrasessnsessinesersanes 4-1 A ...........ccceeivieiiiiiiiiiiiiiiien O OO 4-1 A Word on Style ..eoeiieiii e 4-3 4.3 Using Compound StatemeNtS .......c.ccceeerrnrrenerernrerrueeerreereerereesieersinereneesssnnns 4-3 4.4 The While LOOD couiiiiniiiiiiiiiie et eeve e e vt e e e re e eeeaaaa 44 4.5 The for LOOD oooiiiniiiiiieiie ittt et e e e s e e s re e e re e e s aaaaas 4-5 4.6 The next StAtEIMENt .....ceeuuiieiiiiiiireiee ettt eriier et eeriereeetieraetiesrtaeessieesenaasssannn 4-6 4.7 The eXit StAEMENE ....oieviinieiireriieeteeeeriereierreeerneeeetteeeeteersneereniesreneessranes 4-7 5 String Manipulation 5.1 String Varables 5.1.1 5.1.2 .....coeiiiiiiiiiiiir e eeaas 5-1 ee e ........ccccceiiiiiiiiiiiiiiiiini e String vs. Numeric Variables .......c..cccoeviiiiiiiiiiiiiiieiiniiieciinncies e eeenns 5-1 Built-In String Variables 5.2 String Concatenation 5.3 String Manipulation FUnctions iv Contents et .......cccoeieuirieneruiieiieiieireiee et eene et et erteeataaeneraaaaaes .......cccociiiiiiiiiiiiiriiiiieeiriein e ev 5-3 5-3 54 6 Arrays 6.1 Arrays with Integer SUbSCTIPIS VAN € 15 111 /T BN & o) £ 6.2.1 ...oceuuiiieiiiiiiiiiii e e e 6-1 TS 6-2 String Subscripts vs. Numeric Subscripts .......cccoceiiiiiiiiiiiiiiciieeeenne, 6-3 6.3 Deleting Array EIEMents ....cccouciiiiiiieriiiiiiiiiiiiiiiiir et eereee et e ee e 6-3 6.4 Multidimensional AITAYS ....ieeuviiiiiiiiieiiirieeriieeriiieereteerieerreeriieeesneernnnennees 6—4 7 User-Defined Functions 7.1 Defining FUNCHONS ...viiiiiiiiiiiiiiiiiiiiin ettt e e eeae s e e e eaee e eeeas 7-1 I o1 ) (6 s E T 7-3 7.3 Call By ValUe 7-3 7.4 Passing Arrays to FUNCHONS 8 Enhancing Your nawk Programs 8.1 The getline FUNCLION .ottt e e e e ra e eaas ....couoiiiiiiiiiiiiiiiiiier e e 7-4 ....cvvuiiiiiiieiiieiiieiie e e eei e e e e eea e e veeernee e ns 8-1 8.1.3 Reading from the Current Input ........cccooimiiiiiiiiiiiieeriee e, Reading a Line into a String Variable .........ccccoeiiiviiiiiiiiiiiiriniiiieieiens Reading froma New File .....ccocooiiiiiiiiiiii e, 8-1 8-1 8-2 8.1.4 Reading from Other Commands 8-2 8.1.5 Redirecting Output to Files and Pipes 8.1.1 8.1.2 8.2 The system FUNCHON 8.3 Compound ASSIZNMENES 8.4 The sOrtgen Program A Order of Operations B Example Files ..........ccoeeuiiriiiiiiniiiiiiiiiiiiiirenereeranenn. ........cccccovvviiveiviiiiiiiiiiiieeeeen, ....cocuvuiiiiiieiiiiieiieiiiiieeniiie e e e ieeiieeee e e se e e evaenesrennereans 8-3 - 8-3 .....uiiiiiuniiiiininriiriareiiiereetrenieeerrssnsarsissserrmsermaerenns 8-3 ....c..ccoiiiiiiiiiiiiiiriiieiiiireeieeirenieeeerneee e sere s erneerenneennnans 84 ~ Examples 8-1: sortgen Program for nawk .......cccoviiiiiiiiiiiiiiii e ee 84 Contents v Tables 2-1: ArithmetiC OPEratiOnS .......ueiuiieuieiinieiiieeieriie et et e e eeie e e eneeenaeeaesneeeneeanns 2-2: Format String Placeholders ........cccooiiiiiiiiiiiiiii 2-3: Escape Sequences for Nawk ....ooooiiiiiiiiiii 2-1 24 e 2-6 ..........cccccveiiiiiiniiiiiiiniiieeeiceee e 2-9 ...........ccccoviiiiiiiiiiiiiiiiiiiiicc e, 2-11 ....cocoveiiviniiiiiiiiiiniiicie e, 3-2 5-1: Built-In String Variables ......ccoooveiiiiiiiiiiirice e e 5-2 8-1: Compound ASSIZNMENLS ...cc.vvvueieiuiniiiieerieeeinereeiieteererneerriereerreseenaerreneesnnnns 8-3 2-4. Built-In Record-Oriented Variables 2-5: Common Mathematical Functions 3-1: Metacharacters Recognized by nawk vi Contents About This Manual The Guide to the nawk Utility introduces the important principles and concepts of the nawk programming language and utility, and shows how they can be used for productive programming. This manual is a tutorial that teaches you how to use nawk; it is also a reference manual that you can use later. Audience This manual is a guide for intermediate users of the ULTRIX system. If you are a novice user, you might want to read the chapter on regular expressions in The Big Gray Book: The Next Step with ULTRIX before using this manual. Organization This book contains eight chapters and two appendixes. The following list gives a brief description of the book’s contents: Chapter 1 Introduces nawk and describes the basic concepts of the language. Chapter 2 Describes how to use nawk to perform mathematical calculations. Describes how to use pattern matching and regular expressions in Chapter 3 nawk programs. Chapter 4 Describes the actions you can make nawk perform, and discusses how Chapter 5 Describes how to manipulate strings with nawk. Chapter 6 Describes how to use arrays of information with nawk. Chapter 7 Describes how to create your own custom functions for nawk to use control structures to create more powerful nawk programs. programs. Chapter 8 Describes how to tailor your nawk programs. Appendix A Describes the order in which nawk performs operations when Appendix B Contains copies of the example files used in this manual. executing a program. Related Documents The Little Gray Book: An ULTRIX Primer introduces the ULTRIX operating system and some of the tools and utilities discussed here, and is a handy reference as you read this book. The Big Gray Book: The Next Step with ULTRIX provides more information on ULTRIX utilities. The Guide to the nawk Utility is a thorough tutorial description of an enhanced version of the awk utility discussed in The Big Gray Book. Another excellent reference for nawk is The AWK Programming Language , by Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan (Addison-Wesley, 1988). Aho, Weinberger, and Kernighan created awk, of which nawk is an enhanced version, at AT&T Laboratories. The ULTRIX Reference Pages provide details of the commands and utilities described in this book. Experienced programmers may prefer to turn directly to nawk(1) in the Reference Pages. Conventions The following typeface conventions are used in this manual: % The default user prompt is your system name followed by a right angle bracket. In this manual, a percent sign (% ) is used to represent this prompt. user input This bold typeface is used in interactive examples to indicate typed user input. system output This typeface is used in interactive examples to indicate system output and also in code examples and other screen displays. In text, this typeface is used to indicate the exact name of a command, option, partition, pathname, directory, or file. UPPERCASE lowercase The ULTRIX system differentiates between lowercase and uppercase characters. Literal strings that appear in text, examples, syntax descriptions, and function definitions must be typed exactly as shown. rlogin In syntax descriptions and function definitions, this typeface is used to indicate terms that you must type exactly as shown. Sfilename In examples, syntax descriptions, and function definitions, italics are used to indicate variable values; and in text, to give references to other documents, macro In text, bold type is used to introduce new terms. A vertical ellipsis indicates that a portion of an example that would normally be present is not shown. CTRL/x This symbol is used in examples to indicate that you must hold down the CTRL key while pressing the key x that follows the slash. When you use this key combination, the system sometimes echoes the resulting character, using a circumflex (#) to represent the CTRL key (for example, AC for CTRL/C). sequence is not echoed. viii About This Manual Sometimes the Basic Concepts 1 The nawk language is an easy-to-use programming language that lets you work with information that is stored in files. With nawk programs, you can do these things: e Display all of the information in a file, or selected pieces of information e Perform calculations with numeric information from a file e Prepare reports based on information from a file e Analyze text for spelling and frequency of words and letters At first glance, these operations seem elementary. However, later chapters show how they can be combined to perform complicated tasks. You will find that nawk is a good first programming language. It allows most of the logical constructs of modern computing languages: if-else statements, while and for loops, function calls, and so on. It is easy to learn, and allows beginners to get results with little effort. At the same time, it introduces all the important concepts of programming and prepares users for more complicated languages. Every programming language has its own way of looking at the world. To write programs in the language, you must learn to see things from the language’s point of view. This chapter examines the fundamentals of nawk: 1.1 ¢ The kind of information it works with e The ‘“‘shape’” of a nawk program e How to run nawk programs Data Files Almost all nawk programs work with data. Programs can obtain data typed in from the terminal or from the output of other commands (through pipes ); but usually data is obtained from data files. Data files for nawk are always text files. This means that the files contain readable text, made up of letters, digits, punctuation characters, and so on. For example, you could create a data file containing information about the hobbies of a group of people. Each line in this file would give a person’s name, one of that person’s hobbies, how many hours a week the person spends on the hobby, and how much money the hobby costs per year. Using a separate line for each of a person’s hobbies, the file might look like this: reading bridge role-playing 15 100.00 Jim Jim 4 5 10.00 70.00 Linda bridge 12 30.00 Jim Linda cartooning 5 75.00 Katie jogging 14 120.00 Katie reading 10 60.00 John role-playing 8 100.00 John jogging Andrew wind-surfing Lori jogging Lori weight-lifting Lori bridge 8 30.00 20 1000.00 5 30.00 12 200.00 2 0.00 If you want to follow the examples using this file, create a copy of the file and name it hobbies. There are other example files used in this manual; you might want to create copies of them as well. 1.1.1 Appendix B contains copies of all the example files. Records A nawk data file is a collection of records. A record contains a number of pieces of information about a single item; these pieces are called fields. In the hobbies file, each line is a separate record, giving a complete set of information about one person’s hobby. Records are separated by a record separator character, which is usually the new- line character. A new-line character shows where one line of text ends and another begins; in a file using new-line as a record separator, each line of the file is a separate record. All the examples in this manual use the new-line character as a record separator. 1.1.2 Fields A record consists of a number of fields. A field is a single piece of information. For example, the following record from the hobbies file contains four fields: Jim reading 15 100.00 The information in the first field is Jim, the second is reading, and so on. Specify fields in the same order in each record; that way nawk and other tools can easily access a particular piece of information in any record. The fields of a record are separated by one or more field separator characters. In the hobbies file, strings of blank characters (spaces) separate the fields. By default, nawk uses white space (any number of blanks or tab characters) to separate fields. 1.2 You can change this default, as you will see in Section 1.3.1. The Shape of a Program A nawk program looks like this: pattern { actions } pattern { actions } pattern { actions } Each line is a separate instruction or rule. The nawk utility looks through the data files record by record and executes the rules, in the given order, on each record. 1-2 Basic Concepts 1.2.1 Simple Patterns A rule has this form: [ pattern [ {actions} ] The form of a rule is called its syntax. This syntax indicates that the given set of actions is to be performed on every record that meets a certain set of conditions. The conditions are given by the pattern part of the rule. The brackets indicate that both the pattern part and the actions part are optional. The pattern of a rule often looks for records that have a particular value in some field. The notation $1 stands for the first field of a record, $2 stands for the second field, and so on. The special notation $0 represents the entire record. A pair of equal signs (==) stands for ‘‘is equal to.”” For example: $2 == "jogging" { print } This rule tells nawk to print any record whose second field is jogging. This rule is a complete nawk program. If you ran this program on the hobbies file, nawk would look through the file record by record (line by line). Whenever a line had jogging as its second field, nawk would print the complete record. The output from the program would therefore be as follows: Katie John jogging jogging 14 8 120.00 30.00 Lori jogging 5 30.00 Here is another example; ask yourself what the following nawk program does: $1 == "John" { print } As you probably guessed, this program prints every record that has John as its first field. The output would be as follows: role-playing jogging John John 8 8 100.00 30.00 The same sort of search can be performed on any text database. The only difference is that databases tend to contain a great deal more data than the example contains. The previous examples both used the print action. In fact, this action does not have to be written explicitly; if a nawk rule does not contain an action, print is assumed. The two example programs you’ve seen could have been written as follows, with the same effect: $2 == "jogging” and $1 == "John" The use of the two equal signs (==) is an example of a comparison operation. The nawk language recognizes several other types of comparison: != < Not equal Less than > Greater than <= >= Less than or equal QGreater than or equal For example, consider each of the following rules as complete programs, and decide what the programs do with the hobbies file: Basic Concepts 1-3 (a) $1 !'= (b) $3 > 10 100.00 (c) $4 < (d) $4 <= "Linda" { print } 100.00 These rules have the following effects: 1.2.2 (a) Prints all records whose first field is not Linda. (b) Prints all records whose third field is greater than 10. there 1s no explicit action, print is assumed. (c) Prints all records whose fourth field is less than 100.00. (d) Prints all records whose fourth field is less than or equal to 100.00. Remember that when Numbers and Strings In the previous examples, there are quotation marks (" ) around Linda in (a), but none in any of the other rules. The nawk language distinguishes between string values, which are enclosed in quotation marks, and numeric values, which are not. A string value is a sequence of characters like ="abc". Any characters are allowed, even digits, as in "abc123". Strings can contain any number of characters. A string with zero characters is called the null string and is written " ", A numeric value is mostly made up of digits, but it can also have a sign and a decimal point. The following are all valid numerical values in nawk: 10 0.34 -78 +2.56 -.92 The nawk language does not let you put commas inside numbers. For example, you must write 1000 instead of 1,000. Note The nawk utility lets you use exponential or scientific notation. Exponents are given as e or E followed by an optionally signed exponent. Thus, the following values are all equivalent: 1E3 1.0e3 10E2 1000 When numbers are compared (with operators like > and <), comparisons are made in accordance with the usual rules of arithmetic. When strings are compared, comparisons are made in accordance with the ASCII! collating order. This is a little like alphabetical order; for example: $1 >= "Katie" This program will print out the Katie, Linda, and Lori lines, as you would expect from alphabetical order. However, ASCII collating order differs from alphabetical order in a number of respects; for example, lowercase letters are greater than uppercase ones, so that a is greater than 2. The complete ASCII collating order is given in the ascii(7) Reference Page. 1 ASCII is an abbreviation for American Standard Code for Information Interchange; most computer systems use the ASCII code to represent characters. 1-4 Basic Concepts 1.2.3 The Print Action So far, the only action you have learned is print. As you have seen, print can display an entire record. It can also display selected fields of the record, as in the following example: print S$1 { $2 == "bridge" } This rule displays the first field of every record whose second field is bridge. The output is as follows: Jim Linda Lori The print command can display more than one field. If you give print a list of fields separated by commas, print displays the given fields separated by single blanks. For example: print { "Jim" $1 == } $2,$3,%4 This program produces the following output: 15 reading 4 bridge 100.00 10.00 role-playing 5 70.00 The print action can display strings and numbers along with fields. For example: $1 == "John" print { "$",$4 } This program’s output looks like this: $ 100.00 $ 30.00 In this example, the print action prints out a string containing a dollar sign ($) followed by a blank, followed by the value of the fourth field in each selected record. As an exercise, predict the output of the following programs: (a) (b) (c) 1.2.4 "Lori" { print $1, "spends $", $4,"on",$2 } $1 "JoggingTM { print $1,"jogs",$3,"hours a week" $2 $4 > 100.00 { print $1, "has an expensive hobby" } } Additional Points About Rules You can put any number of extra blanks and tabs into nawk patterns and actions. For example: { print $1 , $2 , $3 } You can leave out the pattern part of a rule. In this case, the action part is applied to every record in the file. The following example is a complete nawk program that displays every record in the data file. { print } You can leave out the action part of a rule. In this case, the default action is print. The following example is a complete nawk program that displays every record whose first field is Andrew: $1 == "Andrew" This is equivalent to the following: $1=="Andrew" { print } Basic Concepts 1-5 When a nawk program contains several rules, nawk applies every appropriate rule to the first record, then every appropriate rule to the second record, and so on. Rules are applied in order. For example: $1 == "Linda" $2 == "bridge" { print $1 } This program produces the following output: Jim Linda bridge 12 30.00 5 75.00 Linda Linda cartooning Lori The nawk program looks through the file record by record. The following record is the first to satisfy one of the patterns: Jim bridge 4 10.00 As a result, nawk prints out the first field of the record (as dictated by the second rule). The next record of interest is Linda bridge 12 30.00 This record satisfies the pattern of the first rule, so the whole record is printed. It also satisfies the pattern of the second rule, so the first field is printed again. The nawk program continues through the file, record by record, executing the appropriate actions when the pattern is satisfied. 1.3 Running nawk Programs You can run nawk programs in two ways: * From a command line e From a program file The following sections describe these two methods. 1.3.1 The nawk Command Line The simplest nawk command line has the following form: nawk ‘program’ datafile The nawk program is enclosed in apostrophes, or single quotation marks (’ ). datafile argument gives the name of the data file. command executes the program $1 % nawk ’‘$1 == "Linda"’ == The For example, the following "Linda" on the hobbies file: hobbies You can also type in a multiline program within apostrophes, provided that the shell you are using allows this construction. For example: nawk ' $1 == "Linda" $2 == "bridge" " { print $1 } hobbies As mentioned in a previous section, the default is for nawk to assume that record fields are separated by space and tab characters. If the data file uses different field 1-6 Basic Concepts separator characters, you must indicate this on the command line. You do this with an option of the following form: —Fstring The string lists the characters used to separate fields. For example: nawk -F":" /{ print $3 }'’ file.dat This rule indicates that the given data file uses colons (:) to separate fields in its records. The -F option must come before the quoted program rules. 1.3.2 Program Files Short programs like the ones discussed in this chapter can be entered on a single command line. Later chapters discuss longer programs, which cannot be typed on a single line. Such programs are most easily executed from a program file. A program file is a text file that contains a nawk program. You can create program files with any text editor. For example, you might create a program file named lbprog.nawk that contains the following lines: $1 == "Linda" $2 == "bridge" { print $1 } To execute a program on a particular data file, use the following command: nawk -f progfile datafile The name progfile is the name of the file that contains the nawk program, and datafile is the name of the data file. The following example runs the program in lbprog.nawk on the data in hobbies: nawk -f lbprog.nawk hobbies If the data file does not use the default separator characters, you must specify a -F option after the progfile name. For example: nawk -f prog.nawk -F":" file.dat As an exercise, execute the examples in this chapter on the hobbies file. Run some from the command line and some from program files. 1.3.3 Sources of Data If you do not specify a data file on the command line, nawk reads data from the terminal. If you issue a command as in the following example, nawk prints the first word of every line you type in: nawk '{ print $2 }’ When you are entering data from the terminal, mark the end of the data by typing CTRL/D. For example: % nawk ’'{ print $1 reading Jim }’ 15 100.00 bridge 4 10.00 role-playing 5 70.00 12 30.00 reading Jim bridge Jim role-playing Linda bridge bridge Basic Concepts 1-7 Linda cartooning 5 75.00 cartooning % You can specify several data files on the nawk command line. nawk —-f progfile datal data2 data3 For example: ... When nawk finishes reading the first data file, datal, it moves to dataz2, and so on. 1.3.4 Saving nawk Output You can save a nawk program’s output in a file by using output redirection. To do this, specify a right angle bracket (> ) and a file name at the end of any nawk command line. For example: nawk -f progfile datafile >outfile This command line writes the output from the nawk program to a file named outfile. In this case, the output is not displayed on the terminal screen. For more information about redirection, see the chapter on the shell in The Little Gray Book: An ULTRIX Primer. 1-8 Basic Concepts Simple Arithmetic 2 The nawk language makes it easy for you to perform calculations with numbers contained in data files. This chapter discusses how nawk does arithmetic and shows examples of programs using these features. Note that nawk performs arithmetic operations in exactly the same way as the C programming language. Therefore, knowledge of nawk is good preparation for learning C. 2.1 Arithmetic Operations Here is an example of a nawk program that uses simple arithmetic: $3 > 10 { print $1, $2, $3-10 } In the print statement, $3~-10 subtracts 10 from the value of the third field in the record. The print statement prints this result. If you apply this program to the hobbies file shown in the previous chapter, the output will be as follows: Jim reading 5 Linda bridge Katie jogging 2 4 Andrew wind-surfing 10 Lori 2 weight-lifting The program works like this: if someone spends more than 10 hours on a hobby, the program prints the person’s name, the name of the hobby, and the number of extra hours the person spends on the hobby (the number of hours more than 10). The notation $3-10 is called an arithmetic expression. It performs an arithmetic operation and comes up with a result; the result of the arithmetic is called the value of the expression. The nawk language recognizes the arithmetic operations shown in Table 2-1. Table 2-1: Arithmetic Operations Operation Operator Example Addition A+ B 243 1is 5 Subtraction A - B 7-31is 4 Multiplication A * B 2*4 15 8 Division A/ B 6/31s 2 Negation - A -9i1s -9 Table 2-1: (continued) Operation Operator Example Remainder A% B T%3 1s 1 Exponentiation A "B 3721is 9 The remainder operation is also known as the modulus or integer remainder operation. The value of a modulus operation is the integer remainder you get when you divide A by B. For example: 7% 3 This expression has a value of 1, because when you divide 7 by 3, you get a quotient of 2 and a remainder of 1. The value for the exponentiation operation A ~ B is the value of A raised to the exponent B. For example: 372 This expression has the value 9 (that is, 3x3). Here are some programs that perform simple arithmetic with the hobbies file. Try to figure out what they do and what they will print out. (a) (b) (c) $1 == "Katie" { print $2, $3/7 } { print $1, $2, $3/7 } $1 == "Jim" { print $1, $2, "s$", (d) { print $1, "$", $4*1.05 $4/52 } } After you have thought about the programs, run them to see if they produce the output you have predicted. An explanation of each program follows: (a) Because field 3 gives the average number of hours per week that a person spends on a hobby, $3/7 shows the average number of hours per day. Program (a) therefore prints out the number of hours per day Katie spends on each of her hobbies. (b) This is a variation on program (a). It prints out the number of hours per day (c) Field 4 gives the amount of money a person spent this year on a particular hobby. Dividing this by 52 gives the average amount of money spent per week. (d) If the current inflation rate is 5 percent, multiplying this year’s expenses by 1.05 will give the amount of money the same person might expect to spend next each person spends on each hobby. year. This is the information that program (d) prints out. 2.1.1 Operation Ordering Expressions can contain several operations. For example: A+B*C As is customary in mathematics, all multiplications and divisions (and remainder operations) are performed before additions and subtractions. When handling the expression A+B*C, nawk performs B*C first and then adds A. The value of 2+3*4 is therefore 14 (3x4 first, then add 2). If you want a particular operation done first, enclose it in parentheses. For example: 2-2 Simple Arithmetic (A+B) *C When evaluating this expression, nawk performs the addition before the multiplication. Therefore, (2+3) *4 is 20. (Add 2 and 3 first, then multiply by 4.) For example, consider the following program: { print $4/($3*52) } Field 4 is the amount of money a person spent on a hobby in the last year. Field 3 is the average number of hours a week the person spent on that hobby, so $3*52 is the number of hours in 52 weeks (one year). The value $4/ ($3*52) is therefore the amount of money that the person spent on the hobby per hour. Appendix A shows the order of evaluation for nawk expressions. 2.2 Formatted Output With nawk, you can specify the format you want your output to take. For example: $1 == "Jim" { print "$", $4/52 } This program produces the following output: $ $ 1.923077 .192308 $ 1.346154 This output shows the amount of money per week that Jim spent on his hobbies. However, it is customary to write money amounts with only two digits after the decimal point. How can you change the program to make the money amounts look more normal? The answer is to use the printf action instead of print. The printf statement lets you specify the format in which output should be printed. A printf action has the following form: { printf format-string, value, value, ... } The format-string indicates the format in which output should be printed. The values give the data to be printed. A format string contains two kinds of items: e Normal characters, which are just printed out as is e Placeholders, which are replaced with values given later in the print f action As an example, try running the following program on the hobbies file: $2 == "bridge" { printf "%5s plays bridge\n", $1 } This nawk program will produce the following output: Jim plays bridge Linda plays bridge Lori plays bridge The following format string has one placeholder, $5s: "%$5s plays bridge\n" The first (and only) value printed by this program is $1; when the printf statement prints its output, the placeholder is replaced by the value of field 1. The rest of the format string is printed as is. (Note that the format string ends in \n; this symbol is explained in Section 2.2.2. Simple Arithmetic 2-3 2.2.1 Placeholders The form of a placeholder tells nawk how to print out the associated value. All placeholders begin with a percent sign ( %) and end in a letter. Table 2-2 shows the most common letters used in placeholders. Table 2-2: Format String Placeholders Placeholder Description d An integer in decimal form (base 10) e A floating point number in scientific notation, as in -d .ddddddE+dd f A floating point number in conventional form, as in -ddd . dddddd g A floating point number in either e or £ form, whichever is shorter; also, non-significant zeroes are not printed o) An unsigned integer in octal form (base 8) S A string X An unsigned integer in hexadecimal form (base 16) For example, the following format string contains two placeholders: "%$s %d\n" The notation %s represents a string and $d represents a decimal integer. You can put additional information between the percent sign and the letter at the end of the placeholder. If you put an integer there, as in $5s, the number is used as a width. The corresponding value is printed using (at least) the given number of characters. For example: $2 == "bridge" { printf "%5s plays bridge\n", $1 } Here, the value of the string $1 replaces the placeholder $5s and is always printed using at least five characters. The output, therefore, is as follows: Jim plays bridge Linda plays bridge Lori plays bridge If you did not specify the 5 in the placeholder, the output would be different. For example: $2 == "pridge" { printf "%s plays bridge\n", $1 } This program produces the following output: Jim plays bridge Linda plays Lori bridge plays bridge If no width is given, nawk prints values using the smallest number of characters possible. The nawk language also lets you put a minus sign ( - ) in front of the number in the width position. The amount of output space will be the same, but the information will be left-justified. For example: $2 == "bridge" 2—4 Simple Arithmetic { printf "%-5s plays bridge\n", $1 } This program’s output looks like this: Jim plays bridge Linda plays bridge Lori plays bridge A placeholder for a floating point number may also contain a precision. This is written as a decimal point followed by an integer. A precision determines the number of digits to be printed after the decimal point in a floating point number. For example: $1 == "John" { printf "$%.2f\n", $4/52 } Here, the placeholder %. 2 £ indicates that all floating point numbers are to be printed with two digits after the decimal point. This program produces the following output: $1.92 $.58 on on role-playing jogging Using both a width and a precision can improve the appearance of your program’s output. For example: $1 == "John" { printf "$%4.2f on %s\n", $4/52, $2 } This program’s output looks like this: $1.92 on role-playing $0.58 on jogging The %4 . 2f indicates that the corresponding floating point value are to be printed with a width of four characters, with two characters after the decimal point. Note that the decimal point itself is counted in the width. Here are a few more nawk programs that work on the hobbies file. Predict what each will print out, and run them to see if your prediction is right. 2.2.2 (a) { printf "%6s (b) { printf "%20s: (c) $1=="Katie" { %s\n", $1, $2 } %2d hours/week\n", printf "%20s: $2, $3 } $%6.2f\n",$2,%4 } Escape Sequences All of the format strings shown so far have ended in \n. This kind of construct is called an escape sequence. All escape sequences are made from a backslash character ( \ ) followed by one, two, or three other characters. You use escape sequences inside strings to represent special characters. In particular, the \n escape sequence represents the new-line character. A \n in a printf format string tells nawk to start printing output at the beginning of a new line. For example: $1 == "Lori"TM { printf " %s", $2 } This program produces the following output: jogging weight-1lifting bridge The output is all on one line; without the \n escape sequence, print£ does not start new lines. This action is different from that of print, which begins a new line each time it executes. You can use the \n escape sequence in the middle of a format string. For example: $1 == "John" { printf "%s:\n $d\n",$2,$3 } Simple Arithmetic 2-5 This program’s output looks like this: role-playing: 8 jogging: 8 The first new-line escape sequence starts a new line after the colon; the second starts a new line after the value of $3. Table 2-3 shows the valid nawk escape sequences. Table 2-3: Escape Sequences for nawk Escape Interpretation Escape Interpretation \" Quotation mark \n New-line \a Audible bell \r Carriage return \b Backspace \t Horizontal tab \f Formfeed \v Vertical tab \ooo ASCII character, octal ooo Use the escape sequence \ " (a backslash followed by a quotation mark) when you want a string to contain an actual quotation mark. For example: "He said, \"Hello\"." By entering this escape sequence, you indicate that the quotation mark character is inside the string; if you left out the backslash, nawk would think that the quotation mark before Hello was marking the end of the string. Because a backslash followed by another character looks like an escape sequence, you must type two backslashes (\\ ) if you want to put a single backslash character in a string. For example: { print "The backslash (\\) character" } The output from this program is as follows: The backslash 2.3 (\) character Variables Suppose you want to find out how many people have jogging as a hobby. To do this, you have to look through the hobbies file, record by record, and keep a count of the number of records that have jogging in their second field. This means you must remember the count from one record to the next. A nawk program remembers information by using variables. A variable is a storage place for information. Every variable has a name and a value. A variable is given a value with an action of the following form: name = value The nawk utility assigns the specified value to the variable that has the given name. The following example assigns the value 0 (zero) to the variable count: count = 0 Do not confuse the assignment operator (=) with the equality test operator 2-6 Simple Arithmetic (==). A single equal sign (=) stores a value in a variable. A pair of equal signs (==) see if two values are equal. tests to You can use variables in expressions. For example: 1 + count The value of this expression is the current value of count plus 1. Now consider the action in the following example: = count count + 1 Your nawk program first finds the value of count + 1 and then assigns this value to count. This action increases the value of count by 1. You can use this kind of action in a program to count how many people have jogging as a hobby. BEGIN { count = 0 } m [] $2 == "jogging" { count = count + 1 } END { printf "%d people like jogging.\n", count } Efl A line by line review of this program follows: 1] When a rule has BEGIN as its pattern, the associated action is performed before nawk has looked at any of the records in the data file. Therefore, nawk begins by assigning the value 0 to count. [2] This line adds one to count every time nawk finds a record with jogging in the second field. [3] When a rule has END as its pattern, the associated action is performed after nawk has looked at all records in the data files specified on the command line. Thus, after nawk has looked at all the records, the print £ action prints out the count of people who jog. The output from the program will be as follows: like 3 people jogging. Notice how the value of count is printed out in place of the $d placeholder. Here are a few more programs that use variables. Examine the programs and try to figure out what they are doing. (a) BEGIN $1 == END (b) { BEGIN $1 == END (c) { count { "John" printf = 0 } count = count "John has + 1 } %d hobbies.\n", count } { sum = 0 } "Linda" { sum = sum + $4 printf $%6.2f a year\n",sum { BEGIN { "Linda spends hours = 0 } } } $1 == "Lori" { hours = hours + $3 } END { printf "Lori passes %d hours/week\n",hours } Here is what each of these programs does: (a) This program counts the number of hobbies that John has. (b) This program adds up the amount of money that Linda spent on hobbies in the past year. (¢) This program calculates the number of hours a week that Lori spends on her hobbies. Using variables, you can write even more complex programs. For example, consider the following: BEGIN { sum = 0; count = 0 } Simple Arithmetic 2-7 $2 == "role-playing" count = count sum = sum + { + 1 $4 } END { printf "Average per person: $%6.2f\n", sum/count } This program has two variables. The count variable keeps track of the number of people with role-playing as a hobby, and sum keeps track of the amount of money spent on role-playing. When sum is divided by count, the result is the average amount spent on role-playing. Notice that the action part of the BEGIN rule contains two assignment instructions. A semicolon is used to separate the two instructions. The second rule in the program also has two assignments: count = sum = sum count + + 1 $4 These two instructions are on separate lines. When an action contains more than one instruction, you can separate the instructions with semicolons or put them on separate lines. Variables can be used in the pattern part of a rule. For example: BEGIN { $3 > max END { max { = 0 } max = printf $3 "The } maximum time is %d hours.\n", max } This program finds the maximum value of field 3 in the hobbies file. The maximum is set to O to start. Then, if a record has a value in field 3 that is greater than the current value of max, max is set to this new value. At the end of the data file, max will hold the largest value found. As an exercise, try to write a nawk program that examines the hobbies file and calculates the average number of hours per week that someone spends on any one hobby. Then write a program that calculates the average number of hours per year that a person spends on any one hobby. 2.3.1 The Increment and Decrement Operators You know how to advance the value held in a variable with an addition operation: count = count + 1 This is such a common operation that nawk has a special operator for incrementing variables by 1: count++ A pair of minus signs (—-) is the counterpart of ++. This operator decrements (subtracts 1 from) the current value of a variable. For example, to subtract 1 from count, you could use either of these two forms: count = count -1 count-- 2.3.2 Initial Values If you use any variable in an arithmetic expression before you assign the variable a value, the variable is automatically given the value 0. This means that the BEGIN rule in the following program could be left out: 2-8 Simple Arithmetic BEGIN $2 END 2.3.3 { == count = "jogging” { printf 0 { } count = "%$d people count + jog\n", 1 } count } Built-In Record-Oriented Variables The nawk language has several built-in variables that you can use in your programs. You do not have to assign values to these variables; nawk automatically assigns the values for you. Table 2-4 describes some of the important numeric built-in variables. These variables have to do with information about records. Table 2-4: Built-In Record-Oriented Variables Variable Description NR Contains the number of records that have been read so far. When nawk is looking at the first record, NR has the value 1; when nawk is looking at the second record, NR has the value 2; and so on. In a BEGIN rule, NR has the value 0. In an END rule, NR contains the total number of records that were read. The following rule prints the total number of data records read by the nawk program: END FNR { print NR } Like NR, but counts the number of records that have been read so far from the current file. When several data files are given on the nawk command line, FNR is set back to 1 when nawk begins reading each new file. Thus, the following rule will print the line number in the current file, followed by a colon, followed by the contents of the current line: { NF printf "%$d:%s\n",FNR, $0 } Gives the number of fields in the current record. For the hobbies file, NF is 4 for each line because there are four fields in each record. In an arbitrary text file, NF gives the number of words on the current line in the file; by default, the fields of a file are assumed to be separated by blanks, so each word on a line is considered to be a separate field. The following program therefore prints out the total number of words in the file: { count END { = count print + count NF } } You can use built-in variables in place of any other variable or value. For example, they can appear in the pattern part of a rule. NF > 10 { print For example: } This rule prints out any record that has more than ten fields. Here is another example: NR == 5 { print } This rule prints out record 5 in a file; the pattern selection criterion is true only when NR is S. Try to predict what the following example will do: { print S$NF } Simple Arithmetic 2-9 Because NF is the number of fields in the current record, it is also the number of the last field in the record. Therefore, SNF refers to the contents of the last field in a record, and the command in the previous example prints the last field in every record in the data file. To test your understanding of almost everything discussed in this chapter, try to predict what the following rule will print: (NR $ 5) == 0 The expression NR% calculates the remainder of NR divided by 5. The rule prints out a record whenever this remainder is equal to 0. Therefore, the rule prints out every fifth record from the data file. As an exercise, write nawk programs to do the following: (a) (b) Print every record that does not have exactly three fields. Print the total number of words and total number of lines in a text file. (This is two thirds of what the wc(1) command does.) (c) Print the total number of records that have either four fields or five fields. (d) Print the average number of words per line in a text file. Write these programs and test them by running them on arbitrary text files. Once you have solutions that work, compare them against the following answers: (a) NF (b) END = 3 { words { printf = words + NF "Words = %d, words, (c) NR } Lines NF == 4 { count = count + 1 } NF == 5 { count = count + 1 } { print count END (d) END = %d\n", } { words = { print "Average words } + NF = } %d\n", words/NR } There are often several ways to write a given program; your solutions may differ from the ones presented here. 2.4 Arithmetic Functions In nawk, a function can be compared to a car assembly line: you feed in various parts and raw materials at one end, and you get out a complete product at the other end. In nawk, a function is fed data values (called the arguments of the function) and the final product is also a data value (called the result of the function). You may already be familiar with this kind of function in mathematics. For example, mathematics uses sin to stand for a function that calculates the trigonometric sine of an angle. If you ‘‘feed’’ an angle into the sin function, the number returned is the trigonometric sine of the given angle. The angle is the argument of the function, and the sine is the result. In nawk, you use functions inside expressions. For example: y = sin(x) The right hand side of the assignment is a function call. The name of the function is sin; this name is immediately followed by the function’s arguments, which are 2-10 Simple Arithmetic enclosed in parentheses. When a nawk program contains a function call, nawk calculates the result of the function and uses that result in the expression that contains the function call. In the statement y=sin (x), nawk calculates the number that is the sine of the given angle and then assigns that number to the variable y. Another nawk function is sqrt, whose result is the square root of its argument. The following statement assigns the value 4 to x: x = sqrt (16) To show how you can use these functions, suppose you have a set of data that contains one number per line. Here is a program that reads these numbers and prints out the square root of each: { printf "Number: %f, Root: %f\n", $1, sqrt(s$l) } You can run this program with the following command line, and then type in numbers from the terminal: % awk ’'{ printf "Number: %f, Root: %f\n", $1, sqrt($1l) }’ Each time you press the RETURN key at the end of the line, nawk prints out the square root of the number. Any argument of a function can be an expression instead of a single value. For example: y = sin(2*x) Your nawk program will calculate the value of the expression and then use the resulting value as the argument of the function. The nawk language recognizes the most common mathematical functions, as shown in Table 2-5. Table 2-5: Function sin (x) Common Mathematical Functions Result Function Result Sine of x, where x is in sqrt (x) Square root of x int (x) Integer part of x rand () Random number n, 0<n<1 srand (x) Sets x as seed for rand () radians cos (x) Cosine of x, where x is in radians atan2 (yx) Arctangent of y/x in range -1 to T radians log (x) Natural logarithm (base e) exp (x) Exponential (e*) Several of these functions need a little more explanation. The int function takes a floating point number as an argument and returns an integer. The integer is the floating point number without its fractional part. For example: int (6.3) This expression has the value 6. The following expression has the value —7. Note Simple Arithmetic 2-11 that the fractional part is removed (truncated), not rounded. int (=7.4) The next expression has the value 8: int (8.99999) A call to rand returns a random number greater than or equal to 0 and less than 1. In this way, you can get a sequence of random numbers. You can use srand to set the starting point (seed) for a random number sequence. If you set the seed to a particular value, you will always get the same sequence of numbers from rand. This is useful if you want a program to use rand but obtain uniform results every time the program runs. As an example of how you can use rand, here is a sequence of instructions that could be used in a nawk program to simulate a roll of two six-sided dice. diel int(6 * rand() + 1) die? int(6 * rand() + 1 ) The function call rand () obtains a random floating point number from 0 to 1 (not including 1). Note that the function call needs the parentheses, even though rand requires no argument values. Multiplying the random number by 6 gives a floating point value from O to 6 (not including 6). Adding 1 gives a floating point value from 1 to 7 (not including 7). Applying the int function to this floating point value drops the fraction part, giving an integer from 1 to 6. 2-12 Simple Arithmetic Patterns and Regular Expressions 3 So far, this manual has discussed three kinds of patterns: comparisons, and the special patterns BEGIN and END. This chapter discusses a fourth kind: regular expressions. A regular expression is a way of telling nawk to select records that contain certain strings of characters. For example, the following rule tells nawk to print all records that contain the string ri: /ri/ { print } Applying this rule to the hobbies file produces this output: Jim bridge 4 10.00 Linda bridge 12 30.00 Lori jogging Lori Lori weight-lifting bridge 5 30.00 12 2 200.00 0.00 All these records contain ri, either in Lori or bridge. Regular expressions are always enclosed in slashes. For example: /ing/ This expression finds all the records that contain ing. The nawk language pays attention to the case of letters in regular expressions. For example, /11i/ will print the record that contains weight-1ifting; however, the /11i/ does not match the Linda records because the L in Linda is uppercase. It is important to recognize the difference between two rules like the following: $1 == "Lori" /Lori/ To satisfy the first of these patterns, a record must have its first field exactly equal to the string Lori. If the first field is Lorie, for example, the comparison will not be true and the pattern will not be satisfied. With the regular expression /Lori/ the string Lori can appear anywhere in the record, and can be all or part of a field. This regular expression would match a string like Lorie. 3.1 Using Matching Expressions If the pattern in a rule is a regular expression, nawk looks for a matching string anywhere in a record. Sometimes, however, you only want to look for a matching string in a particular field of a record. In this case, you can use a matching expression . Two types of expressions check for matches: * The following expression is true if the string matches the given regular expression: string * ~ /regular-expression/ The following expression is true if the string does not match the given regular expression: string '~ /regular-expression/ The statement in the following program looks for matching strings; applied to the hobbies file, it will print all records that have ri contained somewhere in the second field: $2 ~ /ri/ This example produces the following output: Jim bridge 4 10.00 Linda bridge 12 30.00 Lori bridge 2 0.00 The following rule looks for nonmatching strings; it will print all records that do not have the letter J somewhere in the first field: $1 '~ /3/ Note that the following two patterns are equivalent because $0 represents the whole record: /Lori/ $0 ~ /Lori/ 3.2 Metacharacters Several characters have special meanings when they are used in regular expressions. These special characters, known as metacharacters, are described in Table 3-1. Table 3-1: Character ~ Metacharacters Recognized by nawk Description Stands for the beginning of a field. For example: $2 ~ /*b/ { print } This rule prints any record whose second field begins with b. $ Stands for the end of a field. For example: $2 ~ /g$/ { print } This rule prints any record whose second field ends with g. Matches any single character (except the new-line). $2 ~ /i.g/ { print For example: } This rule selects the records with fields containing ing, and also selects the records containing bridge (idg). 3-2 Patterns and Regular Expressions Table 3-1: Character (continued) Description Means ‘‘or.”’ For example: /Linda|Lori/ This regular expression matches either of the strings Linda or Lori. Indicates zero or more repetitions of a character. For example, /ab*c/ matches abc, abbc, abbbc, and so on. It also matches ac (zero repetitions of b). The asterisk is most frequently used in conjunction with the period ( . * ). Because the period matches any character except the new-line, the period/asterisk combination matches an arbitrary string of zero or more characters. For example: } { print $2 ~ /*r.*g$/ This rule prints any record whose second field begins with r, ends in g, and has any set of characters between (for example, reading and role-playing). Similar to the asterisk, but stands for one or more repetitions of a string. For example, /ab+c/ matches abc, abbc, and so on; but it does not match ac. Similar to the asterisk, but stands for zero or one repetitions of a string. For example. {m,n} [X] /ab?c/ matches ac and abc, but not abbc, and so on. Indicates m to n repetitions of a character (where m and n are both integers). For example, /ab{2, 4 } ¢/ matches abbc, abbbc, and abbbbc, but nothing else. Matches any one of the set of characters X given inside the brackets. For example: $1 ~ /~[LJ}/ print { } This rule prints any record whose first field begins with either L or J. As a special case, [ : lower:] inside brackets stands for any lowercase letter, [ :upper:] inside brackets stands for any uppercase letter, [:alpha:] inside brackets stands for any letter, and [ :digit:] inside brackets stands for any digit. For example: /[[:digit:]}[:alpha:]1/ This expression matches a digit or letter. [~X] Matches any one character that is not in the set X that follows the circumflex ( ~ ). For example: $1 ~ /~["LJ]/ { print } This rule prints any record whose first field does not begin with L or J. $1 ~ /~[~[:digit:]1]/ { print } This rule prints any record whose first field does not begin with a digit. (X) Matches anything that the regular expression X does. Parentheses are used to control the way in which other special characters behave. For example, the asterisk ( * ) normally applies to the single character that immediately precedes it. For example, /abc*d/ matches abd, abcd, abccd, and so on. However, /a (bc) *d/ matches ad, abcd, abcbed, and so on. Patterns and Regular Expressions 3-3 When a metacharacter appears in a regular expression, it usually has its special meaning. If you want to use one of these characters literally (without its special meaning), put a backslash in front of the character. For example, the following statement prints all records that contain a dollar sign ( $ ) followed by a 1: /\$1/ { print } If you wrote the expression without the backslash, nawk would search for records in which the end of the record is followed by a 1, which is impossible. Because the backslash has this special meaning, it too is considered a metacharacter. If you want to create a regular expression that matches a backslash, you must therefore use two backslashes (\\ ). 3.3 Using Matching Expressions with Strings Until now, you have seen matching operations that contain regular expressions inside slash (/) characters. Matching operations can also refer to normal strings; for example: "xyz" ~ $1 This has the same effect as the following statement: $1 ~ /xyz/ Regular expressions are compiled when the program is read. To use a string as a regular expression, nawk constructs a dynamic regular expression out of the string. Dynamic regular expressions take more time to compile than regular expressions, but they are more powerful. When a matching operation uses a string instead of a regular expression, and the string contains one or more metacharacters, the situation is a little bit tricky. If you want to escape a metacharacter (have it taken literally), you must use two backslashes instead of one. For example, suppose you want to look for strings of the form "$1.00" in field 4 of a record. Using regular expressions, you would write the statement as follows to show that both the dollar sign ($) and the period ( . ) should be taken literally: $4 ~ /\$1\.00/ With strings, you would have to write the statement like this: $4 ~ "\\SI\\.00" Two backslashes are needed instead of one. The reason is simple: as discussed in Chapter 2, you need to type two backslashes inside a quoted string to get the effect of one. { For example: print "The backslash character: \\" } This program prints the following: The backslash character: \ To match an actual backslash with a dynamic regular expression, you must use four, as in: $1 ~ "\\A\A" The literal string "\\\\" is read by nawk and turned into a string consisting of "\\". When used as a dynamic regular expression, this will match one backslash. 3-4 Patterns and Regular Expressions 3.4 Applying Actions to a Group of Lines Pattern ranges let you apply an action to a group of lines. A rule that applies to a pattern range has the following form: patternl , pattern2 { action } This rule performs the given action on every line, starting at an occurrence of patternl and ending at the next occurrence of pattern2 (inclusive). For example: NR == 1, NR == 10 { print $1 } This rule prints the first field of each of the first 10 input lines. It starts when NR is 1 and ends when NR is 10. Here is another example, using the hobbies file as its data file: /Jim/, /Linda/ { print $2 } This example produces the following output: reading bridge role-playing bridge As you can see, this program prints the second field of all lines between an occurrence of Jim and an occurrence of Linda. After nawk has found a record matching pattern2 , it begins to look for a line matching patternl again. In the following example, nawk prints the first range of records from reading to role, then starts looking for reading again. /reading/, /role/ The output from this program looks like this: Jim reading 15 Jim bridge 4 100.00 10.00 Jim role-playing 5 70.00 Katie reading John role-playing 10 60.00 8 100.00 It is important to remember that nawk starts performing the rule’s action as soon as there is a record that matches patternl . A nawk program does not check to make sure that there is a line matching pattern2 in the rest of the file. For example: /Lori/, /Jim/ { print $2 } In this case, nawk begins printing at the first record that contains Lori, and continues until it reaches the end of the file, finding no record that matches the second pattern, Jim. 3.5 Combining Conditions in Patterns A double ampersand ( && ) operator means AND. It is used to combine conditions in patterns. For example: $3 > 10 && $4 > 100.00 { print $1, $2 } In this case, nawk prints the first and second fields of any record where $3 is greater than 10 and $4 is greater than 100.00. Here is another example: $1 ~ /J/ && $4 < 50.00 Patterns and Regular Expressions 3—5 This rule prints all records in which the first field $1 contains a J and the fourth field $4 is less than 50.00. The double vertical bar ( | | ) operator means OR. It is also used to combine conditions in patterns. For example: $1 == "Linda" || $1 == "Lori" This rule prints any record whose first field is either Linda or Lori. Here is another example: /jogging/ END { || /reading/ print sum { sum = sum + $4 } } This program calculates the total money spent by hobbyists on both jogging and reading (because sum is increased if the hobby is either jogging or reading). This program is equivalent to the following program: /jogginglreading/ END { print sum { sum = sum + $4 } } These last two examples demonstrate that there are often several ways of writing the same program. The double ampersand and double vertical bar operators can only be used to combine complete pattern expressions. For example, you cannot write a pattern like this: $1 == "Linda" || "Lori" You must write this kind of pattern this way: $1 == || "Linda" $1 == "Lori" For practice with the concepts discussed in this chapter, write programs that do the following: (a) Print every record that begins with A and contains more than four fields. (b) Print the number of records that contain a dollar sign ( $). (c) Print records 10 through 20 of every data file. (d) Print every tenth record of a file, plus the record that immediately follows the tenth record (records 10 and 11, records 20 and 21, and so on). When you have written your programs, compare them against the solutions that follow. Remember that there may be several ways to write the same program. (a) (b) /"A/ && NF > 4 count = count + 1 { /\$/ END { (c) FNR == (d) (NR % print 10, 10) count FNR == == 0, } } 20 (NR % 10) == 1 or ((NR % 10) == 3-6 Patterns and Regular Expressions 0) || ((NR % 10) == 1) Actions and Control Structures 4 So far, you have learned three actions: print, printf, and assignments. In this chapter, you will examine a wide variety of constructs that may appear in the action part of a nawk rule. Note that most of these are virtually identical to constructs in the C programming language. 4.1 Adding Comments A comment is a note inside your program, explaining what the program is doing. Your nawk program ignores comments, so they do not affect how your program behaves, but they do help explain what is going on. A comment begins with a number sign ( # ). When nawk sees the number sign in a program (outside of a quoted string or regular expression), it ignores the rest of the line. For example: # This program adds up the /John/ { sum = sum + END { print sum $3 } hours # John field 3 spends on hobbies is hours } The first line of this program explains what the program is doing. This is useful when you have a number of nawk programs stored in different files and you cannot remember which program is which. A comment at the beginning of the program lets you identify the program without having to read through the code and figure out what is going on. The following example shows another way in which you can use comments: /John/ { sum = sum + $3 } # field 3 is hours A comment on the end of a line can give further information about what that line is doing. In this case, it explains the meaning of the number in field 3 of the record. It is a good practice to use comments in your programs. Without meaningful comments, you may find it difficult to understand a program if you look at it several months after you wrote it. Comments also make it easier for others to understand the programs you write. 4.2 The if Statement An if statement lets you perform an action if a specified condition is true. The statement has the following form: if (expression) statementl else statement2 Typically, the expression in an if statement has a true/false value. If the value is true, statementl is performed; otherwise, statement2 is performed. The else statement2 part is optional. To see how if statements are used, consider the following programs, which examine a file of baseball scores. This file is named baseball, and it looks like this: 5 Tigers Brewers N Blue Blue 8 Red Jays 9 (o) Brewers Jays Sox 7 Each line gives the home team first and the visitors second. Fields in each record are separated by tab characters (shown here as wide spaces) instead of single blanks, because some team names contain blanks. This means that you must use the following option when you run command-line nawk programs on the baseball file: "F ” \ t " This option is equivalent to having the following line in a nawk program file: BEGIN { FS = "\t" } (The built-in FS variable is explained in Chapter 5.) Consider the following program: { if ($2 > $4) else print "Home" print "Visitor" } This program prints Home when the home team’s score ( $2 ) is greater than the visiting team’s, and prints Visitor otherwise. The else part of an if statement can be omitted. In this case, nawk does nothing if the expression of the if statement is not true. For example: $1 ~ END /Tigers/ { print { if win ($2 > $4) win++ } } This is a simple program that looks at all the Tigers” home games and prints out the number of times the Tigers won. On records where $2 is not greater than $4, nawk takes no action. As a more complicated example, consider this program: $1 ~ /Yankees/ { if ($2 > $4) else $3 ~ /Yankees/ { if ($4 > $2) else print "Home Win" print "Home print "Away Win" print "Away Loss" Loss" } } This program runs through the baseball scores looking for games involving the Yankees. Appropriate messages are written for each possible outcome. This next program is similar to the previous program. However, this program keeps track of the number of wins and losses, at home and away, then prints these values at the end: $1 ~ /Yankees/ { if ($2 > $4) else hw++ hl++ } $3 ~ /Yankees/ { if ($4 > $2) else END aw++ al++ { printf "Home Wins: printf "Home Losses: printf "Away Wins: printf "Away Losses: 4-2 Actions and Control Structures %d\n", hw %d\n", %d\n", hl aw %d\n", al 4.2.1 A Word on Style Note the way in which indentation is used in the preceding program: e Except in trivial cases, the program begins a new line after after every opening brace ( {). ' e Every else is lined up under the corresponding if. e Parallel statements, like the sequence of print f instructions, are lined up underneath each other. It is not necessary to write nawk programs in this way, but appropriate indentation and spacing make programs easier to read and understand. Your style for writing programs can also help you spot errors as you type in your program. For example, if you always try to make opening and closing braces line up, it is easy to notice if you leave out a brace. The indentation format used in the rest of this guide demonstrates a clean readable programming style. All programmers develop personal preferences as they become familiar with a language, and you may decide to deviate from this guide’s style in some respects. The important thing is to have a style and to follow it consistently in all your programs. It may not make much difference now, when your programs are relatively simple; but as your programs become more complex, you will find that style will be an important aid to writing programs that work correctly. 4.3 Using Compound Statements In an if statement, you might sometimes want to perform several instructions. You can do this by enclosing the instructions in braces. Such a construct is called a compound statement. For example, consider the following program: { if ($2 > } else { $4) { homewin++ printf "The %s defeated the %s.\n", $1, $3 %s defeated the %s.\n", $3, $1 homeloss++ printf END "The { printf printf "The home team won %d times.\n", homewin "The home team lost %d times.\n", homeloss } The first action is applied to every record in the file. It keeps a count of how many times the home team wins and how many times the home team loses. It also prints out a line telling who defeated whom. The END action summarizes the results after they have been calculated. As another example, the following program examines the games involving the Orioles: $1 ~ /Orioles/ if { ($2 > $4) { win++ printf } else # "%s: Home win %d, %s: %d\n",$1,$52,5$3,54 { loss++ # Home loss Actions and Control Structures 4-3 printf : %d\n",$3,54,%1,8$2 } } $3 ~ /Orioles/ { if ($4 > $2) { win++ # Away win printf } else "%¥s: %s: %d\n",$3,%4,51,S$2 { loss++ printf END %d, # Away "%$s: %d, loss %s: %d\n",$1,%2,$3,54 { printf "Wins: %$d, Losses: $d\n", win, loss } Each line of output from the first two actions will have the following form: Winning team: score, Losing team: score The final line of output (from the END rule) summarizes the Orioles’ wins and losses. Examine this program closely to see how it works. The program is straightforward, but you should make sure you understand how it covers all the possible cases. One if statement can contain another. For example, the previous program could have been written as follows: /Orioles/ if ¢{ ($2 > $4) { # Home team wins printf "%s: if ~ ($1 %d, %s: %d\n",$1,$2,$3,%4 /Orioles/) win++ else loss++ } else { # printf "%s: if ~ ($3 %d, Home %s: team loses %d\n",$3,54,81,52 /Orioles/) win++ else loss++ END { printf "Wins: %d, Losses: $d\n", win, loss } This version of the program determines whether the game was won by the home team, prints out the scores with the winner first, and then checks to see if the Orioles were the home team or the visitors. The previous version of the program split the problem into two parts: one action performed when the Orioles were the home team and one when they were not. 4.4 The while Loop A while loop repeats one or more other instructions as long as a given condition holds true. A while loop has the following format: while (expression) statement The statement can be a single statement or a compound statement. For example, the file numbers contains a set of one to ten random numbers on each line. The following program adds up the numbers on each line and prints the line’s total: 4-4 Actions and Control Structures sum = 0 i=1 while (i <= NF) sum = { sum + Si i=1i4+1 } print sum } The variable i counts fields in the record. While i is less than or equal to the total number of fields in the record, the while loop adds the value of the i th field to sum and then adds 1 to i. The loop then starts again; if the new value of i is still less than or equal to the total number of fields, the loop adds the value of the next field. The loop stops when i is greater than NF. As another example, here is a program that uses the same data file and prints out the maximum value on each line: { max = $1 # starting max is field 1 i=2 while (i <= if NF) ($i { > max) max = $§$i i=14+1 } print max } On each line, the variable max starts out with the value of the first field (the first number). The while loop then moves across the record number by number, using an if statement to test whether a field is greater than the current value of max. Ifa greater value is found, max is assigned the new maximum value. After the loop, the maximum value is printed. What does this program do if there is only one number on a particular line? In that case, NF would be 1. The nawk program would execute the following statements and find that 1 was already greater than NE': max = $1 i=2 while (i <= NF) Therefore, nawk would not execute any of the instructions in the while loop at all. If the condition part of a while loop is false when the loop is first encountered, the statements in the loop are not executed. As an exercise, try to write a program that reads a normal text file and writes out the text, one word per line. 4.5 The for Loop A for loop is another way to repeat instructions as long as a given condition holds true. A for loop has the following format: for (expressionl;expression2;expression3) statement This loop is equivalent to the following instruction sequence: expressionl while (expression2) { Statement expression3 Actions and Control Structures 4-5 For example, you could write the exercise given at the end of Section 4.4 as follows: { for (i = NF; i > printf printf 0; "%s i--) ", $i "\n" } The program that prints the maximum value in an input line could be written as follows: { max = $1 for (i = 2; 1 if print <= NF; ($i i++) > max) max = $i max } As you can see, the for loop is just a short-hand way of writing a certain kind of while loop. Another form of the for loop is described in Chapter 6. 4.6 The next Statement The next statement tells nawk to skip immediately to the next record in the data file. In the following example, a next statement is added to the baseball score program from Section 4.2. { if (NF < 4) { printf "Not enough fields: %s\n", $0 next } if ($2 else > $4) print print "Home "Home Win" loss" } If a particular record has less than four fields, this program will print a warning message and skip to processing the next record. This bypasses the rest of the instructions in the rule. It also bypasses any other rules that might normally be applied to this record. As this example shows, next is often used when a program finds a record that does not have the format you expect. You can also use next to skip to the next record if you do not want the record processed by any of the remaining rules. For example: $1 ~ /Orioles/ {count++; $3 ~ /Orioles/ {count++} next} This program prevents the record from being counted twice if it happens to have Orioles in both the first and third fields. You could also write this program as follows: ($1 ~ /Orioles/) || ($3 ~ /Orioles/) { count++ } Using the next instruction inside a BEGIN rule tells nawk to start normal processing (by reading the first record of the first file). In other words, the next instruction indicates that you have finished the action associated with the BEGIN pattern. 4-6 Actions and Control Structures 4.7 The exit Statement The exit statement makes a nawk program behave as if it has just reached the end of data input. No further input is read. If there is an END action, it is executed before the program terminates. As with next, exit is often used when input data is found to be in error. If exit appears inside the END action, it terminates the program immediately. Actions and Control Structures 4-7 String Manipulation 5 The preceding chapters have used quoted strings extensively. This chapter discusses strings in more detail and shows the various operations that manipulate strings. 5.1 String Variables In Chapter 2, you learned how to use numeric variables: variables that contained numbers. Variables can also contain strings. For example: a = "string" This statement assigns a string to a variable a. As an example of how this can be used, here is a simple program that checks a text file for duplicate lines (places where two adjacent lines are identical): { if ($0 == lastline lastline) = printf "%d: %s\n", FNR, $0 $0 } The variable 1ast1line represents the contents of the previous line in the file. In the action of the program, the current record $0 is compared to the previous record (stored in lastline). If the two are equal, the print £ action prints the line number FNR and the contents of the line. At the end of the action, 1lastline is assigned the contents of the current line (so that it can be compared to the next line). You might wonder what 1astline contains when the program first begins. After all, nothing is assigned to 1lastline until the first line has been read. All string variables begin with a null string value. A null string is a string, but it contains no characters. It is written " ". When used in an arithmetic expression, a null string has the value 0. As another example of a program that uses string variables, here is a program that writes out the last line of a file: END { line { print = $0 line } } The value of each input line is assigned to the variable 1ine. At the end of the file, line contains the contents of the last line in the file. - Therefore, the END action prints out the contents of that line. 5.1.1 Built-In String Variables In Chapter 3, you learned about the built-in numeric variables NF, NR, and FNR. The nawk language also provides the built-in string variables shown in Table 5-1. Table 5-1: Variable FILENAME Built-In String Variables Description Contains the name of the current input file. For example, when you apply programs to the hobbies file, the value of FILENAME is hobbies (if that is the file you are using). If the input is coming from the nawk standard input, the value of FILENAME is the string "-". FS The field separator string. Specifies the character that is used to separate fields in the current file. The default value for FS is " which as a special case matches both blank and tab. " (a single blank), However, if the command line contains a —F option specifying a different field separator, F'S is a string containing the given separator character. A program can also assign values to F'S to indicate new field separator characters. For example, you could create a data file whose first line gives the character that is to be used to separate fields in the records in the rest of the file. A nawk program could then contain the following rule: FNR == 1 { FS = $0 } This says that the field separator string F'S is to be assigned the contents of the first record in the current data file. The character in this line will then be used as the field separator for the rest of the file (unless the program changes the value of F'S again). Any FS value of more than one character is used as a regular expression. See the INPUT section of the nawk(1) reference page for details. RS The input record separator string. Just as F'S specifies the string that is used to separate fields within records, RS specifies the string that is used to separate one record from another. By default, RS contains a new-line character, which means that input records are separated by new-line characters. However, a different character may be assigned to RS. For example, the following statement says that input records are separated by semicolons (;): RS = n’." This would let you have several records on one line, or a single record that extends over several lines. To separate records by empty lines, specify the following: RS OFS = ""n The output field separator string. When the print action is used to print several values, as in { print A, B, C }, the output field separator string is printed between each two of the values. By default, OF S contains a single blank character. However, if you make the assignment OFS = " : ", the output values will be separated by space- colon-space. ORS The output record separator string. When the print action is used, the output record separator is printed at the end of each record. By default, ORS is the new-line character. OFMT The default output format for numbers when they are printed by print. This is a format string like the one used by printf. By default, it is % . 69, indicating that numbers are to be printed with a maximum of six digits after the decimal point. By changing OFMT, you can display more or less precision. 5-2 String Manipulation 5.1.2 String vs. Numeric Variables Because string variables start out with the null string value while numeric variables start out as 0, the question arises: how can nawk differentiate between string and numeric variables, especially when execution is starting and a variable has not been used yet? The answer is that a variable is assumed to contain a string unless you use it as a number. For example, if you have a program that consists of { print X } with no value assigned to X, the variable is assumed to be a string. Thus, the output will be a blank line for each line of input; if X had been taken as a number, the output would be zero for each line of input. In an action like X = $1, the variable X will be taken as a number if the form of $1 looks like a number; otherwise, it will be taken as a string. Consider the record in the following example: 3 ... Here, the first field looks like a number, so X will normally be taken to be a numeric variable. On the other hand, consider this example: 7ABC ... The first field cannot be a number (even though it starts with a digit), so X will be taken to be a string variable. There are times when you want a value to be treated as a string, even though it looks like a number. For example, suppose a file contains the string 1el. In some contexts, this could be a number (with an exponential part); in other contexts, you might want to interpret this as a string. To make sure that a value is taken as a string, even when it might look numeric, concatenate it with an empty string, by placing a pair of quotation marks (" ") after it. For example: X - $2 " This makes sure that the value in $2 is interpreted as a string, even if it looks like a number. Therefore, X will be a string variable. Similarly, if you want to make sure that a value is taken to be a number, just add zero to it. For example: X =8$3+0 In this case, $3 will be taken to be a number because it is involved in an arithmetic operation. What happens if $3 is not a valid number? If $3 starts with something that looks like a number, as in 7ABC, the numeric value of the string is the number. Thus, the numeric value of 7ABC is 7. If the field does not start with anything that looks like a number, the numeric value of the string is zero. Thus the numeric value of ABC is 0. 5.2 String Concatenation When a line in a program contains two or more strings that are separated only by blank characters, the strings are concatenated (joined) into one long string. The following expression is an example of string concatenation: $2 " String Manipulation 5-3 The following action prints the contents of the first three fields, joined together into one string: { print $1 $2 $3 } Suppose your input line is: ABC Then the output will be as follows: ABC Consider the following example as applied to the hobbies file: $1 ~ /John/ { print "$" $4 } This example’s output looks like this: $100.00 $30.00 The dollar sign ( $) is concatenated with the contents of the fourth field in all the appropriate records. 5.3 String Manipulation Functions Chapter 3 introduced numeric functions like sin and sqrt. The nawk language also provides the following functions that perform string operations: length Returns an integer that is the length of the current record (the number of characters in the record, without the new-line on the end). For example, the following program calculates the total number of characters in a file (except for new-line characters): END { sum = sum + { print sum } length } length(s) Returns an integer that is the length of the string s. For example, the following program prints out the length of the first field in each record of the data file: { print length($1) } The function call length ($0) is equivalent to length. gsub(regexp,replacement) Puts the replacement string replacement in place of every string matching the regular expression regexp in the current record. For example: { gsub (/John/, "Jonathan") print } This program checks every record in the data file for the regular expression John. Every matching string is replaced with Jonathan and printed out. As a result, the output of the program is exactly like the input except that every occurrence of John has been changed to Jonathan. This form of the gsub function returns an integer that tells how many substitutions were made in the current record. This result will be zero if the record has no strings that match regexp. 5—4 String Manipulation sub(regexp,replacement) Works like gsub, except that it only replaces the first occurrence of a string matching regexp in the current record. gsub(regexp,replacement,string var) Puts the replacement string replacement in place of every string matching the regular expression regexp in the string string var. For example: { gsub (/John/, "Jonathan", $1) print } This program is similar to the previous program, but the replacement is only made in the first field of each record. This form of the gsub function returns an integer that tells how many substitutions were made in string var. sub(regexp,replacement,string var) Works like gsub, except that it replaces only the first occurrence of a string matching regexp in the string string var . index(string,substring) Searches the given string for the appearance of the given substring. If the substring cannot be found, index returns zero; otherwise, it returns the number (origin 1) of the character in string where substring begins. For example: index ("abcd", "cd") This program returns the integer 3 because cd is found beginning at the third character of abcd. match(string,regexp) Determines if string contains a substring that matches the regular expression (pattern) regexp. If so, match returns an index giving the position of the matching substring within szring; if not, it returns zero. This function also sets a variable named RSTART to the index where the matching string starts, and sets a variable named RLENGTH to the length of the matching string. substxr(string,pos) Returns the last part of string , beginning at a particular character position. The argument pos is an integer, giving the number of a character. Numbering begins at 1. For example: substr ("abcd", 3) The value of this expression is the string cd. substr(string,pos,length) Returns the part of string that begins at the character position given by pos and has the length given by length. For example: substr ("abcdefg", 3, 2) The value of this expression is cd (a string of length 2 beginning at position 3). String Manipulation 5-5 sprint £(format,valuel ,value2,...) Returns the string value that would be printed by the following print£ action: printf (format,valuel,value2,...) For example, str = sprintf("%d %d!!!\n",2,3) assigns the string "2 311!\n" to the string variable str. tolowex(string) Returns the value of string , but with all the letters in lowercase. (This function is not found in all versions of awk.) toupper(string) Returns the value of string , but with all the letters in uppercase. (This function is not found in all versions of awk.) ord(string) Converts the first character of string into a number. This number gives the decimal value of the characterin the ASCII character set. (This functlonis not foundin all versions of awk.) 5-6 String Manipulation Arrays 6 In most programming languages an array is an ordered list of values, similar to a table of information. Arraysin nawk are more flexible than arrays in most other languages, but it is helpful to begin by discussing the traditional concept of an array. 6.1 Arrays with Integer Subscripts The simplest sort of array is a list of values (either numbers or strings). The values in the list are called the elements of the array. Elements in an array are most commonly referred to by number. For example, the first element in the array could be number 1, the second could be number 2, and so on. These numbers are called subscripts of the array elements. A nawk array has a name, similar to a variable name. To refer to an element of an array, you give the name of the array followed by brackets containing the element’s subscript. For example: arr([3] This statement refers to element 3 in an array named arr. A statement like the following creates an array named arr whose elements are all the fields of the current record: for (i=1; i<=NF; arr(i] = i++) $i The following program stores the entire contents of the input file in an array called lines: { lines[NR] = $0 } Remember that the variable NR is incremented by 1 for each line that is read in, so the elements in the 1ines array will be the lines of the input file, in order. The following program reads the contents of a data file and stores the input in lines: END { lines[NR] { for (i=NR; = $0 } i>0; i~--) print lines[i] } When all the lines have been read in, the END action prints out the lines in reverse order. The program therefore reads lines of text and then prints them in reverse order. As another example of the simple use of arrays, suppose you have a file that contains 12 columns of numbers and you want to add up the numbers in each column. You could do this with the following program: END { for (i=1; i<=12; i++) sum[i] { for (i=1l; i<=12; i++) print = sum[i] sum{i] } + $i } Each element in the array called sum holds a running total of the sum of numbers in the corresponding column. Notice that the previous examples make extensive use of the for statement. This is true of many programs that use arrays. Also notice that you do not need a special statement to create (declare) an array. If a statement in a program contains a name followed by a value in brackets, the name is assumed to refer to an array, and the array is created automatically. A name must not be used as both a variable and an array in the same nawk program. 6.2 Generalized Arrays Most programming languages let you create arrays that use numbers as subscripts; nawk also lets you create arrays that have string values as subscripts. For example, here is a program that calculates how much each person spends on all his or her hobbies. { money[$1] += $4 } The array in this program is named money:; the subscripts are the names of the people in the hobbies file. The elements of the array are therefore as follows: money ["Jim"] money ["Linda"] money ["John"] (Note that the following statements are equivalent: money [$1] += money[$1] = money[$1] $4 + $4 This notation is explained in Section 8.3.) Apply this program to the following input record: Jim reading 15 100.00 The action becomes money ["Jim"] += 100.00 As with all numeric variables, money ["Jim" ] starts out with a value of zero. At the end of the program, the array element will contain the amount of money that Jim spends on all his hobbies. To print the contents of the money array, you can use a new form of the for statement: for (s in money) print s, money][s] This form of the for statement executes the print action once for every value that is used as a subscript for the money array. In each loop, the variable s has one of the subscript values. Therefore, the first time through the loop, s might have the value Jim, the next time Linda, and so on. The order is undefined. Therefore, the complete program prints out the amount that each person spends on his or her hobbies: END { money[$1] { for += $4 (s in money) } print s, money([s] } Run this program to see how it works. After you have done so, replace the print 6-2 Arrays action with printf to produce more understandable output. Generalized arrays have a wide variety of applications. For example, the following program produces a list of all the words used in an input text file: { for (i=1l; i<=NF; wordlist[$i] END { for (x in i++) = 1 } wordlist) print x } Assigning 1 to each element of wordlist is just a dummy action; the important thing is that the program creates an element of word1ist whose subscript value is one of the words in the input text file. The for loop in the END action then prints out all the words that were used as subscript values; this list is the set of all words used in the file. As an exercise, modify the preceding program so that it keeps a count of how often each word is used in the input file. At the end, the program should print out each word that appears in the file and how often the word was used. 6.2.1 String Subscripts vs. Numeric Subscripts This chapter began by showing arrays with numeric subscripts because those types of arrays are most familiar to programmers. However, all nawk array subscripts are converted to strings. For example, the subscript in a [1] is converted to a string, giving a["1"]. Ina[01], the numeric subscript is first converted to its simplest form, a[1], which is then converted to the string a["1"] as before. Floating point subscripts are converted to the simplest equivalent integer, then converted to the corresponding string. Thus a[1.0] is convertedto a[1] and then converted to a ["1"]. Therefore, the following forms are all equivalent: all] all.0] af"1l"] Note that the array element a ["01"] is not equivalent to the ones in the preceding examples because "1" is not the same string as "01". 6.3 Deleting Array Elements Because array elements are stored in the computer’s memory, you can decrease memory requirements by deleting elements when you are finished using them. To do this, use the following statement: delete arrayname [subscript] For example: delete money["Jim"] As an extension of standard awk, the following statement deletes the entire array: delete money This statement is equivalent to the following: for (ind in money) delete money[ind] Arrays 6-3 6.4 Multidimensional Arrays The nawk language lets you define arrays with more than one subscript. Subscripts are separated by commas and enclosed in brackets, as in the following example: afli,2} = b['lcat", 3 "dog"' "bird"] — "horsell The following example creates a multidimensional array that records different animal names: name ["chicken"”", "female"] = name["chicken", "male"] "rooster” = "hen" name ["chicken", "young"] = "chick" name(["cattle", "female"] = "cow" name["cattle", "male"] "bull"” name ["cattle", "young"] = = "calf" As you can see, it is simple to create and manipulate a database that is just a multidimensional nawk array. 6—4 Arrays User-Defined Functions 7 Previous chapters discuss numeric functions like sin and sqrt, and string functions like gsub and length. This chapter shows how nawk lets you create your own functions to perform similar kinds of operations. 7.1 Defining Functions In a nawk program, a function definition looks like this: function name(argument-list) { statements } The argument-list is a list of one or more names, separated by commas, that represent argument values passed to the function. When an argument name is used in the statements of a function, it is replaced by a copy of the corresponding argument value. For example, here is a simple function that takes a single numeric argument N and returns a random integer between 1 and N (inclusive): function random(N) return { (int (N * rand() + 1)) } This function uses two built-in functions discussed in Chapter 3: rand (which returns a random floating point number between 0 and 1) and int (which returns the integer part of a floating point number). The expression N * rand () + 1 yields a random floating point number between 1 and N+1 (not including N+1 itself). Applying the int function to this floating point number obtains an integer between 1 and N. The return statement returns this value as the result of the function random. Once you define the random function, you can use it anywhere in your program that you would use other functions. For example, if you have a file that contains people’s names in its first field, and each of these people is going to roll two six-sided dice, you could simulate this situation with the following program: function random(N) return { (int (N * rand() + 1)) } { score printf = random(6) "%s rolls + random(6) %d\n", $1, score } This program consists of a definition for the random function and a rule to be applied to every record in the file. The score variable contains the sum of two simulated six-sided die rolls. This value is printed, along with the name of the person who rolled the dice. You can test this program on the hobbies file. Remember, however, that the file contains several lines for most people, so the output will show more than one roll per person. As another example of the random function, here is the program used to generate the random baseball scores in the baseball file. The input data file contains a single line giving the names of baseball teams (separated by tabs). BEGIN { FS function "\t" } random(N) = { # Produce return ( # in Read for # * names 1; 1 # Generate (i = field 100 1; 1 rand() + separator 1) of baseball <= NF; team[i] for is random number between int (N (i = Tab = 1 and N ) teams i++) $i random scores <= 100; i++) { Choose teams # hometeam = visteam = # Make while team[random (NF) ] team[random (NF) ] sure teams are different (hometeam == wvisteam) visteam = team[random(NF) ] # Generate homescore visscore # Make while scores = random{13) = random(1l3) sure scores visscore # are different (homescore == visscore) = random(1l3) Print out score "%$s\t%d\t",hometeam, homescore printf printf "%$s\t%d\n",visteam,visscore } The comments in the program should make it easy to understand what is happening in each section. The program chooses two different teams at random from the list in an input file. It then assigns each team a random score from 1 to 13 (a range typical of baseball scores) and prints the results with two printf statements. (We could also have used a single print f statement.) As another example of the random function, here is the program used to generate the-random lists of numbers in the numbers file: function random(N) # Produce return ( for = { random int (N * integer between rand() + 1) i++) { 1 and N 0; j--) ) } BEGIN { (i 1; 1 <= for (j = 30; random(10); printf printf exit 7-2 User-Defined Functions "\n" "%d j > ",random(100) This program has only a BEGIN rule. This rule prints out 30 lines, each of which contains a random number of integers in the range 1 to 100. Note that random is used both to choose the integers and to decide how many of these integers will appear on each line. 7.2 Recursion A function can call itself; this process is called recursion. One example of a recursive function is the factorial function, which is called with the following form: factorial (N) This factorial function produces the number that is the product of all positive integers less than or equal to N. For example: factorial (4) The result of this expression is 4x3x2x1, or 24. The factorial of any N less than 1 is defined as 1. The following function definition defines the factorial function recursively: function factorial (N) if (N <= { 1) return 1 return N else * factorial (N-1) } If N is less than or equal to 1, the factorial is 1. Otherwise, the factorial of N is N times the factorial of N-1. Thus the factorial of 4 (4x3x2x1) is 4 times the factorial of 3 (3x2x1). The factorial function calls itself recursively to figure out the appropriate result. By the way, the factorial function demonstrates that a function can have more than one return statement. When a return statement is executed, the function immediately stops executing and returns the given value as the function result. 7.3 Call By Value When a program calls a user-defined function, nawk makes copies of the argument values passed to the function and the function does all its work using those copies. For example, suppose a program is using a variable named X and calls a user-defined function F': F (X) The function F is given a copy of the current value of X. Because F only has a copy, the function cannot affect the current value of X: For example, consider this program: function exchange (A,B) temp = { A A =238 B temp exchange ($1,$2) print $0 User-Defined Functions 7-3 In this program, it appears that the exchange function swaps the values of arguments A and B. The value of A is temporarily stored in temp; the value of B is assigned to A and the saved value of A is assigned to B. Now, when the main rule of the program issues the function call exchange ($1, $2) does nawk swap the values of the first two fields of the current record? No, the function is only working with copies of the two fields; the function does not change the fields themselves. Note that the definition of exchange does not have a return statement. It is not necessary for functions to return values. If a function does not have a return statement, the function ends when the last statement is executed. If a function does not use return to return a result, do not use that function as if it did return a result. A function with no return statement yields a meaningless (undefined) result value. 7.4 Passing Arrays to Functions When an array is passed as an argument to a function, it is passed by reference. This means that the function works with the actual array, not with a copy. Anything that the function does to the array has an effect on the original array. For example, the split function is a built-in function that takes an array as an argument. It has the following form: split(string,array) The split function breaks up the string into fields, and assigns each of the fields to an element of the array. The first field is assigned to array [1] , the next to array (2] , and so on. Fields are assumed to be separated with the field separator string F'S. If you want to use a different field separator string, you can use the following format: split(string,array fsstring) The value of fsstring is the field separator string you want to use instead of ¥S. The result of split is the number of fields that string contained. Note that split actually changes the elements of array. When an array is passed to a function, the function may change the array elements. 7—4 User-Defined Functions Enhancing Your nawk Programs 8 This chapter discusses additional ways you can tailor your nawk programs to serve your needs. 8.1 The getline Function The getline function reads input from the current data file or from a different file. The function has several different forms, discussed in the sections that follow. 8.1.1 Reading from the Current Input In its simplest form, getline is called as follows: getline This reads a new record from the current data file. The function automatically changes the value of $0 and all the other field values. It also changes variables like NF, NR, and FNR. In other words, using get1line in this way is exactly like what happens when nawk reads in a new record in the normal way. For example: /XYZ/ { print ; getline ; print } First, this rule prints any record that contains the string XYZ. Next, the getline function reads the next record, and the final print prints that new record. Therefore, the rule prints every record that contains XYZ and also the record that follows (regardless of what the next record contains). When getline reads a new record, the previous record is discarded; subsequent rules are applied to the new record, if appropriate. For example: /XY2/ { print /ABC/ { ... ; getline some action ; print ... } } The ABC rule in this program will be applied to the new record (if appropriate); it will not be applied to the XYZ record because that record is discarded when the new record is read. If a call to getline appears in the BEGIN action, nawk immediately starts reading the first data file specified on the command line. 8.1.2 Reading a Line into a String Variable The getline function can also be called in the following form: getline variable This form reads a new line from the current data file but assigns the contents of the line to the named string variable. The variables NR and FNR are changed to reflect that another record has been read from the input data file; however, the contents of $0 and NF are unchanged. Therefore, the following example reads a line into the variable X and compares this new line to the old line that is still stored in $0: getline if X (X == $0) print 8.1.3 "Duplicate line" Reading from a New File Another form of getline reads a line from a different file instead of the current data file: getline var <"filename" This form of the function reads a line from the given file and stores the contents of the line in the string variable var. For example, here is a simple program that compares the current data file to another file named testfile and prints out a message if the two are not identical: { getline X <"testfile" if !'= X) ($0 print "Not identical!TM } This rule is executed for every line in the data file. Every time the action is executed, the getline function reads a new line from test £ile and compares it with the current line from the data file. For every line read from the current data file, another line is read from test file and the two lines are compared. If the two files differ at any point, the message ‘‘Not identical!’’ is printed. A program may also call get1ine with the form getline <"filename" In this case, a line is read from the given file and assigned to $0. The value of NF is changed to reflect the new record in $0, but the variables NR and FNR are not changed because the record was not read from the current data file. 8.1.4 Reading from Other Commands The getline function can also be used to read data produced by another command or program: "command" | getline var This form of the function executes the given command and gathers the command’s output. The first line of output is piped into (assigned to) the string variable var. For example, the following program executes the date command and assigns the output of the command to the string variable now: "date" | getline now The following statements read the current date into the variable now and check to see if the date string contains Apr: "date" if | getline now (now ~ /.*Apr.*/) print "April Shower Time!" You can also pipe command output into $0. This is done with a statement of the following form: "command" | getline 8-2 Enhancing Your nawk Programs This form of get 1ine changes the value of $0 and NF but does not change NR or E'NR. 8.1.5 Redirecting Output to Files and Pipes You can redirect the output of print and printf to a file or a pipe. Details are given in the Output section of the nawk(1) reference page. Only a limited number of files and pipes can be opened at one time. You can use the close function to close files during execution. In this way, any number of files and pipes can be used during the execution of a nawk program. You can close both input files (used by get 1ine) and output files (used by print and printf). 8.2 The system Function The previous section showed how you can execute programs and system commands from nawk programs using the get1ine function. You can also execute commands with the system function. This function has the following form: system("command line") The following statement executes a cd command to change the current directory to directory XYZ: system ("cd XYZ") 8.3 Compound Assignments The nawk language lets you use a shorthand notation for some common assignment operations. For example, the following statements are equivalent: sum = sum += sum + value value Note, however, that the second form is simpler to write. The += operation is an example of a compound assignment. Table 8-1 shows all the compound assignment operations of nawk and their equivalents: Table 8-1: Compound Assignments Compound Operation Equivalent Compound Operation Equivalent A += B A=A+2B A /=B A=A/B A -=B A=A-8B A %= B A=AS%B A A=A*B A A=A"B *= B ~=B For example, you could use the following program on the hobbies file to calculate how many hours a week John spends on his hobbies: /John/ { sum += $3 } Enhancing Your nawk Programs 8-3 8.4 The sortgen Program It can be difficult to remember all of the options to the sort command. As an example of the power of nawk, this section presents a nawk program, named sortgen, that generates the correct options for a specification. The sortgen program is described in detail in The AWK Programming Language . Briefly, sortgen takes a description of the layout of the fields in a record and emits a command line for sort that will carry out the desired sort. Note that sortgen uses 1-origin (the first field to be sorted on is field 1), and writes the sort command line to use sort’s 0-origin field labeling. Example 8-1 shows the definition of sortgen: Example 8-1: # sortgen # input: # output: BEGIN /no # { generate sequence = 0 } In't / { for { sort of print global ok = 0 command lines command line key |not rules - sortgen Program for nawk describing for sort options sort "error: cannot do negatives:", $0; ok = 1 } variables } /uniqgldiscard. * (iden|dupl) / { unig = /separ.*tabjtab. *separ/ { sep = /separ/ { for (1 = 1; " -u"; ok =1 "t’'\t’"; 1 <= NF; if ok = } i++) (length($i) sep ok } 1 - == "tl 1) " si nsn =1 } /key/ { # for rules key++; dokey(); each ok = 1 } # new { dictl[key] /ignore.* (space|{blank)/ { blankl[key] /fold|case/ { fold[key] /num/ { /rev|descendldecreas|down|oppos/ /month/ num[key] { { { print { cmd = "sort" flag = dict[0] if for (flag) (i = come "d"; ok = 1 in order 1; = { next cannot # flags "b"; "f"; "n"; = ok ok "r"; = # each 1 =1 } } } =1 } ok =1 this understand:", for = ok "M"; } } ok =1 is $0 } default } key uniq cmd if = = revikey] "error: print = month{key] /forward|ascend|increas|uplalpha/ END must key /dict/ lok key; blank[0] = i cmd <= " key; (pos[i] fold[0] -" rev([0] num{0] month{[0] blank[i] fold([i] sep flag i++) != "") { flag = pos[i] flag = flag dict[i] rev([i] if (flag) if (pos2[i]) cmd num{i] = cmd " +" cmd = cmd " month[i] flag =" pos2[i] } print cmd } function dokey( for (i i) = 1; if | 1 (81 # <= NF; ~ /~[0-91+8/) poslkey] 8-4 Enhancing Your nawk Programs determine position of key i++) = $i | - 1 # sort uses 0O-origin Example 8-1: (continued) break } for (i++; if 1 <= NF; ($1i ~ i++) /~[0-9]1+S/) pos2[key] = { §i break } if (posl(key] == "") printf ("error: if invalid key specification: %s\n", $0) (pos2[key] == "") pos2{key] = poslkey] + 1 Enhancing Your nawk Programs 8-5 Order of Operations A This appendix lists the order of operations for nawk, from highest precedence (operations done first) to lowest (operations done last). You can use parentheses () to change this ordering. Operators Description $i Vi[a] field, array element V++ V== ++V --V A"TMB increment, decrement exponentiation +A -A !A A*B A/B unary plus, unary minus, logical NOT A%B multiplication, division, remainder A+B A-B addition, subtraction A B string concatenation A<B A>B A<=B A>=B A!=B A== A~B A!~B comparison regular expression matching A in V array membership A && B logical AND A logical OR ||l B A?B:C V=B V+=B V*=B V/=B conditional expression V-=B V%= assignment V*=B In this table, A, B, and C can be any expression; i is any expression yielding an integer; and V is any variable. Example Files B This appendix contains copies of all the example files used in this manual. The hobbies File Fields in this file are separated by spaces. When creating files that will use nawk’s default value for F'S, you can enter a single space or as many spaces as needed to make the fields align neatly. Jim reading 15 Jim bridge 4 100.00 10.00 Jim role~playing 5 70.00 Linda bridge 12 30.00 Linda cartooning Katie 5 75.00 jogging 14 120.00 Katie reading 10 60.00 John role-playing 8 100.00 John jogging Andrew wind-surfing Lori jogging Lori weight-lifting Lori bridge 8 30.00 20 1000.00 5 30.00 12 200.00 2 0.00 The baseball File Fields in this file are separated by tabs. Note that the fields do not line up uniformly when you look at the file on your terminal. This irregularity occurs because exactly one tab is used between fields; using multiple tabs to make the fields line up in neat columns would result in nawk’s seeing two adjacent tabs as the field separators before and after an empty field. When creating the baseball file, key in the information as in this example: % cat > baseball Brewers 5[TABITigers 9 Here is the file: Brewers 5 Tigers Brewers 2 Blue Blue Jays 9 Jays 8 Red 6 Sox Indians 6 Blue Yankees 7 Brewers Orioles 10 Indians 1 Brewers 6 Yankees 3 Red Sox 3 Indians 12 Red 6 Yankees 2 8 Brewers Blue Sox Jays Jays 7 7 2 2 Sox Yankees Brewers Tigers Jays 12 Blue Jays 11 Indians Blue Blue 10 Jays Jays o Tigers Jays Blue Red Brewers 10 Red Indians 4 Tigers 12 8 Brewers Blue Jays Sox Sox 9 9 Yankees 11 Tigers 2 Orioles 5 Red 6 Yankees 12 Blue Orioles 1 Red Sox 8 Yankees 5 Brewers 4 Orioles 6 Indians 13 Indians 12 Tigers 9 Red 3 Blue Sox Blue Jays Yankees 9 Sox Jays Orioles Orioles 6 10 Indians 7 Red 5 Orioles 2 Yankees 13 Brewers 6 Orioles 4 Brewers 6 Yankees 11 Indians 9 Tigers 4 Indians 13 Red Sox 3 Brewers 10 Yankees 1 Indians 8 Yankees 8 Tigers 10 Orioles 1 Blue s 9 Indians 8 Blue Brewers 2 Orioles 5 Brewers 2 Indians 7 Orioles 7 Indians 2 Yankees 4 Orioles 6 Red 1 Orioles 12 Tigers 6 Brewers 13 Indians 1 Yankees 12 Orioles 8 Red 7 Yankees 9 Brewers 13 Tigers 8 Indians 7 Indians 1 Blue Sox Blue Jays 12 9 Orioles Sox 13 Jays 12 Jays Indians Jays 8 Red Indians 12 Tigers S Yankees 8 Indians 5 8 Sox Indians 2 Orioles 12 Brewers 6 Red 2 Brewers 13 Indians 9 9 Tigers 2 Yankees 11 Orioles 1 Blue Red Sox 5 Yankees 9 Brewers 3 Tigers 13 Blue Jays Orioles Sox Jays 8 Red Sox Blue Jays 11 Brewers 3 Tigers 7 Brewers Brewers 2 Tigers 5 9 Red Sox Jays 4 Indians 5 Yankees 12 Orioles 5 Brewers 4 Blue Jays Tigers 2 Blue Jays Orioles 4 Blue Jays Red Sox B-2 Example Files 7 Jays Blue Blue 8 9 Jays Sox 8 w Red Jays Blue [+)} Orioles Blue Q© Indians HWOdHO-JO N Orioles Orioles 10 Brewers Tigers 5 Red Sox 2 Brewers 9 Tigers 12 11 Tigers Jays 2 Blue Brewers 12 Orioles 6 Indians Red Sox Yankees Indians Yankees Orioles Sox Red Yankees Indians WO Yankees W 0NN Blue 3 Tigers 8 Tigers 7 Brewers 11 Brewers 11 Red Sox 11 Yankees 5 Yankees 10 Tigers 13 Brewers 8 Red Yankees Orioles Orioles 6 Yankees 4 Red Sox 11 Ry Indians Tigers =Y Brewers Sox Red =N Indians 12 13 Brewers Indians 6 Red Sox 11 Oriocles 12 w Brewers 1 13 Jays 120 Blue O WHF Indians Sox Jays Indians 9 Brewers 8 The numbers File Fields in this file are separated by spaces. 74 8 33 87 66 40 46 68 45 50 53 40 5 19 54 12 55 35 44 21 66 43 20 98 44 58 2 70 77 22 60 55 12 12 2 20 5 100 12 43 10 46 1 57 7 2 52 46 83 42 46 58 17 90 43 14 84 63 69 7 65 63 15 59 71 63 95 82 82 83 63 73 35 82 24 14 23 60 35 94 48 59 33 39 99 90 88 51 50 58 1 50 36 42 41 76 88 68 5 49 68 56 41 45 33 72 40 7 94 5 56 69 96 21 46 52 47 26 26 36 28 93 63 20 5 56 88 79 60 55 1 1 91 12 36 42 12 57 63 13 35 33 11 47 10 31 26 95 44 55 19 94 86 64 47 60 49 35 45 89 34 79 65 17 73 96 67 58 Example Files B~3 Index Special Characters , (comma) See comma ’ (apostrophe) See apostrophe A (circumflex) See circumflex { } (braces) See braces | (vertical bar) See vertical bar . (period) See period ¢ (quotation marks) See quotation marks $ (dollar sign) See dollar sign $0 notation, 1-3 % (percent sign) See percent sign & (ampersand) See ampersand () (parentheses) See parentheses * (asterisk) See asterisk + (plus sign) See plus sign ; (semicolon) See semicolon = (equal sign) See equal sign ? (question mark) See question mark [ ] (brackets) See brackets — (minus sign) See minus sign \ (backslash) See backslash A action, 1-3 after processing input, 27 before processing input, 2-7 compound, 4-3 default, 1-5 omitting from rules, 1-5 print, 1-5 implied if no action specified, 1-3 printf, 2-3 alphabetical order, 14 ampersand double, for multiple conditions, 3-5 AND operator, 3-5 apostrophe for enclosing a nawk program, 1-6 arguments for numeric functions, 2-10, 2-11 passing mechanisms for, 7-1, 7-3, 74 arithmetic operations, 2-1 functions in, 2-10 operators for, list of, 2-1t remainder (modulus), 2-2 arrays creating, 6-2 deleting elements from, 6-3 generalized, 6-2 arrays (cont.) generalized (cont.) applications for, 63 close function, 83 comma, to separate fields, 1-5 command line, running nawk from, 1-6 multidimensional, 64 comments in nawk programs, 4-1 names of, 6-2 comparing values, 1-3, 1-4 passing mechanism to functions, 74 subscripts, 6—1 operators for, list of, 1-3t compound assignments, 8-3 floating-point numbers as, 6-3 list of, 8-3t non-equivalent strings in, 63 compound statements, 4-3 treatment of, by nawk, 6-3 concatenating strings, 5-3 using strings as, 62 conditions, 1-3 syntax of references to, 6~1 ASCII collating order, 14 multiple, 3-5, 36 control structures assigning values, 2-6, 2-9 else statement, 4-1 assignment operator, 26 exit statement, 4—7 asterisk for loop, 4-5, 6-2 in regular expressions, 3-2t if statement, 4-1 atan2 function, 2-11t next statement, 46 while loop, 44 converting a string to a number, 5-6 cos function, 2-11t backslash preventing interpretation of metacharacters with, creating arrays, 6-2 creating your own functions, 7-1 to 74 34 using built-in functions, 7-1 printing in a string, 2-6 BEGIN pattern, 2-7 next statement in action for, 46 braces in regular expressions, 3-2t brackets in regular expressions, 3-2t built-in variables, 2-9, 5-1t D data entering from the terminal, 1-7 files, 1-1, 1-8 form of, 1-1 sources of, 1-1, 1-7 decimal point in numbers, 2-3, 2-5 decrementing values, 2-8 C defining your own functions, 7-1 to 7-4 calculating with nawk, 2-1 case of letters, 3—1 changing in a string, 5-6 character escape sequences for certain, 2-5 normal, 2-3 with special meaning to nawk, 3-2 circumflex in regular expressions, 3-2t Index-2 using built-in functions, 7-1 dollar sign in regular expressions, 3-2t to indicate fields, 1-3 dynamic regular expressions, 34 E formatting variables as strings, 5-6 element deleting from an array, 63 FS variable, 4-2, 5-1t functions argument passing mechanisms, 7-1, 7-3, 74 of an array, 6-1 call by reference, 74 else statement, 4—1 call by value, 7-3 END pattern, 2-7, 4-3 closing files or pipes, 8-3 exit statement in action for, 4—7 defining your own, 7-1 to 74 equal sign using built-in functions, 7-1 assigning values to variables with, 2-6 getline, 8-1 testing equality with, 1-3 reading from a different file with, 8-2 escape sequences, 2-5 reading from other commands with, 8-2 list of, 2-6t numeric executing commands from a nawk program, 8-3 arguments for, 2-10, 2-11 exit statement, 4-7 described, 2-10 exp function, 2-11t list of, 2—11t exponential notation, 14 results of, 2-10 expressions string, 5—4 See also regular expression, 2-1 list of, 54 multiple, 3-6 syntax for, 7-1 extracting substrings from a string, 5-5 F system, 8-3 G -F option, 4-2 field getline function, 8-1 reading from a different file with, 8-2 defined, 1-2 displaying, 1-5 order of, in records, 1-2 reading from other commands with, 8-2 gsub string function, 5-4, 5-5 separating, 1-2, 4-2 separating for output, 1-5 file if statement, 4-1 data, 1-1 incrementing values, 28 program, 1-7 index string function, 5-5 redirecting print output to, 8-3 initializing values, 2-8 FILENAME variable, 5-1t int function, 2-11t finding length of a string, 54 explained, 2-11 FNR variable, 2-9t for loop, 4-5, 6-2 for statement useful in accessing arrays, 62 J joining strings, 5-3 format string, 2-3 formatting output, 2-3 Index-3 L nonmatching expressions, 3-2 leaving out the action, 1-5 leaving out the pattern, 1-5 length string function, 54 letters, case of, 3—1 locating substrings in a string, 5-5 notation, 1-3 scientific or exponential, 1-4 NR variable, 2-9t null string, 1-4 numbers forcing variable treatment as, 5-3 log function, 2—11t numeric values, 1-4 loops displaying, 1-5 for, 4-5, 62 while, 44 lowercase letters, 3-1 O OFMT variable, 5-1t M OFS variable, 5-1t match string function, 5-5 matching expressions See regular expression matching strings, 34 omitting the action, 1-5 omitting the pattern, 1-5 operations, order of, 1-6, 2-2, A-1 operators AND, 3-5 mathematical calculations, 21 decrement, 2—8 functions in, 2-10 for comparing values, list of, 1-3t order of, 22 increment, 2—-8 metacharacter mathematical, list of, 2—1t defined, 3-2 in regular expressions, 3—4 preventing interpretation of, 34 list of, 3-2t minus sign OR, 3-6 OR operator, 3-6 ord string function, 5-6 order of operations, as subtraction operator, 2-1t mathematical, 2-2 double, as decrement operator, 2-8 multidimensional arrays, 6-4 multiline programs A-1 in applying rules, 1-6 ORS variable, 5-1t output formatting of, 2-3 entering from a command line, 1-6 P N nawk utility running, 1-6 from a command line, 1-6 from a program file, 1-7 new-line character, 1-2 representing for output, 2-5 next statement, 4-6 NF variable, 2-9t parentheses in regular expressions, 3-2t to control calculation order, 2-2 pattern function of, 1-3 matching with a regular expression, 31 multiple, 3-6 omitting from rules, 1-5 ranges of, 3-5 Index—4 pattern (cont.) special function of BEGIN, 2-7 special function of END, 2-7 variables in, 2-8 percent sign in placeholders, 2—4 period R rand function, 211t explained, 2—12 range, 3-5 caution when using, 3-5 reading a line explicitly, 8-1 from a different file, 8-2 as decimal point in numbers, 2-3 in regular expressions, 3-2t pipes from other commands, 8-2 record defined, 1-2 redirecting print output to, 8-3 representing entire, 1-3 placeholders, 24 list of, 24t specifying display precision with, 2-5 separating, 1-2 record-oriented variables built-in, 2-9 specifying display width with, 24 plus sign as addition operator, 21t double, as increment operator, 2-8 in regular expressions, 3—2t precision list of, 2-9t recursion, 7-3 recursive, 7-3 redirection, 1-8, 8-3 regular expression bracketed, 3-2t of numbers, specifying for display, 2-5 described, 3-1 preliminary actions, 2-7 dynamic, 3—4 print action, 1-5 in braces, 3-2t printf action, 2-3 matching patterns with, 3-1 starting a new line with, 2-5 parantheses in, 3-2 printing information, 1-5 with special formatting, 2-3 program form of, 1-2 multiline, from a command line, 1-6 shape of, 1-2 preventing metacharacter interpretation in, 3—4 replacing substrings in a string, 54, 5-5 results of numeric functions, 2-10 RS variable, 5-1t rule defined, 1-2 program files, 1-7 order of application, 1-6 running nawk from, 1-7 syntax of, 1-3 programming languages, 1-1 S Q question mark in regular expressions, 3-2t quotation marks for enclosing strings, 1-4 quotation marks, single See apostrophe scientific notation, 14 semicolon to separate actions, 2—-8 separating actions on a line, 2-8 shell restriction on multiline programs, 1-6 sin function, 2-11t Index-5 T sortgen program, 8—4e sprintf string function, 5-6 sqrt function, 2-11t srand function, 211t statements else, 4-1 tolower string function, 5-6 toupper string function, 5-6 truncation of values, 2-11 exit, 4-7 U for, 4-5 uppercase letters, 3-1 if, 4-1 next, 4-6 while, 44 vV values, 21 string array subscripts all converted to, by nawk, 6-3 assigning, 2-9 as regular expression, 34 comparing, 14 changing case of letters in, 5-6 decrementing, 2-8 concatenation, 5-3 incrementing, 2-8 converting to a number, 5-6 initial, 2-8 defined, 14 numeric, defined, 14 string, defined, 1-4 displaying, 1-5 extracting substrings from, 5-5 variables forcing variable treatment as, 5-3 built-in, use of, 2-9 formatting variables as, 56 described, 2-6 length of, 54 forcing treatment as numerics, 5-3 locating substrings in, 5-5 forcing treatment as strings, 5-3 matching expressions with, 3—4 initializing string, 5-1 replacing substrings in, 5-4, 5-5 numeric and string, differentiating between, 5-3 string variables record-oriented, built-in, 2-9 and numeric variables, differentiating between, 5-3 list of, 2-9t built-in, 5-1 string, built-in, 5-1 list of, 5-1t list of, 5-1t defined, 5-1 initializing, 5-1 vertical bar sub string function, 5-5 double, for multiple conditions, 3-6 subscripts in regular expressions, 3—2t in arrays, 61 floating-point numbers as, 63 non-equivalent strings in, 6-3 treatment of by nawk, 6-3 using strings as, 6-2 substr string function, 5-5 system function, 8-3 w while loop, 44 for loop as a shorthand form of, 4-6 white space, 1-2 in nawk rules, 1-5 width of displayed information, 2—4 Index—6 How to Order Additional Documentation Technical Support If you need help deciding which documentation best meets your needs, call 800-343-4040 before placing your electronic, telephone, or direct mail order. Electronic Orders To place an order at the Electronic Store, dial 800-234-1998 using a 1200- or 2400-baud modem from anywhere in the USA, Canada, or Puerto Rico. If you need assistance using the Electronic Store, call 800-DIGITAL (800-344-4825). Telephone and Direct Mail Orders Your Location Call Contact Continental USA, 800-DIGITAL Digital Equipment Corporation Alaska, or Hawaii P.O. Box CS2008 Nashua, New Hampshire 03061 Puerto Rico 809-754-7575 Local Digital Subsidiary Canada 800-267-6215 Digital Equipment of Canada Attn: DECdirect Operations KAO2/2 P.O. Box 13000 100 Herzberg Road Kanata, Ontario, Canada K2K 2A6 International - Internal —_— Local Digital subsidiary or approved distributor SSB Order Processing - WMO/E1LS or Software Supply Business Digital Equipment Corporation Westminster, Massachusetts 01473 * For internal orders, you must submit an Internal Software Order Form (EN-01740-07). Reader’'s Comments ULTRIX Guide to the nawk Utility AA-PBKPA-TE Please use this postage-paid form to comment on this manual. If you require a written reply to a software problem and are eligible to receive one under Software Performance Report (SPR) service, submit your comments on an SPR form. Thank you for your assistance. Please rate this manual: Accuracy (software works as manual says) Completeness (enough information) Clarity (easy to understand) Organization (structure of subject matter) Figures (useful) Examples (useful) Index (ability to find topic) Page layout (easy to find information) Excellent Good Fair Poor O O O O O O O 0O O O O O O O O O O O O O O O O O O O . O 0 O . O What would you like to see more/less of? What do you like best about this manual? What do you like least about this manual? Please list errors you have found in this manual: Page Description Additional comments or suggestions to improve this manual: What version of the software described by this manual are you using? Name/Title Dept. Company Date Mailing Address Email Phone =======-mccmccme e e ~|-l- dliloliltial ---------------------- NO POSTAGE NECESSARY IF MAILED IN THE I I I R T T T T T T T 110 SPIT BROOK ROAD NASHUA NH 03062-9987 L DIGITAL EQUIPMENT CORPORATION OPEN SOFTWARE PUBLICATIONS MANAGER ZKO3-2/204 ek POSTAGE WILL BE PAID BY ADDRESSEE Y FIRST-CLASS MAIL PERMIT NO. 33 MAYNARD MA el BUSINESS REPLY MAIL I I UNITED STATES ———.— Do Not Tear —Fold Here and Tape Reader’s Comments ULTRIX Guide to the nawk Utility AA-PBKPA-TE Please use this postage-paid form to comment on this manual. If you require a written reply to a software problem and are eligible to receive one under Software Performance Report (SPR) service, submit your comments on an SPR form. Thank you for your assistance. Please rate this manual: Excellent Good Fair Poor Accuracy (software works as manual says) O O O O Completeness (enough information) ] O O O Clarity (easy to understand) O O O O Organization (structure of subject matter) O O O O Figures (useful) O O O O Examples (useful) O O O O Index (ability to find topic) O O O O Page layout (easy to find information) O O O O What would you like to see more/less of? What do you like best about this manual? What do you like least about this manual? Please list errors you have found in this manual: Page Description Additional comments or suggestions to improve this manual: What version of the software described by this manual are you using? Name/Title Dept. Company Date Mailing Address Email Phone ] f ------- Do Not Tear — Fold Here and Tape =======cccmcmmcmcc e I-I— | i e — | ] ] m ngnan TM | NOPOSTAGE | |' NECESSARY IFMALEDINTHE | | UNITED STATES | | 1 1 T E BUSINESS REPLY MAIL FIRST-CLASS MAIL PERMIT NO. 33 MAYNARD MA POSTAGE WILL BE PAID BY ADDRESSEE I ; — IEE——— e | | —— ] i IS — E DIGITAL EQUIPMENT CORPORATION OPEN SOFTWARE PUBLICATIONS MANAGER | ! 110 SPIT BROOK ROAD ! | ZK0O3-2/Z204 NASHUA NH 03062-9987 E ] | i Hmulhllnnllnubthlulilulubilnhilsl ] i -------- Do Not Tear ~ Fold Here --—---------—-—--------------—---------------------—---—--—-------E 1 I 1 1 1 1 1 t i 1 i ! ! i I 1 1 i 1 ! ! 1 I 1 1 1 Cut Along Dotted | ' Line 1 I l 1 ! 1 1 1 ] i 1 1 1 | I 1 1 ! 1 1 I [} I 1 1 1 1 1
Home
Privacy and Data
Site structure and layout ©2025 Majenko Technologies