UNIX in a Nutshell: A Desktop Quick Reference for SVR 4 and Solaris 7

UNIX in a Nutshell: A Desktop Quick Reference for SVR 4 and Solaris 7

Paperback(Third Edition)

$26.96 $29.95 Save 10% Current price is $26.96, Original price is $29.95. You Save 10%.

Product Details

ISBN-13: 9781565924277
Publisher: O'Reilly Media, Incorporated
Publication date: 09/01/1999
Series: In a Nutshell (O'Reilly) Series
Edition description: Third Edition
Pages: 624
Product dimensions: 6.03(w) x 8.99(h) x 1.16(d)

About the Author

Arnold Robbins, an Atlanta native, is a professional programmer and technical author. He has worked with Unix systems since 1980, when he was introduced to a PDP-11 running a version of Sixth Edition Unix. He has been a heavy AWK user since 1987, when he became involved with gawk, the GNU project's version of AWK. As a member of the POSIX 1003.2 balloting group, he helped shape the POSIX standard for AWK. He is currently the maintainer of gawk and its documentation. He is also coauthor of the sixth edition of O'Reilly's Learning the vi Editor. Since late 1997, he and his family have been living happily in Israel.

Read an Excerpt

Chapter 11: The awk Programming Language

11.1 Conceptual Overview

awk is a pattern-matching program for processing files, especially when they are databases. The new version of awk, called nawk, provides additional capabilities.1 Every modern Unix system comes with a version of new awk, and its use is recommended over old awk.

1 It really isn't so new. The additional features were added in 1984, and it was first shipped with System V Release 3.1 in 1987. Nevertheless, the name was never changed on most systems.

Different systems vary in what the two versions are called. Some have oawk and awk, for the old and new versions, respectively. Others have awk and nawk. Still others only have awk, which is the new version. This example shows what happens if your awk is the old one:

$ awk 1 /dev/null
awk: syntax error near line 1
awk: bailing out near line 1

awk exits silently if it is the new version.

Source code for the latest version of awk, from Bell Labs, can be downloaded starting at Brian Kernighan's home page: http://cm.bell-labs.com/~bwk. Michael Brennan's mawk is available via anonymous FTP from ftp://ftp.whidbey.net/pub/brennan/mawk1.3.3.tar.gz. Finally, the Free Software Foundation has a version of awk called gawk, available from ftp://gnudist.gnu.org/gnu/gawk/gawk-3.0.4.tar.gz. All three programs implement "new" awk. Thus, references below such as "nawk only," apply to all three. gawk has additional features.

With original awk, you can:

  • Think of a text file as made up of records and fields in a textual database.

  • Perform arithmetic and string operations.

  • Use programming constructs such as loops and conditionals.

  • Produce formatted reports.

With nawk, you can also:

  • Define your own functions.

  • Execute Unix commands from a script.

  • Process the results of Unix commands.

  • Process command-line arguments more gracefully.

  • Work more easily with multiple input streams.

  • Flush open output files and pipes (latest Bell Labs awk).

In addition, with GNU awk (gawk), you can:

  • Use regular expressions to separate records, as well as fields.

  • Skip to the start of the next file, not just the next record.

  • Perform more powerful string substitutions.

  • Retrieve and format system time values.

11.2 Command-Line Syntax

The syntax for invoking awk has two forms:

awk  [options]  'script'  var=value  file(s)
awk  [options]  -f scriptfile  var=value  file(s)

You can specify a script directly on the command line, or you can store a script in a scriptfile and specify it with -f. nawk allows multiple -f scripts. Variables can be assigned a value on the command line. The value can be a literal, a shell variable ($name), or a command substitution (`cmd`), but the value is available only after the BEGIN statement is executed.

awk operates on one or more files. If none are specified (or if - is specified), awk reads from the standard input.

The recognized options are:

-Ffs

Set the field separator to fs. This is the same as setting the system variable FS. Original awk allows the field separator to be only a single character. nawk allows fs to be a regular expression. Each input line, or record, is divided into fields by whitespace (blanks or tabs) or by some other user-definable record separator. Fields are referred to by the variables $1, $2,..., $n. $0 refers to the entire record.

-v var=value

Assign a value to variable var. This allows assignment before the script begins execution (available in nawk only).

To print the first three (colon-separated) fields of each record on separate lines:

awk -F: '{ print $1; print $2; print $3 }' /etc/passwd

More examples are shown in the section "Simple Pattern-Procedure Examples."

11.3 Patterns and Procedures

awk scripts consist of patterns and procedures:

pattern  { procedure }

Both are optional. If pattern is missing, { procedure } is applied to all lines; if { procedure } is missing, the matched line is printed.

11.3.1 Patterns

A pattern can be any of the following:

/regular expression/
relational expression
pattern-matching expression
BEGIN
END
  • Expressions can be composed of quoted strings, numbers, operators, functions, defined variables, or any of the predefined variables described later in the section "Built-in Variables."

  • Regular expressions use the extended set of metacharacters and are described in Chapter 6, Pattern Matching.

  • ^ and $ refer to the beginning and end of a string (such as the fields), respectively, rather than the beginning and end of a line. In particular, these metacharacters will not match at a newline embedded in the middle of a string.

  • Relational expressions use the relational operators listed in the section "Operators" later in this chapter. For example, $2 > $1 selects lines for which the second field is greater than the first. Comparisons can be either string or numeric. Thus, depending on the types of data in $1 and $2, awk does either a numeric or a string comparison. This can change from one record to the next.

  • Pattern-matching expressions use the operators ~ (match) and !~ (don't match). See the section "Operators" later in this chapter.

  • The BEGIN pattern lets you specify procedures that take place before the first input line is processed. (Generally, you set global variables here.)

  • The END pattern lets you specify procedures that take place after the last input record is read.

  • In nawk, BEGIN and END patterns may appear multiple times. The procedures are merged as if there had been one large procedure.

Except for BEGIN and END, patterns can be combined with the Boolean operators || (or), && (and), and ! (not). A range of lines can also be specified using comma-separated patterns:

pattern,pattern

11.3.2 Procedures

Procedures consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons, and contained within curly braces. Commands fall into five groups:

  • Variable or array assignments

  • Printing commands

  • Built-in functions

  • Control-flow commands

  • User-defined functions (nawk only)

11.3.3 Simple Pattern-Procedure Examples

  • Print first field of each line:

    { print $1 }

  • Print all lines that contain pattern:

    /pattern/

  • Print first field of lines that contain pattern:

    /pattern/ { print $1 }

  • Select records containing more than two fields:

    NF > 2

  • Interpret input records as a group of lines up to a blank line. Each line is a single field:

    BEGIN { FS = "\n"; RS = "" }

  • Print fields 2 and 3 in switched order, but only on lines whose first field matches the string "URGENT":

    $1 ~ /URGENT/ { print $3, $2 }

  • Count and print the number of pattern found:

    /pattern/ { ++x }
    END { print x }

  • Add numbers in second column and print total:

    { total += $2 }
    END { print "column total is", total}

  • Print lines that contain less than 20 characters:

    length($0) < 20

  • Print each line that begins with Name: and that contains exactly seven fields:

    NF == 7 && /^Name:/

  • Print the fields of each input record in reverse order, one per line:

    {
            for (i = NF; i >= 1; i--)
                    print $i
    }

11.4 Built-in Variables

Version Variable Description
awk FILENAME

Current filename

  FS

Field separator (a space)

  NF

Number of fields in current record

  NR

Number of the current record

  OFMT

Output format for numbers ("%.6g") and for conversion to string

  OFS

Output field separator (a space)

  ORS

Output record separator (a newline)

  RS

Record separator (a newline)

  $0

Entire input record

  $n

nth field in current record; fields are separated by FS

nawk ARGC

Number of arguments on command line

  ARGV

An array containing the command-line arguments, indexed from 0 to ARGC - 1

  CONVFMT

String conversion format for numbers ("%.6g") (POSIX)

  ENVIRON

An associative array of environment variables

  FNR

Like NR, but relative to the current file

  RLENGTH

Length of the string matched by match() function

  RSTART

First position in the string matched by match() function

  SUBSEP

Separator character for array subscripts ("\034")

gawk ARGIND

Index in ARGV of current input file

  ERRNO

A string indicating the error when a redirection fails for getline or if close() fails

  FIELDWIDTHS

A space-separated list of field widths to use for splitting up the record, instead of FS

  IGNORECASE

When true, all regular expression matches, string comparisons, and calls to index()s ignore case

  RT

The text matched by RS, which can be a regular expression in gawk

11.5 Operators

The following table lists the operators, in order of increasing precedence, that are available in awk. Note: while ** and **= are common extensions, they are not part of POSIX awk.

Symbol Meaning

= += -= *= /= %= ^= **=

Assignment
?: C conditional expression (nawk only)
|| Logical OR (short-circuit)
&& Logical AND (short-circuit)
in Array membership (nawk only)
~ !~ Match regular expression and negation
< <= > >= != == Relational operators
(blank) Concatenation
+ - Addition, subtraction
* / % Multiplication, division, and modulus (remainder)
+ - ! Unary plus and minus, and logical negation
^ ** Exponentiation
++ -- Increment and decrement, either prefix or postfix
$ Field reference

11.6 Variables and Array Assignments

Variables can be assigned a value with an = sign. For example:

FS = ","

Expressions using the operators +, -, /, and % (modulo) can be assigned to variables.

Arrays can be created with the split() function (see below), or they can simply be named in an assignment statement. Array elements can be subscripted with numbers (array[1], ..., array[n]) or with strings. Arrays subscripted by strings are called associative arrays.2 For example, to count the number of widgets you have, you could use the following script:

2 In fact, all arrays in awk are associative; numeric subscripts are converted to strings before using them as array subscripts. Associative arrays are one of awk's most powerful features.

/widget/ { count["widget"]++ }               Count widgets
END      { print count["widget"] }            Print the count

You can use the special for loop to read all the elements of an associative array:

for (item in array)
        process array[item]

The index of the array is available as item, while the value of an element of the array can be referenced as array[item].

You can use the operator in to see if an element exists by testing to see if its index exists (nawk only):

if (index in array)
        ...

This sequence tests that array[index] exists, but you cannot use it to test the value of the element referenced by array[index].

You can also delete individual elements of the array using the delete statement (nawk only).

11.6.1 Escape Sequences

Within string and regular expression constants, the following escape sequences may be used. Note: The \x escape sequence is a common extension; it is not part of POSIX awk.

Sequence Meaning Sequence Meaning
\a Alert (bell) \v Vertical tab
\b Backspace \\ Literal backslash
\f Form feed \nnn Octal value nnn
\n Newline \xnn Hexadecimal value nn
\r Carriage return \" Literal double quote (in strings)
\t Tab \/ Literal slash (in regular expressions)

11.7 User-Defined Functions

nawk allows you to define your own functions. This makes it easy to encapsulate sequences of steps that need to be repeated into a single place, and reuse the code from anywhere in your program. Note: for user-defined functions, no space is allowed between the function name and the left parenthesis when the function is called.

The following function capitalizes each word in a string. It has one parameter, named input, and five local variables, which are written as extra parameters.

# capitalize each word in a string
function capitalize(input,    result, words, n, i, w)
{
        result = ""
        n = split(input, words, " ")
        for (i = 1; i <= n; i++) {
                w = words[i]
                w = toupper(substr(w, 1, 1)) substr(w, 2)
                if (i > 1)
                        result = result " "
                result = result w
        }
        return result
}

# main program, for testing
{ print capitalize($0) }

With this input data:

A test line with words and numbers like 12 on it.

This program produces:

A Test Line With Words And Numbers Like 12 On It.

11.8 Group Listing of awk Functions and Commands

The following table classifies awk functions and commands.

strftime[4]
Arithmetic String Control Flow I/O Time Program-
Functions Functions Statements Processing Functions ming
atan2[3] gensub[4] break close[3] delete[3]
cos[3] gsub[3] continue fflush[5] systime[4] function[3]
exp index do/while[3] getline[3]   system[3]
int length exit next    
log match[3] for nextfile[5]    
rand[3] split if print    
sin[3] sprintf return[3] printf    
sqrt sub[3] while      
srand[3] substr        
  tolower[3]        
  toupper[3]        

[3] Available in nawk.

[4] Available in gawk.

[5] Available in Bell Labs awk and gawk.

11.9 Implementation Limits

Many versions of awk have various implementation limits, on things such as:

  • Number of fields per record

  • Number of characters per input record

  • Number of characters per output record

  • Number of characters per field

  • Number of characters per printf string

  • Number of characters in literal string

  • Number of characters in character class

  • Number of files open

  • Number of pipes open

  • The ability to handle 8-bit characters and characters that are all zero (ASCII NUL)

gawk does not have limits on any of these items, other than those imposed by the machine architecture and/or the operating system.

11.10 Alphabetical Summary of Functions and Commands

The following alphabetical list of keywords and functions includes all that are available in awk, nawk, and gawk. nawk includes all old awk functions and keywords, plus some additional ones (marked as {N}). gawk includes all nawk functions and keywords, plus some additional ones (marked as {G}). Items marked with {B} are available in the Bell Labs awk. Items that aren't marked with a symbol are available in all versions....

Table of Contents

Dedication

Preface

Commands and Shells

Chapter 1: Introduction

Chapter 2: Unix Commands

Chapter 3: The Unix Shell: An Overview

Chapter 4: The Bourne Shell and Korn Shell

Chapter 5: The C Shell

Text Editing and Processing

Chapter 6: Pattern Matching

Chapter 7: The Emacs Editor

Chapter 8: The vi Editor

Chapter 9: The ex Editor

Chapter 10: The sed Editor

Chapter 11: The awk Programming Language

Text Formatting

Chapter 12: nroff and troff

Chapter 13: mm Macros

Chapter 14: ms Macros

Chapter 15: me Macros

Chapter 16: man Macros

Chapter 17: troff Preprocessors

Software Development

Chapter 18: The Source Code Control System

Chapter 19: The Revision Control System

Chapter 20: The make Utility

Appendixes

ASCII Character Set

Obsolete Commands

Bibliography

Colophon

Preface

The third edition of UNIX in a Nutshell (for System V) generally follows the dictum that "if it's not broken, don't fix it." This edition has the following new features:

  • Many mistakes and typographical errors have been fixed.
  • Coverage of Solaris 7, the latest version of the SVR4-based operating system from Sun Microsystems.
  • Coverage of over 50 new commands has been added, mostly in Chapter 2.
  • The Korn shell chapter now covers both the 1988 and the 1993 versions of ksh.
  • The Emacs chapter now covers GNU emacs version 20.
  • A new chapter describes the troff man macros.
  • Each chapter on the troff macro packages comes with a simple example document showing the order in which to use the macros.
  • The troff preprocessors chapter now covers refer and its related programs.
  • The RCS chapter now covers version 5.7 of RCS.
  • A new "UNIX Bibliography" chapter lists books that every UNIX Wizard should have on his or her bookshelf
  • Coverage of commands that are no longer generally useful but that still come with SVR4 or Solaris have been moved to an appendix.

Audience

This quick reference should be of interest to UNIX users and UNIX programmers, as well as to anyone (such as a system administrator) who might offer direct support to users and programmers. The presentation is geared mainly toward people who are already familiar with the UNIX system - that is, you know what you want to do, and you even have some idea how to do it. You just need a reminder about the details. For example, if you want to remove the third field from a database, you might think, "I know I can use the cut command, but what arethe options?" In many cases, specific examples are provided to show how a command is used.

This quick reference might also help people who are familiar with some aspects of UNIX but not with others. Many chapters include an overview of the particular topic. White this isn't meant to be comprehensive, it's usually sufficient to get you started in unfamiliar territory.

And some of you may be coming from a UNIX system that runs the BSD or SunOS 4.1 version. To help with such a transition, SVR4 and Solaris include a group of "compatibility" commands, many of which are presented in this guide.

Finally, if you're new to the UNIX operating system, and you're feeling bold, you might appreciate this book as a quick tour of what UNIX has to offer. Section 1-4, Beginner's Guide, can point you to the most useful commands, and you'll find brief examples of how to use them, but take note: this book should not be used in place of a good beginner's tutorial on UNIX. (You might try Learning the UNIX Operating System for that.) This quick reference should be a supplement, not a substitute. (There are references throughout the text to other relevant O'Reilly books that will help you learn the subject matter under discussionyou may be better off detouring to those books first.)

Scope of This Book

The quick reference is divided into five parts:

  • Part I (Chapters I through 5) describes the syntax and options for UNIX commands and for the Bourne, Kom, and C shells.
  • Part II (Chapters 6 through 11) presents various editing tools and describes their command set (alphabetically and by group). Part 11 begins with a review of pattern matching, inducting examples geared toward specific editors.
  • Part III (Chapters 12 through 17) describes the nroff and troff text formatting programs, related macro packages, and the preprocessors tbl, eqn, pic, and refer.
  • Part IV (Chapters 18 through 20) summarizes the UNIX utilities for software development - SCCS, RCS, and make.
  • Part V contains loose ends: a cable of ASCII characters and equivalent values, a bibliography of UNIX books, and an appendix that covers obsolete commands that are still part of SVR4 and/or Solaris.

Conventions

The quick reference follows certain typographic conventions, outlined below:

Constant Width

is used for directory names, filenames, commands, and options. All terms shown in constant width are typed literally. It is also used to show the contents of files or the out- put from commands.

Constant Italic
is used in syntax and command summaries to show generic text; these should be replaced with user-supplied values.

Constant Bold
is used in examples and tables to show commands or other text that should be typed literally by the user.

Italic
is used to show generic arguments and options; these should be replaced with user-supplied values. Italic is also used to highlight comments in examples.

Bold Italic
is used for headings.

Bold
is used for summary headings, such as in the command summary chapter.

%, $ are used in some examples as the C shell prompt (%) and as the Bourne shell or Kom shell prompt ($).
?, > are used in some examples as the C shell secondary prompt (?) and as the Bourne shell or Kom shell secondary prompt (>).
program(N) indicates the "man page" for program in section IV of the online manual. For example, echo(1) means the entry for the echo command.
[ ] surround optional elements in a description of syntax. (Me brackets themselves should never be typed.) Note that many commands show the argument [files]. If a filename is omitted, standard input (usually the keyboard) is assumed. End keyboard input with an end-of-file character.
EOF
indicates the end-of-file character (normally CTRL-D).
^x, CTRL-x
indicates a "control character," typed by holding down the CONTROL key and the x key for any key x.
| is used in syntax descriptions to separate items for which only one alternative may be chosen at a time.
---> is used at the bottom of a right-hand page to show that the current entry continues on the next page. The continuation is marked by a <---.

A final word about syntax. In many cases, the space between an option and its argument can be omitted. In other cases, the spacing (or lack of spacing) must be followed strictly. For example, -wn (no intervening space) might be interpreted differently from -wn. It's important to notice the spacing used in option syntax.

Acknowledgments

Thanks to Yosef Gold for letting me share his office, allowing me to work efficiently and productively. Deb Cameron revised Chapter 7, The Emacs Editor. Thanks to Gigi Escabrook at O'Reilly & Associates for her help and support.

Good reviewers make for good books, even though they also make for more work for the author. I would like to thank Glenn Barry (Sun Microsystems) for a number of helpful comments. Nelson H. F. Beebe (University of Utah Department of Mathematics) went through the book with a fine-tooth comb; it is greatly improved for his efforts. A special thanks to Brian Kernighan (Bell Labs) for his review and comments. The troff-related chapters in particular benefited from his authority and expertise, as did the rest of the book (not to mention much of UNIX!). Nelson H. F. Beebe and Dennis Ritchie (Bell Labs) provided considerable help in putting together Chapter 22, A UNIX Bibliography.

Finally, much thanks to my wonderful wife Miriam; without her love and support this project would not have been possible.

Arnold Robbins
Nof Ayalon, ISRAEL
March, 1999

Acknowledgments From the Second Edition

Many people helped this book along the way. The first edition resulted from the efforts of the following staff members of O'Reilly & Associates: Jean Diaz, Dale Dougherty, Daniel Gilly, Linda Mui, Tim O'Reilly, Thomas Van Raalte, Linda Walsh, Sue Willing, and Donna Woonteiler.

The second edition has a new cover and new interior layout, designed by Edie Freedman. Arthur Saarinen drew the referee figures. Chris Reilley and Jeff Robbins assisted with graphics. The manuscript was formatted using troff macros that were implemented by Linda Mui and Lenny Muellner, and the manuscript was prepared through the efforts of Donna Woontelier, Sue Willing, and especially Rosanne Wagger. Christine Kenney and Peter Mui were valuable resources, tracking down useful information and passing it along.

Special thanks to the technical reviewers for reading the drafts and fielding all kinds of questions; the book has profited greatly from the comments of Tan Bronson (Microvation Consultants), Peter van der Linden, and Mike Loukides (O'Reilly & Associates).

We'd Like to Hear From You

We have tested and verified all of the information in this book to the best of our ability, but you may find that features have changed (or even that we have made mistakes!). Please let us know about any errors you find, as well as your suggestions for future editions, by writing:

O'Reilly & Associates, Inc.
101 Morris Street
Sebastopol, CA 95472
1-800-998-9938 (in the US or Canada)
1-707-829-0515 (intemational/local)
1-707-829-0104 (FAX)

You can also send us messages electronically. To be put on the mailing list or request a catalog, send email to:

info@oreilly.com

To ask technical questions or comment on the book, send email to:

bookquestions@oreilly.com

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews