UNIX in a Nutshell: A Desktop Quick Reference for SVR 4 and Solaris 7 by Arnold Robbins, Paperback | Barnes & Noble
UNIX in a Nutshell: A Desktop Quick Reference for SVR 4 and Solaris 7

UNIX in a Nutshell: A Desktop Quick Reference for SVR 4 and Solaris 7

by Arnold Robbins
     
 

You may have seen Unix quick-reference guides, but you've never seen anything like UNIX in a Nutshell. Not a scaled-down quick reference of common commands, UNIX in a Nutshell is a complete reference containing all commands and options, along with generous descriptions and examples that put the commands in context. For all but the thorniest Unix

Overview

You may have seen Unix quick-reference guides, but you've never seen anything like UNIX in a Nutshell. Not a scaled-down quick reference of common commands, UNIX in a Nutshell is a complete reference containing all commands and options, along with generous descriptions and examples that put the commands in context. For all but the thorniest Unix problems, this one reference should be all the documentation you need.The third edition of UNIX in a Nutshell includes thorough coverage of System V Release 4. To that, author Arnold Robbins has added the latest information about:

  • Sixty new commands in The Alphabetical Summary of Commands
  • Solaris 7
  • Shell syntax (sh, csh, and the 1988 and 1993 versions of ksh)
  • Regular expression syntax
  • vi and ex commands, as well as newly updated Emacs information
  • sed and awk commands
  • troff and related commands and macros, with a new section on refer
  • make, RCS (version 5.7), and SCCS commands
In addition, there is a new Unix bibliography to guide the reader to further reading about the Unix environment.If you currently use Unix SVR4, or if you're a Solaris user, you'll want this book. UNIX in a Nutshell is the most comprehensive quick reference on the market, a must for any Unix user.

Product Details

ISBN-13:
9781565924277
Publisher:
O'Reilly Media, Incorporated
Publication date:
09/01/1999
Series:
In a Nutshell (O'Reilly) Series
Edition description:
Third Edition
Pages:
624
Product dimensions:
6.03(w) x 8.99(h) x 1.16(d)

Related Subjects

Read an Excerpt

Chapter 11: The awk Programming Language

11.1 Conceptual Overview

awk is a pattern-matching program for processing files, especially when they are databases. The new version of awk, called nawk, provides additional capabilities.1 Every modern Unix system comes with a version of new awk, and its use is recommended over old awk.

1 It really isn't so new. The additional features were added in 1984, and it was first shipped with System V Release 3.1 in 1987. Nevertheless, the name was never changed on most systems.

Different systems vary in what the two versions are called. Some have oawk and awk, for the old and new versions, respectively. Others have awk and nawk. Still others only have awk, which is the new version. This example shows what happens if your awk is the old one:

$ awk 1 /dev/null
awk: syntax error near line 1
awk: bailing out near line 1

awk exits silently if it is the new version.

Source code for the latest version of awk, from Bell Labs, can be downloaded starting at Brian Kernighan's home page: http://cm.bell-labs.com/~bwk. Michael Brennan's mawk is available via anonymous FTP from ftp://ftp.whidbey.net/pub/brennan/mawk1.3.3.tar.gz. Finally, the Free Software Foundation has a version of awk called gawk, available from ftp://gnudist.gnu.org/gnu/gawk/gawk-3.0.4.tar.gz. All three programs implement "new" awk. Thus, references below such as "nawk only," apply to all three. gawk has additional features.

With original awk, you can:

  • Think of a text file as made up of records and fields in a textual database.

  • Perform arithmetic and string operations.

  • Use programming constructs such as loops and conditionals.

  • Produce formatted reports.

With nawk, you can also:

  • Define your own functions.

  • Execute Unix commands from a script.

  • Process the results of Unix commands.

  • Process command-line arguments more gracefully.

  • Work more easily with multiple input streams.

  • Flush open output files and pipes (latest Bell Labs awk).

In addition, with GNU awk (gawk), you can:

  • Use regular expressions to separate records, as well as fields.

  • Skip to the start of the next file, not just the next record.

  • Perform more powerful string substitutions.

  • Retrieve and format system time values.

11.2 Command-Line Syntax

The syntax for invoking awk has two forms:

awk  [options]  'script'  var=value  file(s)
awk  [options]  -f scriptfile  var=value  file(s)

You can specify a script directly on the command line, or you can store a script in a scriptfile and specify it with -f. nawk allows multiple -f scripts. Variables can be assigned a value on the command line. The value can be a literal, a shell variable ($name), or a command substitution (`cmd`), but the value is available only after the BEGIN statement is executed.

awk operates on one or more files. If none are specified (or if - is specified), awk reads from the standard input.

The recognized options are:

-Ffs

Set the field separator to fs. This is the same as setting the system variable FS. Original awk allows the field separator to be only a single character. nawk allows fs to be a regular expression. Each input line, or record, is divided into fields by whitespace (blanks or tabs) or by some other user-definable record separator. Fields are referred to by the variables $1, $2,..., $n. $0 refers to the entire record.

-v var=value

Assign a value to variable var. This allows assignment before the script begins execution (available in nawk only).

To print the first three (colon-separated) fields of each record on separate lines:

awk -F: '{ print $1; print $2; print $3 }' /etc/passwd

More examples are shown in the section "Simple Pattern-Procedure Examples."

11.3 Patterns and Procedures

awk scripts consist of patterns and procedures:

pattern  { procedure }

Both are optional. If pattern is missing, { procedure } is applied to all lines; if { procedure } is missing, the matched line is printed.

11.3.1 Patterns

A pattern can be any of the following:

/regular expression/
relational expression
pattern-matching expression
BEGIN
END
  • Expressions can be composed of quoted strings, numbers, operators, functions, defined variables, or any of the predefined variables described later in the section "Built-in Variables."

  • Regular expressions use the extended set of metacharacters and are described in Chapter 6, Pattern Matching.

  • ^ and $ refer to the beginning and end of a string (such as the fields), respectively, rather than the beginning and end of a line. In particular, these metacharacters will not match at a newline embedded in the middle of a string.

  • Relational expressions use the relational operators listed in the section "Operators" later in this chapter. For example, $2 > $1 selects lines for which the second field is greater than the first. Comparisons can be either string or numeric. Thus, depending on the types of data in $1 and $2, awk does either a numeric or a string comparison. This can change from one record to the next.

  • Pattern-matching expressions use the operators ~ (match) and !~ (don't match). See the section "Operators" later in this chapter.

  • The BEGIN pattern lets you specify procedures that take place before the first input line is processed. (Generally, you set global variables here.)

  • The END pattern lets you specify procedures that take place after the last input record is read.

  • In nawk, BEGIN and END patterns may appear multiple times. The procedures are merged as if there had been one large procedure.

Except for BEGIN and END, patterns can be combined with the Boolean operators || (or), && (and), and ! (not). A range of lines can also be specified using comma-separated patterns:

pattern,pattern

11.3.2 Procedures

Procedures consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons, and contained within curly braces. Commands fall into five groups:

  • Variable or array assignments

  • Printing commands

  • Built-in functions

  • Control-flow commands

  • User-defined functions (nawk only)

11.3.3 Simple Pattern-Procedure Examples

  • Print first field of each line:

    { print $1 }

  • Print all lines that contain pattern:

    /pattern/

  • Print first field of lines that contain pattern:

    /pattern/ { print $1 }

  • Select records containing more than two fields:

    NF > 2

  • Interpret input records as a group of lines up to a blank line. Each line is a single field:

    BEGIN { FS = "\n"; RS = "" }

  • Print fields 2 and 3 in switched order, but only on lines whose first field matches the string "URGENT":

    $1 ~ /URGENT/ { print $3, $2 }

  • Count and print the number of pattern found:

    /pattern/ { ++x }
    END { print x }

  • Add numbers in second column and print total:

    { total += $2 }
    END { print "column total is", total}

  • Print lines that contain less than 20 characters:

    length($0) < 20

  • Print each line that begins with Name: and that contains exactly seven fields:

    NF == 7 && /^Name:/

  • Print the fields of each input record in reverse order, one per line:

    {
            for (i = NF; i >= 1; i--)
                    print $i
    }

11.4 Built-in Variables

Version Variable Description
awk FILENAME

Current filename

  FS

Field separator (a space)

  NF

Number of fields in current record

  NR

Number of the current record

  OFMT

Output format for numbers ("%.6g") and for conversion to string

  OFS

Output field separator (a space)

  ORS

Output record separator (a newline)

  RS

Record separator (a newline)

  $0

Entire input record

  $n

nth field in current record; fields are separated by FS

nawk ARGC

Number of arguments on command line

  ARGV

An array containing the command-line arguments, indexed from 0 to ARGC - 1

  CONVFMT

String conversion format for numbers ("%.6g") (POSIX)

  ENVIRON

An associative array of environment variables

  FNR

Like NR, but relative to the current file

  RLENGTH

Length of the string matched by match() function

  RSTART

First position in the string matched by match() function

  SUBSEP

Separator character for array subscripts ("\034")

gawk ARGIND

Index in ARGV of current input file

  ERRNO

A string indicating the error when a redirection fails for getline or if close() fails

  FIELDWIDTHS

A space-separated list of field widths to use for splitting up the record, instead of FS

  IGNORECASE

When true, all regular expression matches, string comparisons, and calls to index()s ignore case

  RT

The text matched by RS, which can be a regular expression in gawk

11.5 Operators

The following table lists the operators, in order of increasing precedence, that are available in awk. Note: while ** and **= are common extensions, they are not part of POSIX awk.

Symbol Meaning

= += -= *= /= %= ^= **=

Assignment
?: C conditional expression (nawk only)
|| Logical OR (short-circuit)
&& Logical AND (short-circuit)
in Array membership (nawk only)
~ !~ Match regular expression and negation
< <= > >= != == Relational operators
(blank) Concatenation
+ - Addition, subtraction
* / % Multiplication, division, and modulus (remainder)
+ - ! Unary plus and minus, and logical negation
^ ** Exponentiation
++ -- Increment and decrement, either prefix or postfix
$ Field reference

11.6 Variables and Array Assignments

Variables can be assigned a value with an = sign. For example:

FS = ","

Expressions using the operators +, -, /, and % (modulo) can be assigned to variables.

Arrays can be created with the split() function (see below), or they can simply be named in an assignment statement. Array elements can be subscripted with numbers (array[1], ..., array[n]) or with strings. Arrays subscripted by strings are called associative arrays.2 For example, to count the number of widgets you have, you could use the following script:

2 In fact, all arrays in awk are associative; numeric subscripts are converted to strings before using them as array subscripts. Associative arrays are one of awk's most powerful features.

/widget/ { count["widget"]++ }               Count widgets
END      { print count["widget"] }            Print the count

You can use the special for loop to read all the elements of an associative array:

for (item in array)
        process array[item]

The index of the array is available as item, while the value of an element of the array can be referenced as array[item].

You can use the operator in to see if an element exists by testing to see if its index exists (nawk only):

if (index in array)
        ...

This sequence tests that array[index] exists, but you cannot use it to test the value of the element referenced by array[index].

You can also delete individual elements of the array using the delete statement (nawk only).

11.6.1 Escape Sequences

Within string and regular expression constants, the following escape sequences may be used. Note: The \x escape sequence is a common extension; it is not part of POSIX awk.

Sequence Meaning Sequence Meaning
\a Alert (bell) \v Vertical tab
\b Backspace \\ Literal backslash
\f Form feed \nnn Octal value nnn
\n Newline \xnn Hexadecimal value nn
\r Carriage return \" Literal double quote (in strings)
\t Tab \/ Literal slash (in regular expressions)

11.7 User-Defined Functions

nawk allows you to define your own functions. This makes it easy to encapsulate sequences of steps that need to be repeated into a single place, and reuse the code from anywhere in your program. Note: for user-defined functions, no space is allowed between the function name and the left parenthesis when the function is called.

The following function capitalizes each word in a string. It has one parameter, named input, and five local variables, which are written as extra parameters.

# capitalize each word in a string
function capitalize(input,    result, words, n, i, w)
{
        result = ""
        n = split(input, words, " ")
        for (i = 1; i <= n; i++) {
                w = words[i]
                w = toupper(substr(w, 1, 1)) substr(w, 2)
                if (i > 1)
                        result = result " "
                result = result w
        }
        return result
}

# main program, for testing
{ print capitalize($0) }

With this input data:

A test line with words and numbers like 12 on it.

This program produces:

A Test Line With Words And Numbers Like 12 On It.

11.8 Group Listing of awk Functions and Commands

The following table classifies awk functions and commands.

strftime[4]
Arithmetic String Control Flow I/O Time Program-
Functions Functions Statements Processing Functions ming
atan2[3] gensub[4] break close[3] delete[3]
cos[3] gsub[3] continue fflush[5] systime[4] function[3]
exp index do/while[3] getline[3]   system[3]
int length exit next    
log match[3] for nextfile[5]    
rand[3] split if print    
sin[3] sprintf return[3] printf    
sqrt sub[3] while      
srand[3] substr        
  tolower[3]        
  toupper[3]        

[3] Available in nawk.

[4] Available in gawk.

[5] Available in Bell Labs awk and gawk.

11.9 Implementation Limits

Many versions of awk have various implementation limits, on things such as:

  • Number of fields per record

  • Number of characters per input record

  • Number of characters per output record

  • Number of characters per field

  • Number of characters per printf string

  • Number of characters in literal string

  • Number of characters in character class

  • Number of files open

  • Number of pipes open

  • The ability to handle 8-bit characters and characters that are all zero (ASCII NUL)

gawk does not have limits on any of these items, other than those imposed by the machine architecture and/or the operating system.

11.10 Alphabetical Summary of Functions and Commands

The following alphabetical list of keywords and functions includes all that are available in awk, nawk, and gawk. nawk includes all old awk functions and keywords, plus some additional ones (marked as {N}). gawk includes all nawk functions and keywords, plus some additional ones (marked as {G}). Items marked with {B} are available in the Bell Labs awk. Items that aren't marked with a symbol are available in all versions....

Meet the Author

Arnold Robbins, an Atlanta native, is a professional programmer and technical author. He has worked with Unix systems since 1980, when he was introduced to a PDP-11 running a version of Sixth Edition Unix. He has been a heavy AWK user since 1987, when he became involved with gawk, the GNU project's version of AWK. As a member of the POSIX 1003.2 balloting group, he helped shape the POSIX standard for AWK. He is currently the maintainer of gawk and its documentation. He is also coauthor of the sixth edition of O'Reilly's Learning the vi Editor. Since late 1997, he and his family have been living happily in Israel.

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >