sed and awk

sed and awk

4.6 35
by Dale Dougherty, Arnold Robbins
     
 

View All Available Formats & Editions

sed & awk describes two text processing programs that are mainstays of the UNIX programmer's toolbox.sed is a "stream editor" for editing streams of text that might be too large to edit as a single file, or that might be generated on the fly as part of a larger data processing step. The most common operation done with sed is substitution,

Overview

sed & awk describes two text processing programs that are mainstays of the UNIX programmer's toolbox.sed is a "stream editor" for editing streams of text that might be too large to edit as a single file, or that might be generated on the fly as part of a larger data processing step. The most common operation done with sed is substitution, replacing one block of text with another.awk is a complete programming language. Unlike many conventional languages, awk is "data driven" — you specify what kind of data you are interested in and the operations to be performed when that data is found. awk does many things for you, including automatically opening and closing data files, reading records, breaking the records up into fields, and counting the records. While awk provides the features of most conventional programming languages, it also includes some unconventional features, such as extended regular expression matching and associative arrays. sed & awk describes both programs in detail and includes a chapter of example sed and awk scripts.This edition covers features of sed and awk that are mandated by the POSIX standard. This most notably affects awk, where POSIX standardized a new variable, CONVFMT, and new functions, toupper() and tolower(). The CONVFMT variable specifies the conversion format to use when converting numbers to strings (awk used to use OFMT for this purpose). The toupper() and tolower() functions each take a (presumably mixed case) string argument and return a new version of the string with all letters translated to the corresponding case.In addition, this edition covers GNU sed, newly available since the first edition. It also updates the first edition coverage of Bell Labs nawk and GNU awk (gawk), covers mawk, an additional freely available implementation of awk, and briefly discusses three commercial versions of awk, MKS awk, Thompson Automation awk (tawk), and Videosoft (VSAwk).

Editorial Reviews


Fatbrain Review

Serious UNIX programmers and administrators will enjoy the second edition of this best-selling book on a set of the most popular UNIX utilities, sed and awk. Why? Because it covers awk as described by the POSIX standard as well as NetBSD, FreeBSD, and the Linux versions of awk.

The journey begins with an overview of the basic operations of sed and awk, showing a progression in functionality from grep to sed to awk. The next stop is writing sed scripts. You'll learn the syntax of sed commands, and advanced features, including multiple pattern space and hold space commands.

The book then moves to writing scripts for awk. Discussions include pattern matching, expressions, relational and Boolean operators, and informal retrieval. The text also explains awk's built-in functions and user-defined functions. The authors keep you learn by outlining the development of an index processing application, and they offer the readers contact information on how to obtain various versions of awk. This tutorial includes a miscellany of sed and awk scripting styles and techniques.

Product Details

ISBN-13:
9781565922259
Publisher:
O'Reilly Media, Incorporated
Publication date:
03/28/1997
Series:
Nutshell Handbooks Series
Edition description:
Second Edition
Pages:
434
Sales rank:
602,740
Product dimensions:
9.00(w) x 7.00(h) x 1.00(d)

Read an Excerpt


From Chapter 7: Writing Scripts for awk

As mentioned in the preface, this book describes POSIX awk; that is, the awk language as specified by the POSIX standard. Before diving into the details, we'll provide a bit of history.

The original awk was a nice little language. It first saw the light of day with Version 7 UNIX, around 1978. It caught on, and people used it for significant programming.

In 1985, the original authors, seeing that awk was being used for more serious programming than they had ever intended, decided to beef up the language. (See Chapter 11, A Flock of awks, for a description of the original awk, and all the things it did not have when compared to the new one.) The new version was finally released to the world at large in 1987, and it is this version that is still found on SunOS 4.1.x systems.

In 1989, for System V Release 4, awk was updated in some minor ways. This version became the basis for the awk feature list in the POSIX standard. POSIX clarified a number of things about awk, and added the CONVFMT variable (to be discussed later in this chapter).

As you read the rest of this book, bear in mind that the term awk refers to POSIX awk, and not to any particular implementation, whether the original one from Bell Labs, or any of the others discussed in Chapter 11. However, in the few cases where different versions have fundamental differences of behavior, that will be pointed out in the main body of the discussion.

Playing the Game

To write an awk script, you must become familiar with the rules of the game. The rules can be stated plainly and you will find them described in Appendix B, QuickReference for awk, rather than in this chapter. The goal of this chapter is not to describe the rules but to show you how to play the game. In this way, you will become acquainted with many of the features of the language and see examples that illustrate how scripts actually work. Some people prefer to begin by reading the rules, which is roughly equivalent to learning to use a program from its manual page or learning to speak a language by scanning its rules of grammar--not an easy task. Having a good grasp of the rules, however, is essential once you begin to use awk regularly. But the more you use awk, the faster the rules of the game become second nature. You learn them through trial and error--spending a long time trying to fix a silly syntax error such as a missing space or brace has a magical effect upon long-term memory. Thus, the best way to learn to write scripts is to begin writing them. As you make progress writing scripts, you will no doubt benefit from reading the rules (and rereading them) in Appendix B or the awk manpage or The AWK Programming Language book. You can do that later--let's get started now.

Hello, World

It has become a convention to introduce a programming language by demonstrating the "Hello, world" program. Showing this program works in awk will demonstrate just how unconventional awk is. In fact, it's necessary to show several different approaches to printing "Hello, world."

In the first example, we create a file named test that contains a single line. This example shows a script that contains the print statement:

$ echo 'this line of data is ignored' > test
$ awk '{ print "Hello, world" }' test

Hello, world

This script has only a single action, which is enclosed in braces. That action is to execute the print statement for each line of input. In this case, the test file contains only a single line; thus, the action occurs once. Note that the input line is read but never output.

Now let's look at another example. Here, we use a file that contains the line "Hello, world."

$ cat test2
Hello, world
$ awk '{ print }&39; test2
Hello, world

In this example, "Hello, world" appears in the input file. The same result is achieved because the print statement, without arguments, simply outputs each line of input. If there were additional lines of input, they would be output as well.

Both of these examples illustrate that awk is usually input-driven. That is, nothing happens unless there are lines of input on which to act. When you invoke the awk program, it reads the script that you supply, checking the syntax of your instructions. Then awk attempts to execute the instructions for each line of input. Thus, the print statement will not be executed unless there is input from the file.

To verify this for yourself, try entering the command line in the first example but omit the filename. You'll find that because awk expects input to come from the keyboard, it will wait until you give it input to process: press RETURN several times, then type an EOF (CTRL-D on most systems) to signal the end of input. For each time that you pressed RETURN, the action that prints "Hello, world" will be executed.

There is yet another way to write the "Hello, world" message and not have awk wait for input. This method associates the action with the BEGIN pattern. The BEGIN pattern specifies actions that are performed before the first line of input is read.

$ awk 'BEGIN { print "Hello, world" }'
Hello, world

Awk prints the message, and then exits. If a program has only a BEGIN pattern, and no other statements, awk will not process any input files.

Awk's Programming Model

It's important to understand the basic model that awk offers the programmer. Part of the reason why awk is easier to learn than many programming languages is that it offers such a well-defined and useful model to the programmer.

An awk program consists of what we will call a main input look. a loop is a routine that is executed over and over again until some condition exists that terminates it. You don't write this loop, it is given--it exists as the framework within which the code that you do write will be executed. The main input loop in awk is a routine that reads one line of input from a file and makes it available for processing. The actions you write to do the processing assume that there is a line of input available. In another programming language, you would have to create the main input loop as part of your program. It would have to open the input file and read one line at a time. This is not necessary a lot of work, but it illustrates a basic awk shortcut and makes it easier for you to write your program.

The main input loop is executed as many times as there are lines of input. As you saw in the "Hello, world" examples, this loop does not execute until there is a line of input. It terminates when there is no more input to be read.

Awk allows you to write two special routines that can be executed before any input is read and after all input is read. These are the procedures associated with the BEGIN and END rules, respectively. In other words, you can do some preprocessing before the main input loop is ever executed and you can do some postprocessing after the main input loop has terminated. The BEGIN and END procedures are optional.

You can think of an awk script as having potentially three major parts: what happens before, what happens during, and what happens after processing the input. Figure 7-1 shows the relationship of these parts in the flow of control of an awk script...

Meet the Author

Dale Dougherty is the publisher of the O'Reilly Network and Director of O'Reilly Research. Dale has been instrumental in many of O'Reilly's most important efforts, including founding O'Reilly & Associates with Tim O'Reilly. He was the developer and publisher of Global Network Navigator (GNN), the first commercial Web site. Dale was developer and publisher of Web Review, the online magazine for Web designers, and he was O'Reilly & Associates' first editor. Dale has written and edited numerous books at O'Reilly & Associates. Dougherty is a Lecturer in the School of Information Management and Systems (SIMS) at the University of California at Berkeley.

Arnold Robbins, an Atlanta native, is a professional programmer and technical author. He has worked with Unix systems since 1980, when he was introduced to a PDP-11 running a version of Sixth Edition Unix. He has been a heavy AWK user since 1987, when he became involved with gawk, the GNU project's version of AWK. As a member of the POSIX 1003.2 balloting group, he helped shape the POSIX standard for AWK. He is currently the maintainer of gawk and its documentation. He is also coauthor of the sixth edition of O'Reilly's Learning the vi Editor. Since late 1997, he and his family have been living happily in Israel.

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >

Sed and Awk 4.6 out of 5 based on 0 ratings. 34 reviews.
Anonymous More than 1 year ago
Does what it says helps you learn sed and awk. Also helps you learn regualr expressioins
Anonymous More than 1 year ago
"Thank you" the tabby warrior says as he heads towards the warriors den.
Anonymous More than 1 year ago
2 before 5!!! Awesome!!
Anonymous More than 1 year ago
But you stand here for a reason.
Anonymous More than 1 year ago
Theres a cyclops ahead. Problaby 305 millinea old heading this way.
Anonymous More than 1 year ago
"Do we look like them? I think I have two eyes. Not one."
Anonymous More than 1 year ago
-.-
Anonymous More than 1 year ago
"LEMME AT EM"
Anonymous More than 1 year ago
He finished off the last cyclops, pulling a splinter from his arm, and sheathed his sword. He walked over to the group.p, staying in the shadows.
Anonymous More than 1 year ago
He leaned against a nearby tree, calmly breathing. He tiredly twirled his dagger, the tip of the blade gently pushing into his left hand. <p> [Do we move now? And if so, someone needs to go post their post first, because my results are different and I don't know where main camp there is.]
Anonymous More than 1 year ago
Im a warrior and im firery orangae and striking blue eyes an d i will always protect u and kits if i was ur mate so van pls be ur mate?
Anonymous More than 1 year ago
A small kit stumbles into camp. She is bleeding a tad on her back. Hello my name is treekit. May i plese join. My mum got killed by some big scary animal. Tree kit is very small. She is an orange she cat with blackish blue eyes.
Anonymous More than 1 year ago
Can I join just so you know I'm a black she-cat with green eyes and I'm timid and I forgot to say I have cuts and thorns all over me and I'm very thin
Anonymous More than 1 year ago
Can i join your clan?
Anonymous More than 1 year ago
Hey guys eont be on much on vacation srry))
Anonymous More than 1 year ago
"Hi everybody!" (Sorry i was gone!)
Anonymous More than 1 year ago
My name is Cherryflower and i am a pinkish gray she cat with golden eyes. I would like to be medicine cat.
Anonymous More than 1 year ago
At gundor al results. For sourons sid go to black gates all results
Anonymous More than 1 year ago
Guest More than 1 year ago
Unix has earned itself quite a reputation for its potent tools, used for batch editing of text files (like program output). Sed and Awk are two of these tools. Sed is a direct descendent of Ed, the original Unix line editor, which employs regular expressions, a powerful method for description of patterns in text, for operations like substitute, append or delete. Awk is a complete scripting language with programming structures like conditionals, loops, functions etc., developed in 1970's by Alfred Aho, Brian Kernighan and Peter Weinberger (hence A-W-K). The trio has also written a book on Awk.

Dale Dougherty (in the 2nd edition with Arnold Robbins, maintainer of GNU Awk and author of several more books on Awk programming language) have made a good job in making a thoroughly readable tutorial on Sed and Awk. However, it remains a mystery to me how they succeeded to fill no less than 407 pages with it. Mind you, Sed and Awk are not really some big monsters. There exist something like two dozens of operators in Sed (most of them you will probably never use), and the syntax of Awk mimics those of C programming language, so it is likely that you know it already. Once you grok the idea of regular expressions, you should become a proficient user of Awk in about 30 minutes.

In conclusion, go buy the book if your need to manipulate text files on Unix and you think you need a lengthy tutorial with a gentle learning curve. Otherwise, short references on Awk and Sed, like the ones in Unix Power Tools and a bunch of examples showing some tricks you might not think of, will probably be more useful. In addition, it is good to know that during the nineties, much of the focus has drifted from Awk to Perl, so you might consider a book on Perl as well.

Guest More than 1 year ago
You will get sick of hearing from your co-workers how 'they gotta get this book'. It's that good. Both of these fantastic tools helped out as we needed to provide detailed file information during the latest ILOVEYOU virus fiasco on over 250 servers.
Anonymous More than 1 year ago
Move to 'borgias'! Asher is locked out!
Anonymous More than 1 year ago
"Pfff did you actually think I would keep you?" I blink innocently, "Aaaaanyway, make yourself at home. Or not. Idc" I shrug and walk over to the tables, sitting down and taking a gulp of my protein-enriched fruit smoothie
Anonymous More than 1 year ago
The giant squid flew off to the sign that says "WELCOME TO CAMP HALF BLOOD" And flew to her baby squids in the middle of the hill, "MY BABIES IVE MISSED YOU SO MUCH, IVE CAME ACROSS THE CAMPS AND WOODS TO GET TO YOU GUYS" as the baby squids roared at her mama happy with joy, the mama squid sat with her babies on the middle of the "welcome to camp half blood" sign hill, as they all roared loudly with joy.
Anonymous More than 1 year ago
He yawned and decided to call it a night. He went to the Apollo Cabin to get some sleep. ((Gtgtb bbt))