Read an Excerpt
From Chapter 1: bash Basics
...Pipelines
It is also possible to redirect the output of a command into the standard input of another command instead of a file. The construct that does this is called the pipe, notated as |. A command line that includes two or more commands connected with pipes is called a pipeline.
Pipes are very often used with the more command, which works just like cat except that it prints its output screen by screen, pausing for the user to type SPACE (next screen), RETURN (next line), or other commands. If you're in a directory with a large number of files and you want to see details about them, Is -1 | more will give you a detailed listing a screen at a time.
Pipelines can get very complex, and they can also be combined with other I/O directors. To see a sorted listing of the file cheshire a screen at a time, type sort < cheshire | more. To print it instead of viewing it on your terminal, type sort < cheshire | lp.
Here's a more complicated example. The file /etc/passwd stores information about users' accounts on a UNIX system. Each line in the file contains a user's login name, user ID number, encrypted password, home directory, login shell, and other information. The first field of each line is the login name; fields are separated by colons (:). A sample line might look like this:
cam:LM1c7GhNesD4GhF3iEHrHrH4FeCKB/:501:100:Cameron Newham:/home/cam:/bin/bash
To get a sorted listing of all users on the system, type:
$ cut -d: -f1 < /etc/passwd | sort
(Actually, you can omit the <, since cut accepts input filenamearguments.) The cut command extracts the first field (-f1), where fields are separated by colons (-d:), from the input. The entire pipeline will print a list that looks like this:
adm
bin
cam
daemon
davidgc
ftp
games
gonzo
...
If you want to send the list directly to the printer (instead of your screen), you can extend the pipeline like this:
$ cut -d: -f1 < /etc/passwd | sort | lp
Now you should see how I/O directors and pipelines support the UNIX building block philosophy. The notation is extremely terse and powerful. Just as important, the pipe concept eliminates the need for messy temporary files to store command output before it is fed into other commands.
For example, do the same sort of thing as the above command line on other operating systems (assuming that equivalent utilities are available...), you need three commands. On DEC's VAX/VMS system, they might look like this:
$ cur [etc]passwd /d=":" /f=1 /out=temp1
$ sort temp1 /out=temp2
$ print temp2
$ delete temp1 temp2
After sufficient practice, you will find yourself routinely typing in powerful command pipelines that do in one line what it would take several commands (and temporary files) in other operating systems to accomplish.
Background Jobs
Pipes are actually a special case of a more general feature: doing more than one thing at a time. This is a capability that many other commercial operating systems don't have, because of the rigid limits that they tend to impose upon users. UNIX, on the other hand, was developed in a research lab and meant for internal use, so it does relatively little to impose limits on the resources available to users on a computer--as usual, leaning towards uncluttered simplicity rather than overcomplexity.
"Doing more than one thing at a time" means running more than one program at the same time. You do this when you invoke a pipeline; you can also do it by logging on to a UNIX system as many times simultaneously as you wish. (If you try that on an IBM's VM/CMS system, for example, you will get an obnoxious "already logged in" message.)
The shell also lets you run more than one command at a time during a single login session. Normally, when you type a command and hit RETURN, the shell will let the command have control of your terminal until it is done; you can't type in further commands until the first one is done. But if you want to run a command that does not require user input and you want to do other things while the command is running, put an ampersand (&) after the command.
This is called running the command in the background, and a command that runs in this way is called a background job; by contrast, a job run the normal way is called a foreground job. When you start a background job, you get your shell prompt back immediately, enabling you to enter other commands.
The most obvious use for background jobs is programs that take a long time to run, such as sort or uncompress on large files. For example, assume you just got an enormous compressed file loaded into your directory from magnetic tape.* Let's say the file is gcc.tar.z, which is a compressed archive file that contains well over 10 MB of source code files.
Type uncompress gcc.tar & (you can omit the .Z), and the system will start a job in the background that uncompresses the data "in place" and ends up with the file gcc.tar. Right after you type the command, you will see a line like this:
[1] 175
followed by your shell prompt, meaning that you can enter other commands. Those numbers give you ways of referring to your background job; Chapter 8, explains them in detail.
You can check on background jobs with the command jobs. For each background job, jobs prints a line similar to the above but with an indication of the job's status:
[1]+ Running uncompress gcc.tar &
When the job finishes, you will see a message like this right before your shell prompt:
[1]+ Done uncompress gcc.tar
The message changes if your background job terminated with an error; again, see Chapter 8 for details...