
Talking with Computers: Explorations in the Science and Technology of Computing
316
Talking with Computers: Explorations in the Science and Technology of Computing
316Hardcover
-
SHIP THIS ITEMIn stock. Ships in 1-2 days.PICK UP IN STORE
Your local store may have stock of this item.
Available within 2 business hours
Related collections and offers
Overview
Product Details
ISBN-13: | 9780521834254 |
---|---|
Publisher: | Cambridge University Press |
Publication date: | 03/15/2004 |
Pages: | 316 |
Product dimensions: | 7.01(w) x 10.00(h) x 0.75(d) |
About the Author
Read an Excerpt
Cambridge University Press
0521834252 - Talking with Computers - Explorations in the Science and Technology of Computing - by Thomas Dean
Excerpt
CHAPTER ONE
Talking with Computers
Hardly a day goes by that I don't write at least one short computer program: a few lines of code to explore an idea or help organize my thoughts. I think of it as simply talking with my computer, and more and more often there is a computer available to talk with, often several of them joining in the conversation simultaneously. Each time you click on a link in a browser, you cause a sequence of computations involving dozens if not hundreds of computers scattered all over the world.
Making a computation happen is not, however, the same thing as programming. There are lots of powerful programs written by talented programmers that you can call up with a click of a mouse or few keystrokes. These programs animate computers, breathing life and spirit into lumps of metal and plastic. Even if you know what's going on inside computers and computer programs, it's easy to imagine that programs are spells and the programmers who create them are sorcerers.When you click on the icon for a program, you invoke these spells and the spells conjure up spirits in the machine. But this book isn't about invoking the spells of others; it's about creating your own spells and conjuring spirits of your own design.
This is not to say I won't encourage you to use code written by other programmers. Quite the contrary: an important part of the power of computing is that good spells can be reused as often as needed. Programming is about weaving together the spells of others, conjuring your own spirits, and animating the computer to dance to your bidding.This book is about practical conjuring, about revealing what's behind some of the magic associated with computing while at the same time learning to create your own magic. And the best way to begin a book about computing and computer programming is to sit down in front of a computer and start programming.
1.1 COMPUTERS EVERYWHERE
The laptop computer on our breakfast table is connected to a wide-area network (usually called the World Wide Web or the Internet) through a local-area wireless network (our house) and a broadband connection supported by a local cable television company. I leave the laptop on the table so I can read the headlines from the online news services or check the weather while I eat breakfast. I also use it to write and run small programs.
I use a program on my laptop (it's called ssh for "secure shell") to tunnel through the firewall protecting the computers in the computer science department at Brown and open a shell (a special program that lets me interact more or less directly with the operating system - a variant of Unix in this case) on the machine sitting in my office (its name, by the way, is "klee" for the artist Paul Klee - see Figure 1.1 for the inspiration for this naming - and its symbolic address on the Internet is "klee.cs.brown.edu").
Figure 1.1: Paul Klee's "Twittering Machine" (1922) © 2003 Artist Rights Society (ARS) New York, VG Bild-Kunst, Bonn Digital Image © The Museum A Modern Art/Licensed by SCALA/Art Resource, NY |
When I say "open a shell," I mean that I make a window appear on my laptop screen into which I can type commands to be interpreted by the shell program. When I say that a program "interprets" a command, I mean that the program reads the command I've typed and converts it into instructions that the computer can carry out, thereby executing the command. The results of executing the command, usually one or more lines of text, are then displayed in the same window as the command was typed. The shell lets me write and run programs to do all sorts of routine tasks from checking football statistics to keeping track of all of my email messages, digital photos and music files.
The program ssh allows me to work remotely on computers that "trust me" in such a way that the information sent back and forth between my laptop and klee can't be deciphered by someone with access to the wires on which the information is transmitted and doesn't allow a malicious hacker to break into either my laptop or klee. I could open a shell on any of several hundred machines residing within the firewall, but I generally choose to do it on my own machine rather than slow down or "steal cycles" from a machine being used by someone else.
The time will soon come however when it won't make much sense to talk about "my machine" - computation will become as pervasive as indoor plumbing. The Internet has blurred the distinction among individual computers. I'm almost always connected to the Internet, but most of the time I don't think about what computer I'm talking with. When I'm in the department at Brown but not in my office, I walk around with my laptop connected to the department wireless network, which connects to a wide-area network and then to the Internet. Right this minute I'm working at my laptop, typing into a shell that's running on the computer in my home office a few feet away, but in another window running on my laptop I'm connected to klee. For all I know, the data that's flowing between these computers may be circling the globe, zipping through cables under the ocean and bouncing off satellites along the way. Indeed, I could pretty easily force the data to go through Zurich, Seattle or Tokyo.
Given the current state of the art, though, I do have to think a bit about where I am, or rather where the program is that's currently interpreting my keystrokes. The reason I have to know which computer I'm working on is that different machines have different software, offer different services and have access to different sources of data. I'm pretty confident that I won't lose data that's stored on the machines in the department because I trust the folks who maintain those machines and perform the backups on the file system there. But I have to do the backups on my laptop and the machine in my home office myself, and I know I'm not very careful about doing them.
Eventually, with the exception of very specialized programs and services, I won't have to worry about what computers are running the programs I need. This is already true to a certain extent if you restrict your computing to what you can do within a web browser, and yes, you can do a lot of useful computing within a web browser. Some people don't even distinguish between their web browser and their computer; they do everything - email, news, shopping, entertainment and education - from within their web browser.
For the last twenty years, I've been using programs to work on computers thousands of miles away. In the early '90s, it seemed miraculous to be sitting in a Paris hotel room running programs on the computer in my office in Providence or telling a computer at Stanford to transfer files to the portable computer on my bed in the hotel room. Today, most "netizens" take this amazing connectivity for granted and, though they may not know the magic incantations that animate these processes, they routinely run programs on remote computers and fetch files with the click of a mouse.
1.2 EVERYDAY MAGIC
I want to give you some examples of everyday programming, not fancy programming, just examples of talking with computers and getting them to do interesting things. I'll use the phrase "invoking a program" to mean making a program run, usually by typing its name and then zero or more expressions or "arguments" that provide additional direction or information. Invoking programs with specific arguments is one of the simplest ways to talk with a computer.
In summer 2002, I kept a journal to record ideas for this book. I put the journal entries in a collection of files and directories on klee. Here I'm invoking a program called wc (for "word count") by typing into a shell running on klee in order to see how much I wrote in my journal during August: /u/tld/email/book % wc -l ./journal/02/08/*/*.txt
465 ./journal/02/08/01/day.txt
323 ./journal/02/08/02/day.txt
207 ./journal/02/08/04/day.txt
445 ./journal/02/08/08/day.txt
215 ./journal/02/08/12/day.txt
299 ./journal/02/08/16/day.txt
700 ./journal/02/08/24/day.txt
335 ./journal/02/08/30/day.txt
857 ./journal/02/08/31/day.txt
3846 total
The /u/tld/email/book % part was printed by the shell. It's called the "prompt" and when I'm in the shell window (the portion of my computer screen dedicated to the shell) the cursor is positioned at the end of the prompt waiting for me to type something. I've modified the shell - the shell is itself programmable - so that the prompt always displays the default directory in which the shell looks for files.
When I'm finished typing I signal the shell, usually by hitting the "return" (or "enter") key on my keyboard, to interpret what I just typed. The directory /u/tld/email/ is where I generally store files related to my daily activities. /u/tld/email/book/ is the temporary directory I created for files related to working on this book.
I typed wc -l ./journal/02/08/*/*.txt and then hit the return key as part of my conversation with the shell and so indirectly with the operating system running on klee. More often than not when you invoke one program, that program invokes another program, and that program another, and so on, with some programs possibly invoking several other programs at once. A computer operating system is just another program, really a collection of many programs written (and rewritten) by many different people. You can think of the operating system as the accumulated wisdom of a host of very clever programmers who packed it with everything they felt was fundamentally useful for building other programs.
Other programs, applications such as web browsers and word processors, are run "on top of" or "under the control of" the operating system. The operating system sees all and controls all; it's only through the operating system that your programs can get information from the outside world (through a local network or the World Wide Web) or send files to printers or grab data stored on disks or CDs. If this seems mysterious, don't worry; it really is complicated. The good news is that for the most part you don't have to understand the details, since the operating system hides a lot of the computer's complexities from the programmer. This ability to hide complexity is essential in developing large complicated programs and makes learning to program much easier.
The specific command I typed told the shell to run the program wc to count lines (the -l argument) in the files specified by the pattern ./journal/02/08/*/*.txt, where * is a "wildcard" that matches any string of characters. With no perceptible pause, the shell printed out the next ten lines, which you can think of as the answer to my question or the result of the computation. The specified pattern matched nine files. Each of the first nine lines contains the name of a file that matched the pattern preceded by the number of lines of text in that file. For the first file listed, the first * matched 01 and the second * matched day. The last line is the total number of lines of text in all of the files.
Let me say a few words about file systems and the strange strings of characters containing slashes (/). A slash with no preceding text indicates the "root" directory; as far we're concerned, everything is stored under the root of the file system. The u in /u/ is a symbolic link to the /home/ directory on the Brown file system where the directories and files of computer users like me are stored. For the most part, symbolic links are invisible to users but allow system managers to handle large file systems more efficiently and transparently. The /u/tld/ designates my home directory, where all my files are stored; my login name is tld for the initials of my name, Thomas Linus Dean.
Most computer file systems are organized hierarchically. So, for instance, my email directory /u/tld/email/ is one of many files and directories stored in my home directory, and the directory /u/tld/email/book/ is one of many files and directories stored in my email directory.
Files can be named absolutely with respect to the root directory or relatively with respect to some other starting directory. When you're in the directory /u/tld/email/book/, ./journal/02/08/30/day.txt is a shorthand reference (or relative path name) for /u/tld/email/book/journal/02/08/30/day. txt, which is the full name (or absolute path name) for the file. I keep all files for journal entries written in 2002 in ./journal/02/, all files for August 2002 in ./journal/02/08/, and all files for 30 August 2002 in ./journal/02/08/30/. If I had typed wc -l ./journal/02/*/01/*.txt, the shell would have reported on all journal entries written on the first day of some month in 2002.
Absolute and relative path names can be confusing until you've played with them a bit, and even then you can easily get lost in a file system consisting of thousands of directories, in the same way that you can get lost navigating in a collection of web pages. For the most part, however, the nested, hierarchical directory structure makes it relatively easy to keep track of where you are and is a useful way to organize all sorts of data (including web pages). Consider these files from my journal directory: /u/tld/email/book/journal/02/year.txt
/u/tld/email/book/journal/02/year.htm
/u/tld/email/book/journal/02/08/month.txt
/u/tld/email/book/journal/02/08/month.htm
/u/tld/email/book/journal/02/08/30/day.txt
/u/tld/email/book/journal/02/08/30/day.htm
/u/tld/email/book/journal/02/08/31/day.txt
/u/tld/email/book/journal/02/08/31/day.htm
/u/tld/email/book/journal/02/09/month.txt
/u/tld/email/book/journal/02/09/month.htm
/u/tld/email/book/journal/02/09/01/day.txt
/u/tld/email/book/journal/02/09/01/day.htm
/u/tld/email/book/journal/02/09/02/day.txt
/u/tld/email/book/journal/02/09/02/day.htm
When listed this way, it's hard to discern the organizational structure inherent in calendars, though it's there in the absolute path names if you look hard enough. The underlying structure is similar to a tree, with files corresponding to leaves and directories corresponding to branches. Figure 1.2 shows these files as a tree (or, rather, the branch of the tree called /u/tld/email/book/journal/02/). You can think of Figure 1.2 as grafted onto the tree rooted at / at the branch /u/tld/email/journal/. The same basic tree-like structure that underlies hierarchical file systems in most modern operating systems appears again and again in computer science.
Figure 1.2: Files and directories organized hierarchically in a tree-like structure
The incantation wc -l ./journal/02/08/*/*.txt really is a program of sorts, albeit a short and rather cryptic one. That this short program called another program wc is not at all unusual: most programming languages provide access to all sorts of specialized programs. Even + in a language that allows 1 + 2 is a program (and not a simple one if you understand how computers handle arithmetic).
Shells and other means of interacting with operating systems offer a wide range of powerful programs that can be orchestrated to perform tasks. For example, the next program (called a shell script) renames all files with the extension html to have the extension htm. The HTML ("hypertext markup language") files that comprise web sites are conventionally identified using either the three-letter extension htm or the four-letter extension html. Both conventions are common in practice. Unfortunately, some programs require one or the other exclusively and if your files are in the wrong format, you have to convert them. I used the program ls (for "list directory contents") to list all of the files in the current directory prior to executing the program to rename the files. After executing the program, I used ls again to show that the shell script worked as advertised. In the remainder of this chapter, I've simplified the prompt to just %; each appearance of % signals the beginning of another typed command. % ls
home.html syllabus.html
% ls | sed 's/html// ' | awk '{print "mv " $1 "html " $1 "htm"}' | sh
% ls
home.htm syllabus.htm
The program starts by listing the set of files with the extension html.1 Unix programmers call the vertical bars (|) "pipes": they convert the output of one program, ls in this case, into the input to another program. The sed 's/html//' part of the program takes each file name in turn and rips off the html part; the output of sed 's/html//' is two truncated file names home. and syllabus.. The next | causes these file names to be piped into the program fragment awk '{print "mv " $1 "html " $1 "htm"}' that essentially writes two little programs that are themselves shell scripts and look like mv home.html home.htm and mv syllabus.html syllabus.htm (mv is the "move" or "rename" command and requires you to specify both the original and the new names of the file you're renaming). The output of awk '{print "mv " $1 "html " $1 "htm"}' is fed into the program sh (yet another shell - remember we're already typing to one shell) via the last |. Figure 1.3 illustrates how the intermediate results from the different steps in this computation are piped from one step to the next.
Figure 1.3: Intermediate results flowing through pipes connecting one command to the next in a shell script
If you think about it, this little program is pretty interesting despite its simple task. The program actually wrote a couple of littler programs, started up a shell and submitted those programs to the new shell to run, producing the desired outcome. Programs that write and run other programs and even replicate or improve upon themselves are relatively common, for example, computer viruses.
With a few simple modifications, this program could change the names of thousands of files stored in any number of directories and on any number of computers. With just a little more work, you could write a program that would go inside each of these files and change any reference in the text to a file with extension html to have the extension htm. If you were maintaining a web site with thousands of web pages spread across hundreds of directories, you might end up writing and running similar programs frequently.
By the way, there's always more than one way to solve any given programming problem, and I certainly don't claim that this shell script is the most elegant or efficient way of handling renaming. We could, for example, eliminate awk from our script and manage everything using sed, as in ls | sed 's/\(.*\)html/mv \1html \1htm/' | sh. However, the most compact program is not always the best: I find the original script clearer and more appealing, though I admit this is an aesthetic judgment.
Let's try something a little more complicated. Many web masters use a language called Perl both to maintain web pages and to do computations for visitors to their web pages. Suppose a bunch of your text files refer to dates in the format mm/dd/yy and you want to change them so as to name months explicitly, perhaps to avoid confusion with the alternative format dd/mm/yy. You might also want to try to clear up the ambiguity of the century implied when only two digits are used for the year. I just created a short text file of the sort I'd like to modify and I'll get the shell to print it out using cat, a Unix command for creating, displaying and stringing together (or concatenating) files: % cat dates.txt
The date 1/1/00 should be changed,
as should 12/31/1999 and 1/1/2002,
but not the file /usr/local/bin/.
I've written a little Perl program in a file called program.pl to perform the conversion, and again I get the shell to print it out: % cat program.pl
@month = (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec);
while ($- = <ARGV>) {
s"\b(\d\d?)/(\d\d?)/(?:19)?(\d\d)\b"$month[$1-1] $2, 19$3";
s"\b(\d\d?)/(\d\d?)/(?:20)(\d\d)\b"$month[$1-1] $2, 20$3";
print "$-";
}
© Cambridge University Press