- Shopping Bag ( 0 items )
Regular expressions are a powerful tool for manipulating text and data. If you don't use them yet, you will discover in this book a whole new world of mastery over your data. If you already use them, you'll appreciate this book's unprecedented detail and breadth of coverage. If you think you know all you need to know about regular expressions, this book is a stunning eye-opener.With regular expressions, you can save yourself time and aggravation while dealing with documents, mail messages, log files — you name it...
Ships from: Waltham, MA
Usually ships in 1-2 business days
Ships from: Waltham, MA
Usually ships in 1-2 business days
Ships from: Waltham, MA
Usually ships in 1-2 business days
Ships from: acton, MA
Usually ships in 1-2 business days
Regular expressions are a powerful tool for manipulating text and data. If you don't use them yet, you will discover in this book a whole new world of mastery over your data. If you already use them, you'll appreciate this book's unprecedented detail and breadth of coverage. If you think you know all you need to know about regular expressions, this book is a stunning eye-opener.With regular expressions, you can save yourself time and aggravation while dealing with documents, mail messages, log files — you name it — any type of text or data. For example, regular expressions can play a vital role in constructing a World Wide Web CGI script, which can involve text and data of all sorts.Regular expressions are not a tool in and of themselves, but are included as part of a larger utility. The classic example is grep. These days, regular expressions can be found everywhere, such as in:
"Regular expressions" are a powerful tool for data and text manipulation. For example, you might use regular expressions to search through a number of files for certain key words and accompanying phrases, then organize the results into a table of contents or an index. Regular expressions are commonly used in a variety of programming environments (Delphi & Visual C++ for example), scripting languages (Perl, Tcl, awk & Python), editors (emacs, vi, & Nisus Writer), specialized tools (lex, Expect, & sed), and as a "power user" utility in web browsers. This book is intended for novices as well as grizzled experts, and Friedl presents the material in a particularly engaging, almost storylike, manner. He introduces and overviews regular expressions, features, and flavors, covering the mechanics of expression processing and their crafting. regular expressions. Tool-specific information is provided as well.
In the forties, Warren McCulluch and Walter Pitts created neuron-level models of how the nervous system operates. The mathematician, Stephen Kleene, later described these models using his mathematical notation called regular expressions. Ken Thompson incorporated that system of notation into qed (the grandfather of the UNIX ed) and eventually into grep. Ever since that time, regular expressions have constantly seeped into UNIX and UNIX-like utilities. These are the regular expressions that are not used to explain nerves; these are ones used to get on them.
Grep, egrep, vi, sed, lex, awk, emacs, Perl, Tcl, and Python support regular expressions. In fact, regular expressions (regexes) are an essential part of these utilities. Unfortunately, regexes usually don't seem very important in the documentation. Man pages only casually mention regexes with usually no theoretical explanation and very limited practical discussion. This leaves most of us crafting regexes like we're playing the board game Battleship -- keep guessing until we sink a solution.
Mastering Regular Expressions is an important work about an often overlooked concept that permeates UNIX: the regular expression. It is clear and practical. The regular expression is explained by concept and not by rote. Friedl is trying to get us to think in regular expressions and not to blindly parrot (or pirate) his examples.
The book uses quick do-you-really-understand questions at key spots in the text. The book is constructed so that you have to turn the current page to see the answer to a question. Unfortunately, this prevents the "accidental" hint. Be honest. It's human nature to have wondering eyes when a question is too hard. If you can't answer a question in this book, then you know for a fact that you can't answer it. It's too hard to "accidentally" turn a page.
That's only one example of its thoughtful design. The typography is the best use of typography as a communication tool that I have seen. With code snippets like "if ($input =~ m/^([-+]?[0-9]+(\.[0-9]*)?)([CF])$/)", the typography takes you by the hand with a series of underlining, shading, and contrived characters that highlight important sections of the code. Whereas in most books the "Typographical Conventions Used" section is fluff, read and memorize this section before you cross the river regex with Friedl. You can't journey to the other side without it.
So, what exactly is a regular expression? What makes it regular? Don't look for those answers here. Friedl gives you practical concepts, not a theoretical framework. He defines regular expressions as "the key to powerful, flexible, and efficient text processing." He clarifies that with "regular expressions...allow you to describe and parse text." That is the closest thing to a definition of regular expressions in the book. That is also like defining the sun as our source of heat and light, but failing to mention that it is the ball of flame in the center of the solar system. The title of the book is not The Theory of Regular Expressions but Mastering Regular Expressions. Look for practical advice on crafting efficient regexes at your desk and not for conversation tidbits in the break room.--Dr. Dobb's Electronic Review of Computer Books
Start Your Engines!
Let's see how much I can milk this engine analogy for. The whole point of having an engine is so that you can get from Point A to Point B without doing much work. The engine does the work for you so you can relax and enjoy the Rich Corinthian Leather. The engine's primary task is to turn the wheels, and how it does that isn't really a concern of yours. Or is it?
Two Kinds of Engines
Well, what if you had an electric car? They've been around for a long time, but they aren't as common as gas cars because they're hard to design well. If you had one, though, you would have to remember not to put gas in it. If you had a gasoline engine, well, watch out for sparks! An electric engine more or less just runs, but a gas engine might need some babysitting. You can get much better performance just by changing little things like your spark plug gaps, air filter, or brand of gas. Do it wrong and the engine's performance deteriorates, or, worse yet, it stalls.
Each engine might do its work differently, but the end result is that the wheels turn. You still have to steer properly if you want to get anywhere, but that's an entirely different issue.
Let's stoke the fire by adding another variable: the California Emissions Standards. Some engines adhere to California's strict pollution standards, and some engines don't. These aren't really different kinds of engines, just new variations on what's already around. The standard regulates a result of the engine's work, the emissions, but doesn't say one thing or the other about how the engine should go about achieving those cleaner results. So, our two classes of engine are divided into four types: electric (adhering and non-adhering) and gasoline (adhering and non-adhering).
I Come to think of it, I bet that an electric engine can qualify for the standard without much change, so it's not really impacted very much - the standard just "blesses" the clean results that are already par for the course. The gas engine, on the other hand, needs some major tweaking and a bit of re-tooling before it can qualify. Owners of this kind of engine need to pay particular care to what they feed it -use the wrong kind of gas and you're in big trouble in more ways than one.
The impact of standards
Better pollution standards are a good thing, but they require that the driver exercise more thought and foresight (well, at least for gas engines, as I noted in the previous paragraph). Frankly, however, the standard doesn't impact most people since all the other states still do their own thing and don't follow California's standard... yet. It's probably just a matter of time.
Okay, so you realize that these four types of engines can be classified into three groups (the two kinds for gas, and electric in general). You know about the differences, and that in the end they all still turn the wheels. What you don't know is what the heck this has to do with regular expressions!
More than you might imagine.
Regex Engine Types
There are two fundamentally different types of regex engines: one called "DFA" (the electric engine of our story) and one called "NFA" (the gas engine). The details follow shortly ( 101), but for now just consider them names, like Bill and Ted. Or electric and gas.
Both engine types have been around for a long time, but like its gasoline counterpart, the NFA type seems to be used more often. Tools that use an NFA engine include Tcl, Perl, Python, GNU Emacs, ed, sed, vi, most versions of grep, and even a few versions of egrep and awk. On the other hand, a DFA engine is found in almost all versions of egrep and awk, as well as lex and flex. Table 4-1 on the next page lists a few common programs available for a wide variety of platforms and the regex engine that most versions use. A generic version means that it's an old tool with many clones-I have listed notably different clones that I'm aware of.
As Chapter 3 illustrated, 20 years of development with both DFAs and NFAs resulted in a lot of needless variety. Things were dirty. The POSIX standard came in to clean things up by specifying clearly which metacharacters an engine should support, and exactly the results you could expect from them. Superficial details aside, the DFAs (our electric engines) were already well suited to adhere to the standard, but the kind of results an NFA traditionally provided were quite different from the new standard, so changes were needed. As a result, broadly speaking, we have three types of regex engines:
POSIX standardized the workings of over 70 programs, including traditional regex-wielding tools such as awk, ed, egrep, expr, grep, lex, and sed. Most of these tools' regex flavor had (and still have) the weak powers equivalent to a moped. So weak, in fact, that I don't find them interesting for discussing regular expressions. Although they can certainly be extremely useful tools, you won't find much mention of expr, ed, and sed in this book. Well, to be fair, some modern versions of these tools have been retrofitted with a more-powerful flavor. This is commonly done to grep, a direct regex sibling of sed, ed, and expr.
On the other hand, egrep, awk, and lex were normally implemented with the electric DFA engine, so the new standard primarily just confirmed the status quo-no big changes. However, there were some gas-powered versions of these programs which had to be changed if they wanted to be POSIX-compliant. The gas engines that passed the California Emission Standards tests (POSIX NFA) were fine in that they produced results according to the standard, but the necessary changes only enhanced their fickleness to proper tuning. Where before you might get by with slightly misaligned spark plugs, you now have absolutely no tolerance. Gasoline that used to be "good enough" now causes knocks and pings. But so long as you know how to maintain your baby, the engine runs smoothly. And cleanly....
Here, you will learn how to use regular expressions to solve problems and get the most out of tools that provide them. Not only that, but much more: this book is about mastering regular expressions.
If you use a computer, you can benefit from regular expressions all the time (even if you don't realize it). When accessing World Wide Web search engines, with your editor, word processor, configuration scripts, and system tools, regular expressions are often provided as "power user" options. Languages such as Awk, Elisp, Expect, Perl, Python, and Tcl have regular-expression support built in (regular expressions are the very heart of many programs written in these languages), and regular-expression libraries are available for most other languages. For example, quite soon after Java became available, a regular-expression library was built and made freely available on the Web. Regular expressions are found in editors and programming environments such as vi, Delphi, Emacs, Brief, Visual C++, Nisus Writer, and many, many more. Regular expressions are very popular.
There's a good reason that regular expressions are found in so many diverse applications: they are extremely powerful. At a low level, a regular expression describes a chunk of text. You might use it to verify a user's input, or perhaps to sift through large amounts of data. On a higher level, regular expressions allow you to master your data. Control it. Put it to work for you. To master regular expressions is to master your data.
Sadly, that hasn't been the case. Regular-expression documentation is certainly plentiful, and has been available for a long time. (I read my first regular-expression-related manual back in 1981.) The problem, it seems, is that the documentation has traditionally centered on the "low-level view" that I mentioned a moment ago. You can talk all you want about how paints adhere to canvas, and the science of how colors blend, but this won't make you a great painter. With painting, as with any art, you must touch on the human aspect to really make a statement. Regular expressions, composed of a mixture of symbols and text, might seem to be a cold, scientific enterprise, but I firmly believe they are very much creatures of the right half of the brain. They can be an outlet for creativity, for cunningly brilliant programming, and for the elegant solution.
I'm not talented at anything that most people would call art. I go to karaoke bars in Kyoto a lot, but I make up for the lack of talent simply by being loud. I do, however, feel very artistic when I can devise an elegant solution to a tough problem. In much of my work, regular expressions are often instrumental in developing those elegant solutions. Because it's one of the few outlets for the artist in me, I have developed somewhat of a passion for regular expressions. It is my goal in writing this book to share some of that passion.
Anyone who uses GNU Emacs or vi, or programs in Perl, Tcl, Python, or Awk, should find a gold mine of detail, hints, tips, and understanding that can be put to immediate use. The detail and thoroughness is simply not found anywhere else.
Regular expressions are an idea -- one that is implemented in various ways by various utilities (many, many more than are specifically presented in this book). If you master the general concept of regular expressions, it's a short step to mastering a particular implementation. This book concentrates on that idea, so most of the knowledge presented here transcend the utilities used in the examples.