Mastering Regular Expressions: Understand Your Data and Be More Productive

Mastering Regular Expressions: Understand Your Data and Be More Productive

by Jeffrey Friedl
Mastering Regular Expressions: Understand Your Data and Be More Productive

Mastering Regular Expressions: Understand Your Data and Be More Productive

by Jeffrey Friedl

Paperback(Third Edition)

$53.99  $59.99 Save 10% Current price is $53.99, Original price is $59.99. You Save 10%.
  • SHIP THIS ITEM
    Qualifies for Free Shipping
    Choose Expedited Shipping at checkout for delivery by Friday, March 22
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

Regular expressions are an extremely powerful tool for manipulating text and data. They are now standard features in a wide range of languages and popular tools, including Perl, Python, Ruby, Java, VB.NET and C# (and any language using the .NET Framework), PHP, and MySQL.

If you don't use regular expressions yet, you will discover in this book a whole new world of mastery over your data. If you already use them, you'll appreciate this book's unprecedented detail and breadth of coverage. If you think you know all you need to know about regular expressions, this book is a stunning eye-opener.

As this book shows, a command of regular expressions is an invaluable skill. Regular expressions allow you to code complex and subtle text processing that you never imagined could be automated. Regular expressions can save you time and aggravation. They can be used to craft elegant solutions to a wide range of problems. Once you've mastered regular expressions, they'll become an invaluable part of your toolkit. You will wonder how you ever got by without them.

Yet despite their wide availability, flexibility, and unparalleled power, regular expressions are frequently underutilized. Yet what is power in the hands of an expert can be fraught with peril for the unwary. Mastering Regular Expressions will help you navigate the minefield to becoming an expert and help you optimize your use of regular expressions.

Mastering Regular Expressions, Third Edition, now includes a full chapter devoted to PHP and its powerful and expressive suite of regular expression functions, in addition to enhanced PHP coverage in the central "core" chapters. Furthermore, this edition has been updated throughout to reflect advances in other languages, including expanded in-depth coverage of Sun's java.util.regex package, which has emerged as the standard Java regex implementation.Topics include:

  • A comparison of features among different versions of many languages and tools
  • How the regular expression engine works
  • Optimization (major savings available here!)
  • Matching just what you want, but not what you don't want
  • Sections and chapters on individual languages

Written in the lucid, entertaining tone that makes a complex, dry topic become crystal-clear to programmers, and sprinkled with solutions to complex real-world problems, Mastering Regular Expressions, Third Edition offers a wealth information that you can put to immediate use.

Reviews of this new edition and the second edition:

"There isn't a better (or more useful) book available on regular expressions."

—Zak Greant, Managing Director, eZ Systems

"A real tour-de-force of a book which not only covers the mechanics of regexes in extraordinary detail but also talks about efficiency and the use of regexes in Perl, Java, and .NET...If you use regular expressions as part of your professional work (even if you already have a good book on whatever language you're programming in) I would strongly recommend this book to you."

—Dr. Chris Brown, Linux Format

"The author does an outstanding job leading the reader from regex novice to master. The book is extremely easy to read and chock full of useful and relevant examples...Regular expressions are valuable tools that every developer should have in their toolbox. Mastering Regular Expressions is the definitive guide to the subject, and an outstanding resource that belongs on every programmer's bookshelf. Ten out of Ten Horseshoes."

—Jason Menard, Java Ranch


Product Details

ISBN-13: 9780596528126
Publisher: O'Reilly Media, Incorporated
Publication date: 08/15/2006
Edition description: Third Edition
Pages: 542
Sales rank: 395,100
Product dimensions: 7.00(w) x 9.19(h) x 1.30(d)

About the Author

Jeffrey Friedl was raised in the countryside of Rootstown, Ohio, and had aspirations of being an astronomer until one day he noticed a TRS-80 Model I sitting unused in the corner of the chem lab (bristling with a full 16K of RAM, no less). He eventually began using Unix (and regular expressions) in 1980, and earned degrees in Computer Science from Kent (BS) and the University of New Hampshire (MS). He did kernel development for Omron Corporation in Kyoto, Japan for eight years before moving in 1997 to Silicon Valley to apply his regular-expression know-how to financial news and data for a little-known company called "Yahoo!"



When faced with the daunting task of filling his copious free time, Jeffrey enjoys playing Ultimate Frisbee and basketball with friends at Yahoo!, programming his house, and feeding the squirrels and jays in his back yard. He also enjoys spending time with his wife Fumie, and preparing for the Fall 2002 release of their first "software project" together.

Table of Contents

Prefacexv
1Introduction to Regular Expressions1
Solving Real Problems2
Regular Expressions as a Language4
The Filename Analogy4
The Language Analogy5
The Regular-Expression Frame of Mind6
If You Have Some Regular-Expression Experience6
Searching Text Files: Egrep6
Egrep Metacharacters8
Start and End of the Line8
Character Classes9
Matching Any Character with Dot11
Alternation13
Ignoring Differences in Capitalization14
Word Boundaries15
In a Nutshell16
Optional Items17
Other Quantifiers: Repetition18
Parentheses and Backreferences20
The Great Escape22
Expanding the Foundation23
Linguistic Diversification23
The Goal of a Regular Expression23
A Few More Examples23
Regular Expression Nomenclature27
Improving on the Status Quo30
Summary32
Personal Glimpses33
2Extended Introductory Examples35
About the Examples36
A Short Introduction to Perl37
Matching Text with Regular Expressions38
Toward a More Real-World Example40
Side Effects of a Successful Match40
Intertwined Regular Expressions43
Intermission49
Modifying Text with Regular Expressions50
Example: Form Letter50
Example: Prettifying a Stock Price51
Automated Editing53
A Small Mail Utility53
Adding Commas to a Number with Lookaround59
Text-to-HTML Conversion67
That Doubled-Word Thing77
3Overview of Regular Expression Features and Flavors83
A Casual Stroll Across the Regex Landscape85
The Origins of Regular Expressions85
At a Glance91
Care and Handling of Regular Expressions93
Integrated Handling94
Procedural and Object-Oriented Handling95
A Search-and-Replace Example97
Search and Replace in Other Languages99
Care and Handling: Summary101
Strings, Character Encodings, and Modes101
Strings as Regular Expressions101
Character-Encoding Issues105
Regex Modes and Match Modes109
Common Metacharacters and Features112
Character Representations114
Character Classes and Class-Like Constructs117
Anchors and Other "Zero-Width Assertions"127
Comments and Mode Modifiers133
Grouping, Capturing, Conditionals, and Control135
Guide to the Advanced Chapters141
4The Mechanics of Expression Processing143
Start Your Engines!143
Two Kinds of Engines144
New Standards144
Regex Engine Types145
From the Department of Redundancy Department146
Testing the Engine Type146
Match Basics147
About the Examples147
Rule 1The Match That Begins Earliest Wins148
Engine Pieces and Parts149
Rule 2The Standard Quantifiers Are Greedy151
Regex-Directed Versus Text-Directed153
NFA Engine: Regex-Directed153
DFA Engine: Text-Directed155
First Thoughts: NFA and DFA in Comparison156
Backtracking157
A Really Crummy Analogy158
Two Important Points on Backtracking159
Saved States159
Backtracking and Greediness162
More About Greediness and Backtracking163
Problems of Greediness164
Multi-Character "Quotes"165
Using Lazy Quantifiers166
Greediness and Laziness Always Favor a Match167
The Essence of Greediness, Laziness, and Backtracking168
Possessive Quantifiers and Atomic Grouping169
Possessive Quantifiers, ?+, *+, ++, and {m,n}+172
The Backtracking of Lookaround173
Is Alternation Greedy?174
Taking Advantage of Ordered Alternation175
NFA, DFA, and POSIX177
"The Longest-Leftmost"177
POSIX and the Longest-Leftmost Rule178
Speed and Efficiency179
Summary: NFA and DFA in Comparison180
Summary183
5Practical Regex Techniques185
Regex Balancing Act186
A Few Short Examples186
Continuing with Continuation Lines186
Matching an IP Address187
Working with Filenames190
Matching Balanced Sets of Parentheses193
Watching Out for Unwanted Matches194
Matching Delimited Text196
Knowing Your Data and Making Assumptions198
Stripping Leading and Trailing Whitespace199
HTML-Related Examples200
Matching an HTML Tag200
Matching an HTML Link201
Examining an HTTP URL203
Validating a Hostname203
Plucking Out a URL in the Real World205
Extended Examples208
Keeping in Sync with Your Data208
Parsing CSV Files212
6Crafting an Efficient Expression221
A Sobering Example222
A Simple Change--Placing Your Best Foot Forward223
Efficiency Verses Correctness223
Advancing Further--Localizing the Greediness225
Reality Check226
A Global View of Backtracking228
More Work for a POSIX NFA229
Work Required During a Non-Match230
Being More Specific231
Alternation Can Be Expensive231
Benchmarking232
Know What You're Measuring234
Benchmarking with Java234
Benchmarking with VB.NET236
Benchmarking with Python237
Benchmarking with Ruby238
Benchmarking with Tcl239
Common Optimizations239
No Free Lunch240
Everyone's Lunch is Different240
The Mechanics of Regex Application241
Pre-Application Optimizations242
Optimizations with the Transmission245
Optimizations of the Regex Itself247
Techniques for Faster Expressions252
Common Sense Techniques254
Expose Literal Text255
Expose Anchors255
Lazy Versus Greedy: Be Specific256
Split Into Multiple Regular Expressions257
Mimic Initial-Character Discrimination258
Use Atomic Grouping and Possessive Quantifiers259
Lead the Engine to a Match260
Unrolling the Loop261
Method 1Building a Regex From Past Experiences262
The Real "Unrolling-the-Loop" Pattern263
Method 2A Top-Down View266
Method 3An Internet Hostname267
Observations268
Using Atomic Grouping and Possessive Quantifiers268
Short Unrolling Examples270
Unrolling C Comments272
The Freeflowing Regex277
A Helping Hand to Guide the Match277
A Well-Guided Regex is a Fast Regex279
Wrapup280
In Summary: Think!281
7Perl283
Regular Expressions as a Language Component285
Perl's Greatest Strength286
Perl's Greatest Weakness286
Perl's Regex Flavor286
Regex Operands and Regex Literals288
How Regex Literals Are Parsed292
Regex Modifiers292
Regex-Related Perlisms293
Expression Context294
Dynamic Scope and Regex Match Effects295
Special Variables Modified by a Match299
The qr/.../ Operator and Regex Objects303
Building and Using Regex Objects303
Viewing Regex Objects305
Using Regex Objects for Efficiency306
The Match Operator306
Match's Regex Operand307
Specifying the Match Target Operand308
Different Uses of the Match Operator309
Iterative Matching: Scalar Context, with /g312
The Match Operator's Environmental Relations316
The Substitution Operator318
The Replacement Operand319
The /e Modifier319
Context and Return Value321
The Split Operator321
Basic Split322
Returning Empty Elements324
Split's Special Regex Operands325
Split's Match Operand with Capturing Parentheses326
Fun with Perl Enhancements326
Using a Dynamic Regex to Match Nested Pairs328
Using the Embedded-Code Construct331
Using local in an Embedded-Code Construct335
A Warning About Embedded Code and my Variables338
Matching Nested Constructs with Embedded Code340
Overloading Regex Literals341
Problems with Regex-Literal Overloading344
Mimicking Named Capture344
Perl Efficiency Issues347
"There's More Than One Way to Do It"348
Regex Compilation, the /o Modifier, qr/.../, and Efficiency348
Understanding the "Pre-Match" Copy355
The Study Function359
Benchmarking360
Regex Debugging Information361
Final Comments363
8Java365
Judging a Regex Package366
Technical Issues366
Social and Political Issues367
Object Models368
A Few Abstract Object Models368
Growing Complexity372
Packages, Packages, Packages372
Why So Many "Perl5" Flavors?375
Lies, Damn Lies, and Benchmarks375
Recommendations377
Sun's Regex Package378
Regex Flavor378
Using java.util.regex381
The Pattern.compile() Factory383
The Matcher Object384
Other Pattern Methods390
A Quick Look at Jakarta-ORO392
ORO's Perl5Util392
A Mini Perl5Util Reference393
Using ORO's Underlying Classes397
9.NET399
.NET's Regex Flavor400
Additional Comments on the Flavor402
Using .NET Regular Expressions407
Regex Quickstart407
Package Overview409
Core Object Overview410
Core Object Details412
Creating Regex Objects413
Using Regex Objects415
Using Match Objects421
Using Group Objects424
Static "Convenience" Functions425
Regex Caching426
Support Functions426
Advanced .NET427
Regex Assemblies428
Matching Nested Constructs430
Capture Objects431
Index433
From the B&N Reads Blog

Customer Reviews