I have been reading several introductory Perl books recently and thought Data Munging with Perl, by David Cross, looked like a good second Perl book. After all, what the author calls Data Munging reading and writing data, converting data from one format to another is firmly within the computing mainstream. And the Web has not given us any less data or any fewer data formats to deal with.
But what is "munging"? Perl's interpreted nature and obscure-looking syntax appeal to me. Perhaps it is this same bent that causes me veritable excitement upon reading the assertion in Chapter 2, that "most data munging tasks look like: read input, Munge (or process), write output."
In a day in which the prominent development approach consists of objects whose interactions are not known until run time, and where a hot development technique more resembles a new variety of computing buddy system, this assertion has an appealing historical ring to it. This is something COBOL programmers and programmers coding UNIX filters can agree upon.
Cross clearly believes that fiddling with data and converting among formats are still important in the life of the working programmer, and that Perl is the language for the task.
For instance, UNIX-style filters are high on the list of techniques Cross recommends for munging. Other tips are "don't throw anything away" (sometimes it pays to read in more data than you currently need), design your data structures well in the beginning, and "don't do too much processing in the input routine." The third means to leave something for the munging (processing) routine to do.
Data Munging with Perl's 12 chapters address progressively more "interesting" types of data, with strategies for dealing with each. Each section includes an introductory rationale followed by examples in Perl.
Cross sometimes refines these examples. My favorite section: "How Not To Parse HTML" in Chapter 8 (by annihilating everything between "<" and ">") is followed in Chapter 9 with several Perl add-in modules that do the parsing for you.
Techniques are also presented from simple to complex. Reading data line-by-line into an array of strings or array of hashes may work for record-oriented data (Chapter 6), whereas parsing with an extension module (XML::Parser), works better for data with strict idiosyncratic structure.
Chapter 7 provides a discussion on reading binary with read() and unpack() that I found useful.
I also like the explanation of regular expressions in Chapter 4, which starts with simple examples that become more all encompassing. For example, /regular expression/ matches "regular expression" and /[a-z]/ matches the lowercase letters.
Similar for the explanation of parsers on (page 159): tokens, rules, grammars, top-down and bottom-up parsing. This is lucid stuff for such a short space and a topic that has so much theory attached to it.
However, Data Munging with Perl suffers from the "same page syndrome."
While writing a Java book, I was beset by the urge (which I now think misguided) to write the "short history of programming languages" in an introductory chapter. You know the stuff: Languages evolved from ML to assembler, which gave way to high-level languages. My editor averred, "That's fine to include Doug, just so everyone is on the same page."
This attitude stems from the goal of publishers to sell a book to a cross-over audience such as Intermediate to Advanced. The result is that there are conservatively hundreds of computer books on the market that say the same things.
And this syndrome also subjects readers to some strange contradictions. On page Data Munging,139 of you have the author introducing ASCII text, what it is and how it takes more space to store than the same data in binary. But the Perl examples in this book can only be understood by a veteran. If I can read the Perl in this book without help, why would I not know about ASCII already?
But this minor flaw only annoyed me a little, making Data Munging a bit wordier than it might have been. With its narrow focus on a language of current interest, this book does not quite rise to the level of "Software Tools," but it still shows some good Perl programming, and provides convincing evidence of the value of data structures beyond the halls of academia along the way.
I found the sample problems and the author's solutions to be very
well done. I especially liked the design tips...
Well worth the price, and a good starting point for more advanced
A very good resource for programmers who want to learn more about
data parsing, data filters, and data conversion...
"The book's chapters are concise, the coverage is comprehensive, and the examples are plentiful and relevant. I've been using Perl's data munging capabilities heavily for many years, and I still picked up some useful new insights from Cross' book."
"Coders looking to transform data somehow and hackers who want to take advantage of Perl's unique features will improve their knowledge and understanding. If you find yourself working with files or records in Perl, this book will save you time and trouble."
"Munging" is a computer term referring to the process of data conversion. Perl is particularly well suited to data munging and this programmer's guide provides advice on how to most efficiently manipulate data using Perl. After the manipulation of unstructured, record-oriented, fixed-width, and binary data is explored, the work moves into the realms of hierarchical data structures and parsers such as HTML and XML parsing tools. Finally, a demonstration of how to write one's own parsers for data structures is provided. Annotation c. Book News, Inc., Portland, OR (booknews.com)