Perl & LWPby Sean M. Burke
The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is
Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages.
The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to e-commerce, can be controlled with Perl and LWP. You can automate Web-based purchase orders as easily as you can set up a program to download MP3 files from a web site.
Perl & LWP covers:
Understanding LWP and its design
Fetching and analyzing URLs
Extracting information from HTML using regular expressions and tokens
Working with the structure of HTML documents using trees
Setting and inspecting HTTP headers and response codes
Accessing information that requires authentication
Cooperating with proxy caches
Writing web spiders (also known as robots) in a safe fashion
Perl & LWP includes many step-by-step examples that show how to apply the various techniques. Programs to extract information from the web sites of BBC News, Altavista, ABEBooks.com, and the Weather Underground, to name just a few, are explained in detail, so that you understand how and why they work.
Perl programmers who want to automate and mine the web can pick up this book and be immediately productive. Written by a contributor to LWP, and with a foreword by one of LWP's creators, Perl & LWP is the authoritative guide to this powerful and popular toolkit.
- O'Reilly Media, Incorporated
- Publication date:
- Sold by:
- Barnes & Noble
- NOOK Book
- File size:
- 3 MB
and post it to your social network
Most Helpful Customer Reviews
See all customer reviews >
Perl & LWP covers the art of automating HTML page retrieval and parsing. If you've ever wanted to grab news headlines from your favorite website, or automatically check web pages for specific text, this book guides you through the tools and techniques required. In particular the code snippets, single use examples, and reoccurring programs comprise a large portion of the text and are accompanied by detailed explanations covering how they work and how to modify them. Similar examples are used throughout the book provide continuity and comparison as more complex topics are introduced. The book's chapters fall into three sections of retrieving web pages, parsing HTML, and advanced topics. Additionally the seven appendixes provide great reference for common topics encountered as you write your own LWP based programs. Chapters 1 to 5 introduce general web concepts including URLS, HTTP requests and responses, and HTML forms; while also introducing core modules like LWP::UserAgent and HTTP::Response. To help find what you are looking for in HTML, simple regular expressions, token parsing, and tree building are covered in chapters 6 to 9. Lastly, chapters 10 to 12 cover HTML modification using trees, cookies, authentication, and spiders. The code examples presented require a good understanding of Perl syntax and a basic understanding of HTML is beneficial. A few helpful tips and gotchas are sprinkled in the text, but are often buried, making them hard to find or even notice. One topic I would have liked to have seen covered was a discussion about HTTPS. Even if the details about configuring LWP to support HTTPS were omitted, a reference or pointer to online material would have been helpful. Overall this is a great book for learning LWP and how it can help you automate and simplify HTML parsing and web crawling.
I was definitely interested when I first heard that O'Reilly were publishing a book on LWP. LWP is a definitive collection of perl modules covering everything you could think of doing with URIs, HTML, and HTTP. While 'web services' are the buzzword friendly technology of the day, sometimes you need to roll your sleeves up and get a bit dirty scraping screens and hacking at HTML. For such a deep subject, this book weighs in at a slim 242 pages. This is a very good thing. I'm far too busy to read these massive shelf-destroying tomes that seem to be churned out recently. It covers everything you need to know with concise examples, which is what makes this book really shine. You start with the basics using LWP::Simple through to more advanced topics using LWP::UserAgent, HTTP::Cookies, and WWW::RobotRules. Sean shows finger saving tips and shortcuts that take you more than a couple notches above what you can learn from the lwpcook manpage, with enough depth to satisfy somebody who is an experienced LWP hacker. This book is a great reference, just flick through and you'll find a relevant chapter with an example to save the day. Chapters include filling in forms and extracting data from HTML using regular expressions, then more advanced topics using HTML::TokeParser, and then my preferred tool, the author's own HTML::TreeBuilder. The book ends with a chapter on spidering, with excellent coverage of design and warnings to get your started on your web trawling.