Perl and LWP

( 2 )


Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages.The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to ...

See more details below
Other sellers (Paperback)
  • All (26) from $1.99   
  • New (9) from $22.15   
  • Used (17) from $1.99   
Perl & LWP

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK 7.0
  • Samsung Galaxy Tab 4 NOOK 10.1
  • NOOK HD Tablet
  • NOOK HD+ Tablet
  • NOOK eReaders
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$17.99 price
(Save 43%)$31.99 List Price


Perl soared to popularity as a language for creating and managing web content, but with LWP (Library for WWW in Perl), Perl is equally adept at consuming information on the Web. LWP is a suite of modules for fetching and processing web pages.The Web is a vast data source that contains everything from stock prices to movie credits, and with LWP all that data is just a few lines of code away. Anything you do on the Web, whether it's buying or selling, reading or writing, uploading or downloading, news to e-commerce, can be controlled with Perl and LWP. You can automate Web-based purchase orders as easily as you can set up a program to download MP3 files from a web site.Perl & LWP covers:

  • Understanding LWP and its design
  • Fetching and analyzing URLs
  • Extracting information from HTML using regular expressions and tokens
  • Working with the structure of HTML documents using trees
  • Setting and inspecting HTTP headers and response codes
  • Managing cookies
  • Accessing information that requires authentication
  • Extracting links
  • Cooperating with proxy caches
  • Writing web spiders (also known as robots) in a safe fashion
Perl & LWP includes many step-by-step examples that show how to apply the various techniques. Programs to extract information from the web sites of BBC News, Altavista,, and the Weather Underground, to name just a few, are explained in detail, so that you understand how and why they work.Perl programmers who want to automate and mine the web can pick up this book and be immediately productive. Written by a contributor to LWP, and with a foreword by one of LWP's creators, Perl & LWP is the authoritative guide to this powerful and popular toolkit.

This comprehensive guide to LWP and its applications comes with many practical examples. Topics include fetching Web pages, submitting forms, using various techniques for HTML parsing, handling cookies and authentication.

Read More Show Less

Product Details

  • ISBN-13: 9780596001780
  • Publisher: O'Reilly Media, Incorporated
  • Publication date: 6/28/2002
  • Edition number: 1
  • Pages: 262
  • Product dimensions: 7.08 (w) x 9.20 (h) x 0.68 (d)

Meet the Author

Burke is an active member of the Perl community. Trained as a linguist, he also develops tools for software internationalization and native language preservation.

Read More Show Less

Table of Contents

Audience for This Book;
Structure of This Book;
Order of Chapters;
Important Standards Documents;
Conventions Used in This Book;
Comments & Questions;
Chapter 1: Introduction to Web Automation;
1.1 The Web as Data Source;
1.2 History of LWP;
1.3 Installing LWP;
1.4 Words of Caution;
1.5 LWP in Action;
Chapter 2: Web Basics;
2.1 URLs;
2.2 An HTTP Transaction;
2.3 LWP::Simple;
2.4 Fetching Documents Without LWP::Simple;
2.5 Example: AltaVista;
2.7 Example: Babelfish;
Chapter 3: The LWP Class Model;
3.1 The Basic Classes;
3.2 Programming with LWP Classes;
3.3 Inside the do_GET and do_POST Functions;
3.4 User Agents;
3.5 HTTP::Response Objects;
3.6 LWP Classes: Behind the Scenes;
Chapter 4: URLs;
4.1 Parsing URLs;
4.2 Relative URLs;
4.3 Converting Absolute URLs to Relative;
4.4 Converting Relative URLs to Absolute;
Chapter 5: Forms;
5.1 Elements of an HTML Form;
5.2 LWP and GET Requests;
5.3 Automating Form Analysis;
5.4 Idiosyncrasies of HTML Forms;
5.5 POST Example: License Plates;
5.6 POST Example:;
5.7 File Uploads;
5.8 Limits on Forms;
Chapter 6: Simple HTML Processing with Regular Expressions;
6.1 Automating Data Extraction;
6.2 Regular Expression Techniques;
6.3 Troubleshooting;
6.4 When Regular Expressions Aren't Enough;
6.5 Example: Extracting Linksfrom a Bookmark File;
6.6 Example: Extracting Linksfrom Arbitrary HTML;
6.7 Example: Extracting Temperatures from Weather Underground;
Chapter 7: HTML Processing with Tokens;
7.1 HTML as Tokens;
7.2 Basic HTML::TokeParser Use;
7.3 Individual Tokens;
7.4 Token Sequences;
7.5 More HTML::TokeParser Methods;
7.6 Using Extracted Text;
Chapter 8: Tokenizing Walkthrough;
8.1 The Problem;
8.2 Getting the Data;
8.3 Inspecting the HTML;
8.4 First Code;
8.5 Narrowing In;
8.6 Rewrite for Features;
8.7 Alternatives;
Chapter 9: HTML Processing with Trees;
9.1 Introduction to Trees;
9.2 HTML::TreeBuilder;
9.3 Processing;
9.4 Example: BBC News;
9.5 Example: Fresh Air;
Chapter 10: Modifying HTML with Trees;
10.1 Changing Attributes;
10.2 Deleting Images;
10.3 Detaching and Reattaching;
10.4 Attaching in Another Tree;
10.5 Creating New Elements;
Chapter 11: Cookies, Authentication,and Advanced Requests;
11.1 Cookies;
11.2 Adding Extra Request Header Lines;
11.3 Authentication;
11.4 An HTTP Authentication Example:The Unicode Mailing Archive;
Chapter 12: Spiders;
12.1 Types of Web-Querying Programs;
12.2 A User Agent for Robots;
12.3 Example: A Link-Checking Spider;
12.4 Ideas for Further Expansion;
Appendix A: LWP Modules;
Appendix B: HTTP Status Codes;
B.1 100s: Informational;
B.2 200s: Successful;
B.3 300s: Redirection;
B.4 400s: Client Errors;
B.5 500s: Server Errors;
Appendix C: Common MIME Types;
Appendix D: Language Tags;
Appendix E: Common Content Encodings;
Appendix F: ASCII Table;
Appendix G: User's View of Object-Oriented Modules;
G.1 A User's View of Object-Oriented Modules;
G.2 Modules and Their Functional Interfaces;
G.3 Modules with Object-Oriented Interfaces;
G.4 What Can You Do with Objects?;
G.5 What's in an Object?;
G.6 What Is an Object Value?;
G.7 So Why Do Some Modules Use Objects?;
G.8 The Gory Details;

Read More Show Less

Customer Reviews

Average Rating 4.5
( 2 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing all of 3 Customer Reviews
  • Anonymous

    Posted October 14, 2002

    Automate HTML retrieval and parsing

    Perl & LWP covers the art of automating HTML page retrieval and parsing. If you've ever wanted to grab news headlines from your favorite website, or automatically check web pages for specific text, this book guides you through the tools and techniques required. In particular the code snippets, single use examples, and reoccurring programs comprise a large portion of the text and are accompanied by detailed explanations covering how they work and how to modify them. Similar examples are used throughout the book provide continuity and comparison as more complex topics are introduced. The book's chapters fall into three sections of retrieving web pages, parsing HTML, and advanced topics. Additionally the seven appendixes provide great reference for common topics encountered as you write your own LWP based programs. Chapters 1 to 5 introduce general web concepts including URLS, HTTP requests and responses, and HTML forms; while also introducing core modules like LWP::UserAgent and HTTP::Response. To help find what you are looking for in HTML, simple regular expressions, token parsing, and tree building are covered in chapters 6 to 9. Lastly, chapters 10 to 12 cover HTML modification using trees, cookies, authentication, and spiders. The code examples presented require a good understanding of Perl syntax and a basic understanding of HTML is beneficial. A few helpful tips and gotchas are sprinkled in the text, but are often buried, making them hard to find or even notice. One topic I would have liked to have seen covered was a discussion about HTTPS. Even if the details about configuring LWP to support HTTPS were omitted, a reference or pointer to online material would have been helpful. Overall this is a great book for learning LWP and how it can help you automate and simplify HTML parsing and web crawling.

    1 out of 1 people found this review helpful.

    Was this review helpful? Yes  No   Report this review
  • Anonymous

    Posted July 16, 2002

    Excellent coverage of LWP, packed full of useful examples

    I was definitely interested when I first heard that O'Reilly were publishing a book on LWP. LWP is a definitive collection of perl modules covering everything you could think of doing with URIs, HTML, and HTTP. While 'web services' are the buzzword friendly technology of the day, sometimes you need to roll your sleeves up and get a bit dirty scraping screens and hacking at HTML. For such a deep subject, this book weighs in at a slim 242 pages. This is a very good thing. I'm far too busy to read these massive shelf-destroying tomes that seem to be churned out recently. It covers everything you need to know with concise examples, which is what makes this book really shine. You start with the basics using LWP::Simple through to more advanced topics using LWP::UserAgent, HTTP::Cookies, and WWW::RobotRules. Sean shows finger saving tips and shortcuts that take you more than a couple notches above what you can learn from the lwpcook manpage, with enough depth to satisfy somebody who is an experienced LWP hacker. This book is a great reference, just flick through and you'll find a relevant chapter with an example to save the day. Chapters include filling in forms and extracting data from HTML using regular expressions, then more advanced topics using HTML::TokeParser, and then my preferred tool, the author's own HTML::TreeBuilder. The book ends with a chapter on spidering, with excellent coverage of design and warnings to get your started on your web trawling.

    1 out of 1 people found this review helpful.

    Was this review helpful? Yes  No   Report this review
  • Anonymous

    Posted December 4, 2010

    No text was provided for this review.

Sort by: Showing all of 3 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)