Refactoring HTML: Improving the Design of Existing Web Applications [NOOK Book]


Like any other software system, Web sites gradually accumulate “cruft” over time. They slow down. Links break. Security and compatibility problems mysteriously appear. New features don’t integrate seamlessly. Things just don’t work as well. In an ideal world, you’d rebuild from scratch. But you can’t: there’s no time or money for that. Fortunately, there’s a solution: You can refactor your Web code using easy, proven techniques, tools, and ...
See more details below
Refactoring HTML: Improving the Design of Existing Web Applications

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK 7.0
  • Samsung Galaxy Tab 4 NOOK 10.1
  • NOOK HD Tablet
  • NOOK HD+ Tablet
  • NOOK eReaders
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac

Want a NOOK? Explore Now

NOOK Book (eBook)
$31.49 price
(Save 12%)$35.99 List Price


Like any other software system, Web sites gradually accumulate “cruft” over time. They slow down. Links break. Security and compatibility problems mysteriously appear. New features don’t integrate seamlessly. Things just don’t work as well. In an ideal world, you’d rebuild from scratch. But you can’t: there’s no time or money for that. Fortunately, there’s a solution: You can refactor your Web code using easy, proven techniques, tools, and recipes adapted from the world of software development.

In Refactoring HTML, Elliotte Rusty Harold explains how to use refactoring to improve virtually any Web site or application. Writing for programmers and non-programmers alike, Harold shows how to refactor for better reliability, performance, usability, security, accessibility, compatibility, and even search engine placement. Step by step, he shows how to migrate obsolete code to today’s stable Web standards, including XHTML, CSS, and REST—and eliminate chronic problems like presentation-based markup, stateful applications, and “tag soup.”

The book’s extensive catalog of detailed refactorings and practical “recipes for success” are organized to help you find specific solutions fast, and get maximum benefit for minimum effort. Using this book, you can quickly improve site performance now—and make your site far easier to enhance, maintain, and scale for years to come.

Topics covered include

• Recognizing the “smells” of Web code that should be refactored
• Transforming old HTML into well-formed, valid XHTML, one step at a time
• Modernizing existing layouts with CSS
• Updating old Web applications: replacing POST with GET, replacing old contact forms, and refactoring JavaScript
• Systematically refactoring content and links
• Restructuring sites without changing the URLs your users rely upon

This book will be an indispensable resource for Web designers, developers, project managers, and anyone who maintains or updates existing sites. It will be especially helpful to Web professionals who learned HTML years ago, and want to refresh their knowledge with today’s standards-compliant best practices.
This book will be an indispensable resource for Web designers, developers, project managers, and anyone who maintains or updates existing sites. It will be especially helpful to Web professionals who learned HTML years ago, and want to refresh their knowledge with today’s standards-compliant best practices.

Read More Show Less

Product Details

  • ISBN-13: 9780132701877
  • Publisher: Pearson Education
  • Publication date: 3/30/2012
  • Sold by: Barnes & Noble
  • Format: eBook
  • Edition number: 1
  • Pages: 368
  • File size: 3 MB

Meet the Author

Elliotte Rusty Harold is an internationally respected writer, programmer, and educator. His Cafe con Leche Web site has become one of the most popular sites for information on XML. In addition, he is the author and coauthor of numerous books, the most recent of which are Java I/O (O’Reilly, 2006), Java Network Programming (O’Reilly, 2004), Effective XML (Addison-Wesley, 2003), and XML in a Nutshell (O’Reilly, 2002).
Read More Show Less

Table of Contents

Foreword by Martin Fowler xvii
Foreword by Bob DuCharme xix
About the Author xxi

Chapter 1 Refactoring 1
Why Refactor 3
When to Refactor 11
What to Refactor To 13
Objections to Refactoring 23

Chapter 2 Tools 25
Backups, Staging Servers, and Source Code Control 25
Validators 27
Testing 34
Regular Expressions 48
Tidy 54
TagSoup 60

Chapter 3 Well-Formedness 65
What Is Well-Formedness? 66
Change Name to Lowercase 69
Quote Attribute Value 73
Fill In Omitted Attribute Value 76
Replace Empty Tag with Empty-Element Tag 78
Add End-tag 81
Remove Overlap 85
Convert Text to UTF-8 89
Escape Less-Than Sign 91
Escape Ampersand 93
Escape Quotation Marks in Attribute Values 96
Introduce an XHTML DOCTYPE Declaration 98
Terminate Each Entity Reference 101
Replace Imaginary Entity References 102
Introduce a Root Element 103
Introduce the XHTML Namespace 104

Chapter 4 Validity 107
Introduce a Transitional DOCTYPE Declaration 109
Remove All Nonexistent Tags 111
Add an alt Attribute 114
Replace embed with object 117
Introduce a Strict DOCTYPE Declaration 123
Replace center with CSS 124
Replace font with CSS 127
Replace i with em or CSS 131
Replace b with strong or CSS 134
Replace the color Attribute with CSS 136
Convert img Attributes to CSS 140
Replace applet with object 142
Replace Presentational Elements with CSS 146
Nest Inline Elements inside Block Elements 149

Chapter 5 Layout 155
Replace Table Layouts 156
Replace Frames with CSS Positions 170
Move Content to the Front 180
Mark Up Lists as Lists 184
Replace blockquote/ul Indentation with CSS 187
Replace Spacer GIFs 189
Add an ID Attribute 191
Add Width and Height to an Image 195

Chapter 6 Accessibility 199
Convert Images to Text 202
Add Labels to Form Input 206
Introduce Standard Field Names 210
Turn on Autocomplete 216
Add Tab Indexes to Forms 218
Introduce Skip Navigation 222
Add Internal Headings 225
Move Unique Content to the Front of Links and Headlines 226
Make the Input Field Bigger 228
Introduce Table Descriptions 230
Introduce Acronym Elements 235
Introduce lang Attributes 236

Chapter 7 Web Applications 241
Replace Unsafe GET with POST 241
Replace Safe POST with GET 246
Redirect POST to GET 251
Enable Caching 254
Prevent Caching 258
Introduce ETag 261
Replace Flash with HTML 265
Add Web Forms 2.0 Types 270
Replace Contact Forms with mailto Links 277
Block Robots 280
Escape User Input 284

Chapter 8 Content 287
Correct Spelling 287
Repair Broken Links 292
Move a Page 298
Remove the Entry Page 302
Hide E-mail Addresses 304

Appendix A Regular Expressions 309
Characters That Match Themselves 309
Metacharacters 311
Wildcards 312
Quantifiers 313

Index 327

Read More Show Less


ForewordForeword by Martin Fowler

In just over a decade the Web has gone from a technology with promise to major part of the world's infrastructure. It's been a fascinating time, and many useful resources have been built in the process. But, as with any technology, we've learned as we go how best to use it and the technology itself has matured to help us use it better.

However complex a web application, it finally hits the glass in the form of HTML—the universal web page description language. HTML is a computer language, albeit a very limited and specialized one. As such, if you want a system that you can evolve easily over time, you need to pay attention to writing HTML that is clear and understandable. But just like any computer language, or indeed any writing at all, it's hard to get it right first time. Clear code comes from writing and rewriting with a determination to create something that is easy to follow.

Rewriting code carries a risk of introducing bugs. Several years ago, I wrote about a technique called refactoring, which is a disciplined way of rewriting code that can greatly reduce the chances of introducing bugs while reworking software. Refactoring has made a big impact on regular software languages. Many programmers use it as part of their daily work to help them keep code clear and enhance their future productivity. Tools have sprung up to automate refactoring tasks, to further improve the workflow.

Just as refactoring can make a big improvement to regular programming, the same basic idea can work with HTML. The refactoring steps are different, but the underlying philosophy is the same. By learning how torefactor your HTML, you can keep your HTML clean and easy to change into the future, allowing you to make the inevitable changes more quickly. These techniques can also allow you to bring web sites into line with the improvements in web technologies, specifically allowing you to move toward supporting XHTML and CSS.

Elliotte Rusty Harold has long had a permanent place on my bookshelf for his work on XML technologies and open source software for XML processing. I've always respected him as a fine programmer and writer. With this book he brings the benefits of refactoring into the HTML world.

—Martin Fowler

Foreword by Bob DuCharme

A key to the success of the World Wide Web has always been the ease with which just about anyone can create a web page and put it where everyone can see it. As people create sets of interlinked pages, their web sites become more useful to a wider audience, and stories of web millionaires inspire these web developers to plan greater things.

Many find, though, that as their web sites get larger, they have growing pains. Revised links lead to nowhere, pages look different in different browsers, and it becomes more difficult to keep track of what's where, especially when trying to apply changes consistently throughout the site. This is when many who built their own web site call in professional help, but now with Refactoring HTML, you can become that professional. And, if you're already a web pro, you can become a better one.

There are many beginner-level introductions to web technologies, but this book is the first to tie together intermediate-level discussions of all the key technologies for creating professional, maintainable, accessible web sites. You may already be an expert in one or two of the topics covered by this book, but few people know all of them as well as Elliotte, and he's very good at explaining them. (I know XML pretty well, but this book has shown me that some simple changes to my CSS habits will benefit all of the web pages I've created.)

For each recommendation in the book, Elliotte lays out the motivation for why it's a good idea, the potential trade-offs for following the recommendation, and the mechanics of implementing it, giving you a full perspective on the how and why of each tip. For detecting problems, I'll stop short of comparing his use of smell imagery with Proust's, but it's pretty evocative nevertheless.

I've read several of Elliotte's books, but not all of them. When I heard that Refactoring HTML was on the way, I knew right away that I'd want to read it, and I was glad to get an advanced look. I learned a lot, and I know that you will, too.

—Bob DuCharme
Solutions Architect, Innodata Isogen

© Copyright Pearson Education. All rights reserved.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing 1 Customer Reviews
  • Anonymous

    Posted May 12, 2008

    use CSS and XHTML

    The Web means mostly webpages written in HTML. The popularity of HTML is overwhelming. Yet it has well known problems. There is no intrinsic separation of semantic content from presentation details. And the tag syntax is very sloppy. Harold explains in clear and strong terms why you should clean up your webpages. Mostly by using CSS and by making [and checking] that the pages are well formed and valid under XHTML. This is not a text on CSS, and if you are going to follow the precepts of the book, you will need another book, dedicated to CSS. The strength of Harold's message is in the clarity. He is trying to influence you in a top-down manner. To make these strategic decisions. For example, by going with CSS, you simplify maintenance. Because files are factored into CSS files, which layout people can work on, and semantic content files, which can be the purview of others who are more involved with intrinsic information processing. The latter files also have the advantage that they can be used with different types of display devices and programs, and not just for the typical web browser. Think of cellphones, or devices for the blind. The latter is another good point he makes. Writing pages that are also accessible to the blind is not just good for that reason. It lets you focus not on what the page looks like, but on what it means. Why is this good? Because it improves the chance that search engines will look at and positively classify your semantic files. Search engines often deprecate presentation instructions and CSS files. They are also looking for files with high semantic content. Also, by factoring using CSS files, the resultant set of files gets to be smaller, which reduces outgoing bandwidth from your web server. For large, popular sites, this can be a cost saving. While the writing of well formed and [better yet] XHTML-valid pages increases the chances that different browsers can accurately show the pages. The reason is that browsers have been written to pragmatically show HTML, where the tag structure is sloppy. To do this, a browser has to make certain display assumptions with a badly written file. The problem is that different browsers make different assumptions. And so some HTML files will not display well, or at all. There are also other smaller level tips scattered thru the book. Like suppose you have an image that shows essentially only text. Replace the image with text. Less bandwidth is consumed. Plus search engines don't really do much with images. [Image analysis is very intensive and hard.] So giving them more meaningful text instead of images helps your page ranking. As a side note, some spammers do precisely the opposite. They have images which are mostly to display text. To evade a search engine or antispam software that keys off suspicious text. In related wise, your should always have an alt attribute describing the image. Helps the blind visitor. But mostly it helps a search engine classify the image. There is one unintended ironical aspect of the book's last page. It talks about hiding your email address in the webpage, from screen scraper bots run by spammers harvesting email addresses. One way is to use JavaScript to generate the address. Where the script is run by the visitor's browser as it displays the page. This is to evade spammers. The irony is that a spammer can use this very method, when sending spam email. Many antispam programs now use a blacklist, since spam often has links to the spammer's domain. But the programs usually [always?] check against static links in an email. The spammer can write JavaScript that dynamically makes links, to evade this. Sure, browsers that have JavaScript turned off will not show these links. But in fact, most users turn JavaScript on, because many websites use it. And the spammer might figure that the loss of links due to no JavaScript is greatly outweighed by being able to evade the now almost axiomatic use of blacklists by antispam

    Was this review helpful? Yes  No   Report this review
Sort by: Showing 1 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)