Read an Excerpt
Chapter 1: The XML GalaxyXML stands for the eXtensible Markup Language. It is a new markup language, developed by the W3C (World Wide Web Consortium), mainly to overcome limitations in HTML. The W3C is the organization in charge of the development and maintenance of most Web standards, most notably HTML. For more information on the W3C, visit its Web site at www. w3. o rg.
HTML is an immensely popular markup language. According to some studies there are 800 million Web pages, all based on HTML. HTML is supported by thousands of applications including browsers, editors, email software, databases, contact managers, and more.
Originally, the Web was a solution to publish scientific documents. Today it has grown into a full-fledged medium, equal to print and TV. More importantly, the Web is an interactive medium because it supports applications such as online shops, electronic banking, and trading and forums.
To accommodate this phenomenal popularity, HTML has been extended over the years. Many new tags have been introduced. The first version of HTML had a dozen tags; the latest version (HTML 4.0) is close to 100 tags (not counting browser-specific tags).
However, everything is not rosy with HTML. It has grown into a complex language. At almost 100 tags, it is definitively not a small language. The combinations of tags are almost endless and the result of a particular combination of tags might be different from one browser to another.
Finally, despite all these tags already included in HTML, more are needed. Electronic commerce applications need tags for product references, prices, name, addresses, and more. Streaming needs tags to control the flow of images and sound. Search engines need more precise tags for keywords and description. Security needs tags for signing. The list of applications that need new HTML tags is almost endless.
However, adding even more tags to an overblown language is hardly a satisfactory solution. It appears that HTML is already on the verge of collapsing under its own weight, so why continue adding tags?
Worse, although many applications need more tags, some applications would greatly benefit if there were less, not more, tags in HTML. The W3C expects that by the year 2002, 75% of surfers won't be using a PC. Rather, they will access the Web from a personal digital assistant, such as the popular PalmPilot, or from so-called smart phones.
These machines are not as powerful as PCs. They cannot process a complex language like HTML, much less a version of HTML that would include more tags.
Another, but related, problem is that it takes many tags to format a page. It is not uncommon to see pages that have more markup than content! These pages are slow to download and to display.
In conclusion, even though HTML is a popular and successful markup language, it has some major shortcomings. XML was developed to address these shortcomings. It was not introduced for the sake of novelty.
XML exists because HTML was successful. Therefore, XML incorporates many successful features of HTML. XML also exists because HTML could not live up to new demands. Therefore, XML breaks new ground where it is appropriat.
It is difficult to change a successful technology like HTML so, notsurprisingly, XML has raised some level of controversy.
Let's make it clear: XML is unlikely to replace HTML in the near or medium-term. XML does not threaten the Web but introduces new possibilities. Work is already under way to combine XML and HTML in XHTML, an XML version of HTML. At the time of this writing, XHTML version 1.0 is not finalized yet. However, it is expected that XHTML will soon be adopted by the W3C.
Some of the areas where XML will be useful in the near-term include:
- large Web site maintenance. XML would work behind the scene to simplify the creation of HTML documents
- exchange of information between organizations
- offloading and reloading of databases
- syndicated content, where content is being made available to different Web sites
- electronic commerce applications where different organizations collaborate to serve a customer
- scientific applications with new markup languages for mathematical and chemical formulas
- electronic books with new markup languages to express rights and ownership
- handheld devices and smart phones with new markup languages optimized for these "alternative" devices
You don't need to know Java to read this book. There is very little Java involved (again, most of the code in the final example is based on techniques that you will learn in this book) and Appendix A, "Crash Course on Java," will teach you just enough Java to understand the examples.
A First Look at XML
The idea behind XML is deceptively simple. It aims at answering the conflicting demands that arrive at the W3C for the future of HTML.
On one hand, people need more tags. And these new tags are increasingly specialized. For example, mathematicians want tags for formulas. Chemists also want tags for formulas but they are not the same.
On the other hand, authors and developers want fewer tags. HTML is already so complex! As handheld devices gain in popularity, the need for a simpler markup language also is apparent because small devices, like the PaImPilot, are not powerful enough to process HMTL pages...