SGML for Dummies


SGML (Standard Generalized Markup Language) begat HTML (HyperText Markup Language), the simple but essential coding scheme behind every one of the millions of pages that make up the World Wide Web. HTML is a mere subset of SGML, so just imagine the benefits of using a robust document formatting language that's independent of any particular computer platform or desktop publishing software.

With SGML For Dummies, you'll not only imagine those benefits but also reap them. Author ...

See more details below
Available through our Marketplace sellers.
Other sellers (Paperback)
  • All (15) from $1.99   
  • New (2) from $63.84   
  • Used (13) from $1.99   
Sort by
Page 1 of 1
Showing All
Note: Marketplace items are not eligible for any coupons and promotions
Seller since 2008

Feedback rating:



New — never opened or used in original packaging.

Like New — packaging may have been opened. A "Like New" item is suitable to give as a gift.

Very Good — may have minor signs of wear on packaging but item works perfectly and has no damage.

Good — item is in good condition but packaging may have signs of shelf wear/aging or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Acceptable — item is in working order but may show signs of wear such as scratches or torn packaging. All specific defects should be noted in the Comments section associated with each item.

Used — An item that has been opened and may show signs of wear. All specific defects should be noted in the Comments section associated with each item.

Refurbished — A used item that has been renewed or updated and verified to be in proper working condition. Not necessarily completed by the original manufacturer.


Ships from: Chicago, IL

Usually ships in 1-2 business days

  • Standard, 48 States
  • Standard (AK, HI)
Seller since 2015

Feedback rating:


Condition: New
Brand new.

Ships from: acton, MA

Usually ships in 1-2 business days

  • Standard, 48 States
  • Standard (AK, HI)
Page 1 of 1
Showing All
Sort by
Sending request ...


SGML (Standard Generalized Markup Language) begat HTML (HyperText Markup Language), the simple but essential coding scheme behind every one of the millions of pages that make up the World Wide Web. HTML is a mere subset of SGML, so just imagine the benefits of using a robust document formatting language that's independent of any particular computer platform or desktop publishing software.

With SGML For Dummies, you'll not only imagine those benefits but also reap them. Author Bill von Hagen provide practical, easy-to-understand coverage of topics such as

  • What SGML is, where it came from, and where it's heading
  • How SGML can solve otherwise intractable documentation problems
  • Whether using SGML makes sense for your business or organization
  • Whether you should create your own SGML document types or try to work with established ones
  • How to convert your existing documents into SGML efficiently
  • How to get the best out of both SGML and desktop publishing

Plus, SGML For Dummies comes with a bonus CD-ROM containing valuable software, including

  • A demo version of Corel WordPerfect, that includes the Corel Logic Compiler and Layout Designer for a complete SGML solution
  • Digitome's Electronic Publishing's IDM Personal Edition -- a powerful software package for generating Windows Help, RTF, and Lotus Notes files from your SGML files
  • Sample SGML applications from SGML Systems Engineering
Read More Show Less

Product Details

  • ISBN-13: 9780764501753
  • Publisher: Wiley, John & Sons, Incorporated
  • Publication date: 7/28/1998
  • Series: For Dummies Series
  • Edition number: 1
  • Pages: 408
  • Product dimensions: 7.48 (w) x 9.23 (h) x 1.05 (d)

First Chapter

Chapter 1
Writing Documents on a Computer: Form(at) vs. Function

In This Chapter

  • Figuring out what a markup language is
  • To manually format or not to manually format, that is the question
  • Aren't style sheets good enough?
  • Introducing SGML
  • Thinking about SGML's features and benefits

Text formatting and word-processing programs were among the earliest programs written for interactive computer systems. I never found it much fun to write anything on a typewriter, even with erasable bond paper and a 55-gallon drum of White-out. Maybe I'm just not a very good typist or speller. It also can be hard to convince yourself to reorganize something after you've already typed 200 pages of copy.

Today's computer software for creative, business, and technical writing gives you the freedom to experiment, painlessly makes major changes in the tone and organization of your documents, and generally empowers writers. However, sometimes power can be a dangerous thing because you can spend more time worrying about the appearance of a document rather than its content. You can also create documents that are so thoroughly customized that they become nightmarish to work with if you want to do something as simple as changing the size of the paper they're printed on.

This chapter discusses how documents are written on a computer, shows different approaches to writing documents on a computer, and introduces the Standard Generalized Markup Language (SGML) as a solution to many of the problems that plague writers, writing groups, and documentation managers.

What Is a Markup Language?

In documentation circles, the set of formatting commands associated with any element of a document is typically called the markup for that element. A markup language is the set of commands that you use to tell your word-processing or publishing package how to format a specific part of your document. The term "markup" has its roots in the history of printing -- editors used to "mark up" the copy for a newspaper or book, writing down formatting instructions for the person who was setting the type. When looking at a document in a text editor, word-processing, or publishing package, a good way to think of markup is "everything in your document that isn't the text of the document." The computer uses the markup commands to do special things for you, such as making text bold, centering it, and so on.

The following is an example of a markup language, showing some of the formatting commands used by the standard UNIX document formatter, troff:

.ps 24
\fBThis is the Title of Chapter One\fP
.ps 11
.ad 1
The special codes used in troff are lines that begin with a period and three-character sequences starting with a backslash (\). Line-by-line, this sample troff code says
  • Start centering the following text.
  • Change the point size to 24 point.
  • Switch to a bold font, display the text "This is the Title of Chapter One," and revert to the previous font.
  • Change the point size to 11 point.
  • Switch to a filled, left-justified environment.

That's a lot of work simply to format a single part of your document, and it would be incredibly tedious if you have a large document. To simplify things, most word-processing and publishing software lets you define macros or styles that allow you to apply the same formatting to different parts of a document. These are specific formatting attributes that are associated with different elements of your document. For example, you may create a style that defines the way that you want top-level headings to appear in your document -- they should be numbered automatically, printed in 24-point bold Helvetica, and always flush-left on the page. Troff provides a macro package that lets you identify something as a title by simply saying

.TI "This is the Title of Chapter One"

Applying a style to a specific part of your document is frequently referred to as tagging that part of the document with that style.

When using older word-processing and publishing software like troff, you create your documents as text files, embed the appropriate markup commands, and then process them with troff to generate an output file that is formatted for your printer. You can create the text files for your document using whatever text editor you like. Today's graphical word-processing and publishing software frees you from having to use text files and embed cryptic commands in them, but internally they still do the same thing. Some software packages, such as Corel WordPerfect, allow you to examine what's going on under the covers. Figure 1-1 is a sample screen from WordPerfect showing its Reveal Codes feature.

I have used this example to show that markup languages aren't something old or archaic, though my troff example certainly is. Word-processing and publishing software always needs to associate different parts of your document with how they should be formatted. Markup languages are even somewhat coming back into vogue, thanks to HTML, the HyperText Markup Language that is used to create documents on the World Wide Web. However, like WordPerfect, which uses markup commands under the covers but doesn't show them to you unless you ask, many word-processing and publishing packages can produce HTML without your having to do anything special.

Since you've read this far, I'll let you into an ugly secret that few people know -- HTML is actually just one specific instance of SGML. More about this later. . . .

Nightmare on Format Street

As mentioned in the previous section, there are two basic ways to write a document using a word-processing or publishing program. You can concentrate on the formatting of a document, assigning specific fonts, font sizes, and justification to each part of the document as you write it. Or you can focus on the logical organization of your document, making sure that it is composed of chapters that each have multiple sections that you can easily identify.

Concentrating on using markup commands to specify the formatting of a document is often referred to as procedural markup because it tells a specific word-processing or publishing package exactly what to do when displaying or printing certain parts of the document. Concentrating on identifying different parts of your document by what they mean within a document is often referred to as descriptive or logical markup because you use markup to describe the purpose of a certain part of your document. Your word-processing or publishing package then applies some specific style to that particular item.

Choosing your focus: disposable versus durable

People create two basic types of documents -- disposable documents, like personal letters, and durable documents, like the documentation for a software product. It doesn't really matter how you organize a letter or whether you use a specific set of styles in it because you'll probably never use it again after you write and mail it. Durable documents, like technical and user manuals, are documents that you and others will update and reissue over the lifetime of a product. Not all products actually have a lifetime, but we should at least hope that they will!

Problems with durable documents

Focusing on how a document is formatted is fine for one-time documents because you simply want to write it once and make it look nice. However, for durable documents, you want to minimize the number of times that you manually tweak the formatting of different parts of the document. This is important for several reasons:

  • Documents that contain a lot of manual formatting are more difficult to maintain. It's not always easy to tell that something has been changed manually. Even if you're writing in a program that lets you see the exact formatting codes for each piece of the document, it's often hard to spot small changes. Making small changes in manual formatting throughout a documentation set is timeconsuming and painful!
  • You often make small changes in response to the way a specific word or paragraph currently looks in your document. The next time you modify the document, a page break or relationship between two paragraphs may not matter anymore or may actually do the wrong thing.
  • Similar to the preceding bullet, you often make small changes to a document to "fix" the way it prints on a specific printer or from a specific word-processing or publishing application. If you switch printers or word-processing programs, your document may format or print incorrectly.

That's a lot of work to do throughout a large document. For large documents or for sets of documents that are supposed to look the same, we'll probably all agree that it's better to work in a way that lets you uniformly apply the same formatting attributes to parts of your document that serve the same purpose throughout a document.

Aren't Style Sheets Good Enough?

Most word-processing and publishing software allows you to group the set of macros and styles that you want to use on a single project into style sheets, which all writers working on the same project use. Writers can either copy the style sheets onto their computers or share a single copy from a central source on a network. In theory, when everyone uses the same style sheet, finishes their work, and prints the final version of their piece of the document, you combine everyone's documents together, and they all look the same.

Unfortunately, that's rarely the case. Style sheets don't provide many of the consistency guarantees that you might like to see in your documents. Imagine a simple style sheet that contains these four elements:

  • Chapter heading -- the title of a chapter in your document. This uses a specific font, is automatically numbered, and so on.
  • Intro paragraph -- an introductory paragraph following a chapter heading that describes the contents of a chapter.
  • Text paragraph -- a normal paragraph of text in your document.
  • Section heading -- the title of a section in your document.

Most applications that use style sheets allow you to specify a default value for the next element in your document. In this example, you can specify that the paragraph that follows a chapter heading has the intro paragraph style. You and any writers using this style sheet agree that you'll all follow this convention, maybe even writing it down into a style guide, and off you go.

Unfortunately, style sheets don't prevent people from ignoring conventions. If a style guide isn't available and someone forgets to tell a new writer what conventions to use, she might use the text paragraph style everywhere, even where she should use the intro paragraph style. Unless the two styles look glaringly different, you might not catch this mistake until after you print the document.

Style sheets also can't ensure that everything that's supposed be in a document is actually there. Nothing prevents you from accidentally forgetting to include an introductory paragraph in a new chapter, even though it's present in all the other chapters in a book.

Finally, you should know that style sheets are usually specific to a particular word-processing or publishing package. If you decide or need to switch to a different word processor or publishing system, you may not be able to transfer the style sheets into that program. You may have to reenter all of that information in the new system.

Introducing SGML

The Standard Generalized Markup Language (SGML) was created to solve many of the potential problems raised in the previous sections. SGML is the result of years of working with documents on computers. Here are the basic principles of SGML:

  • It provides you with a way to define the structure of your documents and to easily identify the different parts of that structure.
  • It separates the structure and content of your documents from their appearance.
  • It is an open approach to documentation that is not tied to a specific word-processing or publishing package. Many popular word-processing and publishing packages support SGML, either directly or through add-in software.

SGML rigorously enforces structure and consistency in your documentation, which is one of the reasons that it has become so popular. Because SGML documents conform to a specific structure, they are easier for computer programs to work with. For example, you can write programs that translate SGML documents into other common formats, such as the HyperText Markup Language (HTML), fairly easily because you can always predict the parts of a document that you can encounter next when translating it. Similarly, if you want to store an SGML document with many similar records (such as a catalog) in a database, you can set up and enforce the relationships between the different records that hold different parts of the document without too much trouble.

Writing documents in SGML is much like writing programs in a computer language because most SGML tools verify the structure of your document as you write it. They do this by enforcing syntax. Just like in the English language, SGML documents have a certain form that they have to follow. In English, sentences have to (or, at least, should) follow basic rules, like "subject verb object." In SGML, documents have to conform to some basic structure that says something like "documents consist of a title, followed by chapters and appendices." Enforcing the structure of a document prevents many common errors and also simplifies writing many types of documents. It's always clear what parts of a document can come next, and it's also impossible to forget to insert a mandatory part of a document. Although this can be constraining, it can help guarantee a structurally consistent and complete documentation set.

Acronym alert!

Anyone working in the computer field knows that it's impossible to pick up a computer-related text that isn't saturated with acronyms. Although they sometimes make text look like an explosion in an uppercase factory, acronyms do provide a way to quickly refer to concepts that would otherwise be a mouthful. BTW (By The Way), this book is no different.

SGML is an accepted standard defined by the International Standards Organization (ISO, a non-partisan group whose whole purpose is to promote standardization in the sciences) in ISO Standard # 8879. Because SGML is a standard, some industries require that documentation work be done in SGML. For example, any contractor or subcontractor doing work for the Department of Defense or many aerospace companies must submit any associated documentation in SGML form.

Basic concepts

The markup used in SGML documents consists of elements, which are the building blocks of an SGML document. Each element is surrounded by a pair of beginning and ending expressions called tags. A sample sentence in SGML looks like this:


In this sentence, <PARAGRAPH> and <KEYWORD> are beginning tags, and </KEYWORD> and </PARAGRAPH> are their corresponding ending tags. In most cases, the ending tag associated with any beginning tag is the name of the begining tag preceded by the slash character -- there are some possible exceptions to this which are discussed in Chapter 3. Tags are usually referred to using the name of the opening tag, such as in the expression "the <KEYWORD> tag." SGML tags do not have to be in uppercase, but I'm using that convention to make them stand out more in the examples used throughout this book.

SGML elements identify the purpose of the text that they contain. In this example, "SGML is very cool" is identified as a paragraph, and the word "SGML" is identified as a keyword within that paragraph. SGML elements have to be correctly nested within each other. For example, you should not close the <PARAGRAPH> element before closing the <KEYWORD> element. Most SGML software automatically prevents you from making such syntax errors. For example, incorrect markup of the preceding example would be the following.


Elements are the fundamental building blocks of an SGML document. Tags identify the boundaries of an instance of an element, which is just a fancy way of saying "an element with some specific content."

The set of elements available to you when writing a document depends on the Document Type Definition (DTD) associated with that document. How you actually apply these elements and how they are displayed on the screen depends on the word-processing or publishing software you use. You can create your own DTD to define and enforce your own documentation requirements, or you can use one of many that are freely available on the Internet. Most word-processing and publishing software that supports SGML comes fully loaded with one or two of the most common DTDs. See Chapter 3 for more information on DTDs and how they are organized.

It's important to understand how SGML word-processing and publishing software provides support for different DTDs. An SGML document contains introductory markup that specifies the DTD used by that document. The SGML software then loads the files associated with that DTD, and away you go!

Because I haven't filled my quota of acronyms in this section, there's one more that's central to SGML's open approach to documentation. This is the Formatting Output Specification Instance, or FOSI. A FOSI is one common way of defining the formatting that is associated with each part of a DTD. You specify things like page size, margins, the fonts used by various elements, and so on, in the FOSI. You can create multiple FOSIs for a single DTD and then specify which one you want to use when you print your document. Just as with creating a DTD, the way you create a FOSI depends on which SGML word-processing or publishing package you use. See Chapter 9 for more information on FOSIs and other ways of specifying the formatting of an SGML document.

Different strokes for different folks

Different word-processing and publishing packages have different terms for central SGML concepts like the DTD and FOSI. Not all SGML software even uses a FOSI, but all SGML word-processing or publishing software has to have some way of defining how its SGML documents are formatted. If you use Adobe FrameMaker+SGML, you are probably on a first name basis with its Element Data Dictionary (EDD) files, which is its FOSI equivalent. If you use Corel WordPerfect, you are probably familiar with its Logic (LGC) files, which is a compiled form of its DTD, and its Layout Specification Instance (LSI) files, which is its FOSI equivalent. No matter what you call them, DTDs and FOSIs are critical pieces of the SGML puzzle because they define the structure of your documents and keep the content of your documents separate from their formatting.

Besides using different terminology, different SGML word-processing and publishing packages take very different approaches to how your document appears on the screen while you work on it. If you are used to What-You-See-Is-What-You-Get (WYSIWYG) word-processing and publishing software, you will be in for a surprise when you see various SGML tools. Some very popular and powerful SGML software packages, such as Arbortext Adept and Softquad's Author/Editor products, emphasize the structure of a document rather than try to display what it will look like when printed. I call packages like these "QUASIWYG" software packages because what you see on the screen is something like what you'll see when you print your document, but not really. These packages are also sometimes referred to as "WYSIWYN" (What-You-See-Is-What-You-Need), since their focus is on showing you the structure of a document plus some visual hints as to what parts of a document are different levels of headings, and so on. Figure 1-2 is a sample screen from a document in Arbortext Adept.

Other SGML software shows your documents in more or less the same way as they'll look when they're printed. Software such as Corel WordPerfect and Adobe's FrameMaker+SGML are good examples of this type of software. Figure 1-3 is a sample screen showing an SGML document in Corel WordPerfect. Figure 1-4 is a sample screen showing a document in Adobe FrameMaker+SGML.

The approach that different tools take to display your documents tends to show the roots of the company. Arbortext and Softquad have always produced SGML tools, so their emphasis has always been on structured documentation. Corel and Adobe produced word-processing and desktop publishing software long before becoming involved in SGML, so their SGML tools can't afford to alienate their existing customers.

Features and Benefits of SGML

Depending on how much documentation you already have, how large it is, and how it was originally written, switching to SGML can either be easy or difficult. When time and money are involved, "Don't fix it if it isn't already broken" is a common saying. I discuss some specific benefits of using SGML in detail in the next chapter, but here are some of the high points.

SGML can save you from worrying about some common issues that plague writers and documentation managers:

  • Is a document organized correctly? An SGML DTD defines the relationship between different parts of a document, such as where certain parts can appear. For example, level one headings can only appear in a chapter, level two headings only in a section, and so on.
  • Is a document complete? Using SGML, you can specify that certain parts of a document are required and easily identify sections that are missing.
  • Do similar parts of a document look the same when they are printed? Parts of an SGML document with the same purpose (headings, paragraphs, lists, and so on) print exactly the same way.
  • Can I produce documents in different styles and formats from a single documentation set? By separating the structure and appearance of your documents, SGML makes it easy to change output formats, page sizes, and printers.
  • What am I going to do if I ever switch word-processing or publishing packages? SGML is a standard that is independent of specific software packages. You can edit and use SGML documents in any SGML tool that can use your DTD.
  • How can I reuse text in different documents? SGML makes it easy to create documents out of many smaller pieces. These modular documents help you share boilerplate information between different documents, which is typically introductory or background information that you want to write once but use in many places.

I go into these topics in more detail later, but you should be able to see the benefits that just these few basic points can bring. If you've ever struggled to make two documents look exactly the same, anguished over why a certain part of a document is formatting the way that it is, or kicked yourself for not noticing a problem until after a document was printed, you should realize that SGML can help eliminate these sorts of problems. Hindsight is 20/20 -- you can't fix the past, but you can learn from it and plan for a better future.

Because SGML separates the content of a document from how that document appears when it's printed, writers and documentation groups that use SGML can be more productive than others. In part, this is because they do not have to spend lots of time concentrating on formatting details. They also have the ability to make better use of the documents that they've already written, sharing information between documents, easily producing abstracts, extracts, or catalogs.

An interesting anecdote is that my copy of The SGML Handbook by Charles Goldfarb, the official bible of SGML, is bound upside down -- if you open the front cover, you see the last page, upside down. Clearly, SGML is not the cure for every documentation problem! SGML may not even be right or cost-effective for you or the types of writing that you do. Like any other change to how you currently work, it costs more than just the purchase price of the software. You may have to learn to write documents in a structured fashion or learn a new set of styles and conventions. For many writers, focusing on the elements that make up a document requires some rethinking of how they work, simply because they may never have thought of documents in a structured fashion before. Also, you or someone you work with will have to become an expert in the SGML tools you use. There's a startup cost in every home improvement.

A short history of SGML

SGML is a descendent of two efforts for standardizing the documentation industry in the late 1960s, coming from different ends of the spectrum. One was an industry effort to standardize the control codes used internally by composition hardware, which sets the type used when printing a book. The other was an effort at IBM to develop a common set of standards for creating its internal and product documentation.

Prior to the late 1960s, the layout and formatting information for specific printers and typesetters was embedded in documents that were to be printed on those devices. If you wanted to get a different firm to print your documents on different hardware, you usually had to pay a conversion fee to convert everything to the codes used by the new hardware. You also had to factor the time required for the conversion into your schedule. To try to solve this problem, the Graphic Communications Association (GCA for you acronym buffs) created GenCode to standardize the formatting and layout codes used by different printing and typesetting hardware.

Starting in 1969, a group at IBM led by Charles Goldfarb, the father of SGML, developed the Generic Markup Language (GML) to build upon the ground laid by the GenCode initiative. GML added the notion of defining a document type that specified the relationships between all of the parts of a document.

In the late 1970s, the American National Standards Institute (ANSI), the people who brought you such popular standards as ASCII (American Standard Code for Information Interchange), established a committee to build on the ideas introduced by GML. ANSI wanted to develop a truly standard markup language. It brought together people who had worked on GML, such as Goldfarb, with people who had worked on the GenCode project. The first draft of the SGML standard was published in 1980. The final text of the SGML standard was published in 1986.

The U.S. government started using SGML in 1983, when the Internal Revenue Service (I think we all know their acronym!) and the Department of Defense adopted a draft of the SGML standard. In 1987, the U.S. government's Computer Aided Logistics and Support (CALS) program (designed to develop formal procedures for any facet of governmental purchasing or contracting) organized a committee to examine SGML as a standard for government work. This is one case when the phrase "good enough for government work" is a good thing. In 1988, this committee published a military standard for SGML, MIL-M-28001.

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)