- Shopping Bag ( 0 items )
With SGML For Dummies, you'll not only imagine those benefits but also reap them. Author ...
Ships from: Naperville, IL
Usually ships in 1-2 business days
With SGML For Dummies, you'll not only imagine those benefits but also reap them. Author Bill von Hagen provide practical, easy-to-understand coverage of topics such as
Plus, SGML For Dummies comes with a bonus CD-ROM containing valuable software, including
In This Chapter
Text formatting and word-processing programs were among the earliest programs written for interactive computer systems. I never found it much fun to write anything on a typewriter, even with erasable bond paper and a 55-gallon drum of White-out. Maybe I'm just not a very good typist or speller. It also can be hard to convince yourself to reorganize something after you've already typed 200 pages of copy.
Today's computer software for creative, business, and technical writing gives you the freedom to experiment, painlessly makes major changes in the tone and organization of your documents, and generally empowers writers. However, sometimes power can be a dangerous thing because you can spend more time worrying about the appearance of a document rather than its content. You can also create documents that are so thoroughly customized that they become nightmarish to work with if you want to do something as simple as changing the size of the paper they're printed on.
This chapter discusses how documents are written on a computer, shows different approaches to writing documents on a computer, and introduces the Standard Generalized Markup Language (SGML) as a solution to many of the problems that plague writers, writing groups, and documentation managers.
In documentation circles, the set of formatting commands associated with any element of a document is typically called the markup for that element. A markup language is the set of commands that you use to tell your word-processing or publishing package how to format a specific part of your document. The term "markup" has its roots in the history of printing -- editors used to "mark up" the copy for a newspaper or book, writing down formatting instructions for the person who was setting the type. When looking at a document in a text editor, word-processing, or publishing package, a good way to think of markup is "everything in your document that isn't the text of the document." The computer uses the markup commands to do special things for you, such as making text bold, centering it, and so on.
The following is an example of a markup language, showing some of the formatting commands used by the standard UNIX document formatter, troff:
That's a lot of work simply to format a single part of your document, and it would be incredibly tedious if you have a large document. To simplify things, most word-processing and publishing software lets you define macros or styles that allow you to apply the same formatting to different parts of a document. These are specific formatting attributes that are associated with different elements of your document. For example, you may create a style that defines the way that you want top-level headings to appear in your document -- they should be numbered automatically, printed in 24-point bold Helvetica, and always flush-left on the page. Troff provides a macro package that lets you identify something as a title by simply saying
.TI "This is the Title of Chapter One"
Applying a style to a specific part of your document is frequently referred to as tagging that part of the document with that style.
When using older word-processing and publishing software like troff, you create your documents as text files, embed the appropriate markup commands, and then process them with troff to generate an output file that is formatted for your printer. You can create the text files for your document using whatever text editor you like. Today's graphical word-processing and publishing software frees you from having to use text files and embed cryptic commands in them, but internally they still do the same thing. Some software packages, such as Corel WordPerfect, allow you to examine what's going on under the covers. Figure 1-1 is a sample screen from WordPerfect showing its Reveal Codes feature.
I have used this example to show that markup languages aren't something old or archaic, though my troff example certainly is. Word-processing and publishing software always needs to associate different parts of your document with how they should be formatted. Markup languages are even somewhat coming back into vogue, thanks to HTML, the HyperText Markup Language that is used to create documents on the World Wide Web. However, like WordPerfect, which uses markup commands under the covers but doesn't show them to you unless you ask, many word-processing and publishing packages can produce HTML without your having to do anything special.
Since you've read this far, I'll let you into an ugly secret that few people know -- HTML is actually just one specific instance of SGML. More about this later. . . .
As mentioned in the previous section, there are two basic ways to write a document using a word-processing or publishing program. You can concentrate on the formatting of a document, assigning specific fonts, font sizes, and justification to each part of the document as you write it. Or you can focus on the logical organization of your document, making sure that it is composed of chapters that each have multiple sections that you can easily identify.
Concentrating on using markup commands to specify the formatting of a document is often referred to as procedural markup because it tells a specific word-processing or publishing package exactly what to do when displaying or printing certain parts of the document. Concentrating on identifying different parts of your document by what they mean within a document is often referred to as descriptive or logical markup because you use markup to describe the purpose of a certain part of your document. Your word-processing or publishing package then applies some specific style to that particular item.
People create two basic types of documents -- disposable documents, like personal letters, and durable documents, like the documentation for a software product. It doesn't really matter how you organize a letter or whether you use a specific set of styles in it because you'll probably never use it again after you write and mail it. Durable documents, like technical and user manuals, are documents that you and others will update and reissue over the lifetime of a product. Not all products actually have a lifetime, but we should at least hope that they will!
Focusing on how a document is formatted is fine for one-time documents because you simply want to write it once and make it look nice. However, for durable documents, you want to minimize the number of times that you manually tweak the formatting of different parts of the document. This is important for several reasons:
That's a lot of work to do throughout a large document. For large documents or for sets of documents that are supposed to look the same, we'll probably all agree that it's better to work in a way that lets you uniformly apply the same formatting attributes to parts of your document that serve the same purpose throughout a document.
Most word-processing and publishing software allows you to group the set of macros and styles that you want to use on a single project into style sheets, which all writers working on the same project use. Writers can either copy the style sheets onto their computers or share a single copy from a central source on a network. In theory, when everyone uses the same style sheet, finishes their work, and prints the final version of their piece of the document, you combine everyone's documents together, and they all look the same.
Unfortunately, that's rarely the case. Style sheets don't provide many of the consistency guarantees that you might like to see in your documents. Imagine a simple style sheet that contains these four elements:
Most applications that use style sheets allow you to specify a default value for the next element in your document. In this example, you can specify that the paragraph that follows a chapter heading has the intro paragraph style. You and any writers using this style sheet agree that you'll all follow this convention, maybe even writing it down into a style guide, and off you go.
Unfortunately, style sheets don't prevent people from ignoring conventions. If a style guide isn't available and someone forgets to tell a new writer what conventions to use, she might use the text paragraph style everywhere, even where she should use the intro paragraph style. Unless the two styles look glaringly different, you might not catch this mistake until after you print the document.
Style sheets also can't ensure that everything that's supposed be in a document is actually there. Nothing prevents you from accidentally forgetting to include an introductory paragraph in a new chapter, even though it's present in all the other chapters in a book.
Finally, you should know that style sheets are usually specific to a particular word-processing or publishing package. If you decide or need to switch to a different word processor or publishing system, you may not be able to transfer the style sheets into that program. You may have to reenter all of that information in the new system.
The Standard Generalized Markup Language (SGML) was created to solve many of the potential problems raised in the previous sections. SGML is the result of years of working with documents on computers. Here are the basic principles of SGML:
SGML rigorously enforces structure and consistency in your documentation, which is one of the reasons that it has become so popular. Because SGML documents conform to a specific structure, they are easier for computer programs to work with. For example, you can write programs that translate SGML documents into other common formats, such as the HyperText Markup Language (HTML), fairly easily because you can always predict the parts of a document that you can encounter next when translating it. Similarly, if you want to store an SGML document with many similar records (such as a catalog) in a database, you can set up and enforce the relationships between the different records that hold different parts of the document without too much trouble.
Writing documents in SGML is much like writing programs in a computer language because most SGML tools verify the structure of your document as you write it. They do this by enforcing syntax. Just like in the English language, SGML documents have a certain form that they have to follow. In English, sentences have to (or, at least, should) follow basic rules, like "subject verb object." In SGML, documents have to conform to some basic structure that says something like "documents consist of a title, followed by chapters and appendices." Enforcing the structure of a document prevents many common errors and also simplifies writing many types of documents. It's always clear what parts of a document can come next, and it's also impossible to forget to insert a mandatory part of a document. Although this can be constraining, it can help guarantee a structurally consistent and complete documentation set.
Anyone working in the computer field knows that it's impossible to pick up a computer-related text that isn't saturated with acronyms. Although they sometimes make text look like an explosion in an uppercase factory, acronyms do provide a way to quickly refer to concepts that would otherwise be a mouthful. BTW (By The Way), this book is no different.
SGML is an accepted standard defined by the International Standards Organization (ISO, a non-partisan group whose whole purpose is to promote standardization in the sciences) in ISO Standard # 8879. Because SGML is a standard, some industries require that documentation work be done in SGML. For example, any contractor or subcontractor doing work for the Department of Defense or many aerospace companies must submit any associated documentation in SGML form.
The markup used in SGML documents consists of elements, which are the building blocks of an SGML document. Each element is surrounded by a pair of beginning and ending expressions called tags. A sample sentence in SGML looks like this:
<PARAGRAPH><KEYWORD>SGML</KEYWORD> is very cool.</PARAGRAPH>
In this sentence,
<KEYWORD> are beginning tags, and
</PARAGRAPH> are their corresponding ending tags. In most cases, the ending tag associated with any beginning tag is the name of the begining tag preceded by the slash character -- there are some possible exceptions to this which are discussed in Chapter 3. Tags are usually referred to using the name of the opening tag, such as in the expression "the
<KEYWORD> tag." SGML tags do not have to be in uppercase, but I'm using that convention to make them stand out more in the examples used throughout this book.
SGML elements identify the purpose of the text that they contain. In this example, "SGML is very cool" is identified as a paragraph, and the word "SGML" is identified as a keyword within that paragraph. SGML elements have to be correctly nested within each other. For example, you should not close the
<PARAGRAPH> element before closing the
<KEYWORD> element. Most SGML software automatically prevents you from making such syntax errors. For example, incorrect markup of the preceding example would be the following.
<PARAGRAPH><KEYWORD>SGML</PARAGRAPH></KEYWORD> is very cool.
Elements are the fundamental building blocks of an SGML document. Tags identify the boundaries of an instance of an element, which is just a fancy way of saying "an element with some specific content."
The set of elements available to you when writing a document depends on the Document Type Definition (DTD) associated with that document. How you actually apply these elements and how they are displayed on the screen depends on the word-processing or publishing software you use. You can create your own DTD to define and enforce your own documentation requirements, or you can use one of many that are freely available on the Internet. Most word-processing and publishing software that supports SGML comes fully loaded with one or two of the most common DTDs. See Chapter 3 for more information on DTDs and how they are organized.
It's important to understand how SGML word-processing and publishing software provides support for different DTDs. An SGML document contains introductory markup that specifies the DTD used by that document. The SGML software then loads the files associated with that DTD, and away you go!
Because I haven't filled my quota of acronyms in this section, there's one more that's central to SGML's open approach to documentation. This is the Formatting Output Specification Instance, or FOSI. A FOSI is one common way of defining the formatting that is associated with each part of a DTD. You specify things like page size, margins, the fonts used by various elements, and so on, in the FOSI. You can create multiple FOSIs for a single DTD and then specify which one you want to use when you print your document. Just as with creating a DTD, the way you create a FOSI depends on which SGML word-processing or publishing package you use. See Chapter 9 for more information on FOSIs and other ways of specifying the formatting of an SGML document.
Different word-processing and publishing packages have different terms for central SGML concepts like the DTD and FOSI. Not all SGML software even uses a FOSI, but all SGML word-processing or publishing software has to have some way of defining how its SGML documents are formatted. If you use Adobe FrameMaker+SGML, you are probably on a first name basis with its Element Data Dictionary (EDD) files, which is its FOSI equivalent. If you use Corel WordPerfect, you are probably familiar with its Logic (LGC) files, which is a compiled form of its DTD, and its Layout Specification Instance (LSI) files, which is its FOSI equivalent. No matter what you call them, DTDs and FOSIs are critical pieces of the SGML puzzle because they define the structure of your documents and keep the content of your documents separate from their formatting.
Besides using different terminology, different SGML word-processing and publishing packages take very different approaches to how your document appears on the screen while you work on it. If you are used to What-You-See-Is-What-You-Get (WYSIWYG) word-processing and publishing software, you will be in for a surprise when you see various SGML tools. Some very popular and powerful SGML software packages, such as Arbortext Adept and Softquad's Author/Editor products, emphasize the structure of a document rather than try to display what it will look like when printed. I call packages like these "QUASIWYG" software packages because what you see on the screen is something like what you'll see when you print your document, but not really. These packages are also sometimes referred to as "WYSIWYN" (What-You-See-Is-What-You-Need), since their focus is on showing you the structure of a document plus some visual hints as to what parts of a document are different levels of headings, and so on. Figure 1-2 is a sample screen from a document in Arbortext Adept.
Other SGML software shows your documents in more or less the same way as they'll look when they're printed. Software such as Corel WordPerfect and Adobe's FrameMaker+SGML are good examples of this type of software. Figure 1-3 is a sample screen showing an SGML document in Corel WordPerfect. Figure 1-4 is a sample screen showing a document in Adobe FrameMaker+SGML.
The approach that different tools take to display your documents tends to show the roots of the company. Arbortext and Softquad have always produced SGML tools, so their emphasis has always been on structured documentation. Corel and Adobe produced word-processing and desktop publishing software long before becoming involved in SGML, so their SGML tools can't afford to alienate their existing customers.
Depending on how much documentation you already have, how large it is, and how it was originally written, switching to SGML can either be easy or difficult. When time and money are involved, "Don't fix it if it isn't already broken" is a common saying. I discuss some specific benefits of using SGML in detail in the next chapter, but here are some of the high points.
SGML can save you from worrying about some common issues that plague writers and documentation managers:
I go into these topics in more detail later, but you should be able to see the benefits that just these few basic points can bring. If you've ever struggled to make two documents look exactly the same, anguished over why a certain part of a document is formatting the way that it is, or kicked yourself for not noticing a problem until after a document was printed, you should realize that SGML can help eliminate these sorts of problems. Hindsight is 20/20 -- you can't fix the past, but you can learn from it and plan for a better future.
Because SGML separates the content of a document from how that document appears when it's printed, writers and documentation groups that use SGML can be more productive than others. In part, this is because they do not have to spend lots of time concentrating on formatting details. They also have the ability to make better use of the documents that they've already written, sharing information between documents, easily producing abstracts, extracts, or catalogs.
An interesting anecdote is that my copy of The SGML Handbook by Charles Goldfarb, the official bible of SGML, is bound upside down -- if you open the front cover, you see the last page, upside down. Clearly, SGML is not the cure for every documentation problem! SGML may not even be right or cost-effective for you or the types of writing that you do. Like any other change to how you currently work, it costs more than just the purchase price of the software. You may have to learn to write documents in a structured fashion or learn a new set of styles and conventions. For many writers, focusing on the elements that make up a document requires some rethinking of how they work, simply because they may never have thought of documents in a structured fashion before. Also, you or someone you work with will have to become an expert in the SGML tools you use. There's a startup cost in every home improvement.
A short history of SGML
SGML is a descendent of two efforts for standardizing the documentation industry in the late 1960s, coming from different ends of the spectrum. One was an industry effort to standardize the control codes used internally by composition hardware, which sets the type used when printing a book. The other was an effort at IBM to develop a common set of standards for creating its internal and product documentation.
Prior to the late 1960s, the layout and formatting information for specific printers and typesetters was embedded in documents that were to be printed on those devices. If you wanted to get a different firm to print your documents on different hardware, you usually had to pay a conversion fee to convert everything to the codes used by the new hardware. You also had to factor the time required for the conversion into your schedule. To try to solve this problem, the Graphic Communications Association (GCA for you acronym buffs) created GenCode to standardize the formatting and layout codes used by different printing and typesetting hardware.
Starting in 1969, a group at IBM led by Charles Goldfarb, the father of SGML, developed the Generic Markup Language (GML) to build upon the ground laid by the GenCode initiative. GML added the notion of defining a document type that specified the relationships between all of the parts of a document.
In the late 1970s, the American National Standards Institute (ANSI), the people who brought you such popular standards as ASCII (American Standard Code for Information Interchange), established a committee to build on the ideas introduced by GML. ANSI wanted to develop a truly standard markup language. It brought together people who had worked on GML, such as Goldfarb, with people who had worked on the GenCode project. The first draft of the SGML standard was published in 1980. The final text of the SGML standard was published in 1986.
The U.S. government started using SGML in 1983, when the Internal Revenue Service (I think we all know their acronym!) and the Department of Defense adopted a draft of the SGML standard. In 1987, the U.S. government's Computer Aided Logistics and Support (CALS) program (designed to develop formal procedures for any facet of governmental purchasing or contracting) organized a committee to examine SGML as a standard for government work. This is one case when the phrase "good enough for government work" is a good thing. In 1988, this committee published a military standard for SGML, MIL-M-28001.