The XML and SGML Cookbook: Recipes for Structured Information

The XML and SGML Cookbook: Recipes for Structured Information

by Richard A. Jelliffe

Paperback(BK&CD ROM)

$47.79 $54.99 Save 13% Current price is $47.79, Original price is $54.99. You Save 13%.

Temporarily Out of Stock Online

Eligible for FREE SHIPPING

Overview

The XML and SGML Cookbook: Recipes for Structured Information by Richard A. Jelliffe

Every month, the demand for SGML expertise grows-yet few people have mastered this breakthrough technology for managing information. With The XML & SGML Cookbook, you can move from SGML novice to expert faster than ever before. Based on a successful training course, this book provides dozens of instantly-usable Document Type Definition (DTD) "recipes" for virtually every type of document - and it delivers a practical understanding of document structure, patterns and form, so you can go "beyond the cookbook."

  • Proven recipes for all the most common editorial structures.
  • Databases, tables, forms, lists, and multiple-version documents.
  • Frontmatter, metadata, formatting, and backmatter.
  • Practical tips and warnings for SGML, XML, HTML, TEI, and CALS publishing.
  • Detailed coverage of building documents for international use.
  • All DTDs on CD-ROM - plus extensive state-of-the-art SGML tools!

Quickly learn the skills and sensitivities it's taken SGML experts years to develop.

Discover how to manage critical tradeoffs between simplicity and richness, and between immediate and future applications.

Learn to build DTDs that serve the needs of different users and different media-using techniques that are equally applicable in both SGML and XML environments.

The CD-ROM contains all the book's DTDs, plus an extensive library of great SGML tools, including EditTime SGML Editor sampler and OmniMark Light sampler.

Whether you're a publishing manager, information professional, system integrator or anyone else who needs stronger SGML expertise fast, there's no better solution than The XML & SGML Cookbook.

Product Details

ISBN-13: 9780136142232
Publisher: Prentice Hall Professional Technical Reference
Publication date: 05/20/1998
Series: Charles F. Goldfarb Series on Structured Information Management Series
Edition description: BK&CD ROM
Pages: 656
Product dimensions: 7.02(w) x 9.21(h) x 1.75(d)

Read an Excerpt

PREFACE: Preface

As I complete this book today, I have two exciting new publications fresh from the Net, still warm from the laser printer: the final texts of XML 1.0 (the World Wide Web Consortium's subset of SGML, the Extensible Markup Language) and Web SGML (the corrections and enhancements to the SGML standard, ISO 8879, for WWW uses such as XML).



These two publications together revolutionize electronic publishing; they make SGML's traditional advantages for large-scale corporate publishing available at the desktop. Most importantly, they give you, the owner of data and the creator of data systems, control of your documents: SGML and its Web-optimized subset XML represent the triumph of the Open Systems movement. Not only can we have open systems for network protocols and languages, but also for the data which animate them.

This book is a practical guide for this brave new world - ideas and declarations for SGML and XML element type sets and entity sets that implement the important and useful document structures. I have tried to put in many resources which may be hard to track down otherwise. And I have paid particular attention to trying to bring out the deeper model in XML, one which will not be familiar to readers coming from HTML or proprietary markup languages: processing instructions and notations in particular.

This is not a tutorial on syntax: there are many excellent books available, including the previous books in this series. In particular I have avoided reference to SGML declarations and to the more rarely implemented, optional features of SGML. I have included a particularly detailed treatment ofone area for which there has until now been little detailed treatment: characters, glyphs and internationalization.

Order, Structures, Patterns & Forms
This book is about order, structures, patterns and forms.
In this book, order means the underlying, abstract (and sometimes ineffable) relationships and natures of things, structure means how some order is captured in some concrete markup, pattern means a kind of template or recipe used for creating structures (i.e., "pattern" in the dressmaking sense, not the text-processing sense), and form means a particular conformance between one structure and another (i.e., "form" in the concrete-laying sense, not the metaphysical sense).

Like any good cookbook, as well as showing how to make a structure, The XML & SGML Cookbook also tries to explain why, to explore the alternatives, and give the various pros and cons.

The freedom of a highly generalized technology like SGML can cause unease for new document-system designers. Being able to move in any direction is not much comfort if you cannot afford to go in the wrong direction! Fortunately, during the ten years of SGML's existence as an International Standard, convergent approaches and solutions to many common document patterns have emerged. This book attempts to catalog and discuss the best, the most instructive and the most useful of them.

Document Systems

A consideration of system constraints and factors outside the scope of the document type declaration is so often the thing that makes a project successful. Because of this, this book is aimed at the document-system designer rather than just the "DTD writer." Information is never managed in a vacuum; documents exist as part of a document system. Sometimes the system is closed, sometimes open-ended. If the information is valuable enough to warrant management, your whole document system should usefully be considered when creating a great DTD.

In any case, this book also will be of use to those who generate XML documents, and who might not ever even create formal declarations for elements types using SGML as their notation. If you are one of these people, I urge you to learn and attempt to use SGML content models in your informal documentation at least: SGML provides a very convenient and well-thought-out notation, suitable for many kinds of structures, and there are graphical visualization tools available to help.

So even though order can be discovered in all kinds of places in documents, many times the structures are loose, have exceptions or are incomplete. Consequently, the patterns for element type sets in this book are presented as prototypes and exemplars tha t you can take and reshape to your particular needs, rather than as templates which you must obediently cut and paste.

The document-system designer needs to be aware of the limits of DTD elegance. A pattern that the designer may perceive as the archetype for authors may in fact merely be a stereotype of their needs. A pattern can only be used successfully to reveal some actual order, never to impose a spurious order.

Document-system designers tend to have neat and schematic minds that reject disorder, sometimes at the price of wanting to see order where there is none: a mirage from some previous document. So this book, as well as giving patterns, also gives some principles for selecting patterns. The need for elegance must be moderated by the need for success. I hope that readers coming to this book expecting neat cookie-cutter solutions will be empowered and enabled; you will understand the issues and tradeoffs most appropriate for your individual needs.

Terminology

XML is bringing a rich influx of people from different disciplines and technologies into the SGML world, and so there is quite a variety and duplication of terminology. In order to keep sentences under control, I have used some common simplified terms which emphasize the SGML keywords used. This Book ISO-ese ANY element element having a declared content type of ANY CDATA attribute attribute having a declared value of CDATA CDATA element element having a declared content type of CDATA CDATA entity CDATA entity CDATA marked section CDATA marked section container element element having subelements EMPTY element empty element ID attribute attribute having a declared value of ID IDREF attribute attribute having declared value of IDREF NDATA entity NDATA entity NMTOKEN attribute attribute having declared value of NMTOKEN RCDATA element element having a declared content type of RCDATA SDATA entity SDATA entity SUBDOC entity subdocument entity

In this book, an attribute ID means an attribute with the name ID; an ID attribute means an attribute having the declared value ID; but the attribute ID means that attribute with the name ID in the example snippet. It is good usage that an attribute ID should be an ID attribute and that an attribute IDREF should be an IDREF attribute.

I intend to maintain a Web page giving any errata for this book, at the Prentice Hall PTR Web site www.phptr.com.

Rick Jelliffe
Sydney, Australia

Table of Contents

Foreword xix
Preface xxi
Index of Patterns, Structures and Forms xxv
Part 1: Systems of Documents
Documents & Publications
3(26)
Explicit and Implicit Document Type
5(1)
Love, Bloat and Prudence
6(2)
A Six-View Model of Publications
8(11)
Viewing Page Layout
9(1)
Viewing Page Objects
10(1)
Viewing Glyphs
11(1)
Viewing Characters
12(1)
Viewing Editorial Structure
13(1)
Viewing Topic Structure
14(1)
The Flow of Dependence
15(4)
Fads, Trends, Polemics
19(2)
Conflicts of Interest: HTML and SMDL
21(2)
What about Non-Text?
23(2)
Documents versus APIs
25(4)
The Nature of Markup
29(18)
What is Good Markup?
32(3)
Generic and Specific Tagging
35(3)
Which is Better: Generic Markup or Specific Markup?
38(2)
Underlying Forms
40(3)
Embedding Other Kinds of Data
43(2)
The Worst DTD in the World?
45(2)
Software Engineering
47(26)
DTDs and Patterns
49(5)
Reusable Components
49(1)
Architectures
50(1)
Information Units
51(1)
Cohesion and Coupling
52(2)
Waterfalls and Spirals
54(6)
Diagrams
56(2)
Maler and el Andoloussi's Methodology
58(2)
Exploration and Prototypes
60(4)
Prototyping
61(1)
Exploratory DTD Design
62(2)
The Human Side
64(9)
Viewpoint Analysis
64(2)
Scenario Analysis
66(2)
User Interfaces are Documents
68(2)
Involvement
70(1)
Useful Skills
70(3)
Implementaion Choices
73(28)
DTD Style Checklist
74(10)
Do You Need Full SGML?
84(1)
Almost SGML
85(11)
Non-Standard Generalized Markup
86(1)
HTML
87(2)
XML
89(2)
Your Own Simplified SGML
91(4)
SGML with User Extensions
95(1)
Thumbs Rule, OK!
96(5)
Language Analogy
96(2)
Object Relationships
98(1)
Occurrence
99(1)
Sequence
100(1)
The Document in Use
101(24)
Declarations are Not Enough!
102(4)
Processing SGML
106(7)
SGML Tools
106(1)
Text Tools
107(3)
Storage Management Tools
110(2)
Hybrid Tools
112(1)
Groovy Steps with SGML
113(4)
The Scrub
13(102)
The Massage
115(1)
The Tweak
116(1)
Growth of DTDs
117(4)
Top 10 Reasons Why DTDs Fail
121(4)
Design Principles
125
Part 2: Document Patterns
Common Attributes
3(14)
SGML
5(1)
HTML & XML
5(1)
XLL
6(1)
TEI
7(1)
SGML Extended Facilities
8(5)
Default Value Lists
8(1)
Data Attributes for Elements
9(2)
Limiting the Target Element Types of IDREFs
11(1)
Common Data Attributes
11(2)
The Unspecified Attribute
13(1)
Paragraphs
13(1)
Architectural Forms
14(3)
The Document Shell
17(8)
HTML
19(1)
Information Units
20(2)
The Advantages of a Simple Head
22(3)
Paragraphs
25(16)
Paragraphs versus Text Blocks
26(3)
Paragraphs versus Paragraph Groups
29(1)
Paragraph Contents
30(1)
Paragraphs Nested Inside Paragraphs
30(2)
Subparagraphs
32(2)
ID Attributes
34(1)
Development of the Paragraph
35(1)
Paragraph Breaks
36(1)
Paragraph Groups Revisited
37(4)
Sequences
41(6)
Examples of Sequences
43(2)
Bad Mixed Content
45(1)
Simpifying the Linear Form
46(1)
Named Data
47(14)
Fielded Text
48(4)
Sequences of Fielded Text
50(2)
Element References
52(3)
Description Tables
55(2)
Importing ASCII Dumps
57(1)
Schema and Type Extension using Parameter Entities
58(3)
Tables
61(8)
Direct Markup versus Element Reference
62(1)
Simple HTML-Style Tables
63(1)
ICADD Tables
64(1)
CALS Tables
65(3)
HTML 4 Tables
68(1)
Interactive Systems
69(8)
Entity
71(1)
Element
71(3)
Processing Instruction
74(3)
Formal Public Identifiers
77(6)
OASIS (SGML Open) Entity Catalogs
79(1)
SGML and MIME
80(3)
Data Content Notations
83(18)
Some FPIs for Notations
87(1)
ISO Standard
87(8)
Time and Space
95(2)
Non-Standard
97(4)
Formal System Identifiers
101(8)
Formal System Identifiers
103(2)
Becoming an FSI User
105(4)
Embedded Notations
109(14)
Naming
111(1)
Stylesheets and Scripts
112(3)
Defining Data Types
115(5)
Lexical Typing using Standard Notation Names
115(1)
Lexical Typing using Lexical Models
116(1)
HyLex and POSIX Regular Expression Delimiters
117(1)
An Attribute for Dates using HyLex
118(1)
Date using POSIX Regular Expressions
118(2)
Embedding Other Notations
120(3)
Fragment Interchange
120(3)
Organizing & Documenting DTDs
123
Core Element Type Sets
124(1)
Base and Derived DTDs
124(4)
Architectural Forms
128(1)
DTD Versions
129(1)
Multiple Pass DTDs
130(1)
Unaccounted-for Elements
131(2)
Simple
131(1)
Richer
132(1)
Documenting Your DTD
133
External Documents
133(1)
Comments
134(1)
Additional Requirements
135(1)
Descriptions in the Document Instance
136
Part 3: Characters & Glyphs
About Characters & Glyphs
3(18)
The ISO Character/Glyph Model
4(2)
Millefiori: 1000 Flowers
6(4)
Modern Printed Scripts
10(3)
Character Repertoire
13(3)
Using Entities
14(1)
Using Elements
15(1)
Characters
16(1)
Collation
16(5)
Simple Collation for English
17(1)
Collation for Western European Languages
17(1)
Fuzzy Transforms
18(1)
Explicit Markup
19(2)
Typeface, Script, & Language
21(48)
Typeface
22(10)
Western
22(1)
Eastern
23(1)
Specifying Exact Font
24(2)
Design Group
26(6)
Script Codes
32(10)
Language Codes
42(12)
Country Codes
54(6)
Multilingual Documents
60(7)
Inline Localizable
61(2)
Interlaced Multilingual
63(2)
Multilingual Hyperdocument
65(1)
Multilingual World Wide Web
66(1)
TEI Writing System Declaration
67(2)
The Flowering of Coded Character Sets
69(20)
The Joy of Sets
70(10)
Telegraph Codes: Five-Bit Sets
70(1)
ASCII, EBCDIC, and ISO 646: Seven-Bit Sets
70(3)
ISO 8859, ISCII, JIS X 201: Eight-Bit Sets
73(3)
Extended Eight-Bit Sets
76(1)
Sixteen-Bit Sets
77(1)
Extended Sixteen-Bit Sets
78(1)
Universal Sets:
78(2)
Literals
80(2)
Character Set and Encodings
82(7)
WG4 Character Encoding Model
83(1)
How To Specify Character Encoding
84(5)
Them's the Breaks
89(18)
Spaces, Words, Hyphens and Lines
90(2)
Word Segmentation
92(9)
Joining
94(2)
Splitting Words and Hyphenation
96(4)
Finding
100(1)
White-space
101(3)
Word Segmentation in Chinese
104(3)
Special Characters & SDATA
107(16)
Using SDATA Entities
108(4)
SDATA Entity Text
111(1)
Quality Assurance on Characters
112(2)
Accents
114(2)
HTML Entities
116(3)
Mathematical Scripts and Symbols
119(2)
XML
121(2)
From Characters To Glyphs
123(32)
Glyph Mapping
125(2)
Glyph Selection
127(3)
Glyph Selection with Entities
129(1)
Size
130(2)
Superscripts and Subscripts
132(2)
Color Codes
134(19)
Black, Grays and White
135(1)
Colors
136(17)
Typographical Embellishments
153(2)
East Asian Issues
155
Custom Symbols
156
Extra Characters
158
Gaiji & User-Defined Characters
160
Custom Fonts
160
Marking Up Handwritten Text
163
Ruby Annotations
164
Native-Language Markup
167
Appendixes
Appendix A: ISO Special Characters
XML-1
Appendix B: HTML Special Characters
HTML-1
Appendix C: TEI Special Characters
TEI-1
Appendix D: Index of XML Special Characters
XML-1
Bibliography
Index
Acknowledgements
Colophon

Preface

PREFACE: Preface

As I complete this book today, I have two exciting new publications fresh from the Net, still warm from the laser printer: the final texts of XML 1.0 (the World Wide Web Consortium's subset of SGML, the Extensible Markup Language) and Web SGML (the corrections and enhancements to the SGML standard, ISO 8879, for WWW uses such as XML).



These two publications together revolutionize electronic publishing; they make SGML's traditional advantages for large-scale corporate publishing available at the desktop. Most importantly, they give you, the owner of data and the creator of data systems, control of your documents: SGML and its Web-optimized subset XML represent the triumph of the Open Systems movement. Not only can we have open systems for network protocols and languages, but also for the data which animate them.

This book is a practical guide for this brave new world - ideas and declarations for SGML and XML element type sets and entity sets that implement the important and useful document structures. I have tried to put in many resources which may be hard to track down otherwise. And I have paid particular attention to trying to bring out the deeper model in XML, one which will not be familiar to readers coming from HTML or proprietary markup languages: processing instructions and notations in particular.

This is not a tutorial on syntax: there are many excellent books available, including the previous books in this series. In particular I have avoided reference to SGML declarations and to the more rarely implemented, optional features of SGML. I have included a particularly detailed treatmentofone area for which there has until now been little detailed treatment: characters, glyphs and internationalization.

Order, Structures, Patterns & Forms
This book is about order, structures, patterns and forms.
In this book, order means the underlying, abstract (and sometimes ineffable) relationships and natures of things, structure means how some order is captured in some concrete markup, pattern means a kind of template or recipe used for creating structures (i.e., "pattern" in the dressmaking sense, not the text-processing sense), and form means a particular conformance between one structure and another (i.e., "form" in the concrete-laying sense, not the metaphysical sense).

Like any good cookbook, as well as showing how to make a structure, The XML & SGML Cookbook also tries to explain why, to explore the alternatives, and give the various pros and cons.

The freedom of a highly generalized technology like SGML can cause unease for new document-system designers. Being able to move in any direction is not much comfort if you cannot afford to go in the wrong direction! Fortunately, during the ten years of SGML's existence as an International Standard, convergent approaches and solutions to many common document patterns have emerged. This book attempts to catalog and discuss the best, the most instructive and the most useful of them.

Document Systems

A consideration of system constraints and factors outside the scope of the document type declaration is so often the thing that makes a project successful. Because of this, this book is aimed at the document-system designer rather than just the "DTD writer." Information is never managed in a vacuum; documents exist as part of a document system. Sometimes the system is closed, sometimes open-ended. If the information is valuable enough to warrant management, your whole document system should usefully be considered when creating a great DTD.

In any case, this book also will be of use to those who generate XML documents, and who might not ever even create formal declarations for elements types using SGML as their notation. If you are one of these people, I urge you to learn and attempt to use SGML content models in your informal documentation at least: SGML provides a very convenient and well-thought-out notation, suitable for many kinds of structures, and there are graphical visualization tools available to help.

So even though order can be discovered in all kinds of places in documents, many times the structures are loose, have exceptions or are incomplete. Consequently, the patterns for element type sets in this book are presented as prototypes and exemplars tha t you can take and reshape to your particular needs, rather than as templates which you must obediently cut and paste.

The document-system designer needs to be aware of the limits of DTD elegance. A pattern that the designer may perceive as the archetype for authors may in fact merely be a stereotype of their needs. A pattern can only be used successfully to reveal some actual order, never to impose a spurious order.

Document-system designers tend to have neat and schematic minds that reject disorder, sometimes at the price of wanting to see order where there is none: a mirage from some previous document. So this book, as well as giving patterns, also gives some principles for selecting patterns. The need for elegance must be moderated by the need for success. I hope that readers coming to this book expecting neat cookie-cutter solutions will be empowered and enabled; you will understand the issues and tradeoffs most appropriate for your individual needs.

Terminology

XML is bringing a rich influx of people from different disciplines and technologies into the SGML world, and so there is quite a variety and duplication of terminology. In order to keep sentences under control, I have used some common simplified terms which emphasize the SGML keywords used. This Book ISO-ese ANY element element having a declared content type of ANY CDATA attribute attribute having a declared value of CDATA CDATA element element having a declared content type of CDATA CDATA entity CDATA entity CDATA marked section CDATA marked section container element element having subelements EMPTY element empty element ID attribute attribute having a declared value of ID IDREF attribute attribute having declared value of IDREF NDATA entity NDATA entity NMTOKEN attribute attribute having declared value of NMTOKEN RCDATA element element having a declared content type of RCDATA SDATA entity SDATA entity SUBDOC entity subdocument entity

In this book, an attribute ID means an attribute with the name ID; an ID attribute means an attribute having the declared value ID; but the attribute ID means that attribute with the name ID in the example snippet. It is good usage that an attribute ID should be an ID attribute and that an attribute IDREF should be an IDREF attribute.

I intend to maintain a Web page giving any errata for this book, at the Prentice Hall PTR Web site www.phptr.com.

Rick Jelliffe
Sydney, Australia

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews