Read an Excerpt
JOE CELKO'S SQL PROGRAMMING STYLE
By Joe Celko
MORGAN KAUFMANN PUBLISHERS
Copyright © 2005 Elsevier Inc.
All right reserved.
ISBN: 978-0-08-047883-8
Chapter One
Names and Data Elements
This is the old joke: "When I was a kid, we had three cats."
"What were their names?"
"Cat, cat, and cat."
"That sounds screwed up. How did you tell them apart?"
"Who cares? Cats don't come when you call them anyway!"
Your data will not come when it is called either if you do not give it a name that is always distinct and recognizable. This is an important part of any database project. Bad names for the data elements make the code difficult, or even impossible, to read.
I am not kidding about impossible to read. In the old days, software companies used to deliberately scramble source code names and remove formatting to hide the algorithm from the buyers. The tradition seems to linger on, even if not by intent. In August 2004, a SQL newsgroup had a posting in which all of the names were one letter and a long string of digits.
There are now ISO-11179 metadata standards that describe rules for naming data elements and for registering standards. Because they are an ISO standard, they are what you should be using in SQL as well as everywhere else.
That standard, a bit of typography, and some common sense will give you the rules you need to get started.
1.1 Names
In the early days, every programmer had his or her own personal naming conventions. Unfortunately, they were often highly creative. My favorite was a guy who picked a theme for his COBOL paragraph names: one program might use countries, another might use flowers, and so forth. This is obviously weird behavior even for a programmer, but many programmers had personal systems that made sense to themselves but not to other people.
For example, the first FORTRAN I used allowed only six-letter names, so I became adept at using and inventing six-letter names. Programmers who started with weakly typed or typeless languages like to use Hungarian notation (see Leszynski and Reddick). Old habits are hard to give up.
When software engineering became the norm, every shop developed its own naming conventions and enforced them with some kind of data dictionary. Perhaps the most widespread set of rules was MIL STD 8320.1, set up by the U.S. Department of Defense, but it never became popular outside of the federal government. This was a definite improvement over the prior nonsystem, but each shop varied quite a bit; some had formal rules for name construction, whereas others simply registered whatever the first name given to a data element was.
Today, we have ISO-11179 standards, which are becoming increasingly widespread, required for certain government work, and being put into data repository products. Tools and repositories of standardized encoding schemes are being built to this standard. Given this and XML as a standard exchange format, ISO-11179 will be the way that metadata is referenced in the future.
1.1.1 Watch the Length of Names
Rationale:
The SQL-92 standards have a maximum identifier length of 18 characters. This length came from the older COBOL standards. These days, SQL implementations allow longer names, but if you cannot say it in 18 characters, then you have a problem. Table 1.1 shows the maximum length for names of the most important SQL schema objects according to ISO and several popular SQL products.
The numbers in the table are either bytes or characters. A maximum character length can be smaller than a maximum byte length if you use a multibyte character set.
Do not use super-long names. People have to read them, type them, and print them out. They also have to be able to understand those names when they look at the code, search for them in the data dictionary, and so forth. Finally, the names need to be shared in host programs that might not allow the same maximum length.
But do not go to the other extreme of highly condensed names that are impossible to read without weeks of study. The old Bachman design tool was used to build DB2 databases back when column length was limited to 18 bytes. Sometimes the tool would change the logical attribute name to a physical column name by removing all of the vowels. Craig Mullins referred to this as "Bachman having a vowel movement on my DDL." This is a bad approach to getting the name to fit within a smaller number of characters.
Exceptions:
These exceptions would be on a case-by-case basis and probably the result of legacy systems that had different naming restrictions.
1.1.2 Avoid All Special Characters in Names
Rationale:
Special characters in a name make it difficult or impossible to use the same name in the database and the host language programs or even to move a schema to another SQL product.
Table 1.2 shows the characters allowed in names by the standards and popular SQL products.
Generally, the first character of a name must be a letter, whereas subsequent characters may be letters, digits, or _ (underscore). Any database management system (DBMS) might also allow $, #, or @, but no DBMS allows all three, and in any case the special characters are not usable everywhere (Microsoft attaches special meaning to names that begin with @ or # and Oracle discourages special characters in the names of certain objects).
But what is a letter? In the original SQL, all letters had to be uppercase Latin, so there were only 26 choices. Nowadays the repertoire is more extensive, but be wary of characters outside the Latin-1 character set for the following reasons:
1. IBM cannot always recognize a letter. It just accepts that any multibyte character except space is a letter and will not attempt to determine whether it's uppercase or lowercase.
2. IBM and Oracle use the database's character set and so could have a migration problemwith exotic letters. Microsoft uses Unicode and so does not have this problem.
Intermediate SQL-92 does not allow an identifier to end in an underscore. It is also not a good idea to put multiple underscores together; modern printers make it difficult to count the number of underscores in a chain.
Exceptions: None
1.1.3 Avoid Quoted Identifiers
Rationale:
This feature was added to SQL-92. Its main use has been to alias column names to make printouts look like reports. This kludge defeats the purpose of a tiered architecture. Instead, it destroys portability of the code and invites poorly constructed names. Table 1.3 shows the characteristics of delimited identifiers.
If you find the character-set restrictions of names onerous, you can avoid them by putting identifiers inside double quotes. The result is a delimited identifier (or quoted identifier in Oracle terminology). Delimited identifiers may startwith, and contain, any character. It is a bit uncertain how one can include the double quote (") character. The standard way is to double it, as in "Empl"oyees" but that's not always documented.
Support for delimited names is nearly universal, with only two major exceptions: (1) IBM will not allow nonalphanumeric characters for labels and variable names inside stored procedures, and (2) Microsoft will not allow quoted identifiers if the QUOTED_IDENTIFIER switch is off. The reason for the first exception is, perhaps, that IBM converts SQL procedures into another computer language before compilation. Suppose you make a table with a delimited identifier, for example:
CREATE TABLE "t" ("column1" INTEGER NOT NULL);
Now try to get that table with a regular identifier, thus:
SELECT column1 FROM t;
Will this work? According to the SQL standard, it should not, but with Microsoft, it might. The reason is case sensitivity, which we discuss in section 1.1.4.
The quoted identifiers do not work well with hot languages, especially when they have spaces or special characters. For example, this is a valid insertion statement:
INSERT INTO Table ([field with space]) VALUES (value);
ADO generates the following code:
INSERT INTO Table (field with space) VALUES (value);
which is a syntax error.
Exceptions:
If you need to communicate a result to someone who cannot read or understand the properly constructed column names in Latin-1, then use quoted aliases to format the output. I have done this for Polish and Chinese speakers.
I also use quoted names inside documentation so that they will immediately read as the name of a schema object and not a regular word in the sentence.
The usual reason for this error is that the programmer confuses a data element name with a display header. In traditional procedural languages, the data file and the application are in the same tier; in SQL, the database is totally separate from the front end where the data is displayed.
1.1.4 Enforce Capitalization Rules to Avoid CaseSensitivity Problems
Rationale:
Case-sensitivity rules vary from product to product.
Standard SQL, IBM, and Oracle will convert regular identifiers to uppercase but will not convert delimited identifiers to uppercase. For Microsoft, the case-sensitivity rule has nothing to do with whether the name is regular or delimited. Instead, identifiers depend on the default collation. If the default collation is case insensitive, then t equals T. If it's case sensitive, then t does not equal T.
To sum up, there are two case-sensitivity problems. The first is that the delimited identifier "t" and the regular identifier t differ if one follows the SQL standard. The second is that Microsoft does not follow the SQL standard. These problems make it difficult for one naming convention to fit everyone.
Exceptions:
I will give a simple set of rules based on principles of readability and typography, but there are other possible conventions:
1. Avoid delimited identifiers so you have no problems.
2. IBM uses only uppercase. Unfortunately, this is difficult to read and looks like you are still programming on a punchcard system.
3. Microsoft and Oracle use lowercase except where it would look odd. Unfortunately, the definition of looking odd is not at all precise. Sometimes reserved words are uppercased, sometimes lowercased, and so forth.
1.2 Follow the ISO-11179 Standards Naming Conventions
This is a fairly new ISO standard for metadata, and it is not well understood. Fortunately, the parts that a SQL programmer needs to know are pretty obvious and simple. The real problem is in the many ways that people violate them. A short summary of the NCITS L8 Metadata Standards Committee rules for data elements can be found at the following sites:
http://pueblo.lbl.gov/~olken/X3L8/drafts/draft.docs.html http://lists.oasis-open.org/archives/ubl-ndrsc/200111/ msg00005.html
Also the pdf file: www.oasis-open.org/committees/download.php/6233/ c002349_ISO_IEC_11179
and the draft: www.iso.org/iso/en/ittf/PubliclyAvailableStandards/ c002349_ISO_IEC_11179-1_1999(E).zip
The ISO-11179 standard is broken down into six sections:
11179-1: Framework for the Specification and Standardization of Data Elements Definitions 11179-2: Classification for Data Elements 11179-3: Basic Attributes of Data Elements 11179-4: Rules and Guidelines for the Formulation of Data 11179-5: Naming and Identification Principles for Data 11179-6: Registration of Data Elements
1.2.1 ISO-11179 for SQL
Rationale:
Although the formal standards are good, they are very general. It is handy to have a set of rules aimed at the SQL developer in his or her own language. Some of the interpretations given here are the consensus of experts, as taken from newsgroups and private e-mails.
Taking the rules from Section ISO-11179-4, a scalar data element should do the following:
1. Be unique (within any data dictionary in which it appears).
2. Be stated in the singular.
3. State what the concept is, not only what it is not.
4. Be stated as a descriptive phrase or sentence(s).
5. Contain only commonly understood abbreviations.
6. Be expressed without embedding definitions of other data elements or underlying concepts.
7. Tables, sets, and other collections shall be named with a collective, class, or plural name.
8. Procedures shall have a verb in their name.
9. A copy (alias) of a table shall include the base table name as well as the role it is playing at that time.
(Continues...)
Excerpted from JOE CELKO'S SQL PROGRAMMING STYLE by Joe Celko Copyright © 2005 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN PUBLISHERS. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.