Joe Celko's Trees and Hierarchies in SQL for Smarties

( 1 )

Overview

The demand for SQL information and training continues to grow with the need for a database behind every website capable of offering web-based information queries. SQL is the de facto standard for database retrieval, and if you need to access, update, or utilize data in a modern database management system, youwill need SQL to do it. TheSecond Editionof Joe Celko's Trees and Hierarchies in SQL for Smarties covers two new sets of extensions over three entirelynew chapters and expounds upon the changes that have ...

See more details below
Paperback
$43.77
BN.com price
(Save 8%)$47.95 List Price
Other sellers (Paperback)
  • All (8) from $2.82   
  • New (3) from $29.75   
  • Used (5) from $2.82   
Joe Celko's Trees and Hierarchies in SQL for Smarties

Available on NOOK devices and apps  
  • NOOK Devices
  • NOOK HD/HD+ Tablet
  • NOOK
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK Study
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$46.95
BN.com price

Overview

The demand for SQL information and training continues to grow with the need for a database behind every website capable of offering web-based information queries. SQL is the de facto standard for database retrieval, and if you need to access, update, or utilize data in a modern database management system, youwill need SQL to do it. TheSecond Editionof Joe Celko's Trees and Hierarchies in SQL for Smarties covers two new sets of extensions over three entirelynew chapters and expounds upon the changes that have occurred in SQL standards since the previous edition's publication. Benefit from mastering the challenging aspects of these database applications in SQL as taught by Joe Celko, one of the most-read SQL authors in the world.

• Expert advice from a noted SQL authority and award-winning columnist who has given 10 years of service to the ANSI SQL standards committee

• Teaches scores of advanced techniques that can be used with any product, in any SQL environment

• Offers graph theory and programming techniques for working around deficiencies and gives insight into real-world challenges

Audience: Database application developers (from enterprise-level application builders to small business developers).

Read More Show Less

Editorial Reviews

From the Publisher
"I want to say clearly that I think the subject of this proposed book is one for which there will be considerable demand...the topic is poorly understood in general and a good book on the subject will be helpful to the SQL community at large. This book should be of great interest to real-world application programmers...I think that this book would be used on a day-to-day basis (rather than languish on a shelf until some special problem arose)." -Jim Melton, author of SQL:1999.
Read More Show Less

Product Details

Meet the Author

Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.

Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas.

He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.

Read More Show Less

Read an Excerpt

JOE CELKO'S TREES AND HIERARCHIES IN SQL FOR SMARTIES


By Joe Celko

MORGAN KAUFMANN

Copyright © 2004 Elsevier Inc.
All right reserved.

ISBN: 978-0-08-049169-1


Chapter One

Graphs, Trees, and Hierarchies

LET'S START WITH a little mathematical background. Graph theory is a branch of mathematics that deals with abstract structures, known as graphs. These are not the presentation charts that you get out of a spreadsheet package.

Very loosely speaking, a graph is a diagram of "dots" (called nodes or vertices) and "lines" (edges) that model some kind of "flow" or relationship. The edges can be undirected or directed. Graphs are very general models. In circuit diagrams the edges are the wires and the nodes are the components. On a road map the nodes are the towns and the edges are the roads. Flowcharts, organizational charts, and a hundred other common abstract models you see every day are all shown as graphs.

A directed graph allows a "flow" along the edges in one direction only, as shown by the arrowheads, whereas an undirected graph allows the flow to travel in both directions. Exactly what is flowing depends on what you are modeling with the graph.

The convention is that an edge must join two (and only two) nodes. This lets us show an edge as an ordered pair of nodes, such as (Atlanta, Boston) if we are dealing with a map, or (a, b) in a more abstract notation. There is an implication in a directed graph that the direction is shown by the ordering. In an undirected graph we know that (a, b) = (b, a), however.

A node can sit alone or have any number of edges associated with it. A node can also be self-referencing, as in (a, a).

The terminology used in graph theory will vary, depending on which book you had in your finite math class. The following list, in informal language, includes the terms I will use in this book.

Order of a graph: number of nodes in the graph

Degree: the number of edges at a node, without regard to whether the graph is directed or undirected

Indegree: the number of edges coming into a node in a directed graph

Outdegree: the number of edges leaving a node in a directed graph

Subgraph: a graph that is a subset of another graph's edges and nodes

Walk: a subgraph of alternating edges and nodes that are connected to each other in such a way that you can trace around it without lifting your finger

Path: a subgraph that does not cross over itself—there is a starting node with degree one, an ending node with degree one, and all other nodes have degree two. It is a special case of a walk. It is a "connect-the-dots" puzzle.

Cycle: a subgraph that "makes a loop," so that all nodes have degree two. In a directed graph all the nodes of a cycle have outdegree one and indegree one (Figure 1.1).

Connected graph: a graph in which all pairs of nodes are connected by a path. Informally, the graph is all in one piece.

Forest: a collection of separate trees. Yes, I am defining this term before we finally get to discussing trees.

There are a lot more terms to describe special kinds of graphs, but frankly, we will not use them in this book. We are supposed to be learning SQL programming, not graph theory.

The strength of graphs as problem-solving tools is that the nodes and edges can be given extra attributes that adapt this general model to a particular problem. Edges can be assigned "weights," such as expected travel time for the roads on a highway map. Nodes can be assigned "colors" that put them into groups, such as men and women. Look around and you will see how they are used.

1.1 Modeling a Graph in a Program

Long before there was SQL, programmers represented graphs in the programming language that they had. People used pointer chains in assembly language or system development languages such as 'C' to build very direct representations of graphs with machine language instructions. However, unlike the low level system development languages, the later, higher-level languages, such as Pascal, LISP, and PL/I, did not expose the hardware to the programmer. Pointers in these higher level languages were abstracted to hide references to the actual physical storage and often required that the pointers point to variables or structures of a particular type. (See PL/I's ADDR() function, pointer datatypes, and based variables as examples of this kind of language construct.)

Traditional application development languages do not have pointers, but they often have arrays. In particular, FORTRAN only had arrays for a data structure; a good FORTRAN programmer could use them for just about anything. Early versions of FORTRAN did not have character-string data types—everything was either an integer or a floating-point number. This meant the model of a graph had to be created by numbering the nodes and using the node numbers as subscripts to index into the arrays.

Once the array techniques for graphs were developed, they became part of the "programmer's folklore" and were implemented in other languages. I will use a pseudocode to explain the techniques.

1.1.1 Adjacency Lists for Graphs

The adjacency list model represents the edges of the graph as pairs of nodes, similar to the following computer code:

DECLARE ARRAY GraphList OF RECORD [edge CHAR(1), in_node INTEGER, out_node INTEGER];

With data:

GraphList edge in_node out_node 'a' 1 2 'b' 2 3 'c' 4 2 'd' 1 4

The algorithms that we used were based on loops that made the connections between two edges, in which the in_node of one row was equal to the out_node of another row.

1.1.2 Adjacency Arrays for Graphs

Many of the computational languages had library functions for matrix operations; therefore it was logical to put the graph into an array where it could be manipulated with these functions.

Given (n) nodes, you could declare an (n) by (n) array with zeros and ones in the cells. A "one" meant that there was an edge between the two nodes represented by the row and column of the cell, and a "zero" meant that there was not.

You can actually represent a two-dimensional array; for example: A[0:5, 0:5], with a table like this:

CREATE TABLE Array_A (edge CHAR(10) NOT NULL, i INTEGER NOT NULL UNIQUE CHECK (i BETWEEN 0 AND 5), j INTEGER NOT NULL UNIQUE CHECK (j BETWEEN 0 AND 5), PRIMARY KEY (i, j));

I have a chapter in SQL For Smarties on how to do basic matrix math with such tables. However, because SQL was not meant to be used this way, the code to implement the old Adjacency Array algorithms is rather baroque. Array was added to SQL-99 as a "collection type" for columns, but it is not widely implemented and has serious limitations—it is a vector, or one-dimensional array, and not a full multidimensional structure.

1.1.3 Finding a Path in General Graphs in SQL

There is a classic problem in graph theory that illustrates how expensive it can be to do general graphs in SQL. What we want is a list of paths from any two nodes in a directed graph in which the edges have a weight. The sum of these weights gives us the cost of each path so that we can pick the cheapest path.

The best way is probably to use the Floyd-Warshall or Johnson algorithm in a procedural language and load a table with the results. However, I want to do this in pure SQL as an exercise. Let's start with a simple graph and represent it as an adjacency list with weights on the edges.

CREATE TABLE Graph (source CHAR(2) NOT NULL, destination CHAR(2) NOT NULL, cost INTEGER NOT NULL, PRIMARY KEY (source, destination));

I obtained data for this table from the book Introduction to Algorithms by Cormen, Leiserson, and Rivest (Cambridge, Mass., MIT Press, 1990, p. 518; ISBN 0-262-03141-8). This book is very popular in college courses in the United States. I made one decision that will be important later—I added self-traversal edges—the node is both the source and the destination so the cost of those paths is zero.

INSERT INTO Graph VALUES ('s', 's', 0), ('s', 'u', 3), ('s', 'x', 5), ('u', 'u', 0), ('u', 'v', 6), ('u', 'x', 2), ('v', 'v', 0), ('v', 'y', 2), ('x', 'u', 1), ('x', 'v', 4), ('x', 'x', 0), ('x', 'y', 6), ('y', 's', 3), ('y', 'v', 7), ('y', 'y', 0);

I am not happy about this approach, because I have to decide the maximum number of edges in a path before I start looking for an answer. However, this will work, and I know that a path will have no more than the total number of nodes in the graph. Let's create a table to hold the paths:

CREATE TABLE Paths (step_1 CHAR(2) NOT NULL, step_2 CHAR(2) NOT NULL, step_3 CHAR(2) NOT NULL, step_4 CHAR(2) NOT NULL, step_5 CHAR(2) NOT NULL, total_cost INTEGER NOT NULL, path_length INTEGER NOT NULL, PRIMARY KEY (step_1, step_2, step_3, step_4, step_5));

The step_1 node is where I begin the path. The other columns are the second step, third step, fourth step, and so forth. The last step column is the end of the journey. The total_cost column is the total cost, based on the sum of the weights of the edges, on this path. The path_length column is harder to explain, but for now let's just say that it is a count of the nodes visited in the path.

To keep things easier let's look at all the paths from "s" to "y" in the graph. The INSERT INTO statement for constructing that set looks like this:

INSERT INTO Paths SELECT G1.source, it is 's' in this example G2.source, G3.source, G4.source, G4.destination, it is 'y' in this example (G1.cost + G2.cost + G3.cost + G4.cost), (CASE WHEN G1.source NOT IN (G2.source, G3.source, G4.source) THEN 1 ELSE 0 END + CASE WHEN G2.source NOT IN (G1.source, G3.source, G4.source) THEN 1 ELSE 0 END + CASE WHEN G3.source NOT IN (G1.source, G2.source, G4.source) THEN 1 ELSE 0 END + CASE WHEN G4.source NOT IN (G1.source, G2.source, G3.source) THEN 1 ELSE 0 END) FROM Graph AS G1, Graph AS G2, Graph AS G3, Graph AS G4 WHERE G1.source = 's' AND G1.destination = G2.source AND G2.destination = G3.source AND G3.destination = G4.source AND G4.destination = 'y';

I put in "s" and "y" as the source and destination of the path and made sure that the destination of one step in the path was the source of the next step in the path. This is a combinatorial explosion, but it is easy to read and understand.

The sum of the weights is the cost of the path, which is easy to understand. The path_length calculation is a bit harder. This sum of CASE expressions looks at each node in the path. If it is unique within the row, it is assigned a value of one; if it is not unique within the row, it is assigned a value of zero.

All paths will have five steps in them, because that is the way the table is declared. However, what if a path shorter than five steps exists between the two nodes? That is where the self-traversal rows are used! Consecutive pairs of steps in the same row can be repetitions of the same node.

Here is what the rows of the paths table look like after this INSERT INTO statement, ordered by descending path_length, and then by ascending cost:

Paths step_1 step_2 step_3 step_4 step_5 total_cost path_length

s s x x y 11 0 s s s x y 11 1 s x x x y 11 1 s x u x y 14 2 s s u v y 11 2 s s u x y 11 2 s s x v y 11 2 s s x y y 11 2 s u u v y 11 2 s u u x y 11 2 s u v v y 11 2 s u x x y 11 2 s x v v y 11 2 s x x v y 11 2 s x x y y 11 2 s x y y y 11 2 s x y v y 20 4 s x u v y 14 4 s u v y y 11 4 s u x v y 11 4 s u x y y 11 4 s x v y y 11 4

(Continues...)



Excerpted from JOE CELKO'S TREES AND HIERARCHIES IN SQL FOR SMARTIES by Joe Celko Copyright © 2004 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More Show Less

Table of Contents

1 Graphs, trees, and hierarchies 3
2 Adjacency list model 17
3 Path enumeration models 35
4 Nested set model of hierarchies 45
5 Frequent insertion trees 101
6 The linear version of the nested sets model 137
7 Binary trees 143
8 Other models for trees 157
9 Proprietary extensions for trees 169
10 Hierarchies in data modeling 175
11 Hierarchical encoding schemes 191
12 Hierarchical database systems (IMS) 199
Read More Show Less

Customer Reviews

Average Rating 5
( 1 )
Rating Distribution

5 Star

(1)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously
Sort by: Showing 1 Customer Reviews
  • Anonymous

    Posted April 29, 2012

    To rp

    You boy or girl

    Was this review helpful? Yes  No   Report this review
Sort by: Showing 1 Customer Reviews

If you find inappropriate content, please report it to Barnes & Noble
Why is this product inappropriate?
Comments (optional)