Joe Celko's SQL Puzzles and Answers, Second Edition

In this complete revision and expansion of his first SQL Puzzles book, Joe Celko challenges you with his trickiest puzzles—and then helps solve them with a variety of solutions and explanations. Joe demonstrates the thought processes that are involved in attacking a problem from an SQL perspective to help advanced database programmers solve the puzzles you frequently face. These techniques not only help with the puzzle at hand, but help develop the mindset needed to solve the many difficult SQL puzzles you face every day. Of course, part of the fun is to see whether or not you can write better solutions than Joe’s.

* A great collection of tricky SQL puzzles with a variety of solutions and explanations.

* Uses the proven format of puzzles and solutions to provide a user-friendly, practical look into SQL programming problems - many of which will help users solve their own problems.

* New edition features:
— Many new puzzles added!
— Dozens of new solutions to puzzles, and using features in SQL-99
— Code is edited to conform to SQL STYLE rules
— New chapter on temporal query puzzles
— New chapter on common misconceptions about SQL and RDBMS that leads to problems

Joe Celko's SQL Puzzles and Answers, Second Edition

62.95 In Stock

Joe Celko's SQL Puzzles and Answers, Second Edition

Add to Wishlist

Joe Celko's SQL Puzzles and Answers, Second Edition

eBook

$62.95

eBook
$62.95

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.

WANT A NOOK? Explore Now

Buy As Gift

Related collections and offers

Overview

Product Details

ISBN-13:	9780080491684
Publisher:	Elsevier Science
Publication date:	09/21/2006
Series:	Morgan Kaufmann Series in Data Management Systems Series
Sold by:	Barnes & Noble
Format:	eBook
Pages:	352
File size:	9 MB

About the Author

Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards.

Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas.

He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.

Read an Excerpt

JOE CELKO'S SQL PUZZLES & ANSWERS

By Joe Celko

MORGAN KAUFMANN PUBLISHERS

Copyright © 2007 Elsevier Inc.
All right reserved.
ISBN: 978-0-08-049168-4

Chapter One

PUZZLE 1 FISCAL YEAR TABLES

Let's write some CREATE TABLE statements that are as complete as possible. This little exercise is important because SQL is a declarative language and you need to learn how to specify things in the database instead of in the code.

The table looks like this:

CREATE TABLE FiscalYearTable1 (fiscal_year INTEGER, start_date DATE, end_date DATE);

It stores date ranges for determining what fiscal year any given date belongs to. For example, the federal government runs its fiscal year from October 1 until the end of September. The scalar subquery you would use to do this table lookup is:

(SELECT F1.fiscal_year FROM FiscalYearTable1 AS F1 WHERE outside_date BETWEEN F1.start_date AND F1.end_date)

Your assignment is to add all the constraints you can think of to the table to guarantee that it contains only correct information.

While vendors all have different date and time functions, let's assume that all we have is the SQL-92 temporal arithmetic and the function EXTRACT ([YEAR | MONTH | DAY] FROM <date expression>), which returns an integer that represents a field within a date.

Answer #1

1. First things first; make all the columns NOT NULL since there is no good reason to allow them to be NULL.

2. Most SQL programmers immediately think in terms of adding a PRIMARY KEY, so you might add the constraint PRIMARY KEY (fiscal_year, start_date, end_date) because the fiscal year is really another name for the pair (start_date, end_date). This is not enough, because it would allow this sort of error:

(1995, '1994-10-01', '1995-09-30') (1996, '1995-10-01', '1996-08-30') <== error! (1997, '1996-10-01', '1997-09-30') (1998, '1997-10-01', '1997-09-30')

You could continue along the same lines and fix some problems by adding the constraints UNIQUE (fiscal_year), UNIQUE (start_date), and UNIQUE (end_date), since we do not want duplicate dates in any of those columns.

3. The constraint that almost everyone forgets to add because it is so obvious is:

CHECK (start_date < end_date) or CHECK (start_date <= end_date), as is appropriate.

4. A better way would be to use the constraint PRIMARY KEY (fiscal_year) as before, but then since the start and end dates are the same within each year, you could use constraints on those column declarations:

CREATE TABLE FiscalYearTable1 (fiscal_year INTEGER NOT NULL PRIMARY KEY, start_date DATE NOT NULL, CONSTRAINT valid_start_date CHECK ((EXTRACT (YEAR FROM start_date) = fiscal_year - 1) AND (EXTRACT (MONTH FROM start_date) = 10) AND CHECK (EXTRACT (DAY FROM start_date) = 01)), end_date DATE NOT NULL, CONSTRAINT valid_end_date CHECK ((EXTRACT (YEAR FROM end_date) = fiscal_year) AND (EXTRACT (MONTH FROM end_date) = 09) AND (EXTRACT (DAY FROM end_date) = 30)));

You could argue for making each predicate a separate constraint to give more detailed error messages. The predicates on the year components of the start_date and end_date columns also guarantee uniqueness because they are derived from the unique fiscal year.

5. Unfortunately, this method does not work for all companies. Many companies have an elaborate set of rules that involve taking into account the weeks, weekends, and weekdays involved.

They do this to arrive at exactly 360 days or 52 weeks in their accounting year. In fact, there is a fairly standard accounting practice of using a "4 weeks, 4 weeks, 5 weeks" quarter with some fudging at the end of the year; you can have a leftover week between 3 and 11 days. The answer is a FiscalMonth table along the same lines as this FiscalYears example.

A constraint that will work surprisingly well for such cases is:

CHECK ((end_date - start_date) = INTERVAL 359 DAYS)

where you adjust the number of days to fit your rules (i.e., 52 weeks * 7 days = 364 days). If the rules allow some variation in the size of the fiscal year, then replace the equality test with a BETWEEN predicate.

Now, true confession time. When I have to load such a table in a database, I get out my spreadsheet and build a table using the built-in temporal functions. Spreadsheets have much better temporal functions than databases, and there is a good chance that the accounting department already has the fiscal calendar in a spreadsheet.

PUZZLE 2 ABSENTEES

This problem was presented on the MS ACCESS forum on CompuServe by Jim Chupella. He wanted to create a database that tracks employee absentee rates. Here is the table you will use:

CREATE TABLE Absenteeism (emp_id INTEGER NOT NULL REFERENCES Personnel (emp_id), absent_date DATE NOT NULL, reason_code CHAR (40) NOT NULL REFERENCES ExcuseList (reason_code), severity_points INTEGER NOT NULL CHECK (severity_points BETWEEN 1 AND 4), PRIMARY KEY (emp_id, absent_date));

An employee ID number identifies each employee. The reason_code is a short text explanation for the absence (for example, "hit by beer truck," "bad hair day," and so on) that you pull from an ever-growing and imaginative list, and severity point is a point system that scores the penalty associated with the absence.

If an employee accrues 40 severity points within a one-year period, you automatically discharge that employee. If an employee is absent more than one day in a row, it is charged as a long-term illness, not as a typical absence. The employee does not receive severity points on the second, third, or later days, nor do those days count toward his or her total absenteeism.

Your job is to write SQL to enforce these two business rules, changing the schema if necessary.

Answer #1

Looking at the first rule on discharging personnel, the most common design error is to try to drop the second, third, and later days from the table. This approach messes up queries that count sick days, and makes chains of sick days very difficult to find.

The trick is to allow a severity score of zero, so you can track the long-term illness of an employee in the Absenteeism table. Simply change the severity point declaration to "CHECK (severity_points BETWEEN 0 AND 4)" so that you can give a zero to those absences that do not count. This is a trick newbies miss because storing a zero seems to be a waste of space, but zero is a number and the event is a fact that needs to be noted.

UPDATE Absenteeism SET severity_points= 0, reason_code = 'long term illness' WHERE EXISTS (SELECT * FROM Absenteeism AS A2 WHERE Absenteeism.emp_id = A2.emp_id AND Absenteeism.absent_date = (A2.absent_date INTERVAL 1 DAY));

When a new row is inserted, this updatewill look for another absence on the day before and change its severity point score and reason_code in accordance with your first rule.

The second rule for firing an employee requires that you know what his or her current point score is. You would write that query as follows:

SELECT emp_id, SUM(severity_points) FROM Absenteeism GROUP BY emp_id;

This is the basis for a grouped subquery in the DELETE statement you finally want. Personnel with less than 40 points will return a NULL, and the test will fail.

DELETE FROM Personnel WHERE emp_id = (SELECT A1.emp_id FROM Absenteeism AS A1 WHERE A1.emp_id = Personnel.emp_id GROUP BY A1.emp_id HAVING SUM(severity_points) >= 40);

The GROUP BY clause is not really needed in SQL-92, but some older SQL implementations will require it.

Answer #2

Bert Scalzo, a senior instructor for Oracle Corporation, pointed out that the puzzle solution had two flaws and room for performance improvements.

The flaws are quite simple. First, the subquery does not check for personnel accruing 40 or more severity points within a one-year period, as required. It requires the addition of a date range check in the WHERE clause:

DELETE FROM Personnel WHERE emp_id = (SELECT A1.emp_id FROM Absenteeism AS A1 WHERE A1.emp_id = Personnel.emp_id AND absent_date BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS AND CURRENT_TIMESTAMP GROUP BY A1.emp_id HAVING SUM(severity_points) >= 40);

Second, this SQL code deletes only offending personnel and not their absences. The related Absenteeism row must be either explicitly or implicitly deleted as well. You could replicate the above deletion for the Absenteeism table. However, the best solution is to add a cascading deletion clause to the Absenteeism table declaration:

CREATE TABLE Absenteeism ( ... emp_id INTEGER NOT NULL REFERENCES Personnel(emp_id) ON DELETE CASCADE, ...);

The performance suggestions are based on some assumptions. If you can safely assume that the UPDATE is run regularly and people do not change their departments while they are absent, then you can improve the UPDATE command's subquery:

UPDATE Absenteeism AS A1 SET severity_points = 0, reason_code = 'long term illness' WHERE EXISTS (SELECT * FROM absenteeism as A2 WHERE A1.emp_id = A2.emp_id AND (A1.absent_date + INTERVAL 1 DAY) = A2.absent_date);

There is still a problem with long-term illnesses that span weeks. The current situation is that if you want to spend your weekends being sick, that is fine with the company. This is not a very nice place to work. If an employee reports in absent on Friday of week number 1, all of week number 2, and just Monday of week number 3, the UPDATE will catch only the five days from week number 2 as long-term illness. The Friday and Monday will show up as sick days with severity points. The subquery in the UPDATE requires additional changes to the missed-date chaining.

I would avoid problems with weekends by having a code for scheduled days off (weekends, holidays, vacation, and so forth) that carry a severity point of zero. A business that has people working weekend shifts would need such codes.

(Continues...)

Excerpted from JOE CELKO'S SQL PUZZLES & ANSWERS by Joe Celko Copyright © 2007 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN PUBLISHERS. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

PUZZLE 01: FISCAL YEAR TABLES
PUZZLE 02: ABSENTEES
PUZZLE 03: THE ANESTHESIA PUZZLE
PUZZLE 04: SECURITY BADGES
PUZZLE 05: ALPHA DATA
PUZZLE 06: HOTEL RESERVATIONS
PUZZLE 07: KEEPING A PORTFOLIO
PUZZLE 08: SCHEDULING PRINTERS
PUZZLE 09: AVAILABLE SEATS PUZZLE
PUZZLE 10: WAGES OF SIN:

PUZZLE 11: WORK ORDER
PUZZLE 12: CLAIMS STATUS
PUZZLE 13: TEACHERS
PUZZLE 14: TELEPHONE
PUZZLE 15: FIND THE LAST TWO SALARIES
PUZZLE 16: MECHANICS
PUZZLE 17: EMPLOYMENT AGENCY
PUZZLE 18: JUNK MAIL
PUZZLE 19: TOP SALESMEN
PUZZLE 20: TEST RESULTS

PUZZLE 21: AIRPLANES AND PILOTS
PUZZLE 22: LANDLORD
PUZZLE 23: MAGAZINE
PUZZLE 24 ONE IN TEN
PUZZLE 25: MILESTONE
PUZZLE 26: DATAFLOW DIAGRAMS
PUZZLE 27: FINDING EQUAL SETS
PUZZLE 28: CALCULATE THE SINE FUNCTION
PUZZLE 29: FIND THE MODE COMPUTATION
PUZZLE 30: AVERAGE SALES WAIT

PUZZLE 31: BUYING ALL THE PRODUCTS
PUZZLE 32: COMPUTING TAXES
PUZZLE 33: COMPUTING DEPRECIATION
PUZZLE 34: CONSULTANT BILLING
PUZZLE 35: REQUISITIONS
PUZZLE 36: DOUBLE DUTY
PUZZLE 37: MOVING AVERAGE
PUZZLE 38: JOURNAL UPDATING
PUZZLE 39: INSURANCE LOSSES
PUZZLE 40: PERMUTATIONS:

PUZZLE 41: BUDGETING
PUZZLE 42: COUNTING FISH
PUZZLE 43: GRADUATION
PUZZLE 44: PAIRS OF STYLES
PUZZLE 45: PEPPERONI PIZZA
PUZZLE 46: SALES PROMOTIONS
PUZZLE 47: BLOCKS OF SEATS
PUZZLE 48: UN-GROUPING
PUZZLE 49: WIDGET COUNT
PUZZLE 50: TWO OF THREE

PUZZLE 51: BUDGET VERSUS ACTUAL
PUZZLE 52: PERSONNEL PROBLEM
PUZZLE 53: COLLAPSING A TABLE BY COLUMNS
PUZZLE 54: POTENTIAL DUPLICATES
PUZZLE 55: PLAYING THE PONIES
PUZZLE 56: HOTEL ROOM NUMBERS
PUZZLE 57: GAPS - 1
PUZZLE 58: GAPS - 2
PUZZLE 59: MERGING TIME PERIODS
PUZZLE 60: BARCODES

PUZZLE 61: SORTING A STRING
PUZZLE 62: REPORT FORMATTING
PUZZLE 63: CONTIGUOUS GROUPINGS
PUZZLE 64: BOXES
PUZZLE 65: AGE RANGES FOR PRODUCTS
PUZZLE 66: FOUR CONSECUTIVE ABSENCES
PUZZLE 67: STABLE MARRIAGES PROBLEM
PUZZLE 68: CATCHING THE NEXT BUS
PUZZLE 69: LIFO-FIFO INVENTORY
PUZZLE 70: STOCK TRENDS

PUZZLE 71: CALCULATIONS
PUZZLE 72: SCHEDULING SERVICE CALLS
PUZZLE 73: A LITTLE DATA SCRUBBING

PUZZLE 74: DERIVED TABLES OR NOT?
PUZZLE 75: FINDING A PUB

From the B&N Reads Blog

Page 1 of

Joe Celko's SQL Puzzles and Answers, Second Edition

Joe Celko's SQL Puzzles and Answers, Second Edition

eBook

eBook

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

JOE CELKO'S SQL PUZZLES & ANSWERS

MORGAN KAUFMANN PUBLISHERS

Chapter One

Table of Contents

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Read an Excerpt

JOE CELKO'S SQL PUZZLES & ANSWERS

MORGAN KAUFMANN PUBLISHERS

Chapter One

Table of Contents

Related Subjects

Customer Reviews