Read an Excerpt
  JOE CELKO'S SQL PUZZLES & ANSWERS 
 By Joe Celko 
 MORGAN KAUFMANN PUBLISHERS 
 Copyright © 2007   Elsevier Inc. 
All right reserved.
 ISBN: 978-0-08-049168-4 
    Chapter One 
  PUZZLE  1 FISCAL YEAR TABLES    
  Let's write some CREATE TABLE statements that are as complete as  possible. This little exercise is important because SQL is a declarative  language and you need to learn how to specify things in the database  instead of in the code.  
     The table looks like this:  
  CREATE TABLE FiscalYearTable1  (fiscal_year INTEGER,   start_date DATE,   end_date DATE);  
     It stores date ranges for determining what fiscal year any given date  belongs to. For example, the federal government runs its fiscal year from  October 1 until the end of September. The scalar subquery you would  use to do this table lookup is:  
  (SELECT F1.fiscal_year    FROM FiscalYearTable1 AS F1   WHERE outside_date BETWEEN F1.start_date AND F1.end_date)  
     Your assignment is to add all the constraints you can think of to the  table to guarantee that it contains only correct information.  
     While vendors all have different date and time functions, let's assume  that all we have is the SQL-92 temporal arithmetic and the function  EXTRACT ([YEAR | MONTH | DAY] FROM <date expression>), which  returns an integer that represents a field within a date.  
  
  Answer #1  
  1. First things first; make all the columns NOT NULL since there     is no good reason to allow them to be NULL.  
  2. Most SQL programmers immediately think in terms of adding     a PRIMARY KEY, so you might add the constraint PRIMARY     KEY (fiscal_year, start_date, end_date) because the fiscal year is     really another name for the pair (start_date, end_date). This is     not enough, because it would allow this sort of error:  
     (1995, '1994-10-01', '1995-09-30')     (1996, '1995-10-01', '1996-08-30') <== error!     (1997, '1996-10-01', '1997-09-30')     (1998, '1997-10-01', '1997-09-30')  
        You could continue along the same lines and fix some problems     by adding the constraints UNIQUE (fiscal_year),     UNIQUE (start_date), and UNIQUE (end_date), since we do     not want duplicate dates in any of those columns.  
  3. The constraint that almost everyone forgets to add because it is     so obvious is:  
        CHECK (start_date < end_date) or CHECK (start_date <=     end_date), as is appropriate.  
  4. A better way would be to use the constraint PRIMARY KEY     (fiscal_year) as before, but then since the start and end dates     are the same within each year, you could use constraints on     those column declarations:  
  CREATE TABLE FiscalYearTable1  (fiscal_year INTEGER NOT NULL PRIMARY KEY,    start_date DATE NOT NULL,    CONSTRAINT valid_start_date     CHECK ((EXTRACT (YEAR FROM start_date) = fiscal_year - 1)              AND (EXTRACT (MONTH FROM start_date) = 10)              AND CHECK (EXTRACT (DAY FROM start_date) = 01)),    end_date DATE NOT NULL,    CONSTRAINT valid_end_date      CHECK ((EXTRACT (YEAR FROM end_date) = fiscal_year)       AND (EXTRACT (MONTH FROM end_date) = 09)       AND (EXTRACT (DAY FROM end_date) = 30)));  
          You could argue for making each predicate a separate constraint       to give more detailed error messages. The predicates on       the year components of the start_date and end_date columns       also guarantee uniqueness because they are derived from the       unique fiscal year.  
  5. Unfortunately, this method does not work for all companies.     Many companies have an elaborate set of rules that involve taking     into account the weeks, weekends, and weekdays involved.  
          They do this to arrive at exactly 360 days or 52 weeks in their       accounting year. In fact, there is a fairly standard accounting       practice of using a "4 weeks, 4 weeks, 5 weeks" quarter with       some fudging at the end of the year; you can have a leftover       week between 3 and 11 days. The answer is a FiscalMonth       table along the same lines as this FiscalYears example.  
     A constraint that will work surprisingly well for such cases is:  
    CHECK ((end_date - start_date) = INTERVAL 359 DAYS)  
  where you adjust the number of days to fit your rules (i.e., 52 weeks * 7  days = 364 days). If the rules allow some variation in the size of the fiscal  year, then replace the equality test with a BETWEEN predicate.  
     Now, true confession time. When I have to load such a table in a  database, I get out my spreadsheet and build a table using the built-in  temporal functions. Spreadsheets have much better temporal functions  than databases, and there is a good chance that the accounting  department already has the fiscal calendar in a spreadsheet.  
  
  PUZZLE  2 ABSENTEES  
  This problem was presented on the MS ACCESS forum on CompuServe  by Jim Chupella. He wanted to create a database that tracks employee  absentee rates. Here is the table you will use:  
  CREATE TABLE Absenteeism  (emp_id INTEGER NOT NULL REFERENCES Personnel (emp_id),   absent_date DATE NOT NULL,   reason_code CHAR (40) NOT NULL REFERENCES ExcuseList  (reason_code),   severity_points INTEGER NOT NULL CHECK (severity_points  BETWEEN 1 AND 4),   PRIMARY KEY (emp_id, absent_date));  
     An employee ID number identifies each employee. The reason_code  is a short text explanation for the absence (for example, "hit by beer  truck," "bad hair day," and so on) that you pull from an ever-growing  and imaginative list, and severity point is a point system that scores the  penalty associated with the absence.  
     If an employee accrues 40 severity points within a one-year period,  you automatically discharge that employee. If an employee is absent  more than one day in a row, it is charged as a long-term illness, not as a  typical absence. The employee does not receive severity points on the  second, third, or later days, nor do those days count toward his or her  total absenteeism.  
     Your job is to write SQL to enforce these two business rules, changing  the schema if necessary.  
  
  Answer #1  
  Looking at the first rule on discharging personnel, the most common  design error is to try to drop the second, third, and later days from the  table. This approach messes up queries that count sick days, and makes  chains of sick days very difficult to find.  
     The trick is to allow a severity score of zero, so you can track the long-term  illness of an employee in the Absenteeism table. Simply change the  severity point declaration to "CHECK (severity_points BETWEEN 0  AND 4)" so that you can give a zero to those absences that do not count.  This is a trick newbies miss because storing a zero seems to be a waste of  space, but zero is a number and the event is a fact that needs to be noted.  
  UPDATE Absenteeism     SET severity_points= 0,         reason_code = 'long term illness'   WHERE EXISTS         (SELECT *            FROM Absenteeism AS A2           WHERE Absenteeism.emp_id = A2.emp_id            AND Absenteeism.absent_date = (A2.absent_date INTERVAL  1 DAY));  
  
     When a new row is inserted, this updatewill look for another absence  on the day before and change its severity point score and reason_code in  accordance with your first rule.  
     The second rule for firing an employee requires that you know what  his or her current point score is. You would write that query as follows:  
  SELECT emp_id, SUM(severity_points)     FROM Absenteeism   GROUP BY emp_id;  
  
     This is the basis for a grouped subquery in the DELETE statement  you finally want. Personnel with less than 40 points will return a NULL,  and the test will fail.  
  DELETE FROM Personnel   WHERE emp_id = (SELECT A1.emp_id                    FROM Absenteeism AS A1                   WHERE A1.emp_id = Personnel.emp_id                   GROUP BY A1.emp_id                  HAVING SUM(severity_points) >= 40);  
     The GROUP BY clause is not really needed in SQL-92, but some older  SQL implementations will require it.  
  
  Answer #2  
  Bert Scalzo, a senior instructor for Oracle Corporation, pointed out that  the puzzle solution had two flaws and room for performance  improvements.  
     The flaws are quite simple. First, the subquery does not check for  personnel accruing 40 or more severity points within a one-year period,  as required. It requires the addition of a date range check in the WHERE  clause:  
  DELETE FROM Personnel   WHERE emp_id = (SELECT A1.emp_id        FROM Absenteeism AS A1            WHERE A1.emp_id = Personnel.emp_id               AND absent_date               BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS               AND CURRENT_TIMESTAMP               GROUP BY A1.emp_id               HAVING SUM(severity_points) >= 40);  
  
     Second, this SQL code deletes only offending personnel and not their  absences. The related Absenteeism row must be either explicitly or  implicitly deleted as well. You could replicate the above deletion for the  Absenteeism table. However, the best solution is to add a cascading  deletion clause to the Absenteeism table declaration:  
  CREATE TABLE Absenteeism    ( ... emp_id INTEGER NOT NULL                 REFERENCES Personnel(emp_id)                 ON DELETE CASCADE,      ...);  
     The performance suggestions are based on some assumptions. If you  can safely assume that the UPDATE is run regularly and people do not  change their departments while they are absent, then you can improve  the UPDATE command's subquery:  
  UPDATE Absenteeism AS A1     SET severity_points = 0,         reason_code = 'long term illness'   WHERE EXISTS     (SELECT *         FROM absenteeism as A2        WHERE A1.emp_id = A2.emp_id          AND (A1.absent_date + INTERVAL 1 DAY) =  A2.absent_date);  
     There is still a problem with long-term illnesses that span weeks. The  current situation is that if you want to spend your weekends being sick,  that is fine with the company. This is not a very nice place to work. If an  employee reports in absent on Friday of week number 1, all of week  number 2, and just Monday of week number 3, the UPDATE will catch  only the five days from week number 2 as long-term illness. The Friday  and Monday will show up as sick days with severity points. The subquery  in the UPDATE requires additional changes to the missed-date chaining.  
     I would avoid problems with weekends by having a code for  scheduled days off (weekends, holidays, vacation, and so forth) that  carry a severity point of zero. A business that has people working  weekend shifts would need such codes.  
  (Continues...)  
  
     
 
 Excerpted from JOE CELKO'S SQL PUZZLES & ANSWERS by Joe Celko  Copyright © 2007   by Elsevier Inc..   Excerpted by permission of MORGAN KAUFMANN PUBLISHERS. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.