Database Modeling and Design: Logical Design

Database Modeling and Design: Logical Design

by Toby J. Teorey, Sam S. Lightstone, Tom Nadeau, H.V. Jagadish
     
 

View All Available Formats & Editions

Database Modeling and Design focuses on techniques for database design in relational database systems. The book discusses the entity-relationship approach and the unified modeling language approach; both are used throughout the book when it refers to logical database design for the specification of data requirements and conceptual modeling.
The book starts with a… See more details below

Overview

Database Modeling and Design focuses on techniques for database design in relational database systems. The book discusses the entity-relationship approach and the unified modeling language approach; both are used throughout the book when it refers to logical database design for the specification of data requirements and conceptual modeling.
The book starts with a description of the life cycle of databases, and it goes on to discuss the entity-relationship model and unified modeling languages. It demonstrates how data modeling concepts are used in the process of database design. The book covers database normalization, including information about equivalence on the function of the entity-relationship and unified modeling language conceptual models, and the relational model of the Boyce Codd normal form.
The book explores the difference between object-oriented database systems and relational database systems. It includes a description on handling the impedance mismatch problem by the extensions made to relational systems. Web technologies, including an overview and specific database design issues on XML, are also covered in this book. Lastly, IBMs Rational Data Architect, Computer Associates AllFusion ERwin Data Modeler, and Sysbases PowerDesigner are discussed, and their uses for handling complex data modeling problems are demonstrated.
This book will serve as a guide for both novice and experienced professional database practitioners.

  • In-depth detail and plenty of real-world, practical examples throughout
  • Loaded with design rules and illustrative case studies that are applicable to any SQL, UML, or XML-based system
  • Immediately useful to anyone tasked with the creation of data models for the integration of large-scale enterprise data.

Read More

Editorial Reviews

From the Publisher
"An explicit presentation on Business Intelligence."
- P. Pichappan, Department of Information Science, Annamalai University, India

Product Details

ISBN-13:
9780123820211
Publisher:
Elsevier Science
Publication date:
02/10/2011
Series:
Morgan Kaufmann Series in Data Management Systems
Sold by:
Barnes & Noble
Format:
NOOK Book
Pages:
352
File size:
6 MB

Meet the Author

Toby J. Teorey is a professor in the Electrical Engineering and Computer Science Department at the University of Michigan, Ann Arbor. He received his B.S. and M.S. degrees in electrical engineering from the University of Arizona, Tucson, and a Ph.D. in computer sciences from the University of Wisconsin, Madison. He was general chair of the 1981 ACM SIGMOD Conference and program chair for the 1991 Entity-Relationship Conference. Professor Teorey’s current research focuses on database design and data warehousing, OLAP, advanced database systems, and performance of computer networks. He is a member of the ACM and the IEEE Computer Society.
Sam Lightstone is a Senior Technical Staff Member and Development Manager with IBM’s DB2 product development team. His work includes numerous topics in autonomic computing and relational database management systems. He is cofounder and leader of DB2’s autonomic computing R&D effort. He is Chair of the IEEE Data Engineering Workgroup on Self Managing Database Systems and a member of the IEEE Computer Society Task Force on Autonomous and Autonomic Computing. In 2003 he was elected to the Canadian Technical Excellence Council, the Canadian affiliate of the IBM Academy of Technology. He is an IBM Master Inventor with over 25 patents and patents pending; he has published widely on autonomic computing for relational database systems. He has been with IBM since 1991.
Tom Nadeau is the founder of Aladdin Software (aladdinsoftware.com) and works in the area of data and text mining. He received his B.S. degree in computer science and M.S. and Ph.D. degrees in electrical engineering and computer science from the University of Michigan, Ann Arbor. His technical interests include data warehousing, OLAP, data mining and machine learning. He won the best paper award at the 2001 IBM CASCON Conference.
H.V. Jagadish is a professor in EE and CS at the University of Michigan, Ann Arbor, where he is part of the database group affiliated with the bioinformatics program and the Center for Computational Medicine and Bioinformatics. Prior to joining the Michigan faculty, he spent over a decade at AT&T Bell Laboratories as a research scientist where he became head of the Database division.

Read More

Read an Excerpt

DATABASE MODELING AND DESIGN

Logical Design
By TOBY TEOREY SAM LIGHTSTONE TOM NADEAU H. V. JAGADISH

MORGAN KAUFMANN

Copyright © 2011 Elsevier Inc.
All right reserved.

ISBN: 978-0-12-382021-1


Chapter One

INTRODUCTION

CHAPTER OUTLINE Data and Database Management 2 Database Life Cycle 3 Conceptual Data Modeling 9 Summary 10 Tips and Insights for Database Professionals 10 Literature Summary 11

Database technology has evolved rapidly in the past three decades since the rise and eventual dominance of relational database systems. While many specialized database systems (object-oriented, spatial, multimedia, etc.) have found substantial user communities in the sciences and engineering, relational systems remain the dominant database technology for business enterprises.

Relational database design has evolved from an art to a science that has been partially implementable as a set of software design aids. Many of these design aids have appeared as the database component of computer-aided software engineering (CASE) tools, and many of them offer interactive modeling capability using a simplified data modeling approach. Logical design—that is, the structure of basic data relationships and their definition in a particular database system—is largely the domain of application designers. The work of these designers can be effectively done with tools such as the ERwin Data Modeler or Rational Rose with Unified Modeling Language (UML), as well as with a purely manual approach. Physical design—the creation of efficient data storage and retrieval mechanisms on the computing platform you are using—is typically the domain of the database administrator (DBA). Today's DBAs have a variety of vendor-supplied tools available to help design the most efficient databases. This book is devoted to the logical design methodologies and tools most popular for relational databases today. Physical design methodologies and tools are covered in a separate book.

In this chapter, we review the basic concepts of database management and introduce the role of data modeling and database design in the database life cycle.

Data and Database Management

The basic component of a file in a file system is a data item, which is the smallest named unit of data that has meaning in the real world—for example, last name, first name, street address, ID number, and political party. A group of related data items treated as a unit by an application is called a record. Examples of types of records are order, salesperson, customer, product, and department. A file is a collection of records of a single type. Database systems have built upon and expanded these definitions: In a relational database, a data item is called a column or attribute, a record is called a row or tuple, and a file is called a table.

A database is a more complex object; it is a collection of interrelated stored data that serves the needs of multiple users within one or more organizations—that is, an interrelated collection of many different types of tables. The motivation for using databases rather than files has been greater availability to a diverse set of users, integration of data for easier access and update for complex transactions, and less redundancy of data.

A database management system (DBMS) is a generalized software system for manipulating databases. A DBMS supports a logical view (schema, subschema); physical view (access methods, data clustering); data definition language; data manipulation language; and important utilities such as transaction management and concurrency control, data integrity, crash recovery, and security. Relational database systems, the dominant type of systems for well-formatted business databases, also provide a greater degree of data independence than the earlier hierarchical and network (CODASYL) database management systems. Data independence is the ability to make changes in either the logical or physical structure of the database without requiring reprogramming of application programs. It also makes database conversion and reorganization much easier. Relational DBMSs provide a much higher degree of data independence than previous systems; they are the focus of our discussion on data modeling.

Database Life Cycle

The database life cycle incorporates the basic steps involved in designing a global schema of the logical database, allocating data across a computer network, and defining local DBMS-specific schemas. Once the design is completed, the life cycle continues with database implementation and maintenance. This chapter contains an overview of the database life cycle, as shown in Figure 1.1. In succeeding chapters we will focus on the database design process from the modeling of requirements through logical design (Steps I and II below). We illustrate the result of each step of the life cycle with a series of diagrams in Figure 1.2. Each diagram shows a possible form of the output of each step so the reader can see the progression of the design process from an idea to an actual database implementation. These forms are discussed in much more detail in Chapters 2–6.

I. Requirements analysis. The database requirements are determined by interviewing both the producers and users of data and using the information to produce a formal requirements specification. That specification includes the data required for processing, the natural data relationships, and the software platform for the database implementation. As an example, Figure 1.2 (Step I) shows the concepts of products, customers, salespersons, and orders being formulated in the mind of the end user during the interview process.

II. Logical design. The global schema, a conceptual data model diagram that shows all the data and their relationships, is developed using techniques such as entity-relationship (ER) or UML. The data model constructs must be ultimately transformed into tables.

a. Conceptual data modeling. The data requirements are analyzed and modeled by using an ER or UML diagram that includes many features we will study in Chapters 2 and 3, for example, semantics for optional relationships, ternary relationships, supertypes, and subtypes (categories). Processing requirements are typically specified using natural language expressions or SQL commands along with the frequency of occurrence. Figure 1.2 (Step II.a) shows a possible ER model representation of the product/customer database in the mind of the end user.

b. View integration. Usually, when the design is large and more than one person is involved in requirements analysis, multiple views of data and relationships occur, resulting in inconsistencies due to variance in taxonomy, context, or perception. To eliminate redundancy and inconsistency from the model, these views must be "rationalized" and consolidated into a single global view. View integration requires the use of ER semantic tools such as identification of synonyms, aggregation, and generalization. In Figure 1.2 (Step II.b), two possible views of the product/customerdatabase are merged into a single global view based on common data for customer and order. View integration is also important when applications have to be integrated, and each may be written with its own view of the database. c. Transformation of the conceptual data model to SQL tables. Based on a categorization of data modeling constructs and a set of mapping rules, each relationship and its associated entities are transformed into a set of DBMS-specific candidate relational tables. We will show these transformations in standard SQL in Chapter 5. Redundant tables are eliminated as part of this process. In our example, the tables in Step II.c of Figure 1.2 are the result of transformation of the integrated ER model in Step II.b. d. Normalization of tables. Given a table (R), a set of attributes (B) is functionally dependent on another set of attributes (A) if, at each instant of time, each A value is associated with exactly one B value. Functional dependencies (FDs) are derived from the conceptual data model diagram and the semantics of data relationships in the requirements analysis. They represent the dependencies among data elements that are unique identifiers (keys) of entities. Additional FDs, which represent the dependencies between key and nonkey attributes within entities, can be derived from the requirements specification. Candidate relational tables associated with all derived FDs are normalized (i.e., modified by decomposing or splitting tables into smaller tables) using standard normalization techniques. Finally, redundancies in the data that occur in normalized candidate tables are analyzed further for possible elimination, with the constraint that data integrity must be preserved. An example of normalization of the Salesperson table into the new Salesperson and SalesVacations tables is shown in Figure 1.2 from Step II.c to Step II.d.

We note here that database tool vendors tend to use the term logical model to refer to the conceptual data model, and they use the term physical model to refer to the DBMS-specific implementation model (e.g., SQL tables). We also note that many conceptual data models are obtained not from scratch, but from the process of reverse engineering from an existing DBMS-specific schema (Silberschatz et al., 2010).

III. Physical design. The physical design step involves the selection of indexes (access methods), partitioning, and clustering of data. The logical design methodology in Step II simplifies the approach to designing large relational databases by reducing the number of data dependencies that need to be analyzed. This is accomplished by inserting the conceptual data modeling and integration steps (Steps II.a and II.b of Figure 1.2) into the traditional relational design approach. The objective of these steps is an accurate representation of reality. Data integrity is preserved through normalization of the candidate tables created when the conceptual data model is transformed into a relational model. The purpose of physical design is to then optimize performance. As part of the physical design, the global schema can sometimes be refined in limited ways to reflect processing (query and transaction) requirements if there are obvious large gains to be made in efficiency. This is called denormalization. It consists of selecting dominant processes on the basis of high frequency, high volume, or explicit priority; defining simple extensions to tables that will improve query performance; evaluating total cost for query, update, and storage; and considering the side effects, such as possible loss of integrity. This is particularly important for online analytical processing (OLAP) applications.

IV. Database implementation, monitoring, and modification. Once the design is completed, the database can be created through implementation of the formal schema using the data definition language (DDL) of a DBMS. Then the data manipulation language (DML) can be used to query and update the database, as well as to set up indexes and establish constraints, such as referential integrity. The language SQL contains both DDL and DML constructs; for example, the create table command represents DDL, and the select command represents DML. As the database begins operation, monitoring indicates whether performance requirements are being met. If they are not being satisfied, modifications should be made to improve performance. Other modifications may be necessary when requirements change or end user expectations increase with good performance. Thus, the life cycle continues with monitoring, redesign, and modifications. In the next two chapters we look first at the basic data modeling concepts; then, starting in Chapter 4, we apply these concepts to the database design process.

(Continues...)



Excerpted from DATABASE MODELING AND DESIGN by TOBY TEOREY SAM LIGHTSTONE TOM NADEAU H. V. JAGADISH Copyright © 2011 by Elsevier Inc.. Excerpted by permission of MORGAN KAUFMANN. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.

Read More

Customer Reviews

Average Review:

Write a Review

and post it to your social network

     

Most Helpful Customer Reviews

See all customer reviews >