Pub. Date:
Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner / Edition 1

Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner / Edition 1


Current price is , Original price is $126.5. You

Temporarily Out of Stock Online

Please check back later for updated availability.

This item is available online through Marketplace sellers.

Product Details

ISBN-13: 9780470084854
Publisher: Wiley
Publication date: 12/11/2006
Edition description: New Edition
Pages: 298
Product dimensions: 7.32(w) x 10.12(h) x 0.79(d)

About the Author

GALIT SHMUELI, PhD, is Associate Professor of Statistics and Director of the eMarkets Research Lab in the Robert H. Smith School of Business at the University of Maryland. Dr. Shmueli is the coauthor of Statistical Methods in e-Commerce Research and Modeling Online Auctions, both published by Wiley.

NITIN R. PATEL, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology for over ten years.

PETER C. BRUCE is President and owner of, the leading provider of online education in statistics.

Table of Contents

Foreword     xiii
Preface     xv
Acknowledgments     xvii
Introduction     1
What Is Data Mining?     1
Where Is Data Mining Used?     2
The Origins of Data Mining     2
The Rapid Growth of Data Mining     3
Why Are There So Many Different Methods?     4
Terminology and Notation     4
Road Maps to This Book     6
Overview of the Data Mining Process     9
Introduction     9
Core Ideas in Data Mining     9
Supervised and Unsupervised Learning     11
The Steps in Data Mining     11
Preliminary Steps     13
Building a Model: Example with Linear Regression     21
Using Excel for Data Mining     27
Problems     31
Data Exploration and Dimension Reduction     35
Introduction     35
Practical Considerations     35
House Prices in Boston     36
Data Summaries     37
Data Visualization     38
Correlation Analysis     40
Reducing the Number of Categories in Categorical Variables     41
Principal Components Analysis     41
Breakfast Cereals     42
Principal Components     45
Normalizing the Data     46
Using Principal Components for Classification and Prediction     49
Problems     51
Evaluating Classification and Predictive Performance     53
Introduction     53
Judging Classification Performance     53
Accuracy Measures     53
Cutoff for Classification     56
Performance in Unequal Importance of Classes     60
Asymmetric Misclassification Costs     61
Oversampling and Asymmetric Costs     66
Classification Using a Triage Strategy     72
Evaluating Predictive Performance     72
Problems     74
Multiple Linear Regression     75
Introduction     75
Explanatory vs. Predictive Modeling     76
Estimating the Regression Equation and Prediction     76
Example: Predicting the Price of Used Toyota Corolla Automobiles     77
Variable Selection in Linear Regression     81
Reducing the Number of Predictors     81
How to Reduce the Number of Predictors     82
Problems     86
Three Simple Classification Methods      91
Introduction     91
Predicting Fraudulent Financial Reporting     91
Predicting Delayed Flights     92
The Naive Rule     92
Naive Bayes     93
Conditional Probabilities and Pivot Tables     94
A Practical Difficulty     94
A Solution: Naive Bayes     95
Advantages and Shortcomings of the naive Bayes Classifier     100
k-Nearest Neighbors     103
Riding Mowers     104
Choosing k     105
k-NN for a Quantitative Response     106
Advantages and Shortcomings of k-NN Algorithms     106
Problems     108
Classification and Regression Trees     111
Introduction     111
Classification Trees     113
Recursive Partitioning     113
Example 1: Riding Mowers     113
Measures of Impurity     115
Evaluating the Performance of a Classification Tree     120
Acceptance of Personal Loan     120
Avoiding Overfitting     121
Stopping Tree Growth: CHAID     121
Pruning the Tree     125
Classification Rules from Trees     130
Regression Trees      130
Prediction     130
Measuring Impurity     131
Evaluating Performance     132
Advantages, Weaknesses, and Extensions     132
Problems     134
Logistic Regression     137
Introduction     137
The Logistic Regression Model     138
Example: Acceptance of Personal Loan     139
Model with a Single Predictor     141
Estimating the Logistic Model from Data: Computing Parameter Estimates     143
Interpreting Results in Terms of Odds     144
Why Linear Regression Is Inappropriate for a Categorical Response     146
Evaluating Classification Performance     148
Variable Selection     148
Evaluating Goodness of Fit     150
Example of Complete Analysis: Predicting Delayed Flights     153
Data Preprocessing     154
Model Fitting and Estimation     155
Model Interpretation     155
Model Performance     155
Goodness of fit     157
Variable Selection     158
Logistic Regression for More Than Two Classes     160
Ordinal Classes     160
Nominal Classes     161
Problems      163
Neural Nets     167
Introduction     167
Concept and Structure of a Neural Network     168
Fitting a Network to Data     168
Tiny Dataset     169
Computing Output of Nodes     170
Preprocessing the Data     172
Training the Model     172
Classifying Accident Severity     176
Avoiding overfitting     177
Using the Output for Prediction and Classification     181
Required User Input     181
Exploring the Relationship Between Predictors and Response     182
Advantages and Weaknesses of Neural Networks     182
Problems     184
Discriminant Analysis     187
Introduction     187
Example 1: Riding Mowers     187
Example 2: Personal Loan Acceptance     188
Distance of an Observation from a Class     188
Fisher's Linear Classification Functions     191
Classification Performance of Discriminant Analysis     194
Prior Probabilities     195
Unequal Misclassification Costs     195
Classifying More Than Two Classes     196
Medical Dispatch to Accident Scenes      196
Advantages and Weaknesses     197
Problems     200
Association Rules     203
Introduction     203
Discovering Association Rules in Transaction Databases     203
Example 1: Synthetic Data on Purchases of Phone Faceplates     204
Generating Candidate Rules     204
The Apriori Algorithm     205
Selecting Strong Rules     206
Support and Confidence     206
Lift Ratio     207
Data Format     207
The Process of Rule Selection     209
Interpreting the Results     210
Statistical Significance of Rules     211
Example 2: Rules for Similar Book Purchases     212
Summary     212
Problems     215
Cluster Analysis     219
Introduction     219
Example: Public Utilities     220
Measuring Distance Between Two Records     222
Euclidean Distance     223
Normalizing Numerical Measurements     223
Other Distance Measures for Numerical Data     223
Distance Measures for Categorical Data     226
Distance Measures for Mixed Data     226
Measuring Distance Between Two Clusters     227
Hierarchical (Agglomerative) Clustering     228
Minimum Distance (Single Linkage)     229
Maximum Distance (Complete Linkage)     229
Group Average (Average Linkage)     230
Dendrograms: Displaying Clustering Process and Results     230
Validating Clusters     231
Limitations of Hierarchical Clustering     232
Nonhierarchical Clustering: The k-Means Algorithm     233
Initial Partition into k Clusters     234
Problems     237
Cases     241
Charles Book Club     241
German Credit     250
Tayko Software Cataloger     254
Segmenting Consumers of Bath Soap     258
Direct-Mail Fundraising     262
Catalog Cross-Selling     265
Predicting Bankruptcy     267
References     271
Index     273

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews