ISBN-10:
111844714X
ISBN-13:
9781118447147
Pub. Date:
06/17/2013
Publisher:
Wiley
Data Mining and Business Analytics with R / Edition 1

Data Mining and Business Analytics with R / Edition 1

by Johannes Ledolter
Current price is , Original price is $129.0. You

Temporarily Out of Stock Online

Please check back later for updated availability.

Product Details

ISBN-13: 9781118447147
Publisher: Wiley
Publication date: 06/17/2013
Edition description: New Edition
Pages: 368
Sales rank: 1,104,922
Product dimensions: 15.00(w) x 9.40(h) x 0.80(d)

About the Author

JOHANNES LEDOLTER, PhD, is Professor in both the Department of Management Sciences and the Department of Statistics and Actuarial Science at the University of Iowa. He is a Fellow of the American Statistical Association and the American Society for Quality, and an Elected Member of the International Statistical Institute. Dr. Ledolter is the coauthor of Statistical Methods for Forecasting, Achieving Quality Through Continual Improvement, and Statistical Quality Control: Strategies and Tools for Continual Improvement, all published by Wiley.

Read an Excerpt

Click to read or download

Table of Contents

Preface ix

Acknowledgments xi

1. Introduction 1

Reference 6

2. Processing the Information and Getting to Know Your Data7

2.1 Example 1: 2006 Birth Data 7

2.2 Example 2: Alumni Donations 17

2.3 Example 3: Orange Juice 31

References 39

3. Standard Linear Regression 40

3.1 Estimation in R 43

3.2 Example 1: Fuel Efficiency of Automobiles 43

3.3 Example 2: Toyota Used-Car Prices 47

Appendix 3.A The Effects of Model Overfitting on the AverageMean Square Error of the Regression Prediction 53

References 54

4. Local Polynomial Regression: a Nonparametric RegressionApproach 55

4.1 Model Selection 56

4.2 Application to Density Estimation and the Smoothing ofHistograms 58

4.3 Extension to the Multiple Regression Model 58

4.4 Examples and Software 58

References 65

5. Importance of Parsimony in Statistical Modeling 67

5.1 How Do We Guard Against False Discovery 67

References 70

6. Penalty-Based Variable Selection in Regression Models withMany Parameters (LASSO) 71

6.1 Example 1: Prostate Cancer 74

6.2 Example 2: Orange Juice 78

References 82

7. Logistic Regression 83

7.1 Building a Linear Model for Binary Response Data 83

7.2 Interpretation of the Regression Coefficients in a LogisticRegression Model 85

7.3 Statistical Inference 85

7.4 Classification of New Cases 86

7.5 Estimation in R 87

7.6 Example 1: Death Penalty Data 87

7.7 Example 2: Delayed Airplanes 92

7.8 Example 3: Loan Acceptance 100

7.9 Example 4: German Credit Data 103

References 107

8. Binary Classification, Probabilities, and EvaluatingClassification Performance 108

8.1 Binary Classification 108

8.2 Using Probabilities to Make Decisions 108

8.3 Sensitivity and Specificity 109

8.4 Example: German Credit Data 109

9. Classification Using a Nearest Neighbor Analysis115

9.1 The k-Nearest Neighbor Algorithm 116

9.2 Example 1: Forensic Glass 117

9.3 Example 2: German Credit Data 122

Reference 125

10. The Naïve Bayesian Analysis: a Model forPredicting a Categorical Response from Mostly Categorical

Predictor Variables 126

10.1 Example: Delayed Airplanes 127

Reference 131

11. Multinomial Logistic Regression 132

11.1 Computer Software 134

11.2 Example 1: Forensic Glass 134

11.3 Example 2: Forensic Glass Revisited 141

Appendix 11.A Specification of a Simple Triplet Matrix 147

References 149

12. More on Classification and a Discussion on DiscriminantAnalysis 150

12.1 Fisher’s Linear Discriminant Function 153

12.2 Example 1: German Credit Data 154

12.3 Example 2: Fisher Iris Data 156

12.4 Example 3: Forensic Glass Data 157

12.5 Example 4: MBA Admission Data 159

Reference 160

13. Decision Trees 161

13.1 Example 1: Prostate Cancer 167

13.2 Example 2: Motorcycle Acceleration 179

13.3 Example 3: Fisher Iris Data Revisited 182

14. Further Discussion on Regression and ClassificationTrees, Computer Software, and Other Useful Classification Methods185

14.1 R Packages for Tree Construction 185

14.2 Chi-Square Automatic Interaction Detection (CHAID) 186

14.3 Ensemble Methods: Bagging, Boosting, and Random Forests188

14.4 Support Vector Machines (SVM) 192

14.5 Neural Networks 192

14.6 The R Package Rattle: A Useful Graphical User Interface forData Mining 193

References 195

15. Clustering 196

15.1 k-Means Clustering 196

15.2 Another Way to Look at Clustering: Applying theExpectation-Maximization (EM) Algorithm to Mixtures of NormalDistributions 204

15.3 Hierarchical Clustering Procedures 212

References 219

16. Market Basket Analysis: Association Rules and Lift220

16.1 Example 1: Online Radio 222

16.2 Example 2: Predicting Income 227

References 234

17. Dimension Reduction: Factor Models and PrincipalComponents 235

17.1 Example 1: European Protein Consumption 238

17.2 Example 2: Monthly US Unemployment Rates 243

18. Reducing the Dimension in Regressions with MulticollinearInputs: Principal Components Regression and Partial Least Squares247

18.1 Three Examples 249

References 257

19. Text as Data: Text Mining and Sentiment Analysis258

19.1 Inverse Multinomial Logistic Regression 259

19.2 Example 1: Restaurant Reviews 261

19.3 Example 2: Political Sentiment 266

Appendix 19.A Relationship Between the Gentzkow Shapiro Estimateof “Slant” and Partial Least Squares 268

References 271

20. Network Data 272

20.1 Example 1: Marriage and Power in Fifteenth Century Florence274

20.2 Example 2: Connections in a Friendship Network 278

References 292

Appendix A: Exercises 293

Exercise 1 294

Exercise 2 294

Exercise 3 296

Exercise 4 298

Exercise 5 299

Exercise 6 300

Exercise 7 301

Appendix B: References 338

Index 341

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews

Data Mining and Business Analytics with R / Edition 1 2 out of 5 based on 0 ratings. 1 reviews.
loyolanaveenyonnex 8 months ago
big data, big lies, big theft. American led, NATO ( North america technology orangutans) liars can't we understand the data they collect. just want to do research at others people expense. a few good lies and the few dirty men behind the lies: Singularity: Robot better than humans? oh sure. like the human who built the y2k bug and then looted $200 billion by year 2000 , will now develop a Robot ot do our work? Watson and IBM: 2. Watson has not earned any revenue. Tesla inc, 3. Giga Factory, Nevada USA He Mr elon musk customer of BNY melon, is no Nikola tesla legla heir. He is getting Panasonic , Japan to build batteries and all he does is talk and enjoy resorts with his latest wife. Check the Tesla car, you will see th eli-ion battery is made by panasonic. Elon is like School dropout Larry ellison, oracle, good at power point presentation at oracle world! -one world and unlimited looting ideas! I can give afew more: but leave to the great IIT students amd MIt students to figure out the scams -Quantum computing, 5g