Data Mining and Business Analytics with R [NOOK Book]


Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction ...

See more details below
Data Mining and Business Analytics with R

Available on NOOK devices and apps  
  • NOOK Devices
  • Samsung Galaxy Tab 4 NOOK 7.0
  • Samsung Galaxy Tab 4 NOOK 10.1
  • NOOK HD Tablet
  • NOOK HD+ Tablet
  • NOOK eReaders
  • NOOK Color
  • NOOK Tablet
  • Tablet/Phone
  • NOOK for Windows 8 Tablet
  • NOOK for iOS
  • NOOK for Android
  • NOOK Kids for iPad
  • PC/Mac
  • NOOK for Windows 8
  • NOOK for PC
  • NOOK for Mac
  • NOOK for Web

Want a NOOK? Explore Now

NOOK Book (eBook)
$71.49 price
(Save 42%)$125.00 List Price
Note: This NOOK Book can be purchased in bulk. Please email us for more information.


Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. Data Mining and Business Analytics with R utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification.

Highlighting both underlying concepts and practical computational skills, Data Mining and Business Analytics with R begins with coverage of standard linear regression and the importance of parsimony in statistical modeling. The book includes important topics such as penalty-based variable selection (LASSO); logistic regression; regression and classification trees; clustering; principal components and partial least squares; and the analysis of text and network data. In addition, the book presents:

• A thorough discussion and extensive demonstration of the theory behind the most useful data mining tools

• Illustrations of how to use the outlined concepts in real-world situations

• Readily available additional data sets and related R code allowing readers to apply their own analyses to the discussed materials

• Numerous exercises to help readers with computing skills and deepen their understanding of the material

Data Mining and Business Analytics with R is an excellent graduate-level textbook for courses on data mining and business analytics. The book is also a valuable reference for practitioners who collect and analyze data in the fields of finance, operations management, marketing, and the information sciences.

Read More Show Less

Editorial Reviews

From the Publisher
“I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text.” (Journal of the American Statistical Association, 1 January 2014)
Read More Show Less

Product Details

  • ISBN-13: 9781118572153
  • Publisher: Wiley, John & Sons, Incorporated
  • Publication date: 5/28/2013
  • Sold by: Barnes & Noble
  • Format: eBook
  • Edition number: 1
  • Pages: 368
  • Sales rank: 1,320,042
  • File size: 12 MB
  • Note: This product may take a few minutes to download.

Meet the Author

JOHANNES LEDOLTER, PhD, is Professor in both the Department of Management Sciences and the Department of Statistics and Actuarial Science at the University of Iowa. He is a Fellow of the American Statistical Association and the American Society for Quality, and an Elected Member of the International Statistical Institute. Dr. Ledolter is the coauthor of Statistical Methods for Forecasting, Achieving Quality Through Continual Improvement, and Statistical Quality Control: Strategies and Tools for Continual Improvement, all published by Wiley.

Read More Show Less

Table of Contents

Preface ix

Acknowledgments xi

1. Introduction 1

Reference 6

2. Processing the Information and Getting to Know Your Data 7

2.1 Example 1: 2006 Birth Data 7

2.2 Example 2: Alumni Donations 17

2.3 Example 3: Orange Juice 31

References 39

3. Standard Linear Regression 40

3.1 Estimation in R 43

3.2 Example 1: Fuel Efficiency of Automobiles 43

3.3 Example 2: Toyota Used-Car Prices 47

Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53

References 54

4. Local Polynomial Regression: a Nonparametric Regression Approach 55

4.1 Model Selection 56

4.2 Application to Density Estimation and the Smoothing of Histograms 58

4.3 Extension to the Multiple Regression Model 58

4.4 Examples and Software 58

References 65

5. Importance of Parsimony in Statistical Modeling 67

5.1 How Do We Guard Against False Discovery 67

References 70

6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71

6.1 Example 1: Prostate Cancer 74

6.2 Example 2: Orange Juice 78

References 82

7. Logistic Regression 83

7.1 Building a Linear Model for Binary Response Data 83

7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85

7.3 Statistical Inference 85

7.4 Classification of New Cases 86

7.5 Estimation in R 87

7.6 Example 1: Death Penalty Data 87

7.7 Example 2: Delayed Airplanes 92

7.8 Example 3: Loan Acceptance 100

7.9 Example 4: German Credit Data 103

References 107

8. Binary Classification, Probabilities, and Evaluating Classification Performance 108

8.1 Binary Classification 108

8.2 Using Probabilities to Make Decisions 108

8.3 Sensitivity and Specificity 109

8.4 Example: German Credit Data 109

9. Classification Using a Nearest Neighbor Analysis 115

9.1 The k-Nearest Neighbor Algorithm 116

9.2 Example 1: Forensic Glass 117

9.3 Example 2: German Credit Data 122

Reference 125

10. The Na¨yve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical

Predictor Variables 126

10.1 Example: Delayed Airplanes 127

Reference 131

11. Multinomial Logistic Regression 132

11.1 Computer Software 134

11.2 Example 1: Forensic Glass 134

11.3 Example 2: Forensic Glass Revisited 141

Appendix 11.A Specification of a Simple Triplet Matrix 147

References 149

12. More on Classification and a Discussion on Discriminant Analysis 150

12.1 Fisher’s Linear Discriminant Function 153

12.2 Example 1: German Credit Data 154

12.3 Example 2: Fisher Iris Data 156

12.4 Example 3: Forensic Glass Data 157

12.5 Example 4: MBA Admission Data 159

Reference 160

13. Decision Trees 161

13.1 Example 1: Prostate Cancer 167

13.2 Example 2: Motorcycle Acceleration 179

13.3 Example 3: Fisher Iris Data Revisited 182

14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185

14.1 R Packages for Tree Construction 185

14.2 Chi-Square Automatic Interaction Detection (CHAID) 186

14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188

14.4 Support Vector Machines (SVM) 192

14.5 Neural Networks 192

14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193

References 195

15. Clustering 196

15.1 k-Means Clustering 196

15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204

15.3 Hierarchical Clustering Procedures 212

References 219

16. Market Basket Analysis: Association Rules and Lift 220

16.1 Example 1: Online Radio 222

16.2 Example 2: Predicting Income 227

References 234

17. Dimension Reduction: Factor Models and Principal Components 235

17.1 Example 1: European Protein Consumption 238

17.2 Example 2: Monthly US Unemployment Rates 243

18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247

18.1 Three Examples 249

References 257

19. Text as Data: Text Mining and Sentiment Analysis 258

19.1 Inverse Multinomial Logistic Regression 259

19.2 Example 1: Restaurant Reviews 261

19.3 Example 2: Political Sentiment 266

Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268

References 271

20. Network Data 272

20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274

20.2 Example 2: Connections in a Friendship Network 278

References 292

Appendix A: Exercises 293

Exercise 1 294

Exercise 2 294

Exercise 3 296

Exercise 4 298

Exercise 5 299

Exercise 6 300

Exercise 7 301

Appendix B: References 338

Index 341

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)