Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R

Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R

by Daniel D. Gutierrez

Paperback

$45.57 $49.95 Save 9% Current price is $45.57, Original price is $49.95. You Save 9%.
Use Standard Shipping. For guaranteed delivery by December 24, use Express or Expedited Shipping.

Product Details

ISBN-13: 9781634620963
Publisher: Technics Publications, LLC
Publication date: 10/30/2015
Pages: 284
Product dimensions: 7.30(w) x 9.10(h) x 0.80(d)

Table of Contents

Introduction 1

How This Book is Organized 3

Intended Audience for This Book 6

What you Will Need 7

R Code and Figures 8

Going Beyond This Book 8

Contacting the Author 9

Chapter 1 Machine Learning Overview 11

Types of Machine Learning 13

Use Case Examples of Machine Learning 14

Acquire Valued Shoppers Challenge 15

Netflix 16

Algorithmic Trading Challenge 17

Heritage Health Prize 18

Marketing 19

Sales 20

Supply Chain 20

Risk Management 20

Customer Support 20

Human Resources 20

Google Flu Trends 21

Process of Machine Learning 21

Mathematics Behind Machine Learning 26

Becoming a Data Scientist 27

R Project for Statistical Computing 29

RStudio 30

Using R Packages 32

Data Sets 34

Using R in Production 35

Summary 36

Chapter 2 Data Access 37

Managing Your Working Directory 39

Types of Data Files 40

Sources of Data 41

Downloading Data Sets From the Web 41

Reading CSV Files 44

Reading Excel Files 45

Using File Connections 46

Reading JSON Files 48

Scraping Data From Websites 49

SQL Databases 51

SQL Equivalents in R 55

Reading Twitter Data 60

Reading Data From Google Analytics 62

Writing Data 66

Summary 68

Chapter 3 Data Munging 69

Feature Engineering 72

Data Pipeline 74

Data Sampling 75

Revise Variable Names 76

Create New Variables 78

Discrete Numeric Values 79

Date Handling 80

Binary Categorical Variables 83

Merge Data Sets 85

Ordering Data Sets 86

Reshape Data Sets 88

Data Manipulation Using Dplyr 89

Handle Missing Data 93

Feature Scaling 95

Dimensionality Reduction 96

Summary 100

Chapter 4 Exploratory Data Analysis 103

Numeric Summaries 105

Exploratory Visualizations 108

Histograms 108

Boxplots 110

Barplots 113

Density Plots 114

Scatterplots 116

QQ-Plots 123

Heatmaps 124

Missing Value Plots 125

Expository Plots 126

Summary 127

Chapter 5 Regression 129

Simple Linear Regression 130

Multiple Linear Regression 143

Polynomial Regression 152

Summary 159

Chapter 6 Classification 161

A Simple Example 162

Logistic Regression 164

Classification Trees 169

Naive Bayes 173

K Nearest Neighbors 178

Support Vector Machines 182

Neural Networks 187

Ensembles 194

Random Forests 197

Gradient Boosting Machines 201

Summary 204

Chapter 7 Evaluating Model Performance 207

Overfitting 208

Bias and Variance 215

Confounders 218

Data Leakage 220

Measuring Regression Performance 222

Measuring Classification Performance 227

Cross Validation 230

Other Machine Learning Diagnostics 238

Get More Training Observations 239

Feature Reduction 239

Feature Addition 240

Add Polynomial Features 240

Fine Tuning the Reguiarization Parameter 240

Summary 240

Chapter 8 Unsupervised Learning 243

Clustering 244

Simulating Clusters 246

Hierarchical Clustering 247

K-Means Clustering 255

Principal Component Analysis 260

Summary 271

Index 273

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews