Table of Contents
Preface ix
1 Test-Driven Machine Learning 1
History of Test-Driven Development 2
TDD and the Scientific Method 2
TDD Makes a Logical Proposition of Validity 3
TDD Involves Writing Your Assumptions Down on Paper or in Code 5
TDD and Scientific Method Work in Feedback Loops 5
Risks with Machine Learning 6
Unstable Data 6
Underfitting 6
Overfitting 8
Unpredictable Future 9
What to Test for to Reduce Risks 9
Mitigate Unstable Data with Seam Testing 9
Check Fit by Cross-Validating 10
Reduce Overfitting Risk by Testing the Speed of Training 12
Monitor for Future Shifts with Precision and Recall 13
Conclusion 13
2 A Quick introduction to Machine Learning 15
What Is Machine Learning? 15
Supervised Learning 16
Unsupervised Learning 16
Reinforcement Learning 17
What Can Machine Learning Accomplish? 17
Mathematical Notation Used Throughout the Book 18
Conclusion 13
3 K-Nearest Neighbors Classification 21
History of K-Nearest Neighbors Classification 22
House Happiness Based on a Neighborhood 22
How Do You Pick K? 25
Guessing K 25
Heuristics for Picking K 26
Algorithms for Picking K 29
What Makes a Neighbor "Near"? 29
Minkowski Distance 30
Mahalanobis Distance 31
Determining Classes 32
Beard and Glasses Detection Using KNN and OpenCV 34
The Class Diagram 35
Raw Image to Avatar 36
The Face Class 39
The Neighborhood Class 42
Conclusion 50
4 Naive Bayesian Classification 51
Using Bayes's Theorem to Find Fraudulent Orders 51
Conditional Probabilities 52
Inverse Conditional Probability (aka Bayes's Theorem) 54
Naive Bayesian Classifier 54
The Chain Rule 55
Naivety in Bayesian Reasoning 55
Pseudocount 57
Spam Filter 58
The Class Diagram 58
Data Source 58
Email Class 59
Tokenization and Context 62
The SpamTrainer 63
Error Minimization Through Cross-Validation 70
Conclusion 74
5 Hidden Markov Models 75
Tracking User Behavior Using State Machines 75
Emissions/Observations of Underlying States 77
Simplification through the Markov Assumption 79
Using Markov Chains Instead of a Finite State Machine 79
Hidden Markov Model 80
Evaluation: Forward-Backward Algorithm 80
Using User Behavior 81
The Decoding Problem through the Viterbi Algorithm 84
The Learning Problem 85
Part-of-Speech Tagging with the Brown Corpus 85
The Seam of Our Part-of-Speech Tagger: CorpusParser 86
Writing the Part-of Speech Tagger 88
Cross-Validating to Get Confidence in the Model 96
How to Make This Model Better 97
Conclusion 97
6 Support Vector Machines 99
Solving the Loyalty Mapping Problem 99
Derivation of SVM 101
Nonlinear Data 102
The Kernel Trick 102
Soft Margins 106
Using SVM to Determine Sentiment 108
The Class Diagram 108
Corpus Class 109
Return a Unique Set of Words from the Corpus 113
The CorpusSet Class 114
The Sentiment Classifier Class 118
Improving Results Over Time 123
Conclusion 123
7 Neural Networks 125
History of Neural Networks 125
What Is an Artificial Neural Network? 126
Input Layer 127
Hidden Layers 128
Neurons 129
Output Layer 135
Training Algorithms 135
Building Neural Networks 139
How Many Hidden Layers? 139
How Many Neurons for Each Layer? 139
Tolerance for Error and Max Epochs 140
Using a Neural Network to Classify a Language 140
Writing the Seam Test for Language 143
Cross-Validating Our Way to a Network Class 145
Tuning the Neural Network 149
Convergence Testing 149
Precision and Recall for Neural Networks 150
Wrap-Up of Example 150
Conclusion 150
8 Clustering 151
User Cohorts 152
K-Means Clustering 154
The K-Means Algorithm 154
The Downside of K-Means Clustering 155
Expectation Maximization (EM) Clustering 155
The Impossibility Theorem 157
Categorizing Music 157
Gathering the Data 158
Analyzing the Data with K-Means 159
EM Clustering 161
EM Jazz Clustering Results 165
Conclusion 166
9 Kernel Ridge Regression 167
Collaborative Filtering 167
Linear Regression Applied to Collaborative Filtering 169
Introducing Regularization, or Ridge Regression 171
Kernel Ridge Regression 173
Wrap-Up of Theory 173
Collaborative Filtering with Beer Styles 174
Data Set 174
The Tools We Will Need 174
Reviewer 177
Writing the Code to Figure Out Someone's Preference 179
Collaborative Filtering with User Preferences 182
Conclusion 182
10 Improving Models and Data Extraction 185
The Problem with the Curse of Dimensionality 185
Feature Selection 186
Feature Transformation 189
Principal Component Analysis (PCA) 192
Independent Component Analysis (ICA) 193
Monitoring Machine Learning Algorithms 195
Precision and Recall: Spam Filter 196
The Confusion Matrix 198
Mean Squared Error 198
The Wilds of Production Environments 200
Conclusion 201
11 Putting It All Together 203
Machine Learning Algorithms Revisited 203
How to Use This Information for Solving Problems 205
What's Next for You? 205
Index 207