Thoughtful Machine Learning: A Test-Driven Approach
Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis. In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks.

Machine-learning algorithms often have tests baked in, but they can’t account for human errors in coding. Rather than blindly rely on machine-learning results as many researchers have, you can mitigate the risk of errors with TDD and write clean, stable machine-learning code. If you’re familiar with Ruby 2.1, you’re ready to start.

  • Apply TDD to write and run tests before you start coding
  • Learn the best uses and tradeoffs of eight machine learning algorithms
  • Use real-world examples to test each algorithm through engaging, hands-on exercises
  • Understand the similarities between TDD and the scientific method for validating solutions
  • Be aware of the risks of machine learning, such as underfitting and overfitting data
  • Explore techniques for improving your machine-learning models or data extraction
1119253044
Thoughtful Machine Learning: A Test-Driven Approach
Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis. In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks.

Machine-learning algorithms often have tests baked in, but they can’t account for human errors in coding. Rather than blindly rely on machine-learning results as many researchers have, you can mitigate the risk of errors with TDD and write clean, stable machine-learning code. If you’re familiar with Ruby 2.1, you’re ready to start.

  • Apply TDD to write and run tests before you start coding
  • Learn the best uses and tradeoffs of eight machine learning algorithms
  • Use real-world examples to test each algorithm through engaging, hands-on exercises
  • Understand the similarities between TDD and the scientific method for validating solutions
  • Be aware of the risks of machine learning, such as underfitting and overfitting data
  • Explore techniques for improving your machine-learning models or data extraction
42.99 In Stock
Thoughtful Machine Learning: A Test-Driven Approach

Thoughtful Machine Learning: A Test-Driven Approach

by Matthew Kirk
Thoughtful Machine Learning: A Test-Driven Approach

Thoughtful Machine Learning: A Test-Driven Approach

by Matthew Kirk

Paperback

$42.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

Learn how to apply test-driven development (TDD) to machine-learning algorithms—and catch mistakes that could sink your analysis. In this practical guide, author Matthew Kirk takes you through the principles of TDD and machine learning, and shows you how to apply TDD to several machine-learning algorithms, including Naive Bayesian classifiers and Neural Networks.

Machine-learning algorithms often have tests baked in, but they can’t account for human errors in coding. Rather than blindly rely on machine-learning results as many researchers have, you can mitigate the risk of errors with TDD and write clean, stable machine-learning code. If you’re familiar with Ruby 2.1, you’re ready to start.

  • Apply TDD to write and run tests before you start coding
  • Learn the best uses and tradeoffs of eight machine learning algorithms
  • Use real-world examples to test each algorithm through engaging, hands-on exercises
  • Understand the similarities between TDD and the scientific method for validating solutions
  • Be aware of the risks of machine learning, such as underfitting and overfitting data
  • Explore techniques for improving your machine-learning models or data extraction

Product Details

ISBN-13: 9781449374068
Publisher: O'Reilly Media, Incorporated
Publication date: 10/20/2014
Pages: 233
Product dimensions: 6.90(w) x 9.00(h) x 0.70(d)

About the Author

Matthew Kirk holds a B.S. in Economics and a B.S. in Applied and Computational Mathematical Sciences with a concentration in Quantitative Economics from the Universityof Washington. He started Modulus 7, a data science and Ruby development consulting firm, in early 2012. Matthew has spoken around the world about using machine learning and data science with Ruby.

Table of Contents

Preface ix

1 Test-Driven Machine Learning 1

History of Test-Driven Development 2

TDD and the Scientific Method 2

TDD Makes a Logical Proposition of Validity 3

TDD Involves Writing Your Assumptions Down on Paper or in Code 5

TDD and Scientific Method Work in Feedback Loops 5

Risks with Machine Learning 6

Unstable Data 6

Underfitting 6

Overfitting 8

Unpredictable Future 9

What to Test for to Reduce Risks 9

Mitigate Unstable Data with Seam Testing 9

Check Fit by Cross-Validating 10

Reduce Overfitting Risk by Testing the Speed of Training 12

Monitor for Future Shifts with Precision and Recall 13

Conclusion 13

2 A Quick introduction to Machine Learning 15

What Is Machine Learning? 15

Supervised Learning 16

Unsupervised Learning 16

Reinforcement Learning 17

What Can Machine Learning Accomplish? 17

Mathematical Notation Used Throughout the Book 18

Conclusion 13

3 K-Nearest Neighbors Classification 21

History of K-Nearest Neighbors Classification 22

House Happiness Based on a Neighborhood 22

How Do You Pick K? 25

Guessing K 25

Heuristics for Picking K 26

Algorithms for Picking K 29

What Makes a Neighbor "Near"? 29

Minkowski Distance 30

Mahalanobis Distance 31

Determining Classes 32

Beard and Glasses Detection Using KNN and OpenCV 34

The Class Diagram 35

Raw Image to Avatar 36

The Face Class 39

The Neighborhood Class 42

Conclusion 50

4 Naive Bayesian Classification 51

Using Bayes's Theorem to Find Fraudulent Orders 51

Conditional Probabilities 52

Inverse Conditional Probability (aka Bayes's Theorem) 54

Naive Bayesian Classifier 54

The Chain Rule 55

Naivety in Bayesian Reasoning 55

Pseudocount 57

Spam Filter 58

The Class Diagram 58

Data Source 58

Email Class 59

Tokenization and Context 62

The SpamTrainer 63

Error Minimization Through Cross-Validation 70

Conclusion 74

5 Hidden Markov Models 75

Tracking User Behavior Using State Machines 75

Emissions/Observations of Underlying States 77

Simplification through the Markov Assumption 79

Using Markov Chains Instead of a Finite State Machine 79

Hidden Markov Model 80

Evaluation: Forward-Backward Algorithm 80

Using User Behavior 81

The Decoding Problem through the Viterbi Algorithm 84

The Learning Problem 85

Part-of-Speech Tagging with the Brown Corpus 85

The Seam of Our Part-of-Speech Tagger: CorpusParser 86

Writing the Part-of Speech Tagger 88

Cross-Validating to Get Confidence in the Model 96

How to Make This Model Better 97

Conclusion 97

6 Support Vector Machines 99

Solving the Loyalty Mapping Problem 99

Derivation of SVM 101

Nonlinear Data 102

The Kernel Trick 102

Soft Margins 106

Using SVM to Determine Sentiment 108

The Class Diagram 108

Corpus Class 109

Return a Unique Set of Words from the Corpus 113

The CorpusSet Class 114

The Sentiment Classifier Class 118

Improving Results Over Time 123

Conclusion 123

7 Neural Networks 125

History of Neural Networks 125

What Is an Artificial Neural Network? 126

Input Layer 127

Hidden Layers 128

Neurons 129

Output Layer 135

Training Algorithms 135

Building Neural Networks 139

How Many Hidden Layers? 139

How Many Neurons for Each Layer? 139

Tolerance for Error and Max Epochs 140

Using a Neural Network to Classify a Language 140

Writing the Seam Test for Language 143

Cross-Validating Our Way to a Network Class 145

Tuning the Neural Network 149

Convergence Testing 149

Precision and Recall for Neural Networks 150

Wrap-Up of Example 150

Conclusion 150

8 Clustering 151

User Cohorts 152

K-Means Clustering 154

The K-Means Algorithm 154

The Downside of K-Means Clustering 155

Expectation Maximization (EM) Clustering 155

The Impossibility Theorem 157

Categorizing Music 157

Gathering the Data 158

Analyzing the Data with K-Means 159

EM Clustering 161

EM Jazz Clustering Results 165

Conclusion 166

9 Kernel Ridge Regression 167

Collaborative Filtering 167

Linear Regression Applied to Collaborative Filtering 169

Introducing Regularization, or Ridge Regression 171

Kernel Ridge Regression 173

Wrap-Up of Theory 173

Collaborative Filtering with Beer Styles 174

Data Set 174

The Tools We Will Need 174

Reviewer 177

Writing the Code to Figure Out Someone's Preference 179

Collaborative Filtering with User Preferences 182

Conclusion 182

10 Improving Models and Data Extraction 185

The Problem with the Curse of Dimensionality 185

Feature Selection 186

Feature Transformation 189

Principal Component Analysis (PCA) 192

Independent Component Analysis (ICA) 193

Monitoring Machine Learning Algorithms 195

Precision and Recall: Spam Filter 196

The Confusion Matrix 198

Mean Squared Error 198

The Wilds of Production Environments 200

Conclusion 201

11 Putting It All Together 203

Machine Learning Algorithms Revisited 203

How to Use This Information for Solving Problems 205

What's Next for You? 205

Index 207

From the B&N Reads Blog

Customer Reviews