Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

by Chip Huyen
Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

by Chip Huyen

Paperback

$65.99 
  • SHIP THIS ITEM
    Qualifies for Free Shipping
  • PICK UP IN STORE
    Check Availability at Nearby Stores

Related collections and offers


Overview

Machine learning systems are both complex and unique. Complex because they consist of many different components and involve many different stakeholders. Unique because they're data dependent, with data varying wildly from one use case to the next. In this book, you'll learn a holistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

Author Chip Huyen, co-founder of Claypot AI, considers each design decision—such as how to process and create training data, which features to use, how often to retrain models, and what to monitor—in the context of how it can help your system as a whole achieve its objectives. The iterative framework in this book uses actual case studies backed by ample references.

This book will help you tackle scenarios such as:

  • Engineering data and choosing the right metrics to solve a business problem
  • Automating the process for continually developing, evaluating, deploying, and updating models
  • Developing a monitoring system to quickly detect and address issues your models might encounter in production
  • Architecting an ML platform that serves across use cases
  • Developing responsible ML systems

Product Details

ISBN-13: 9781098107963
Publisher: O'Reilly Media, Incorporated
Publication date: 06/21/2022
Pages: 386
Sales rank: 231,554
Product dimensions: 7.00(w) x 9.19(h) x 0.80(d)

About the Author

Chip Huyen (https://huyenchip.com) is a co-founder of Claypot AI, a platform for real-time machine learning. Through her work at NVIDIA, Netflix, and Snorkel AI, she has helped some of the world's largest organizations develop and deploy machine learning systems. She teaches CS 329S: Machine Learning Systems Design at Stanford, whose lecture notes this book is based on.



LinkedIn included her among Top Voices in Software Development (2019) and Top Voices in Data Science & AI (2020). She is also the author of four bestselling Vietnamese books, including the series Xach ba lo len va Di (Pack Your Bag and Go). She also runs a Discord server on MLOps with over 6,000 members (https://discord.com/invite/Mw77HPrgjF).

Table of Contents

Preface ix

1 Overview of Machine Learning Systems 1

When to Use Machine Learning 3

Machine Learning Use Cases 9

Understanding Machine Learning Systems 12

Machine Learning in Research Versus in Production 12

Machine Learning Systems Versus Traditional Software 22

Summary 23

2 Introduction to Machine Learning Systems Design 25

Business and ML Objectives 26

Requirements for ML Systems 29

Reliability 29

Scalability 30

Maintainability 31

Adaptability 31

Iterative Process 32

Framing ML Problems 35

Types of ML Tasks 36

Objective Functions 40

Mind Versus Data 43

Summary 46

3 Data Engineering Fundamentals 49

Data Sources 50

Data Formats 53

JSON 54

Row-Major Versus Column-Major Format 54

Text Versus Binary Format 57

Data Models 58

Relational Model 59

NoSQL 63

Structured Versus Unstructured Data 66

Data Storage Engines and Processing 67

Transactional and Analytical Processing 67

ETL: Extract, Transform, and Load 70

Modes of Dataflow 72

Data Passing Through Databases 72

Data Passing Through Services 73

Data Passing Through Real-Time Transport 74

Batch Processing Versus Stream Processing 78

Summary 79

4 Training Dat 81

Sampling 82

Nonprobability Sampling 83

Simple Random Sampling 84

Stratified Sampling 84

Weighted Sampling 85

Reservoir Sampling 86

Importance Sampling 87

Labeling 88

Hand Labels 88

Natural Labels 91

Handling the Lack of Labels 94

Class Imbalance 102

Challenges of Glass Imbalance 103

Handling Class Imbalance 105

Data Augmentation 113

Simple Label-Preserving Transformations 114

Perturbation 114

Data Synthesis 116

Summary 118

5 Feature Engineering 119

Learned Features Versus Engineered Features 120

Common Feature Engineering Operations 123

Handling Missing Values 123

Scaling 126

Discretization 128

Encoding Categorical Features 129

Feature Crossing 132

Discrete and Continuous Positional Embeddings 133

Data Leakage 135

Common Causes for Data Leakage 137

Detecting Data Leakage 140

Engineering Good Features 141

Feature Importance 142

Feature Generalization 144

Summary 146

6 Model Development and Offline Evaluation 149

Model Development and Training 150

Evaluating ML Models 150

Ensembles 156

Experiment Tracking and Versioning 162

Distributed Training 168

Auto ML 172

Model Offline Evaluation 178

Baselines 179

Evaluation Methods 181

Summary 188

7 Model Deployment and Prediction Service 191

Machine Learning Deployment Myths 194

Myth 1: You Only Deploy One or Two ML Models at a Time 194

Myth 2: If We Don't Do Anything, Model Performance Remains the Same 195

Myth 3: You Won't Need to Update Your Models as Much 196

Myth 4: Most ML Engineers Don't Need to Worry About Scale 196

Batch Prediction Versus Online Prediction 197

From Batch Prediction to Online Prediction 201

Unifying Batch Pipeline and Streaming Pipeline 203

Model Compression 206

Low-Rank Factorization 206

Knowledge Distillation 208

Pruning 208

Quantization 209

ML on the Cloud and on the Edge 212

Compiling and Optimizing Models for Edge Devices 214

ML in Browsers 222

Summary 223

8 Data Distribution Shifts and Monitoring 225

Causes of ML System Failures 226

Software System Failures 227

ML-Specific Failures 229

Data Distribution Shifts 237

Types of Data Distribution Shifts 237

General Data Distribution Shifts 241

Detecting Data Distribution Shifts 242

Addressing Data Distribution Shifts 248

Monitoring and Observability 250

ML-Specific Metrics 251

Monitoring Toolbox 256

Observability 259

Summary 261

9 Continual Learning and Test in Production 263

Continual Learning 264

Stateless Retraining Versus Stateful Training 265

Why Continual Learning? 268

Continual Learning Challenges 270

Four Stages of Continual Learning 274

How Often to Update Your Models 279

Test in Production 281

Shadow Deployment 282

A/B Testing 283

Canary Release 285

Interleaving Experiments 285

Bandits 287

Summary 291

10 Infrastructure and Tooling for MLOps 293

Storage and Compute 297

Public Cloud Versus Private Data Centers 300

Development Environment 302

Dev Environment Setup 303

Standardizing Dev Environments 306

From Dev to Prod: Containers 308

Resource Management 311

Cron, Schedulers, and Orchestrators 311

Data Science Workflow Management 314

ML Platform 319

Model Deployment 320

Model Store 321

Feature Store 325

Build Versus Buy 327

Summary 329

11 The Human Side of Machine Learning 331

User Experience 331

Ensuring User Experience Consistency 332

Combatting "Mostly Correct" Predictions 332

Smooth Failing 334

Team Structure 334

Cross-functional Teams Collaboration 335

End-to-End Data Scientists 335

Responsible AI 339

Irresponsible AI: Case Studies 341

A Framework for Responsible AI 347

Summary 353

Epilogue 355

Index 357

From the B&N Reads Blog

Customer Reviews