Table of Contents
Preface ix
1 Fundamentals of Linear Algebra for Deep Learning 1
Data Structures and Operations 1
Matrix Operations 3
Vector Operations 6
Matrix-Vector Multiplication 7
The Fundamental Spaces 7
The Column Space 7
The Null Space 10
Eigenvectors and Eigenvalues 13
Summary 15
2 Fundamentals of Probability 17
Events and Probability 17
Conditional Probability 20
Random Variables 22
Expectation 24
Variance 25
Bayes' Theorem 27
Entropy, Cross Entropy, and KL Divergence 29
Continuous Probability Distributions 32
Summary 36
3 The Neural Network 39
Building Intelligent Machines 39
The Limits of Traditional Computer Programs 40
The Mechanics of Machine Learning 41
The Neuron 45
Expressing Linear Perceptrons as Neurons 47
Feed-Forward Neural Networks 48
Linear Neurons and Their Limitations 51
Sigmoid, Tanh, and ReLU Neurons 51
Softmax Output Layers 54
Summary 54
4 Training Feed-Forward Neural Networks 55
The Fast-Food Problem 55
Gradient Descent 57
The Delta Rule and Learning Rates 58
Gradient Descent with Sigmoidal Neurons 60
The Backpropagation Algorithm 61
Stochastic and Minibatch Gradient Descent 63
Test Sets, Validation Sets, and Overfilling 65
Preventing Overfilling in Deep Neural Networks 71
Summary 76
5 Implementing Neural Networks in PyTorch 77
Introduction to PyTorch 77
Installing PyTorch 77
PyTorch Tensors 78
Tensor Init 78
Tensor Attributes 79
Tensor Operations 80
Gradients in PyTorch 83
The PyTorch nn Module 84
PyTorch Datasets and Dataloaders 87
Building the MNIST Classifier in PyTorch 89
Summary 93
6 Beyond Gradient Descent 95
The Challenges with Gradient Descent 95
Local Minima in the Error Surfaces of Deep Networks 96
Model Identifiability 97
How Pesky Are Spurious Local Minima in Deep Networks? 98
Flat Regions in the Error Surface 101
When the Gradient Points in the Wrong Direction 104
Momentum-Based Optimization 106
A Brief View of Second-Order Methods 109
Learning Rate Adaptation 111
AdaGrad-Accumulating Historical Gradients 111
RMSProp-Exponentially Weighted Moving Average of Gradients 112
Adam-Combining Momentum and RMSProp 113
The Philosophy Behind Optimizer Selection 115
Summary 116
7 Convolutional Neural Networks 117
Neurons in Human Vision 117
The Shortcomings of Feature Selection 118
Vanilla Deep Neural Networks Don't Scale 121
Filters and Feature Maps 122
Full Description of the Convolutional Layer 127
Max Pooling 131
Full Architectural Description of Convolution Networks 132
Closing the Loop on MNIST with Convolutional Networks 134
Image Preprocessing Pipelines Enable More Robust Models 136
Accelerating Training with Batch Normalization 137
Group Normalization for Memory Constrained Learning Tasks 139
Building a Convolutional Network for CIFAR-10 141
Visualizing Learning in Convolutional Networks 143
Residual Learning and Skip Connections for Very Deep Networks 147
Building a Residual Network with Superhuman Vision 149
Leveraging Convolutional Filters to Replicate Artistic Styles 152
Learning Convolutional Filters for Other Problem Domains 154
Summary 155
8 Embedding and Representation Learning 157
Learning Lower-Dimensional Representations 157
Principal Component Analysis 158
Motivating the Autoencoder Architecture 160
Implementing an Autoencoder in PyTorch 161
Denoising to Force Robust Representations 171
Sparsity in Autoencoders 174
When Context Is More Informative than the Input Vector 177
The Word2Vec Framework 179
Implementing the Skip-Gram Architecture 182
Summary 188
9 Models for Sequence Analysis 189
Analyzing Variable-Length Inputs 189
Tackling seq2seq with Neural N-Grams 190
Implementing a Part-of-Speech Tagger 192
Dependency Parsing and SyntaxNet 197
Beam Search and Global Normalization 203
A Case for Stateful Deep Learning Models 206
Recurrent Neural Networks 207
The Challenges with Vanishing Gradients 210
Long Short-Term Memory Units 213
PyTorch Primitives for RNN Models 218
Implementing a Sentiment Analysis Model 219
Solving seq2seq Tasks with Recurrent Neural Networks 224
Augmenting Recurrent Networks with Attention 227
Dissecting a Neural Translation Network 230
Self-Attention and Transformers 239
Summary 242
10 Generative Models 243
Generative Adversarial Networks 244
Variational Autoencoders 249
Implementing a VAE 259
Score-Based Generative Models 264
Denoising Autoencoders and Score Matching 269
Summary 274
11 Methods in Interpretability 275
Overview 275
Decision Trees and Tree-Based Algorithms 276
Linear Regression 280
Methods for Evaluating Feature Importance 281
Permutation Feature Importance 281
Partial Dependence Plots 282
Extractive Rationalization 283
LIME 288
SHAP 292
Summary 297
12 Memory Augmented Neural Networks 299
Neural Turing Machines 299
Attention-Based Memory Access 301
NTM Memory Addressing Mechanisms 303
Differentiable Neural Computers 307
Interference-Free Writing in DNCs 309
DNC Memory Reuse 310
Temporal Linking of DNC Writes 311
Understanding the DNC Read Head 312
The DNC Controller Network 313
Visualizing the DNC in Action 314
Implementing the DNC in PyTorch 317
Teaching a DNC to Read and Comprehend 321
Summary 323
13 Deep Reinforcement Learning 325
Deep Reinforcement Learning Masters Atari Games 325
What Is Reinforcement Learning? 326
Markov Decision Processes 328
Policy 329
Future Return 330
Discounted Future Return 331
Explore Versus Exploit 331
∈-Greedy 333
Annealed ∈-Greedy 333
Policy Versus Value Learning 334
Pole-Cart with Policy Gradients 335
OpenAI Gym 335
Creating an Agent 335
Building the Model and Optimizer 337
Sampling Actions 337
Keeping Track of History 337
Policy Gradient Main Function 338
PGAgent Performance on Pole-Cart 340
Trust-Region Policy Optimization 341
Proximal Policy Optimization 345
Q-Learning and Deep Q-Networks 347
The Bellman Equation 347
Issues with Value Iteration 348
Approximating the Q-Function 348
Deep Q-Network 348
Training DQN 349
Learning Stability 349
Target Q-Network 350
Experience Replay 350
From Q-Function to Policy 350
DQN and the Markov Assumption 351
DQN's Solution to the Markov Assumption 351
Playing Breakout with DQN 351
Building Our Architecture 354
Stacking Frames 354
Setting Up Training Operations 354
Updating Our Target Q-Network 354
Implementing Experience Replay 355
DQN Main Loop 356
DQNAgent Results on Breakout 358
Improving and Moving Beyond DQN 358
Deep Recurrent Q-Networks 359
Asynchronous Advantage Actor-Critic Agent 359
UNsupervised REinforcement and Auxiliary Learning 360
Summary 361
Index 363