Data Science and Machine Learning: Mathematical and Statistical Methods

"This textbook is a well-rounded, rigorous, and informative work presenting the mathematics behind modern machine learning techniques. It hits all the right notes: the choice of topics is up-to-date and perfect for a course on data science for mathematics students at the advanced undergraduate or early graduate level. This book fills a sorely-needed gap in the existing literature by not sacrificing depth for breadth, presenting proofs of major theorems and subsequent derivations, as well as providing a copious amount of Python code. I only wish a book like this had been around when I first began my journey!" -Nicholas Hoell, University of Toronto

"This is a well-written book that provides a deeper dive into data-scientific methods than many introductory texts. The writing is clear, and the text logically builds up regularization, classification, and decision trees. Compared to its probable competitors, it carves out a unique niche. -Adam Loy, Carleton College

The purpose of Data Science and Machine Learning: Mathematical and Statistical Methods is to provide an accessible, yet comprehensive textbook intended for students interested in gaining a better understanding of the mathematics and statistics that underpin the rich variety of ideas and machine learning algorithms in data science.

Key Features:

Focuses on mathematical understanding.
Presentation is self-contained, accessible, and comprehensive.
Extensive list of exercises and worked-out examples.
Many concrete algorithms with Python code.
Full color throughout.

Further Resources can be found on the authors website: https://github.com/DSML-book/Lectures

1133036146

Data Science and Machine Learning: Mathematical and Statistical Methods

Key Features:

Focuses on mathematical understanding.
Presentation is self-contained, accessible, and comprehensive.
Extensive list of exercises and worked-out examples.
Many concrete algorithms with Python code.
Full color throughout.

Further Resources can be found on the authors website: https://github.com/DSML-book/Lectures

120.0 In Stock

Data Science and Machine Learning: Mathematical and Statistical Methods

Add to Wishlist

Data Science and Machine Learning: Mathematical and Statistical Methods

eBook

$120.00

View All Available Formats & Editions

eBook
$120.00

View All Available Formats & Editions

Available on Compatible NOOK devices, the free NOOK App and in My Digital Library.

WANT A NOOK? Explore Now

Buy As Gift

Related collections and offers

Overview

Key Features:

Focuses on mathematical understanding.
Presentation is self-contained, accessible, and comprehensive.
Extensive list of exercises and worked-out examples.
Many concrete algorithms with Python code.
Full color throughout.

Further Resources can be found on the authors website: https://github.com/DSML-book/Lectures

Product Details

ISBN-13:	9781000731071
Publisher:	CRC Press
Publication date:	11/20/2019
Series:	Chapman & Hall/CRC Machine Learning & Pattern Recognition
Sold by:	Barnes & Noble
Format:	eBook
Pages:	532
File size:	34 MB
Note:	This product may take a few minutes to download.

About the Author

Dirk P. Kroese, PhD, is a Professor of Mathematics and Statistics at The University of Queensland. He has published over 120 articles and five books in a wide range of areas in mathematics, statistics, data science, machine learning, and Monte Carlo methods. He is a pioneer of the well-known Cross-Entropy method—an adaptive Monte Carlo technique, which is being used around the world to help solve difficult estimation and optimization problems in science, engineering, and finance.

Zdravko Botev, PhD, is an Australian Mathematical Science Institute Lecturer in Data Science and Machine Learning with an appointment at the University of New South Wales in Sydney, Australia. He is the recipient of the 2018 Christopher Heyde Medal of the Australian Academy of Science for distinguished research in the Mathematical Sciences.

Thomas Taimre, PhD, is a Senior Lecturer of Mathematics and Statistics at The University of Queensland.
His research interests range from applied probability and Monte Carlo methods to applied physics and the remarkably universal self-mixing effect in lasers. He has published over 100 articles, holds a patent, and is the coauthor of Handbook of Monte Carlo Methods (Wiley).

Radislav Vaisman, PhD, is a Lecturer of Mathematics and Statistics at The University of Queensland. His research interests lie at the intersection of applied probability, machine learning, and computer science. He has published over 20 articles and two books.

Preface Notation 1. Importing, Summarizing, and Visualizing Data 1.1 Introduction 1.2 Structuring Features According to Type 1.3 Summary Tables 1.4 Summary Statistics 1.5 Visualizing Data 1.5.1 Plotting Qualitative Variables 1.5.2 Plotting Quantitative Variables 1.5.3 Data Visualization in a Bivariate Setting Exercises 2. Statistical Learning 2.1 Introduction 2.2 Supervised and Unsupervised Learning 2.3 Training and Test Loss 2.4 Tradeoffs in Statistical Learning 2.5 Estimating Risk 2.5.1 In-Sample Risk 2.5.2 Cross-Validation 2.6 Modeling Data 2.7 Multivariate Normal Models 2.8 Normal Linear Models 2.9 Bayesian Learning Exercises 3. Monte Carlo Methods 3.1 Introduction 3.2 Monte Carlo Sampling 3.2.1 Generating Random Numbers 3.2.2 Simulating Random Variables 3.2.3 Simulating Random Vectors and Processes 3.2.4 Resampling 3.2.5 Markov Chain Monte Carlo 3.3 Monte Carlo Estimation 3.3.1 Crude Monte Carlo 3.3.2 Bootstrap Method 3.3.3 Variance Reduction 3.4 Monte Carlo for Optimization 3.4.1 Simulated Annealing 3.4.2 Cross-Entropy Method 3.4.3 Splitting for Optimization3.4.4 Noisy Optimization Exercises 4. Unsupervised Learning 4.1 Introduction 4.2 Risk and Loss in Unsupervised Learning 4.3 Expectation–Maximization (EM) Algorithm 4.4 Empirical Distribution and Density Estimation 4.5 Clustering via Mixture Models 4.5.1 Mixture Models 4.5.2 EM Algorithm for Mixture Models 4.6 Clustering via Vector Quantization 4.6.1 K-Means 4.6.2 Clustering via Continuous Multiextremal Optimization 4.7 Hierarchical Clustering 4.8 Principal Component Analysis (PCA) 4.8.1 Motivation: Principal Axes of an Ellipsoid 4.8.2 PCA and Singular Value Decomposition (SVD) Exercises 5. Regression 5.1 Introduction 5.2 Linear Regression 5.3 Analysis via Linear Models 5.3.1 Parameter Estimation 5.3.2 Model Selection and Prediction 5.3.3 Cross-Validation and Predictive Residual Sum of Squares 5.3.4 In-Sample Risk and Akaike Information Criterion 5.3.5 Categorical Features 5.3.6 Nested Models 5.3.7 Coefficient of Determination 5.4 Inference for Normal Linear Models 5.4.1 Comparing Two Normal Linear Models 5.4.2 Confidence and Prediction Intervals 5.5 Nonlinear Regression Models 5.6 Linear Models in Python 5.6.1 Modeling 5.6.2 Analysis 5.6.3 Analysis of Variance (ANOVA) 5.6.4 Confidence and Prediction Intervals 5.6.5 Model Validation 5.6.6 Variable Selection 5.7 Generalized Linear Models Exercises 6. Regularization and Kernel Methods 6.1 Introduction 6.2 Regularization 6.3 Reproducing Kernel Hilbert Spaces 6.4 Construction of Reproducing Kernels 6.4.1 Reproducing Kernels via Feature Mapping 6.4.2 Kernels from Characteristic Functions 6.4.3 Reproducing Kernels Using Orthonormal Features 6.4.4 Kernels from Kernels 6.5 Representer Theorem 6.6 Smoothing Cubic Splines 6.7 Gaussian Process Regression 6.8 Kernel PCA Exercises 7. Classification 7.1 Introduction 7.2 Classification Metrics 7.3 Classification via Bayes’ Rule 7.4 Linear and Quadratic Discriminant Analysis 7.5 Logistic Regression and Softmax Classification 7.6 K-nearest Neighbors Classification 7.7 Support Vector Machine 7.8 Classification with Scikit-Learn Exercises 8. Decision Trees and Ensemble Methods 8.1 Introduction 8.2 Top-Down Construction of Decision Trees 8.2.1 Regional Prediction Functions 8.2.2 Splitting Rules 8.2.3 Termination Criterion 8.2.4 Basic Implementation 8.3 Additional Considerations 8.3.1 Binary Versus Non-Binary Trees 8.3.2 Data Preprocessing 8.3.3 Alternative Splitting Rules 8.3.4 Categorical Variables 8.3.5 Missing Values 8.4 Controlling the Tree Shape 8.4.1 Cost-Complexity Pruning 8.4.2 Advantages and Limitations of Decision Trees 8.5 Bootstrap Aggregation 8.6 Random Forests 8.7 Boosting Exercises 9. Deep Learning 9.1 Introduction 9.2 Feed-Forward Neural Networks 9.3 Back-Propagation 9.4 Methods for Training 9.4.1 Steepest Descent 9.4.2 Levenberg–Marquardt Method 9.4.3 Limited-Memory BFGS Method 9.4.4 Adaptive Gradient Methods 9.5 Examples in Python 9.5.1 Simple Polynomial Regression 9.5.2 Image Classification Exercises A. Linear Algebra and Functional Analysis A.1 Vector Spaces, Bases, and Matrices A.2 Inner Product A.3 Complex Vectors and Matrices A.4 Orthogonal Projections A.5 Eigenvalues and Eigenvectors A.5.1 Left- and Right-Eigenvectors A.6 Matrix Decompositions A.6.1 (P)LU Decomposition A.6.2 Woodbury Identity A.6.3 Cholesky Decomposition A.6.4 QR Decomposition and the Gram–Schmidt Procedure A.6.5 Singular Value Decomposition A.6.6 Solving Structured Matrix Equations A.7 Functional Analysis A.8 Fourier Transforms A.8.1 Discrete Fourier Transform A.8.2 Fast Fourier Transform B. Multivariate Differentiation and Optimization B.1 Multivariate Differentiation B.1.1 Taylor Expansion B.1.2 Chain Rule B.2 Optimization Theory B.2.1 Convexity and Optimization B.2.2 Lagrangian Method B.2.3 Duality B.3 Numerical Root-Finding and Minimization B.3.1 Newton-Like Methods B.3.2 Quasi-Newton Methods B.3.3 Normal Approximation Method B.3.4 Nonlinear Least Squares B.4 Constrained Minimization via Penalty Functions C. Probability and Statistics C.1 Random Experiments and Probability Spaces C.2 Random Variables and Probability Distributions C.3 Expectation C.4 Joint Distributions C.5 Conditioning and Independence C.5.1 Conditional Probability C.5.2 Independence C.5.3 Expectation and Covariance C.5.4 Conditional Density and Conditional Expectation C.6 Functions of Random Variables C.7 Multivariate Normal Distribution C.8 Convergence of Random Variables C.9 Law of Large Numbers and Central Limit Theorem C.10 Markov Chains C.11 Statistics C.12 Estimation C.12.1 Method of Moments C.12.2 Maximum Likelihood Method C.13 Confidence Intervals C.14 Hypothesis Testing D. Python Primer D.1 Getting Started D.2 Python Objects D.3 Types and Operators D.4 Functions and Methods D.5 Modules D.6 Flow Control D.7 Iteration D.8 Classes D.9 Files D.10 NumPy D.10.1 Creating and Shaping Arrays D.10.2 Slicing D.10.3 Array Operations D.10.4 Random Numbers D.11 Matplotlib D.11.1 Creating a Basic Plot D.12 Pandas D.12.1 Series and DataFrame D.12.2 Manipulating Data Frames D.12.3 Extracting Information D.12.4 Plotting D.13 Scikit-learn D.13.1 Partitioning the Data D.13.2 Standardization D.13.3 Fitting and Prediction D.13.4 Testing the Model D.14 System Calls, URL Access, and Speed-Up Bibliography Index

From the B&N Reads Blog

Page 1 of

Editorial Reviews

"The first impression when handling and opening this book at a random page is superb. A big format (A4) and heavy weight, because the paper quality is high, along with a spectacular style and large font, much colour and many plots, and blocks of python code enhanced in colour boxes. This makes the book attractive and easy to study...The book is a very well-designed data science course, with mathematical rigor in mind. Key concepts are highlighted in red in the margins, often with links to other parts of the book...This book will be excellent for those that want to build a strong mathematical foundation for their knowledge on the main machine learning techniques, and at the same time get python recipes on how to perform the analyses for worked examples."
- Victor Moreno, ISCB News, December 2020
'The way the Python code was written follows the algorithm closely. This is very useful for readers who wish to understand the rationale and flow of the background knowledge. In each chapter, the authors recommend further readings for those who plan to learn advanced topics. Another useful part is that the Python implementation of different statistical learning algorithms is discussed throughout the book. At the end of each chapter, extensive exercises are designed. These exercises can help readers understand the content better. This book would be a good reference for readers who are already experienced with statistical analysis and are looking for theoretical background knowledge of the algorithms.'

-Yin-Ju Lai and Chuhsing Kate Hsiao, Biometrics, vol 77, issue 4, 2021

From the Publisher

Data Science and Machine Learning: Mathematical and Statistical Methods

Data Science and Machine Learning: Mathematical and Statistical Methods

eBook

eBook

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Customer Reviews

Related collections and offers

Overview

Product Details

About the Author

Table of Contents

Related Subjects

Customer Reviews