- Shopping Bag ( 0 items )
-
All (5) from $93.61
-
New (4) from $93.92
-
Used (1) from $93.61
More About This Textbook
Overview
This book is a through introduction to the most important topics in data mining and machine learning. It begins with a detailed review of classical function estimation and proceeds with chapters on nonlinear regression, classification, and ensemble methods. The final chapters focus on clustering, dimension reduction, variable selection, and multiple comparisons. All these topics have undergone extraordinarily rapid development in recent years and this treatment offers a modern perspective emphasizing the most recent contributions. The presentation of foundational results is detailed and includes many accessible proofs not readily available outside original sources. While the orientation is conceptual and theoretical, the main points are regularly reinforced by computational comparisons.
Intended primarily as a graduate level textbook for statistics, computer science, and electrical engineering students, this book assumes only a strong foundation in undergraduate statistics and mathematics, and facility with using R packages. The text has a wide variety of problems, many of an exploratory nature. There are numerous computed examples, complete with code, so that further computations can be carried out readily. The book also serves as a handbook for researchers who want a conceptual overview of the central topics in data mining and machine learning.
Product Details
Table of Contents
Preface v
1 Variability, Information, and Prediction 1
1.0.1 The Curse of Dimensionality 3
1.0.2 The Two Extremes 4
1.1 Perspectives on the Curse 5
1.1.1 Sparsity 6
1.1.2 Exploding Numbers of Models 8
1.1.3 Multicollinearity and Concurvity 9
1.1.4 The Effect of Noise 10
1.2 Coping with the Curse 11
1.2.1 Selecting Design Points 11
1.2.2 Local Dimension 12
1.2.3 Parsimony 17
1.3 Two Techniques 18
1.3.1 The Bootstrap 18
1.3.2 Cross-Validation 27
1.4 Optimization and Search 32
1.4.1 Univariate Search 32
1.4.2 Multivariate Search 33
1.4.3 General Searches 34
1.4.4 Constraint Satisfaction and Combinatorial Search 35
1.5 Notes 38
1.5.1 Hammersley Points 38
1.5.2 Edgeworth Expansions for the Mean 39
1.5.3 Bootstrap Asymptotics for the Studentized Mean 41
1.6 Exercises 43
2 Local Smoothers 53
2.1 Early Smoothers 55
2.2 Transition to Classical Smoothers 59
2.2.1 Global Versus Local Approximations 60
2.2.2 LOESS 64
2.3 Kernel Smoothers 67
2.3.1 Statistical Function Approximation 68
2.3.2 The Concept of Kernel Methods and the Discrete Case 73
2.3.3 Kernels and Stochastic Designs: Density Estimation 78
2.3.4 Stochastic Designs: Asymptotics for Kernel Smoothers 81
2.3.5 Convergence Theorems and Rates for Kernel Smoothers 86
2.3.6 Kernel and Bandwidth Selection 90
2.3.7 Linear Smoothers 95
2.4 Nearest Neighbors 96
2.5 Applications of Kernel Regression 100
2.5.1 A Simulated Example 100
2.5.2 Ethanol Data 102
2.6 Exercises 107
3 Spline Smoothing 117
3.1 Interpolating Splines 117
3.2 Natural Cubic Splines 123
3.3 Smoothing Splines for Regression 126
3.3.1 Model Selection for Spline Smoothing 129
3.3.2Spline Smoothing Meets Kernel Smoothing 130
3.4 Asymptotic Bias, Variance, and MISE for Spline Smoothers 131
3.4.1 Ethanol Data Example - Continued 133
3.5 Splines Redux: Hilbert Space Formulation 136
3.5.1 Reproducing Kernels 138
3.5.2 Constructing an RKHS 141
3.5.3 Direct Sum Construction for Splines 146
3.5.4 Explicit Forms 149
3.5.5 Nonparametrics in Data Mining and Machine Learning 152
3.6 Simulated Comparisons 154
3.6.1 What Happens with Dependent Noise Models? 157
3.6.2 Higher Dimensions and the Curse of Dimensionality 159
3.7 Notes 163
3.7.1 Sobolev Spaces: Definition 163
3.8 Exercises 164
4 New Wave Nonparametrics 171
4.1 Additive Models 172
4.1.1 The Backfitting Algorithm 173
4.1.2 Concurvity and Inference 177
4.1.3 Nonparametric Optimality 180
4.2 Generalized Additive Models 181
4.3 Projection Pursuit Regression 184
4.4 Neural Networks 189
4.4.1 Backpropagation and Inference 192
4.4.2 Barron's Result and the Curse 197
4.4.3 Approximation Properties 198
4.4.4 Barron's Theorem: Formal Statement 200
4.5 Recursive Partitioning Regression 202
4.5.1 Growing Trees 204
4.5.2 Pruning and Selection 207
4.5.3 Regression 208
4.5.4 Bayesian Additive Regression Trees: BART 210
4.6 MARS 210
4.7 Sliced Inverse Regression 215
4.8 ACE and AVAS 218
4.9 Notes 220
4.9.1 Proof of Barron's Theorem 220
4.10 Exercises 224
5 Supervised Learning: Partition Methods 231
5.1 Multiclass Learning 233
5.2 Discriminant Analysis 235
5.2.1 Distance-Based Discriminant Analysis 236
5.2.2 Bayes Rules 241
5.2.3 Probability-Based Discriminant Analysis 245
5.3 Tree-Based Classifiers 249
5.3.1 Splitting Rules 249
5.3.2 Logic Trees 253
5.3.3 Random Forests 254
5.4 Support Vector Machines 262
5.4.1 Margins and Distances 262
5.4.2 Binary Classification and Risk 265
5.4.3 Prediction Bounds for Function Classes 268
5.4.4 Constructing SVM Classifiers 271
5.4.5 SVM Classification for Nonlinearly Separable Populations 279
5.4.6 SVMs in the General Nonlinear Case 282
5.4.7 Some Kernels Used in SVM Classification 288
5.4.8 Kernel Choice, SVMs and Model Selection 289
5.4.9 Support Vector Regression 290
5.4.10 Multiclass Support Vector Machines 293
5.5 Neural Networks 294
5.6 Notes 296
5.6.1 Hoeffding's Inequality 296
5.6.2 VC Dimension 297
5.7 Exercises 300
6 Alternative Nonparametrics 307
6.1 Ensemble Methods 308
6.1.1 Bayes Model Averaging 310
6.1.2 Bagging 312
6.1.3 Stacking 316
6.1.4 Boosting 318
6.1.5 Other Averaging Methods 326
6.1.6 Oracle Inequalities 328
6.2 Bayes Nonparametrics 334
6.2.1 Dirichlet Process Priors 334
6.2.2 Polya Tree Priors 336
6.2.3 Gaussian Process Priors 338
6.3 The Relevance Vector Machine 344
6.3.1 RVM Regression: Formal Description 345
6.3.2 RVM Classification 349
6.4 Hidden Markov Models - Sequential Classification 352
6.5 Notes 354
6.5.1 Proof of Yang's Oracle Inequality 354
6.5.2 Proof of Lecue's Oracle Inequality 357
6.6 Exercises 359
7 Computational Comparisons 365
7.1 Computational Results: Classification 366
7.1.1 Comparison on Fisher's Iris Data 366
7.1.2 Comparison on Ripley's Data 369
7.2 Computational Results: Regression 376
7.2.1 Vapnik's sinc Function 377
7.2.2 Friedman's Function 389
7.2.3 Conclusions 392
7.3 Systematic Simulation Study 397
7.4 No Free Lunch 400
7.5 Exercises 402
8 Unsupervised Learning: Clustering 405
8.1 Centroid-Based Clustering 408
8.1.1 K-Means Clustering 409
8.1.2 Variants 412
8.2 Hierarchical Clustering 413
8.2.1 Agglomerative Hierarchical Clustering 414
8.2.2 Divisive Hierarchical Clustering 422
8.2.3 Theory for Hierarchical Clustering 426
8.3 Partitional Clustering 430
8.3.1 Model-Based Clustering 432
8.3.2 Graph-Theoretic Clustering 447
8.3.3 Spectral Clustering 452
8.4 Bayesian Clustering 458
8.4.1 Probabilistic Clustering 458
8.4.2 Hypothesis Testing 461
8.5 Computed Examples 463
8.5.1 Ripley's Data 465
8.5.2 Iris Data 475
8.6 Cluster Validation 480
8.7 Notes 484
8.7.1 Derivatives of Functions of a Matrix 484
8.7.2 Kruskal's Algorithm: Proof 484
8.7.3 Prim's Algorithm: Proof 485
8.8 Exercises 485
9 Learning in High Dimensions 493
9.1 Principal Components 495
9.1.1 Main Theorem 496
9.1.2 Key Properties 498
9.1.3 Extensions 500
9.2 Factor Analysis 502
9.2.1 Finding Λ and ψ 504
9.2.2 Finding K 506
9.2.3 Estimating Factor Scores 507
9.3 Projection Pursuit 508
9.4 Independent Components Analysis 511
9.4.1 Main Definitions 511
9.4.2 Key Results 513
9.4.3 Computational Approach 515
9.5 Nonlinear PCs and ICA 516
9.5.1 Nonlinear PCs 517
9.5.2 Nonlinear ICA 518
9.6 Geometric Summarization 518
9.6.1 Measuring Distances to an Algebraic Shape 519
9.6.2 Principal Curves and Surfaces 520
9.7 Supervised Dimension Reduction: Partial Least Squares 523
9.7.1 Simple PLS 523
9.7.2 PLS Procedures 524
9.7.3 Properties of PLS 526
9.8 Supervised Dimension Reduction: Sufficient Dimensions in Regression 527
9.9 Visualization I: Basic Plots 531
9.9.1 Elementary Visualization 534
9.9.2 Projections 541
9.9.3 Time Dependence 543
9.10 Visualization II: Transformations 546
9.10.1 Chernoff Faces 546
9.10.2 Multidimensional Scaling 547
9.10.3 Self-Organizing Maps 553
9.11 Exercises 560
10 Variable Selection 569
10.1 Concepts from Linear Regression 570
10.1.1 Subset Selection 572
10.1.2 Variable Ranking 575
10.1.3 Overview 577
10.2 Traditional Criteria 578
10.2.1 Akaike Information Criterion (AIC) 580
10.2.2 Bayesian Information Criterion (BIC) 583
10.2.3 Choices of Information Criteria 585
10.2.4 Cross Validation 587
10.3 Shrinkage Methods 599
10.3.1 Shrinkage Methods for Linear Models 601
10.3.2 Grouping in Variable Selection 615
10.3.3 Least Angle Regression 617
10.3.4 Shrinkage Methods for Model Classes 620
10.3.5 Cautionary Notes 631
10.4 Bayes Variable Selection 632
10.4.1 Prior Specification 635
10.4.2 Posterior Calculation and Exploration 643
10.4.3 Evaluating Evidence 647
10.4.4 Connections Between Bayesian and Frequentist Methods 650
10.5 Computational Comparisons 653
10.5.1 The n>p Case 653
10.5.2 When p>n 665
10.6 Notes 667
10.6.1 Code for Generating Data in Section 10.5 667
10.7 Exercises 671
11 Multiple Testing 679
11.1 Analyzing the Hypothesis Testing Problem 681
11.1.1 A Paradigmatic Setting 681
11.1.2 Counts for Multiple Tests 684
11.1.3 Measures of Error in Multiple Testing 685
11.1.4 Aspects of Error Control 687
11.2 Controlling the Familywise Error Rate 690
11.2.1 One-Step Adjustments 690
11.2.2 Stepwise p-Value Adjustments 693
11.3 PCER and PFER 695
11.3.1 Null Domination 696
11.3.2 Two Procedures 697
11.3.3 Controlling the Type I Error Rate 702
11.3.4 Adjusted p-Values for PFER/PCER 706
11.4 Controlling the False Discovery Rate 707
11.4.1 FDR and other Measures of Error 709
11.4.2 The Benjamini-Hochberg Procedure 710
11.4.3 A BH Theorem for a Dependent Setting 711
11.4.4 Variations on BH 713
11.5 Controlling the Positive False Discovery Rate 719
11.5.1 Bayesian Interpretations 719
11.5.2 Aspects of Implementation 723
11.6 Bayesian Multiple Testing 727
11.6.1 Fully Bayes: Hierarchical 728
11.6.2 Fully Bayes: Decision theory 731
11.7 Notes 736
11.7.1 Proof of the Benjamini-Hochberg Theorem 736
11.7.2 Proof of the Benjamini-Yekutieli Theorem 739
References 743
Index 773