Advances in Learning Theory: Methods, Models and Applications

Advances in Learning Theory: Methods, Models and Applications

ISBN-10:
1586033417
ISBN-13:
9781586033415
Pub. Date:
05/01/2003
Publisher:
SAGE Publications
ISBN-10:
1586033417
ISBN-13:
9781586033415
Pub. Date:
05/01/2003
Publisher:
SAGE Publications
Advances in Learning Theory: Methods, Models and Applications

Advances in Learning Theory: Methods, Models and Applications

Hardcover

$168.0
Current price is , Original price is $168.0. You
$168.00 
  • SHIP THIS ITEM
    In stock. Ships in 1-2 days.
  • PICK UP IN STORE

    Your local store may have stock of this item.


Overview

In recent years, considerable progress has been made in the understanding of problems of learning and generalization. In this context, intelligence basically means the ability to perform well on new data after learning a model on the basis of given data. Such problems arise in many different areas and are becoming increasingly important and crucial towards many applications such as in bioinformatics, multimedia, computer vision and signal processing, internet search and information retrieval, datamining and textmining, finance, fraud detection, measurement systems, process control and several others. Currently, the development of new technologies enables to generate massive amounts of data containing a wealth of information that remains to become explored. Often the dimensionality of the input spaces in these novel applications is huge. This can be seen in the analysis of micro-array data, for example, where expression levels of thousands of genes need to be analyzed given only a limited number of experiments. Without performing dimensionality reduction, the classical statistical paradigms show fundamental shortcomings at this point. Facing these new challenges, there is a need for new mathematical foundations and models in a way that the data can become processed in a reliable way. The subjects in this publication are very interdisciplinary and relate to problems studied in neural networks, machine learning, mathematics and statistics.

Product Details

ISBN-13: 9781586033415
Publisher: SAGE Publications
Publication date: 05/01/2003
Series: NATO Science , #190
Pages: 440
Product dimensions: 6.14(w) x 9.21(h) x 1.00(d)

Table of Contents

Prefacev
Organizing committeeix
List of chapter contributorsxi
1An Overview of Statistical Learning Theory1
1.1Setting of the Learning Problem2
1.1.1Function estimation model2
1.1.2Problem of risk minimization2
1.1.3Three main learning problems2
1.1.4Empirical risk minimization induction principle4
1.1.5Empirical risk minimization principle and the classical methods4
1.1.6Four parts of learning theory5
1.2The Theory of Consistency of Learning Processes6
1.2.1The key theorem of the learning theory6
1.2.2The necessary and sufficient conditions for uniform convergence7
1.2.3Three milestones in learning theory9
1.3Bounds on the Rate of Convergence of the Learning Processes10
1.3.1The structure of the growth function11
1.3.2Equivalent definition of the VC dimension11
1.3.3Two important examples12
1.3.4Distribution independent bounds for the rate of convergence of learning processes13
1.3.5Problem of constructing rigorous (distribution dependent) bounds14
1.4Theory for Controlling the Generalization of Learning Machines15
1.4.1Structural risk minimization induction principle15
1.5Theory of Constructing Learning Algorithms17
1.5.1Methods of separating hyperplanes and their generalization17
1.5.2Sigmoid approximation of indicator functions and neural nets18
1.5.3The optimal separating hyperplanes19
1.5.4The support vector network21
1.5.5Why can neural networks and support vectors networks generalize?23
1.6Conclusion24
2Best Choices for Regularization Parameters in Learning Theory: On the Bias-Variance Problem29
2.1Introduction30
2.2RKHS and Regularization Parameters30
2.3Estimating the Confidence32
2.4Estimating the Sample Error38
2.5Choosing the optimal [gamma]40
2.6Final Remarks41
3Cucker Smale Learning Theory in Besov Spaces47
3.1Introduction48
3.2Cucker Smale Functional and the Peetre K-Functional48
3.3Estimates for the CS-Functional in Anisotropic Besov Spaces52
4High-dimensional Approximation by Neural Networks69
4.1Introduction70
4.2Variable-basis Approximation and Optimization71
4.3Maurey-Jones-Barron's Theorem73
4.4Variation with respect to a Set of Functions75
4.5Rates of Approximate Optimization over Variable Basis Functions77
4.6Comparison with Linear Approximation79
4.7Upper Bounds on Variation80
4.8Lower Bounds on Variation82
4.9Rates of Approximation of Real-valued Boolean Functions83
5Functional Learning through Kernels89
5.1Some Questions Regarding Machine Learning90
5.2r.k.h.s Perspective91
5.2.1Positive kernels91
5.2.2r.k.h.s and learning in the literature91
5.3Three Principles on the Nature of the Hypothesis Set92
5.3.1The learning problem92
5.3.2The evaluation functional93
5.3.3Continuity of the evaluation functional93
5.3.4Important consequence94
5.3.5IR[superscript x] the set of the pointwise defined functions on x94
5.4Reproducing Kernel Hilbert Space (r.k.h.s)95
5.5Kernel and Kernel Operator97
5.5.1How to build r.k.h.s.?97
5.5.2Carleman operator and the regularization operator98
5.5.3Generalization99
5.6Reproducing Kernel Spaces (r.k.h.s)99
5.6.1Evaluation spaces99
5.6.2Reproducing kernels100
5.7Representer Theorem104
5.8Examples105
5.8.1Examples in Hilbert space105
5.8.2Other examples107
5.9Conclusion107
6Leave-one-out Error and Stability of Learning Algorithms with Applications111
6.1Introduction112
6.2General Observations about the Leave-one-out Error113
6.3Theoretical Attempts to Justify the Use of the Leave-one-out Error116
6.3.1Early work in non-parametric statistics116
6.3.2Relation to VC-theory117
6.3.3Stability118
6.3.4Stability of averaging techniques119
6.4Kernel Machines119
6.4.1Background on kernel machines120
6.4.2Leave-one-out error for the square loss121
6.4.3Bounds on the leave-one-out error and stability122
6.5The Use of the Leave-one-out Error in Other Learning Problems123
6.5.1Transduction123
6.5.2Feature selection and rescaling123
6.6Discussion124
6.6.1Sensitivity analysis, stability, and learning124
6.6.2Open problems124
7Regularized Least-Squares Classification131
7.1Introduction132
7.2The RLSC Algorithm134
7.3Previous Work135
7.4RLSC vs. SVM136
7.5Empirical Performance of RLSC137
7.6Approximations to the RLSC Algorithm139
7.6.1Low-rank approximations for RLSC141
7.6.2Nonlinear RLSC application: image classification142
7.7Leave-one-out Bounds for RLSC146
8Support Vector Machines: Least Squares Approaches and Extensions 155155
8.1Introduction156
8.2Least Squares SVMs for Classification and Function Estimation158
8.2.1LS-SVM classifiers and link with kernel FDA158
8.2.2Function estimation case and equivalence to a regularization network solution161
8.2.3Issues of sparseness and robustness161
8.2.4Bayesian inference of LS-SVMs and Gaussian processes163
8.3Primal-dual Formulations to Kernel PCA and CCA163
8.3.1Kernel PCA as a one-class modelling problem and a primal-dual derivation163
8.3.2A support vector machine formulation to Kernel CCA166
8.4Large Scale Methods and On-line Learning168
8.4.1Nystrom method168
8.4.2Basis construction in the feature space using fixed size LS-SVM169
8.5Recurrent Networks and Control172
8.6Conclusions173
9Extension of the [nu]-SVM Range for Classification179
9.1Introduction180
9.2[nu] Support Vector Classifiers181
9.3Limitation in the Range of [nu]185
9.4Negative Margin Minimization186
9.5Extended [nu]-SVM188
9.5.1Kernelization in the dual189
9.5.2Kernelization in the primal191
9.6Experiments191
9.7Conclusions and Further Work194
10Kernels Methods for Text Processing197
10.1Introduction198
10.2Overview of Kernel Methods198
10.3From Bag of Words to Semantic Space199
10.4Vector Space Representations201
10.4.1Basic vector space model203
10.4.2Generalised vector space model204
10.4.3Semantic smoothing for vector space models204
10.4.4Latent semantic kernels205
10.4.5Semantic diffusion kernels207
10.5Learning Semantics from Cross Language Correlations211
10.6Hypertext215
10.7String Matching Kernels216
10.7.1Efficient computation of SSK219
10.7.2n-grams- a language independent approach220
10.8Conclusions220
11An Optimization Perspective on Kernel Partial Least Squares Regression227
11.1Introduction228
11.2PLS Derivation229
11.2.1PCA regression review229
11.2.2PLS analysis231
11.2.3Linear PLS232
11.2.4Final regression components234
11.3Nonlinear PLS via Kernels236
11.3.1Feature space K-PLS236
11.3.2Direct kernel partial least squares237
11.4Computational Issues in K-PLS238
11.5Comparison of Kernel Regression Methods239
11.5.1Methods239
11.5.2Benchmark cases240
11.5.3Data preparation and parameter tuning240
11.5.4Results and discussion241
11.6Case Study for Classification with Uneven Classes243
11.7Feature Selection with K-PLS243
11.8Thoughts and Conclusions245
12Multiclass Learning with Output Codes251
12.1Introduction252
12.2Margin-based Learning Algorithms253
12.3Output Coding for Multiclass Problems257
12.4Training Error Bounds260
12.5Finding Good Output Codes262
12.6Conclusions263
13Bayesian Regression and Classification267
13.1Introduction268
13.1.1Least squares regression268
13.1.2Regularization269
13.1.3Probabilistic models269
13.1.4Bayesian regression271
13.2Support Vector Machines272
13.3The Relevance Vector Machine273
13.3.1Model specification273
13.3.2The effective prior275
13.3.3Inference276
13.3.4Making predictions277
13.3.5Properties of the marginal likelihood278
13.3.6Hyperparameter optimization279
13.3.7Relevance vector machines for classification280
13.4The Relevance Vector Machine in Action281
13.4.1Illustrative synthetic data: regression281
13.4.2Illustrative synthetic data: classification283
13.4.3Benchmark results284
13.5Discussion285
14Bayesian Field Theory: from Likelihood Fields to Hyperfields289
14.1Introduction290
14.2The Bayesian framework290
14.2.1The basic probabilistic model290
14.2.2Bayesian decision theory and predictive density291
14.2.3Bayes' theorem: from prior and likelihood to the posterior293
14.3Likelihood models295
14.3.1Log-probabilities, energies, and density estimation295
14.3.2Regression297
14.3.3Inverse quantum theory298
14.4Prior models299
14.4.1Gaussian prior factors and approximate symmetries299
14.4.2Hyperparameters and hyperfields303
14.4.3Hyperpriors for hyperfields308
14.4.4Auxiliary fields309
14.5Summary312
15Bayesian Smoothing and Information Geometry319
15.1Introduction320
15.2Problem Statement321
15.3Probability-Based Inference322
15.4Information-Based Inference324
15.5Single-Case Geometry327
15.6Average-Case Geometry331
15.7Similar-Case Modeling332
15.8Locally Weighted Geometry336
15.9Concluding Remarks337
16Nonparametric Prediction341
16.1Introduction342
16.2Prediction for Squared Error342
16.3Prediction for 0 - 1 Loss: Pattern Recognition346
16.4Prediction for Log Utility: Portfolio Selection348
17Recent Advances in Statistical Learning Theory357
17.1Introduction358
17.2Problem Formulations358
17.2.1Uniform convergence of empirical means358
17.2.2Probably approximately correct learning360
17.3Summary of "Classical" Results362
17.3.1Fixed distribution case362
17.3.2Distribution-free case364
17.4Recent Advances365
17.4.1Intermediate families of probability measures365
17.4.2Learning with prior information366
17.5Learning with Dependent Inputs367
17.5.1Problem formulations367
17.5.2Definition of [beta]-mixing368
17.5.3UCEM and PAC learning with [beta]-mixing inputs369
17.6Applications to Learning with Inputs Generated by a Markov Chain371
17.7Conclusions372
18Neural Networks in Measurement Systems (an engineering view)375
18.1Introduction376
18.2Measurement and Modeling377
18.3Neural Networks383
18.4Support Vector Machines389
18.5The Nature of Knowledge, Prior Information393
18.6Questions Concerning Implementation394
18.7Conclusions396
List of participants403
Subject Index411
Author Index415
From the B&N Reads Blog

Customer Reviews