Semisupervised Learning for Computational Linguistics / Edition 1

Add to Wishlist

ISBN-10:: 0367388634
ISBN-13:: 9780367388638
Pub. Date:: 09/25/2019
Publisher:: Taylor & Francis

ISBN-10:: 0367388634
ISBN-13:: 9780367388638
Pub. Date:: 09/25/2019
Publisher:: Taylor & Francis

Semisupervised Learning for Computational Linguistics / Edition 1

Paperback

View All Available Formats & Editions

$84.99

SHIP THIS ITEM
In stock. Ships in 1-2 days.
PICK UP IN STORE

Your local store may have stock of this item.

Available within 2 business hours

Overview

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.

The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.

Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.

Product Details
Table of Contents

Product Details

ISBN-13:	9780367388638
Publisher:	Taylor & Francis
Publication date:	09/25/2019
Series:	Chapman & Hall/CRC Computer Science & Data Analysis
Pages:	320
Product dimensions:	6.12(w) x 9.19(h) x (d)

1 Introduction 1

1.1 A brief history 1

1.1.1 Probabilistic methods in computational linguistics 1

1.1.2 Supervised and unsupervised training 2

1.1.3 Semisupervised learning 3

1.2 Semisupervised learning 4

1.2.1 Major varieties of learning problem 4

1.2.2 Motivation 6

1.2.3 Evaluation 7

1.2.4 Active learning 8

1.3 Organization and assumptions 8

1.3.1 Leading ideas 8

1.3.2 Mathematical background 10

1.3.3 Notation 11

2 Self-training and Co-training 13

2.1 Classification 13

2.1.1 The standard setting 13

2.1.2 Features and rules 14

2.1.3 Decision lists 16

2.2 Self-training 18

2.2.1 The algorithm 19

2.2.2 Parameters and variants 20

2.2.3 Evaluation 23

2.2.4 Symmetry of features and instances 25

2.2.5 Related algorithms 27

2.3 Co-Training 28

3 Applications of Self-Training and Co-Training 31

3.1 Part-of-speech tagging 31

3.2 Information extraction 33

3.3 Parsing 35

3.4 Word senses 36

3.4.1 WordNet 36

3.4.2 Word-sense disambiguation 38

3.4.3 Taxonomic inference 40

4 Classification 43

4.1 Two simple classifiers 43

4.1.1 Naive Bayes 43

4.1.2 k-nearest-neighbor classifier 45

4.2 Abstract setting 48

4.2.1 Function approximation 48

4.2.2 Denning success 50

4.2.3 Fit and simplicity 52

4.3 Evaluating detectors and classifiers that abstain 53

4.3.1 Confidence-rated classifiers 53

4.3.2 Measures for detection 54

4.3.3 Idealized performance curves 57

4.3.4 The multiclass case 59

4.4 Binary classifiers and ECOC 62

5 Mathematics for Boundary-Oriented Methods 67

5.1 Linear separators 67

5.1.1 Representing a hyperplane 67

5.1.2 Eliminating the threshold 69

5.1.3 The point-normal form 70

5.1.4 Naive Bayes decision boundary 72

5.2 The gradient 74

5.2.1 Graphs and domains 74

5.2.2 Convexity 76

5.2.3 Differentiation of vector and matrix expressions 79

5.2.4 An example: linear regression 81

5.3 Constrained optimization 83

5.3.1 Optimization 83

5.3.2 Equality constraints 84

5.3.3 Inequality constraints 87

5.3.4 The Wolfe dual 91

6 Boundary-Oriented Methods 95

6.1 The perceptron 97

6.1.1 The algorithm 97

6.1.2 An example 99

6.1.3 Convergence 100

6.1.4 The perceptron algorithm as gradient descent 101

6.2 Game self-teaching 103

6.3 Boosting 105

6.3.1 Abstention 110

6.3.2 Semisupervised boosting 111

6.3.3 Co-boosting 113

6.4 Support Vector Machines (SVMs) 114

6.4.1 The margin 114

6.4.2 Maximizing the margin 116

6.4.3 The nonseparable case 119

6.4.4 Slack in the separable case 121

6.4.5 Multiple slack points 123

6.4.6 Transductive SVMs 125

6.4.7 Training a transductive SVM 127

6.5 Null-category noise model 129

7 Clustering 131

7.1 Cluster and label 131

7.2 Clustering concepts 132

7.2.1 Objective 132

7.2.2 Distance and similarity 133

7.2.3 Graphs 136

7.3 Hierarchical clustering 137

7.4 Self-training revisited 139

7.4.1 k-means clustering 139

7.4.2 Pseudo relevance feedback 140

7.5 Graph mincut 143

7.6 Label propagation 146

7.6.1 Clustering by propagation 146

7.6.2 Self-training as propagation 147

7.6.3 Co-training as propagation 150

7.7 Bibliographic notes 152

8 Generative Models 153

8.1 Gaussian mixtures 153

8.1.1 Definition and geometric interpretation 153

8.1.2 The linear discriminant decision boundary 156

8.1.3 Decision-directed approximation 159

8.1.4 McLachlan's algorithm 162

8.2 The EM algorithm 163

8.2.1 Maximizing likelihood 163

8.2.2 Relative frequency estimation 164

8.2.3 Divergence 166

8.2.4 The EM algorithm 169

9 Agreement Constraints 175

9.1 Co-training 175

9.1.1 The conditional independence assumption 176

9.1.2 The power of conditional independence 178

9.2 Agreement-based self-teaching 182

9.3 Random fields 184

9.3.1 Applied to self-training and co-training 184

9.3.2 Gibbs sampling 186

9.3.3 Markov chains and random walks 187

9.4 Bibliographic notes 192

10 Propagation Methods 193

10.1 Label propagation 194

10.2 Random walks 196

10.3 Harmonic functions 198

10.4 Fluids 203

10.4.1 Flow 203

10.4.2 Pressure 205

10.4.3 Conservation of energy 209

10.4.4 Thomson's principle 210

10.5 Computing the solution 213

10.6 Graph mincuts revisited 215

10.7 Bibliographic notes 220

11 Mathematics for Spectral Methods 221

11.1 Some basic concepts 221

11.1.1 The norm of a vector 221

11.1.2 Matrices as linear operators 222

11.1.3 The column space 222

11.2 Eigenvalues and eigenvectors 224

11.2.1 Definition of eigenvalues and eigenvectors 224

11.2.2 Diagonalization 225

11.2.3 Orthogonal diagonalization 226

11.3 Eigenvalues and the scaling effects of a matrix 227

11.3.1 Matrix norms 227

11.3.2 The Rayleigh quotient 228

11.3.3 The 2x2 case 230

11.3.4 The general case 232

11.3.5 The Courant-Fischer minimax theorem 234

11.4 Bibliographic notes 236

12 Spectral Methods 237

12.1 Simple harmonic motion 237

12.1.1 Harmonics 237

12.1.2 Mixtures of harmonics 239

12.1.3 An oscillating particle 241

12.1.4 A vibrating string 243

12.2 Spectra of matrices and graphs 251

12.2.1 The spectrum of a matrix 252

12.2.2 Relating matrices and graphs 253

12.2.3 The Laplacian matrix and graph spectrum 256

12.3 Spectral clustering 257

12.3.1 The second smallest eigenvector of the Laplacian 257

12.3.2 The cut size and the Laplacian 259

12.3.3 Approximating cut size 260

12.3.4 Minimizing cut size 262

12.3.5 Ratiocut 263

12.4 Spectral methods for semisupervised learning 265

12.4.1 Harmonics and harmonic functions 265

12.4.2 Eigenvalues and energy 267

12.4.3 The Laplacian and random fields 268

12.4.4 Harmonic functions and the Laplacian 270

12.4.5 Using the Laplacian for regularization 272

12.4.6 Transduction to induction 274

12.5 Bibliographic notes 275

Bibliography 277

Index 301

From the B&N Reads Blog

Page 1 of

Overview

Product Details

Table of Contents

Related Subjects

Customer Reviews