Instance Selection and Construction for Data Mining / Edition 1

Hardcover (Print)
Buy New
Buy New from
Used and New from Other Sellers
Used and New from Other Sellers
from $138.00
Usually ships in 1-2 business days
(Save 48%)
Other sellers (Hardcover)
  • All (6) from $138.00   
  • New (3) from $214.45   
  • Used (3) from $138.00   


The ability to analyze and understand massive data sets lags far behind the ability to gather and store the data. To meet this challenge, knowledge discovery and data mining (KDD) is growing rapidly as an emerging field. However, no matter how powerful computers are now or will be in the future, KDD researchers and practitioners must consider how to manage ever-growing data which is, ironically, due to the extensive use of computers and ease of data collection with computers. Many different approaches have been used to address the data explosion issue, such as algorithm scale-up and data reduction. Instance, example, or tuple selection pertains to methods or algorithms that select or search for a representative portion of data that can fulfill a KDD task as if the whole data is used. Instance selection is directly related to data reduction and becomes increasingly important in many KDD applications due to the need for processing efficiency and/or storage efficiency.
One of the major means of instance selection is sampling whereby a sample is selected for testing and analysis, and randomness is a key element in the process. Instance selection also covers methods that require search. Examples can be found in density estimation (finding the representative instances - data points - for a cluster); boundary hunting (finding the critical instances to form boundaries to differentiate data points of different classes); and data squashing (producing weighted new data with equivalent sufficient statistics). Other important issues related to instance selection extend to unwanted precision, focusing, concept drifts, noise/outlier removal, data smoothing, etc.
Instance Selection and Construction for Data Mining brings researchers and practitioners together to report new developments and applications, to share hard-learned experiences in order to avoid similar pitfalls, and to shed light on the future development of instance selection. This volume serves as a comprehensive reference for graduate students, practitioners and researchers in KDD.

Read More Show Less

Editorial Reviews

For graduate students, practitioners, and researchers in the field of knowledge discovery and data mining, this reference provides an overview of recent work on preprocessing to help handle vast amounts of data. Instance selection and construction are a set of techniques that reduce the quantity of data by selecting a subset and/or constructing a reduced set of data that resembles the original. This collection, which evolved from a two-year project on instance selection, discusses real-world applications, recounts programmers' experiences to help others avoid pitfalls, and sheds light on the future of the field. It covers research on sampling, boundary hunting, data squashing, and other techniques. Annotation c. Book News, Inc., Portland, OR (
Read More Show Less

Product Details

Table of Contents

Foreword xi
Preface xiii
Acknowledgments xv
Contributing Authors xvii
Part I Background and Foundation
1 Data Reduction via Instance Selection 3
1.1 Background 3
1.2 Major Lines of Work 7
1.3 Evaluation Issues 11
1.4 Related Work 13
1.5 Distinctive Contributions 14
1.6 Conclusion and Future Work 18
2 Sampling: Knowing Whole from Its Part 21
2.1 Introduction 21
2.2 Basics of Sampling 22
2.3 General Considerations 23
2.4 Categories of Sampling Methods 26
2.5 Choosing Sampling Methods 36
2.6 Conclusion 37
3 A Unifying View on Instance Selection 39
3.1 Introduction 39
3.2 Focusing Tasks 40
3.3 Evaluation Criteria for Instance Selection 43
3.4 A Unifying Framework for Instance Selection 45
3.5 Evaluation 49
3.6 Conclusions 52
Part II Instance Selection Methods
4 Competence Guided Instance Selection for Case-Based Reasoning 59
4.1 Introduction 59
4.2 Related Work 60
4.3 A Competence Model for CBR 64
4.4 Competence Footprinting 66
4.5 Experimental Analysis 69
4.6 Current Status 74
4.7 Conclusions 74
5 Identifying Competence-Critical Instances for Instance-Based Learners 77
5.1 Introduction 77
5.2 Defining the Problem 78
5.3 Review 82
5.4 Comparative Evaluation 89
5.5 Conclusions 91
6 Genetic-Algorithm-Based Instance and Feature Selection 95
6.1 Introduction 95
6.2 Genetic Algorithms 97
6.3 Performance Evaluation 104
6.4 Effect on Neural Networks 108
6.5 Some Variants 109
6.6 Concluding Remarks 111
7 The Landmark Model: An Instance Selection Method for Time Series Data 113
7.1 Introduction 114
7.2 The Landmark Data Model and Similarity Model 118
7.3 Data Representation 125
7.4 Conclusion 128
Part III Use of Sampling Methods
8 Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms 133
8.1 Introduction 134
8.2 General Rule Selection Problem 136
8.3 Adaptive Sampling Algorithm 138
8.4 An Application of AdaSelect 142
8.5 Concluding Remarks 149
9 Progressive Sampling 151
9.1 Introduction 152
9.2 Progressive Sampling 153
9.3 Determining An Efficient Schedule 155
9.4 Detecting Convergence 161
9.5 Adaptive Scheduling 162
9.6 Empirical Comparison of Sampling Schedules 163
9.7 Discussion 167
9.8 Conclusion 168
10 Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes 171
10.1 Introduction 171
10.2 Induction of Decisionr Trees 172
10.3 Local Sampling Strategies for Decision Trees 175
10.4 Experiments 182
10.5 Conclusion and Future Work 186
11 Incremental Classification Using Tree-Based Sampling for Large Data 189
11.1 Introduction 190
11.2 Related Work 192
11.3 Incremental Classification 193
11.4 Sampling for Incremental Classification 198
11.5 Empirical Results 201
Part IV Unconventional Methods
12 Instance Construction via Likelihood-Based Data Squashing 209
12.1 Introduction 210
12.2 The LDS Algorithm 213
12.3 Evaluation: Logistic Regression 215
12.4 Evaluation: Neural Networks 221
12.5 Iterative LDS 222
12.6 Discussion 224
13 Learning via Prototype Generation and Filtering 227
13.1 Introduction 228
13.2 Related Work 228
13.3 Our Proposed Algorithm 235
13.4 Empirical Evaluation 239
13.5 Conclusions and Future Work 241
14 Instance Selection Based on Hypertuples 245
14.1 Introduction 246
14.2 Definitions and Notation 247
14.3 Merging Hypertuples while Preserving Classification Structure 249
14.4 Merging Hypertuples to Maximize Density 253
14.5 Selection of Reprentative Instances 257
14.6 NN-Based Classification Using Representative Instances 258
14.7 Experiment 259
14.8 Summary and Conclusion 260
15 KBIS: Using Domain Knowledge to Guide Instance Selection 263
15.1 Introduction 264
15.2 Motivation 266
15.3 Methodology 267
15.4 Experimental Setup 274
15.5 Analysis and Evaluation 275
15.6 Conclusions 277
Part V Instance Selection in Model Combination
16 Instance Sampling for Boosted and Standalone Nearest Neighbor Classifiers 283
16.1 The Introduction 284
16.2 Related Research 286
16.3 Sampling for A Standalone Nearest Neighbor Classifier 288
16.4 Coarse Reclassification 290
16.5 A Taxonomy of Instance Types 294
16.6 Conclusions 297
17 Prototype Selection Using Boosted Nearest-Neighbors 301
17.1 Introduction 302
17.2 From Instances to Prototypes and Weak Hypotheses 305
17.3 Experimental Results 310
17.4 Conclusion 316
18 DAGGER: Instance Selection for Combining Multiple Models Learnt from Disjoint Subsets 319
18.1 Introduction 320
18.2 Related Work 321
18.3 The DAGGER Algorithm 323
18.4 A Proof 327
18.5 The Experimental Method 329
18.6 Results 330
18.7 Discussion and Future Work 334
Part VI Applications of Instance Selection
19 Using Genetic Algorithms for Training Data Selection in RBF Networks 339
19.1 Introduction 340
19.2 Training Set Selection: A Brief Review 340
19.3 Genetic Algorithms 342
19.4 Experiments 344
19.5 A Real-World Regression Problem 348
19.6 Conclusions 354
20 An Active Learning Formulation for Instance Selection with Applications to Object Detection 357
20.1 Introduction 358
20.2 The Theoretical Formulation 359
20.3 Comparing Sample Complexity 363
20.4 Instance Selection in An Object Detection Scenario 370
20.5 Conclusion 373
21 Filtering Noisy Instances and Outliers 375
21.1 Introduction 376
21.2 Background and Related Work 377
21.3 Noise Filtering Algorithms 379
21.4 Experimental Evaluation 386
21.5 Summary and Further Work 391
22 Instance Selection Based on Support Vector Machine 395
22.1 Introduction 396
22.2 Support Vector Machines 397
22.3 Instance Discovery Based on Support Vector Machines 398
22.4 Application to The Meningoencephalitis Data Set 401
22.5 Discussion 406
22.6 Conclusions 407
Appendix Meningoencepalitis Data Set 410
Index 413
Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star


4 Star


3 Star


2 Star


1 Star


Your Rating:

Your Name: Create a Pen Name or

Barnes & Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation


  • - By submitting a review, you grant to Barnes & and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Terms of Use.
  • - Barnes & reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)