Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data / Edition 1

Hardcover (Print)
Buy New
Buy New from BN.com
$71.96
Used and New from Other Sellers
Used and New from Other Sellers
from $72.52
Usually ships in 1-2 business days
(Save 19%)
Other sellers (Hardcover)
  • All (7) from $72.52   
  • New (5) from $72.52   
  • Used (2) from $125.76   

Overview

Each passing year bears witness to the development of ever more powerful computers, increasingly fast and cheap storage media, and even higher bandwidth data connections. This makes it easy to believe that we can now - at least in principle - solve any problem we are faced with so long as we only have enough data.

Yet this is not the case. Although large databases allow us to retrieve many different single pieces of information and to compute simple aggregations, general patterns and regularities often go undetected. Furthermore, it is exactly these patterns regularities and trends that are often most valuable.

To avoid the danger of "drowning in information, but starving for knowledge" the branch of research known as data analysis has emerged, and a considerable number of methods and software tools have been developed. However, it is not these tools alone but the intelligent application of human intuition in combination with computational power, of sound background knowledge with computer-aided modeling, and of critical reflection with convenient automatic model construction, that results in successful intelligent data analysis projects. Guide to Intelligent Data Analysis provides a hand on instructional approach to many basic data analysis techniques, and explains how these are used to solve data analysis problems.

Topics and features:

Guides the reader through the process of data analysis, following the interdependent steps of project understanding, data understanding, data preparation, modeling, and deployment and monitoring.

Equips the reader with the necessary information in order to obtain hands-on experience of the topics under discussion

Provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms

Includes numerous examples using R and KNIME, together with appendices introducing the open source software

Integrates illustrations and case-study-style examples to support pedagogical exposition

Supplies further tools and information at the associated website: idaguide.net/

This practical and systematic textbook/reference for graduate and advanced undergradate students is also essential reading for all professionals who face data analysis problems. Moreover it is a book to be used following one's exploration of it.

Read More Show Less

Editorial Reviews

From the Publisher
From the reviews:
“The authors, leading scholars in the field based in Germany and Spain, seek to offer a hands-on instructional approach to basic data analysis techniques and consider their use in solving problems. The reader is taken through the process, following the interlinked steps of project understanding, data understanding, data preparation, modelling, and deployment and monitoring. The text reviews the basics of classical statistics that support and justify many data analysis methods, and includes a glossary of statistical terms.” (Times Higher Education, 26 May 2011)
“The clear and complete exposition of arguments, along with the attention to formalization and the balanced number of bibliographic references, make this book a bright introduction to intelligent data analysis. It is an excellent choice for graduate or advanced undergraduate courses, as well as for researchers and professionals who want get acquainted with this field of study. … Overall, the authors hit their target producing a textbook that aids in understanding the basic processes, methods, and issues for intelligent data analysis.” (Corrado Mencar, ACM Computing Reviews, April, 2011)
“The book provides a thorough introduction to data mining that covers theoretical background as well as the use of tools (KNIME and R). The book is intended as a textbook for a broad audience from graduate and advanced undergraduate students to professional data analysts. … each chapter ends with a list of references to identify relevant research. Hence, I recommend this book as an introductory text on data analysis for the intended target audience.” (Gottfried Vossen, Zentralblatt MATH, Vol. 1210, 2011)
Read More Show Less

Product Details

  • ISBN-13: 9781848822597
  • Publisher: Springer London
  • Publication date: 7/28/2010
  • Series: Texts in Computer Science Series , #42
  • Edition description: 2010
  • Edition number: 1
  • Pages: 394
  • Sales rank: 917,455
  • Product dimensions: 6.10 (w) x 9.30 (h) x 1.00 (d)

Table of Contents

1 Introduction 1

1.1 Motivation 1

1.1.1 Data and Knowledge 2

1.1.2 Tycho Brahe and Johannes Kepler 4

1.1.3 Intelligent Data Analysis 6

1.2 The Data Analysis Process 7

1.3 Methods, Tasks, and Tools 11

1.4 How to Read This Book 13

References 14

2 Practical Data Analysis: An Example 15

2.1 The Setup 15

2.2 Data Understanding and Pattern Finding 16

2.3 Explanation Finding 20

2.4 Predicting the Future 21

2.5 Concluding Remarks 23

3 Project Understanding 25

3.1 Determine the Project Objective 26

3.2 Assess the Situation 28

3.3 Determine Analysis Goals 30

3.4 Further Reading 31

References 32

4 Data Understanding 33

4.1 Attribute Understanding 34

4.2 Data Quality 37

4.3 Data Visualization 40

4.3.1 Methods for One and Two Attributes 40

4.3.2 Methods for Higher-Dimensional Data 48

4.4 Correlation Analysis 59

4.5 Outlier Detection 62

4.5.1 Outlier Detection for Single Attributes 63

4.5.2 Outlier Detection for Multidimensional Data 64

4.6 Missing Values 65

4.7 A Checklist for Data Understanding 68

4.8 Data Understanding in Practice 69

4.8.1 Data Understanding in KNIME 70

4.8.2 Data Understanding in R 73

References 78

5 Principles of Modeling 81

5.1 Model Classes 82

5.2 Fitting Criteria and Score Functions 85

5.2.1 Error Functions for Classification Problems 87

5.2.2 Measures of Interestingness 89

5.3 Algorithms for Model Fitting 89

5.3.1 Closed Form Solutions 89

5.3.2 Gradient Method 90

5.3.3 Combinatorial Optimization 92

5.3.4 Random Search, Greedy Strategies, and Other Heuristics 92

5.4 Types of Errors 93

5.4.1 Experimental Error 94

5.4.2 Sample Error 99

5.4.3 Model Error 100

5.4.4 Algorithmic Error 101

5.4.5 Machine Learning Bias and Variance 101

5.4.6 Learning Without Bias? 102

5.5 Model Validation 102

5.5.1 Training and Test Data 102

5.5.2 Cross-Validation 103

5.5.3 Bootstrapping 104

5.5.4 Measures for Model Complexity 105

5.6 Model Errors and Validation in Practice 111

5.6.1 Errors and Validation in KNIME 111

5.6.2 Validation in R 111

5.7 Further Reading 113

References 113

6 Data Preparation 115

6.1 Select Data 115

6.1.1 Feature Selection 116

6.1.2 Dimensionality Reduction 121

6.1.3 Record Selection 121

6.2 Clean Data 123

6.2.1 Improve Data Quality 123

6.2.2 Missing Values 124

6.3 Construct Data 127

6.3.1 Provide Operability 127

6.3.2 Assure Impartiality 129

6.3.3 Maximize Efficiency 131

6.4 Complex Data Types 134

6.5 Data Integration 135

6.5.1 Vertical Data Integration 136

6.5.2 Horizontal Data Integration 136

6.6 Data Preparation in Practice 138

6.6.1 Data Preparation in KNIME 139

6.6.2 Data Preparation in R 141

References 142

7 Finding Patterns 145

7.1 Hierarchical Clustering 147

7.1.1 Overview 148

7.1.2 Construction 150

7.1.3 Variations and Issues 152

7.2 Notion of (Dis-)Similarity 155

7.3 Prototype-and Model-Based Clustering 162

7.3.1 Overview 162

7.3.2 Construction 164

7.3.3 Variations and Issues 167

7.4 Density-Based Clustering 169

7.4.1 Overview 170

7.4.2 Construction 171

7.4.3 Variations and Issues 173

7.5 Self-organizing Maps 175

7.5.1 Overview 175

7.5.2 Construction 176

7.6 Frequent Pattern Mining and Association Rules 179

7.6.1 Overview 179

7.6.2 Construction 181

7.6.3 Variations and Issues 187

7.7 Deviation Analysis 194

7.7.1 Overview 194

7.7.2 Construction 195

7.7.3 Variations and Issues 197

7.8 Finding Patterns in Practice 198

7.8.1 Finding Patterns with KNIME 199

7.8.2 Finding Patterns in R 201

7.9 Further Reading 203

References 204

8 Finding Explanations 207

8.1 Decision Trees 208

8.1.1 Overview 209

8.1.2 Construction 210

8.1.3 Variations and Issues 213

8.2 Bayes Classifiers 218

8.2.1 Overview 218

8.2.2 Construction 220

8.2.3 Variations and Issues 224

8.3 Regression 229

8.3.1 Overview 230

8.3.2 Construction 231

8.3.3 Variations and Issues 234

8.3.4 Two Class Problems 242

8.4 Rule learning 244

8.4.1 Propositional Rules 245

8.4.2 Inductive Logic Programming or First-Order Rules 251

8.5 Finding Explanations in Practice 253

8.5.1 Finding Explanations with KNIME 253

8.5.2 Using Explanations with R 255

8.6 Further Reading 257

References 258

9 Finding Predictors 259

9.1 Nearest-Neighbor Predictors 261

9.1.1 Overview 261

9.1.2 Construction 263

9.1.3 Variations and Issues 265

9.2 Artifical Neural Networks 269

9.2.1 Overview 269

9.2.2 Construction 272

9.2.3 Variations and Issues 276

9.3 Support Vector Machines 277

9.3.1 Overview 278

9.3.2 Construction 282

9.3.3 Variations and Issues 283

9.4 Ensemble Methods 284

9.4.1 Overview 284

9.4.2 Construction 286

9.4.3 Further Reading 289

9.5 Finding Predictors in Practice 290

9.5.1 Finding Predictors with KNIME 290

9.5.2 Using Predictors in R 292

References 294

10 Evaluation and Deployment 297

10.1 Evaluation 297

10.2 Deployment and Monitoring 299

References 301

A Statistics 303

A.1 Terms and Notation 304

A.2 Descriptive Statistics 305

A.2.1 Tabular Representations 305

A.2.2 Graphical Representations 306

A.2.3 Characteristic Measures for One-Dimensional Data 309

A.2.4 Characteristic Measures for Multidimensional Data 316

A.2.5 Principal Component Analysis 318

A.3 Probability Theory 323

A.3.1 Probability 323

A.3.2 Basic Methods and Theorems 327

A.3.3 Random Variables 333

A.3.4 Characteristic Measures of Random Variables 339

A.3.5 Some Special Distributions 343

A.4 Inferential Statistics 349

A.4.1 Random Samples 350

A.4.2 Parameter Estimation 351

A.4.3 Hypothesis Testing 361

B The R Project 369

B.1 Installation and Overview 369

B.2 Reading Files and R Objects 370

B.3 R Functions and Commands 372

B.4 Libraries/Packages 373

B.5 R Workspace 373

B.6 Finding Help 374

B.7 Further Reading 374

C Knime 375

C.1 Installation and Overview 375

C.2 Building Workflows 377

C.3 Example Flow 378

C.4 R Integration 380

References 383

Appendix A 383

Appendix B 383

Index 385

Read More Show Less

Customer Reviews

Be the first to write a review
( 0 )
Rating Distribution

5 Star

(0)

4 Star

(0)

3 Star

(0)

2 Star

(0)

1 Star

(0)

Your Rating:

Your Name: Create a Pen Name or

Barnes & Noble.com Review Rules

Our reader reviews allow you to share your comments on titles you liked, or didn't, with others. By submitting an online review, you are representing to Barnes & Noble.com that all information contained in your review is original and accurate in all respects, and that the submission of such content by you and the posting of such content by Barnes & Noble.com does not and will not violate the rights of any third party. Please follow the rules below to help ensure that your review can be posted.

Reviews by Our Customers Under the Age of 13

We highly value and respect everyone's opinion concerning the titles we offer. However, we cannot allow persons under the age of 13 to have accounts at BN.com or to post customer reviews. Please see our Terms of Use for more details.

What to exclude from your review:

Please do not write about reviews, commentary, or information posted on the product page. If you see any errors in the information on the product page, please send us an email.

Reviews should not contain any of the following:

  • - HTML tags, profanity, obscenities, vulgarities, or comments that defame anyone
  • - Time-sensitive information such as tour dates, signings, lectures, etc.
  • - Single-word reviews. Other people will read your review to discover why you liked or didn't like the title. Be descriptive.
  • - Comments focusing on the author or that may ruin the ending for others
  • - Phone numbers, addresses, URLs
  • - Pricing and availability information or alternative ordering information
  • - Advertisements or commercial solicitation

Reminder:

  • - By submitting a review, you grant to Barnes & Noble.com and its sublicensees the royalty-free, perpetual, irrevocable right and license to use the review in accordance with the Barnes & Noble.com Terms of Use.
  • - Barnes & Noble.com reserves the right not to post any review -- particularly those that do not follow the terms and conditions of these Rules. Barnes & Noble.com also reserves the right to remove any review at any time without notice.
  • - See Terms of Use for other conditions and disclaimers.
Search for Products You'd Like to Recommend

Recommend other products that relate to your review. Just search for them below and share!

Create a Pen Name

Your Pen Name is your unique identity on BN.com. It will appear on the reviews you write and other website activities. Your Pen Name cannot be edited, changed or deleted once submitted.

 
Your Pen Name can be any combination of alphanumeric characters (plus - and _), and must be at least two characters long.

Continue Anonymously

    If you find inappropriate content, please report it to Barnes & Noble
    Why is this product inappropriate?
    Comments (optional)