 Shopping Bag ( 0 items )

All (14) from $50.96

New (8) from $76.17

Used (6) from $50.96
More About This Textbook
Overview
The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries  from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.
Editorial Reviews
From the Publisher
“Great introduction to the realworld process of data mining. The overviews, practical advise, tutorials, and extra CD material make this book an invaluable resource for both new and experienced data miners.” Karl Rexer, PhD (President & Founder of Rexer Analytics, Boston, Massachusetts)
If you want to rollup your sleeves and execute on predictive analytics, this is your definite, goto resource. To put it lightly, if this book isn't on your shelf, you're not a data miner.
 Eric Siegel, Ph.D., President, Prediction Impact, Inc. and Founding Chair, Predictive Analytics World
Product Details
Related Subjects
Meet the Author
Dr. Robert Nisbet was trained initially in Ecology and Ecosystems Analysis. He has over 30 years’ experience in complex systems analysis and modeling, most recently as a Researcher (University of California, Santa Barbara). In business, he pioneered the design and development of configurable data mining applications for retail sales forecasting, and Churn, Propensitytobuy, and Customer Acquisition in Telecommunications Insurance, Banking, and Credit industries. In addition to data mining, he has expertise in data warehousing technology for Extract, Transform, and Load (ETL) operations, Business Intelligence reporting, and data quality analyses. He is lead author of the “Handbook of Statistical Analysis & Data Mining Applications” (Academic Press, 2009), and a coauthor of "Practical Text Mining" (Academic Press, 2012). Currently, he serves as an Instructor in the University of California, Irvine Predictive Analytics Certification Program, teaching online courses in Effective Data preparation, and coteaching Introduction to Predictive Analytics.
Dr. John Elder heads the United States’ leading data mining consulting team, with offices in Charlottesville, Virginia; Washington, D.C.; and Baltimore, Maryland (www.datamininglab.com). Founded in 1995, Elder Research, Inc. focuses on investment, commercial, and security applications of advanced analytics, including text mining, image recognition, process optimization, crossselling, biometrics, drug efficacy, credit scoring, market sector timing, and fraud detection. John obtained a B.S. and an M.E.E. in electrical engineering from Rice University and a Ph.D. in systems engineering from the University of Virginia, where he’s an adjunct professor teaching Optimization or Data Mining. Prior to 16 years at ERI, he spent five years in aerospace defense consulting, four years heading research at an investment management firm, and two years in Rice's Computational & Applied Mathematics Department.
Dr. Gary Miner received a B.S. from Hamline University, St. Paul, MN, with biology, chemistry, and education majors; an M.S. in zoology and population genetics from the University of Wyoming; and a Ph.D. in biochemical genetics from the University of Kansas as the recipient of a NASA predoctoral fellowship. He pursued additional National Institutes of Health postdoctoral studies at the U of Minnesota and U of Iowa eventually becoming immersed in the study of affective disorders and Alzheimer's disease.
In 1985, he and his wife, Dr. Linda WintersMiner, founded the Familial Alzheimer's Disease Research Foundation, which became a leading force in organizing both local and international scientific meetings, bringing together all the leaders in the field of genetics of Alzheimer's from several countries, resulting in the first major book on the genetics of Alzheimer’s disease. In the mid1990s, Dr. Miner turned his data analysis interests to the business world, joining the team at StatSoft and deciding to specialize in data mining. He started developing what eventually became the Handbook of Statistical Analysis and Data Mining Applications (coauthored with Drs. Robert A. Nisbet and John Elder), which received the 2009 American Publishers Award for Professional and Scholarly Excellence (PROSE). Their followup collaboration, Practical Text Mining and Statistical Analysis for Nonstructured Text Data Applications, also received a PROSE award in February of 2013. Overall, Dr. Miner’s career has focused on medicine and health issues, so serving as the ‘project director’ for this current book on ‘Predictive Analytics of Medicine  Healthcare Issues’ fit his knowledge and skills perfectly.
Gary also serves as VP & Scientific Director of Healthcare Predictive Analytics Corp; as Merit Reviewer for PCORI (Patient Centered Outcomes Research Institute) that awards grants for predictive analytics research into the comparative effectiveness and heterogeneous treatment effects of medical interventions including drugs among different genetic groups of patients; additionally he teaches online classes in ‘Introduction to Predictive Analytics’, ‘Text Analytics’, and ‘Risk Analytics’ for the University of CaliforniaIrvine, and other classes in medical predictive analytics for the University of CaliforniaSan Diego; he spends most of his time in his primary role as Senior AnalystHealthcare Applications Specialist for Dell  Information Management Group, Dell Software (through Dell’s acquisition of StatSoft in April 2014).
Read an Excerpt
HANDBOOK OF STATISTICAL ANALYSIS AND DATA MINING APPLICATIONS
By Robert Nisbet John Elder Gary Miner
Academic Press
Copyright © 2009 Elsevier Inc.All right reserved.
ISBN: 9780080912035
Chapter One
The Background for Data Mining PracticeOUTLINE
Preamble 3
A Short History of Statistics and Data Mining 4
Modern Statistics: A Duality? 5
Two Views of Reality 8
The Rise of Modern Statistical Analysis: The Second Generation 10
Machine Learning Methods: The Third Generation 11
Statistical Learning Theory: The Fourth Generation 12
Postscript 13
PREAMBLE
You must be interested in learning how to practice data mining; otherwise, you would not be reading this book. We know that there are many books available that will give a good introduction to the process of data mining. Most books on data mining focus on the features and functions of various data mining tools or algorithms. Some books do focus on the challenges of performing data mining tasks. This book is designed to give you an introduction to the practice of data mining in the real world of business.
One of the first things considered in building a business data mining capability in a company is the selection of the data mining tool. It is difficult to penetrate the hype erected around the description of these tools by the vendors. The fact is that even the most mediocre of data mining tools can create models that are at least 90% as good as the best tools. A 90% solution performed with a relatively cheap tool might be more cost effective in your organization than a more expensive tool. How do you choose your data mining tool? Few reviews are available. The best listing of tools by popularity is maintained and updated yearly by KDNuggets.com. Some detailed reviews available in the literature go beyond just a discussion of the features and functions of the tools (see Nisbet, 2006, Parts 1–3). The interest in an unbiased and detailed comparison is great. We are told the "most downloaded document in data mining" is the comprehensive but decadeold tool review by Elder and Abbott (1998).
The other considerations in building a business's data mining capability are forming the data mining team, building the data mining platform, and forming a foundation of good data mining practice. This book will not discuss the building of the data mining platform. This subject is discussed in many other books, some in great detail. A good overview of how to build a data mining platform is presented in Data Mining: Concepts and Techniques (Han and Kamber, 2006). The primary focus of this book is to present a practical approach to building costeffective data mining models aimed at increasing company profitability, using tutorials and demo versions of common data mining tools.
Just as important as these considerations in practice is the background against which they must be performed. We must not imagine that the background doesn't matter ... it does matter, whether or not we recognize it initially. The reason it matters is that the capabilities of statistical and data mining methodology were not developed in a vacuum. Analytical methodology was developed in the context of prevailing statistical and analytical theory. But the major driver in this development was a very pressing need to provide a simple and repeatable analysis methodology in medical science. From this beginning developed modern statistical analysis and data mining. To understand the strengths and limitations of this body of methodology and use it effectively, we must understand the strengths and limitations of the statistical theory from which they developed. This theory was developed by scientists and mathematicians who "thought" it out. But this thinking was not one sided or unidirectional; there arose several views on how to solve analytical problems. To understand how to approach the solving of an analytical problem, we must understand the different ways different people tend to think. This history of statistical theory behind the development of various statistical techniques bears strongly on the ability of the technique to serve the tasks of a data mining project.
A SHORT HISTORY OF STATISTICS AND DATA MINING
Analysis of patterns in data is not new. The concepts of average and grouping can be dated back to the 6th century BC in Ancient China, following the invention of the bamboo rod abacus (Goodman, 1968). In Ancient China and Greece, statistics were gathered to help heads of state govern their countries in fiscal and military matters. (This makes you wonder if the words statistic and state might have sprung from the same root.) In the sixteenth and seventeenth centuries, games of chance were popular among the wealthy, prompting many questions about probability to be addressed to famous mathematicians (Fermat, Leibnitz, etc.). These questions led to much research in mathematics and statistics during the ensuing years.
MODERN STATISTICS: A DUALITY?
Two branches of statistical analysis developed in the eighteenth century: Bayesian and classical statistics. (See Figure 1.1.) To treat both fairly in the context of history, we will consider both in the First Generation of statistical analysis. For the Bayesians, the probability of an event's occurrence is equal to the probability of its past occurrence times the likelihood of its occurrence in the future. Analysis proceeds based on the concept of conditional probability: the probability of an event occurring given that another event has already occurred. Bayesian analysis begins with the quantification of the investigator's existing state of knowledge, beliefs, and assumptions. These subjective priors are combined with observed data quantified probabilistically through an objective function of some sort. The classical statistical approach (that flowed out of mathematical works of Gauss and Laplace) considered that the joint probability, rather than the conditional probability, was the appropriate basis for analysis. The joint probability function expresses the probability that simultaneously X takes the specific values x and Y takes value y, as a function of x and y.
Interest in probability picked up early among biologists following Mendel in the latter part of the nineteenth century. Sir Francis Galton, founder of the School of Eugenics in England, and his successor, Karl Pearson, developed the concepts of regression and correlation for analyzing genetic data. Later, Pearson and colleagues extended their work to the social sciences. Following Pearson, Sir R. A. Fisher in England developed his system for inference testing in medical studies based on his concept of standard deviation. While the development of probability theory flowed out of the work of Galton and Pearson, early predictive methods followed Bayes's approach. Bayesian approaches to inference testing could lead to widely different conclusions by different medical investigators because they used different sets of subjective priors. Fisher's goal in developing his system of statistical inference was to provide medical investigators with a common set of tools for use in comparison studies of effects of different treatments by different investigators. But to make his system work even with large samples, Fisher had to make a number of assumptions to define his "Parametric Model."
Assumptions of the Parametric Model
1. Data Fits a Known Distribution (e.g., Normal, Logistic, Poisson, etc.)
Fisher's early work was based on calculation of the parameter standard deviation, which assumes that data are distributed in a normal distribution. The normal distribution is bellshaped, with the mean (average) at the top of the bell, with "tails" falling off evenly at the sides. Standard deviation is simply the "average" of the absolute deviation of a value from the mean. In this calculation, however, averaging is accomplished by dividing the sum of the absolute deviations by the total – 1. This subtraction expresses (to some extent) the increased uncertainty of the result due to grouping (summing the absolute deviations). Subsequent developments used modified parameters based on the logistic or Poisson distributions. The assumption of a particular known distribution is necessary in order to draw upon the characteristics of the distribution function for making inferences. All of these parametric methods run the gauntlet of dangers related to forcefitting data from the real world into a mathematical construct that does not fit.
2. Factor Independency
In parametric predictive systems, the variable to be predicted (Y) is considered as a function of predictor variables (X's) that are assumed to have independent effects on Y. That is, the effect on Y of each Xvariable is not dependent on effects on Y of any other X variable. This situation could be created in the laboratory by allowing only one factor (e.g., a treatment) to vary, while keeping all other factors constant (e.g., temperature, moisture, light, etc.). But, in the real world, such laboratory control is absent. As a result, some factors that do affect other factors are permitted to have a joint effect on Y. This problem is called collinearity. When it occurs between more than two factors, it is termed multicollinearity. The multicollinearity problem led statisticians to use an interaction term in the relationship that supposedly represented the combined effects. Use of this interaction term functioned as a magnificent kluge, and the reality of its effects was seldom analyzed. Later development included a number of interaction terms, one for each interaction the investigator might be presenting.
3. Linear Additivity
Not only must the Xvariables be independent, their effects on Y must be cumulative and linear. That means the effect of each factor is added to or subtracted from the combined effects of all Xvariables on Y. But what if the relationship between Y and the predictors (Xvariables) is not additive, but multiplicative or divisive? Such functions can be expressed only by exponential equations that usually generate very nonlinear relationships. Assumption of linear additivity for these relationships may cause large errors in the predicted outputs. This is often the case with their use in business data systems.
4. Constant Variance (Homoscedasticity)
The variance throughout the range of each variable is assumed to be constant. This means that if you divided the range of a variable into bins, the variance across all records for bin #1 is the same as the range for all the other bins in the range of that variable. If the variance throughout the range of a variable differs significantly from constancy, it is said to be heteroscedastic. The error in the predicted value caused by the combined heteroscedasticity among all variables can be quite significant.
5. Variables Must Be Numerical and Continuous
The assumption that variables must be numerical and continuous means that data must be numeric (or it must be transformable to a number before analysis) and the number must be part of a distribution that is inherently continuous. Integer values in a string are not continuous; they are discrete. Classical parametric statistical methods are not valid for use with discrete data, because the probability distributions for continuous and discrete data are different. But both scientists and business analysts have used them anyway.
In his landmark paper, Fisher (1921; see Figure 1.2) began with the broad definition of probability as the intrinsic probability of an event's occurrence divided by the probability of occurrence of all competing events (very Bayesian). By the end of his paper, Fisher modified his definition of probability for use in medical analysis (the goal of his research) as the intrinsic probability of an event's occurrence period. He named this quantity likelihood. From that foundation, he developed the concepts of standard deviation based on the normal distribution. Those who followed Fisher began to refer to likelihood as probability. The concept of likelihood approaches the classical concept of probability only as the sample size becomes very large and the effects of subjective priors approach zero (von Mises, 1957). In practice, these two conditions may be satisfied sufficiently if the initial distribution of the data is known and the sample size is relatively large (following the Law of Large Numbers).
Why did this duality of thought arise in the development of statistics? Perhaps it is because of the broader duality that pervades all of human thinking. This duality can be traced all the way back to the ancient debate between Plato and Aristotle.
TWO VIEWS OF REALITY
Whenever we consider solving a problem or answering a question, we start by conceptualizing it. That means we do one of two things: (1) try to reduce it to key elements or (2) try to conceive of it in general terms. We call people who take each of these approaches "detail people" and "big picture people," respectively. What we don't consider is that this distinction has its roots deep in Greek philosophy in the works of Aristotle and Plato.
Aristotle
Aristotle (Figure 1.3) believed that the true being of things (reality) could be discerned only by what the eye could see, the hand could touch, etc. He believed that the highest level of intellectual activity was the detailed study of the tangible world around us. Only in that way could we understand reality. Based on this approach to truth, Aristotle was led to believe that you could break down a complex system into pieces, describe the pieces in detail, put the pieces together and understand the whole. For Aristotle, the "whole" was equal to the sum of its parts. This nature of the whole was viewed by Aristotle in a manner that was very machinelike.
Science gravitated toward Aristotle very early. The nature of the world around us was studied by looking very closely at the physical elements and biological units (species) that composed it. As our understanding of the natural world matured into the concept of the ecosystem, it was discovered that many characteristics of ecosystems could not be explained by traditional (Aristotelian) approaches. For example, in the science of forestry, we discovered that when a tropical rain forest is cut down on the periphery of its range, it may take a very long time to regenerate (if it does at all). We learned that the reason for this is that in areas of relative stress (e.g., peripheral areas), the primary characteristics necessary for the survival and growth of tropical trees are maintained by the forest itself! High rainfall leaches nutrients down beyond the reach of the tree roots, so almost all of the nutrients for tree growth must come from recently fallen leaves and branches. When you cut down the forest, you remove that source of nutrients. The forest canopy also maintains favorable conditions of light, moisture, and temperature required by the trees. Removing the forest removes the very factors necessary for it to exist at all in that location. These factors emerge only when the system is whole and functioning. Many complex systems are like that, even business systems. In fact, these emergent properties may be the major drivers of system stability and predictability.
To understand the failure of Aristotelian philosophy for completely defining the world, we must return to Ancient Greece and consider Aristotle's rival, Plato.
Plato
Plato (Figure 1.4) was Aristotle's teacher for 20 years, and they both agreed to disagree on the nature of being. While Aristotle focused on describing tangible things in the world by detailed studies, Plato focused on the world of ideas that lay behind these tangibles. For Plato, the only thing that had lasting being was an idea. He believed that the most important things in human existence were beyond what the eye could see and the hand could touch. Plato believed that the influence of ideas transcended the world of tangible things that commanded so much of Aristotle's interest. For Plato, the "whole" of reality was greater than the sum of its tangible parts.
(Continues...)
Table of Contents
Preface
Forwards
Introduction
PART I: History of Phases of Data Analysis, Basic Theory, and the Data Mining Process
Chapter 1. History – The Phases of Data Analysis throughout the Ages
Chapter 2. Theory
Chapter 3. The Data Mining Process
Chapter 4. Data Understanding and Preparation
Chapter 5. Feature Selection – Selecting the Best Variables
Chapter 6: Accessory Tools and Advanced Features in Data
PART II:  The Algorithms in Data Mining and Text Mining, and the Organization of the Three most common Data Mining Tools
Chapter 7. Basic Algorithms
Chapter 8: Advanced Algorithms
Chapter 9. Text Mining
Chapter 10. Organization of 3 Leading Data Mining Tools
Chapter 11. Classification Trees = Decision Trees
Chapter 12. Numerical Prediction (Neural Nets and GLM
Chapter 13. Model Evaluation and Enhancement
Chapter 14. Medical Informatics
Chapter 15. Bioinformatics
Chapter 16. Customer Response Models
Chapter 17. Fraud Detection
PART III: Tutorials  StepbyStep Case Studies as a Starting Point to learn how to do Data Mining Analyses
Tutorials
PART IV: Paradox of Complex Models; using the “right model for the right use”, ongoing development, and the Future.
Chapter 18: Paradox of Ensembles and Complexity
Chapter 19: The Right Model for the Right Use
Chapter 20: The Top 10 Data Mining Mistakes
Chapter 21: Prospect for the Future – Developing Areas in Data Mining
Chapter 22: Summary
GLOSSARY of STATISICAL and DATA MINING TERMS
INDEX
CD – With Additional Tutorials, data sets, Power Points, and Data Mining software