Data Science for Business: What you need to know about data mining and data-analytic thinking / Edition 1

Data Science for Business: What you need to know about data mining and data-analytic thinking / Edition 1

Pub. Date:
O'Reilly Media, Incorporated


View All Available Formats & Editions
Current price is , Original price is $39.99. You
Select a Purchase Option (New Edition)
  • purchase options
    $30.99 $39.99 Save 23% Current price is $30.99, Original price is $39.99. You Save 23%.
  • purchase options


Data Science for Business: What you need to know about data mining and data-analytic thinking / Edition 1

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. You’ll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your company’s data science projects. You’ll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.

  • Understand how data science fits in your organization—and how you can use it for competitive advantage
  • Treat data as a business asset that requires careful investment if you’re to gain real value
  • Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way
  • Learn general concepts for actually extracting knowledge from data
  • Apply data science principles when interviewing data science job candidates

Product Details

ISBN-13: 9781449361327
Publisher: O'Reilly Media, Incorporated
Publication date: 08/16/2013
Edition description: New Edition
Pages: 414
Sales rank: 60,066
Product dimensions: 7.00(w) x 9.10(h) x 0.90(d)

About the Author

Foster Provost is Professor and NEC Faculty Fellow at the NYU Stern School of Business, where he teaches in the MBA, Business Analytics, and Data Science programs. Former Editor-in-Chief for the journal Machine Learning, Professor Provost has co-founded several successful companies focusing on data science for marketing.

Tom Fawcett holds a Ph.D. in machine learning and has worked in industry R&D for more than two decades for companies such as GTE Laboratories, NYNEX/Verizon Labs, and HP Labs. His published work has become standard reading in data science both on methodology (evaluating data mining results) and on applications (fraud detection and spam filtering).

Table of Contents

Our Conceptual Approach to Data Science;
To the Instructor;
Other Skills and Concepts;
Sections and Notation;
Using Examples;
Safari® Books Online;
How to Contact Us;
Chapter 1: Introduction: Data-Analytic Thinking;
1.1 The Ubiquity of Data Opportunities;
1.2 Example: Hurricane Frances;
1.3 Example: Predicting Customer Churn;
1.4 Data Science, Engineering, and Data-Driven Decision Making;
1.5 Data Processing and “Big Data”;
1.6 From Big Data 1.0 to Big Data 2.0;
1.7 Data and Data Science Capability as a Strategic Asset;
1.8 Data-Analytic Thinking;
1.9 This Book;
1.10 Data Mining and Data Science, Revisited;
1.11 Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist;
1.12 Summary;
Chapter 2: Business Problems and Data Science Solutions;
2.1 From Business Problems to Data Mining Tasks;
2.2 Supervised Versus Unsupervised Methods;
2.3 Data Mining and Its Results;
2.4 The Data Mining Process;
2.5 Implications for Managing the Data Science Team;
2.6 Other Analytics Techniques and Technologies;
2.7 Summary;
Chapter 3: Introduction to Predictive Modeling: From Correlation to Supervised Segmentation;
3.1 Models, Induction, and Prediction;
3.2 Supervised Segmentation;
3.3 Visualizing Segmentations;
3.4 Trees as Sets of Rules;
3.5 Probability Estimation;
3.6 Example: Addressing the Churn Problem with Tree Induction;
3.7 Summary;
Chapter 4: Fitting a Model to Data;
4.1 Classification via Mathematical Functions;
4.2 Regression via Mathematical Functions;
4.3 Class Probability Estimation and Logistic “Regression”;
4.4 Example: Logistic Regression versus Tree Induction;
4.5 Nonlinear Functions, Support Vector Machines, and Neural Networks;
4.6 Summary;
Chapter 5: Overfitting and Its Avoidance;
5.1 Generalization;
5.2 Overfitting;
5.3 Overfitting Examined;
5.4 Example: Overfitting Linear Functions;
5.5 * Example: Why Is Overfitting Bad?;
5.6 From Holdout Evaluation to Cross-Validation;
5.7 The Churn Dataset Revisited;
5.8 Learning Curves;
5.9 Overfitting Avoidance and Complexity Control;
5.10 Summary;
Chapter 6: Similarity, Neighbors, and Clusters;
6.1 Similarity and Distance;
6.2 Nearest-Neighbor Reasoning;
6.3 Some Important Technical Details Relating to Similarities and Neighbors;
6.4 Clustering;
6.5 Stepping Back: Solving a Business Problem Versus Data Exploration;
6.6 Summary;
Chapter 7: Decision Analytic Thinking I: What Is a Good Model?;
7.1 Evaluating Classifiers;
7.2 Generalizing Beyond Classification;
7.3 A Key Analytical Framework: Expected Value;
7.4 Evaluation, Baseline Performance, and Implications for Investments in Data;
7.5 Summary;
Chapter 8: Visualizing Model Performance;
8.1 Ranking Instead of Classifying;
8.2 Profit Curves;
8.3 ROC Graphs and Curves;
8.4 The Area Under the ROC Curve (AUC);
8.5 Cumulative Response and Lift Curves;
8.6 Example: churnperformance analytics for modeling performance analytics, for modeling churn Performance Analytics for Churn Modeling;
8.7 Summary;
Chapter 9: Evidence and Probabilities;
9.1 Example: Targeting Online Consumers With Advertisements;
9.2 Combining Evidence Probabilistically;
9.3 Applying Bayes’ Rule to Data Science;
9.4 A Model of Evidence “Lift”;
9.5 Example: Evidence Lifts from Facebook "Likes";
9.6 Summary;
Chapter 10: Representing and Mining Text;
10.1 Why Text Is Important;
10.2 Why Text Is Difficult;
10.3 Representation;
10.4 Example: Jazz Musicians;
10.5 * The Relationship of IDF to Entropy;
10.6 Beyond Bag of Words;
10.7 Example: Mining News Stories to Predict Stock Price Movement;
10.8 Summary;
Chapter 11: Decision Analytic Thinking II: Toward Analytical Engineering;
11.1 Targeting the Best Prospects for a Charity Mailing;
11.2 Our Churn Example Revisited with Even More Sophistication;
Chapter 12: Other Data Science Tasks and Techniques;
12.1 Co-occurrences and Associations: Finding Items That Go Together;
12.2 Profiling: Finding Typical Behavior;
12.3 Link Prediction and Social Recommendation;
12.4 Data Reduction, Latent Information, and Movie Recommendation;
12.5 Bias, Variance, and Ensemble Methods;
12.6 Data-Driven Causal Explanation and a Viral Marketing Example;
12.7 Summary;
Chapter 13: Data Science and Business Strategy;
13.1 Thinking Data-Analytically, Redux;
13.2 Achieving Competitive Advantage with Data Science;
13.3 Sustaining Competitive Advantage with Data Science;
13.4 Attracting and Nurturing Data Scientists and Their Teams;
13.5 Examine Data Science Case Studies;
13.6 Be Ready to Accept Creative Ideas from Any Source;
13.7 Be Ready to Evaluate Proposals for Data Science Projects;
13.8 A Firm’s Data Science Maturity;
Chapter 14: Conclusion;
14.1 The Fundamental Concepts of Data Science;
14.2 What Data Can’t Do: Humans in the Loop, Revisited;
14.3 Privacy, Ethics, and Mining Data About Individuals;
14.4 Is There More to Data Science?;
14.5 Final Example: From Crowd-Sourcing to Cloud-Sourcing;
14.6 Final Words;
Proposal Review Guide;
Business and Data Understanding;
Data Preparation;
Evaluation and Deployment;
Another Sample Proposal;
Scenario and Proposal;

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews

Data Science for Business: What you need to know about data mining and data-analytic thinking 5 out of 5 based on 0 ratings. 1 reviews.
Anonymous More than 1 year ago
When trying to learn about a new field, one of the most common difficulties is to find books (and other materials) that have the right "depth". All too often one ends up with either a friendly but largely useless book that oversimplifies or a heavy academic tome that, though authoritative and comprehensive, is condemned to sit gathering dust in one's shelves. "Data Science for Business" gets it just right. What I mean might become clearer if I point out what this book is *not*: - It is *not* a computer science textbook with a focus on theoretical derivations and algorithms. - It is *not* a "cookbook" that provides "step-by-step" guidance with little to no explanation of what one is doing. - It is *not* your standard "management" title on the cool tech du jour available at airport stands and meant to be read in one sitting (buzzwords, hype and overly enthusiastic statements making up for the dearth of actual content). Instead, it is close to being the perfect guide for the intelligent reader who -- regardless of whether s/he has a tech background -- has a sincere desire to learn how the tools and principles of data science can be used to extract meaningful information from huge datasets. Highly recommended.