Think Stats: Exploratory Data Analysis / Edition 2

Think Stats: Exploratory Data Analysis / Edition 2

by Allen B. Downey Allen B. Downey
Pub. Date:
O'Reilly Media, Incorporated
Pub. Date:
O'Reilly Media, Incorporated
Think Stats: Exploratory Data Analysis / Edition 2

Think Stats: Exploratory Data Analysis / Edition 2

by Allen B. Downey Allen B. Downey


Current price is , Original price is $34.99. You

Temporarily Out of Stock Online

Please check back later for updated availability.


If you know how to program, you have the skills to turn data into knowledge, using tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python.

By working with a single case study throughout this thoroughly revised book, you’ll learn the entire process of exploratory data analysis—from collecting data and generating statistics to identifying patterns and testing hypotheses. You’ll explore distributions, rules of probability, visualization, and many other tools and concepts.

New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries.

  • Develop an understanding of probability and statistics by writing and testing code
  • Run experiments to test statistical behavior, such as generating samples from several distributions
  • Use simulations to understand concepts that are hard to grasp mathematically
  • Import data from most sources with Python, rather than rely on data that’s cleaned and formatted for statistics tools
  • Use statistical inference to answer questions about real-world data

Related collections and offers

Product Details

ISBN-13: 9781491907337
Publisher: O'Reilly Media, Incorporated
Publication date: 10/27/2014
Edition description: 2nd ed.
Pages: 226
Sales rank: 774,220
Product dimensions: 6.90(w) x 9.10(h) x 0.60(d)

About the Author

Allen Downey is an Associate Professor of Computer Science at the Olin College of Engineering. He has taught computer science at Wellesley College, Colby College and U.C. Berkeley. He has a Ph.D. in Computer Science from U.C. Berkeley and Master’s and Bachelor’s degrees from MIT.

Table of Contents

Why I Wrote This Book;
How I Wrote This Book;
Contributor List;
Conventions Used in This Book;
Using Code Examples;
Safari® Books Online;
How to Contact Us;
Chapter 1: Statistical Thinking for Programmers;
1.1 Do First Babies Arrive Late?;
1.2 A Statistical Approach;
1.3 The National Survey of Family Growth;
1.4 Tables and Records;
1.5 Significance;
1.6 Glossary;
Chapter 2: Descriptive Statistics;
2.1 Means and Averages;
2.2 Variance;
2.3 Distributions;
2.4 Representing Histograms;
2.5 Plotting Histograms;
2.6 Representing PMFs;
2.7 Plotting PMFs;
2.8 Outliers;
2.9 Other Visualizations;
2.10 Relative Risk;
2.11 Conditional Probability;
2.12 Reporting Results;
2.13 Glossary;
Chapter 3: Cumulative Distribution Functions;
3.1 The Class Size Paradox;
3.2 The Limits of PMFs;
3.3 Percentiles;
3.4 Cumulative Distribution Functions;
3.5 Representing CDFs;
3.6 Back to the Survey Data;
3.7 Conditional Distributions;
3.8 Random Numbers;
3.9 Summary Statistics Revisited;
3.10 Glossary;
Chapter 4: Continuous Distributions;
4.1 The Exponential Distribution;
4.2 The Pareto Distribution;
4.3 The Normal Distribution;
4.4 Normal Probability Plot;
4.5 The Lognormal Distribution;
4.6 Why Model?;
4.7 Generating Random Numbers;
4.8 Glossary;
Chapter 5: Probability;
5.1 Rules of Probability;
5.2 Monty Hall;
5.3 Poincaré;
5.4 Another Rule of Probability;
5.5 Binomial Distribution;
5.6 Streaks and Hot Spots;
5.7 Bayes’s Theorem;
5.8 Glossary;
Chapter 6: Operations on Distributions;
6.1 Skewness;
6.2 Random Variables;
6.3 PDFs;
6.4 Convolution;
6.5 Why Normal?;
6.6 Central Limit Theorem;
6.7 The Distribution Framework;
6.8 Glossary;
Chapter 7: Hypothesis Testing;
7.1 Testing a Difference in Means;
7.2 Choosing a Threshold;
7.3 Defining the Effect;
7.4 Interpreting the Result;
7.5 Cross-Validation;
7.6 Reporting Bayesian Probabilities;
7.7 Chi-Square Test;
7.8 Efficient Resampling;
7.9 Power;
7.10 Glossary;
Chapter 8: Estimation;
8.1 The Estimation Game;
8.2 Guess the Variance;
8.3 Understanding Errors;
8.4 Exponential Distributions;
8.5 Confidence Intervals;
8.6 Bayesian Estimation;
8.7 Implementing Bayesian Estimation;
8.8 Censored Data;
8.9 The Locomotive Problem;
8.10 Glossary;
Chapter 9: Correlation;
9.1 Standard Scores;
9.2 Covariance;
9.3 Correlation;
9.4 Making Scatterplots in Pyplot;
9.5 Spearman’s Rank Correlation;
9.6 Least Squares Fit;
9.7 Goodness of Fit;
9.8 Correlation and Causation;
9.9 Glossary;

Customer Reviews