Table of Contents
Preface iii
1 Variables, Population, and Samples 1
1.1 Statistics is concerned with describing and explaining how things vary 1
1.2 Inferences about populations based on sample information are subject to error 3
1.3 Other forms of random samples include stratified and cluster sampling 7
2 Basic Ideals of Statistical Inference 11
2.1 There are two basic tasks of statistical inference 11
2.2 To evaluate degree of certainty, first consider all possible samples 13
2.3 To find degree of certainty, find the sampling distribution of the statistic 16
2.4 A handy way of representing a distribution is to draw a histogram 19
2.5 The larger the sample, the less spread out is the sampling distribution 22
2.6 So what? 28
3 Describing Data for a Single Variable 32
3.1 There are many ways of summarizing a set of data 32
3.2 Often we are interested in the entire population distribution 33
3.3 Some distributions can be approximated by mathematical functions 34
3.4 A histogram and frequency distribution are very useful descriptive tools 38
3.5 In a histogram, area represents relative frequency 45
3.6 The mean, median, and mode are useful for describing central tendency 48
3.7 The range and interquartile range are measures of variability 53
3.8 The standard deviation is the most commonly used measure of variability 56
3.9 The standard deviation is computed differently for samples versus populations 58
4 Some Distributions Used in Statistical Inference 64
4.1 Knowing the sampling distribution of a statistic allows us to draw inferences from sample data 64
4.2 The standard normal distribution is used to find areas under any normal curve 64
4.3 The binomial distribution is used for variables that count the number of yeses 74
4.4 To calculate binomial probabilities, we need to find the probability of each possible outcome 75
4.5 We next find the number of relevant outcomes and multiply it by the probability of each relevant outcome 77
4.6 Binomial probabilities can be computed from a general formula and are also available in tables 78
4.7 A binomial distribution with large n and moderate p is approximately normal 82
5 Interval Estimation 92
5.1 The standard error of a statistic is the standard deviation of its sampling distribution 92
5.2 The CLT can be applied to draw inferences about the population mean 95
5.3 If we know o- we can use z scores to form a confidence interval for the mean 97
6 Hypothesis Testing 101
6.1 The CLT can also be applied to perform hypothesis tests concerning averages 101
6.2 Rejecting a true null hypothesis is called a type I error 103
6.3 P values come in two flavors: one-tailed and two-tailed 105
6.4 Failing to reject a false null hypothesis is called a type II error 108
6.5 Do not confuse statistical significance with practical significance 110
7 Drawing Inference About a Population Mean 112
7.1 The normal distribution can be used to test hypotheses about μ when σ is known 112
7.2 If we must estimate the standard error from the data, we lose some certainty 118
7.3 The family of t distributions is used to draw inferences when σ is known 120
7.4 The t distribution with n - 1 df is used to draw inferences about a mean 122
8 Further Topics in Inference About Single Populations 127
8.1 Proportions can be treated as a special kind of mean 127
8.2 The sign test requires few assumptions 131
8.3 The Wilcoxon signed rank test works with ranks instead of the original units 135
9 Drawing Inferences About Group Differences 142
9.1 Many scientific hypotheses can be stated in terms of group differences 142
9.2 For two paired samples, the problem reduces to the one-sample case 144
9.3 The two-sample t test is used for problems involving independent samples 147
9.4 The Wilcoxon rank sum test is another method for comparing two groups 155
10 One-Way Analysis of Variance 161
10.1 Analysis of variance is a general method used in analyzing group differences 161
10.2 One-way ANOVA can be used to test for differences among more than two groups 164
10.3 The Kruskal-Wallis test is a nonparametric analog of one-way ANOVA 172
11 Describing Relationships Between Two Variables 177
11.1 A scatterplot shows the shape of a relationship between two variables 177
11.2 Easy-r is a simple measure of the strength of a monotonic relationship 180
11.3 The correlation coefficient tells how well a straight line describes the plot 189
11.4 A form of t test can be used to draw inferences about correlation coefficients 194
11.5 Spearman's rho and Kendall's tau are nonparametric measures of correlation 198
11.6 A nonzero r does not imply causality, nor does a zero r imply no correlation 201
12 Introduction to Regression Methods 205
12.1 Often we wish to form a model to describe how one variable responds to others 205
12.2 Stochastic models may include one independent variable, or several 207
12.3 Regression methods are used to estimate the parameters of a stochastic model 212
12.4 The coefficient of determination tells how well the model fits the data 218
12.5 The F test is used to test the model as a whole for statistical significance 220
12.6 Individual regression coefficients can be tested using a form of t test 223
13 Further Topics in Regression 228
13.1 Stepwise methods can be used to select independent variables for a model 228
13.2 Beware of the misuses of stepwise regression! 235
13.3 Plotting the residuals can help identify violations of assumptions 240
13.4 Violations of assumptions can be treated by a variety of methods 251
14 Further Topics in Analysis of Variance 255
14.1 Analysis of variance is used to compare group 255
14.2 Interaction means that the effect of one factor depends on another factor 258
14.3 The F test is used to test for significant main effects and interactions 260
15 Analyzing Categorical Data 267
15.1 In many situations our data are not measurements, but simply counts 267
15.2 Expected counts are what we would observe on the average if the null is true 270
15.3 The chi-square test evaluates the difference between observed and expected frequencies 273
References 279
Appendix A The Coronary Care Data Set 281
Appendix B The Electricity Consumer Questionnaire Data Set 286
Appendix C Selected Advanced Topics 289
C.1 Checking for normality and homogeneity 289
C.2 Dealing with lack of homogeneity and normality 291
C.3 Comparing individual groups in ANOVA 292
C.4 Fisher's z transformation for testing correlation 294
C.5 Derivation of least-squares formulas for simple regression coefficients 296
C.6 Formula for standard error of b1 in simple regression 296
C.7 Formulas for mean squares and df in regression 297
Appendix D Statistical Tables 298
D.1 Standard normal distribution 299
D.2 Binomial probabilities 300
D.3 Student's t distribution 305
D.4 Critical values for the Wilcoxon signed rank test 306
D.5 Critical values for the Wilcoxon rank sum test 307
D.6 The F distribution 312
D.7 The chi-square- distribution 320
D.8 Critical values of r, the correlation coefficient 321
Index 323
Statpal Manual M-1
Statpal Manual Index M-93