- Shopping Bag ( 0 items )
|Preface to the Second Edition: For the Instructor|
|Preface to the Second Edition: For the Student|
|Pt. 1||Descriptive Statistics||1|
|Ch. 1||Introduction to Psychological Statistics||1|
|Ch. 2||Frequency Tables, Graphs, and Distributions||21|
|Ch. 3||Measures of Central Tendency and Variability||49|
|Ch. 4||Standardized Scores and the Normal Distribution||89|
|Pt. 2||One- and Two-Sample Hypothesis Tests||124|
|Ch. 5||Introduction to Hypothesis Testing: The One-Sample z Test||124|
|Ch. 6||Interval Estimation and the t Distribution||154|
|Ch. 7||The t Test for Two Independent Sample Means||182|
|Ch. 8||Statistical Power and Effect Size||214|
|Pt. 3||Hypothesis Tests Involving Two Measures on Each Subject||241|
|Ch. 9||Linear Correlation||241|
|Ch. 10||Linear Regression||272|
|Ch. 11||The Matched t Test||301|
|Pt. 4||Analysis of Variance without Repeated Measures||324|
|Ch. 12||One-Way Independent ANOVA||324|
|Ch. 13||Multiple Comparisons||362|
|Ch. 14||Two-Way ANOVA||391|
|Pt. 5||Analysis of Variance with Repeated Measures||435|
|Ch. 15||Repeated Measures ANOVA||435|
|Ch. 16||Two-Way Mixed Design ANOVA||472|
|Pt. 6||Multiple Regression and Its Connection to ANOVA||507|
|Ch. 17||Multiple Regression||507|
|Ch. 18||The Regression Approach to ANOVA||563|
|Pt. 7||Non Parametric Statistics||611|
|Ch. 19||The Binomial Distribution||611|
|Ch. 20||Chi-Square Tests||634|
|Ch. 21||Statistical Tests for Ordinal Data||662|
|App. A: Statistical Tables||689|
|App. B: Answers To Selected Exercises||709|
Note: The Figures and/or Tables mentioned in this sample chapter do not appear on the Web.
If you have not already read the Preface to the Student, please do so now. Many readers have developed the habit of skipping the Preface because it is often used by the author as a soapbox or as an opportunity to give his or her autobiography and to thank many people the reader has never heard of. The preface of this text is different and plays a particularly important role. You may have noticed that this book uses a unique form of organization (each chapter is broken into A, B, and C sections). The preface explains the rationale for this unique format and explains how you can derive the most benefit from it.
What Is (Are) Statistics?
An obvious way to begin a text about statistics is to pose the rhetorical question What is statistics? However, it is also proper to pose the question What are statistics?--because the term statistics can be used in at least two different ways. In one sense statistics refers to a collection of numerical facts, such as a set of performance measures for a baseball team (e. g., batting averages of the players) or the results of the latest U. S. census (e. g., the average size of households in the United States). So the answer to What are statistics? is that they are observations organized into numerical form.
In a second sense, statistics refers to a branch of mathematics that is concerned with methods for understanding and summarizing collections of numbers. So the answer to What is statistics? is that it is a set of methods for dealing with numerical facts. Psychologists, like other scientists, refer to numerical facts as data. The word data is a plural noun and always takes a plural verb, as in "the data were analyzed." (The singular form, datum, is rarely used.) Actually, there is a third meaning for the term statistics, which distinguishes a statistic from a parameter. To explain this distinction, I have to contrast samples with populations, which I will do at the end of this section.
As a part of mathematics, statistics has a theoretical side that can get very abstract. This text, however, deals only with applied statistics. It describes methods for data analysis that have been worked out by statisticians but does not show how these methods were derived from more fundamental mathematical principles. For that part of the story, you would need to read a text on theoretical statistics (e. g., Hogg & Craig, 1965).
The title of this text uses the phrase "psychological statistics." This could mean a collection of numerical facts about psychology (e. g., how large a percentage of the population claims to be happy), but as you have probably guessed, it actually refers to those statistical methods that are commonly applied to the analysis of psychological data. Indeed, just about every kind of statistical method has been used at one time or another to analyze some set of psychological data. The methods presented in this text are the ones usually taught in an intermediate (advanced undergraduate or master's level) statistics course for psychology students, and they have been chosen because they are not only commonly used but are also simple to explain. Unfortunately, some methods that are now used frequently in psychological research (e. g., structural equation modeling) are too complex to be covered adequately at this level.
One part of applied statistics is concerned only with summarizing the set of data that a researcher has collected; this is called descriptive statistics. If all sixth graders in the United States take the same standardized exam, and you want a system for describing each student's standing with respect to the others, you need descriptive statistics. However, most psychological research involves relatively small groups of people from which inferences are drawn about the larger population; this branch of statistics is called inferential statistics. If you have a random sample of 100 patients who have been taking a new antidepressant drug, and you want to make a general statement about the drug's possible effectiveness in the entire population, you need inferential statistics. This text begins with a presentation of several procedures that are commonly used to create descriptive statistics. Although such methods can be used just to describe data, it is quite common to use these descriptive statistics as the basis for inferential procedures. The bulk of the text is devoted to some of the most common procedures of inferential statistics.
Statistics and Research
The reason a course in statistics is nearly universally required for psychology students is that statistical methods play a critical role in most types of psychological research. However, not all forms of research rely on statistics. For instance, it was once believed that only humans make and use tools. Then chimpanzees were observed stripping leaves from branches before inserting the branches into holes in logs to "fish" for termites to eat (van Lawick-Goodall, 1971). Certainly such an observation has to be replicated by different scientists in different settings before becoming widely accepted as evidence of toolmaking among chimpanzees, but statistical analysis is not necessary.
On the other hand, suppose you wanted to know whether a glass of warm milk at bedtime will help insomniacs get to sleep faster. In this case, the results are not likely to be obvious. You don't expect the warm milk to "knock out" any of the subjects or even to help every one of them. The effect of the milk is likely to be small and noticeable only after averaging the time it takes a number of subjects to fall asleep (the sleep latency) and comparing that to the average for a (control) group that does not get the milk. Descriptive statistics is required to demonstrate that there is a difference between the two groups, and inferential statistics is needed to show that if the experiment were repeated, it would be likely that the difference would be in the same direction. (If warm milk really has no effect on sleep latency, the next experiment would be just as likely to show that warm milk slightly increases sleep latency as to show that it slightly decreases it.)
Variables and Constants
A key concept in the above example is that the time it takes to fall asleep varies from one insomniac to another and also varies after a person drinks warm milk. Because sleep latency varies, it is called a variable. If sleep latency were the same for everyone, it would be a constant, and you really wouldn't need statistics to evaluate your research. It would be obvious after testing a few subjects whether the milk was having an effect. But, because sleep latency varies from person to person and from night to night, it would not be obvious whether a particular case of shortened sleep latency was due to warm milk or just to the usual variability. Rather than focusing on any one instance of sleep latency, you would probably use statistics to compare a whole set of sleep latencies of people who drank warm milk with another whole set of people who did not.
In the field of physics there are many important constants (e. g., the speed of light, the mass of a proton), but most human characteristics vary a great deal from person to person. The number of chambers in the heart is a constant for humans (four), but resting heart rate is a variable. Many human variables (e. g., beauty, charisma) are easy to observe but hard to measure precisely or reliably. Because the types of statistical procedures that can be used to analyze the data from a research study depend in part on the way the variables involved were measured, we turn to this topic next.
Scales of Measurement
Measurement is a system for assigning values to observations in a consistent and reproducible way. When most people think of measurement, they think first of physical measurement, in which numbers and measurement units (e. g., minutes and seconds for sleep latency) are used in a precise way. However, in a broad sense, measurement need not involve numbers at all.
Facial expressions can be classified by the emotions they express (e. g., anger, happiness, surprise). The different emotions can be considered values on a nominal scale; the term nominal refers to the fact that the values are simply named, rather than assigned numbers. (Some emotions can be identified quite reliably, even across diverse cultures and geographical locations; see Ekman, 1982.) If numbers are assigned to the values of a nominal scale, they are assigned arbitrarily and therefore cannot be used for mathematical operations. For example, the Diagnostic and Statistical Manual of the American Psychiatric Association (the latest version is DSM-IV) assigns a number as well as a name to each psychiatric diagnosis (e. g., the number 300.3 designates obsessive-compulsive disorder). However, it makes no sense to use these numbers mathematically; for instance, you cannot average the numerical diagnoses of all the members in a family to find out the "average" mental illness of the family. Even the order of the assigned numbers is arbitrary; the higher DSM-IV numbers do not indicate more severe diagnoses.
Many variables that are important to psychology (e. g., gender, type of psychotherapy) can be measured only on a nominal scale, so we will be dealing with this level of measurement throughout the text. Nominal scales are often referred to as categorical scales because the different levels of the scale represent distinct categories; each object measured is assigned to one and only one category. A nominal scale is also referred to as a qualitative level of measurement because each level has a different quality and therefore cannot be compared with other levels with respect to quantity.
A quantitative level of measurement is being used when the different values of a scale can be placed in order. For instance, an elementary school teacher may rate the handwriting of each student in a class as excellent, good, fair, or poor. Unlike the categories of a nominal scale, these designations have a meaningful order and therefore constitute an ordinal scale. One can add the percentage of students rated excellent to the percentage of students rated good, for instance, and then make the statement that a certain percentage of the students have handwriting that is "better than fair." Often the levels of an ordinal scale are given numbers, as when a coach rank-orders the gymnasts on a team based on ability. These numbers are not arbitrary like the numbers that may be assigned to the categories of a nominal scale; the gymnast ranked number 2 is better than the gymnast ranked number 4, and gymnast number 3 is somewhere between. However, the rankings cannot be treated as real numbers; that is, it cannot be assumed that the third-ranked gymnast is midway between the second and the fourth. In fact, it could be the case that the number 2 gymnast is much better than either number 3 or 4 and that number 3 is only slightly better than number 4 (as shown in Figure 1.1). Although the average of the numbers 2 and 4 is 3, the average of the abilities of the number 2 and 4 gymnasts is not equivalent to the abilities of gymnast number 3.
A typical example of the use of an ordinal scale in psychology is when photographs of human faces are rank-ordered for attractiveness. A less obvious example is the measurement of anxiety by means of a self-rated questionnaire (on which subjects indicate the frequency of various anxiety symptoms in their lives using numbers corresponding to never, sometimes, often, etc.). Higher scores can generally be thought of as indicating greater amounts of anxiety, but it is not likely that the anxiety difference between subjects scoring 20 and 30 is going to be exactly the same as the anxiety difference between subjects scoring 40 and 50. Nonetheless, scores from anxiety questionnaires and similar psychological measures are usually dealt with mathematically by researchers as though they were certain the scores were equally spaced throughout the scale.
The topic of measurement is a complex one to which entire textbooks have been devoted, so we will not delve further into measurement controversies here. For our purposes, you should be aware that when dealing with an ordinal scale (when you are sure of the order of the levels but not sure that the levels are equally spaced), you should use statistical procedures that have been devised specifically for use with ordinal data. The descriptive statistics that apply to ordinal data will be discussed in the next two chapters. The use of inferential statistics with ordinal data will not be presented until Chapter 21. (Although it can be argued that inferential ordinal statistics should be used more frequently, such procedures are not used very often in psychological research.)
Interval and Ratio Scales
In general, physical measurements have a level of precision that goes beyond the ordinal property previously described. We are confident that the inch marks on a ruler are equally spaced; we know that considerable effort goes into making sure of this. Because we know that the space, or interval, between 2 and 3 inches is the same as that between 3 and 4 inches, we can say that this measurement scale possesses the interval property (see Figure 1.2a). Such scales are based on units of measurement (e. g., the inch); a unit at one part of the scale is always the same size as a unit at any other part of the scale. It is therefore permissible to treat the numbers on this kind of scale as actual numbers and to assume that a measurement of three units is exactly halfway between two and four units.
In addition, most physical measurements possess what is called the ratio property. This means that when your measurement scale tells you that you now have twice as many units of the variable as before, you really do have twice as much of the variable. Measurements of sleep latency in minutes and seconds have this property. When a subject's sleep latency is 20 minutes, it has taken that person twice as long to fall asleep as a subject with a sleep latency of 10 minutes. Measuring the lengths of objects with a ruler also involves the ratio property. Scales that have the ratio property in addition to the interval property are called ratio scales (see Figure 1.2b).
Whereas all ratio scales have the interval property, there are some scales that have the interval property but not the ratio property. These scales are called interval scales. Such scales are relatively rare in the realm of physical measurement; perhaps the best known examples are the Celsius (also known as centigrade) and Fahrenheit temperature scales. The degrees are equally spaced, according to the interval property, but one cannot say that something that has a temperature of 40 degrees is twice as hot as something that has a temperature of 20 degrees. The reason these two temperature scales lack the ratio property is that the zero point for each is arbitrary. Both scales have different zero points (0 °C 32 °F), but in neither case does zero indicate a total lack of heat. (Heat comes from the motion of particles within a substance, and as long as there is some motion, there is some heat.) In contrast, the Kelvin scale of temperature is a true ratio scale because its zero point represents absolute zero temperature--a total lack of heat. (Theoretically, the motion of internal particles has stopped completely.)
Although interval scales may be rare when dealing with physical measurement, they are not uncommon in psychological research. If we grant that IQ scores have the interval property (which is open to debate), we still would not consider IQ a ratio scale. It doesn't make sense to say that someone who scores a zero on a particular IQ test has no intelligence at all, unless intelligence is defined very narrowly. And does it make sense to say that someone with an IQ of 150 is exactly twice as intelligent as someone who scores 75?
One distinction among variables that affects the way they are measured is that some variables vary continuously, whereas others have only a finite (or countable) number of levels with no intermediate values possible. The latter variables are said to be discrete (see Figure 1.3). A simple example of a continuous variable is height; no matter how close two people are in height, it is possible to find someone whose height is midway between those two people. (Quantum physics has shown that there are limitations to the precision of measurement, and it may be meaningless to talk of continuous variables at the quantum level, but these concepts have no practical implications for psychological research.)
An example of a discrete variable is the size of a family. This variable can be measured on a ratio scale by simply counting the family members, but it does not vary continuously--a family can have two or three children, but there is no meaningful value in between. The size of a family will always be a whole number and never involve a fraction. The distinction between discrete and continuous variables affects some of the procedures for displaying and describing data, as you will see in the next chapter. Fortunately, however, the inferential statistics discussed in Parts II through VI of this text are not affected by whether the variable measured is discrete or continuous.
Scales versus Variables
It is important not to confuse variables with the scales with which they are measured. For instance, the temperature of the air outside can be measured on an ordinal scale (e. g., the hottest day of the year, the third hottest day), an interval scale (degrees Celsius or Fahrenheit), or a ratio scale (degrees Kelvin); these three scales are measuring the same physical quantity but yield very different measurements. In many cases, a variable that varies continuously, such as charisma, can only be measured crudely, with relatively few levels (e. g., highly charismatic, somewhat charismatic, not at all charismatic). On the other hand, a continuous variable such as generosity can be measured rather precisely by the exact amount of money donated to charity in a year (which is at least one aspect of generosity). Although in an ultimate sense all scales are discrete, scales with very many levels relative to the quantities measured are treated as continuous for display purposes, whereas scales with relatively few levels are usually treated as discrete (see Chapter 2). Of course, the scale used to measure a discrete variable is always treated as discrete.
Parametric versus Nonparametric Statistics
Because the kinds of statistical procedures described in Parts II through VI of this text are just as valid for interval scales as they are for ratio scales, it is customary to lump these two types of scales together and refer to interval/ ratio scales or interval/ ratio data when some statement applies equally to both types of scales. The data from interval/ ratio scales can be described in terms of smooth distributions, which will be explained in greater detail in the next few chapters. These data distributions sometimes resemble well-known mathematical distributions, which can be described by just a few values called parameters. (I will expand on this point in Section C.) Statistical procedures based on distributions and their parameters are called parametric statistics. With interval/ ratio data it is often (but not always) appropriate to use parametric statistics. Conversely, parametric statistics are truly valid only when you are dealing with interval/ratio data. The bulk of this text (i. e., Parts II through VI) is devoted to parametric statistics. If your data have been measured on a nominal or ordinal scale, or your interval/ratio data do not meet the distributional assumptions of parametric statistics (which will be explained at the appropriate time), you should be using nonparametric statistics (described in Part VII).
Returning to the experiment in which one group of insomniacs gets warm milk before bedtime and the other does not, note that there are actually two variables involved in this experiment. One of these, sleep latency, has already been discussed; it is being measured on a ratio scale. The other variable is less obvious; it is group membership. That is, subjects "vary" as to which experimental condition they are in--some receive milk, and some do not. This variable, which in this case has only two levels, is called the independent variable. A subject's "level" on this variable--that is, which group a subject is placed in--is determined at random by the experimenter and is independent of anything that happens during the experiment. The other variable, sleep latency, is called the dependent variable because its value depends (it is hoped) at least partially on the value of the independent variable. That is, sleep latency is expected to depend in part on whether the subject drinks milk before bedtime. Notice that the independent variable is measured on a nominal scale (the two categories are "milk" and "no milk"). However, because the dependent variable is being measured on a ratio scale, parametric statistical analysis can be performed. If neither of the variables were measured on an interval or ratio scale (for example, if sleep latency were categorized as simply less than or greater than 10 minutes), a nonparametric statistical procedure would be needed (see Part VII). If the independent variable were also being measured on an interval/ ratio scale (e. g., amount of milk given), you would still use parametric statistics, but of a different type (see Chapter 9). I will discuss different experimental designs as they become relevant to the statistical procedures I am describing. For now, I will simply point out that parametric statistics can be used to analyze the data from an experiment, even if one of the variables is measured on a nominal scale.
Experimental versus Correlational Research
It is important to realize that not all research involves experiments; much of the research in some areas of psychology involves measuring differences between groups that were not created by the researcher. For instance, insomniacs can be compared to normal sleepers on variables such as anxiety. If inferential statistics shows that insomniacs, in general, differ from normal people in daily anxiety, it is interesting, but we still do not know whether the greater anxiety causes the insomnia, the insomnia causes the greater anxiety, or some third variable (e. g., increased muscle tension) causes both. We cannot make causal conclusions because we are not in control of who is an insomniac and who is not. Nonetheless, such correlational studies can produce useful insights and sometimes suggest confirming experiments.
To continue this example: If a comparison of insomniacs and normals reveals a statistically reliable difference in the amount of sugar consumed daily, these results suggest that sugar consumption may be interfering with sleep. In this case, correlational research has led to an interesting hypothesis that can be tested more conclusively by means of an experiment. A researcher randomly selects two groups of sugar-eating insomniacs; one group is restricted from eating sugar and the other is not. If the sugar-restricted insomniacs sleep better, that evidence supports the notion that sugar consumption interferes with sleep. If there is no sleep difference between the groups, the causal connection may be in the opposite direction (i. e., lack of sleep may produce an increased craving for sugar), or the insomnia may be due to some as yet unidentified third variable (e. g., maybe anxiety produces both insomnia and a craving for sugar). The statistical analysis is generally the same for both experimental and correlational research; it is the causal conclusions that differ.
Populations versus Samples
In psychological research, measurements are often performed on some aspect of a person. The psychologist may want to know about people's ability to remember faces or solve anagrams or experience happiness. The collection of all people who could be measured, or in whom the psychologist is interested, is called the population. However, it is not always people who are the subjects of measurement in psychological research; a population can consist of laboratory rats, mental hospitals, married couples, small towns, and so forth. Indeed, as far as theoretical statisticians are concerned, a population is just a set (ideally one that is infinitely large) of numbers. The statistical procedures used to analyze data are the same regardless of where the numbers come from (as long as certain assumptions are met, as subsequent chapters will make clear). In fact, the statistical methods you will be studying in this text were originally devised to solve problems in agriculture, beer manufacture, human genetics, and other diverse areas.
If you had measurements for an entire population, you would have so many numbers that you would surely want to use descriptive statistics to summarize your results. This would also enable you to compare any individual to the rest of the population, compare two different variables measured on the same population, or even to compare two different populations measured on the same variable. More often, practical limitations will prevent you from gathering all of the measurements that you might want. In such cases you would obtain measurements for some subset of the population; this subset is called a sample (see Figure 1.4).
Sampling is something we all do in daily life. If you have tried two or three items from the menu of a nearby restaurant and have not liked any of them, you do not have to try everything on the menu before deciding not to dine at that restaurant anymore. When you are conducting research, you follow a more formal sampling procedure. If you have obtained measurements on a sample, you would probably begin by using descriptive statistics to summarize the data in your sample. But it is not likely that you would stop there; usually you would then use the procedures of inferential statistics to draw some conclusions about the entire population from which you obtained your sample. Strictly speaking, these conclusions would be valid only if your sample was a random sample. In reality, truly random samples are virtually impossible to obtain, and most research is conducted on samples of convenience (e. g., students in an introductory psychology class who must either "volunteer" for some experiments or complete some alternative assignment). To the extent that one's sample is not truly random, it may be difficult to generalize one's results to the larger population. The role of sampling in inferential statistics will be discussed at greater length in Part II.
Now we come to the third definition for the term statistic. A statistic is a value derived from the data in a sample rather than a population. It could be a value derived from all of the data in the sample, such as the mean, or it could be just one measurement in the sample, such as the maximum value. If the same mathematical operation used to derive a statistic from a sample is performed on the entire population from which you selected the sample, the result is called a population parameter rather than a sample statistic. As you will see, sample statistics are often used to make estimates of, or draw inferences about, corresponding population parameters.
Many descriptive statistics, as well as sample statistics that are used for inference, are found by means of statistical formulas. Often these formulas are applied to all of the measurements that have been collected, so a notational system is needed for referring to many data points at once. It is also frequently necessary to add many measurements together, so a symbol is needed to represent this operation. Throughout the text, Section B will be reserved for a presentation of the nuts and bolts of statistical analysis. The first Section B will present the building blocks of all statistical formulas: subscripted variables and summation signs.
1. Descriptive statistics is concerned with summarizing a given set of measurements, whereas inferential statistics is concerned with generalizing beyond the given data to some larger potential set of measurements.
2. The type of descriptive or inferential statistics that can be applied to a set of data depends, in part, on the type of measurement scale that was used to obtain the data.
3. If the different levels of a variable can be named, but not placed in any specific order, a nominal scale is being used. The categories in a nominal scale can be numbered, but the numbers cannot be used in any mathematical way--even the ordering of the numbers would be arbitrary.
4. If the levels of a scale can be ordered, but the intervals between adjacent levels are not guaranteed to be the same size, you are dealing with an ordinal scale. The levels can be assigned numbers, as when subjects or items are rank-ordered along some dimension, but these numbers cannot be used for arithmetical operations (e. g., we cannot be sure that the average of ranks 1 and 3, for instance, equals rank 2).
5. If the intervals corresponding to the units of measurement on a scale are always equal (e. g., the difference between 2 and 3 units is the same as between 4 and 5 units), the scale has the interval property. Scales that have equal intervals but do not have a true zero point are called interval scales.
6. If an interval scale has a true zero point (i. e., zero on the scale indicates a total absence of the variable being measured), the ratio between two measurements will be meaningful (a fish that is 30 inches long is twice as long as one that is 15 inches long). A scale that has both the interval and the ratio properties is called a ratio scale.
7. A variable that has countable levels with no values possible between any two adjacent levels is called a discrete variable. A variable that can be measured with infinite precision (i. e., intermediate measurements are always possible), at least in theory, is called a continuous variable. In practice, most physical measurements are treated as continuous even though they are not infinitely precise.
8. The entire set of measurements about which one is concerned is referred to as a population. The measurements that comprise a population can be from individual people, families, animals, hospitals, cities, and so forth. A subset of a population is called a sample, especially if the subset is considerably smaller than the population and is chosen at random.
9. Values that are derived from and in some way summarize samples are called statistics, whereas values that describe a population are called parameters.
10. If at least one of your variables has been measured on an interval or ratio scale, and certain additional assumptions have been met, it may be appropriate to use parametric statistics to draw inferences about population parameters from sample statistics. If all of your variables have been measured on ordinal or nominal scales, or the assumptions of parametric statistics have not been met, it may be necessary to use nonparametric statistics.