Opinions about education programs and practices are offered frequentlyby children, parents, teachers, and policymakers. Credible studies of the impact of programs on the performance of children are far less frequent. Researchers use a variety of tools to determine their impact and efficacy, including sample surveys, narrative studies, and exploratory research. However, randomized field trials, which are commonly used in other disciplines, are rarely employed to measure the impact of education practice. Evidence Matters explores the history and current status of research in education and encourages the more frequent use of such trials. Judith Gueron (Manpower Demonstration Research Corporation), discusses the challenges involved in randomized trials and offers practical advice drawn experience. Robert Boruch (Wharton School, University of Pennsylvania), Dorothy de Moya (Campbell Collaboration Secretariat), and Brooke Snyder (University of Pennsylvania) explore the use of randomized field trials in education and other fields. David Cohen, Stephen Raudenbush, and Deborah Loewenberg Ball (all from the University of Michigan) review the history of progress in education over the past forty years and urge increased research on coherent instruction regimes. Maris Vinovskis (University of Michigan) examines the history and role of the U.S. Department of Education in developing rigorous evaluation of federal programs, and suggests a new National Center for Evaluation and Development. Thomas Cook and Monique Renee Payne (both from Northwestern University) take on the claim that randomized field trials are inappropriate in the U.S. education system. Gary Burtless (Brookings Institution) explores the political and professional factors that influence randomized field trials in economic programs, examining possible explanations for their lack of frequent use in education. Carol Weiss (Harvard University) provides a brief history of community studies in the United States and suggests a variety of alternatives to randomization. It is difficult to gauge the impact of various approaches in education. But the authors give a variety of concrete examples to illustrate the feasibility of randomized trials, and the circumstances under which they are appropriate. By offering a variety of suggestions to improve the methods used to evaluate education programs, the contributors to this volume seek to improve education in the United States. Frederick Mosteller is Roger I. Lee Professor in Mathematical Statistics, emeritus, in the department of statistics at Harvard University. Robert Boruch is the University Trustee Chair Professor the graduate school of education and statistics department at the Wharton School, University of Pennsylvania
|Publisher:||Brookings Institution Press|
|Product dimensions:||6.00(w) x 9.00(h) x (d)|
About the Author
Frederick Mosteller is Roger I. Lee Professor in Mathematical Statistics, emeritus, in the department of statistics at Harvard University. Robert Boruch is the University Trustee Chair Professor in the Graduate School of Education and the statistics department at the Wharton School, University of Pennsylvania.
Read an Excerpt
Randomized Trials in Education Research
Edited by Frederick Mosteller & Robert Boruch
BROOKINGS INSTITUTION PRESS
Copyright © 2001 Brookings Institution Press.
All rights reserved.
Overview and New Directions
THIS BOOK PRESENTS some histories and current status of research on practices in education from several points of view. Our main goal is to help improve education in schools in the United States by encouraging the gathering of better evidence on the impact of educational practices.
That governments have become seriously interested in the quality of research in the education sector is plain. The U.S. Congress's Committee on Education and the Work Force, for example, has been concerned about the wide dissemination of flawed, untested educational initiatives that can be detrimental to children. In 2000 this concern led to the development of a bill designed to address problems in this arena: the Scientifically Based Education Research, Evaluation, and Statistics and Information Act of 2000. Among other things, the bill called for increased scientific rigor in education research and tried to specify what this meant. It called for controlled experiments and the use of properly constituted comparison groups in quantitative research and for scientifically based qualitative research standards.
That particular bill did not pass. But it is a clear signal of congressional interest. Partly as a consequence, the National Research Council has convened a committee titled Scientific Principles in Educational Research to help make explicit the standards of evidence that are appropriate. The National Academy of Education and the Social Science Research Council have initiated a different related effort, focusing on the intellectual organization and state of educational research.
Examples from other countries are not difficult to identify. In the United Kingdom, for example, the Department for Education and Employment, in cooperation with the Economic and Social Research Council, has set up a new Centre for Evidence-Informed Policy and Practice at London's Institute of Education to improve the knowledge base. This move was driven partly by the country's concerns about ideology parading as intellectual inquiry and about the relevance and timeliness of research and the intelligibility of its results. The biennial Conferences on Evidence-Based Policy at Durham University (United Kingdom) anticipated much of this more recent governmental concern.
An appropriate approach to studying the value of an intervention depends on the question that is asked. This book focuses on evidence about what interventions work better. Some of the best evidence to address this question can be generated in randomized field trials (RFTs). By this, we mean situations in which individuals or entire organizations are randomly assigned to one of two or more interventions. The groups that are so constructed do not differ systematically. That is, there are no hidden factors that would lead the groups to differ in unknown ways. When the groups are statistically equivalent at the outset, we can then be assured that comparison of the relative effectiveness of the interventions will be fair. That is, the properly executed RFT will produce estimates of relative effects that are statistically unbiased. Furthermore, one can make legitimate statistical statements about one's confidence in the results.
This book also focuses on empirical studies of the relative effectiveness of programs in education. Other kinds of research are important in building up to controlled studies of program effectiveness, in augmenting such studies, and in adding to what we understand about education and its effects. These other kinds of research include sample surveys, designed to understand which children need assistance and to what extent. They include narrative studies of how children, their parents, their teachers, and policymakers make decisions about their needs. The other kinds of studies include exploratory research on new ideas about what might work. Even throat-clearing essays at times contribute to understanding.
Overview of the Chapters
Chapter 2, by Judith Gueron, discusses the difficulties in practice and their resolutions when randomization is used. The chapter includes counsel about carrying out randomized field trials, using the experiences of the Manpower Development Research Corporation as a guide. Judith Gueron's long and successful experience with such randomized trials brings reality to her advice about carrying out studies. Among the problems is the task of persuading the sponsors and participants of the value of this approach. Her organization has executed many such investigations by persuading sponsors that there is no easier way to get the answers to the right questions, by meeting legal and ethical standards, and by ensuring that each study is large enough in numbers and duration not to miss the effects being studied. Impacts often emerge over time. Gueron emphasizes the importance of giving people the services to which they are entitled, of addressing previously unanswered questions of sponsors and participants, and of having adequate procedures to assure participants of data confidentiality, and to ensure that the innovations offer more than the usual treatment. Although Gueron's advice is especially oriented toward RFTs, most of it has value for any investigator trying to study a sensitive social problem in the turmoil of the real world.
Chapter 3, by Robert Boruch, Dorothy de Moya, and Brooke Snyder, presents many examples of evaluation in education and elsewhere using RFTs. The authors emphasize the systematic study of field trials themselves in the world literature and offer a variety of viewpoints for considering these investigations. Internationally, the number of RFTs is increasing, presumably because the quality of evidence from these studies seems stronger than that from other kinds. Chapter 3 introduces several ways of considering the value of such trials: by looking at the severity of the problem, the realism of the investigation for policy guidance, and the ability to estimate costs and benefits. Attention is also given to the international Campbell Collaboration, which prepares systematic reviews of comparative studies in education, social science, criminal justice, and other areas. Cooperation between the older Cochrane Collaboration in health care and Campbell may be of help to both groups.
In chapter 4, David Cohen, Stephen Raudenbush, and Deborah Ball offer a historical view of progress in education since about 1960. For researchers in education, they argue, the question "Do resources matter?" is far less important than "What resources matter, how, and under what circumstances?" They push on to ask, "What instructional approaches, embedded in what instructional goals, suffice to ensure that students achieve the goals?" Adding resources to schools, they point out, does not automatically bring improved performance to the children. We need to attend to the environment that accompanies resources and to the interactions among such variables as money spent on instruction, years in school, depth of teachers' subject matter preparation, time on task, class size, and how instruction is actually carried out. The authors think that we need to move away from thinking about resources as money to considering environment, students, teachers, and their interactions. They want to see more research not on the components of instructional approaches, but on coherent instructional regimes. These regimes need to be compared for their effect on learning.
The remaining chapters turn to diverse topics, beginning with the pros and cons of using RFTs. The authors recognize the advantages of RFTs in relation to alternatives and ask why they are infrequently used in education. Thomas Cook and Monique Payne suggest that the rarity of RFTs is due to educational researchers' objections to randomized trials. They present arguments to refute these objections. Maris Vinovskis focuses on the institutional history of federally sponsored evaluations, arguing that the rarity of good studies is a function of weak political and administrative support for rigorous research. Gary Burtless focuses on chance, social convention, and major political impediments to randomized trials. And Carol Weiss discusses the practical difficulty of community-wide trials and identifies ways to study education in anticipation of RFTs or in parallel with them. A second topic of discussion is alternatives to experiments. Burtless and Cook and Payne cite evidence on defects in the design and results of nonrandomized trials, especially the difficulty of ensuring that estimates of relative effects are interpretable and statistically unbiased. All of the authors recognize that randomized trials are impossible at times, and they identify alternatives that may suffice, depending on the context of the evaluation. Weiss introduces some approaches that help to build understanding "before the randomizer arrives."
The essays also introduce refreshed ways of thinking about these topics and others. In chapter 5, Maris Vinovskis comments on the checkered history and role of the U.S. Department of Education (USDE) in rigorous evaluation of federal programs. He outlines major contributions such as the Coleman report and those of various political administrations, such as President Lyndon Johnson's enhanced investments in education. He also notes that other administrations, such as President Richard Nixon's, reduced investments in research. He describes how various institutional arrangements have emerged and then disappeared, institutions such as the National Institute of Education. To judge from Vinovskis's description, the National Center for Education Statistics has made substantial progress since the 1980s in the collection and dissemination of descriptive statistics, including student achievement data. According to Vinovskis, some senior executives at the U.S. Department of Education have supported rigorous evaluation, although the record in producing such evaluations has been spotty. The Even Start Family Program is used to illustrate a remarkable effort to sponsor and to execute a randomized field trial.
One reason for the scarcity of randomized trials, says Vinovskis, is that the USDE's Planning and Evaluation Service has focused on many short-term studies that assist the Department of Education. Hence the unit's resources are spread thinly. Similarly, he is concerned that regional laboratories and R&D centers spread their limited resources among many different small-scale projects rather then concentrating them on a few larger and longer-term initiatives. Vinovskis therefore urges policymakers to consider reconfiguration of the institutions responsible for program development and evaluations, beginning perhaps with a new National Center for Evaluation and Development in the USDE to handle large-scale evaluations and development projects. He would structure this new entity so as to reduce political influences on evaluations, for example, by providing for a six-year term of office for the center's commissioner. The proposed center would help consolidate evaluations across education programs and reduce what Vinovskis views as a fragmentation of federal monetary investments in evaluation and development.
In chapter 6, Thomas Cook and Monique Payne recognize that education research is strong in that it has produced high-quality description, survey methods and data, and achievement testing. They believe, however, that education research has been weak in establishing causal relationships. This weakness is based, they say, on some contemporary researchers' objections to using RFTs to understand such relationships. They present counterarguments and empirical evidence to meet these objections.
In response to criticism that RFTs stem from an oversimplified theory of causation, Cook and Payne point out that the purpose of such trials is "more narrow and practicalto identify whether one or more possible causal agents cause change in a given outcome." Opponents of RFTs, they add, have stressed qualitative methods that depend on different assumptions and involve a different objective: hypothesis generation rather than hypothesis testing. Cook and Payne also disagree that RFTs are inappropriate in the complex U.S. education system, noting that experiments can be designed and executed to take into account both the similarities among schools and their heterogeneity. Furthermore, they do not believe that randomized assignment is premature until a good program theory and mediational processes are developed, as some claim. They maintain that the value of randomized field trials depends not on having good program theory, but on making a fair comparison and protecting against statistical bias, regardless of the particular theory. Pointing to trials that have been mounted successfully, they disagree that randomization has been used and has failed. Such criticisms, they note, are often based on nonrandomized trials and on the imperfections of randomized trials in fields other than education.
Claims that RFTs involve trade-offs that are not worth making sometimes take the form of rejecting internal validity in favor of external validity. That is, the opponents of trials attach higher value to the generalizability of a study's results (external validity) and lower value to unbiased estimates of the relative effectiveness of programs (internal validity). Cook and Payne recognize that a given trial may or may not be designed to ensure that the results are generalizable from a particular sample to larger populations. They encourage the use of designs that "minimize the external validity loss."
In considering the complaint that RFTs are unethical, Cook and Payne recognize the legitimate tensions between a lottery and measures of merit as a device for allocating scarce resources. They maintain that randomization can be justified when we do not know which intervention is more effective, especially when resources for interventions are limited. They also examine the view that there are good alternatives to RFTs. They encourage the reader to recognize qualitative case studies as adjuncts to RFTs, rather than as substitutes for a randomized trial. They criticize another purported alternative, theory-based evaluation, which Weiss encourages, pointing out that theory-based approaches are rarely sufficiently explicit. Weiss offers some counterexamples. Furthermore, when there are multiple ways of making theory explicit, the theories are unspecific about timelines, reciprocal causal influences, and counterfactuals. Their concern about others' claims that quasi experiments are an alternative to randomized field trials lies in the fact that the phrase "quasi experiment" is often used promiscuously and that the quasi-experimental designs are often poorly conceived and executed.
In chapter 7, Gary Burtless notes that sizable RFTs have often been mounted to evaluate certain economic programs started on a pilot basis. Among the basic reasons for trials in these areas, he says, is that randomization ensures that we know the direction of causality and helps remove any systematic correlation between program status and unobserved characteristics of participants and nonparticipants. Burtless argues that RFTs can be designed to meet ethical standards while permitting policymakers to test new programs and new variations on old programs. In the area of welfare, says Burtless, some RFTs have been done where economists have found important defects in some of the statistical alternatives. These defects have led to wrong conclusions about the effects of programs.
Burtless also discusses political and professional influences on our willingness to do RFTs in the area of welfare and employment and training, as opposed to education programs. First, Butler notes that economists knowledgeable about RFTs have held important cabinet positions in the U.S. government in agencies such as the U.S. Department of Labor, but not in the U.S. Department of Education. Second, the influence depends on the extent of the federal role in financial, political, and regulatory environments. For instance, the federal government has a substantial role in drugs that are put on the commercial market partly because of regulations that require good evidence on the effects of new drugs. It does not have the same arrangement in place to determine whether education interventions are safe or effective. Burtless also recognizes that some target populations are politically weak, suggesting that their members then are not well positioned to oppose an RFT.
RFTs appear infrequently in education, Burtless suggests, partly because the federal financial share in education is small in relation to state and local investments. Consequently, the federal government has limited influence on mounting RFTs to test innovations in education. Another reason is that economists and other social scientists with a strong interest in rigorous evaluation have not held important positions in federal education departments or the legislative forums that shape policy on education programs. In addition, argues Burtless, teachers, parents, and others in the education community view RFTs as potentially denying benefits to children who need assistance. Furthermore, school administrators and teachers believe that they will lose control in some respects when a trial is undertaken. Finally, because the education community is politically influential, states Burtless, its members have the capacity to impede or stop a trial that they consider unethical or apt to reduce their authority.
In Carol Weiss's view, outlined in chapter 8, in some situations the sampling of individuals is not an appropriate basis for measuring program success. This would be the case in the study of community programs where the purpose is not to change the behavior of the individuals in the community but to change the community itself. One could sample communities, but this approach leads to large studies and the further difficulty that initial conditions in the chosen communities may differ substantially. Weiss offers a brief history of community studies in the United States. She also suggests a variety of alternatives to randomization, including the use of qualitative information to get at the rich structure of the activities and their outcomes.
Carol Weiss's main recommendations as alternatives to randomization are theory-based evaluation (TBE) and Ruling Out Alternative Explanations. By way of example, she describes a program expected to create jobs in a depressed community and a theory of change for an initiative to create a healthier atmosphere for adolescents (considering both positive and negative aspects). By spelling out the theories, Weiss argues, we are able to focus on events and consequences that demonstrate its weaknesses. We know what to pay attention to. We can gather data appropriate to the theory from the beginning and see if it is supported or whether some alternative theory is needed.
Weiss's second approach is to rule out alternative explanations, which she does by employing the usual requirements for indications of causality. For example, we may know the order in time that things should occur, and when they do not, we can eliminate an explanation. Or when it appears unlikely that a sequence of requirements can all be true, we can again eliminate an explanation.
Excerpted from EVIDENCE MATTERS by Frederick Mosteller & Robert Boruch. Copyright © 2001 by Brookings Institution Press. Excerpted by permission. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Table of Contents
|1||Overview and New Directions||1|
|2||The Politics of Random Assignment: Implementing Studies and Affecting Policy||15|
|3||The Importance of Randomized Field Trials in Education and Related Areas||50|
|4||Resources, Instruction, and Research||80|
|5||Missing in Practice? Development and Evaluation at the U.S. Department of Education||120|
|6||Objecting to the Objections to Using Random Assignment in Educational Research||150|
|7||Randomized Field Trials for Policy Evaluation: Why Not in Education?||179|
|8||What to Do until the Random Assigner Comes||198|
|Conference Participants and Contributors||225|