All across the social sciences, from development economics to political science, researchers are going into the field to collect data and learn about the world. Successful randomized controlled trials have brought about enormous gains, but less is learned when projects fail. In Failing in the Field, Dean Karlan and Jacob Appel examine the taboo subject of failure in field research so that researchers might avoid the same pitfalls in future work. Drawing on the experiences of top social scientists working in developing countries, this book describes five common categories of failures, reviews six case studies in detail, and concludes with reflections on best (and worst) practices for designing and running field projects, with an emphasis on randomized controlled trials. Failing in the Field is an invaluable “how-not-to” guide to conducting fieldwork and running randomized controlled trials in development settings.
|Publisher:||Princeton University Press|
|Product dimensions:||5.00(w) x 8.10(h) x 0.60(d)|
About the Author
Dean Karlan is professor of economics and finance at Northwestern University and president of Innovations for Poverty Action. He is the recipient of a Guggenheim Fellowship. Jacob Appel previously worked with Innovations for Poverty Action and now designs and runs field experiments with the Behavioural Insights Team.
Read an Excerpt
Failing in the Field
What We can Learn When Field Research Goes Wrong
By Dean Karlan, Jacob Appel
PRINCETON UNIVERSITY PRESSCopyright © 2016 Dean Karlan and Jacob Appel
All rights reserved.
INAPPROPRIATE RESEARCH SETTING
MANTRA:Thou shalt have a well-understood context and an intervention that maps sensibly to a plausible theory of change.
How is a research study and evaluation born? One policy-driven story proceeds as follows: Begin with observation of a problem or challenge; collaborate with informed constituents who have local knowledge; form theories and hypotheses about the cause of the problem; collect diagnostic information; conceive an intervention that may correct the problem; test it; lastly, iterate, tinker, continue testing, all with an eye to learning how to scale. In cases like these, appropriateness of setting is not a question. The entire research process arises from, and is crafted around, the particular context where the problem exists.
This is a nice plan, but not all research follows it. Hypotheses and intervention ideas often originate elsewhere — perhaps they are extensions or consequences of an existing theory, or inspired by results from other research or from experiences in neighboring countries or countries far away. In such cases, researchers with theories or hypotheses already in hand set out to search for appropriate sites to run experiments. As they consider alternatives, they often look for goodness-of-fit on a variety of key characteristics.
First, researchers must verify that the people in the proposed sample frame are actually facing the problem or challenge to which the intervention is a possible answer. This may seem obvious, but in practice it is not always evident. Imagine an informational public health campaign that told people about the importance of sleeping under bed nets. Ideally, you would try to test such a campaign where the information about bed nets is genuinely new. But without a costly pre-survey, how could you tell whether people already know about them?
Second, researchers often seek out partner organizations that can help develop or deliver interventions. Such partners must be both willing and able to participate. In the chapter on partner organization challenges, we will say more about what constitutes "willing and able." In terms of a research setting, the key consideration is that all interventions under study are sufficiently developed and standardized to permit a fair and consistent test. Embarking on a rigorous impact evaluation with a partner that is still tinkering with its product or process can be disastrous. (Side note to avoid confusion: a common misunderstanding is that in order to do an RCT one must make sure that the actual delivery of a service is homogeneous across all treatment recipients. This is not the case. Take, for example, community-driven development programs. In such programs the process is static, but what happens in each village is not — and that is fine. One can evaluate whether that process works to generate higher community cohesion, better provision of public goods, etc. It is important to remember that one is then not testing the impact of building health clinics, if that happens to be a public good some communities choose to build; rather one is testing the impact of the process of community-driven development.)
Where to draw the line between the tinkering stage and firmly established is not so well-defined. Implementers always face a natural learning curve, commit forgivable beginner's mistakes, and need to make adjustments on the fly — all of which can wreak havoc when research requires steadfast adherence to experimental protocols. A guiding principle: the intervention should be well-enough defined such that if it works, it is clear what "it" is that could then be replicated or scaled.
Finally, researchers must look at physical, social, and political features of the setting. First, these should fit the intervention and the theory that underlies it. Here is an obvious one: tests of malaria prevention programs should only be undertaken in areas where malaria is prevalent. A less obvious one: suppose we want to test the impact of timely market price information on farmers' decisions about where to sell their produce. Such a study should be conducted only where farmers genuinely have choices — where there are multiple markets within reasonable travel distance, and those markets allow new entrants, and so forth. Even if we do not see farmers switching markets beforehand, that does not imply they are unable to do so; indeed, their mobility may be revealed by doing the study. Second, context must permit delivery of the intervention. Studying the impact of an in-school reproductive health class likely will not work if talking about sex is taboo. Finally, data collection needs to be possible, whether through survey (Are people willing to talk honestly?) or administrative data (Are institutions willing to share proprietary information?).
In practice, choosing a setting is often a complex process. It takes time and effort, judgment, and a theory that describes how the underlying context will interact with the treatment to be tested.
Problems often arise when researchers try to shoehorn a fit. It is natural, when almost all criteria are met, to convince oneself that "we are close enough" — and especially tempting to do so if considerable resources have already been sunk into the work. In such cases it is easy to become fixated on "getting it done" or credulous that potential obstacles will work themselves out. Beware.
Ultimately this is about risk management. These are not binary and certain conditions in the world. Rather, much of the process relies on trust and judgment. There are a few common pitfalls worth mentioning.
POORLY TIMED STUDIES
Concurrent events, though unrelated to the study, sometimes alter the environment in ways that can compromise research. It could be a change in anything — politics, technology, weather, policy. Such a change arrives as a potential wrinkle in an otherwise well-considered plan. With staff committed, partners on board, and funding lined up, researchers rarely feel they have the luxury to wait and see what happens.
In a study we will see in chapter 9, researchers studied a microcredit product in India that involved borrowers buying chicks from a supplier, raising them, and selling the grown chickens to a distributor. A tight schedule had been created for the supplier and distributor to visit borrowers' villages on pre-specified dates, making drop-offs and pickups in a flatbed truck. But a software upgrade for the microlender, unrelated to the research, took far longer than expected and delayed the launch of the study until the beginning of the Indian monsoon season. Daily rains would make some roads impassable, throwing a wrench in the distribution schedule, and would also make it more difficult for clients to raise chickens in the first place. At this point, the study had already been considerably delayed but researchers decided to press on — a mistake in hindsight.
Similarly, in chapter 10 we discuss a study with a microlender in which loan officers were tasked with delivering health and business training to clients as they repaid their loans. Shortly before the study began, and again unrelated to the research, the lender changed a policy for first-time borrowers. Instead of receiving twenty-four hours of new client orientation (much of which was spent driving home the importance of making payments on time), they shortened the program to eight hours. Though the nuts and bolts of the loan product remained the same, repayment rates for new clients fell immediately following the change. Loan officers found themselves forced to split their limited time between chasing down payments from delinquent clients and delivering health and business training modules, a tension that (along with other issues) ultimately doomed the study.
TECHNICALLY INFEASIBLE INTERVENTIONS
Many studies rely on infrastructure — roads, power grids, cold chains for medicine — simply to deliver treatment. Researchers can verify directly whether these things exist and how reliable they are, but doing so takes time and resources. Such investigation may seem unnecessary, especially when local forecasts are optimistic. The power is always supposed to come back on — next week. Partner staff will say with certainty, "Everyone here has a phone." If the study depends on it, do not take anybody's word for it. Get the data, either from a reliable third party or, better yet, by surveying directly.
Alternatively, learn the hard way. In a study we discuss in chapter 6, Peruvian microfinance bank Arariwa took clients through a multimedia financial literacy training program that included short DVD-based segments. Hoping to avoid the expense of buying all new equipment, the researchers had asked the loan officers who would do the training (and who had been working with these clients for years) whether they would be able to borrow TVs and DVD players from friends, family, or clients. The loan officers were confident they could, so the project went ahead without new equipment. Turns out audio/video setups are harder to come by in rural Peru than the loan officers had suspected; an audit midway through the project revealed that almost none of the loan officers had succeeded in showing clients the DVD segments.
Interventions tested with RCTs often have some novel features: they are additions or tweaks to existing programs or products, expansions to new sites, or new approaches altogether. Such novelty might be precisely what makes for an interesting study. But it also makes it hard to predict what will happen when the intervention lands in the field.
Starting an evaluation too soon — that is, launching before the team has thoroughly kicked the tires on the product or program under study — can be a mistake. On this front, both researchers and partner organizations often fall prey to overconfidence or to false optimism. Most of the details are settled and the outstanding questions that remain may not seem major at the time. The implementers may bring a wealth of relevant experience to the table and may also be able to draw on lessons from similar programs run in similar settings. But experience suggests that each new implementation brings its own unique, and potentially disruptive, wrinkles.
In a case we will see in chapter 7, a microlender in Ghana partnered with researchers to study the relationship between interest rates and client demand for microloans. (As a side note, this project is actually how the coauthors of this book met. Jacob was the research assistant for Dean and others back in 2006, helping to launch this study — a tad earlier than we should have.) The lender, which had previously made only traditional group-based loans, created its first individual-based microloan product for the study and conceived a direct marketing campaign to invite prospective clients to apply. Door-to-door marketing was a new tactic for the lender, and they wisely ran a pre-pilot to test it before launch. Because the new product was similar in many ways to its predecessors, and because experienced staff would be handling it, the research team deemed it unnecessary to extend the pre-pilot to include loan application and administration. But this proved to be a mistake, as seemingly minor changes in the application process created major operational delays.
In another case, discussed in chapter 11, two established Indian firms, a microlender and an insurer, teamed up to bundle a rudimentary insurance policy with the lender's microloans. All parties suspected that fairly priced insurance would be an appealing perk and make for a more desirable product, so they launched it sight unseen. It turned out clients saw it as a burden — an unexpected reaction that completely undermined the study.
RESEARCHERS NOT KNOWING WHEN TO WALK AWAY
Researchers are like hopeful hikers in a mountain cabin. The idea — the peak — is there, beckoning through the window. In the previous examples, they are impetuous, setting off with imperfect knowledge of the terrain and weather forecast, in spite of (or ignorant of) the hazards that might await them. Once on the trail, they find an environment more hostile than they expected. In the cases presented in part 2, we will detail their encounters with rain and sleet, rockslides, impassable crevasses, and the like.
What if the weather is so bad they cannot even open the cabin door? As in the other examples, the hikers are committed and eager, and loathe to give up: they have already obtained their permits, invested in gear, and come a long way to the park. Rather than a fight against the elements, though, they find themselves in a war of attrition. How long will they hold out for a chance at the summit? When is the right time to cut their losses, pack up the car, and head back down the mountain?
Researchers Billy Jack of Georgetown University, Tavneet Suri of MIT, and Chris Woodruff of the University of Warwick lived this dilemma through a study about supply-chain credit. They had partnered with a local bank and a consumer product distributor to create a loan product that allowed retailers to purchase additional inventory from the distributor on credit, all using mobile phones. Having the partners on board was an achievement in itself: both were established, experienced firms with the capacity to deliver such a product and with footprints large enough to support a robust test of it. With an implementation plan in place, research staff on the ground, and agreements signed by all parties, it appeared everyone was ready to set out for the summit.
But every time they circled up for a final gear check, they found something was out of place. First, the bank discovered its back-end system could not accommodate the structure of the product to which everyone had initially agreed. This realization prompted a second round of discussions, after which the research team submitted a new product proposal that remained true to the initial terms and fit the bank's back end. The bank considered this for a few months and finally wrote back with changes that differed from both the initial agreement and the second discussions. A third set of meetings reconciled these differences and set in place a revised implementation plan. As the launch date neared, the research team requested screenshots of the bank's back-end software to verify the product terms had been set as agreed and discovered that the bank had raised the interest rate of the loan significantly. (The minutes the bank had recorded from the previous meeting, including an agreement on an interest rate, differed from what had actually been discussed.) This prompted a fourth meeting, which produced another revised plan, including a limited pilot to start immediately. The pilot quickly ran into technical issues with the software that supported mobile phone integration, rendering the product unviable and forcing them to pause the project again.
Amid all the false starts, the research team had recruited participants and run three separate baseline surveys, all of which had to be scrapped when launch was delayed once more. Getting the picture? The whole saga took nearly three years to unfold; cost countless hours of planning, re-planning, and negotiation from researchers and partners alike; and produced nothing but frustration. Along the way, the researchers could see their chances were narrowing. More than once they considered shutting down the project, but there always seemed to be a glimmer of hope that encouraged them to try again or hold another meeting with the partner. The longer they stayed, the more invested they became, and the more painful (and wasteful) an exit appeared.
At some level this is a straightforward story of failing to acknowledge sunk cost — doubling down in an effort to capitalize on expenditures of time and resources that cannot be recovered. Though we may understand intellectually that continuing is futile, the inclination to do so still persists. We do not want our work to go to waste. Paradoxically, of all the intellectual challenges that arise in the course of designing and implementing a rigorous research study, the greatest may be deciding when to pull the plug.
Excerpted from Failing in the Field by Dean Karlan, Jacob Appel. Copyright © 2016 Dean Karlan and Jacob Appel. Excerpted by permission of PRINCETON UNIVERSITY PRESS.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Table of Contents
Introduction: Why Failures? 1
Part I Leading Causes of Research Failures 17
1 Inappropriate Research Setting 19
2 Technical Design Flaws 29
3 Partner Organization Challenges 40
4 Survey and Measurement Execution Problems 51
5 Low Participation Rates 62
Part II Case Studies 71
6 Credit and Financial Literacy Training: No Delivery Means No Impact 73
7 Interest Rate Sensitivity: Ignoring the Elephant in the Room 84
8 Youth Savings: Real Money Drumming up Fake People 94
9 Poultry Loans: Trying to Fly without a Pilot 105
10 Child Health and Business Training with Credit: No Such Thing as a Simple Study 114
11 Bundling Credit and Insurance: Turns Out More Is Less 125
Appendix | Checklist for Avoiding Failures 138