3ie Replication Paper 1, by Annette N Brown, Drew B Cameron, Benjamin DK Wood, March 2014. Available as pdf
“1. Introduction: Every so often, a well-publicised replication study comes along that, for a brief period, catalyses serious discussion about the importance of replication for social science research, particularly in economics. The most recent example is the Herndon, Ash, and Pollin replication study (2013) showing that the famous and highly influential work of Reinhart and Rogoff (2010) on the relationship between debt and growth is flawed.
McCullough and McKitrick (2009) document numerous other examples from the past few decades of replication studies that expose serious weaknesses in policy influential research across several fields. The disturbing inability of Dewald et al. (1986) to replicate many of the articles in their Journal of Money, Credit and Banking experiment is probably the most well-known example of the need for more replication research in economics. Yet, replication studies are rarely published and remain the domain of graduate student exercises and the occasional controversy.
This paper takes up the case for replication research, specifically internal replication, or the reanalysis of original data to address the original evaluation question. This focus helps to demonstrate that replication is a crucial element in the production of evidence for evidence-based policymaking, especially in low-and middle-income countries.
Following an overview of the main challenges facing this type of research, the paper then presents a typology of replication approaches for addressing the challenges. The approaches include pure replication, measurement and estimation analysis (MEA), and theory of change analysis (TCA). Although the challenges presented are not new, the discussion here is meant to highlight that the call for replication is not about catching bad or irresponsible researchers. It is about addressing very real challenges in the research and publication processes and thus about producing better evidence to inform development policymaking.”
Other quotes:
“When single evaluations are influential, and any contradictory evaluations of similar interventions can be easily discounted for contextual reasons, the
minimum requirement for validating policy recommendations should be recalculating and re-estimating the measurements and findings using the original raw data to confirm the published results, or a pure replication.”
“On the bright side, there is some evidence of a correlation between public data availability and increased citation counts in the social sciences. Gleditsch (2003) finds that articles published in the Journal of Conflict Resolution that offer data in any form receive twice as many citations as comparable papers without available data (Gleditsch et al. 2003; Evanschitzky et al. 2007). ”
“Replication should be seen as part of the process for translating research findings into evidence for policy and not as a way to catch or call out researchers who, in all likelihood, have the best of intentions when conducting and submitting their research, but face understandable challenges. These challenges include the inevitability of human error, the uncontrolled nature of social science, reporting and publication bias, and the pressure to derive policy recommendations from empirical findings”
“Even in the medical sciences, the analysis of heterogeneity of outcomes, or post-trial subgroup analysis, is not accorded ‘any special epistemic status’ by the United States Food and Drug Administration rules (Deaton 2010 p.440). In the social sciences, testing for and understanding heterogeneous outcomes is crucial to policymaking. An average treatment effect demonstrated by an RCT could result from a few strongly positive outcomes and many negative outcomes, rather than from many positive outcomes, a distinction that would be important for programme design. Most RCT-based studies in development do report heterogeneous outcomes.Indeed, researchers are often required to do so by funders who want studies to have policy recommendations. As such, RCTs as practised – estimating treatment effects for groups not subject to random assignment – face the same challenges as other empirical social science studies.”
“King (2006) encourages graduate students to conduct replication studies but, in his desire to help students publish, he suggests they may leave out replication findings that support the original article and instead look for findings that contribute by changing people’s minds about something. About sensitivity analysis, King (2006 p.121) advises, ‘If it turns out that all those other changes don’t change any substantive conclusions, then leave them out or report them” Aaarrrggghhh!
Rick Davies Comment: This paper is well worth reading!