Running Randomized Evaluations: A Practical Guide

Glennerster, Rachel, and Kudzai Takavarasha. Running Randomized Evaluations: A Practical Guide. Princeton: Princeton University Press, 2013.

Overview

This book provides a comprehensive yet accessible guide to running randomized impact evaluations of social programs. Drawing on the experience of researchers at the Abdul Latif Jameel Poverty Action Lab, which has run hundreds of such evaluations in dozens of countries throughout the world, it offers practical insights on how to use this powerful technique, especially in resource-poor environments.

This step-by-step guide explains why and when randomized evaluations are useful, in what situations they should be used, and how to prioritize different evaluation opportunities. It shows how to design and analyze studies that answer important questions while respecting the constraints of those working on and benefiting from the program being evaluated. The book gives concrete tips on issues such as improving the quality of a study despite tight budget constraints, and demonstrates how the results of randomized impact evaluations can inform policy.

With its self-contained modules, this one-of-a-kind guide is easy to navigate. It also includes invaluable references and a checklist of the common pitfalls to avoid.

Provides the most up-to-date guide to running randomized evaluations of social programs, especially in developing countries

Offers practical tips on how to complete high-quality studies in even the most challenging environments

Self-contained modules allow for easy reference and flexible teaching and learning

Comprehensive yet nontechnical

Contents pages and more (via Amazon)  &    Brief chapter summaries

The first chapter “This chapter provides an example of how a randomized evaluation can lead to large-scale change and provides a road map for an evaluation and for the rest of the book”

Book review: The impact evaluation primer you have been waiting for? Mark Goldstein, Development Impact blog. 27/11/2013

YouTube video: Book launch talk (1:21) “On 21 Nov, 2013, author of “Running Randomized Evaluations” and Executive Director of J-PAL, Rachel Glennerster, launched the new book at the World Bank. This was followed by a panel discussion with Alix Zwane, Executive Director of Evidence Action, Mary Ann Bates, Deputy Director of J-PAL North America and David Evans, Senior Economist, Office of the Chief Economist, Africa Region, World Bank, led by the Head of DIME, Arianna Legovini.”

Working with messy data sets? Two useful and free tools

I have just come across two useful apps (aka software packages (aka tools)) for when you are working with someone else’s data sets and/or data sets from multiple sources and times. Or,  just your own data that was in a less than perfect state when you last left it :-)

  • OpenRefine: Initially developed by Google and now open source with its own support and development community. You can explore the characteristics of a data set, clean it in quick and comprehensive moves, transform its layout and formats, as well as reconcile and match multiple data sets. There is documentation and videos to show you how to do all this. There is also a book, which you can purchase.The wikipedia entry provides a good overview.
  • Tabula: This package allows you to extract tables of data from pdfs, a task which otherwise can be very tiresome, messy and error prone

And some other packages I have yet to explore

“Quality evidence for policymaking. I’ll believe it when I see the replication”

3ie Replication Paper 1, by Annette N Brown, Drew B Cameron, Benjamin DK Wood, March 2014. Available as pdf

“1. Introduction:  Every so often, a well-publicised replication study comes along that, for a brief period, catalyses serious discussion about the importance of replication for social science research, particularly in economics. The most recent example is the Herndon, Ash, and Pollin replication study (2013) showing that the famous and highly influential work of Reinhart and Rogoff (2010) on the relationship between debt and growth is flawed.

McCullough and McKitrick (2009) document numerous other examples from the past few decades of replication studies that expose serious weaknesses in policy influential research across several fields. The disturbing inability of Dewald et al. (1986) to replicate many of the articles in their Journal of Money, Credit and Banking experiment is probably the most well-known example of the need for more replication research in economics. Yet, replication studies are rarely published and remain the domain of graduate student exercises and the occasional controversy.

This paper takes up the case for replication research, specifically internal replication, or the reanalysis of original data to address the original evaluation question. This focus helps to demonstrate that replication is a crucial element in the production of evidence for evidence-based policymaking, especially in low-and middle-income countries.

Following an overview of the main challenges facing this type of research, the paper then presents a typology of replication approaches for addressing the challenges. The approaches include pure replication, measurement and estimation analysis (MEA), and theory of change analysis (TCA). Although the challenges presented are not new, the discussion here is meant to highlight that the call for replication is not about catching bad or irresponsible researchers. It is about addressing very real challenges in the research and publication processes and thus about producing better evidence to inform development policymaking.”

Other quotes:

“When single evaluations are influential, and any contradictory evaluations of similar interventions can be easily discounted for contextual reasons, the
minimum requirement for validating policy recommendations should be recalculating and re-estimating the measurements and findings using the original raw data to confirm the published results, or a pure replication.”

“On the bright side, there is some evidence of a correlation between public data availability and increased citation counts in the social sciences. Gleditsch (2003) finds that articles published in the Journal of Conflict Resolution that offer data in any form receive twice as many citations as comparable papers without available data (Gleditsch et al. 2003; Evanschitzky et al. 2007). ”

“Replication should be seen as part of the process for translating research findings into evidence for policy and not as a way to catch or call out researchers who, in all likelihood, have the best of intentions when conducting and submitting their research, but face understandable challenges. These challenges include the inevitability of human error, the uncontrolled nature of social science, reporting and publication bias, and the pressure to derive policy recommendations from empirical findings”

“Even in the medical sciences, the analysis of heterogeneity of outcomes, or post-trial subgroup analysis, is not accorded ‘any special epistemic status’ by the United States Food and Drug Administration rules (Deaton 2010 p.440). In the social sciences, testing for and understanding heterogeneous outcomes is crucial to policymaking. An average treatment effect demonstrated by an RCT could result from a few strongly positive outcomes and many negative outcomes, rather than from many positive outcomes, a distinction that would be important for programme design. Most RCT-based studies in development do report heterogeneous outcomes.Indeed, researchers are often required to do so by funders who want studies to have policy recommendations. As such, RCTs as practised – estimating treatment effects for groups not subject to random assignment – face the same challenges as other empirical social science studies.”

“King (2006) encourages graduate students to conduct replication studies but, in his desire to help students publish, he suggests they may leave out replication findings that support the original article and instead look for findings that contribute by changing people’s minds about something. About sensitivity analysis, King (2006 p.121) advises, ‘If it turns out that all those other changes don’t change any substantive conclusions, then leave them out or report them” Aaarrrggghhh!

Rick Davies Comment: This paper is well worth reading!

DEMOCRACY, GOVERNANCE AND RANDOMISED MEDIA ASSISTANCE

BY DEVRA C. MOEHLER, BBC Media Action RESEARCH REPORT // ISSUE 03 // MARCH 2014 // GOVERNANCE. Available as pdf

Foreword by BBC Media Action

“This report summarises how experimental design has been used to assess the effectiveness of governance interventions and to understand the effects of the media on political opinion and behaviour. It provides an analysis of the benefits and drawbacks of experimental approaches and also highlights how field experiments can challenge the assumptions made by media support organisations about the role of the media in different countries.

The report highlights that – despite interest in the use of RCTs to assess governance outcomes – only a small number of field experiments have been conducted in the area of media, governance and democracy.

The results of these experiments are not widely known among donors or implementers. This report aims to address that gap. It shows that media initiatives have led to governance outcomes including improved accountability. However, they have also at times had unexpected adverse effects.

The studies conducted to date have been confined to a small number of countries and the research questions posed were linked to specific intervention and governance outcomes. As a result, there is a limit to what policymakers and practitioners can infer. While this report highlights an opportunity for more experimental research, it also identifies that the complexity of media development can hinder the efficacy of experimental evaluation. It cautions that low?level interventions (eg those aimed at individuals as opposed to working at a national or organisational level) best lend themselves to experimentation. This could create incentives for researchers to undertake experimental research that answers questions focused on individual change rather than wider organisational and systemic change. For example, it would be relatively easy to assess whether a training course does or does not work. Researchers can randomise the journalists that were trained and assess the uptake and implementation of skills. However, it would be much harder to assess how capacity?building efforts affect a media house, its editorial values, content, audiences and media/state relations.

Designing such experiments will be challenging. The intention of this report is to start a conversation both within our own organisation and externally. As researchers we should be prepared to discover that experimentation may not be feasible or relevant for evaluation. In order to strengthen the evidence base, practitioners, researchers and donors need to agree which research questions can and should be answered using experimental research, and, in the absence of experimental research, to agree what constitutes good evidence.

BBC Media Action welcomes feedback on this report and all publications published under our Bridging Theory and Practice Research Dissemination Series.”

Contents
Introduction 5
Chapter 1: Background on DG field experiments 7
Chapter 2: Background on media development assistance and evaluation 9
Chapter 3: Current experiments and quasi?experimental studies on media in developing countries 11
Field experiments
Quasi experiments
Chapter 4: Challenges of conducting field experiments on media development 21
Level of intervention
Complexity of intervention
Research planning under ambiguity
Chapter 5: Challenges to learning from field experiments on media development 26
Chapter 6: Solutions and opportunities 29
Research in media scarce environments
Test assumptions about media effects
To investigate influences on media
References 33

Feminist Evaluation & Research: Theory & Practice

 

 

Sharon Brisolara PhD (Editor), Denise Seigart PhD (Editor), Saumitra SenGupta PhD (Editor)
Paperback: 368 pages, Publisher: The Guilford Press; Publication Date: March 28, 2014 | ISBN-10: 1462515207 | ISBN-13: 978-1462515202 | Edition: 1
Available on Amazon (though at an expensive US$43 for a paperback!)

No reviews available online as yet, but links to these will be posted here when they become available

CONTENTS

I. Feminist Theory, Research and Evaluation

1. Feminist Theory: Its Domain and Applications, Sharon Brisolara
2. Research and Evaluation: Intersections and Divergence, Sandra Mathison
3. Researcher/Evaluator Roles and Social Justice, Elizabeth Whitmore
4. A Transformative Feminist Stance: Inclusion of Multiple Dimensions of Diversity with Gender, Donna M. Mertens
5. Feminist Evaluation for Nonfeminists, Donna Podems

II. Feminist Evaluation in Practice

6. An Explication of Evaluator Values: Framing Matters, Kathryn Sielbeck-Mathes and Rebecca Selove
7. Fostering Democracy in Angola: A Feminist-Ecological Model for Evaluation, Tristi Nichols
8. Feminist Evaluation in South Asia: Building Bridges of Theory and Practice, Katherine Hay
9. Feminist Evaluation in Latin American Contexts, Silvia Salinas Mulder and Fabiola Amariles

III. Feminist Research in Practice

10. Feminist Research and School-Based Health Care: A Three-Country Comparison, Denise Seigart
11. Feminist Research Approaches to Empowerment in Syria, Alessandra Galié
12. Feminist Research Approaches to Studying Sub-Saharan Traditional Midwives, Elaine Dietsch
Final Reflection. Feminist Social Inquiry: Relevance, Relationships, and Responsibility, Jennifer C. Greene

 

Independent Commission for Aid Impact publishes report on “How DFID Learns”

Terms of Reference for the review

The review itself, available here, published 4th April 2014

Selected quotes:

“Overall Assessment: Amber-Red: DFID has allocated at least £1.2 billion for research, evaluation and personnel development (2011-15). It generates considerable volumes of information, much of which, such as funded research, is publicly available. DFID itself is less good at using it and building on experience so as to turn learning into action. DFID does not clearly identify how its investment in learning links to its performance and delivering better impact. DFID has the potential to be excellent at organisational learning if its best practices become common. DFID staff learn well as individuals. They are highly motivated and DFID provides opportunities and resources for them to learn. DFID is not yet, however, managing all the elements that contribute to how it learns as a single, integrated system. DFID does not review the costs, benefits and impact of learning. Insufficient priority is placed on learning during implementation. The emphasis on results can lead to a bias to the positive. Learning from both success and failure should be systematically encouraged”.

RD Comment: The measurement of organisational learning is no easy matter, so it is likely that a lot of people would be very interested to know more about the ICAI approach. The ICAI report does define learning, as follows:

“We define learning as the extent to which DFID gains and uses knowledge  to influence its policy, strategy, plans and actions. This includes  knowledge from both its own work and that of others. Our report makes a distinction between the knowledge  DFID collects and how it is actively applied, which we term as ‘know-how’.”

Okay, and how is this assessed in practice? The key word in this definition is “influence”. Influencing is a notoriously difficult process and outcome to measure. Unfortunately the ICAI report does not provide an explanation of influence was assessed or measured. Annex 5 does show how the topic of learning was broken down into four areas:  making programme choices; creating theories of change;  choosing delivery mechanisms; and adapting and improving implementation of its activities. The report also provides some information on the sources used: “The 31 ICAI reports  considered by the team examined 140 DFID programmes across 40 countries/territories, including visits undertaken to 24 DFID country offices”….” We spoke to 92 individuals, of whom 87 were DFID staff from:  11 DFID fragile state country offices;  5 non-fragile small country offices;  16 HQ departments; and  13 advisory cadres” But how influence was measured remains unclear. ICAI could do better at modeling good practice here: i.e. transparency of evaluation methods. Perhaps then DFID could learn from how ICAI about how to assess its (DFIDs) own learning, in the future. Maybe…

Other quotes

 “DFID is always losing and gaining knowledge. Staff are continuously leaving and joining DFID  (sometimes referred to as ‘churn’). Fragile states are particularly vulnerable to high staff turnover by UK-based staff. For instance, in Afghanistan, DFID informed us that staff turnover is at a rate of 50% per year. We are aware of one project in the Democratic Republic of Congo having had five managers in five years. DFID inform us that a staff appointment typically lasts slightly under three years.” A table that follows show an overall rate of around 10% per year

 “DFID does not track or report on the overall impact of evaluations .The challenge of synthesising, disseminating and using knowledge from an increasing number of evaluation reports is considerable. DFID reports what evaluations are undertaken and it comments on their quality. The annual evaluation report also provides some summary findings. We would have expected DFID also to report the impact that evaluations have on what it does and what it  achieves. Such reporting would cover actions  taken in response to individual evaluations and their impact on DFID’s overall value for money and effectiveness.” It is the case that some agencies do systematcially track what happens to  the recommendations made in evaluation reports.

“DFID has, however, outsourced much of its knowledge production. Of the £1.5 billion for knowledge generation and learning, it has committed at least £1.2 billion to fund others outside DFID to produce knowledge it can use (specifically research, evaluation and PEAKS). Staff are now primarily consumers of knowledge products rather than producers of knowledge itself. We note that there are risks to this model; staff may not have the practical experience that allows them wisely to use this knowledge to make programming decisions.”

“We note that annual and project completion reviews are resources that are not fully supporting DFID’s learning. We are concerned that the lesson-learning section was removed from the  standard format of these reports and is no longer required. Lessons from these reports are not being systematically collated and that there is no central resource regularly quality assuring reviews. “

RD Comment: Paras 2.50 to 2.52 are entertaining. A UK Gov model is presented of how people learn, DFID staff are interviewed about how they think they learn, then differences between the model and what staff report are ascribed to staff lack of understanding: – “This indicates that DFID staff do not consciously  and sufficiently use the experience of their work for learning. It also indicates, within DFID, an over-identification of learning with formal training” OR… maybe it indicates that the the model was wrong and the staff were right???

This para might also raise a smile or two: “There is evidence that DFID staff are sometimes using evidence selectively. It appears this is often driven by managers requiring support for decisions. While such selective use of evidence is not the usual practice across the department, it appears to be occurring with sufficient regularity to be a concern. It is clearly unacceptable.” Golly…

Process tracing: A list

  • Understanding Process Tracing, David Collier, University of California, Berkeley. PS: Political Science and Politics 44, No.4 (2011):823-30. 7 pages.
    • Abstract: “Process tracing is a fundamental tool of qualitative analysis. This method is often invoked by scholars who carry out within-case analysis based on qualitative data, yet frequently it is neither adequately understood nor rigorously applied. This deficit motivates this article, which offers a new framework for carrying out process tracing. The reformulation integrates discussions of process tracing and causal-process observations, gives greater attention to description as a key contribution, and emphasizes the causal sequence in which process-tracing observations can be situated. In the current period of major innovation in quantitative tools for causal inference, this reformulation is part of a wider, parallel effort to achieve greater systematization of qualitative methods. A key point here is that these methods can add inferential leverage that is often lacking in quantitative analysis. This article is accompanied by online teaching exercises, focused on four examples from American politics, two from comparative politics, three from international relations, and one from public health/epidemiology”
      • Great explanation of the difference between straw-in-the-wind tests, hoop tests, smoking-gun tests and doubly-decisive tests, using Sherlock Holmes story “Silver Blaze”
  • Case selection techniques in Process-tracing and the implications of taking the study of causal mechanisms seriously, Derek Beach, Rasmus Brun, 2012, 33 pages
    • Abstract: “This paper develops guidelines for each of the three variants of Process-tracing (PT): explaining outcome PT, theory-testing, and theory-building PT. Case selection strategies are not relevant when we are engaging in explaining outcome PT due to the broader conceptualization of outcomes that is a product of the different understandings of case study research (and science itself) underlying this variant of PT. Here we simply select historically important cases because they are for instance the First World War, not a ‘case of’ failed deterrence or crisis decision-making. Within the two theorycentric variants of PT, typical case selection strategies are most applicable. A typical case is one that is a member of the set of X, Y and the relevant scope conditions for the mechanism. We put forward that pathway cases, where scores on other causes are controlled for, are less relevant when we take the study of mechanisms seriously in PT, given that we are focusing our attention on how a mechanism contributes to produce Y, not on the causal effects of an X upon values of Y. We also discuss the role that deviant cases play in theory-building PT, suggesting that PT cannot stand alone, but needs to be complemented with comparative analysis of the deviant case with typical cases”
  • Process-Tracing Methods: Foundations and Guidelines, Derek Beach, Rasmus Brun Pedersen,  The University of Michigan Press (15 Dec 2012), 248 pages.
    • Description: “Process-tracing in social science is a method for studying causal mechanisms linking causes with outcomes. This enables the researcher to make strong inferences about how a cause (or set of causes) contributes to producing an outcome. Derek Beach and Rasmus Brun Pedersen introduce a refined definition of process-tracing, differentiating it into three distinct variants and explaining the applications and limitations of each. The authors develop the underlying logic of process-tracing, including how one should understand causal mechanisms and how Bayesian logic enables strong within-case inferences. They provide instructions for identifying the variant of process-tracing most appropriate for the research question at hand and a set of guidelines for each stage of the research process.” View the Table of Contents here:
  • Mahoney, James. 2012. “Mahoney, J. (2012). The Logic of Process Tracing Tests in the Social Sciences.  1-28.” Sociological Methods & Research XX(X) (March): 1–28. doi:10.1177/0049124112437709.
    • Abstract: This article discusses process tracing as a methodology for testing hypotheses in the social sciences. With process tracing tests, the analyst combines preexisting generalizations with specific observations from within a single case to make causal inferences about that case. Process tracing tests can be used to help establish that (1) an initial event or process took place, (2) a subsequent outcome also occurred, and (3) the former was a cause of the latter. The article focuses on the logic of different process tracing tests, including hoop tests, smoking gun tests, and straw in the wind tests. New criteria for judging the strength of these tests are developed using ideas concerning the relative importance of necessary and sufficient conditions. Similarities and differences between process tracing and the deductive-nomological model of explanation are explored.
  • Goertz, Gary, and James Mahoney. 2012. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press. See chapter 8 on causal mechanisms and process tracing, and the surrounding chapters 7 and 9 which make up a section on within-case analysis
  • Hutchings, Claire. ‘Process Tracing: Draft Protocol’. Oxfam, 2013. Plus an associated blog posting and an Effectiveness Review which made use of the protocol
  • Schneider, C.Q., Rohlfing, I., 2013. Combining QCA and Process Tracing in Set-Theoretic Multi-Method Research. Sociological Methods & Research 42, 559–597. doi:10.1177/0049124113481341
    • Abstract:  Set-theoretic methods and Qualitative Comparative Analysis (QCA) in particular are case-based methods. There are, however, only few guidelines on how to combine them with qualitative case studies. Contributing to the literature on multi-method research (MMR), we offer the first comprehensive elaboration of principles for the integration of QCA and case studies with a special focus on case selection. We show that QCA’s reliance on set-relational causation in terms of necessity and sufficiency has important consequences for the choice of cases. Using real world data for both crisp-set and fuzzy-set QCA, we show what typical and deviant cases are in QCA-based MMR. In addition, we demonstrate how to select cases for comparative case studies aiming to discern causal mechanisms and address the puzzles behind deviant cases. Finally, we detail the implications of modifying the set-theoretic cross-case model in the light of case-study evidence. Following the principles developed in this article should increase the inferential leverage of set-theoretic MMR.”
  • Rohlfing, Ingo. “Comparative Hypothesis Testing Via Process Tracing.” Sociological Methods & Research 43, no. 4 (November 1, 2014): 606–42. doi:10.1177/0049124113503142.
    • Abstract: Causal inference via process tracing has received increasing attention during recent years. A 2 × 2 typology of hypothesis tests takes a central place in this debate. A discussion of the typology demonstrates that its role for causal inference can be improved further in three respects. First, the aim of this article is to formulate case selection principles for each of the four tests. Second, in focusing on the dimension of uniqueness of the 2 × 2 typology, I show that it is important to distinguish between theoretical and empirical uniqueness when choosing cases and generating inferences via process tracing. Third, I demonstrate that the standard reading of the so-called doubly decisive test is misleading. It conflates unique implications of a hypothesis with contradictory implications between one hypothesis and another. In order to remedy the current ambiguity of the dimension of uniqueness, I propose an expanded typology of hypothesis tests that is constituted by three dimensions.
  • Bennett, A., Checkel, J. (Eds.), 2014Process Tracing: From Metaphor to Analytic Tool. Cambridge University Press
  • Befani, Barbara, and John Mayne. “Process Tracing and Contribution Analysis: A Combined Approach to Generative Causal Inference for Impact Evaluation.IDS Bulletin 45, no. 6 (2014): 17–36. doi:10.1111/1759-5436.12110.
    • Abstract: This article proposes a combination of a popular evaluation approach, contribution analysis (CA), with an emerging method for causal inference, process tracing (PT). Both are grounded in generative causality and take a probabilistic approach to the interpretation of evidence. The combined approach is tested on the evaluation of the contribution of a teaching programme to the improvement of school performance of girls, and is shown to be preferable to either CA or PT alone. The proposed procedure shows that established Bayesian principles and PT tests, based on both science and common sense, can be applied to assess the strength of qualitative and quali-quantitative observations and evidence, collected within an overarching CA framework; thus shifting the focus of impact evaluation from ‘assessing impact’ to ‘assessing confidence’ (about impact).

  • Punton, M., Welle, K., 2015. Straws-in-the-wind, Hoops and Smoking Guns: What can Process Tracing Offer to Impact Evaluation?
    • Abstract:  “This CDI Practice Paper by Melanie Punton and Katharina Welle explains the methodological and theoretical foundations of process tracing, and discusses its potential application in international development impact evaluations. It draws on two early applications of process tracing for assessing impact in international development interventions: Oxfam Great Britain (GB)’s contribution to advancing universal health care in Ghana, and the impact of the Hunger and Nutrition Commitment Index (HANCI) on policy change in Tanzania. In a companion to this paper, Practice Paper 10 Annex describes the main steps in applying process tracing and provides some examples of how these steps might be applied in practice.”
  • Weller, N., & Barnes, J. (2016). Pathway Analysis and the search for causal mechanisms. Sociological Methods & Research, 45(3), 424–457.
    • Abstract: The study of causal mechanisms interests scholars across the social sciences. Case studies can be a valuable tool in developing knowledge and hypotheses about how causal mechanisms function. The usefulness of case studies in the search for causal mechanisms depends on effective case selection, and there are few existing guidelines for selecting cases to study causal mechanisms. We outline a general approach for selecting cases for pathway analysis: a mode of qualitative research that is part of a mixed-method research agenda, which seeks to (1) understand the mechanisms or links underlying an association between some explanatory variable, X1, and an outcome, Y, in particular cases and (2) generate insights from these cases about mechanisms in the unstudied population of cases featuring the X1/Y relationship. The gist of our approach is that researchers should choose cases for comparison in light of two criteria. The first criterion is the expected relationship between X1/Y, which is the degree to which cases are expected to feature the relationship of interest
      between X1 and Y. The second criterion is variation in case characteristics or the extent to which the cases are likely to feature differences in characteristics that can facilitate hypothesis generation. We demonstrate how to apply our approach and compare it to a leading example of pathway analysis in the so-called resource curse literature, a prominent example of a correlation featuring a nonlinear relationship and multiple causal mechanisms.
  • Befani, Barbara, and Gavin Stedman-Bryce. “Process Tracing and Bayesian Updating for Impact Evaluation.” Evaluation, June 24, 2016, 1356389016654584. doi:10.1177/1356389016654584.
    • Abstract: Commissioners of impact evaluation often place great emphasis on assessing the contribution made by a particular intervention in achieving one or more outcomes, commonly referred to as a ‘contribution claim’. Current theory-based approaches fail to provide evaluators with guidance on how to collect data and assess how strongly or weakly such data support contribution claims. This article presents a rigorous quali-quantitative approach to establish the validity of contribution claims in impact evaluation, with explicit criteria to guide evaluators in data collection and in measuring confidence in their findings. Coined ‘Contribution Tracing’, the approach is inspired by the principles of Process Tracing and Bayesian Updating, and attempts to make these accessible, relevant and applicable by evaluators. The Contribution Tracing approach, aided by a symbolic ‘contribution trial’, adds value to impact evaluation theory-based approaches by: reducing confirmation bias; improving the conceptual clarity and precision of theories of change; providing more transparency and predictability to data-collection efforts; and ultimately increasing the internal validity and credibility of evaluation findings, namely of qualitative statements. The approach is demonstrated in the impact evaluation of the Universal Health Care campaign, an advocacy campaign aimed at influencing health policy in Ghana.

Conference: Next Generation Evaluation: Embracing Complexity, Connectivity and Change

“On Nov. 14th 2013, FSG and Stanford Social Innovation Review convened Next Generation Evaluation: Embracing Complexity, Connectivity and Change to discuss emerging ideas that are defining the future of social sector evaluation. The Conference brought together nearly 400 participants to learn about the trends driving the need for evaluation to evolve, the characteristics and approaches that represent Next Generation Evaluation, and potential implications for the social sector.”

The conference website provides 8 video records of presentations and pdf copies of many more.

Introducing Next Generation Evaluation, Hallie Preskill, Managing Director, FSG  Introducing Next Generation Evaluation

Developmental Evaluation: An Approach to Evaluating Complex Social Change Initiatives, Kathy Brennan, Research and Evaluation Advisor, AARP  Developmental Evaluation: An Approach to Evaluating Complex Social Change Initiatives

Shared Measurement: A Catalyst to Drive Collective Learning and Action , Patricia Bowie, Consultant, Magnolia Place Community Initiative,  Shared Measurement: A Catalyst to Drive Collective Learning and Action

Using Data for Good: The Potential and Peril of Big Data, Lucy Bernholz, Visiting Scholar, Center on Philanthropy and Civil Society, Stanford University, Using Data for Good: The Potential and Peril of Big Data

Frontiers of Innovation: A Case Study in Using Developmental Evaluation to Improve Outcomes for Vulnerable Children, James Radner, Assistant Professor, University of Toronto,  Frontiers of Innovation: A Case Study in Using Developmental Evaluation to Improve Outcomes for Vulnerable Children

Project SAM: A Case Study in Shared Performance Measurement For Community Impact , Sally Clifford, Program Director, Experience Matters
Tony Banegas, Philanthropy Advisor, Arizona Community Foundation, Project SAM: A Case Study in Shared Performance Measurement For Community Impact

UN Global Pulse: A Case Study in Leveraging Big Data for Global Development, Robert Kirkpatrick, Director, UN Global Pulse, UN Global Pulse: A Case Study in Leveraging Big Data for Global Development

Panel: Implications for the Social Sector (“So What?”), Presenters: Lisbeth Schorr, Senior Fellow, Center for the Study of Social Policy; Fay Twersky, Director, Effective Philanthropy Group, The William and Flora Hewlett Foundation; Alicia Grunow, Senior Managing Partner, Design, Development, and Improvement Research, Carnegie Foundation for the Advancement of Teaching Moderator: Srik Gopalakrishnan, Director, FSG

Small Group Discussion: Implications for Individuals and Organizations (“Now What?”) , Moderator: Eva Nico, Director, FSG, Embracing Complexity, Connectivity, and Change , Brenda Zimmerman, Professor, York University,  Embracing Complexity, Connectivity, and Change

Rapid Review of Embedding Evaluation in UK Department for International Development

February 2014 Executive Summary ….Final Report

“Purpose of the rapid review:  Since 2009/10, there has been a drive within the Department for International Development (DFID) to strengthen the evidence base upon which policy and programme decisions are made. Evaluation plays a central role in this and DFID has introduced a step change to embed evaluation more firmly within its programmes. The primary purpose of this rapid review is to inform DFID and the international development evaluation community of the progress made and the challenges and opportunities encountered in embedding evaluation across the organisation.”

Selected quotes:…

“There has been a strong drive to recruit, accredit and train staff in evaluation in DFID since 2011. There have been 25 advisers working in a solely or shared evaluation role, a further 12 advisers in roles with an evaluation component, 150 staff accredited in evaluation and 700 people receiving basic training. …While the scaling up of capacity has been rapid, the depth of this capacity is less than required. The number of embedded advisory posts created is significantly fewer than envisaged at the outset, with eight of 25 advisers working 50% or less on evaluation. ”

“The embedding evaluation approach has contributed to a significant, but uneven, increase in the quantity of evaluations commissioned by DFID. These have increased from around 12 per year, prior to 2011, to an estimated 40 completed evaluations in 2013/14”

“The focus of evaluation has changed to become almost exclusively programme oriented. There are very few thematic or country level evaluations planned whereas previously these types of evaluations accounted for the majority of DFID’s evaluation portfolio. This presents a challenge to DFID as it seeks to synthesise the learning from individual projects and programmes into broader lessons for policy and programme planning and design.”

“The embedding evaluation approach has been accompanied by a significant increase in the number of evaluations which has, in turn, led to an increase in the total amount spent on evaluation. However, the average total cost per evaluation has changed little since 2010. ”

“Externally procured evaluation costs appear to be in line with those of other donors. However, forecasts of future spending on evaluation indicate a likely increase in the median amount that DFID pays directly for evaluations. For non-impact evaluations the median budget is £200,000 and for IEs the median budget is £500,000. This represents a significant under-estimation of evaluation costs.”

“Evaluation accounts for a median of 1.9% of programme value, which is in line with expectations. The amount DFID spends on IEs is higher at 2.6% of programme value but this is consistent with the figures of other donors such as the Millennium Challenge Corporation and the World Bank”.

“There has been considerable enthusiasm shown by programme managers for conducting IEs, which now comprise 28% of planned evaluations.”

 

 

 

Do you have a Data Management Plan?

Sam Held discusses Data Management Plans in his 14 February 2014 AEA blog posting on Federal (US) Data Sharing Policies

“A recent trend in the STEM fields is the call to share or access research data, especially data collected with federal funding. The result is requirements from the federal agencies for data management plans in grants, but the different agencies have different requirements. NSF requires a plan for every grant, but NIH only requires plans for grants over $500,000.

The common theme in all policies is “data should be widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data” (NIH’s Statement on Sharing Data 2/26/2003). The call for a data sharing plan forces the PIs, evaluators, and those involved with the proposals to consider what data will be collected, how will it be stored and preserved, and what will be the procedures for sharing or distributing the data within privacy or legal requirements (i.e., HIPAA or IRB requirements). To me – the most important feature here is data formatting. What format will the data be in now and still be accessible or usable in the future or to those who cannot afford expensive software?”

He then points to DMPTool – a University of California online system for developing Data Management Plans.  This site includes more than 20 different templates for the plans, provided by different funding bodies.

DMPTool – a website from the University of California system for developing Data Management Plans. The best component of this site is their collection of funder requirements, including those for NIH, NSF, NEH, and some private foundations.  This site includes templates for the plans. – See more at: http://aea365.org/blog/stem-tig-week-sam-held-on-federal-data-sharing-policies/#sthash.U5QbE7zj.dpuf

 

%d bloggers like this: