Quantitative and Qualitative Methods in Impact Evaluation and Measuring Results

Governance and Social Development Resource Centre. Issues Paper by Sabine Garbarino and Jeremy Holland March 2009

1 Introduction
There has been a renewed interest in impact evaluation in recent years amongst development agencies and donors. Additional attention was drawn to the issue recently by a Center for Global Development (CGD) report calling for more rigorous impact evaluations, where ‘rigorous’ was taken to mean studies which tackle the selection bias aspect of the attribution problem (CGD, 2006). This argument was not universally well received in the development community; among other reasons there was the mistaken belief that supporters of rigorous impact evaluations were pushing for an approach solely based on randomised control trials (RCTs). While ‘randomisers’ have appeared to gain the upper hand in a lot of the debates—particularly in the United States—the CGD report in fact recognises a range of approaches and the entity set up as a results of its efforts, 3ie, is moving even more strongly towards mixed methods (White, nd). The Department for International Development (DFID) in its draft policy statements similarly stresses the opportunities arising from a synthesis of qualitative and qualitative approaches in impact evaluation. Other work underway on ‘measuring results’ and ‘using numbers’ recognises the need to find standard indicators which capture non-material impacts and which are sensitive to social difference. This work also stresses the importance of supplementing standard indicators with narrative that can capture those dimensions of poverty that are harder to measure. This paper contributes to the ongoing debate on ‘more and better’ impact evaluations by highlighting experience on combining qualitative and quantitative methods for impact evaluation to ensure that we:

1. measure the different impact of donor interventions on different groups of people and

2. measure the different dimensions of poverty, particularly those that are not readily quantified but which poor people themselves identity as important, such as dignity, respect, security and power.

A third framing question was added during the discussions with DFID staff on the use of the research process itself as a way of increasing accountability and empowerment of the poor.

This paper does not intend to provide a detailed account of different approaches to impact evaluation nor an overview of proposed solutions to specific impact evaluation challenges. Instead it defines and reviews the case for combining qualitative and quantitative approaches to impact evaluation. An important principle that emerges in this discussion is that of equity, or what McGee (2003, 135) calls ‘equality of difference’. By promoting various forms of mixing we are moving methodological discussion away from a norm in development research in which qualitative research plays ‘second fiddle’ to conventional empiricist investigation. This means, for example, that contextual studies should not be used simply to confirm or ‘window dress’ the findings of non-contextual surveys. Instead they should play a more rigorous role of observing and evaluating impacts, even replacing, when appropriate, large-scale and lengthy surveys that can ‘overgenerate’ information in an untimely fashion for policy audiences.

The remainder of the paper is structured as follows. Section 2 briefly sets the scene by summarising the policy context. Section 3 clarifies the terminology surrounding qualitative and quantitative approaches, including participatory research. Section 4 reviews options for combining and sequencing qualitative and quantitative methods and data and looks at recent methodological innovations in measuring and analysing qualitative impacts. Section 5 addresses the operational issues to consider when combing methods in impact evaluation. Section 6 briefly concludes.

The Elusive Craft of Evaluating Advocacy

Original paper by Steven Teles, Department of Political Science, Johns Hopkins University, and Mark Schmitt, Roosevelt Institute. Published with support provided by The William and Flora Hewlett Foundation. Found courtesy of @alb202

A version of this paper was published in the Stanford Social Innovation Review  in May 2011 and is available as a pdf

“The political process is chaotic and often takes years to unfold, making it difficult to use traditional measures to evaluate the effectiveness of advocacy organizations. There are, however, unconventional methods one can use to evaluate advocacy organizations and make strategic investments in that arena”

Cost-Benefit Analysis in World Bank Projects

by Andrew Warner, Independent Evaluation Group, June 2010. Available as pdf

Cost-benefit analysis used to be one of the World Bank?’s signature issues. It helped establish its reputation as the knowledge Bank and served to demonstrate its commitment to measuring results and ensuring accountability to taxpayers. It was the Bank’s answer to the results agenda long before that term became popular. This report takes stock of what has happened to costbenefit analysis at the Bank, based on analysis of four decades of project data, project appraisal and completion reports from recent fiscal years, and interviews with current Bank staff. The percentage of projects that are justified by cost-benefit analysis has been declining for several decades, due to both a decline in standards and difficulty in applying cost-benefit analysis. Where cost-benefit analysis is applied to justify projects, there are examples of excellent analysis but also examples of a lack of attention to fundamental analytical issues such as the public sector rationale and comparison of the chosen project against alternatives. Cost-benefit analysis of completed projects is hampered by the failure to collect relevant data, particularly for low-performing projects. The Bank’s use of cost-benefit analysis for decisions is limited because the analysis is usually prepared after making the decision to proceed with the project.

This study draws two broad conclusions. First, the Bank needs to revisit the policy for costbenefit analysis in a way that recognizes legitimate difficulties in quantifying benefits while preserving a high degree of rigor in justifying projects. Second, it needs to ensure that when costbenefit analysis is done it is done with quality, rigor, and objectivity, as poor data and analysis misinform, and do not improve results. Reforms are required to project appraisal procedures to ensure objectivity, improve both the analysis and the use of evidence at appraisal, and ensure effective use of cost-benefit analysis in decision-making.

Evaluating Legislation and Other Non-Expenditure Instruments in the Area of Information Society and Media

March 2011, DG INFORMATION SOCIETY AND MEDIA

[This manual]”… addresses the pressing need for more guidance in evaluating legislation, which will become an increasingly important part of the [EC] policy officers’ work in the coming years, just as impact assessment is now. More focus on evaluation should be seen as an opportunity to enhance the quality of policy making.

It explains what the evaluation of legislation and other non-expenditure instruments is and gives simple and straightforward guidance on how to go about it. After reading the manual a reader will:

  • understand more fully what the purpose of evaluating legislation is and how it can help with policy work;
  • be able to structure well and efficiently conduct the evaluation process to obtain relevant and high-quality output and to maximize the use of the results of an evaluation in the policy cycle;
  • predict risks and difficulties in the evaluation process and be able to minimize them;
  • understand what the basic evaluation methods and techniques are, and how to use them effectively in the specific context of evaluating legislation
  • be able to find additional guidance and support

The manual is divided in two volumes. Volume 1 will guide a reader step-by-step through the entire evaluation process, from preparation and planning to data collection and analysis, to the dissemination and the use of evaluation results. Volume 2 contains a toolbox on methods and techniques tailored to the specific needs of evaluating legislation”

Documents:

Additional Information/documentation

Mr Bartek Tokarz
European Commission – Directorate General Information Society and Media
Policy Coordination and Strategy
Evaluation and Monitoring
Tel no.:  +32 2 298 58 90
email: infso-c3 in the domain ec.europa.eu

MECHANISM EXPERIMENTS AND POLICY EVALUATIONS

 

Jens Ludwig, Jeffrey R. Kling, Sendhil Mullainathan, Working Paper 17062
, NATIONAL BUREAU OF ECONOMIC RESEARCH, 1050 Massachusetts Avenue, Cambridge, MA 02138, May 2011 pdf copy available

A mechanism experiment is “an experiment that does not test a policy, but one which tests a causal mechanism that underlies a policy”

ABSTRACT
Randomized controlled trials are increasingly used to evaluate policies. How can we make these experiments as useful as possible for policy purposes? We argue greater use should be made of experiments that identify behavioral mechanisms that are central to clearly specified policy questions, what we call “mechanism experiments.” These types of experiments can be of great policy value even if the intervention that is tested (or its setting) does not correspond exactly to any realistic policy option.

RD comment: Well worth reading. Actually entertaining.

See also a blog posting about them by David McKenzie:  What are “Mechanism Experiments” and should we be doing more of them?  on Mon, 2011-06-06 01:02

“Intelligence is about creating and adjusting stories”

…says Gregory Treverton, in his Prospect article “What should we expect of our spies?” , June 2011

RD comment: How do you assess the performance of intelligence agencies, in the way they collect and make sense of the world around them? How do you explain their failure to predict some of the biggest developments in the last thirty years, including the collapse of the Soviet Union, the failure to find weapons of mass destruction (WMD) in Iraq, and  the contagion effects in the more recent Arab Spring?

The American intelligence agencies described by Treverton struggle to make sense of vast masses of information, much of which is incomplete and ambiguous. Storylines emerge and become dominant, which have some degree of fit with the sorrounding political context. “Questions not asked or stories not imagined by policy are not likely to be developed by intelligence”. Referring to the end of the Soviet Union Treverton identifies two possible counter-measures: “What we could have expected of intelligence was not better prediction but earlier and better monitoring  of internal shortcomings. We could also have expected competing stories to challenge the prevailing one. Very late, in 1990, an NIE, “The deepening crisis in the USSR”, did just that laying our four different scenarious, or stories for the coming year”. ”

Discussing the WMD story, he remarks “the most significant part of the WMD story was what intelligence and policy shared: a deeply held mindset that Saddam must have WMD…In the end if most people believe one thing, arguing for another is hard. There is little pressure to rethink the issue and the few dissenters in intelligence are lost in the wilderness. What should have been expected from intelligence in this case was a section of the assessments asking what was the best case that could be made that Iraq did not have WMD.”

Both sets of suggestions seem to have some relevance to the production of evaluations. Should alternate interpretations be more visible? Should evaluations reports contain their own best counter-arguments (as a a free standing section, not simply as straw men to be dutifuly propped up then knocked down)?

There are also other echoes in Treverton’s paper with the practice and problems of monitoring and evaluating aid interventions. The pressing demand for immediate information, at the expense  of a long term perspective: “We used to do analysis, now we do reporting” says one American analyst. Some  aid agency staff have reported similar problems. Impact evaluations? Yes, that would be good, but in reality we are busy meeting the demand for information about more immediate aspects of performance.

Interesting conclusions as well: “At the NIC, I came to think that, for all the technology, strategic analysis was best done in person. I came to think that our real products weren’t those papers, the NIEs. Rather they were the NIOs, the National Intelligence Officers—the experts, not papers. We all think we can absorb information more efficiently by reading, but my advice to my policy colleagues was to give intelligence officers some face time… In 20 minutes, though, the intelligence officers can sharpen the question, and the policy official can calibrate the expertise of the analyst. In that conversation, intelligence analysts can offer advice; they don’t need to be as tightly restricted as they are on paper by the “thou shalt not traffic in policy” edict. Expectations can be calibrated on both sides of the conversation. And the result might even be better policy.”

Evidence of the effectiveness of evidence?

Heart + Mind? Or Just Heart? Experiments in Aid Effectiveness (And a Contest!) by Dean Karlan 05/27/2011 | 4:00 pm Found courtesy of @poverty_action

RD comment: There is a killer assumption behind many of the efforts being made to measure aid effectiveness – that evidence of the effectiveness of specific aid interventions will make a difference. That is,  it will be used to develop better policies and practices. But, as far as I know, much less effort is being invested into testing this assumption, to find out when and where evidence works this way, or not. This is worrying, because anyone looking into how policies are actually made knows that it is often not a pretty picture.

That is why, contrary to my normal policy, I am publicising a blog posting. This posting is by Dean Karlan on an actual experiment that looks at the effect of providing evidence of an aid intervention (a specific form of micro-finance assistance) on the willingness of individual donors to make donations to the aid agency that is delivering the intervention. This relatively simple experiment is now underway.

Equally interesting is the fact that the author has launched, albeit on a very modest scale, a prediction market on the likely results of this experiment. Visitors to the blog are asked to make their predictions on the results of the experiment. When the results of the experiment are available Dean will identify and reward the most successful “bidder” (with two free copies of his new book More Than Good Intentions). Apart from the fun element involved, the use of a prediction maket will enable  Dean to identify to what extent his experiment has generated new knowledge [i.e. experiment results differ a lot from the average prediction], versus confirmed existing common knowledge [i.e. results = the average prediction]. That sort of thing does not happen very often.

So, I encourage you to visit Dean’s blog and participate. You do this by making your predictions using the Comment facility at the end of the blog (where you can also read other’s predictions already made, plus their comments).

Good Enough Guide to Impact Measurement – Rapid Onset Natural Disasters

[from the Emergency Capacity Building Project website]

Published on 6 April 2011

The Department for International Development (DfID / UKAID) awarded a grant for the ECB Project to develop a new Good Enough Guide to Impact Measurement. Lead by Dr. Vivien Walden from Oxfam, a team of ECB specialists from CRS, Save the Children, and World Vision will work together with the British University of East Anglia (UEA).

This guide, and supporting capacity-building materials, will include the development of an impact measurement methodology for rapid onset natural disasters. The methodologies will be field tested by the editorial team in Pakistan and one other country location from September 2011 onwards.

The team welcomes suggestions and input on developing methodologies for impact measurement. Contact us with your ideas at info@ecbproject.org

Tools and Methods for Evaluating the Efficiency of Development Interventions

Palenberg, M. (2011),  Evaluation Working Papers. Bonn: Bundesministe-rium für wirtschaftliche Zusammenarbeit und Entwicklung. Available as pdf.

Foreword:

Previous BMZ Evaluation Working Papers have focused on measuring impact. The present paper explores approaches for assessing efficiency. Efficiency is a powerful concept for decision making and ex-post assessments of development interventions but, nevertheless, often treated rather superficially in project appraisal, project completion and evaluation reports. Assessing efficiency is not an easy task but with potential for improvements, as the report shows. Starting with definitions and the theoretical foundations the author proposes a three level classification related to the analytical power of efficiency analysis methods. Based on an extensive literature review and a broad range of interviews, the report identifies and describes 15 distinct methods and explains how they can be used to assess efficiency. It concludes with an overall assessment of the methods described and with recommendations for their application and further development.

The “Real Book” for story evaluation methods

Marc Maxson, Irene Guijt, and others, 2010. GlobalGiving Foundation (supported by Rockefeller Foundation). Available as pdf.  See also the related website.

[“Real Book” = The Real Book is a central part of the culture of playing music where improvisation is essential. Real books are not for beginners: the reader interprets scant notation, and builds on her own familiarity with chords. The Real Book allows musicians to play an approximate version of hundreds of new songs quickly]

About this book
“This is a collection of narratives that serve to illustrate some not-so-obvious lessons that affected our story pilot project in Kenya. We gathered a large body of community stories that revealed what people in various communities believed they  needed, what services they were getting, and what they would like to see happen in the future. By combining many brief narratives with a few contextual questions we were able to compare and analyze thousands of stories. Taken together, these stories and their meanings provide a perspective with both depth and breadth: Broad enough to inform an organization’s strategic thinking about the root causes of social ailments2, yet deep and real enough to provoke specific and immediate follow-up actions by the local organizations of whom community members speak.

We believe that local people are the “experts” on what they want and know who has (or has not) been helping them. And like democracy, letting them define the problems and solutions that deserve to be discussed is the best method we’ve found for aggregating that knowledge. Professionals working in this field can draw upon the wisdom of this crowd for understanding the local context, and build upon what they know. Community efforts are complex, and our aim is not to predict the future, but help local leaders manage the present. If projects are observed from many angles – especially by those for whom success affects their livelihood – and implementers use these perspectives to mitigate risks and avoid early failure, the probability of future success will be much greater.”

See also:

RD comment 1: See also a different perspective on the Global Giving experience: Networks of self-categorised stories, by Rick Davies

RD comment 2: What I like about this doc: 1. Lots of warts and all descriptions of data collection, with all the problems that occur in real life, 2. the imaginative improvement on Cognitive Edge’s use of triads as tools to enabling self-signifier tools, a circular device call the story marbles approach. This enables respondents to choose which of x categories they will use and then indicate to what extent each of these categories apply to their story. It meets the requirement the author described thus: “What we need is a means to let the storyteller define the right question while also constraining the possible questions enough that we will derive useful clusters of stories with similar question frames.”