Evaluation Revisited – Improving the Quality of Evaluative Practice by Embracing Complexity

Utrecht Conference Report. Irene Guijt, Jan Brouwers, Cecile Kusters, Ester Prins and Bayaz Zeynalova. March 2011. Available as pdf

This report summarises the outline and outputs of the Conference ‘Evaluation Revisited: Improving the Quality of Evaluative Practice by Embracing Complexity’’, which took place on May 20-21, 2010. It also adds additional insights and observations related to the themes of the conference, which emerged in presentations about the conference at specific events.

Contents (109 pages):

1 What is Contested and What is at Stake
1.1 Trends at Loggerheads
1.2 What is at Stake?
1.3 About the May Conference
1.4 About the Report
2 Four Concepts Central to the Conference
2.1 Rigour
2.2 Values
2.3 Standards
2.4 Complexity
3 Three Questions and Three Strategies for Change
3.1 What does ‘evaluative practice that embraces complexity’ mean in practice?
3.2 Trade-offs and their Consequences
3.3 (Re)legitimise Choice for Complexity
4 The Conference Process in a Nutshell

Behavioral economics and randomized trials: trumpeted, attacked and parried

This is the title of a blog posting by Chris Blattman, which points to and comments on a debate  in the Boston Review, March/April 2011

The focus of the debate is an article by Rachel Glennerster and Michael Kremer, titled Small Changes, Big Results:  Behavioral Economics at Work in Poor Countries

“Behavioral economics has changed the way we implement public policy in the developed world. It is time we harness its approaches to alleviate poverty in developing countries as well.”

This article is part of Small Changes, Big Results, a forum on applying behavioral economics to global development. This includes the following 7 responses to Glennerster and  Kremer, and their response.

Diane Coyle: There’s nothing irrational about rising prices and falling demand. (March 14)

Eran Bendavid: Randomized trials are not infallible—just look at medicine. (March 15)

Pranab Bardhan: As the experimental program becomes its own kind of fad, other issues in development are being ignored. (March 16)

José Gómez-Márquez: We want to empower locals to invent, so they can be collaborators, not just clients. (March 17)

Chloe O’Gara:  You can’t teach a child to read with an immunization schedule. (March 17)

Jishnu Das, Shantayanan Devarajan, and Jeffrey S. Hammer:Even if experiments show us what to do, can we rely on government action? (March 18)

Daniel N. Posner: We cannot hope to understand individual behavior apart from the community itself. (March 21)

Rachel Glennerster and Michael Kremer reply: Context is important, and meticulous experimentation can improve our understanding of it. (March 22)

PS (26th March 2011: See also Ben Goldacre’s Bad Science column in today’s Guardian: Unlikely boost for clinical trials (/When ethics committees kill)

“At present there is a bizarre paradox in medicine. When there is no evidence on which treatment is best, out of two available options, then you can choose one randomly at will, on a whim, in clinic, and be subject to no special safeguards. If, however, you decide to formally randomise in the same situation, and so generate new knowledge to improve treatments now and in the future, then suddenly a world of administrative obstruction opens up before you.

This is not an abstract problem. Here is one example. For years in A&E, patients with serious head injury were often treated with steroids, in the reasonable belief that this would reduce swelling, and so reduce crushing damage to the brain, inside the fixed-volume box of your skull.

Researchers wanted to randomise unconscious patients to receive steroids, or no steroids, instantly in A&E, to find out which was best. This was called the CRASH trial, and it was a famously hard fought battle with ethics committees, even though both treatments – steroids, or no steroids – were in widespread, routine use. Finally, when approval was granted, it turned out that steroids were killing patients.”

When is the rigorous impact evaluation of development projects a luxury, and when is it a necessity?

by Michael Clemens and Gabriel Demombynes, Centre for Global Development, 10/11/2010  Download (PDF, 733 KB)

“The authors study one high-profile case: the Millennium Villages Project (MVP), an experimental and intensive package intervention to spark sustained local economic development in rural Africa. They illustrate the benefits of rigorous impact evaluation in this setting by showing that estimates of the project’s effects depend heavily on the evaluation method.

Comparing trends at the MVP intervention sites in Kenya, Ghana, and Nigeria to trends in the surrounding areas yields much more modest estimates of the project’s effects than the before-versus-after comparisons published thus far by the MVP. Neither approach constitutes a rigorous impact evaluation of the MVP, which is impossible to perform due to weaknesses in the evaluation design of the project’s initial phase. These weaknesses include the subjective choice of intervention sites, the subjective choice of comparison sites, the lack of baseline data on comparison sites, the small sample size, and the short time horizon. We describe how the next wave of the intervention could be designed to allow proper evaluation of the MVP’s impact at little additional cost.”

See responses to this paper here:

Do we need a Minimum Level of Failure (MLF)?

This is the title of a new posting on the Rick on the Road blog, the editorial arm of Monitoring and Evaluation NEWS. It argues that improving aid effectiveness by identifying and culling out the worst performers is a different and possibly more appropriate strategy than identifying and replicating the best performers. This argument ties directly into the debate about RCTs, which are considered by some as the best means for improving aid effectiveness.

PS 22 October: Kirstin Hinds of the DFID Evaluation Department has pointed out (in a reply to my original blog posting) that DFID has published a more recent independent review of project completion reports (covering 2005-2008) which may be of interest.

Other recent postings on the Rick on the Road blog include:

A full list of all editorial posts is available here.

“Instruments, Randomization and Learning about Development”

Angus Deaton,  Research Program in Development Studies, Center for Health and Wellbeing, Princeton University, March, 2010 Full text as pdf

ABSTRACT
There is currently much debate about the effectiveness of foreign aid and about what kind of projects can engender economic development. There is skepticism about the ability of econometric analysis to resolve these issues, or of development agencies to learn from their own experience. In response, there is increasing use in development economics of randomized controlled trials (RCTs) to accumulate credible knowledge of what works, without over-reliance on questionable theory or statistical methods. When RCTs are not possible, the proponents of these methods advocate quasi-randomization through instrumental variable (IV) techniques or natural experiments. I argue that many of these applications are unlikely to recover quantities that are useful for policy or understanding: two key issues are the misunderstanding of exogeneity, and the handling of heterogeneity. I illustrate from the literature on aid and growth. Actual randomization faces similar problems as does quasi-randomization, notwithstanding rhetoric to the contrary. I argue that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statistical or epistemic superiority. I illustrate using prominent experiments in development and elsewhere. As with IV methods, RCT-based evaluation of projects, without guidance from an understanding of underlying mechanisms, is unlikely to lead to scientific progress in the understanding of economic development. I welcome recent trends in development experimentation away from the evaluation of projects and towards the evaluation of theoretical mechanisms.

See also Why Works? by Lawrence Hadded, Development Horizons blog

See also  Carlos Baharona’s Randomised Control Trials for the Impact Evaluation of Development Initiatives: A Statistician’s Point of View. Introduction: This [ILAC Working Paper]  paper contains the technical and practical reflections of a statistician on the use of Randomised Control Trial designs (RCT) for evaluating the impact of development initiatives. It is divided into three parts. The first part discusses RCTs in impact evaluation, their origin, how they have developed and the debate that has been generated in the evaluation circles. The second part examines difficult issues faced in applying RCT designs to the impact evaluation of development initiatives, to what extent this type of design can be applied rigorously, the validity of the assumptions underlying RCT designs in this context, and the opportunities and constraints inherent in their adoption. The third part discusses the some of the ethical issues raised by RCTs, the need to establish ethical standards for studies about development options and the need for an open mind in the selection of research methods and tools.

%d bloggers like this: