World Bank – Monitoring and Evaluation NEWS

How should we understand “clinical equipoise” when doing RCTs in development?

World Bank Blogs

Submitted by David McKenzie on 2013/09/02

While the blog was on break over the last month, a couple of posts caught my attention by discussing whether it is ethical to do experiments on programs that we think we know will make people better off. First up, Paul Farmer on the Lancet Global Health blog writes:

“What happens when people who previously did not have access are provided with the kind of health care that most of The Lancet’s readership takes for granted? Not very surprisingly, health outcomes are improved: fewer children die when they are vaccinated against preventable diseases; HIV-infected patients survive longer when they are treated with antiretroviral therapy (ART); maternal deaths decline when prenatal care is linked to caesarean sections and anti-haemorrhagic agents to address obstructed labour and its complications; and fewer malaria deaths occur, and drug-resistant strains are slower to emerge, when potent anti-malarials are used in combination rather than as monotherapy.

It has long been the case that randomized clinical trials have been held up as the gold standard of clinical research… This kind of study can only be carried out ethically if the intervention being assessed is in equipoise, meaning that the medical community is in genuine doubt about its clinical merits. It is troubling, then, that clinical trials have so dominated outcomes research when observational studies of interventions like those cited above, which are clearly not in equipoise, are discredited to the point that they are difficult to publish”

This was followed by a post by Eric Djimeu on the 3ie blog asks what else development economics should be learning from clinical trials, in which he writes: Continue reading “How should we understand “clinical equipoise” when doing RCTs in development?”

Evaluating simple interventions that turn out to be not so simple

Conditional Cash Transfer (CCT) programs have been cited in the past as examples of projects that are suitable for testing via randomised control trials. They are relatively simple interventions that can be delivered in a standardised manner. Or so it seemed.

Last year Lant Pritchett, Salimah Samji and Jeffrey Hammer wrote this interesting (if at times difficult to read) paper “It’s All About MeE: Using Structured Experiential Learning (‘e’) to Crawl the Design Space“(the abstract is reproduced below). In the course of that paper they argued that CCT programs are not as simple as they might seem. Looking at three real life examples they identified at least 10 different characteristics of CCTs that need to be specified correctly in order for them to work as expected. Some of these involve binary choices (whether to do x or y) and some involve tuning of a numerical variable. This means there were at least 2 to the power of 10 i.e. 1024 different possible designs. They also pointed out that while changes to some of these characteristics make only a small difference to the results achieved others, including some binary choices, can make quite major differences. In other words, overall it may well be a rugged rather than a smooth design space. The question then occurs, how well are RCTs suited to exploring such spaces?

Today the World Bank Development Blog posted an interesting confirmation of the point made in Pritchett et al paper, in a blog posting titled: Defining Conditional Cash Transfer Programs: An Unconditional Mess. Basically they are in effect pointing out that the design space is even way more complicated than Princhett et al describe!. They conclude

“So, if you’re a donor or a policymaker, it is important not to frame your question to be about the relative effectiveness of “conditional” vs. “unconditional” cash transfer programs: the line between these concepts is too blurry. It turns out that your question needs to be much more precise than that. It is better to define the feasible range of options available to you first (politically, ethically, etc.), and then go after evidence of relative effectiveness of design options along the continuum from a pure UCT to a heavy-handed CCT. Alas, that evidence is the subject of another post…“

So stay tuned fore their next installment. Of course you could quibble with the fact that even this conclusion is a bit optimistic, in that it talks about a a continuum of design options, when in fact it is multi-dimensional space with both smooth and rugged bits

PS: Here is the abstract for the Printchett paper:

“There is an inherent tension between implementing organizations—which have specific objectives and narrow missions and mandates—and executive organizations—which provide resources to multiple implementing organizations. Ministries of finance/planning/budgeting allocate across ministries and projects/programmes within ministries, development organizations allocate across sectors (and countries), foundations or philanthropies allocate across programmes/grantees. Implementing organizations typically try to do the best they can with the funds they have and attract more resources, while executive organizations have to decide what and who to fund. Monitoring and Evaluation (M&E) has always been an element of the accountability of implementing organizations to their funders. There has been a recent trend towards much greater rigor in evaluations to isolate causal impacts of projects and programmes and more ‘evidence base’ approaches to accountability and budget allocations Here we extend the basic idea of rigorous impact evaluation—the use of a valid counter-factual to make judgments about causality—to emphasize that the techniques of impact evaluation can be directly useful to implementing organizations (as opposed to impact evaluation being seen by implementing organizations as only an external threat to their funding). We introduce structured experiential learning (which we add to M&E to get MeE) which allows implementing agencies to actively and rigorously search across alternative project designs using the monitoring data that provides real time performance information with direct feedback into the decision loops of project design and implementation. Our argument is that within-project variations in design can serve as their own counter-factual and this dramatically reduces the incremental cost of evaluation and increases the direct usefulness of evaluation to implementing agencies. The right combination of M, e, and E provides the right space for innovation and organizational capability building while at the same time providing accountability and an evidence base for funding agencies.” Paper available as pdf

I especially like this point about within-project variation (on which I have argue for in the past): “Our argument is that within-project variations in design can serve as their own counter-factual and this dramatically reduces the incremental cost of evaluation and increases the direct usefulness of evaluation to implementing agencies”

Impact Evaluation Toolkit: Measuring the Impact of Results Based Financing on Maternal and Child Health

Christel Vermeersch, Elisa Rothenbühler, Jennifer Renee Sturdy, for the World Bank
Version 1.0. June 2012

Download full document: English [PDF, 3.83MB] / Español [PDF, 3.47MB] / Francais [PDF, 3.97MB]

View online: http://www.worldbank.org/health/impactevaluationtoolkit

“The Toolkit was developed with funding from the Health Results Innovation Trust Fund (HRITF). The objective of the HRITF is to design, implement and evaluate sustainable results-based financing (RBF) pilot programs that improve maternal and child health outcomes for accelerating progress towards reaching MDGs 1c, 4 & 5. A key element of this program is to ensure a rigorous and well designed impact evaluation is embedded in each country’s RBF project in order to document the extent to which RBF programs are effective, operationally feasible, and under what circumstances. The evaluations are essential for generating new evidence that can inform and improve RBF, not only in the HRITF pilot countries, but also elsewhere. The HRITF finances grants for countries implementing RBF pilots, knowledge and learning activities, impact evaluations, as well as analytical work. ”

Evaluating the Evaluators: Some Lessons from a Recent World Bank Self-Evaluation

February 21, 2012 blog posting by Johannes Linn, at Brookings
Found via @WorldBank_IEG tweet

“The World Bank’s Independent Evaluation Group (IEG) recently published a self-evaluation of its activities. Besides representing current thinking among evaluation experts at the World Bank, it also more broadly reflects some of the strengths and gaps in the approaches that evaluators use to assess and learn from the performance of the international institutions with which they work…. Johannes Linn served as an external peer reviewer of the self-evaluation and provides a bird’s-eye view on the lessons learned. “

Key lessons as seen by Linn

An evaluation of evaluations should focus not only on process, but also on the substantive issues that the institution is grappling with.
An evaluation of the effectiveness of evaluations should include a professional assessment of the quality of evaluation products.
An evaluation of evaluations should assess:
o How effectively impact evaluations are used;
o How scaling up of successful interventions is treated;
o How the experience of other comparable institutions is utilized;
o Whether and how the internal policies, management practices and incentives of the institution are effectively assessed;
o Whether and how the governance of the institution is evaluated; and
o Whether and how internal coordination, cooperation and synergy among units within the organizations are assessed

Read the complete posting, with arguments behind each of the above points, here

World Bank – Raising the Bar on Transparency, Accountability and Openness

Blog posting by Hannah George on Thu, 02/16/2012 – 18:01 Found via @TimShorten

“The World Bank has taken landmark steps to make information accessible to the public and globally promote transparency and accountability, according to the first annual report on the World Bank’s Access to Information (AI) Policy.[20/02/2012 – links is not working – here is a link to a related doc, World Bank Policy on Access to Information Progress Report : January through March 2011]

“The World Bank’s Access to Information Policy continues to set the standard for other institutions to strive for,” said Chad Dobson, executive director of the Bank Information Center. Publish What You Fund recently rated the Bank “best performer” in terms of aid transparency out of 58 donors for the second year in a row. Furthermore, the Center for Global Development and Brookings ranked the International Development Association (the World Bank’s Fund for the Poorest) as a top donor in transparency and learning in its 2011 Quality of Official Development Assistance Assessment (QuODA).

Cost-Benefit Analysis in World Bank Projects

by Andrew Warner, Independent Evaluation Group, June 2010. Available as pdf

Cost-benefit analysis used to be one of the World Bank?’s signature issues. It helped establish its reputation as the knowledge Bank and served to demonstrate its commitment to measuring results and ensuring accountability to taxpayers. It was the Bank’s answer to the results agenda long before that term became popular. This report takes stock of what has happened to costbenefit analysis at the Bank, based on analysis of four decades of project data, project appraisal and completion reports from recent fiscal years, and interviews with current Bank staff. The percentage of projects that are justified by cost-benefit analysis has been declining for several decades, due to both a decline in standards and difficulty in applying cost-benefit analysis. Where cost-benefit analysis is applied to justify projects, there are examples of excellent analysis but also examples of a lack of attention to fundamental analytical issues such as the public sector rationale and comparison of the chosen project against alternatives. Cost-benefit analysis of completed projects is hampered by the failure to collect relevant data, particularly for low-performing projects. The Bank’s use of cost-benefit analysis for decisions is limited because the analysis is usually prepared after making the decision to proceed with the project.

This study draws two broad conclusions. First, the Bank needs to revisit the policy for costbenefit analysis in a way that recognizes legitimate difficulties in quantifying benefits while preserving a high degree of rigor in justifying projects. Second, it needs to ensure that when costbenefit analysis is done it is done with quality, rigor, and objectivity, as poor data and analysis misinform, and do not improve results. Reforms are required to project appraisal procedures to ensure objectivity, improve both the analysis and the use of evidence at appraisal, and ensure effective use of cost-benefit analysis in decision-making.

WRITING TERMS OF REFERENCE FOR AN EVALUATION: A HOW-TO GUIDE

Independent Evaluation Group, World Bank 2011. Available as pdf.

“The terms of reference (ToR) document defines all aspects of how a consultant or a team will conduct an evaluation. It defines the objectives and the scope of the evaluation, outlines the responsibilities of the consultant or team, and provides a clear description of the resources available to conduct the study. Developing an accurate and wellspecified ToR is a critical step in managing a high-quality evaluation. The evaluation ToR document serves as the basis for a contractual arrangement with one or more evaluators and sets the parameters against which the success of the assignment can be measured.

The specific content and format for a ToR will vary to some degree based on organizational requirements, local practices, and the type of
assignment. However, a few basic principles and guidelines inform the development of any evaluation ToR. This publication provides userfriendly guidance for writing ToRs by covering the following areas:

1. Definition and function. What is a ToR? When is one needed? What are its objectives? This section also highlights how an evaluation ToR is different from other ToRs.
2. Content. What should be included in a ToR? What role(s) will each of the sections of the document serve in supporting and facilitating the completion of a high-quality evaluation?
3. Preparation. What needs to be in place for a practitioner or team to develop the ToR for an evaluation or review?
4. Process. What steps should be taken to develop an effective ToR? Who should be involved for each of these steps?

A quality checklist and some Internet resources are included in this publication to foster good practice in writing ToRs for evaluations and reviews of projects and programs. The publication also provides references and resources for further information.”

[RD Comment: See also: Guidance on Terms of Reference for an Evaluation: A List, listing ToRs guidance documents produced by 9 different organisations]

Impact Evaluation in Practice

Paul J. Gertler, Sebastian Martinez, Patrick Premand, Laura B. Rawlings, Christel M. J. Vermeersch, World Bank, 2011

Impact Evaluation in Practice is available as downloadable pdf, and can be bought online.

“Impact Evaluation in Practice presents a non-technical overview of how to design and use impact evaluation to build more effective programs to alleviate poverty and improve people’s lives. Aimed at policymakers, project managers and development practitioners, the book offers experts and non-experts alike a review of why impact evaluations are important and how they are designed and implemented. The goal is to further the ability of policymakers and practitioners to use impact evaluations to help make policy decisions based on evidence of what works the most effectively.

The book is accompanied by a set of training material — including videos and power point presentations — developed for the “Turning Promises to Evidence” workshop series of the Office of the Chief Economist for Human Development. It is a reference and self-learning tool for policy-makers interested in using impact evaluations and was developed to serve as a manual for introductory courses on impact evaluation as well as a teaching resource for trainers in academic and policy circles.

CONTENTS
PART ONE. INTRODUCTION TO IMPACT EVALUATION
Chapter 1. Why Evaluate?
Chapter 2. Determining Evaluation Questions
PART TWO. HOW TO EVALUATE
Chapter 3. Causal Inference and Counterfactuals
Chapter 4. Randomized Selection Methods
Chapter 5. Regression Discontinuity Design
Chapter 6. Difference-in-Differences
Chapter 7. Matching
Chapter 8. Combining Methods
Chapter 9. Evaluating Multifaceted Programs
PART THREE. HOW TO IMPLEMENT AN IMPACT EVALUATION
Chapter 10. Operationalizing the Impact Evaluation Design
Chapter 11. Choosing the Sample
Chapter 12. Collecting Data
Chapter 13. Producing and Disseminating Findings
Chapter 14. Conclusion