Evaluating Legislation and Other Non-Expenditure Instruments in the Area of Information Society and Media

March 2011, DG INFORMATION SOCIETY AND MEDIA

[This manual]”… addresses the pressing need for more guidance in evaluating legislation, which will become an increasingly important part of the [EC] policy officers’ work in the coming years, just as impact assessment is now. More focus on evaluation should be seen as an opportunity to enhance the quality of policy making.

It explains what the evaluation of legislation and other non-expenditure instruments is and gives simple and straightforward guidance on how to go about it. After reading the manual a reader will:

  • understand more fully what the purpose of evaluating legislation is and how it can help with policy work;
  • be able to structure well and efficiently conduct the evaluation process to obtain relevant and high-quality output and to maximize the use of the results of an evaluation in the policy cycle;
  • predict risks and difficulties in the evaluation process and be able to minimize them;
  • understand what the basic evaluation methods and techniques are, and how to use them effectively in the specific context of evaluating legislation
  • be able to find additional guidance and support

The manual is divided in two volumes. Volume 1 will guide a reader step-by-step through the entire evaluation process, from preparation and planning to data collection and analysis, to the dissemination and the use of evaluation results. Volume 2 contains a toolbox on methods and techniques tailored to the specific needs of evaluating legislation”

Documents:

Additional Information/documentation

Mr Bartek Tokarz
European Commission – Directorate General Information Society and Media
Policy Coordination and Strategy
Evaluation and Monitoring
Tel no.:  +32 2 298 58 90
email: infso-c3 in the domain ec.europa.eu

MECHANISM EXPERIMENTS AND POLICY EVALUATIONS

 

Jens Ludwig, Jeffrey R. Kling, Sendhil Mullainathan, Working Paper 17062
, NATIONAL BUREAU OF ECONOMIC RESEARCH, 1050 Massachusetts Avenue, Cambridge, MA 02138, May 2011 pdf copy available

A mechanism experiment is “an experiment that does not test a policy, but one which tests a causal mechanism that underlies a policy”

ABSTRACT
Randomized controlled trials are increasingly used to evaluate policies. How can we make these experiments as useful as possible for policy purposes? We argue greater use should be made of experiments that identify behavioral mechanisms that are central to clearly specified policy questions, what we call “mechanism experiments.” These types of experiments can be of great policy value even if the intervention that is tested (or its setting) does not correspond exactly to any realistic policy option.

RD comment: Well worth reading. Actually entertaining.

See also a blog posting about them by David McKenzie:  What are “Mechanism Experiments” and should we be doing more of them?  on Mon, 2011-06-06 01:02

“Intelligence is about creating and adjusting stories”

…says Gregory Treverton, in his Prospect article “What should we expect of our spies?” , June 2011

RD comment: How do you assess the performance of intelligence agencies, in the way they collect and make sense of the world around them? How do you explain their failure to predict some of the biggest developments in the last thirty years, including the collapse of the Soviet Union, the failure to find weapons of mass destruction (WMD) in Iraq, and  the contagion effects in the more recent Arab Spring?

The American intelligence agencies described by Treverton struggle to make sense of vast masses of information, much of which is incomplete and ambiguous. Storylines emerge and become dominant, which have some degree of fit with the sorrounding political context. “Questions not asked or stories not imagined by policy are not likely to be developed by intelligence”. Referring to the end of the Soviet Union Treverton identifies two possible counter-measures: “What we could have expected of intelligence was not better prediction but earlier and better monitoring  of internal shortcomings. We could also have expected competing stories to challenge the prevailing one. Very late, in 1990, an NIE, “The deepening crisis in the USSR”, did just that laying our four different scenarious, or stories for the coming year”. ”

Discussing the WMD story, he remarks “the most significant part of the WMD story was what intelligence and policy shared: a deeply held mindset that Saddam must have WMD…In the end if most people believe one thing, arguing for another is hard. There is little pressure to rethink the issue and the few dissenters in intelligence are lost in the wilderness. What should have been expected from intelligence in this case was a section of the assessments asking what was the best case that could be made that Iraq did not have WMD.”

Both sets of suggestions seem to have some relevance to the production of evaluations. Should alternate interpretations be more visible? Should evaluations reports contain their own best counter-arguments (as a a free standing section, not simply as straw men to be dutifuly propped up then knocked down)?

There are also other echoes in Treverton’s paper with the practice and problems of monitoring and evaluating aid interventions. The pressing demand for immediate information, at the expense  of a long term perspective: “We used to do analysis, now we do reporting” says one American analyst. Some  aid agency staff have reported similar problems. Impact evaluations? Yes, that would be good, but in reality we are busy meeting the demand for information about more immediate aspects of performance.

Interesting conclusions as well: “At the NIC, I came to think that, for all the technology, strategic analysis was best done in person. I came to think that our real products weren’t those papers, the NIEs. Rather they were the NIOs, the National Intelligence Officers—the experts, not papers. We all think we can absorb information more efficiently by reading, but my advice to my policy colleagues was to give intelligence officers some face time… In 20 minutes, though, the intelligence officers can sharpen the question, and the policy official can calibrate the expertise of the analyst. In that conversation, intelligence analysts can offer advice; they don’t need to be as tightly restricted as they are on paper by the “thou shalt not traffic in policy” edict. Expectations can be calibrated on both sides of the conversation. And the result might even be better policy.”

Evidence of the effectiveness of evidence?

Heart + Mind? Or Just Heart? Experiments in Aid Effectiveness (And a Contest!) by Dean Karlan 05/27/2011 | 4:00 pm Found courtesy of @poverty_action

RD comment: There is a killer assumption behind many of the efforts being made to measure aid effectiveness – that evidence of the effectiveness of specific aid interventions will make a difference. That is,  it will be used to develop better policies and practices. But, as far as I know, much less effort is being invested into testing this assumption, to find out when and where evidence works this way, or not. This is worrying, because anyone looking into how policies are actually made knows that it is often not a pretty picture.

That is why, contrary to my normal policy, I am publicising a blog posting. This posting is by Dean Karlan on an actual experiment that looks at the effect of providing evidence of an aid intervention (a specific form of micro-finance assistance) on the willingness of individual donors to make donations to the aid agency that is delivering the intervention. This relatively simple experiment is now underway.

Equally interesting is the fact that the author has launched, albeit on a very modest scale, a prediction market on the likely results of this experiment. Visitors to the blog are asked to make their predictions on the results of the experiment. When the results of the experiment are available Dean will identify and reward the most successful “bidder” (with two free copies of his new book More Than Good Intentions). Apart from the fun element involved, the use of a prediction maket will enable  Dean to identify to what extent his experiment has generated new knowledge [i.e. experiment results differ a lot from the average prediction], versus confirmed existing common knowledge [i.e. results = the average prediction]. That sort of thing does not happen very often.

So, I encourage you to visit Dean’s blog and participate. You do this by making your predictions using the Comment facility at the end of the blog (where you can also read other’s predictions already made, plus their comments).

Good Enough Guide to Impact Measurement – Rapid Onset Natural Disasters

[from the Emergency Capacity Building Project website]

Published on 6 April 2011

The Department for International Development (DfID / UKAID) awarded a grant for the ECB Project to develop a new Good Enough Guide to Impact Measurement. Lead by Dr. Vivien Walden from Oxfam, a team of ECB specialists from CRS, Save the Children, and World Vision will work together with the British University of East Anglia (UEA).

This guide, and supporting capacity-building materials, will include the development of an impact measurement methodology for rapid onset natural disasters. The methodologies will be field tested by the editorial team in Pakistan and one other country location from September 2011 onwards.

The team welcomes suggestions and input on developing methodologies for impact measurement. Contact us with your ideas at info@ecbproject.org

Tools and Methods for Evaluating the Efficiency of Development Interventions

Palenberg, M. (2011),  Evaluation Working Papers. Bonn: Bundesministe-rium für wirtschaftliche Zusammenarbeit und Entwicklung. Available as pdf.

Foreword:

Previous BMZ Evaluation Working Papers have focused on measuring impact. The present paper explores approaches for assessing efficiency. Efficiency is a powerful concept for decision making and ex-post assessments of development interventions but, nevertheless, often treated rather superficially in project appraisal, project completion and evaluation reports. Assessing efficiency is not an easy task but with potential for improvements, as the report shows. Starting with definitions and the theoretical foundations the author proposes a three level classification related to the analytical power of efficiency analysis methods. Based on an extensive literature review and a broad range of interviews, the report identifies and describes 15 distinct methods and explains how they can be used to assess efficiency. It concludes with an overall assessment of the methods described and with recommendations for their application and further development.

The “Real Book” for story evaluation methods

Marc Maxson, Irene Guijt, and others, 2010. GlobalGiving Foundation (supported by Rockefeller Foundation). Available as pdf.  See also the related website.

[“Real Book” = The Real Book is a central part of the culture of playing music where improvisation is essential. Real books are not for beginners: the reader interprets scant notation, and builds on her own familiarity with chords. The Real Book allows musicians to play an approximate version of hundreds of new songs quickly]

About this book
“This is a collection of narratives that serve to illustrate some not-so-obvious lessons that affected our story pilot project in Kenya. We gathered a large body of community stories that revealed what people in various communities believed they  needed, what services they were getting, and what they would like to see happen in the future. By combining many brief narratives with a few contextual questions we were able to compare and analyze thousands of stories. Taken together, these stories and their meanings provide a perspective with both depth and breadth: Broad enough to inform an organization’s strategic thinking about the root causes of social ailments2, yet deep and real enough to provoke specific and immediate follow-up actions by the local organizations of whom community members speak.

We believe that local people are the “experts” on what they want and know who has (or has not) been helping them. And like democracy, letting them define the problems and solutions that deserve to be discussed is the best method we’ve found for aggregating that knowledge. Professionals working in this field can draw upon the wisdom of this crowd for understanding the local context, and build upon what they know. Community efforts are complex, and our aim is not to predict the future, but help local leaders manage the present. If projects are observed from many angles – especially by those for whom success affects their livelihood – and implementers use these perspectives to mitigate risks and avoid early failure, the probability of future success will be much greater.”

See also:

RD comment 1: See also a different perspective on the Global Giving experience: Networks of self-categorised stories, by Rick Davies

RD comment 2: What I like about this doc: 1. Lots of warts and all descriptions of data collection, with all the problems that occur in real life, 2. the imaginative improvement on Cognitive Edge’s use of triads as tools to enabling self-signifier tools, a circular device call the story marbles approach. This enables respondents to choose which of x categories they will use and then indicate to what extent each of these categories apply to their story. It meets the requirement the author described thus: “What we need is a means to let the storyteller define the right question while also constraining the possible questions enough that we will derive useful clusters of stories with similar question frames.”

Synthesis Study of DFID’s Strategic Evaluations 2005 – 2010

 

A report produced for the Independent Commission for Aid Impact
by Roger Drew, January 2011. Available as pdf.

Summary

S1. This report examined central evaluations of DFID’s work published from 2006 to 2010. This included:
– 41 reports of the International Development Committee (IDC)
– Two Development Assistance Committee (DAC) peer reviews
– 10 National Audit Office (NAO) reports
– 63 reports of evaluations from DFID’s Evaluation Department (EVD)

S2. These evaluations consisted of various types:
– Studies of DFID’s work overall (16%)
– Studies with a geographic focus (46%)
– Studies of themes or sectors (19%)
– Studies of how aid is delivered (19%) (see Figure 1)

S3. During this period, DFID’s business model involved allocating funds through divisional programmes. Analysis of these evaluation studies according to this business model shows that:
– Across regional divisions, the amount of money covered per study varied from £63 million in Europe and Central Asia to £427 million in East and Central Africa.
– Across non-regional divisions, the amount of money covered per study varied from £84 million in Policy Division to £5,305 million in Europe and Donor Relations (see Figure 2).

S4. Part of the explanation of these differences is that the evaluations studied form only part of the overall scrutiny of DFID’s work. In particular, its policy on evaluation commits DFID to rely on the evaluation systems of partner multilateral organisations for assessment of the effectiveness and efficiency of multilateral aid. No central reviews of data generated through those systems were included in the documents reviewed for this study. The impact of DFID’s Bilateral and Multilateral Aid Reviews was not considered, as the Reviews had not been completed by the time this study was undertaken.

S5. The evaluations reviewed had a strong focus on DFID’s bilateral aid programmes at country level. There was a good match overall between the frequency of studying countries and the amount of DFID bilateral aid received (see Table 4). Despite the growing focus on fragile states, such countries were still less likely to be studied than non-fragile countries. Countries that received large amounts of DFID bilateral aid not evaluated in the last five years included Tanzania, Iraq and Somalia (see Table 5). Regional programmes in Africa also received large amounts of DFID bilateral aid but were not centrally evaluated. Country programme evaluations did not consider DFID’s multilateral aid specifically. None of the evaluations reviewed considered why the distribution of DFID’s multilateral aid by country differs so significantly from its bilateral aid. For example, Turkey is the single largest recipient of DFID multilateral aid but receives almost nothing bilaterally (see Table 7).

S6. The evaluations reviewed covered a wide range of thematic, sectoral and policy issues (see Figure 3). These evaluations were, however, largely standalone exercises rather than drawing either retrospectively on data gathered in other evaluations or prospectively including questions into proposed evaluations. More use could have been made of syntheses of country programme evaluations for this purpose.

S7. The evaluations explored in detail the delivery of DFID’s bilateral aid and issues of how aid could be delivered more effectively. The evaluations covered the provision of multilateral aid in much less detail (see paragraph S4). One area not covered in the evaluations is the increasing use of multilateral organisations to deliver bilateral aid programmes. This more than trebled from £389 million in 2005/6 to £1.3 billion in 2009/10 and, by 2009/10, was more than double the amount being provided as financial aid through both general and sectoral budget support combined.

[RD comment:  I had the impression that DFID, like many bilateral donors, does very few ex-post evaluations, so I wanted to find out how correct this view was. I searched for “ex-post” and found nothing. The question then is whether the new Independent Commission for Aid Impact (ICAI) will address this gap – see more on this here]

A CAN OF WORMS? IMPLICATIONS OF RIGOROUS IMPACT EVALUATIONS FOR DEVELOPMENT AGENCIES

Eric Roetman,  International Child Support,  Email: eric.roetman@ic s.nl

3ie Working Paper 11, March 2011 Found courtesy of  @txtpablo

Abstract
“Development agencies are under great pressure to show results and evaluate the impact of projects and programmes. This paper highlights the practical and ethical dilemmas of conducting impact evaluations for NGOs (Non Governmental Organizations). Specifically the paper presents the case of the development organization, International Child Support (ICS). For almost a decade, all of ICS’ projects in West Kenya were evaluated through rigorous, statistically sound, impact evaluations. However, as a result of logistical and ethical dilemmas ICS decided to put less emphasis on these evaluations. This particular case shows that rigorous impact evaluations are more than an additional step in the project cycle; impact evaluations influence every step of the programme and project design. These programmatic changes, which are needed to make rigorous impact evaluations possible, may go against the strategy and principles of many development agencies. Therefore, impact evaluations not only require additional resources but also present organizations with a dilemma if they are willing to change their approach and programmes.”

[RD comment: I think this abstract is somewhat misleading. My reading of the story in this paper is that ICS’s management made some questionable decisions, not that there was something intrinsically questionable about rigourous impact evaluations per se. In the first half of the story the ICS management allowed researchers, and their methodological needs, to drive ICS programming decisions, rather than to serve and inform programming decisions. In the second half of the story the evidence from some studies of the efficacy of particular forms of participatory development seems to have been overriden by the sheer strength of ICSs belief’s in the primacy of participatory approaches. Of course this would not be the first time that evidence has been sidelined, when an organisation’s core values and beliefs are threatened.]

Theory-Based Stakeholder Evaluation

Morten Balle Hansen and Evert Vedung. American Journal of Evaluation
31(3) 295-313, 2010. Available as pdf

Abstract
“This article introduces a new approach to program theory evaluation called theory-based stakeholder evaluation or the TSE model for short. Most theory-based approaches are program theory driven and some are stakeholder oriented as well. Practically, all of the latter fuse the program perceptions of the various stakeholder groups into one unitary program theory. The TSE model  keeps the program theories of the diverse stakeholder groups apart from each other and from the program theory embedded in the institutionalized intervention itself. This represents, the authors argue, an important clarification and extension of the standard theory-based evaluation. The TSE model is elaborated to enhance  theory-based evaluation of interventions characterized by conflicts and competing program theories. The authors argue that especially in evaluations of complex and complicated multilevel and multisite interventions, the presence of competing theories is likely and the TSE model may prove useful.”

%d bloggers like this: