The Science of Evaluation: A Realist Manifesto

Pawson, Ray. 2013. The Science of Evaluation: A Realist Manifesto. UK: Sage Publications. http://www.uk.sagepub.com

Chapter 1 is available as a pdf. Hopefully other chapters will also become available this way, because this 240 page book is expensive.

Contents

Preface: The Armchair Methodologist and the Jobbing Researcher
PART ONE: PRECURSORS AND PRINCIPLES
Precursors: From the Library of Ray Pawson
First Principles: A Realist Diagnostic Workshop
PART TWO: THE CHALLENGE OF COMPLEXITY – DROWNING OR WAVING?
A Complexity Checklist
Contested Complexity
Informed Guesswork: The Realist Response to Complexity
PART THREE: TOWARDS EVALUATION SCIENCE
Invisible Mechanisms I: The Long Road to Behavioural Change
Invisible Mechanisms II: Clinical Interventions as Social Interventions
Synthesis as Science: The Bumpy Road to Legislative Change
Conclusion: A Mutually Monitoring, Disputatious Community of Truth Seekers

Reviews

Twelve reasons why climate change adaptation M&E is challenging

Bours, Dennis, Colleen McGinn, and Patrick Pringle. 2014. “Guidance Note 1: Twelve Reasons Why Climate Change Adaptation M&E Is Challenging.” SeaChange & UKCIP     Available as a pdf

“Introduction:  Climate change adaptation (CCA) refers to how people and systems adjust to the actual or expected effects of climate change. It is often presented as a cyclical process developed in response to climate change impacts or their social, political, and economic consequences. There has been a recent upsurge of interest in CCA among international development agencies resulting in stand-alone adaptation programs as well as efforts to mainstream CCA into existing development strategies. The scaling up of adaptation efforts and the iterative nature of the adaptation process means that Monitoring and Evaluation (M&E) will play a critical role in informing and improving adaptation polices and activities. Although many CCA programmes may look similar to other development interventions, they do have specific and distinct characteristics that set them apart. These stem from the complex nature of adaptation itself. CCA is a dynamic process that cuts across scales and sectors of intervention, and extends long past any normal project cycle. It is also inherently uncertain: we cannot be entirely sure about the course of climate change consequences, as these will be shaped by societal decisions taken in the future. How then should we define, measure, and assess the achievements of an adaptation programme?  The complexities inherent in climate adaptation programming call for a nuanced approach to M&E research. This is not, however, always being realised in practice. CCA poses a range of thorny challenges for evaluators. In this Guidance Note, we identify twelve challenges that make M&E of CCA programmes difficult, and highlight strategies to address each. While most are not unique to CCA, together they present a distinctive package of dilemmas that need to be addressed.”

See also: Bours, Dennis, Colleen McGinn, and Patrick Pringle. 2013. Monitoring and evaluation for climate change adaptation: A synthesis of tools, frameworks and approaches, UKCIP & SeaChange, pdf version (3.4 MB)

See also:  Dennis Bours, Colleen McGinn, Patrick Pringle, 2014, “Guidance Note 2: Selecting indicators for climate change adaptation programming” SEA Change CoP, UKCIP

” This second Guidance Note follows on from that discussion with a narrower question: how does one go about choosing appropriate indicators? We begin with a brief review of approaches to CCA programme design, monitoring, and evaluation (DME). We then go on to discuss how to identify appropriate indicators. We demonstrate that CCA does not necessarily call for a separate set of indicators; rather, the key is to select a medley that appropriately frames progress towards adaptation and resilience. To this end, we highlight the importance of process indicators, and conclude with remarks about how to use indicators thoughtfully and well”

Monitoring and evaluating civil society partnerships

A GSDRC Help Desk response

Request: Please identify approaches and methods used by civil society organisations (international NGOs and others) to monitor and evaluate the quality of their relationships with partner (including southern) NGOs. Please also provide a short comparative analysis.

Helpdesk response

Key findings: This report lists and describes tools used by NGOs to monitor the quality of their relationships with partner organisations. It begins with a brief analysis of the types of tools and their approaches, then describes each tool. This paper focuses on tools which monitor the partnership relationship itself, rather than the impact or outcomes of the partnership. While there is substantial general literature on partnerships, there is less literature on this particular aspect.

Within the development literature, ‘partnership’ is most often used to refer to international or high-income country NGOs partnering with low-income country NGOs, which may be grassroots or small-scale. Much of a ‘north-south’ partnership arrangement centres around funding, meaning accountability arrangements are often reporting and audit requirements (Brehm, 2001). As a result, much of the literature and analysis is heavily biased towards funding and financial accountability. There is a commonly noted power imbalance in the literature, with northern partners controlling the relationship and requiring southern partners to report to them on use of funds. Most partnerships are weak on ensuring Northern accountability to Southern organisations (Brehm, 2001). Most monitoring tools are aimed at bilateral partnerships.

The tools explored in the report are those which evaluate the nature of the partnership, rather than the broader issue of partnership impact. The ‘quality’ of relationships is best described by BOND, in which the highest quality of partnership is described as joint working, adequate time and resources allocated specifically to partnership working, and improved overall effectiveness. Most of the tools use qualitative, perception-based methods including interviewing staff from both partner organisations and discussing relevant findings. There are not many specific tools available, as most organisations rely on generic internal feedback and consultation sessions, rather than comprehensive monitoring and evaluation of relationships. Resultantly, this report only presents six tools, as these were the most referred to by experts.

Full response: http://www.gsdrc.org/docs/open/HDQ1024.pdf

The Availability of Research Data Declines Rapidly with Article Age

Summarised on SciDevNet, as “Most research data lost as scientists switch storage tech” from this source:

Current Biology, 19 December 2013
Copyright © 2014 Elsevier Ltd All rights reserved.
10.1016/j.cub.2013.11.014

Authors

Highlights

  • We examined the availability of data from 516 studies between 2 and 22 years old
  • The odds of a data set being reported as extant fell by 17% per year
  • Broken e-mails and obsolete storage devices were the main obstacles to data sharing
  • Policies mandating data archiving at publication are clearly needed

Summary

“Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2,3,4], and journal [5,6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8,9,10,11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested data sets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.”

Rick Davies comment: I suspect the situation with data generated by development aid projects (and their evaluations) is much, much worse. I have been unable to get access to data generated within the last 12 months by one DFID co-funded project in Africa . I am now trying to see if data used in a recent analysis of the (DFID funded) Chars Livelihoods Programme is available.

I am also making my own episodic attempts to make data sets publicly available that have been generated by my own work in the past. One is a large set of hosuehold survey data from Mogadishu in 1986, and another is household survey data from Vietnam generated in 1996 (baseline) and 2006 (follow up). One of the challenges is finding a place on the internet that specialises in making such data available (especially development project data). Any ideas?

PS 2014 01 07: Missing raw data is not the only problem. Lack of contact information about the evaluators/researchers who were associated with the data collection is another one. In their exemplary blog about their use of QCA Raab and Stuppert comment about their search for evaluation reports:

Most of the 74 evaluation reports in our first coding round do not display the evaluator’s or the commissioner’s contact details. In some cases, the evaluators remain anonymous; in other cases, the only e-mail address available in the report is a generic info@xyz.org. This has surprised us – in our own evaluation practice, we always include our e-mail addresses so that our counterparts can get in touch with us in case, say, they wish to work with us again”

PS 2014 02 01 Here is another interesting article about missing data and missing policies about making data available: Troves of Personal Data, Forbidden to Researchers (NYT, May 21, 2012)

“At leading social science journals, there are few clear guidelines on data sharing. “The American Journal of Sociology does not at present have a formal position on proprietary data,” its editor, Andrew Abbott, a sociologist at the University of Chicago, wrote in an e-mail. “Nor does it at present have formal policies enforcing the sharing of data.”

 The problem is not limited to the social sciences. A recent review found that 44 of 50 leading scientific journals instructed their authors on sharing data but that fewer than 30 percent of the papers they published fully adhered to the instructions. A 2008 review of sharing requirements for genetics data found that 40 of 70 journals surveyed had policies, and that 17 of those were “weak.””

Aid on the Edge of Chaos…

… Rethinking International Cooperation in a Complex World

by Ben Ramalingam, Oxford University Press, 2013. Viewable in part via Google Books (and fully searchable with key words)

Publishers summary:

A ground breaking book on the state of the aid business, bridging policy, practice and science. Gets inside the black box of aid to highlight critical flaws in the ways agencies learn, strategise, organise, and evaluate themselves. Shows how ideas from the cutting edge of complex systems science have been used to address social, economic and political issues, and how they can contribute to the transformation of aid. An open accessible style with cartoons by a leading illustrator. Draws on workshops, conferences, over five years of research, and hundreds of interviews.

Rick Davies comments (but not a review): Where to start…? This is a big book, in size and ambition. But also in the breadth of the author’s knowledge and contacts in the field. There have been many reviews of the book, so I will simply link to some here, to start with: Duncan Green (Oxfam), Tom Kirk (LSE), Nick Perkins (AllAfrica), Paul van Gardingen and Andrée Carter (SciDevnet), Melissa Leach (Steps Centre), Owen Barder, Philip Ball , IRIN , New Scientist and Lucy Noonan (Clear Horizon)  See also Ben’s own Aid on the Edge of Chaos blog.

Evauation issues are discussed in two sections:  Watching the Watchman (pages 101-122), and Performance Dynamics, Dynamic Performance (pages 351-356). That is about 7% of the book as a whole, which is a bigger percentage than most development projects spend on evaluation! Of course there is a lot more to Ben’s book that relates to evaluation outside of these sections.

One view of the idea of systems being on the edge of chaos is that it is about organisations (biological and social) evolving to a point where they find a viable balance  between sensitivity to new information and retention of past information (as embedded in existing structures and processes) i.e learning strategies. That said, what strikes me the most about aid organisations, as a sector, is how stable they are. Perhaps way too stable. Mortality rates are very low compared to private sector enterprises. Does this suggest that as a set aid organisations are not as effective at learning as they could be?

I also wondered to what extent the idea of being on the edge of chaos (i.e a certain level of complexity) could be operationalised/measured, and thus developed into something that was more than a metaphor. However, Ben and other authors (Melanie Mitchel) have highlighted the limitations of various attempts to measure complexity. In fact the very attempt to do so, at least in a single (i.e. one dimensional) measure seems somewhat ironical. But perhaps degrees of complexity could be mapped in a space defined by multiple measures? For example: (a) diversity of agents, (b) density of connections between them, (c) the degrees of freedom or agency each agent has. …a speculation.

Ben has been kind enough to quote some of my views on complexity issues, including those on the representation of complexity (page 351). The limitations of linear Theories of Change (ToC) are discussed at various points in the book, and alternatives are explored, including network models and agent based simulation models. While I am sympathetic to their wider use I do continue to be surprised at how little complexity aid agency staff can actually cope with when presented with a ToC that has to be a working part of a Monitoring and Evaluation Framework, for a project. And I have a background concern that the whole enthusiasm for ToCs these days still belies a deep desire for plan-ablity that in reality is at odds with the real world within which aid agencies work.

In his chapter on Dynamic Change Ben describes an initiative called Artificial Intelligence for Development and the attempt to use quantitative approaches  and “big data”  sources to understand more about the dynamics of development (e.g. market movements, migration, and more) as they occur, or at least shortly afterwards. Mobile phone usage being one of the data sets that are becoming more available in many locations around the world. I think this is fascinating stuff, but it is in stark contrast with my experience of the average development project, where there is little in the way of readily available big data that is or could be used for project management and wider lesson learning. Where there is survey data it is rarely publicly available, although the open data and transparency movements are starting to have some effect.

On the more positive side, where data is available, there are new “big data” approaches that agencies can use and adapt. There is now an array of data mining methods that can be used to inductively find patterns (clusters and associations) in data sets, some of which are free and open source (See Rapid Miner). While these searches can be informed by prior theories, they are not necessarily locked in by them – they are open to discovery of unexpected patterns and surprise. Whereas the average ToC is a relatively small and linear construct, data mining software can quickly and systematically explore relationships within much larger sets of attributes/measures describing the interventions, their targets and their wider context.

Some of the complexity science concepts described in the book provide limited added value, in my view. For example, the idea of a fitness landscape, which comes from evolutionary theory. Some of its proposed use, as in chapter 17, is almost a self caricature: “Implementers first need to establish the overall space of possiblities for a given project, programme or policy, then ‘dynamically crawl the design space  by simultaneously trying out design alternatives and then adapting  the project sequentially based on the results” (Pritchett et al). On the other hand, there were some ideas I would definitely like to follow up on, most notably agent based modelling, especially participatory based modeling (pages 175-80, 283-95). Simulations are evaluable, in two ways: by analysis of fit with historic data and accuracy of predictions of future data points. But they do require data, and that perhaps is an issue that could be explored a bit more. When facing uncertain futures and when using a portfolio of strategies to cope with that uncertainty a lot more data is needed than when pursuing a single intervention in a more stable and predictable environment. [end of ramble :-)

 

Learning about Measuring Advocacy and Policy Change: Are Baselines always Feasible and Desirable?

by Chris Barnett, an IDS Practice Paper in Brief, July 2013 Available as pdf

Summary: This paper captures some recent challenges that emerged from establishing a baseline for an empowerment and accountability fund. It is widely accepted that producing a baseline is logical and largely uncontested – with the recent increased investment in baselines being largely something to be welcomed. This paper is therefore not a challenge to convention, but rather a note of caution: where adaptive programming is necessary, and there are multiple pathways to success, then the ‘baseline endline’ survey tradition has its limitations. This is particularly so for interventions which seek to alter complex political-economic dynamics, such as between citizens and those in power.

Concluding paragraph: It is not that baselines are impossible, but that in such cases process tracking and ex post assessments may be necessary to capture the full extent of the results and impacts where programmes are flexible, demand-led, and working on change areas that cannot be fully specified from the outset. Developing greater robustness around methodologies to  evaluate the work of civil society – particularly E&A initiatives that seek to advocate and influence policy change – should therefore not be limited to simple baseline (plus end-line) survey traditions.

 Rick Davies’ comment: This is a welcome discussion on something that can too easily be taken for granted as a “good thing”. Years ago I was reviewing a maternal and child health project being implemented in multiple districts in Indonesia. There was baseline data for the year before the project started, and data on the same key indicators for the following four years when the project intervention took place. The problem was that the values on the indicators during the project period varied substantially from year to year, raising a big doubt in my mind as to how reliable the baseline measure was, as a measure of pre-intervention status. I suspect the pre-intervention values also varied substantially from year to year. So to be useful at all, a baseline in these circumstances would probably better be in the form of a moving average of x previous years – which would only be doable if the necessary data could be found!

Reading Chris Barnet’s paper I also recognised (in hindsight) another problem. Their  Assumption 1: The baseline is ‘year zero’ probably did not hold (as he suggests it often does not)  in a number of districts, where the same agency had already been working beforehand

THE CHALLENGES OF EVIDENCE

Dr Ruth Levitt, November 2013 Available as pdf

PROVOCATION PAPER FOR THE ALLIANCE FOR USEFUL EVIDENCE, NESTA

Excerpt….

“Which of these statements is true?

Evidence is essential stuff. It is objective. It answers questions and helps us to solve problems. It helps us to predict. It puts decisions on the right track. Evidence makes sure that decisions are safer. Evidence can turn guesswork into certainty. Evidence tells us what works. It explains why people think and act as they do. It alerts us to likely consequences and implications. It shows us where and when to intervene. We have robust methods for using evidence. Evidence is information; information is abundant. It is the most reliable basis for making policy. Evidence is the most reliable basis for improving practice. There has never been a better time for getting hold of evidence.

Now, what about truth in any of these statements?

Evidence is dangerous stuff. Used unscrupulously it can do harm. It is easily misinterpreted and misrepresented. It is often inconclusive. Evidence is often insufficient or unsuitable for our needs. We will act on it even when it is inadequate or contradictory or biased. We ignore or explain away evidence that doesn’t suit our prejudices. We may not spot where evidence has flaws. It can conceal rather than reveal, confuse rather than clarify. It can exaggerate or understate what is actually known. It can confuse us. Evidence can be manipulated politically. We can be persuaded to accept false correlations. A forceful advocate can distort what the evidence actually says.

The answer is that each statement in each cluster is sometimes true, in particular circumstances

 

Learning about Theories of Change for the Monitoring and Evaluation of Research Uptake

IDS Practice Paper in Brief, 1 4, SEPTEMBER 2013. Chris Barnett and Robbie Gregorowski.  Available as pdf

Abstract: “This paper captures lessons from recent experiences on using ‘theories of change’ amongst organisations involved in the research–policy interface. The literature in this area highlights much of the complexity inherent in the policymaking process, as well as the challenges around finding meaningful ways to measure research uptake. As a tool, ‘theories of change’ offers much, but the paper argues that the very complexity and dynamism of the research-to-policy process means that any theory of change will be inadequate in this context. Therefore, rather than overcomplicating a static depiction of change at the start (to be evaluated at the end), incentives need to be in place to regularly collect evidence around the theory, test it periodically, and then reflect and reconsider its relevance and assumptions

Evidence-Based Policy and Systemic Change: Conflicting Trends?

2013 . Springfield Working Paper Series # 1 ,  Dr. Ben Taylor btaylor@springfieldcentre.com Available as pdf

Abstract: “Two concurrent but incompatible trends have emerged in development in recent years. Firstly, evidence-based policy and the results agenda have come become ubiquitous amongst government policymakers in recent years including in development. Secondly, there has been a realisation of the utility of systemic approaches to development policy and programming in order to bring about sustainable change for larger numbers of people. This paper highlights the negative impacts of this former trend on development and, more acutely, its incompatibility with the latter trend. The paper then highlights positive signs of a change in thinking in development that have begun to emerge to lead to a more pragmatic and contextually nuanced approach to measuring progress and identifies the need for further research in this area, calling for evaluation of approaches rather than searching for a silver bullet. The paper draws on a review of the evidence together with a number of key informant interviews with practitioners from the field.”

Utilization Focused Evaluation A primer for evaluators

by Ricardo Ramirez, Dal Brodhead. 2013. Available as pdf.

Also available in Spanish and French. See http://evaluationandcommunicationinpractice.ca/ufe-primer-get-it-here/

“Who is this Primer For?

This Primer is for practitioner evaluators who have heard of UFE and are keen to test drive the approach. Throughout this Primer  we refer to the value of having a mentor to assist an evaluator who is using UFE for the first time. Our collective experiences with  UFE indicated having a mentor was, for many UFE participants, an essential support and it reflects how we learned and mentored UFE.

>Evaluators may use elements of a UFE in their work naturally, for example by engaging users in planning the process or in assisting  them in the utilization of findings. This Primer, however, walks the reader through UFE by systematically covering all of the 12 steps. It reflects deeply on the UFE evaluation practice and builds from it.

A second audience for the Primer is project implementers. In the five UFE experiences that underpin this Primer, the primary users of the evaluations were the research projects’ implementers — although other users could have been selected such as funders or beneficiaries. This qualification is important as the Primer will also interest funders of research and commissioners of evaluation.

Funders frequently have resources to commission evaluations. Funders have the power to support useful evaluations. They can, as well, choose not to support the evaluations. Supporting useful evaluation using UFE requires working differently than in the past with regards to both evaluators and the evaluands. This Primer offers some insights into how to do this.

While this Primer is based on UFE experiences completed with five research projects in the field of ICTD, there is scope for the lessons to apply to a wider variety of project in other sectors.

This primer is not a stand-alone manual. For that purpose readers are referred to the  fourth edition of Utilization-Focused Evaluation by Michael Quinn Patton, as well as his  most recent Essentials of Utilization-Focused Evaluation (2012)

This primer is also not a training module. Readers interested in that use are referred to the UFE Curriculum. It provides modules that were developed and adapted to different audiences. They are available at: http://evaluationinpractice.wordpress.com/

%d bloggers like this: