Monitoring and Evaluation NEWS

Aid on the Edge of Chaos…

… Rethinking International Cooperation in a Complex World

by Ben Ramalingam, Oxford University Press, 2013. Viewable in part via Google Books (and fully searchable with key words)

Publishers summary:

A ground breaking book on the state of the aid business, bridging policy, practice and science. Gets inside the black box of aid to highlight critical flaws in the ways agencies learn, strategise, organise, and evaluate themselves. Shows how ideas from the cutting edge of complex systems science have been used to address social, economic and political issues, and how they can contribute to the transformation of aid. An open accessible style with cartoons by a leading illustrator. Draws on workshops, conferences, over five years of research, and hundreds of interviews.

Rick Davies comments (but not a review): Where to start…? This is a big book, in size and ambition. But also in the breadth of the author’s knowledge and contacts in the field. There have been many reviews of the book, so I will simply link to some here, to start with: Duncan Green (Oxfam), Tom Kirk (LSE), Nick Perkins (AllAfrica), Paul van Gardingen and Andrée Carter (SciDevnet), Melissa Leach (Steps Centre), Owen Barder, Philip Ball , IRIN , New Scientist and Lucy Noonan (Clear Horizon) See also Ben’s own Aid on the Edge of Chaos blog.

Evauation issues are discussed in two sections: Watching the Watchman (pages 101-122), and Performance Dynamics, Dynamic Performance (pages 351-356). That is about 7% of the book as a whole, which is a bigger percentage than most development projects spend on evaluation! Of course there is a lot more to Ben’s book that relates to evaluation outside of these sections.

One view of the idea of systems being on the edge of chaos is that it is about organisations (biological and social) evolving to a point where they find a viable balance between sensitivity to new information and retention of past information (as embedded in existing structures and processes) i.e learning strategies. That said, what strikes me the most about aid organisations, as a sector, is how stable they are. Perhaps way too stable. Mortality rates are very low compared to private sector enterprises. Does this suggest that as a set aid organisations are not as effective at learning as they could be?

I also wondered to what extent the idea of being on the edge of chaos (i.e a certain level of complexity) could be operationalised/measured, and thus developed into something that was more than a metaphor. However, Ben and other authors (Melanie Mitchel) have highlighted the limitations of various attempts to measure complexity. In fact the very attempt to do so, at least in a single (i.e. one dimensional) measure seems somewhat ironical. But perhaps degrees of complexity could be mapped in a space defined by multiple measures? For example: (a) diversity of agents, (b) density of connections between them, (c) the degrees of freedom or agency each agent has. …a speculation.

Ben has been kind enough to quote some of my views on complexity issues, including those on the representation of complexity (page 351). The limitations of linear Theories of Change (ToC) are discussed at various points in the book, and alternatives are explored, including network models and agent based simulation models. While I am sympathetic to their wider use I do continue to be surprised at how little complexity aid agency staff can actually cope with when presented with a ToC that has to be a working part of a Monitoring and Evaluation Framework, for a project. And I have a background concern that the whole enthusiasm for ToCs these days still belies a deep desire for plan-ablity that in reality is at odds with the real world within which aid agencies work.

In his chapter on Dynamic Change Ben describes an initiative called Artificial Intelligence for Development and the attempt to use quantitative approaches and “big data” sources to understand more about the dynamics of development (e.g. market movements, migration, and more) as they occur, or at least shortly afterwards. Mobile phone usage being one of the data sets that are becoming more available in many locations around the world. I think this is fascinating stuff, but it is in stark contrast with my experience of the average development project, where there is little in the way of readily available big data that is or could be used for project management and wider lesson learning. Where there is survey data it is rarely publicly available, although the open data and transparency movements are starting to have some effect.

On the more positive side, where data is available, there are new “big data” approaches that agencies can use and adapt. There is now an array of data mining methods that can be used to inductively find patterns (clusters and associations) in data sets, some of which are free and open source (See Rapid Miner). While these searches can be informed by prior theories, they are not necessarily locked in by them – they are open to discovery of unexpected patterns and surprise. Whereas the average ToC is a relatively small and linear construct, data mining software can quickly and systematically explore relationships within much larger sets of attributes/measures describing the interventions, their targets and their wider context.

Some of the complexity science concepts described in the book provide limited added value, in my view. For example, the idea of a fitness landscape, which comes from evolutionary theory. Some of its proposed use, as in chapter 17, is almost a self caricature: “Implementers first need to establish the overall space of possiblities for a given project, programme or policy, then ‘dynamically crawl the design space by simultaneously trying out design alternatives and then adapting the project sequentially based on the results” (Pritchett et al). On the other hand, there were some ideas I would definitely like to follow up on, most notably agent based modelling, especially participatory based modeling (pages 175-80, 283-95). Simulations are evaluable, in two ways: by analysis of fit with historic data and accuracy of predictions of future data points. But they do require data, and that perhaps is an issue that could be explored a bit more. When facing uncertain futures and when using a portfolio of strategies to cope with that uncertainty a lot more data is needed than when pursuing a single intervention in a more stable and predictable environment. [end of ramble :-)

Multiple Pathways to Policy Impact: Testing an Uptake Theory with QCA

by Barbara Befani, IDS Centre for Development Impact, PRACTICE PAPER. Number 05 October 2013. Available as pdf

Abstract: Policy impact is a complex process influenced by multiple factors. An intermediate step in this process is policy uptake, or the adoption of measures by policymakers that reflect research findings and recommendations. The path to policy uptake often involves activism, lobbying and advocacy work by civil society organisations, so an earlier intermediate step could be termed ‘advocacy uptake’; which would be the use of research findings and recommendations by Civil Society Organisations (CSOs) in their efforts to influence government policy. This CDI Practice Paper by Barbara Befani proposes a ‘broad-brush’ theory of policy uptake (more precisely of ‘advocacy uptake’) and then tests it using two methods: (1) a type of statistical analysis and (2) a variant of Qualitative Comparative Analysis (QCA). The pros and cons of both families of methods are discussed in this paper, which shows that QCA offers the power of generalisation whilst also capturing some of the complexity of middle-range explanation. A limited number of pathways to uptake are identified, which are at the same time moderately sophisticated (considering combinations of causal factors rather than additions) and cover a medium number of cases (40), allowing a moderate degree of generalisation. – See more at: http://www.ids.ac.uk/publication/multiple-pathways-to-policy-impact-testing-an-uptake-theory-with-qca#sthash.HEg4Smra.dpuf

Rick Davies comment: What I like about this paper is the way it shows, quite simply, how measurements of the contribution of different possible causal conditions in terms of averages, and correlations between these, can be uniformative and even misleading. In contrast, a QCA analysis of the different configurations of causal conditions can be much more enlightening and easier to relate to what are often complex realities in the ground.

I have taken the liberty of re-analysing the fictional data set provided in the annex, using a Decision Tree software (within RapidMiner). This is a means of triangulating the results of QCA analyses. It uses the same kind of data set and produces results which are comparable in structure, but the method of analysis is different. Shown below is a Decision Tree representing seven configurations of conditions that can be found in Befani’s data set of 40 cases. It makes use of 4 of the five conditions described in the paper. These are shown as nodes in the tree diagram.

(click on image to enlarge and get a clearer image!)

The 0 and 1 values on the various branches indicate whether the condition immediately above is present or not. The first configuration on the left says that if there is no ACCESS then research UPTAKE (12 cases at the red leaf) does not take place. This is a statement of a sufficient cause. The branch on the right, represents a configuration of three conditions, which says that where ACCESS to research is present, and recommendations are consistent with measures previously (PREV) recommended by the organisation, and where the research findings are disseminated within the organisation by a local ‘champion (CHAMP) then research UPTAKE (8 cases at the blue leaf) does take place.

Overall the findings shown in the Decision Tree model are consistent with the QCA analyses in terms of the number of configurations (seven) and the configurations that are associated with the largest number of cases (i.e. their coverage). However there were small differences in descriptions of two sets of cases where there was no uptake (red leaves). In the third branch (configuration) from the left above, the QCA analysis indicated that it was the presence of INTERNAL CONFLICT (different approaches to the same policy problem within the organisation) that played a role, rather than the presence of a (perhaps ineffectual) CHAMPION. In the third branch (configuration) from the right the QCA analysis proposed a fourth necessary condition (QUALITY), in addtion to the three shown in the Decision Tree. Here the Decision Tree seems the more parsimonious solution. However, in both sets of cases where differences in findings have occured it would make most sense to then proceed with within-case investigations of the causal processes at work.

PS: Here is the dataset, in case anyone wants to play with it

Learning about Measuring Advocacy and Policy Change: Are Baselines always Feasible and Desirable?

by Chris Barnett, an IDS Practice Paper in Brief, July 2013 Available as pdf

Summary: This paper captures some recent challenges that emerged from establishing a baseline for an empowerment and accountability fund. It is widely accepted that producing a baseline is logical and largely uncontested – with the recent increased investment in baselines being largely something to be welcomed. This paper is therefore not a challenge to convention, but rather a note of caution: where adaptive programming is necessary, and there are multiple pathways to success, then the ‘baseline endline’ survey tradition has its limitations. This is particularly so for interventions which seek to alter complex political-economic dynamics, such as between citizens and those in power.

Concluding paragraph: It is not that baselines are impossible, but that in such cases process tracking and ex post assessments may be necessary to capture the full extent of the results and impacts where programmes are flexible, demand-led, and working on change areas that cannot be fully specified from the outset. Developing greater robustness around methodologies to evaluate the work of civil society – particularly E&A initiatives that seek to advocate and influence policy change – should therefore not be limited to simple baseline (plus end-line) survey traditions.

Rick Davies’ comment: This is a welcome discussion on something that can too easily be taken for granted as a “good thing”. Years ago I was reviewing a maternal and child health project being implemented in multiple districts in Indonesia. There was baseline data for the year before the project started, and data on the same key indicators for the following four years when the project intervention took place. The problem was that the values on the indicators during the project period varied substantially from year to year, raising a big doubt in my mind as to how reliable the baseline measure was, as a measure of pre-intervention status. I suspect the pre-intervention values also varied substantially from year to year. So to be useful at all, a baseline in these circumstances would probably better be in the form of a moving average of x previous years – which would only be doable if the necessary data could be found!

Reading Chris Barnet’s paper I also recognised (in hindsight) another problem. Their Assumption 1: The baseline is ‘year zero’ probably did not hold (as he suggests it often does not) in a number of districts, where the same agency had already been working beforehand

THE CHALLENGES OF EVIDENCE

Dr Ruth Levitt, November 2013 Available as pdf

PROVOCATION PAPER FOR THE ALLIANCE FOR USEFUL EVIDENCE, NESTA

Excerpt….

“Which of these statements is true?

Evidence is essential stuff. It is objective. It answers questions and helps us to solve problems. It helps us to predict. It puts decisions on the right track. Evidence makes sure that decisions are safer. Evidence can turn guesswork into certainty. Evidence tells us what works. It explains why people think and act as they do. It alerts us to likely consequences and implications. It shows us where and when to intervene. We have robust methods for using evidence. Evidence is information; information is abundant. It is the most reliable basis for making policy. Evidence is the most reliable basis for improving practice. There has never been a better time for getting hold of evidence.

Now, what about truth in any of these statements?

Evidence is dangerous stuff. Used unscrupulously it can do harm. It is easily misinterpreted and misrepresented. It is often inconclusive. Evidence is often insufficient or unsuitable for our needs. We will act on it even when it is inadequate or contradictory or biased. We ignore or explain away evidence that doesn’t suit our prejudices. We may not spot where evidence has flaws. It can conceal rather than reveal, confuse rather than clarify. It can exaggerate or understate what is actually known. It can confuse us. Evidence can be manipulated politically. We can be persuaded to accept false correlations. A forceful advocate can distort what the evidence actually says.

The answer is that each statement in each cluster is sometimes true, in particular circumstances

Learning about Theories of Change for the Monitoring and Evaluation of Research Uptake

IDS Practice Paper in Brief, 1 4, SEPTEMBER 2013. Chris Barnett and Robbie Gregorowski. Available as pdf

Abstract: “This paper captures lessons from recent experiences on using ‘theories of change’ amongst organisations involved in the research–policy interface. The literature in this area highlights much of the complexity inherent in the policymaking process, as well as the challenges around finding meaningful ways to measure research uptake. As a tool, ‘theories of change’ offers much, but the paper argues that the very complexity and dynamism of the research-to-policy process means that any theory of change will be inadequate in this context. Therefore, rather than overcomplicating a static depiction of change at the start (to be evaluated at the end), incentives need to be in place to regularly collect evidence around the theory, test it periodically, and then reflect and reconsider its relevance and assumptions

Evidence-Based Policy and Systemic Change: Conflicting Trends?

2013 . Springfield Working Paper Series # 1 , Dr. Ben Taylor btaylor@springfieldcentre.com Available as pdf

Abstract: “Two concurrent but incompatible trends have emerged in development in recent years. Firstly, evidence-based policy and the results agenda have come become ubiquitous amongst government policymakers in recent years including in development. Secondly, there has been a realisation of the utility of systemic approaches to development policy and programming in order to bring about sustainable change for larger numbers of people. This paper highlights the negative impacts of this former trend on development and, more acutely, its incompatibility with the latter trend. The paper then highlights positive signs of a change in thinking in development that have begun to emerge to lead to a more pragmatic and contextually nuanced approach to measuring progress and identifies the need for further research in this area, calling for evaluation of approaches rather than searching for a silver bullet. The paper draws on a review of the evidence together with a number of key informant interviews with practitioners from the field.”

A review of evaluations of interventions related to violence against women and girls – using QCA and process tracing

In this posting I am drawing attention to a blog by Michaela Raab and Wolf Stuppert, which is exceptional (or at least unusual) in a number of respects. The blog is called http://www.evawreview.de/

Firstly the blog is not just about the results of a review, but more importantly, about the review process, written as the review process proceeds. (I have not seen many of these kinds of blogs around, but if you know about any others please let me know)

Secondly the blog is about the use of of QCA and process tracing. There have been a number of articles about QCA in the journal Evaluation but generally speaking relatively few evaluators working with development projects know much about QCA or process tracing.

Thirdly, the blog is about the use of QCA and process tracing as a means of doing a review of findings of past evaluations of interventions related to violence against women and girls. In other words it is another approach to undertaking a kind of systematic review, notably one which does not require throwing out 95% of the available studies because their contents don’t fit the methodology being used to do the systematic review.

Fourthly, it is about combining the use of QCA and process tracing, i.e. combining cross-case comparisons with within-case analyses. QCA can help identify causal configurations of conditions associated with specific outcomes. But once found these associations need to be examined in depth to ensure there are plausible causal mechanisms at work. That is where process tracing comes into play.

I have two hopes for the EVAWG Review blog. One is that it will provide a sufficiently transparent account of the use of QCA to enable new potential users to understand how it works, along with an appreciation of its potentials and difficulties. The other is that the dataset used in the QCA analysis will be made publicly available, ideally via the blog itself. One of the merits of QCA analyses, as published so far, is that the datasets are often published as part of the published articles, which means others can then re-analyse the same data, perhaps from a different perspective. For example, I would like to test the results of the QCA analyses by using another method for generating results which have a comparable structure (i.e. descriptions of one or more configurations of conditions associated with the presence and absence of expected outcomes). I have described this method elsewhere (Decision Tree algorithms, as used in data mining)

There are also some challenges that will face this use of QCA, which I would like to see how the blog’s authors will try to deal with. In RCTs there need to be both comparable interventions and comparable outcomes e.g. cash transfers provided to many people in some standardised manner, and a common measure of household poverty status. With QCA (and Decision Tree) analyses comparable outcomes are still needed, but not comparable interventions. These can be many and varied, as can be the wider context in which they are provided. The challenge with Raab and Stuppert’s work on VAWG is that there will be many and varied outcome measures as well and interventions. They will probably need to do multiple QCA analyses, focusing on sub-sets of evaluations within which there are one or more comparable outcomes. But by focusing in this way, they may end up with too few cases (evaluations) to produce plausible results, given the diversity of (possibly) causal conditions they will be exploring.

There is a much bigger challenge still. On re-reading the blog I realised this is not simply a kind of systematic review of the available evidence, using a different method. Instead it is a kind of meta-evaluation, where the focus is on comparison of the evaluation methods used in the population of evaluation they manage to amass. The problem of finding comparable outcomes is much bigger here. For example, on what basis will they rate or categorise evaluations as successful (e.g. valid and/or useful)? There seems to be a chicken and egg problem lurking here. Help!

PS1: I should add that this work is being funded by DFID, but the types of evaluations being reviewed is not limited to evaluations of DFID projects

PS2 2013 11 07 : I now see from the team’s latest blog posting the the common outcome of interest will be the usefullness of the evaluation. I would be interested to see how they assess usefullness , in some way that is reasonably reliable.

PS3 2014 01 07: I continue to be impressed by the team’s efforts to publicly document the progress of their work. Their Scoping Report is now available online, along with a blog commentary on progress to date (2013 01 06)

PS4 2014 03 27: The Inception Report is now available on the VAWG blog. It is well worth reading, especially the sections explaining the methodology and the evaluation team’s response to comments by the the Specialised Evaluation and Quality Assurance Service (SEQUAS, 4 March 2014) on pages 56-62, some of which are quite tough.

Some related/relevant reading:

Compasss: A web site devoted to resources on QCA
Goertz, Gary, and James Mahoney. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press, 2012.
Mahoney, James. “Mahoney, J. (2012). The Logic of Process Tracing Tests in the Social Sciences. 1-28.” Sociological Methods & Research XX(X) (March 2, 2012): 1–28. doi:10.1177/0049124112437709.
Where there is no single Theory of Change: The uses of Decision Tree models. Rick Davies, 2012

Utilization Focused Evaluation A primer for evaluators

by Ricardo Ramirez, Dal Brodhead. 2013. Available as pdf.

Also available in Spanish and French. See http://evaluationandcommunicationinpractice.ca/ufe-primer-get-it-here/

“Who is this Primer For?

This Primer is for practitioner evaluators who have heard of UFE and are keen to test drive the approach. Throughout this Primer we refer to the value of having a mentor to assist an evaluator who is using UFE for the first time. Our collective experiences with UFE indicated having a mentor was, for many UFE participants, an essential support and it reflects how we learned and mentored UFE.

>Evaluators may use elements of a UFE in their work naturally, for example by engaging users in planning the process or in assisting them in the utilization of findings. This Primer, however, walks the reader through UFE by systematically covering all of the 12 steps. It reflects deeply on the UFE evaluation practice and builds from it.

A second audience for the Primer is project implementers. In the five UFE experiences that underpin this Primer, the primary users of the evaluations were the research projects’ implementers — although other users could have been selected such as funders or beneficiaries. This qualification is important as the Primer will also interest funders of research and commissioners of evaluation.

Funders frequently have resources to commission evaluations. Funders have the power to support useful evaluations. They can, as well, choose not to support the evaluations. Supporting useful evaluation using UFE requires working differently than in the past with regards to both evaluators and the evaluands. This Primer offers some insights into how to do this.

While this Primer is based on UFE experiences completed with five research projects in the field of ICTD, there is scope for the lessons to apply to a wider variety of project in other sectors.

This primer is not a stand-alone manual. For that purpose readers are referred to the fourth edition of Utilization-Focused Evaluation by Michael Quinn Patton, as well as his most recent Essentials of Utilization-Focused Evaluation (2012)

This primer is also not a training module. Readers interested in that use are referred to the UFE Curriculum. It provides modules that were developed and adapted to different audiences. They are available at: http://evaluationinpractice.wordpress.com/ “

Rapid needs assessments: Severity and priority measures

by Aldo Benini (received 8th October 2013) abenini@starpower.net http://aldo-benini.org/

“Rapid assessments after disasters gauge the intensity of unmet needs across various spheres of life, commonly referred to as “sectors”. Sometimes two different measures of needs are used concurrently – a “severity score” independently given in each sector and a “priority score”, a relative measure comparing levels of needs to those of other sectors. Needs in every assessed locality are thus scored twice.

“Severity and priority – Their measurement in rapid needs assessments” clarifies the conceptual relationship. Aldo Benini wrote this note for the Assessment Capacities Project (ACAPS) in Geneva following the Second Joint Rapid Assessment of Northern Syria (J-RANS II) in May 2013. It investigates the construction and functioning of severity and priority scales, using data from Syria as well as from an earlier assessment in Yemen. In both assessments, the severity scales differentiated poorly. Therefore an artificial dataset was created to simulate what associations can realistically be expected between severity and priority measures. The note discusses several alternative measurement formulations and the logic of comparisons among sectors and among affected locations.

Readers find the note as well as files needed to replicate the simulation here; the author welcomes new ideas for the measurement of the severity and priority of needs in general and improvements to the simulation code in particular.”

Planning Evaluability Assessments: A Synthesis of the Literature with Recommendations

Report of a study commissioned by the Department for International Development
DFID Working Paper No. 40. By Rick Davies, August 2013. Available as pdf
See also the DFID website:https://www.gov.uk/government/publications/planning-evaluability-assessments

[From the Executive Summary] “The purpose of this synthesis paper is to produce a short practically oriented report that summarises the literature on Evaluability Assessments, and highlights the main issues for consideration in planning an Evaluability Assessment. The paper was commissioned by the Evaluation Department of the UK Department for International Development (DFID) but intended for use both within and beyond DFID.

The synthesis process began with an online literature search, carried out in November 2012. The search generated a bibliography of 133 documents including journal articles, books, reports and web pages, published from 1979 onwards. Approximately half (44%) of the documents were produced by international development agencies. The main focus of the synthesis is on the experience of international agencies and on recommendations relevant to their field of work.

Amongst those agencies the following OECD DAC definition of evaluability is widely accepted and has been applied within this report: “The extent to which an activity or project can be evaluated in a reliable and credible fashion”.

Eighteen recommendations about the use of Evaluability Assessments are presented here [in the Executive Summary], based on the synthesis of the literature in the main body of the report. The report is supported by annexes, which include an outline structure for Terms of Reference for an Evaluability Assessment.]

The full bibliography referred to in the study can be found online here: http://mande.co.uk/wp-content/uploads/2013/02/Zotero-report.htm

Postscript: A relevant xkcd perspective?

Aid on the Edge of Chaos…

Like this:

Multiple Pathways to Policy Impact: Testing an Uptake Theory with QCA

Like this:

Learning about Measuring Advocacy and Policy Change: Are Baselines always Feasible and Desirable?

Like this:

THE CHALLENGES OF EVIDENCE

Like this:

Learning about Theories of Change for the Monitoring and Evaluation of Research Uptake

Like this:

Evidence-Based Policy and Systemic Change: Conflicting Trends?

Like this:

A review of evaluations of interventions related to violence against women and girls – using QCA and process tracing

Like this:

Utilization Focused Evaluation A primer for evaluators

Like this:

Rapid needs assessments: Severity and priority measures

Like this:

Planning Evaluability Assessments: A Synthesis of the Literature with Recommendations

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: