Process Tracing as a Practical Evaluation Method: Comparative Learning from Six Evaluations

By Alix Wadeson, Bernardo Monzani and Tom Aston
March 2020. pdf available here

Rick Davies comment: This is the most interesting and useful paper I have seen yet written on process tracing and its use for evaluation purposes. A good mix of methodology discussion, practical examples and useful recommendations.

The impact of impact evaluation

WIDER Working Paper 2020/20. Richard Manning, Ian Goldman, and Gonzalo Hernández Licona. PDF copy available

Abstract: In 2006 the Center for Global Development’s report ‘When Will We Ever Learn? Improving lives through impact evaluation’ bemoaned the lack of rigorous impact evaluations. The authors of the present paper researched international organizations and countries including Mexico, Colombia, South Africa, Uganda, and Philippines to understand how impact evaluations and systematic reviews are being implemented and used, drawing out the emerging lessons. The number of impact evaluations has risen (to over 500 per year), as have those of systematic reviews and other synthesis products, such as evidence maps. However, impact evaluations are too often donor-driven, and not embedded in partner governments. The willingness of politicians and top policymakers to take evidence seriously is variable, even in a single country, and the use of evidence is not tracked well enough. We need to see impact evaluations within a broader spectrum of tools available to support policymakers, ranging from evidence maps, rapid evaluations, and rapid synthesis work, to formative/process evaluations and classic impact evaluations and systematic reviews.

Selected quotes

4.1 Adoption of IEs On the basis of our survey, we feel that real progress has been made since 2006 in the adoption of IEs to assess programmes and policies in LMICs. As shown above, this progress has not just been in terms of the number of IEs commissioned, but also in the topics covered, and in the development of a more flexible suite of IE products. There is also some evidence, though mainly anecdotal, 89 that the insistence of the IE community on rigour has had some effect both in levering up the quality of other forms of evaluation and in gaining wider acceptance that ‘before and after’ evaluations with no valid control group tell one very little about the real impact of interventions. In some countries, such as South Africa, Mexico, and Colombia, institutional arrangements have favoured the use of evaluations, including IEs, although more uptake is needed.

There is also perhaps a clearer understanding of where IE techniques can or cannot usefully be applied, or combined with other types of evaluation.

At the same time, some limitations are evident. In the first place, despite the application of IE techniques to new areas, the field remains dominated by medical trials and interventions in the social sectors. Second, even in the health sector, other types of evaluation still account for the bulk of total evaluations, whether by donor agencies or LMIC governments.

Third, despite the increase in willingness of a few LMICs to finance and commission their own IEs, the majority of IEs on policies and programmes in such countries are still financed and commissioned by donor agencies, albeit in some cases with the topics defined by the countries, such as in 3ie’s policy windows. In quite a few cases, the prime objectives of such IEs are domestic accountability and/or learning within the donor agency. We believe that greater local ownership of IEs is highly desirable. While there is much that could not have been achieved without donor finance and commissioning, our sense is that—as with other forms of evaluation—a more balanced pattern of finance and commissioning is needed if IEs are to become a more accepted part of national evidence systems.

Fourth, the vast majority of IEs in LMICs appear to have ‘northern’ principal investigators. Undoubtedly, quality and rigour are essential to IEs, but it is important that IEs should not be perceived as a supply-driven product of a limited number of high-level academic departments in, for the most part, Anglo-Saxon universities, sometimes mediated through specialist consultancy firms. Fortunately, ‘southern’ capacity is increasing, and some programmes have made significant investments in developing this. We take the view that this progress needs to be ramped up very considerably in the interests of sustainability, local institutional development, and contributing over time to the local culture of evidence.

Fifth, as pointed out in Section 2.1, the financing of IEs depends to a troubling extent on a small body of official agencies and foundations that regard IEs as extremely important products. Major shifts in policy by even a few such agencies could radically reduce the number of IEs being financed.

Finally, while IEs of individual interventions are numerous and often valuable to the programmes concerned, IEs that transform thinking about policies or broad approaches to key issues of development are less evident. The natural tools for such results are more often synthesis products than one-off IEs, and to these we now turn

4.2 Adoption of synthesis products (building body of evidence)

Systematic reviews and other meta-analyses depend on an adequate underpinning of well structured IEs, although methodological innovation is now using a more diverse set of sources. 91 The take-off of such products therefore followed the rise in the stock of IEs, and can be regarded as a further wave of the ‘evidence revolution’, as it has been described by Howard White (2019). Such products are increasingly necessary, as the evidence from individual IEs grows.

As with IEs, synthesis products have diversified from full systematic reviews to a more flexible suite of products. We noted examples from international agencies in Section 2.1 and to a lesser extent from countries in Section 3, but many more could be cited. In several cases, synthesis products seek to integrate evidence from quasi-experimental evaluations (e.g. J-PAL’s Policy Insights) or other high-quality research and evaluation evidence.

The need to understand what is now available and where the main gaps in knowledge exist has led in recent years to the burgeoning of evidence maps, pioneered by 3ie but now produced by a variety of institutions and countries. The example of the 500+ evaluations in Uganda cited earlier shows the range of evidence that already exists, which should be mapped and used before new evidence is sought. This should be a priority in all countries.

The popularity of evidence maps shows that there is now a real demand to ‘navigate’ the growing body of IE-based evidence in an efficient manner, as well as to understand the gaps that still exist. The innovation happening also in rapid synthesis shows the demand for synthesis products—but more synthesis is still needed in many sectors and, bearing in mind the expansion in IEs, should be increasingly possible.

Bayesian belief networks – Their use in humanitarian scenarios An invitation to explorers

By Aldo Benini. July 2018. Available here as a pdf


This is an invitation for humanitarian data analysts and others –  assessment, policy and advocacy specialists, response planners and grant writers – to enhance the reach and quality of scenarios by means of so-called Bayesian belief networks. Belief networks are a powerful technique for structuring scenarios in a qualitative as well as quantitative approach. Modern software, with elegant graphical user interfaces, makes for rapid learning, convenient drafting, effortless calculation and compelling presentation in workshops, reports and Web pages.

In recent years, scenario development in humanitarian analysis has grown. Until now, however, the community has hardly ever tried out belief networks, in contrast to the natural disaster and ecological communities. This note offers a small demonstration. We build a simple belief network using information currently (mid-July 2018) available on a recent violent crisis in Nigeria. We produce and discuss several possible scenarios for the next three months, computing probabilities of two humanitarian outcomes.

Figure 1: Belief network with probability bar charts (segment)

We conclude with reflections on the contributions of belief networks to humanitarian scenario building and elsewhere. While much speaks for this technique, the growth of competence, the uses in workshops and the interpretation of graphs and statistics need to be fostered cautiously, with consideration for the real-world complexity and for the doubts that stakeholders may harbor about quantitative approaches. This note is in its first draft. It needs to be revised, possibly by several authors, in order to connect to progress in humanitarian scenario methodologies, expert judgment and workshop didactics

RD Comment: See also the comment and links provided below by Simon Henderson on his experience (with IOD/PARC) of trialing the use of Bayesian belief networks

Representing Theories of Change: Technical Challenges with Evaluation Consequences

A CEDIL Inception Paper, by Rick Davies. August 2018.  A pdf copy is available here 


Abstract: This paper looks at the technical issues associated with the representation of Theories of Change and the implications of design choices for the evaluability of those theories. The focus is on the description of connections between events rather than the events themselves, because this is seen as a widespread design weakness. Using examples and evidence from Internet sources six structural problems are described along with their consequences for evaluation.

The paper then outlines a range of different ways of addressing these problems which could be used by programme designers, implementers and evaluators. The paper concludes with some caution speculating on why the design problems are so endemic but also pointing a way forward. Four strands of work are identified that CEDIL and DFID could invest in to develop solutions identified in the paper.

Table of Contents

What is a theory of change?
What is the problem?
A summary of the problems….
And a word in defence….
Six possible ways forward
Why so little progress?
Implications for CEDIL and DFID

Postscript: Michael Bamberger’s 2018 07 13 comments on this paper

I think this is an extremely useful and well-documented paper.  Framing the discussion around the 6 problems, and the possible ways forward is a good way to organize the presentation.  The documentation and links that you present will be greatly appreciated, as well as the graphical illustrations of the different approaches.
Without getting into too much detail, the following are a few general thoughts on this very useful paper:
  1. A criticism of many TOCs is that they only describe how a program will achieve its intended objectives and they do not address th challenges of identifying and monitoring potential unintended and often undesired, outcomes (UOs)  While some UOs could not have been anticipated, many others could, and these should perhaps be built into the model.  For example, there is an extensive literature documenting negative consequences for women of political and economic empowerment, often including increased domestic violence.  So these could be built into the TOC, but in many cases they are not.
  2. Many, but certainly not all, TOCs do not adequately address the challenges of emergence the fact that the environment in which the program operates; the political and organizational arrangements; and the characteristics of the target population and how they respond to the program are all likely to change significantly during the life of the project.  Many TOCs implicitly assume that the project and its environment remain relatively stable throughout the project lifetime.  Of course, many of the models you describe do not assume a stable environment, but it might be useful to flag the challenges of emergence. Many agencies are starting to become interested in agile project management to address the emergence challenge.
  3. Given the increasing recognition that most evaluation approaches do not adequately address complexity, and the interest in complexity-responsive evaluation approaches, you might like to focus more directly on how TOCs can address complexity.  Complexity is, of course, implicit in much of your discussion, but it might b useful to highlight the term.
  4. Do you think it would be useful to include a section on how big data and data analytics can strengthen the ability to develop more sophisticated TOCs.  Many agencies may feel that many of the techniques you mention would not be feasible with the kinds of data they collect and their current analytical tools.
  5. Related to the previous point, it might be useful to include a brief discussion of how accessible the quite sophisticated methods that you discuss would be to many evaluation offices.  What kinds of expertise would be required?  where would the data come from? how much would it cost.  You don’t ned to go into too much detail but many readers would like guidance on which approaches are likely to be accessible to which kinds of agency.
  6. Your discussion of “Why so little progress?” is critical.  It is my impression that among the agencies with whom I have worked,  while many evaluations pay lip-service to TOC, the full potential of the approach is very often not utilized.  Often the TOC is constructed at the start of a project with major inputs from an external consultant.  The framework is then rarely consulted again until the final evaluation report is being written, and there are even fewer instances where it is regularly tested, updated and revised.  There are of course many exceptions, and I am sure experience may be different with other kinds of agencies.  However, I think that many implementing agencies (and many donors) have very limited expectations concerning what they hope TOC will contribute.  There is probably very little appetite among many implementing agencies (as opposed to a few funding agencies such as DFID) for more refined models.
  7. Among agencies where this is the case, it will be necessary to demonstrate the value-added of investing time and resources in more refined TOCs.  So it might be useful to expand the discussion of the very practical, as opposed to the broader theoretical, justifications for investing in the existing TOC.
  8. In addition to the above considerations, many evaluators tend to be quite conservative in their choice of methodologies and they are often reluctant to adopt new methodologies – particularly if these use approaches with which they are not familiar.  New approaches, such as some of those you describe can also be seen as threatening if they might undermine the status of the evaluation professional as expert in his/her field.

Searching for Success: A Mixed Methods Approach to Identifying and Examining Positive Outliers in Development Outcomes

by Caryn Peiffer and Rosita Armytage, April 2018, Development Leadership Program Research Paper 52. Available as pdf

Summary: Increasingly, development scholars and practitioners are reaching for exceptional examples of positive change to better understand how developmental progress occurs. These are often referred to as ‘positive outliers’, but also ‘positive deviants’ and ‘pockets of effectiveness’.
Studies in this literature promise to identify and examine positive developmental change occurring in otherwise poorly governed states. However, to identify success stories, such research largely relies on cases’ reputations, and, by doing so, overlooks cases that have not yet garnered a reputation for their developmental progress.

This paper presents a novel three-stage methodology for identifying and examining positive outlier cases that does not rely solely on reputations. It therefore promises to uncover ‘hidden’ cases of developmental progress as
well as those that have been recognised.

The utility of the methodology is demonstrated through its use in uncovering two country case studies in which surprising rates of bribery reduction occurred, though the methodology has much broader applicability. The advantage of the methodology is validated by the fact that, in both of the cases identified, the reductions in bribery that occurred were largely previously unrecognised.

Introduction 1
Literature review: How positive outliers are selected 2
Stage 1: Statistically identifying potential positive outliers in bribery reduction 3
Stage 2: Triangulating statistical data 6
Stage 3: In-country case study fieldwork 7
Promise realised: Uncovering hidden ‘positive outliers’ 8
Conclusion 9
References 11
Appendix: Excluded samples from pooled GCB dataset 13

Rick Davies comment: This is a paper that has been waiting to be published, one that unites a qual and quant approach to identifying AND understanding positive deviance / positive outliers [I do prefer the latter term, promoted by the authors of this paper]

The authors use regression analysis to identify statistical outliers, which is appropriate where numerical data is available.. Where the data is binary/categorical is possible to use other methods to identify such outliers. See this page on the use of the EvaLC3 Excel app to find positive outliers in binary data sets.

Where there is no single Theory of Change: The uses of Decision Tree models

Eliciting tacit and multiple Theories of Change

Rick Davies, November 2012. Unpublished paper. Available as pdf version available hereand a 4 page summary version

This paper begins by identifying situations where a theory-of-change led approach to evaluation can be difficult, if not impossible. It then introduces the idea of systematic rather than ad hoc data mining and the types of data mining approaches that exist. The rest of the paper then focuses on one data mining method known as Decision Trees, also known as Classification Trees.  The merits of Decision Tree models are spelled out and then the processes of constructing Decision Trees are explained. These include the use of computerised algorithms and ethnographic methods, using expert inquiry and more participatory processes. The relationships of Decision Tree analyses to related methods are then explored, specifically Qualitative Comparative Analysis (QCA) and Network Analysis. The final section of the paper identifies potential applications of Decision Tree analyses, covering the elicitation of tacit and multiple Theories of Change, the analysis of project generated data and the meta-analysis of data from multiple evaluations. Readers are encouraged to explore these usages.

Included in the list of merits of Decision Tree models is the possibility of differentiating what are necessary and/or sufficient causal conditions and the extent to which a cause is a contributory cause (a la Mayne)

Comments on this paper are being sought. Please post them below or email Rick Davies at

Separate but related:

See also: An example application of Decision Tree (predictive) models (10th April 2013)

Postscript 2013 03 20: Probably the best book on Decision Tree algorithms is:

Rokach, Lior, and Oded Z. Maimon. Data Mining with Decision Trees: Theory and Applications. World Scientific, 2008. A pdf copy is available