Simple but not simplistic: Findings from a theory-driven retrospective evaluation of a small projects program

By Larry Dershem, Maya Komakhidze, Mariam Berianidze, in Evaluation and Program Planning 97 (2023) 102267.  A link to the article, which will be active for 30 days. After that, contact the authors.

Why I like this evaluation – see below  and the lesson I may have learned

Background and purpose: From 2010–2019, the United States Peace Corps Volunteers in Georgia implemented 270 small projects as part of the US Peace Corps/Georgia Small Projects Assistance (SPA) Program. In early 2020, the US Peace Corps/Georgia office commissioned a retrospective evaluation of these projects. The key evaluation questions were: 1) To what degree were SPA Program projects successful in achieving the SPA Program objectives over the ten years, 2) To what extent can the achieved outcomes be attributed to the SPA Program ’s interventions, and 3) How can the SPA Program be improved to increase likelihood of success of future projects.

Methods: Three theory-driven methods were used to answer the evaluation questions. First, a performance rubric was collaboratively developed with SPA Program staff to clearly identify which small projects had achieved intended outcomes and satisfied the SPA Program ’s criteria for successful projects. Second, qualitative comparative analysis was used to understand the conditions that led to successful and unsuccessful projects and obtain a causal package of conditions that was conducive to a successful outcome. Third, causal process tracing was used to unpack how and why the conjunction of conditions identified through qualitative comparative analysis were sufficient for a successful outcome.

Findings: Based on the performance rubric, thirty-one percent (82) of small projects were categorized as successful. Using Boolean minimization of a truth table based on cross case analysis of successful projects, a causal package of five conditions was sufficient to produce the likelihood of a successful outcome. Of the five conditions in the causal package, the productive relationship of two conditions was sequential whereas for the remaining three conditions it was simultaneous. Distinctive characteristics explained the remaining successful projects that had only several of the five conditions present from the causal package. A causal package, comprised of the conjunction of two conditions, was sufficient to produce the likelihood of an unsuccessful project. Conclusions: Despite having modest grant amounts, short implementation periods, and a relatively straightforward intervention logic, success in the SPA Program was uncommon over the ten years because a complex combination of conditions was necessary to achieve success. In contrast, project failure was more frequent and uncomplicated. However, by focusing on the causal package of five conditions during project design and implementation, the success of small projects can be increased.

Why I like this paper:

1. The clear explanation of the basic QCA process
2. The detailed connection made between the conditions being investigated and the background theory of change about the projects being analysed.
3. The section on causal process  which investigates alternative sequencing of conditions
4. The within case descriptions of modal cases (true positives) and the cases which were successful but not covered by the intermediate solution (false negatives), and the contextual background given for each of the conditions you are investigating.
5. The investigation of the causes of the absence of the outcome, all too often not given sufficient attention in other studies/evaluation
6. The points made in the summary especially about the possibility of causal configurations changing over time, and a proposal to include characteristics of the intermediate solution into the project proposal screening stage. It has bugged me for a long time how little attention is given to the theory embodied into project proposal screening processes, let alone testing details of these assessments against subsequent outcomes. I know the authors were not proposing this specifically here but the idea of revising the selection process by new evidence of prior performance is consistent and makes a lot of sense
7. The fact that the data set is part of the paper and open to reanalysis by others (see below)

New lessons, at least for me..about satisficing versus optimising

It could be argued that the search for Sufficient conditions (individual or configurations of)  is a minimalist ambition, a form of “satisficing” rather than optimising. In the above authors’ analysis their “intermediate solution”, which met the criteria of sufficiency,  accounted for 5 of the 12 cases where the expected outcome was present.

A more ambitious and optimising approach would be to seek maximum classification accuracy (=(TP+TN)/(TP+FP+FN+TN)), even if this at the initial cost of few False Positives. In my investigation of the same data set there was a single condition that was not sufficient, yet accounted for 9 of the  same 12 cases (NEED). This was at the cost of some inconsistency i.e two false positives also being present when this single condition was present (Cases 10 & 25) . This solution covered 75% of the cases with expected outcomes, versus 42% with the satisficing solution.

What might need to be taken into account when considering this choice of whether to prefer optimising over satisficing? One factor to consider is the nature of the performance of the two false positive cases? Was it near the boundary of what would be seen as successful performance i.e. a near miss? Or was it a really bad fail? Secondly, if it was a really bad fail, in terms of degree of failure, how significant was that for the lives of the people involved? How damaging was it? Thirdly, how avoidable was that failure? In the future is there a clear way in which these types of failure could be avoided, or not?

This argument relates to a point I have made on many occasions elsewhere. Different situations require different concerns about the nature of failure. An investor in the stock market can afford a high proportion of false positives in their predictions, so long as their classification accuracy is above 50% and they have plenty of time available. In the longer term they will be able to recover their losses and make a profit. But a brain surgeon can afford absolute minimum of false positives. If their patients die as a response of their wrong interpretation of what is needed that life is unrecoverable, and no amount of subsequent successful future operations will make a difference. At the very most, they will have learnt how to avoid such catastrophic mistakes in the future.

So my argument here is let’s not be too satisfied with satisficing solutions.  Let’s make sure that we have at the very least always tried to find the optimal solution (defined in terms of highest classification accuracy) and then looked closely at the extent to which that optimal solution can be afforded.

PS 1: Where there are “imbalanced classes” i.e a high proportion of outcome-absent cases (or vice versa) an alternate measure known as “balanced accuracy” is preferred. Balanced accuracy = ( TP/(TP+FN))+(TN/(TN+FP)))/2.

PS 2: If you have any examples of QCA studies that have compared sufficient solutions with non-sufficient but more (classification) accurate solutions, please let me know. They may be more common than I am assuming

Process tracing: A list

  • Understanding Process Tracing, David Collier, University of California, Berkeley. PS: Political Science and Politics 44, No.4 (2011):823-30. 7 pages.
    • Abstract: “Process tracing is a fundamental tool of qualitative analysis. This method is often invoked by scholars who carry out within-case analysis based on qualitative data, yet frequently it is neither adequately understood nor rigorously applied. This deficit motivates this article, which offers a new framework for carrying out process tracing. The reformulation integrates discussions of process tracing and causal-process observations, gives greater attention to description as a key contribution, and emphasizes the causal sequence in which process-tracing observations can be situated. In the current period of major innovation in quantitative tools for causal inference, this reformulation is part of a wider, parallel effort to achieve greater systematization of qualitative methods. A key point here is that these methods can add inferential leverage that is often lacking in quantitative analysis. This article is accompanied by online teaching exercises, focused on four examples from American politics, two from comparative politics, three from international relations, and one from public health/epidemiology”
      • Great explanation of the difference between straw-in-the-wind tests, hoop tests, smoking-gun tests and doubly-decisive tests, using Sherlock Holmes story “Silver Blaze”
  • Case selection techniques in Process-tracing and the implications of taking the study of causal mechanisms seriously, Derek Beach, Rasmus Brun, 2012, 33 pages
    • Abstract: “This paper develops guidelines for each of the three variants of Process-tracing (PT): explaining outcome PT, theory-testing, and theory-building PT. Case selection strategies are not relevant when we are engaging in explaining outcome PT due to the broader conceptualization of outcomes that is a product of the different understandings of case study research (and science itself) underlying this variant of PT. Here we simply select historically important cases because they are for instance the First World War, not a ‘case of’ failed deterrence or crisis decision-making. Within the two theorycentric variants of PT, typical case selection strategies are most applicable. A typical case is one that is a member of the set of X, Y and the relevant scope conditions for the mechanism. We put forward that pathway cases, where scores on other causes are controlled for, are less relevant when we take the study of mechanisms seriously in PT, given that we are focusing our attention on how a mechanism contributes to produce Y, not on the causal effects of an X upon values of Y. We also discuss the role that deviant cases play in theory-building PT, suggesting that PT cannot stand alone, but needs to be complemented with comparative analysis of the deviant case with typical cases”
  • Process-Tracing Methods: Foundations and Guidelines, Derek Beach, Rasmus Brun Pedersen,  The University of Michigan Press (15 Dec 2012), 248 pages.
    • Description: “Process-tracing in social science is a method for studying causal mechanisms linking causes with outcomes. This enables the researcher to make strong inferences about how a cause (or set of causes) contributes to producing an outcome. Derek Beach and Rasmus Brun Pedersen introduce a refined definition of process-tracing, differentiating it into three distinct variants and explaining the applications and limitations of each. The authors develop the underlying logic of process-tracing, including how one should understand causal mechanisms and how Bayesian logic enables strong within-case inferences. They provide instructions for identifying the variant of process-tracing most appropriate for the research question at hand and a set of guidelines for each stage of the research process.” View the Table of Contents here:
  • Mahoney, James. 2012. “Mahoney, J. (2012). The Logic of Process Tracing Tests in the Social Sciences.  1-28.” Sociological Methods & Research XX(X) (March): 1–28. doi:10.1177/0049124112437709.
    • Abstract: This article discusses process tracing as a methodology for testing hypotheses in the social sciences. With process tracing tests, the analyst combines preexisting generalizations with specific observations from within a single case to make causal inferences about that case. Process tracing tests can be used to help establish that (1) an initial event or process took place, (2) a subsequent outcome also occurred, and (3) the former was a cause of the latter. The article focuses on the logic of different process tracing tests, including hoop tests, smoking gun tests, and straw in the wind tests. New criteria for judging the strength of these tests are developed using ideas concerning the relative importance of necessary and sufficient conditions. Similarities and differences between process tracing and the deductive-nomological model of explanation are explored.
  • Goertz, Gary, and James Mahoney. 2012. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences. Princeton University Press. See chapter 8 on causal mechanisms and process tracing, and the surrounding chapters 7 and 9 which make up a section on within-case analysis
  • Hutchings, Claire. ‘Process Tracing: Draft Protocol’. Oxfam, 2013. Plus an associated blog posting and an Effectiveness Review which made use of the protocol
  • Schneider, C.Q., Rohlfing, I., 2013. Combining QCA and Process Tracing in Set-Theoretic Multi-Method Research. Sociological Methods & Research 42, 559–597. doi:10.1177/0049124113481341
    • Abstract:  Set-theoretic methods and Qualitative Comparative Analysis (QCA) in particular are case-based methods. There are, however, only few guidelines on how to combine them with qualitative case studies. Contributing to the literature on multi-method research (MMR), we offer the first comprehensive elaboration of principles for the integration of QCA and case studies with a special focus on case selection. We show that QCA’s reliance on set-relational causation in terms of necessity and sufficiency has important consequences for the choice of cases. Using real world data for both crisp-set and fuzzy-set QCA, we show what typical and deviant cases are in QCA-based MMR. In addition, we demonstrate how to select cases for comparative case studies aiming to discern causal mechanisms and address the puzzles behind deviant cases. Finally, we detail the implications of modifying the set-theoretic cross-case model in the light of case-study evidence. Following the principles developed in this article should increase the inferential leverage of set-theoretic MMR.”
  • Rohlfing, Ingo. “Comparative Hypothesis Testing Via Process Tracing.” Sociological Methods & Research 43, no. 4 (November 1, 2014): 606–42. doi:10.1177/0049124113503142.
    • Abstract: Causal inference via process tracing has received increasing attention during recent years. A 2 × 2 typology of hypothesis tests takes a central place in this debate. A discussion of the typology demonstrates that its role for causal inference can be improved further in three respects. First, the aim of this article is to formulate case selection principles for each of the four tests. Second, in focusing on the dimension of uniqueness of the 2 × 2 typology, I show that it is important to distinguish between theoretical and empirical uniqueness when choosing cases and generating inferences via process tracing. Third, I demonstrate that the standard reading of the so-called doubly decisive test is misleading. It conflates unique implications of a hypothesis with contradictory implications between one hypothesis and another. In order to remedy the current ambiguity of the dimension of uniqueness, I propose an expanded typology of hypothesis tests that is constituted by three dimensions.
  • Bennett, A., Checkel, J. (Eds.), 2014Process Tracing: From Metaphor to Analytic Tool. Cambridge University Press
  • Befani, Barbara, and John Mayne. “Process Tracing and Contribution Analysis: A Combined Approach to Generative Causal Inference for Impact Evaluation.IDS Bulletin 45, no. 6 (2014): 17–36. doi:10.1111/1759-5436.12110.
    • Abstract: This article proposes a combination of a popular evaluation approach, contribution analysis (CA), with an emerging method for causal inference, process tracing (PT). Both are grounded in generative causality and take a probabilistic approach to the interpretation of evidence. The combined approach is tested on the evaluation of the contribution of a teaching programme to the improvement of school performance of girls, and is shown to be preferable to either CA or PT alone. The proposed procedure shows that established Bayesian principles and PT tests, based on both science and common sense, can be applied to assess the strength of qualitative and quali-quantitative observations and evidence, collected within an overarching CA framework; thus shifting the focus of impact evaluation from ‘assessing impact’ to ‘assessing confidence’ (about impact).

  • Punton, M., Welle, K., 2015. Straws-in-the-wind, Hoops and Smoking Guns: What can Process Tracing Offer to Impact Evaluation?
    • Abstract:  “This CDI Practice Paper by Melanie Punton and Katharina Welle explains the methodological and theoretical foundations of process tracing, and discusses its potential application in international development impact evaluations. It draws on two early applications of process tracing for assessing impact in international development interventions: Oxfam Great Britain (GB)’s contribution to advancing universal health care in Ghana, and the impact of the Hunger and Nutrition Commitment Index (HANCI) on policy change in Tanzania. In a companion to this paper, Practice Paper 10 Annex describes the main steps in applying process tracing and provides some examples of how these steps might be applied in practice.”
  • Weller, N., & Barnes, J. (2016). Pathway Analysis and the search for causal mechanisms. Sociological Methods & Research, 45(3), 424–457.
    • Abstract: The study of causal mechanisms interests scholars across the social sciences. Case studies can be a valuable tool in developing knowledge and hypotheses about how causal mechanisms function. The usefulness of case studies in the search for causal mechanisms depends on effective case selection, and there are few existing guidelines for selecting cases to study causal mechanisms. We outline a general approach for selecting cases for pathway analysis: a mode of qualitative research that is part of a mixed-method research agenda, which seeks to (1) understand the mechanisms or links underlying an association between some explanatory variable, X1, and an outcome, Y, in particular cases and (2) generate insights from these cases about mechanisms in the unstudied population of cases featuring the X1/Y relationship. The gist of our approach is that researchers should choose cases for comparison in light of two criteria. The first criterion is the expected relationship between X1/Y, which is the degree to which cases are expected to feature the relationship of interest
      between X1 and Y. The second criterion is variation in case characteristics or the extent to which the cases are likely to feature differences in characteristics that can facilitate hypothesis generation. We demonstrate how to apply our approach and compare it to a leading example of pathway analysis in the so-called resource curse literature, a prominent example of a correlation featuring a nonlinear relationship and multiple causal mechanisms.
  • Befani, Barbara, and Gavin Stedman-Bryce. “Process Tracing and Bayesian Updating for Impact Evaluation.” Evaluation, June 24, 2016, 1356389016654584. doi:10.1177/1356389016654584.
    • Abstract: Commissioners of impact evaluation often place great emphasis on assessing the contribution made by a particular intervention in achieving one or more outcomes, commonly referred to as a ‘contribution claim’. Current theory-based approaches fail to provide evaluators with guidance on how to collect data and assess how strongly or weakly such data support contribution claims. This article presents a rigorous quali-quantitative approach to establish the validity of contribution claims in impact evaluation, with explicit criteria to guide evaluators in data collection and in measuring confidence in their findings. Coined ‘Contribution Tracing’, the approach is inspired by the principles of Process Tracing and Bayesian Updating, and attempts to make these accessible, relevant and applicable by evaluators. The Contribution Tracing approach, aided by a symbolic ‘contribution trial’, adds value to impact evaluation theory-based approaches by: reducing confirmation bias; improving the conceptual clarity and precision of theories of change; providing more transparency and predictability to data-collection efforts; and ultimately increasing the internal validity and credibility of evaluation findings, namely of qualitative statements. The approach is demonstrated in the impact evaluation of the Universal Health Care campaign, an advocacy campaign aimed at influencing health policy in Ghana.
%d bloggers like this: