Simple but not simplistic: Findings from a theory-driven retrospective evaluation of a small projects program

By Larry Dershem, Maya Komakhidze, Mariam Berianidze, in Evaluation and Program Planning 97 (2023) 102267.  A link to the article, which will be active for 30 days. After that, contact the authors.

Why I like this evaluation – see below  and the lesson I may have learned

Background and purpose: From 2010–2019, the United States Peace Corps Volunteers in Georgia implemented 270 small projects as part of the US Peace Corps/Georgia Small Projects Assistance (SPA) Program. In early 2020, the US Peace Corps/Georgia office commissioned a retrospective evaluation of these projects. The key evaluation questions were: 1) To what degree were SPA Program projects successful in achieving the SPA Program objectives over the ten years, 2) To what extent can the achieved outcomes be attributed to the SPA Program ’s interventions, and 3) How can the SPA Program be improved to increase likelihood of success of future projects.

Methods: Three theory-driven methods were used to answer the evaluation questions. First, a performance rubric was collaboratively developed with SPA Program staff to clearly identify which small projects had achieved intended outcomes and satisfied the SPA Program ’s criteria for successful projects. Second, qualitative comparative analysis was used to understand the conditions that led to successful and unsuccessful projects and obtain a causal package of conditions that was conducive to a successful outcome. Third, causal process tracing was used to unpack how and why the conjunction of conditions identified through qualitative comparative analysis were sufficient for a successful outcome.

Findings: Based on the performance rubric, thirty-one percent (82) of small projects were categorized as successful. Using Boolean minimization of a truth table based on cross case analysis of successful projects, a causal package of five conditions was sufficient to produce the likelihood of a successful outcome. Of the five conditions in the causal package, the productive relationship of two conditions was sequential whereas for the remaining three conditions it was simultaneous. Distinctive characteristics explained the remaining successful projects that had only several of the five conditions present from the causal package. A causal package, comprised of the conjunction of two conditions, was sufficient to produce the likelihood of an unsuccessful project. Conclusions: Despite having modest grant amounts, short implementation periods, and a relatively straightforward intervention logic, success in the SPA Program was uncommon over the ten years because a complex combination of conditions was necessary to achieve success. In contrast, project failure was more frequent and uncomplicated. However, by focusing on the causal package of five conditions during project design and implementation, the success of small projects can be increased.

Why I like this paper:

1. The clear explanation of the basic QCA process
2. The detailed connection made between the conditions being investigated and the background theory of change about the projects being analysed.
3. The section on causal process  which investigates alternative sequencing of conditions
4. The within case descriptions of modal cases (true positives) and the cases which were successful but not covered by the intermediate solution (false negatives), and the contextual background given for each of the conditions you are investigating.
5. The investigation of the causes of the absence of the outcome, all too often not given sufficient attention in other studies/evaluation
6. The points made in the summary especially about the possibility of causal configurations changing over time, and a proposal to include characteristics of the intermediate solution into the project proposal screening stage. It has bugged me for a long time how little attention is given to the theory embodied into project proposal screening processes, let alone testing details of these assessments against subsequent outcomes. I know the authors were not proposing this specifically here but the idea of revising the selection process by new evidence of prior performance is consistent and makes a lot of sense
7. The fact that the data set is part of the paper and open to reanalysis by others (see below)

New lessons, at least for me..about satisficing versus optimising

It could be argued that the search for Sufficient conditions (individual or configurations of)  is a minimalist ambition, a form of “satisficing” rather than optimising. In the above authors’ analysis their “intermediate solution”, which met the criteria of sufficiency,  accounted for 5 of the 12 cases where the expected outcome was present.

A more ambitious and optimising approach would be to seek maximum classification accuracy (=(TP+TN)/(TP+FP+FN+TN)), even if this at the initial cost of few False Positives. In my investigation of the same data set there was a single condition that was not sufficient, yet accounted for 9 of the  same 12 cases (NEED). This was at the cost of some inconsistency i.e two false positives also being present when this single condition was present (Cases 10 & 25) . This solution covered 75% of the cases with expected outcomes, versus 42% with the satisficing solution.

What might need to be taken into account when considering this choice of whether to prefer optimising over satisficing? One factor to consider is the nature of the performance of the two false positive cases? Was it near the boundary of what would be seen as successful performance i.e. a near miss? Or was it a really bad fail? Secondly, if it was a really bad fail, in terms of degree of failure, how significant was that for the lives of the people involved? How damaging was it? Thirdly, how avoidable was that failure? In the future is there a clear way in which these types of failure could be avoided, or not?

This argument relates to a point I have made on many occasions elsewhere. Different situations require different concerns about the nature of failure. An investor in the stock market can afford a high proportion of false positives in their predictions, so long as their classification accuracy is above 50% and they have plenty of time available. In the longer term they will be able to recover their losses and make a profit. But a brain surgeon can afford absolute minimum of false positives. If their patients die as a response of their wrong interpretation of what is needed that life is unrecoverable, and no amount of subsequent successful future operations will make a difference. At the very most, they will have learnt how to avoid such catastrophic mistakes in the future.

So my argument here is let’s not be too satisfied with satisficing solutions.  Let’s make sure that we have at the very least always tried to find the optimal solution (defined in terms of highest classification accuracy) and then looked closely at the extent to which that optimal solution can be afforded.

PS 1: Where there are “imbalanced classes” i.e a high proportion of outcome-absent cases (or vice versa) an alternate measure known as “balanced accuracy” is preferred. Balanced accuracy = ( TP/(TP+FN))+(TN/(TN+FP)))/2.

PS 2: If you have any examples of QCA studies that have compared sufficient solutions with non-sufficient but more (classification) accurate solutions, please let me know. They may be more common than I am assuming

Techniques to Identify Themes (in text/interview data)

Ryan, G. W., & Bernard, H. R. (2003). Techniques to Identify Themes. Field Methods, 15(1), 85–109. https://doi.org/10.1177/1525822X02239569  

.

Abstract: Theme identification is one of the most fundamental tasks in qualitative research. It also is one of the most mysterious. Explicit descriptions of theme discovery are rarely found in articles and reports, and when they are, they are often relegated to appendices or footnotes. Techniques are shared among small groups of social scientists, but sharing is impeded by disciplinary or epistemological boundaries. The techniques described here are drawn from across epistemological and disciplinary boundaries. They include both observational and manipulative techniques and range from quick word counts to laborious, in-depth, line-by-line scrutiny. Techniques are compared on six dimensions: (1) appropriateness for data types, (2) required labor, (3) required expertise, (4) stage of analysis, (5) number and types of themes to be generated, and (6) issues of reliability and validity.

.

Contents (as in headings used)
  • What is a theme
  • HOW DO YOU KNOW A THEME WHEN YOU SEE ONE?
  • WHERE DO THEMES COME FROM?
  • SCRUTINY TECHNIQUES—THINGS TO LOOK FOR
    • Repetitions
    • Indigenous Typologies or Categories
    • Metaphors and Analogies
    • Transitions
    • Similarities and Differences
    • Linguistic Connectors
    • Missing Data
    • Theory-Related Material
  • PROCESSING TECHNIQUES
    • Cutting and Sorting
    • Word Lists and Key Words in Context (KWIC)
    • Word Co-Occurrence
    • Metacoding
  • SELECTING AMONG TECHNIQUES
    • Kind of Data
    • Expertise
    • Labor
    • Number and Kinds of Themes
    • Reliability and Validity
  • FURTHER RESEARCH
  • NOTES
  • REFERENCES

An Institutional View of Algorithmic Impact Assessments

Selbst, A. (2021). An Institutional View of Algorithmic Impact Assessments. Harvard Journal of Law and Technology, 35(10), 78. The author has indicated that paper that can be downloaded has a “draft” status.
First some general points about its relevance:
  1. Rich people get personalised one-to-one attention and services. Poor people get processed by algorithms. That may be a bit of a caricature, but there is also some truth there. Consider loan applications, bail applications, recruitment decisions, welfare payments. And perhaps medical diagnoses and treatments, depending to the source of service.  There is therefore a good reason for any evaluators concerned with equity to pay close attention to how algorithms affect the lives of the poorest sections of societies.
  2. This paper reminded me of the importance of impact assessments, as distinct from impact evaluations. The former are concerned with “effects-of-a-cause“, as distinct from the “causes-of-an-effect” , which is the focus of impact evaluations. In this paper impact assessment is specifically concerned about negative impacts, which is a narrower ambit than I  have seen previously in my sphere of work. But complementary to the expectations of positive impact associated with impact evaluations.  It may reflect the narrowness of my inhabited part of the evaluation world, but my feeling is that impact evaluations get way more attention than impact assessments. Yet once could argue that the default situation should be the reverse. Though I cant quite articulate my reasoning … I think it is something to do with the perception that most of the time the world acts on us, relative to us acting on the world.
Some selected quotes:
  1. The impact assessment approach has two principal aims. The first goal is to get the people who build systems to think methodically about the details and potential impacts of a complex project before its implementation, and therefore head off risks before they become too costly to correct. As proponents of values-in-design have argued for decades, the earlier in project development that social values are considered, the more likely that the end result will reflect those social values. The second goal is to create and provide documentation of the decisions made in development and their rationales, which in turn can lead to better accountability for those decisions and useful information for future policy  interventions (p.6)
    1. This Article will argue in part that once filtered through the institutional logics of the private sector, the first goal of improving systems through better design will only be effective in those organizations motivated by social obligation rather than mere compliance, but second goal of producing information needed for better policy and public understanding is what really can make the AIA regime worthwhile (p.8)
  2. Among all possible regulatory approaches, impact assessments are most useful where projects have unknown and hard-to-measure impacts on society, where the people creating the project and the ones with the knowledge and expertise to estimate its impacts have inadequate incentives to generate the needed information, and where the public has no other means to create that information. What is attractive about the AIA (Algorithmic Impact Assessment) is that we are now in exactly such a situation with respect to algorithmic harms. (p.7)
  3. The Article proceeds in four parts. Part I introduces the AIA, and
    explains why it is likely a useful approach….Part II briefly surveys different models of AIA that have been proposed as well as two alternatives: self-regulation and audits…Part III examines how institutional forces shape regulation and compliance, seeking to apply those lessons to the case of AIAs….Ultimately, the Part concludes that AIAs may not be
    fully successful in their primary goal of getting individual firms to consider
    social problems early, but that the second goal of policy-learning may well be
    more successful because it does not require full substantive compliance. Finally, Part IV looks at what we can learn from the technical community. This part discusses many relevant developments within technology industry and scholarship: empirical research into how firms understand AI fairness and ethics, proposals for documentation standards coming from academic and industrial labs, trade groups, standards organizations, and various self-regulatory framework proposal.(p.9)

 

 

Five ways to ensure that models serve society: A manifesto

Saltelli, A., Bammer, G., Bruno, I., Charters, E., Fiore, M. D., Didier, E., Espeland, W. N., Kay, J., Piano, S. L., Mayo, D., Jr, R. P., Portaluri, T., Porter, T. M., Puy, A., Rafols, I., Ravetz, J. R., Reinert, E., Sarewitz, D., Stark, P. B., … Vineis, P. (2020). Five ways to ensure that models serve society: A manifesto. Nature, 582(7813), 482–484. https://doi.org/10.1038/d41586-020-01812-9

The five ways:

    1. Mind the assumptions
      • “One way to mitigate these issues is to perform global uncertainty and sensitivity analyses. In practice, that means allowing all that is uncertain — variables, mathematical relationships and boundary conditions — to vary simultaneously as runs of the model produce its range of predictions. This often reveals that the uncertainty in predictions is substantially larger than originally asserted”
    2. Mind the hubris
      • Most modellers are aware that there is a tradeoff between the usefulness of a model and the breadth it tries to capture. But many are seduced by the idea of adding complexity in an attempt to capture reality more accurately. As modellers incorporate more phenomena, a model might fit better to the training data, but at a cost. Its predictions typically become less
    3. Mind the framing
      • “Match purpose and context. Results from models will at least partly reflect the interests, disciplinary orientations and biases of the developers. No one model can serve all purposes. accurate”
    4. Mind the consequences
      • Quantification can backfire. Excessive regard for producing numbers can push a discipline away from being roughly right towards being precisely wrong. Undiscriminating use of statistical tests can substitute for sound judgement. By helping to make risky financial products seem safe, models contributed to derailing the global economy in 2007–08 (ref. 5).”
    5. Mind the unknowns
      • Acknowledge ignorance. For most of the history of Western philosophy, self-awareness of ignorance was considered a virtue, the worthy object of intellectual pursuit”

“Ignore the five, and model predictions become Trojan horses for unstated
interests and values”

“Models’ assumptions and limitations must be appraised openly and honestly. Process and ethics matter as much as intellectual prowess”

“Mathematical models are a great way to explore questions. They are also a dangerous way to assert answers. Asking models for  certainty or consensus is more a sign of the  difficulties in making controversial decisions  than it is a solution, and can invite ritualistic use of quantification”

A broken system – why literature searching needs a FAIR revolution

Gusenbauer, Michael, and Neal R. Haddaway. ‘Which Academic Search Systems Are Suitable for Systematic Reviews or Meta-Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed, and 26 Other Resources’. Research Synthesis Methods,2019.

Haddaway, Neal, and Michael Gusenbauer. 2020. ‘A Broken System – Why Literature Searching Needs a FAIR Revolution’. LSE (blog). 3 February 2020.

“….searches on Google Scholar are neither reproducible, nor transparent.  Repeated searches often retrieve different results and users cannot specify detailed search queries, leaving it to the system to interpret what the user wants.

However, systematic reviews in particular need to use rigorous, scientific methods in their quest for research evidence. Searches for articles must be as objective, reproducible and transparent as possible. With systems like Google Scholar, searches are not reproducible – a central tenet of the scientific method. 

Specifically, we believe there is a very real need to drastically overhaul how we discover research, driven by the same ethos as in the Open Science movement. The FAIR data principles offer an excellent set of criteria that search system providers can adapt to make their search systems more adequate for scientific search, not just for systematic searching, but also in day-to-day research discovery:

  • Findable: Databases should be transparent in how search queries are interpreted and in the way they select and rank relevant records. With this transparency researchers should be able choose fit-for-purpose databases clearly based on their merits.
  • Accessible: Databases should be free-to-use for research discovery (detailed analysis or visualisation could require payment). This way researchers can access all knowledge available via search.
  • Interoperable: Search results should be readily exportable in bulk for integration into evidence synthesis and citation network analysis (similar to the concept of ‘research weaving’ proposed by Shinichi Nakagawa and colleagues). Standardised export formats help analysis across databases.
  • Reusable: Citation information (including abstracts) should not be restricted by copyright to permit reuse/publication of summaries/text analysis etc.

Rick Davies comment: I highly recommend using Lens.org, a search facility mentioned in the second paper above.

Predict science to improve science

DellaVigna, Stefano, Devin Pope Vivalt, and Eva Vivalt. 2019. Predict Science to Improve Science’. Science 366 (6464): 428–29.

Selected quotes follow:

The limited attention paid to predictions of research results stands in
contrast to a vast literature in the social sciences exploring people’s
ability to make predictions in general

We stress three main motivations for a more systematic collection of predictions of research results. 1. The nature of scientific progress. A new result builds on the consensus, or lack thereof, in an area and is often evaluated for how surprising, or not, it is. In turn, the novel result will lead to an updating of views. Yet we do not have a systematic procedure to capture the scientific views prior to a study, nor the updating that takes place afterward.

2. A second benefit of collecting predictions is that they can not only reveal when results are an important departure from expectations of the research community and improve the interpretation of research results, but they can also potentially help to mitigate publication bias. It is not uncommon for research findings to be met by claims that they are not surprising. This may be particularly true when researchers find null results, which are rarely published even when authors have used rigorous methods to answer important questions (15). However, if priors are collected before carrying out a study, the results can be compared to the average expert prediction, rather than to the null hypothesis of no effect. This would allow researchers to confirm that some results were unexpected, potentially making them more interesting and informative because they indicate rejection of a prior held by the research community; this could contribute to alleviating publication bias against null results.


3. A third benefit of collecting predictions systematically is that it makes it possible to improve the accuracy of predictions. In turn, this may help with experimental design. For example, envision a behavioral research team consulted to help a city recruit a more diverse police department. The team has a dozen ideas for reaching out to minority applicants, but the sample size allows for only three treatments to be tested with adequate statistical power. Fortunately, the team has recorded forecasts for several years, keeping track of predictive accuracy, and they have learned that they can combine team members’ predictions, giving more weight to “superforecasters” (9). Informed by its longitudinal data on forecasts, the team can elicit predictions for each potential project and weed out those interventions judged to have a low chance of success or focus on those interventions with a higher value of information. In addition, the research results of those projects that did go forward would be more impactful if accompanied by predictions that allow better interpretation of results in light of the conventional wisdom.

Rick Davies comment: I have argued, for years, that evaluators should start by eliciting client, and other stakeholders, predictions of outcomes of interest that the evaluation might uncover (e.g. Bangladesh, 2004). But I can’t think of any instance where my efforts have been successful, yet. But I have an upcoming opportunity and will try once again, perhaps armed with these two papers.

See also Stefano DellaVigna, and Devin Pope. 2016.‘Predicting Experimental Results: Who Knows What? NATIONAL BUREAU OF ECONOMIC RESEARCH.

ABSTRACT
Academic experts frequently recommend policies and treatments. But how well do they anticipate the impact of different treatments? And how do their predictions compare to the predictions of non-experts? We analyze how 208 experts forecast the results of 15 treatments involving monetary and non-monetary motivators in a real-effort task. We compare these forecasts to those made by PhD students and non-experts: undergraduates, MBAs, and an online sample. We document seven main results. First, the average forecast of experts predicts quite well the experimental results. Second, there is a strong wisdom-of-crowds effect: the average forecast outperforms 96 per cent of individual forecasts. Third, correlates of expertise—citations, academic rank, field, and contextual experience–do not improve forecasting accuracy. Fourth, experts as a group do better than non-experts, but not if accuracy is defined as rank-ordering treatments. Fifth, measures of effort, confidence, and revealed ability are predictive of forecast accuracy to some extent, especially for non-experts. Sixth, using these measures we identify `superforecasters’ among the non-experts who outperform the experts out of sample. Seventh, we document that these results on forecasting accuracy surprise the forecasters themselves. We present a simple model that organizes several of these results and we stress the implications for the collection of forecasts of future experimental results.

See also: The Social Science Prediction Platform, developed by the same authors.

Twitter responses to this post:

Howard White@HowardNWhite Ask decision-makers what they expect research findings to be before you conduct the research to help assess the impact of the research. Thanks to @MandE_NEWS for the pointer. https://socialscienceprediction.org

Marc Winokur@marc_winokur Replying to @HowardNWhite and @MandE_NEWS For our RCT of DR in CO, the child welfare decision makers expected a “no harm” finding for safety, while other stakeholders expected kids to be less safe. When we found no difference in safety outcomes, but improvements in family engagement, the research impact was more accepted

Nature editorial: “Tell it like it is”

22 January 2020. Aimed at researchers, but equally relevant to evaluators. Quoted in full below, available online here. Bold highlighting is mine

Every research paper tells a story, but the pressure to provide ‘clean’ narratives is harmful to the scientific endeavour. Research manuscripts provide an account of how their authors addressed a research question or questions, the means they used to do so, what they found and how the work (dis)confirms existing hypotheses or generates new ones. The current research culture is characterized by significant pressure to present research projects as conclusive narratives that leave no room for ambiguity or for conflicting or inconclusive results. The pressure to produce such clean narratives, however, represents a significant threat to validity and runs counter to the reality of what science looks like.

Prioritizing conclusive over transparent research narratives incentivizes a host of questionable research practices: hypothesizing after the results are known, selectively reporting only those outcomes that confirm the original predictions or excluding from the research report studies that provide contradictory or messy results. Each of these practices damages credibility and presents a distorted picture of the research that prevents cumulative knowledge.

During peer review, reviewers may occasionally suggest that the authors ‘reframe’ the reported work. While this is not problematic for exploratory research, it is inappropriate for confirmatory research—that is, research that tests pre-existing hypotheses. Altering the hypotheses or predictions of confirmatory research after the fact invalidates inference and renders the research fundamentally unreliable. Although these reframing suggestions are made in good faith, we will always overrule them, asking authors to present their hypotheses and predictions as originally intended.

Preregistration is being increasingly adopted across different fields as a means of preventing questionable research practices and increasing transparency. As a journal, we strongly support the preregistration of confirmatory research (and currently mandate registration for clinical trials). However, preregistration has little value if authors fail to abide by it or do not transparently report whether their project differs from what they preregistered and why. We ask that authors provide links to their preregistrations, specify the date of preregistration and transparently report any deviations from the original protocol in their manuscripts.

There is occasionally valid reason to deviate from the preregistered protocol, especially if that protocol did not have the benefit of peer review before the authors carried out their research (as in Registered Reports). For instance, it sometimes becomes apparent during peer review that a preregistered analysis is inappropriate or suboptimal. For all deviations from the preregistered protocol, we ask authors to indicate in their manuscripts how they deviated from their original plan and explain their reason for doing so (e.g., flaw, suboptimality, etc.). To ensure transparency, unless a preregistered analysis plan is unquestionably flawed, we ask that authors also report the results of their preregistered analyses alongside the new analyses.

Occasionally, authors may be tempted to drop a study from their report for reasons other than poor quality (or reviewers may make that recommendation)—for instance, because the results are incompatible with other studies reported in the paper. We discourage this practice; in multistudy research papers, we ask that authors report all of the work they carried out, regardless of outcome. Authors may speculate as to why some of their work failed to confirm their hypotheses and need to appropriately caveat their conclusions, but dropping studies simply exacerbates the file-drawer problem and presents the conclusions of research as more definitive than they are.

No research project is perfect; there are always limitations that also need to be transparently reported. In 2019, we made it a requirement that all our research papers include a limitations section, in which authors explain methodological and other shortcomings and explicitly acknowledge alternative interpretations of their findings.

Science is messy, and the results of research rarely conform fully to plan or expectation. ‘Clean’ narratives are an artefact of inappropriate pressures and the culture they have generated. We strongly support authors in their efforts to be transparent about what they did and what they found, and we commit to publishing work that is robust, transparent and appropriately presented, even if it does not yield ‘clean’ narratives.?

Published online: 21 January 2020 htthttps://doi.org/10.1038/s41562-020-0818-9

Mental models for conservation research and practice


Conservation Letters, February, 2019. Katie Moon, Angela M. Guerrero, Vanessa. M. Adams, Duan Biggs, Deborah A. Blackman, Luke Craven, Helen Dickinson, Helen Ross
https://conbio.onlinelibrary.wiley.com/doi/epdf/10.1111/conl.12642

Abstract: Conservation practice requires an understanding of complex social-ecological processes of a system and the different meanings and values that people attach to them. Mental models research offers a suite of methods that can be used to reveal these understandings and how they might affect conservation outcomes. Mental models are representations in people’s minds of how parts of the world work. We seek to demonstrate their value to conservation and assist practitioners and researchers in navigating the choices of methods available to elicit them. We begin by explaining some of the dominant applications of mental models in conservation: revealing individual assumptions about a system, developing a stakeholder-based model of the system, and creating a shared pathway to conservation. We then provide a framework to “walkthrough” the stepwise decisions in mental models research, with a focus on diagram-based methods. Finally, we discuss some of the limitations of mental models research and application that are important to consider. This work extends the use of mental models research in improving our ability to understand social-ecological systems, creating a powerful set of tools to inform and shape conservation initiatives.

Our paper aims to assist researchers and practitioners to navigate the choices available in mental models research methods. The paper is structured into three sections. The first section explores some of the dominant applications and thus value of mental models for conservation research and practice. The second section provides a “walk through” of the step-wise decisions that can be useful when engaging in mental models research, with a focus on diagram-based methods. We present a framework to assist in this “walk through,” which adopts a pragmatist perspective. This perspective focuses on the most appropriate strategies to understand and resolve problems, rather than holding to a firm philosophical position (e.g., Sil & Katzenstein, 2010). The third section discusses some of the limitations of mental models research and application.

1 INTRODUCTION

2 THE ROLE FOR MENTAL MODELS I N CO N S E RVAT I O N

2.1 Revealing individual assumptions about a system

2 .2 Developing a stakeholder-based model of the system

2.3 Creating a shared pathway to conservation

3 THE TYPE OF MENTAL MODEL NEEDED

4 ELICITING OR DEVELOPING CONCEPTS AND OBJECTS

5 MODELING RELATIONSHIPS WITHIN MENTAL MODELS

5.1 Mapping qualitative relationships

5.2 Quantifying qualitative relationships

5.3 Analyzing systems based on mental models

6 COMPARING MENTAL MODELS

7 LIMITATIONS OF MENTAL MODELS RESEARCH FOR CONSERVATION POLICY AND PRACTICE

8 ADVANCING MENTAL MODELS FOR CONSERVAT I ON

Computational Modelling of Public Policy: Reflections on Practice

Gilbert G, Ahrweiler P, Barbrook-Johnson P, et al. (2018) Computational Modelling of Public Policy: Reflections on Practice. Journal of Artificial Societies and Social Simulation 21: 1–14. pdf copy available

Abstract: Computational models are increasingly being used to assist in developing, implementing and evaluating public policy. This paper reports on the experience of the authors in designing and using computational models of public policy (‘policy models’, for short). The paper considers the role of computational models in policy making, and some of the challenges that need to be overcome if policy models are to make an effective contribution. It suggests that policy models can have an important place in the policy process because they could allow policy makers to experiment in a virtual world, and have many advantages compared with randomised control trials and policy pilots. The paper then summarises some general lessons that can be extracted from the authors’ experience with policy modelling. These general lessons include the observation that ofen the main benefit of designing and using a model is that it provides an understanding of the policy domain, rather than the numbers it generates; that care needs to be taken that models are designed at an appropriate level of abstraction; that although appropriate data for calibration and validation may sometimes be in short supply, modelling is ofen still valuable; that modelling collaboratively and involving a range of stakeholders from the outset increases the likelihood that the model will be used and will be fit for purpose; that attention needs to be paid to effective communication between modellers and stakeholders; and that modelling for public policy involves ethical issues that need careful consideration. The paper concludes that policy modelling will continue to grow in importance as a component of public policy making processes, but if its potential is to be fully realised, there will need to be a melding of the cultures of computational modelling and policy making.

Selected quotes: For these reasons, the ability to make ‘point predictions’, i.e. forecasts of specific values at a specific time in the future, is rarely possible. More possible is a prediction that some event will or will not take place, or qualitative statements about the type or direction of change of values. Understanding what sort of unexpected outcomes
can emerge and something of the nature of how these arise also helps design policies that can be responsive to unexpected outcomes when they do arise. It can be particularly helpful in changing environments to use the model to explore what might happen under a range of possible, but dfferent, potential futures – without any commitment about which of these may eventually transpire. Even more valuable is a finding that the model shows that certain outcomes could not be achieved given the assumptions of the model. An example of this is the use of a whole system energy model to develop scenarios that meet the decarbonisation goals set by the EU for 2050 (see, for example, RAENG 2015.)

Rick Davies comment: A concise and very informative summary with many useful references. Definitely worth reading! I like the big emphasis on the need for ongoing collaboration and communication between model developers and their clients and other model stakeholders However, I would have liked to see some discussion of the pros and cons of different approaches to modeling e.g. agent-based models vs Fuzzy Cognitive Mapping and other approaches. Not just examples of different modelling applications, useful as they were.

See also: Uprichard, E and Penn, A (2016) Dependency Models: A CECAN Evaluation and Policy Practice Note for policy analysts and evaluators. CECAN. Available at: https://www.cecan.ac.uk/sites/default/files/2018-01/EMMA%20PPN%20v1.0.pdf (accessed 6 June 2018).

Wiki Surveys: Open and Quantifiable Social Data Collection

by Matthew J. Salganik, Karen E. C. Levy, PLOS
Published: May 20, 2015 https://doi.org/10.1371/journal.pone.0123483

Abstract: In the social sciences, there is a longstanding tension between data collection methods that facilitate quantification and those that are open to unanticipated information. Advances in technology now enable new, hybrid methods that combine some of the benefits of both approaches. Drawing inspiration from online information aggregation systems like Wikipedia and from traditional survey research, we propose a new class of research instruments called wiki surveys. Just as Wikipedia evolves over time based on contributions from participants, we envision an evolving survey driven by contributions from respondents. We develop three general principles that underlie wiki surveys: they should be greedy, collaborative, and adaptive. Building on these principles, we develop methods for data collection and data analysis for one type of wiki survey, a pairwise wiki survey. Using two proof-of-concept case studies involving our free and open-source website www.allourideas.org, we show that pairwise wiki surveys can yield insights that would be difficult to obtain with other methods.

Also explained in detail in this Vimeo video: https://vimeo.com/51369546

%d bloggers like this: