Monitoring and Evaluation NEWS – Page 3 – A news service focusing on developments in monitoring and evaluation methods relevant to development programmes with social development objectives. Managed by Rick Davies, since 1997

On the usefulness of deliberate (but bounded) randomness in decision making

An introduction

In many spheres of human activity, relevant information may be hard to find, and it may be of variable quality. Human capacities to objectively assess that information may also be limited and variable. Extreme cases may be easy to assess e.g projects or research that is definitely worth/not worth funding or papers that are definitely worth/not worth publishing. But in between these extremes there may be substantial uncertainty and thus room for tacit assumptions and unrecognised biases to influence judgements. In some fields the size of this zone of uncertainty may be quite big (see Adam, 2019 below), so the consequences at stake can be substantial. This is the territory where a number of recent papers have argued that an explicitly random decision making process may be the best approach to take.

After you have scanned the references below, continue on to some musings about implications for how we think about complexity

The literature (a sample)

- Nesta (2020) Why randomise funding? How randomisation can improve the diversity of ideas
- Osterloh, M., & Frey, B. S. (2020, March 9). To ensure the quality of peer reviewed research introduce randomness. Impact of Social Sciences. https://blogs.lse.ac.uk/impactofsocialsciences/2020/03/09/to-ensure-the-quality-of-peer-reviewed-research-introduce-randomness/
  - Why random selection of contributions to which the referees do not agree? This procedure reduces the “conservative bias”, i.e. the bias against unconventional ideas. Where there is uncertainty over the quality of a contribution, referees have little evidence to draw on in order to make accurate evaluations. However, unconventional ideas may well yield high returns in the future. Under these circumstances a randomised choice among the unorthodox contributions is advantageous.
  - …two [possible] types of error: type I errors (“reject errors”) implying that a correct hypothesis is rejected, and type 2 errors implying that a false hypothesis is accepted (“accept errors”). The former matters more than the latter. “Reject errors” stop promising new ideas, sometimes for a long time, while “accept errors” lead to a waste of money, but may be detected soon once published. This is the reason why it is more difficult to identify “reject errors” than “accept errors”. Through randomisation the risks of “reject errors” are diversified.

Osterloh, M., & Frey, B. S. (2020). How to avoid borrowed plumes in academia. Research Policy, 49(1), 103831. https://doi.org/10.1016/j.respol.2019.103831 Abstract: Publications in top journals today have a powerful influence on ac

Liu, M., Choy, V., Clarke, P., Barnett, A., Blakely, T., & Pomeroy, L. (2020). The acceptability of using a lottery to allocate research funding: A survey of applicants. Research Integrity and Peer Review, 5(1), 3. https://doi.org/10.1186/s41073-019-0089-z
- Background: The Health Research Council of New Zealand is the first major government funding agency to use a lottery to allocate research funding for their Explorer Grant scheme. … the Health Research Council of New Zealand wanted to hear from applicants about the acceptability of the randomisation process and anonymity of applicants. … The survey asked about the acceptability of using a lottery and if the lottery meant researchers took a different approach to their application. Results:… There was agreement that randomisation is an acceptable method for allocating Explorer Grant funds with 63% (n = 79) in favour and 25% (n = 32) against. There was less support for allocating funds randomly for other grant types with only 40% (n = 50) in favour and 37% (n = 46) against. Support for a lottery was higher amongst those that had won funding. Multiple respondents stated that they supported a lottery when ineligible applications had been excluded and outstanding applications funded, so that the remaining applications were truly equal. Most applicants reported that the lottery did not change the time they spent preparing their application. Conclusions: The Health Research Council’s experience through the Explorer Grant scheme supports further uptake of a modified lottery.

Roumbanis, L. (2019). Peer Review or Lottery? A Critical Analysis of Two Different Forms of Decision-making Mechanisms for Allocation of Research Grants. Science, Technology, & Human Values, 44(6), 994–1019. https://doi.org/10.1177/0162243918822744

Adam, D. (2019). Science funders gamble on grant lotteries.A growing number of research agencies are assigning money randomly. Nature, 575(7784), 574–575. https://doi.org/10.1038/d41586-019-03572-7
- ….says that existing selection processes are inefficient. Scientists have to prepare lengthy applications, many of which are never funded, and assessment panels spend most of their time sorting out the specific order in which to place mid-ranking ideas. Low and high quality applications are easy to rank, she says. “But most applications are in the midfield, which is very big
- The fund tells applicants how far they got in the process, and feedback from them has been positive, he says. “Those that got into the ballot and miss out don’t feel as disappointed. They know they were good enough to get funded and take it as the luck of the draw.”

- Fang, F. C., & Casadevall, A. (2016). Research Funding: The Case for a Modified Lottery. MBio, 7(2).
  - ABSTRACT The time-honored mechanism of allocating funds based on ranking of proposals by scienti?c peer review is no longer effective, because review panels cannot accurately stratify proposals to identify the most meritorious ones. Bias has a major in?uence on funding decisions, and the impact of reviewer bias is magni?ed by low funding paylines. Despite more than a decade of funding crisis, there has been no fundamental reform in the mechanism for funding research. This essay explores the idea of awarding research funds on the basis of a modi?ed lottery in which peer review is used to identify the most meritorious proposals, from which funded applications are selected by lottery. We suggest that a modi?ed lottery for research fund allocation would have many advantages over the current system, including reducing bias and improving grantee diversity with regard to seniority, race, and gender.

Avin, S (2015) Breaking the grant cycle: on the rational allocation of public resources to scientific research projects
- Abstract: The thesis presents a reformative criticism of science funding by peer review. The criticism is based on epistemological scepticism, regarding the ability of scientific peers, or any other agent, to have access to sufficient information regarding the potential of proposed projects at the time of funding. The scepticism is based on the complexity of factors contributing to the merit of scientific projects, and the rate at which the parameters of this complex system change their values. By constructing models of different science funding mechanisms, a construction supported by historical evidence, computational simulations show that in a significant subset of cases it would be better to select research projects by a lottery mechanism than by selection based on peer review. This last result is used to create a template for an alternative funding mechanism that combines the merits of peer review with the benefits of random allocation, while noting that this alternative is not so far removed from current practice as may first appear.
Schulson, M. (2014). If you can’t choose wisely, choose randomly. Aeon. A quick review of known instances of the use of randomness across different cultures, nationalities and periods of history
Casadevall, F. C. F. A. (2014, April 14). Taking the Powerball Approach to Funding Medical Research. Wall Street Journal.

Stone, P. (2011). The Luck of the Draw: The Role of Lotteries in Decision Making. In The Luck of the Draw: The Role of Lotteries in Decision Making.
- From the earliest times, people have used lotteries to make decisions–by drawing straws, tossing coins, picking names out of hats, and so on. We use lotteries to place citizens on juries, draft men into armies, assign students to schools, and even on very rare occasions, select lifeboat survivors to be eaten. Lotteries make a great deal of sense in all of these cases, and yet there is something absurd about them. Largely, this is because lottery-based decisions are not based upon reasons. In fact, lotteries actively prevent reason from playing a role in decision making at all. Over the years, people have devoted considerable effort to solving this paradox and thinking about the legitimacy of lotteries as a whole. However, these scholars have mainly focused on lotteries on a case-by-case basis, not as a part of a comprehensive political theory of lotteries. In The Luck of the Draw, Peter Stone surveys the variety of arguments proffered for and against lotteries and argues that they only have one true effect relevant to decision making: the “sanitizing effect” of preventing decisions from being made on the basis of reasons. While this rationale might sound strange to us, Stone contends that in many instances, it is vital that decisions be made without the use of reasons. By developing innovative principles for the use of lottery-based decision making, Stone lays a foundation for understanding when it is–and when it is not–appropriate to draw lots when making political decisions both large and small

Randomness in other species

- Drew, L. (2020). Random Search Wired Into Animals May Help Them Hunt. Quanta Magazine. Retrieved 2 February 2021, from https://www.quantamagazine.org/random-search-wired-into-animals-may-help-them-hunt-20200611/
  - - Of special interest here is the description of Levy walks, a variety of randomised movement where the frequency distribution of distances moved has one long tail. Levy walks have been the subject of exploration across multiple disciples, as seen in…

- Reynolds, A. M. (2018). Current status and future directions of Lévy walk research. Biology Open, 7(1). https://doi.org/10.1242/bio.030106
  - - Levy walks are specialised forms of random walks composed of clusters of multiple short steps with longer steps between them…. They are particularly advantageous when searching in uncertain or dynamic environments where the spatial scales of searching patterns cannot be tuned to target distributions…Nature repeatedly reveals the limits of our imagination. Lévy walks once thought to be the preserve of probabilistic foragers have now been identified in the movement patterns of human hunter-gatherers

Levy walk random versus Brownian motion random movement

Implications for thinking about complexity

Uncertainty of future states is a common characteristic of many complex systems, though not unique to these. One strategy that human organisations can use to deal with uncertainty is to build up capital reserves, thus enhancing longer term resilience albeit at the cost of more immediate efficiencies. From the first set of papers referenced above, it seems like the deliberate and bounded use of randomness could provide a useful second option. The work being done on Levy walks also suggests that there are interesting variations on randomisation that should be explored. It is already the case the designers of search/opitimisation algorithms have headed this way. If you are interested, you can read further on the subject of what are called “Levy Flight ” algorithms.

On a more light hearted note, I would be interested to hear from the Cynefin school on how comfortable they would be marketing this approach to “managing” uncertainty to the managers and leaders they seem keen to engage with.

Another thought…years ago I did an analysis of data that had been collected on development projects that had been funded by the then DFID’s funded Civil Society Challenge Fund. This included data on project proposals, proposal assessments, and project outcomes. I used Rapid Miner Studio’s Decision Tree module to develop predictive models of achievement ratings of the funded projects. Somewhat disappointingly, I failed to identify any attributes of project proposals, or how they had been initially assessed, which were good predictors of the subsequent performance of those projects. There are number of possible reasons why this might so. One of which may be the scale of the uncertainty gap between the evident likely failures and the evident likely successes. Various biases may have skewed judgements within this zone in a way that undermined the longer term predictive use of the proposal screening and approval process. Somewhat paradoxically, if instead a lottery mechanism had been used for selecting fundable proposals in the uncertainty zone this may well have led to the approval process being a better predictor eventual project performance.

Postscript: Subsequent finds…

The Powerball Revolution. By Malcom Gladwell (n.d.). Revisionist History Season 5 Episode 3. Retrieved 7 April 2021, from http://revisionisthistory.com/episodes/44-the-powerball-revolution
- On school student council lotteries in Bolivia
  - “Running for an office” and “Running an office” can be two very different things. Lotteries diminish the former and put the focus on the latter
  - “Its a more diverse group” that end up on the council, compared to those selected via election
  - “Nobody knows anything” -initial impressions of capacity are often not good predictors of leadership capacity. Contra assumption that voters can be good predictors of capacity in office.
- Medical research grant review and selection
  - Review scores of proposals are poor predictors of influential and innovative research (based on citation analysis), but has been in use for decades.
- A boarding school in New Jersey

Mapping the Standards of Evidence used in UK social policy.

Puttick, R. (2018). Mapping the Standards of Evidence used in UK social policy. Alliance for Useful Evidence.

“Our analysis focuses on 18 frameworks used by 16 UK organisations for judging evidence used in UK domestic social policy which are relevant to government, charities, and public service providers.

In summary:

• There has been a rapid proliferation of standards of evidence and other evidence frameworks since 2000. This is a very positive development and reflects the increasing sophistication of how evidence is generated and used in social policy.
• There are common principles underpinning them, particularly the shared goal of improving decision-making, but they often ask different questions, are engaging different audiences, generate different content, and have varying uses. This variance reflects the host organisation’s goals, which can be to inform its funding decisions, to make recommendations to the wider field, or to provide a resource for providers to help them evaluate.
• It may be expected that all evidence frameworks assess whether an intervention is working, but this is not always the case, with some frameworks assessing the quality of evidence, not the success of the intervention itself.
• The differences between the standards of evidence are often for practical reasons and reflect the host organisation’s goals. However, there is a need to consider more philosophical and theoretical tensions about what constitutes good evidence. We identified examples of different organisations reaching different conclusions about the same intervention; one thought it worked well, and the other was less confident. This is a problem: Who is right? Does the intervention work, or not? As the field develops, it is crucial that confusion and disagreement is minimised.
• One suggested response to minimise confusion is to develop a single set of standards of evidence. Although this sounds inherently sensible, our research has identified several major challenges which would need to be overcome to achieve this.
• We propose that the creation of a single set of standards of evidence is considered in greater depth through engagement with both those using standards of evidence, and those being assessed against them. This engagement would also help share learning and insights to ensure that standards of evidence are effectively achieving their goals.

Computational Modelling: Technological Futures

Council for Science and Technology & Government Office for Science, 2020. Available as pdf

Not the most thrilling/enticing title, but differently of interest. Chapter 3 provides a good overview of different ways of building models. Well worth a read, and definitely readable.

Recommendation 2: Decision-makers need to be intelligent customers for models, and those that supply models should provide appropriate
guidance to model users to support proper use and interpretation. This includes providing suitable model documentation detailing the model purpose, assumptions, sensitivities, and limitations, and evidence of appropriate quality assurance.

Chapters 1-3

The Alignment Problem: Machine Learning and Human Values

By Brian Christian. 334 pages. 2020 Norton. Author’s web page here

Brian Christian talking about his book on YouTube

RD comment: This is one of the most interesting and informative books I have read in the last few years. Totally relevant for evaluators thinking about the present and about future trends

Releasing the power of digital data for development. A guide to new opportunities

Releasing the power of digital data for development: A guide to new opportunities. (2020). Frontier Technologies, UKAID, NIRAS.

Available online here: https://datafutures.org/knowledge-products/frontier-data-study-insights-and-guidance-about-how-to-use-digital-data-to-support-the-sdgs/

Contents

Section 1 Executive Summary
Section 2 Introduction
Section 3 Understanding and navigating the new data landscape
Section 4 What is needed to release the new potential?
Section 5 Further considerations
Appendix 1: Data opportunities potentially useful now in testing environments
Appendix 2: Bibliography and further reading
Appendix 3: Methodological notes

Executive Summary

There are 8 conclusions we discuss in this report.

1. There is justified excitement and proven benefits in the use of new digital data sources, particularly where timeliness of data is important or there are persistent gaps in traditional data sources. This might include data from fragile and conflict-affected states, data supporting decision-making about marginalised population groups, or in finding solutions to address persistent ethical issues where traditional sources have not proved adequate.

2. In many cases, improvements in and greater access to traditional data sources could be more effective than just new data alone, including developing traditional data in tandem with new data sources. This includes innovations in digitising traditional data sources, supporting the sharing of data between and within organisations, and integrating the use of new data sources with traditional data.

3. Decision-making around the use of new data sources should be highly devolved by empowering individual staff and be focused on multiple dimensions of data quality, not least because there are no “one size fits all” rules that determine how new digital data sources fit to specific needs, subject matters or geographies. This could be supported by ensuring:
a. Research, innovation, and technical support are highly demand-led, driven by specific data user needs in specific contexts; and
b. Staff have accessible guidance that demystifies the complexities of new data sources, clarifies the benefits and risks that need to be managed, and allows them to be ‘data brokers’ confident in navigating the new data landscape, innovating in it, and coordinating the technical expertise of others.

The main report includes a description of the evidence and conclusions in a way that supports these aims, including a set of guides for staff about the most promising new data sources.

4. Where traditional data sources are failing to provide the detailed data needed, most new data sources provide a potential route to helping with the Agenda 2030 goal to ‘leave no-one behind,’ as often they can provide additional granularity on population sub-groups. But, to avoid harming the interests of marginalised groups, strong ethical frameworks are needed, and affected people should be involved in decisionmaking about how data is processed and used. Action is also required to ensure strong data protection environments according to each type of new data and the contexts of its use.

5. New data sources with the highest potential added value for exploitation now, especially when combined with each other or traditional data sources, were found to be:
a. data from Earth Observation (EO) platforms (including satellites and drones)
b. passive location data from mobile phones

6. While there are specific limitations and risks in different circumstances, each of these data sources provides for significant gains in certain dimensions of data quality compared to some traditional sources and other new data sources. The use of Artificial Intelligence (AI) techniques, such as through machine learning, has high potential to add value to digital datasets in terms of improving aspects of data quality from many different sources, such as social media data, and particularly with large complex datasets and across multiple data sources.

7. Beyond the current time horizon, the most potential for emerging data sources is likely to come from:
• The next generation of Artificial Intelligence
• The next generation of Earth Observation platforms
• Privacy Preserving Data Sharing (PPDS) via the Cloud and
• the Internet of Things (IoT).
No significant other data sources, technologies or techniques were found with high potential to benefit FCDO’s work, which seems to be in line with its current research agenda and innovative activities. Some longer-term data prospects have been identified and these could be monitored to observe increases in their potential in the future.

8. Several other factors are relevant to the optimal use of digital data sources which should be investigated and/or work in these areas maintained. These include important internal and external corporate developments, importantly including continued support to Open Data/ data sharing and enhanced data security systems to underpin it, learning across disciplinary boundaries with official statistics principles at the core, and continued support to capacity-building of national statistical systems in developing countries in traditional data and data innovation.

Calling Bullshit: THE ART OF SKEPTICISM IN A DATA-DRIVEN WORLD

Reviews

Wired review article

Guardian review article

Forbes review article

Kirkus Review article

Podcast Interview with the authors here

ABOUT CALLING BULLSHIT (=publisher blurb)
“Bullshit isn’t what it used to be. Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.

Misinformation, disinformation, and fake news abound and it’s increasingly difficult to know what’s true. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don’t feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.

You don’t need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.

We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.”

Evaluation Failures: 22 Tales of Mistakes Made and Lessons Learned

Edited by: Kylie Hutchinson – Community Solutions, Vancouver, Canada. 2018 Published by Sage. https://us.sagepub.com/en-us/nam/evaluation-failures/book260109

But $30 for 184-page paperback is going to limit its appeal! The electronic version is similarly expensive, more like the cost of a hardback. Fortunately, two example chapters (1 and 8) are available as free pdfs, see below. Reading those two chapters makes me think the rest of the book would also be well worthwhile reading. It is not ofter you see anything written at length about evaluation failures. Perhaps we should set up an online-confessional, where we can line up to anonymously confess our un/professional sins. I will certainly be one of those needing to join such a queue! :)

PART I. MANAGE THE EVALUATION

Chapter 1. It’s Not Me, It’s You: The Value of Addressing Conflict Head On

Chapter 2. The Scope Creep Train Wreck: How Responsive Evaluation Can Go Off the Rails

Chapter 3. The Buffalo Jump: Lessons After the Fall

Chapter 4. Evaluator Self-Evaluation: When Self-Flagellation Is Not Enough

PART II. ENGAGE STAKEHOLDERS

Chapter 5. That Alien Feeling: Engaging All Stakeholders in the Universe

Chapter 6. Seeds of Failure: How the Evaluation of a West African

Chapter 7. I Didn’t Know I Would Be a Tightrope Walker Someday: Balancing Evaluator Responsiveness and Independence

Chapter 8. When National Pride Is Beyond Facts: Navigating Conflicting Stakeholder Requirements

PART III. BUILD EVALUATION CAPACITY

Chapter 9. Stars in Our Eyes: What Happens When Things Are Too Good to Be True

PART IV. DESCRIBE THE PROGRAM

Chapter 10. A “Failed” Logic Model: How I Learned to Connect With All Stakeholders

Chapter 11. Lost Without You: A Lesson in System Mapping and Engaging Stakeholders

PART V. FOCUS THE EVALUATION DESIGN

Chapter 12. You Got to Know When to Hold ’Em: An Evaluation That Went From Bad to Worse

Chapter 13. The Evaluation From Hell: When Evaluators and Clients Don’t Quite Fit

PART VI. GATHER CREDIBLE EVIDENCE

Chapter 14. The Best Laid Plans of Mice and Evaluators: Dealing With Data Collection Surprises in the Field

Chapter 15. Are You My Amigo, or My Chero? The Importance of Cultural Competence in Data Collection and Evaluation

Chapter 16. OMG, Why Can’t We Get the Data? A Lesson in Managing Evaluation Expectations

Chapter 17. No, Actually, This Project Has to Stop Now: Learning When to Pull the Plug

Chapter 18. Missing in Action: How Assumptions, Language, History, and Soft Skills Influenced a Cross-Cultural Participatory Evaluation

PART VII. JUSTIFY CONCLUSIONS

Chapter 19. “This Is Highly Illogical”: How a Spock Evaluator Learns That Context and Mixed Methods Are Everything

Chapter 20. The Ripple That Became a Splash: The Importance of Context and Why I Now Do Data Parties

Chapter 21. The Voldemort Evaluation: How I Learned to Survive Organizational Dysfunction, Confusion, and Distrust

PART VIII. REPORT AND ENSURE USE

Chapter 22. The Only Way Out Is Through

Conclusion

Free Coursera online course: Qualitative Comparative Analysis (QCA)

Highly recommended! A well organised and very clear and systematic exposition. Available at: https://www.coursera.org/learn/qualitative-comparative-analysis

About this Course

Welcome to this massive open online course (MOOC) about Qualitative Comparative Analysis (QCA). Please read the points below before you start the course. This will help you prepare well for the course and attend it properly. It will also help you determine if the course offers the knowledge and skills you are looking for.

What can you do with QCA?

QCA is a comparative method that is mainly used in the social sciences for the assessment of cause-effect relations (i.e. causation).
QCA is relevant for researchers who normally work with qualitative methods and are looking for a more systematic way of comparing and assessing cases.
QCA is also useful for quantitative researchers who like to assess alternative (more complex) aspects of causation, such as how factors work together in producing an effect.
QCA can be used for the analysis of cases on all levels: macro (e.g. countries), meso (e.g. organizations) and micro (e.g. individuals).
QCA is mostly used for research of small- and medium-sized samples and populations (10-100 cases), but it can also be used for larger groups. Ideally, the number of cases is at least 10.
QCA cannot be used if you are doing an in-depth study of one case

What will you learn in this course?

The course is designed for people who have no or little experience with QCA.
After the course you will understand the methodological foundations of QCA.
After the course you will know how to conduct a basic QCA study by yourself.

How is this course organized?

The MOOC takes five weeks. The specific learning objectives and activities per week are mentioned in appendix A of the course guide. Please find the course guide under Resources in the main menu.
The learning objectives with regard to understanding the foundations of QCA and practically conducting a QCA study are pursued throughout the course. However, week 1 focuses more on the general analytic foundations, and weeks 2 to 5 are more about the practical aspects of a QCA study.
The activities of the course include watching the videos, consulting supplementary material where necessary, and doing assignments. The activities should be done in that order: first watch the videos; then consult supplementary material (if desired) for more details and examples; then do the assignments. • There are 10 assignments. Appendix A in the course guide states the estimated time needed to make the assignments and how the assignments are graded. Only assignments 1 to 6 and 8 are mandatory. These 7 mandatory assignments must be completed successfully to pass the course. • Making the assignments successfully is one condition for receiving a course certificate. Further information about receiving a course certificate can be found here: https://learner.coursera.help/hc/en-us/articles/209819053-Get-a-Course-Certificate

About the supplementary material

The course can be followed by watching the videos. It is not absolutely necessary yet recommended to study the supplementary reading material (as mentioned in the course guide) for further details and examples. Further, because some of the covered topics are quite technical (particularly topics in weeks 3 and 4 of the course), we provide several worked examples that supplement the videos by offering more specific illustrations and explanation. These worked examples can be found under Resources in the main menu. •
Note that the supplementary readings are mostly not freely available. Books have to be bought or might be available in a university library; journal publications have to be ordered online or are accessible via a university license. •
The textbook by Schneider and Wagemann (2012) functions as the primary reference for further information on the topics that are covered in the MOOC. Appendix A in the course guide mentions which chapters in that book can be consulted for which week of the course. •
The publication by Schneider and Wagemann (2012) is comprehensive and detailed, and covers almost all topics discussed in the MOOC. However, for further study, appendix A in the course guide also mentions some additional supplementary literature. •
Please find the full list of references for all citations (mentioned in this course guide, in the MOOC, and in the assignments) in appendix B of the course guide.

Fadi Hirzalla

Assistant Professor / Senior Lecturer

Erasmus Graduate School of Social Sciences (EGSH), Erasmus University Rotterdam

Five ways to ensure that models serve society: A manifesto

Saltelli, A., Bammer, G., Bruno, I., Charters, E., Fiore, M. D., Didier, E., Espeland, W. N., Kay, J., Piano, S. L., Mayo, D., Jr, R. P., Portaluri, T., Porter, T. M., Puy, A., Rafols, I., Ravetz, J. R., Reinert, E., Sarewitz, D., Stark, P. B., … Vineis, P. (2020). Five ways to ensure that models serve society: A manifesto. Nature, 582(7813), 482–484. https://doi.org/10.1038/d41586-020-01812-9

The five ways:

1. Mind the assumptions
  - “One way to mitigate these issues is to perform global uncertainty and sensitivity analyses. In practice, that means allowing all that is uncertain — variables, mathematical relationships and boundary conditions — to vary simultaneously as runs of the model produce its range of predictions. This often reveals that the uncertainty in predictions is substantially larger than originally asserted”
2. Mind the hubris
  - “Most modellers are aware that there is a tradeoff between the usefulness of a model and the breadth it tries to capture. But many are seduced by the idea of adding complexity in an attempt to capture reality more accurately. As modellers incorporate more phenomena, a model might fit better to the training data, but at a cost. Its predictions typically become less“
3. Mind the framing
  - “Match purpose and context. Results from models will at least partly reflect the interests, disciplinary orientations and biases of the developers. No one model can serve all purposes. accurate”
4. Mind the consequences
  - “Quantification can backfire. Excessive regard for producing numbers can push a discipline away from being roughly right towards being precisely wrong. Undiscriminating use of statistical tests can substitute for sound judgement. By helping to make risky financial products seem safe, models contributed to derailing the global economy in 2007–08 (ref. 5).”
5. Mind the unknowns
  - “Acknowledge ignorance. For most of the history of Western philosophy, self-awareness of ignorance was considered a virtue, the worthy object of intellectual pursuit”

“Ignore the five, and model predictions become Trojan horses for unstated
interests and values”

“Models’ assumptions and limitations must be appraised openly and honestly. Process and ethics matter as much as intellectual prowess”

“Mathematical models are a great way to explore questions. They are also a dangerous way to assert answers. Asking models for certainty or consensus is more a sign of the difficulties in making controversial decisions than it is a solution, and can invite ritualistic use of quantification”

Evaluating the Future

A blog posting and (summarising) podcast, produced for the EU Evaluation Support Services, by Rick Davies, June 2020

The podcast is available here, on the Capacity4Dev website

The blog posting full text is here as a pdf

- Limitations of common evaluative thinking
- Scenario planning
- Risk vs uncertainty
- Additional evaluation criteria
- Meaningful differences
- Other information sources