Monitoring and Evaluation NEWS

“Doing Good Better” by William Macaskill

https://effectivealtruism.org/doing-good-better
By the co-founder of the Effective Altruism movement. You can find and follow multiple EA groups on twitter, by searching for “Effective Altruism”, with an without the space between the two words.

Well worth reading. A good example of wide ranging applied evaluative thinking

Contents page

Book reviews

2015 The Guardian
2015 London Review of Books
2015 New York Times
GoodReads: 650 reviews, average rating 4.25 / 5

Techniques to Identify Themes (in text/interview data)

Ryan, G. W., & Bernard, H. R. (2003). Techniques to Identify Themes. Field Methods, 15(1), 85–109. https://doi.org/10.1177/1525822X02239569

Downloadable pdf available here (link may look broken but is is not) Recommended

Abstract: Theme identification is one of the most fundamental tasks in qualitative research. It also is one of the most mysterious. Explicit descriptions of theme discovery are rarely found in articles and reports, and when they are, they are often relegated to appendices or footnotes. Techniques are shared among small groups of social scientists, but sharing is impeded by disciplinary or epistemological boundaries. The techniques described here are drawn from across epistemological and disciplinary boundaries. They include both observational and manipulative techniques and range from quick word counts to laborious, in-depth, line-by-line scrutiny. Techniques are compared on six dimensions: (1) appropriateness for data types, (2) required labor, (3) required expertise, (4) stage of analysis, (5) number and types of themes to be generated, and (6) issues of reliability and validity.

Contents (as in headings used)

What is a theme
HOW DO YOU KNOW A THEME WHEN YOU SEE ONE?
WHERE DO THEMES COME FROM?
SCRUTINY TECHNIQUES—THINGS TO LOOK FOR
- Repetitions
- Indigenous Typologies or Categories
- Metaphors and Analogies
- Transitions
- Similarities and Differences
- Linguistic Connectors
- Missing Data
- Theory-Related Material
PROCESSING TECHNIQUES
- Cutting and Sorting
- Word Lists and Key Words in Context (KWIC)
- Word Co-Occurrence
- Metacoding
SELECTING AMONG TECHNIQUES
- Kind of Data
- Expertise
- Labor
- Number and Kinds of Themes
- Reliability and Validity
FURTHER RESEARCH
NOTES
REFERENCES

Defining the Agenda: Key Lessons for Funders and Commissioners of Ethical Research in Fragile and Conflict Affected Contexts

By Leslie Groves-Williams. Funded by UK Research and Innovation (UKRI) and developed in collaboration with UNICEF, Office of Research – Innocenti. A pdf copy is available online here

Publicised here because the issues and lessons identified also seem relevant to many evaluation activities

Text of the Introduction: The ethical issues that affect all research are amplified significantly in fragile and conflict-affected contexts. The power imbalances between local and international researchers are increased and the risk of harm is augmented within a context where safeguards are often reduced and the probabilities of unethical research that would be prohibited elsewhere are magnified. Funders and commissioners need to be confident that careful ethical scrutiny of the research process is conducted to mitigate risk, avoid potential harm and maximize the benefit of the commissioned research for affected populations, including through improving the quality and accuracy of data collected. The UKRI and UNICEF Ethical Research in Fragile and Conflict-Affected Contexts: Guidelines for Reviewers can support you to ensure that appropriate ethical scrutiny is taking place at review phase. But, what about mitigating for risks at the funding and commissioning phases? These phases are often not subject to ethical review yet carry strong ethical risks and opportunities. As a commissioner or a funder designing a call for research in fragile and conflict-affected contexts, how confident are you that you are commissioning the research in the most ethical way?

This document brings together some key lessons learned that provide guidance for funders and commissioners of research in fragile and conflict-affected contexts to ensure that ethical standards are applied, not just at the review stage, but also in formulating the research agenda. These lessons fall into four clusters:

1. Ethical Agenda Setting
2. Ethical Partnerships
3. Ethical Review
4. Ethical Resourcing.
In addition to highlighting the lessons, this paper provides mitigation strategies for funders and commissioners to explore as they seek to avoid the ethical risks highlighted

Algorithmic Impact Assessment – Three+ useful publications by Data & Society

In the movies, when a machine decides to be the boss — or humans let it — things go wrong. Yet despite myriad dystopian warnings, control by machines is fast becoming our reality. Photo: The Conversation / Shutterstock

As William Gibson famously said circa 1992 “The future is already here — it’s just not very evenly distributed” In 2021 the future is certainly here in the form of algorithms (rather than people) that manage low paid workers ( distribution centres, delivery services, etc), welfare service recipients and those caught up in the justice system. Plus anyone else having to deal with chatbots when trying to get through to other kinds of service providers. But is a counter-revolution brewing? Read on…

Selected quotes

“Algorithmic accountability is the process of assigning responsibility for harm when algorithmic decision-making results in discriminatory and inequitable outcomes”

“Among many applications, algorithms are used to:

• Sort résumés for job applications;
• Allocate social services;
• Decide who sees advertisements for open positions, housing, and products;
• Decide who should be promoted or fired;
• Estimate a person’s risk of committing crimes or the length of a prison term;
• Assess and allocate insurance and benefits;
• Obtain and determine credit; and
• Rank and curate news and information in search engines.”

“Algorithmic systems present a special challenge to assessors, because the harms of these systems are unevenly distributed, emerge only after they are integrated into society, or are often only visible in the aggregate”

“What our research indicates is that the risk of self-regulation lies not so much in a corrupted reporting and assessment process, but in the capacity of industry to define the methods and metrics used to measure the impact of proposed systems”

Algorithmic Accountability: A Primer. Data & S0ciety. Caplan, R., Donovan, J., Hanson, L., & Matthews, J. (2018). 26 pages

CONTENTS

What Is an Algorithm?
How Are Algorithms Used to Make Decisions?
Example: Racial Bias in Algorithms of Incarceration
Complications with Algorithmic Systems
• Fairness and Bias
• Opacity and Transparency
• Repurposing Data and
Repurposing Algorithms
• Lack of Standards for Auditing
• Power and Control
• Trust and Expertise
What is Algorithmic Accountability?
• Auditing by Journalists
• Enforcement and Regulation

Assembling accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society. Moss, E., Watkins, E. A., Singh, R., Elish, M. C., & Jacob Metcalf. (2021).

- The 6 page Policy brief
- The 56 page full report

In summary: The Algorithmic Impact Assessment is a new concept for regulating algorithmic systems and protecting the public interest. Assembling Accountability: Algorithmic Impact Assessment for the Public Interest is a report that maps the challenges of constructing algorithmic impact assessments (AIAs) and provides a framework for evaluating the effectiveness of current and proposed AIA regimes. This framework is a practical tool for regulators, advocates, public-interest technologists, technology companies, and critical scholars who are identifying, assessing, and acting upon algorithmic harms.

First, report authors Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf analyze the use of impact assessment in other domains, including finance, the environment, human rights, and privacy. Building on this comparative analysis, they then identify common components of existing impact assessment practices in order to provide a framework for evaluating current and proposed AIA regimes. The authors find that a singular, generalized model for AIAs would not be effective due to the variances of governing bodies, specific systems being evaluated, and the range of impacted communities.

After illustrating the novel decision points required for the development of effective AIAs, the report specifies ten necessary components that constitute robust impact assessment regimes.

CONTENTS

INTRODUCTION
What is an Impact?
What is Accountability?
What is Impact Assessment?
THE CONSTITUTIVE COMPONENTS OF IMPACT ASSESSMENT
Sources of Legitimacy
Actors and Forum
Catalyzing Event
Time Frame
Public Access
Public Consultation
Method
Assessors
Impacts
Harms and Redress
TOWARD ALGORITHMIC IMPACT ASSESSMENTS
Existing and Proposed AIA Regulations
Algorithmic Audits
External (Third and Second Party) Audits
Internal (First-Party) Technical Audits and
Governance Mechanisms
Sociotechnical Expertise
CONCLUSION:
GOVERNING WITH AIAs
ACKNOWLEDGMENTS

See also

Databite No. 145: Algorithmic Governance and the State of Impact Assessment in the EU, US, and Canada. (n.d.). Data & Society. Retrieved 18 December 2021,
Moss, E. (2021, June 29). Assembling Accountability, from the Ground Up. Medium.

Structured Analytic Techniques for Intelligence Analysis

This is the title of the 3rd edition of the same, by Randolph H. Pherson and Richards J. Heuer Jr, published by Sage in 2019

It is not cheap book, so I am not encouraging its purchase, but I am encouraging the perusal of its contents via the contents list and via Amazon’s “Look inside” facility.

Why so? The challenges facing intelligence analysts are especially difficult, so any methods used to address these may be of wider interest. These are spelled out in the Foreword, as follows:

This report is of interest in a number of ways:

To what extent are the challenges faced similar/different to those of evaluations of publicly visible interventions?
How different is the tool set, and the categorisation of the contents of that set?
How much research has gone into the development and testing of this tool set?

The challenges

Some of these challenges are also faced by evaluation teams working in more overt and less antagonistic settings, albeit to a lesser degree. For example, what will work in future in a slightly different settings (1), missing and ambiguous evidence (2), and with clients and other stakeholders who may intentionally or unintentionally not disclose or actually mislead (3) , and whose recommendations can affect peoples lives, positively and negatively (4).

The contents of the tool set

My first impression is that this book casts its net much wider than the average evaluation text (if there is such a thing). The families of methods include team working, organising, exploring, diagnosing, reframing, foresight, decision support, and more. Secondly, there are quite a few methods within these families I had not heard of before, including Bowtie analysis, opportunities incubator, morphological analysis, premortem analysis, deception detection and inconsistencies finder. The last two are of particular interest. Hopefully they are more than just a method brand name.

Research and testing

Worth looking at, alongside this publication, is this 17 page paper by Artner, S., Girven, R., & Bruce, J. (2016). Assessing the Value of Structured Analytic Techniques in the U.S. Intelligence Community. RAND Corporation. Its key findings are summarised as follows:

- The U.S. Intelligence Community does not systematically evaluate the effectiveness of structured analytic techniques, despite their increased use.
- One promising method of assessing these techniques would be to initiate qualitative reviews of their contribution in bodies of intelligence production on a variety of topics, in addition to interviews with authors, managers, and consumers.
- A RAND pilot study found that intelligence publications using these techniques generally addressed a broader range of potential outcomes and implications than did other analyses.
- Quantitative assessments correlating the use of structured techniques to measures of analytic quality, along with controlled experiments using these techniques, could provide a fuller picture of their contribution to intelligence analysis.

See also Chang, W., & Berdini, E. (2017). Restructuring Structured Analytic Techniques in Intelligence. For an interesting in-depth analysis of bias risks and how the are managed and possibly mismanaged. Here is the abstract:

Structured analytic techniques (SATs) are intended to improve intelligence analysis by checking the two canonical sources of error: systematic biases and random noise. Although both goals are achievable, no one knows how close the current generation of SATs comes to achieving either of them. We identify two root problems: (1) SATs treat bipolar biases as unipolar. As a result, we lack metrics for gauging possible over-shooting—and have no way of knowing when SATs that focus on suppressing one bias (e.g., over-confidence) are triggering the opposing bias (e.g., under-confidence); (2) SATs tacitly assume that problem decomposition (e.g., breaking reasoning into rows and columns of matrices corresponding to hypotheses and evidence) is a sound means of reducing noise in assessments. But no one has ever actually tested whether decomposition is adding or subtracting noise from the analytic process—and there are good reasons for suspecting that decomposition will, on balance, degrade the reliability of analytic judgment. The central shortcoming is that SATs have not been subject to sustained scientific of the sort that could reveal when they are helping or harming the cause of delivering accurate assessments of the world to the policy community.

Both sound like serious critiques, but compared to what? There are probably plenty of evaluation methods where the same criticism could be applied – no one has subjected them to serious evaluation.

An Institutional View of Algorithmic Impact Assessments

Selbst, A. (2021). An Institutional View of Algorithmic Impact Assessments. Harvard Journal of Law and Technology, 35(10), 78. The author has indicated that paper that can be downloaded has a “draft” status.

First some general points about its relevance:

Rich people get personalised one-to-one attention and services. Poor people get processed by algorithms. That may be a bit of a caricature, but there is also some truth there. Consider loan applications, bail applications, recruitment decisions, welfare payments. And perhaps medical diagnoses and treatments, depending to the source of service. There is therefore a good reason for any evaluators concerned with equity to pay close attention to how algorithms affect the lives of the poorest sections of societies.
This paper reminded me of the importance of impact assessments, as distinct from impact evaluations. The former are concerned with “effects-of-a-cause“, as distinct from the “causes-of-an-effect” , which is the focus of impact evaluations. In this paper impact assessment is specifically concerned about negative impacts, which is a narrower ambit than I have seen previously in my sphere of work. But complementary to the expectations of positive impact associated with impact evaluations. It may reflect the narrowness of my inhabited part of the evaluation world, but my feeling is that impact evaluations get way more attention than impact assessments. Yet once could argue that the default situation should be the reverse. Though I cant quite articulate my reasoning … I think it is something to do with the perception that most of the time the world acts on us, relative to us acting on the world.

Some selected quotes:

The impact assessment approach has two principal aims. The first goal is to get the people who build systems to think methodically about the details and potential impacts of a complex project before its implementation, and therefore head off risks before they become too costly to correct. As proponents of values-in-design have argued for decades, the earlier in project development that social values are considered, the more likely that the end result will reflect those social values. The second goal is to create and provide documentation of the decisions made in development and their rationales, which in turn can lead to better accountability for those decisions and useful information for future policy interventions (p.6)
1. This Article will argue in part that once filtered through the institutional logics of the private sector, the first goal of improving systems through better design will only be effective in those organizations motivated by social obligation rather than mere compliance, but second goal of producing information needed for better policy and public understanding is what really can make the AIA regime worthwhile (p.8)
Among all possible regulatory approaches, impact assessments are most useful where projects have unknown and hard-to-measure impacts on society, where the people creating the project and the ones with the knowledge and expertise to estimate its impacts have inadequate incentives to generate the needed information, and where the public has no other means to create that information. What is attractive about the AIA (Algorithmic Impact Assessment) is that we are now in exactly such a situation with respect to algorithmic harms. (p.7)
The Article proceeds in four parts. Part I introduces the AIA, and
explains why it is likely a useful approach….Part II briefly surveys different models of AIA that have been proposed as well as two alternatives: self-regulation and audits…Part III examines how institutional forces shape regulation and compliance, seeking to apply those lessons to the case of AIAs….Ultimately, the Part concludes that AIAs may not be
fully successful in their primary goal of getting individual firms to consider
social problems early, but that the second goal of policy-learning may well be
more successful because it does not require full substantive compliance. Finally, Part IV looks at what we can learn from the technical community. This part discusses many relevant developments within technology industry and scholarship: empirical research into how firms understand AI fairness and ethics, proposals for documentation standards coming from academic and industrial labs, trade groups, standards organizations, and various self-regulatory framework proposal.(p.9)

The revised UNEG Ethical Guidelines for Evaluations (2020)

The UNEG Ethical Guidelines for Evaluation were first published in 2008. This document is a revision of the original document and was approved at the UNEG AGM 2020. These revised guidelines are consistent with the standards of conduct in the Charter of the United Nations, the Staff Regulations and Rules of the United Nations, the Standards of Conduct for the International Civil Service, and in the Regulations Governing the Status, Basic Rights and Duties of Officials other than Secretariat. They are also consistent with the United Nations’ core values of Integrity, Professionalism and Respect for Diversity, the humanitarian principles of Humanity, Neutrality, Impartiality and Independence and the values enshrined in the Universal Declaration of Human Rights.

This document aims to support UN entity leaders and governing bodies as well as those organizing and conducting evaluations for the UN to ensure that an ethical lens informs day to day evaluation practice.

This document provides:

Four ethical principles for evaluation;
Tailored guidelines for entity leaders and governing bodies, evaluation organizers, and evaluation practitioners;
A detachable Pledge of Commitment to Ethical Conduct in Evaluation that all those involved in evaluations will be required to sign.

These guidelines are designed to be useful and applicable to all UN agencies, regardless of differences in mission (operational vs. normative agencies), in structures (centralized vs. decentralized), in the contexts for the work (development, peacekeeping, humanitarian) and in the nature of evaluations that are undertaken (oversight/accountability focused vs. learning).

“The Checklist Manifesto”, another perspective on managing the problem of extreme complexity

The Checklist Manifesto by Atul Gwande, 2009

Atul differentiates two types of problems that we face when dealing with extreme complexity. One is that of ignorance, there is a lot we simply don’t know. Unpredictability is a facet of complexity that many writers on the subject of complexity have given plenty of attention to, along with possible ways of managing that unpredictability. The other problem that Atul identifies is that of ineptitude. This is our inability to make good use of knowledge that is already available. He gives many examples where complex bodies of knowledge already exist that can make a big difference to people’s lives, notably in the field of medicine. But because of the very scale of those bodies of knowledge the reality is that people often are not cable of making full use of it and sometimes the consequences are disastrous. This facet of complexity is not something I’ve seen given very much attention to in the literature on complexity, at least that which I have come across. So I read this book with great interest, an interest magnified no doubt by my previous interest in, and experiments with, the use of weighted checklists, which are documented elsewhere on this website.

Another distinction that he makes is between task checklists and communication checklists. The first are all about avoiding dumb mistakes, forgetting to do things we should know that have to be done. The second is about coping with unexpected events, and the necessary characteristics of how we should cope by communicating relevant information to relevant people. He gives some interesting examples from the (big) building industry, where given the complexity of modern construction activities, and the extensive use of task checklists, there are still inevitably various unexpected hitches which have to be responded to effectively, without jeopardising the progress or safety of the construction process.

Some selected quotes:

Checklists helped ensure a higher standard of baseline performance.
Medicine has become the art of managing extreme complexity – and a test of whether such extreme complexity can, in fact, be humanely mastered”
Team work may just be hard in certain lines of work. Under conditions of extreme complexity, we inevitably rely on a division of tasks and expertise…But the evidence suggests that we need them to see their job not just as performing their isolated set of tasks well, but also helping the group get the best possible results
It is common to misconceived power checklists function in complex lines of work. They are not comprehensive how to guides whether for building a skyscraper or getting a plane out of trouble. They are quick and simple tools aimed to buttress the skills of expert professionals. And by remaining swift and usable and resolutely modest, they are saving thousands upon thousands of lives.
When you are making a checklist, you have a number of key decisions. You must define a clear pause point at which the checklist is supposed to be used (unless the moment is obvious, like when a warning light goes on or an engine fails) you must decide whether you want a do-confirm checklist or read-do checklist. With a do-confirm checklist team members perform their jobs from memory and experience, often separately. But then they stop. They paused to run the checklist and confirm that everything that was supposed to be done was done. With the read-do checklist, on the other hand, people carry out the task as they check them off, it’s more like a recipe. So for any new checklist created from scratch, you have to pick the type that makes the most sense of the situation.
We are obsessed in medicine with having great components – the best drugs, the best devices, the best specialists – but paid little attention to how to make them fit together well. Berwisk notes how wrongheaded this approach is ‘anyone who understands systems will know immediately that optimising part is not a great route to system excellent ‘he says.

I could go on, but I would rather keep reading the book… :-)

On the usefulness of deliberate (but bounded) randomness in decision making

An introduction

In many spheres of human activity, relevant information may be hard to find, and it may be of variable quality. Human capacities to objectively assess that information may also be limited and variable. Extreme cases may be easy to assess e.g projects or research that is definitely worth/not worth funding or papers that are definitely worth/not worth publishing. But in between these extremes there may be substantial uncertainty and thus room for tacit assumptions and unrecognised biases to influence judgements. In some fields the size of this zone of uncertainty may be quite big (see Adam, 2019 below), so the consequences at stake can be substantial. This is the territory where a number of recent papers have argued that an explicitly random decision making process may be the best approach to take.

After you have scanned the references below, continue on to some musings about implications for how we think about complexity

The literature (a sample)

- Nesta (2020) Why randomise funding? How randomisation can improve the diversity of ideas
- Osterloh, M., & Frey, B. S. (2020, March 9). To ensure the quality of peer reviewed research introduce randomness. Impact of Social Sciences. https://blogs.lse.ac.uk/impactofsocialsciences/2020/03/09/to-ensure-the-quality-of-peer-reviewed-research-introduce-randomness/
  - Why random selection of contributions to which the referees do not agree? This procedure reduces the “conservative bias”, i.e. the bias against unconventional ideas. Where there is uncertainty over the quality of a contribution, referees have little evidence to draw on in order to make accurate evaluations. However, unconventional ideas may well yield high returns in the future. Under these circumstances a randomised choice among the unorthodox contributions is advantageous.
  - …two [possible] types of error: type I errors (“reject errors”) implying that a correct hypothesis is rejected, and type 2 errors implying that a false hypothesis is accepted (“accept errors”). The former matters more than the latter. “Reject errors” stop promising new ideas, sometimes for a long time, while “accept errors” lead to a waste of money, but may be detected soon once published. This is the reason why it is more difficult to identify “reject errors” than “accept errors”. Through randomisation the risks of “reject errors” are diversified.

Osterloh, M., & Frey, B. S. (2020). How to avoid borrowed plumes in academia. Research Policy, 49(1), 103831. https://doi.org/10.1016/j.respol.2019.103831 Abstract: Publications in top journals today have a powerful influence on ac

Liu, M., Choy, V., Clarke, P., Barnett, A., Blakely, T., & Pomeroy, L. (2020). The acceptability of using a lottery to allocate research funding: A survey of applicants. Research Integrity and Peer Review, 5(1), 3. https://doi.org/10.1186/s41073-019-0089-z
- Background: The Health Research Council of New Zealand is the first major government funding agency to use a lottery to allocate research funding for their Explorer Grant scheme. … the Health Research Council of New Zealand wanted to hear from applicants about the acceptability of the randomisation process and anonymity of applicants. … The survey asked about the acceptability of using a lottery and if the lottery meant researchers took a different approach to their application. Results:… There was agreement that randomisation is an acceptable method for allocating Explorer Grant funds with 63% (n = 79) in favour and 25% (n = 32) against. There was less support for allocating funds randomly for other grant types with only 40% (n = 50) in favour and 37% (n = 46) against. Support for a lottery was higher amongst those that had won funding. Multiple respondents stated that they supported a lottery when ineligible applications had been excluded and outstanding applications funded, so that the remaining applications were truly equal. Most applicants reported that the lottery did not change the time they spent preparing their application. Conclusions: The Health Research Council’s experience through the Explorer Grant scheme supports further uptake of a modified lottery.

Roumbanis, L. (2019). Peer Review or Lottery? A Critical Analysis of Two Different Forms of Decision-making Mechanisms for Allocation of Research Grants. Science, Technology, & Human Values, 44(6), 994–1019. https://doi.org/10.1177/0162243918822744

Adam, D. (2019). Science funders gamble on grant lotteries.A growing number of research agencies are assigning money randomly. Nature, 575(7784), 574–575. https://doi.org/10.1038/d41586-019-03572-7
- ….says that existing selection processes are inefficient. Scientists have to prepare lengthy applications, many of which are never funded, and assessment panels spend most of their time sorting out the specific order in which to place mid-ranking ideas. Low and high quality applications are easy to rank, she says. “But most applications are in the midfield, which is very big
- The fund tells applicants how far they got in the process, and feedback from them has been positive, he says. “Those that got into the ballot and miss out don’t feel as disappointed. They know they were good enough to get funded and take it as the luck of the draw.”

- Fang, F. C., & Casadevall, A. (2016). Research Funding: The Case for a Modified Lottery. MBio, 7(2).
  - ABSTRACT The time-honored mechanism of allocating funds based on ranking of proposals by scienti?c peer review is no longer effective, because review panels cannot accurately stratify proposals to identify the most meritorious ones. Bias has a major in?uence on funding decisions, and the impact of reviewer bias is magni?ed by low funding paylines. Despite more than a decade of funding crisis, there has been no fundamental reform in the mechanism for funding research. This essay explores the idea of awarding research funds on the basis of a modi?ed lottery in which peer review is used to identify the most meritorious proposals, from which funded applications are selected by lottery. We suggest that a modi?ed lottery for research fund allocation would have many advantages over the current system, including reducing bias and improving grantee diversity with regard to seniority, race, and gender.

Avin, S (2015) Breaking the grant cycle: on the rational allocation of public resources to scientific research projects
- Abstract: The thesis presents a reformative criticism of science funding by peer review. The criticism is based on epistemological scepticism, regarding the ability of scientific peers, or any other agent, to have access to sufficient information regarding the potential of proposed projects at the time of funding. The scepticism is based on the complexity of factors contributing to the merit of scientific projects, and the rate at which the parameters of this complex system change their values. By constructing models of different science funding mechanisms, a construction supported by historical evidence, computational simulations show that in a significant subset of cases it would be better to select research projects by a lottery mechanism than by selection based on peer review. This last result is used to create a template for an alternative funding mechanism that combines the merits of peer review with the benefits of random allocation, while noting that this alternative is not so far removed from current practice as may first appear.
Schulson, M. (2014). If you can’t choose wisely, choose randomly. Aeon. A quick review of known instances of the use of randomness across different cultures, nationalities and periods of history
Casadevall, F. C. F. A. (2014, April 14). Taking the Powerball Approach to Funding Medical Research. Wall Street Journal.

Stone, P. (2011). The Luck of the Draw: The Role of Lotteries in Decision Making. In The Luck of the Draw: The Role of Lotteries in Decision Making.
- From the earliest times, people have used lotteries to make decisions–by drawing straws, tossing coins, picking names out of hats, and so on. We use lotteries to place citizens on juries, draft men into armies, assign students to schools, and even on very rare occasions, select lifeboat survivors to be eaten. Lotteries make a great deal of sense in all of these cases, and yet there is something absurd about them. Largely, this is because lottery-based decisions are not based upon reasons. In fact, lotteries actively prevent reason from playing a role in decision making at all. Over the years, people have devoted considerable effort to solving this paradox and thinking about the legitimacy of lotteries as a whole. However, these scholars have mainly focused on lotteries on a case-by-case basis, not as a part of a comprehensive political theory of lotteries. In The Luck of the Draw, Peter Stone surveys the variety of arguments proffered for and against lotteries and argues that they only have one true effect relevant to decision making: the “sanitizing effect” of preventing decisions from being made on the basis of reasons. While this rationale might sound strange to us, Stone contends that in many instances, it is vital that decisions be made without the use of reasons. By developing innovative principles for the use of lottery-based decision making, Stone lays a foundation for understanding when it is–and when it is not–appropriate to draw lots when making political decisions both large and small

Randomness in other species

- Drew, L. (2020). Random Search Wired Into Animals May Help Them Hunt. Quanta Magazine. Retrieved 2 February 2021, from https://www.quantamagazine.org/random-search-wired-into-animals-may-help-them-hunt-20200611/
  - - Of special interest here is the description of Levy walks, a variety of randomised movement where the frequency distribution of distances moved has one long tail. Levy walks have been the subject of exploration across multiple disciples, as seen in…

- Reynolds, A. M. (2018). Current status and future directions of Lévy walk research. Biology Open, 7(1). https://doi.org/10.1242/bio.030106
  - - Levy walks are specialised forms of random walks composed of clusters of multiple short steps with longer steps between them…. They are particularly advantageous when searching in uncertain or dynamic environments where the spatial scales of searching patterns cannot be tuned to target distributions…Nature repeatedly reveals the limits of our imagination. Lévy walks once thought to be the preserve of probabilistic foragers have now been identified in the movement patterns of human hunter-gatherers

Levy walk random versus Brownian motion random movement

Implications for thinking about complexity

Uncertainty of future states is a common characteristic of many complex systems, though not unique to these. One strategy that human organisations can use to deal with uncertainty is to build up capital reserves, thus enhancing longer term resilience albeit at the cost of more immediate efficiencies. From the first set of papers referenced above, it seems like the deliberate and bounded use of randomness could provide a useful second option. The work being done on Levy walks also suggests that there are interesting variations on randomisation that should be explored. It is already the case the designers of search/opitimisation algorithms have headed this way. If you are interested, you can read further on the subject of what are called “Levy Flight ” algorithms.

On a more light hearted note, I would be interested to hear from the Cynefin school on how comfortable they would be marketing this approach to “managing” uncertainty to the managers and leaders they seem keen to engage with.

Another thought…years ago I did an analysis of data that had been collected on development projects that had been funded by the then DFID’s funded Civil Society Challenge Fund. This included data on project proposals, proposal assessments, and project outcomes. I used Rapid Miner Studio’s Decision Tree module to develop predictive models of achievement ratings of the funded projects. Somewhat disappointingly, I failed to identify any attributes of project proposals, or how they had been initially assessed, which were good predictors of the subsequent performance of those projects. There are number of possible reasons why this might so. One of which may be the scale of the uncertainty gap between the evident likely failures and the evident likely successes. Various biases may have skewed judgements within this zone in a way that undermined the longer term predictive use of the proposal screening and approval process. Somewhat paradoxically, if instead a lottery mechanism had been used for selecting fundable proposals in the uncertainty zone this may well have led to the approval process being a better predictor eventual project performance.

Postscript: Subsequent finds…

The Powerball Revolution. By Malcom Gladwell (n.d.). Revisionist History Season 5 Episode 3. Retrieved 7 April 2021, from http://revisionisthistory.com/episodes/44-the-powerball-revolution
- On school student council lotteries in Bolivia
  - “Running for an office” and “Running an office” can be two very different things. Lotteries diminish the former and put the focus on the latter
  - “Its a more diverse group” that end up on the council, compared to those selected via election
  - “Nobody knows anything” -initial impressions of capacity are often not good predictors of leadership capacity. Contra assumption that voters can be good predictors of capacity in office.
- Medical research grant review and selection
  - Review scores of proposals are poor predictors of influential and innovative research (based on citation analysis), but has been in use for decades.
- A boarding school in New Jersey

Mapping the Standards of Evidence used in UK social policy.

Puttick, R. (2018). Mapping the Standards of Evidence used in UK social policy. Alliance for Useful Evidence.

“Our analysis focuses on 18 frameworks used by 16 UK organisations for judging evidence used in UK domestic social policy which are relevant to government, charities, and public service providers.

In summary:

• There has been a rapid proliferation of standards of evidence and other evidence frameworks since 2000. This is a very positive development and reflects the increasing sophistication of how evidence is generated and used in social policy.
• There are common principles underpinning them, particularly the shared goal of improving decision-making, but they often ask different questions, are engaging different audiences, generate different content, and have varying uses. This variance reflects the host organisation’s goals, which can be to inform its funding decisions, to make recommendations to the wider field, or to provide a resource for providers to help them evaluate.
• It may be expected that all evidence frameworks assess whether an intervention is working, but this is not always the case, with some frameworks assessing the quality of evidence, not the success of the intervention itself.
• The differences between the standards of evidence are often for practical reasons and reflect the host organisation’s goals. However, there is a need to consider more philosophical and theoretical tensions about what constitutes good evidence. We identified examples of different organisations reaching different conclusions about the same intervention; one thought it worked well, and the other was less confident. This is a problem: Who is right? Does the intervention work, or not? As the field develops, it is crucial that confusion and disagreement is minimised.
• One suggested response to minimise confusion is to develop a single set of standards of evidence. Although this sounds inherently sensible, our research has identified several major challenges which would need to be overcome to achieve this.
• We propose that the creation of a single set of standards of evidence is considered in greater depth through engagement with both those using standards of evidence, and those being assessed against them. This engagement would also help share learning and insights to ensure that standards of evidence are effectively achieving their goals.

“Doing Good Better” by William Macaskill

Like this:

Techniques to Identify Themes (in text/interview data)

Like this:

Defining the Agenda: Key Lessons for Funders and Commissioners of Ethical Research in Fragile and Conflict Affected Contexts

Like this:

Algorithmic Impact Assessment – Three+ useful publications by Data & Society

Like this:

Structured Analytic Techniques for Intelligence Analysis

Like this:

An Institutional View of Algorithmic Impact Assessments

Like this:

The revised UNEG Ethical Guidelines for Evaluations (2020)

Like this:

“The Checklist Manifesto”, another perspective on managing the problem of extreme complexity

Like this:

On the usefulness of deliberate (but bounded) randomness in decision making

An introduction

The literature (a sample)

Randomness in other species

Implications for thinking about complexity

Postscript: Subsequent finds…

Like this:

Mapping the Standards of Evidence used in UK social policy.

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

An introduction

The literature (a sample)

Randomness in other species

Implications for thinking about complexity

Postscript: Subsequent finds…

Share this:

Like this:

Share this:

Like this: