Findings and recommendations from the Evaluation Methods Advisory Panel at WFP

WFP (2024) Annual Report from the Evaluation Methods Advisory Panel at WFP 2023 in Review. Rome, Italy. Available at: (accessed 14 March 2024).
Declaration of interest: I was a member of the panel from 2022-23

1. Approaches and methods……………………………………………………………………3
2. Evaluation guidance …………………………………………………………………………..5
3. Use of theory-based evaluation …………………………………………………………7
4. Evaluability assessments and linkages with evaluation design ……….9
5. Triangulation, clarity, and transparency…………………………………………..11
6. Lessons to strengthen WFP’s evaluation function …………………………..13
Annex 1: Short biographies of members of the EMAP………………………….16
Annex 2: Evaluation documents reviewed by the EMAP………………………17
Annex 3: Selection of evaluations for review by the EMAP………………….19

The Evaluation Methods Advisory Panel Given the increase in the number of
evaluations and the complex and diverse contexts in which the World Food Programme (WFP) operates, the WFP Office of Evaluation (OEV) has created an Evaluation Methods Advisory Panel (EMAP) to support improving
evaluation methodology, approaches, and methods, and to reflect on international best practice and innovations in these areas. The Panel was launched in January 2022. Currently composed of six members (listed in annex 1), it complements provisions in the WFP evaluation quality assurance system (EQAS).

Purpose and Scope

The aims of the Annual Review are to:

  • Reflect on evaluation approaches and methods used in evaluations,
    and progress towards improving and broadening the range of
  • Identify systemic and structural challenges
  • Derive lessons to increase quality and utility in future evaluations

The EMAP Annual Report covers most evaluations conducted by WFP’s evaluation function – Policy Evaluations (PEs), Complex Emergency Evaluations (CEEs), Strategic Evaluations (SEs), Decentralized Evaluations
(DEs), and Country Strategic Plan Evaluations (CSPEs) – in 2022-2023 (see Annex 3). It is based on reviews undertaken by EMAP members (“the reviewers”), and discussions and workshops between the reviewers and
WFP. EMAP has not examined system-wide and impact evaluations.


Two approaches to the EMAP reviews were undertaken. In one strand of activities, EMAP members received a selection of completed CSPE and DE evaluation reports (ERs), and the related terms of reference (ToR) and inception reports (IRs), for their review. The other strand of EMAP activities was giving feedback on draft outputs for Policy Evaluations (PEs),
Complex Emergency Evaluations (CEEs) and Strategic Evaluations (SEs).

Two EMAP advisers wrote this Annual Report; the process of preparing it entailed:

  • Review of the advice provided by EMAP on WFP evaluations during 2023.
  • Discussion of the draft annual report with OEV, Regional Evaluation Officers (REOs) and other EMAP advisors in a two-day workshop at WFP. This report  incorporates key elements from these discussions.

As in 2022, the 2023 review faced the following limitations:

  • The review included 14 DEs, 10 CSPEs, 5 PEs, 3 CEEs and 3 SEs, but analysed outputs were at different stages of development. EMAP reviewers
    prepared review reports for DEs and CSPEs based on finalised ToRs,
    inception and evaluation reports. Conversely, for SEs, PEs and CEEs, the
    reviews examined draft concept notes, ToRs, IRs, ERs, and two literature
  • Not all EMAP reviews undertaken in 2023 were finalised in time for the
    synthesis process undertaken to prepare the Annual Report.
  • Most reviews followed a structure provided by WFP which varied by
    evaluation type. For instance, the DE review template included a section on
    overall evaluation approaches and methods which was not included in the
    CSPE review template. Some reviews did not use the templates provided but
    added comments directly to the draft reports.
  • Finally, reviewing written evaluation outputs presented challenges to
    explaining why something did or did not happen in an evaluation process
  • Unlike in 2022, there was no opportunity for the EMAP to discuss the draft annual report as a panel before sharing it with OEV. The 2023 Annual Report was, however, discussed in a workshop with EMAP members and OEVstaff, including regional evaluation officers, to validate the results and discuss potential ways forward across the different types of evaluations in WFP

The Australian Centre for Evaluation – plans, context, critiques

The plan

The 2023?24 Budget includes $10 million over four years to establish an Australian Centre for Evaluation (ACE) in the Australian Treasury. The Australian Centre for Evaluation will improve the volume, quality, and impact of evaluations across the Australian Public Service (APS), and work in close collaboration with evaluation units in other departments and agencies.

The context

The critique(s)

    •  Risky behaviour — three predictable problems with the Australian Centre for Evaluation, by Patricia Rogers.  Some highlighted points, among many others of interest:
      • Three predictable problems
        • Firstly, the emphasis on impact evaluations risks displacing attention from other types of evaluation that are needed for accountable and effective government
        • Secondly, the emphasis on a narrow range of approaches to impact evaluation risks producing erroneous or misleading findings.
        • Thirdly, the focus on ‘measuring what works’ creates risks in terms of how evidence is used to inform policy and practice, especially in terms of equity.
          • These approaches are designed to answer the question “what works” on average, which is a blunt and often inappropriate guide to what should be done in a particular situation. “What works” on average can be ineffective or even harmful for certain groups; “what doesn’t work” on average might be effective in certain circumstances. ….This simplistic focus on “what works” risks presenting evidence-informed policy as being about applying an algorithm where the average effect is turned into a policy prescription for all.

Other developments

    • In September 2022 the Commonwealth Evaluation Community of Practice (CoP) was launched as a way of bringing people together to support and promote better practice evaluation across the policy cycle. The CoP Terms of Reference state that it is open to all Australian government officials with a role or interest in evaluation that can access community events, discussion boards and a SharePoint Workspace. According to the Department of Finance the CoP membership has grown to over 400 people with representatives from around 70 entities and companies.
      • It would be interesting to be a “fly on the wall” amidst such discussions

My own two pence worth

  • Not only do we need a diversity of evaluation approaches (vs “RCTs rule okay!”), we also need to get away from the idea of even one approach alone being sufficient for many evaluations – which are often asking multiple complex questions. We need more combinatorial thinking, rather than single solution thinking. So, for example, combining “causes of an effect” analyses with “effects of a cause” analyses
  • Getting away from “average affect” thinking (but not abandoning it altogether) is also an essential step forward. We neeed more attention to both positive and negative deviants from any  averages. We also need more attention to configurational analyses, looking at packages of causes, rather than the role of multiple isolated (but not in reality) single factors. As pointed out by Patricia, equity is important – not just effectiveness and effeciency – i.e the different consequences for different groups need to be identified. Yes, the questions is not is not “What works” but “what works for whom in what ways and under what circumstances”
    • Re “This simplistic focus on “what works” risks presenting evidence-informed policy as being about applying an algorithm where the average effect is turned into a policy prescription for all.” Yes, what we want to avoid (or minimise) is a society where  “While the rich get personalised one to one services, the rest get management by algorithm

Connecting Foresight and Evaluation

This posting has been prompted by an email shared by Petra Mikkolainen, Senior Consultant – Development Evaluation, with NIRAS, a Finnish consulting firm. It follows my attendance at a UNESCO workshop last week that also looked at bridging Foresight and Evaluation

A good place to start is this NIRAS guide: 14 mental barriers to integrating futures-thinking in evaluations and how to overcome them

A new trend is emerging simultaneously in the field of evaluation and foresight: combining foresight with evaluation and evaluation with foresight. Evaluators realise that evaluation must become more future sensitive, while futures thinking experts consider that
foresight should use more lessons from past events to
strengthen the analysis of possible futures. This new
mindset is useful, given that evaluation and foresight
complement each other like two pieces of a puzzle.
However, before we can move on with the debate, we
must clarify what we mean by each concept and related
key terms. This discussion paper serves as your quick
guide to evaluation and foresight terminology.

Then there is “Evaluation must become future-sensitive – easy to implement ideas on how to do it practice

Evaluation – by definition – assesses past events to give recommendations for future action. There is an underlying assumption that what has (or has not) worked in the past will also work (or will not) in the future. In other words, it is supposed that the context in which the past events occurred will remain the same. This idea seems problematic in the current world, where volatility, uncertainty, complexity, and ambiguity (VUCA)are the new normal. One solution is to integrate methods of foresight into the evaluation project cycle. This idea of combining evaluation and foresight is relatively new and untested in the sector. This discussion paper proposes ways this integration can be done in practice in different steps of the evaluation project cycle.

Then there is: 14 mental barriers to integrating futures-thinking in evaluations and how to overcome them

There are two types of basic human reactions to new things: (1) “Yes, let’s try it!” and (2) “No, I don’t want that!”. We might observe one of these experiences in our minds when thinking about integrating foresight concepts and tools into development evaluation to make it more valuable and responsive to support transformative change. The danger with the first response is a lack of critical thinking about whether the approach is relevant to the situation. On the other hand, the second response might prevent reaching new levels of learning and co-creation. In this blog, I explore 14 types of resistance to applying futures-thinking in evaluation and suggest solutions with an attitude of positive curiosity.

One of the foreight methods mentioned on page 10 of the second document is ParEvo:

The ParEvo tool developed by Rick Davies is a web-assisted programme for building future (or past) scenarios in a participatory manner (Davies, 2022). It has been used in evaluations, and as described by Davies “When used to look forward ParEvo can be seen as a form of participatory exploration of alternate futures. When used to look back it can be seen as a form of participatory public history”. The website includes plenty of information on its applications.

Defining the Agenda: Key Lessons for Funders and Commissioners of Ethical Research in Fragile and Conflict Affected Contexts

By Leslie Groves-Williams. Funded by UK Research and Innovation (UKRI) and developed in collaboration with UNICEF, Office of Research – Innocenti.  A pdf copy is available online here

Publicised here because the issues and lessons identified also seem relevant to many evaluation activities

Text of the Introduction: The ethical issues that affect all research are amplified significantly in fragile and conflict-affected contexts. The power imbalances between local and international researchers are increased and the risk of harm is augmented within a context where safeguards are often reduced and the probabilities of unethical research that would be prohibited elsewhere are magnified. Funders and commissioners need to be confident that careful ethical scrutiny of the research process is conducted to mitigate risk, avoid potential harm and maximize the benefit of the commissioned research for affected populations, including through improving the quality and accuracy of data collected. The UKRI and UNICEF Ethical Research in Fragile and Conflict-Affected Contexts: Guidelines for Reviewers can support you to ensure that appropriate ethical scrutiny is taking place at review phase. But, what about mitigating for risks at the funding and commissioning phases? These phases are often not subject to ethical review yet carry strong ethical risks and opportunities. As a commissioner or a funder designing a call for research in fragile and conflict-affected contexts, how confident are you that you are commissioning the research in the most ethical way?

This document brings together some key lessons learned that provide guidance for funders and commissioners of research in fragile and conflict-affected contexts to ensure that ethical standards are applied, not just at the review stage, but also in formulating the research agenda. These lessons fall into four clusters:

1. Ethical Agenda Setting
2. Ethical Partnerships
3. Ethical Review
4. Ethical Resourcing.
In addition to highlighting the lessons, this paper provides mitigation strategies for funders and commissioners to explore as they seek to avoid the ethical risks highlighted

Algorithmic Impact Assessment – Three+ useful publications by Data & Society

In the movies, when a machine decides to be the boss — or humans let it — things go wrong. Yet despite myriad dystopian warnings, control by machines is fast becoming our reality. Photo: The Conversation / Shutterstock
As William Gibson famously said  circa 1992 “The future is already here — it’s just not very evenly distributed”  In 2021 the future is certainly here in the form of algorithms (rather than people) that manage low paid workers ( distribution centres, delivery services, etc), welfare service recipients and those caught up in the justice system. Plus anyone else having to deal with chatbots when trying to get through to other kinds of service providers. But is a counter-revolution brewing? Read on…

Selected quotes

“Algorithmic accountability is the process of assigning responsibility for harm when algorithmic decision-making results in discriminatory and inequitable outcomes”

“Among many applications, algorithms are used to:

• Sort résumés for job applications;
• Allocate social services;
• Decide who sees advertisements for open positions, housing, and products;
• Decide who should be promoted or fired;
• Estimate a person’s risk of committing crimes or the length of a prison term;
• Assess and allocate insurance and benefits;
• Obtain and determine credit; and
• Rank and curate news and information in search engines.”

“Algorithmic systems present a special challenge to assessors, because the harms of these systems are unevenly distributed, emerge only after they are integrated into society, or are often only visible in the aggregate”

“What our research indicates is that the risk of self-regulation lies not so much in a corrupted reporting and assessment process, but in the capacity of industry to define the methods and metrics used to measure the impact of proposed systems”

Algorithmic Accountability: A Primer.  Data & S0ciety. Caplan, R., Donovan, J., Hanson, L., & Matthews, J. (2018). 26 pages
What Is an Algorithm?
How Are Algorithms Used to Make Decisions?
Example: Racial Bias in Algorithms of Incarceration
Complications with Algorithmic Systems
• Fairness and Bias
• Opacity and Transparency
• Repurposing Data and
Repurposing Algorithms
• Lack of Standards for Auditing
• Power and Control
• Trust and Expertise
What is Algorithmic Accountability?
• Auditing by Journalists
• Enforcement and Regulation
Assembling accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society. Moss, E., Watkins, E. A., Singh, R., Elish, M. C., & Jacob Metcalf. (2021).


In summary: The Algorithmic Impact Assessment is a new concept for regulating algorithmic systems and protecting the public interest. Assembling Accountability: Algorithmic Impact Assessment for the Public Interest is a report that maps the challenges of constructing algorithmic impact assessments (AIAs) and provides a framework for evaluating the effectiveness of current and proposed AIA regimes. This framework is a practical tool for regulators, advocates, public-interest technologists, technology companies, and critical scholars who are identifying, assessing, and acting upon algorithmic harms.

First, report authors Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf analyze the use of impact assessment in other domains, including finance, the environment, human rights, and privacy. Building on this comparative analysis, they then identify common components of existing impact assessment practices in order to provide a framework for evaluating current and proposed AIA regimes. The authors find that a singular, generalized model for AIAs would not be effective due to the variances of governing bodies, specific systems being evaluated, and the range of impacted communities.

After illustrating the novel decision points required for the development of effective AIAs, the report specifies ten necessary components that constitute robust impact assessment regimes.


What is an Impact?
What is Accountability?
What is Impact Assessment?
Sources of Legitimacy
Actors and Forum
Catalyzing Event
Time Frame
Public Access
Public Consultation
Harms and Redress
Existing and Proposed AIA Regulations
Algorithmic Audits
External (Third and Second Party) Audits
Internal (First-Party) Technical Audits and
Governance Mechanisms
Sociotechnical Expertise
See also

The revised UNEG Ethical Guidelines for Evaluations (2020)

The UNEG Ethical Guidelines for Evaluation were first published in 2008. This document is a revision of the original document and was approved at the UNEG AGM 2020. These revised guidelines are consistent with the standards of conduct in the Charter of the United Nations, the Staff Regulations and Rules of the United Nations, the Standards of Conduct for the International Civil Service, and in the Regulations Governing the Status, Basic Rights and Duties of Officials other than Secretariat. They  are also consistent with the United Nations’ core values of Integrity, Professionalism and Respect for Diversity, the humanitarian principles of Humanity, Neutrality, Impartiality and Independence and the values enshrined in the Universal Declaration of Human Rights.

This document aims to support UN entity leaders and governing bodies as well as those organizing and conducting evaluations for the UN to ensure that an ethical lens informs day to day evaluation practice.

This document provides:

  • Four ethical principles for evaluation;
  • Tailored guidelines for entity leaders and governing bodies, evaluation organizers, and evaluation practitioners;
  • A detachable Pledge of Commitment to Ethical Conduct in Evaluation that all those involved in evaluations will be required to sign.

These guidelines are designed to be useful and applicable to all UN agencies, regardless of differences in mission (operational vs. normative agencies), in structures (centralized vs. decentralized), in the contexts for the work (development, peacekeeping, humanitarian) and in the nature of evaluations that are undertaken (oversight/accountability focused vs. learning).

On the usefulness of deliberate (but bounded) randomness in decision making


An introduction

In many spheres of human activity, relevant information may be hard to find, and it may be of variable quality. Human capacities to objectively assess that information may also be limited and variable. Extreme cases may be easy to assess e.g projects or research that is definitely worth/not worth funding or papers that are definitely worth/not worth publishing. But in between these extremes there may be substantial uncertainty and thus room for tacit assumptions and unrecognised biases to influence judgements.  In some fields the size of this zone of uncertainty may be quite big (see Adam, 2019 below), so the consequences at stake can be substantial. This is the territory where a number of recent papers have argued that an explicitly random decision making process may be the best approach to take.

After you have scanned the references below, continue on to some musings about implications for how we think about complexity

The literature (a sample)

    • Nesta (2020) Why randomise funding? How randomisation can improve the diversity of ideas
    • Osterloh, M., & Frey, B. S. (2020, March 9). To ensure the quality of peer reviewed research introduce randomness. Impact of Social Sciences.  
      • Why random selection of contributions to which the referees do not agree? This procedure reduces the “conservative bias”, i.e. the bias against unconventional ideas. Where there is uncertainty over the quality of a contribution, referees have little evidence to draw on in order to make accurate evaluations. However, unconventional ideas may well yield high returns in the future. Under these circumstances a randomised choice among the unorthodox contributions is advantageous.
      • …two [possible] types of error: type I errors (“reject errors”) implying that a correct hypothesis is rejected, and type 2 errors implying that a false hypothesis is accepted (“accept errors”). The former matters more than the latter. “Reject errors” stop promising new ideas, sometimes for a long time, while “accept errors” lead to a waste of money, but may be detected soon once published. This is the reason why it is more difficult to identify “reject errors” than “accept errors”. Through randomisation the risks of “reject errors” are diversified.
  • Osterloh, M., & Frey, B. S. (2020). How to avoid borrowed plumes in academia. Research Policy, 49(1), 103831. Abstract: Publications in top journals today have a powerful influence on ac
  • Liu, M., Choy, V., Clarke, P., Barnett, A., Blakely, T., & Pomeroy, L. (2020). The acceptability of using a lottery to allocate research funding: A survey of applicants. Research Integrity and Peer Review, 5(1), 3.
    • Background: The Health Research Council of New Zealand is the first major government funding agency to use a lottery to allocate research funding for their Explorer Grant scheme. …  the Health Research Council of New Zealand wanted to hear from applicants about the acceptability of the randomisation process and anonymity of applicants.   The survey asked about the acceptability of using a lottery and if the lottery meant researchers took a different approach to their application. Results:… There was agreement that randomisation is an acceptable method for allocating Explorer Grant funds with 63% (n = 79) in favour and 25% (n = 32) against. There was less support for allocating funds randomly for other grant types with only 40% (n = 50) in favour and 37% (n = 46) against. Support for a lottery was higher amongst those that had won funding. Multiple respondents stated that they supported a lottery when ineligible applications had been excluded and outstanding applications funded, so that the remaining applications were truly equal. Most applicants reported that the lottery did not change the time they spent preparing their application. Conclusions: The Health Research Council’s experience through the Explorer Grant scheme supports further uptake of a modified lottery.
  • Roumbanis, L. (2019). Peer Review or Lottery? A Critical Analysis of Two Different Forms of Decision-making Mechanisms for Allocation of Research Grants. Science, Technology, & Human Values44(6), 994–1019.  
  • Adam, D. (2019). Science funders gamble on grant lotteries.A growing number of research agencies are assigning money randomly. Nature, 575(7784), 574–575.
    • ….says that existing selection processes are inefficient. Scientists have to prepare lengthy applications, many of which are never funded, and assessment panels spend most of their time sorting out the specific order in which to place mid-ranking ideas. Low­ and high­ quality applications are easy to rank, she says. “But most applications are in the midfield, which is very big
    • The fund tells applicants how far they got in the process, and feedback from them has been positive, he says. “Those that got into the ballot and miss out don’t feel as disappointed. They know they were good enough to get funded and take it as the luck of the draw.”
    • Fang, F. C., & Casadevall, A. (2016). Research Funding: The Case for a Modified Lottery. MBio, 7(2).
      • ABSTRACT The time-honored mechanism of allocating funds based on ranking of proposals by scienti?c peer review is no longer effective, because review panels cannot accurately stratify proposals to identify the most meritorious ones. Bias has a major in?uence on funding decisions, and the impact of reviewer bias is magni?ed by low funding paylines. Despite more than a decade of funding crisis, there has been no fundamental reform in the mechanism for funding research. This essay explores the idea of awarding research funds on the basis of a modi?ed lottery in which peer review is used to identify the most meritorious proposals, from which funded applications are selected by lottery. We suggest that a modi?ed lottery for research fund allocation would have many advantages over the current system, including reducing bias and improving grantee diversity with regard to seniority, race, and gender.
  • Avin, S (2015) Breaking the grant cycle: on the rational allocation of public resources to scientific research projects
    • Abstract: The thesis presents a reformative criticism of science funding by peer review. The criticism is based on epistemological scepticism, regarding the ability of scientific peers, or any other agent, to have access to sufficient information regarding the potential of proposed projects at the time of funding. The scepticism is based on the complexity of factors contributing to the merit of scientific projects, and the rate at which the parameters of this complex system change their values. By constructing models of different science funding mechanisms, a construction supported by historical evidence, computational simulations show that in a significant subset of cases it would be better to select research projects by a lottery mechanism than by selection based on peer review. This last result is used to create a template for an alternative funding mechanism that combines the merits of peer review with the benefits of random allocation, while noting that this alternative is not so far removed from current practice as may first appear.
  • Schulson, M. (2014). If you can’t choose wisely, choose randomly. Aeon. A quick review of known instances of the use of randomness across different cultures, nationalities and periods of history
  • Casadevall, F. C. F. A. (2014, April 14). Taking the Powerball Approach to Funding Medical Research. Wall Street Journal.
  • Stone, P. (2011). The Luck of the Draw: The Role of Lotteries in Decision Making. In The Luck of the Draw: The Role of Lotteries in Decision Making.
    • From the earliest times, people have used lotteries to make decisions–by drawing straws, tossing coins, picking names out of hats, and so on. We use lotteries to place citizens on juries, draft men into armies, assign students to schools, and even on very rare occasions, select lifeboat survivors to be eaten. Lotteries make a great deal of sense in all of these cases, and yet there is something absurd about them. Largely, this is because lottery-based decisions are not based upon reasons. In fact, lotteries actively prevent reason from playing a role in decision making at all. Over the years, people have devoted considerable effort to solving this paradox and thinking about the legitimacy of lotteries as a whole. However, these scholars have mainly focused on lotteries on a case-by-case basis, not as a part of a comprehensive political theory of lotteries. In The Luck of the Draw, Peter Stone surveys the variety of arguments proffered for and against lotteries and argues that they only have one true effect relevant to decision making: the “sanitizing effect” of preventing decisions from being made on the basis of reasons. While this rationale might sound strange to us, Stone contends that in many instances, it is vital that decisions be made without the use of reasons. By developing innovative principles for the use of lottery-based decision making, Stone lays a foundation for understanding when it is–and when it is not–appropriate to draw lots when making political decisions both large and small

Randomness in other species

    • Drew, L. (2020). Random Search Wired Into Animals May Help Them Hunt. Quanta Magazine. Retrieved 2 February 2021, from
        • Of special interest here is the description of  Levy walks, a variety of randomised movement where the frequency  distribution of distances moved has one long tail. Levy walks have been the subject of exploration across multiple disciples, as seen in…
    • Reynolds, A. M. (2018). Current status and future directions of Lévy walk research. Biology Open, 7(1).
        • Levy walks are specialised forms of random walks composed of clusters of multiple short steps with longer steps between them…. They are particularly advantageous when searching in uncertain or dynamic environments where the spatial scales of searching patterns cannot be tuned to target distributions…Nature repeatedly reveals the limits of our imagination. Lévy walks once thought to be the preserve of probabilistic foragers have now been identified in the movement patterns of human hunter-gatherers
Levy walk random versus Brownian motion random movement

Implications for thinking about complexity

Uncertainty of future states is a common characteristic of many complex systems, though not unique to these.  One strategy that human organisations can use to deal with uncertainty is to build up capital reserves, thus enhancing longer term resilience albeit at the cost of more immediate efficiencies. From the first set of papers referenced above, it seems like the deliberate and bounded use of randomness could provide a useful second option. The work being done on Levy walks also suggests that there are interesting variations on randomisation that should be explored.  It is already the case the designers of search/opitimisation algorithms have headed this way. If you are interested, you can read further on the subject of what are called  “Levy Flight ” algorithms.

On a more light hearted note, I would be interested to hear from the Cynefin school on how comfortable they would be marketing this approach to “managing” uncertainty to the managers and leaders they seem keen to engage with.

Another thought…years ago I did an analysis of data that had been collected on development projects that had been funded by the then DFID’s funded Civil Society Challenge Fund. This included data on project proposals, proposal assessments, and project outcomes. I used Rapid Miner Studio’s Decision Tree  module to develop predictive models of achievement ratings of the funded projects. Somewhat disappointingly, I failed to identify any attributes of project proposals, or how they had been initially assessed, which were good predictors of the subsequent performance of those projects. There are number of possible reasons why this might so. One of which may be the scale of the uncertainty gap between the evident likely failures and the evident likely successes. Various biases may have skewed judgements within this zone in a way that undermined the longer term predictive use of the proposal screening and approval process. Somewhat paradoxically, if instead a lottery mechanism had been used for selecting fundable proposals in the uncertainty zone this may well have led to the approval process being a better predictor eventual project performance.

Postscript: Subsequent finds…

  •  The Powerball Revolution. By Malcom Gladwell (n.d.). Revisionist History Season 5 Episode 3. Retrieved 7 April 2021, from
    • On school student council lotteries in Bolivia
      • “Running for an office” and “Running an office” can be two very different things. Lotteries diminish the former and put the focus on the latter
      • “Its a more diverse group” that end up on the council, compared to those selected via election
      • “Nobody knows anything” -initial impressions of capacity are often not good predictors of leadership capacity. Contra assumption that voters can be good predictors of capacity in office.
    • Medical research grant review and selection
      • Review scores of proposals are poor predictors of influential and innovative research (based on citation analysis), but has been in use for decades.
    • A boarding school in New Jersey


Mapping the Standards of Evidence used in UK social policy.

Puttick, R. (2018). Mapping the Standards of Evidence used in UK social policy. Alliance for Useful Evidence.
“Our analysis focuses on 18 frameworks used by 16 UK organisations for judging evidence used in UK domestic social policy which are relevant to government, charities, and public service providers.
In summary:
• There has been a rapid proliferation of standards of evidence and other evidence frameworks since 2000. This is a very positive development and reflects the increasing sophistication of how evidence is generated and used in social policy.
• There are common principles underpinning them, particularly the shared goal of improving decision-making, but they often ask different questions, are engaging different audiences, generate different content, and have varying uses. This variance reflects the host organisation’s goals, which can be to inform its funding decisions, to make recommendations to the wider field, or to provide a resource for providers to help them evaluate.
• It may be expected that all evidence frameworks assess whether an intervention is working, but this is not always the case, with some frameworks assessing the quality of evidence, not the success of the intervention itself.
• The differences between the standards of evidence are often for practical reasons and reflect the host organisation’s goals. However, there is a need to consider more philosophical and theoretical tensions about what constitutes good evidence. We identified examples of different organisations reaching different conclusions about the same intervention; one thought it worked well, and the other was less confident. This is a problem: Who is right? Does the intervention work, or not? As the field develops, it is crucial that confusion and disagreement is minimised.
• One suggested response to minimise confusion is to develop a single set of standards of evidence. Although this sounds inherently sensible, our research has identified several major challenges which would need to be overcome to achieve this.
• We propose that the creation of a single set of standards of evidence is considered in greater depth through engagement with both those using standards of evidence, and those being assessed against them. This engagement would also help share learning and insights to ensure that standards of evidence are effectively achieving their goals.

Computational Modelling: Technological Futures

Council for Science and Technology & Government Office for Science, 2020. Available as pdf

Not the most thrilling/enticing title, but differently of interest. Chapter  3 provides a good overview of different ways of building models. Well worth a read, and definitely readable.

Recommendation 2: Decision-makers need to be intelligent customers for models, and those that supply models should provide appropriate
guidance to model users to support proper use and interpretation. This includes providing suitable model documentation detailing the model purpose, assumptions, sensitivities, and limitations, and evidence of appropriate quality assurance.

Chapters 1-3

The Alignment Problem: Machine Learning and Human Values

By Brian Christian. 334 pages. 2020 Norton. Author’s web page here

Brian Christian talking about his book on YouTube

RD comment: This is one of the most interesting and informative books I have read in the last few years. Totally relevant for evaluators thinking about the present and about future trends

%d bloggers like this: