## How to find the right answer when the “wisdom of the crowd” fails?

Posted on 9 April, 2017 – 6:39 PM

Dizekes, P. (2017). Better wisdom from crowds. MIT Office News. Retrieved from http://news.mit.edu/2017/algorithm-better-wisdom-crowds-0125  PDF copy pdf copy

Ross, E. (n.d.). How to find the right answer when the “wisdom of the crowd” fails. Nature News. https://doi.org/10.1038/nature.2017.21370

Prelec, D., Seung, H. S., & McCoy, J. (2017). A solution to the single-question crowd wisdom problem.Nature, 541(7638), 532–535. https://doi.org/10.1038/nature21054

Dizekes: The wisdom of crowds is not always perfect. but two scholars at MIT’s Sloan Neuroeconomics Lab, along with a colleague at Princeton University, have found a way to make it better. Their method, explained in a newly published paper, uses a technique the researchers call the “surprisingly popular” algorithm to better extract correct answers from large groups of people. As such, it could refine “wisdom of crowds” surveys, which are used in political and economic forecasting, as well as many other collective activities, from pricing artworks to grading scientific research proposals.

The new method is simple. For a given question, people are asked two things: What they think the right answer is, and what they think popular opinion will be. The variation between the two aggregate responses indicates the correct answer. [Ross: In most cases, the answers that exceeded expectations were the correct ones. Example: If Answer A was given by 70% but 80% expected it to be given and Answer B was given by 30% but only 20% expected it to be given then Answer B would be the “surprisingly popular” answer].

In situations where there is enough information in the crowd to determine the correct answer to a question, that answer will be the one [that] most outperforms expectations,” says paper co-author Drazen Prelec, a professor at the MIT Sloan School of Management as well as the Department of Economics and the Department of brain and Cognitive Sciences.

The paper is built on both theoretical and empirical work. The researchers first derived their result mathematically, then assessed how it works in practice, through surveys spanning a range of subjects, including U.S. state capitols, general knowledge, medical diagnoses by dermatologists, and art auction estimates.

Across all these areas, the researchers found that the “surprisingly popular” algorithm reduced errors by 21.3 percent compared to simple majority votes, and by 24.2 percent compared to basic confidence-weighted votes (where people express how confident they are in their answers). And it reduced errors by 22.2 percent compared to another kind of confidence weighted votes, those taking the answers with the highest average confidence levels”

But “… Prelec and Steyvers both caution that this algorithm won’t solve all of life’s hard problems. It only works on factual topics: people will have to figure out the answers to political and philosophical questions the old-fashioned way”

Rick Davies comment: This method could be useful in an evaluation context, especially where participatory methods were needed or potentially useful

VN:F [1.9.22_1171]

## Fact Checking websites serving as public evidence-monitoring services: Some sources

Posted on 2 March, 2017 – 7:42 AM

These services seem to be getting more attention lately, so I thought it would be worthwhile compiling a list of some of the kinds of fact checking websites that exist, and how they work.

Fact checkers have the potential to influence policies at all stages of the policy development and implementation process, not by promoting particular policy positions based on evidence, but by policing the boundaries of what should be considered as acceptable as factual evidence. They are responsive rather than pro-active.

International

• International Fact-Checking Network (IFCN) is covering…
• Research trends and formats in fact-checking worldwide
• Provide online and offline training resources for fact-checkers
• Occasionally lead collaborative efforts in international fact checking, such as the ‘Relay-Check’ on refugees in the European Union.
• And the host Poynter.org says it will investigate
• Using technology to turbo-power fact-checking.
• Measuring the impact of fact-checkers
• Funding fact-checking.
• The ethics of fact-checking
• Duke Reporters Lab keeps a count of active and inactive fact checking services here
• Aggregators:

American websites

• Politifact– PolitiFact is a fact-checking website that rates the accuracy of claims by elected officials and others who speak up in American politics.
• Fact Check–They monitor the factual accuracy of what is said by major U.S. political players in the form of TV ads, debates, speeches, interviews and news releases.
• Media Bias / Fact Check…claims to be ” the most comprehensive media bias resource on the internet”, but content is mainly American

Australia

United Kingdom

Discussions of the role of fact checkers

A related item, just seen…

• This site is “taking the edge off rant mode” by making readers pass a factual knowldge quiz before commenting. ““If everyone can agree that this is what the article says, then they have a much better basis for commenting on it.”

Update 20/03/2017: Read Tim Harford’s blog posting on The Problem With Facts (pdf copy here), and communication value of eliciting curiosity

VN:F [1.9.22_1171]

## Integrating Big Data into the Monitoring and Evaluation of Development Programmes

Posted on 24 January, 2017 – 7:39 PM
Bamberger, M. (2016). Integrating Big Data into the Monitoring and Evaluation of Development Programmes (2016) |. United Nations Global Pulse. Retrieved from http://unglobalpulse.org/big-data-monitoring-and-evaluation-report  PDF copy available

Context: “This report represents a basis for integrating big data and data analytics in the monitoring and evaluation of development programmes. The report proposes a Call to Action, which hopes to inspire development agencies and particularly evaluators to collaborate with data scientists and analysts in the exploration and application of new data sources, methods, and technologies. Most of the applications of big data in international development do not currently focus directly on monitoring, and even less on evaluation. Instead they relate more to research, planning and operational use using big data. Many development agencies are still in the process of defining their policies on big data and it can be anticipated that applications to the monitoring and evaluation of development programmes will start to be incorporated more widely in the near future. This report includes examples and ways that big data, together with related information and communications technologies (ICTs) are already being used in programme monitoring, evaluation and learning. The data revolution has been underway for perhaps a decade now. One implication for international development is that new sources of real–time information about people are for the first time available and accessible. In 2015, in an unprecedented, inclusive and open process, 193 members states of the United Nations adopted, by consensus, the 2030 Agenda for sustainable development. The 17 Sustainable Development Goals (SDGs) contained in the 2030 Agenda constitute a transformative plan for people, planet, prosperity, partnerships and peace. All of these factors are creating a greater demand for new complexity–responsive evaluation designs that are flexible, cost effective and provide real–time information. At the same time, the rapid and exciting developments in the areas of new information technology (big data, information and communications technologies) are creating the expectation, that the capacity to collect and analyse larger and more complex kinds of data, is increasing. The report reviews the opportunities and challenges for M&E in this new, increasingly digital international development context. The SDGs are used to illustrate the need to rethink current approaches to M&E practices, which are no longer able to address the complexities of evaluation and interaction among the 17 Goals. This endeavour hopes to provide a framework for monitoring and evaluation practitioners in taking advantage of the data revolution to improve the design of their programmes and projects to support the achievement of the Sustainable Development Goals and the 2030 Agenda.

Rick Davies comment: As well as my general interest in this paper, I have two particular interests in its contents. One is what it says about small  (rather than big) data and how big data analysis techniques may be relevant to the analysis of small data sets. In my experience many development agencies have rather small data sets, which are often riddle with missing data points. The other is what the paper has to say about predictive analytics, a field of analysis (within data mining defined more widely) that I think has a lot of relevance to M&E of development programmes.

Re the references to predictive analytics, I was disappointed to see this explanation on page 48: “Predictive analytics (PA) uses patterns of associations among variables to predict future trends. The predictive models are usually based on Bayesian statistics and identify the probability distributions for different outcomes“.  In my understanding  Bayesian classification algorithms are only one of a number of predictive analytics tools which generate classifications (read predictive models). Here  are some some classifications of the different algorithms that are available: (a) Example A, focused on classification algorithms – with some limitations, (b) Example B, looking at classification algorithms within the wider ambit of data mining methods, from Maimon and Rokach (2010; p.6) . Bamberger’s narrow definition is an unfortunate because there are simpler and more transparent methods available, such as Decision Trees, which would be easier for many evaluators to use and whose results could be more easily communicated to their clients.

Re my first interest re small data, I was more pleased to see this statement: “While some data analytics are based on the mining of very large data sets with very large numbers of cases and variables, it is also possible to apply many of the techniques such as predictive modelling with smaller data sets” This heightens the importance of clearly spelling out the different ways in which predictive analytics work can be done.

I was also agreeing with the follow on paragraph:  “While predictive analytics are well developed, much less progress has been made on causal (attribution) analysis. Commercial predictive analytics tends to focus on what happened, or is predicted to happen (e.g. click rates on web sites), with much less attention to why outcomes change in response to variations in inputs (e.g. the wording or visual presentation of an on–line message). From the evaluation perspective, a limitation of predictive analysis is that it is not normally based on a theoretical framework, such as a theory of change, which explains the process through which outcomes are likely to be achieved. This is an area where there is great potential for collaboration between big data analytics and current impact evaluation methodologies” My approach to connecting these two types of analysis is explained on the EvalC3 website. This involves connecting cross-case analysis (using predictive analytics tools, for example) to within-case analysis (using process tracing or simpler tools, for example) through carefully thought though case selection and comparison strategies.

My interest and argument for focusing more on small data was reinforced when I saw this plausible and likely situation: “The limited access of many agencies to big data is another major consideration” (p69) – not a minor issue in a paper on the use and uses of big data! Though the paper does highlight the many and varied sources that are becoming increasingly available, and the risks and opportunities associated with their use.

VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

## Monitoring and Evaluation in Health and Social Development: Interpretive and Social Development Perspectives

Posted on 17 January, 2017 – 4:47 PM

Edited by Stephen Bell and Peter Aggleton. Routledge 2016. View on Google Books

interpretive researchers thus attempt to understand phenomena through accessing the meanings participants assign to them

“...interpretive and ethnographic approaches are side-lined in much contemporary evaluation work and current monitoring and evaluation practice remains heavily influenced by more positivist approaches

attribution is not the only purpose of impact evaluation

Lack of familiarity with qualitative approaches by programme staff and donor agencies also influences the preferences for for quantitative methods in monitoring and evaluation work

Contents

1. Interpretive and Ethnographic Perspectives – Alternative Approaches to Monitoring and Evaluation Practice

2. The Political Economy of Evidence: Personal Reflections on the Value of the Interpretive Tradition and its Methods

3. Measurement, Modification and Transferability: Evidential Challenges in the Evaluation of Complex Interventions

4. What Really Works? Understanding the Role of ‘Local Knowledges’ in the Monitoring and Evaluation of a Maternal, Newborn and Child Health Project in Kenya

PART 2: Programme Design 5. Permissions, Vacations and Periods of Self-regulation: Using Consumer Insight to Improve HIV Treatment Adherence in Four Central American Countries

6. Generating Local Knowledge: A Role for Ethnography in Evidence-based Programme Design for Social Development

7. Interpretation, Context and Time: An Ethnographically Inspired Approach to Strategy Development for Tuberculosis Control in Odisha, India

8. Designing Health and Leadership Programmes for Young Vulnerable Women Using Participatory Ethnographic Research in Freetown, Sierra Leone

Part 3: Monitoring Processes

9. Using Social Mapping Techniques to Guide Programme Redesign in the Tingim Laip HIV Prevention and Care Project in Papua New Guinea

10. Pathways to Impact: New Approaches to Monitoring and Improving Volunteering for Sustainable Environmental Management

11. Ethnographic Process Evaluation: A Case Study of an HIV Prevention Programme with Injecting Drug Users in the USA

12. Using the Reality Check Approach to Shape Quantitative Findings: Experience from Mixed Method Evaluations in Ghana and Nepal

Part 4: Understanding Impact and Change

13. Innovation in Evaluation: Using SenseMaker to Assess the Inclusion of Smallholder Farmers in Modern Markets

14. The Use of the Rapid PEER Approach for the Evaluation of Sexual and Reproductive Health Programmes

15. Using Interpretive Research to Make Quantitative Evaluation More Effective: Oxfam’s Experience in Pakistan and Zimbabwe

16. Can Qualitative Research Rigorously Evaluate Programme Impact? Evidence from a Randomised Controlled Trial of an Adolescent Sexual Health Programme in Tanzania

Rick Davies Comment: [Though this may reflect my reading biases…]It seems like this strand of thinking has not been in the forefront of M&E attention for a long time (i.e. maybe since the 1990s – early 2000’s) so it is good to see this new collection of papers, by a large collection of both old and new faces (33 in all).

VN:F [1.9.22_1171]

## New books on the pros and cons of algorithms

Posted on 5 January, 2017 – 2:42 PM

Algorithms are means of processing data in ways that can aid our decision making. One of the weak areas of evaluation practice is guidance on data analysis, as distinct from data gathering. In the last year or so I have been searching for useful books on the subject of algorithms – what they are, how they work and the risks and opportunities associated with their use. Here are a couple of books I have found worth reading, plus some blog postings.

Books

Christian, B., & Griffiths, T. (2016). Algorithms To Live By: The Computer Science of Human Decisions. William Collins. An excellent over view of a wide range of types of algorithms and how they work. I have read this book twice and found a number of ideas within it that have been practically useful for me in my work

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Crown Publishing Group. A more depressing book, but a necessary read nevertheless. Highlighting the risks posed  to human welfare by poorly designed and or  poorly used algorithms. One of the examples cited being labor/staff scheduling algorithms, which very effectively minimize labor costs for empowers, but at the cost of employees not being able to predictably schedule child care, second jobs or part time further education, thus in effect locking those people into membership of a low cost labor pool.Some algorithms are able to optimize multiple objectives e.g. labor costs and labor turnover (represented longer term costs), but both objectives are still employer focused. Another area of concern is customer segmentation, where algorithms fed on big data sets enable companies to differentially (and non-transparently) price products and services being sold to ever smaller segments of their consumer population. In the insurance market this can mean that instead of the whole population sharing the costs of health insurance risks, which may in real life fall more on some than others, those costs will now be imposed more specifically on those with the high risks (regardless of the origins of those risks, genetic, environmental or an unknown mix)

Ezrachi, A., & Stucke, M. E. (2016). Virtual Competition: The Promise and Perils of the Algorithm-Driven Economy. Cambridge, Massachusetts: Harvard University Press. This one is a more in-depth analysis than the one above, focusing on the implications for how our economies work, and can fail to work

Blog postings

Kleinberg, J., Ludwig, J., & Mullainathan, S. (2016, December 8). A Guide to Solving Social Problems with Machine Learning. Retrieved January 5, 2017, from Harvard Business Review website. A blog posting, easy to read and informative

Knight, Will, (2016, November 23) How to Fix Silicon Valley’s Sexist Algorithms, MIT Technology Review

Lipton, Zacharay Chase, (2016) The foundations of algorithmic bias. KD Nuggets

Nicholas Diakopoulos and Sorelle Friedler (2016, November 17) How to Hold Algorithms Accountable,  MIT Technology Review. Algorithmic systems have a way of making mistakes or leading to undesired consequences. Here are five principles to help technologists deal with that.

VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

## Dealing with missing data: A list

Posted on 20 November, 2016 – 12:48 PM

In this post “missing data” does not mean absence of whole categories of data, which is a common enough problem, but missing data values within a given data set.

While this is a common problem in almost all spheres of research/evaluation it seems particularly common in more qualitative and participatory inquiry, where the same questions may not be asked of all participants/respondents. It is also likely to be a problem when data is extracted from documentary source produced by different parties e.g. project completion reports.

Some types of strategies (from Analytics Vidhya):

1. Deletion:
1. Listwise deletion: Of all cases with missing data
2. Pairwise deletion: : An analysis is carried out with all cases in which the variable of interest is present. The sub-set of cases used will vary according to the sub-set of variables which are the focus of each analysis.
2. Substitution
1. Mean/ Mode/ Median Imputation: replacing the missing data for a given attribute by the mean or median (quantitative attribute) or mode (qualitative attribute) of all known values of that variable. Two variants:
1. Generalized: Done for all cases
2. Similar case: calculated separately for different sub-groups e.g. men versus women
2. K Nearest Neighbour (KNN) imputation: The missing values of an attribute are imputed using those found in other cases with the most similar other attributes (where k = number of other attributes being examined).
3. Prediction model: Using a sub-set of cases with no missing values, a model is developed that best predicts the presence of the attribute of interest. This is then applied to predict the missing values in the sub-set of cases with the missing values. Another variant, for continuous data:
1. Regression Substitution: Using multiple-regression analysis to estimate a missing value.
3. Error estimation (tbc)

Note: I would like this list to focus on easily usable references i.e. those not requiring substantial knowledge of statistics and/or the subject of missing data

VN:F [1.9.22_1171]

## The sources of algorithmic bias

Posted on 20 November, 2016 – 9:56 AM

“The foundations of algorithmic bias“, by Zacharay Chase Lipton, 2016. pdf copy here. Original source here

“This morning, millions of people woke up and impulsively checked Facebook. They were greeted immediately by content curated by Facebook’s newsfeed algorithms. To some degree, this news might have influenced their perceptions of the day’s news, the economy’s outlook, and the state of the election. Every year, millions of people apply for jobs. Increasingly, their success might lie, in part, in the hands of computer programs tasked with matching applications to job openings. And every year, roughly 12 million people are arrested. Throughout the criminal justice system, computer-generated risk-assessments are used to determine which arrestees should be set free. In all these situations, algorithms are tasked with making decisions.

Algorithmic decision-making mediates more and more of our interactions, influencing our social experiences, the news we see, our finances, and our career opportunities. We task computer programs with approving lines of credit, curating news, and filtering job applicants. Courts even deploy computerized algorithms to predict “risk of recidivism”, the probability that an individual relapses into criminal behavior. It seems likely that this trend will only accelerate as breakthroughs in artificial intelligence rapidly broadened the capabilities of software.

Turning decision-making over to algorithms naturally raises worries about our ability to assess and enforce the neutrality of these new decision makers. How can we be sure that the algorithmically curated news doesn’t have a political party bias or job listings don’t reflect a gender or racial bias? What other biases might our automated processes be exhibiting that that we wouldn’t even know to look for?”

Rick Davies Comment: This paper is well worth reading. It starts by explaining the basics (what an algorithm is and what machine learning is). Then it goes into detail about three sources of bias:(a) biased data, (c) bias by omission, and (c) surrogate objectives. It does not throw the baby out with the bathwater, i.e condemn the use of algorithms altogether because of some bad practices and weaknesses in their use and design

[TAKEAWAYS]

Many of the problems with bias in algorithms are similar to problems with bias in humans. Some articles suggest that we can detect our own biases and therefore correct for them, while for machine learning we cannot.  But this seems far ­fetched. We have little idea how the brain works. And ample studies show that humans are flagrantly biased in college admissions, employment decisions, dating behavior, and more. Moreover, we typically detect biases in human behavior post ­hoc by evaluating human behavior, not through an a priori examination of the processes by which we think.

Perhaps the most salient difference between human and algorithmic bias may be that with human decisions, we expect bias. Take for example, the well documented racial biases among employers, less likely to call back  workers with more more typically black names than those with white names but identical resumes.  We detect these biases because we suspect that they exist and have decided that they are undesirable, and therefore vigilantly test for their existence.

As algorithmic decision ­making slowly moves from simple rule ­based systems towards more complex, human level decision making, it’s only reasonable to expect that these decisions are susceptible to bias.

Perhaps, by treating this bias as a property of the decision itself and not focusing overly on the algorithm that made it, we can bring to bear the same tools and institutions that have helped to strengthen ethics and equality in the workplace, college admissions etc. over the past century.

• How to Hold Algorithms Accountable, Nicholas Diakopoulos and Sorelle Friedler. MIT Technology Review, November 17, 2016. Algorithmic systems have a way of making mistakes or leading to undesired consequences. Here are five principles to help technologists deal with that.
• Is the Gig Economy Rigged? by Will Knight November 17, 2016 A new study suggests that racial and gender bias affect the freelancing websites TaskRabbit and Fiverr—and may be baked into underlying algorithms.

VN:F [1.9.22_1171]

## Two useful papers on the practicalities of doing Realist Evaluation

Posted on 15 August, 2016 – 2:43 PM

1. Punton, M., Vogel, I., & Lloyd, R. (2016, April). Reflections from a Realist Evaluation in Progress: Scaling Ladders and Stitching Theory. IDS. Available here

2. Manzano, A. (2016). The craft of interviewing in realist evaluation. Evaluation, 22(3), 342–360. Available here.

Rick Davies comment: I have listed these two papers here because I think they both make useful contributions towards enabling people (myself and others) to understand how to actually do a Realist Evaluation. My previous reading of comments that Realist Evaluation (RE) is “an approach” or a “a way of thinking” rather than a method” has not been encouraging. Both of these papers provide practically relevant details. The Punton et al paper includes comments about the difficulties encountered and where they deviated from current or suggested practice and why so, which I found refreshing.

I have listed some issues of interest to me below, with reflections on the contributions of the two paper.

#### Interviews as sources of knowledge

Interviews of stakeholders about if, how and why a program works, are a key resource in most REs (Punton et al). Respondents views are both sources of theories and sources of evidence for and against those theories, and there seems to be potential for mixing these up in  way that the process of theory elicitation and testing becomes less explicit than it should be. Punton et al have partially addressed this by coding the status of views about reported outcomes as “observed”, anticipated” or “implied”. The same approach could be taken with recording of respondents’ views on the context and mechanisms involved.

Manzano makes a number of useful distinctions between RE and constructivist interview approaches. But one distinction that is made seems unrealistic, so to speak. “…data collected through qualitative interviews are not considered constructions. Data are instead considered “evidence for real phenomena and processes”. But respondents themselves, as shown in some quotes in the paper, will indicate that on some issues they are not sure, they have forgotten or they are guessing. What is real here is that respondents are themselves making their best efforts to construct some sense out of a situation.So the issue of careful coding of the status of respondents’ views, as to whether they are theories or not, and if observations, what status these have, is important.

#### How many people to interview

According to Manzano there is no simple answer to this question, but is clear that in the early stages of a RE the emphasis is on capturing a diversity of stakeholder views in such a way that the diversity of possibly CMOs might be identified. So I was worried that the Punton et al paper referred to interviews being conducted in only 5 of the 11 countries where the BCURE program was operating. If some contextual differences are more influential than others, then I would guess that  cross-country differences  would be one such type of difference. I know in all evaluations resources are in limited supplies and choices need to be made. But this one puzzled me.

[Later edit] I think part of the problem here is the lack of what could be called an explicit search strategy. The problem is that the number of useful CMOs that could be identified is potentially equal to the number of people effected by a program, or perhaps even a multiple of that if they encountered a program on multiple occasions. Do you try to identify all of these, or do you stop when the number of new CMOs starts to drop off, per extra x number of interviewees? Each of these is a kind of search strategy. One pragmatic way of limiting the number of possible CMOs to investigate might be to decide in advance on just how dis-aggregated an analysis of “what works for whom in what circumstances” should be. To do this one would need to be clear on what the unit of analysis should be. I partially agree and disagree with Manzano’s point that “the unit of analysis is not the person, but the events and processes around them, every unique program participant uncovers a collection of micro-events and processes, each of which can be explored in multiple ways to test theories”. From my point of view, the person, especially the intended beneficiaries, should be the central focus, and selected events and process are relevant in as much as they impinge on these peoples lives.  I would re-edit the phrase above as follows “what works for whom in what circumstances”

If the unit of analysis is some category of persons then my guess is that the smallest unit of analysis would be a group of people probably defined by a combination of geographic dimensions (e.g. administrative units) and demographic dimensions (e.g. gender, religion, ethnicity of people to be affected). The minimal number of potential differences between these units of analysis seems to be N-1 (where N = number of identifiable groups) as shown by this fictional example below, where each green node is a point of difference between groups of people. Each of these points of difference could be explained by a particular CMO.

I have one reservation about this approach. It requires some form of prior knowledge about the groupings that matter. That is not unreasonable when evaluating a program that had an explicit goal about reaching particular people. But I am wondering if there is also a more inductive search option. [To be continued…perhaps]

#### How to interview

Both papers had lots of useful advice on how to interview, from a RE perspective. This is primarily from a theory elicitation and clarification perspective.

#### How to conceptualise CMOs

Both papers noted difficulties in operationalising the idea of CMOs, but also had useful advice in this area. Manzano broke the concept of Context down into sub-constructs such as  characteristics of the patients, staff and infrastructure, in the setting she was examining. Punton et al introduced a new category of Intervention, alongside Context and Mechanism. In a development aid context this makes a lot of sense to me. Both authors used interviewing methods that avoided any reference to “CMOs” as a technical term

#### Consolidating the theories

After exploring what could be an endless variety of CMOs a RE process needs to enter a consolidation phase. Manzano points out: “In summary, this phase gives more detailed consideration to a smaller number of CMOs which belong to many families of CMOs”. Punton et al refers to a process of abstraction  that leads to more general explanations “which encompass findings from across different respondents and country settings”. This process sounds very similar in principle to the process of minimization used in QCA, which uses a more algorithm based approach. To my surprise the Punton et al paper highlights differences between QCA and RE rather than potential synergies. A good point about their paper is that it explain this stage in more detail than that by Manzano, which is more focused specifically on interview processes.

#### Testing the theories

The Punton et al paper does not go into this territory because of the early stage of the work that it is describing. Manzano makes more reference to this process, but mainly the context of interviews that are eliciting peoples theories. This is the territory where more light needs to be shone in future, hopefully by follow up papers by Punton et al. My continuing impression is that theory elicitation and testing are so bound up together that the process of testing is effectively not transparent and thus difficult to verify or replicate. But readers could point me to other papers where this view could be corrected…:-)

VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

## The Value of Evaluation: Tools for Budgeting and Valuing Evaluations

Posted on 4 August, 2016 – 9:12 PM
Barr, J., Rinnert, D., Lloyd, R., Dunne, D., & Hentinnen, A. (2016, August). The Value of Evaluation: Tools for Budgeting and Valuing Evaluations Research for Development Output – GOV.UK. ITAF & DFID.

Exec Summary (first part): “DFID has been at the forefront of supporting the generation of evidence to meet the increasing demand for knowledge and evidence about what works in international development. Monitoring and evaluation have become established tools for donor agencies and other actors to demonstrate accountability and to learn. At the same time, the need to demonstrate the impact and value of evaluation activities has also increased. However, there is currently no systematic approach to valuing the benefits of an evaluation, whether at the individual or at the portfolio level.

This paper argues that the value proposition of evaluations for DFID is context-specific, but that it is closely linked to the use of the evaluation and the benefits conferred to stakeholders by the use of the evidence that the evaluation provides. Although it may not always be possible to quantify and monetise this value, it should always be possible to identify and articulate it.

In the simplest terms, the cost of an evaluation should be proportionate to the value that an evaluation is expected to generate. This means that it is important to be clear about the rationale, purpose and intended use of an evaluation before investing in one. To provide accountability for evaluation activity, decision makers are also interested to know whether an evaluation was ‘worth it’ after it has been completed. Namely, did the investment in the evaluation generate information that is in itself more valuable and useful than using the
funds for another purpose.

Against this background, this paper has been commissioned by DFID to answer two main questions:

1. What different methods and approaches can be used to estimate the value of evaluations before commissioning decisions are taken and what tools and approaches are available to assess the value of an already concluded evaluation?

2. How can these approaches be simplified and merged into a practical framework that can be applied and further developed by evaluation commissioners to make evidence-based decisions about whether and how to evaluate before commissioning and contracting?”

Rick Davies comment: The points I  noted/highlighted…
• “…there is surprisingly little empirical evidence available to demonstrate the benefits of evaluation, or to show they can be estimated” …”Evidence’ is thus usually seen as axiomatically ‘a good thing’”
• “A National Audit Office (NAO) review (2013) of evaluation in government was critical across its sample of departments – it found that: “There is little systematic information from the government on how it has used the evaluation evidence that it has commissioned or produced”.
• “…there is currently no systematic approach to valuing the benefits of an evaluation, whether at the individual or at the portfolio level”
• “…most ex-ante techniques may be too time-consuming for evaluation commissioners, including DFID, to use routinely”
• ” The concept of ‘value’ of evaluations is linked to whether and how the knowledge generated during or from an evaluation will be used and by whom.”

The paper proposes that:

• “Consider selecting a sample of evaluations for ex-post valuation within any given reporting period” Earlier it notes that “”…a growing body of ex–post valuation of evaluations at the portfolio level, and their synthesis, will build an evidence base to inform evaluation planning and create a feedback loop that informs learning about commissioning more valuable evaluations”
• “Qualitative approaches that include questionnaires and self-evaluation may offer some merits for commissioners in setting up guidance to standardise the way ongoing and ex-post information is collected on evaluations for ex-post assessment of the benefits of evaluations.”
• “Consider using a case study template for valuing DFID evaluations”
• “An ex-ante valuation framework is included in this paper (see section 4)  which incorporates information from the examination of the above techniques and recommendations. Commissioners could use this framework to develop a tool, to assess the potential benefit of evaluations to be commissioned”

While I agree with all of these…

• The is already a body of empirically-oriented literature on evaluation use dating back to the 1980s that should be given adequate attention. See my probably incomplete bibliography here. This includes a very recent 2016 study by USAID.
• The use of case studies the kind used by the Research Excellence Framework (REF), known as Impact Case Studies’ makes sense. As this paper noted “. The impact case studies do not need to be representative of the spread of research activity in the unit rather they should provide the strongest examples of impact” They are in, other words, a kind of “Most Significant Change” story, including the MSC type requirement that there be “a list of sufficient sources that could, if audited, corroborate key claims made about the impact of the research”  Evaluation use is not a kind of outcome where it seems to make much sense investing a lot of effort into establishing “average affects”. Per unit of money invested it would seem to make more sense searching for the most significant changes (both positive and negative) that people perceive as the effects of an evaluation
• The ex-ante valuation framework is in effect a “loose” Theory of Change“, which needs to be put in use and then tested for its predictive value! Interpreted in crude terms, presumably the more of the criteria listed in the Evaluation Decision Framework (on page 26) are met by a given evaluation the higher our expectations are that the evaluation will be used and have an impact. There are stacks of normative frameworks around telling us how to do things, e.g. on how to have effective partnerships. However, good ideas like these need to disciplined by some effort to test them against what happens in reality.
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

## Process Tracing and Bayesian updating for impact evaluation

Posted on 14 July, 2016 – 4:43 PM
Befani, B., & Stedman-Bryce, G. (2016). Process Tracing and Bayesian updating for impact evaluation. Evaluation, 1356389016654584. http://doi.org/10.1177/1356389016654584

.

Abstract: Commissioners of impact evaluation often place great emphasis on assessing the contribution made by a particular intervention in achieving one or more outcomes, commonly referred to as a ‘contribution claim’. Current theory-based approaches fail to provide evaluators with guidance on how to collect data and assess how strongly or weakly such data support contribution claims. This article presents a rigorous quali-quantitative approach to establish the validity of contribution claims in impact evaluation, with explicit criteria to guide evaluators in data collection and in measuring confidence in their findings. Coined as ‘Contribution Tracing’, the approach is inspired by the principles of Process Tracing and Bayesian Updating, and attempts to make these accessible, relevant and applicable by evaluators. The Contribution Tracing approach, aided by a symbolic ‘contribution trial’, adds value to impact evaluation theory-based approaches by: reducing confirmation bias; improving the conceptual clarity and precision of theories of change; providing more transparency and predictability to data-collection efforts; and ultimately increasing the internal validity and credibility of evaluation findings, namely of qualitative statements. The approach is demonstrated in the impact evaluation of the Universal Health Care campaign, an advocacy campaign aimed at influencing health policy in Ghana.

.

Rick Davies comment: Unfortunately this paper is behind a paywall, but it may become more accessible in the future. If so, I recommend reading it, along with some related papers. These include a recent IIED paper on process tracing: Clearing the fog: new tools for improving the credibility of impact claims, by Barbara Befani, Stefano D’Errico, Francesca Booker, and Alessandra Giuliani. This paper is also about combining process tracing with Bayesian updating. The other is Azad, K. (n.d.). An Intuitive (and Short) Explanation of Bayes’ Theorem, which helped me a lot. Also worth watching out for are future courses on contribution tracing run by Pamoja. I attended their first three-day training event on contribution tracing this week. It was hard going but by the third day I felt I was getting on top of the subject matter. It was run by Befani and Stedman-Bryce, the authors of the main paper above.  Why am I recommending this reading? Because the combination of process tracing and Bayesian probability calculation strikes me as a systemic and transparent way of assessing evidence for and against a causal claim. The downside is the initial difficulty of understanding the concepts involved. Like some other impact assessment tools and methods what you gain in rigor seems to then be put at risk by the fact that it is difficult to communicate how the method works, leaving non-specialist audiences having to trust your judgement, which is what the use of such methods tries to avoid in the first place. The other issue which I think needs more attention is how you aggregate or synthesize multiple contribution claims that are found to have substantial posterior probability. And niggling in the background is a thought: what about all the contribution claims that are found not to be supported, what happens to these?
VN:F [1.9.22_1171]