Monitoring and Evaluation NEWS

The Alignment Problem: Machine Learning and Human Values

By Brian Christian. 334 pages. 2020 Norton. Author’s web page here

Brian Christian talking about his book on YouTube

RD comment: This is one of the most interesting and informative books I have read in the last few years. Totally relevant for evaluators thinking about the present and about future trends

Releasing the power of digital data for development. A guide to new opportunities

Releasing the power of digital data for development: A guide to new opportunities. (2020). Frontier Technologies, UKAID, NIRAS.

Available online here: https://datafutures.org/knowledge-products/frontier-data-study-insights-and-guidance-about-how-to-use-digital-data-to-support-the-sdgs/

Contents

Section 1 Executive Summary
Section 2 Introduction
Section 3 Understanding and navigating the new data landscape
Section 4 What is needed to release the new potential?
Section 5 Further considerations
Appendix 1: Data opportunities potentially useful now in testing environments
Appendix 2: Bibliography and further reading
Appendix 3: Methodological notes

Executive Summary

There are 8 conclusions we discuss in this report.

1. There is justified excitement and proven benefits in the use of new digital data sources, particularly where timeliness of data is important or there are persistent gaps in traditional data sources. This might include data from fragile and conflict-affected states, data supporting decision-making about marginalised population groups, or in finding solutions to address persistent ethical issues where traditional sources have not proved adequate.

2. In many cases, improvements in and greater access to traditional data sources could be more effective than just new data alone, including developing traditional data in tandem with new data sources. This includes innovations in digitising traditional data sources, supporting the sharing of data between and within organisations, and integrating the use of new data sources with traditional data.

3. Decision-making around the use of new data sources should be highly devolved by empowering individual staff and be focused on multiple dimensions of data quality, not least because there are no “one size fits all” rules that determine how new digital data sources fit to specific needs, subject matters or geographies. This could be supported by ensuring:
a. Research, innovation, and technical support are highly demand-led, driven by specific data user needs in specific contexts; and
b. Staff have accessible guidance that demystifies the complexities of new data sources, clarifies the benefits and risks that need to be managed, and allows them to be ‘data brokers’ confident in navigating the new data landscape, innovating in it, and coordinating the technical expertise of others.

The main report includes a description of the evidence and conclusions in a way that supports these aims, including a set of guides for staff about the most promising new data sources.

4. Where traditional data sources are failing to provide the detailed data needed, most new data sources provide a potential route to helping with the Agenda 2030 goal to ‘leave no-one behind,’ as often they can provide additional granularity on population sub-groups. But, to avoid harming the interests of marginalised groups, strong ethical frameworks are needed, and affected people should be involved in decisionmaking about how data is processed and used. Action is also required to ensure strong data protection environments according to each type of new data and the contexts of its use.

5. New data sources with the highest potential added value for exploitation now, especially when combined with each other or traditional data sources, were found to be:
a. data from Earth Observation (EO) platforms (including satellites and drones)
b. passive location data from mobile phones

6. While there are specific limitations and risks in different circumstances, each of these data sources provides for significant gains in certain dimensions of data quality compared to some traditional sources and other new data sources. The use of Artificial Intelligence (AI) techniques, such as through machine learning, has high potential to add value to digital datasets in terms of improving aspects of data quality from many different sources, such as social media data, and particularly with large complex datasets and across multiple data sources.

7. Beyond the current time horizon, the most potential for emerging data sources is likely to come from:
• The next generation of Artificial Intelligence
• The next generation of Earth Observation platforms
• Privacy Preserving Data Sharing (PPDS) via the Cloud and
• the Internet of Things (IoT).
No significant other data sources, technologies or techniques were found with high potential to benefit FCDO’s work, which seems to be in line with its current research agenda and innovative activities. Some longer-term data prospects have been identified and these could be monitored to observe increases in their potential in the future.

8. Several other factors are relevant to the optimal use of digital data sources which should be investigated and/or work in these areas maintained. These include important internal and external corporate developments, importantly including continued support to Open Data/ data sharing and enhanced data security systems to underpin it, learning across disciplinary boundaries with official statistics principles at the core, and continued support to capacity-building of national statistical systems in developing countries in traditional data and data innovation.

Calling Bullshit: THE ART OF SKEPTICISM IN A DATA-DRIVEN WORLD

Reviews

Wired review article

Guardian review article

Forbes review article

Kirkus Review article

Podcast Interview with the authors here

ABOUT CALLING BULLSHIT (=publisher blurb)
“Bullshit isn’t what it used to be. Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.

Misinformation, disinformation, and fake news abound and it’s increasingly difficult to know what’s true. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don’t feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.

You don’t need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.

We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.”

Evaluation Failures: 22 Tales of Mistakes Made and Lessons Learned

Edited by: Kylie Hutchinson – Community Solutions, Vancouver, Canada. 2018 Published by Sage. https://us.sagepub.com/en-us/nam/evaluation-failures/book260109

But $30 for 184-page paperback is going to limit its appeal! The electronic version is similarly expensive, more like the cost of a hardback. Fortunately, two example chapters (1 and 8) are available as free pdfs, see below. Reading those two chapters makes me think the rest of the book would also be well worthwhile reading. It is not ofter you see anything written at length about evaluation failures. Perhaps we should set up an online-confessional, where we can line up to anonymously confess our un/professional sins. I will certainly be one of those needing to join such a queue! :)

PART I. MANAGE THE EVALUATION

Chapter 1. It’s Not Me, It’s You: The Value of Addressing Conflict Head On

Chapter 2. The Scope Creep Train Wreck: How Responsive Evaluation Can Go Off the Rails

Chapter 3. The Buffalo Jump: Lessons After the Fall

Chapter 4. Evaluator Self-Evaluation: When Self-Flagellation Is Not Enough

PART II. ENGAGE STAKEHOLDERS

Chapter 5. That Alien Feeling: Engaging All Stakeholders in the Universe

Chapter 6. Seeds of Failure: How the Evaluation of a West African

Chapter 7. I Didn’t Know I Would Be a Tightrope Walker Someday: Balancing Evaluator Responsiveness and Independence

Chapter 8. When National Pride Is Beyond Facts: Navigating Conflicting Stakeholder Requirements

PART III. BUILD EVALUATION CAPACITY

Chapter 9. Stars in Our Eyes: What Happens When Things Are Too Good to Be True

PART IV. DESCRIBE THE PROGRAM

Chapter 10. A “Failed” Logic Model: How I Learned to Connect With All Stakeholders

Chapter 11. Lost Without You: A Lesson in System Mapping and Engaging Stakeholders

PART V. FOCUS THE EVALUATION DESIGN

Chapter 12. You Got to Know When to Hold ’Em: An Evaluation That Went From Bad to Worse

Chapter 13. The Evaluation From Hell: When Evaluators and Clients Don’t Quite Fit

PART VI. GATHER CREDIBLE EVIDENCE

Chapter 14. The Best Laid Plans of Mice and Evaluators: Dealing With Data Collection Surprises in the Field

Chapter 15. Are You My Amigo, or My Chero? The Importance of Cultural Competence in Data Collection and Evaluation

Chapter 16. OMG, Why Can’t We Get the Data? A Lesson in Managing Evaluation Expectations

Chapter 17. No, Actually, This Project Has to Stop Now: Learning When to Pull the Plug

Chapter 18. Missing in Action: How Assumptions, Language, History, and Soft Skills Influenced a Cross-Cultural Participatory Evaluation

PART VII. JUSTIFY CONCLUSIONS

Chapter 19. “This Is Highly Illogical”: How a Spock Evaluator Learns That Context and Mixed Methods Are Everything

Chapter 20. The Ripple That Became a Splash: The Importance of Context and Why I Now Do Data Parties

Chapter 21. The Voldemort Evaluation: How I Learned to Survive Organizational Dysfunction, Confusion, and Distrust

PART VIII. REPORT AND ENSURE USE

Chapter 22. The Only Way Out Is Through

Conclusion

Free Coursera online course: Qualitative Comparative Analysis (QCA)

Highly recommended! A well organised and very clear and systematic exposition. Available at: https://www.coursera.org/learn/qualitative-comparative-analysis

About this Course

Welcome to this massive open online course (MOOC) about Qualitative Comparative Analysis (QCA). Please read the points below before you start the course. This will help you prepare well for the course and attend it properly. It will also help you determine if the course offers the knowledge and skills you are looking for.

What can you do with QCA?

QCA is a comparative method that is mainly used in the social sciences for the assessment of cause-effect relations (i.e. causation).
QCA is relevant for researchers who normally work with qualitative methods and are looking for a more systematic way of comparing and assessing cases.
QCA is also useful for quantitative researchers who like to assess alternative (more complex) aspects of causation, such as how factors work together in producing an effect.
QCA can be used for the analysis of cases on all levels: macro (e.g. countries), meso (e.g. organizations) and micro (e.g. individuals).
QCA is mostly used for research of small- and medium-sized samples and populations (10-100 cases), but it can also be used for larger groups. Ideally, the number of cases is at least 10.
QCA cannot be used if you are doing an in-depth study of one case

What will you learn in this course?

The course is designed for people who have no or little experience with QCA.
After the course you will understand the methodological foundations of QCA.
After the course you will know how to conduct a basic QCA study by yourself.

How is this course organized?

The MOOC takes five weeks. The specific learning objectives and activities per week are mentioned in appendix A of the course guide. Please find the course guide under Resources in the main menu.
The learning objectives with regard to understanding the foundations of QCA and practically conducting a QCA study are pursued throughout the course. However, week 1 focuses more on the general analytic foundations, and weeks 2 to 5 are more about the practical aspects of a QCA study.
The activities of the course include watching the videos, consulting supplementary material where necessary, and doing assignments. The activities should be done in that order: first watch the videos; then consult supplementary material (if desired) for more details and examples; then do the assignments. • There are 10 assignments. Appendix A in the course guide states the estimated time needed to make the assignments and how the assignments are graded. Only assignments 1 to 6 and 8 are mandatory. These 7 mandatory assignments must be completed successfully to pass the course. • Making the assignments successfully is one condition for receiving a course certificate. Further information about receiving a course certificate can be found here: https://learner.coursera.help/hc/en-us/articles/209819053-Get-a-Course-Certificate

About the supplementary material

The course can be followed by watching the videos. It is not absolutely necessary yet recommended to study the supplementary reading material (as mentioned in the course guide) for further details and examples. Further, because some of the covered topics are quite technical (particularly topics in weeks 3 and 4 of the course), we provide several worked examples that supplement the videos by offering more specific illustrations and explanation. These worked examples can be found under Resources in the main menu. •
Note that the supplementary readings are mostly not freely available. Books have to be bought or might be available in a university library; journal publications have to be ordered online or are accessible via a university license. •
The textbook by Schneider and Wagemann (2012) functions as the primary reference for further information on the topics that are covered in the MOOC. Appendix A in the course guide mentions which chapters in that book can be consulted for which week of the course. •
The publication by Schneider and Wagemann (2012) is comprehensive and detailed, and covers almost all topics discussed in the MOOC. However, for further study, appendix A in the course guide also mentions some additional supplementary literature. •
Please find the full list of references for all citations (mentioned in this course guide, in the MOOC, and in the assignments) in appendix B of the course guide.

Fadi Hirzalla

Assistant Professor / Senior Lecturer

Erasmus Graduate School of Social Sciences (EGSH), Erasmus University Rotterdam

Five ways to ensure that models serve society: A manifesto

Saltelli, A., Bammer, G., Bruno, I., Charters, E., Fiore, M. D., Didier, E., Espeland, W. N., Kay, J., Piano, S. L., Mayo, D., Jr, R. P., Portaluri, T., Porter, T. M., Puy, A., Rafols, I., Ravetz, J. R., Reinert, E., Sarewitz, D., Stark, P. B., … Vineis, P. (2020). Five ways to ensure that models serve society: A manifesto. Nature, 582(7813), 482–484. https://doi.org/10.1038/d41586-020-01812-9

The five ways:

1. Mind the assumptions
  - “One way to mitigate these issues is to perform global uncertainty and sensitivity analyses. In practice, that means allowing all that is uncertain — variables, mathematical relationships and boundary conditions — to vary simultaneously as runs of the model produce its range of predictions. This often reveals that the uncertainty in predictions is substantially larger than originally asserted”
2. Mind the hubris
  - “Most modellers are aware that there is a tradeoff between the usefulness of a model and the breadth it tries to capture. But many are seduced by the idea of adding complexity in an attempt to capture reality more accurately. As modellers incorporate more phenomena, a model might fit better to the training data, but at a cost. Its predictions typically become less“
3. Mind the framing
  - “Match purpose and context. Results from models will at least partly reflect the interests, disciplinary orientations and biases of the developers. No one model can serve all purposes. accurate”
4. Mind the consequences
  - “Quantification can backfire. Excessive regard for producing numbers can push a discipline away from being roughly right towards being precisely wrong. Undiscriminating use of statistical tests can substitute for sound judgement. By helping to make risky financial products seem safe, models contributed to derailing the global economy in 2007–08 (ref. 5).”
5. Mind the unknowns
  - “Acknowledge ignorance. For most of the history of Western philosophy, self-awareness of ignorance was considered a virtue, the worthy object of intellectual pursuit”

“Ignore the five, and model predictions become Trojan horses for unstated
interests and values”

“Models’ assumptions and limitations must be appraised openly and honestly. Process and ethics matter as much as intellectual prowess”

“Mathematical models are a great way to explore questions. They are also a dangerous way to assert answers. Asking models for certainty or consensus is more a sign of the difficulties in making controversial decisions than it is a solution, and can invite ritualistic use of quantification”

Evaluating the Future

A blog posting and (summarising) podcast, produced for the EU Evaluation Support Services, by Rick Davies, June 2020

The podcast is available here, on the Capacity4Dev website

The blog posting full text is here as a pdf

- Limitations of common evaluative thinking
- Scenario planning
- Risk vs uncertainty
- Additional evaluation criteria
- Meaningful differences
- Other information sources

Story Completion exercises: An idea worth borrowing?

Yesterday, TheoNabben, a friend and colleague of mine and an MSC trainer, sent me a link to a webpage full of information about a method called Story Completion: https://www.psych.auckland.ac.nz/en/about/story-completion.html

Background

Story Completion is a qualitative research method first developed in the field of psychology but subsequently taken up primarily by feminist researchers. It was originally of interest as a method of enquiring about psychological meanings particularly those that people could not or did not want to explicitly communicate. However, it was subsequently re-conceptualised as a valuable method of accessing and investigating social discourses. These two different perspectives have been described as essentialist versus social constructionist.

Story completion is a useful tool for accessing meaning-making around a particular topic of interest. It is particularly useful for exploring (dominant) assumptions about a topic. This type of research can be framed as exploring either perceptions and understandings or social/discursive constructions of a topic.

This 2019 paper by Clarke et al. provides a good overview and is my main source of comments and explanations on this page

How It Works

The researcher provides the participant with the beginning of the story, called the stem. Typically this is one sentence long but can be longer. For example…

“Catherine has decided that she needs to lose weight. Full of enthusiasm, and in order to prevent her from changing her mind, she is telling her friends in the pub about her plans and motivations.”

The participant is then asked by the researcher to extend that story, by explaining – usually in writing – what happens next. Typically this storyline is about a third person (e.g. a Catherine), not about the participant themselves.

In practice, this form of enquiry can take various forms as suggested by Figure 1 below.

Figure 1: Four different versions of a Story Completion inquiry

Analysis of responses can be done in two ways: (a) horizontally – comparisons across respondents, (B) vertically – changes over time within the narratives.

Here is a good how-to-do-it introduction to Story Completion: http://blogs.brighton.ac.uk/sasspsychlab/2017/10/15/story-completion/

And here is an annotated bibliography that looks very useful: https://cdn.auckland.ac.nz/assets/psych/about/our-research/documents/Resources%20for%20qualitative%20story%20completion%20(July%202019).pdf

How it could be useful for monitoring and evaluation purposes

Story Completion exercises could be a good way of identifying different stakeholders views of the possible consequences of an intervention. Variations in the text of the story stem could allow the exploration of consequences that might vary across gender or other social differences. Variations in the respondents being interviewed would allow exploration of differences in perspective on how a specific intervention might have consequences.

Of course, these responses will need interpretation and would benefit from further questioning. Participatory processes could be designed to enable this type of follow-up. Rather than simply relying on third parties (e.g. researchers), as informed as they might be.

Variations could be developed where literacy is likely to be a problem. Voice recordings could be made instead, and small groups could be encouraged to collectively develop a response to the stem. There would seem to be plenty of room for creativity here.

Postscript

There is a considerable overlap between the Story Completion method and how the ParEvo participatory scenario planning process works.

The commonality of the two methods is that they are both narrative-based. They both start with a story stem/seed designed by the researcher/Facilitator. Then the respondent/participants add an extension onto that story stem describing what happens next. Both methods are future-orientated and largely other-orientated, in other words not about the storyteller themselves. And both processes pay quite a lot of attention after the narratives are developed, to how those narratives can be analysed and compared.

Now for some key differences. With ParEvo the process of narrative development involves multiple people rather than one person. This means multiple alternative storylines can develop, some of which die out, some which continue, and some of which branch into multiple variants. The other difference, already implied, is that the ParEvo process goes through multiple iterations, where is the Story Completion process has only one iteration. So in the case of ParEvo the storylines accumulate multiple segments of text, with a new segment added with each iteration. Content analysis can be carried out with the results of Story Completion and ParEvo exercises. But in the case of ParEvo it is also possible to analyse the structure of people’s participation and how it relates to the contents of the storylines.

Brian Castellani’s Map of the Complexity Sciences

I have limited tolerance for “complexity babble” That is, people talking about complexity in abstract and ungrounded, and in effect, practically inconsequential terms. Also in ways that give no acknowledgement to the surrounding history of ideas.

So, I really appreciate the work Brian has put into his “Map of the Complexity Sciences” produced in 2018. And thought it deserves wider circulation. Note that this is one of a number of iterations and more iterations are likely in the future. Click on the image to go to a bigger copy.

And please note: when you get taken to the bigger copy and when you click on any node a hypertext link there, this will take you to another web page providing detailed information about that concept or person. A lot of work has gone into the construction of this map, which deserves recognition.

Here is a discussion of an earlier iteration: https://www.theoryculturesociety.org/brian-castellani-on-the-complexity-sciences/

Process Tracing as a Practical Evaluation Method: Comparative Learning from Six Evaluations

By Alix Wadeson, Bernardo Monzani and Tom Aston
March 2020. pdf available here

Rick Davies comment: This is the most interesting and useful paper I have seen yet written on process tracing and its use for evaluation purposes. A good mix of methodology discussion, practical examples and useful recommendations.

The Alignment Problem: Machine Learning and Human Values

Like this:

Releasing the power of digital data for development. A guide to new opportunities

Like this:

Calling Bullshit: THE ART OF SKEPTICISM IN A DATA-DRIVEN WORLD

Like this:

Evaluation Failures: 22 Tales of Mistakes Made and Lessons Learned

Like this: