Calling Bullshit: THE ART OF SKEPTICISM IN A DATA-DRIVEN WORLD

Reviews

Wired review article

Guardian review article

Forbes review article

Kirkus Review article

Podcast Interview with the authors here

ABOUT CALLING BULLSHIT (=publisher blurb)
“Bullshit isn’t what it used to be. Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.

Misinformation, disinformation, and fake news abound and it’s increasingly difficult to know what’s true. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don’t feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.

You don’t need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.

We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.”

Evaluation Failures: 22 Tales of Mistakes Made and Lessons Learned

Edited by: Kylie Hutchinson – Community Solutions, Vancouver, Canada. 2018 Published by Sage. https://us.sagepub.com/en-us/nam/evaluation-failures/book260109

But $30 for 184-page paperback is going to limit its appeal! The electronic version is similarly expensive, more like the cost of a hardback. Fortunately, two example chapters (1 and 8) are available as free pdfs, see below. Reading those two chapters makes me think the rest of the book would also be well worthwhile reading. It is not ofter you see anything written at length about evaluation failures. Perhaps we should set up an online-confessional, where we can line up to anonymously confess our un/professional sins. I will certainly be one of those needing to join such a queue! :)

PART I. MANAGE THE EVALUATION
Chapter 2. The Scope Creep Train Wreck: How Responsive Evaluation Can Go Off the Rails
Chapter 3. The Buffalo Jump: Lessons After the Fall
Chapter 4. Evaluator Self-Evaluation: When Self-Flagellation Is Not Enough
PART II. ENGAGE STAKEHOLDERS
Chapter 5. That Alien Feeling: Engaging All Stakeholders in the Universe
Chapter 6. Seeds of Failure: How the Evaluation of a West African
Chapter 7. I Didn’t Know I Would Be a Tightrope Walker Someday: Balancing Evaluator Responsiveness and Independence
PART III. BUILD EVALUATION CAPACITY
Chapter 9. Stars in Our Eyes: What Happens When Things Are Too Good to Be True
PART IV. DESCRIBE THE PROGRAM
Chapter 10. A “Failed” Logic Model: How I Learned to Connect With All Stakeholders
Chapter 11. Lost Without You: A Lesson in System Mapping and Engaging Stakeholders
PART V. FOCUS THE EVALUATION DESIGN
Chapter 12. You Got to Know When to Hold ’Em: An Evaluation That Went From Bad to Worse
Chapter 13. The Evaluation From Hell: When Evaluators and Clients Don’t Quite Fit
PART VI. GATHER CREDIBLE EVIDENCE
Chapter 14. The Best Laid Plans of Mice and Evaluators: Dealing With Data Collection Surprises in the Field
Chapter 15. Are You My Amigo, or My Chero? The Importance of Cultural Competence in Data Collection and Evaluation
Chapter 16. OMG, Why Can’t We Get the Data? A Lesson in Managing Evaluation Expectations
Chapter 17. No, Actually, This Project Has to Stop Now: Learning When to Pull the Plug
Chapter 18. Missing in Action: How Assumptions, Language, History, and Soft Skills Influenced a Cross-Cultural Participatory Evaluation
PART VII. JUSTIFY CONCLUSIONS
Chapter 19. “This Is Highly Illogical”: How a Spock Evaluator Learns That Context and Mixed Methods Are Everything
Chapter 20. The Ripple That Became a Splash: The Importance of Context and Why I Now Do Data Parties
Chapter 21. The Voldemort Evaluation: How I Learned to Survive Organizational Dysfunction, Confusion, and Distrust
PART VIII. REPORT AND ENSURE USE
Chapter 22. The Only Way Out Is Through
Conclusion

 

 

 

Free Coursera online course: Qualitative Comparative Analysis (QCA)

Highly recommended! A well organised and very clear and systematic exposition. Available at: https://www.coursera.org/learn/qualitative-comparative-analysis

About this Course

Welcome to this massive open online course (MOOC) about Qualitative Comparative Analysis (QCA). Please read the points below before you start the course. This will help you prepare well for the course and attend it properly. It will also help you determine if the course offers the knowledge and skills you are looking for.

What can you do with QCA?

  • QCA is a comparative method that is mainly used in the social sciences for the assessment of cause-effect relations (i.e. causation).
  • QCA is relevant for researchers who normally work with qualitative methods and are looking for a more systematic way of comparing and assessing cases.
  • QCA is also useful for quantitative researchers who like to assess alternative (more complex) aspects of causation, such as how factors work together in producing an effect.
  • QCA can be used for the analysis of cases on all levels: macro (e.g. countries), meso (e.g. organizations) and micro (e.g. individuals).
  • QCA is mostly used for research of small- and medium-sized samples and populations (10-100 cases), but it can also be used for larger groups. Ideally, the number of cases is at least 10.
  • QCA cannot be used if you are doing an in-depth study of one case

What will you learn in this course?

  • The course is designed for people who have no or little experience with QCA.
  • After the course you will understand the methodological foundations of QCA.
  • After the course you will know how to conduct a basic QCA study by yourself.

How is this course organized?

  • The MOOC takes five weeks. The specific learning objectives and activities per week are mentioned in appendix A of the course guide. Please find the course guide under Resources in the main menu.
  • The learning objectives with regard to understanding the foundations of QCA and practically conducting a QCA study are pursued throughout the course. However, week 1 focuses more on the general analytic foundations, and weeks 2 to 5 are more about the practical aspects of a QCA study.
  • The activities of the course include watching the videos, consulting supplementary material where necessary, and doing assignments. The activities should be done in that order: first watch the videos; then consult supplementary material (if desired) for more details and examples; then do the assignments. • There are 10 assignments. Appendix A in the course guide states the estimated time needed to make the assignments and how the assignments are graded. Only assignments 1 to 6 and 8 are mandatory. These 7 mandatory assignments must be completed successfully to pass the course. • Making the assignments successfully is one condition for receiving a course certificate. Further information about receiving a course certificate can be found here: https://learner.coursera.help/hc/en-us/articles/209819053-Get-a-Course-Certificate

About the supplementary material

  • The course can be followed by watching the videos. It is not absolutely necessary yet recommended to study the supplementary reading material (as mentioned in the course guide) for further details and examples. Further, because some of the covered topics are quite technical (particularly topics in weeks 3 and 4 of the course), we provide several worked examples that supplement the videos by offering more specific illustrations and explanation. These worked examples can be found under Resources in the main menu. •
  • Note that the supplementary readings are mostly not freely available. Books have to be bought or might be available in a university library; journal publications have to be ordered online or are accessible via a university license. •
  • The textbook by Schneider and Wagemann (2012) functions as the primary reference for further information on the topics that are covered in the MOOC. Appendix A in the course guide mentions which chapters in that book can be consulted for which week of the course. •
  • The publication by Schneider and Wagemann (2012) is comprehensive and detailed, and covers almost all topics discussed in the MOOC. However, for further study, appendix A in the course guide also mentions some additional supplementary literature. •
  • Please find the full list of references for all citations (mentioned in this course guide, in the MOOC, and in the assignments) in appendix B of the course guide.

 

 

Five ways to ensure that models serve society: A manifesto

Saltelli, A., Bammer, G., Bruno, I., Charters, E., Fiore, M. D., Didier, E., Espeland, W. N., Kay, J., Piano, S. L., Mayo, D., Jr, R. P., Portaluri, T., Porter, T. M., Puy, A., Rafols, I., Ravetz, J. R., Reinert, E., Sarewitz, D., Stark, P. B., … Vineis, P. (2020). Five ways to ensure that models serve society: A manifesto. Nature, 582(7813), 482–484. https://doi.org/10.1038/d41586-020-01812-9

The five ways:

    1. Mind the assumptions
      • “One way to mitigate these issues is to perform global uncertainty and sensitivity analyses. In practice, that means allowing all that is uncertain — variables, mathematical relationships and boundary conditions — to vary simultaneously as runs of the model produce its range of predictions. This often reveals that the uncertainty in predictions is substantially larger than originally asserted”
    2. Mind the hubris
      • Most modellers are aware that there is a tradeoff between the usefulness of a model and the breadth it tries to capture. But many are seduced by the idea of adding complexity in an attempt to capture reality more accurately. As modellers incorporate more phenomena, a model might fit better to the training data, but at a cost. Its predictions typically become less
    3. Mind the framing
      • “Match purpose and context. Results from models will at least partly reflect the interests, disciplinary orientations and biases of the developers. No one model can serve all purposes. accurate”
    4. Mind the consequences
      • Quantification can backfire. Excessive regard for producing numbers can push a discipline away from being roughly right towards being precisely wrong. Undiscriminating use of statistical tests can substitute for sound judgement. By helping to make risky financial products seem safe, models contributed to derailing the global economy in 2007–08 (ref. 5).”
    5. Mind the unknowns
      • Acknowledge ignorance. For most of the history of Western philosophy, self-awareness of ignorance was considered a virtue, the worthy object of intellectual pursuit”

“Ignore the five, and model predictions become Trojan horses for unstated
interests and values”

“Models’ assumptions and limitations must be appraised openly and honestly. Process and ethics matter as much as intellectual prowess”

“Mathematical models are a great way to explore questions. They are also a dangerous way to assert answers. Asking models for  certainty or consensus is more a sign of the  difficulties in making controversial decisions  than it is a solution, and can invite ritualistic use of quantification”

Evaluating the Future

A blog posting and (summarising) podcast, produced for the EU Evaluation Support Services, by Rick Davies, June 2020

The podcast is available here, on the Capacity4Dev website

The blog posting full text is here as a pdf

    • Limitations of common evaluative thinking
    • Scenario planning
    • Risk vs uncertainty
    • Additional evaluation criteria
    • Meaningful differences
    • Other information sources

Process Tracing as a Practical Evaluation Method: Comparative Learning from Six Evaluations

By Alix Wadeson, Bernardo Monzani and Tom Aston
March 2020. pdf available here

Rick Davies comment: This is the most interesting and useful paper I have seen yet written on process tracing and its use for evaluation purposes. A good mix of methodology discussion, practical examples and useful recommendations.

The Power of Experiments: Decision Making in a Data-Driven World


By Michael Luca and Max H. Bazerman, March 2020. Published by MIT Press

How organizations—including Google, StubHub, Airbnb, and Facebook—learn from experiments in a data-driven world.

Abstract

Have you logged into Facebook recently? Searched for something on Google? Chosen a movie on Netflix? If so, you’ve probably been an unwitting participant in a variety of experiments—also known as randomized controlled trials—designed to test the impact of changes to an experience or product. Once an esoteric tool for academic research, the randomized controlled trial has gone mainstream—and is becoming an important part of the managerial toolkit. In The Power of Experiments: Decision-Making in a Data Driven World, Michael Luca and Max Bazerman explore the value of experiments and the ways in which they can improve organizational decisions. Drawing on real world experiments and case studies, Luca and Bazerman show that going by gut is no longer enough—successful leaders need frameworks for moving between data and decisions. Experiments can save companies money—eBay, for example, discovered how to cut $50 million from its yearly advertising budget without losing customers. Experiments can also bring to light something previously ignored, as when Airbnb was forced to confront rampant discrimination by its hosts. The Power of Experiments introduces readers to the topic of experimentation and the managerial challenges that surround them. Looking at experiments in the tech sector and beyond, this book offers lessons and best practices for making the most of experiments.

In The Power of Experiments: Decision-Making in a Data Driven World, Michael Luca and Max Bazerman explore the value of experiments, and the ways in which they can improve organizational decisions. Drawing on real world experiments and case studies, Luca and Bazerman show that going by gut is no longer enough—successful leaders need frameworks for moving between data and decisions. Experiments can save companies money—eBay, for example, discovered how to cut $50 million from its yearly advertising budget without losing customers. Experiments can also bring to light something previously ignored, as when Airbnb was forced to confront rampant discrimination by its hosts.

The Power of Experiments introduces readers to the topic of experimentation and the managerial challenges that surround them. Looking at experiments in the tech sector and beyond, this book offers lessons and best practices for making the most of experiments.

See also a World bank blog review by David McKenzie

The impact of impact evaluation

WIDER Working Paper 2020/20. Richard Manning, Ian Goldman, and Gonzalo Hernández Licona. PDF copy available

Abstract: In 2006 the Center for Global Development’s report ‘When Will We Ever Learn? Improving lives through impact evaluation’ bemoaned the lack of rigorous impact evaluations. The authors of the present paper researched international organizations and countries including Mexico, Colombia, South Africa, Uganda, and Philippines to understand how impact evaluations and systematic reviews are being implemented and used, drawing out the emerging lessons. The number of impact evaluations has risen (to over 500 per year), as have those of systematic reviews and other synthesis products, such as evidence maps. However, impact evaluations are too often donor-driven, and not embedded in partner governments. The willingness of politicians and top policymakers to take evidence seriously is variable, even in a single country, and the use of evidence is not tracked well enough. We need to see impact evaluations within a broader spectrum of tools available to support policymakers, ranging from evidence maps, rapid evaluations, and rapid synthesis work, to formative/process evaluations and classic impact evaluations and systematic reviews.

Selected quotes

4.1 Adoption of IEs On the basis of our survey, we feel that real progress has been made since 2006 in the adoption of IEs to assess programmes and policies in LMICs. As shown above, this progress has not just been in terms of the number of IEs commissioned, but also in the topics covered, and in the development of a more flexible suite of IE products. There is also some evidence, though mainly anecdotal, 89 that the insistence of the IE community on rigour has had some effect both in levering up the quality of other forms of evaluation and in gaining wider acceptance that ‘before and after’ evaluations with no valid control group tell one very little about the real impact of interventions. In some countries, such as South Africa, Mexico, and Colombia, institutional arrangements have favoured the use of evaluations, including IEs, although more uptake is needed.

There is also perhaps a clearer understanding of where IE techniques can or cannot usefully be applied, or combined with other types of evaluation.

At the same time, some limitations are evident. In the first place, despite the application of IE techniques to new areas, the field remains dominated by medical trials and interventions in the social sectors. Second, even in the health sector, other types of evaluation still account for the bulk of total evaluations, whether by donor agencies or LMIC governments.

Third, despite the increase in willingness of a few LMICs to finance and commission their own IEs, the majority of IEs on policies and programmes in such countries are still financed and commissioned by donor agencies, albeit in some cases with the topics defined by the countries, such as in 3ie’s policy windows. In quite a few cases, the prime objectives of such IEs are domestic accountability and/or learning within the donor agency. We believe that greater local ownership of IEs is highly desirable. While there is much that could not have been achieved without donor finance and commissioning, our sense is that—as with other forms of evaluation—a more balanced pattern of finance and commissioning is needed if IEs are to become a more accepted part of national evidence systems.

Fourth, the vast majority of IEs in LMICs appear to have ‘northern’ principal investigators. Undoubtedly, quality and rigour are essential to IEs, but it is important that IEs should not be perceived as a supply-driven product of a limited number of high-level academic departments in, for the most part, Anglo-Saxon universities, sometimes mediated through specialist consultancy firms. Fortunately, ‘southern’ capacity is increasing, and some programmes have made significant investments in developing this. We take the view that this progress needs to be ramped up very considerably in the interests of sustainability, local institutional development, and contributing over time to the local culture of evidence.

Fifth, as pointed out in Section 2.1, the financing of IEs depends to a troubling extent on a small body of official agencies and foundations that regard IEs as extremely important products. Major shifts in policy by even a few such agencies could radically reduce the number of IEs being financed.

Finally, while IEs of individual interventions are numerous and often valuable to the programmes concerned, IEs that transform thinking about policies or broad approaches to key issues of development are less evident. The natural tools for such results are more often synthesis products than one-off IEs, and to these we now turn

4.2 Adoption of synthesis products (building body of evidence)

Systematic reviews and other meta-analyses depend on an adequate underpinning of well structured IEs, although methodological innovation is now using a more diverse set of sources. 91 The take-off of such products therefore followed the rise in the stock of IEs, and can be regarded as a further wave of the ‘evidence revolution’, as it has been described by Howard White (2019). Such products are increasingly necessary, as the evidence from individual IEs grows.

As with IEs, synthesis products have diversified from full systematic reviews to a more flexible suite of products. We noted examples from international agencies in Section 2.1 and to a lesser extent from countries in Section 3, but many more could be cited. In several cases, synthesis products seek to integrate evidence from quasi-experimental evaluations (e.g. J-PAL’s Policy Insights) or other high-quality research and evaluation evidence.

The need to understand what is now available and where the main gaps in knowledge exist has led in recent years to the burgeoning of evidence maps, pioneered by 3ie but now produced by a variety of institutions and countries. The example of the 500+ evaluations in Uganda cited earlier shows the range of evidence that already exists, which should be mapped and used before new evidence is sought. This should be a priority in all countries.

The popularity of evidence maps shows that there is now a real demand to ‘navigate’ the growing body of IE-based evidence in an efficient manner, as well as to understand the gaps that still exist. The innovation happening also in rapid synthesis shows the demand for synthesis products—but more synthesis is still needed in many sectors and, bearing in mind the expansion in IEs, should be increasingly possible.

A broken system – why literature searching needs a FAIR revolution

Gusenbauer, Michael, and Neal R. Haddaway. ‘Which Academic Search Systems Are Suitable for Systematic Reviews or Meta-Analyses? Evaluating Retrieval Qualities of Google Scholar, PubMed, and 26 Other Resources’. Research Synthesis Methods,2019.

Haddaway, Neal, and Michael Gusenbauer. 2020. ‘A Broken System – Why Literature Searching Needs a FAIR Revolution’. LSE (blog). 3 February 2020.

“….searches on Google Scholar are neither reproducible, nor transparent.  Repeated searches often retrieve different results and users cannot specify detailed search queries, leaving it to the system to interpret what the user wants.

However, systematic reviews in particular need to use rigorous, scientific methods in their quest for research evidence. Searches for articles must be as objective, reproducible and transparent as possible. With systems like Google Scholar, searches are not reproducible – a central tenet of the scientific method. 

Specifically, we believe there is a very real need to drastically overhaul how we discover research, driven by the same ethos as in the Open Science movement. The FAIR data principles offer an excellent set of criteria that search system providers can adapt to make their search systems more adequate for scientific search, not just for systematic searching, but also in day-to-day research discovery:

  • Findable: Databases should be transparent in how search queries are interpreted and in the way they select and rank relevant records. With this transparency researchers should be able choose fit-for-purpose databases clearly based on their merits.
  • Accessible: Databases should be free-to-use for research discovery (detailed analysis or visualisation could require payment). This way researchers can access all knowledge available via search.
  • Interoperable: Search results should be readily exportable in bulk for integration into evidence synthesis and citation network analysis (similar to the concept of ‘research weaving’ proposed by Shinichi Nakagawa and colleagues). Standardised export formats help analysis across databases.
  • Reusable: Citation information (including abstracts) should not be restricted by copyright to permit reuse/publication of summaries/text analysis etc.

Rick Davies comment: I highly recommend using Lens.org, a search facility mentioned in the second paper above.