The surprising usefulness of simple measures: HappyOrNot terminals

as described in this very readable article by David Owen:
Customer Satisfaction at the Push of  a Button – HappyOrNot terminals look simple, but the information they gather is revelatory. New Yorker, 2 February 2018, pages 26-29

Read the full article here

Points of interest covered by the article include:

  1. What is so good about them
  2. Why they work so well
  3. Can people “game” the data that is collected
  4. The value of immediacy of data collection
  5. How value is added to data points by information about location and time
  6. Example of real life large scale applications
  7. What is the worse thing that could happen

Other articles on the same subject:

Rick Davies comment: I like the design of the simple experiment described in the first para of this article. Because the locations of the petrol stations were different, and thus not comparable, the managers swapped the “treatment” given to each station i.e the staff they thought were making a difference to the performance of these stations.

Searching for Success: A Mixed Methods Approach to Identifying and Examining Positive Outliers in Development Outcomes

by Caryn Peiffer and Rosita Armytage, April 2018, Development Leadership Program Research Paper 52. Available as pdf

Summary: Increasingly, development scholars and practitioners are reaching for exceptional examples of positive change to better understand how developmental progress occurs. These are often referred to as ‘positive outliers’, but also ‘positive deviants’ and ‘pockets of effectiveness’.
Studies in this literature promise to identify and examine positive developmental change occurring in otherwise poorly governed states. However, to identify success stories, such research largely relies on cases’ reputations, and, by doing so, overlooks cases that have not yet garnered a reputation for their developmental progress.

This paper presents a novel three-stage methodology for identifying and examining positive outlier cases that does not rely solely on reputations. It therefore promises to uncover ‘hidden’ cases of developmental progress as
well as those that have been recognised.

The utility of the methodology is demonstrated through its use in uncovering two country case studies in which surprising rates of bribery reduction occurred, though the methodology has much broader applicability. The advantage of the methodology is validated by the fact that, in both of the cases identified, the reductions in bribery that occurred were largely previously unrecognised.

Contents: 
Summary
Introduction 1
Literature review: How positive outliers are selected 2
Stage 1: Statistically identifying potential positive outliers in bribery reduction 3
Stage 2: Triangulating statistical data 6
Stage 3: In-country case study fieldwork 7
Promise realised: Uncovering hidden ‘positive outliers’ 8
Conclusion 9
References 11
Appendix: Excluded samples from pooled GCB dataset 13

Rick Davies comment: This is a paper that has been waiting to be published, one that unites a qual and quant approach to identifying AND understanding positive deviance / positive outliers [I do prefer the latter term, promoted by the authors of this paper]

The authors use regression analysis to identify statistical outliers, which is appropriate where numerical data is available.. Where the data is binary/categorical is possible to use other methods to identify such outliers. See this page on the use of the EvaLC3 Excel app to find positive outliers in binary data sets.

“How to Publish Statistically Insignificant Results in Economics”

…being the title of  a blog posting submitted by DAVID EVANS , 03/28/2018 on the World Bank’s Development Impact website here

…and reproduced in full here…


Sometimes, finding nothing at all can unlock the secrets of the universe. Consider this story from astronomy, recounted by Lily Zhao: “In 1823, Heinrich Wilhelm Olbers gazed up and wondered not about the stars, but about the darkness between them, asking why the sky is dark at night. If we assume a universe that is infinite, uniform and unchanging, then our line of sight should land on a star no matter where we look. For instance, imagine you are in a forest that stretches around you with no end. Then, in every direction you turn, you will eventually see a tree. Like trees in a never-ending forest, we should similarly be able to see stars in every direction, lighting up the night sky as bright as if were day. The fact that we don’t indicates that the universe either is not infinite, is not uniform, or is somehow changing.”

What can “finding nothing” – statistically insignificant results – tell us in economics? In his breezy personal essay, MIT economist Alberto Abadie makes the case that statistically insignificant results are at least as interesting as significant ones. You can see excerpts of his piece below.

In case it’s not obvious from the above, one of Abadie’s key points (in a deeply reductive nutshell) is that results are interesting if they change what we believe (or “update our priors”). With most public policy interventions, there is no reason that the expected impact would be zero. So there is no reason that the only finding that should change our beliefs is a non-zero finding.

Indeed, a quick review of popular papers (crowdsourced from Twitter) with key results that are statistically insignificantly different from zero showed that the vast majority showed an insignificant result in a context where many readers would expect a positive result.
For example…

  • You think wealth improves health? Not so fast! (Cesarini et al., QJE, 2016)
  • Okay, if wealth doesn’t affect health, maybe you think that education reduces mortality? Nuh-uh! (Meghir, Palme, & Simeonova, AEJ: Applied, 2018)
  • You think going to an elite school improves your test scores? Not! (Abdulkadiroglu, Angrist, & Pathak, Econometrica, 2014)
  • Do you still think going to an elite school improves your test scores, but only in Kenya? No way! (Lucas & Mbiti, AEJ: Applied, 2014)
  • You think increasing teacher salaries will increase student learning? Nice try! (de Ree et al., QJE, 2017)
  • You believe all the hype about microcredit and poverty? Think again! (Banerjee et al., AEJ: Applied, 2015)

and even

  • You think people born on Friday the 13th are unlucky? Think again! (Cesarini et al., Kyklos, 2015)

It also doesn’t hurt if people’s expectations are fomented by active political debate.

  • Do you believe that cutting taxes on individual dividends will increase corporate investment? Better luck next time! (Yagan, AER, 2015)
  • Do you believe that Mexican migrant agricultural laborers drive down wages for U.S. workers? We think not! (Clemens, Lewis, & Postel, AER, forthcoming)
  • Okay, maybe not the Mexicans. But what about Cuban immigrants? Nope! (Card, Industrial and Labor Relations Review, 1980)

In cases where you wouldn’t expect readers to have a strong prior, papers sometimes play up a methodological angle.

  • Do you believe that funding community projects in Sierra Leone will improve community institutions? No strong feelings? It didn’t. But we had a pre-analysis plan which proves we aren’t cherry picking among a thousand outcomes, like some other paper on this topic might do. (Casey, Glennerster, & Miguel, QJE, 2012)
  • Do you think that putting flipcharts in schools in Kenya improves student learning? What, you don’t really have an opinion about that? Well, they don’t. And we provide a nice demonstration that a prospective randomized-controlled trial can totally flip the results of a retrospective analysis. (Glewwe et al., JDE, 2004)

Sometimes, when reporting a statistically insignificant result, authors take special care to highlight what they can rule out.

  • “We find no evidence that wealth impacts mortality or health care utilization… Our estimates allow us to rule out effects on 10-year mortality one sixth as largeas the cross-sectional wealth-mortality gradient.” In other words, we can rule out even a pretty small effect. “The effects on most other child outcomes, including drug consumption, scholastic performance, and skills, can usually be bounded to a tight interval around zero.” (Cesarini et al., QJE, 2016)
  • “We estimate insignificant effects of the [Swedish education] reform [that increased years of compulsory schooling] on mortality in the affected cohort. From the confidence intervals, we can rule out effects larger than 1–1.4 months of increased life expectancy.” (Meghir, Palme, & Simeonova, AEJ: Applied, 2018)
  • “We can rule out even modest positive impacts on test scores.” (de Ree et al., QJE, 2017)

Of course, not all insignificant results are created equal. In the design of a research project, data that illuminates what kind of statistically insignificant result you have can help. Consider five (non-exhaustive) potential reasons for an insignificant result proposed by Glewwe and Muralidharan (and summarized in my blog post on their paper, which I adapt below).

  1. The intervention doesn’t work. (This is the easiest conclusion, but it’s often the wrong one.)
  2. The intervention was implemented poorly. Textbooks in Sierra Leone made it to schools but never got distributed to students (Sabarwal et al. 2014).
  3. The intervention led to substitution away from program inputs by other actors. School grants in India lost their impact in the second year when households lowered their education spending to compensate (Das et al. 2013).
  4. The intervention works for some participants, but it doesn’t alleviate a binding constraint for the average participant. English language textbooks in rural Kenya only benefitted the top students, who were the only ones who could read them (Glewwe et al. 2009).
  5. The intervention will only work with complementary interventions. School grants in Tanzania only worked when complemented with teacher performance pay (Mbiti et al. 2014).

Here are two papers that – just in the abstract – demonstrate detective work to understand what’s going on behind their insignificant results.

For example #1, in Atkin et al. (QJE, 2017), few soccer ball producing firms in Pakistan take up a technology that reduces waste. Why?

“We hypothesize that an important reason for the lack of adoption is a misalignment of incentives within firms: the key employees (cutters and printers) are typically paid piece rates, with no incentive to reduce waste, and the new technology slows them down, at least initially. Fearing reductions in their effective wage, employees resist adoption in various ways, including by misinforming owners about the value of the technology.”

And then, they implemented a second experiment to test the hypothesis.

“To investigate this hypothesis, we implemented a second experiment among the firms that originally received the technology: we offered one cutter and one printer per firm a lump-sum payment, approximately a month’s earnings, conditional on demonstrating competence in using the technology in the presence of the owner. This incentive payment, small from the point of view of the firm, had a significant positive effect on adoption.”

Wow! You thought we had a null result, but by the end of the abstract, we produced a statistically significant result!

For example #2, Michalopoulos and Papaioannou (QJE, 2014) can’t run a follow-up experiment because they’re looking at the partition of African ethnic groups by political boundaries imposed half a century ago. “We show that differences in countrywide institutional structures across the national border do not explain within-ethnicity differences in economic performance.” What? Do institutions not matter? Need we rethink everything we learned from Why Nations Fail? Oh ho, the “average noneffect…masks considerable heterogeneity.” This is a version of Reason 4 from Glewwe and Muralidharan above.

These papers remind us that economists need to be detectives as well as plumbers, especially in the context of insignificant results.

Towards the end of the paper that began this post, Abadie writes that “we advocate a visible reporting and discussion of non-significant results in empirical practice.” I agree. Non-significant results can change our minds. They can teach us. But authors have to do the work to show readers what they should learn. And editors and reviewers need to be open to it.

What else can you read about this topic?


 

The Politics of Evidence: From evidence-based policy to the good governance of evidence

by Justin Parkhurst © 2017 – Routledge,
Available as pdf and readable online, and hardback or paperback

“There has been an enormous increase in interest in the use of evidence for public policymaking, but the vast majority of work on the subject has failed to engage with the political nature of decision making and how this influences the ways in which evidence will be used (or misused) within political areas. This book provides new insights into the nature of political bias with regards to evidence and critically considers what an ‘improved’ use of evidence would look like from a policymaking perspective”

“Part I describes the great potential for evidence to help achieve social goals, as well as the challenges raised by the political nature of policymaking. It explores the concern of evidence advocates that political interests drive the misuse or manipulation of evidence, as well as counter-concerns of critical policy scholars about how appeals to ‘evidence-based policy’ can depoliticise political debates. Both concerns reflect forms of bias – the first representing technical bias, whereby evidence use violates principles of scientific best practice, and the second representing issue bias in how appeals to evidence can shift political debates to particular questions or marginalise policy-relevant social concerns”

“Part II then draws on the fields of policy studies and cognitive psychology to understand the origins and mechanisms of both forms of bias in relation to political interests and values. It illustrates how such biases are not only common, but can be much more predictable once we recognise their origins and manifestations in policy arenas”

“Finally, Part III discusses ways to move forward for those seeking to improve the use of evidence in public policymaking. It explores what constitutes ‘good evidence for policy’, as well as the ‘good use of evidence’ within policy processes, and considers how to build evidence-advisory institutions that embed key principles of both scientific good practice and democratic representation. Taken as a whole, the approach promoted is termed the ‘good governance of evidence’ – a concept that represents the use of rigorous, systematic and technically valid pieces of evidence within decision-making processes that are representative of, and accountable to, populations served”

Contents
Part I: Evidence-based policymaking – opportunities and challenges
Chapter 1. Introduction
Chapter 2. Evidence-based policymaking – an important first step, and the need to take the next
Part II: The politics of evidence
Chapter 3. Bias and the politics of evidence
Chapter 4. The overt politics of evidence – bias and the pursuit of political interests
Chapter 5. The subtle politics of evidence – the cognitive-political origins of bias
Part III: Towards the good governance of evidence
Chapter 6. What is ‘good evidence for policy’? From hierarchies to appropriate evidence.
Chapter 7. What is the ‘good use of evidence’ for policy?
Chapter 8. From evidence-based policy to the good governance of evidence

Wiki Surveys: Open and Quantifiable Social Data Collection

by Matthew J. Salganik, Karen E. C. Levy, PLOS
Published: May 20, 2015 https://doi.org/10.1371/journal.pone.0123483

Abstract: In the social sciences, there is a longstanding tension between data collection methods that facilitate quantification and those that are open to unanticipated information. Advances in technology now enable new, hybrid methods that combine some of the benefits of both approaches. Drawing inspiration from online information aggregation systems like Wikipedia and from traditional survey research, we propose a new class of research instruments called wiki surveys. Just as Wikipedia evolves over time based on contributions from participants, we envision an evolving survey driven by contributions from respondents. We develop three general principles that underlie wiki surveys: they should be greedy, collaborative, and adaptive. Building on these principles, we develop methods for data collection and data analysis for one type of wiki survey, a pairwise wiki survey. Using two proof-of-concept case studies involving our free and open-source website www.allourideas.org, we show that pairwise wiki surveys can yield insights that would be difficult to obtain with other methods.

Also explained in detail in this Vimeo video: https://vimeo.com/51369546

OPM’s approach to assessing Value for Money

by Julian King, Oxford Policy Management. January 2018. Available as pdf

Excerpt from Foreword:

In 2016, Oxford Policy Management (OPM) teamed up with Julian King, an evaluation specialist, who worked with staff from across the company to develop the basis of a robust and distinct OPM approach to assessing VfM. The methodology was successfully piloted during the annual reviews of the Department for International Development’s (DFID) Sub-National Governance programme in Pakistan and MUVA, a women’s economic empowerment programme in Mozambique. The approach involves making transparent, evidence-based judgements about how well resources are being used, and whether the value derived is good enough to justify the investment.

To date, we have applied this approach on upwards of a dozen different development projects and programmes, spanning a range of clients, countries, sectors, and budgets. It has been well received by our clients (both funding agencies and partner governments) and project teams alike, who in particular appreciate the use of explicit evaluative reasoning. This involves developing definitions of what acceptable / good / excellent VfM looks like, in the context of each specific project. Critically, these definitions are co-developed and endorsed upfront, in advance of implementation and before the evidence is gathered, which provides an agreed, objective, and transparent basis for making judgements.

Table of contents
Foreword 1
Acknowledgements 3
Executive summary 4
1 Background 7
1.1 What is VfM? 7
1.2 Why evaluate VfM? 7
1.3 Context 8
1.4 Overview of this guide 8
2 Conceptual framework for VfM assessment 9
2.1 Explicit evaluative reasoning 9
2.2 VfM criteria 10
2.3 DFID’s VfM criteria: the Four E’s 10
2.4 Limitations of the Four E’s 12
2.5 Defining criteria and standards for the Four E’s 13
2.6 Mixed methods evidence 15
2.7 Economic analysis 15
2.8 Complexity and emergent strategy 17
2.9 Perspective matters in VfM 19
2.10 Integration with M&E frameworks 19
2.11 Timing of VfM assessments 20
2.12 Level of effort required 20
3 Designing and implementing a VfM framework 21
3.1 Step 1: theory of change 21
3.2 Steps 2 and 3: VfM criteria and standards 22
3.3 Step 4: identifying evidence required 26
3.4 Step 5: gathering evidence 26
3.5 Steps 6–7: analysis, synthesis, and judgements 27
3.6 Step 8: reporting 29
Bibliography 30
Contact us

Review by Dr E. Jane Davidson, author of Evaluation Methodology Basics (Sage, 2005) and Director of Real Evaluation LLC, Seattle

Finally, an approach to Value for Money that breaks free of the “here’s the formula” approach and instead emphasises the importance of thoughtful and well-evidenced evaluative reasoning. Combining an equity lens with insights and perspectives from diverse stakeholders helps us understand the value of different constellations of outcomes relative to the efforts and investments required to achieve them. This step-by-step guide helps decision makers figure out how to answer the VfM question in an intelligent way when some of the most valuable outcomes may be the hardest to measure – as they so often are.

 

Bit by Bit Social Research in the Digital Age

by Matthew J. Salganik, Princeton University Press, 2017

Very positive reviews by…

Selected quotes:

“Overall, the book relies on a repeated narrative device, imagining how a social scientist and a data scientist might approach the same research opportunity. Salganik suggests that where data scientists are glass-half-full people and see opportunities, social scientists are quicker to highlight problems (the glass-half-empty camp). He is also upfront about how he has chosen to write the book, adopting the more optimistic view of the data scientist, while holding on to the caution expressed by social scientists”

“Salganik argues that data scientists most often work with “readymades”, social scientists with “custommades”, illustrating the point through art: data scientists are more like Marcel Duchamp, using existing objects to make art; meanwhile, social scientists operate in the custom-made style of Michelangelo, which offers a neat fit between research questions and data, but does not scale well. The book is thus a call to arms, to encourage more interdisciplinary research and for both sides to see the potential merits and drawbacks of each approach. It will be particularly welcome to researchers who have already started to think along similar lines, of which I suspect there are many”

Illustrates important ideas with examples of outstanding research

Combines ideas from social science and data science in an accessible style and without jargon

Goes beyond the analysis of “found” data to discuss the collection of “designed” data such as surveys, experiments, and mass collaboration

Features an entire chapter on ethics

Includes extensive suggestions for further reading and activities for the classroom or self-study

Matthew J. Salganik is professor of sociology at Princeton University, where he is also affiliated with the Center for Information Technology Policy and the Center for Statistics and Machine Learning. His research has been funded by Microsoft, Facebook, and Google, and has been featured on NPR and in such publications as the New Yorker, the New York Times, and the Wall Street Journal.

Contents

Preface
1 Introduction
2 Observing Behavior
3 Asking Questions
4 Running Experiments
5 Creating Mass Collaboration
6 Ethics
7 The Future
Acknowledgments
References
Index

More detailed contents page available via Amazon Look Inside

PS: See also this Vimeo video presentation by Salganik: Wiki Surveys – Open and Quantifiable Social Data Collection plus this PLOS paper on the same topic.

 

Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor.

Virginia Eubanks, (2018), New York, NY: St. Martin’s Press

Unfortunately, a contents list does not seem to be available online.  But here is a  lengthy excerpt from the book.

And here is a YouTube interview with the author, a University of Albany political scientist Virginia Eubanks discusses her new book “Automating Inequality: How High Tech Tools Profile, Police, and Punish the Poor.” (Taped: 12/05/2017)

 

Impact Evaluation of Development Interventions A Practical Guide

by  Howard White David A. Raitzer. Published by Asian Development Bank. 2017. Available as a pdf (3.12Mb)

The publisher says “This book offers guidance on the principles, methods, and practice of impact evaluation. It contains material for a range of audiences, from those who may use or manage impact evaluations to applied researchers”

“Impact evaluation is an empirical approach to estimating the causal effects of interventions, in terms of both magnitude and statistical significance. Expanded use of impact evaluation techniques is critical to rigorously derive knowledge from development operations and for development investments and policies to become more evidence-based and effective. To help backstop more use of impact evaluation approaches, this book introduces core concepts, methods, and considerations for planning, designing, managing, and implementing impact evaluation, supplemented by examples. The topics covered range from impact evaluation purposes to basic principles, specific methodologies, and guidance on field implementation. It has materials for a range of audiences, from those who are interested in understanding evidence on “what works” in development, to those who will contribute to expanding the evidence base as applied researchers.”

Contents 

  • Introduction: Impact Evaluation for Evidence-Based Development
  • Using Theories of Change to Identify Impact Evaluation Questions
  • The Core Concepts of Impact Evaluation
  • Randomized Controlled Trials
  • Nonexperimental Designs
  • What and How to Measure: Data Collection for Impact Evaluation
  • Sample Size Determination for Data Collection
  • Managing the Impact Evaluation Process
  • Appendixes

Rick Davies’ comments: I have only scanned, not read, this book. But some of the sections that I found of interest included:

  • 3.4 Time Dimension of Impacts…not always covered, but very important when planning the timing of evaluations of any kind
  • Page 2: “Impact evaluations are empirical studies that quantify the causal effects of interventions on outcomes of interest” I am surprised that the word “explain” is not also included in this definition. Or perhaps it is an intentionally minimalist definition, and omission does not mean it has to be ignored
  • Page 23 on the Funnel of Attribution, which I would like to see presented in the form of overlapping sets
  • There could be better acknowledgment by referencing of other sources e.g to Outcome Mapping (p25, re behavioral change) and Realist Evaluation (p41)
  • Good explanations of the technical terms used, on page 42 and 44 for example
  • Overcoming resistance to RCTs (p59) and 10 things that can go wrong with RCTs (p61)
  • The whole of chapter 6 on data collection
  • and lots more…

The Tyranny of Metrics

The Tyranny of Metrics, by Jerry Z Muller, Princeton University Press, RRP£19.95/ $24.95, 240 pages

See Tim Harford’s review of this book in the Financial Times, 24, January 2018

Some quotes: Muller shows that metrics are often used as a substitute for relevant experience, by managers with generic rather than specific expertise. Muller does not claim that metrics are always useless, but that we expect too much from them as a tool of management. ….

The Tyranny of Metrics does us a service in briskly pulling together parallel arguments from economics, management science, philosophy and psychology along with examples from education, policing, medicine, business and the military.

 In an excellent final chapter, Muller summarises his argument thus: “measurement is not an alternative to judgement: measurement demands judgement: judgement about whether to measure, what to measure, how to evaluate the significance of what’s been measured, whether rewards and penalties will be attached to the results, and to whom to make the measurements available”. 

 The book does not engage seriously enough with the possibility that the advantages of metric-driven accountability might outweigh the undoubted downsides. Tellingly, Muller complains of a university ratings metric that rewards high graduation rates, access for disadvantaged students, and low costs. He says these requirements are “mutually exclusive”, but they are not. They are in tension with each other,

Nor does this book reckon with evidence that mechanical statistical predictions often beat the subjective judgment of experts.

…and perhaps most curiously, there is no discussion of computers, cheap sensors, or big data. In this respect, at least, the book could have been written in the 1980s.

Table of Contents

Introduction 1
I THE ARGUMENT
1 The Argument in a Nutshell 17
2 Recurring Flaws 23
II THE BACKGROUND
3 The Origins of Measuring and Paying for Performance 29
4 Why Metrics Became So Popular 39
5 Principals, Agents, and Motivation 49
6 Philosophical Critiques 59
III THE MISMEASURE OF ALL THINGS? Case Studies
7 Colleges and Universities 67
8 Schools 89
9 Medicine 103
10 Policing 125
11 The Military 131
12 Business and Finance 137
13 Philanthropy and Foreign Aid 153
EXCURSUS
14 When Transparency Is the Enemy of Performance: Politics, Diplomacy, Intelligence, and Marriage 159
IV CONCLUSIONS
15 Unintended but Predictable Negative Consequences 169
16 When and How to Use Metrics: A Checklist 175
Acknowledgments 185
Notes 189
Index 213

Search inside this book using a Google Books view

%d bloggers like this: