The Tyranny of Metrics

The Tyranny of Metrics, by Jerry Z Muller, Princeton University Press, RRP£19.95/ $24.95, 240 pages

See Tim Harford’s review of this book in the Financial Times, 24, January 2018

Some quotes: Muller shows that metrics are often used as a substitute for relevant experience, by managers with generic rather than specific expertise. Muller does not claim that metrics are always useless, but that we expect too much from them as a tool of management. ….

The Tyranny of Metrics does us a service in briskly pulling together parallel arguments from economics, management science, philosophy and psychology along with examples from education, policing, medicine, business and the military.

 In an excellent final chapter, Muller summarises his argument thus: “measurement is not an alternative to judgement: measurement demands judgement: judgement about whether to measure, what to measure, how to evaluate the significance of what’s been measured, whether rewards and penalties will be attached to the results, and to whom to make the measurements available”. 

 The book does not engage seriously enough with the possibility that the advantages of metric-driven accountability might outweigh the undoubted downsides. Tellingly, Muller complains of a university ratings metric that rewards high graduation rates, access for disadvantaged students, and low costs. He says these requirements are “mutually exclusive”, but they are not. They are in tension with each other,

Nor does this book reckon with evidence that mechanical statistical predictions often beat the subjective judgment of experts.

…and perhaps most curiously, there is no discussion of computers, cheap sensors, or big data. In this respect, at least, the book could have been written in the 1980s.

Table of Contents

Introduction 1
I THE ARGUMENT
1 The Argument in a Nutshell 17
2 Recurring Flaws 23
II THE BACKGROUND
3 The Origins of Measuring and Paying for Performance 29
4 Why Metrics Became So Popular 39
5 Principals, Agents, and Motivation 49
6 Philosophical Critiques 59
III THE MISMEASURE OF ALL THINGS? Case Studies
7 Colleges and Universities 67
8 Schools 89
9 Medicine 103
10 Policing 125
11 The Military 131
12 Business and Finance 137
13 Philanthropy and Foreign Aid 153
EXCURSUS
14 When Transparency Is the Enemy of Performance: Politics, Diplomacy, Intelligence, and Marriage 159
IV CONCLUSIONS
15 Unintended but Predictable Negative Consequences 169
16 When and How to Use Metrics: A Checklist 175
Acknowledgments 185
Notes 189
Index 213

Search inside this book using a Google Books view

Analyzing Social Networks

To be published in Jan 2018. SECOND EDITION. Published by Sage
Stephen P Borgatti – University of Kentucky, USA
Martin G Everett – Manchester University, UK
Jeffrey C Johnson – University of Florida, USA

Publishers blurb: “Designed to walk beginners through core aspects of collecting, visualizing, analyzing, and interpreting social network data, this book will get you up-to-speed on the theory and skills you need to conduct social network analysis. Using simple language and equations, the authors provide expert, clear insight into every step of the research process—including basic maths principles—without making assumptions about what you know. With a particular focus on NetDraw and UCINET, the book introduces relevant software tools step-by-step in an easy to follow way.

In addition to the fundamentals of network analysis and the research process, this new Second Edition focuses on:

  • Digital data and social networks like Twitter
  • Statistical models to use in SNA, like QAP and ERGM
  • The structure and centrality of networks
  • Methods for cohesive subgroups/community detection
  • Supported by new chapter exercises, a glossary, and a fully updated companion website, this text is the perfect student-friendly introduction to social network analysis.”

Detailed contents list here

 

The Ethics of Influence: Government in the Age of Behavioral Science

by Cass R. Sunstein, Cambridge University Press, 2016

Contents:

1. The age of behavioral science;
2. Choice and its architecture;
3. ‘As judged by themselves’;
4. Values;
5. Fifty shades of manipulation;
6. Do people like nudges? Empirical findings;
7. Green by default? Ethical challenges for environmental protection;
8. Mandates – a very brief recapitulation;
Appendix A. American attitudes toward thirty-four nudges;
Appendix B. Survey questions;
Appendix C. Executive Order 13707: using behavioral science insights to better serve the American people;

Amazon blurb: “In recent years, ‘nudge units’ or ‘behavioral insights teams’ have been created in the United States, the United Kingdom, Germany, and other nations. All over the world, public officials are using the behavioral sciences to protect the environment, promote employment and economic growth, reduce poverty, and increase national security. In this book, Cass R. Sunstein, the eminent legal scholar and best-selling co-author of Nudge (2008), breaks new ground with a deep yet highly readable investigation into the ethical issues surrounding nudges, choice architecture, and mandates, addressing such issues as welfare, autonomy, self-government, dignity, manipulation, and the constraints and responsibilities of an ethical state. Complementing the ethical discussion, The Ethics of Influence: Government in the Age of Behavioral Science contains a wealth of new data on people’s attitudes towards a broad range of nudges, choice architecture, and mandates.

Book Review by Roger Frantz (pdf)

Norms in the Wild: How to Diagnose, Measure, and Change Social Norms

Cristina Bicchieri, Oxford University Press, 2016. View Table of Contents

Publisher summary:

  1. Presents evidence-based assessment tools for assessing and intervening on various social behaviors
  2. Illustrates the role of mass media and autonomous “first movers” as the forefront of wide-scale behavioral change
  3. Provides dichotomous models for assessing normative behaviors
  4. Explains why well-tested interventions sometimes fail to change behavior

 

Amazon blurb: “The philosopher Cristina Bicchieri here develops her theory of social norms, most recently explained in her 2006 volume The Grammar of Society. Bicchieri challenges many of the fundamental assumptions of the social sciences. She argues that when it comes to human behavior, social scientists place too much stress on rational deliberation. In fact, many choices occur without much deliberation at all. Bicchieri’s theory accounts for these automatic components of behavior, where individuals react automatically to cues–those cues often pointing to the social norms that govern our choices in a social world

Bicchieri’s work has broad implications not only for understanding human behavior, but for changing it for better outcomes. People have a strong conditional preference for following social norms, but that also means manipulating those norms (and the underlying social expectations) can produce beneficial behavioral changes. Bicchieri’s recent work with UNICEF has explored the applicability of her views to issues of human rights and well-being. Is it possible to change social expectations around forced marriage, genital mutilations, and public health practices like vaccinations and sanitation? If so, how? What tools might we use? This short book explores how social norms work, and how changing them–changing preferences, beliefs, and especially social expectations–can potentially improve lives all around the world.”

 

 

How to Measure Anything: Finding the Value of Intangibles in Business [and elsewhere]

3rd Edition by Douglas W. Hubbard (Author)

pdf copy of 2nd edition available here

Building up from simple concepts to illustrate the hands-on yet intuitively easy application of advanced statistical techniques, How to Measure Anything reveals the power of measurement in our understanding of business and the world at large. This insightful and engaging book shows you how to measure those things in your business that until now you may have considered “immeasurable,” including technology ROI, organizational flexibility, customer satisfaction, and technology risk.

Offering examples that will get you to attempt measurements-even when it seems impossible-this book provides you with the substantive steps for measuring anything, especially uncertainty and risk. Don’t wait-listen to this book and find out:

  • The three reasons why things may seem immeasurable but are not
  • Inspirational examples of where seemingly impossible measurements were resolved with surprisingly simple methods
  • How computing the value of information will show that you probably have been measuring all the wrong things
  • How not to measure risk
  • Methods for measuring “soft” things like happiness, satisfaction, quality, and more

Amazon.com Review Now updated with new research and even more intuitive explanations, a demystifying explanation of how managers can inform themselves to make less risky, more profitable business decisions This insightful and eloquent book will show you how to measure those things in your own business that, until now, you may have considered “immeasurable,” including customer satisfaction, organizational flexibility, technology risk, and technology ROI.

  • Adds even more intuitive explanations of powerful measurement methods and shows how they can be applied to areas such as risk management and customer satisfaction
  • Continues to boldly assert that any perception of “immeasurability” is based on certain popular misconceptions about measurement and measurement methods
  • Shows the common reasoning for calling something immeasurable, and sets out to correct those ideas
  • Offers practical methods for measuring a variety of “intangibles”
  • Adds recent research, especially in regards to methods that seem like measurement, but are in fact a kind of “placebo effect” for management – and explains how to tell effective methods from management mythology
  • Written by recognized expert Douglas Hubbard-creator of Applied Information Economics

How to Measure Anything, Second Edition illustrates how the author has used his approach across various industries and how any problem, no matter how difficult, ill defined, or uncertain can lend itself to measurement using proven methods.

See also Julia Galef’s podcast interview with the author: 

 

 

PRISM: TOOLKIT FOR EVALUATING THE OUTCOMES AND IMPACTS ?OF SMALL/MEDIUM-SIZED CONSERVATION PROJECTS

WHAT IS PRISM?

PRISM is a toolkit that aims to support small/medium-sized conservation projects to effectively evaluate the outcomes and impacts of their work.

The toolkit has been developed by a collaboration of several conservation NGOs with additional input from scientists and practitioners from across the conservation sector.

The toolkit is divided into four main sections:

Introduction and Key Concepts: Provides a basic overview of the theory behind evaluation relevant to small/medium-sized conservation projects

Designing and Implementing the Evaluation: Guides users through a simple, step by step process for evaluating project outcomes and impacts, including identifying what you need to evaluate, how to collect evaluation data, analysing/interpreting results and deciding what to do next.

Modules: Provides users with additional guidance and directs users towards methods for evaluating outcomes/impacts resulting from five different kinds of conservation action:

  • Awareness and Attitudes
  • Capacity Development
  • Livelihoods and Governance
  • Policy
  • Species and Habitat Management

Method factsheets: Outlines over 60 practical, easy to use methods and supplementary guidance factsheets for collecting, analysing and interpreting evaluation data

Toolkit Website: https://conservationevaluation.org/
PDF copy of manual- Download request form: https://conservationevaluation.org/download/

Recent readings: Replication of findings (not), argument for/against “mixed methods”, use of algorithms (public accountability, cost/benefits, meta data)

Recently noted papers of interest on my Twitter feed:

  • Go Forth and Replicate: On Creating Incentives for Repeat Studies. Scientists have few direct incentives to replicate other researchers’ work, including precious little funding to do replications. Can that change? 09.11.2017 / BY Michael Schulson
    • “A survey of 1,500 scientists, conducted by the journal Nature last year, suggested that researchers often weren’t telling their colleagues — let alone publishing the results — when other researchers’ findings failed to replicate.”… “Each year, the [US] federal government spends more than $30 billion on basic scientific research. Universities and private foundations spend around $20 billion more, according to one estimate. Virtually none of that money is earmarked for research replication”…”In reality, major scientific communities have been beset these last several years over inadequate replication, with some studies heralded as groundbreaking exerting their influence in the scientific literature — sometimes for years, and with thousands of citations — before anyone bothers to reproduce the experiments and discover that they don’t hold water. In fields ranging from cancer biology to social psychology, there’s mounting evidence that replication does not happen nearly enough. The term “replication crisis” is now well on its way to becoming a household phrase.”
  • WHEN GOVERNMENT RULES BY SOFTWARE, CITIZENS ARE LEFT IN THE DARK. TOM SIMONITE, WIRED, BUSINESS, 08.17.1707:00 AM
    • “Most governments the professors queried didn’t appear to have the expertise to properly consider or answer questions about the predictive algorithms they use”…”Researchers believe predictive algorithms are growing more prevalent – and more complex. “I think that probably makes things harder,” says Goodman.”…”Danielle Citron, a law professor at the University of Maryland, says that pressure from state attorneys general, court cases, and even legislation will be necessary to change how local governments think about, and use, such algorithms. “Part of it has to come from law,” she says. “Ethics and best practices never gets us over the line because the incentives just aren’t there.”
  • The evolution of machine learning. Posted Aug 8, 2017 by Catherine Dong (@catzdong) TechCrunch
    • “Machine learning engineering happens in three stages — data processing, model building and deployment and monitoring. In the middle we have the meat of the pipeline, the model, which is the machine learning algorithm that learns to predict given input data.The first stage involves cleaning and formatting vast amounts of data to be fed into the model. The last stage involves careful deployment and monitoring of the model. We found that most of the engineering time in AI is not actually spent on building machine learning models — it’s spent preparing and monitoring those models.Despite the focus on deep learning at the big tech company AI research labs, most applications of machine learning at these same companies do not rely on neural networks and instead use traditional machine learning models. The most common models include linear/logistic regression, random forests and boosted decision trees.”
  • The Most Crucial Design Job Of The Future. What is a data ethnographer, and why is it poised to become so important? 2017.7.24 BY CAROLINE SINDERS. Co-Design
    • Why we need meta data (data about the data we are using). “I advocate we need data ethnography, a term I define as the study of the data that feeds technology, looking at it from a cultural perspective as well as a data science perspective”…”Data is a reflection of society, and it is not neutral; it is as complex as the people who make it.”
  • The Mystery of Mixing Methods. Despite significant progress on mixed methods approaches, their application continues to be (partly) shrouded in mystery, and the concept itself can be subject to misuse. March 28, 2017 By Jos Vaessen. IEG
    • “The lack of an explicit (and comprehensive) understanding of the principles underlying mixed methods inquiry has led to some confusion and even misuses of the concept in the international evaluation community.”
    • Three types of misuse (
    • Five valid reasons for using mixed methods: (Triangulation, Complementarity, Development, Initiation, Expansion)
  • To err is algorithm: Algorithmic fallibility and economic organisation. Wednesday, 10 May 2017. NESTA
    • We should not stop using algorithms simply because they make errors. Without them, many popular and useful services would be unviable. However, we need to recognise that algorithms are fallible and that their failures have costs. This points at an important trade-off between more (algorithm-enabled) beneficial decisions and more (algorithm-caused) costly errors. Where lies the balance?Economics is the science of trade-offs, so why not think about this topic like economists? This is what I have done ahead of this blog, creating three simple economics vignettes that look at key aspects of algorithmic decision-making. These are the key questions:Risk: when should we leave decisions to algorithms, and how accurate do those algorithms need to be?
      Supervision: How do we combine human and machine intelligence to achieve desired outcomes?
      Scale: What factors enable and constrain our ability to ramp-up algorithmic decision-making?
  • A taxonomy of algorithmic accountability. Cory Doctorow / 6:20 am Wed May 31, 2017 Boing Boing
    • “Eminent computer scientist Ed Felten has posted a short, extremely useful taxonomy of four ways that an algorithm can fail to be accountable to the people whose lives it affects: it can be protected by claims of confidentiality (“how it works is a trade secret”); by complexity (“you wouldn’t understand how it works”); unreasonableness (“we consider factors supported by data, even when you there’s no obvious correlation”); and injustice (“it seems impossible to explain how the algorithm is consistent with law or ethics”)”

Why have evaluators been slow to adopt big data analytics?

This is a question posed by Michael Bamberger in his blog posting on the MERL Tech website, titled Building bridges between evaluators and big data analysts. There he puts forward eight reasons (4 main ones and 4 subsidiary points). None of which I disagree with. But I have my own perspective on the same question and posted the following points as a Comment underneath his blog posting.

My take on “Why have evaluators been slow to adopt big data analytics?”

1. “Big data? I am having enough trouble finding any useful data! How to analyse big data is ‘a problem we would like to have’” This is what I suspect many evaluators are thinking.

2. “Data mining is BAD” – because data mining is seen as by evaluators something that is ad hoc and non-transparent. Whereas the best data mining practices are systematic and transparent.

3. “Correlation does not mean causation” – many evaluators have not updated this formulation to the more useful “Association is a necessary but insufficient basis for a strong causal claim”

4. Evaluators focus on explanatory models and do not give much attention to the uses of predictive models, but both are useful in the real world, including the combination of both. Some predictive models can become explanatory models, through follow-up within-case investigations.

5. Lack of appreciation of the limits of manual hypothesis formulation and testing (useful as it can be) as a means of accumulating knowledge. In a project with four outputs and four outcomes there can be 16 different individual causal links between outputs and outcomes, but 2 to the power of 16 possible combinations of these causal links. That’s a lot of theories to choose from (65,536). In this context, search algorithms can be very useful.

6. Lack of knowledge and confidence in the use of machine learning software. There is still work to be done to make this software more user friendly. Rapid Miner, BigML, and EvalC3 are heading in the right direction.

7. Most evaluators probably don’t know that you can use the above software on small data sets. They don’t only work with large data sets. Yesterday I was using EvalC3 with a data set describing 25 cases only.

8. The difficulty of understanding some machine learning findings. Decision tree models (one means of machine learning) are eminently readable, but few can explain the internal logic of specific prediction models generated by artificial neural networks (another means of machine learning, often used for classification of images). Lack of explainability presents a major problem for public accountability. Public accountability for the behavior and use of algorithms is shaping up to be a BIG issue, as highlighted in this week’s Economist Leader article on advances in facial recognition software: What machines can tell from your face

Update: 2017 09 19: See Michael Bamberger’s response to my comments above in the Comment section below. They are copied from his original response posted here http://merltech.org/building-bridges-between-evaluators-and-big-data-analysts/

 

 

Order and Diversity: Representing and Assisting Organisational Learning in Non-Government Aid Organisations.

No, history did not begin three years ago ;-)

“It was twenty years ago today…” well almost. Here is a link to my 1998 PhD Thesis of the above title. It was based on field work I carried out in Bangladesh between 1992 and 1995. Chapter 8 describes the first implementation of what later became the Most Significant Change impact monitoring technique. But there is a lot more of value in this thesis as well, including analysis of the organisational learning literature up to that date, an analysis of the Bangladesh NGO sector in the early 1990s, and a summary of thinking about evolutionary epistemology. Unlike all too many PhDs, this one was useful, even for the immediate subjects of my field work. CCDB was still using the impact monitoring process I helped them set up (i.e. MSC)  when I visited them again in the early 2000’s, albeit with some modifications to suit its expanded use.

Abstract: The aim of this thesis is to develop a coherent theory of organisational learning which can generate practical means of assisting organisational learning. The thesis develops and applies this theory to one class of organisations known as non-government organisations (NGOs), and more specifically to those NGOs who receive funds from high income countries but who work for the benefit of the poor in low income countries. Of central concern are the processes whereby these NGOs learn from the rural and urban poor with whom they work.
The basis of the theory of organisational learning used in this thesis is modern evolutionary theory, and more particularly, evolutionary epistemology. It is argued that this theory provides a means of both representing and assisting organisational learning. Firstly, it provides a simple definition of learning that can be operationalised at multiple scales of analysis: that of individuals, organisations, and populations of organisations. Differences in the forms of organisational learning that do take place can be represented using a number of observable attributes of learning which are derived from an interpretation of evolutionary theory. The same evolutionary theory can also provide useful explanations of processes thus defined and represented. Secondly, an analysis of organisational learning using these observable attributes and background theory also suggest two ways in which organisational learning can be assisted. One is the use of specific methods within NGOs: a type of participatory monitoring. The second is the use of particular interventions by their donors: demands for particular types of information which are indicative of how and where the NGO is learning In addition to these practical implications, it is argued that a specific concern with organisational learning can be related to a wider problematic which should be of concern to Development Studies: one which is described as “the management of diversity”. Individual theories, organisations, and larger social structures may not survive in the face of diversity and change. In surviving they may constrain and / or enable other agents, with feedback effects into the scale and forms of diversity possible. The management of diversity can be analysed descriptively and prescriptively, at multiple scales of aggregation.

 

Twitters posts tagged as #evaluation

This post should feature a continually updated feed of all Twitter tweets tagged as: #evaluation


%d bloggers like this: