Straws-in-the-wind, Hoops and Smoking Guns: What can Process Tracing Offer to Impact Evaluation?

[Spotted via  tweet by Chris Roche]
Punton, M., Welle, K., 2015. Straws-in-the-wind, Hoops and Smoking Guns: What can Process Tracing Offer to Impact Evaluation? Available as pdf

See also the Annex Applying Process Tracing in Five Steps, also available as pdf

Abstract:  “This CDI Practice Paper by Melanie Punton and Katharina Welle explains the methodological and theoretical foundations of process tracing, and discusses its potential application in international development impact evaluations. It draws on two early applications of process tracing for assessing impact in international development interventions: Oxfam Great Britain (GB)’s contribution to advancing universal health care in Ghana, and the impact of the Hunger and Nutrition Commitment Index (HANCI) on policy change in Tanzania. In a companion to this paper, Practice Paper 10 Annex describes the main steps in applying process tracing and provides some examples of how these steps might be applied in practice.”

Annex abstract: Abstract This Practice Paper Annex describes the main steps in applying process tracing, as adapted from Process-Tracing Methods: Foundations and Guidelines (Beach and Pedersen 2013). It
also provides some examples of how these steps might be applied in practice, drawing on a case study discussed in CDI Practice Paper 10.

Rick Davies Comment: This is one of a number of recent publications now available on process tracing (See bibliography below)). The good thing about this IDS publication is its practical orientation, on how to do process tracing. However, I think there are three gaps which concern me:

  • Not highlighting how process tracing (based on within-case investigations) can be complimentary to cross-case investigations (which can be done using the Configurational or Regularity approaches in Box 1 of this paper). While within-case investigations can elaborate on the how-things-work question, across-case questions can tell us about the scale on which these things are happening (i.e. their coverage). The former is about mechanisms, the latter is about associations. A good causal claim will involve both association(s) and mechanism(s).
  • Not highlighting out the close connection between conceptions of necessary and sufficient causes and the four types of tests the paper describes. The concepts of necessary and/or sufficient causes provide a means of connecting both levels of analyses, they can be used to describe what is happening at both levels (causal conditions and configurations in cross-cases investigations and mechanisms in within-case investigations).
  • Not highlighting out that there are two important elements to the tests, not just one (probability). One is the ability to disprove a proposition of sufficiency or necessity through the existence of a single contrary case, the other is the significance of the prior probability of an event happening. See more below…

The Better Evaluation website describes the relationship between the tests and the types of causes as follows (with some extra annotations here by myself)

  • ‘Hoop’ test is failed when examination of a case shows the presence of a Necessary causal condition but the outcome of interest is not present.
    • Passing a common hoop condition is more persuasive than an uncommon one [This is the Bayesian bit referred to in the IDS paper – the significance of an event is affected by our prior assumptions about its occurrence]
  • ‘Smoking Gun’ test is passed when examination of a case shows the presence of a Sufficient causal condition.
    • Passing an uncommon smoking gun condition is more persuasive than a common one [The Bayesian bit again]
  • ‘Doubly Definitive’ test is passed when examination of a case shows that a condition is both Necessary and Sufficient support for the explanation. These tend to be rare.

Instead, the authors (possibly following other cited authors) make use of two related distinctions, between certainty and uniqueness, in place of necessity and sufficiency. I am not sure that this helps much. Certainty arises from something being a necessity, not the other way around

Postscript: I have set up a Zotero based online bibliography on process tracing here, which displays all the papers I have come across in recent years, which may be of interest to readers of MandE NEWS

Evidence based medicine: a movement in crisis?

Greenhalgh, T., Howick, J., Maskrey, N.,Evidence based medicine: a movement in crisis? , 2014. BMJ 348 http://www.bmj.com/content/348/bmj.g3725 Available as pdf

This paper popped up in a comments made on Chris Roche’s posting on the From Poverty to Power website

Rick Davies comment: The paper is interesting in the first instance because both the debate and practice about evidence based policy and practice seems to be much further ahead in the field of medicine than it is in the field of development aid (…broad generalisation that this is…).  There are also parallels between different approaches in medicine and different approaches in development aid.

  • In medicine, one is rule based, focused on average affects when trying to meet common needs in populations and the other is expertise focused on the specific and often unique needs of individuals.
  • In development aid one is centrally planned and nationally rolled out services meeting basic needs like water supply or education and the other is much more person centered participatory rural and other development programs

I have written a blog in response to some issues raised in this paper here, titled  In defense of the (careful) use of algorithms and the need for dialogue between tacit (expertise) and explicit (rules) forms of knowledge

Excerpts from the BMJ paper

Box 1: Crisis in evidence based medicine?

  • The evidence based “quality mark” has been misappropriated by vested interests

  • The volume of evidence, especially clinical guidelines, has become unmanageable

  • Statistically significant benefits may be marginal in clinical practice

  • Inflexible rules and technology driven prompts may produce care that is management driven rather than patient centred

  • Evidence based guidelines often map poorly to complex multimorbidity

Box 2: What is real evidence based medicine and how do we achieve it?

Real evidence based medicine:
  • Makes the ethical care of the patient its top priority

  • Demands individualised evidence in a format that clinicians and patients can understand

  • Is characterised by expert judgment rather than mechanical rule following

  • Shares decisions with patients through meaningful conversations

  • Builds on a strong clinician-patient relationship and the human aspects of care

  • Applies these principles at community level for evidence based public health

Actions to deliver real evidence based medicine
  • Patients must demand better evidence, better presented, better explained, and applied in a more personalised way

  • Clinical training must go beyond searching and critical appraisal to hone expert judgment and shared decision making skills

  • Producers of evidence summaries, clinical guidelines, and decision support tools must take account of who will use them, for what purposes, and under what constraints

  • Publishers must demand that studies meet usability standards as well as methodological ones

  • Policy makers must resist the instrumental generation and use of “evidence” by vested interests

  • Independent funders must increasingly shape the production, synthesis, and dissemination of high quality clinical and public health evidence

  • The research agenda must become broader and more interdisciplinary, embracing the experience of illness, the psychology of evidence interpretation, the negotiation and sharing of evidence by clinicians and patients, and how to prevent harm from overdiagnosis

Process Tracing: From Metaphor to Analytic Tool

Bennett, A., Checkel, J. (Eds.), 2014. Process Tracing: From Metaphor to Analytic Tool. Cambridge University Press

Search the contents via Google Books

“This book argues that techniques falling under the label of process tracing are particularly well suited for measuring and testing hypothesized causal mechanisms. Indeed, a growing number of political scientists now invoke the term. Despite or perhaps because of this fact, a buzzword problem has arisen, where process tracing is mentioned, but often with little thought or explication of how it works in practice. As one sharp observer has noted, proponents of qualitative methods draw upon various debates – over mechanisms and causation, say – to argue that process tracing is necessary and good. Yet, they have done much less work to articulate the criteria for determining whether a particular piece of research counts as good process tracing (Waldner 2012: 65–68). Put differently, “there is substantial distance between the broad claim that ‘process tracing is good’ and the precise claim ‘this is an instance of good process tracing’” (Waldner 2011: 7).

This volume addresses such concerns, and does so along several dimensions. Meta-theoretically, it establishes a philosophical basis for process tracing – one that captures mainstream uses while simultaneously being open to applications by interpretive scholars. Conceptually, contributors explore the relation of process tracing to mechanism-based understandings of causation. Most importantly, we articulate best practices for individual process-tracing accounts – for example, criteria for how micro to go and how to deal with the problem of equifinality (the possibility that there may be multiple pathways leading to the same outcome).

Ours is an applied methods book – and not a standard methodology text – where the aim is to show how process tracing works in practice. If Van Evera (1997), George and Bennett (2005), Gerring (2007a), and Rohlfing (2012) set the state of the art for case studies, then our volume is a logical follow-on, providing clear guidance for what is perhaps the central within-case method – process tracing.

Despite all the recent attention, process tracing – or the use of evidence from within a case to make inferences about causal explanations of that case – has in fact been around for thousands of years. Related forms of analysis date back to the Greek historian Thucydides and perhaps even to the origins of human language and society. It is nearly impossible to avoid historical explanations and causal inferences from historical cases in any purposive human discourse or activity.

Although social science methodologists have debated and elaborated on formal approaches to inference such as statistical analysis for over a hundred years, they have only recently coined the term “process tracing” or attempted to explicate its procedures in a systematic way. Perhaps this is because drawing causal inferences from historical cases is a more intuitive practice than statistical analysis and one that individuals carry out in their everyday lives. Yet, the seemingly intuitive nature of process tracing obscures that its unsystematic use is fraught with potential inferential errors; it is thus important to utilize rigorous methodological safeguards to reduce such risks.

The goal of this book is therefore to explain the philosophical foundations, specific techniques, common evidentiary sources, and best practices of process tracing to reduce the risks of making inferential errors in the analysis of historical and contemporary cases. This introductory chapter first defines process tracing and discusses its foundations in the philosophy of social science. We then address its techniques and evidentiary sources, and advance ten bestpractice criteria for judging the quality of process tracing in empirical research. The chapter concludes with an analysis of the methodological issues specific to process tracing on general categories of theories, including structuralinstitutional, cognitive-psychological, and sociological. Subsequent chapters take up this last issue in greater detail and assess the contributions of process tracing in particular research programs or bodies of theory”

Preface
Part I. Introduction:
1. Process tracing: from philosophical roots to best practices Andrew Bennett and Jeffrey T. Checkel
Part II. Process Tracing in Action:
2. Process tracing the effects of ideas Alan M. Jacobs
3. Mechanisms, process, and the study of international institutions Jeffrey T. Checkel
4. Efficient process tracing: analyzing the causal mechanisms of European integration Frank Schimmelfennig
5. What makes process tracing good? Causal mechanisms, causal inference, and the completeness standard in comparative politics David Waldner
6. Explaining the Cold War’s end: process tracing all the way down? Matthew Evangelista
7. Process tracing, causal inference, and civil war Jason Lyall
Part III. Extensions, Controversies, and Conclusions:
8. Improving process tracing: the case of multi-method research Thad Dunning
9. Practice tracing Vincent Pouliot
10. Beyond metaphors: standards, theory, and the ‘where next’ for process tracing Jeffrey T. Checkel and Andrew Bennett
Appendix. Disciplining our conjectures: systematizing process tracing with Bayesian analysis.

See also: Bennett, A., 2008. Process Tracing: A Bayesian Perspective. The Oxford Handbook of Political Methodology Chapter 30. Pages 702–721. (a pdf)

How to interpret P values, according to xkcd :-)

Background: 
“When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. Hypothesis tests are used to test the validity of a claim that is made about a population. This claim that’s on trial, in essence, is called the null hypothesis….(continue here...)

The xkcd view

 

THE FUTURE OF EVALUATION: 10 PREDICTIONS (& you can add your votes)

From John Garnagi’s EvalBlog post of 30 January 2015 See also the 40+ comments posted there as well

As someone said, “Making predictions can be difficult, especially about the future”

Give your opinions on these predictions via the online poll at the bottom of this page, and see what others think

See also other writers predictions, access ble via links after the opinion poll

(1) Most evaluations will be internal.

The growth of internal evaluation, especially in corporations adopting environmental and social missions, will continue.  Eventually, internal evaluation will overshadow external evaluation.  The job responsibilities of internal evaluators will expand and routinely include organizational development, strategic planning, and program design.  Advances in online data collection and real-time reporting will increase the transparency of internal evaluation, reducing the utility of external consultants.

(2) Evaluation reports will become obsolete.

After-the-fact reports will disappear entirely.  Results will be generated and shared automatically—in real time—with links to the raw data and documentation explaining methods, samples, and other technical matters.  A new class of predictive reports, preports, will emerge.  Preports will suggest specific adjustments to program operations that anticipate demographic shifts, economic shocks, and social trends.

(3) Evaluations will abandon data collection in favor of data mining.

Tremendous amounts of data are being collected in our day-to-day lives and stored digitally.  It will become routine for evaluators to access and integrate these data.  Standards will be established specifying the type, format, security, and quality of “core data” that are routinely collected from existing sources.  As in medicine, core data will represent most of the outcome and process measures that are used in evaluations.

(4) A national registry of evaluations will be created.

Evaluators will begin to record their studies in a central, open-access registry as a requirement of funding.  The registry will document research questions, methods, contextual factors, and intended purposes prior to the start of an evaluation.  Results will be entered or linked at the end of the evaluation.  The stated purpose of the database will be to improve evaluation synthesis, meta-analysis, meta-evaluation, policy planning, and local program design.  It will be the subject of prolonged debate.

(5) Evaluations will be conducted in more open ways.

Evaluations will no longer be conducted in silos.  Evaluations will be public activities that are discussed and debated before, during, and after they are conducted.  Social media, wikis, and websites will be re-imagined as virtual evaluation research centers in which like-minded stakeholders collaborate informally across organizations, geographies, and socioeconomic strata.

(6) The RFP will RIP.

The purpose of an RFP is to help someone choose the best service at the lowest price.  RFPs will no longer serve this purpose well because most evaluations will be internal (see 1 above), information about how evaluators conduct their work will be widely available (see 5 above), and relevant data will be immediately accessible (see 3 above).  Internal evaluators will simply drop their data—quantitative and qualitative—into competing analysis and reporting apps, and then choose the ones that best meet their needs.

(7) Evaluation theories (plural) will disappear.

Over the past 20 years, there has been a proliferation of theories intended to guide evaluation practice.  Over the next ten years, there will be a convergence of theories until one comprehensive, contingent, context-sensitive theory emerges.  All evaluators—quantitative and qualitative; process-oriented and outcome-oriented; empowerment and traditional—will be able to use the theory in ways that guide and improve their practice.

(8) The demand for evaluators will continue to grow.

The demand for evaluators has been growing steadily over the past 20 to 30 years.  Over the next ten years, the demand will not level off due to the growth of internal evaluation (see 1 above) and the availability of data (see 3 above).

(9) The number of training programs in evaluation will increase.

There is a shortage of evaluation training programs in colleges and universities.  The shortage is driven largely by how colleges and universities are organized around disciplines.  Evaluation is typically found as a specialty within many disciplines in the same institution.  That disciplinary structure will soften and the number of evaluation-specific centers and training programs in academia will grow.

(10) The term evaluation will go out of favor.

The term evaluation sets the process of understanding a program apart from the process of managing a program.  Good evaluators have always worked to improve understanding and management.  When they do, they have sometimes been criticized for doing more than determining the merit of a program.  To more accurately describe what good evaluators do, evaluation will become known by a new name, such as social impact management.

 

 

See also…

How Systematic Is That Systematic Review? The Case of Improving Learning Outcomes

(copy of a blog posting by David Evans on 2015/03/02 on the World Bank Development Impact blog)

Rick Davies Comment: I have highlighted interesting bits of text in red. The conclusions, also in red, are worth noting. And…make sure you check out the great (as often)  xkcd comic at the end of the posting below :-) 

“With the rapid expansion of impact evaluation evidence has come the cottage industry of the systematic review. Simply put, a systematic review is supposed to “sum up the best available research on a specific question.” We found 238 reviews in 3ie’s database of systematic reviews of “the effectiveness of social and economic interventions in low- and middle- income countries,” seeking to sum up the best evidence on topics as diverse as the effect of decentralized forest management on deforestation and the effect of microcredit on women’s control over household spending.
But how definitive are these systematic reviews really? Over the past two years, we noticed that there were multiple systematic reviews on the same topic: How to improve learning outcomes for children in low and middle income countries. In fact, we found six! Of course, these reviews aren’t precisely the same: Some only include randomized-controlled trials (RCTs) and others include quasi-experimental studies. Some examine only how to improve learning outcomes and others include both learning and access outcomes. One only includes studies in Africa. But they all have the common core of seeking to identify what improves learning outcomes.

Here are the six studies:

  1. Identifying Effective Education Interventions in Sub-Saharan Africa: A Meta-Analysis of Rigorous Impact Evaluations, by Conn (2014)
  2. School Resources and Educational Outcomes in Developing Countries: A Review of the Literature from 1990-2010, by Glewwe et al. (2014)
  3. The Challenge of Education and Learning in the Developing World, by Kremer et al. (2013)
  4. Quality Education for All Children? What Works in Education in Developing Countries, by Krishnaratne et al. (2013)
  5. Improving Learning in Primary Schools of Developing Countries: A Meta-Analysis of Randomized Experiments, by McEwan (2014)
  6. Improving Educational Outcomes in Developing Countries: Lessons from Rigorous Evaluations, by Murnane & Ganimian (2014)

Between them, they cover an enormous amount of educational research. They identify 227 studies that measure the impact of some intervention on learning outcomes in the developing world. 134 of those are RCTs. There are studies from around the world, with many studies from China, India, Chile, and – you guessed it – Kenya. But as we read the abstracts and intros of the reviews, there was some overlap, but also quite a bit of divergence. One highlighted that pedagogical interventions were the most effective; another that information and computer technology interventions raised test scores the most; and a third highlighted school materials as most important.

What’s going on? In a recent paper, we try to figure it out.

Differing Compositions. Despite having the same topic, these studies don’t study the same papers. In fact, they don’t even come close. Out of 227 total studies that have learning outcomes across the six reviews, only 3 studies are in all six reviews, per the figure below. That may not be surprising since there are differences in the inclusion criteria (RCTs only, Africa only, etc.). Maybe some of those studies aren’t the highest quality. But only 13 studies are even in the majority (4, 5, or 6) of reviews. 159 of the total studies (70 percent!) are only included in one review. 74 of those are RCTs and so are arguably of higher quality and should be included in more reviews. (Of course, there are low-quality RCTs and high-quality non-RCTs. That’s just an example.) The most comprehensive of the reviews covers less than half of the studies.

If we do a more parsimonious analysis, looking only at RCTs with learning outcomes at the primary level between 1990 and 2010 in Sub-Saharan Africa (which is basically the intersection of the inclusion criteria of the six reviews), we find 42 total studies, and the median number included in a given systematic review is 15, about one-third. So there is surprisingly little overlap in the studies that these reviews examine.

What about categorization? The reviews also vary in how they classify the same studies. For example, a program providing merit scholarships to girls in Kenya is classified alternatively as a school fee reduction, a cash transfer, a student incentive, or a performance incentive. Likewise, a program that provided computer-assisted learning in India is alternatively classified as “computers or technology” or “materials.”

What drives the different conclusions? Composition or categorization? We selected one positive recommendation from each review and examined which studies were driving that recommendation. We then counted how many of those studies were included in other reviews. As the figure below shows, the proportion varies enormously, but the median value is 33%: In other words, another review would likely have just one third of the studies driving a major recommendation in a given review. So composition matters a lot. This is why, for example, McEwan finds much bigger results for computers than others do: The other reviews include – on average – just one third of the studies that drive his result.

At the same time, categorization plays a role. One review highlights the provision of materials as one of the best ways to improve test scores. But several of the key studies that those authors call “materials,” other authors categorize as “computers” or “instructional technology.” While those are certainly materials, not all materials are created equal.

The variation is bigger on the inside. Systematic reviews tend to group interventions into categories (like “incentives” or “information provision” or “computers”), but saying that one of these delivers the highest returns on average masks the fact the variation within these groups is often as big or bigger than the variation across groups. When McEwan finds that computer interventions deliver the highest returns on average, it can be easy to forget that the same category of interventions includes a lot of clunkers, as you can see in the forest plot from his paper, below. (We’re looking at you, One Laptop Per Child in Peru or in Uruguay; but not at you, program providing laptops in China. Man, there’s even heterogeneity within intervention sub-categories!) Indeed, out of 11 categories of interventions in McEwan’s paper, 5 have a bigger standard deviation across effect sizes within the category than across effect sizes in the entire review sample. And for another 5, the standard deviation within category is more than half the standard deviation of the full sample. This is an argument for reporting effectiveness at lower levels of aggregation of intervention categories.


Source: McEwan (2014)

What does this tell us? First, it’s worth investing in an exhaustive search. Maybe it’s even worth replicating searches. Second, it may be worthwhile to combine systematic review methodologies, such as meta-analysis (which is very systematic but excludes some studies) and narrative review (which is not very systematic but allows inclusion of lots of studies, as well as examination of the specific elements of an intervention category that make it work, or not work). Third, maintain low aggregation of intervention categories so that the categories can actually be useful.

Finally, and perhaps most importantly, take systematic reviews with a grain of salt. What they recommend very likely has good evidence behind it; but it may not be the best category of intervention, since chances are, a lot of evidence didn’t make it into the review.

Oh, and what are the three winning studies that made it into all six systematic reviews?

  1. Many Children Left Behind? Textbooks and Test Scores in Kenya, by Kremer, Glewwe, & Moulin (2009)
  2. Retrospective vs. Prospective Analysis of School Inputs: The Case of Flip Charts in Kenya, by Glewwe, Kremer, Moulin, and Zitzewitz (2004)
  3. Incentives to Learn, by Kremer, Miguel, & Thornton (2009)

Tomorrow, we’ll write briefly on what kinds of interventions are recommended most consistently across the reviews.

Future work. Can someone please now do a systematic review of our systematic review of the systematic reviews?

Credit: xkcd

Overview: An open source document clustering and search tool

Overview is an open-source tool originally designed to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. It’s also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more. Overview does at least three things really well.

  • Find what you don’t even know to look for.
  • See broad trends or patterns across many documents.
  • Make exhaustive manual reading faster, when all else fails.

Search is a wonderful tool when you know what you’re trying to find — and Overview includes advanced search features. It’s less useful when you start with a hunch or an anonymous tip. Or there might be many different ways to phrase what you’re looking for, or you could be struggling with poor quality material and OCR error. By automatically sorting documents by topic, Overview gives you a fast way to see what you have .

In other cases you’re interested in broad patterns. Overview’s topic tree shows the structure of your document set at a glance, and you can tag entire folders at once to label documents according to your own category names. Then you can export those tags to create visualizations.

Rick Davies Comment: This service could be quite useful in various ways, including clustering sets of Most Significant Change (MSC) stories, or micro-narratives form SenseMaker type exercises, or collections of Twitter tweets found via a key word search. For those interested in the details, and preferring transparency to apparent magic, Overview uses the k-means clustering algorithm, which is explained broadly here. One caveat, the processing of documents can take some time, so you may want to pop out for a cup of coffee while waiting. For those into algorithms, here is a healthy critique of careless use of k-means clustering i.e. not paying attention to when its assumptions about the structure of the underlying data are inappropriate

It is the combination of searching using keywords, and the automatic clustering that seems to be the most useful, to me…so far. Another good feature is the ability to label clusters of interest with one or more tags

I have uploaded 69 blog postings from my Rick on the Road blog. If you want to see how Overview hierarchically clusters these documents let me know, I then will enter your email, which will then let Overview give you access. It seems, so far, that there is no simple way of sharing access (but I am inquiring).

Research on the use and influence of evaluations: The beginnings of a list

This is intended to be the start of an accumulating list of references on the subject of evaluation use. Particularly papers that review specific sets or examples of evaluations, rather than talk about the issues in a less grounded way

2016

2015

2014

2012

2009

2000

1997

1986

Related docs

  • Improving the use of monitoring & evaluation processes and findings. Conference Report, Centre for Development Innovation, Wageningen, June 2014  
    • “An existing framework of four areas of factors influencing use …:
      1. Quality factors, relating to the quality of the evaluation. These factors include the evaluation design, planning, approach, timing, dissemination and the quality and credibility of the evidence.
      2. Relational factors: personal and interpersonal; role and influence of evaluation unit; networks,communities of practice.
      3. Organisational factors: culture, structure and knowledge management
      4. External factors, that affect utilisation in ways beyond the influence of the primary stakeholders and the evaluation process.
  • Bibliography provided by ODI, in response to this post Jan 2015. Includes all ODI publications found using keyword “evaluation” – a bit too broad, but still useful
  • ITIG- Utilization of Evaluations- Bibliography. International Development  Evaluation Association. Produced circa 2011/12

The Checklist: If something so simple can transform intensive care, what else can it do?

Fascinating article By ATUL GAWANDE in the New Yorker Magazine, Annals of Medicine DECEMBER 10, 2007 ISSUE

Selected quotes:

There are degrees of complexity, though, and intensive-care medicine has grown so far beyond ordinary complexity that avoiding daily mistakes is proving impossible even for our super-specialists. The I.C.U., with its spectacular successes and frequent failures, therefore poses a distinctive challenge: what do you do when expertise is not enough?

The checklists provided two main benefits, Pronovost observed. First, they helped with memory recall, especially with mundane matters that are easily overlooked in patients undergoing more drastic events. A second effect was to make explicit the minimum, expected steps in complex processes. Pronovost was surprised to discover how often even experienced personnel failed to grasp the importance of certain precautions.

In the Keystone Initiative’s first eighteen months, the hospitals saved an estimated hundred and seventy-five million dollars in costs and more than fifteen hundred lives. The successes have been sustained for almost four years—all because of a stupid little checklist.

But the prospect pushes against the traditional culture of medicine, with its central belief that in situations of high risk and complexity what you want is a kind of expert audacity—the right stuff, again. Checklists and standard operating procedures feel like exactly the opposite, and that’s what rankles many people.

“The fundamental problem with the quality of American medicine is that we’ve failed to view delivery of health care as a science. The tasks of medical science fall into three buckets. One is understanding disease biology. One is finding effective therapies. And one is insuring those therapies are delivered effectively. That third bucket has been almost totally ignored by research funders, government, and academia. It’s viewed as the art of medicine. That’s a mistake, a huge mistake. And from a taxpayer’s perspective it’s outrageous.

Which was followed by this book: The Checklist Manifesto: How to Get Things Right – January 4, 2011

If its good enough for surgeons and airline pilots, is it good enough for evaluators?

See also this favorite paper of mine by Scriven : “THE LOGIC AND METHODOLOGY OF CHECKLISTS, 2005

Procedures for the use of the humble checklist, while no one would deny their utility, in evaluation and elsewhere, are usually thought to fall somewhat below the entry level of what we call a methodology, let alone a theory. But many checklists used in evaluation incorporate a quite complex theory, or at least a set of assumptions, which we are well advised to uncover— and the process of validating an evaluative checklist is a task calling for considerable sophistication. Interestingly, while the theory underlying a checklist is less ambitious than the kind that we normally call program theory, it is often all the theory we need for an evaluation.

Here is a list of evaluation checklists, courtesy of Michegan State University

Serious question: How do you go about constructing good versus useless/ineffective checklists? Is there a meta-checklist covering this task? :-)

Here is one reader’s attempt at such a meta-checklist: http://www.marketade.com/old/checklist-manifesto-book-review.html

Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner

Author(s) : Kotu & Deshpande Release Date: 05 Dec 2014 Published by Morgan Kaufmann Print Book ISBN :9780128014608 eBook ISBN :9780128016503 Pages: 446

Look inside the book here

Key Features

  • Demystifies data mining concepts with easy to understand language
  • Shows how to get up and running fast with 20 commonly used powerful techniques for predictive analysis
  • Explains the process of using open source RapidMiner tools
  • Discusses a simple 5 step process for implementing algorithms that can be used for performing predictive analytics
  • Includes practical use cases and examples

Chapter headings

  • Introduction
  • Data Mining Process
  • Data Exploration
  • Classification
  • Regression
  • Association
  • Clustering
  • Model Evaluation
  • Text Mining
  • Time Series
  • Anomaly Detection
  • Advanced Data Mining
  • Getting Started with RapidMiner

Rick Davies comment: This looks like a very useful book and I have already ordered a copy. Rapid Miner is a a free open source suite of data mining algorithms that can be assembled as modules, according to purpose. I have used Rapid Miner a lot for one specific purpose, to construct Decision Tree models of relationships between project context and intervention conditions and project outcomes. For more on data mining, and Decision Trees in particular, see my Data Mining posting on the Better Evaluation website

%d bloggers like this: