Evaluation in the Extreme: Research, Impact and Politics in Violently Divided Societies

Posted on 25 November, 2015 – 6:11 PM

Kenneth Bush – University of York Heslington, York, England
Colleen Duggan – International Development Research Centre, Ottawa
published October 2015, by Sage

“Over the past two decades, there has been an increase in the funding of research in and on violently divided societies. But how do we know whether research makes any difference to these societies—is the impact constructive or destructive? This book is the first to systematically explore this question through a series of case studies written by those on the front line of applied research. It offers clear and logical ways to understand the positive or negative role that research, or any other aid intervention, might have in developing societies affected by armed conflict, political unrest and/or social violence.”

Kenneth Bush is Altajir Lecturer and Executive Director of the Post-war Reconstruction and Development Unit, University of York (UK).  From 2016: School of Government & International Affairs, Durham University

Colleen Duggan is a Senior Programme Specialist in the Policy and Evaluation Division of the International Development Research Centre, Ottawa.

Download PDF:  


 Download EBook: http://www.idrc.ca/EN/Resources/Publications/openebooks/584-7/index.html

 Order Book:


VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

The sustainable development goals as a network of targets

Posted on 13 November, 2015 – 12:01 AM

DESA Working Paper No. 141, ST/ESA/2015/DWP/141, March 2015
Towards integration at last? The sustainable development goals as a network of targets
David Le Blanc, Department of Economic & Social Affairs

ABSTRACT “In 2014, UN Member States proposed a set of Sustainable Development Goals (SDGs), which will succeed the Millennium Development Goals (MDGs) as reference goals for the international development community for the period 2015-2030. The proposed goals and targets can be seen as a network, in which links among goals exist through targets that refer to multiple goals. Using network analysis techniques, we show that some thematic areas covered by the SDGs are well connected among one another. Other parts of the network have weaker connections with the rest of the system. The SDGs as a whole are a more integrated system than the MDGs were, which may facilitate policy integration across sectors. However, many of the links among goals that have been documented in biophysical, economic and social dimensions are not explicitly reflected in the SDGs. Beyond the added visibility that the SDGs provide to links among thematic areas, attempts at policy integration across various areas will have to be based on studies of the biophysical, social and economic systems.”

Rick Davies Comment: This is an example of something I would like to see many more examples of (what are in effect, almost): network Theories of Change, in place of overly simplified hierarchical models which typically have few if any feedback loops (aka cyclic graphs) Request: Could the author making the underlying data set publicly available, so other people can do their own network analyses? I know the data set could be reconstructed from existing sources on the SDGs, but…it could save a lot of unnecessary work. Also, the paper should provide a footnote explanation of the layout algorithm used to generate the network diagrams

Some simple improvements that could be made to the existing network diagrams:

  • Vary node size by their centrality (number of immediate connections they have with other nodes)
  • Represent Target nodes as squares and goal nodes as circles, not all as circles

What is now needed is a two mode network diagram showing what agencies (perhaps UN for a start) are prioritizing which SDGs.  This will help focus minds on where coordination needs are greatest, i.e. between which specific agencies re which specific goals. Here is an example of this kind of network diagram from Ghana, showing which different government agencies prioritised which Governance objectives in the Ghana Poverty Reduction Strategy, more than a decade ago. (Blue nodes – government agencies, red nodes = GPRS governance objectives, thicker lines = higher priority). The existence of SDG targets as well as goal could make an updated version of this kind of exercise even more useful.


VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Crowdsourced research: Many hands make tight work

Posted on 11 October, 2015 – 10:44 AM

Crowdsourced research: Many hands make tight work, Raphael Silberzahn& Eric L. Uhlmann, Nature, 07 October 2015

Selected quotes:

“Crowdsourcing research can balance discussions, validate findings and better inform policy”

Crowdsourcing research can reveal how conclusions are contingent on analytical choices. Furthermore, the crowdsourcing framework also provides researchers with a safe space in which they can vet analytical approaches, explore doubts and get a second, third or fourth opinion. Discussions about analytical approaches happen before committing to a particular strategy. In our project, the teams were essentially peer reviewing each other’s work before even settling on their own analyses. And we found that researchers did change their minds through the course of analysis.

Crowdsourcing also reduces the incentive for flashy results. A single-team project may be published only if it finds significant effects; participants in crowdsourced projects can contribute even with null findings. A range of scientific possibilities are revealed, the results are more credible and analytical choices that seem to sway conclusions can point research in fruitful directions. What is more, analysts learn from each other, and the creativity required to construct analytical methodologies can be better appreciated by the research community and the public.

The transparency resulting from a crowdsourced approach should be particularly beneficial when important policy issues are at stake. The uncertainty of scientific conclusions about, for example, the effects of the minimum wage on unemployment, and the consequences of economic austerity policies should be investigated by crowds of researchers rather than left to single teams of analysts.

Under the current system, strong storylines win out over messy results. Worse, once a finding has been published in a journal, it becomes difficult to challenge. Ideas become entrenched too quickly, and uprooting them is more disruptive than it ought to be. The crowdsourcing approach gives space to dissenting opinions.

Researchers who are interested in starting or participating in collaborative crowdsourcing projects can access resources available online. We have publicly shared all our materials and survey templates, and the Center for Open Science has just launched ManyLab, a web space where researchers can join crowdsourced projects.

Summary of  this Nature article in this weeks Economist (Honest disagreement about methods may explain irreproducible results. From the Economist, p82, October 10th, 2015)

“IT SOUNDS like an easy question for any half-competent scientist to answer. Do dark-skinned footballers get given red cards more often than light-skinned ones? But, as Raphael Silberzahn of IESE, a Spanish business school, and Eric Uhlmann of INSEAD, an international one (he works in the branch in Singapore), illustrate in this week’s Nature, it is not. The answer depends on whom you ask, and the methods they use.

Dr Silberzahn and Dr Uhlmann sought their answers from 29 research teams. They gave their volunteers the same wodge of data (covering 2,000 male footballers for a single season in the top divisions of the leagues of England, France, Germany and Spain) and waited to see what would come back.

The consensus was that dark-skinned players were about 1.3 times more likely to be sent off than were their light-skinned confrères. But there was a lot of variation. Nine of the research teams found no significant relationship between a player’s skin colour and the likelihood of his receiving a red card. Of the 20 that did find a difference, two groups reported that dark-skinned players were less, rather than more, likely to receive red cards than their paler counterparts (only 89% as likely, to be precise). At the other extreme, another group claimed that dark-skinned players were nearly three times as likely to be sent off.

Dr Uhlmann and Dr Silberzahn are less interested in football than in the way science works. Their study may shed light on a problem that has quite a few scientists worried: the difficulty of reproducing many results published in journals.

Fraud, unconscious bias and the cherry-picking of data have all been blamed at one time or another—and all, no doubt, contribute. But Dr Uhlmann’s and Dr Silberzahn’s work offers another explanation: that even scrupulously honest scientists may disagree about how best to attack a data set. Their 29 volunteer teams used a variety of statistical models (“everything from Bayesian clustering to logistic regression and linear modelling”, since you ask) and made different decisions about which variables within the data set were deemed relevant. (Should a player’s playing position on the field be taken into account? Or the country he was playing in?) It was these decisions, the authors reckon, that explain why different teams came up with different results.

How to get around this is a puzzle. But when important questions are being considered—when science is informing government decisions, for instance—asking several different researchers to do the analysis, and then comparing their results, is probably a good idea.”

See also another summary of the Nature articel in: A Fix for Social Science, Francis Diep, Pacific Standard, 7th October


VN:F [1.9.22_1171]
Rating: -1 (from 1 vote)

How Useful Are RCTs in Evaluating Transparency and Accountability Projects ?

Posted on 24 September, 2015 – 4:32 PM
by LEAVY, J., IDS Research, Evidence and Learning Working Paper, Issue 1, September 2014. 37 pages. Available as pdf
List of abbreviations iv
1 Introduction 1
1.1 Objectives of this review 2
2 Impact evaluation and RCTs 4
2.1 Impact evaluation definitions 4
2.1.1 Causality and the counterfactual 4
2.2 Strengths and conditions of RCTs 5
3 T&A initiatives 6
3.1 What are ‘transparency’ and ‘accountability’? 6
3.2 Characteristics of T&A initiatives 7
3.2.1 Technology for T&A 8
3.3 Measuring (the impact of) T&A 9
4 RCT evaluation of T&A initiatives 10
4.1 What do we already know? 10
4.2 Implications for evaluation design 12
5 How effective are RCTs in measuring the impact of T&A programmes? 14
5.1 Analytical framework for assessing RCTs in IE of T&A initiatives 14
5.2 Search methods 15
5.3 The studies 16
5.4 Analysis 18
5.4.1 Design 18
5.4.2 Contribution 20
5.4.3 Explanation 20
5.4.4 Effects 22
5.5 Summary 25
6 Conclusion 26
References 28
Rick Davies comment: I liked the systematic way in which the author reviewed the different aspects of 15 relevant RCTs, as documented in section 5. The Conclusions section was balanced and pragmatic
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Social Network Analysis for [M&E of] Program Implementation

Posted on 1 September, 2015 – 2:25 PM
Valente, T.W., Palinkas, L.A., Czaja, S., Chu, K.-H., Brown, C.H., 2015. Social Network Analysis for Program Implementation. PLoS ONE 10, e0131712. doi:10.1371/journal.pone.0131712 Available as pdf

“Abstract: This paper introduces the use of social network analysis theory and tools for implementation research. The social network perspective is useful for understanding, monitoring, influencing, or evaluating the implementation process when programs, policies, practices, or principles are designed and scaled up or adapted to different settings. We briefly describe common barriers to implementation success and relate them to the social networks of implementation stakeholders. We introduce a few simple measures commonly used in social network analysis and discuss how these measures can be used in program implementation. Using the four stage model of program implementation (exploration, adoption, implementation, and sustainment) proposed by Aarons and colleagues [1] and our experience in developing multi-sector partnerships involving community leaders, organizations, practitioners, and researchers, we show how network measures can be used at each stage to monitor, intervene, and improve the implementation process. Examples are provided to illustrate these concepts. We conclude with expected benefits and challenges associated with this approach”.

Selected quotes:

“Getting evidence-based programs into practice has increasingly been recognized as a concern in many domains of public health and medicine [4, 5]. Research has shown that there is a considerable lag between an invention or innovation and its routine use in a clinical or applied setting [6]. There are many challenges in scaling up proven programs so that they reach the many people in need [7–9].”

“Partnerships are vital to the successful adoption, implementation and sustainability of successful programs. Indeed, evidence-based programs that have progressed to implementation and translation stages report that effective partnerships with community-based, school, or implementing agencies are critical to their success [11, 17, 18]. Understanding which partnerships can be created and maintained can be accomplished via social network analysis.”

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

The median impact narrative

Posted on 29 August, 2015 – 4:06 PM

Rick Davies comment: The text below is an excerpt from a longer blog posting found here: Impact as narrative, by Bruce Wydick

I want to suggest one particular tool that I will call the “median impact narrative,” which (though not precisely the average–because the average typically does not factually exist) recounts the narrative of the one or a few of the middle-impact subjects in a study. So instead of highlighting the outlier, Juana, who has built a small textile empire from a few microloans, we conclude with a paragraph describing Eduardo, who after two years of microfinance borrowing, has dedicated more hours to growing his carpentry business and used microloans to weather two modest-size economic shocks to his household, an illness to his wife and the theft of some tools. If one were to choose the subject for the median impact narrative rigorously it could involve choosing the treated subject whose realized impacts represent the closest Euclidean distance (through a weighting of impact variables via the inverse of the variance-covariance matrix) to the estimated ATTs.

Consider, for example, the “median impact narrative” of the outstanding 2013 Haushofer and Shapiro study of GiveDirectly, a study finding an array of substantial impacts from unconditional cash transfers in Kenya. The median impact narrative might recount the experience of Joseph, a goat herder with a family of six who received $1100 in five electronic cash transfers. Joseph and his wife both have only two years of formal schooling and have always struggled to make ends meet with their four children. At baseline, Joseph’s children went to bed hungry an average of three days a week. Eighteen months after receiving the transfers, his goat herd increased by 51%, bringing added economic stability to his household. He also reported a 30% reduction in his children going to bed hungry in the period before the follow-up survey, and a 42% reduction in number of days his children went completely without food. Tests of his cortisol indicated that Joseph experienced a reduction in stress, about 0.14 standard deviations relative to same difference in the control group. This kind of narrative on the median subject from this particular study cements a truthful image of impact into the mind of a reader.

A false dichotomy has emerged between the use of narrative and data analysis; either can be equally misleading or helpful in conveying truth about causal effects. As researchers begin to incorporate narrative into their scientific work, it will begin to create a standard for the appropriate use of narrative by non-profits, making it easier to insist that narratives present an unbiased picture that represents a truthful image of average impacts.”

Some of the attached readers’ Comments are also of interest e.g.

The basic point is a solid and important one: sampling strategy matters to qualitative work and for understanding what really happened for a range of people.

One consideration for sampling is that the same observables (independent vars) that drive sub-group analyses can also be used to help determine a qualitative sub-sample (capturing medians, outliers in both directions, etc).

A second consideration, in the spirit of lieberman’s call for nested analyses (or other forms of linked and sequential qual-quant work), the results of quantitative work can be used to inform sampling of later qualitative work, targeting those representing the range of outcomes values.”

Read more on this topic from this reader here http://blogs.worldbank.org/publicsphere/1-2014

Rick Davies comment: If the argument for using median impact narratives is accepted the interesting question for me is then how do we identify median cases? Bruce Wydick seems to suggest above that this would be done by looking at impact measures and finding a median case among those (Confession: I don’t fully understand his reference to Euclidean distance and ATTs). I would argue that we need to look at median-ness not only in impacts, but also in other attributes of the cases, including the context and interventions experienced by each case. One way of doing this is to measure and use Hamming distance as a measure of similarity between cases, an idea I have discussed elsewhere. This can be done with very basic categorical data, as well as variable data

Postscript: Some readers might ask “Why not simply choose sources of impact narratives from a randomised sample of cases, as you might do with quantitative data? Well, with a random sample of quantitative data you can average the responses. But you just cannot do that with a random sample of narrative data, there is no way of “averaging” the content of a set of texts. But you would end up with a set of stories that readers might then themselves “average out” into one overall impression in their own minds. But that will not be a very transparent or consistent process.

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

What methods may be used in impact evaluations of humanitarian assistance?

Posted on 19 August, 2015 – 6:18 PM

Jyotsna Puri, Anastasia Aladysheva, Vegard Iversen, Yashodhan Ghorpade, Tilman Brück, International Initiative for Impact Evaluation (3ie) Working Paper 22, December 2014. Available as pdf

“Humanitarian crises are complex situations where the demand for aid has traditionally far exceeded its supply. The humanitarian assistance community has
long asked for better evidence on how each dollar should be effectively spent. Impact evaluations of humanitarian assistance can help answer these questions and
also respond to the increasing call to estimate the impact of humanitarian assistance and supplement the rich tradition for undertaking real-time and process evaluations
in the sector. This working paper gives an overview of the methodological techniques that can be used to address some of the important questions in this area, while
simultaneously considering the special circumstances and constraints associated with humanitarian assistance.”

Executive summary
1. Introduction
2. Defining and categorising humanitarian emergencies and humanitarian action
3. Defining and discussing high-quality, theory-based impact evaluations 
3.1 Various forms of evaluations
3.2 Impact evaluations in non-emergency settings
3.3 Impact evaluations in emergency settings
3.4 Objectives of impact evaluations
3.5 Methods for impact evaluations
4. A conceptual framework for using impact evaluations in humanitarian emergencies.
5. Impact evaluations of humanitarian assistance: a review of the literature .
5.1 Emergency relief
5.2 Recovery and resilience
5.3 General discussion on methods used by studies
6. Using appropriate methods to overcome ethical concerns
7. Case studies
Case study 1: Multiple interventions or a multi-agency intervention
Case study 2: Unanticipated emergencies
Case study 3: A complex emergency involving flooding and conflict
Case study 4: A protracted emergency – internally displaced peoples in DRC
Case study 5: Using impact evaluations to estimate the effect of assistance after typhoons in the Philippines
Case study 6: Using impact evaluations to estimate the effect of assistance in the recovery phase in the absence of ex ante planning
8. Conclusions 
Appendix A : Table on impact evaluations of humanitarian relief

VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Case-Selection [for case studies]: A Diversity of Methods and Criteria

Posted on 19 August, 2015 – 12:12 PM
Gerring, J., Cojocaru, L., 2015. Case-Selection: A Diversity of Methods and Criteria. January 2015 Available as pdf

Excerpt: “Case-selection plays a pivotal role in case study research. This is widely acknowledged, and is implicit in the practice of describing case studies by their method of selection – typical, deviant, crucial, and so forth. It is also evident in the centrality of case-selection in methodological work on the case study, as witnessed by this symposium. By contrast, in large-N cross-case research one would never  describe a study solely by its method of sampling. Likewise, sampling occupies a specialized methodological niche within the literature and is not front-and-center in current methodological debates. The reasons for this contrast are revealing and provide a fitting entrée to our subject.

First, there is relatively little variation in methods of sample construction for cross-case research. Most samples are randomly sampled from a known population or are convenience
samples, employing all the data on the subject that is available. By contrast, there are myriad approaches to case-selection in case study research, and they are quite disparate, offering many opportunities for researcher bias in the selection of cases (“cherry-picking”).

Second, there is little methodological debate about the proper way to construct a sample in cross-case research. Random sampling is the gold standard and departures from this standard are
recognized as inferior. By contrast, in case study research there is no consensus about how best to choose a case, or a small set of cases, for intensive study.

Third, the construction of a sample and the analysis of that sample are clearly delineated, sequential tasks in cross-case research. By contrast, in case study research they blend into one
another. Choosing a case often implies a method of analysis, and the method of analysis may drive the selection of cases.

Fourth, because cross-case research encompasses a large sample – drawn randomly or incorporating as much evidence as is available – its findings are less likely to be driven by the
composition of the sample. By contrast, in case study research the choice of a case will very likely determine the substantive findings of the case study.

Fifth, because cross-case research encompasses a large sample claims to external validity are fairly easy to evaluate, even if the sample is not drawn randomly from a well-defined population. By
contrast, in case study research it is often difficult to say what a chosen case is a case of – referred to as a problem of “casing.”

Finally, taking its cue from experimental research, methodological discussion of cross-case research tends to focus on issues of internal validity, rendering the problem of case-selection less
relevant. Researchers want to know whether a study is true for the studied sample. By contrast, methodological discussion of case study research tends to focus on issues of external validity. This could be a product of the difficulty of assessing case study evidence, which tends to demand a great deal of highly specialized subject expertise and usually does not draw on formal methods of analysis that would be easy for an outsider to assess. In any case, the effect is to further accentuate the role of case-selection. Rather than asking whether the case is correctly analyzed readers want to know whether the results are generalizable, and this leads back to the question of case-selection.”

Other recent papers on case selection methods:

Herron, M.C., Quinn, K.M., 2014. A Careful Look at Modern Case Selection Methods. Sociological Methods & Research
 Nielsen, R.A., 2014. Case Selection via Matching. http://www.mit.edu/~rnielsen/Case%20Selection%20via%20Matching.pdf
VN:F [1.9.22_1171]
Rating: -1 (from 1 vote)

Participatory Approaches (to impact evaluation – a pluralist view)

Posted on 12 August, 2015 – 6:15 PM

Methodological Briefs. Impact Evaluation No. 5 by Irene Guijt (and found via the Better Evaluation website). Available as pdf.

“This guide, written by Irene Guijt for UNICEF, looks at the use of participatory approaches in impact evaluation…..By asking the question, ‘Who should be involved, why and how?’ for each step of an impact evaluation, an appropriate and context-specific participatory approach can be developed”


  • Participatory approaches: a brief description
  • When is it appropriate to use this method?
  • How to make the most of participatory approaches
  • Ethical concerns
  • Which other methods work well with this one?
  • Participation in analysis and feedback of results
  • Examples of good practices and challenges

Rick Davies comment: I like the pluralist approach this paper takes towards the use of participatory approaches. It is practically oriented rather than driven by a ideological type of belief that peoples participation must always be maximised. That said, I did find  Table 1 “Types of participation by programme participants in impact evaluation” out of place, because it was a typology with a very simple linear scale with fairly obvious indications of not only what kinds of  participation are possible,but which ones are more desirable. On the other hand I thought Box 3 was really useful, because it spelled out a number of useful questions to ask about possible forms of participation at each stage of the evaluation design, implementation and review process. It is worth noting that given the 22 questions, and assuming for arguments sake they each had binary answers, this means there are at least 2 to the power of 22 different types of ways of building participation into an  evaluation i.e 4,194,304 ways! That seems a bit closer to reality to me, relative to the earlier classification of four types in Table 1

I think the one area here where I would like more detail and examples is on participatory approaches to the analysis of data. Not the collection of data, but its analysis. There is some discussion on page 11 about causality, which would be great to see further developed. I often feel that this is an area of participatory practice where a yellow post-it note might as well placed, saying “here a miracle occurs”

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

The use of Data Envelopment Analysis to calculate priority scores in needs assessments

Posted on 10 August, 2015 – 6:40 PM

by Aldo Benini, July 2015

Priority indices have grown popular for identifying communities most affected by disasters. Responders have produced a number of formats and formulas. Most of these combine indicators using weights and aggregations decided by analysts. Often the rationales for these are weak. In such situations, a data-driven methodology may be preferable. This note discusses the suitability of different approaches. It offers a basic tutorial of a DEA freeware application that works closely with MS Excel. The demo data are from the response to Typhoon Haiyan in the Philippines 2013. . – Mirrored from the Assessment Capacities Project (ACAPS) Web site with permission.

Rick Davies comment: I have dipped into this paper and resolved to learn more about Data Envelope Analysis. It looks like it could be quite useful.

VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)