Scaling Up What Works: Experimental Evidence on External Validity in Kenyan Education

Centre for Global Development Working Paper 321 3/27/13 Tessa Bold, Mwangi Kimenyi, Germano Mwabu, Alice Ng’ang’a, and Justin Sandefur
Available as pdf


The recent wave of randomized trials in development economics has provoked criticisms regarding external validity. We investigate two concerns—heterogeneity across beneficiaries and implementers—in a randomized trial of contract teachers in Kenyan schools. The intervention, previously shown to raise test scores in NGO-led trials in Western Kenya and parts of India, was replicated across all Kenyan provinces by an NGO and the government. Strong effects of shortterm contracts produced in controlled experimental settings are lost in weak public institutions: NGO implementation produces a positive effect on test scores across diverse contexts, while government implementation yields zero effect. The data suggests that the stark contrast in success between the government and NGO arm can be traced back to implementation constraints and political economy forces put in motion as the program went to scale.

Rick Davies comment: This study attends to two of the concerns I have raised in a  recent blog (My two particular problems with RCTs) – (a) the neglect of important internal variations in performance arising from a focus on average treatment effects, (b) the neglect of the causal role of contextual factors (the institutional setting in this case) which happens when the context is in effect treated as an externality.

It reinforces my view of the importance of a configurational view of causation.  This kind of analysis should be within the reach of experimental studies as well as methods like QCA. For years agricultural scientists have devised and used factorial designs (albeit using fewer factors than the number of conditions found in most QCA studies)

On this subject I came across this relevant quote from R A Fisher: “

If the investigator confines his attention to any single factor we may infer either that he is the unfortunate victim of a doctrinaire theory as to how experimentation should proceed, or that the time, material or equipment at his disposal is too limited to allow him to give attention to more than one aspect of his problem…..

…. Indeed in a wide class of cases (by using factorial designs) an experimental investigation, at the same time as it is made more comprehensive, may also be made more efficient if by more efficient we mean that more knowledge and a higher degree of precision are obtainable by the same number of observations.”

And also, from Wikipedia, another Fisher quote:

“No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or, ideally, one question, at a time. The writer is convinced that this view is wholly mistaken.”

And also

‘Realist evaluation – understanding how programs work in their context’; An expert seminar with Dr. Gill Westhorp; Wageningen, the Netherlands; 29-03-2011

FormDate: 29-03-2011
Venue: Wageningen, the Netherlands

Dear colleague,

With pleasure we would like to announce an expert seminar with Dr. Gill Westhorp on 29th March 2011: ‘Realist evaluation – understanding how programs work in their context’.

‘Realist evaluation (Pawson and Tilley, 1997) is one type of theory based evaluation.  It aims to explore “what works, for whom, in what contexts, to what extent and how”.  It adopts a particular understanding of how programs work, and uses a particular format for program theories to help guide evaluation design, data collection and analysis.

Realist evaluation has a particular focus on understanding the interactions between programs and their contexts and the ways that these influence how programs work. Evaluation expert Dr. Gill Westhorp will discuss the concepts and assumptions that underpin this theory based evaluation approach. What is it that realist evaluation brings to the table of evaluating development programs? How is it different from existing approaches in evaluation in development? How does it understand, and deal with, complexity? What new insights can help strengthen the utility of evaluation for development?

During the morning, Gill will introduce the basic assumptions and key concepts in realist evaluation.  She will also briefly demonstrate how these ideas can be built into other evaluation models using two examples.  These models – realist action research and realist program logic – are participatory models which were designed for use in settings where limited resources, lack of capacity to collect outcomes data, complex programs, and (sometimes) small participant numbers make evaluation difficult.  In the afternoon, the practical implications for evaluation design, data collection and analysis will be discussed. Examples and practical exercises will be included throughout the day.

For those interested and not to far away around that time, please do come and join this interesting event!

Please find attached the Factsheet flyer and Registration Form form. We also suggest you make an early hotel booking (  as the hotel is already quite full. Please indicate to the hotel that you are booking a room for the ‘expert seminar realist evaluation’.

Note: the expert seminar with Dr. Michael Quinn Patton on ‘developmental evaluation’ unfortunately had to be cancelled due to personal reasons. We hope to organise another opportunity with him early next year.

Looking forward to meeting you here at the expert seminar on realist evaluation!

Cecile Kusters (CDI), Irene Guijt (Learning by Design), Jan Brouwers (Context, international cooperation) and Paula Bilinsky (CDI)

Kind regards / Hartelijke groeten,

Cecile Kusters
Participatory Planning, Monitoring & Evaluation – Managing for Impact
Multi-Stakeholder Processes and Social Learning
Centre for Development Innovation
Wageningen UR
P.O. Box 88, 6700 AB Wageningen, The Netherlands
Tel. +31 (0)317 481407 (direct), +31 (0)317 486800 (reception)
Fax +31 (0)317 486801
PPME resource portal:
MSP resource portal: