The purpose of this page
-
- To provide information on existing checklists of this kind.
- To collate and prompt ideas on how improved versions of a Evaluation Questions Checklist could be developed
Rationale: The selection of evaluation questions is central to the design of an evaluation. If these choices are not the best possible, the evaluation will be weakened accordingly
Feedback: Please feel free to use the Comment facility at the bottom of this page to make suggestions or comments.
Contents on this page:
1. Existing resources
1.1 Checklists
-
- Delahais, T. (2022) Writing Better Evaluation Questions. Quadrant Conseil
- Wingate, L., & Schroeter, D. (2016). Evaluation Questions Checklist for Program Evaluation. Western Michigan University. The Evaluation Center.
- Spark Policy Institute (2014) Developmental Evaluation Toolkit. Developmental Evaluation Questions
- USAID (2013) Checklist for Defining Evaluation Questions.
- Davies, R (2012) Evaluation questions: Managing agency, bias and scale
- CDC (2013) Good Evaluation Questions: A Checklist to Help Focus Your Evaluation
- Preskill, H., & Jones., N. (2009). A Practical Guide for Engaging Stakeholders in Developing Evaluation Questions. Robert Wood Johnson Foundation.
1.2 References on the use of checklists
-
- Gawande, A. (2011). The checklist manifesto: How to get things right. Profile Books. A great read.
- Scriven, M. (2007). The Logic and Methodology of Checklists. Western Michigan University. Also recommended
- Daniel Stufflebeam. (2000). Guidelines for Developing Evaluation Checklists: The Checklists Development Checklist. Western Michegan University. The Evaluation Center.
2. Suggestions for an improved Evaluation Questions checklist
2.1 The purpose of a checklist
Suggested overall purpose: To help improve the usefulness and quality of an evaluation.
Why use a checklist? Checklists can serve the following purposes:
-
- To remind people of the range of options that are available when choosing evaluation questions,
- To be transparent about the choices that were made
- Optionally, to explain why other options were not taken up.
- To ensure adherence to some minimal standards, if they can be found
Rationale: The three lists below are a mix of options to be chosen from (Lists 2.2 & 2.3) and some provisional minimal standards (2.4) – despite the fact that I was initially wary of proposing any”one-size-that-fits-all” set of attributes of good evaluation questions.
The “Proposed use” column on the right side of the lists below is a nominal space which a checklist user might use to document if the evaluation questions developed to date are of this (row) type, along with any caveats or conditions. In its simplest form the response there could be a yes/no.
2.2 A provisional checklist of types of evaluation questions that can be asked
1: Question types | Description | Priority (binary/ranked/rated) |
---|---|---|
1. Descriptive | about what happened , when, where when, who. Without this information other questions below can be difficult to answer | |
2. Valuative | about peoples’ assessments of the value and significance of what happened. (aka Normative? see below), which will then inform choices about where to investigate… | |
3. Explanatory | about the causes of what happened or what happened as a result of a cause | |
4. Predictive | about likely consequences of what happened | |
5. Prescriptive | about what what could or should be done about what happened, or what is expected to happen |
2.2.1: Kinds of descriptions (relates to 1.1 above) | Priority (binary/ranked/rated) |
---|---|
1. Comparable descriptions of multiple instances | |
2. Detailed and specific descriptions about individual cases/instances | |
3. Historical accounts – of events over an extended period of time | |
4. Stages on an expected process: activities, outputs, outcomes, impacts | |
5. The “expected” versus “unexpected” status of events described | |
6. The degree of agreement/dissent about the nature of the events described | |
[… others to be identified] | |
PS: Worth reading “Descriptive statistics are essential to making complex analyses useful” by Murphy, 2022 The gorilla experiment, or why simple scatter plots can be so useful, by Yanai and Lercher, 2020 |
2.2.3: Kinds of value criteria (relates to 1.2 above) | Priority (binary/ranked/rated) |
---|---|
1. Preferences [aka Relevance? See below] | |
2. Equity – of process, of outcomes | |
3. Inclusion – partners/beneficiaries in design / steering / management / implementation | |
4. Sustainability – of an intervention and/or of its outcomes | |
5. Effectiveness | |
6. Economy, efficiency, cost-effectiveness | |
7. Resilience, robustness | |
8. Coherence – with the efforts of others | |
9. Consistency – with other activities of the same programme/agent | |
10. Fidelity – consistency of actual implementation with design intentions | |
For more options please also see: Evaluation Values and Criteria Checklist by Daniel L. Stufflebeam, 2001 Better Criteria for Better Evaluation.Revised Evaluation Criteria Definitions and Principles for Use OECD/DAC Network on Development Evaluation, 2019 |
2.2.4: Types of explanatory questions about…(relates to 1.3 above) | Priority (binary/ranked/rated) |
---|---|
1. Causes where the intervention and expected outcome were both present (True Positives) | |
2. Causes where the intervention was present but the expected outcome was absent present (False Positives) | |
3. Causes where the intervention was absent but the expected outcome was present (False Negatives) | |
4. Causes where the intervention and the expected outcome were both absent (True Negatives) | |
5. The effects of a cause (equifinality) [i.e. both situations 1 and 2 above] | |
6. The causes of an effect (multifinality) [i.e. both situations 1 and 3 above] | |
7. Necessary conditions for an outcome to be present (=no false negatives), sufficient or probable status of a known or expected cause | |
8. Sufficient conditions for an outcome to be present (=no false positives), | |
9. Probable conditions for an outcome to be present (false negatives and false positives are present, but they are in the minority) | |
[Others to be identified] |
2.2.5: Types of prediction questions (relates to 1.4 above) | Priority (binary/ranked/rated) |
---|---|
1. Ownership: Who owns/supports particular predictions? | |
2. Evidence: What are the evidence requirements for particular predictions to adequately tested? | |
3. Accuracy 1: How often is the outcome likely to be present (versus absent) when the intervention is present? (aka Consistency/Precision)? | |
4. Accuracy 2: How often is the outcome likely to be present when the intervention is present (versus absent)? (aka Coverage/Sensitivity/Recall)? | |
[…others to be identified] |
2.2.6: Types of prescriptions/ recommendations that can be sought (relates to 1.5 above) | Priority (binary/ranked/rated) |
---|---|
1. Customised versus widely applicable recommendations | |
2. Prioritised recommendations | |
[…others to be identified] |
[Other content yet to be provided here]
2.3 Sourcing options for evaluation questions
1: Documents | Priority (binary/ranked/rated) |
---|---|
1. Previous evaluations [or monitoring or audits] of the same or similar activities, especially those needing clarification or not yet answered | |
2. Strategy documents referring to the purposes and ambitions of this and other similar activities | |
3. Theory(ies) of Change (ToC) for this specific activity. | |
4. Published studies and reviews of the same kind of activity | |
[…others to be identified] |
2: People | Priority (binary/ranked/rated) |
---|---|
1. Funders, who are financing the activity, and others similar | |
2. Implementers: Persons and organisations involved in the delivery of the activity | |
3. Beneficiaries: Those intended to benefit from the activity | |
4. Other stakeholders, who are expected to have an interest in the activity (e.g. independent media, civil society organisations, researchers, members of parliament, government bodies, corporate interests…) | |
[…others to be identified] |
3: Process | Priority (binary/ranked/rated) |
---|---|
1. Time is allowed for the iterative development of evaluation questions I.e questions are proposed, responded to, refined, and then agreed on. At least one cycle. | |
[…others to be identified] |
[Other content yet to be provided here]
2.4 Quality criteria for evaluation questions
Criteria | Description | Priority (binary/ranked/rated) |
---|---|---|
1. Ownership |
|
|
2. Usefulness |
|
|
3. Feasibility |
|
|
4. Relevance |
|
|
5. Prioritisation |
|
|
6. Uncertainty |
|
|
7. Writing style |
|
|
[…others to be identified] |
[Other content yet to be provided here]
2.5 Unresolved issues – your views please
- Uncertainty: For any given activity, people’s views may vary as to whether it has been effective or not (or how it rates on any other criteria of interest). If an evaluation does not have the time and resources to examine and test every claim they have to make choices as to which to focus on. Should they focus on claims that have a wide degree of support e.g. 95% of x people think it is true, or should they focus on claims that are more evenly disputed e.g where 50% think the claim is true and 50% who do not?
- These differences of opinion may be evenly scattered across different stakeholder groups, or they may be associated with particular stakeholder groups. Should more attention be paid to differences of opinion that are of the second kind, associated with identifiable stakeholder groups?
- PS: The whole idea of focusing on testing claims, versus just making open ended inquiries, is argued in detail here: Davies, R (2012) Evaluation questions: Managing agency, bias and scale
- Relevance. This is the first of the six DAC criteria. But I have my doubts about whether this is the best word to use. The first para that explains the concept says “The extent to which the intervention objectives and design respond to beneficiaries’,global, country, and partner/institution needs, policies, and priorities, and continue to do so if circumstances change.” [my emphasis added]. In short, I think it is how an activity fits with peoples/organisations preferences (expressed one way or another)
- Valuative/Normative, or?: Not sure what the best summary term is to describe questions “about peoples’ assessments of the value and significance of what happened” Any advice?
- EQ workflow: Does it make sense to think about using different types of evaluation questions at different stages of a workflow? For example, as suggested in Figure 1 below [ bearing in mind that in reality there are likely to be various feedback loops between these stages]:
Post script: Courtesy of Tom Aston, I found this other diagram that suggests that valuation might better come first, before description (but I am not persuaded)
3. Endnote: Where Evaluation Questions fit in with other considerations
Evaluation questions should not be thought about in isolation. This diagram below is taken from the Austrian government’s Guidance Document on Evaluability Assessments. What stakeholders want to know needs to be may not always need to be answered, or can be answered. They need to fit, to some extent, with the Theory of Change (about what was intended), and with current and potential availability of data (about what can be known).