Weighted Checklists

A Participatory Means Of Measuring Complex Change

What is a weighted checklist? | Where are (participatory) weighted checklists used? | What is different about (participatory) weighted checklists? | Potential problems | Postscript

What is a weighted checklist?

A weighted checklist has:

  • A list of items, each of which describes an attribute of an organisation or an event. The attribute may or may not be present (indicated by a 1 or 0), or it may be present in a degree measured in a simple scale (e.g. 0 to 3).
  • A set of weights, which describes the relative importance of each item
  • A summary score, based on the number of items identified as present, but adjusted by their individual weights.

Here is an example of a very simple customer satisfaction survey that is in the form of a weighted checklist (that was sent to me by a firm I used). In this case the survey respondents provided two sets of information (a) their views on the importance of each item, to them (the weights), in the second column, (b) their views on how well the firm was doing on each criteria according to their experience, in the third column.

Once the responses have been collected, weighted scores for individual respondents then can be calculated, along with an average score for all respondents. The process is as follows:

1. Multiply the importance rating x actual performance rating for each item

2. The sum of these is the actual raw score

3. Multiply the importance rating x highest possible performance rating for each item

4. The sum of these is the highest possible raw score

5. Divide the actual raw score (2) by the highest possible raw score (4), to get a percentage score for the respondent. A high percentage = high degree of satisfaction, and vice versa

6. Calculate the average  percentage score for all the respondents

This is a participatory form of weighted checklist, because respondents themselves determine the weights given to different items on the checklist. Other types of checklists use weightings solicited from experts. They are not the focus of the remainder of this paper. Judging from a Google search these expert weighted checklists are mainly used for staff performance appraisal purposes. For more information on these, see:

Where are (participatory) weighted checklists used?

1. When the event is complex and difficult to measure with any single indicator

Often people try to measure a change by finding a single measurable indicator that will capture the change. For complex changes, such as improvements in people’s participation or changes in organisational capacity, finding such an indicator can be a major challenge. Often the chosen indicator seems far too simplistic. Such as using the number of people participating in x type of meetings, as an indicator of participation.

2. Where there may be multiple measures in place, but a single aggregate measure is needed of overall performance

Sometimes Logical Framework descriptions of project designs will include more than one indicator to track a given change that is recognised to be complex. However this response presents a further challenge, of how to aggregate the evidence of change described by multiple indicators.

3. Where peoples’ views of the significance of what has happened differ

Users of a health centre may have different views on how well the health service is performing compared to the health centre staff, or to the views of the senior managers of health services

The matrix below can be used to describe where three different methods are most suitable (ordinary indicators, Most Significant Change stories, and weighted checklists)


What is different about (participatory) weighted checklists?

Weighted checklists separate out value data from observational data. In the example above, the second column asks about importance to you, the respondent. This is value data. The third column asks you about the company’s performance. This is observational data.

With the use of conventional indicators judgments about importance happen only once, when the choice is made to use a specific indicator or not. This happens at the planning stage, and is set thereafter. It is not possible to change the choice of indicator later on, without losing continuity of the data that has been collected so far. With weighted checklists the same set of observational data can be re-analysed with different sets of value data, reflecting the views of different stakeholders.

Value data is meta-information: information about information. This can be of different kinds.  In the simple example shown in the table above, respondents are asked about their preferences. Another survey could ask people which items they thought were basic rights, which all people should have access to. This is the basis of the design of the Basic Necessities Survey (BNS). Or, a survey could ask which items would be the most important cause of an overall outcome e.g. improved community health. This was the subject of my posting on “Checklists as mini theories-of-change” Because of these choices available participatory processes used to elicit checklist weightings should always be clear on what type of judgments are being sought.

Value data can be worthwhile analyzing in itself. Different groups of stakeholders will usually vary in the extent that their views agree with each other. We could measure and monitor this degree of alignment by looking at how participants’ ratings in the second column of the example above correlate with each other, using Excel. Social Network diagrams could also be produced using the same data (in a participants x item ratings matrix) to show in more detail how various stakeholder groups are aligned with each other in their views. Of special importance in development project settings will be how the alignment of views between stakeholders changes over time. Is there a stronger consensus developing or not?

Changes over time are likely to be important in other ways as well. If a survey asks for information about the perceived importance of different health services (as well as their actual availability) the increased expectations over time might be as important an indicator of development as any increased availability of services.  Knowledge that the public were expecting more could affect the responsiveness of local politicians to their constituency’s concerns. Differences between what people report as available and what is reported to be available through other sources of information could also be informative. It could highlight a lack of public knowledge of what is available, or raise questions about the validity of officials’ claims about what is available.

Potential problems

Constructing the checklist

A common challenge to the use of methods like the BNS is “But who constructed the checklist? Surely the contents of this list, and what it omits, will affect the overall findings?” There are two ways of addressing this potential problem. The first is to ensure that the checklist contents are developed through a consultative process involving a  range of stakeholders, especially those whose performance is being assessed. The other is to ensure that the checklist is long enough. The BNS checklist had around thirty items. The larger the checklist the less vulnerable the aggregate score will be to the accidental omission of individual items that could be important. But there will also need to be some limits to the size of the list, because respondents’ interests are likely to wane towards the end of a long list.

Long lists of items in a checklist can also present another challenge, of how to assign weightings to all of them. One way around this problem is to group the items into categories, as in the table above, and then proceed with weightings in two stages: (a) for the four main categories first, then for the individual items within each category.

Interpreting the checklist

In a survey form it will not be easy to elicit reasons why people have rated one item on a checklist as more important than another. But in workshop settings this can be easier. One way of eliciting these explanations from participating stakeholders is to do pair comparisons, asking “Why is this category of activities more important than this one?” Answers to this question help provide insight into people as consumers of services or citizens with rights or managers with theories-of-change about how their intervention should work.

The same problem exists with understanding the observational data, especially where there are rating scales rather than yes/no answer options. With yes/no options the main requirements it that the items on the list are clearly defined entities or events. With ratings there is the possibility of significant differences in response styles, how people use the ratings available. One common strategy is to provide the respondent with guidance on what would constitute a 0, 1, 2, of 3 on a rating scale.

Transparency issues

There is some debate however about whether the weightings of items should be made visible to respondents before a survey, or only made visible later on, when results have been aggregated.  This would not be an issue where the weightings themselves are obtained from the respondents, as is the case with the BNS survey. But it could have an influence on the survey results where weights are decided before the survey is implemented. It could lead to respondents, say health centre staff, deciding to improve one aspect of their service more than another, because they know it receives a higher weighting in the checklist. But that response may not necessarily be a bad, thing, if those aspects of service are really more important than others. Being open about the weightings could give health centres some choices about how to improve their service, in contrast to performance measurement relying on one key indicator.

Where weightings are obtained from stakeholders (including respondents) via a workshop event after the survey the effects might be less easy to predict. Participants might be inclined to argue for higher weightings for items they know they have done well on, and vice versa. Making their raw checklist scores visible during the workshop discussion could help make this tendency evident, but it is not likely to eradicate it. Structuring a debate around the proponents of different weightings might help force any apparent self-interest proposals to be justified.

Postscript 1

I recommend THE LOGIC AND METHODOLOGY OF CHECKLISTS by Michael Scriven Claremont Graduate University, updated in 2007. In the opening para he says

“The humble checklist, while no one would deny its utility in evaluation and elsewhere, is usually thought to fall somewhat below the entry level of what we call a methodology, let alone a theory. But many checklists used in evaluation incorporate a quite complex theory, or at least a set of assumptions, which we are well advised to uncover; and the process of validating an evaluative checklist is a task calling for considerable sophistication. Indeed, while the theory underlying a checklist is less ambitious than the kind that we normally call a program theory, it is often all the theory we need for an evaluation.”

This is a great paper, informative and a pleasure to read. Amongst other things, it gives a wider background to the use of checklists than I have provided above.

Postscript 2

In the DFID  “Guidance on using the revised Logical Framework“  there is now a section on IMPACT WEIGHTING.

Once you have defined your Outputs, you should assign a percentage for the contribution each is likely to make towards the achievement of the overall Purpose.    The impact weights of all the Outputs must total 100% and each should be rounded to the nearest 5%. Impact Weightings for Outputs are intended to:
•  Promote a more considered approach to the choice of Outputs at project
design stage; and
•  Provide a clearer link to how Output performance relates to project Purpose performance.

It appears from the DFID Annual Report formats that Output achievement ratings are multiplied by these weightings to produce a weighted measure of achievement

Postscript 3

See also

  • The synthesis problem: Issues and methods in the combination of evaluation results into overall evaluative conclusions. By Michael Scriven, Claremont Graduate University andb E. Jane Davidson, CGU & Alliant University.A demonstration presented at the annual meeting of the American Evaluation Association, Honolulu, HI, November 2000 HIGHLY RECOMMENDED
  • Multi-criteria analysis: A manual. Department of Communities and Local Government. London. 2009 “This manual was commissioned by the Department for the Environment, Transport and the Regions in 2000 and remains, in 2009, the principal current central government guidance on the application of multi-criteria analysis (MCA) techniques. Since 2000 it has become more widely recognised in government that, where quantities can be valued in monetary terms, MCA is not a substitute for cost-benefit analysis, but it may be a complement; and that MCA techniques are diverse in both the kinds of problem that they address (for example prioritisation of programmes as
    well as single option selection) and in the techniques that they employ, ranging from decision conferencing to less resource intensive processes. “

Postscript 4 (14 March 2012)

Could you combine the use of weighted checklists and genetic algorithms (GAs), to discover candidate theories that have a good record of explaining variations in performance across a range of settings?

How it might work:

  • An aggregate score on a weighted checklist is a function of individual item scores x their respective weights
  • The aggregate score can be seen as a prediction, which can be compared against observed measures for its fitness i.e accuracy.
  • The particular set of weights in a weighted checklist can be seen as a “theory” of what is needed for good performance
  • Some theories (i.e sets of weights) will be better than others in generating an accurate aggregate score
  • GA software could find the best combination of weights that would generate the most accurate aggregate score, across a range of participants, each of whom’s performance is measured by a weighted checklist
  • The theory embodied in that set of weights would then need to be tested by complimentary means, including common sense judgement (e.g. is this particular combination of performance attributes likely to occur in real life?

Excel has an add-in called Solver that enables GA searches for the most optimal combination of variables to generate a target result. This Excel file shows a mock-up example describing 8 projects scored on 4 attributes. Solver found the combination of attribute weights that generated the most accurate prediction of actual performance (predicted values – observed values =  closest to zero). My guess is that the larger the number of cases (projects) and the larger number of attributes, the less sensitive the results would be to small variations in attribute scores (e.g. due to measurement error).

Why use a GA? The number of possible combinations of weightings that might generate successful predictions is enormous even when there are only  five values on four attributes, and this number grows very rapidily with each extra attribute. Evolutionary processes are well known for their capacity to explore very large combinatorial spaces.

8 thoughts on “Weighted Checklists”

  1. I think this is a very interesting and promising approach. I also imagine that it would be very beneficial when used to assess (evaluate) performance of community based associations.

    I am thinking/brainstorming on some issues below:

    1- The measue of quality of a certain indicator: for example in the tool presented above might the latrines might be found and thus take (1)while their quality might be very low (broken), so I would really “stand for” the simple measure degree scale (0-3)which will deal with the quality issue. In case of using the simple measure degree scale, each score from (1-3) while have a different weighting which will make it more complicated!

    2- If an evaluator/someone use the tool to assess the performance of a community based association, s/he will need examine a lot of evidence to prove /prove not each statement/indicator (i.e. existence of mission statement or existence of clear and fair staff policies..etc).

    3- I also think that before calculating the total score of a tool, a one should know before hand what each range of scores would mean (20 – 40: this would mean …) – but again this could also be misleading if total high scores where due to focus to achieved the “highly weighted” indicators!!


  2. While weighted checklist quantifies performance it is important that there is a qualitative explanation to clarify further the scores.

  3. I have just noticed that the 2008 revision of the format used by DFID for its Annual Review reports on its funded projects includes not just an achievement rating for each output (1-5), but also a % weighting for each output (adding up to 100%), and that these are then combined to produce an aggregate performance score for the project at Output level, with a maximum possible score of 100%.

  4. Hello Rick!

    Very useful tips and article about Weighted checklist. A great way to start and spread the importance of having such.

    You may want to share them too with other management and event professionals over at Expert Checklist http://expertchecklists.com/. It’s a new web app for professionals working in difficult and complex environments where users can work together to create and discuss very effective checklists for their fields.

    The cool thing is that you can modify the list for yourself and print it as PDF. On the web site, you can also work together with other pros to improve the list or discuss changes.

  5. Hi Joevye
    It would be good if your site could support weighted versions of checklists.
    Is that a possibility in the future?

  6. HI Rick!!!

    I ever worked with you for a short time at AATG on the mid-term review of the ActionAid Dvelopment Area 2 in the Gambia. I was the M&E Officer for the DA2 at the time.

    Please I will appreciate receipt of updates in the M&E world as i term you my mentor.

    Momodou A. Jallow
    M&E Officer
    Agency for the Development of Women and Children – ADWAC
    Kerewan, The Gambia
    +220 9936 912
    +220 7936 912
    +220 6936 912

Leave a Reply