Predictive Modeling


Predictive and explanatory models are both useful

We can make a distinction between predictive and explanatory models. A predictive model may say that where conditions A+B+C exist then we will find outcome Y. But there may or may not be any causal mechanisms connecting A+B+C. That is not necessarily a problem. For example, if the model described conditions in a stock market, we could still make money from this predictive model. But if we wanted to engineer these conditions to be present in order for the outcome to occur we would probably need to know something about the underlying causal processes. A predictive model could be developed into an explanatory model, if we did some homework on what was happening inside the cases we used to develop the predictive model.  Different kinds of models may be useful for different kinds of people and situations.


We need models that can capture complicated causal processes, let alone complex ones…

One option is to use what are called “multiple conjunctural causation” models.  This is the approach used by Qualitative Comparative Analysis (QCA), but which can also be used as the basis for other approaches to developing explanatory and/or predictive models.

An example would be a model that says A+B+C leads to Y and also A+D+F leads to Y and also notA +M+N leads to Y

Here there are three configurations involving different combinations of seven conditions (present or absent), each of which can lead to Y. This feature is called equifinality.

Another feature of these models is that they can be asymmetrical. The cause of Y may not simply be the absence of the conditions above. It may be some other combination altogether. Example. I may be motivated to go to a conference, but the reason I don’t go to a conference may not be lack of motivation, but another pressing engagement

Associated with this approach are the distinctions made between sufficient and necessary conditions

  • A condition may be sufficient and necessary
  • A condition may be sufficient but not necessary
  • A condition may be necessary but not sufficient
  • A condition may unnecessary and insufficient, but still important. It may be a necessary but insufficient part of a combination of conditions which is sufficient but not necessary (INUS).

How we can develop predictive models

There are quite a few alternative ways of developing good predictive models, given the availability of a data set with N different attributes and an outcome of interest that needs to be predicted. Here the ones that I am aware of, and have explored:

  1. Manual hypothesis-led searches
  2. Algorithm-based searches, including
    1.  Exhaustive searches of all possible combinations of attributes
    2. Decision Tree algorithms (there are more than one)
    3. Genetic algorithms, as availabvle in the Solver add-in for Excel
    4. QuineMcCluskey algorithm, as built into QCA software. Here is an introductory page I  wrote on QCA
  3. Ethnographic/participatory methods that can be used for predictive purposes
    1. Ethnographic decision tree models, as developed by Gladwin in 1989
    2. Hierarchical Card Sorting


This is an Excel app that I have developed with the assistance of Aptivate, a software company based here in Cambridge, UK

Its purpose is to enable users: (a) to identify sets of attributes that describe an intervention & its context, and which are  good predictors of the achievement of an outcome of interest,  (b) to compare and evaluate the performance of these predictive models, and (c) to identify relevant cases for follow-up within-case investigations to uncover any causal mechanisms at work.

Examples of specific types of uses are described under here

These predictions are based on the screening of a data set that (ideally) describes the attributes of a set of cases (projects, households, …), including interventions, the wider context and associated outcomes. While it involves systematic quantitative cross-case comparisons, its use should be informed by  within-case knowledge at both the pre-analysis planning and post-analysis interpretation stages.

The overall approach is based on the view that “association is a necessary but insufficient basis for a strong claim about causation”, which is a more useful perspective than simply saying “correlation does not equal causation”.

Influences: The design of this Excel package has been influenced by exposure to: (a) Qualitative Comparative Analysis (courtesy Barbara Befani), (b) RapidMiner open source predictive analytics software, (c) Goertz and Mahoney’s (2012) A Tale of Two Cultures” . Look here for relevant references 

You can find more about EvalC3 via this supporting website. You can also request a copy of EvalC3 if you would like to help with testing out the software. It is free and available under a Creative Commons Attribution Non-Commercial Share Alike licence (contact me on






One thought on “Predictive Modeling”


This site uses Akismet to reduce spam. Learn how your comment data is processed.