Context: “This report represents a basis for integrating big data and data analytics in the monitoring and evaluation of development programmes. The report proposes a Call to Action, which hopes to inspire development agencies and particularly evaluators to collaborate with data scientists and analysts in the exploration and application of new data sources, methods, and technologies. Most of the applications of big data in international development do not currently focus directly on monitoring, and even less on evaluation. Instead they relate more to research, planning and operational use using big data. Many development agencies are still in the process of defining their policies on big data and it can be anticipated that applications to the monitoring and evaluation of development programmes will start to be incorporated more widely in the near future. This report includes examples and ways that big data, together with related information and communications technologies (ICTs) are already being used in programme monitoring, evaluation and learning. The data revolution has been underway for perhaps a decade now. One implication for international development is that new sources of real–time information about people are for the first time available and accessible. In 2015, in an unprecedented, inclusive and open process, 193 members states of the United Nations adopted, by consensus, the 2030 Agenda for sustainable development. The 17 Sustainable Development Goals (SDGs) contained in the 2030 Agenda constitute a transformative plan for people, planet, prosperity, partnerships and peace. All of these factors are creating a greater demand for new complexity–responsive evaluation designs that are flexible, cost effective and provide real–time information. At the same time, the rapid and exciting developments in the areas of new information technology (big data, information and communications technologies) are creating the expectation, that the capacity to collect and analyse larger and more complex kinds of data, is increasing. The report reviews the opportunities and challenges for M&E in this new, increasingly digital international development context. The SDGs are used to illustrate the need to rethink current approaches to M&E practices, which are no longer able to address the complexities of evaluation and interaction among the 17 Goals. This endeavour hopes to provide a framework for monitoring and evaluation practitioners in taking advantage of the data revolution to improve the design of their programmes and projects to support the achievement of the Sustainable Development Goals and the 2030 Agenda.”
Rick Davies comment: As well as my general interest in this paper, I have two particular interests in its contents. One is what it says about small (rather than big) data and how big data analysis techniques may be relevant to the analysis of small data sets. In my experience many development agencies have rather small data sets, which are often riddle with missing data points. The other is what the paper has to say about predictive analytics, a field of analysis (within data mining defined more widely) that I think has a lot of relevance to M&E of development programmes.
Re the references to predictive analytics, I was disappointed to see this explanation on page 48: “Predictive analytics (PA) uses patterns of associations among variables to predict future trends. The predictive models are usually based on Bayesian statistics and identify the probability distributions for different outcomes“. In my understanding Bayesian classification algorithms are only one of a number of predictive analytics tools which generate classifications (read predictive models). Here are some some classifications of the different algorithms that are available: (a) Example A, focused on classification algorithms – with some limitations, (b) Example B, looking at classification algorithms within the wider ambit of data mining methods, from Maimon and Rokach (2010; p.6) . Bamberger’s narrow definition is an unfortunate because there are simpler and more transparent methods available, such as Decision Trees, which would be easier for many evaluators to use and whose results could be more easily communicated to their clients.
Re my first interest re small data, I was more pleased to see this statement: “While some data analytics are based on the mining of very large data sets with very large numbers of cases and variables, it is also possible to apply many of the techniques such as predictive modelling with smaller data sets” This heightens the importance of clearly spelling out the different ways in which predictive analytics work can be done.
I was also agreeing with the follow on paragraph: “While predictive analytics are well developed, much less progress has been made on causal (attribution) analysis. Commercial predictive analytics tends to focus on what happened, or is predicted to happen (e.g. click rates on web sites), with much less attention to why outcomes change in response to variations in inputs (e.g. the wording or visual presentation of an on–line message). From the evaluation perspective, a limitation of predictive analysis is that it is not normally based on a theoretical framework, such as a theory of change, which explains the process through which outcomes are likely to be achieved. This is an area where there is great potential for collaboration between big data analytics and current impact evaluation methodologies” My approach to connecting these two types of analysis is explained on the EvalC3 website. This involves connecting cross-case analysis (using predictive analytics tools, for example) to within-case analysis (using process tracing or simpler tools, for example) through carefully thought though case selection and comparison strategies.
My interest and argument for focusing more on small data was reinforced when I saw this plausible and likely situation: “The limited access of many agencies to big data is another major consideration” (p69) – not a minor issue in a paper on the use and uses of big data! Though the paper does highlight the many and varied sources that are becoming increasingly available, and the risks and opportunities associated with their use.