“Big Data for Development: Opportunities & Challenges”

Published by Global Pulse, 29 May 2012

Abstract: “Innovations in technology and greater affordability of digital devices have presided over  today’s Age of Big Data, an umbrella term for the explosion in the quantity and diversity of high frequency digital data. These data hold the potential—as yet largely untapped— to allow decision makers to track development progress, improve social protection, and understand where existing policies and programmes require adjustment.  Turning Big Data—call logs, mobile-banking transactions, online user-generated content such as blog posts and Tweets, online searches, satellite images, etc.—into actionable information requires using computational techniques to unveil trends and patterns within and between these extremely large socioeconomic datasets. New insights gleaned from such data mining should complement official statistics, survey data, and information generated by Early Warning Systems, adding depth and nuances on human behaviours  and experiences—and doing so in real time, thereby narrowing both information and  time gaps. With the promise come questions about the analytical value and thus policy relevance of  this data—including concerns over the relevance of the data in developing country contexts, its representativeness, its reliability—as well as the overarching privacy issues of utilising personal data. This paper does not offer a grand theory of technology-driven social change in the Big Data era. Rather it aims to delineate the main concerns and challenges raised by “Big Data for Development” as concretely and openly as possible, and to suggest ways to address at least a few aspects of each.”

“It is important to recognise that Big Data and real-time analytics are no modern panacea for age-old development challenges.  That said, the diffusion of data science to the realm of international development nevertheless constitutes a genuine opportunity to bring powerful new tools to the fight against poverty, hunger and disease.”

“The paper is structured to foster dialogue around some of the following issues:

  • What types of new, digital data sources are potentially useful to the field of international development?
  • What kind of analytical tools, methodologies for analyzing Big Data have already been tried and tested by academia and the private sector, which could have utility for the public sector?
  • What challenges are posed by the potential of using digital data sources (Big Data) in development work?
  • What are some specific applications of Big Data in the field of global development?
  • How can we chart a way forward?”

Click here to download the PDF:

Read about Global Pulse. “Global Pulse is an innovation initiative launched by the Executive Office of the United Nations Secretary-General, in response to the need for more timely information to track and monitor the impacts of global and local socio-economic crises. The Global Pulse initiative is exploring how new, digital data sources and real-time analytics technologies can help policymakers understand human well-being and emerging vulnerabilities in real-time, in order to better protect populations from shocks.”

See also: World Bank Project Performance Ratings. “IEG independently validates all completion reports that the World Bank prepares for its projects (known as Implementation Completion Reports, or ICRs).  For a subset of completed projects (target coverage is 25%), IEG performs a more in-depth project evaluation that includes extensive primary research and field work.  The corresponding ICR Reviews and Project Performance Assessment Reports (PPARs), codify IEG’s assessments using Likert-scale project performance indicators.  The World Bank Project Performance Ratings database is the collection of more than 8000 project assessments covering about 6000 completed projects, since the unit was originally established in 1967.  It is the longest-running development project performance data collection of its kind.”(1981-2010)

Rick Davies comment: There is a great opportunity here for a data mining analysis to find decision rules that best predict successful projects [Caveat: GIVEN THE FIELDS AVAILABLE IN THIS DATA SET] ”

See also: Good countries or good projects ? macro and micro correlates of World Bank project performance. Author: Denizer, Cevdet; Kaufmann, Daniel; Kraay, Aart; 2011/05/01, Policy Research working paper ; no. WPS 5646 . Summary:”The authors use data from more than 6,000 World Bank projects evaluated between 1983 and 2009 to investigate macro and micro correlates of project outcomes. They find that country-level “macro” measures of the quality of policies and institutions are very strongly correlated with project outcomes, confirming the importance of country-level performance for the effective use of aid resources. However, a striking feature of the data is that the success of individual development projects varies much more within countries than it does between countries. The authors assemble a large set of project-level “micro” correlates of project outcomes in an effort to explain some of this within-country variation. They find that measures of project size, the extent of project supervision, and evaluation lags are all significantly correlated with project outcomes, as are early-warning indicators that flag problematic projects during the implementation stage. They also find that measures of World Bank project task manager quality matter significantly for the ultimate outcome of projects. They discuss the implications of these findings for donor policies aimed at aid effectiveness.”

 See also: A Few Useful Things to Know about Machine Learning.  Pedro Domingos. Department of Computer Science and Engineering, University of Washington, Seattle, WA , 8195-2350, U.S.A.pedrod@cs.washington.edu

Do we need more attention to monitoring relative to evaluation?

This post title was prompted by my reading of Daniel Ticehurst’s paper (below), and some of my reading of literature on complexity theory and on data mining.

First, Daniel’s paper: Who is listening to whom, and how well and with what effect?   Daniel Ticehurst, October 16th, 2012. 34 pages


“I am a so called Monitoring and Evaluation (M&E) specialist although, as this paper hopefully reveals, my passion is monitoring. Hence I dislike the collective term ‘M&E’. I see them as very different things. I also dislike the setting up of Monitoring and especially Evaluation units on development aid programmes: the skills and processes necessary for good monitoring should be an integral part of management; and evaluation should be seen as a different function. I often find that ‘M&E’ experts, driven by donor insistence on their presence backed up by so-called evaluation departments with, interestingly, no equivalent structure, function or capacity for monitoring, over-complicate the already challenging task of managing development programmes. The work of a monitoring specialist, to avoid contradicting myself, is to help instil an understanding of the scope of what a good monitoring process looks like. Based on this, it is to support those responsible for managing programmes to work together in following this process through so as to drive better, not just comment on, performance.”

“I have spent most of my 20 years in development aid working on long term assignments mainly in various countries in Africa and exclusively on ‘M&E’ across the agriculture and private sector development sectors hoping to become a decent consultant. Of course, just because I have done nothing else but ‘M&E.’ does not mean I excel at both. However, it has meant that I have had opportunities to make mistakes and learn from them and the work of others. I make reference to the work of others throughout this paper from which I have learnt and continue to learn a great deal.”

“The purpose of this paper is to stimulate debate on what makes for good monitoring. It  draws on my reading of history and perceptions of current practice, in the development aid and a bit in the corporate sectors. I dwell on the history deliberately as it throws up some good practice, thus relevant lessons and, with these in mind, pass some comment on current practice and thinking. This is particularly instructive regarding the resurgence of the aid industry’s focus on results and recent claims about how there is scant experience in involving intended beneficiaries and establishing feedback loops, in the agricultural sector anyway.The main audience I have in mind are not those associated with managing or carrying out evaluations. Rather, this paper seeks to highlight particular actions I hope will be useful to managers responsible for monitoring (be they directors in Ministries, managers in consulting companies, NGOs or civil servants in donor agencies who oversee programme implementation) and will improve a neglected area.”

 Rick Davies comment: Complexity theory writers seem to give considerable emphasis to the idea of constant  change and substantial unpredictability of complex adaptive systems (e.g. most human societies). Yet surprisingly enough we find more writings on complexity and evaluation than we do on complexity and monitoring.  For a very crude bit of evidence compare Google searches for “monitoring and complexity  -evaluation” and “evaluation and complexity -monitoring”. There are literally twice as many search results for the second search string. This imbalance is strange because monitoring typically happens more frequently and looks at smaller units of time, than evaluation. You would think its use would be more suited to complex projects and settings.  Is this because we have not had in the past the necessary analytic tools to make best use of monitoring data? Is it also because the audiences for any use of the data have been quite small, limited perhaps to the implementing agency, their donor(s) and the intended beneficiaries at best? The latter should not longer be the case, given the global movement for greater transparency in the operations of aid programs, aided by continually widening internet access. In addition to the wide range of statistical tools suitable for hypothesis testing (generally under-utilised, even in their simplest forms e.g. chi-square tests) there are now a range of data mining tools that are useful for more inductive pattern finding purposes. (Dare I say it, but…) These are already in widespread use by big businesses to understanding and predict their customers behaviors (e.g. their purchasing decisions). The analytic tools are there, and available in in free open source forms (e.g. RapidMiner)

Where there is no single Theory of Change: The uses of Decision Tree models

Eliciting tacit and multiple Theories of Change

Rick Davies, November 2012.Available as pdf  and a 4 page summary version

This paper begins by identifying situations where a theory-of-change led approach to evaluation can be difficult, if not impossible. It then introduces the idea of systematic rather than ad hoc data mining and the types of data mining approaches that exist. The rest of the paper then focuses on one data mining method known as Decision Trees, also known as Classification Trees.  The merits of Decision Tree models are spelled out and then the processes of constructing Decision Trees are explained. These include the use of computerised algorithms and ethnographic methods, using expert inquiry and more participatory processes. The relationships of Decision Tree analyses to related methods are then explored, specifically Qualitative Comparative Analysis (QCA) and Network Analysis. The final section of the paper identifies potential applications of Decision Tree analyses, covering the elicitation of tacit and multiple Theories of Change, the analysis of project generated data and the meta-analysis of data from multiple evaluations. Readers are encouraged to explore these usages.

Included in the list of merits of Decision Tree models is the possibility of differentiating what are necessary and/or sufficient causal conditions and the extent to which a cause is a contributory cause (a la Mayne)

Comments on this paper are being sought. Please post them below or email Rick Davies at rick@mande.co.uk

Separate but related:

See also: An example application of Decision Tree (predictive) models (10th April 2013)

Postscript 2013 03 20: Probably the best book on Decision Tree algorithms is:

Rokach, Lior, and Oded Z. Maimon. Data Mining with Decision Trees: Theory and Applications. World Scientific, 2008. A pdf copy is available