Releasing the power of digital data for development. A guide to new opportunities

Releasing the power of digital data for development: A guide to new opportunities. (2020). Frontier Technologies, UKAID, NIRAS.
Contents

Section 1  Executive Summary
Section 2 Introduction
Section 3 Understanding and navigating the new data landscape
Section 4  What is needed to release the new potential?
Section 5  Further considerations
Appendix 1: Data opportunities potentially useful now in testing  environments
Appendix 2: Bibliography and further reading
Appendix 3: Methodological notes

Executive Summary

There are 8 conclusions we discuss in this report.

1. There is justified excitement and proven benefits in the use of new digital data sources, particularly where timeliness of data is important or there are persistent gaps in traditional data sources.  This might include data from fragile and conflict-affected states, data supporting decision-making about marginalised population groups, or in finding solutions to address persistent ethical issues where traditional sources have not proved adequate.

2. In many cases, improvements in and greater access to traditional data sources could be more effective than just new data alone, including developing traditional data in tandem with new data sources. This includes innovations in digitising traditional data sources, supporting the sharing of data between and within organisations, and integrating the use of new data sources with traditional data.

3. Decision-making around the use of new data sources should be highly devolved by empowering individual staff and be focused on multiple dimensions of data quality, not least because there are no “one size fits all” rules that determine how new digital data sources fit to specific needs, subject matters or geographies. This could be supported by ensuring:
a. Research, innovation, and technical support are highly demand-led, driven by specific data user needs in specific contexts; and
b. Staff have accessible guidance that demystifies the complexities of new data sources, clarifies the benefits and risks that need to be managed, and allows them to be ‘data brokers’ confident in navigating the new data landscape, innovating in it, and coordinating the technical expertise of others.

The main report includes a description of the evidence and conclusions in a way that supports these aims, including a set of guides for staff about the most promising new data sources.

4. Where traditional data sources are failing to provide the detailed data needed, most new data sources provide a potential route to helping with the Agenda 2030 goal to ‘leave no-one behind,’ as often they can provide additional granularity on population sub-groups.  But, to avoid harming the interests of marginalised groups, strong ethical frameworks are needed, and affected people should be involved in decisionmaking about how data is processed and used. Action is also required to ensure strong data protection environments according to each type of new data and the contexts of its use.

5. New data sources with the highest potential added value for exploitation now, especially when combined with each other or traditional data sources, were found to be:
a. data from Earth Observation (EO) platforms (including satellites and drones)
b. passive location data from mobile phones

6. While there are specific limitations and risks in different circumstances, each of these data sources provides for significant gains in certain dimensions of data quality compared to some traditional sources and other new data sources. The use of Artificial Intelligence (AI) techniques, such as through machine learning, has high potential to add value to digital datasets in terms of improving aspects of data quality from many different sources, such as social media data, and particularly with large complex datasets and across multiple data sources.

7. Beyond the current time horizon, the most potential for emerging data sources is likely to come from:
• The next generation of Artificial Intelligence
• The next generation of Earth Observation platforms
• Privacy Preserving Data Sharing (PPDS) via the Cloud and
• the Internet of Things (IoT).
No significant other data sources, technologies or techniques were found with high potential to benefit FCDO’s work, which seems to be in line with its current research agenda and innovative activities. Some longer-term data prospects have been identified and these could be monitored to observe increases in their potential in the future.

8. Several other factors are relevant to the optimal use of digital data sources which should be investigated and/or work in these areas maintained. These include important internal and external corporate developments, importantly including continued support to Open Data/ data sharing and enhanced data security systems to underpin it, learning across disciplinary boundaries with official statistics principles at the core, and continued support to capacity-building of national statistical systems in developing countries in traditional data and data innovation.

Brian Castellani’s Map of the Complexity Sciences

I have limited tolerance for “complexity babble” That is, people talking about complexity in abstract and ungrounded, and in effect, practically inconsequential terms. Also in ways that give no acknowledgement to the surrounding history of ideas.

So, I really appreciate the work Brian has put into his “Map of the Complexity Sciences” produced in 2018. And thought it deserves wider circulation. Note that this is one of a number of iterations and more iterations are likely in the future. Click on the image to go to a bigger copy.

And please note: when you get taken to the bigger copy and when you click on any node a hypertext link there, this will take you to another web page providing detailed information about that concept or person. A lot of work has gone into the construction of this map, which deserves recognition.

Here is a discussion of an earlier iteration: https://www.theoryculturesociety.org/brian-castellani-on-the-complexity-sciences/

Linked Democracy Foundations, Tools, and Applications

Poblet, Marta, Pompeu Casanovas, and Víctor Rodríguez-Doncel. 2019. Linked Democracy: Foundations, Tools, and Applications. SpringerBriefs in Law. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-13363-4. Available in PDF form online

“It is only by mobilizing knowledge that is widely dispersed
across a genuinely diverse community that a free society can
hope to outperform its rivals while remaining true to its
values”

(Ober 2008, 5) cited on page v

Chapter 1 Introduction to Linked Data Abstract This chapter presents Linked Data, a new form of distributed data on the web which is especially suitable to be manipulated by machines and to share knowledge. By adopting the linked data publication paradigm, anybody can publish data on the web, relate it to data resources published by others and run artificial intelligence algorithms in a smooth manner. Open linked data resources may democratize the future access to knowledge by the mass of internet users, either directly or mediated through algorithms. Governments have enthusiastically adopted these ideas, which is in harmony with the broader open data movement.

Chapter 2 Deliberative and Epistemic Approaches to Democracy Abstract Deliberative and epistemic approaches to democracy are two important dimensions of contemporary democratic theory. This chapter studies these dimensions in the emerging ecosystem of civic and political participation tools, and appraises their collective value in a new distinct concept: linked democracy. Linked democracy is the distributed, technology-supported collective decision-making process, where data, information and knowledge are connected and shared by citizens online. Innovation and learning are two key elements of Athenian democracies which can be facilitated by the new digital technologies, and a cross-disciplinary research involving computational scientists and democratic theorists can lead to new theoretical insights of democracy

Chapter 3 Multilayered Linked Democracy An infinite amount of knowledge is waiting to be unearthed. —Hess and Ostrom (2007) Abstract Although confidence in democracy to tackle societal problems is falling, new civic participation tools are appearing supported by modern ICT technologies. These tools implicitly assume different views on democracy and citizenship which have not been fully analysed, but their main fault is their isolated operation in non-communicated silos. We can conceive public knowledge, like in Karl Popper’s World 3, as distributed and connected in different layers and by different connectors, much as it happens with the information in the web or the data in the linked data cloud. The interaction between people, technology and data is still to be defined before alternative institutions are founded, but the so called linked democracy should rest on different layers of interaction: linked data, linked platforms and linked ecosystems; a robust connectivity between democratic institutions is fundamental in order to enhance the way knowledge circulates and collective decisions are made.

Chapter 4 Towards a Linked Democracy Model Abstract In this chapter we lay out the properties of participatory ecosystems as linked democracy ecosystems. The goal is to provide a conceptual roadmap that helps us to ground the theoretical foundations for a meso-level, institutional theory of democracy. The identification of the basic properties of a linked democracy eco-system draws from different empirical examples that, to some extent, exhibit some of these properties. We then correlate these properties with Ostrom’s design principles for the management of common-pool resources (as generalised to groups cooperating and coordinating to achieve shared goals) to open up the question of how linked democracy ecosystems can be governed

Chapter 5 Legal Linked Data Ecosystems and the Rule of Law Abstract This chapter introduces the notions of meta-rule of law and socio-legal ecosystems to both foster and regulate linked democracy. It explores the way of stimulating innovative regulations and building a regulatory quadrant for the rule of law. The chapter summarises briefly (i) the notions of responsive, better and smart regulation; (ii) requirements for legal interchange languages (legal interoperability); (iii) and cognitive ecology approaches. It shows how the protections of the substantive rule of law can be embedded into the semantic languages of the web of data and reflects on the conditions that make possible their enactment and implementation as a socio-legal ecosystem. The chapter suggests in the end a reusable multi-levelled meta-model and four notions of legal validity: positive, composite, formal, and ecological.

Chapter 6 Conclusion Communication technologies have permeated almost every aspect of modern life, shaping a densely connected society where information flows follow complex patterns on a worldwide scale. The World Wide Web created a global space of information, with its network of documents linked through hyperlinks. And a new network is woven, the Web of Data, with linked machine-readable data resources that enable new forms of computation and more solidly grounded knowledge. Parliamentary debates, legislation, information on political parties or political programs are starting to be offered as linked data in rhizomatic structures, creating new opportunities for electronic government, electronic democracy or political deliberation. Nobody could foresee that individuals, corporations and government institutions alike would participate …(continues)

Participatory modelling and mental models

These are the topics covered by two papers I have come across today, courtesy of Peter Barbrook-Johnson, of Surrey University. Both papers provide good overviews of their respective fields.

Moon, K., Adams, V. M., Dickinson, H., Guerrero, A. M., Biggs, D., Craven, L., … Ross, H. (2019). Mental models for conservation research and practice. Conservation Letters, 1–11.

Abstract: Conservation practice requires an understanding of complex social-ecological processes of a system and the different meanings and values that people attach to them. Mental models research offers a suite of methods that can be used to reveal these understandings and how they might affect conservation outcomes. Mental models are representations in people’s minds of how parts of the world work. We seek to demonstrate their value to conservation and assist practitioners and researchers in navigating the choices of methods available to elicit them. We begin by explaining some of the dominant applications of mental models in conservation: revealing individual assumptions about a system, developing a stakeholder-based model of the system, and creating a shared pathway to conservation. We then provide a framework to “walk through” the stepwise decisions in mental models research, with a focus on diagram based methods. Finally, we discuss some of the limitations of mental models research and application that are important to consider. This work extends the use of mental models research in improving our ability to understand social-ecological systems, creating a powerful set of tools to inform and shape conservation initiatives.

PDF copy here

Voinov, A. (2018). Tools and methods in participatory modeling: Selecting the right tool for the job. Environmental Modelling and Software, 109, 232–255.

Abstract: Various tools and methods are used in participatory modelling, at di?erent stages of the process and for di?erent purposes. The diversity of tools and methods can create challenges for stakeholders and modelers when selecting the ones most appropriate for their projects. We o?er a systematic overview, assessment, and categorization of methods to assist modelers and stakeholders with their choices and decisions. Most available literature provides little justi?cation or information on the reasons for the use of particular methods or tools in a given study. In most of the cases, it seems that the prior experience and skills of the modelers had a dominant e?ect on the selection of the methods used. While we have not found any real evidence of this approach being wrong, we do think that putting more thought into the method selection process and choosing the most appropriate method for the project can produce better results. Based
on expert opinion and a survey of modelers engaged in participatory processes, we o?er practical guidelines to improve decisions about method selection at di?erent stages of the participatory modeling process

PDF copy here

Subjective measures in humanitarian analysis

A note for ACAPS, by Aldo Benini, A. (2018). PDF available at https://www.acaps.org/sites/acaps/files/resources/files/20180115_acaps_technical_note_subjective_measures_full_report.pdf

Purpose and motivation

This note seeks to sensitize analysts to the growing momentum of subjective methods and measures around, and eventually inside, the humanitarian field. It clarifies the nature of subjective measures and their place in humanitarian needs assessments. It weighs their strengths and challenges. It discusses, in considerable depth, a small number of instruments and methods that are ready, or have good potential, for humanitarian analysis.

Post World War II culture and society have seen an acceleration of subjectivity in all institutional realms, although at variable paces. The sciences responded with considerable lag. They have created new methodologies – “mixed methods” (quantitative and qualitative), “subjective measures”, self-assessments of all kinds – that claim an equal playing field with distant, mechanical objectivity. For the period 2000-2012, using the search term “subjective measure”, Google Scholar returns around 600 references per year; for the period 2013 – fall 2017, the figure quintuples to 3,000. Since 2012, the United Nations has been publishing the annual World Happiness Report; its first edition discusses validity and reliability of subjective measures at length.

Closer to the humanitarian domain, poverty measurement has increasingly appreciated subjective data. Humanitarian analysis is at the initial stages of feeling the change. Adding “AND humanitarian” to the above search term produces 8 references per year for the first period, and 40 for the second – a trickle, but undeniably an increase. Other searches confirm the intuition that something is happening below the surface; for instance, “mixed method  AND humanitarian” returns 110 per year in the first, and 640 in the second period – a growth similar to that of “subjective measures”.

Still in some quarters subjectivity remains suspect. Language matters. Some collaborations on subjective measures have preferred billing them as “experience-based measures”. Who doubts experience? It is good salesmanship, but we stay with “subjective” unless the official name of the measure contains “experience”.

What follows 

We proceed as follows: In the foundational part, we discuss the nature of, motivation for, and reservations against, subjective measures. We provide illustrations from poverty measurement and from food insecurity studies. In the second part, we present three tools – scales, vignettes and hypothetical questions – with generic pointers as well as with specific case studies. We conclude with recommendations and by noting instruments that we have not covered, but which are likely to grow more important in years to come

Rick Davies comment: High recommended!

PRISM: TOOLKIT FOR EVALUATING THE OUTCOMES AND IMPACTS ?OF SMALL/MEDIUM-SIZED CONSERVATION PROJECTS

WHAT IS PRISM?

PRISM is a toolkit that aims to support small/medium-sized conservation projects to effectively evaluate the outcomes and impacts of their work.

The toolkit has been developed by a collaboration of several conservation NGOs with additional input from scientists and practitioners from across the conservation sector.

The toolkit is divided into four main sections:

Introduction and Key Concepts: Provides a basic overview of the theory behind evaluation relevant to small/medium-sized conservation projects

Designing and Implementing the Evaluation: Guides users through a simple, step by step process for evaluating project outcomes and impacts, including identifying what you need to evaluate, how to collect evaluation data, analysing/interpreting results and deciding what to do next.

Modules: Provides users with additional guidance and directs users towards methods for evaluating outcomes/impacts resulting from five different kinds of conservation action:

  • Awareness and Attitudes
  • Capacity Development
  • Livelihoods and Governance
  • Policy
  • Species and Habitat Management

Method factsheets: Outlines over 60 practical, easy to use methods and supplementary guidance factsheets for collecting, analysing and interpreting evaluation data

Toolkit Website: https://conservationevaluation.org/
PDF copy of manual- Download request form: https://conservationevaluation.org/download/

Recent readings: Replication of findings (not), argument for/against “mixed methods”, use of algorithms (public accountability, cost/benefits, meta data)

Recently noted papers of interest on my Twitter feed:

  • Go Forth and Replicate: On Creating Incentives for Repeat Studies. Scientists have few direct incentives to replicate other researchers’ work, including precious little funding to do replications. Can that change? 09.11.2017 / BY Michael Schulson
    • “A survey of 1,500 scientists, conducted by the journal Nature last year, suggested that researchers often weren’t telling their colleagues — let alone publishing the results — when other researchers’ findings failed to replicate.”… “Each year, the [US] federal government spends more than $30 billion on basic scientific research. Universities and private foundations spend around $20 billion more, according to one estimate. Virtually none of that money is earmarked for research replication”…”In reality, major scientific communities have been beset these last several years over inadequate replication, with some studies heralded as groundbreaking exerting their influence in the scientific literature — sometimes for years, and with thousands of citations — before anyone bothers to reproduce the experiments and discover that they don’t hold water. In fields ranging from cancer biology to social psychology, there’s mounting evidence that replication does not happen nearly enough. The term “replication crisis” is now well on its way to becoming a household phrase.”
  • WHEN GOVERNMENT RULES BY SOFTWARE, CITIZENS ARE LEFT IN THE DARK. TOM SIMONITE, WIRED, BUSINESS, 08.17.1707:00 AM
    • “Most governments the professors queried didn’t appear to have the expertise to properly consider or answer questions about the predictive algorithms they use”…”Researchers believe predictive algorithms are growing more prevalent – and more complex. “I think that probably makes things harder,” says Goodman.”…”Danielle Citron, a law professor at the University of Maryland, says that pressure from state attorneys general, court cases, and even legislation will be necessary to change how local governments think about, and use, such algorithms. “Part of it has to come from law,” she says. “Ethics and best practices never gets us over the line because the incentives just aren’t there.”
  • The evolution of machine learning. Posted Aug 8, 2017 by Catherine Dong (@catzdong) TechCrunch
    • “Machine learning engineering happens in three stages — data processing, model building and deployment and monitoring. In the middle we have the meat of the pipeline, the model, which is the machine learning algorithm that learns to predict given input data.The first stage involves cleaning and formatting vast amounts of data to be fed into the model. The last stage involves careful deployment and monitoring of the model. We found that most of the engineering time in AI is not actually spent on building machine learning models — it’s spent preparing and monitoring those models.Despite the focus on deep learning at the big tech company AI research labs, most applications of machine learning at these same companies do not rely on neural networks and instead use traditional machine learning models. The most common models include linear/logistic regression, random forests and boosted decision trees.”
  • The Most Crucial Design Job Of The Future. What is a data ethnographer, and why is it poised to become so important? 2017.7.24 BY CAROLINE SINDERS. Co-Design
    • Why we need meta data (data about the data we are using). “I advocate we need data ethnography, a term I define as the study of the data that feeds technology, looking at it from a cultural perspective as well as a data science perspective”…”Data is a reflection of society, and it is not neutral; it is as complex as the people who make it.”
  • The Mystery of Mixing Methods. Despite significant progress on mixed methods approaches, their application continues to be (partly) shrouded in mystery, and the concept itself can be subject to misuse. March 28, 2017 By Jos Vaessen. IEG
    • “The lack of an explicit (and comprehensive) understanding of the principles underlying mixed methods inquiry has led to some confusion and even misuses of the concept in the international evaluation community.”
    • Three types of misuse (
    • Five valid reasons for using mixed methods: (Triangulation, Complementarity, Development, Initiation, Expansion)
  • To err is algorithm: Algorithmic fallibility and economic organisation. Wednesday, 10 May 2017. NESTA
    • We should not stop using algorithms simply because they make errors. Without them, many popular and useful services would be unviable. However, we need to recognise that algorithms are fallible and that their failures have costs. This points at an important trade-off between more (algorithm-enabled) beneficial decisions and more (algorithm-caused) costly errors. Where lies the balance?Economics is the science of trade-offs, so why not think about this topic like economists? This is what I have done ahead of this blog, creating three simple economics vignettes that look at key aspects of algorithmic decision-making. These are the key questions:Risk: when should we leave decisions to algorithms, and how accurate do those algorithms need to be?
      Supervision: How do we combine human and machine intelligence to achieve desired outcomes?
      Scale: What factors enable and constrain our ability to ramp-up algorithmic decision-making?
  • A taxonomy of algorithmic accountability. Cory Doctorow / 6:20 am Wed May 31, 2017 Boing Boing
    • “Eminent computer scientist Ed Felten has posted a short, extremely useful taxonomy of four ways that an algorithm can fail to be accountable to the people whose lives it affects: it can be protected by claims of confidentiality (“how it works is a trade secret”); by complexity (“you wouldn’t understand how it works”); unreasonableness (“we consider factors supported by data, even when you there’s no obvious correlation”); and injustice (“it seems impossible to explain how the algorithm is consistent with law or ethics”)”

Why have evaluators been slow to adopt big data analytics?

This is a question posed by Michael Bamberger in his blog posting on the MERL Tech website, titled Building bridges between evaluators and big data analysts. There he puts forward eight reasons (4 main ones and 4 subsidiary points). None of which I disagree with. But I have my own perspective on the same question and posted the following points as a Comment underneath his blog posting.

My take on “Why have evaluators been slow to adopt big data analytics?”

1. “Big data? I am having enough trouble finding any useful data! How to analyse big data is ‘a problem we would like to have’” This is what I suspect many evaluators are thinking.

2. “Data mining is BAD” – because data mining is seen as by evaluators something that is ad hoc and non-transparent. Whereas the best data mining practices are systematic and transparent.

3. “Correlation does not mean causation” – many evaluators have not updated this formulation to the more useful “Association is a necessary but insufficient basis for a strong causal claim”

4. Evaluators focus on explanatory models and do not give much attention to the uses of predictive models, but both are useful in the real world, including the combination of both. Some predictive models can become explanatory models, through follow-up within-case investigations.

5. Lack of appreciation of the limits of manual hypothesis formulation and testing (useful as it can be) as a means of accumulating knowledge. In a project with four outputs and four outcomes there can be 16 different individual causal links between outputs and outcomes, but 2 to the power of 16 possible combinations of these causal links. That’s a lot of theories to choose from (65,536). In this context, search algorithms can be very useful.

6. Lack of knowledge and confidence in the use of machine learning software. There is still work to be done to make this software more user friendly. Rapid Miner, BigML, and EvalC3 are heading in the right direction.

7. Most evaluators probably don’t know that you can use the above software on small data sets. They don’t only work with large data sets. Yesterday I was using EvalC3 with a data set describing 25 cases only.

8. The difficulty of understanding some machine learning findings. Decision tree models (one means of machine learning) are eminently readable, but few can explain the internal logic of specific prediction models generated by artificial neural networks (another means of machine learning, often used for classification of images). Lack of explainability presents a major problem for public accountability. Public accountability for the behavior and use of algorithms is shaping up to be a BIG issue, as highlighted in this week’s Economist Leader article on advances in facial recognition software: What machines can tell from your face

Update: 2017 09 19: See Michael Bamberger’s response to my comments above in the Comment section below. They are copied from his original response posted here http://merltech.org/building-bridges-between-evaluators-and-big-data-analysts/

 

 

Order and Diversity: Representing and Assisting Organisational Learning in Non-Government Aid Organisations.

No, history did not begin three years ago ;-)

“It was twenty years ago today…” well almost. Here is a link to my 1998 PhD Thesis of the above title. It was based on field work I carried out in Bangladesh between 1992 and 1995. Chapter 8 describes the first implementation of what later became the Most Significant Change impact monitoring technique. But there is a lot more of value in this thesis as well, including analysis of the organisational learning literature up to that date, an analysis of the Bangladesh NGO sector in the early 1990s, and a summary of thinking about evolutionary epistemology. Unlike all too many PhDs, this one was useful, even for the immediate subjects of my field work. CCDB was still using the impact monitoring process I helped them set up (i.e. MSC)  when I visited them again in the early 2000’s, albeit with some modifications to suit its expanded use.

Abstract: The aim of this thesis is to develop a coherent theory of organisational learning which can generate practical means of assisting organisational learning. The thesis develops and applies this theory to one class of organisations known as non-government organisations (NGOs), and more specifically to those NGOs who receive funds from high income countries but who work for the benefit of the poor in low income countries. Of central concern are the processes whereby these NGOs learn from the rural and urban poor with whom they work.
The basis of the theory of organisational learning used in this thesis is modern evolutionary theory, and more particularly, evolutionary epistemology. It is argued that this theory provides a means of both representing and assisting organisational learning. Firstly, it provides a simple definition of learning that can be operationalised at multiple scales of analysis: that of individuals, organisations, and populations of organisations. Differences in the forms of organisational learning that do take place can be represented using a number of observable attributes of learning which are derived from an interpretation of evolutionary theory. The same evolutionary theory can also provide useful explanations of processes thus defined and represented. Secondly, an analysis of organisational learning using these observable attributes and background theory also suggest two ways in which organisational learning can be assisted. One is the use of specific methods within NGOs: a type of participatory monitoring. The second is the use of particular interventions by their donors: demands for particular types of information which are indicative of how and where the NGO is learning In addition to these practical implications, it is argued that a specific concern with organisational learning can be related to a wider problematic which should be of concern to Development Studies: one which is described as “the management of diversity”. Individual theories, organisations, and larger social structures may not survive in the face of diversity and change. In surviving they may constrain and / or enable other agents, with feedback effects into the scale and forms of diversity possible. The management of diversity can be analysed descriptively and prescriptively, at multiple scales of aggregation.

 

Twitters posts tagged as #evaluation

This post should feature a continually updated feed of all Twitter tweets tagged as: #evaluation


%d bloggers like this: