Working with messy data sets? Two useful and free tools

I have just come across two useful apps (aka software packages (aka tools)) for when you are working with someone else’s data sets and/or data sets from multiple sources and times. Or,  just your own data that was in a less than perfect state when you last left it :-)

  • OpenRefine: Initially developed by Google and now open source with its own support and development community. You can explore the characteristics of a data set, clean it in quick and comprehensive moves, transform its layout and formats, as well as reconcile and match multiple data sets. There is documentation and videos to show you how to do all this. There is also a book, which you can purchase.The wikipedia entry provides a good overview.
  • Tabula: This package allows you to extract tables of data from pdfs, a task which otherwise can be very tiresome, messy and error prone

And some other packages I have yet to explore

“Quality evidence for policymaking. I’ll believe it when I see the replication”

3ie Replication Paper 1, by Annette N Brown, Drew B Cameron, Benjamin DK Wood, March 2014. Available as pdf

“1. Introduction:  Every so often, a well-publicised replication study comes along that, for a brief period, catalyses serious discussion about the importance of replication for social science research, particularly in economics. The most recent example is the Herndon, Ash, and Pollin replication study (2013) showing that the famous and highly influential work of Reinhart and Rogoff (2010) on the relationship between debt and growth is flawed.

McCullough and McKitrick (2009) document numerous other examples from the past few decades of replication studies that expose serious weaknesses in policy influential research across several fields. The disturbing inability of Dewald et al. (1986) to replicate many of the articles in their Journal of Money, Credit and Banking experiment is probably the most well-known example of the need for more replication research in economics. Yet, replication studies are rarely published and remain the domain of graduate student exercises and the occasional controversy.

This paper takes up the case for replication research, specifically internal replication, or the reanalysis of original data to address the original evaluation question. This focus helps to demonstrate that replication is a crucial element in the production of evidence for evidence-based policymaking, especially in low-and middle-income countries.

Following an overview of the main challenges facing this type of research, the paper then presents a typology of replication approaches for addressing the challenges. The approaches include pure replication, measurement and estimation analysis (MEA), and theory of change analysis (TCA). Although the challenges presented are not new, the discussion here is meant to highlight that the call for replication is not about catching bad or irresponsible researchers. It is about addressing very real challenges in the research and publication processes and thus about producing better evidence to inform development policymaking.”

Other quotes:

“When single evaluations are influential, and any contradictory evaluations of similar interventions can be easily discounted for contextual reasons, the
minimum requirement for validating policy recommendations should be recalculating and re-estimating the measurements and findings using the original raw data to confirm the published results, or a pure replication.”

“On the bright side, there is some evidence of a correlation between public data availability and increased citation counts in the social sciences. Gleditsch (2003) finds that articles published in the Journal of Conflict Resolution that offer data in any form receive twice as many citations as comparable papers without available data (Gleditsch et al. 2003; Evanschitzky et al. 2007). ”

“Replication should be seen as part of the process for translating research findings into evidence for policy and not as a way to catch or call out researchers who, in all likelihood, have the best of intentions when conducting and submitting their research, but face understandable challenges. These challenges include the inevitability of human error, the uncontrolled nature of social science, reporting and publication bias, and the pressure to derive policy recommendations from empirical findings”

“Even in the medical sciences, the analysis of heterogeneity of outcomes, or post-trial subgroup analysis, is not accorded ‘any special epistemic status’ by the United States Food and Drug Administration rules (Deaton 2010 p.440). In the social sciences, testing for and understanding heterogeneous outcomes is crucial to policymaking. An average treatment effect demonstrated by an RCT could result from a few strongly positive outcomes and many negative outcomes, rather than from many positive outcomes, a distinction that would be important for programme design. Most RCT-based studies in development do report heterogeneous outcomes.Indeed, researchers are often required to do so by funders who want studies to have policy recommendations. As such, RCTs as practised – estimating treatment effects for groups not subject to random assignment – face the same challenges as other empirical social science studies.”

“King (2006) encourages graduate students to conduct replication studies but, in his desire to help students publish, he suggests they may leave out replication findings that support the original article and instead look for findings that contribute by changing people’s minds about something. About sensitivity analysis, King (2006 p.121) advises, ‘If it turns out that all those other changes don’t change any substantive conclusions, then leave them out or report them” Aaarrrggghhh!

Rick Davies Comment: This paper is well worth reading!

DEMOCRACY, GOVERNANCE AND RANDOMISED MEDIA ASSISTANCE

BY DEVRA C. MOEHLER, BBC Media Action RESEARCH REPORT // ISSUE 03 // MARCH 2014 // GOVERNANCE. Available as pdf

Foreword by BBC Media Action

“This report summarises how experimental design has been used to assess the effectiveness of governance interventions and to understand the effects of the media on political opinion and behaviour. It provides an analysis of the benefits and drawbacks of experimental approaches and also highlights how field experiments can challenge the assumptions made by media support organisations about the role of the media in different countries.

The report highlights that – despite interest in the use of RCTs to assess governance outcomes – only a small number of field experiments have been conducted in the area of media, governance and democracy.

The results of these experiments are not widely known among donors or implementers. This report aims to address that gap. It shows that media initiatives have led to governance outcomes including improved accountability. However, they have also at times had unexpected adverse effects.

The studies conducted to date have been confined to a small number of countries and the research questions posed were linked to specific intervention and governance outcomes. As a result, there is a limit to what policymakers and practitioners can infer. While this report highlights an opportunity for more experimental research, it also identifies that the complexity of media development can hinder the efficacy of experimental evaluation. It cautions that low?level interventions (eg those aimed at individuals as opposed to working at a national or organisational level) best lend themselves to experimentation. This could create incentives for researchers to undertake experimental research that answers questions focused on individual change rather than wider organisational and systemic change. For example, it would be relatively easy to assess whether a training course does or does not work. Researchers can randomise the journalists that were trained and assess the uptake and implementation of skills. However, it would be much harder to assess how capacity?building efforts affect a media house, its editorial values, content, audiences and media/state relations.

Designing such experiments will be challenging. The intention of this report is to start a conversation both within our own organisation and externally. As researchers we should be prepared to discover that experimentation may not be feasible or relevant for evaluation. In order to strengthen the evidence base, practitioners, researchers and donors need to agree which research questions can and should be answered using experimental research, and, in the absence of experimental research, to agree what constitutes good evidence.

BBC Media Action welcomes feedback on this report and all publications published under our Bridging Theory and Practice Research Dissemination Series.”

Contents
Introduction 5
Chapter 1: Background on DG field experiments 7
Chapter 2: Background on media development assistance and evaluation 9
Chapter 3: Current experiments and quasi?experimental studies on media in developing countries 11
Field experiments
Quasi experiments
Chapter 4: Challenges of conducting field experiments on media development 21
Level of intervention
Complexity of intervention
Research planning under ambiguity
Chapter 5: Challenges to learning from field experiments on media development 26
Chapter 6: Solutions and opportunities 29
Research in media scarce environments
Test assumptions about media effects
To investigate influences on media
References 33

Independent Commission for Aid Impact publishes report on “How DFID Learns”

Terms of Reference for the review

The review itself, available here, published 4th April 2014

Selected quotes:

“Overall Assessment: Amber-Red: DFID has allocated at least £1.2 billion for research, evaluation and personnel development (2011-15). It generates considerable volumes of information, much of which, such as funded research, is publicly available. DFID itself is less good at using it and building on experience so as to turn learning into action. DFID does not clearly identify how its investment in learning links to its performance and delivering better impact. DFID has the potential to be excellent at organisational learning if its best practices become common. DFID staff learn well as individuals. They are highly motivated and DFID provides opportunities and resources for them to learn. DFID is not yet, however, managing all the elements that contribute to how it learns as a single, integrated system. DFID does not review the costs, benefits and impact of learning. Insufficient priority is placed on learning during implementation. The emphasis on results can lead to a bias to the positive. Learning from both success and failure should be systematically encouraged”.

RD Comment: The measurement of organisational learning is no easy matter, so it is likely that a lot of people would be very interested to know more about the ICAI approach. The ICAI report does define learning, as follows:

“We define learning as the extent to which DFID gains and uses knowledge  to influence its policy, strategy, plans and actions. This includes  knowledge from both its own work and that of others. Our report makes a distinction between the knowledge  DFID collects and how it is actively applied, which we term as ‘know-how’.”

Okay, and how is this assessed in practice? The key word in this definition is “influence”. Influencing is a notoriously difficult process and outcome to measure. Unfortunately the ICAI report does not provide an explanation of influence was assessed or measured. Annex 5 does show how the topic of learning was broken down into four areas:  making programme choices; creating theories of change;  choosing delivery mechanisms; and adapting and improving implementation of its activities. The report also provides some information on the sources used: “The 31 ICAI reports  considered by the team examined 140 DFID programmes across 40 countries/territories, including visits undertaken to 24 DFID country offices”….” We spoke to 92 individuals, of whom 87 were DFID staff from:  11 DFID fragile state country offices;  5 non-fragile small country offices;  16 HQ departments; and  13 advisory cadres” But how influence was measured remains unclear. ICAI could do better at modeling good practice here: i.e. transparency of evaluation methods. Perhaps then DFID could learn from how ICAI about how to assess its (DFIDs) own learning, in the future. Maybe…

Other quotes

 “DFID is always losing and gaining knowledge. Staff are continuously leaving and joining DFID  (sometimes referred to as ‘churn’). Fragile states are particularly vulnerable to high staff turnover by UK-based staff. For instance, in Afghanistan, DFID informed us that staff turnover is at a rate of 50% per year. We are aware of one project in the Democratic Republic of Congo having had five managers in five years. DFID inform us that a staff appointment typically lasts slightly under three years.” A table that follows show an overall rate of around 10% per year

 “DFID does not track or report on the overall impact of evaluations .The challenge of synthesising, disseminating and using knowledge from an increasing number of evaluation reports is considerable. DFID reports what evaluations are undertaken and it comments on their quality. The annual evaluation report also provides some summary findings. We would have expected DFID also to report the impact that evaluations have on what it does and what it  achieves. Such reporting would cover actions  taken in response to individual evaluations and their impact on DFID’s overall value for money and effectiveness.” It is the case that some agencies do systematcially track what happens to  the recommendations made in evaluation reports.

“DFID has, however, outsourced much of its knowledge production. Of the £1.5 billion for knowledge generation and learning, it has committed at least £1.2 billion to fund others outside DFID to produce knowledge it can use (specifically research, evaluation and PEAKS). Staff are now primarily consumers of knowledge products rather than producers of knowledge itself. We note that there are risks to this model; staff may not have the practical experience that allows them wisely to use this knowledge to make programming decisions.”

“We note that annual and project completion reviews are resources that are not fully supporting DFID’s learning. We are concerned that the lesson-learning section was removed from the  standard format of these reports and is no longer required. Lessons from these reports are not being systematically collated and that there is no central resource regularly quality assuring reviews. “

RD Comment: Paras 2.50 to 2.52 are entertaining. A UK Gov model is presented of how people learn, DFID staff are interviewed about how they think they learn, then differences between the model and what staff report are ascribed to staff lack of understanding: – “This indicates that DFID staff do not consciously  and sufficiently use the experience of their work for learning. It also indicates, within DFID, an over-identification of learning with formal training” OR… maybe it indicates that the the model was wrong and the staff were right???

This para might also raise a smile or two: “There is evidence that DFID staff are sometimes using evidence selectively. It appears this is often driven by managers requiring support for decisions. While such selective use of evidence is not the usual practice across the department, it appears to be occurring with sufficient regularity to be a concern. It is clearly unacceptable.” Golly…

Rapid Review of Embedding Evaluation in UK Department for International Development

February 2014 Executive Summary ….Final Report

“Purpose of the rapid review:  Since 2009/10, there has been a drive within the Department for International Development (DFID) to strengthen the evidence base upon which policy and programme decisions are made. Evaluation plays a central role in this and DFID has introduced a step change to embed evaluation more firmly within its programmes. The primary purpose of this rapid review is to inform DFID and the international development evaluation community of the progress made and the challenges and opportunities encountered in embedding evaluation across the organisation.”

Selected quotes:…

“There has been a strong drive to recruit, accredit and train staff in evaluation in DFID since 2011. There have been 25 advisers working in a solely or shared evaluation role, a further 12 advisers in roles with an evaluation component, 150 staff accredited in evaluation and 700 people receiving basic training. …While the scaling up of capacity has been rapid, the depth of this capacity is less than required. The number of embedded advisory posts created is significantly fewer than envisaged at the outset, with eight of 25 advisers working 50% or less on evaluation. ”

“The embedding evaluation approach has contributed to a significant, but uneven, increase in the quantity of evaluations commissioned by DFID. These have increased from around 12 per year, prior to 2011, to an estimated 40 completed evaluations in 2013/14”

“The focus of evaluation has changed to become almost exclusively programme oriented. There are very few thematic or country level evaluations planned whereas previously these types of evaluations accounted for the majority of DFID’s evaluation portfolio. This presents a challenge to DFID as it seeks to synthesise the learning from individual projects and programmes into broader lessons for policy and programme planning and design.”

“The embedding evaluation approach has been accompanied by a significant increase in the number of evaluations which has, in turn, led to an increase in the total amount spent on evaluation. However, the average total cost per evaluation has changed little since 2010. ”

“Externally procured evaluation costs appear to be in line with those of other donors. However, forecasts of future spending on evaluation indicate a likely increase in the median amount that DFID pays directly for evaluations. For non-impact evaluations the median budget is £200,000 and for IEs the median budget is £500,000. This represents a significant under-estimation of evaluation costs.”

“Evaluation accounts for a median of 1.9% of programme value, which is in line with expectations. The amount DFID spends on IEs is higher at 2.6% of programme value but this is consistent with the figures of other donors such as the Millennium Challenge Corporation and the World Bank”.

“There has been considerable enthusiasm shown by programme managers for conducting IEs, which now comprise 28% of planned evaluations.”

 

 

 

Do you have a Data Management Plan?

Sam Held discusses Data Management Plans in his 14 February 2014 AEA blog posting on Federal (US) Data Sharing Policies

“A recent trend in the STEM fields is the call to share or access research data, especially data collected with federal funding. The result is requirements from the federal agencies for data management plans in grants, but the different agencies have different requirements. NSF requires a plan for every grant, but NIH only requires plans for grants over $500,000.

The common theme in all policies is “data should be widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and proprietary data” (NIH’s Statement on Sharing Data 2/26/2003). The call for a data sharing plan forces the PIs, evaluators, and those involved with the proposals to consider what data will be collected, how will it be stored and preserved, and what will be the procedures for sharing or distributing the data within privacy or legal requirements (i.e., HIPAA or IRB requirements). To me – the most important feature here is data formatting. What format will the data be in now and still be accessible or usable in the future or to those who cannot afford expensive software?”

He then points to DMPTool – a University of California online system for developing Data Management Plans.  This site includes more than 20 different templates for the plans, provided by different funding bodies.

DMPTool – a website from the University of California system for developing Data Management Plans. The best component of this site is their collection of funder requirements, including those for NIH, NSF, NEH, and some private foundations.  This site includes templates for the plans. – See more at: http://aea365.org/blog/stem-tig-week-sam-held-on-federal-data-sharing-policies/#sthash.U5QbE7zj.dpuf

 

Reflections on research processes in a development NGO: FIVDB’s survey in 2013 of the change in household conditions and of the effect of livelihood trainings

Received from Aldo Benini:

“Development NGOs are under increasing pressure to demonstrate impact. The methodological rigor of impact studies can challenge those with small research staffs and/or insufficient capacity to engage with outside researchers. “Reflections on research processes in a development NGO: Friends In Village Development Bangladesh’s (FIVDB) survey in 2013 of the change in household conditions and of the effect of livelihood trainings” (2013, with several others) grapples with some related dilemmas. On one side, it is a detailed and careful account of how a qualitative methodology known as “Community-based Change Ranking” and data from previous baseline surveys were combined to derive an estimate of the livelihood training effect distinct from highly diverse changes in household conditions. In the process, over 9,000 specific verbal change statements were condensed into a succinct household typology. On the other side, the report discusses challenges that regularly arise from the study design to the dissemination of findings. The choice of an intuitive impact metric (as opposed to one that may seem the best in the eyes of the analyst) and the communication of uncertainty in the findings are particularly critical.”

Produced by Aldo Benini, Wasima Samad Chowdhury, Arif Azad Khan, Rakshit Bhattacharjee, Friends In Village Development Bangladesh (FIVDB), 12 November 2013

PS: See also...

“Personal skills and social action” (2013, together with several others) is a sociological history of the 35-year effort, by Friends In Village Development Bangladesh (FIVDB), to create and amplify adult literacy training when major donors and leading NGOs had opted out of this sector. It is written in Amartya Sen’s perspective that

 “Illiteracy and innumeracy are forms of insecurity in themselves. Not to be able to read or write or count or communicate is itself a terrible deprivation. And if a person is thus reduced by illiteracy and innumeracy, we can not only see that the person is insecure to whom something terrible could happen, but more immediately, that to him or her, something terrible has actually happened”.

The study leads the reader from theories of literacy and human development through adult literacy in Bangladesh and the expert role of FIVDB to the learners’ experience and a concept of communicative competency that opens doors of opportunity. Apart from organizational history, the empirical research relied on biographic interviews with former learners and trainers, proportional piling to self-evaluate relevance and ability, analysis of test scores as well as village development budget simulations conducted with 33 Community Learning Center committees. A beautifully illustrated printed version is available from FIVDB.

 

Meta-evaluation of USAID’s Evaluations: 2009-2012

Author(s):Molly Hageboeck, Micah Frumkin, and Stephanie Monschein
Date Published:November 25, 2013

Report available as a pdf (a big file). See also video and PP presentations (worth reading!)

Context and Purpose

This evaluation of evaluations, or meta-evaluation, was undertaken to assess the quality of USAID’s evaluation reports. The study builds on USAID’s practice of periodically examining evaluation quality to identify opportunities for improvement. It covers USAID evaluations completed between January 2009 and December 2012. During this four-year period, USAID launched an ambitious effort called USAID Forward, which aims to integrate all aspects of the Agency’s programming approach, including program and project evaluations, into a modern, evidence-based system for realizing development results. A key element of this initiative is USAID’s Evaluation Policy, released in January 2011.

Meta-Evaluation Questions

The meta-evaluation on which this volume reports systematically examined 340 randomly selected evaluations and gathered qualitative data from USAID staff and evaluators to address three questions:

1. To what degree have quality aspects of USAID’s evaluation reports, and underlying practices, changed over time?

2. At this point in time, on which evaluation quality aspects or factors do USAID’s evaluation reports excel and where are they falling short?

3. What can be determined about the overall quality of USAID evaluation reports and where do the greatest opportunities for improvement lie?

 Meta-Evaluation Methodology and Study Limitations

The framework for this study recognizes that undertaking an evaluation involves a partnership between the client for an evaluation (USAID) and the evaluation team. Each party plays an important role in ensuring overall quality. Information on basic characteristics and quality aspects of 340 randomly selected USAID evaluation reports was a primary source for this study. Quality aspects of these evaluations were assessed using a 37-element checklist. Conclusions reached by the meta-evaluation also drew from results of four small-group interviews with staff from USAID’s technical and regional bureaus in Washington, 15 organizations that carry out evaluations for USAID, and a survey of 25 team leaders of recent USAID evaluations. MSI used chi-square and t–tests to analyze rating data. Qualitative data were analyzed using content analyses. No specific study limitation unduly hampered MSI’s ability to obtain or analyze data needed to address the three meta-evaluation questions. Nonetheless, the study would have benefited from reliable data on the cost and duration of evaluations, survey or conference call interviews with USAID Mission staff, and the consistent inclusion of the names of evaluation team leaders in evaluation reports.”

Rick Davies comment: Where is the dataset? 340 evaluations were scored on a 37 point checklist. Ten of the 37 checklist items used to creat an overall “score” This data could be analysed in N different ways by many more people, it it was made readily available. Responses please, from anyone..

 

LineUp: Visual Analysis of Multi-Attribute Rankings

Gratzl, S., A. Lex, N. Gehlenborg, H. Pfister, and M. Streit. 2013. “LineUp: Visual Analysis of Multi-Attribute Rankings.IEEE Transactions on Visualization and Computer Graphics 19 (12): 2277–86. doi:10.1109/TVCG.2013.173.

“Abstract—Rankings are a popular and universal approach to structuring otherwise unorganized collections of items by computing a rank for each item based on the value of one or more of its attributes. This allows us, for example, to prioritize tasks or to evaluate the performance of products relative to each other. While the visualization of a ranking itself is straightforward, its interpretation is not, because the rank of an item represents only a summary of a potentially complicated relationship between its attributes and those of the other items. It is also common that alternative rankings exist which need to be compared and analyzed to gain insight into how multiple heterogeneous attributes affect the rankings. Advanced visual exploration tools are needed to make this process ef?cient. In this paper we present a comprehensive analysis of requirements for the visualization of multi-attribute rankings. Based on these considerations, we propose LineUp – a novel and scalable visualization technique that uses bar charts. This interactive technique supports the ranking of items based on multiple heterogeneous attributes with different scales and semantics. It enables users to interactively combine attributes and ?exibly re?ne parameters to explore the effect of changes in the attribute combination. This process can be employed to derive actionable insights as to which attributes of an item need to be modi?ed in order for its rank to change. Additionally, through integration of slope graphs, LineUp can also be used to compare multiple alternative rankings on the same set of items, for example, over time or across different attribute combinations. We evaluate the effectiveness of the proposed multi-attribute visualization technique in a qualitative study. The study shows that users are able to successfully solve complex ranking tasks in a short period of time.”

“In this paper we propose a new technique that addresses the limitations of existing methods and is motivated by a comprehensive analysis of requirements of multi-attribute rankings considering various domains, which is the ?rst contribution of this paper. Based on this analysis, we present our second contribution, the design and implementation of LineUp, a visual analysis technique for creating, re?ning, and exploring rankings based on complex combinations of attributes. We demonstrate the application of LineUp in two use cases in which we explore and analyze university rankings and nutrition data. We evaluate LineUp in a qualitative study that demonstrates the utility of our approach. The evaluation shows that users are able to solve complex ranking tasks in a short period of time.”

Rick Davies comment: I have been a long time advocate of the usefullness of ranking measures in evaluation, because they can combine subjective judgements with numerical values. This tool is focused on ways of visualising and manipulating existing data rather than elicitation of the ranking data (a seperate and important issue of its own). It includes lot of options for weighting different attributes to produce overall ranking scores

Free open source software, instructions, example data sets, introductory videos and more available here

Qualitative Comparative Analysis (QCA) An application to compare national REDD+ policy processes

 

Sehring, Jenniver, Kaisa Korhonen-Kurki, and Maria Brockhaus. 2013. “Qualitative Comparative Analysis (QCA) An Application to Compare National REDD+ Policy Processes”. CIFOR. http://www.cifor.org/publications/pdf_files/WPapers/WP121Sehring.pdf.

“This working paper gives an overview of Qualitative Comparative Analysis (QCA), a method that enables systematic cross-case comparison of an intermediate number of case studies. It presents an overview of QCA and detailed descriptions of different versions of the method. Based on the experience applying QCA to CIFOR’s Global Comparative Study on REDD+, the paper shows how QCA can help produce parsimonious and stringent research results from a multitude of in-depth case studies developed by numerous researchers.QCA can be used as a structuring tool that allows researchers to share understanding and produce coherent data, as well as a tool for making inferences usable for policy advice.

REDD+ is still a young policy domain, and it is a very dynamic one. Currently, the benefits of QCA result mainly from the fact that it helps researchers to organize the evidence generated. However, with further and more differentiated case knowledge, and more countries achieving desired outcomes, QCA has the potential to deliver robust analysis that allows the provision of information, guidance and recommendations to ensure carbon-effective, cost-efficient and equitable REDD+ policy design and implementation.”

Rick Davies comment: I like this paper because it provides a good how-to-do-it overview of different forms of QCA, illustrated in a step-by-step fashion with one practical case example.  It may not be quite enough to enable one to do a QCA from the very start, but it provides a very good starting point

%d bloggers like this: