The Katine Challenge: How to analyse 540+ stories about a rural development project

The Guardian & Barclays funded and AMREF implemented, Katine Community Partnerships Project in Soroti District, Uganda is exceptional in some respects and all too common in others.

It is exceptional in the degree to which its progress has been very publicly monitored since it began in October 2007. Not only have all project documents been made publicly available via the dedicated Guardian Katine website, but resident and visiting journalists have posted more than 540 stories about the people, the place and the project. These stories provide an invaluable in-depth and dynamic picture of what has been happening in Katine, unparalleled by anything else I have seen in any other development aid project.

On the flip side, the project is all too common in the kind of design and implementation problems that have been experienced, along with its fair share of unpredictable and very influential external events, including dramatic turn-arounds in various government policies. Plus the usual share of staffing and contracting problems.

Right now the project has completed its third year of operation and is now heading into the fourth and final year, one more year than originally planned.

I have a major concern. It is during this final year that there will be more knowledge about the project available than ever before, but at the same time its donors, and perhaps various staff within AMREF, will be becoming more interested in other new events appearing over the horizon. For example, the Guardian will cease its intensive journalistic coverage of the project from this month, and attention is now focusing on their new international development website

So, I would like to pose an important challenge to all the visitors to the Monitoring and Evaluation NEWS website, and the associated MandE NEWS email list:

How can the 540+ stories be put to good use? Is there some form of analysis that could be made of their contents, that would help AMREF, the Guardian, Barclays, the people of Katine, and all of us learn more from the Katine project?

In order to help I have uploaded an Excel file listing all the stories since December 2008, with working hypertext links. I will try to progressively extend this list back to the start of the project in late 2007. This list includes copies of all progress reports, review and planning documents that AMREF has given the Guardian to be uploaded onto their website.

If you have any questions or comments please post them below, as Comments to this posting, in the first instance.

What would be useful in the first instance is ideas about plans or strategies for analysing the data. Then volunteers to actually implement one or more of these plans.

PS: My understanding is that the data is by definition already in the public domain, and therefore anyone could make use of it. However, that use should be fair and not for profit. What we should be searching for here are lessons or truths in some form that could be seen as having wider applicability, which are based on sound argument and good evidence, as much as is possible.

14 thoughts on “The Katine Challenge: How to analyse 540+ stories about a rural development project”

Louise Shaxson says:

14 October, 2010 at 10:59 AM

Rick, I have recently heard of a software package that may fit the bill – designed for meta-analysis of reports and apparently able to cope with a large number of documents. I’ll put you in touch with the person who told me about it.

Louise
Rene Schoenmakers says:

14 October, 2010 at 11:34 AM

Hi Rick,

I suppose you already considered using Sensemaker as an option: http://www.cognitive-edge.com/sensemaker_suite.php

Cheers
rick davies says:

14 October, 2010 at 11:43 AM

Hi Rene
I know of Sensemaker, but I am not sure it is appropriate in this situation. With Sensemaker you want the story providers to self-index (rate/categorise) their stories to enable an indigenous structure to emerge, as a starting point in your own analysis. But in this case the stories have already been collected and were collected by a small number of journalists.
Rene Schoenmakers says:

14 October, 2010 at 10:47 PM

Self-indexation is indeed one of the main features of sensemaker, but I remeber a case in which sensemaker was used in a project where Singaporian policemen indexed hundreds of police cases afterwards. You could ask Irene Guijt for advice on this.

good luck
rick davies says:

16 October, 2010 at 11:03 PM

Re use of Sensemaker, David Snowden has suggested “Well, one way would be to present groups of the stories (individually) to groups of experts or people in equivalent positions for signification/indexing. that would allow analysis and also comparisons of the ways in which different people indexed overlapping material”
Alex Rutto says:

18 October, 2010 at 8:36 AM

Rick,
A meta-analysis of some sort can be done on the stories.QSR NVIVOcan be handyhere
Ari Soemodinoto says:

18 October, 2010 at 8:58 AM

Hi Rick,
I would suggest you to employ a software for qualitative analysis, called N-Vivo, for analyzing those 540+ stories. The challenge is of course selecting relevant sentences/phrases in every story that could be later categorized into themes (or common themes) of the stories (probably this is similar to ‘indexing’). Later on, once a large table of, say, respondents versus the themes, has been produced you could further analyze them using cluster analysis or multidimensional scaling to categorize, e.g., views on positive and negative impacts of the project, indifferent views of the respondents, and other views.
My experience using similar approach (with 40 respondents) showed that the approach was quite sensitive to demonstrate different views of both direct and indirect beneficiaries of tourism development project in West Bali, Indonesia.
Hope this helps.
heiko agnoli says:

18 October, 2010 at 1:16 PM

Hi,
I think that this link could be a good starting point:

http://en.wikipedia.org/wiki/Text_mining
jenny kowalczuk says:

18 October, 2010 at 3:31 PM

As far as tools go, Atlas.ti is pretty fantastic for serving all manner of interrogation and can certainly handle large datasets.

Of course doesn’t help us with how we interrogate the data and to what end…
Ellie Chowns says:

19 October, 2010 at 12:13 PM

A few fairly random ideas:
1) What about starting by simply making a wordcloud?
2) Like others above, my first thought was to use NVivo.
3) How about working with groups of people from Katine to see what they make of them / what their analysis is. Could turn the text into audio, and from English into Ateso and Kumam. Probably not a viable strategy for all 540!
4) Ask journalism students, and development studies students, in different countries (UK, Uganda, and others) to analyse subsets and see what they come up with both in terms of methodologies and in terms of analysis. Offer prizes. (I’d be interested in seeing what differences in analysis there were between professions and between countries).
5) Ask playwrights to have a go at making sense of them by dramatising some.
Good luck!
rick davies says:

19 October, 2010 at 3:01 PM

My proposal (which would need someone with technical expertise to implement it)

1. Set up an online survey, using Survey Monkey, or similar, and invite participation, e.g. from the MandE NEWS email list

2. Have a script within the survey form that would display, at random, one of the 540+ stories. Ask the participant to read this story.

3. Then present the participant with a list of categories they could decide apply or dont apply to the story (as many or as few as they see fit)

4. Then allow participants to add their own text comments on the story

5. Then collect some minimum data about the participants

The classification data on the stories (3 above) can then be used to create a network visualisation of how all the various stories are connected. Different clusters of stories will become identifiable that would not be easily found through a directed search. See my post on Networks of self-categorised stories
Simon Hearn says:

19 October, 2010 at 3:27 PM

Hi Rick,

This is fascinating. Some great ideas. Just building on your classification/connections idea, you could set up different kinds of connections that may exist – e.g. characters, places, activities, events, behaviour changes, successes, failures…

The idea being that you don’t provide the categories beforehand (you won’t necessarily know who the characters are or the influential events) but you suggest what kind of categories would be appropriate and let the reviewers build the taxonomy for you.

Perhaps you have a few rounds where at the beginning the taxonomy is completely open then it narrows down further through subsequent stages.

Of course the analysis is going to be very dependent on the person reviewing the story but perhaps after a while a pattern will emerge that can help you tackle the analysis in a more systematic way. i.e. perhaps a few archetypal or pivotal stories emerge that can then be reviewed by a community or expert panel.

Look forward to seeing how this method evolves.

Simon
rick davies says:

19 October, 2010 at 3:57 PM

With tagging systems people build the taxonomy, as they go along. But (a) one or two word tags have difficulty conveying much complex meaning and (b) the array of tags available will change as more and more participate.
To do a network visualisation there has to be common set of attributes available to all stories (tags or classifications) For this reason, my preference is for a pre-existing taxonomy, but one where people can place a story in one or more categories. Even with 10 categories this means there is a hugh combinatorial space within which people can place stories. And on top of that you provide the respondnet with a comment space, to describe what is specific to the one location in that space they have defined by their choices
Some of the types of categories could be like those you mentioned:
– is this story about success or failure?
– is this story about health, education, livelihoods, ….
– is this story about morality?
– is this story about politics?
– is this story about gender equity?
– is this story about the importance of government taking the lead?
– is this story about the impotance of people organising independently of government
etc
Pingback: Is the Katine project really crowdsourcing? « Paula Cunniffe