Overview: An open source document clustering and search tool

Overview is an open-source tool originally designed to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. It’s also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more. Overview does at least three things really well.

  • Find what you don’t even know to look for.
  • See broad trends or patterns across many documents.
  • Make exhaustive manual reading faster, when all else fails.

Search is a wonderful tool when you know what you’re trying to find — and Overview includes advanced search features. It’s less useful when you start with a hunch or an anonymous tip. Or there might be many different ways to phrase what you’re looking for, or you could be struggling with poor quality material and OCR error. By automatically sorting documents by topic, Overview gives you a fast way to see what you have .

In other cases you’re interested in broad patterns. Overview’s topic tree shows the structure of your document set at a glance, and you can tag entire folders at once to label documents according to your own category names. Then you can export those tags to create visualizations.

Rick Davies Comment: This service could be quite useful in various ways, including clustering sets of Most Significant Change (MSC) stories, or micro-narratives form SenseMaker type exercises, or collections of Twitter tweets found via a key word search. For those interested in the details, and preferring transparency to apparent magic, Overview uses the k-means clustering algorithm, which is explained broadly here. One caveat, the processing of documents can take some time, so you may want to pop out for a cup of coffee while waiting. For those into algorithms, here is a healthy critique of careless use of k-means clustering i.e. not paying attention to when its assumptions about the structure of the underlying data are inappropriate

It is the combination of searching using keywords, and the automatic clustering that seems to be the most useful, to me…so far. Another good feature is the ability to label clusters of interest with one or more tags

I have uploaded 69 blog postings from my Rick on the Road blog. If you want to see how Overview hierarchically clusters these documents let me know, I then will enter your email, which will then let Overview give you access. It seems, so far, that there is no simple way of sharing access (but I am inquiring).

M&E Software: A List

Well, the beginnings of a list…

Please note: No guarantee can be given about the accuracy of information provided on the linked websites about the M&E software concerned, and its providers

Stand alone systems

  • AidProject M+E for Donor-funded aid projects
  • Flamingo and Monitoring Organiser: “In order to implement FLAMINGO, it is crucial to first define the inputs (or resources available), activities, outputs and outcomes”
  • HIV/AIDS  Data Capturing And Reporting Platform[Monitoring and Evaluation System]
  • Impact Execution Software -from Newdea –  Bridging the gap between activities and outcomes for funders and programs
  • PacPlan: “Results-Based Planning, Monitoring and Evaluation Software and Process Solution”
  • Prome Web: A project management, monitoring and evaluation software. Adapted for aid projects in developing countries
  • Sigmah: “humanitarian project management open source software”

Online systems

  • Activity Info: “an online humanitarian project monitoring tool, which helps humanitarian organizations to collect, manage, map and analyze indicators. ActivityInfo has been developed to simplify reporting and allow for real-time monitoring”
  • AKVO: “a paid-for platform that covers data collection, analysis, visualisation and reporting”
  • DevResults: “web-based project management tool specially designed for the international development community.” Including M&E, mapping, budgeting, checklists, forms, and collaboration facilities.
  • Granity: “Management and reporting software for Not-for-profits Making transparency easy”
  • IndiKit: Guidance on SMART indicators for relief and development programmes
  • Kashana: An open sourced, web-based Monitoring, Evaluation & Learning (MEL) product for development projects and organisations
  • KI-PROJECTS™ MONITORING AND EVALUATION SOFTWARE:
  • Kobo Toolbox: “a free, more user-friendly way to deploy Open Data Kit surveys. It was developed with humanitarian purposes in mind, but could be used in various contexts (and not just for surveys). There is an Android data collection app that works offline”
  • Logalto:”Collaborative Web-Based Software for Monitoring and Evaluation of International Development Projects”
  • M&E Online: “Web-based monitoring and evaluation software tool”
  • Monitoring and Evaluation Online: Online Monitoring and Evaluation Software Tool
  • Systmapp: “cloud-based software that uses a patent-pending methodology to connect monitoring, planning, and knowledge management for international development organisations”
  • TCS Aid360: “a web-based system enabling digitisation for the social development sector. It is a modular solution that supports Grant Management, Planning, Monitoring & Evaluation”
  • Views  online monitoring, evaluation and reporting system
  • WebMo: Web-based project monitoring for development cooperation

Survey supporting software

  • EpiSurveyor lets anyone create an account, design forms, download them to phones, and start collecting data in minutes, for free.
  • EthnoCorder is mobile multimedia survey software for your iPhone
  • KoBoToolbox is a suite of tools for field data collection for use in challenging environments. Free and open source
  • Magpi 
  • Mobile data collection tools – Comparison matrix – 13 tools including above
  • Online Survey Comparison Chart, comparing six different services
  • Open Data Kit (ODK) is a free and open-source set of tools which help organizations author, field, and manage mobile data collection solution
  • REDCap,a secure web application for building and managing online surveys and databases… specifically geared to support online or offline data capture for research studies and operations
  • Sensemaker(c) “links micro-narratives with human sense-making to create advanced decision support, research and monitoring capability in both large and small organisations.”

Sector specific tools

  • Mwater for WASH, which explicitly aims to make the data (in this case water quality). Free and open source
  • Adaptive Management Software for Conservation projects. https://www.miradi.org/

Qualitative data analysis

  • Dedooose, A cross-platform app for analyzing qualitative and mixed methods research with text, photos, audio, videos, spreadsheet data and more
  • Nvivo, powerful software for qualitative data analysis.
  • HyperRESEARCH “…gives you complete access and control, with keyword coding, mind-mapping tools, theory building and much more”.

Data mining / predictive modeling

  • RapidMiner Studio. Free and paid for versions. Data Access (Connect to any data source, any format, at any scale), Data Exploration (Quickly discover patterns or data quality issues). Data Blending (Create the optimal data set for predictive analysis), Data Cleansing (Expertly cleanse data for advanced algorithms), Modeling (Efficiently build and delivers better models faster), Validation (Confidently & accurately estimate model performance)
  • BigML. Free and paid for versions. Online service. “Machine learning made easy”
  • EvalC3: Tools for exploring and evaluating complex causal configurations, developed by Rick Davies (Editor of MandE NEWS). Free and available with Skype video support

Program Logic Modelling

  • DoView – Visual outcomes and results planning
  • Dylomo: ” a free* web-based tool that you can use to build and present program logic models that you can interact with”
  • IdeaTree – Simultaneous Collaboration & Brainstorming Using Mind Maps
  • Logframer 1.0 “a free project management application for projects based on the logical framework method”
  • Theory maker: a free web app by Steve Powell for making any kind of causal diagram, i.e. a diagram which uses arrows to say what contributes to what.
  • TOCO – Theory of Change Online. A free version is available.
  • DCED’s Evidence Framework – more a way of using a website than software as such, but definitely an approach that is replicable by others.
  • yEd – diagram editor that can be used to generate drawings of diagrams.  FREE. PS: There is now a web-based version of this excellent network drawing application

Excel-based tools

  • EvalC3: Tools for exploring and evaluating complex causal configurations, developed by Rick Davies (Editor of MandE NEWS). Free and available with Skype video support

Uncategorised yet

  • OpenRefine: Formerly called Google Refine is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
  • Overview is an open-source tool originally designed to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. It’s also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more. Overview does at least three things really well.

Other lists

Other other

If you have software, or lists of software, which you would like to see added here, please use the Comment facility below

Social Network Analysis software: A list

A. Software I have some familiarity with:

UCINET & NetDraw ( a combined package)

  • Easy to import data from Excel
  • Has a huge range of abilities to manipulate and edit the raw data
  • Has an online support group (Yahoo Groups)
  • There is a detailed how to use it text
  • Files can be read by many other software packages
  • Not very expensive, and there is a free trial period
  • Undergoing continuous development
  • Widely used
  • Not easy to draw network diagrams on screen
  • Steep learning curve, many more bells and whistles than you may need
  • No easy to use introductory texts
  • Not easy to edit node and link attribute data on the NetDraw screen
  • PS: See Louse Clark’s very useful and detailed guide to working with NetDraw; and “A Brief Guide to Using NetDraw” by Steve Borgatti; and NETDRAW – BASIC A Practical Guide to Visualising Social Networks by ONA Surveys

Visualyzer

  • Perhaps my favorite, beause it is easy to draw and edit networks on screen, which is very useful in workshop settings
  • Attributes of nodes and links can be easily edited and displayed
  • Can import and export UCINET data
  • Very user-friendly manual
  • Free trial period
  • Now available at a more reasonable price!
  • No online support group
  • Does not seem to be undergoing continuous development

yED Graph Editor

  • Very good for network drawing
  • Many options for layouts
  • Can export files to work as web pages
  • Nodes can include weblinks, allowing quick access to much more information about each node
  • Free
  • Latest version (3.5) can now open data from Excel worksheets, in matrix, edgelist (relationships)  and nodelist (actors) forms. Including as many attributes for the actors and relationships as needed. It seems it will import both one and two mode  (adjacency and affiliation) matrices. This is a major improvement.
  • They are working on capacity to export back to Excel, and ability to search actors and relationships by attribute. Both will be very useful
  • yED is rapidly moving up my list of most favored SNA software packages
  • Now also available as an online version: yED Live
  • Limited analysis capacity

Microsoft NodeXL

  • Free, works as a plug-in to Excel 2007
  • Undergoing continuous development
  • Online support group
  • All node and link attribute data is visible and easy to edit in Excel sheets, which is great
  • Nodes can include weblinks, I think
  • There is a useful users guide here
  • You can’t draw the network direct on the screen,
    • But by using the Excel sheet immediately below the screen you can add nodes and links, and edit their attributes, very easily
  • I have had difficulty in importing yEd (GraphML)
    • PS: They report this is being addressed
  • The layout options (different algorithms) seem quite limited
  • I dont yet know as much about it as the other packages above

C-IKNOW

  • Iattended a presentation on C-IKNOW at the 2010 INSNA conference and found this package very impressive, for two broad reasons:
    • User-friendliness
    • Sophisticated range of capacities
  • This is an online service that is open to use by anyone, free of charge
  • Data can be imported, exported and generated by an associated online survey mechanisms
  • There are multiple videos showing how different aspects of the package works, along with a detailed downloadable user guide
  • Development is ongoing and led by Noshir Contactor, a very smart person, and co-author of Theories of Communication Networks

Discourse Network Analyzer

  • software which combines social network analysis and category-based content analysis. After applying categories to text portions, you can automatically extract two-mode networks or one-mode co-occurrence networks in several file formats. There are also some algorithms for longitudinal analysis.”
  • Exports to Excel (in CSV format),  DL files (UCINET), and GraphML files (visone, yEd etc)
  • Free
  • Looks useful but I have yet to try it out on my own data

Visone

  • Patrick Kenis describes this as “very intuitive programme which can be used instantly in consultancy settings”
  • Free
  • Easy to draw networks live on screen
  • Continuous development, but not so often as UCINET

Gephi (last comments added 21 April 2011)

  • Open source (free)
  • Undergoing continuous development, but not so often as UCINET
  • Very sophisticated graphics, the emphasis is on visualisation as a means of exploratory data analysis
  • Capable of visualising very large networks quickly
  • Dynamic views of networks, as they change over time
  • Many filtering options
  • As in NodeXL, has a Data Table view to browse and edit data
  • Drawing networks on the screen is possible, but not so intuitive
  • Imports GraphML files (e.g. as used by yED, NodeXL), vna (as used by Netdraw), csv (used by Excel etc). Exports as csv (for Excel etc) and GraphML.
  • Has Plugins e.g. Social Network Data Import
  • Looks like it could become very good, in time

Others not yet examined in any detail

Inflow: [Not yet tested, but looks good]

Social Networks Visualizer (SocNetV) [Not yet tested, but used by Valdis Krebs]

Cytoscape Thomas Delahais says: “I’ve been using consistently Cytoscape, which was designed for neuro-biological analysis but works very well for social sciences! Cytoscape is free, open source and you should complete it with the Max Planck Analyser Plugin, which includes all or most of the usual indicators (diameter, shortest path, etc.) in a unique interface (free for non-commercial use if I remember well). Cytoscape needs some formatting first but then it is very easy to use, very easy to draw on screen too. As a sidenote this is the software I picked when I decided that Ucinet was too complicated for transferring this competency to my colleagues”

SocioWorks “is an innovative set of web tools for the online application of Social Network Analysis (SNA) methods to collect and analyze data regarding social relationships, from individual to institution to national levels.” (posted 2013 05 02)

B. Lists of software most of which I dont know about, maintained by:

  1. KM4DEV list
  2. Wikipedia list
  3. International Network for Social Network Analysis list
  4. Top 10 Open-Source Social network  Development Platforms
  5. Mark Round’s  “SNA Tools and Formats diagram – updated”showing how different software packages are linked by use of the same data formats

The number of social network analysis packages is exploding, a bit like the Cambrian explosion of organic life. No software package has yet achieved dominance because of its ability to meet a wide variety of needs.

C. Online SNA software

  • IdeaTree was not developed as SNA software, but in practice provides many of the same functions, in terms of visualisation. Key features: (a) it supports online collaborative development of network diagrams, (b) it seems quite user friendly, (c) data can be exported in XML, which can be converted elsewhere into graphml, and as pdf documents

PS April 2011: GraphML is a format for storing network data, used by yED, Gephi, and others. The GraphML Primer provides a simple introduction to its use.