Monitoring and Evaluation NEWS – Page 2 – A news service focusing on developments in monitoring and evaluation methods relevant to development programmes with social development objectives. Managed by Rick Davies, since 1997

The Fallacy of AI Functionality

Evaluators should have a basic working knowledge of how to evaluate algorithms used to manage human affairs (law, finance, social services, etc) because algorithm designs embody human decisions and can have large scale consequences. For this reason I recommend:

Raji ID, Kumar IE, Horowitz A, et al. (2022) The Fallacy of AI Functionality. In: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul Republic of Korea, 21 June 2022, pp. 959–972. ACM. DOI: 10.1145/3531146.3533158.

Deployed AI systems often do not work. They can be constructed haphazardly, deployed indiscriminately, and promoted deceptively. However, despite this reality, scholars, the press, and policymakers pay too little attention to functionality. This leads to technical and policy solutions focused on “ethical” or value-aligned deployments, often skipping over the prior question of whether a given system functions, or provides any benefits at all. To describe the harms of various types of functionality failures, we analyze a set of case studies to create a taxonomy of known AI functionality issues. We then point to policy and organizational responses that are often overlooked and become more readily available once functionality is drawn into focus. We argue that functionality is a meaningful AI policy challenge, operating as a necessary first step towards protecting affected communities from algorithmic harm.

CONTENTS
1. Introduction
2. Related work
3. The functionality assumption
4. The many dimensions of disfunction
4.1 Methodology
4.2 Failure taxonomy
4.2.1 Impossible Tasks
Conceptually Impossible.
Practically Impossible
4.2.2 Engineering Failures
Model Design Failures
Model Implementation Failures
Missing Safety Features
4.2.3 Deployment Failures
Robustness Issues
Failure under Adversarial Attacks
Unanticipated Interactions
4.2.4 Communication Failures
Falsified or Overstated Capabilities
Misrepresented Capabilities
5 DEALING WITH DYSFUNCTION: OPPORTUNITIES FOR INTERVENTION ON FUNCTIONAL SAFETY
5.1 Legal/Policy Interventions
5.1.1 Consumer Protection
5.1.2 Products Liability Law.
5.1.3 Warranties
5.1.4 Fraud
5.1.5 Other Legal Avenues Already Being Explored
5.2 Organizational interventions
5.2.1 Internal Audits & Documentation.
5.2.2 Product Certification & Standards
5.2.3 Other Interventions
6 CONCLUSION : THE ROAD AHEAD

Using ChatGPT as a tool for the analysis of text data

Warning: This post is a bit of a ramble, with new sections being added to the end, as further thoughts come to mind.

1. 1. Introduction to ChatGPT, by ChatGPT
2. See also this newer page Exploring the text analytic capacities of GPT4
  
  Introduction to ChatGPT, by ChatGPT
  
  I asked ChatGPT this question: “Describe ChatGPT in 200 words, for a general, non-specialist reader”
  
  The response: “ChatGPT is a language model developed by OpenAI. It is based on a deep learning technique called transformer, which allows it to generate human-like text. It has been trained on a large dataset of text from the internet, and can be fine-tuned for specific tasks such as language translation, text summarization, and text generation. ChatGPT is capable of understanding and generating text in multiple languages, and can respond to a wide range of questions, prompts, and commands. It can also generate creative content such as stories and poetry. The model can be used for various applications like virtual assistants, chatbots, automated writing and more. However, it is important to note that while ChatGPT is highly advanced, it is not a human and may not always understand context or nuances in language and may produce errors or biases present in the training data”.
  
  Here is an introductory video providing a beginners guide to how to access and use ChatGPT: https://www.youtube.com/watch?v=JTxsNm9IdYU
  
  Using Chat GPT to analyses bodies of text
  
  Why: This is my current interest, where I think ChatGPT is already useful in its current form
  
  The challenge: Evaluators often have to deal with large volumes of text data, including
  - - official documents describing policies and programmes,
    - records of individual interviews and group discussions.
  Manual analysis of this material can be very time consuming. In recent years a number of different software packages have been developed which are useful for different forms of content analysis. These are generally described as text analytics, text mining and Natural Language Processing (NLP) methods. I have experimented with some of these methods, including clustering tools like Topic Modelling, sentiment analysis methods, and noun and key word extraction tools.
  
  From my limited experience to date, ChatGPT seems likely to leave many of these tools behind. Primarily on criteria such as flexibility and usability. I am less certain on criteria such as transparency of process and replicability of results. I need to give these more of my attention
  
  How it works – from a user perspective
  
  Here below is the user interface, seen after you have logged on. You can see prompt I have written in the top of the white section. Then underneath is the ChatGPT response. I then have two options.
  - To click on “Regenerate Response” to create an alternative body of text to the one already shown. This can be done multiple times, until new variant responses are no longer generated. It is important to use this option because in your specific context one response may be more suitable than others, and ChatGPT won’t know the details of your context, unless it is described in the prompt
  - To create a new prompt, such as “Simplify this down to 200 words, using less technical language”. The dialogic process of writing prompts, reading results, writing prompts and reading results can go on as long as needed. A point to note here is that ChatGPT remembers the whole sequence of discussion, as context for the most current prompt. But you can start a new chat at any point, and when you do so the old one will remain listed in the left side panel. But it will no longer be part of ChatGPT’s current memory, when responding to the current prompt.
  There is a similarity between these two functions and March’s (1991) distinction between two complimentary approaches to learning: Exploration and Exploitation. With regeneration being more exploratory and refined prompts being more exploitative.
  
  But bear in mind that ChatGPT is using data that was available up to 2021. It does not (yet) have real time access to data on the internet. When it does, that will be another major step forward. Fasten your seat belts!
  .
  
  The design of prompts
  
  This is the key to the whole process. Careful design of prompts will deliver more rewards. The more clearly specified your request, the more likely you will see results which are useful.
  
  I will now list some of the prompts, and kinds of prompts, I have experimented with. These have all been applied to paragraphs of text generated by a ParEvo exercise (which I cant quote here for privacy reasons).
  - - Text summarisation
      - Summarize the following text in 300 words or less
      - Write a newspaper headline for the events described in each of the three paragraphs
    - Differentiation of texts
      - Identify the most noticeable differences between the events described in the following two paragraphs of text
      - Identify three differences between the two paragraphs of text
      - Pile sorting
        
        Sort there three paragraphs of text into two piles of paragraphs, and describe what you think is the most significant difference between the two sets of paragraphs, in terms of the events they are describing.
    - Evaluation of content on predefined criteria
      - All three paragraphs describe imagined futures. Rank these three paragraphs in terms of their optimism, and explain why they have been ranked this way
      - All three paragraphs already provided above describe imagined futures. Rank these three paragraphs in terms of their realism i.e how likely the events in the paragraphs could actually happen. Then explain why they have been ranked this way
    - Evaluation of content on unspecified criteria
      - For each of the three paragraphs provided above, list 5 adjectives that best describe the events in those paragraphs
    - Actor extraction
      - Describe the types of actors mentioned in each of the two piles. By actors I mean people, groups, organisations and states
      - Using three bullet points, list the people, groups, organisations and countries named in each of these three paragraphs of text.
    - Relationship extraction
      - Using the list of actors already generated, identify alliances (e.g. commonalities of interests) that exist between any these actor. List these in order of the strength of evidence that that an alliance exists
    - Network mapping
      - Create an affiliation matrix based on the above data about actors and their relationships, where actors are listed row by row, and the individual alliances are listed by column. The latter can be numbered or named. The cell contents will be 0 or 1, where 1 represents a specific actors membership of a specific alliance and 0 represents non membership
3. 2023 03 03 Postscript. Check out Pickaxe. Here you can design a simple (no code required) app that you and others can use to carry out specific AI based inquiries. Such as sentiment analysis or actor extraction. Try my Pickaxe AI Pile Sorting Exercise Vs 2
  
  Extracting causal theories – Steve Powell’s explorations
  
  Steve Powell, a colleague of mine (who developed the Causal Map app) has been testing the following prompts, with the intention to identify if ChatGPT can usefully identify causal connections between events in a text:
  - - Please look at this sentence and identify the cause and the effect.
    - Great, now can you just identify cause and effect pairs in this sentence
    - Please do that again but with only the cause and effect pairs you are most sure about
    - Thanks, can you do that again so that, when a causal factor (cause or effect) is part of a longer chain, you use the same name for it when you report it as effect in one pair and as cause in another pair?
  I have not quoted the original text that was presented for analysis, nor all the results. But here is the response to the last prompt:
  - - Sure, here are the cause and effect pairs in the text that I am most sure about:
      Cause: “Organisation 1” Effect: “Changed behaviors in caring for health”
      Cause: “Changed behaviors in caring for health” Effect: “Considerably healthy”
      Cause: “Prohibitions on going to the hospital” Effect: “Going to church”
      Cause: “Use of mosquito nets” Effect: “Decrease in getting sick”
      Cause: “Changed behaviors in caring for health” Effect: “Taking three baths per day”
      Cause: “Changed behaviors in caring for health” Effect: “Washing hands after visiting the washrooms”
      Note that I’ve used the same name “Changed behaviors in caring for health” for the cause in the first and fifth pair and as the effect in the second pair.
  Steve also notes that… another important challenge I forgot to mention is to get it to abstract a little, so that it uses the same labels for similar concepts across paragraphs and cases. This is a task that I also felt needs further exploration through better designed prompts, to see if ChatGPT can do concept abstraction or not.
  
  He also reports that he will work a prompt to get it to format its output for more easy upload into Causal Map. That could be very useful for many people, including myself. I have been doing the same kind of prompt development with text descriptions of network relationships to be converted into a data format that can be imported into social network analysis software – see further above.
  
  PS: Read more of Steve’s thoughts and experiment with ChatGPT and causality on his blog here: Causal Map: Making Connections
  
  Other applications that may be useful to evaluators
  
  The one feature that seems to have received the least attention, within the generalist media, is how ChaptGP and its related products Open AI Codex, could have a major impact on software development . “Proficient in more than a dozen programming languages, Codex can now interpret simple commands in natural language and execute them on the user’s behalf” – executing them meaning write relevant functional computer code.
  
  In response a query I made, ChatpGP replied
  
  “As an AI language model, I can help users with coding suggestions and provide information on a wide range of programming languages. Some of the most common programming languages that I can assist with include:
  
  Python: a high-level, interpreted language known for its simplicity and ease of use, often used in data science and machine learning.
  
  Java: a popular object-oriented language used for developing large-scale applications.
  
  JavaScript: a client-side scripting language used for creating interactive web pages and web applications.
  
  C++: a powerful, high-performance language used for building operating systems, games, and other resource-intensive applications.
  
  Ruby: a dynamic, object-oriented language used for web development, especially with the Ruby on Rails framework.
  
  PHP: a server-side scripting language used for building dynamic websites and web applications.
  
  Swift: a modern programming language developed by Apple for iOS, macOS, watchOS, and tvOS development.
  
  C#: an object-oriented language developed by Microsoft, used for developing Windows desktop applications, games, and other applications.
  
  These are just a few examples of the many programming languages I can provide assistance with, and I can also provide guidance on other languages based on the user’s needs and preferences.
  
  What it does not also mention is that it can provide advice on relevant Excel formula, macros and VBA code. A capacity likely to be relevant to a wider group of evaluators
  
  One point to note about this capacity, is that testing the answers is straightforward in most cases. It either works or does not, and if it does work it should be easy enough to identify if the results are correct or not.
  
  There are a few videos available online that explain what can be done by combining use of ChatGPT and Excel:
  Quotes of note..
  
  “As the novelty of that surprise wears off, it is becoming clear that ChatGPT is less a magical wish-granting machine than an interpretive sparring partner”
  
  Crypto was money without utility,” he argued, while tools such as ChatGPT are, “for now, utility without money.”
  
  “It’s going to be fascinating to see how people incorporate this second brain into their job,”
  
  “…you’re curious how GPT and other AI tools are going to change “the way people talk about talking, write about writing, and think about thinking.”
  
  “If the old line was “Learn to code,” what if the new line is “Learn to prompt”? Learn how to write the most clever and helpful prompts in such a way that gives you results that are actually useful.”
  
  “Your job won’t be replaced by AI but it may be replaced by someone who knows how to use AI better than you…”
  
  In summary…as of 22 February 2023
  
  Seeing ChatGPT as “…an interpretive sparring partner…” is a good approximate description. Another is that working with ChatGPT is (as others have already said) like working with an intern that has at least a Masters degree (or more) in every subject you need to be working with. The trouble is that this intern is not above bluffing and bullshitting when it cant find any thing better (i.e. more informed/detailed/accurate) to say. So you need to get past the understandable “Wow” reaction to its apparent intelligence and creativity, and lift your own game to the level where you are ready and able to critically review what ChapGPT has responded with. Then, through further dialogue with ChatGPT, get it to know when some of its answers are not acceptable and, through further feedback, to improve on its own performance thereafter.
  
  Which will of course mean you will then (again) need to get past any (additional) “Wow” reaction to its (additional) apparent intelligence and creativity, and lift your own game to (an additional) another level where you are ready and able to critically review what ChapGPT has responded with”…. :-) The ball comes back into your court very quickly. And it does not show evidence of tiring, no matter how long the dialogue continues.
  
  Lame criticisms – a miscellany with some responses
  1. But the data its responses are based on is biased. Yes, true. Welcome to the world. All of us see the world through a biased sample of the world and what it has to offer. With AI like ChatGP we have an opportunity, not yet realised, to be able to see the nature of that bias…what kind of data has been included and what kind has been excluded.
  2. But it gets things wrong. Yes, true. Welcome to the world. So do we humans. When this seems to be happening we often then ask questions, and explore different approaches. ChatGPT builds in four options of this kind. As explained above. 1. Ask follow up queries, 2. Regenerate a response, 3. Channel feedback via the thumbs up/down, 4. Start a new chat. The clue is in the name “chat” i.e dialogue, to use a fancier name.
  3. It is/is not sentient/conscious. I am just not sure if this is a helpful claim or debate. All we have access to is its behavior, not interior states, whatever shape of form they may take, if any. Again, perhaps, welcome to the world, of humans and other beings. We do know that AI, like ChaGPT, can be asked to respond in the style of x type person or entity. As we also are, when we take on different social roles. In future, when its data base is updated to include post November 2022 information, that will include data about itself and how various humans have reacted to and think about ChatGPT. It will have a form of self-knowledge, acquired via others. Like aspects of ourselves. But probably a lot more diverse and contradictory than the social feedback that individual’s generally get. How will that effect its responses to human prompts thereafter, if at all, I have no idea. But it does taken me into the real of values or meta-rules, some of which it must already have, installed by its human designers, in order to prevent presently foreseeable harms. This takes us into the large and growing area of discussion around the alignment problem (Christian, 2020)
  PS: There seem to be significant current limitations to ChatGPT’s ability to build up self-knowledge from user responses. Each time a new Chat is started no memory is retained of the contents of previous chats (which include users responses). Even within a current chat there appears to be a limit on how many prior prompts and associated responses (and the information they all contain), can be accessed by ChatGPT.

PS 2023 02 28 A new article on how to communicate with ChaGPT and the like: Tech’s hottest new job: AI whisperer. No coding required. Washington Post 25/02/2023

Systems Mapping: How to build and use causal models of systems

Authors: Pete Barbrook-Johnson, Alexandra S. Penn

Highly commended, both for the content, and for making the whole publication FREE !!

Available in pdf form, as a whole or in sections here

Overview

- Provides a practical and in-depth discussion of causal systems mapping methods
- Provides guidance on running systems mapping workshops and using different types of data and evidence
- Orientates readers to the systems mapping landscape and explores how we can compare, choose, and combine methods
- This book is open access, which means that you have free and unlimited access

Contents:

Introduction Pete Barbrook-Johnson, Alexandra S. Penn Pages 1-19 Open Access PDF

Rich Pictures Pete Barbrook-Johnson, Alexandra S. Penn Pages 21-32 Open Access PDF

Theory of Change Diagrams Pete Barbrook-Johnson, Alexandra S. Penn Pages 33-46 Open Access PDF

Causal Loop Diagrams Pete Barbrook-Johnson, Alexandra S. Penn Pages 47-59Open Access PDF

Participatory Systems Mapping Pete Barbrook-Johnson, Alexandra S. Penn Pages 61-78 Open Access PDF

Fuzzy Cognitive Mapping Pete Barbrook-Johnson, Alexandra S. Penn Pages 79-95 Open Access PDF

Bayesian Belief Networks Pete Barbrook-Johnson, Alexandra S. Penn Pages 97-112 Open Access PDF

System Dynamics Pete Barbrook-Johnson, Alexandra S. Penn Pages 113-128 Open Access PDF

What Data and Evidence Can You Build System Maps From? Pete Barbrook-Johnson, Alexandra S. Penn Pages 129-143 Open Access PDF

Running Systems Mapping Workshops Pete Barbrook-Johnson, Alexandra S. Penn Pages 145-159 Open Access PDF

Comparing, Choosing, and Combining Systems Mapping Methods Pete Barbrook-Johnson, Alexandra S. Penn Pages 161-177 Open Access PDF

Conclusion Pete Barbrook-Johnson, Alexandra S. Penn Pages 179-182 Open Access PDF

Back Matter Pages 183-186 PDF

“Doing Good Better” by William Macaskill

https://effectivealtruism.org/doing-good-better
By the co-founder of the Effective Altruism movement. You can find and follow multiple EA groups on twitter, by searching for “Effective Altruism”, with an without the space between the two words.

Well worth reading. A good example of wide ranging applied evaluative thinking

Contents page

Book reviews

2015 The Guardian
2015 London Review of Books
2015 New York Times
GoodReads: 650 reviews, average rating 4.25 / 5

Techniques to Identify Themes (in text/interview data)

Ryan, G. W., & Bernard, H. R. (2003). Techniques to Identify Themes. Field Methods, 15(1), 85–109. https://doi.org/10.1177/1525822X02239569

Downloadable pdf available here (link may look broken but is is not) Recommended

Abstract: Theme identification is one of the most fundamental tasks in qualitative research. It also is one of the most mysterious. Explicit descriptions of theme discovery are rarely found in articles and reports, and when they are, they are often relegated to appendices or footnotes. Techniques are shared among small groups of social scientists, but sharing is impeded by disciplinary or epistemological boundaries. The techniques described here are drawn from across epistemological and disciplinary boundaries. They include both observational and manipulative techniques and range from quick word counts to laborious, in-depth, line-by-line scrutiny. Techniques are compared on six dimensions: (1) appropriateness for data types, (2) required labor, (3) required expertise, (4) stage of analysis, (5) number and types of themes to be generated, and (6) issues of reliability and validity.

Contents (as in headings used)

What is a theme
HOW DO YOU KNOW A THEME WHEN YOU SEE ONE?
WHERE DO THEMES COME FROM?
SCRUTINY TECHNIQUES—THINGS TO LOOK FOR
- Repetitions
- Indigenous Typologies or Categories
- Metaphors and Analogies
- Transitions
- Similarities and Differences
- Linguistic Connectors
- Missing Data
- Theory-Related Material
PROCESSING TECHNIQUES
- Cutting and Sorting
- Word Lists and Key Words in Context (KWIC)
- Word Co-Occurrence
- Metacoding
SELECTING AMONG TECHNIQUES
- Kind of Data
- Expertise
- Labor
- Number and Kinds of Themes
- Reliability and Validity
FURTHER RESEARCH
NOTES
REFERENCES

Defining the Agenda: Key Lessons for Funders and Commissioners of Ethical Research in Fragile and Conflict Affected Contexts

By Leslie Groves-Williams. Funded by UK Research and Innovation (UKRI) and developed in collaboration with UNICEF, Office of Research – Innocenti. A pdf copy is available online here

Publicised here because the issues and lessons identified also seem relevant to many evaluation activities

Text of the Introduction: The ethical issues that affect all research are amplified significantly in fragile and conflict-affected contexts. The power imbalances between local and international researchers are increased and the risk of harm is augmented within a context where safeguards are often reduced and the probabilities of unethical research that would be prohibited elsewhere are magnified. Funders and commissioners need to be confident that careful ethical scrutiny of the research process is conducted to mitigate risk, avoid potential harm and maximize the benefit of the commissioned research for affected populations, including through improving the quality and accuracy of data collected. The UKRI and UNICEF Ethical Research in Fragile and Conflict-Affected Contexts: Guidelines for Reviewers can support you to ensure that appropriate ethical scrutiny is taking place at review phase. But, what about mitigating for risks at the funding and commissioning phases? These phases are often not subject to ethical review yet carry strong ethical risks and opportunities. As a commissioner or a funder designing a call for research in fragile and conflict-affected contexts, how confident are you that you are commissioning the research in the most ethical way?

This document brings together some key lessons learned that provide guidance for funders and commissioners of research in fragile and conflict-affected contexts to ensure that ethical standards are applied, not just at the review stage, but also in formulating the research agenda. These lessons fall into four clusters:

1. Ethical Agenda Setting
2. Ethical Partnerships
3. Ethical Review
4. Ethical Resourcing.
In addition to highlighting the lessons, this paper provides mitigation strategies for funders and commissioners to explore as they seek to avoid the ethical risks highlighted

Algorithmic Impact Assessment – Three+ useful publications by Data & Society

In the movies, when a machine decides to be the boss — or humans let it — things go wrong. Yet despite myriad dystopian warnings, control by machines is fast becoming our reality. Photo: The Conversation / Shutterstock

As William Gibson famously said circa 1992 “The future is already here — it’s just not very evenly distributed” In 2021 the future is certainly here in the form of algorithms (rather than people) that manage low paid workers ( distribution centres, delivery services, etc), welfare service recipients and those caught up in the justice system. Plus anyone else having to deal with chatbots when trying to get through to other kinds of service providers. But is a counter-revolution brewing? Read on…

Selected quotes

“Algorithmic accountability is the process of assigning responsibility for harm when algorithmic decision-making results in discriminatory and inequitable outcomes”

“Among many applications, algorithms are used to:

• Sort résumés for job applications;
• Allocate social services;
• Decide who sees advertisements for open positions, housing, and products;
• Decide who should be promoted or fired;
• Estimate a person’s risk of committing crimes or the length of a prison term;
• Assess and allocate insurance and benefits;
• Obtain and determine credit; and
• Rank and curate news and information in search engines.”

“Algorithmic systems present a special challenge to assessors, because the harms of these systems are unevenly distributed, emerge only after they are integrated into society, or are often only visible in the aggregate”

“What our research indicates is that the risk of self-regulation lies not so much in a corrupted reporting and assessment process, but in the capacity of industry to define the methods and metrics used to measure the impact of proposed systems”

Algorithmic Accountability: A Primer. Data & S0ciety. Caplan, R., Donovan, J., Hanson, L., & Matthews, J. (2018). 26 pages

CONTENTS

What Is an Algorithm?
How Are Algorithms Used to Make Decisions?
Example: Racial Bias in Algorithms of Incarceration
Complications with Algorithmic Systems
• Fairness and Bias
• Opacity and Transparency
• Repurposing Data and
Repurposing Algorithms
• Lack of Standards for Auditing
• Power and Control
• Trust and Expertise
What is Algorithmic Accountability?
• Auditing by Journalists
• Enforcement and Regulation

Assembling accountability: Algorithmic Impact Assessment for the Public Interest. Data & Society. Moss, E., Watkins, E. A., Singh, R., Elish, M. C., & Jacob Metcalf. (2021).

- The 6 page Policy brief
- The 56 page full report

In summary: The Algorithmic Impact Assessment is a new concept for regulating algorithmic systems and protecting the public interest. Assembling Accountability: Algorithmic Impact Assessment for the Public Interest is a report that maps the challenges of constructing algorithmic impact assessments (AIAs) and provides a framework for evaluating the effectiveness of current and proposed AIA regimes. This framework is a practical tool for regulators, advocates, public-interest technologists, technology companies, and critical scholars who are identifying, assessing, and acting upon algorithmic harms.

First, report authors Emanuel Moss, Elizabeth Anne Watkins, Ranjit Singh, Madeleine Clare Elish, and Jacob Metcalf analyze the use of impact assessment in other domains, including finance, the environment, human rights, and privacy. Building on this comparative analysis, they then identify common components of existing impact assessment practices in order to provide a framework for evaluating current and proposed AIA regimes. The authors find that a singular, generalized model for AIAs would not be effective due to the variances of governing bodies, specific systems being evaluated, and the range of impacted communities.

After illustrating the novel decision points required for the development of effective AIAs, the report specifies ten necessary components that constitute robust impact assessment regimes.

CONTENTS

INTRODUCTION
What is an Impact?
What is Accountability?
What is Impact Assessment?
THE CONSTITUTIVE COMPONENTS OF IMPACT ASSESSMENT
Sources of Legitimacy
Actors and Forum
Catalyzing Event
Time Frame
Public Access
Public Consultation
Method
Assessors
Impacts
Harms and Redress
TOWARD ALGORITHMIC IMPACT ASSESSMENTS
Existing and Proposed AIA Regulations
Algorithmic Audits
External (Third and Second Party) Audits
Internal (First-Party) Technical Audits and
Governance Mechanisms
Sociotechnical Expertise
CONCLUSION:
GOVERNING WITH AIAs
ACKNOWLEDGMENTS

See also

Databite No. 145: Algorithmic Governance and the State of Impact Assessment in the EU, US, and Canada. (n.d.). Data & Society. Retrieved 18 December 2021,
Moss, E. (2021, June 29). Assembling Accountability, from the Ground Up. Medium.

Structured Analytic Techniques for Intelligence Analysis

This is the title of the 3rd edition of the same, by Randolph H. Pherson and Richards J. Heuer Jr, published by Sage in 2019

It is not cheap book, so I am not encouraging its purchase, but I am encouraging the perusal of its contents via the contents list and via Amazon’s “Look inside” facility.

Why so? The challenges facing intelligence analysts are especially difficult, so any methods used to address these may be of wider interest. These are spelled out in the Foreword, as follows:

This report is of interest in a number of ways:

To what extent are the challenges faced similar/different to those of evaluations of publicly visible interventions?
How different is the tool set, and the categorisation of the contents of that set?
How much research has gone into the development and testing of this tool set?

The challenges

Some of these challenges are also faced by evaluation teams working in more overt and less antagonistic settings, albeit to a lesser degree. For example, what will work in future in a slightly different settings (1), missing and ambiguous evidence (2), and with clients and other stakeholders who may intentionally or unintentionally not disclose or actually mislead (3) , and whose recommendations can affect peoples lives, positively and negatively (4).

The contents of the tool set

My first impression is that this book casts its net much wider than the average evaluation text (if there is such a thing). The families of methods include team working, organising, exploring, diagnosing, reframing, foresight, decision support, and more. Secondly, there are quite a few methods within these families I had not heard of before, including Bowtie analysis, opportunities incubator, morphological analysis, premortem analysis, deception detection and inconsistencies finder. The last two are of particular interest. Hopefully they are more than just a method brand name.

Research and testing

Worth looking at, alongside this publication, is this 17 page paper by Artner, S., Girven, R., & Bruce, J. (2016). Assessing the Value of Structured Analytic Techniques in the U.S. Intelligence Community. RAND Corporation. Its key findings are summarised as follows:

- The U.S. Intelligence Community does not systematically evaluate the effectiveness of structured analytic techniques, despite their increased use.
- One promising method of assessing these techniques would be to initiate qualitative reviews of their contribution in bodies of intelligence production on a variety of topics, in addition to interviews with authors, managers, and consumers.
- A RAND pilot study found that intelligence publications using these techniques generally addressed a broader range of potential outcomes and implications than did other analyses.
- Quantitative assessments correlating the use of structured techniques to measures of analytic quality, along with controlled experiments using these techniques, could provide a fuller picture of their contribution to intelligence analysis.

See also Chang, W., & Berdini, E. (2017). Restructuring Structured Analytic Techniques in Intelligence. For an interesting in-depth analysis of bias risks and how the are managed and possibly mismanaged. Here is the abstract:

Structured analytic techniques (SATs) are intended to improve intelligence analysis by checking the two canonical sources of error: systematic biases and random noise. Although both goals are achievable, no one knows how close the current generation of SATs comes to achieving either of them. We identify two root problems: (1) SATs treat bipolar biases as unipolar. As a result, we lack metrics for gauging possible over-shooting—and have no way of knowing when SATs that focus on suppressing one bias (e.g., over-confidence) are triggering the opposing bias (e.g., under-confidence); (2) SATs tacitly assume that problem decomposition (e.g., breaking reasoning into rows and columns of matrices corresponding to hypotheses and evidence) is a sound means of reducing noise in assessments. But no one has ever actually tested whether decomposition is adding or subtracting noise from the analytic process—and there are good reasons for suspecting that decomposition will, on balance, degrade the reliability of analytic judgment. The central shortcoming is that SATs have not been subject to sustained scientific of the sort that could reveal when they are helping or harming the cause of delivering accurate assessments of the world to the policy community.

Both sound like serious critiques, but compared to what? There are probably plenty of evaluation methods where the same criticism could be applied – no one has subjected them to serious evaluation.

An Institutional View of Algorithmic Impact Assessments

Selbst, A. (2021). An Institutional View of Algorithmic Impact Assessments. Harvard Journal of Law and Technology, 35(10), 78. The author has indicated that paper that can be downloaded has a “draft” status.

First some general points about its relevance:

Rich people get personalised one-to-one attention and services. Poor people get processed by algorithms. That may be a bit of a caricature, but there is also some truth there. Consider loan applications, bail applications, recruitment decisions, welfare payments. And perhaps medical diagnoses and treatments, depending to the source of service. There is therefore a good reason for any evaluators concerned with equity to pay close attention to how algorithms affect the lives of the poorest sections of societies.
This paper reminded me of the importance of impact assessments, as distinct from impact evaluations. The former are concerned with “effects-of-a-cause“, as distinct from the “causes-of-an-effect” , which is the focus of impact evaluations. In this paper impact assessment is specifically concerned about negative impacts, which is a narrower ambit than I have seen previously in my sphere of work. But complementary to the expectations of positive impact associated with impact evaluations. It may reflect the narrowness of my inhabited part of the evaluation world, but my feeling is that impact evaluations get way more attention than impact assessments. Yet once could argue that the default situation should be the reverse. Though I cant quite articulate my reasoning … I think it is something to do with the perception that most of the time the world acts on us, relative to us acting on the world.

Some selected quotes:

The impact assessment approach has two principal aims. The first goal is to get the people who build systems to think methodically about the details and potential impacts of a complex project before its implementation, and therefore head off risks before they become too costly to correct. As proponents of values-in-design have argued for decades, the earlier in project development that social values are considered, the more likely that the end result will reflect those social values. The second goal is to create and provide documentation of the decisions made in development and their rationales, which in turn can lead to better accountability for those decisions and useful information for future policy interventions (p.6)
1. This Article will argue in part that once filtered through the institutional logics of the private sector, the first goal of improving systems through better design will only be effective in those organizations motivated by social obligation rather than mere compliance, but second goal of producing information needed for better policy and public understanding is what really can make the AIA regime worthwhile (p.8)
Among all possible regulatory approaches, impact assessments are most useful where projects have unknown and hard-to-measure impacts on society, where the people creating the project and the ones with the knowledge and expertise to estimate its impacts have inadequate incentives to generate the needed information, and where the public has no other means to create that information. What is attractive about the AIA (Algorithmic Impact Assessment) is that we are now in exactly such a situation with respect to algorithmic harms. (p.7)
The Article proceeds in four parts. Part I introduces the AIA, and
explains why it is likely a useful approach….Part II briefly surveys different models of AIA that have been proposed as well as two alternatives: self-regulation and audits…Part III examines how institutional forces shape regulation and compliance, seeking to apply those lessons to the case of AIAs….Ultimately, the Part concludes that AIAs may not be
fully successful in their primary goal of getting individual firms to consider
social problems early, but that the second goal of policy-learning may well be
more successful because it does not require full substantive compliance. Finally, Part IV looks at what we can learn from the technical community. This part discusses many relevant developments within technology industry and scholarship: empirical research into how firms understand AI fairness and ethics, proposals for documentation standards coming from academic and industrial labs, trade groups, standards organizations, and various self-regulatory framework proposal.(p.9)

The revised UNEG Ethical Guidelines for Evaluations (2020)

The UNEG Ethical Guidelines for Evaluation were first published in 2008. This document is a revision of the original document and was approved at the UNEG AGM 2020. These revised guidelines are consistent with the standards of conduct in the Charter of the United Nations, the Staff Regulations and Rules of the United Nations, the Standards of Conduct for the International Civil Service, and in the Regulations Governing the Status, Basic Rights and Duties of Officials other than Secretariat. They are also consistent with the United Nations’ core values of Integrity, Professionalism and Respect for Diversity, the humanitarian principles of Humanity, Neutrality, Impartiality and Independence and the values enshrined in the Universal Declaration of Human Rights.

This document aims to support UN entity leaders and governing bodies as well as those organizing and conducting evaluations for the UN to ensure that an ethical lens informs day to day evaluation practice.

This document provides:

Four ethical principles for evaluation;
Tailored guidelines for entity leaders and governing bodies, evaluation organizers, and evaluation practitioners;
A detachable Pledge of Commitment to Ethical Conduct in Evaluation that all those involved in evaluations will be required to sign.

These guidelines are designed to be useful and applicable to all UN agencies, regardless of differences in mission (operational vs. normative agencies), in structures (centralized vs. decentralized), in the contexts for the work (development, peacekeeping, humanitarian) and in the nature of evaluations that are undertaken (oversight/accountability focused vs. learning).

Contents

Introduction to ChatGPT, by ChatGPT

Using Chat GPT to analyses bodies of text

How it works – from a user perspective

The design of prompts

Extracting causal theories – Steve Powell’s explorations

Other applications that may be useful to evaluators

Quotes of note..

In summary…as of 22 February 2023

Lame criticisms – a miscellany with some responses