Skip to main content

Biases in AI (1) - Algorithmic & model bias

Algorithmic Bias

Systemic errors in computer systems that lead to unfair outcomes or judgements

Equality Act (2010): It is against the law to discriminate against someone because of any of these protected characteristics

Racial bias in technology

  • System behaviour (inc errors, faults etc) leading to unfair outcomes or judgements due to a persons race
  • Due to biased associations and stereotypes
  • Can be due to race-based correlations - targeted advertising. Such as post code lottery
  • Can be result of biased training data, due to 'historical' reasons or flawed human labelling

Racial algorithmic bias

  • Can be about the physical dimension of race
    • Darker skin tones not recognised by sensors
    • CV 'object detection' found to mis-identify those with darker skin 10% more other than lighter skin
    • Facial recognition for identity matching in laws enforcement

Facial recognition for gender classification less accurate for those with darker skin

Google used to show ads for names but stopped due to the word 'arrest' appeared for black0dentifying first names (60%) than for white first names (48%)

LLM Bias - Generally, LLMs encode biases in societies, such as gender stereotypes

Schlesinger suggest that instead of just deleting words, chatbots should handle sensitive topics like race, power and justice well. Three areas of chatbot architecture relevant for generating better responses

  • Data corpora - focusing on the underlying data
  • Language understanding
  • Learning

Data corpora for generating responses

  • Rather than assume it can be avoided, anticipate 'race-talk' - explicitly collect and aggregate dialogs to train chatbots in more respectful race-talk
  • Ensure diverse representation of cultural references across ethnicities
  • Diversity is who build the database is more lively to lead a diverse database!

Language understanding for generating responses

  • NLPs focus on probabilistic language models, based on which words(tokens) are likely to appear after one another, powerful way to generate valid statements.

  • Doesn't tell you if a statement is racist, sexist, untrue or otherwise problematic

  • Need a growing capacity for meaning-in-context in NLP - the areas of semantics and pragmatics

Word Embeddings

  • Word embeddings represent text data as vectors
    • Concepts such as similarity, association are computed by co-occurrence
    • In NLP, this is conflated with meaning
    • Words are considered to have similar semantic meaning if the vectors are close together
    • Differences between vectors represent relationships between words

Demonstrate that word embeddings contain biases in their geometry that reflect stereotypes present in broader society

Example of quantitative social exercise - 100 years of text data to show how stereotypes toward women and ethnic minorities change over time

What can be done about it

Change starts with understanding the problems

Approaches to mitigate bias:

  • Technical Solutions - debiasing algorithms, better training data
  • Transparency and openness - Limitations of training data, classification algorithms
  • Diversity and inclusion - who's involved in building the models etc