Biases in AI (1) - Algorithmic & model bias

Algorithmic Bias

Systemic errors in computer systems that lead to unfair outcomes or judgements

Equality Act (2010): It is against the law to discriminate against someone because of any of these protected characteristics

Racial bias in technology

System behaviour (inc errors, faults etc) leading to unfair outcomes or judgements due to a persons race
Due to biased associations and stereotypes
Can be due to race-based correlations - targeted advertising. Such as post code lottery
Can be result of biased training data, due to 'historical' reasons or flawed human labelling

Racial algorithmic bias

Can be about the physical dimension of race
- Darker skin tones not recognised by sensors
- CV 'object detection' found to mis-identify those with darker skin 10% more other than lighter skin
- Facial recognition for identity matching in laws enforcement

Facial recognition for gender classification less accurate for those with darker skin

Google used to show ads for names but stopped due to the word 'arrest' appeared for black0dentifying first names (60%) than for white first names (48%)

LLM Bias - Generally, LLMs encode biases in societies, such as gender stereotypes

Schlesinger suggest that instead of just deleting words, chatbots should handle sensitive topics like race, power and justice well. Three areas of chatbot architecture relevant for generating better responses

Data corpora - focusing on the underlying data
Language understanding
Learning

Data corpora for generating responses

Rather than assume it can be avoided, anticipate 'race-talk' - explicitly collect and aggregate dialogs to train chatbots in more respectful race-talk
Ensure diverse representation of cultural references across ethnicities
Diversity is who build the database is more lively to lead a diverse database!

Language understanding for generating responses

NLPs focus on probabilistic language models, based on which words(tokens) are likely to appear after one another, powerful way to generate valid statements.
Doesn't tell you if a statement is racist, sexist, untrue or otherwise problematic
Need a growing capacity for meaning-in-context in NLP - the areas of semantics and pragmatics

Word Embeddings

Word embeddings represent text data as vectors
- Concepts such as similarity, association are computed by co-occurrence
- In NLP, this is conflated with meaning
- Words are considered to have similar semantic meaning if the vectors are close together
- Differences between vectors represent relationships between words

Demonstrate that word embeddings contain biases in their geometry that reflect stereotypes present in broader society

Example of quantitative social exercise - 100 years of text data to show how stereotypes toward women and ethnic minorities change over time

What can be done about it

Change starts with understanding the problems

Approaches to mitigate bias:

Technical Solutions - debiasing algorithms, better training data
Transparency and openness - Limitations of training data, classification algorithms
Diversity and inclusion - who's involved in building the models etc

Biases in AI (1) - Algorithmic & model bias

Algorithmic Bias​

Racial bias in technology​

Racial algorithmic bias​

Data corpora for generating responses​

Language understanding for generating responses​

Word Embeddings​

What can be done about it​