Biases in AI(2)

Dealing with biased data

Issue of ecological validity of data sets
- The extent to which study findings/data generalises to real-word settings.
Real callers vs. simulated callers
Differences in speech
- Recruited participants talk more, speak faster
- Real users ask for more help and interrupt the system more

Can mitigate bias by being transparent

Data Sheets

Another approach to provide meta data for transparency

Motivation - Who collected, who, how funded
Composition - How many instances, how sampled, data spilt
Collection Process - How collected, how metadata assigned, IRB, timeline, consent
Pre-processing, Uses, Distribution, Maintenance

Crowdsourcing

"Artificial Artificial Intelligence"
- Outsourcing processes and jobs to a distributed workforce who can perform these tasks virtually - usually via a marketplace
Commonly employed to obtain labelled data to train AI systems

Quality of Crowdsourcing

Literature survey paper about crowdsourcing practices inspired by best practices in structured content analysis, a longstanding methodology in the social sciences

Overall, findings indicate concern, given how crucial the quality of training data is and the difficulty of standardizing human judgement

Crowdwork is working, bad conditions, and no health care. Trying to get fairness for them

Algorithmic solutions and transparency

Debiasing words, homemaker is not related to women

Hard Debiasing - Removes the gender pair associations for gender neutral words Soft Debiasing - Reduces the differences between the sets while maintaining as much similarity to the original embedding as possible

Lipstick on a pig?

Shows debiasing techniques are problematic, superficial
Techniques only hide, not remove the bias

Biases in AI(2)

Dealing with biased data​

Data Sheets​

Crowdsourcing​

Quality of Crowdsourcing​

Algorithmic solutions and transparency​

Lipstick on a pig?​