Biases in AI(2)
Dealing with biased data
- Issue of ecological validity of data sets
- The extent to which study findings/data generalises to real-word settings.
- Real callers vs. simulated callers
- Differences in speech
- Recruited participants talk more, speak faster
- Real users ask for more help and interrupt the system more
Can mitigate bias by being transparent
Data Sheets
Another approach to provide meta data for transparency
- Motivation - Who collected, who, how funded
- Composition - How many instances, how sampled, data spilt
- Collection Process - How collected, how metadata assigned, IRB, timeline, consent
- Pre-processing, Uses, Distribution, Maintenance
Crowdsourcing
- "Artificial Artificial Intelligence"
- Outsourcing processes and jobs to a distributed workforce who can perform these tasks virtually - usually via a marketplace
- Commonly employed to obtain labelled data to train AI systems
Quality of Crowdsourcing
Literature survey paper about crowdsourcing practices inspired by best practices in structured content analysis, a longstanding methodology in the social sciences
- Overall, findings indicate concern, given how crucial the quality of training data is and the difficulty of standardizing human judgement
Crowdwork is working, bad conditions, and no health care. Trying to get fairness for them
Algorithmic solutions and transparency
Debiasing words, homemaker is not related to women
Hard Debiasing - Removes the gender pair associations for gender neutral words Soft Debiasing - Reduces the differences between the sets while maintaining as much similarity to the original embedding as possible
Lipstick on a pig?
- Shows debiasing techniques are problematic, superficial
- Techniques only hide, not remove the bias