Skip to main content

Biases in AI(2)

Dealing with biased data

  • Issue of ecological validity of data sets
    • The extent to which study findings/data generalises to real-word settings.
  • Real callers vs. simulated callers
  • Differences in speech
    • Recruited participants talk more, speak faster
    • Real users ask for more help and interrupt the system more

Can mitigate bias by being transparent

Data Sheets

Another approach to provide meta data for transparency

  • Motivation - Who collected, who, how funded
  • Composition - How many instances, how sampled, data spilt
  • Collection Process - How collected, how metadata assigned, IRB, timeline, consent
  • Pre-processing, Uses, Distribution, Maintenance

Crowdsourcing

  • "Artificial Artificial Intelligence"
    • Outsourcing processes and jobs to a distributed workforce who can perform these tasks virtually - usually via a marketplace
  • Commonly employed to obtain labelled data to train AI systems

Quality of Crowdsourcing

Literature survey paper about crowdsourcing practices inspired by best practices in structured content analysis, a longstanding methodology in the social sciences

  • Overall, findings indicate concern, given how crucial the quality of training data is and the difficulty of standardizing human judgement

Crowdwork is working, bad conditions, and no health care. Trying to get fairness for them

Algorithmic solutions and transparency

Debiasing words, homemaker is not related to women

Hard Debiasing - Removes the gender pair associations for gender neutral words Soft Debiasing - Reduces the differences between the sets while maintaining as much similarity to the original embedding as possible

Lipstick on a pig?

  • Shows debiasing techniques are problematic, superficial
  • Techniques only hide, not remove the bias