The sorting hat bias: Tidying UP Open source data for AI Startups

Imagine embarking on an AI startup journey and depending on open source data as if it were your own magical Sorting Hat. It's 'super smart', no question, but it's also 'super biased', tending to categorise data into its preferred 'houses'. Much like the hat that has a weakness for placing too many wizards in Gryffindor, our open source data tends to favour certain kinds of data over others. And there lies our predicament.

Unravelling the Bias Thread

Yes, I am a HP fan and chatting with my friend GPT4, I started imagining the Sorting Hat only placing students in Gryffindor and Slytherin, whilst overlooking Ravenclaw and Hufflepuff. That's what the bias in open source data could look like. The AI ends up recognising only Gryffindors and Slytherins, neglecting the Ravenclaws and Hufflepuffs, creating an imbalance in the magical world it is meant to serve.

We Need Ravenclaws and Hufflepuffs

Easier said than done, but here is a simple 3-step plan start-ups can think about to diversify their data.

1. Data Scrutiny: Begin by inspecting your open source data much like you would scrutinise a dragon before attempting to steal a golden egg. Identify the Gryffindor and Slytherin overload.

2. Data Enrichment: Having identified the favoured houses, it's time to balance the score. Introduce some Slytherin and Ravenclaw data into the mix and be sure to tag them accurately.

3. Testing: Time to test your retrained AI with a diverse crowd of witches and wizards from all houses. They will reveal whether your AI can now correctly identify and serve all of them equally.

The Never-Ending Story (that’s a different story, Cien!)

Tidying up a data set isn't a one-off event. It's an ongoing commitment, rather like pledging to avoid the Unforgivable Curses. As our Muggle and magical worlds evolve, so should our AI.

Be the best version of Dumbledore you can be!

Previous
Previous

An open letter to rishi Sunak: Invest in retraining AI

Next
Next

an open letter to steven bartlett & mo gawdat