State-of-the-art machine learning algorithms used to detect hate speech online can misidentify conversations about chess as racist dialogue, according to a new study out of Carnegie Mellon University in the U.S. Words like “black,” “white,” “attack,” and “threat” appeared to trigger red flags in the systems.
The research was inspired when the account of a popular chess YouTuber was flagged by the video platform last year for “harmful and dangerous” content. YouTube reinstated the account of Antonio Radić, a.k.a. agadmator, within 24 hours, without an explanation for the temporary ban.
“We don’t know what tools YouTube uses, but if they rely on artificial intelligence to detect racist language, this kind of accident can happen,” Ashiqur R. KhudaBukhsh, Ph.D., a project scientist in the university’s Language Technologies Institute who led the experiment, said in a statement.
KhudaBukhsh and Rupak Sarkar, a research engineer at the Institute, used two cutting-edge AI language classifiers to review more than 680,000 comments compiled from five, popular chess-themed YouTube accounts. They then reviewed a random sample of the comments the classifiers had flagged as hate speech, finding that 82% of those comments were not hate speech at all.
Related on The Swaddle:
The Tech Industry’s Sexism, Racism Is Making Artificial Intelligence Less Intelligent
Their findings highlight the difficulty of relying on artificial intelligence technology to police hate speech online at a time when demands for regulation are high, especially on social media platforms.
Speech-monitoring AI seems to offer social media platforms a cost-effective way of mass monitoring their users’ language. But the history of such efforts shows the field has more promise than demonstrated efficacy. In 2016, a Twitter chatbot programmed to refine its speech capabilities by chatting with real people on Twitter quickly turned into a racist and misogynistic mouthpiece. In 2019, Google/Alphabet’s Jigsaw, an algorithm developed with the aim of greater nuance and designed to give online speech a ‘toxicity’ score as opposed to a binary hate speech flag, demonstrated a racial bias by rating speech patterns common within black American culture as toxic.
Most recently, in January, Facebook’s Oversight Board reversed two hate speech bans, finding the posts, when viewed in context by humans, did not qualify as hate speech.
While developing and refining technological solutions to monitor hate speech are important — arguably, despite the collateral damage on Radić and others, such efforts are better than nothing — they don’t solve the underlying problem. If AI can become hateful because of real people’s hate speech, AI’s struggle to successfully decide what is offensive and what is not may reflect our own. It’s a conversation the chess community — led by more than 1,700 grandmasters worldwide, only three of whom are black — and the world at large is only starting to grapple with.