Ezra Klein, lead explainer over at Vox, needs to have direction of cause explained to him. He tweeted “leading AI models for processing hate speech were one-and-a-half times more likely to flag tweets as offensive or hateful when they were written by African Americans, and 2.2 times more likely to flag tweets written in African American English”.
His tweet pointed to the article “The algorithms that detect hate speech online are biased against black people: A new study shows that leading AI models are 1.5 times more likely to flag tweets written by African Americans as ‘offensive’ compared to other tweets.”
Some person calling herself (I think it’s a woman) “Cardi B” won the 2019 Grammy award for the “best” rap album Invasion of Privacy, on which is the song “I like it.” A snippet of the lyrics:
I like million dollar deals
Where’s my pen? Bitch I’m signin’
I like those Balenciagas, the ones that look like socks
I like going to the jeweler, I put rocks all in my watch
I like texts from my exes when they want a second chance
I like proving niggas wrong, I do what they say I can’t
Another big hit on the album seems to be “Be Careful” (my asterisks):
And putas, chillin’ poolside, livin’ two lives
I could’ve did what you did to me to you a few times
But if I did decide to slide, find a nigga
F*** him, suck his d***, you would’ve been pissed
But that’s not my M.O., I’m not that type of bitch
And karma for you is gon’ be who you end up with
Don’t make me sick, nigga
Now if some music lover were to quote these Grammy winning works of art in a tweet, it could conceivably be labeled as “hate speech”. If that music lover were black, then a black would be algorithmically charged as a “hater”. And if more blacks tweeters appreciate this kind of Grammy winning art than white tweeters, why, then, more blacks will be painted as “haters.”
Did I mention these songs were Grammy winning?
Enough jokes. Here are the easy facts.
A do-gooder compiles a list of “hate”. Tweets are compared against the list. Some are flagged as “hate”, others, presumably, as “love”. Anybody who’s written even one line of code knows how easy this is to do. As long as there are no typos, the algorithm will work. “Hate” tweets will be set on one side, “love” tweets on another.
Then if it later turns out that the “hate” tweets were written by blacks more than whites (proportionally, determined by external data), then blacks will be said to be bigger “haters” than whites.
It must be obvious that if the algorithm does not know, in advance, who is black and who white, then it is impossible for the algorithm to be “racist”. It will really be true that more blacks are “haters” than whites.
Alas, it is not obvious.
But two new studies show that AI trained to identify hate speech may actually end up amplifying racial bias. In one study, researchers found that leading AI models for processing hate speech were one-and-a-half times more likely to flag tweets as offensive or hateful when they were written by African Americans, and 2.2 times more likely to flag tweets written in African American English (which is commonly spoken by black people in the US). Another study found similar widespread evidence of racial bias against black speech in five widely used academic data sets for studying hate speech that totaled around 155,800 Twitter posts.
I heard some white politicians trying their tongues on African American English. How did it go? “I ain’t in no ways tired…” Which I guess means whites speak White American English. Skip it.
This is in large part because what is considered offensive depends on social context. Terms that are slurs when used in some settings — like the “n-word” or “queer” — may not be in others. But algorithms — and content moderators who grade the test data that teaches these algorithms how to do their job — don’t usually know the context of the comments they’re reviewing.
Okay, so nigger is not offensive if a black guy, or Mel Brooks, says it, but it is if a white guy says it. Not sure in what class Mark Twain fits (Huckleberry Finn is probably banned anyway). Queer is not offensive if a pervert says it, but it is if a normal says it.
Again, unless the algorithm knows in advance who is doing the saying, then there is no way to know if the words are “offensive”.
If the nervous programmers at Twitter are frightened of being called “racists”, then this is what they can do. Write an algorithm that identifies race and not “hate”. It won’t be perfect, but it can be reasonable.
Those who the algorithm tags as black, along with these peoples’ tweets, can then be automatically labeled “love” regardless of content. Whereas those people tagged as white, and their tweets, can be automatically labeled “hate”.
Problem solved! Indeed, this is the only way to solve it. I therefore predict that’s exactly what will happen.