When Artificial Intelligence is Consistent(ly Wrong)
This is my third post based on interviews with six individuals that work on or with artificial intelligent in the workplace. Here, I’ll focus on the development of artificial intelligence tools. It’s critical to keep in mind that these are tools constructed by people, which can sometimes lead to unintended consequences.
When building artificial intelligence, consistent results are key. While speaking with one individual about the results of a machine learning model implemented in his company, he said that one thing he could count on was consistency - even if it was wrong, it was consistently wrong. In this case, the model classified customer service tickets and his team knew when it a ticket was classified incorrectly. An incorrect classification was fairly obvious, and it had low stakes. But, consistently wrong results could have devastating effects in other situations. Companies and organizations are building artificial intelligence, especially machine learning and deep learning, have a responsibility to think about these two questions: how can things go wrong and what can we do to prevent it?
What has gone wrong?
Simply searching “AI mistakes” or “artificial intelligence failures” in your preferred search engine will yield many instances where artificial intelligence “failed.” Some relatively well-known failures include Google’s photo tagging software identifying black people as gorillas in 2015 and a beauty pageant “judged by AI” in 2016 that ended with nearly all white winners. AI has also resulted in injury and death, such as the child that was injured after a security robot ran over it in Palo Alto and the first death associated with Tesla’s autopilot feature.
What is the cause?
A cause of many artificial intelligence “failures” or “mistakes” is biased algorithms. And, unfortunately, many firms and organizations are relying on artificial intelligence built with flawed algorithms and taking the results as infallible and objective. I recently listened to an episode of the 99% Invisible podcast, called “The Age of the Algorithm.” The episode focuses on Cathy O’Neil and her book, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, which explores the effects of poorly-designed algorithms. She covers the court system’s use of programs like Northpoint, which judges sometimes use to predict the likelihood of recidivism (committing another crime). However, many believe the tool is inherently biased against blacks because it considers questions like whether the defendant grew up in a high-crime neighborhood. This is a terrifying prospect when a judge may be considering your score when deciding how long to sentence you to prison.
“Garbage in, garbage out.”
In addition to the two questions I raised earlier, firms and organizations can take a number of proactive steps to avoid consistently incorrect or biased results. Roman Valensky proposed the following five best practices in an article for the Harvard Business Review:
“Controlling user input to the system and limiting learning to verified data inputs.
Checking for racial, gender, age, and other common biases in your algorithms.
Explicitly analyzing how your software can fail, and then providing a safety mechanism for each possible failure.
Having a less “smart” backup product or service available.
Having a communications plan in place to address the media in case of an embarrassing failure. (Hint: Start with an apology.)”
I’d like to add one more suggestion to his list: work with a diverse team. Individuals of different races, cultures, and backgrounds interacting with your product and service may draw out some of those biases naturally. In the end, however, it’s important to remember that while the results might be consistent, that doesn’t mean they’re right.