An AI-powered portrait generator went viral last week thanks to its ability to turn selfies into realistic Impressionist portraits. For people of color, however, the results leave much to be desired.
The flaw in AI Portrait Ars, an app built by researchers at the MIT-IBM Watson AI Lab, was first pointed out by Morgan Sung, a reporter at Mashable. She found that the app “whitened my skin to an unearthly pale tone, turned my flat nose into one with a prominent bridge and pointed end, and replaced my very hooded eyes with heavily lidded ones.” This result is both terribly disappointing and utterly predictable.
In 2019, AI developers should know that algorithmic bias not only exists but is a serious problem we must fight against. So why does it continue to persist? And can we actually stop it? These are open questions that boil down to where you think the blame lies. Is it the case that these algorithms are exaggerating parts of human nature? Are algorithmic biases a reflection of our society’s systemic problems? In the case of the AI Portrait Ars, it may help to trace why it couldn’t draw the faces of people of color in order to figure out why this continues to happen.
Part of the problem lies with how AI Portrait Ars fundamentally works. The program relies on a generative adversarial network (GAN), meaning there are two types of algorithms pitted against each other as adversaries to create its portraits. The first type are generative algorithms, responsible for generating new data. The second type are discriminator algorithms, responsible for deciding whether new data belongs to the training dataset.
With AI Portrait Ars, the generator learns how to create realistic portraits of people and the discriminator learns how to discern which aren’t convincing enough based on the dataset. Datasets, then, are of the utmost importance in determining whether or not the GAN will read certain data (facial features) as authentic or not. The training dataset has over 15,000 images, but it’s important to remember where these images were likely pulled from.
“This was an experiment by one of our researchers. The images from the app’s users were deleted immediately from our servers after the Renaissance portrait was generated. The experiment has run its course,” IBM Research said in a statement to Motherboard.
“Also, the tool reflects the data it was trained on: a collection of 15,000 portraits, predominantly from the Western European Renaissance period,” the company continued. “In some cases, it produced a strong alteration of colors and shapes. That’s a reality of the style, not the algorithm.”
This experiment, however, mirrors dozens of other AI and facial recognition experiments that have had far more accurate results for white people than people of color. If the experiment proved anything, it’s that AI researchers continue to be drawn to experiments and research that perpetuate the biases we already know exist in AI research.
It’s also not actually a “reality of the style” of Renaissance art that people of color weren’t in paintings of the era. There are many examples of people of color in European art history, though they are largely assumed by the masses to be non-existent in art from the Renaissance.
“The material available for illuminating the lives of individual Africans in Renaissance Europe through the visual arts is considerable, though little known to the wider public,” a lengthy 2013 report from Baltimore’s Walters Art Museum called “Revealing the African Presence in Renaissance Europe” notes.
It is important to “understand the period in terms of individuals of African ancestry, whom we encounter in arresting portrayals from life, testifying to the Renaissance adage that portraiture magically makes the absent present. We begin with slaves, moving up the social ladder to farmers, artisans, aristocrats, scholars, diplomats, and rulers from different parts of the African continent,” it continues.
The problem with AI Portrait Ars reflects how, historically, technology often functions as an extension of the status quo as opposed to a great equalizer. Color film, for example, was initially calibrated to look best with white skin tones since they were the preferred consumer market. In the 1970s, what prompted the industry to even consider better rendering of darker colors was economic pressure from Kodak’s professional accounts. Furniture manufacturers were angry that their advertisements using Kodak color film didn’t capture the difference between dark-grained wood and light-grained wood, while chocolate confectioners were angry that the film couldn’t capture all the different shades of chocolate.
At this point, AI researchers—especially ones utilizing IBM’s Watson, should know better. In 2018, Joy Buolamwini, founder of the Algorithm Justice League, published her MIT thesis analyzing facial recognition technology from IBM Watson, Microsoft, and Face++ (a Chinese artificial intelligence company). Buolamwini found that all of the programs had the highest error rates for dark-skinned women and the most accurate results with light-skinned men, but that IBM Watson had the highest disparity in the error rates between dark-skinned women and light-skinned men (the error rate was 34.4 percent higher for dark-skinned women). Buolamwini also found that as skin tones got darker, IBM Watson failed to correctly recognize a subject’s gender nearly 50 percent of the time.
To IBM’s credit, Buolamwini’s research pushed the company to radically improve its facial recognition technology. This, however, hasn’t stopped the problem of racial bias from reappearing in other IBM products like their AI Portrait Ars, or the industry at large. Until we can root out the biases baked into our society that keep reemerging in each new generation of technology, what is to be done?
Caroline Sinders, a machine learning designer who previously worked with IBM Watson, told Motherboard that part of the problem lies with a “lack of awareness that we need to test multiple genders or multiple races.” At the same time, Sinders asked whether the solution is as simple as more diversity in data. “When these failures pop up, it really does highlight a lack of diversity in the sets. But also having a more diverse dataset for things that use facial images poses a problem where better facial apps lead to … better facial recognition. Do we necessarily want that?”
That’s a valid question when applied to the many in-the-field uses of AI and facial recognition technology, many of which are deployed disproportionately by police against people of color. As Sinders mentioned, better facial apps leads to better facial recognition—but do we need yet another AI face app at all?
Today, the problem of getting datasets that represent populations accurately and the legacy of technology being used to preserve power systems are very much interlinked. In a New York Times op-ed, Buolamwini talks about the “coded gaze,” a phenomenon where “A.I. systems are shaped by the priorities and prejudices — conscious and unconscious — of the people who design them.” The extremely high rates of misidentification that plague facial recognition software when used on people of color have led to calls for its complete and total ban. These embedded biases can affect hiring prospects, misidentify innocent people, and give unaccountable actors in the private sector or law enforcement apparatus greater information about our personal lives, without our consent. Already some cities have already banned the technology, and Congress is expected to vote on legislation that would forbid face recognition in government-owned public housing.
All of this, however, makes clear that it’s not exactly clear what the best way to stop this is. Do we use more data to empower problematic technology? Do we use algorithms to de-bias other algorithms? Do we risk continuing to disrupt people’s lives while we figure this thing out? Maybe all this means that the answer is that we can’t, at least not without first questioning whether such fundamentally problematic technology should exist at all.