This article is from the WeChat public account: Tencent Research Institute (ID: cyberlawrc)

We live in an age surrounded by AI algorithms. Advances in technology have caused AI to break past the boundaries of use, enter deeper decision-making fields, and have an important impact on our lives. AI has become a recruiting interviewer, a sentencing assistant, and a teacher who admits admission applications … This technology has undoubtedly brought us convenience. But at the same time, a more undeniable problem also surfaced-algorithmic bias.

Reported by the BBC on November 1, Apple co-founder Steve Wozniak claimed on social media that Apple Credit Card gave him 10 times the credit limit of his wife, although the couple did not Individual individual bank account or any personal assets. This makes people wonder whether Apple’s credit line algorithm is sexist?

In fact, it is not only women who are discriminated against, but also the areas where prejudice is spreading far beyond bank loans. In this feature, we start with the typical types of algorithmic prejudices, and scrutinize them. How exactly does prejudice penetrate into the brain of the machine, and how will we fight against prejudice in the future?

Typical categories of algorithmic bias

Insufficient technology inclusion

Ghanaian scientist Joy Buolamwini discovered by accident that face recognition software could not recognize her unless she wore a white mask. Feeling this, Joy initiated the Gender Shades study and found that face recognition products from IBM, Microsoft, and Deaf Face ++ all have different degrees of discrimination against women and dark races. (ie women and deepThe color recognition accuracy rate is significantly lower than that of males and light-colored races) , with a maximum gap of 34.3%.

The essence of this problem is actually the lack of tolerance of face recognition technology for different groups. Just like when we develop a product, it is naturally easy to meet the usage habits of young and middle-aged people, and ignore its consequences for the elderly or children, or exclude people with disabilities from the users.

Image source: Official website of the Justice Justice Alliance

Predictions, unfair decisions

If inclusiveness issues are more directed towards minorities or women, then unfair predictions and decision-making are more likely to happen to anyone. For example, recruitment bias. HireVue, an AI interview tool adopted by famous companies such as Goldman Sachs, Hilton, Unilever, in November this year, its decision preferences are incredible: AI can’t tell whether you are frowning because you are thinking about a problem or have a bad mood (suggested personality Irritable); The Durham Police in the United Kingdom have used a crime prediction system for several years, setting blacks as criminals twice as likely as whites, and also prefer whites to be low-risk, stand alone crimes. (DeepTech 深 科技)

In today’s life, the field in which AI participates in evaluation and decision-making is far more than that. In addition to crime and employment, it also includes financial and medical fields. AI decision-making relies on the learning of human decision-making preferences and results, and machine prejudice essentially projects prejudice rooted in social traditions.

Prejudicial display

Enter “CEO” in the search engine, there will be a series of male and white faces; some people replace the keywords with “black girls”, and even a lot of pornographic content appears. Microsoft-developed machineOne person, Tay, was removed from Twitter just one day after the launch, because of the influence of users, racism and extreme speech appeared. (THU Data School) This kind of prejudice comes not only from the learning in user interactions, but also being nakedly presented by AI products to a wider audience. This creates a cycle of chained prejudice.

Where does the bias of the algorithm come from?

Algorithms are not discriminated against, and engineers rarely deliberately teach bias to algorithms. Where does that prejudice come from? This problem is closely related to the core technology behind artificial intelligence-machine learning.

Machine learning process can be reduced to the following steps, and there are three main steps to inject bias into the algorithm-data set construction, target formulation and feature selection. ( Engineer) , data annotations

Dataset: the soil of prejudice

Data sets are the basis of machine learning. If the data set itself is not representative, it will not be able to objectively reflect the real situation, and algorithmic decisions will inevitably be unfair.

The common manifestation of this problem is the bias of the ratio. Due to the convenience of data collection, the data set tends to be more “mainstream” and accessible groups, and thus unevenly distributed at the ethnic and gender levels.

Facebook has announced that its face recognition system has achieved an accuracy rate of 97% after being tested by Labeled Faces in the Wild, one of the world’s most well-known face recognition datasets. But whenWhen researchers looked at the so-called gold standard data set, they found that nearly 77% of the men were white and more than 80% were white. (All Media School) This means that the algorithm trained can be problematic in identifying specific groups, such as photo recognition on Facebook Women and blacks are unlikely to be accurately labeled.

Another situation is when existing social prejudices are brought into the dataset. When the original data is the result of social prejudice, the algorithm will learn the prejudice relationship.

Amazon found that the reason for the deviation in its recruitment system is that the original data used by the algorithm is the company’s past employee data-in the past, Amazon has hired more men. The algorithm learns this feature represented by the data set, so it is easier to ignore female job applicants in decision making. (MIT Technology Review)

In fact, the databases behind almost every machine learning algorithm are biased.

Engineer: Ruler

Algorithm engineers are involved in the entire system from beginning to end, including: goal setting for machine learning, which model to use, and what features to choose. Marker: Unintentional ruling

For some unstructured data sets (such as a lot of descriptive text, pictures, videos, etc.) , the algorithm cannot directly analyze . At this time, you need to manually label the data to extract the structured dimensions for training algorithms. For a very simple example, sometimes Google Photos will ask you to help determine whether a picture is a cat. At this time, you participated in the marking of this picture.

When the bidder is faced with a “cat or dog” question, the worst result is simply a wrong answer; but if faced with “beauty or ugly” torture, prejudice arises. As a data processor, the marker is often asked to make some subjective value judgments, which has become a major source of bias.

ImageNet is a typical case: As the largest database of image recognition in the world, many pictures on the website are manually annotated and labeled with various subdivisions. “Although it is impossible for us to know whether these labeled people carry such prejudices themselves, they define what” losers “,” sluts “and” criminals “should look like … the same problem may also occur in what seems like ‘Harmless’ label. After all, even the definitions of ‘man’ and ‘woman’ are open to question. “ (All Media School)

Trevor Paglen is one of the founders of the ImageNet Roulette project, which aims to show how perspectives, prejudices, and even offensive perceptions affect artificial intelligence. He believes: “The way we label images is the product of our worldview, and any classification system will reflect the values ​​of the classifier.” In different cultural backgrounds, people have prejudices against different cultures and races.

The marking process transfers personal prejudice into the data and is absorbed by the algorithm, which generates a biased model. Today, manual marking services have become a typical business model, and many technology companies have outsourced their massive data marking. This means that algorithmic bias is being spread and amplified through a process of “invisibility” and “legalization.”

Summary

Due to the large number of applications of AI technology and the black box principle,Legal prejudice has long been a hidden but widespread social hazard. It will bring injustice in decision-making, human face recognition technology will only benefit some people, and show prejudiced opinions in the search results …

But machines have never created prejudice independently. Prejudice is learned from several important links in machine learning: from the imbalance of data sets, to the bias of feature selection, to the subjectivity brought by manual marking. In the migration from human to machine, prejudice has acquired some kind of “hiddenness” and “legitimateness”, and has been continuously practiced and amplified.

But looking back, technology is just a mirror of society and people. In a way, algorithmic prejudice is like re-presenting the truth in the dark corners and sounding the alarm bell in this current, progressive, beautiful moment. Therefore, when it comes to coping with algorithmic bias, part of the effort is to return to people. Fortunately, even technical self-discipline and governance attempts can greatly reduce the degree of prejudice and avoid a large expansion of prejudice. What are these methods?

This article is from WeChat public account: Tencent Research Institute (ID: cyberlawrc)