After “double reduction”, should you “lie flat” or continue “chicken baby” in another way? This is a hot topic recently discussed by “chicken baby parents” represented by “Mother Haidian”.

New Konobel Prize winner and American economist Joshua D. Angrist and his late collaborator Alan Alan B. Krueger’s research may help us better understand this problem.

One of Angrist’s and Kruger’s most famous studies is the use of “breakpoint regression” in the 1990s to prove that people with long education years More likely to get high income. It seems that even the Nobel Prize winners have justified the “chicken baby”. However, their research in this century found that compared with studying in ordinary public middle schools, entering elite middle schools did not make students achieve better college entrance examination results. The results of another Kruger study show that there is no significant difference in income levels between students who graduated from Ivy League schools and those who graduated from ordinary state universities. So a more comprehensive conclusion should be: receiving more education is beneficial, but the benefits of receiving elite education are not obvious.

In fact, before them, there had been a lot of research on educational experience and personal achievement. The reason why Angrist was able to win the Nobel Prize is not because of how novel his research conclusions are, but the method he used to exclude the influence of many complex endogenous factors (such as personal endowment, family income, etc.), thus confirming ( Or falsify) the existence of causality. (The influence of endogenous factors can be simply illustrated: a child from a wealthy family is more likely to receive a good education, and it is easier to find a high-income job with the help of parents’ contacts after graduation, so it is difficult for us to know the extent of the impact of education on income .)

The ancient Greek philosopher Democritus once said: “Finding the cause of a thing is better than being the king of the Persians.” Why is causality so important, because it is so difficult to discover causality. Why is it difficult to discover causality? Because there are too many “obstacles” in our lives.

1. The deviation of the sample

The question of whether to “chicken baby” has actually been paid attention to by Americans. In the 1940s, “Time” magazine published a report: A survey of Yale University graduates in 1924 found that their average annual income was as high as $25,000. According to the purchasing power of the U.S. dollar at that time, it was roughly equivalent to 2More than RMB 2 million. It seems that as long as you find ways to send your child to a prestigious school, your child’s future will be able to sit back and relax.

In fact, there are many “obstacles” in this conclusion. For example, does everyone tell the truth when reporting income? Those who want to avoid taxes or don’t want to reveal their wealth may underreport their income, while those with strong vanity may overstate their income. For another example, politicians, entrepreneurs, and well-known scholars who graduated from Yale University are obviously easier to find by researchers, while those who are homeless and bankrupt businessmen are not.

Look at another example. During the “Spanish American War” in 1898, the death rate of the US Navy was 9 per 1,000, while the death rate of New Yorkers during the same period was 1,000. Sixteenths. The Naval Recruitment Department uses this data to show that it is safe to join the army. This unreasonable conclusion is also caused by the deviation of the sample: all young people who can join the army have to undergo medical examinations, and most of them are strong and physically strong; while New Yorkers include many old, weak, sick and disabled people. Therefore, the two data are not comparable at all.

Second, a game of concepts

Professor Li Lianjiang from the Chinese University of Hong Kong calls the process of inferring the population from a sample as a “thrilling leap”. Like a migratory salmon, it will jump into the mouth of a brown bear if you are not careful. For example, we often use the average situation to reflect the overall situation, but the concept of “average” is very confusing. The American statistician Darrell Huff said in his classic book “Statistics Will Lie”, “The average is a trick that is often used, sometimes out of unintentional, but more often It was committed knowingly.”

“Statistics Will Lie”, by Dallaire Huff, translated by Jin Yan and Wu Yujing, CITIC Publishing Group 2018 Edition

Going back to the example above, suppose we have found all the 1924 graduates of Yale University, and they are all willing to report their real income. Is the average income we obtained reliable? Obviously, as netizens often ridicule, if you average me with the richest man in China, I am also a billionaire. The average is more convenient to calculate, and the median better reflects the overall situation.

In addition to the concept of “average”, there are many examples of “playing concept” in life. For example, according to data released by some companies, the salaries of online ride-hailing drivers and delivery workers are higher than that of white-collar workers in many companies. In fact, the working hours of white-collar workers are calculated on the basis of working days, so netizens jokingly refer to the convenience of the company’s restroom as “paid toileting”; while online ride-hailing drivers and takeaways wait for orders to be taken, they are not counted. They are even less likely to be paid for the time they spend in the toilet.

For another example, the U.S. medical insurance agency used the indicator of unplanned readmission (ie, the “second admission rate”) of patients within 30 days after discharge from the hospital as a measure of the level of hospital treatment. One of the indicators, so some hospitals classify the patients who are re-admitted as “outpatient” or “emergency” instead of “inpatient”, thus changing the concept.

Another example, suppose a company claims that their product profit margin is only 1%. It seems that this is a conscientious company dedicated to the sake of consumers. However, investing money in this company is not as good as having high interest rates in the bank. Are investors in this company all fools? actually not. Suppose I spend 100 yuan on a product every morning and sell it at 101 yuan in the afternoon. The profit margin is indeed only 1%, but my return on investment for a year is as high as 365%. What is playing here is the concept of “profit rate” and “return on investment”.

3. Misleading propaganda

“History is a little girl who can be dressed up”, so is the data. Propaganda using statistical data is like the “eye enlargement” and “face-lifting and slimming” modes when retouching pictures, often deliberately highlighting some things and hiding some things.

For example, we know that elderly people generally spend more on medical care than young people. People who have a medical history are often more expensive to purchase commercial insurance than those who have no medical history. If an insurance company tells you that, regardless of age, health status, and no medical report, anyone can buy his family’s medical insurance for only one or two hundred yuan, and the maximum compensation can be several million. Did we pick up the big bargain? In fact, if we read the insurance clauses carefully, we may find that the threshold of this insurance is set very high, which means that most people will not be paid.

The publicity about traffic safety is more interesting. Suppose someone says that the number of people who died in an air crash last year was 100 times that of 1980. Is it that the planes of today are less safe than the previous ones? Obviously not, but becauseThere are far more people flying by plane now than in 1980. For another example, suppose statistics show that thousands of people died in train accidents last year. Isn’t taking a train as safe as we thought? In fact, it is possible that most of these thousands of people were killed when they crossed the railway, or were killed by picking up a train. Suppose there are statistics telling you that the number of accidents in driverless cars is only one thousandth of that of ordinary cars. Can we conclude that driverless driving is safer? can not. Because the number of driverless cars is already much smaller than that of ordinary cars.

Four. Wrong attribution

It is difficult to find causality, but it is easy to find wrong causation. Wrong causal attribution often has the following situations:

The first is to reverse the effect as the cause. Assuming that studies have shown that students who smoke usually have worse academic performance, can we conclude that smoking affects learning? Actually not necessarily. It is also possible that students with poor grades have learned to smoke because they are depressed. Assuming that studies have shown that the proportion of “older unmarried” female doctors is higher than that of male doctors and female masters, can we conclude that PhD studies are not conducive to women finding targets? Actually not necessarily. Perhaps it is because women with higher levels of knowledge are more tolerant of late marriage or celibacy. A more interesting example mentioned in Dallaire Huff’s book is: residents of a Pacific island country found that most healthy people have lice, so they believe that lice are good for health, in fact, because lice prefer healthy people, this is also The fallacy brought about by reversing causality.

The second is to confuse correlation and causality. Faced with statistical data such as “students who smoke have worse academic performance”, we may either make the mistake of inverting the effect as the cause, or mistake the covariation relationship for cause and effect (the covariation relationship is what it looks like The two things that are related are the result of the third thing. For example, students who like “mixed society” may have worse grades and are more likely to smoke). Dallaire Huff’s book also cited an example: Someone discovered that the salary of the pastor of the Presbyterian Church in Massachusetts is closely related to the price of rum in Havana, Cuba. Did the priests buy a lot of rum after the salary increase? Or did the church raise the salary of the pastor because it profited from reselling rum? In fact, there is no causal relationship between the two, but the result of global price increases at that time.

The third is to expand the scope of causal explanation. As mentioned earlier, Angrist and Kruger proved that the length of education is longOf people are more likely to earn a high income. However, if the truth goes one step further, it may be fallacy. Their research objects are limited to the compulsory education stage. In other words, the income of masters may not be higher than that of undergraduates, and the income of doctors may not be higher than that of masters.

Fourth is to mistake irrelevant for relevant. In January 2020, “The Lancet” published two epidemiological reports on the new crown pneumonia. From the statistical data, it seems that men are more likely to be infected with the new crown virus. Further research found that the above conclusion is only because the sample size is small and gender data is easier to obtain than other data, whether susceptibility is actually not related to gender. For another example, if we find that Europeans and Americans who drink milk have a much higher rate of cancer than Africans who do not drink milk, can we conclude that drinking milk is susceptible to cancer? Not necessarily, it may just be because the average life expectancy of Africans is very short. Most people die of other diseases before they can get cancer. Whether they drink milk (maybe) has nothing to do with cancer. Similarly, graduates of Yale University are richer, perhaps because they already have money, and it has nothing to do with their education.

In the book “Statistics Will Lie”, Dallaire Huff quoted the words of former British Prime Minister Benjamin Disraeli: ” There are three kinds of lies in the world: lies, big lies and statistics”. In fact, in addition to those false or incorrect data, most of the data can help us make up for the lack of experience and intuition, so as to better understand the world. However, For the expansion of human knowledge and wisdom, we not only need data, but also a scientific causal explanation of data, as well as the spirit of independent thinking and refusal to follow blindly. Just as the motto of the Enlightenment is “Dare to Know”, to find the truth, we can only rely on ourselves. I think this is the most important inspiration for us from the research of Angrist and others.

(The author Wang Xiang is a researcher and assistant to the director of the Digital and Mobile Governance Laboratory of Fudan University. His main research direction: public governance in the digital age and human resource management in the public sector.)