Overcoming the mindset of finding causality: Turing Award winner Pearl’s Why

This article is from WeChat public account:Jizhi Club (ID:swarma_org), author: Tim Maudlin, from the description of FIG title

The promise of big data has made the scientific community lazy to replace relevance for causality. But in the end we still have to ask questions about causality. The book “Why” by Turing Award winner, computer scientist Judea Pearl and scientific writer Dana Mackenzie provides a new answer to this ancient question.

Related is not causal

This argument, although important, has gradually become a cliché. In the stock market, we often see examples of so-called false associations. For example, a Pacific Island tribe believes that fleas are good for personal health because they do not have healthy people who have fleas and are sick. This association is real and powerful, but fleas obviously don’t lead to health, they just show that a person is in a healthy state. A flea on a fever individual simply discards the original host to find a healthier host. A person should not actively seek fleas to avoid disease.

The reason for this problem lies in an observation: the evidence of causality seems to depend entirely on relevance. However, seeing relevance, we cannot know if there is a causal relationship. For example, we found that the only reason smoking causes lung cancer is that we have observed a correlation between the two in special circumstances. Therefore, there is a problem: If the causal relationship cannot be attributed to relevance, how can correlation be used as evidence of causality?

by computer scientist Judea Pearl and scientific writer Dana Mackenzie (Dana Mackenzie) The book “Why” is intended to provide a new answer to this ancient question.

Since the Enlightenment, this issue has been discussed in some form by scientists and philosophers. In 2011, Pearl was “passedDevelopmental Probability and the Fundamental Contribution of Causal Reasoning to Artificial Intelligence” won the Turing Award for the highest honor in computer science. This book aims to explain what causal reasoning means to ordinary readers, about the same topic that he published nearly 20 years ago. The more technical book “Causality” has been updated in content.

This new book is written in the form of a first person, combining theory, history, and memoirs. It details the technical tools of causal reasoning developed by Pearl and his entire research experience, including his long-standing persistence with mainstream scientific institutions. In the past, the mainstream academic community was satisfied with the data processing analysis of correlation, rather than the confirmation of causality. This book is full of wisdom and warning in science and sociology.

Of course, Pearl has a big problem to explain and think about: how we think, and our fanaticism and hype about artificial intelligence. He wrote:“This data-centric history is still today It bothers us.”11 years ago, Wired Magazine announced < /span>“The end of theory“, because they foresee, “ Data torrents make scientific methods obsolete”.

And PoHowever, he strongly opposes this statement. He said, “We live in an era where big data is assumed to be the solution to all our problems, but I hope to convince you through this book that the data is very stupid.” Strong>.” Data can help us predict well what will happen. In fact, computers can drive cars and defeat humans in very complex strategy games, whether it is Go or dangerous edge (American TV quiz, covering a wide range, participants respond to questions based on answers and clues).

But even today’s most advanced machine learning techniques can’t explain the data, tell us why the data can be predicted. For Pearl, we are missing a “reality model” now, and this depends mainly on “causes”. He argues with data fanatics that modern computers are completely different from our brains.

Basic questions about causal reasoning

Let’s think about the following scenarios. Suppose there is a robust, statistically significant and long-term correlation between the color of the car and the annual accident rate. To be more specific, assuming that the red car is higher than the other car accidents every year, you know the study. Ready to buy a car, should you be safe, so you don’t consider the red car at all?

After some thought, you have speculated about the observed correlation. On the one hand, you think, the red car accident rate is high, probably because the human visual system has no other color in measuring the distance and speed of the red object. In this case, the red car may have more accidents, because other drivers often misjudge the speed and distance and approach the red car, thus causing an accident.

On the other hand, you think that this correlation may have nothing to do with the danger of the color itself. This red is only a by-product of other reasons. Drivers who choose a red car may be more adventurous and stimulating than the average driver, so the proportion of accidents involved is higher. And this has nothing to do with the driver’s driving ability. People who buy red cars may prefer to drive more than others, spending more time on the road a year. In this case, even if the driver is on average more cautious than other drivers, there will be more accidents involving red cars in probability.

The above example involves a basic question of causal reasoning: there is a lot of explanation for a problem. How do we find the one closest to the truth in the process of this causal inquiry?

In someIn the case, the best way is to look for more relevance and correlation between different variables. For example, in order to check whether the increase in the accident rate is due to the long time on the road, we should control the time. If the real cause of the correlation is that the driver likes to drive different degrees than the color itself, then when we study the correlation between car color and every mile driving accident or hourly driving accident, the correlation should disappear.

This line of thinking shows that inferring the causal relationship from the correlation is to sort out a large enough data set to use it to find other correlations, thereby eliminating suspicious causality. According to this operationalist way of thinking, all the answers, somehow, are in the data. And people only need to find the way to correctly reveal the iron rule of cause and effect.

Interpret causality with causality

Pell began work on artificial intelligence in the 1970s. Throughout the twentieth century, for many people in the scientific community, the concept of causality itself was considered suspicious unless it could Translated into a purely statistical language. The outstanding question is how this translation process works.

But gradually, Pearl found that the process didn’t work. Even in larger data sets, causality really cannot be attributed to relevance. Invest in more computing resources in this process, such as his own work in the early years – Bayesian Network, (applying the basic principles of Thomas Bayes, Based on new evidence, the probability of causal links is updated in interconnected data sets., the result is that you will never get a solution. Simply put, you can never get causal information without first adding a causal hypothesis.

This book tells how Pearl realized this. After that, he developed simple but powerful techniques, using what he called a “causal map” to answer questions about causality, or to determine when these questions could not be answered from the data at all. For any interested reader, this book should be understandable. It only requires the reader to stop on some formulas to digest their concepts and meanings (although Accurate details, even those with a probabilistic background, require some effort to understand).

It’s worth mentioningIn fact, when Pearl promoted the book, he explained the use of causal hypotheses in an easy-to-understand way. The causal hypothesis is not so much in algebraic statistics as in intuitive pictures. The diagram is called a “directed graph”, indicating the possible causal structure, and the arrow points to the result from the hypothetical reason.

This directed causal map has two basic parts. If the two arrows start from the same node, then we have a “common cause event” that can make a statistically significant relationship between attributes that do not have a causal relationship (such as the relationship between car color and accident rate in a hypothetical driver who likes the red) In this case, A may lead to B and C, but B and C have no cause and effect. relationship. On the other hand, if two different arrows enter the same node, then we have a “collider (collider)” New methods are needed to consider. In this case, A and B may together lead to C, but A and B have no causal relationship.

The distinction between these two structures has an important impact on causal reasoning. As Perel demonstrates, the general analysis of a causal model is to identify the “back door” of the connected node (common cause) and The front door “(collision) path and carefully adjust as appropriate.

Let’s look at some simple examples. We know that the red car has a positive correlation with the accident rate in a certain year, but what we don’t know is whether the car will make it more dangerous if it is red. Therefore, we begin to think about various causal assumptions and use directed graphs to represent them: the various nodes connected by arrows.

In one hypothesis, red is the cause of the high accident rate, so we drew an arrow from the node “red car” to the node “accident”.

In another hypothesis, certain personality traits are the reason for buying red cars and driving longer, and driving longer is the number of accidents per year. In this causal diagram, the arrow points from the “character trait” node to the “red car” node and the “more driving” node, (hence, personality traits are a common Cause fork), alsoThere is an arrow pointing from “More Driving” to “Number of Accidents”. Personality traits can only lead to accidents indirectly.

Schematic diagram of the causal relationship between the red car to be verified and the accident rate

In this picture, there is still a connected path from “red car” to “number of accidents”, which explains this correlation, but this is a backdoor path: it passes a common reason (personality). We can control the “personality traits” (may also be an unknown personality trait) or “more driving” (can be measured) to test this hypothesis. If the correlation persists when either of the two is controlled, then we know that this causal assumption is wrong

However, how do we decide which causal models to test first? For Pearl, these models are provided by theorists based on background information, common sense based speculation, or even blind guesses, rather than data. Out. The existence of a causal graph allows us to test hypotheses by citing data, either by testing the hypotheses alone or by testing the hypotheses; it does not tell us which hypotheses to test. (Pel insists: “We only collect data after assuming a causal model, after we state the scientific questions we want to answer. This is traditional In contrast to statistical methods, traditional statistical methods do not even have a causal model.”) Sometimes data may refute a theory. Sometimes we find that the data we have at hand cannot make a decision between a pair of competing causal assumptions, but we can get new data and analyze it. Sometimes we find that there is no data at all to distinguish these assumptions..

Although this method of combing the causal conclusions from the statistical data using the hypothetical causal structure is very simple, just like Pearl and Mackenzie (Mackenzie)< /span> gives the reader the same example, but the way that Pearl explores and proposes these methods is difficult and roundabout. This is mainly because, when they began to popularize this method, the entire field of statistics has begun to stop talking about causality, so Pearl’s method needs to run counter to the “common sense” of the field. Pearl has been off the mainstream since the late 1980s and early 1990s. In this book, he is proud to tell his own resistance to the mainstream in terms of knowledge and institutions.

The second author of Why, science writer Dana Mackenzie

This seems to be an old and familiar story. The description of the world, whether scientific or common sense, assumes that things cannot be immediately observed (eg legal and social structures) . However, the data on which this theory is evaluated must be observable: this is why they are made into data. So, what we believe in (theory) and the reason we believe it There is a gap between the data). What philosophers call “the uncertainty of evidence theory” means that the data cannot fully prove that the theory is correct. Sensitive people will find that this cognitive gap is intolerable, so many scientific fields will try to reduce this gap again and again, and in some way will make science an observable data structure.

This “regression data” approach has been tried very muchMany times, both psychological behaviorism and physics positivism have failed. Statistically, according to Pearl, it takes the form of giving up all discussions about causality, because, as the Enlightenment philosopher David Hume pointed out in the 17th century, the causal relationship between events It is not immediately visible. As Hume said, what we can observe is the connection between things, such as one thing consistently following another, but this connection is not a causal relationship.

The logic of positivists is not to say causality, as Perel said, “You will find that looking for a “cause” entry in the index of statistical textbooks is futile.” Students cannot say that x is y The reason, but can only say that x and y are “related”.

But what we usually care about is effective intervention, and what depends on the causal structure. If the red car accident rate is high because the red car is really difficult to see effectively and accurately, it is safer to buy a car of different colors. If this correlation is only due to a common cause, such as the buyer’s psychological traits, then you can also choose the color you like. Avoiding the red car will not magically make you a better driver. What has to be said is that trying to suppress the discussion of causality by discussing relevance makes the field of statistical analysis confusing.

Random controlled trials are not enough

Objectively speaking, mainstream academics are not really a discussion of suppressing causality, but merely being downgraded to a more professional field to (such as experiments instead of Observational research), experimental science is far more than statistical analysis, and experimental science itself focuses on conclusions rather than reasons.

In fact, there is a general perception that the observed correlation can be accepted as evidence of causality: using a randomized controlled trial (RCT)< /span>. Suppose we randomly divide a large group of car buyers into two groups, the experimental group and the control group. Then, we forced the experimental group to drive a red car and banned the control group from doing so. Since these groups are randomly formed, it is highly likely that (if these groups are large enough), they are similar in all aspects of statistics. .

For example, the proportion of reckless drivers in each group is largeTo the same. If the number of accidents in the experimental group exceeds the number of the control group and reaches a statistically significant level, we have the evidence of a “gold standard”, that is, the color itself will cause an accident.

Of course, not necessarily the true gold standard, the more rigorous gold standard is a double-blind experiment. In this experiment, neither the subject nor the experimenter knew who was in which group. In the context of car color, we actually have to make the driver blind to do double-blind experiments, but of course, this will greatly increase the accident rate. So let’s not worry about this problem anymore.

The key to a randomized controlled trial is that by assigning members to both groups in a random manner, rather than letting them choose, we control other possible explanations. Of course, the way to allocate random is not completely random, they are actually determined by throwing dice or random number generators. However, the impact of these decisions will not reach any significant level.

Pell identifies this by the “Do operator”, which means an intervention, not just the result of observing x. If I open my eyes and look at the traffic passing by me, I can record who is driving a red car and who is not driving a red car. However, “opening a red car” requires driving a red car or driving a car that is not red. This is precisely the difference between observational studies and randomized controlled trials, and observational studies do not intervene. (For those familiar with statistical methods, there are enough technical details and calculation details about calculus in this book, but it can also be more pure Graphic level to read and interpret.)

Random controlled trials are not perfect

Pell does not question the evidentiary value of randomized controlled trials. But RCT is costly, difficult, and sometimes unethical. The best evidence of smoking leading to human cancer comes from an experiment in which a large group of infants were randomly divided into two groups, forcing one group to smoke two packs a day and the other group to stop smoking.But such experiments are obviously not allowed morally.

One of Pearl’s main contributions is the development of “Do calculus”. He and his students and colleagues show that if a person starts with an accurate graphical model of the causal structure of a situation and the arrow shows that some variables may be the cause of other variables, then in some cases, one can use Observe alternative interventions. That is to say, appropriate passive observation data can provide the same evidence as RCT, provided that the initial causal model is accurate. The advantage of RCT is that it provides evidence of causality without any initial causal assumptions. The advantage of the “Do calculus” calculus is that it provides the same powerful causal speculation without intervention.

Climb the ladder of causality

The final part of the book has entered the field of philosophy. Pearl describes the transition from merely observing correlation to testing causality, from the first level of the causal ladder to the second level.

Causal Ladder

The difference is that you only pay attention to the difference between the correlation in the data and the conclusion of the causal structure. However, the current situation has become quite complicated, because Pearl still insists that there is a higher causal relationship ladder: the third stage, which involves counterfactual reasoning.

If the world is not what it was, the counterfactual reasoning will assert what will happen. For example, consider this statement, “If Oswald does not shoot Kennedy, others will shoot Kennedy.” This statement takes for granted that Oswald did shoot Kennedy and claimed that if he did not If you do, how things will develop. We may have no reason to believe that this counterfactual is true, but it is easy to imagine what it will happen.

domeet webmaster