What is behind the data?

The Translation Bureau is a compilation team that focuses on technology, business, workplace, life and other fields, focusing on foreign new technologies, new ideas, and new trends.

Editor’s note: In everyday data science operations, people rarely consider the meaning of “people” in them, even though the data itself is for people. Therefore, to be the perfect statistical solution, you should not only read technical articles, but also read some works that study how people make choices and how to use data to improve these choices. This article will focus on six books that shape the “worldview” rather than the “methodology” to help you better serve the reality through data. This article was translated from the original article entitled “A Non-Technical Reading List for Data Science” in Medium.com.

Today’s data scientists are often asked to learn a range of modeling techniques, algorithms, and so on. For example, linear regression, many people use it, but do not know why it is used, so there will be such a situation, many new people are ready to deploy the model at any time, but do not understand the actual situation, these technical groups did not The focus is on solving the blindness of the technology, but on the debate about which tool to choose (R or Python).

In contrast to what some data scientists may wish to believe, we can never simplify the world into numbers and algorithms. Ultimately, decisions are made by humans, and wanting to be a professional data scientist means understanding both humanity and data.

Please understand the following case:

When American technology company Opower, a company dedicated to exploring energy data and providing energy-saving advice to users, wants to save people money, they provide customers with a lot of electricity and Cost data. However, these cold numbers alone are not enough to make people change. To achieve this goal, Opower needs some knowledge about psychology and behavioral science. For example, research shows that if the smile on the household energy bill indicates that the cost is lower than the neighbor’s average, and the frown expression is higher than the neighbor’s average, people’s Energy use will be reduced and will also be more in line with the environmental requirements of energy companies. The municipality of the city uses this to bill randomly selected customers, compare their energy use to similar households and the most energy-efficient neighbors, and give advice on how to save energy. In the second year, statistics show that users of high-consumption households have reduced electricity consumption by 3%!

When electricity is used less than neighbors, this simple intervention in people’s electricity bills directly saves millions of dollars in costs and effectively prevents millions of pounds of carbon dioxide emissions. forFor a data scientist, this can be a shocking thing – people directly intervene in the results of statistics! But this is the chief scientific officer of Opower, Robert Cultini (Robert Cialdini) It is not surprising. After all, he was a psychology professor and wrote some books about human behavior. He has a lot of research on consumer psychology. This move by Opower also signals the market: although you can get any data you want, you still need to understand the impact of people’s behavior on the data, and sometimes the latter is more rewarding than the former. willing.

Book list recommendation: Six good books bring you to the entry data science

The histogram may not be the most effective visualization method compared to smiley faces

As a data scientist, under the influence of daily work and school education, we rarely consider the meaning of “people” in it. To take a step back, we don’t really think about the social impact of this data work. Therefore, in order to be the perfect statistical solution, you should not only read technical articles, but also expand and read some works that study how people make choices and how to use data to improve these choices. Personally, this is crucial. .

To this end, in this article, I will focus on six books that shape the “worldview” rather than the “methodology”. The knowledge does not involve mathematics and algorithmic explanations, but it teaches people how to pass data. Better serve the reality.

No.1 The Signal and the Noise Signal and Noise

Author: Nate Silver ( Nate Silver )

Book list recommendation: Six good books bring you entry data science

This book is probably one of the most popular statistical-related books in the world. Signal and noise are a common metaphor in data science. “Signal” refers to the facts we want and need, and “noise” is another matter, usually referring to irrelevant information that hinders or misleads us to search for real signals.

In the era of big data, a huge amount of information is flooding around us. However, with the gradual acceleration of the pace of life in reality, the speed and number of predictions made are gradually increasing.

However, the truth is that many predictions in the real world have failed, and people have paid a huge social cost. This book examines predictions from hurricanes to earthquakes, from the economy to the stock market, from the NBA to political elections, and aims to answer a question: how to filter out real signals from the cumbersome massive data and abandon the noise. Interference, thus making a close prediction. Silver believes that there is no precise answer in the future. Only the basic laws practiced by great prophets can help us improve society.

Anyone can get the following experience from the simple advice provided in the book:

  • Think like a fox (rather than like a hedgehog):Isaiah Berlin in ancient Greek poetry Under the influence, he wrote the famous article “Hedgehog and Fox”. In this article, Berlin distinguishes between two types of intellectuals: one is the hedgehog, which uses one point of view to unite the world’s understanding; the other type of fox acknowledges the incompatibility of the experience and rejects a single point of view. If you only have one idea, you will tend to look for evidence to confirm it, ignoring anything that contradicts it. If you have a lot of small ideas, you will be more concerned about what is right, not what supports your current beliefs, and you can give up any ideas when the evidence no longer supports them. These two different ways of thinking also explain why people who have more confidence in their own predictions (such as stock experts) tend to be more prone to errors.

  • Do a lot of predictions and get quick feedback: We make estimates that are more accurate for frequently occurring events, mainly because of the impact of feedback and improvement cycles. Everyday weather forecasts are wrong, and this information is entered into the model to make tomorrow’s weather forecast better (this is one reason why the weather forecast has improved dramatically over the past few decades). When we are faced with rare situations, it is the hardest choice we make. In these cases, using as much data as possible is the key.

  • Use as many different data sources as possible: each numberProviders have their own biases, but by summarizing the different estimates, you can use the average to find the error, which is well reflected on Silver’s Fivevetirtyeight website. This method of group intelligence means using resources that are inconsistent with your point of view, rather than relying solely on “experts” in a particular field.

  • To include the uncertainty interval, don’t be afraid to update your opinion when the evidence changes: The biggest mistake people make when making predictions is to provide only one number. Although there is an answer that the public hopes to hear, the world has never been strictly black or white, but exists in the grey transitional shadows, and we have a responsibility to show this in our predictions. Expressing uncertainty seems to be weak. For example, Hillary has a 70% chance of winning, which means that you are right regardless of the outcome, but in reality it is more realistic than a single “yes/no”. In addition, people think that changing others’ opinions is an aggressive negative behavior, but in the process of data science and the formation of a world view, the changes in basic cognition are almost inevitable and the advantages outweigh the disadvantages.

Summary of the author’s classic quotes:

Our predictions of the world will never be completely correct, but this should not prevent us from relying on well-proven principles to make higher predictions, thereby reducing errors.

Real predictors will think about problems in a probabilistic way. They are modest and diligent. They can clearly distinguish between what is unpredictable and what is predictable. They focus on any thousand steps closer to the truth. A small detail, they can identify what is noise and what is a signal.

From the health of the global economy to the fight against terrorism, it depends on the quality of prediction. Here, Signals and Noise can give you the answer you want. There are also two books of the same category, Philip Trok’s Super Forecast and Expert Political Judgment.

No. 2 Weapons of Math Destruction Algorithm Hegemony: The Threat of Mathematical Killing Weapons

Author: • Kathy O’Neill (Cathy O’Neill)

Book list recommendation: Six books bring you to the data science

Data scientist Kathy O’Neill believes that we should be alert to the mathematical models that constantly infiltrate and deepen our lives – their existence is likely to threaten our social structure.

By tracking a person’s life trajectory, Casey O’Neill tries to expose in his book the impact of such weapons of mass destruction on shaping the future of the individual and society. These “killing weapons” rate teachers, students, screen resumes, approve loans or refuse loans, evaluate employees, and even monitor our health, so Kathy O’Neill called on modelers to take responsibility for their algorithms. It also calls on policy makers to supervise the use of models. She also pointed out that the control of the model ultimately depends on ourselves. This important book forces us to face the problem and explore the truth.

We live in an era that relies on “algorithms”. It has an ever-increasing impact on our lives. Where do we go to school? Should I borrow a car to buy a car? How much should we spend to buy health insurance? It is not determined by people, but by the big data model. In theory, this model should make society more fair, because each person’s metrics are the same and there should be no prejudice.

But, as revealed in Kathy O’Neill’s book, the opposite is true. Today, the widely used algorithmic model, even if it is full of errors, remains unconstrained and unquestioned. Among them, the issue of “enhanced discrimination” is particularly worthy of reflection: if a poor student cannot be loaned because the loan model is considered too risky (only based on the neighborhood in which the student lives), then, He will be deprived of the opportunity to receive quality education that will help him out of poverty, and then fall into a series of vicious circles.

Therefore, the mathematical models we use today are opaque, unregulated, controversial, and even erroneous. Worst of all, mathematical models and big data algorithms exacerbate prejudice and injustice. O’Neill’s book seems to be pessimistic about the machine learning model, but I prefer to see it as a necessary criticism: because the enthusiasm around machine learning is so great, we need to be willing to take a step back and ask: these tools really Can you improve people’s lives? As a society, how should we accept them?

After reading this book, you will find that the weaponization of big data is ubiquitous. Maybe, when you see these things done by the world’s most powerful data system, you will have some anxiety, but for those we don’t realize, we can’t solve them.

In the author’s opinion, big data is like a black box, scale, damage and secret coexistence. In her book, she quotes a large number of cases based on big data and algorithms that change personal life in the United States. The algorithms that affect the life experiences of these cities have been specially observed and studied.

The author believes that the relationship between data and algorithms is like firearms and arms, dataWithout values, it is neutral, but input from human behavior is inevitably biased, and the data created by the algorithm is counterproductive to human behavior, leading to more injustice.

Kathy pointed out that once the algorithm model is in operation, law enforcement will increase, and the new data will further prove the need to strengthen law enforcement. To put it bluntly, the more “pre-existing” the place, the more the algorithm is “cared for”, which eventually forms a distorted, even harmful feedback loop. This view is also the recent intervention of Facebook in the US general election. Many domestic experts and scholars are hot on the core of the “Today’s headlines” push model.

However, the machine learning algorithm is just a tool, and it has the same duality as any tool. It can also benefit human beings when used properly. Fortunately, we are still in the early stages, which means we can transform them to make sure they make objective decisions and create the best results for most people. The choices we now make in this area will shape the future of data science in the coming decades.

Although data science is a young field, it has had a huge impact on good and bad in the lives of millions of people. As pioneers in this new field, our current staff are obliged to ensure that our algorithms do not become devastating mathematical weapons.

If you want to work in data science, this book is definitely a must read.

No. 3 Algorithms to Live By: The Computer Science of Human Decisions: The Algorithm for Guiding Work and Life

Author: Brian Christian & amp; Tom Griffiths

Book list recommendation: Six good books bring you entry data science

How Not to be Wrong: The Power of Mathematical Thinking Devil Math: The Power of Mathematical Thinking in the Age of Big Data

Author: Jordan Allen Berg ( Jordan Ellenberg)

Book list recommendation: Six good books bring you entry data scienceComputer science and statistics (including all other research disciplines) often encounter a problem in school classrooms: learning is abstract and boring Only when they are applied to solve real-world problems will they become interesting enough for us to explore. Both of these books turn boring themes into interesting, informative depictions. Describes how to use algorithms, statistics, and math in everyday life.

The so-called algorithm refers to an accurate and complete description of the solution to the problem. It is a series of clear instructions for solving the problem. The algorithm represents a systematic approach to describing the problem-solving strategy. If we think about the problem and clearly understand the algorithm we correspond to, we can more easily resolve the problem or solve the problem better.

For example, in an algorithm about lifestyle, the author shows how to use the concepts of exploration and utilization trade-offs and optimal stops to find out how long we should look for spouses (or new employees, restaurants, etc.). Again, we can use sorting algorithms to organize our items most efficiently so that we can quickly retrieve what we need. Although you may have come across these ideas and can even write in code, you may never use them to optimize your life.

In this book by Ehrenberg, it focuses on the charm of mathematics and how to get the skills to solve problems in life with mathematical principles. The author believes that mathematics can help us better understand the structure and nature of the world, and should be placed in the toolbox of every thoughtful person to better solve problems and avoid false and wrong methods. This book abandons complex terminology, uses real-life anecdotes, basic equations, and simple diagrams to give readers a zero-based math class.

Ehrenberg shows us the use and misuse of statistical concepts such as linear regression, reasoning, Bayesian inference, and probability through stories to help us learn optimal decisions. Applying the law of probability shows that playing lottery always leads to failure, unless in a few cases the actual return is positive.

The central quote of Allenberg’s book is mathematical thinking, “expanding common sense by other means.” In many cases, mainly in the distant past, our intuition served us well, but in the modern world, in many cases, our initial response was completely wrong. In this case, we don’t need to rely on intuition, but we can use probabilities and statistics to make the best decisions.

The rigor of these two books is just right, with some formula logic and many practical cases. In this book, I found a lot of data science concepts that I have never mastered in class. Finally, I read it over and over again and experienced the joy of getting knowledge.. Of course, math, statistics, and computer science are only useful if they can help you live better, and both books show the uses of these disciplines you have never considered.

No.4 Thinking, Fast and Slow Thinking fast and slow

Author: Daniel Kahneman (Daniel Kahneman)

Book list recommendation: Six good books bring you entry data science

Human beings are irrational, and we usually make terrible decisions in all kinds of situations in our lives. However, once we understand why we do this rather than take the best action, we can begin to change our behavior to get better results, which is the core of Kahneman’s decades of experimental results. His research opens up cognitive psychology, cognitive science, research on reason and happiness, and a new dimension in behavioral economics, and this book is also his masterpiece.

Kahneman reveals more than thirty rational deviations in Thinking, Fast and Slow, such as heuristic associations, including availability bias, anchoring effects, intuitive judgment, halo effects, etc.; , including hindsight, illusion of effectiveness, algorithm judgment, etc.; such as prospect theory, including risk decision, loss aversion, endowment effect, quadruple mode. They look like a three-dimensional mirror of thinking, 360 degree angles see the brain thinking process and stubborn deviations, let you know your own thinking decision process.

As a 2002 Nobel laureate in economics, Kahneman and his research partner Amos Tversky (famous for his research on the decision-making process) and Richard Seiler (Richard Thaler, winner of the 2017 Nobel Prize in Economics) and others have created the highlight moment of behavioral economics, and this branch of economics, which is quite a niche, has entered the field of more people. It will People are regarded as irrational decision makers, not rational people who pursue the maximization of utility. Of course, this is true. This has led to some dramatic changes in thinking and design choices in the realm of life, not just in economics, but also in the areas of medicine, sports, business practices, energy efficiency and retirement funds. We can also apply many of the findings in this book to data science, such as how to present research results.

The basis of “Thinking, Fast and Slow” is the author’s framework for thinking about humans: System 1 and System 2. System 1 refers to the uncontrolled or unintentional human beingThinking mode of thinking; system 2 refers to a mode of thinking that is controlled by oneself or that is consciously carried out. Thinking or judging with System 1 is very fast, almost entirely dependent on intuition and experience, so people often form ideas in their minds in the first place. But sometimes System 1 may not get conclusions or even get wrong conclusions. In this case, humans often ask System 2 for a more complicated and laborious thinking process to supplement or correct the system.

However, the above statement does not mean that system 1 is perceptual and system 2 is rational. In fact, system 2 is often affected by system 1. This effect may be correct or it may be wrong. Moreover, System 2 is very lazy and often neglects to verify, so that the errors formed by System 1 cannot be corrected.

This difference makes our intuition full of mistakes. Experiments by Kahneman and other scientists show that complex and gorgeous sentences make us feel that information is rich and convincing, although they may not say anything; biting a pencil to keep a smile will also make the hot person There is a substantial improvement in mood, because the brain can’t tell if it’s really happy, or just a small wooden stick against the corner of the mouth—there is this unreliability of cognitive systems, and often the incompleteness of our information. Let us be more likely to rely on some simple ways to accomplish cognitive tasks in our daily lives.

These are all Kahneman’s studies in traditional psychology. It is this kind of research that makes him realize that the assumptions of rational people in economics are flawed. The utility that people consider when making choices is to compare future expectations with current possessions before making judgments. There are two different modes of brain work in the judgment that affect the final result.

This book is essential for understanding how people make decisions and what we can do as a data scientist to help people make better choices.

In addition, this book has some other conclusions that apply to the description of self-concept: experience and memory. Experiencing the self is the feeling of the moment we have in an event, but it is much more important than memorizing the self. The memory of the self is the perception of the event afterwards. Memory self evaluates an experience based on the end of the peak of somatosensory, which has a profound impact on health, life satisfaction, and the task of forcing yourself to be unhappy. We will remember that the event takes much longer than we have experienced, so in one experience we must strive to maximize the future satisfaction of our memory self.

If you want to understand practical human psychology, not the idealized knowledge of traditional classrooms, then this book is the best place to start. Strictly speaking, Kahneman is not a writer who is passionate about popular science books, but his outstanding academic contributions with his colleagues have profoundly changed our understanding of ourselves. In contrast, in recent years, many studies have begun to focus on the neural basis of social behavior, using methods such as magnetic resonance brain imaging.The child explores the problem. Kahneman’s work focuses on human behavior and rigorously avoids too many inferences about mechanisms. In the eyes of today’s psychology, it seems not so fashionable. But his experiments are full of ingenuity and insight, but they are not a short-lived view in the history of science.

No.5 The Black Swan: The Impact of the Highly Improbable Black Swan: How to deal with the unpredictable future

Author: Nicholas • • Nassim Taleb (Nassim Nicholas Taleb)

Book list recommendation: Six good books bring you entry data science

There is only one place where Taleb can have a place on the list, and that is an outsider. Taleb was a quantitative trader who made a lot of money during the market downturn in 2000 and 2007. He has become an acoustic researcher, winning the attention of the world for his work. Adequate praise and criticism came one after another. At that time, Taleb perceive an idea that the failure of contemporary ways of thinking, especially in the era of uncertainty, is very serious. In The Black Swan, Taleb puts forward the concept that we turn a blind eye to the randomness that governs human activity, and thus, when things do not develop as expected, we are destroyed. The Black Swan was first published in 2007. Since the financial crisis of 2008 and 2016, it has become more persuasive, completely subverting the traditional set of mindsets.

Of course, according to the central premise, the question we have to think about is that events that are impossible will not happen often, so should we not worry about them? The key point is that although every event that is unlikely to happen is unlikely to happen by itself, it is almost certain that many unexpected events will eventually appear in your life, even within a year. The probability of an economic collapse in any given year is small, but together, you will find that there is a recession every ten years in the world, which is almost certain.

We should not only expect that events that make the world change happen at a high frequency, and we should not listen to the opinions of experts who are bound by past experience. Anyone who invests in the stock market should know that past performance does not predict future performance. We should consider this lesson in our data science model (a method of speculating using past data). In addition, our world is not normally distributed, but longAt the end of the distribution, there are some extreme events such as the Great Recession, or some wealthy individuals such as Bill Gates, who can cover everyone else. So when extreme events occur, no one can be ready to meet it, because such events far exceed the scale of any previous events.

“Black Swan” is important to data scientists because it shows that any model based solely on past performance can often go wrong and have catastrophic consequences. All machine learning models are built with past data, which means we can’t trust them too much. Models (including Taleb) are flawed, and in order to be as close as possible to reality, we should ensure that there are systems to deal with these inevitable failures.

It is worth mentioning that Taleb is not only known for his novel ideas, but his character is also extremely aggressive. He is even willing to compete with everyone, often criticizing scholars like Steven Pink (American experimental psychologist), or like Nate Silver (data analyst, who was accurate in the 2012 US presidential election) Public figures like the predictions of the 50 states. In our age of severe distortion, his ideas help to understand some things ahead, but his attitude may be a bit unpleasant. Still, I still think this book is worth reading because it provides a non-mainstream ideology.

(This book is the fifth part of Taleb’s second series. Insecto elaborates his complete philosophy. The Black Swan discusses the concept of events that are extremely unlikely to happen, and Incest: The fourth book in The Book of Confusions discusses how to make yourself not only able to withstand damage, but also to make yourself better. I think the black swan’s mind is closest to data science.)

Conclusion

After staring at the computer screen for a whole day, I can’t think of a better way to finish the fulfillment of the day than reading a book (print, e-book or audiobook). Need to be reminded that data science needs to constantly expand the tools in the toolbox, even if we want to relax, let our thinking away from work, but can not stop learning.

These recommended books above all need to be focused on reading, and they will teach us a lot about data science and life. These works will provide a useful complement to more technical works by showing what is truly driving human thought. Understanding people’s ideas in reality rather than idealized models is as important as implementing them for more effective data decisions.

Translator: Xiaozhuo

Original link: https://towardsdatascience.com/a-non-technical-reading-list-for-data-science-d72451429a70