When you sign up for WeChat, you need to read an 8300-character privacy agreement. You must check “I have read and agree to the above terms” to proceed.

But we all know that you haven’t read it and don’t know what terms you agree with.

In theory, as long as you agree, the address, phone number, chat history, phone book, photos, text messages, and call history in the app can be collected and analyzed by the app.

What do they do with your private data?

First, we need to know what information your app can collect.

In the developer documentation provided by Android, we can divide the permissions that an app can obtain into ordinary permissions and dangerous permissions.

There are 140 general permissions, which can be obtained without user consent. Such as controlling your vibration sensors and calendar data.

The more sensitive and dangerous permissions are divided into 9 groups, such as recording audio, reading text messages, accessing photos, and more.

These authorizations are basically one-off. For example, you need to send voice through the app and turn on the microphone permission. Then this app can theoretically be recorded during your use.

At this time, you need to distinguish between the permissions of the app and the data that the app can collect.

In the privacy protocols of most apps, they will not eavesdrop and peek:

But when you use the microphone, it’s hard to say. For example, in the July 2019 report of the Guardian, Apple will collect 1% of users’ recordings of their conversations with Siri for a few seconds to improve Siri recognition. Accuracy.

But apart from the situation that is used to improve the product, there is currently no evidence that the app will secretly upload and analyze your recordings or photos.

The more common use of your data is to generate user portraits for analysts and advertisers. For example, WeChat knows the favorite emoticons after 00 and 60, and it takes 25 public transportations every month after 90.

And video creators like us can also see the gender ratio, age ratio, device usage and geographical distribution of viewers in the background.

Before Internet companies can do thisThe point is that you agreed to all of this, although you probably haven’t seen it before.

But their bottom line is that you cannot reverse your personal information through big data.

For example, suppose we have 100 followers. In the background, we can see 50% from Guangdong, 40% from Jiangsu, and 10% from Hubei.

At this time, if we remove one follower per sheep, the geographical proportion of the remaining 99 people becomes 50.5%, 40.5%, 9%.

In this way, the privacy of each sheep is exposed, and we can easily infer that he is from Hubei.

A common solution to this problem is differential privacy.

The core idea of ​​differential privacy is to add noise to the data, so that whether or not there are beautiful sheep will output similar results. Whether it is 100 or 99 followers, the proportion of viewers from Hubei may be 9%, 10%, 11%.

But on the other hand, the noise should not be added randomly, otherwise the data will not be meaningful.

The most common noise algorithm is Laplace noise. Laplace is a common continuous probability distribution, which looks like this when the position parameter = 0. This b is a scale parameter. It can be seen that the larger b is, the flatter the Laplace distribution is, and the higher the amplitude of the data fluctuation is.

The size of the Laplace noise added to the original data depends on two parameters, the sensitivity Δf and the privacy budget ε set in advance.

In simple terms, Δf represents whether there is any change in the final result per sheep, and Δf / ε can be used to obtain the scale parameters in Laplace.

So, if the difference between the two data sets is larger, the sensitivity Δf will be larger and the distribution will be flatter. In this way, the added noise may be very large, allowing the two data sets to output similar results.

If the data set gap is not large, Δf is small, and the output noise is likely to be close to 0, which does not affect data availability.

Internal leaks are more common than external attacks.

On this issue, the practice of Internet companies is usually research, the most important of which is ISO / IEC 27001. As the highest globally recognized information security standard, 27001 includes information security14 modules including human security, physical security.

Completing this set of certification is time-consuming and labor-intensive. The review agency needs to complete nearly 100 items of review through interviews, sampling, and site observation. For example, check the employees’ desktops for sensitive information, check whether the electronic documents indicate the confidentiality level, and the confidentiality agreements of the employees.

In 2014, ISO proposed 27018 on the basis of 27001, a more stringent extension standard on privacy issues. Enterprises like Baidu Netdisk need to build more sophisticated privacy protection systems such as “data protection authority system” and “data desensitization processing algorithm”.

This also means that your app has to spend more money to protect your privacy. In China, only products such as BAT, such as Baidu Cloud, Baidu Web Disk, and Alibaba Cloud, have passed these two certifications .

This way we can say that our data is relatively secure.

But this does not mean that your data is absolutely secure. Exceptions are written in all privacy agreements today.

For example, according to Baidu Cloud ’s privacy agreement, in these 11 cases, you do not need to ask for your consent to collect and use your information:

For example, the pornography you uploaded may be considered to be related to public safety or criminal investigation because of the crime of disseminating obscene articles in Article 364 of the Criminal Law, thereby preventing your privacy.

In this 2018 judgment, Mr. Chen from Anhui uploaded 189 pornographic videos on Baidu’s online disk, and carried it out on Taobao under the name “Mystery HD Tutorial Video Design Materials of the Way Men and Women Colleagues Get Together”. Sales. Sentenced to 3 years and 8 months in prison.

But if you upload other files, Baidu will still try to protect it.

For example, in this 21,000-word judgment that lasted for 3 years, Baidu Netdisk insisted on not deleting the resources of the TV series “Hurry That Year” stored in the netdisk in order to protect user privacy, and was awarded 50 in the first instance. Ten thousand yuan. However, in the second instance, Baidu analyzed the two concepts of storage and dissemination. While not interfering with the user’s storage privacy, the network disk also actively cut off the transmission path of pirated content. As a result, the verdict was miraculously reversed, and even the 40,000 yuan case acceptance fee was paid for “Hurry That Year”.