This article is from the WeChat public account: All Media Pie (ID: quanmeipai) , author: Tencent media, the title figure from: IC photo

Since the Cambridge Analytica incident [1] involving the leakage of 50 million users ’data two years ago, people have renewed their understanding of the potential of big data and the potential threat to platform data manipulation. Growing alert.

Internet users who are accustomed to surfing the web know more or less that various software in their mobile phones and computers are silently collecting their own usage data and personal data. Different software can even “interoperate with each other” no”. The keyword that the user searches in the browser for one second may appear in another software as the corresponding advertisement in the next second …

In the face of the obvious risk of streaking information, there are not a few users who know hindsight, so that when an Internet tycoon publicly stated, “Chinese users are more open and less sensitive to privacy issues. When they were willing to trade privacy for convenience, netizens realized that they had been seriously offended in this matter.

As the black boxes of data side effects in recent years have been opened one by one, some people have begun to resist in groups. For example, in just two days after the Cambridge Analytica incident was exposed, Facebook ’s market value shrank by $ 50 billion, and the negative effects continue to this day.

However, as a social platform with 2.5 billion users, Facebook has a very strong vitality. At the same time, the majority of netizens have also started a long way to fight against data manipulation, using policies and platforms. Tools and your clever little head, take the initiative to maintain your own information security.

In this issue, All Media Group will take Facebook ’s data management tools open to users as the main observation objects, reveal some details of the platform ’s data acquisition path, and summarize some methods to master data initiative.

What data did the platform get?

Many times, it is not that users are not so sensitive about privacy issues, but that users know very little about the platform’s data collection capabilities and data utilization methods. In the case of asymmetric information, if you want to take the initiative of your own data, you must first master the basic knowledge related to the data collected by the platform.

This year ’s International Data Privacy Day, Facebook opened the door to technology to restore its image, and its newsroom launched an article “Opening this Decade with Better Control of Privacy Control” [2], urging users to set up account privacy, Visibility permissions, and introduced new privacy-related features: officially launched off-Facebook activity worldwide and third-party authorized login tips.

Let’s look at the off-Facebook activity first. For most users, it is a stepping stone to help understand the platform’s ability to collect data. This feature will let you know which third-party software and web pages have sent your usage information to Facebook. Moreover, as long as the device has logged in to the corresponding account, the software will continue to receive data on the use of various third-party software on the device used by the user when the user is not running Facebook.

Off-Facebook activity related pages in Facebook software

The richness of these information reporting lists is stunning. Taking my Facebook account as an example, these third-party platforms in the exported data include applications that have been logged in with Facebook as an account: games, shopping software, news apps, academic platforms; and some that have not been logged in through Facebook. And its eight-pronged application: such as Keep, Meitu Xiuxiu, which are more commonly used in China.

In terms of specific numbers, as a commonly used software user of WeChat, there are 62 third-party platforms in my off-Facebook activity list. “Atlantic Monthly” technology reporter Kaitlyn Tiffany mentioned in an article exploring the feature that her off-Facebook activity includes 1,081 third-party platforms. [3]

As for the reasons why these seemingly unrelated platforms send data to Facebook, the latter explains it as follows: users use third-party platforms to generate data, and related platforms upload the corresponding data to Facebook when they use Facebook’s business analysis tools The database makes Facebook own the data. According to the phone number, email address and other information provided in the data, the corresponding users can be matched, and then it can be used to push advertisements to users.

So it seems that Facebook’s business analysis library is a mature and widely used business tool. The logic for self-consistency is that the data is collected by a third-party platform, not Facebook’s own hands. For users, there is only one impact in the final analysis: Regardless of whether you actively log in with your Facebook account, the platform may obtain traces of your use on third-party platforms.

This is the case for platform data other than Facebook. The various user behaviors that occur within the product naturally cannot escape the eyes of the platform. To understand the latter, we can turn our attention to another feature in Facebook settings-Advertising preferences (Ad Preferences) The platform exposes users to another attempt to use their data.

Six options in the Ad preferences feature

Ad preferences in Ad preferences: your interests

This feature page will show users the “advertising labels” that the platform has affixed to them, and the reasons for the labels. Users can find in the column of their advertising interest preferences that the platform has the following two bases when classifying users’ advertising interests:

1. Actions related to this topic have been performed on Facebook

2. The relevant product software is used.

In the controllable personal information options, you can know that Facebook will also push advertisements based on basic information such as education, love status, and age.

The use and collection of information in advertisement placement can also be used from the perspective of advertisers to discover one or two from the advertising positioning options provided by Facebook to brands.

It can be seen that there are five ways of audience targeting [4]. The data collection revealed by the five positioning methods is basically consistent with the related data mentioned previously:

1. User’s own application data, including basic personal information, whether consumption has been made in the past 7 days / 30 days, etc.;

2. The merchant uploads a list of target users to Facebook for comparative analysis, which is consistent with the operations mentioned in the off-Facebook activity;

3. Attract users who have downloaded the advertiser’s software but are not actively using it;

4. Locate according to the merchant’s Facebook homepage information or the merchant’s regular customer characteristics;

5. Record the ID of their Apple, Android, or Facebook account when customers consume, store the corresponding data, and locate it accordingly.

At this point, we have a certain understanding of the scope and logic of Facebook’s crawling data. If you want to further and more specifically understand what usage data Facebook has collected, you can also apply to the platform to “download your information.” “After the platform has passed the audit, enter the login password again to verify, and you will get a huge compressed package. In the compressed package is the information collected by Facebook about your use of the site.

All the information in the package is displayed in the form of a web page. All the activities in the time range you have selected are detailed to a message and device you sent to your friends on the platform in a few days. Clear view.

Off-Facebook activity and Ad Preferences are placed in the ads_and_businesses folder. You can see dozens of files with the suffix .html. Each file is data sent by a third-party platform. . The data is sorted by time as the label, and is divided into application advertisement (AD_REQUEST) , open app (ACTIVATE_APP) , browse content (VIEW_CONTENT) , custom (CUSTOM) , generate response (GEN_RESPONSE) and other different functional sections.

The off-Facebook activity page in the downloaded data: details of user information sent to Facebook by third-party software.

Regaining autonomy: Do not use data without my authorization

Details of data portraitsCareful and precise, once excavated, it is as if the real life is in a digital panoramic prison, and every move is recorded in the form of data information.

The development of technology is irreversible. Today ’s users ca n’t easily abandon the convenience brought by technology, but everyone still has the right to selectively open their own use of data. At least in terms of data privacy, the user’s right to speak is increasing. Especially with the EU ’s General Data Protection Regulation (GDPR) and the United States’ California Consumer Privacy Act (CCPA) , the platform had to start reviewing and updating its own data policy. Facebook proactively launched these plug-in functions to improve the transparency of data operations. It was difficult to imagine in the past, but it is now becoming a reality.

Daniel Sauter, director of the new school ’s Master of Data Visualization program, believes that the launch of the off-Facebook activity may be to comply with the data collection and disclosure requirements in the CCPA, which took effect on January 1. The data that the user can obtain does not have much operability in itself, but more of a notification function. [5]

Although the data available to users is limited, the release of the tool provides users with a new option: the option to turn off data and account linking. Users can use this function to erase information they do not want others to get from their accounts. After applying this function, the past usage information of on the platform will be disconnected from your account, and neither the platform nor the advertiser can track you through this trace.

There are also initiatives in the Ads Preferences feature. In this feature page, Facebook reveals to users how they classify them into different “advertising categories”, while giving users a choice: Decide what ads they see.

As shown in the figure, users can independently turn off the functions of personal information tags such as occupation and education level to be used to place relevant advertisements, cut off the connection between data from third-party software and advertisement placement, and also block the corresponding advertisement topics. In this way, the advertisements you see are no longer determined unilaterally by the platform and advertisers, and you can also be included in the decision-making level and personally customize within the scope of the platform’s choices.

If the tool provided by the platform is a 1.0 manual for retrieving data initiative, then according to your needs, targeted disclosure and obfuscation of the platform’s data algorithm are the 2.0 advanced manuals that control the initiative.

After supplementary research on the basic information above, we can see which part of the data traces the platform and advertisers, and users can conduct a detailed review of this part of the data category, thinking about the following issues: >

Is there any information about personal privacy preferences that I don’t want others to know? What types of data and information use may have an adverse effect on you? What types of data and information utilization can bring some convenience to daily life choices and consumption?

First, know what you want, then add the relevant information you have to develop your own specific course of action. For example, if you do not want data from third-party software to be captured by social platforms, you can prepare multiple phone mailboxes to construct different identity cards. Distinguish personal information such as telephones and mailboxes used on both sides, so that platforms such as Facebook cannot “justly” match your personal information. If there is an inexplicable match, legal weapons can be used to legitimately defend their rights.

For some specific materials that you do n’t want to disclose, users can pay more attention when surfing the Internet, and discontinue the data from the source instead of relying on the option to close the contact provided by the platform.

In addition, You can also use information to obfuscate the information cocoon room in a single environment, so that the so-called “artificial intelligence” can feel the trouble of “artificial retardation”.

A group of young people in foreign countries came up with such a method: using the loophole in the password modification procedure, several people logged in at different locations and using the same ins account, each person’s topic of interest is different, browsing and like topics will be A difference. In this way, it is difficult for the data algorithm to summarize constant rules from it, and the information flow recommended to them will always have new patterns [6].

Investing more effort in protecting data privacy will certainly become a regular social behavior in the future. As creators and legal owners of data, there is still much that audiences need to know. Whether it is to make good use of the tools provided by the platform, or to open up your brains and algorithmic wisdom, it is a positive way for the audience to master their data initiative in the participatory era.

Data, like technology, is not good or bad in itself. The quality of the output depends largely on how the platform is collected and used. Off-Facebook activity, such as user-oriented data open tools, is worthy of reference for more social platforms. At least, the platform should allow users to know at any time how their behavior data is collected, matched and used, not just for the first time When opening the software, a privacy authorization agreement pops up, which is not fair.

Reference link:


[2] https : //

[3] / 605680 / << br