They tried to back up the entire internet.

Editor’s note: This article is from WeChat public account “Geeks Park” (ID: Geekpark), author Shen Zhihan.

Under the Martin Luther King, Jr’s Wikipedia entry, there are more than 300 footnotes, including 66 book references.

This is why people trust Wikipedia. Every description of almost every entry has its own track. The reader can check the accuracy of the entry text through reference materials.

But even if it is an Internet encyclopedia like Wikipedia, it can record very limited. An article in The New Yorker entitled Can the Internet be archived? The article once wrote, “The network always lives in the moment. It is illusory, short-lived, unstable, and unreliable. Sometimes the pages you want to visit point to 404… Sometimes the page you want to query has been Updated content coverage – this is more cumbersome, because the web page won’t tell you that what you see is not what you want to query.”

So, is there a way to find those 404 or pre-modified web content?

Back up the Internet

Someone tried to back up the entire internet.

In 1996, Brewster Kahle founded the Internet Archive, a public interest website, for fear that information on the web could not be preserved forever as printed in books.

Many people define Internet Archive as the greatest search site. The search tool developed by Kahle Wayback Machine regularly collects and crawls information from global websites and saves them. Wayback Machine’s work also has a primary and secondary point. The number and frequency of different sites are different.

As of now, Internet Archive has saved 330 billion web pages and page snapshots, and the greatness of Internet Archive is that, in addition to this, this huge archive also records 20 million books and texts and 8.5 million audios. And video, 3 million images and 200,000 software programs.

All in all, what Internet Archive wants to do is to make information acquisition simpler and more accurate. Recently, Internet Archive and Wikipedia have teamed up to do one thing, making Wikipedia even more reliable. INternet Archive has linked 130,000 book references in Wikipedia’s footnotes to Internet Archive 50,000 (covering English, Greek and Arabic) digitally scanned and publicly available books. The viewer can view the two-page context preview of the referenced part by clicking on the page number of the footnote.

Internet Archive: Record those forgotten Internet

The viewer can view the two-page context preview of the referenced part by clicking on the footer’s page number | Internet Archive

Web Library

The above-mentioned “New Yorker” article said, “The footnote is a milestone in the history of human civilization. It took centuries to invent and spread it. It took only a few years to destroy it. For example, the footnotes of books and papers in the past. It allows you to get an accurate picture of the extra information and the source of the information. Now, when you move to the Internet, you can still get more information by clicking on the footnote link, but you don’t know which day the link will be invalid.

In October 2016, Wikipedia and Internet Archive announced a collaboration to resolve the failed link issue. InternetBack Bot, developed by Wayback Machine, led by Mark Graham, automatically scanned the invalid link to the Wikipedia footer and automatically linked the failed link to the page saved by Wayback Machine. . “We edited 14 million links and more than 11 million links to Internet Archives,” Graham said.

Linking books is similar, but more challenging. Graham explained that not all books have an ISBN code, and not all footnotes refer to the correct reference format, with a specific page number.

Internet Archive calls itself a web library. Many offline libraries will also borrow books from users after they have been digitized. When you are interested in a book that is referenced, you can ask the Internet Archive to borrow it in electronic form.

Internet Archive began working on the digitization of books in 2005, and it has 3.8 million copies in its “collections.” Current Internet Archive There are 22 locations around the world, and 100 employees accelerate their scanning at a rate of 1,000 per day, even if there are millions of books waiting in line.

In the digital age, people are getting farther and farther away from books. Kahle said, “We want to start connecting Wikipedia with readers and books by weaving books into the Internet.”

Internet Archives

80,90 After the youth may stop with the closing of the horizon and the watercress one day, Facebook has only been in existence for more than a decade. The Internet has accelerated the dissemination and iteration of information, and the faster people forget it. But in the Internet Archive, the old people can see the hot topic of the time, “Make Machine”, the End of the World community, and now there are some “non-mainstream” Sina Weibo homepage snapshots.

Internet Archive: Record those forgotten Internet

Internet Archive: Record those forgotten Internet

A snapshot of Tianya and Sina Weibo saved by Internet Archive | Internet Archive

As the New Yorker commented, it’s almost certain that if something isn’t included in the Wayback Machine, they’re never there.

On July 17, 2014, a Malaysian Boeing 777 aircraft crashed in Ukraine less than three hours after takeoff. Ukrainian opposition commander Strelkov issued a message on Russian social media VKontakte, “We just shot down a plane, an AN-26.” This post contains a video link of the wreckage of the plane, which looks like a Boeing 777, and was subsequently delete. The next day, this post was included in the Wayback Machine, and the Internet Archive posted on Facebook, “This is what we mean.”

As the Financial Times commented, in a false message, the extreme LordThe content of righteousness is rapidly created and disseminated. In the era of continuous iteration and renewal of social media information, it is possible to record “who said what”, “when did you say something” and the importance of the content being unchangeable is magnified. Studying historical information from different periods through the Internet Archive is a greater value.

For example, after Trump was elected, the Internet Archive collected more than 6,000 videos including Trump’s pre-employment to help people identify and verify false information.

However, it is not easy to establish a global Internet archives, in part because countries cannot be unified on legal issues such as legal deposits, copyrights, and privacy.

At the beginning of this year, The Society of Authors stated that the Internet Archive practices were allegedly infringing – all book scanning and lending in the UK must be authorized by the copyright owner, and each borrowing can bring 8.52 to the author. Penny’s public lending. The British Writers Association accused the Internet Archive of not obtaining permission from the author and not paying any compensation.

After a short time, a paper signed by the National Writers Union, signed by the remaining 36 organizations, including The Society of Authors, condemned the Internet Archive and collaborative libraries for scanning and distributing e-books. Although the Internet Archive explained that he signed the CDL (controlled digital lending) agreement – allowed the library to digitally print the book and lend it to the user without the permission of the copyright holder.

The premise is to specify the amount of lending and the upper limit of time, and based on the fair use system, the amount of lending must be the same as the number of books before digitization (once a physical book is lent, its corresponding electronic version) Can’t lend, and vice versa.)

The law can’t keep up with the pace of technological iterations. Just like many dare to be the first, Internet Archive is in the midst of resource sharing and copyright supremacy.

Internet Archive: Record those forgotten Internet

Foundation of Internet Archives Brewster KaHle | Wikipedia

“In the ancient days of the Internet in China, people not only used the Internet, but people at that time participated in the construction of the Internet… For example, going to Wikipedia to compile entries and manage content. In the Chinese Internet world, people go to Douban to add movies. The entries for books, music albums, etc., are easy for other users to mark, collect and comment.” Network writers and dishes have written so.

This may be similar to the Internet world that Internet Archive wants to build. In Graham’s words, Internet Archive hopes to spread all knowledge. Kahle said that although Internet Archive is rooted in San Francisco, it has little to do with today’s Silicon Valley. He hopes that the “heritage” of all technologies will not be in the hands of a few people. “I like the feeling that many people can win.”