Does AI transcribe audio as a text that constitutes a violation of the copyright of a written work?

Editor’s note: This article is from WeChat public account “Knowledge productivity (ID: zhichanli), the author 蓁蓁.

In the global digital book and reader market, Amazon has formed a certain sense of strength, and its Audible is also the leader in the audio book publishing and distribution market. The scenery is infinite and troublesome. Recently, Amazon’s audiobook company Audible was sued by the seven book publishers for copyright infringement and filed a lawsuit in the Southern District of New York.

Seven publishers claim that Audible’s captioning service infringes on book copyright and requires federal judges to ban Audible from using textual content in an education-related service.

It is worth noting that Audible only has the right to sell audiobooks, but the subtitles that match the sound are automatically generated by AI and are not purchased. Then does the text generated by AI transcription infringe copyright? Because there are no precedents, seven publishers are constantly looking for ways to prevent Audible from publishing the service.

Publishing the giants and the leader of audiobooks

First come to talk about the beginning and end of the lawsuit of “fairy fights” in this reading field.

In 2008, Amazon acquired Audible for $300 million. Audible is a 24-year-old company known for its audiobooks and today it has become a major player in the podcast industry and other forms of audio entertainment. In 2018, Audible said that its audible users downloaded nearly 3 billion hours of audio.

In July this year, Audible announced that it will officially launch the “Audible Subtitles” service when students return to school this fall. Through this service, readers can see the words automatically generated by AI on the smartphone screen while listening to audio reading. According to foreign media reports, Audible will provide students with free subtitle books for the “22nd Military Regulations” and “Hunger Games”. Audible founder Don Katz said the service will help young people with difficult reading.

This has caused dissatisfaction among seven publishers. They are the top five publishers: Hachette, HarperCollins, Macmillan, Penguin Random House and Simon & Schuster, as well as San Francisco-based publishers Chronicle Books and Scholastic, the children’s publisher owns Harry Potter and The Hunger Games. 》Copyright.

Publishers believe that the Audible subtitle service uses publisher-specific audiobooks to convert narrative content into unauthorized text and distribute the entire text of these new e-books, implementing such services without permission. It was a typical infringement directly prohibited by US copyright law, and Sudible was sued to the Southern District Court of New York. The text version is not included in Audible’s audio book license, and machine-generated transcription may also cause errors that compromise the quality of the work.

The publisher said: “If the service is not banned, Audible will adopt a digital distribution format to devalue the cross-format product market and harm the interests of publishers, authors and consumers.”

But Audible argues in a statement that the captioning service is only an educational function designed to help young children and improve their literacy skills. “It is not, and never intended to be, a book.” Audible speaks The person elaborated the difference between the Audible subtitle and the correct e-book and the restrictions on the audience, saying that the key difference between the service or function provided by the AI ​​and the e-book is that the page cannot be turned, and the user must wait for each line of text while listening. Gradually generated.

The case is unprecedented in the United States. Both parties are challenged

So, the question is, does AI transcribe audio as a text that constitutes a violation of the copyright of a written work?

Sun Yuanwei, CEO of the Asia-Pacific Law Institute of the United States, said that the audio content is presented in text, whether it is operated by artificial intelligence or human-human means, which translates certain existing works into The behavior of transcripts is very similar to the presentation of subtitles in film and television programs to a certain extent.

According to the definition of Article 101 of the US Copyright Law, such representation may constitute “derivative work” (also referred to as “derived work”), that is, it is a translation of an existing work or any The form is recast, transformed, adapted, or modified. Article 106 (2) of the US Copyright Law also expressly authorizes that the right holder can exclude unauthorised derivative behavior of others based on their copyright. Therefore, engaging in such behavior without the legal permission of the right holder (the plaintiff’s top five publishers) has a high risk of infringement.

And did Audible infringe copyright in this case? Sun Yuanwei believes that whether or not constitutes an infringement must be determined on the basis of the specific facts of the case. In addition, the key point of the case is not to determine whether the voice-to-text conversion through artificial intelligence can enjoy the protection of copyright. Artificial intelligence is only a master to assist the defendant product to achieve its functions and goals.Want tools.

Sun Yuanwei stated that the alleged infringement in the case was not initiated, engaged and completed by a machine or device with specific artificial intelligence that was truly and “spontaneously”. Behind the artificial intelligence operation, it still needs to rely on the operation of the natural person to start (including presetting specific operation instructions in the relevant software). Therefore, the infringement disputes to be dealt with in this case are not particularly different from the traditional infringement cases.

Article 106 of the US Copyright Law stipulates that the right holder of a work is entitled to reproduce, derive, distribute (distribute, including sales, rent, lend, or other transfer of ownership), public performance (public) Performance, mainly for literature, music, drama, dance, mime, film and other audio and video works), public display (public display, mainly for literature, music, drama, dance, mime, picture, graphic or sculpture) Six kinds of tenures, such as the public performance (here specifically refers to the digital transmission of recorded works).

The US law has so far not explicitly granted rights holders the right to “information network communication,” but a special study by the US Copyright Office considers the six rights of the current regulation, especially the right to copy, distribute, and disclose The protection of the right to play is enough to cover the same scope of information network communication rights, so there is no need to practice the law.

The case may involve infringement of the right of reproduction, derivation (adaptation or alteration) and distribution of the plaintiff’s work. If it happens in China, it may involve the determination of the original author’s right of reproduction, distribution, adaptation, translation and information network.

Because the facts presented in the case occurred for the first time in the United States, it is challenging for both parties. The plaintiff may argue that “audio-to-text” is like a translation and should be considered to constitute a derivative act, and may involve unauthorized or partial content of the original work in the actual operation of the defendant. Copying and dissemination (including information network dissemination) constitutes direct infringement. In addition, in terms of strategy, as a reinforcement claim, the plaintiff may also claim that it does not constitute direct infringement on the grounds of the defendant’s device or software, but as a slogan, it promotes indirect infringement of the works owned by the plaintiff (may include auxiliary liability). (contributory liability) and vicarious liability, depending on the circumstances of the case, can also induce infringement, etc.).

The defendant may use the “Cablevision Defence” to claim that its conduct does not constitute a direct infringement, that is, under the Federal Second Circuit Court of Appeal at Cartoon Network, LP v. CSC Holdings, Inc., 536 F.3d 121 (2d Cir. 2008), adjudicating that converting audio into words by conversion is a continuous process in which each sentence is only transient Staying, can’t be saved for a long time, this is not to copy the work of the plaintiff to another book, that is, the ever-changing text is presented on the video in the form of “horse”, and it does not really complete a complete from beginning to end. A copy of the book.

As for indirect infringement, the defendant may file an “Sony Betamax defense” in order to claim reasonable use, that is, according to the US Supreme Court in 1984 Sony Corporation of America v. UniversalCity Studios, Inc., 464 US 417 (1984) judgment (commonly known as Sony Betamax case), defending the function or service of this “transliteration” can be “commercially “Capable of commercially significant non-infringing uses”.

But the service has not yet been officially launched, and the case is also an attempt by the plaintiff to draw a salary – attempting to use the pre-litigation ban to make the defendant’s products or services “dead”. Under this circumstance, if the defendant makes the above defense, it will be more difficult to prove the evidence.

AI+IP is still facing a level, and the tort liability is difficult to identify

In fact, referring to AI+IP, domestic readers may recall that in May 2017, the poetry collection “Sunshine Lost Glass Window” created by “Xiao Bing” was officially published. This collection of poems is “Xiao Bing”. The modern poetry of 519 poets was created after more than 10,000 trainings.

In addition to Microsoft’s “Little Ice”, many companies have also developed a number of artificial intelligence products for the creation of various “literatures” of literature and art. The artificial intelligence DeepDream developed by Google can generate paintings, and its paintings have been successfully auctioned; the DreamWriter robot developed by Tencent and the Xiaomingbot of today’s headlines can automatically generate news articles according to the algorithm and promptly push them to the users.

The professor of the Law School of Renmin University of China, Wan Yong, pointed out in his work “Artificial Intelligence”, “Who belongs to the copyright”: Compared with previous technological innovations, the challenge of artificial intelligence technology to copyright law is the most fundamental and most comprehensive.

In this regard, Sun Yuanwei said that the development of AI+IP is still in the preliminary stage of discussion, but the opinions are quite different, including purely by people.Is the achievement of the work intelligence “ingenuity” and should it be protected by copyright? Conversely, most of the current artificial intelligence relies on a large amount of “absorption” of existing literature to assist the machine in “deep learning”. Therefore, when the result is suspected of “plagiarism”, whether it can bear the relevant infringement. responsibility?

Similarly, if you think that you should empower the results of artificial intelligence, assuming that the “deep learning” behind it comes from 1000 different sources or objects, then whether the 1000 contributing people should be the common rights holders ? If a right has multiple co-owners, then any license and use must be unanimously agreed by all right holders. Is such entitlement beneficial if it is difficult to adjust?

In addition, in addition to technical development and legal empowerment issues, from the practical application point of view, there are still some difficult levels for artificial intelligence:

(1) Human beings cannot give and require machines to engage in moral and value judgments (how to distinguish between good and bad in different scenarios, in fact, even within the same country, the same thing may be There are many different moral values ​​or value judgment factors);

(2) Artificial intelligence can engage in in-depth learning and give different analytical conclusions for specific fields, but it is difficult to engage in horizontal relationship and analogy between different fields;

(3) Artificial intelligence has no emotions, so it is impossible to incorporate irrational factors as the basis for considering, analyzing and screening specific conditions and problems.

An interesting phenomenon that has recently emerged is the emergence of news reports that completely write artificial patent applications through artificial intelligence. The question behind it is how the relevant potential liabilities and risk exposures should be defined.

As of now, although there are still many problems and levels in the development of AI+IP, it is gratifying that the emergence of the case does provide an opportunity for the judiciary to handle works involving copyright protection and guide people to think. How should the scope of the “service” provided by a third party be clarified? What is the standard? In the United States, in the telecommunications and regulations for the protection of people with disabilities, it is required that all audiovisual works must be accompanied by subtitles for all content for the viewer to choose, which implies that there are very large business opportunities.

Responding to the network environment, the outcome of the case is expected to clarify whether the license for the audio contains permission to convert the audio to text.