Opinion: Alex Alben on AI and the Copyright Challenge
The current Writers Guild strike and a recent lawsuit filed by thousands of authors against the OpenAI corporation highlights the turbulent landscape created by generative AI and raises intriguing copyright questions. These suits and the writers’ complaints center on one of the basic features of popular natural language models, mainly that they “ingest” tens of thousands of books and other material, much of it still under copyright protection, in order to generate responses to user requests for new material.
Asking a generative AI tool to “write a three-act play about diverse individuals stranded on an island” will yield a competent work in a matter of seconds. The work will include scenes of bravery, survival, and probably a love story to boot. Because the AI program does not list the specific sources that it drew upon to create the new work, the question arises as to whether and how to compensate the tens of thousands or more authors whose works were scraped—without their overt permission—into the data set upon which the natural language model operates.
Fortunately, American Copyright law has a clear test for whether an author’s original work has been infringed. If the new work is “substantially similar” to the previously published work, then the copyright in the original work is deemed to be violated. This test was explained by renowned Judge Joseph Story in 1841, where he concluded in a case of a biographer copying hundreds of George Washington’s letters that, “It is certainly not necessary, to constitute an invasion of copyright, that the whole of a work should be copied, or even a large portion of it, in form or in substance. If so much is taken, that the value of the original is sensibly diminished, or the labors of the original author are substantially to an injurious extent appropriated by another, that is sufficient, in point of law, to constitute a piracy pro tanto.” In our system of justice, we rely upon juries to determine “whether the value of the original is sensibly diminished,” and if an AI tool produces a new work that fails this test, then it (and its corporate overlords) would be liable for infringement and damages.
The novel problem with large natural language models is that they draw on so many existing works that parsing how much of any one given work has been used is impossible. Yet this problem has its antecedents in the search engines that crawl the entire Internet in order to produce search results. In 2003 and 2007, cases were brought against early search engines, including Google, asserting that they violated copyright by producing search results that referenced or copied small portions of copyrighted material, such as thumbnail photos. The courts found that search engines needed to be trained on vast amounts of data in order to be useful and that the reproduction of links or small versions of images was permissible.
Another analog to the copyright challenge posed by AI can be found in the human endeavor of creative writing. Human authors spend much of their lives reading other books, watching films, listening to music and building their own internal “data sets” of works of fiction and nonfiction. When writers are asked to create new work, they unavoidably draw upon their past experiences and memories. We don’t ask musicians, video artists or writers to disclose all of the previous works they may have drawn on when they produce a new work, because we trust their creative process. If for some reason their new work closely resembles a previously published work still under copyright protection, then we have the substantial similarity test to fall back upon. Our copyright system is intentionally designed to favor the creation of new artistic works. That’s why common literary elements or “scenes a faire” are not protected under copyright law, which aims to protect original expression, not basic underlying ideas, such as the story of people stranded on an island.
One major exception to the use of previously published copyrighted material is the “Fair Use” test, which Congress codified in the 1976 Copyright Act. If a use is for a purpose such as commentary, parody, research or scholarship, or if a work doesn’t diminish the economic market for the original work, then courts may find Fair Use. This past term, the Supreme Court ruled 7-2 that the use by Andy Warhol of a preexisting photograph of the artist Prince by Lynn Goldsmith did not meet the Fair Use test, concluding that Warhol should have licensed her original photo. This case may inform the way future courts analyze whether an AI tool has “transformed” an original copyrighted work.
It's premature to conclude that evolving AI technologies won’t pose future legal questions that defy analysis under our existing copyright tests, yet it is interesting to note that the more these natural language models mimic human ways of thought and creativity, the more they might merit the creative protections that we currently accord to human beings.
Alex Alben teaches Privacy and Internet Law at the UCLA School of Law and is the author of “Analog Days--How Technology Rewrote Our Future.”