throbber
Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 1 of 17
`
`
`
`Joseph R. Saveri (State Bar No. 130064)
`Cadio Zirpoli (State Bar No. 179108)
`Christopher K.L. Young (State Bar No. 318371)
`Kathleen J. McMahon (State Bar No. 340007)
`JOSEPH SAVERI LAW FIRM, LLP
`601 California Street, Suite 1000
`San Francisco, California 94108
`Telephone:
`(415) 500-6800
`Facsimile:
`(415) 395-9940
`Email:
`jsaveri@saverilawfirm.com
`czirpoli@saverilawfirm.com
`cyoung@saverilawfirm.com
`kmcmahon@saverilawfirm.com
`
`
`Matthew Butterick (State Bar No. 250953)
`1920 Hillhurst Avenue, #406
`Los Angeles, CA 90027
`Telephone:
`(323) 968-2632
`Facsimile:
`(415) 395-9940
`mb@buttericklaw.com
`Email:
`
`Counsel for Individual and Representative Plaintiffs
`and the Proposed Class
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA
`SAN FRANCISCO DIVISION
`
`
`Sarah Silverman, an individual;
`Christopher Golden, an individual;
`Richard Kadrey, an individual;
`
`Individual and Representative Plaintiffs,
`
`v.
`
`OpenAI, Inc., a Delaware nonprofit corporation; OpenAI, L.P., a
`Delaware limited partnership; OpenAI OpCo, L.L.C., a Delaware
`limited liability corporation; OpenAI GP, L.L.C., a Delaware
`limited liability company; OpenAI Startup Fund GP I, L.L.C.,
`a Delaware limited liability company; OpenAI Startup Fund I,
`L.P., a Delaware limited partnership; and OpenAI Startup Fund
`Management, LLC, a Delaware limited liability company,
`
`Case No.
`
`Complaint
`
`Class Action
`
`Demand for
`Jury Trial
`
`
`
`
`
`Defendants.
`
`
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 2 of 17
`
`
`
`Plaintiffs Sarah Silverman, Christopher Golden, and Richard Kadrey (“Plaintiffs”), on behalf of
`
`themselves and all others similarly situated, bring this Class Action Complaint (the “Complaint”)
`
`against Defendants OpenAI, Inc., OpenAI, L.P., OpenAI OpCo, L.L.C., OpenAI GP, L.L.C., OpenAI
`
`Startup Fund I, L.P., OpenAI Startup Fund GP I, L.L.C. and OpenAI Startup Fund Management, LLC
`
`for direct copyright infringement, vicarious copyright infringement, violations of section 1202(b) of the
`
`Digital Millennium Copyright Act, unjust enrichment, violations of the California and common law
`
`unfair competition laws, and negligence. Plaintiffs seek injunctive relief an to recover damages as a
`
`result and consequence of Defendants’ unlawful conduct.
`I.
`
`OVERVIEW
`
`1.
`2.
`
`ChatGPT is a software product created, maintained, and sold by OpenAI.
`
`ChatGPT is powered by two AI software programs called GPT-3.5 and GPT-4, also
`
`known as large language models. Rather than being programmed in the traditional way, a large language
`
`model is “trained” by copying massive amounts of text and extracting expressive information from it.
`
`This body of text is called the training dataset. Once a large language model has copied and ingested the
`
`text in its training dataset, it is able to emit convincingly naturalistic text outputs in response to user
`
`prompts.
`3.
`
`A large language model’s output is therefore entirely and uniquely reliant on the
`
`material in its training dataset. Every time it assembles a text output, the model relies on the
`
`information it extracted from its training dataset.
`4.
`
`Plaintiffs and Class members are authors of books. Plaintiffs and Class members have
`
`registered copyrights in the books they published. Plaintiffs and Class members did not consent to the
`
`use of their copyrighted books as training material for ChatGPT. Nonetheless, their copyrighted
`
`materials were ingested and used to train ChatGPT.
`5.
`
`Indeed, when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’
`
`copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.
`6.
`
`Defendants, by and through the use of ChatGPT, benefit commercial and profit richly
`
`from the use of Plaintiffs’ and Class members’ copyrighted materials.
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`
`
`
`
`1
`COMPLAINT
`
`
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 3 of 17
`
`
`
`II.
`
`JURISDICTION AND VENUE
`
`7.
`
`This Court has subject matter jurisdiction under 28 U.S.C. § 1331 because this case
`
`arises under the Copyright Act (17 U.S.C. § 501) and the Digital Millennium Copyright Act (17 U.S.C.
`
`§ 1202).
`8.
`
`Jurisdiction and venue is proper in this judicial district under 28 U.S.C. § 1391(c)(2)
`
`because defendant OpenAI, Inc. is headquartered in this district, and thus a substantial part of the
`
`events giving rise to the claim occurred in this district; and because a substantial part of the events
`
`giving rise to Plaintiffs’ claims occurred in this District, and a substantial portion of the affected
`
`interstate trade and commerce was carried out in this District. Each Defendant has transacted business,
`
`maintained substantial contacts, and/or committed overt acts in furtherance of the illegal scheme and
`
`conspiracy throughout the United States, including in this District. Defendants’ conduct has had the
`
`intended and foreseeable effect of causing injury to persons residing in, located in, or doing business
`
`throughout the United States, including in this District.
`9.
`
`Under Civil Local Rule 3.2(c) and (e), assignment of this case to the San Francisco
`
`Division is proper because defendant OpenAI, Inc. is headquartered in San Francisco, a substantial
`
`amount part of the events giving rise to Plaintiffs’ claims and the interstate trade and commerce
`
`involved and affected by Defendants’ conduct giving rise to the claims herein occurred in this Division.
`III. PARTIES
`
`A.
`
`Plaintiffs
`10.
`
`Plaintiff Sarah Silverman is a writer and performer who lives in California. Plaintiff
`
`Silverman owns a registered copyright in one book, called The Bedwetter. This book contains copyright-
`
`management information customarily included in published books, including the name of the author
`
`and the year of publication.
`11.
`
`Plaintiff Christopher Golden is a writer who lives in Massachusetts. Mr. Golden owns
`
`registered copyrights in several books, including Ararat. This book contains the copyright-management
`
`information customarily included in published books, including the name of the author and the year of
`
`publication.
`
`
`
`
`
`2
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 4 of 17
`
`
`
`12.
`
`Plaintiff Richard Kadrey is a writer who lives in Pennsylvania. Plaintiff Kadrey owns
`
`registered copyrights in several books, including Sandman Slim. This book contains the copyright-
`
`management information customarily included in published books, including the name of the author
`
`and the year of publication.
`13.
`
`A nonexhaustive list of registered copyrights owned by Plaintiffs is included as
`
`Exhibit A.
`B.
`
`Defendants
`14.
`
`Defendant OpenAI, Inc. is a Delaware nonprofit corporation with its principal place of
`
`business located at 3180 18th St, San Francisco, CA 94110.
`15.
`
`Defendant OpenAI, L.P. is a Delaware limited partnership with its principal place of
`
`business located at 3180 18th St, San Francisco, CA 94110. OpenAI, L.P. is a wholly owned subsidiary
`
`of OpenAI Inc. that is operated for profit. OpenAI, Inc. controls OpenAI, L.P. directly and through the
`
`other OpenAI entities.
`16.
`
`Defendant OpenAI OpCo, L.L.C. is a Delaware limited liability company with its
`
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI OpCo,
`
`L.L.C. is a wholly owned subsidiary of OpenAI, Inc. that is operated for profit. OpenAI, Inc. controls
`
`OpenAI OpCo, L.L.C. directly and through the other OpenAI entities.
`17.
`
`Defendant OpenAI GP, L.L.C. (“OpenAI GP”) is a Delaware limited liability company
`
`with its principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI GP is
`
`the general partner of OpenAI, L.P. OpenAI GP manages and operates the day-to-day business and
`
`affairs of OpenAI, L.P. OpenAI GP was aware of the unlawful conduct alleged herein and exercised
`
`control over OpenAI, L.P. throughout the Class Period. OpenAI, Inc. directly controls OpenAI GP.
`18.
`
`Defendant OpenAI Startup Fund I, L.P. (“OpenAI Startup Fund I”) is a Delaware
`
`limited partnership with its principal place of business located at 3180 18th Street, San Francisco, CA
`
`94110. OpenAI Startup Fund I was instrumental in the foundation of OpenAI, L.P., including the
`
`creation of its business strategy and providing initial funding. OpenAI Startup Fund I was aware of the
`
`unlawful conduct alleged herein and exercised control over OpenAI, L.P. throughout the Class Period.
`
`
`
`
`
`3
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 5 of 17
`
`
`
`19.
`
`Defendant OpenAI Startup Fund GP I, L.L.C. (“OpenAI Startup Fund GP I”) is a
`
`Delaware limited liability company with its principal place of business located at 3180 18th Street, San
`
`Francisco, CA 94110. OpenAI Startup Fund GP I is the general partner of OpenAI Startup Fund I.
`
`OpenAI Startup Fund GP I is a party to the unlawful conduct alleged herein. OpenAI Startup Fund GP
`
`I manages and operates the day-to-day business and affairs of OpenAI Startup Fund I.
`20. Defendant OpenAI Startup Fund Management, LLC (“OpenAI Startup Fund
`
`Management”) is a Delaware limited liability company with its principal place of business located at
`
`3180 18th Street, San Francisco, CA 94110. OpenAI Startup Fund Management is a party to the
`
`unlawful conduct alleged herein. OpenAI Startup Fund Management was aware of the unlawful
`
`conduct alleged herein and exercised control over OpenAI, L.P. throughout the Class Period.
`IV. AGENTS AND CO-CONSPIRATORS
`
`21.
`
`The unlawful acts alleged against the Defendants in this class action complaint were
`
`authorized, ordered, or performed by the Defendants’ respective officers, agents, employees,
`
`representatives, or shareholders while actively engaged in the management, direction, or control of the
`
`Defendants’ businesses or affairs. The Defendants’ agents operated under the explicit and apparent
`
`authority of their principals. Each Defendant, and its subsidiaries, affiliates, and agents operated as a
`
`single unified entity.
`22.
`
`Various persons and/or firms not named as Defendants may have participated as co-
`
`conspirators in the violations alleged herein and may have performed acts and made statements in
`
`furtherance thereof. Each acted as the principal, agent, or joint venture of, or for other Defendants with
`
`respect to the acts, violations, and common course of conduct alleged herein.
`V.
`
`FACTUAL ALLEGATIONS
`
`23.
`
`OpenAI creates and sells artificial-intelligence software products. Artificial intelligence is
`
`commonly abbreviated “AI.” AI software is designed to algorithmically simulate human reasoning or
`
`inference, often using statistical methods.
`24.
`
`Certain AI products created and sold by OpenAI are known as large language models. A
`
`large language model (or “LLM” for short) is AI software designed to parse and emit natural language.
`
`Though a large language model is a software program, it is not created the way most software programs
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`
`
`
`
`4
`COMPLAINT
`
`
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 6 of 17
`
`
`
`are—that is, by human software engineers writing code. Rather, a large language model is “trained” by
`
`copying massive amounts of text from various sources and feeding these copies into the model. This
`
`corpus of input material is called the training dataset. During training, the large language model copies
`
`each piece of text in the training dataset and extracts expressive information from it. The large language
`
`model progressively adjusts its output to more closely resemble the sequences of words copied from
`
`the training dataset. Once the large language model has copied and ingested all this text, it is able to
`
`emit convincing simulations of natural written language as it appears in the training dataset.
`25. Much of the material in OpenAI’s training datasets, however, comes from copyrighted
`
`works—including books written by Plaintiffs—that were copied by OpenAI without consent, without
`
`credit, and without compensation.
`26.
`
`Authors, including Plaintiffs, publish books with certain copyright management
`
`information. This information includes the book’s title, the ISBN number or copyright number, the
`
`author’s name, the copyright holder’s name, and terms and conditions of use. Most commonly, this
`
`information is found on the back of the book’s title page and is customarily included in all books,
`
`regardless of genre.
`27.
`
`OpenAI has released a series of large language models, including GPT-1 (released June
`
`2018), GPT-2 (February 2019), GPT-3 (May 2020), GPT-3.5 (March 2022), and most recently GPT-4
`
`(March 2023). “GPT” is an abbreviation for “generative pre-trained transformer,” where pre-trained
`
`refers to the use of textual material for training, generative refers to the model’s ability to emit text, and
`
`transformer refers to the underlying training algorithm. Together, OpenAI’s large language models will
`
`be referred to as the “OpenAI Language Models.”
`28. Many kinds of material have been used to train large language models. Books, however,
`
`have always been a key ingredient in training datasets for large language models because books offer the
`
`best examples of high-quality longform writing.
`29.
`
`For instance, in its June 2018 paper introducing GPT-1 (called “Improving Language
`
`Understanding by Generative Pre-Training”), OpenAI revealed that it trained GPT-1 on BookCorpus,
`
`a collection of “over 7,000 unique unpublished books from a variety of genres including Adventure,
`
`Fantasy, and Romance.” OpenAI confirmed why a dataset of books was so valuable: “Crucially, it
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`
`
`
`
`5
`COMPLAINT
`
`
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 7 of 17
`
`
`
`contains long stretches of contiguous text, which allows the generative model to learn to condition on
`
`long-range information.” Hundreds of large language models have been trained on BookCorpus,
`
`including those made by OpenAI, Google, Amazon, and others.
`30.
`
`BookCorpus, however, is a controversial dataset. It was assembled in 2015 by a team of
`
`AI researchers for the purpose of training language models. They copied the books from a website
`
`called Smashwords that hosts self-published novels, that are available to readers at no cost. Those
`
`novels, however, are largely under copyright. They were copied into the BookCorpus dataset without
`
`consent, credit, or compensation to the authors.
`31.
`
`OpenAI also copied many books while training GPT-3. In the July 2020 paper
`
`introducing GPT-3 (called “Language Models are Few-Shot Learners”), OpenAI disclosed that 15% of
`
`the enormous GPT-3 training dataset came from “two internet-based books corpora” that OpenAI
`
`simply called “Books1” and “Books2”.
`32.
`
`Tellingly, OpenAI has never revealed what books are part of the Books1 and Books2
`
`datasets. Though there are some clues. First, OpenAI admitted these are “internet-based books
`
`corpora.” Second, both Books1 and Books2 are apparently much larger than BookCorpus. Based on
`
`numbers given in OpenAI’s paper about GPT-3, Books1 is apparently about nine times larger; Books2
`
`is about 42 times larger. Since BookCorpus contained about 7,000 titles, this suggests Books1 would
`
`contain about 63,000 titles; Books2 would contain about 294,000 titles.
`33.
`
`But there are only a handful of “internet-based books corpora” that would be able to
`
`deliver this much material.
`34.
`
`As noted in Paragraph 32, supra, the OpenAI Books1 dataset can be estimated to contain
`
`about 63,000 titles. Project Gutenberg is an online archive of e-books whose copyright has expired. In
`
`September 2020, Project Gutenberg claimed to have “over 60,000” titles. Project Gutenberg has long
`
`been popular for training AI systems due to the lack of copyright. In 2018, a team of AI researchers
`
`created the “Standardized Project Gutenberg Corpus,” which contained “more than 50,000 books.”
`
`On information and belief, the OpenAI Books1 dataset is based on either the Standardized Project
`
`Gutenberg Corpus or Project Gutenberg itself, because of the roughly similar sizes of the two datasets.
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`
`
`
`
`6
`COMPLAINT
`
`
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 8 of 17
`
`
`
`35.
`
`As noted in Paragraph 32, supra, the OpenAI Books2 dataset can be estimated to contain
`
`about 294,000 titles. The only “internet-based books corpora” that have ever offered that much
`
`material are notorious “shadow library” websites like Library Genesis (aka LibGen), Z-Library (aka B-
`
`ok), Sci-Hub, and Bibliotik. The books aggregated by these websites have also been available in bulk via
`
`torrent systems. These flagrantly illegal shadow libraries have long been of interest to the AI-training
`
`community: for instance, an AI training dataset published in December 2020 by EleutherAI called
`
`“Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000 books. On
`
`information and belief, the OpenAI Books2 dataset includes books copied from these “shadow
`
`libraries,” because those are the most sources of trainable books most similar in nature and size to
`
`OpenAI’s description of Books2.
`36.
`
`In March 2023, OpenAI’s paper introducing GPT-4 contained no information about its
`
`dataset at all: OpenAI claimed that “[g]iven both the competitive landscape and the safety implications
`
`of large-scale models like GPT-4, this report contains no further details about . . . dataset
`
`construction.” Later in the paper, OpenAI concedes it did “filter[ ] our dataset . . . to specifically
`
`reduce the quantity of inappropriate erotic text content.”
`A.
`
`Interrogating the OpenAI Language Models using ChatGPT
`37.
`
`ChatGPT is a language model created and sold by OpenAI. As its name suggests,
`
`ChatGPT is designed to offer a conversational style of interaction with a user. OpenAI offers ChatGPT
`
`through a web interface to individual users for $20 per month. Through the web interface, users can
`
`choose to use two versions of ChatGPT: one based on the GPT-3.5 model, and one based on the newer
`
`GPT-4 model.
`38.
`
`OpenAI also offers ChatGPT to software developers through an application-
`
`programming interface (or “API”). The API allows developers to write programs that exchange data
`
`with ChatGPT. Access to ChatGPT via the API is billed on the basis of usage.
`39.
`
`Regardless of how accessed—either through the web interface or through the API—
`
`ChatGPT allows users to enter text prompts, which ChatGPT then attempts to respond to in a natural
`
`way, i.e., ChatGPT can generate answers in a coherent and fluent way that closely mimics human
`
`language. If a user prompts ChatGPT with a question, ChatGPT will answer. If a user prompts
`
`
`
`
`
`7
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 9 of 17
`
`
`
`ChatGPT with a command, ChatGPT will obey. If a user prompts ChatGPT to summarize a
`
`copyrighted book, it will do so.
`40.
`
`ChatGPT’s output, like other LLMs, relies on the data upon which it is trained to
`
`generate new content. LLMs generate output based on patterns and connections drawn from the
`
`training data. For example, if an LLM is prompted to generate a writing in the style of a certain author,
`
`the LLM would generate content based on patterns and connections it learned from analysis of that
`
`author’s work within its training data.
`41.
`
`On information and belief, the reason ChatGPT can accurately summarize a certain
`
`copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI
`
`Language Model (either GPT-3.5 or GPT-4) as part of its training data.
`42. When ChatGPT was prompted to summarize books written by each of the Plaintiffs, it
`
`generated very accurate summaries. These summaries are attached as Exhibit B. The summaries get
`
`some details wrong. This is expected, since a large language model mixes together expressive material
`
`derived from many sources. Still, the rest of the summaries are accurate, which means that ChatGPT
`
`retains knowledge of particular works in the training dataset and is able to output similar textual
`
`content. At no point did ChatGPT reproduce any of the copyright management information Plaintiffs
`
`included with their published works.
`VI. CLASS ALLEGATIONS
`
`A.
`
`Class Definition
`43.
`
`Plaintiffs bring this action for damages and injunctive relief as a class action under
`
`Federal Rules of Civil Procedure 23(a), 23(b)(2), and 23(b)(3), on behalf of the following Class:
`
`All persons or entities domiciled in the United States that own a
`United States copyright in any work that was used as training data
`for the OpenAI Language Models during the Class Period.
`
`This Class definition excludes:
`a.
`b.
`c.
`
`any of the Defendants named herein;
`
`any of the Defendants’ co-conspirators;
`
`any of Defendants’ parent companies, subsidiaries, and affiliates;
`
`8
`COMPLAINT
`
`
`
`44.
`
`
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 10 of 17
`
`
`
`B.
`
`d.
`
`e.
`f.
`
`any of Defendants’ officers, directors, management, employees, subsidiaries,
`
`affiliates, or agents;
`
`all governmental entities; and
`
`the judges and chambers staff in this case, as well as any members of their
`
`immediate families.
`
`Numerosity
`45.
`
`Plaintiffs do not know the exact number of members in the Class. This information is in
`
`the exclusive control of Defendants. On information and belief, there are at least thousands of members
`
`in the Class geographically dispersed throughout the United States. Therefore, joinder of all members
`
`of the Class in the prosecution of this action is impracticable.
`C.
`
`Typicality
`46.
`
`Plaintiffs’ claims are typical of the claims of other members of the Class because
`
`Plaintiffs and all members of the Class were damaged by the same wrongful conduct of Defendants as
`
`alleged herein, and the relief sought herein is common to all members of the Class.
`D.
`
`Adequacy
`47.
`
`Plaintiffs will fairly and adequately represent the interests of the members of the Class
`
`because the Plaintiffs have experienced the same harms as the members of the Class and have no
`
`conflicts with any other members of the Class. Furthermore, Plaintiffs have retained sophisticated and
`
`competent counsel who are experienced in prosecuting federal and state class actions, as well as other
`
`complex litigation.
`E.
`
`Commonality and Predominance
`48. Numerous questions of law or fact common to each Class arise from Defendants’
`
`conduct:
`
`
`
`
`
`a. whether Defendants violated the copyrights of Plaintiffs and the Class when they
`
`downloaded copies of Plaintiffs’ copyrighted books and used them to train ChatGPT;
`b. whether ChatGPT itself is an infringing derivative work based on Plaintiffs’ copyrighted
`
`books;
`
`9
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 11 of 17
`
`
`
`c. whether the text outputs of ChatGPT are infringing derivative works based on Plaintiffs’
`
`copyrighted books;
`d. whether Defendants violated the DMCA by removing copyright-management
`
`information (CMI) from Plaintiffs’ copyrighted books.
`e. Whether Defendants were unjustly enriched by the unlawful conduct alleged herein.
`f. Whether Defendants’ conduct alleged herein constitutes Unfair Competition under
`
`California Business and Professions Code section 17200 et seq.
`g. Whether this Court should enjoin Defendants from engaging in the unlawful conduct
`
`alleged herein. And what the scope of that injunction would be.
`h. Whether any affirmative defense excuses Defendants’ conduct.
`i. Whether any statutes of limitation limits Plaintiffs’ and the Class’s potential for recovery.
`49.
`
`These and other questions of law and fact are common to the Class predominate over
`
`any questions affecting the members of the Class individually.
`F.
`
`Other Class Considerations
`50. Defendants have acted on grounds generally applicable to the Class. This class action is
`
`superior to alternatives, if any, for the fair and efficient adjudication of this controversy. Prosecuting the
`
`claims pleaded herein as a class action will eliminate the possibility of repetitive litigation. There will be
`
`no material difficulty in the management of this action as a class action. Further, final injunctive relief is
`
`appropriate with respect to the Class as a whole.
`51.
`
`The prosecution of separate actions by individual Class members would create the risk
`
`of inconsistent or varying adjudications, establishing incompatible standards of conduct for
`
`Defendants.
`
`52.
`53.
`
`VII. CLAIMS FOR RELIEF
`COUNT I
`Direct Copyright Infringement
`17 U.S.C. § 106
`On Behalf of Plaintiffs and the Class
`Plaintiffs incorporate by reference the preceding factual allegations.
`
`As the owners of the registered copyrights in books used to train the OpenAI Language
`
`Models, Plaintiffs hold the exclusive rights to those texts under 17 U.S.C. § 106.
`
`
`
`
`
`10
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 12 of 17
`
`
`
`54.
`
`Plaintiffs never authorized OpenAI to make copies of their books, make derivative
`
`works, publicly display copies (or derivative works), or distribute copies (or derivative works). All those
`
`rights belong exclusively to Plaintiffs under copyright law.
`55.
`
`On information and belief, to train the OpenAI Language Models, OpenAI relied on
`
`harvesting mass quantities of textual material from the public internet, including Plaintiffs’ books,
`
`which are available in digital formats.
`56.
`
`OpenAI made copies of Plaintiffs’ books during the training process of the OpenAI
`
`Language Models without Plaintiffs’ permission. Specifically, OpenAI copied at least Plaintiff
`
`Silverman’s book The Bedwetter; Plaintiff Golden’s book Ararat; and Plaintiff Kadrey’s book Sandman
`
`Slime. Together, these books are referred to as the Infringed Works.
`57.
`
`Because the OpenAI Language Models cannot function without the expressive
`
`information extracted from Plaintiffs’ works (and others) and retained inside them, the OpenAI
`
`Language Models are themselves infringing derivative works, made without Plaintiffs’ permission and
`
`in violation of their exclusive rights under the Copyright Act.
`58.
`
`Plaintiffs have been injured by OpenAI’s acts of direct copyright infringement. Plaintiffs
`
`are entitled to statutory damages, actual damages, restitution of profits, and other remedies provided
`
`by law.
`
`59.
`60.
`
`COUNT 2
`Vicarious Copyright Infringement
`17 U.S.C. § 106
`On Behalf of Plaintiffs and the Class
`Plaintiffs incorporate by reference the preceding factual allegations.
`
`Because the output of the OpenAI Language Models is based on expressive information
`
`extracted from Plaintiffs’ works (and others), every output of the OpenAI Language Models is an
`
`infringing derivative work, made without Plaintiffs’ permission and in violation of their exclusive rights
`
`under the Copyright Act.
`61.
`
`OpenAI has the right and ability to control the output of the OpenAI Language Models.
`
`OpenAI has benefited financially from the infringing output of the OpenAI Language Models.
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`
`
`
`
`11
`COMPLAINT
`
`
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 13 of 17
`
`
`
`Therefore, every output from the OpenAI Language Models constitutes an act of vicarious copyright
`
`infringement.
`62.
`
`Plaintiffs have been injured by OpenAI’s acts of vicarious copyright infringement.
`
`Plaintiffs are entitled to statutory damages, actual damages, restitution of profits, and other remedies
`
`provided by law.
`
`COUNT 3
`Digital Millennium Copyright Act—
`Removal of Copyright Management Information
`17 U.S.C. § 1202(b)
`On Behalf of Plaintiffs and the Class
`
`Plaintiffs incorporate by reference the preceding factual allegations.
`
`Plaintiffs included one or more forms of copyright-management information (“CMI”)
`
`63.
`64.
`
`in each of the Plaintiffs’ Infringed Works, including: copyright notice, title and other identifying
`
`information, the name or other identifying information about the owners of each book, terms and
`
`conditions of use, and identifying numbers or symbols referring to CMI.
`65. Without the authority of Plaintiffs and the Class, OpenAI copied the Plaintiffs’
`
`Infringed Works and used them as training data for the OpenAI Language Models. By design, the
`
`training process does not preserve any CMI. Therefore, OpenAI intentionally removed CMI from the
`
`Plaintiffs’ Infringed Works in violation of 17 U.S.C. § 1202(b)(1).
`66. Without the authority of Plaintiffs and the Class, Defendants created derivative works
`
`based on Plaintiffs’ Infringed Works. By distributing these works without their CMI, OpenAI violated
`
`17 U.S.C. § 1202(b)(3).
`67.
`
`OpenAI knew or had reasonable grounds to know that this removal of CMI would
`
`facilitate copyright infringement by concealing the fact that every output from the OpenAI Language
`
`Models is an infringing derivative work, synthesized entirely from expressive information found in the
`
`training data.
`68.
`
`Plaintiffs have been injured by OpenAI’s removal of CMI. Plaintiffs are entitled to
`
`statutory damages, actual damages, restitution of profits, and other remedies provided by law.
`
`
`
`
`
`
`
`12
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW Document 1 Filed 07/07/23 Page 14 of 17
`
`
`
`COUNT 4
`Unfair Competition
`Cal. Bus. & Prof. Code §§ 17200, et seq.
`On Behalf of Plaintiffs and the Class
`
`69.
`Plaintiffs incorporate by reference the preceding factual allegations.
`70. Defendants have engaged in unlawful business practices, including violating Plaintiffs’
`
`rights under the DMCA, and using Plaintiffs’ Infringed Works to train ChatGPT without Plaintiffs’ or
`
`the Class’s authorization.
`71.
`
`The unlawful business practices described herein violate California Business and
`
`Professions Code section 17200 et seq. (the “UCL”) because that conduct is otherwise unlawful by
`
`violating the DMCA.
`72.
`
`The unlawful business practices described herein violate the UCL because they are
`
`unfair, immoral, unethical, oppressive, unscrupulous or injurious to consumers, because, among other
`
`reasons, Defendants used Plaintiffs’ protected works to train ChatGPT for Defendants’ own
`
`commercial profit without Plaintiffs’ and the Class’s authorization. Defendants further knowingly
`
`designed ChatGPT to output portions or summaries of Plaintiffs’ copyrighted works without
`
`attribution, and they unfairly profit from and take credit for developing a commercial product based on
`
`unattributed reproductions of those stolen writing and ideas.
`73.
`
`The unlawful business practices described herein violate the UCL because consumers
`
`are likely to be deceived. Defendants knowingly and secretively trained ChatGPT on unauthorized
`
`copies of Plaintiffs’ copyright-protected work. Further Defendants deceptively designed ChatGPT to
`
`output without any CMI or other credit to Plaintiffs and Class members whose Infringed Works
`
`comprise ChatGPT’s training dataset. Defendants deceptively marketed their product in a manner that
`
`fails to attribute the success of their product to the copyright-protected work on which it is based.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`13
`COMPLAINT
`
`
`
`1 2 3 4 5 6 7 8 9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`

`

`Case 4:23-cv-03416-KAW

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket