`
`
`
`
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 1 of 24
`
`
`
`
`DANIEL J. MULLER, SBN 193396
`dmuller@venturahersey.com
`VENTURA HERSEY & MULLER, LLP
`1506 Hamilton Avenue
`San Jose, California 95125
`Telephone: (408) 512-3022
`Facsimile: (408) 512-3023
`
`Attorneys for Plaintiff and the Class
`
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA – SAN FRANCISCO DIVISION
`
`
`Case No.
`
`CLASS ACTION COMPLAINT
`
`CLASS ACTION
`
`
`
`
`JURY TRIAL DEMANDED
`
`MICHAEL CHABON, DAVID HENRY
`HWANG, MATTHEW KLAM, RACHEL
`LOUISE SNYDER, AND AYELET
`WALDMAN,
`
`individually and on behalf of all others
`similarly situated,
`
`
`
`Plaintiffs,
`
`
`v.
`
`OPENAI, INC., OPENAI, L.P., OPENAI
`OPCO, LLC, OPENAI GP LLC, OPENAI
`STARTUP FUND GP I, LLC, OPENAI
`STARTUP FUND I, LP, and OPENAI
`STARTUP FUND MANAGEMENT, LLC,
`
`
`
`Defendants.
`
`
`
`
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 2 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`Plaintiffs Michael Chabon, David Henry Hwang, Matthew Klam, Rachel Louise Snyder,
`and Ayelet Waldman (“Plaintiffs”), individually and on behalf of all others similarly situated,
`bring this action against Defendants OpenAI, Inc., OpenAI, LP, OpenAI OpCo, LLC, OpenAI
`GP LLC, OpenAI Startup Fund I, LP, OpenAI Startup Fund GP I, LLC, and OpenAI Startup
`Fund Management, LLC (collectively, “Defendants” or “OpenAI”). Plaintiffs allege as follows
`based upon personal knowledge as to themselves and their own acts, and upon information and
`belief as to all other matters:
`
`NATURE OF ACTION
`This is a class action lawsuit brought by Plaintiffs on behalf of themselves and a
`1.
`Class of authors holding copyrights in their published works arising from OpenAI’s clear
`infringement of their intellectual property.
`OpenAI is a research company specializing in the development of artificial
`2.
`intelligence (“AI”) products, such as ChatGPT.
`ChatGPT is an AI chatbot, which produces responses to users’ text queries or
`3.
`prompts in a way that mimics human conversation.
`ChatGPT relies on other OpenAI products to function, namely Generative Pre-
`4.
`trained Transformer (“GPT”) models. “Generative,” in GPT, represents the model’s ability to
`respond to text inquiries, while “Pre-trained” refers to the model’s use of training datasets to
`program its responses, and “Transformer” concerns the model’s underlying algorithm allowing
`it to function.
`OpenAI has released five versions of GPT models, and the current version of
`5.
`ChatGPT runs on GPT-3.5 and GPT-4, depending on whether the user has subscribed to the
`premium version of ChatGPT. Only the version of ChatGPT that runs on GPT-3.5 is available
`at no cost to the public.
`OpenAI’s GPT models are types of “large language model,” which is a form of
`6.
`deep-learning algorithm programmed through “training datasets,” consisting of massive
`amounts of text data copied from the internet by OpenAI. The GPT models extract information
`from their training datasets in order to learn the statistical relationships between words, phrases,
`
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 3 of 24
`
`
`
`and sentences, which allow them to generate coherent and contextually relevant responses to
`user prompts or queries.
`A large language model’s responses to user prompts or queries are entirely and
`7.
`uniquely dependent on the text contained in its training dataset, necessarily processing and
`analyzing the information contained in its training dataset to generate responses.
`OpenAI incorporated Plaintiffs’ and Class members’ copyrighted works in
`8.
`datasets used to train its GPT models powering its ChatGPT product. Indeed, when ChatGPT is
`prompted, it generates not only summaries, but in-depth analyses of the themes present in
`Plaintiffs’ copyrighted works, which is only possible if the underlying GPT model was trained
`using Plaintiffs’ works.
`Plaintiffs and Class members did not consent to the use of their copyrighted
`9.
`works as training material for GPT models or for use with ChatGPT.
`Defendants, by and through their operation of ChatGPT, benefit commercially
`10.
`and profit handsomely from their unauthorized and illegal use of Plaintiffs’ and Class members’
`copyrighted works.
`
`JURISDICTION AND VENUE
`This Court has subject matter jurisdiction of this action pursuant to 28 U.S.C. §
`11.
`1331 because this case arises under the Copyright Act (17 U.S.C. § 501) and the Digital
`Millennium Copyright Act (17 U.S.C. § 1202).
`This Court has personal jurisdiction over Defendants pursuant to 18 U.S.C.
`12.
`§§ 1965(b) & (d), because they maintain their principal places of business in, and are thus
`residents of, this judicial district, maintain minimum contacts with the United States, this judicial
`district, and this State, and they intentionally avail themselves of the laws of the United States
`and this state by conducting a substantial amount of business in California. For these same
`reasons, venue properly lies in this District pursuant to 28 U.S.C. §§ 1391(a), (b) and (c).
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`1
`CLASS ACTION COMPLAINT
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 4 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`PARTIES
`
`A.
`
`Plaintiffs
`Plaintiff Michael Chabon (“Plaintiff Chabon”) is a resident of California.
`13.
`Plaintiff Chabon is an author who owns registered copyrights in many works, including but not
`limited to, The Mysteries of Pittsburgh, Wonder Boys, The Amazing Adventures of Kavalier &
`Clay, the Yiddish Policemen’s Union, Gentlemen of the Road, Telegraph Avenue, Fight of the
`Century, Kingdom of Olive and Ash, and Moonglow. Plaintiff Chabon is the recipient of the
`Pulitzer Prize for Fiction, Hugo, Nebula, Los Angeles Times Book Prize, and the National
`Jewish Book Award, among many others achieved over the span of a writing career spanning
`more than 30 years. Plaintiff Chabon’s works include copyright-management information that
`provides information about the copyrighted work, including the title of the work, its ISBN or
`copyright registration number, the name of the author, and the year of publication.
`Plaintiff David Henry Hwang (“Plaintiff Hwang”) is a resident of New York.
`14.
`Plaintiff Hwang is a playwright and screenwriter who owns registered copyrights in many
`works, including but not limited to, M. Butterfly, Chinglish, Yellow Face, the Dance and the
`Railroad, and FOB, as well as the Broadway musical, Flower Drum Song (2002 revival).
`Plaintiff Hwang is a Tony Award winner and three-time nominee, a Grammy Award winner
`who has been twice nominated, a three-time OBIE Award winner, and a three-time finalist for
`the Pulitzer Prize in Drama. Plaintiff Hwang’s works include copyright-management
`information that provides information about the copyrighted work, including the title of the
`work, its ISBN or copyright registration number, the name of the author, and the year of
`publication.
`Plaintiff Matthew Klam (“Plaintiff Klam”) is a resident of Washington D.C.
`15.
`Plaintiff Klam is an author who owns registered copyrights in several works, including but not
`limited to, Who is Rich?, and Sam the Cat and Other Stories. Plaintiff Klam is a recipient of a
`Guggenheim Fellowship, a Robert Bingham/PEN Award, a Whiting Writer’s Award, and a
`National Endowment of the Arts. Plaintiff Klam’s works have been selected as Notable Books
`of the year by The New York Times, The Los Angeles Times, the Kansas City Star, and the
`2
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 5 of 24
`
`
`
`Washington Post. Plaintiff Klam’s works include copyright-management information that
`provides information about the copyrighted work, including the title of the work, its ISBN or
`copyright registration number, the name of the author, and the year of publication.
`Plaintiff Rachel Louise Snyder (“Plaintiff Snyder”) is a resident of Washington,
`16.
`D.C. Plaintiff Snyder is an author who owns registered copyrights in many works, including but
`not limited to, Women We Buried, Women We Burned, No Visible Bruises – What We Don’t
`Know About Domestic Violence Can Kill Us, What We’ve Lost is Nothing, and Fugitive Denim:
`A Moving Story of People and Pants in the Borderless World of Global Trade. Plaintiff Snyder
`is a Guggenheim fellow and the recipient of the J. Anthony Lukas Work-in-Progress Award, the
`Hillman Prize, and the Helen Bernstein Book Award, and was a finalist for the National Book
`Critics Circle Award, Los Angeles Times Book Prize, and Kirkus Award. Her work has appeared
`in The New Yorker, The New York Times, Slate, and in many other publications. Plaintiff
`Snyder’s works include copyright-management information that provides information about the
`copyrighted work, including the title of the work, its ISBN or copyright registration number, the
`name of the author, and the year of publication.
`Plaintiff Ayelet Waldman (“Plaintiff Waldman”) is a resident of California.
`17.
`Plaintiff Waldman is an author and screen and television writer who owns registered copyrights
`in several works, including but not limited to, Love and other Impossible Pursuits, Red Hook
`Road, Love and Treasure, Bad Mother, Daughter’s Keeper, A Really Good Day, Fight of the
`Century, and Kingdom of Olives and Ash. Plaintiff Waldman has been nominated for an Emmy
`and a Golden Globe and is the recipient of numerous awards including a Peabody, AFI award,
`and a Pen Award, among others. Plaintiff Waldman’s works include copyright-management
`information that provides information about the copyrighted work, including the title of the
`work, its ISBN or copyright registration number, the name of the author, and the year of
`publication.
`At all times relevant hereto, Plaintiffs have been and remain the holders of the
`18.
`exclusive rights under the Copyright Act of 1976 (17 U.S.C. §§ 101, et seq. and all amendments
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`3
`CLASS ACTION COMPLAINT
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 6 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`thereto) to reproduce, distribute, display, or license the reproduction, distribution, and/or display
`the works identified in paragraphs 13-17, supra.
`Defendants
`B.
`Defendant OpenAI, Inc. is a Delaware nonprofit corporation with its principal
`19.
`place of business located at 3180 18th St., San Francisco, CA 94110.
`Defendant OpenAI, LP is a Delaware limited partnership with its principal place
`20.
`of business located at 3180 18th St., San Francisco, CA 94110. OpenAI, LP is a wholly owned
`subsidiary of OpenAI, Inc. that is operated for profit. OpenAI, Inc. controls OpenAI, LP directly
`and through the other OpenAI entities.
`Defendant OpenAI OpCo, LLC is a Delaware limited liability company with its
`21.
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI
`OpCo, LLC is a wholly owned subsidiary of OpenAI, Inc. that is operated for profit. OpenAI,
`Inc. controls OpenAI OpCo, LLC directly and through the other OpenAI entities.
`Defendant OpenAI GP, LLC is a Delaware limited liability company with its
`22.
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI GP,
`LLC is a general partner of OpenAI, LP. OpenAI GP manages and operates the day-to-day
`business and affairs of OpenAI, LP. OpenAI GP was aware of the unlawful conduct alleged
`herein and exercised control over OpenAI, LP throughout the Class Period. OpenAI, Inc. directly
`controls OpenAI GP.
`Defendant OpenAI Startup Fund I, LP is a Delaware limited partnership with its
`23.
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI
`Startup Fund I, LP was instrumental in the foundation of OpenAI, LP, including the creation of
`its business strategy and providing initial funding. OpenAI Startup Fund I was aware of the
`unlawful conduct alleged herein and exercised control over OpenAI, LP throughout the Class
`Period.
`Defendant OpenAI Startup Fund GP I, LLC is a Delaware limited liability
`24.
`company with its principal place of business located at 3180 18th Street, San Francisco, CA
`94110. OpenAI Startup Fund GP I, LLC is the general partner of OpenAI Startup Fund I.
`4
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 7 of 24
`
`
`
`A.
`
`OpenAI Startup Fund GP I is a party to the unlawful conduct alleged herein. OpenAI Startup
`Fund GP I manages and operates the day-to-day business and affairs of OpenAI Startup Fund I.
`Defendant OpenAI Startup Fund Management, LLC is a Delaware limited
`25.
`liability company with its principal place of business located at 3180 18th Street, San Francisco,
`CA 94110. OpenAI Startup Fund Management, LLC is a party to the unlawful conduct herein.
`OpenAI Startup Fund Management was aware of the unlawful conduct alleged herein and
`exercised control over OpenAI, LP throughout the Class Period.
`FACTUAL ALLEGATIONS
`OpenAI’s Artificial Intelligence Products
`OpenAI researches, develops, releases, and maintains AI products with the
`26.
`intention that its products “benefit all of humanity.”1
`ChatGPT is among the products OpenAI has developed, engineered, released,
`27.
`and maintained, which utilizes another OpenAI product, GPT models, to respond to text prompts
`and queries in a natural, coherent, and fluent way through a web interface.
`OpenAI has released a series of upgrades to its GPT model, including GPT-1
`28.
`(released June 2018), GPT-2 (February 2019), GPT-3 (May 2020), GPT-3.5 (March 2022), and
`most recently, GPT-4 (March 2023)2.
`The current version of ChatGPT utilizes both GPT-3.5 and GPT-4; however, the
`29.
`version of ChatGPT that allows users to choose between using GPT-3.5 and GPT-4 is only
`available to subscribers at a cost of $20 per month. Otherwise, users are only able to access the
`version of ChatGPT that relies on the GPT-3.5 model.3
`OpenAI makes ChatGPT available to software developers through an
`30.
`application-programming interface (“API”), which allows developers to write software
`
`
`1 About, OpenAI, https://openai.com/about
`2 Fawad Ali, GPT-1 to GPT-4: Each of OpenAI’s GPT Models Explained and Compared,
`Make Use Of (Apr. 11, 2023) https://www.makeuseof.com/gpt-models-explained-and-
`compared/
`3 Introducing ChatGPT Plus, OpenAI (Feb. 1, 2023) https://openai.com/blog/chatgpt-plus
`5
`
`CLASS ACTION COMPLAINT
`
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 8 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`programs that exchange data with ChatGPT.4 OpenAI charges developers for access to ChatGPT
`by the API on the basis of usage.
`OpenAI Uses Copyrighted Works in its Training Datasets
`B.
`As mentioned in paragraph 6, supra, OpenAI pre-trains its GPT models using a
`31.
`dataset consisting of various sources and content types, including books, plays, articles, and
`webpage and other written works, to respond accurately to users’ prompts and queries.
`OpenAI has admitted that, of all sources and content types that can be used to
`32.
`train the GPT models, written works, plays and articles are valuable training material because
`they offer the best examples of high-quality, long form writing and “contain[] long stretches of
`contiguous text, which allows the generative model to learn to condition on long-range
`information.”5
`Upon information and belief, OpenAI builds the dataset it uses to train its GPT
`33.
`models by scraping the internet for text data.
`34. While casting a wide net across the internet to capture the most comprehensive
`set of content available allows OpenAI to better train its GPT models, this practice necessarily
`leads OpenAI to capture, download, and copy copyrighted written works, plays and articles.
`Among the content OpenAI has scraped from the internet to construct its training
`35.
`datasets are Plaintiffs’ copyrighted works.
`In its June 2018 paper introducing the GPT-1 model, Improving Language
`36.
`Understanding by Generative Pre-Training, OpenAI revealed that it trained the GPT-1 model
`using two datasets: “Common Crawl,” which is a massive dataset of web pages containing
`billions of words, and “BookCorpus,” which is a collection of “over 7,000 unique unpublished
`books from a variety of genres including Adventure, Fantasy, and Romance.”6
`
`
`4 OpenAI API, OpenAI (June 11, 2020) https://openai.com/blog/openai-api
`5 Alec Radford, Improving Language Understanding by Generative-Pre-Training, OpenAI
`(June 11, 2018).
`6 Id.; see also Fawad Ali, GPT-1 to GPT-4: Each of OpenAI’s GPT Models Explained and
`Compared, Make Use Of (Apr. 11, 2023) https://www.makeuseof.com/gpt-models-explained-
`and-compared/
`
`
`
`
`
`6
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 9 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`BookCorpus is a controversial dataset, assembled in 2015 by a team of AI
`37.
`researchers funded by Google and Samsung for the sole purpose of training language models
`like GPT by copying written works from a website called Smashwords, which hosts self-
`published novels, making them available to readers at no cost.7 Despite those novels being
`largely under copyright, they were copied into the BookCorpus dataset without consent, credit,
`or compensation to the authors.8
`OpenAI also copied many books while training GPT-3. In the July 2020 paper
`38.
`introducing GPT-3, Language Models are Few-Shot Learners, OpenAI disclosed, in addition to
`using the “Common Crawl” and “WebText” datasets that capture web pages, 16% of the GPT-
`3 training dataset came from “two internet-based book corpora,” which OpenAI simply refers
`to as “Books1” and “Books2.”9
`OpenAI has never revealed what books are part of the Books1 and Books2
`39.
`datasets or how they were obtained. OpenAI has offered a few clues, admitting that these are
`internet-based datasets that are much larger than BookCorpus.10 Based on the figures provided
`in its GPT-3 introductory paper, Books1 is nine times larger than BookCorpus, meaning it
`contains roughly 63,000 titles, and Books2 is 42 times larger, meaning it contains about 294,000
`titles.11
`A limited number of internet-based book corpora exist that contain this much
`40.
`material, meaning there are only a handful of possible sources OpenAI could have used to train
`the GPT-3 model.
`Project Gutenberg is an online archive of e-books whose copyrights have expired.
`41.
`Project Gutenberg has long been popular for training AI systems due to the lack of copyright. In
`2018, a team of AI researchers created the “Standardized Project Gutenberg Corpus,” which
`
`
`7 Jack Bandy, Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning, Medium
`(May 12, 2021) https://towardsdatascience.com/dirty-secrets-of-bookcorpus-a-key-dataset-in-
`machine-learning-6ee2927e8650
`8 Id.
`9 Tom B. Brown, Language Models are Few-Shot Learners, OpenAI (July 22, 2020).
`10 Id. at 9.
`11 Id.
`
`
`
`
`
`7
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 10 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`contained “more than 50,000 books.”12 On that information and belief, the OpenAI Books1
`dataset is based on either the Standardized Project Gutenberg Corpus or Project Gutenberg itself,
`because of the roughly similar sizes of the two datasets.
`As for the Books2 dataset, the only “internet-based books corpora” that have ever
`42.
`offered that much material are infamous “shadow library” websites, like Library Genesis
`(“LibGen”), Z-Library, Sci-Hub, and Bibliotik, which host massive collections of pirated books,
`research papers, and other text-based materials.13 The materials aggregated by these websites
`have also been available in bulk through torrent systems.14
`These illegal shadow libraries have long been of interest to the AI-training
`43.
`community. For instance, an AI training dataset published in December 2020 by EleutherAI
`called “Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000
`books.15 On information and belief, the OpenAI Books2 dataset includes books copied from
`these “shadow libraries,” because those are the sources of trainable books most similar in nature
`and size to OpenAI’s description of Books2.
`44. When OpenAI introduced GPT-4 in March 2023, the introductory paper
`contained no information about the dataset used to train it.16 Instead, OpenAI claims that,
`“[g]iven both the competitive landscape and the safety implications of large-scale models like
`GPT-4, this report contains no further details about . . . dataset construction.”17
`Regarding GPT-4, OpenAI has conceded that it did filter its dataset “to
`45.
`specifically reduce the quantity of inappropriate erotic text content,” implying that it again used
`a large dataset containing text works.18
`OpenAI Unlawfully Infringed Plaintiffs’ Copyrights
`C.
`
`12 Martin Gerlach, et al., A standardized Project Gutenberg corpus for statistical analysis of
`natural language and quantitative linguistics, Cornell University (Dec. 19, 2018),
`https://arxiv.org/pdf/1812.08092.pdf
`13 See Claire Woodcock, ‘Shadow Libraries’ Are Moving Their Pirated Books to The Dark
`Web After Fed Crackdowns, Vice (Nov. 30, 2022).
`14 Id.
`15 See Alex Perry, A giant online book collection Meta used to train its AI is gone over
`copyright issues, Mashable (Aug. 18, 2023).
`16 GPT-4 Technical Report, OpenAI (Mar. 27, 2023).
`17 Id. at 2.
`18 Id. at 61.
`
`
`
`
`
`8
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 11 of 24
`
`
`
`As explained, ChatGPT’s responses to user queries or prompts, like other large
`46.
`language models, rely on the data upon which it is trained to generate responsive content. For
`example, if ChatGPT is prompted to generate a writing in the style of a certain author, GPT
`would generate content based on patterns and connections it learned from analysis of that
`author’s work within its training dataset.
`On information and belief, the reason ChatGPT can generate a writing in the style
`47.
`of a certain author or accurately summarize a certain copyrighted book and provide in-depth
`analysis of that book is because it was copied by OpenAI and copied and analyzed by the
`underlying GPT model as part of its training data.
`48. When ChatGPT is prompted to summarize copyrighted written works authored
`by Plaintiffs, it generates accurate, in-depth summaries and analyses of their works.
`For example, when prompted, ChatGPT accurately summarized Plaintiff
`49.
`Chabon’s novel The Amazing Adventures of Kavalier & Clay. When prompted to identify
`examples of trauma in the Amazing Adventures of Kavalier & Clay, ChatGPT identified six
`specific examples, including how the main character’s “experiences in Europe, including
`witnessing the persecution of Jews and the loss of his family, haunt him throughout the story.”
`When asked to write a paragraph in the style of The Amazing Adventures of Kavalier & Clay,
`ChatGPT generated a passage imitating Plaintiff Chabon’s writing style including references to
`the characters dealing with “the weight of the world at war.” Exhibit A.
`ChatGPT similarly provided in depth summaries and analyses of Plaintiff
`50.
`Hwang’s play, The Dance and the Railroad. For example, when prompted, ChatGPT identified
`five key themes from The Dance and the Railroad, including “art and creativity as a form of
`resistance” and “using art as a form of escape from the harsh realities and dehumanization of
`labor.” Additionally, when prompted to produce a screenplay in the style of The Dance and the
`Railroad, ChatGPT produced a script written in Plaintiff Hwang’s style, which generated a
`screenplay involving a Chinese laborer toiling on the Central Pacific Railroad that “believe[s]
`in the power of art to keep [their] spirits alive.” Exhibit B.
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`9
`CLASS ACTION COMPLAINT
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 12 of 24
`
`
`
`Likewise, ChatGPT provided in depth summaries and analyses of Plaintiff
`51.
`Klam’s works. For example, when prompted, Chat GPT accurately summarized Plaintiff Klam’s
`novel Who is Rich? and correctly analyzed the key relationships between the novel’s central
`character and the other characters in the novel. When asked to identify the main themes in Who
`is Rich? Chat GPT accurately identified seven main themes of the novel including “mid-life
`crisis and identify.” Further, when prompted to write a paragraph in the style of Who is Rich?,
`ChatGPT generated random passages authentically written in Plaintiff Klam’s writing style,
`including a reference to navigating the “treacherous waters of midlife.” Exhibit C.
`In the same vein, after being prompted to summarize Plaintiff Snyder’s book,
`52.
`What We’ve Lost is Nothing, ChatGPT accurately identified themes included within the novel,
`such as “safety, perception, and the fragility of human relationships.” Similarly, once prompted,
`ChatGPT accurately analyzed the theme of safety using a specific example from the text of
`Plaintiff Snyder’s copyrighted work, explaining that “the theme of safety is examined through
`the lens of a series of burglaries that occur in a suburban neighborhood . . . and how these
`incidents affect the characters and their perceptions of the world around them.” ChatGPT was
`also able to generate random passages authentically written in Plaintiff Snyder’s writing style
`when prompted. Exhibit D.
`Additionally, ChatGPT provided in depth summaries and analyses of Plaintiff
`53.
`Waldman’s works. For instance, when prompted to summarize Plaintiff Waldman’s novel Love
`and Other Impossible Pursuits, Chat GPT accurately provided a summary and analysis of the
`novel. When prompted to identify specific instances of grief in Love and other Impossible
`Pursuits, ChatGPT identified five specific instances of grief, including the protagonist Emelia’s
`loss of her infant daughter, a “loss that occurred before the events of the novel and [that] continue
`to haunt Emelia, affecting her emotional state and relationships.” When prompted to write a
`paragraph in the style of Love and Other Impossible Pursuits, ChatGPT generated a paragraph
`imitating Plaintiff Waldman’s writing style, including references to the “weight of her
`daughter’s absence.” Exhibit E.
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`10
`CLASS ACTION COMPLAINT
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 13 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`At no point did ChatGPT reproduce any of the copyright management
`54.
`information Plaintiffs included with their published works.
`Furthermore, at no point did Plaintiffs authorize OpenAI to download and copy
`55.
`their protected works, as described above.
`CLASS ALLEGATIONS
`Plaintiffs bring this action pursuant to the provisions of Rules 23(a), 23(b)(2),
`
`56.
`
`and 23(b)(3) of the Federal Rules of Civil Procedure, on behalf of themselves and the following
`
`proposed Class:
`
`All persons or entities in the United States that own a United States copyright in
`any written work that OpenAI used to train any GPT model during the Class
`Period.
`
`Excluded from the Class are Defendant, its employees, officers, directors, legal
`57.
`representatives, heirs, successors, wholly- or partly-owned, and its subsidiaries and affiliates;
`proposed Class counsel and their employees; the judicial officers and associated court staff
`assigned to this case and their immediate family members; all persons who make a timely
`election to be excluded from the Class; governmental entities; and the judge to whom this case
`is assigned and his/her immediate family.
`This action has been brought and may be properly maintained on behalf of the
`58.
`Class proposed herein under Federal Rule of Civil Procedure 23.
`Numerosity. Federal Rule of Civil Procedure 23(a)(1): The members of the Class
`59.
`are so numerous and geographically dispersed that individual joinder of all Class members is
`impracticable. On information and belief, there are at least tens of thousands of members in the
`Class. The Class members may be easily derived from Defendants’ records.
`Commonality and Predominance. Federal Rule of Civil Procedure 23(a)(2) and
`60.
`23(b)(3): This action involves common questions of law and fact, which predominate over any
`questions affecting individual Class members, including, without limitation:
`a. Whether Defendants engaged in the conduct alleged herein;
`b. Whether Defendants violated the copyrights of Plaintiffs and the Class when they
`
`
`
`
`
`
`
`
`11
`CLASS ACTION COMPLAINT
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 14 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`downloaded and copied Plaintiffs’ and the Class’s copyrighted books;
`c. Whether ChatGPT itself is an infringing derivative work based on Plaintiffs’ and
`the Class’s copyrighted books;
`d. Whether the text responses of ChatGPT are infringing derivative works based on
`Plaintiffs’ and the Class’s copyrighted books;
`e. Whether Defendants violated the DMCA by removing copyright-management
`information from Plaintiffs’ and the Class’s copyrighted books;
`f. Whether Defendants were unjustly enriched by the unlawful conduct alleged
`herein;
`g. Whether Defendants’ conduct violates the California Unfair Competition Law;
`h. Whether Plaintiffs and the other Class members are entitled to equitable relief,
`including, but not limited to, restitution or injunctive relief; and
`i. Whether Plaintiffs and the other Class members are entitled to damages and other
`monetary relief and, if so, in what amount.
`Typicality. Federal Rule of Civil Procedure 23(a)(3): Plaintiffs’ claims are
`61.
`typical of the other Class members’ claims because, among other things, all Class members were
`comparably injured through Defendants’ wrongful conduct as described above.
`Adequacy. Federal Rule of Civil Procedure 23(a)(4): Plaintiffs are adequate
`62.
`Class representative because their interests do not conflict with the interests of the other
`members of the Class they seeks to represent; Plaintiff have retained counsel competent and
`experienced in complex class action litigation; and Plaintiffs intend to prosecute this action
`vigorously. The interests of the Class will be fairly and adequately protected by Plaintiffs and
`their counsel.
`Declaratory and Injunctive Relief. Federal Rule of Civil Procedure 23(b)(2):
`63.
`Defendants have acted or refused to act on grounds generally applicable to Plaintiffs and the
`other members of the Class, thereby making appropriate final injunctive relief and declaratory
`relief with respect to the Class as a whole.
`Superiority. Federal Rule of Civil Procedure 23(b)(3): A class action is superior
`64.
`12
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 15 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`to any other available means for the fair and efficient adjudication of this controversy, and no
`unusual difficulties are likely to be encountered in the management of this class action. The
`damages or other financial detriment suffered by Plaintiffs and the other Class members are
`relatively small compared to the burden and expense that would be required to individually