throbber

`
`
`
`
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 1 of 24
`
`
`
`
`DANIEL J. MULLER, SBN 193396
`dmuller@venturahersey.com
`VENTURA HERSEY & MULLER, LLP
`1506 Hamilton Avenue
`San Jose, California 95125
`Telephone: (408) 512-3022
`Facsimile: (408) 512-3023
`
`Attorneys for Plaintiff and the Class
`
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA – SAN FRANCISCO DIVISION
`
`
`Case No.
`
`CLASS ACTION COMPLAINT
`
`CLASS ACTION
`
`
`
`
`JURY TRIAL DEMANDED
`
`MICHAEL CHABON, DAVID HENRY
`HWANG, MATTHEW KLAM, RACHEL
`LOUISE SNYDER, AND AYELET
`WALDMAN,
`
`individually and on behalf of all others
`similarly situated,
`
`
`
`Plaintiffs,
`
`
`v.
`
`OPENAI, INC., OPENAI, L.P., OPENAI
`OPCO, LLC, OPENAI GP LLC, OPENAI
`STARTUP FUND GP I, LLC, OPENAI
`STARTUP FUND I, LP, and OPENAI
`STARTUP FUND MANAGEMENT, LLC,
`
`
`
`Defendants.
`
`
`
`
`
`CLASS ACTION COMPLAINT
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 2 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`Plaintiffs Michael Chabon, David Henry Hwang, Matthew Klam, Rachel Louise Snyder,
`and Ayelet Waldman (“Plaintiffs”), individually and on behalf of all others similarly situated,
`bring this action against Defendants OpenAI, Inc., OpenAI, LP, OpenAI OpCo, LLC, OpenAI
`GP LLC, OpenAI Startup Fund I, LP, OpenAI Startup Fund GP I, LLC, and OpenAI Startup
`Fund Management, LLC (collectively, “Defendants” or “OpenAI”). Plaintiffs allege as follows
`based upon personal knowledge as to themselves and their own acts, and upon information and
`belief as to all other matters:
`
`NATURE OF ACTION
`This is a class action lawsuit brought by Plaintiffs on behalf of themselves and a
`1.
`Class of authors holding copyrights in their published works arising from OpenAI’s clear
`infringement of their intellectual property.
`OpenAI is a research company specializing in the development of artificial
`2.
`intelligence (“AI”) products, such as ChatGPT.
`ChatGPT is an AI chatbot, which produces responses to users’ text queries or
`3.
`prompts in a way that mimics human conversation.
`ChatGPT relies on other OpenAI products to function, namely Generative Pre-
`4.
`trained Transformer (“GPT”) models. “Generative,” in GPT, represents the model’s ability to
`respond to text inquiries, while “Pre-trained” refers to the model’s use of training datasets to
`program its responses, and “Transformer” concerns the model’s underlying algorithm allowing
`it to function.
`OpenAI has released five versions of GPT models, and the current version of
`5.
`ChatGPT runs on GPT-3.5 and GPT-4, depending on whether the user has subscribed to the
`premium version of ChatGPT. Only the version of ChatGPT that runs on GPT-3.5 is available
`at no cost to the public.
`OpenAI’s GPT models are types of “large language model,” which is a form of
`6.
`deep-learning algorithm programmed through “training datasets,” consisting of massive
`amounts of text data copied from the internet by OpenAI. The GPT models extract information
`from their training datasets in order to learn the statistical relationships between words, phrases,
`
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 3 of 24
`
`
`
`and sentences, which allow them to generate coherent and contextually relevant responses to
`user prompts or queries.
`A large language model’s responses to user prompts or queries are entirely and
`7.
`uniquely dependent on the text contained in its training dataset, necessarily processing and
`analyzing the information contained in its training dataset to generate responses.
`OpenAI incorporated Plaintiffs’ and Class members’ copyrighted works in
`8.
`datasets used to train its GPT models powering its ChatGPT product. Indeed, when ChatGPT is
`prompted, it generates not only summaries, but in-depth analyses of the themes present in
`Plaintiffs’ copyrighted works, which is only possible if the underlying GPT model was trained
`using Plaintiffs’ works.
`Plaintiffs and Class members did not consent to the use of their copyrighted
`9.
`works as training material for GPT models or for use with ChatGPT.
`Defendants, by and through their operation of ChatGPT, benefit commercially
`10.
`and profit handsomely from their unauthorized and illegal use of Plaintiffs’ and Class members’
`copyrighted works.
`
`JURISDICTION AND VENUE
`This Court has subject matter jurisdiction of this action pursuant to 28 U.S.C. §
`11.
`1331 because this case arises under the Copyright Act (17 U.S.C. § 501) and the Digital
`Millennium Copyright Act (17 U.S.C. § 1202).
`This Court has personal jurisdiction over Defendants pursuant to 18 U.S.C.
`12.
`§§ 1965(b) & (d), because they maintain their principal places of business in, and are thus
`residents of, this judicial district, maintain minimum contacts with the United States, this judicial
`district, and this State, and they intentionally avail themselves of the laws of the United States
`and this state by conducting a substantial amount of business in California. For these same
`reasons, venue properly lies in this District pursuant to 28 U.S.C. §§ 1391(a), (b) and (c).
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`1
`CLASS ACTION COMPLAINT
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 4 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`PARTIES
`
`A.
`
`Plaintiffs
`Plaintiff Michael Chabon (“Plaintiff Chabon”) is a resident of California.
`13.
`Plaintiff Chabon is an author who owns registered copyrights in many works, including but not
`limited to, The Mysteries of Pittsburgh, Wonder Boys, The Amazing Adventures of Kavalier &
`Clay, the Yiddish Policemen’s Union, Gentlemen of the Road, Telegraph Avenue, Fight of the
`Century, Kingdom of Olive and Ash, and Moonglow. Plaintiff Chabon is the recipient of the
`Pulitzer Prize for Fiction, Hugo, Nebula, Los Angeles Times Book Prize, and the National
`Jewish Book Award, among many others achieved over the span of a writing career spanning
`more than 30 years. Plaintiff Chabon’s works include copyright-management information that
`provides information about the copyrighted work, including the title of the work, its ISBN or
`copyright registration number, the name of the author, and the year of publication.
`Plaintiff David Henry Hwang (“Plaintiff Hwang”) is a resident of New York.
`14.
`Plaintiff Hwang is a playwright and screenwriter who owns registered copyrights in many
`works, including but not limited to, M. Butterfly, Chinglish, Yellow Face, the Dance and the
`Railroad, and FOB, as well as the Broadway musical, Flower Drum Song (2002 revival).
`Plaintiff Hwang is a Tony Award winner and three-time nominee, a Grammy Award winner
`who has been twice nominated, a three-time OBIE Award winner, and a three-time finalist for
`the Pulitzer Prize in Drama. Plaintiff Hwang’s works include copyright-management
`information that provides information about the copyrighted work, including the title of the
`work, its ISBN or copyright registration number, the name of the author, and the year of
`publication.
`Plaintiff Matthew Klam (“Plaintiff Klam”) is a resident of Washington D.C.
`15.
`Plaintiff Klam is an author who owns registered copyrights in several works, including but not
`limited to, Who is Rich?, and Sam the Cat and Other Stories. Plaintiff Klam is a recipient of a
`Guggenheim Fellowship, a Robert Bingham/PEN Award, a Whiting Writer’s Award, and a
`National Endowment of the Arts. Plaintiff Klam’s works have been selected as Notable Books
`of the year by The New York Times, The Los Angeles Times, the Kansas City Star, and the
`2
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 5 of 24
`
`
`
`Washington Post. Plaintiff Klam’s works include copyright-management information that
`provides information about the copyrighted work, including the title of the work, its ISBN or
`copyright registration number, the name of the author, and the year of publication.
`Plaintiff Rachel Louise Snyder (“Plaintiff Snyder”) is a resident of Washington,
`16.
`D.C. Plaintiff Snyder is an author who owns registered copyrights in many works, including but
`not limited to, Women We Buried, Women We Burned, No Visible Bruises – What We Don’t
`Know About Domestic Violence Can Kill Us, What We’ve Lost is Nothing, and Fugitive Denim:
`A Moving Story of People and Pants in the Borderless World of Global Trade. Plaintiff Snyder
`is a Guggenheim fellow and the recipient of the J. Anthony Lukas Work-in-Progress Award, the
`Hillman Prize, and the Helen Bernstein Book Award, and was a finalist for the National Book
`Critics Circle Award, Los Angeles Times Book Prize, and Kirkus Award. Her work has appeared
`in The New Yorker, The New York Times, Slate, and in many other publications. Plaintiff
`Snyder’s works include copyright-management information that provides information about the
`copyrighted work, including the title of the work, its ISBN or copyright registration number, the
`name of the author, and the year of publication.
`Plaintiff Ayelet Waldman (“Plaintiff Waldman”) is a resident of California.
`17.
`Plaintiff Waldman is an author and screen and television writer who owns registered copyrights
`in several works, including but not limited to, Love and other Impossible Pursuits, Red Hook
`Road, Love and Treasure, Bad Mother, Daughter’s Keeper, A Really Good Day, Fight of the
`Century, and Kingdom of Olives and Ash. Plaintiff Waldman has been nominated for an Emmy
`and a Golden Globe and is the recipient of numerous awards including a Peabody, AFI award,
`and a Pen Award, among others. Plaintiff Waldman’s works include copyright-management
`information that provides information about the copyrighted work, including the title of the
`work, its ISBN or copyright registration number, the name of the author, and the year of
`publication.
`At all times relevant hereto, Plaintiffs have been and remain the holders of the
`18.
`exclusive rights under the Copyright Act of 1976 (17 U.S.C. §§ 101, et seq. and all amendments
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`3
`CLASS ACTION COMPLAINT
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 6 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`thereto) to reproduce, distribute, display, or license the reproduction, distribution, and/or display
`the works identified in paragraphs 13-17, supra.
`Defendants
`B.
`Defendant OpenAI, Inc. is a Delaware nonprofit corporation with its principal
`19.
`place of business located at 3180 18th St., San Francisco, CA 94110.
`Defendant OpenAI, LP is a Delaware limited partnership with its principal place
`20.
`of business located at 3180 18th St., San Francisco, CA 94110. OpenAI, LP is a wholly owned
`subsidiary of OpenAI, Inc. that is operated for profit. OpenAI, Inc. controls OpenAI, LP directly
`and through the other OpenAI entities.
`Defendant OpenAI OpCo, LLC is a Delaware limited liability company with its
`21.
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI
`OpCo, LLC is a wholly owned subsidiary of OpenAI, Inc. that is operated for profit. OpenAI,
`Inc. controls OpenAI OpCo, LLC directly and through the other OpenAI entities.
`Defendant OpenAI GP, LLC is a Delaware limited liability company with its
`22.
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI GP,
`LLC is a general partner of OpenAI, LP. OpenAI GP manages and operates the day-to-day
`business and affairs of OpenAI, LP. OpenAI GP was aware of the unlawful conduct alleged
`herein and exercised control over OpenAI, LP throughout the Class Period. OpenAI, Inc. directly
`controls OpenAI GP.
`Defendant OpenAI Startup Fund I, LP is a Delaware limited partnership with its
`23.
`principal place of business located at 3180 18th Street, San Francisco, CA 94110. OpenAI
`Startup Fund I, LP was instrumental in the foundation of OpenAI, LP, including the creation of
`its business strategy and providing initial funding. OpenAI Startup Fund I was aware of the
`unlawful conduct alleged herein and exercised control over OpenAI, LP throughout the Class
`Period.
`Defendant OpenAI Startup Fund GP I, LLC is a Delaware limited liability
`24.
`company with its principal place of business located at 3180 18th Street, San Francisco, CA
`94110. OpenAI Startup Fund GP I, LLC is the general partner of OpenAI Startup Fund I.
`4
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 7 of 24
`
`
`
`A.
`
`OpenAI Startup Fund GP I is a party to the unlawful conduct alleged herein. OpenAI Startup
`Fund GP I manages and operates the day-to-day business and affairs of OpenAI Startup Fund I.
`Defendant OpenAI Startup Fund Management, LLC is a Delaware limited
`25.
`liability company with its principal place of business located at 3180 18th Street, San Francisco,
`CA 94110. OpenAI Startup Fund Management, LLC is a party to the unlawful conduct herein.
`OpenAI Startup Fund Management was aware of the unlawful conduct alleged herein and
`exercised control over OpenAI, LP throughout the Class Period.
`FACTUAL ALLEGATIONS
`OpenAI’s Artificial Intelligence Products
`OpenAI researches, develops, releases, and maintains AI products with the
`26.
`intention that its products “benefit all of humanity.”1
`ChatGPT is among the products OpenAI has developed, engineered, released,
`27.
`and maintained, which utilizes another OpenAI product, GPT models, to respond to text prompts
`and queries in a natural, coherent, and fluent way through a web interface.
`OpenAI has released a series of upgrades to its GPT model, including GPT-1
`28.
`(released June 2018), GPT-2 (February 2019), GPT-3 (May 2020), GPT-3.5 (March 2022), and
`most recently, GPT-4 (March 2023)2.
`The current version of ChatGPT utilizes both GPT-3.5 and GPT-4; however, the
`29.
`version of ChatGPT that allows users to choose between using GPT-3.5 and GPT-4 is only
`available to subscribers at a cost of $20 per month. Otherwise, users are only able to access the
`version of ChatGPT that relies on the GPT-3.5 model.3
`OpenAI makes ChatGPT available to software developers through an
`30.
`application-programming interface (“API”), which allows developers to write software
`
`
`1 About, OpenAI, https://openai.com/about
`2 Fawad Ali, GPT-1 to GPT-4: Each of OpenAI’s GPT Models Explained and Compared,
`Make Use Of (Apr. 11, 2023) https://www.makeuseof.com/gpt-models-explained-and-
`compared/
`3 Introducing ChatGPT Plus, OpenAI (Feb. 1, 2023) https://openai.com/blog/chatgpt-plus
`5
`
`CLASS ACTION COMPLAINT
`
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 8 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`programs that exchange data with ChatGPT.4 OpenAI charges developers for access to ChatGPT
`by the API on the basis of usage.
`OpenAI Uses Copyrighted Works in its Training Datasets
`B.
`As mentioned in paragraph 6, supra, OpenAI pre-trains its GPT models using a
`31.
`dataset consisting of various sources and content types, including books, plays, articles, and
`webpage and other written works, to respond accurately to users’ prompts and queries.
`OpenAI has admitted that, of all sources and content types that can be used to
`32.
`train the GPT models, written works, plays and articles are valuable training material because
`they offer the best examples of high-quality, long form writing and “contain[] long stretches of
`contiguous text, which allows the generative model to learn to condition on long-range
`information.”5
`Upon information and belief, OpenAI builds the dataset it uses to train its GPT
`33.
`models by scraping the internet for text data.
`34. While casting a wide net across the internet to capture the most comprehensive
`set of content available allows OpenAI to better train its GPT models, this practice necessarily
`leads OpenAI to capture, download, and copy copyrighted written works, plays and articles.
`Among the content OpenAI has scraped from the internet to construct its training
`35.
`datasets are Plaintiffs’ copyrighted works.
`In its June 2018 paper introducing the GPT-1 model, Improving Language
`36.
`Understanding by Generative Pre-Training, OpenAI revealed that it trained the GPT-1 model
`using two datasets: “Common Crawl,” which is a massive dataset of web pages containing
`billions of words, and “BookCorpus,” which is a collection of “over 7,000 unique unpublished
`books from a variety of genres including Adventure, Fantasy, and Romance.”6
`
`
`4 OpenAI API, OpenAI (June 11, 2020) https://openai.com/blog/openai-api
`5 Alec Radford, Improving Language Understanding by Generative-Pre-Training, OpenAI
`(June 11, 2018).
`6 Id.; see also Fawad Ali, GPT-1 to GPT-4: Each of OpenAI’s GPT Models Explained and
`Compared, Make Use Of (Apr. 11, 2023) https://www.makeuseof.com/gpt-models-explained-
`and-compared/
`
`
`
`
`
`6
`CLASS ACTION COMPLAINT
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 9 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`BookCorpus is a controversial dataset, assembled in 2015 by a team of AI
`37.
`researchers funded by Google and Samsung for the sole purpose of training language models
`like GPT by copying written works from a website called Smashwords, which hosts self-
`published novels, making them available to readers at no cost.7 Despite those novels being
`largely under copyright, they were copied into the BookCorpus dataset without consent, credit,
`or compensation to the authors.8
`OpenAI also copied many books while training GPT-3. In the July 2020 paper
`38.
`introducing GPT-3, Language Models are Few-Shot Learners, OpenAI disclosed, in addition to
`using the “Common Crawl” and “WebText” datasets that capture web pages, 16% of the GPT-
`3 training dataset came from “two internet-based book corpora,” which OpenAI simply refers
`to as “Books1” and “Books2.”9
`OpenAI has never revealed what books are part of the Books1 and Books2
`39.
`datasets or how they were obtained. OpenAI has offered a few clues, admitting that these are
`internet-based datasets that are much larger than BookCorpus.10 Based on the figures provided
`in its GPT-3 introductory paper, Books1 is nine times larger than BookCorpus, meaning it
`contains roughly 63,000 titles, and Books2 is 42 times larger, meaning it contains about 294,000
`titles.11
`A limited number of internet-based book corpora exist that contain this much
`40.
`material, meaning there are only a handful of possible sources OpenAI could have used to train
`the GPT-3 model.
`Project Gutenberg is an online archive of e-books whose copyrights have expired.
`41.
`Project Gutenberg has long been popular for training AI systems due to the lack of copyright. In
`2018, a team of AI researchers created the “Standardized Project Gutenberg Corpus,” which
`
`
`7 Jack Bandy, Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning, Medium
`(May 12, 2021) https://towardsdatascience.com/dirty-secrets-of-bookcorpus-a-key-dataset-in-
`machine-learning-6ee2927e8650
`8 Id.
`9 Tom B. Brown, Language Models are Few-Shot Learners, OpenAI (July 22, 2020).
`10 Id. at 9.
`11 Id.
`
`
`
`
`
`7
`CLASS ACTION COMPLAINT
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 10 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`contained “more than 50,000 books.”12 On that information and belief, the OpenAI Books1
`dataset is based on either the Standardized Project Gutenberg Corpus or Project Gutenberg itself,
`because of the roughly similar sizes of the two datasets.
`As for the Books2 dataset, the only “internet-based books corpora” that have ever
`42.
`offered that much material are infamous “shadow library” websites, like Library Genesis
`(“LibGen”), Z-Library, Sci-Hub, and Bibliotik, which host massive collections of pirated books,
`research papers, and other text-based materials.13 The materials aggregated by these websites
`have also been available in bulk through torrent systems.14
`These illegal shadow libraries have long been of interest to the AI-training
`43.
`community. For instance, an AI training dataset published in December 2020 by EleutherAI
`called “Books3” includes a recreation of the Bibliotik collection and contains nearly 200,000
`books.15 On information and belief, the OpenAI Books2 dataset includes books copied from
`these “shadow libraries,” because those are the sources of trainable books most similar in nature
`and size to OpenAI’s description of Books2.
`44. When OpenAI introduced GPT-4 in March 2023, the introductory paper
`contained no information about the dataset used to train it.16 Instead, OpenAI claims that,
`“[g]iven both the competitive landscape and the safety implications of large-scale models like
`GPT-4, this report contains no further details about . . . dataset construction.”17
`Regarding GPT-4, OpenAI has conceded that it did filter its dataset “to
`45.
`specifically reduce the quantity of inappropriate erotic text content,” implying that it again used
`a large dataset containing text works.18
`OpenAI Unlawfully Infringed Plaintiffs’ Copyrights
`C.
`
`12 Martin Gerlach, et al., A standardized Project Gutenberg corpus for statistical analysis of
`natural language and quantitative linguistics, Cornell University (Dec. 19, 2018),
`https://arxiv.org/pdf/1812.08092.pdf
`13 See Claire Woodcock, ‘Shadow Libraries’ Are Moving Their Pirated Books to The Dark
`Web After Fed Crackdowns, Vice (Nov. 30, 2022).
`14 Id.
`15 See Alex Perry, A giant online book collection Meta used to train its AI is gone over
`copyright issues, Mashable (Aug. 18, 2023).
`16 GPT-4 Technical Report, OpenAI (Mar. 27, 2023).
`17 Id. at 2.
`18 Id. at 61.
`
`
`
`
`
`8
`CLASS ACTION COMPLAINT
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 11 of 24
`
`
`
`As explained, ChatGPT’s responses to user queries or prompts, like other large
`46.
`language models, rely on the data upon which it is trained to generate responsive content. For
`example, if ChatGPT is prompted to generate a writing in the style of a certain author, GPT
`would generate content based on patterns and connections it learned from analysis of that
`author’s work within its training dataset.
`On information and belief, the reason ChatGPT can generate a writing in the style
`47.
`of a certain author or accurately summarize a certain copyrighted book and provide in-depth
`analysis of that book is because it was copied by OpenAI and copied and analyzed by the
`underlying GPT model as part of its training data.
`48. When ChatGPT is prompted to summarize copyrighted written works authored
`by Plaintiffs, it generates accurate, in-depth summaries and analyses of their works.
`For example, when prompted, ChatGPT accurately summarized Plaintiff
`49.
`Chabon’s novel The Amazing Adventures of Kavalier & Clay. When prompted to identify
`examples of trauma in the Amazing Adventures of Kavalier & Clay, ChatGPT identified six
`specific examples, including how the main character’s “experiences in Europe, including
`witnessing the persecution of Jews and the loss of his family, haunt him throughout the story.”
`When asked to write a paragraph in the style of The Amazing Adventures of Kavalier & Clay,
`ChatGPT generated a passage imitating Plaintiff Chabon’s writing style including references to
`the characters dealing with “the weight of the world at war.” Exhibit A.
`ChatGPT similarly provided in depth summaries and analyses of Plaintiff
`50.
`Hwang’s play, The Dance and the Railroad. For example, when prompted, ChatGPT identified
`five key themes from The Dance and the Railroad, including “art and creativity as a form of
`resistance” and “using art as a form of escape from the harsh realities and dehumanization of
`labor.” Additionally, when prompted to produce a screenplay in the style of The Dance and the
`Railroad, ChatGPT produced a script written in Plaintiff Hwang’s style, which generated a
`screenplay involving a Chinese laborer toiling on the Central Pacific Railroad that “believe[s]
`in the power of art to keep [their] spirits alive.” Exhibit B.
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`9
`CLASS ACTION COMPLAINT
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 12 of 24
`
`
`
`Likewise, ChatGPT provided in depth summaries and analyses of Plaintiff
`51.
`Klam’s works. For example, when prompted, Chat GPT accurately summarized Plaintiff Klam’s
`novel Who is Rich? and correctly analyzed the key relationships between the novel’s central
`character and the other characters in the novel. When asked to identify the main themes in Who
`is Rich? Chat GPT accurately identified seven main themes of the novel including “mid-life
`crisis and identify.” Further, when prompted to write a paragraph in the style of Who is Rich?,
`ChatGPT generated random passages authentically written in Plaintiff Klam’s writing style,
`including a reference to navigating the “treacherous waters of midlife.” Exhibit C.
`In the same vein, after being prompted to summarize Plaintiff Snyder’s book,
`52.
`What We’ve Lost is Nothing, ChatGPT accurately identified themes included within the novel,
`such as “safety, perception, and the fragility of human relationships.” Similarly, once prompted,
`ChatGPT accurately analyzed the theme of safety using a specific example from the text of
`Plaintiff Snyder’s copyrighted work, explaining that “the theme of safety is examined through
`the lens of a series of burglaries that occur in a suburban neighborhood . . . and how these
`incidents affect the characters and their perceptions of the world around them.” ChatGPT was
`also able to generate random passages authentically written in Plaintiff Snyder’s writing style
`when prompted. Exhibit D.
`Additionally, ChatGPT provided in depth summaries and analyses of Plaintiff
`53.
`Waldman’s works. For instance, when prompted to summarize Plaintiff Waldman’s novel Love
`and Other Impossible Pursuits, Chat GPT accurately provided a summary and analysis of the
`novel. When prompted to identify specific instances of grief in Love and other Impossible
`Pursuits, ChatGPT identified five specific instances of grief, including the protagonist Emelia’s
`loss of her infant daughter, a “loss that occurred before the events of the novel and [that] continue
`to haunt Emelia, affecting her emotional state and relationships.” When prompted to write a
`paragraph in the style of Love and Other Impossible Pursuits, ChatGPT generated a paragraph
`imitating Plaintiff Waldman’s writing style, including references to the “weight of her
`daughter’s absence.” Exhibit E.
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`
`
`
`
`
`
`
`10
`CLASS ACTION COMPLAINT
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 13 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`At no point did ChatGPT reproduce any of the copyright management
`54.
`information Plaintiffs included with their published works.
`Furthermore, at no point did Plaintiffs authorize OpenAI to download and copy
`55.
`their protected works, as described above.
`CLASS ALLEGATIONS
`Plaintiffs bring this action pursuant to the provisions of Rules 23(a), 23(b)(2),
`
`56.
`
`and 23(b)(3) of the Federal Rules of Civil Procedure, on behalf of themselves and the following
`
`proposed Class:
`
`All persons or entities in the United States that own a United States copyright in
`any written work that OpenAI used to train any GPT model during the Class
`Period.
`
`Excluded from the Class are Defendant, its employees, officers, directors, legal
`57.
`representatives, heirs, successors, wholly- or partly-owned, and its subsidiaries and affiliates;
`proposed Class counsel and their employees; the judicial officers and associated court staff
`assigned to this case and their immediate family members; all persons who make a timely
`election to be excluded from the Class; governmental entities; and the judge to whom this case
`is assigned and his/her immediate family.
`This action has been brought and may be properly maintained on behalf of the
`58.
`Class proposed herein under Federal Rule of Civil Procedure 23.
`Numerosity. Federal Rule of Civil Procedure 23(a)(1): The members of the Class
`59.
`are so numerous and geographically dispersed that individual joinder of all Class members is
`impracticable. On information and belief, there are at least tens of thousands of members in the
`Class. The Class members may be easily derived from Defendants’ records.
`Commonality and Predominance. Federal Rule of Civil Procedure 23(a)(2) and
`60.
`23(b)(3): This action involves common questions of law and fact, which predominate over any
`questions affecting individual Class members, including, without limitation:
`a. Whether Defendants engaged in the conduct alleged herein;
`b. Whether Defendants violated the copyrights of Plaintiffs and the Class when they
`
`
`
`
`
`
`
`
`11
`CLASS ACTION COMPLAINT
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 14 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`downloaded and copied Plaintiffs’ and the Class’s copyrighted books;
`c. Whether ChatGPT itself is an infringing derivative work based on Plaintiffs’ and
`the Class’s copyrighted books;
`d. Whether the text responses of ChatGPT are infringing derivative works based on
`Plaintiffs’ and the Class’s copyrighted books;
`e. Whether Defendants violated the DMCA by removing copyright-management
`information from Plaintiffs’ and the Class’s copyrighted books;
`f. Whether Defendants were unjustly enriched by the unlawful conduct alleged
`herein;
`g. Whether Defendants’ conduct violates the California Unfair Competition Law;
`h. Whether Plaintiffs and the other Class members are entitled to equitable relief,
`including, but not limited to, restitution or injunctive relief; and
`i. Whether Plaintiffs and the other Class members are entitled to damages and other
`monetary relief and, if so, in what amount.
`Typicality. Federal Rule of Civil Procedure 23(a)(3): Plaintiffs’ claims are
`61.
`typical of the other Class members’ claims because, among other things, all Class members were
`comparably injured through Defendants’ wrongful conduct as described above.
`Adequacy. Federal Rule of Civil Procedure 23(a)(4): Plaintiffs are adequate
`62.
`Class representative because their interests do not conflict with the interests of the other
`members of the Class they seeks to represent; Plaintiff have retained counsel competent and
`experienced in complex class action litigation; and Plaintiffs intend to prosecute this action
`vigorously. The interests of the Class will be fairly and adequately protected by Plaintiffs and
`their counsel.
`Declaratory and Injunctive Relief. Federal Rule of Civil Procedure 23(b)(2):
`63.
`Defendants have acted or refused to act on grounds generally applicable to Plaintiffs and the
`other members of the Class, thereby making appropriate final injunctive relief and declaratory
`relief with respect to the Class as a whole.
`Superiority. Federal Rule of Civil Procedure 23(b)(3): A class action is superior
`64.
`12
`
`CLASS ACTION COMPLAINT
`
`
`
`
`
`
`
`

`

`Case 3:23-cv-04625 Document 1 Filed 09/08/23 Page 15 of 24
`
`
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`
`to any other available means for the fair and efficient adjudication of this controversy, and no
`unusual difficulties are likely to be encountered in the management of this class action. The
`damages or other financial detriment suffered by Plaintiffs and the other Class members are
`relatively small compared to the burden and expense that would be required to individually

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket