Zhang et al v. Google LLC et al, 3:24-cv-02531, No. 1 (N.D.Cal. Apr. 26, 2024)

Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 1 of 16
`
`
`
` Joseph R. Saveri (State Bar No. 130064)
`JOSEPH SAVERI LAW FIRM, LLP
`601 California Street, Suite 1505
`San Francisco, CA 94108
`Telephone: (415) 500-6800
`Facsimile: (415) 395-9940
`Email:
`jsaveri@saverilawﬁrm.com
`
`Matthew Butterick (State Bar No. 250953)
`1920 Hillhurst Avenue, #406
`Los Angeles, CA 90027
`Telephone: (323) 968-2632
`Facsimile: (415) 395-9940
`Email:
`mb@buttericklaw.com
`
`Laura M. Matson (pro hac vice pending)
`LOCKRIDGE GRINDAL NAUEN PLLP
`100 Washington Avenue South, Suite 2200
`Minneapolis, MN 55401
`Telephone: (612) 339-6900
`Facsimile: (612) 339-0981
`Email: lmmatson@locklaw.com
`
`Counsel for Individual and Representative
`Plaintiffs and the Proposed Class
`(continues on signature page)
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA
`SAN FRANCISCO DIVISION
`
`
`Jingna Zhang, an individual;
`Sarah Andersen, an individual;
`Hope Larson, an individual; and
`Jessica Fink, an individual;
`
`Individual and Representative Plaintiffs,
`
`v.
`
`Google LLC, a Delaware limited liability company; and
`Alphabet Inc., a Delaware corporation;
`
`Defendants.
`
`
`
`Case No.
`
`Complaint
`
`Class Action
`
`Demand for Jury Trial
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`32
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 2 of 16
`
`
`
`
`
`Plaintiﬀs Jingna Zhang, Sarah Andersen, Hope Larson, and Jessica Fink (together “Plaintiﬀs”),
`
`on behalf of themselves and all others similarly situated, bring this class-action complaint
`
`(“Complaint”) against defendants Google LLC (“Google”) and Alphabet Inc. (“Alphabet”) (together
`
`“Defendants”).
`
`OVERVIEW
`
`1.
`
`Artiﬁcial intelligence—commonly abbreviated “AI”—denotes software that is designed
`
`to algorithmically simulate human reasoning or inference, often using statistical methods.
`2.
`
`Imagen is an AI software product created, maintained, and sold by Google. Imagen is a
`
`text-to-image diﬀusion model. A text-to-image diﬀusion model takes as input a short text description of
`
`an image (also known as a text prompt) and then uses a machine-learning technique called diﬀusion to
`
`generate an image in response to the prompt.
`3.
`
`Rather than being programmed in the traditional way—that is, by human programmers
`
`writing code—a diﬀusion model is trained by copying an enormous quantity of digital images with
`
`associated text captions, extracting protected expression from these works, and transforming that
`
`protected expression into a large set of numbers called weights that are stored within the model. These
`
`weights are entirely and uniquely derived from the protected expression in the training dataset.
`
`Whenever a diﬀusion model generates an image in response to a user prompt, it is performing a
`
`computation that relies on these stored weights, with the goal of imitating the protected expression
`
`ingested from the training dataset.
`4.
`
`Training a model ﬁrst requires amassing a huge corpus of data, called a dataset. The AI
`
`models at issue in this complaint were trained on datasets containing millions of images paired with
`
`descriptive captions. In this complaint, each image–caption pair is called a training image. During
`
`training of the model, the training images in the dataset are directly copied in full and then completely
`
`ingested by the model, meaning that protected expression from every training image enters the model.
`
`As it copies and ingests billions of training images, the model progressively develops the ability to
`
`generate outputs that mimic the protected expression copied from the dataset.
`
`1 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 3 of 16
`
`
`
`
`
`5.
`
`Plaintiﬀs and Class members are visual artists. They own registered copyrights in
`
`certain training images that Google has admitted copying to train Imagen. Plaintiﬀs and Class members
`
`never authorized Google to use their copyrighted works as training material.
`6.
`
`These copyrighted training images were copied multiple times by Google during the
`
`training process for Imagen. Because Imagen contains weights that represent a transformation of the
`
`protected expression in the training dataset, Imagen is itself an infringing derivative work.
`7.
`
`Alphabet, as the corporate parent of Google, also commercially beneﬁts from these acts
`
`of massive copyright infringement.
`
`JURISDICTION AND VENUE
`
`8.
`
`This Court has subject-matter jurisdiction under 28 U.S.C. § 1331 because this case
`
`arises under the Copyright Act (17 U.S.C. § 501).
`9.
`
`Jurisdiction and venue are proper in this judicial district under 28 U.S.C. § 1391(c)(2)
`
`because Defendants are headquartered in this district. Google created the Imagen model and, in
`
`cooperation with Alphabet, distributes it commercially. Therefore, a substantial part of the events
`
`giving rise to the claim occurred in this District. A substantial portion of the aﬀected interstate trade
`
`and commerce was carried out in this District. Defendants have transacted business, maintained
`
`substantial contacts, and/or committed overt acts in furtherance of the illegal scheme and conspiracy
`
`throughout the United States, including in this District. Defendants’ conduct has had the intended and
`
`foreseeable eﬀect of causing injury to persons residing in, located in, or doing business throughout the
`
`United States, including in this District.
`10. Under Civil Local Rule 3-2(c), assignment of this case to the San Francisco Division is
`
`proper because this case pertains to intellectual-property rights, which under General Order No. 44 is
`
`deemed a district-wide case category, and therefore venue is proper in any courthouse in this District.
`
`PLAINTIFFS
`
`11.
`12.
`
`Plaintiﬀ Jingna Zhang is a photographer who lives in Washington.
`
`Plaintiﬀ Sarah Andersen is a cartoonist and illustrator who lives in Oregon.
`
`2 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 4 of 16
`
`
`
`
`
`13.
`14.
`15.
`
`Plaintiﬀ Hope Larson is a cartoonist and illustrator who lives in North Carolina.
`
`Plaintiﬀ Jessica Fink is a cartoonist and illustrator who lives in New York.
`
`A nonexhaustive list of registered copyrights owned by Plaintiﬀs is included as
`
`Exhibit A: Plaintiﬀ Copyright Registrations. A nonexhaustive list of copyrighted images registered
`
`by Plaintiﬀs and infringed by Defendants is included as Exhibit B: Plaintiﬀ Images in LAION-400M.
`16.
`
`The images shown in Exhibit B are oﬀered as a representative sample of works by
`
`Plaintiﬀs that appear in the LAION-400M dataset—not an exhaustive or complete list. Plaintiﬀs
`
`conﬁrmed that these particular images were in the LAION-400M dataset by searching for their own
`
`names on two websites that allow searching of the LAION datasets: https://haveibeentrained.com and
`
`https://rom1504.github.io/clip-retrieval/. On information and belief, all of Plaintiﬀs’ works that were
`
`registered as part of the collections in Exhibit A and were online were scraped into the LAION-400M
`
`dataset.
`
`DEFENDANTS
`
`17.
`
`Defendant Google LLC is a Delaware limited liability company with its principal place
`
`of business at 1600 Amphitheatre Parkway, Mountain View CA 94043.
`18.
`
`Defendant Alphabet Inc. is a Delaware corporation with its principal place of business at
`
`1600 Amphitheatre Parkway, Mountain View CA 94043. In 2015, Google became a subsidiary of
`
`Alphabet.
`
`AGENTS AND CO-CONSPIRATORS
`
`19.
`
`The unlawful acts alleged against the Defendants in this Complaint were authorized,
`
`ordered, or performed by the Defendants’ respective oﬃcers, agents, employees, representatives, or
`
`shareholders while actively engaged in the management, direction, or control of the Defendants’
`
`businesses or aﬀairs. The Defendants’ agents operated under the explicit and apparent authority of
`
`their principals. Each Defendant, and its subsidiaries, aﬃliates, and agents operated as a single uniﬁed
`
`entity.
`
`3 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 5 of 16
`
`
`
`
`
`20.
`
`Various persons or ﬁrms not named as defendants may have participated as co-
`
`conspirators in the violations alleged herein and may have performed acts and made statements in
`
`furtherance thereof. Each acted as the principal, agent, or joint venture of, or for other Defendants with
`
`respect to the acts, violations, and common course of conduct alleged herein.
`
`FACTUAL ALLEGATIONS
`
`21.
`
`Google is a diversiﬁed technology company whose lines of business include internet
`
`advertising and cloud-computing services. As part of these businesses, Google creates and distributes
`
`artiﬁcial-intelligence software products.
`22. One such product is Imagen, a text-to-image diﬀusion model that takes as input a short
`
`text description of an image and then uses AI techniques to generate an image in response to the
`
`prompt.
`23.
`
`In May 2022, Google announced Imagen in a paper called “Photorealistic Text-to-
`
`Image Diﬀusion Models with Deep Language Understanding.”1 In the paper, Google admits that it
`
`trained Imagen on “the publicly available Laion [sic] dataset … with ≈ 400M image-text pairs.”2
`24.
`
`Initially, Google did not release Imagen to the public. Google explained its reasoning on
`
`the website for Imagen: “the data requirements of text-to-image models have led researchers to rely
`
`heavily on large, mostly uncurated, web-scraped datasets … we also utilized LAION-400M dataset
`
`which is known to contain a wide range of inappropriate content including pornographic imagery, racist
`
`slurs, and harmful social stereotypes … As such, there is a risk that Imagen has encoded harmful
`
`stereotypes and representations, which guides our decision to not release Imagen for public use without
`
`further safeguards in place.”3
`25.
`
`LAION-400M also contains copyrighted works owned by Plaintiﬀs and the Class,
`
`including those in Exhibit B.
`
`
`1 Available at https://arxiv.org/abs/2205.11487
`2 Id. at 7.
`3 See https://imagen.research.google/
`
`4 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 6 of 16
`
`
`
`
`
`26. Despite its professed commitment to “not release Imagen for public use without further
`
`safeguards,”4 Google soon reversed course.
`27.
`
`In November 2022, Google made Imagen publicly available to a select group of users
`
`through its AI Test Kitchen app. According to reporting at the time, Google “announced it will be
`
`adding Imagen—in a very limited form—to its AI Test Kitchen app as a way to collect early feedback on
`
`the technology.”5
`28.
`
`In January 2023, plaintiﬀ Sarah Andersen and two other artists ﬁled the ﬁrst lawsuit in
`
`the U.S. challenging the legality of training text-to-image diﬀusion models on copyrighted work without
`
`consent, credit, or compensation. That case, Andersen v. Stability AI et al., (Case No. 23-cv-00201,
`
`N.D. Cal.) challenged two models similar to Imagen—called Stable Diﬀusion and Midjourney—both
`
`of which were also trained on the LAION dataset. (The Andersen case is currently proceeding.)
`29.
`
`In May 2023, Google made Imagen even more widely available through its commercial
`
`AI cloud-computing service, called Vertex AI. According to a Google blog post about Vertex AI,
`
`Google described it as “Imagen, our text-to-image foundation model, lets organizations generate and
`
`customize studio-grade images at scale for any business need.”6
`30.
`
`In October 2023, Google made Imagen even more widely available through a tool called
`
`Search Generative Experience. According to reporting at the time, “If you’re opted in to [Search
`
`Generative Experience] through Google’s Search Labs program, you can just type your query into the
`
`Google search bar. After you do, [Search Generative Experience] can create a few images based on your
`
`prompt that you can pick from. The tool is powered by the Imagen family of AI models.”7
`31.
`
`In December 2023, Google released the successor to Imagen, called Imagen 2. Unlike
`
`the paper that accompanied the initial version of Imagen, Google’s introduction of Imagen 2 carefully
`
`
`
`4 Id.
`5 See https://www.theverge.com/2022/11/2/23434361/google-text-to-image-ai-model-imagen-test-
`kitchen-app
`6 See https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-
`models-opens-generative-ai-studio
`7 See https://www.theverge.com/2023/10/12/23913337/google-ai-powered-search-sge-images-written-
`drafts
`
`5 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 7 of 16
`
`
`
`
`omits a detailed description of its training dataset. Google limits itself to vague comments such as
`
`“From the outset, we invested in training data safety for Imagen 2, and added technical guardrails to
`
`limit problematic outputs like violent, oﬀensive, or sexually explicit content.”8
`32.
`
`On information and belief, Google did not disclose details about the training dataset for
`
`Imagen 2 because it was aware of the Andersen v. Stability AI et al. case and hoped to avoid being named
`
`as a defendant in a lawsuit over the legality of training on mass quantities of copyrighted works without
`
`consent, credit, or compensation.
`33.
`
`On information and belief, Google included LAION-400M in its training dataset for
`
`Imagen 2, because a) it had already done so for the ﬁrst version of Imagen, and b) one of the architects
`
`of the LAION image datasets, Romain Beaumont, is a Google employee, who Google hired speciﬁcally
`
`to exercise inﬂuence over the LAION organization and its image datasets.
`
`A KEY SOURCE OF GOOGLE’S TRAINING DATA: LAION
`
`34.
`
`LAION (acronym for “Large-Scale Artiﬁcial Intelligence Open Network”) is an
`
`organization based in Hamburg, Germany. According to its website, LAION is led by Christoph
`
`Schuhmann. LAION’s stated goal is “to make large-scale machine learning models, datasets and
`
`related code available to the general public.”9 All of LAION’s projects are made available for free.
`35.
`
`Since 2021, a key member of LAION’s team has been Romain Beaumont, who describes
`
`himself on the LAION website as an “open source contributor … I like to apply scale and deep learning
`
`to build AI apps and models.”10
`36.
`
`LAION’s most well-known projects are the datasets of training images it has released
`
`for training machine-learning models, which are now widely used in the AI industry.
`37.
`
`In August 2021, LAION released LAION-400M, a dataset of 400 million training
`
`images assembled from images accessible on the public internet. At the time, LAION-400M was the
`
`largest freely available dataset of its kind. Until December 2023, LAION distributed the LAION-400M
`
`
`8 See https://deepmind.google/technologies/imagen-2/
`9 https://laion.ai/about/
`10 See https://laion.ai/team/
`
`6 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 8 of 16
`
`
`
`
`dataset to the public through its own website and elsewhere. (In December 2023, due to the discovery
`
`of child sexual-abuse material (“CSAM”) in the LAION datasets, the LAION organization retracted
`
`these datasets—including LAION-400M—from the public internet.)
`38.
`
`Also in August 2021, Romain Beaumont created an online tool called Clip Retrieval that
`
`acted as a search interface to LAION to check whether certain artists or artworks were included in the
`
`LAION-400M dataset.11 Beaumont’s tool was popular. It was online until December 2023. (In
`
`December 2023, it was disabled due to the aforementioned issues with CSAM in the LAION datasets.)
`39.
`
`In November 2021, Romain Beaumont was a primary author of the paper that
`
`introduced the LAION-400M dataset, titled “LAION-400M: Open Dataset of CLIP-Filtered 400
`
`Million Image-Text Pairs,” released in November 2021 (hereafter, the “Beaumont–LAION Paper”).12
`40. When one downloads the LAION-400M dataset, one gets a list of metadata records,
`
`one for each training image. Each record includes the URL of the image, the image caption, a
`
`measurement of the similarity of the caption and image, a NSFW ﬂag (indicating the probability the
`
`image contains so-called “not safe for work” content), and the width and height of the image.
`41.
`
`The actual images referenced in the LAION-400M dataset records are not included
`
`with the dataset. Anyone who wishes to use LAION-400M for training their own machine-learning
`
`model must ﬁrst acquire copies of the actual images from their URLs. To facilitate the copying of these
`
`images, Romain Beaumont created a software tool called `img2dataset` that takes the LAION-400M
`
`metadata records as input and makes copies of the referenced images from the URLs in each metadata
`
`record, thereby creating local copies. The `img2dataset` tool is distributed from a page Beaumont
`
`controls on GitHub.13 LAION promotes the `img2dataset` tool in its documentation for LAION-
`
`400M. (“This metadata dataset purpose is to download the images for the whole dataset or a subset of
`
`it by supplying it to the very eﬃcient `img2dataset` tool.”14)
`
`
`11 See https://rom1504.github.io/clip-retrieval
`12 https://arxiv.org/abs/2111.02114
`13 https://github.com/rom1504/img2dataset
`14 See https://laion.ai/blog/laion-400-open-dataset/
`7 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 9 of 16
`
`
`
`
`
`42.
`
`Training a model with the LAION-400M dataset cannot begin without ﬁrst using
`
``img2dataset` or another similar tool to download the images in the dataset. Thus, because Google has
`
`trained Imagen on LAION-400M, Google has necessarily made one or more copies of images
`
`belonging to Plaintiﬀs as shown in Exhibit B, either by using Romain Beaumont’s `img2dataset` tool or
`
`another. Plaintiﬀs never authorized any of these LAION dataset users to copy their images or use them
`
`for training any models.
`43.
`
`One of the entities that has made unauthorized copies of the LAION-400M training
`
`images is LAION itself. According to the Beaumont–LAION Paper, LAION made the dataset by
`
`starting with Common Crawl metadata records. Common Crawl is a corpus of 250 billion web pages
`
`copied from the public web, including assets like Plaintiﬀs’ images (https://commoncrawl.org/). The
`
`metadata records contain web URLs. According to the Beaumont–LAION Paper, LAION created
`
`training images by ﬁrst “pars[ing] through [the metadata records] from Common Crawl and pars[ing]
`
`out all HTML IMG tags containing an alt-text attribute [that is, a text caption].” Then, LAION
`
`“download[ed] the raw images from the parsed URLs”. Beaumont–LAION Paper at 3.
`44.
`
`Sometime after the release of LAION-400M in August 2021, a company called
`
`Stability AI funded LAION’s creation of a similar dataset, but much larger. In March 2022, Stability AI
`
`CEO Mostaque called himself “the biggest backer of LAION.”15
`45.
`
`But Google wasn’t far behind. In March 2022, Google hired Romain Beaumont as a full-
`
`time software engineer, a position he has held since. On information and belief, Google hired Beaumont
`
`primarily to inﬂuence the creation of future LAION image datasets, based on a) Beaumont’s key role
`
`creating LAION-400M—which Google used to train Imagen; b) Beaumont’s control of the
`
``img2dataset` tool that was essential to using the LAION-400M dataset, and c) Beaumont’s control of
`
`the Clip Retrieval website that was essential to searching the LAION-400M dataset.
`46.
`
`Later in March 2022, LAION released LAION-5B, a dataset of 5.85 billion training
`
`images—more than 14 times bigger than LAION-400M. The author of the LAION blog post
`
`announcing LAION-5B was Romain Beaumont.16
`
`
`15 https://discord.com/channels/662267976984297473/938713143759216720/954674533942591510
`16 See https://laion.ai/blog/laion-5b/
`
`8 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 10 of 16
`
`
`
`
`
`47.
`
`In August 2022, Romain Beaumont created a specialized AI model to rate the aesthetic
`
`quality of an image, and used this model to create subsets of the LAION-5B training images ﬁltered by
`
`aesthetic quality, which Beaumont called LAION-Aesthetics. In its introduction of Imagen 2 in
`
`December 2023, Google said “We trained a specialized image aesthetics model based on human
`
`preferences for qualities like good lighting, framing, exposure, sharpness, and more. Each image was
`
`given an aesthetics score which helped condition Imagen 2 to give more weight to images in its training
`
`dataset that align with qualities humans prefer.”17 On information and belief, Beaumont’s work on
`
`LAION-Aesthetics formed the basis of Imagen 2’s “aesthetics model”, since at the time Beaumont was
`
`both a contributor to LAION and a full-time employee of Google.
`48.
`
`In October 2022, Romain Beaumont was a primary author of the paper about LAION-
`
`5B, called “LAION-5B: An open large-scale dataset for training next generation image-text models.”
`
`(hereafter, the “Beaumont–LAION-5B Paper”). According to the Beaumont–LAION-5B Paper,
`
`LAION-400M is a subset of LAION-5B, meaning every image in LAION-400M is also in LAION-5B.
`49.
`
`Just like the LAION-400M dataset, the actual images referenced in the LAION-5B
`
`dataset records are not included with the dataset. Anyone who wishes to use LAION-5B for training
`
`their own machine-learning model must ﬁrst acquire copies of the actual images from their URLs. As
`
`mentioned above, to facilitate the copying of these images, Romain Beaumont created a software tool
`
`called `img2dataset` that takes the LAION-5B metadata records as input and makes copies of the
`
`referenced images from the URLs in each metadata record, thereby creating local copies. The
`
``img2dataset` tool is distributed from a page Beaumont controls on GitHub.18
`
`
`
`
`
`17 See https://deepmind.google/technologies/imagen-2/
`18 https://github.com/rom1504/img2dataset
`9 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 11 of 16
`
`COUNT 1
`
`Direct Copyright Infringement (17 U.S.C. § 501)
`
`against Google
`
`50.
`51.
`
`The preceding factual allegations are incorporated by reference.
`
`As the owners of the registered copyrights in the works in Exhibit B, Plaintiﬀs hold the
`
`exclusive rights to those works under the U.S. Copyright Act (17 U.S.C. § 106).
`52.
`
`Plaintiﬀs never authorized Google to use their copyrighted work in any way.
`
`Nevertheless, Google repeatedly violated Plaintiﬀs’ exclusive rights under § 106 and continues to do so
`
`today. Plaintiﬀs and the Class members never authorized Google to make copies of their works, make
`
`derivative works, publicly display copies (or derivative works), or distribute copies (or derivative
`
`works).
`53.
`
`On information and belief, Google has used Plaintiﬀs’ training images to train other
`
`versions of Imagen, including Imagen 2, and so-called “multimodal” models that are trained on
`
`training images as well as text, such as Google Gemini. Collectively, Imagen and other models that
`
`Google trained on LAION-400M are called the Google–LAION Models.
`54.
`
`The LAION-400M dataset contains only URLs of training images, not the actual
`
`training images. Therefore, anyone who wishes to use LAION-400M for training their own machine-
`
`learning model must ﬁrst acquire copies of the actual training images from their URLs. Consistent with
`
`this, in preparation for training the Google–LAION Models, Google made one or more copies of the
`
`LAION-400M training images, including the Plaintiﬀ works in Exhibit B, so they could be fed to the
`
`Google–LAION Models as training data. The copies made of each copyrighted work were substantially
`
`similar to that copyrighted work.
`55.
`
`During the training of the Google–LAION Models, Google made a series of
`
`intermediate copies of the LAION-400M training images, including the Plaintiﬀ works in Exhibit B.
`
`The intermediate copies of each copyrighted work that Google made during training of the Google–
`
`LAION Models were substantially similar to that copyrighted work.
`
`10 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 12 of 16
`
`
`
`
`
`56.
`
`Plaintiﬀs have been injured by Google’s acts of direct copyright infringement. Plaintiﬀs
`
`are entitled to statutory damages, actual damages, restitution of proﬁts, and other remedies provided
`
`by law.
`
`COUNT 2
`
`Vicarious Copyright Infringement
`
`against Alphabet
`
`57.
`58.
`
`The preceding factual allegations are incorporated by reference.
`
`Alphabet was the corporate parent of Google during its training of the Google–LAION
`
`Models and remains its corporate parent.
`59.
`
`As the corporate parent of Google, Alphabet beneﬁtted ﬁnancially from the infringing
`
`activity of Google when it trained the Google–LAION Models on Plaintiﬀs’ works, and continues to
`
`beneﬁt ﬁnancially from the deployment of the Google–LAION Models.
`60.
`
`As the corporate parent of Google, Alphabet had the right and ability to supervise the
`
`infringing activity of Google when it trained the Google–LAION Models on Plaintiﬀs’ works. Alphabet
`
`failed to exercise that right and ability.
`61.
`
`Plaintiﬀs have been injured by Alphabet’s acts of vicarious copyright infringement.
`
`Plaintiﬀs are entitled to statutory damages, actual damages, restitution of proﬁts, and other remedies
`
`provided by law.
`
`CLASS ALLEGATIONS
`
`62.
`
`The “Class Period” as deﬁned in this Complaint begins on at least April 26, 2021 and
`
`runs through the present. Because Plaintiﬀs do not yet know when the unlawful conduct alleged herein
`
`began, but believe, on information and belief, that the conduct likely began earlier than the date listed
`
`above, Plaintiﬀs reserve the right to amend the Class Period to comport with the facts and evidence
`
`uncovered during further investigation or through discovery.
`
`11 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 13 of 16
`
`
`
`
`
`63.
`
`Class deﬁnition. Plaintiﬀs bring this action for damages and injunctive relief as a class
`
`action under Federal Rules of Civil Procedure 23(a), 23(b)(2), and 23(b)(3), on behalf of the following
`
`Class:
`
`64.
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`All persons or entities domiciled in the United States that own a
`United States copyright in any work that Google used as a training
`image for the Google–LAION Models during the Class Period.
`
`Defendants named herein;
`
`any of the Defendants’ co-conspirators;
`
`This Class deﬁnition excludes:
`a.
`b.
`c.
`d.
`
`any of Defendants’ parent companies, subsidiaries, and aﬃliates;
`
`any of Defendants’ oﬃcers, directors, management, employees, subsidiaries,
`
`aﬃliates, or agents;
`
`all governmental entities; and
`
`the judges and chambers staﬀ in this case, as well as any members of their
`
`e.
`f.
`
`immediate families.
`65. Numerosity. Plaintiﬀs do not know the exact number of members in the Class. This
`
`information is in the exclusive control of Defendant. On information and belief, there are at least
`
`thousands of members in the Class geographically dispersed throughout the United States. Therefore,
`
`joinder of all members of the Class in the prosecution of this action is impracticable.
`66.
`
`Typicality. Plaintiﬀs’ claims are typical of the claims of other members of the Class
`
`because Plaintiﬀs and all members of the Class were damaged by the same wrongful conduct of
`
`Defendant as alleged herein, and the relief sought herein is common to all members of the Class.
`67.
`
`Adequacy. Plaintiﬀs will fairly and adequately represent the interests of the members of
`
`the Class because the Plaintiﬀs have experienced the same harms as the members of the Class and have
`
`no conﬂicts with any other members of the Class. Furthermore, Plaintiﬀs have retained sophisticated
`
`and competent counsel who are experienced in prosecuting federal and state class actions, as well as
`
`other complex litigation.
`
`12 · complaint
`
`

`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 14 of 16
`
`
`
`
`
`68.
`
`Commonality and predominance. Numerous questions of law or fact common to each
`
`Class member arise from Defendants’ conduct and predominate over any questions aﬀecting the
`
`members of the Class individually:
`a. Whether Defendants violated the copyrights of Plaintiﬀs and the Class when they
`
`obtained copies of Plaintiﬀs’ copyrighted images and used them to train the Google–
`
`LAION Models.
`b. Whether any aﬃrmative defense excuses Defendants’ conduct.
`c. Whether any statutes of limitation constrain the potential for recovery for Plaintiﬀs and
`
`the Class.
`69. Other class considerations. Defendants have acted on grounds generally applicable to
`
`the Class. This class action is superior to alternatives, if any, for the fair and eﬃcient adjudication of
`
`this controversy. Prosecuting the claims pleaded herein as a class action will eliminate the possibility of
`
`repetitive litigation. There will be no material diﬃculty in the management of this action as a class
`
`action.
`
`70.
`
`The prosecution of separate actions by individual Class members would create the risk
`
`of inconsistent or varying adjudications, establishing incompatible standards of conduct for
`
`Defendants.
`
`DEMAND FOR JUDGMENT
`
`Wherefore, Plaintiﬀs request that the Court enter judgment on their behalf and on behalf of
`
`the Class deﬁned herein, by ordering:
`a) This action may proceed as a class action, with Plaintiﬀs serving as Class
`
`Representatives, and with Plaintiﬀs’ counsel as Class Counsel.
`b) Judgment in favor of Plaintiﬀs and the Class and against Defendant.
`c) An award of statutory and other damages under 17 U.S.C. § 504 for violations of the
`
`copyrights of Plaintiﬀs and the Class by Defendant.
`d) Destruction or other reasonable disposition of all copies Defendants made or used in
`
`violation of the exclusive rights of Plaintiffs and the Class, under 17 U.S.C. § 503(b).
`
`13 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`

`
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 15 of 16
`
`e) Pre- and post-judgment interest on the damages awarded to Plaintiﬀs and the Class, and
`
`that such interest be awarded at the highest legal rate from and after the date this class
`
`action complaint is ﬁrst served on Defendant.
`f) Defendants are to be jointly and severally responsible ﬁnancially for the costs and
`
`expenses of a Court approved notice program through post and media designed to give
`
`immediate notiﬁcation to the Class.
`g) Further relief for Plaintiﬀs and the Class as may be just and proper.
`
`JURY TRIAL DEMANDED
`
`Under Federal Rule of Civil Procedure 38(b), Plaintiﬀs demand a trial by jury of all the claims
`
`asserted in this Complaint so triable.
`
`
`
`Dated: April 26, 2024
`
`
`
`By:
`
`/s/ Joseph R. Saveri
`Joseph R. Saveri
`
`
`
`
`
`Joseph R. Saveri (State Bar No. 130064)
`Cadio Zirpoli (State Bar No. 179108)
`Christopher K. L. Young (State Bar No. 318371)
`Elissa Buchanan (State Bar No. 249996)
`JOSE

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases