`
`
`
` Joseph R. Saveri (State Bar No. 130064)
`JOSEPH SAVERI LAW FIRM, LLP
`601 California Street, Suite 1505
`San Francisco, CA 94108
`Telephone: (415) 500-6800
`Facsimile: (415) 395-9940
`Email:
`jsaveri@saverilawfirm.com
`
`Matthew Butterick (State Bar No. 250953)
`1920 Hillhurst Avenue, #406
`Los Angeles, CA 90027
`Telephone: (323) 968-2632
`Facsimile: (415) 395-9940
`Email:
`mb@buttericklaw.com
`
`Laura M. Matson (pro hac vice pending)
`LOCKRIDGE GRINDAL NAUEN PLLP
`100 Washington Avenue South, Suite 2200
`Minneapolis, MN 55401
`Telephone: (612) 339-6900
`Facsimile: (612) 339-0981
`Email: lmmatson@locklaw.com
`
`Counsel for Individual and Representative
`Plaintiffs and the Proposed Class
`(continues on signature page)
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA
`SAN FRANCISCO DIVISION
`
`
`Jingna Zhang, an individual;
`Sarah Andersen, an individual;
`Hope Larson, an individual; and
`Jessica Fink, an individual;
`
`Individual and Representative Plaintiffs,
`
`v.
`
`Google LLC, a Delaware limited liability company; and
`Alphabet Inc., a Delaware corporation;
`
`Defendants.
`
`
`
`Case No.
`
`Complaint
`
`Class Action
`
`Demand for Jury Trial
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`32
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 2 of 16
`
`
`
`
`
`Plaintiffs Jingna Zhang, Sarah Andersen, Hope Larson, and Jessica Fink (together “Plaintiffs”),
`
`on behalf of themselves and all others similarly situated, bring this class-action complaint
`
`(“Complaint”) against defendants Google LLC (“Google”) and Alphabet Inc. (“Alphabet”) (together
`
`“Defendants”).
`
`OVERVIEW
`
`1.
`
`Artificial intelligence—commonly abbreviated “AI”—denotes software that is designed
`
`to algorithmically simulate human reasoning or inference, often using statistical methods.
`2.
`
`Imagen is an AI software product created, maintained, and sold by Google. Imagen is a
`
`text-to-image diffusion model. A text-to-image diffusion model takes as input a short text description of
`
`an image (also known as a text prompt) and then uses a machine-learning technique called diffusion to
`
`generate an image in response to the prompt.
`3.
`
`Rather than being programmed in the traditional way—that is, by human programmers
`
`writing code—a diffusion model is trained by copying an enormous quantity of digital images with
`
`associated text captions, extracting protected expression from these works, and transforming that
`
`protected expression into a large set of numbers called weights that are stored within the model. These
`
`weights are entirely and uniquely derived from the protected expression in the training dataset.
`
`Whenever a diffusion model generates an image in response to a user prompt, it is performing a
`
`computation that relies on these stored weights, with the goal of imitating the protected expression
`
`ingested from the training dataset.
`4.
`
`Training a model first requires amassing a huge corpus of data, called a dataset. The AI
`
`models at issue in this complaint were trained on datasets containing millions of images paired with
`
`descriptive captions. In this complaint, each image–caption pair is called a training image. During
`
`training of the model, the training images in the dataset are directly copied in full and then completely
`
`ingested by the model, meaning that protected expression from every training image enters the model.
`
`As it copies and ingests billions of training images, the model progressively develops the ability to
`
`generate outputs that mimic the protected expression copied from the dataset.
`
`1 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 3 of 16
`
`
`
`
`
`5.
`
`Plaintiffs and Class members are visual artists. They own registered copyrights in
`
`certain training images that Google has admitted copying to train Imagen. Plaintiffs and Class members
`
`never authorized Google to use their copyrighted works as training material.
`6.
`
`These copyrighted training images were copied multiple times by Google during the
`
`training process for Imagen. Because Imagen contains weights that represent a transformation of the
`
`protected expression in the training dataset, Imagen is itself an infringing derivative work.
`7.
`
`Alphabet, as the corporate parent of Google, also commercially benefits from these acts
`
`of massive copyright infringement.
`
`JURISDICTION AND VENUE
`
`8.
`
`This Court has subject-matter jurisdiction under 28 U.S.C. § 1331 because this case
`
`arises under the Copyright Act (17 U.S.C. § 501).
`9.
`
`Jurisdiction and venue are proper in this judicial district under 28 U.S.C. § 1391(c)(2)
`
`because Defendants are headquartered in this district. Google created the Imagen model and, in
`
`cooperation with Alphabet, distributes it commercially. Therefore, a substantial part of the events
`
`giving rise to the claim occurred in this District. A substantial portion of the affected interstate trade
`
`and commerce was carried out in this District. Defendants have transacted business, maintained
`
`substantial contacts, and/or committed overt acts in furtherance of the illegal scheme and conspiracy
`
`throughout the United States, including in this District. Defendants’ conduct has had the intended and
`
`foreseeable effect of causing injury to persons residing in, located in, or doing business throughout the
`
`United States, including in this District.
`10. Under Civil Local Rule 3-2(c), assignment of this case to the San Francisco Division is
`
`proper because this case pertains to intellectual-property rights, which under General Order No. 44 is
`
`deemed a district-wide case category, and therefore venue is proper in any courthouse in this District.
`
`PLAINTIFFS
`
`11.
`12.
`
`Plaintiff Jingna Zhang is a photographer who lives in Washington.
`
`Plaintiff Sarah Andersen is a cartoonist and illustrator who lives in Oregon.
`
`2 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 4 of 16
`
`
`
`
`
`13.
`14.
`15.
`
`Plaintiff Hope Larson is a cartoonist and illustrator who lives in North Carolina.
`
`Plaintiff Jessica Fink is a cartoonist and illustrator who lives in New York.
`
`A nonexhaustive list of registered copyrights owned by Plaintiffs is included as
`
`Exhibit A: Plaintiff Copyright Registrations. A nonexhaustive list of copyrighted images registered
`
`by Plaintiffs and infringed by Defendants is included as Exhibit B: Plaintiff Images in LAION-400M.
`16.
`
`The images shown in Exhibit B are offered as a representative sample of works by
`
`Plaintiffs that appear in the LAION-400M dataset—not an exhaustive or complete list. Plaintiffs
`
`confirmed that these particular images were in the LAION-400M dataset by searching for their own
`
`names on two websites that allow searching of the LAION datasets: https://haveibeentrained.com and
`
`https://rom1504.github.io/clip-retrieval/. On information and belief, all of Plaintiffs’ works that were
`
`registered as part of the collections in Exhibit A and were online were scraped into the LAION-400M
`
`dataset.
`
`DEFENDANTS
`
`17.
`
`Defendant Google LLC is a Delaware limited liability company with its principal place
`
`of business at 1600 Amphitheatre Parkway, Mountain View CA 94043.
`18.
`
`Defendant Alphabet Inc. is a Delaware corporation with its principal place of business at
`
`1600 Amphitheatre Parkway, Mountain View CA 94043. In 2015, Google became a subsidiary of
`
`Alphabet.
`
`AGENTS AND CO-CONSPIRATORS
`
`19.
`
`The unlawful acts alleged against the Defendants in this Complaint were authorized,
`
`ordered, or performed by the Defendants’ respective officers, agents, employees, representatives, or
`
`shareholders while actively engaged in the management, direction, or control of the Defendants’
`
`businesses or affairs. The Defendants’ agents operated under the explicit and apparent authority of
`
`their principals. Each Defendant, and its subsidiaries, affiliates, and agents operated as a single unified
`
`entity.
`
`3 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 5 of 16
`
`
`
`
`
`20.
`
`Various persons or firms not named as defendants may have participated as co-
`
`conspirators in the violations alleged herein and may have performed acts and made statements in
`
`furtherance thereof. Each acted as the principal, agent, or joint venture of, or for other Defendants with
`
`respect to the acts, violations, and common course of conduct alleged herein.
`
`FACTUAL ALLEGATIONS
`
`21.
`
`Google is a diversified technology company whose lines of business include internet
`
`advertising and cloud-computing services. As part of these businesses, Google creates and distributes
`
`artificial-intelligence software products.
`22. One such product is Imagen, a text-to-image diffusion model that takes as input a short
`
`text description of an image and then uses AI techniques to generate an image in response to the
`
`prompt.
`23.
`
`In May 2022, Google announced Imagen in a paper called “Photorealistic Text-to-
`
`Image Diffusion Models with Deep Language Understanding.”1 In the paper, Google admits that it
`
`trained Imagen on “the publicly available Laion [sic] dataset … with ≈ 400M image-text pairs.”2
`24.
`
`Initially, Google did not release Imagen to the public. Google explained its reasoning on
`
`the website for Imagen: “the data requirements of text-to-image models have led researchers to rely
`
`heavily on large, mostly uncurated, web-scraped datasets … we also utilized LAION-400M dataset
`
`which is known to contain a wide range of inappropriate content including pornographic imagery, racist
`
`slurs, and harmful social stereotypes … As such, there is a risk that Imagen has encoded harmful
`
`stereotypes and representations, which guides our decision to not release Imagen for public use without
`
`further safeguards in place.”3
`25.
`
`LAION-400M also contains copyrighted works owned by Plaintiffs and the Class,
`
`including those in Exhibit B.
`
`
`1 Available at https://arxiv.org/abs/2205.11487
`2 Id. at 7.
`3 See https://imagen.research.google/
`
`4 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 6 of 16
`
`
`
`
`
`26. Despite its professed commitment to “not release Imagen for public use without further
`
`safeguards,”4 Google soon reversed course.
`27.
`
`In November 2022, Google made Imagen publicly available to a select group of users
`
`through its AI Test Kitchen app. According to reporting at the time, Google “announced it will be
`
`adding Imagen—in a very limited form—to its AI Test Kitchen app as a way to collect early feedback on
`
`the technology.”5
`28.
`
`In January 2023, plaintiff Sarah Andersen and two other artists filed the first lawsuit in
`
`the U.S. challenging the legality of training text-to-image diffusion models on copyrighted work without
`
`consent, credit, or compensation. That case, Andersen v. Stability AI et al., (Case No. 23-cv-00201,
`
`N.D. Cal.) challenged two models similar to Imagen—called Stable Diffusion and Midjourney—both
`
`of which were also trained on the LAION dataset. (The Andersen case is currently proceeding.)
`29.
`
`In May 2023, Google made Imagen even more widely available through its commercial
`
`AI cloud-computing service, called Vertex AI. According to a Google blog post about Vertex AI,
`
`Google described it as “Imagen, our text-to-image foundation model, lets organizations generate and
`
`customize studio-grade images at scale for any business need.”6
`30.
`
`In October 2023, Google made Imagen even more widely available through a tool called
`
`Search Generative Experience. According to reporting at the time, “If you’re opted in to [Search
`
`Generative Experience] through Google’s Search Labs program, you can just type your query into the
`
`Google search bar. After you do, [Search Generative Experience] can create a few images based on your
`
`prompt that you can pick from. The tool is powered by the Imagen family of AI models.”7
`31.
`
`In December 2023, Google released the successor to Imagen, called Imagen 2. Unlike
`
`the paper that accompanied the initial version of Imagen, Google’s introduction of Imagen 2 carefully
`
`
`
`4 Id.
`5 See https://www.theverge.com/2022/11/2/23434361/google-text-to-image-ai-model-imagen-test-
`kitchen-app
`6 See https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-launches-new-ai-
`models-opens-generative-ai-studio
`7 See https://www.theverge.com/2023/10/12/23913337/google-ai-powered-search-sge-images-written-
`drafts
`
`5 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 7 of 16
`
`
`
`
`omits a detailed description of its training dataset. Google limits itself to vague comments such as
`
`“From the outset, we invested in training data safety for Imagen 2, and added technical guardrails to
`
`limit problematic outputs like violent, offensive, or sexually explicit content.”8
`32.
`
`On information and belief, Google did not disclose details about the training dataset for
`
`Imagen 2 because it was aware of the Andersen v. Stability AI et al. case and hoped to avoid being named
`
`as a defendant in a lawsuit over the legality of training on mass quantities of copyrighted works without
`
`consent, credit, or compensation.
`33.
`
`On information and belief, Google included LAION-400M in its training dataset for
`
`Imagen 2, because a) it had already done so for the first version of Imagen, and b) one of the architects
`
`of the LAION image datasets, Romain Beaumont, is a Google employee, who Google hired specifically
`
`to exercise influence over the LAION organization and its image datasets.
`
`A KEY SOURCE OF GOOGLE’S TRAINING DATA: LAION
`
`34.
`
`LAION (acronym for “Large-Scale Artificial Intelligence Open Network”) is an
`
`organization based in Hamburg, Germany. According to its website, LAION is led by Christoph
`
`Schuhmann. LAION’s stated goal is “to make large-scale machine learning models, datasets and
`
`related code available to the general public.”9 All of LAION’s projects are made available for free.
`35.
`
`Since 2021, a key member of LAION’s team has been Romain Beaumont, who describes
`
`himself on the LAION website as an “open source contributor … I like to apply scale and deep learning
`
`to build AI apps and models.”10
`36.
`
`LAION’s most well-known projects are the datasets of training images it has released
`
`for training machine-learning models, which are now widely used in the AI industry.
`37.
`
`In August 2021, LAION released LAION-400M, a dataset of 400 million training
`
`images assembled from images accessible on the public internet. At the time, LAION-400M was the
`
`largest freely available dataset of its kind. Until December 2023, LAION distributed the LAION-400M
`
`
`8 See https://deepmind.google/technologies/imagen-2/
`9 https://laion.ai/about/
`10 See https://laion.ai/team/
`
`6 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 8 of 16
`
`
`
`
`dataset to the public through its own website and elsewhere. (In December 2023, due to the discovery
`
`of child sexual-abuse material (“CSAM”) in the LAION datasets, the LAION organization retracted
`
`these datasets—including LAION-400M—from the public internet.)
`38.
`
`Also in August 2021, Romain Beaumont created an online tool called Clip Retrieval that
`
`acted as a search interface to LAION to check whether certain artists or artworks were included in the
`
`LAION-400M dataset.11 Beaumont’s tool was popular. It was online until December 2023. (In
`
`December 2023, it was disabled due to the aforementioned issues with CSAM in the LAION datasets.)
`39.
`
`In November 2021, Romain Beaumont was a primary author of the paper that
`
`introduced the LAION-400M dataset, titled “LAION-400M: Open Dataset of CLIP-Filtered 400
`
`Million Image-Text Pairs,” released in November 2021 (hereafter, the “Beaumont–LAION Paper”).12
`40. When one downloads the LAION-400M dataset, one gets a list of metadata records,
`
`one for each training image. Each record includes the URL of the image, the image caption, a
`
`measurement of the similarity of the caption and image, a NSFW flag (indicating the probability the
`
`image contains so-called “not safe for work” content), and the width and height of the image.
`41.
`
`The actual images referenced in the LAION-400M dataset records are not included
`
`with the dataset. Anyone who wishes to use LAION-400M for training their own machine-learning
`
`model must first acquire copies of the actual images from their URLs. To facilitate the copying of these
`
`images, Romain Beaumont created a software tool called `img2dataset` that takes the LAION-400M
`
`metadata records as input and makes copies of the referenced images from the URLs in each metadata
`
`record, thereby creating local copies. The `img2dataset` tool is distributed from a page Beaumont
`
`controls on GitHub.13 LAION promotes the `img2dataset` tool in its documentation for LAION-
`
`400M. (“This metadata dataset purpose is to download the images for the whole dataset or a subset of
`
`it by supplying it to the very efficient `img2dataset` tool.”14)
`
`
`11 See https://rom1504.github.io/clip-retrieval
`12 https://arxiv.org/abs/2111.02114
`13 https://github.com/rom1504/img2dataset
`14 See https://laion.ai/blog/laion-400-open-dataset/
`7 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 9 of 16
`
`
`
`
`
`42.
`
`Training a model with the LAION-400M dataset cannot begin without first using
`
``img2dataset` or another similar tool to download the images in the dataset. Thus, because Google has
`
`trained Imagen on LAION-400M, Google has necessarily made one or more copies of images
`
`belonging to Plaintiffs as shown in Exhibit B, either by using Romain Beaumont’s `img2dataset` tool or
`
`another. Plaintiffs never authorized any of these LAION dataset users to copy their images or use them
`
`for training any models.
`43.
`
`One of the entities that has made unauthorized copies of the LAION-400M training
`
`images is LAION itself. According to the Beaumont–LAION Paper, LAION made the dataset by
`
`starting with Common Crawl metadata records. Common Crawl is a corpus of 250 billion web pages
`
`copied from the public web, including assets like Plaintiffs’ images (https://commoncrawl.org/). The
`
`metadata records contain web URLs. According to the Beaumont–LAION Paper, LAION created
`
`training images by first “pars[ing] through [the metadata records] from Common Crawl and pars[ing]
`
`out all HTML IMG tags containing an alt-text attribute [that is, a text caption].” Then, LAION
`
`“download[ed] the raw images from the parsed URLs”. Beaumont–LAION Paper at 3.
`44.
`
`Sometime after the release of LAION-400M in August 2021, a company called
`
`Stability AI funded LAION’s creation of a similar dataset, but much larger. In March 2022, Stability AI
`
`CEO Mostaque called himself “the biggest backer of LAION.”15
`45.
`
`But Google wasn’t far behind. In March 2022, Google hired Romain Beaumont as a full-
`
`time software engineer, a position he has held since. On information and belief, Google hired Beaumont
`
`primarily to influence the creation of future LAION image datasets, based on a) Beaumont’s key role
`
`creating LAION-400M—which Google used to train Imagen; b) Beaumont’s control of the
`
``img2dataset` tool that was essential to using the LAION-400M dataset, and c) Beaumont’s control of
`
`the Clip Retrieval website that was essential to searching the LAION-400M dataset.
`46.
`
`Later in March 2022, LAION released LAION-5B, a dataset of 5.85 billion training
`
`images—more than 14 times bigger than LAION-400M. The author of the LAION blog post
`
`announcing LAION-5B was Romain Beaumont.16
`
`
`15 https://discord.com/channels/662267976984297473/938713143759216720/954674533942591510
`16 See https://laion.ai/blog/laion-5b/
`
`8 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 10 of 16
`
`
`
`
`
`47.
`
`In August 2022, Romain Beaumont created a specialized AI model to rate the aesthetic
`
`quality of an image, and used this model to create subsets of the LAION-5B training images filtered by
`
`aesthetic quality, which Beaumont called LAION-Aesthetics. In its introduction of Imagen 2 in
`
`December 2023, Google said “We trained a specialized image aesthetics model based on human
`
`preferences for qualities like good lighting, framing, exposure, sharpness, and more. Each image was
`
`given an aesthetics score which helped condition Imagen 2 to give more weight to images in its training
`
`dataset that align with qualities humans prefer.”17 On information and belief, Beaumont’s work on
`
`LAION-Aesthetics formed the basis of Imagen 2’s “aesthetics model”, since at the time Beaumont was
`
`both a contributor to LAION and a full-time employee of Google.
`48.
`
`In October 2022, Romain Beaumont was a primary author of the paper about LAION-
`
`5B, called “LAION-5B: An open large-scale dataset for training next generation image-text models.”
`
`(hereafter, the “Beaumont–LAION-5B Paper”). According to the Beaumont–LAION-5B Paper,
`
`LAION-400M is a subset of LAION-5B, meaning every image in LAION-400M is also in LAION-5B.
`49.
`
`Just like the LAION-400M dataset, the actual images referenced in the LAION-5B
`
`dataset records are not included with the dataset. Anyone who wishes to use LAION-5B for training
`
`their own machine-learning model must first acquire copies of the actual images from their URLs. As
`
`mentioned above, to facilitate the copying of these images, Romain Beaumont created a software tool
`
`called `img2dataset` that takes the LAION-5B metadata records as input and makes copies of the
`
`referenced images from the URLs in each metadata record, thereby creating local copies. The
`
``img2dataset` tool is distributed from a page Beaumont controls on GitHub.18
`
`
`
`
`
`17 See https://deepmind.google/technologies/imagen-2/
`18 https://github.com/rom1504/img2dataset
`9 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 11 of 16
`
`COUNT 1
`
`Direct Copyright Infringement (17 U.S.C. § 501)
`
`against Google
`
`50.
`51.
`
`The preceding factual allegations are incorporated by reference.
`
`As the owners of the registered copyrights in the works in Exhibit B, Plaintiffs hold the
`
`exclusive rights to those works under the U.S. Copyright Act (17 U.S.C. § 106).
`52.
`
`Plaintiffs never authorized Google to use their copyrighted work in any way.
`
`Nevertheless, Google repeatedly violated Plaintiffs’ exclusive rights under § 106 and continues to do so
`
`today. Plaintiffs and the Class members never authorized Google to make copies of their works, make
`
`derivative works, publicly display copies (or derivative works), or distribute copies (or derivative
`
`works).
`53.
`
`On information and belief, Google has used Plaintiffs’ training images to train other
`
`versions of Imagen, including Imagen 2, and so-called “multimodal” models that are trained on
`
`training images as well as text, such as Google Gemini. Collectively, Imagen and other models that
`
`Google trained on LAION-400M are called the Google–LAION Models.
`54.
`
`The LAION-400M dataset contains only URLs of training images, not the actual
`
`training images. Therefore, anyone who wishes to use LAION-400M for training their own machine-
`
`learning model must first acquire copies of the actual training images from their URLs. Consistent with
`
`this, in preparation for training the Google–LAION Models, Google made one or more copies of the
`
`LAION-400M training images, including the Plaintiff works in Exhibit B, so they could be fed to the
`
`Google–LAION Models as training data. The copies made of each copyrighted work were substantially
`
`similar to that copyrighted work.
`55.
`
`During the training of the Google–LAION Models, Google made a series of
`
`intermediate copies of the LAION-400M training images, including the Plaintiff works in Exhibit B.
`
`The intermediate copies of each copyrighted work that Google made during training of the Google–
`
`LAION Models were substantially similar to that copyrighted work.
`
`10 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 12 of 16
`
`
`
`
`
`56.
`
`Plaintiffs have been injured by Google’s acts of direct copyright infringement. Plaintiffs
`
`are entitled to statutory damages, actual damages, restitution of profits, and other remedies provided
`
`by law.
`
`COUNT 2
`
`Vicarious Copyright Infringement
`
`against Alphabet
`
`57.
`58.
`
`The preceding factual allegations are incorporated by reference.
`
`Alphabet was the corporate parent of Google during its training of the Google–LAION
`
`Models and remains its corporate parent.
`59.
`
`As the corporate parent of Google, Alphabet benefitted financially from the infringing
`
`activity of Google when it trained the Google–LAION Models on Plaintiffs’ works, and continues to
`
`benefit financially from the deployment of the Google–LAION Models.
`60.
`
`As the corporate parent of Google, Alphabet had the right and ability to supervise the
`
`infringing activity of Google when it trained the Google–LAION Models on Plaintiffs’ works. Alphabet
`
`failed to exercise that right and ability.
`61.
`
`Plaintiffs have been injured by Alphabet’s acts of vicarious copyright infringement.
`
`Plaintiffs are entitled to statutory damages, actual damages, restitution of profits, and other remedies
`
`provided by law.
`
`CLASS ALLEGATIONS
`
`62.
`
`The “Class Period” as defined in this Complaint begins on at least April 26, 2021 and
`
`runs through the present. Because Plaintiffs do not yet know when the unlawful conduct alleged herein
`
`began, but believe, on information and belief, that the conduct likely began earlier than the date listed
`
`above, Plaintiffs reserve the right to amend the Class Period to comport with the facts and evidence
`
`uncovered during further investigation or through discovery.
`
`11 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 13 of 16
`
`
`
`
`
`63.
`
`Class definition. Plaintiffs bring this action for damages and injunctive relief as a class
`
`action under Federal Rules of Civil Procedure 23(a), 23(b)(2), and 23(b)(3), on behalf of the following
`
`Class:
`
`64.
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`All persons or entities domiciled in the United States that own a
`United States copyright in any work that Google used as a training
`image for the Google–LAION Models during the Class Period.
`
`Defendants named herein;
`
`any of the Defendants’ co-conspirators;
`
`This Class definition excludes:
`a.
`b.
`c.
`d.
`
`any of Defendants’ parent companies, subsidiaries, and affiliates;
`
`any of Defendants’ officers, directors, management, employees, subsidiaries,
`
`affiliates, or agents;
`
`all governmental entities; and
`
`the judges and chambers staff in this case, as well as any members of their
`
`e.
`f.
`
`immediate families.
`65. Numerosity. Plaintiffs do not know the exact number of members in the Class. This
`
`information is in the exclusive control of Defendant. On information and belief, there are at least
`
`thousands of members in the Class geographically dispersed throughout the United States. Therefore,
`
`joinder of all members of the Class in the prosecution of this action is impracticable.
`66.
`
`Typicality. Plaintiffs’ claims are typical of the claims of other members of the Class
`
`because Plaintiffs and all members of the Class were damaged by the same wrongful conduct of
`
`Defendant as alleged herein, and the relief sought herein is common to all members of the Class.
`67.
`
`Adequacy. Plaintiffs will fairly and adequately represent the interests of the members of
`
`the Class because the Plaintiffs have experienced the same harms as the members of the Class and have
`
`no conflicts with any other members of the Class. Furthermore, Plaintiffs have retained sophisticated
`
`and competent counsel who are experienced in prosecuting federal and state class actions, as well as
`
`other complex litigation.
`
`12 · complaint
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 14 of 16
`
`
`
`
`
`68.
`
`Commonality and predominance. Numerous questions of law or fact common to each
`
`Class member arise from Defendants’ conduct and predominate over any questions affecting the
`
`members of the Class individually:
`a. Whether Defendants violated the copyrights of Plaintiffs and the Class when they
`
`obtained copies of Plaintiffs’ copyrighted images and used them to train the Google–
`
`LAION Models.
`b. Whether any affirmative defense excuses Defendants’ conduct.
`c. Whether any statutes of limitation constrain the potential for recovery for Plaintiffs and
`
`the Class.
`69. Other class considerations. Defendants have acted on grounds generally applicable to
`
`the Class. This class action is superior to alternatives, if any, for the fair and efficient adjudication of
`
`this controversy. Prosecuting the claims pleaded herein as a class action will eliminate the possibility of
`
`repetitive litigation. There will be no material difficulty in the management of this action as a class
`
`action.
`
`70.
`
`The prosecution of separate actions by individual Class members would create the risk
`
`of inconsistent or varying adjudications, establishing incompatible standards of conduct for
`
`Defendants.
`
`DEMAND FOR JUDGMENT
`
`Wherefore, Plaintiffs request that the Court enter judgment on their behalf and on behalf of
`
`the Class defined herein, by ordering:
`a) This action may proceed as a class action, with Plaintiffs serving as Class
`
`Representatives, and with Plaintiffs’ counsel as Class Counsel.
`b) Judgment in favor of Plaintiffs and the Class and against Defendant.
`c) An award of statutory and other damages under 17 U.S.C. § 504 for violations of the
`
`copyrights of Plaintiffs and the Class by Defendant.
`d) Destruction or other reasonable disposition of all copies Defendants made or used in
`
`violation of the exclusive rights of Plaintiffs and the Class, under 17 U.S.C. § 503(b).
`
`13 · complaint
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`30
`
`31
`
`
`
`
`
`
`
`Case 3:24-cv-02531 Document 1 Filed 04/26/24 Page 15 of 16
`
`e) Pre- and post-judgment interest on the damages awarded to Plaintiffs and the Class, and
`
`that such interest be awarded at the highest legal rate from and after the date this class
`
`action complaint is first served on Defendant.
`f) Defendants are to be jointly and severally responsible financially for the costs and
`
`expenses of a Court approved notice program through post and media designed to give
`
`immediate notification to the Class.
`g) Further relief for Plaintiffs and the Class as may be just and proper.
`
`JURY TRIAL DEMANDED
`
`Under Federal Rule of Civil Procedure 38(b), Plaintiffs demand a trial by jury of all the claims
`
`asserted in this Complaint so triable.
`
`
`
`Dated: April 26, 2024
`
`
`
`By:
`
`/s/ Joseph R. Saveri
`Joseph R. Saveri
`
`
`
`
`
`Joseph R. Saveri (State Bar No. 130064)
`Cadio Zirpoli (State Bar No. 179108)
`Christopher K. L. Young (State Bar No. 318371)
`Elissa Buchanan (State Bar No. 249996)
`JOSE