`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`
`
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA
`
`In Re Mosaic LLM Litigation
`
`Case No. 24-cv-01451-CRB (LJC)
`
`
`ORDER RESOLVING ECF NO. 176
`Re: Dkt. No. 176
`
`
`Before the Court is the parties’ joint discovery letter brief regarding additional proposed
`search terms and Plaintiffs’ allegedly deficient responses to interrogatories. The Court assumes
`the parties’ familiarity with the procedural and factual background of the case.
`Defendants’ request that Plaintiffs run search terms 1 to 27 is denied. Defendants’ request
`that Plaintiffs run search terms 28 to 99 is denied, and in lieu of these search terms, Defendants are
`ordered to identify up to ten search terms regarding “licensing agreements for AI training data and
`related commentary.” The parties are ordered to further meet and confer to reach an agreement on
`the replacement terms. Plaintiffs are ordered to supplement their responses to Interrogatory
`No. 11 (No. 14 for Plaintiff Keene).
`I. Additional Search Terms
`The parties agreed on thirty search terms that Plaintiffs would run against their ESI.
`Defendants now request that Plaintiffs run an additional ninety-nine search terms. ECF No. 176
`at 3. Defendants’ proposed additional search terms, attached to the instant discovery brief at
`Exhibit A, fall into two categories. Terms 1 to 27 relate to third-party AI tools (such as
`“ChatGPT!” “Ernie!” “Claude!”) and terms 28 to 99 relate to licensing agreements (such as
`“Musk! AND (licens! OR train! OR deal! OR agreem! OR data! OR right! OR AI! OR GenAI!
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 1 of 7
`
`
`
`
`
`
`
`
`2
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`OR pay! OR contract! OR royalt!). See ECF No. 176-1. Plaintiffs object that terms 1 to 27 are
`irrelevant and overbroad and that terms 28 to 99 are not proportionate to the needs of the case. See
`ECF No. 176 at 4.
`A. Terms 1 to 27
`Terms 1 to 27 “seek documents regarding Plaintiffs’ use of, and commentary about, third-
`party AI tools.” ECF No. 176 at 1. Defendants contend that terms 1 to 27 seek documents
`relevant to their fair-use defense, arguing that “Plaintiffs’ use of, and views about, AI tools are
`directly relevant to (1) the transformative nature of these technologies, (2) the public benefits they
`deliver, and (3) Plaintiffs’ theory that they cause market harm.” ECF No. 176 at 1. They also
`argue that the terms are responsive to their RFP Nos. 16 (seeking documents relating to Plaintiffs’
`use of LLMs), 24 (seeking documents concerning the market for copyrighted works in training
`LLMs or other AI technology) and 32 (seeking documents regarding Plaintiffs’ knowledge of the
`use of their copyrighted works for training LLMs). Id.; see, e.g. ECF No. 176-4 at 32-35 (Plaintiff
`Rebecca Makkai’s Objections to Defendants’ First Set of RFPs). Plaintiffs argue that their own
`use of LLMs is irrelevant to Defendants’ fair use defense and that, in any event, they have already
`produced documents hitting on the search string “‘generative ai*’ OR ‘genai*’ OR ‘gen ai*’ OR
`‘artificial intelligence’ OR ‘ai*’.” ECF No. 176 at 4. Without conceding relevance, they argue
`that this search string has already hit on documents related to their use of third-party LLMs, and
`running 27 additional search terms consisting of the names of third-party companies and LLMs
`would be redundant.
`The Court agrees with Plaintiffs. Federal Rule of Civil Procedure 26(b) provides that
`“[p]arties may obtain discovery regarding any nonprivileged matter that is relevant to any party's
`claim or defense and proportional to the needs of the case.” Defendants have not shown that their
`proposed terms are designed to hit on relevant documents. They contend that terms 1 to 27 will
`hit on documents that are “directly relevant to the first and fourth fair-use factors” of their fair-use
`defense. The Copyright Act provides that use of a copyrighted work for purposes considered
`“fair,” such as for “criticism, comment, news reporting, teaching, … scholarship, or research, is
`not an infringement of copyright.” 17 U.S.C. § 107. Although fair use is a flexible concept, the
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 2 of 7
`
`
`
`
`
`
`
`
`3
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`Copyright Act instructs courts to consider four factors to assess if an otherwise infringing use is
`fair:
`
`(1) the purpose and character of the use, including whether such use
`is of a commercial nature or is for nonprofit educational purposes;
`(2) the nature of the copyrighted work; (3) the amount and
`substantiality of the portion used in relation to the copyrighted work
`as a whole; and (4) the effect of the use upon the potential market for
`or value of the copyrighted work.
`Id.; Google LLC v. Oracle America, Inc., 593 U.S. 1, 20 (2021). The focus of the first factor is “to
`see … whether the new work merely supersede[s] the objects of the original creation or instead
`adds something new, with a further purpose or different character, altering the first with new
`expression, meaning, or message.” Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994)
`(quotation marks and citations omitted); see Andy Warhol Found. for the Visual Arts, Inc. v.
`Goldsmith, 598 U.S. 508, 528 (2023). “[I]t asks, in other words, whether and to what extent the
`new work is transformative.” Campbell, 510 U.S. at 579 (quotation marks and citations omitted).
`Thus, the relevant inquiry under factor one is whether, and to what extent, MosaicML’s alleged
`use of Plaintiffs’ works alters or transforms Plaintiffs’ works. The undersigned does not see, and
`Defendants have not explained, how documents regarding Plaintiffs’ use of or commentary about
`other generative AI models would bear on “whether and to what extent” MosaicML’s alleged use
`of Plaintiffs’ works is transformative. Id.
`The fourth factor, considered the most important, “requires courts to consider … the extent
`of market harm caused by the particular actions of the alleged infringer” and “whether unrestricted
`and widespread conduct of the sort engaged in by the defendant ... would result in a substantially
`adverse impact on the potential market for the original.” Id. (quotation marks omitted).
`Defendants contend that terms 1 to 27 will allow them to assess “whether any market impact can
`be traced to the MPT models, which are not generally available, as distinct from generally
`available AI models.” ECF No. 176 at 2. The Court agrees, at a general level, that documents
`showing the market harm to Plaintiffs caused by third-party generative AI models could be
`relevant both to show the impact of “unrestricted and widespread conduct of the sort engaged in
`by the defendant” and to show the impact of Defendants’ versus third parties’ models. Campbell,
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 3 of 7
`
`
`
`
`
`
`
`
`4
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`510 U.S. at 579. But terms 1 to 27 are not tailored to hit on documents regarding the harm caused
`by third party models but rather designed to hit on all documents that include the names of certain
`models. This will presumably encompass documents regarding Plaintiffs’ “use of” and opinions
`of these models, which Defendants have not shown are relevant to the fourth factor. ECF No. 176
`at 1. Plaintiffs have already produced documents responsive to the string “‘generative ai*’ OR
`‘genai*’ OR ‘gen ai*’ OR ‘artificial intelligence’ OR ‘ai*’.” Absent a more robust showing of
`relevance, requiring them to run the names of twenty-seven specific generative AI models and
`companies against their ESI is disproportionate to the needs of the case. Defendants’ request that
`Plaintiffs run terms 1 to 27 is denied.
`B. Terms 28 to 99
`Terms 28 to 99 “concern licensing agreements for AI training data and related
`commentary.” ECF No. 176 at 2. The terms generally use the same format: the name of a
`publisher, technology company or CEO involved in generative AI, or news source, in conjunction
`with the string: “(licens! OR train! OR deal! OR agreem! OR data! OR right! OR AI! OR GenAI!
`OR pay! OR contract! OR royalt!).” See ECF No. 176-1.1 Although Plaintiffs do not dispute that
`documents regarding licensing of their asserted copyrighted works are relevant, they argue that
`terms 28 to 99 are overbroad and disproportionate to the needs of the case. They note that they
`have already collected and produced documents hitting on the parties’ negotiated “search terms
`relating to agreements with publishers and third parties regarding the asserted works,” and have
`agreed to run the string “(Licens! OR agree! OR collab! OR partner! OR permiss!) w/10 data!
`AND train! OR ‘training data’ OR dataset!)” and produce responsive documents, which they
`represent will cover documents “involving seeking or contemplating the use of works as training
`date for AI.” ECF No. 176 at 4 n.6. Plaintiffs contend that terms 28 and 99, which are not
`“connected to Plaintiffs’ asserted works” or licensing for AI training purposes, are thus “overbroad
`and duplicative.” Id. at 4.
`
`1 For example, term 48 is “((Open! /3 AI!) OR OpenAI!) AND (licens! OR train! OR deal! OR
`agreem! OR data! OR right! OR AI! OR GenAI! OR pay! OR contract! OR royalt!).” ECF No.
`146-1 at 4. Term 92 is “(Le /3 Monde!) AND (licens! OR train! OR deal! OR agreem! OR data!
`OR right! OR AI! OR GenAI! OR pay! OR contract! OR royalt!).” Id. at 6.
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 4 of 7
`
`
`
`
`
`
`
`
`5
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`The Court agrees that terms 28 to 99 are unreasonably overbroad. Term 89 is illustrative.
`This proposed term is: “Time! AND (licens! OR train! OR deal! OR agreem! OR data! OR right!
`OR AI! OR GenAI! OR pay! OR contract! OR royalt!)” and hits on 101,797 unique documents
`across the five Plaintiffs. This presumably encompasses documents pertaining to any
`“agreement,” any “deal,” or any “license” the Plaintiffs have with Time, in no way limited to
`documents regarding Plaintiffs’ asserted works or “licensing agreements for AI training data and
`related commentary.” Similarly, term 53—“author! AND (licens! OR train! OR deal! OR agreem!
`OR data! OR right! OR AI! OR GenAI! OR pay! OR contract! OR royalt!)”—hitting on 54,431
`unique documents, is not tailored to hit on relevant and responsive documents. Requiring
`Plaintiffs to run search terms that will likely hit on tens of thousands of unresponsive and
`irrelevant documents is overly burdensome and not proportional to the needs of the case. See Fed.
`R. Civ. P. 26(b).
`Plaintiffs ask the Court to deny Defendants’ request for them to run terms 28 to 99, and
`instead permit Plaintiffs “to run the search terms proposed in their July 17, 2025 proposals
`identified in Exhibit F.” ECF No. 176 at 5. There is no Exhibit F attached to the instant dispute,
`and the Court assumes that Plaintiffs are referring to their proposal at Exhibit E, where they offer
`to run the following search terms: “(big! /3 tech!) w/4 (license OR licensing OR licenses OR deal
`OR deals OR agreem! OR right OR rights)” and “Transform! w/4 (license OR licensing OR
`licenses OR deal OR deals OR agreem! OR right OR rights).” 176-5 at 2. The Court declines to
`adopt Plaintiffs proposal and instead orders the following:
`First, if Plaintiffs have not already done so, they must run the string “(Licens! OR agree!
`OR collab! OR partner! OR permiss!) w/10 data! AND train! OR ‘training data’ OR dataset!)” and
`produce responsive documents immediately.
`Second, in lieu of terms 28 to 99, by November 6, 2025, Defendants may propose up to ten
`replacement search terms designed to hit on documents concerning “licensing agreements for AI
`training data and related commentary.” ECF No. 176 at 2. If Plaintiffs object to Defendants’
`proposed terms, they must send their objections to Defendants by November 10, 2025, and, for
`each term they object to they must (1) provide hit counts (using the same format as the parties’
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 5 of 7
`
`
`
`
`
`
`
`
`6
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`Exhibit A) and (2) identify which portion(s) of the proposed terms generate noise. The parties
`must then meet and confer in person or over video conference and engage in a good-faith effort to
`reach an agreement as to the ten search terms. If the parties are unable to resolve their dispute,
`they may submit a joint status report to the Court, not to exceed five pages, by November 13,
`2025. The status report must include Defendants’ proposed terms and Plaintiffs’ objections and
`hit reports.
`II. Response to Interrogatory No. 11/No. 14
`In addition to the search term dispute, Defendants complain that the five Plaintiffs’
`responses to Interrogatory No. 11 (Interrogatory No. 14 for Plaintiff Keene) are deficient. ECF
`No. 176 at 2. Interrogatory No. 11/14 requests that each Plaintiff “identify all locations on the
`internet, including any websites or data repositories, where any copies of Your Asserted Works, in
`whole or in part, are or have been available, and for each, state whether the Work was made
`available with Your authorization and any steps You have taken to have those copies
`removed.” ECF No. 176-4 at 4 (Interrogatories to Plaintiff Rebecca Makkai). Plaintiffs argue that
`as the internet includes an estimated 1.2 billion webpages, it is unreasonable for them to identify
`all locations where their works are available. ECF No. 176 at 4. They propose identifying the
`“traditional distribution channels” where their works are available and any takedown requests they
`have made. Id. at 5. Defendants characterize Plaintiffs’ objections as unnecessary hand-waiving
`and clarify that Plaintiffs do not need to “search the entire internet,” but rather just respond “to the
`fullest extent possible” after making a reasonable investigation. Id.
`Federal Rule of Civil Procedure 33(b) requires that interrogatories must be answered
`“fully.” “In general, a responding party is not required ‘to conduct extensive research in order to
`answer an interrogatory, but a reasonable effort to respond must be made.’” Gorrell v. Sneath,
`292 F.R.D. 629, 632 (E.D. Cal. 2013) (quoting Haney v. Saldana, 04-cv-05936, 2010 WL
`3341939, at *3 (E.D. Cal. Sept. 21, 2007)). “A party has an obligation to conduct a reasonable
`inquiry into the factual basis of its discovery responses.” Nat’l Acad. of Recording Arts & Scis.,
`Inc. v. On Point Events, LP, 256 F.R.D. 678, 680 (C.D. Cal. 2009). The Court agrees with
`Defendants that, as Plaintiffs need not “conduct extensive research” to respond to the
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 6 of 7
`
`
`
`
`
`
`
`7
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`interrogatory, Plaintiffs are not required to “scour all locations on the internet and locate every
`single webpage where Plaintiffs’ copyrighted works appear.” Gorrell, 292 F.R.D. at 632
`(quotation marks omitted); ECF No. 176 at 5. Plaintiffs may satisfy their obligation to fully
`respond by identifying all locations on the internet where they know their works are available,
`conduct a reasonable and good faith search to identify other locations, identify the “traditional
`distribution channels” containing Plaintiff’s asserted works, and describe all efforts Plaintiffs have
`taken to remove unauthorized copies. Plaintiffs must supplement their responses to Interrogatory
`Nos. 11/14 by November 12, 2025.
`IT IS SO ORDERED.
`Dated: November 4, 2025
`LISA J. CISNEROS
`United States Magistrate Judge
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 7 of 7
`
`
`
`
`
`
`
`



