throbber

`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`
`
`
`UNITED STATES DISTRICT COURT
`NORTHERN DISTRICT OF CALIFORNIA
`
`In Re Mosaic LLM Litigation
`
`Case No. 24-cv-01451-CRB (LJC)
`
`
`ORDER RESOLVING ECF NO. 176
`Re: Dkt. No. 176
`
`
`Before the Court is the parties’ joint discovery letter brief regarding additional proposed
`search terms and Plaintiffs’ allegedly deficient responses to interrogatories. The Court assumes
`the parties’ familiarity with the procedural and factual background of the case.
`Defendants’ request that Plaintiffs run search terms 1 to 27 is denied. Defendants’ request
`that Plaintiffs run search terms 28 to 99 is denied, and in lieu of these search terms, Defendants are
`ordered to identify up to ten search terms regarding “licensing agreements for AI training data and
`related commentary.” The parties are ordered to further meet and confer to reach an agreement on
`the replacement terms. Plaintiffs are ordered to supplement their responses to Interrogatory
`No. 11 (No. 14 for Plaintiff Keene).
`I. Additional Search Terms
`The parties agreed on thirty search terms that Plaintiffs would run against their ESI.
`Defendants now request that Plaintiffs run an additional ninety-nine search terms. ECF No. 176
`at 3. Defendants’ proposed additional search terms, attached to the instant discovery brief at
`Exhibit A, fall into two categories. Terms 1 to 27 relate to third-party AI tools (such as
`“ChatGPT!” “Ernie!” “Claude!”) and terms 28 to 99 relate to licensing agreements (such as
`“Musk! AND (licens! OR train! OR deal! OR agreem! OR data! OR right! OR AI! OR GenAI!
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 1 of 7
`
`
`
`
`
`
`
`
`2
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`OR pay! OR contract! OR royalt!). See ECF No. 176-1. Plaintiffs object that terms 1 to 27 are
`irrelevant and overbroad and that terms 28 to 99 are not proportionate to the needs of the case. See
`ECF No. 176 at 4.
`A. Terms 1 to 27
`Terms 1 to 27 “seek documents regarding Plaintiffs’ use of, and commentary about, third-
`party AI tools.” ECF No. 176 at 1. Defendants contend that terms 1 to 27 seek documents
`relevant to their fair-use defense, arguing that “Plaintiffs’ use of, and views about, AI tools are
`directly relevant to (1) the transformative nature of these technologies, (2) the public benefits they
`deliver, and (3) Plaintiffs’ theory that they cause market harm.” ECF No. 176 at 1. They also
`argue that the terms are responsive to their RFP Nos. 16 (seeking documents relating to Plaintiffs’
`use of LLMs), 24 (seeking documents concerning the market for copyrighted works in training
`LLMs or other AI technology) and 32 (seeking documents regarding Plaintiffs’ knowledge of the
`use of their copyrighted works for training LLMs). Id.; see, e.g. ECF No. 176-4 at 32-35 (Plaintiff
`Rebecca Makkai’s Objections to Defendants’ First Set of RFPs). Plaintiffs argue that their own
`use of LLMs is irrelevant to Defendants’ fair use defense and that, in any event, they have already
`produced documents hitting on the search string “‘generative ai*’ OR ‘genai*’ OR ‘gen ai*’ OR
`‘artificial intelligence’ OR ‘ai*’.” ECF No. 176 at 4. Without conceding relevance, they argue
`that this search string has already hit on documents related to their use of third-party LLMs, and
`running 27 additional search terms consisting of the names of third-party companies and LLMs
`would be redundant.
`The Court agrees with Plaintiffs. Federal Rule of Civil Procedure 26(b) provides that
`“[p]arties may obtain discovery regarding any nonprivileged matter that is relevant to any party's
`claim or defense and proportional to the needs of the case.” Defendants have not shown that their
`proposed terms are designed to hit on relevant documents. They contend that terms 1 to 27 will
`hit on documents that are “directly relevant to the first and fourth fair-use factors” of their fair-use
`defense. The Copyright Act provides that use of a copyrighted work for purposes considered
`“fair,” such as for “criticism, comment, news reporting, teaching, … scholarship, or research, is
`not an infringement of copyright.” 17 U.S.C. § 107. Although fair use is a flexible concept, the
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 2 of 7
`
`
`
`
`
`
`
`
`3
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`Copyright Act instructs courts to consider four factors to assess if an otherwise infringing use is
`fair:
`
`(1) the purpose and character of the use, including whether such use
`is of a commercial nature or is for nonprofit educational purposes;
`(2) the nature of the copyrighted work; (3) the amount and
`substantiality of the portion used in relation to the copyrighted work
`as a whole; and (4) the effect of the use upon the potential market for
`or value of the copyrighted work.
`Id.; Google LLC v. Oracle America, Inc., 593 U.S. 1, 20 (2021). The focus of the first factor is “to
`see … whether the new work merely supersede[s] the objects of the original creation or instead
`adds something new, with a further purpose or different character, altering the first with new
`expression, meaning, or message.” Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 579 (1994)
`(quotation marks and citations omitted); see Andy Warhol Found. for the Visual Arts, Inc. v.
`Goldsmith, 598 U.S. 508, 528 (2023). “[I]t asks, in other words, whether and to what extent the
`new work is transformative.” Campbell, 510 U.S. at 579 (quotation marks and citations omitted).
`Thus, the relevant inquiry under factor one is whether, and to what extent, MosaicML’s alleged
`use of Plaintiffs’ works alters or transforms Plaintiffs’ works. The undersigned does not see, and
`Defendants have not explained, how documents regarding Plaintiffs’ use of or commentary about
`other generative AI models would bear on “whether and to what extent” MosaicML’s alleged use
`of Plaintiffs’ works is transformative. Id.
`The fourth factor, considered the most important, “requires courts to consider … the extent
`of market harm caused by the particular actions of the alleged infringer” and “whether unrestricted
`and widespread conduct of the sort engaged in by the defendant ... would result in a substantially
`adverse impact on the potential market for the original.” Id. (quotation marks omitted).
`Defendants contend that terms 1 to 27 will allow them to assess “whether any market impact can
`be traced to the MPT models, which are not generally available, as distinct from generally
`available AI models.” ECF No. 176 at 2. The Court agrees, at a general level, that documents
`showing the market harm to Plaintiffs caused by third-party generative AI models could be
`relevant both to show the impact of “unrestricted and widespread conduct of the sort engaged in
`by the defendant” and to show the impact of Defendants’ versus third parties’ models. Campbell,
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 3 of 7
`
`
`
`
`
`
`
`
`4
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`510 U.S. at 579. But terms 1 to 27 are not tailored to hit on documents regarding the harm caused
`by third party models but rather designed to hit on all documents that include the names of certain
`models. This will presumably encompass documents regarding Plaintiffs’ “use of” and opinions
`of these models, which Defendants have not shown are relevant to the fourth factor. ECF No. 176
`at 1. Plaintiffs have already produced documents responsive to the string “‘generative ai*’ OR
`‘genai*’ OR ‘gen ai*’ OR ‘artificial intelligence’ OR ‘ai*’.” Absent a more robust showing of
`relevance, requiring them to run the names of twenty-seven specific generative AI models and
`companies against their ESI is disproportionate to the needs of the case. Defendants’ request that
`Plaintiffs run terms 1 to 27 is denied.
`B. Terms 28 to 99
`Terms 28 to 99 “concern licensing agreements for AI training data and related
`commentary.” ECF No. 176 at 2. The terms generally use the same format: the name of a
`publisher, technology company or CEO involved in generative AI, or news source, in conjunction
`with the string: “(licens! OR train! OR deal! OR agreem! OR data! OR right! OR AI! OR GenAI!
`OR pay! OR contract! OR royalt!).” See ECF No. 176-1.1 Although Plaintiffs do not dispute that
`documents regarding licensing of their asserted copyrighted works are relevant, they argue that
`terms 28 to 99 are overbroad and disproportionate to the needs of the case. They note that they
`have already collected and produced documents hitting on the parties’ negotiated “search terms
`relating to agreements with publishers and third parties regarding the asserted works,” and have
`agreed to run the string “(Licens! OR agree! OR collab! OR partner! OR permiss!) w/10 data!
`AND train! OR ‘training data’ OR dataset!)” and produce responsive documents, which they
`represent will cover documents “involving seeking or contemplating the use of works as training
`date for AI.” ECF No. 176 at 4 n.6. Plaintiffs contend that terms 28 and 99, which are not
`“connected to Plaintiffs’ asserted works” or licensing for AI training purposes, are thus “overbroad
`and duplicative.” Id. at 4.
`
`1 For example, term 48 is “((Open! /3 AI!) OR OpenAI!) AND (licens! OR train! OR deal! OR
`agreem! OR data! OR right! OR AI! OR GenAI! OR pay! OR contract! OR royalt!).” ECF No.
`146-1 at 4. Term 92 is “(Le /3 Monde!) AND (licens! OR train! OR deal! OR agreem! OR data!
`OR right! OR AI! OR GenAI! OR pay! OR contract! OR royalt!).” Id. at 6.
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 4 of 7
`
`
`
`
`
`
`
`
`5
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`The Court agrees that terms 28 to 99 are unreasonably overbroad. Term 89 is illustrative.
`This proposed term is: “Time! AND (licens! OR train! OR deal! OR agreem! OR data! OR right!
`OR AI! OR GenAI! OR pay! OR contract! OR royalt!)” and hits on 101,797 unique documents
`across the five Plaintiffs. This presumably encompasses documents pertaining to any
`“agreement,” any “deal,” or any “license” the Plaintiffs have with Time, in no way limited to
`documents regarding Plaintiffs’ asserted works or “licensing agreements for AI training data and
`related commentary.” Similarly, term 53—“author! AND (licens! OR train! OR deal! OR agreem!
`OR data! OR right! OR AI! OR GenAI! OR pay! OR contract! OR royalt!)”—hitting on 54,431
`unique documents, is not tailored to hit on relevant and responsive documents. Requiring
`Plaintiffs to run search terms that will likely hit on tens of thousands of unresponsive and
`irrelevant documents is overly burdensome and not proportional to the needs of the case. See Fed.
`R. Civ. P. 26(b).
`Plaintiffs ask the Court to deny Defendants’ request for them to run terms 28 to 99, and
`instead permit Plaintiffs “to run the search terms proposed in their July 17, 2025 proposals
`identified in Exhibit F.” ECF No. 176 at 5. There is no Exhibit F attached to the instant dispute,
`and the Court assumes that Plaintiffs are referring to their proposal at Exhibit E, where they offer
`to run the following search terms: “(big! /3 tech!) w/4 (license OR licensing OR licenses OR deal
`OR deals OR agreem! OR right OR rights)” and “Transform! w/4 (license OR licensing OR
`licenses OR deal OR deals OR agreem! OR right OR rights).” 176-5 at 2. The Court declines to
`adopt Plaintiffs proposal and instead orders the following:
`First, if Plaintiffs have not already done so, they must run the string “(Licens! OR agree!
`OR collab! OR partner! OR permiss!) w/10 data! AND train! OR ‘training data’ OR dataset!)” and
`produce responsive documents immediately.
`Second, in lieu of terms 28 to 99, by November 6, 2025, Defendants may propose up to ten
`replacement search terms designed to hit on documents concerning “licensing agreements for AI
`training data and related commentary.” ECF No. 176 at 2. If Plaintiffs object to Defendants’
`proposed terms, they must send their objections to Defendants by November 10, 2025, and, for
`each term they object to they must (1) provide hit counts (using the same format as the parties’
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 5 of 7
`
`
`
`
`
`
`
`
`6
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`Exhibit A) and (2) identify which portion(s) of the proposed terms generate noise. The parties
`must then meet and confer in person or over video conference and engage in a good-faith effort to
`reach an agreement as to the ten search terms. If the parties are unable to resolve their dispute,
`they may submit a joint status report to the Court, not to exceed five pages, by November 13,
`2025. The status report must include Defendants’ proposed terms and Plaintiffs’ objections and
`hit reports.
`II. Response to Interrogatory No. 11/No. 14
`In addition to the search term dispute, Defendants complain that the five Plaintiffs’
`responses to Interrogatory No. 11 (Interrogatory No. 14 for Plaintiff Keene) are deficient. ECF
`No. 176 at 2. Interrogatory No. 11/14 requests that each Plaintiff “identify all locations on the
`internet, including any websites or data repositories, where any copies of Your Asserted Works, in
`whole or in part, are or have been available, and for each, state whether the Work was made
`available with Your authorization and any steps You have taken to have those copies
`removed.” ECF No. 176-4 at 4 (Interrogatories to Plaintiff Rebecca Makkai). Plaintiffs argue that
`as the internet includes an estimated 1.2 billion webpages, it is unreasonable for them to identify
`all locations where their works are available. ECF No. 176 at 4. They propose identifying the
`“traditional distribution channels” where their works are available and any takedown requests they
`have made. Id. at 5. Defendants characterize Plaintiffs’ objections as unnecessary hand-waiving
`and clarify that Plaintiffs do not need to “search the entire internet,” but rather just respond “to the
`fullest extent possible” after making a reasonable investigation. Id.
`Federal Rule of Civil Procedure 33(b) requires that interrogatories must be answered
`“fully.” “In general, a responding party is not required ‘to conduct extensive research in order to
`answer an interrogatory, but a reasonable effort to respond must be made.’” Gorrell v. Sneath,
`292 F.R.D. 629, 632 (E.D. Cal. 2013) (quoting Haney v. Saldana, 04-cv-05936, 2010 WL
`3341939, at *3 (E.D. Cal. Sept. 21, 2007)). “A party has an obligation to conduct a reasonable
`inquiry into the factual basis of its discovery responses.” Nat’l Acad. of Recording Arts & Scis.,
`Inc. v. On Point Events, LP, 256 F.R.D. 678, 680 (C.D. Cal. 2009). The Court agrees with
`Defendants that, as Plaintiffs need not “conduct extensive research” to respond to the
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 6 of 7
`
`
`
`
`
`
`
`7
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`12
`13
`14
`15
`16
`17
`18
`19
`20
`21
`22
`23
`24
`25
`26
`27
`28
`United States District Court
`Northern District of California
`interrogatory, Plaintiffs are not required to “scour all locations on the internet and locate every
`single webpage where Plaintiffs’ copyrighted works appear.” Gorrell, 292 F.R.D. at 632
`(quotation marks omitted); ECF No. 176 at 5. Plaintiffs may satisfy their obligation to fully
`respond by identifying all locations on the internet where they know their works are available,
`conduct a reasonable and good faith search to identify other locations, identify the “traditional
`distribution channels” containing Plaintiff’s asserted works, and describe all efforts Plaintiffs have
`taken to remove unauthorized copies. Plaintiffs must supplement their responses to Interrogatory
`Nos. 11/14 by November 12, 2025.
`IT IS SO ORDERED.
`Dated: November 4, 2025
`LISA J. CISNEROS
`United States Magistrate Judge
`Case 3:24-cv-01451-CRB Document 180 Filed 11/04/25 Page 7 of 7
`
`
`
`
`
`
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket