by Dennis Crouch
Anthropic and a certified class of book authors have reportedly reached a class-wide settlement in Bartz v. Anthropic PBC, the Northern District of California case challenging the company’s ingestion of millions of books as training data to build Claude. The parties filed a notice on August 26, 2025, stating that they executed a binding term sheet and will seek preliminary approval in early September. The procedural posture reveals two critical features: (1) Judge Alsup’s June order finding that training on lawfully obtained books constituted fair use as a matter of law; and (2) his conclusion that Anthropic’s acquisition and storage of “pirated” works in a central library could constitute infringement—potentially exposing the company to staggering statutory damages.
The settlement averts that trial (and class certification appeal) but not the broader policy questions surrounding AI training and copyright. Judge Alsup’s analysis demonstrates why fair use isn’t a blanket defense for LLM development. While he granted summary judgment for Anthropic on training with lawfully obtained copies, he distinguished between “training” and maintaining a “shadow library” as separate uses under copyright law. This distinction matters: even where certain training uses qualify as fair use, developers remain exposed to infringement claims based on how they acquire and store training materials. The ruling reinforces that fair use under 17 U.S.C. § 107 requires a fact-specific, work-by-work analysis—not a categorical exemption for AI development.
The Evolving Policy Landscape
The settlement occurs against a shifting federal backdrop on AI and copyright policy. Earlier this year, the U.S. Copyright Office released a pre-publication report concluding that many scraping-based training regimes would constitute infringement absent explicit permission from rightsholders. However, the day after the report’s release, President Trump removed Register of Copyrights Shira Perlmutter – a move I interpret as partially reflecting the administration’s prioritization of AI development over copyright enforcement concerns.
This policy tension raises fundamental questions about the appropriate balance between innovation incentives and authorial rights. One way to frame current developments is as an effort to normalize uncompensated use of copyrighted works for AI “infrastructure.” From a constitutional perspective, if the government were to mandate or effectively authorize such uncompensated use, this might constitute a Taking requiring just compensation under the Fifth Amendment. Alternatively, one might invoke the more provocative analogy of Johnson v. M’Intosh, 21 U.S. (8 Wheat.) 543 (1823), which describes assertions of sovereign prerogative over private interests without compensation requirements.
Toward a Licensing Framework
I generally favor an approach that promotes AI development while recognizing authors’ legitimate interests in compensation. The challenge lies in avoiding copyright thickets that would make legitimate AI training prohibitively expensive or practically impossible because of transaction costs, holdout problems, and fair use uncertainty. But US Copyright law has already solved this type of problem on several occasions with tools to reduce these barriers while ensuring rightsholder compensation through statutory or collective licensing schemes. Here, a tailored training-data license – potentially with opt-out provisions – could operate well: acknowledging authors’ economic interests, reducing incentives for clandestine scraping, and bringing legitimate firms into compliance.
Competitive Implications
Anthropic’s willingness to settle reflects its position as a well-capitalized market leader. Paying substantial settlement amounts is feasible for companies with major cash reserves like Anthropic, Google and OpenAI , but this dynamic creates concerning competitive effects. If access to lawful training corpora requires large upfront payments, the playing field tilts toward incumbents and disadvantages new entrants and undercapitalized startups by creating a costly barrier to entry. I believe any new framework should take pains to ensure broad access rather than exclusive arrangements, with cost structures that don’t erect barriers to entry for the next wave of AI innovators.
Technical Workarounds
From an engineering perspective, some developers are exploring approaches that reduce copying risks altogether. A streaming “read-and-destroy” methodology could download and process training materials without creating permanent copies – perhaps maintaining data only in temporary memory during direct GPU training. While such approaches diverge from current training norms, they represent potential technical solutions that could sidestep infringement concerns entirely.
What are your thought on how this should all play out?