Copyright Thicket and President Trump’s AI Training Data Solution

by Dennis Crouch

President Trump’s recent remarks at the White House AI summit signaled an important policy shift in the AI copyright debate. “Common sense” application of intellectual property rules should allow AI developers to train on books and articles without paying or negotiating a license for each one.  The bottom line: requiring copyright licensing would hamper the US’s ability to have continued AI growth.

 You can’t be expected to have a successful AI program when every single article, book, or anything else that you’ve read or studied, you’re supposed to pay for. Gee. I read a book. I’m supposed to pay somebody. And, you know, we we appreciate that, but you just can’t do it because it’s not doable. And if you’re going to try and do that, you’re not going to have a successful program.

The US continues to see AI development as a particular American national interest — both as the key to continued economic growth and for national security purposes.  For Trump, China immediately comes to mind, with his statement “China’s not” paying for copyright. “And if you’re going to be beating China . . . you have to be able to play by the same set of rules.”

Although his suggestion here is that training data should be fair use, President Trump concludes that outputs should not be copied: “Of course, you can’t copy or plagiarize an article. But if you read an article and learn from it, we have to allow AI to use that pool of knowledge without going through the complexity of contract negotiations of which there would be thousands for every time we use AI.”

These remarks come on the heels of upheaval at the Copyright Office and a new federal AI strategy. In May 2025, the Trump Administration removed Librarian of Congress Carla Hayden and, days later, fired Shira Perlmutter, the Register of Copyrights.  Those firings appeared to be tied to the Copyright Office’s release of a landmark report on generative AI training that concluded many current industry practices “likely do not qualify as fair use.”  That report (issued in pre-publication form) reasoned that ingesting entire copyrighted works into an AI is prima facie infringement absent an exception, and it ultimately suggested that in many situations the use “will not be fair use”

Although President Trump discussed copyright in his speech, the actual AI Action Plan barely mentioned intellectual property, with only a brief note about addressing security risks to American IP.  There is also no mention of IP protection for AI outputs.

Trump’s remarks come amid dozens of pending lawsuits filed by authors, artists, publishers, and other rights-holders against AI developers, accusing them of copyright infringement for using books, articles, code, images, and music in training. The defendants have uniformly asserted fair use as a defense.

One way to think about the concern here in the AI copyright debate is idea of “copyright thickets,” analogous to patent thickets. Patent thickets have been described as dense webs of overlapping patent rights that complicate or block subsequent technological development. Similarly, copyright thickets can block AI developers because of the fragmented rights across vast amounts of training data, requiring potentially millions of individual permissions or licenses.  But, when they arise, the copyright thickets pose an even greater challenge than their patent counterparts due to their substantially longer duration.  Patents generally last 20 years from filing and are most often held by the same company during their entire patent term. Copyrights span generations and are usually transferred multiple times during their term — often without recordation of ownership.   The general idea here is that of an “anticommons.”  Michael A. Heller, The Tragedy of the Anticommons: Property in the Transition from Marx to Markets, 111 HARV. L. REV. 621 (1998).

Inherent in the thicket problem is a generational tension: rights holders from past generations argue they are entitled to reap benefits from their creative efforts — they did the work and deserve the promised rights. However, those rights are now potentially blocking access to the best AI possible. The question then becomes one of balance: how to honor past creativity without obstructing the next generation’s ability to innovate.

One potential solution already used in copyright law is compulsory licensing pools.  Such licensing mechanisms consolidate rights into a single pool accessible via standardized fees and terms. Adopting a similar model for AI training data could alleviate many of the transaction costs and complexities associated with copyright thickets, ensuring fair compensation for rights holders while fostering ongoing innovation. President Trump’s pessimism of “it’s not doable” is true if you imagine individual license negotiations with every US copyright holder on earth. But, that issue can be solved with collective licensing models.  In its 2025 report mentioned above, the US Copyright Office examined this question in some depth but ultimately declined to endorse such regimes without true evidence of market failure.

In a recent law review article, Using Intellectual Property to Regulate Artificial Intelligence, 89 Mo. L. Rev. 1 (2024), I argued that IP law is an ineffective tool for directly governing AI’s trajectory and that over-reliance on IP as a regulatory tool can actually hinder innovation.  The article advances two primary claims that are directly relevant to the current copyright debate: first, that while IP plays a role in guiding innovative behaviors in AI development, it lacks the levers necessary to address the broader societal implications of AI technology; and second, that IP rights may actually hinder AI regulation and development through mechanisms like stringent copyright restrictions on training data. This second point is the primary focus of Trump’s remarks, as the tension between copyright protection and AI innovation represents exactly the kind of regulatory mismatch the article identified.  See also, Mark A. Lemley & Bryan Casey, Fair Learning, 99 Tex. L. Rev. 743 (2021) (arguing that AI ‘reading’ of copyright works for training should be considered fair use).

2 thoughts on “Copyright Thicket and President Trump’s AI Training Data Solution

  1. 1

    As previously noted, the flip side of weak or non-existent copyright law recovery for AI system data and text collections and distributions is likely to be the incouraging of increased placements of data and text originating sources behind paywalls to prevent their digital scraping. Which can result in inaccurate or incomplete AI system responses.

Leave a Reply