Raw Story v. OpenAI: The Constitutional Hurdle That Tripped Up Raw Story's AI Lawsuit

by Dennis Crouch

In my view, some of the weakest anti-AI copyright claims have fallen under 17 U.S.C. § 1202(b)(1) – an element of the Digital Millennium Copyright Act (DMCA) that prohibits intentional removal or alteration of copyright management information (CMI). The statute broadly defines CMI to include not just copyright notices, but also titles, author information, owner information, terms of use, and other identifying information conveyed with copies of works. Any violation also requires proof that the CMI-remover had “reasonable grounds to know” that such removal would enable or conceal copyright infringement.

In Raw Story v. OpenAI, the online news organization alleged that OpenAI violated § 1202(b)(1) by removing copyright management information (CMI) from thousands of their news articles when incorporating them into training datasets for ChatGPT. Notably, the plaintiffs did not bring direct copyright infringement claims, instead focusing solely on alleged CMI removal. The articles in question were published online with author, title, and copyright information, which plaintiffs claimed OpenAI stripped away when creating its training sets. While OpenAI has not published the contents of these training sets, plaintiffs relied on “approximations” suggesting their articles appeared without CMI. They argued this evidenced intentional CMI removal, reasoning that if ChatGPT had been trained on articles with intact CMI, it would output such information when generating responses.

The most recent news in the case is that S.D.N.Y. Judge Colleen McMahon has dismissed the claims brought by Raw Story (and AlterNet Media) — holding that the plaintiffs lacked Article III standing to pursue the case.

Raw Story Order

The first three Articles of the U.S. Constitution establish our system of separated powers: Article I vests legislative power in Congress; Article II establishes executive power under the President; and Article III creates the federal judiciary, limiting its power to deciding actual “Cases” and “Controversies” rather than hypothetical questions. From this constitutional requirement, the court has developed its standing requirement that requires plaintiffs to establish three essential elements: (1) a concrete and particularized injury-in-fact that is actual or imminent, not conjectural or hypothetical, (2) a causal connection between that injury and the defendant’s challenged conduct, and (3) a likelihood that the injury would be redressed by a favorable judicial decision. Lujan v. Defenders of Wildlife, 504 U.S. 555 (1992). The actual injury requirement is particularly crucial – the Supreme Court has repeatedly emphasized that plaintiffs cannot rely on bare statutory violations alone, but must demonstrate real-world harm or a material risk of harm.

The basic standing argument in this case also centers on injury — with the district court agreeing with OpenAI that, even if CMI removal violated the statute, the plaintiffs failed to show any way that they were harmed by the purported removal. This fits well with the Supreme Court’s recent decision in TransUnion LLC v. Ramirez, 594 U.S. 413 (2021), which held that plaintiffs lacked standing to sue over inaccurate information in internal credit files that were never disseminated. Drawing this parallel, the court concluded that alleged CMI removal from internal training data, without more, cannot support standing.

Particularly noteworthy was the court’s rejection of plaintiffs’ attempt to analogize their injury to traditional property-based copyright harms – where infringement is the injury. While acknowledging that Congress intended § 1202 to protect copyright interests, Judge McMahon emphasized that the DMCA’s CMI provisions serve a different purpose than core copyright protections – they aim to “ensure the integrity of the electronic marketplace by preventing fraud and misinformation,” not to vindicate property rights directly.

The court distinguished this case from situations where CMI removal might support standing, such as if plaintiffs had alleged ChatGPT actually disseminated facsimiles of their articles without proper attribution — as has been alleged in other cases against OpenAI.

The court also addressed plaintiffs’ alternative argument for standing to seek injunctive relief based on the risk of future harm from ChatGPT potentially outputting their content without attribution. While acknowledging that threatened future injuries can sometimes support claims for injunctive relief, the district court found the allegations too speculative given the vast amount of training data involved and lack of specific examples suggesting imminent harm to plaintiffs’ works.

The court attempted to add some caveats – saying that the ruling should not be read as blessing unauthorized use of copyrighted materials for AI training. Rather, the holding here is that the DMCA CMI removal claim is not the proper avenue for addressing the issue — leaving open whether there may be another legal theory that works — presumably including direct copyright infringement claims.