US Copyright Office Generative AI Inquiry: Where are the Thresholds?

by Dennis Crouch

Generative Artificial intelligence (GenAI) systems like MidJourney and ChatGPT that can generate creative works have brought a wave of new questions and complexities to copyright law. On the heels of a recent court decision denying registrability of AI created work, the U.S. Copyright Office recently issued a formal notice of inquiry seeking public comments to help analyze AI’s copyright implications and form policy recommendations for both the Office and for Congress. The notice is quite extensive and raises fundamental questions that many have been discussing for several years about copyrightability of AI outputs, use of copyrighted material to train AI systems, infringement liability, labeling AI content, and more. The Copyright Office’s inquiry is an attempt to respond to AI’s rapidly growing impact on creative industries. [Link to the Notice]

The following is a rough overview of three core inquiries that I identified in the notice.  It is also easy to just read it yourself by clicking on the notice above.

A core inquiry is whether original works that would ordinarily be copyrightable should be denied unless a human author is identified. Generative AI models produce outputs like text, art, music, and video that appear highly creative and would certainly meet copyright’s originality standard if created by natural people.  Further, if human contribution is required, the questions shift to the level of human contribution necessary and procedural requirements to claim and prove human authorship. As the notice states, “Although we believe the law is clear that copyright protection in the United States is limited to works of human authorship, questions remain about where and how to draw the line between human creation and AI-generated content.” Factors could be the relative or absolute level of human input, creative control by the human, or even a word count.  With copyright it is helpful to have some bright lines to streamline the process of registration without substantial case-by-case lawyer input for each copyrighted work, but any hard rule might skip over the nuanced. Although the notice focuses on copyrightability, ownership questions will also come into play.

A second important core inquiry focuses on training data that is fundamental to today’s generative AI models. The copyright office seeks input on the legality of training generative models on copyrighted works obtained via the open internet, but without an express license. In particular, the Office seeks information about “the collection and curation of AI datasets, how those datasets are used to train AI models, the sources of materials ingested into training, and whether permission by and/or compensation for copyright owners is or should be required when their works are included.”  Presumably different training models could have different copyright implications.  In particular, an approach that does not store or actually copy the underlying works would be less likely to be be infringing.

In building the training model, we often have copying of works without license, and so the key inquiry under current law appears to be the extent that fair use applies to protect the AI system generators.  In other areas, Congress and the Copyright Office have stepped in with compulsory licensing models, that could possibly work here — a system of providing a few pennies for each web page. Our system also supports approaches to voluntary collective licensing via joint management organizations; perhaps supported by a minimum royalty rate. An issue here is that many of the folks creating training data are doing so secretly and would like to maintain their data and how the model is using the data as trade secret information.  That lack of transparency will raise technical challenges and costs for the underlying copyright holders.

A third core area focuses on infringement liability associated with AI-outputs that result in a copy or improper derivative work.  Who is liable — the AI system developers, model creators, and/or end users? A traditional approach would allow for joint liability.  Again though, the lack of transparency makes things potentially difficult to prove copying, but perhaps availability and likelihood are enough.  On this point, notice also asks about the idea of labeling or watermarking AI content as suggested a recent White House / Industry agreement.  Although I see this issue as outside of copyright law, the inquiry suggests some penalty for failure to label.

Everyone is floundering a bit in terms of how incorporate generative AI into our world view.  I see the Copyright Office AI inquiry as a real attempt to seek creative and potentially transformative solutions. The public is invited to provide input by submitting comments by the October 18, 2023 deadline. There will also be a short response period for reply comments responding to initial submissions that closes on November 15, 2023.

34 thoughts on “US Copyright Office Generative AI Inquiry: Where are the Thresholds?

  1. 9

    Perhaps the case Burrow-Giles Lithographic Co. v. Sarony, 111 U.S. 53 (1884) has some relevance. This involved the question as to whether photographs were copyrightable.

    Burrow-Giles tried to argue that photography (similar in some respects to use of an AI tool today) was merely a mechanical process rather than an art, and could not embody an author’s “idea”.

    The Supreme Court disagreed. Sarony had “by posing the said Oscar Wilde in front of the camera, selecting and arranging the costume, draperies, and other various accessories in said photograph, arranging the subject so as to present graceful outlines, arranging and disposing the light and shade, suggesting and evoking the desired expression, and from such disposition, arrangement, or representation, made entirely by the plaintiff, he produced the picture in suit.” This control that Sarony exercised over the subject matter showed that he was an “author” of “an original work of art”.

    Perhaps then the amount of input controls provided to AI by a human could be equated to posing and arranging a subject, setting of parameters and limits, etc. Even though the camera/AI tool does most of the actual work in generating the output, the final selection of output material and elimination of worthless junk product can also be creative.

    1. 9.1

      Certainly — but this has already been provided in the copyright context, and similar also to the copyright context, when AI is the one that takes the photograph (or is the inventor), then no designated IP protection for you.

      1. 9.1.1

        Quite this, it is both most akin and as well reads onto the Lochean-derived prostrictive whereupon a video camera set to record at random times in random directions produces no stills that can find purchase in copyright protection.

        1. 9.1.1.1

          I did chuckle at your prose here Malcolm, but that aside, I note that you (as usual), have
          F
          A
          I
          L
          E
          D
          to engage on the merits of my (many) counter points to you.

          While you casually joust with 6 as to whom may be better labeled with “Gramps,” your lack of substantive engagement and steadfast denial of reality is more convincing than any thing (so far) that you have chosen to provide.

  2. 8

    This is from a recent article from Wired magazine (the link is very easy to find) involving Stephen Thaler. The title of the article is “The Inventor Behind a Rush of AI Copyright Suits Is Trying to Show His Bot Is Sentient.”

    Here is an interesting quote from the article:
    One of Thaler’s main supporters wants to set precedents that will encourage people to use AI for social good. But Thaler himself says his cases aren’t about IP; they’re about personhood. He believes the AI system that he wants recognized as an inventor, DABUS, is sentient, and that these lawsuits are a good way to draw attention to the existence of his new species. “DABUS and all of this intellectual property is not about setting precedents with the law. It’s about setting precedents in terms of human acceptance,” he says. “There is a new species here on Earth, and it’s called DABUS.”

    This aligns with what I have been saying about Thaler. His efforts are all about giving personhood to and/or having DABUS declared sentient.

    While I don’t doubt the possibility of an artificial intelligence being sufficiently aware to be declared sentient some time in the future, that time is not (or anywhere close) to now.

    1. 8.2

      Perhaps because he already lost at patents and copyrights, he now mouths the “new species” line.

      It hardly matters.

      Inventor (like photographer) is a much lower bar than full legal sentience — as being that which is asserted.

      And, as I have stated all along, even as DABUS may satisfy “inventor” AND “artist,” the Lockean nature of our patent clause prohibits non-human grant of the legal rights of either copyright or patent.

      Maybe DABUS and Naruto can take a group selfie in their woes…

  3. 7

    >>A second important core inquiry focuses on training data that is fundamental to today’s generative AI models.

    This reminds me of the attempts by Microsoft in the early 1980’s to claim a copyright on the code you generated with their compilers. Yes, that is real. This is what gave rise to Turbo products as no one wanted to give Microsoft rights to the products we wrote.

    You have to judge the output of the AI in whether it violated copyright. Trying to restrict the training data is so outrageous. It is an obvious right grab by the entrenched.

    1. 7.1

      How can something that is “abstract” “steal” training data?

      According to the courts, the AI is abstract and something abstract cannot use training data.

      Notice how when reality is distorted in one area of law that it almost always causes problems in other areas of law.

    1. 6.1

      And this is a real question. Inventions are described by claims. So, tell us what the invention is without using “abstract” claims—i.e., invalid claims.

  4. 5

    DC: “text, art, music, and video that appear highly creative”

    What does “highly creative” mean in this context? Specifically, how does a person distinguish a “highly creative” work of art (in any medium) from merely “creative”?

    I know you feel compelled to promote “generative AI” as this super awesome thing but maybe you can show us all some examples of completely computer-generated artwork or text (no human input) that impress you. The plain fact of the matter is 99.999% of what these computers “generate” is just c r a p. What’s appealing to certain business owners, of course, is that it’s cheap c r a p which means that human workers can be fired and money can be saved. Is the service or product better as a result? Sure doesn’t seem that way. Maybe we just need to wait a few months? Ha ha ha.

    1. 5.1

      Your feelings are noted.

      As is the hard numbers of those who have picked up and use this thing that you do disdain.

      But you be you.

      1. 5.1.1

        People have been using computers and sampling to generate artworks for a long time. It’s never been the case that if you claim ownership of a “highly creative” but infringing work that you get a pass “because I used a computer.” Likewise, it’s never been the case that if a machine pumps out 10,000 infringing artworks into the public domain (for free, or for sale) that the owner/controller of the machine (i.e., the one who decided to make the machine “run” in that manner) isn’t responsible.

        The only interesting “new” issue, it seems, is how much unlicensed copying is taking place at the “back end” that isn’t protected by fair use. If I am providing a service to artists whereby I provide a library of materials for the sole purpose of allowing you to experiment with them (through the “AI” interface) to amuse yourself (or others) with a “work” that might or might not be legally derivative, well…

        1. 5.1.1.1

          Please inform yourself of what gen AI is.

          It is most definitely not “People have been using computers and sampling to generate artworks for a long time.

          Your emotive rants are plainly embarrassing.

        2. 5.1.1.2

          In my podcast feed is this: link to open.spotify.com

          You might want to pay attention to the objective discussion on the very recent change in computing approach that has resulted in generative AI.

          Or you may want to cling to your emotions and continue to rant showing everyone has c1ue1ess you choose to be.

          You be you.

    2. 5.2

      “The plain fact of the matter is 99.999% of what these computers “generate” is just c r a p.”

      wut? Obviously you gotta sort through for good stuff and have an experienced hand at the wheel.

      “What’s appealing to certain business owners, of course, is that it’s cheap c r a p which means that human workers can be fired and money can be saved.”

      Well if those workers were only making that in the first place then that’s fine, you know, if that’s what satisfies the market.

      Just fyi tho, from my point of view, a whole lot of AI stuff is in fact wayyyy better than what the humans previously making it were generating. And we’re still on ground level 0 of AI.

      1. 5.2.1

        6,

        I wonder if you grasped your own conflicting statements between:

        Obviously you gotta sort through for good stuff and have an experienced hand at the wheel.

        and

        from my point of view, a whole lot of AI stuff is in fact wayyyy better than what the humans previously making it were generating. And we’re still on ground level 0 of AI.

        (plus, you forgot to emphasize the non-human generative aspect directly at point here)

        1. 5.2.1.1

          That’s just how AI works, it makes you 100+ pics/vids and you pick the best 1-5 or so of them to use, and those 1-5 will be better than what the human artist would have done on 1 pic “for the same price/time invested”. Often way better.

          “plus, you forgot to emphasize the non-human generative aspect directly at point here”

          I literally couldn’t give two shts less.

          1. 5.2.1.1.1

            Your bowel movements only emphasize your miss.

            You are, of course, free to “miss” as much as you may choose, but your feelings simply
            Do
            Not
            Matter.

            1. 5.2.1.1.2.1

              Meh, you chase a point hardly worth it (note Malcolm that 6 did provide some context of price/time invested — so whatever ‘point of better being defined’ is selected, the comparative context is provided).

              But regardless of that — you still need to grasp the ‘generative’ portion of generative AI. You have not done so yet.

            2. 5.2.1.1.2.2

              I’m not sure if you’re ta rded or just underexposed. It literally took 30 seconds for me to locate AI art that is better than what 99.9% of normal artists will ever produce in their lives. And that doesn’t take into account that the pic in question likely didn’t take longer than a day (or less) for the AI artist to make using AI tools where it may have taken 2 days or a week just to come close by human hand.

              Conventional AI artists knocking out 99.9% of competition from normals:

              link to aiartists.org

              Or google “best AI artists”. Their stuff will come out quicker than most could otherwise make it.

              In the adult space (don’t google if you don’t like adult ai) literally just the first one I saw in 30 seconds of looking that is knocking 99.9% of artist out:

              rule 34 eugeneric AI

              There’s thousands of others, or tens of thousands by now. Thousands n tens of thousands more not artsy-cartoonish like that one. If you think 99.9% of conventional artists can match that quality in less than a day (or even ever in many cases) you’re delusional.

              Many people starting to make whole books. Soon it will be super high quality whole movies (just provide storyboards, scripts and voices etc. and AI takes over, you string the scenes made together yourself for now). That’s one person making a whole film. Well, with AI and the AI sla ve class of people that empower modern AI (see the documentaries coming out about them). On and on.

              1. 5.2.1.1.2.2.1

                Note the one example I gave above made 54 PAGES (not single pics) of images that in terms of quality will knock out 99.9% of other artists competing with him’s stuff. He made that in ONE MONTH as compared to it taking like probably 8-10 years to even try to scratch that quantity competing conventionally, if you’re top 99.9%. That’s literally just the first example I found in literally 30 seconds.

                MM bro, you’re an old man that doesn’t know what is happening.

                1. “ you’re an old man that doesn’t know what is happening.”

                  LOL – It’s like some little kid in 1977 screeching “DISCO RULES!”

              2. 5.2.1.1.2.2.2

                “ Soon it will be super high quality whole movies”

                What’s your reference for a “quality movie”?

                Trying not to laugh here at mommy’s basement boy pretending to be an expert on art.

  5. 4

    a system of providing a few pennies for each web page

    at

    By all accounts we have reached the point where training the best language models requires literally all the high-quality, human-generated text available. That’s trillions of words; billions of pages; tens of millions of dollars even at just “a few pennies” per page.

    If what you want is for the small number of big companies that currently dominate AI to continue to be the only entities that can afford to be in the game, by all means adopt a compulsory licensing system. But if you want to see real competition and innovation, then using content for training is going to need to be free.

  6. 3

    The copyright office seeks input on the legality of training generative models on copyrighted works obtained via the open internet, but without an express license.

    Kevin Drum has some useful thoughts on this subject.

    In particular, an approach that does not store or actually copy the underlying works would be less likely to be be infringing.

    I can agree that this is a relative point of distinction, but I do wonder whether focusing on this point might lead us astray. If I borrow a book from the library and read it, that is not a copyright violation. If I read it so many times that I memorize it, that is still not a copyright violation.

    In other words, the “copy” stored in my head (and wholly inaccessible to anyone else) is not a copyright violation. A “copy” stored in a chatbot server and similarly inaccessible seems awfully analogous. Is there a good reason why we should treat one as a violation and the other not?

    1. 3.1

      Drum here might be in the right ballpark, but there really is nothing useful there.

      At least it’s not vapid Sprint Left propaganda.

  7. 2

    >we often have copying of works without license, and so the key inquiry under current law appears to be the extent that fair use applies to protect the AI system generators.

    That seems to skip a few steps… it’s always a bit unclear to me if you actually need a license to merely read a work made available on the public internet (yes, the computer makes a temporary copy in memory). And, even if you do need one, making the work available on the public internet would seem to create an implied license to read it (particularly if the TOS doesn’t expressly forbid this use)

    IMHO, the strongest claim is actually the DMCA one, not infringement (given fair use). The DMCA copyright management junk one of those cases where the statutory language could be read to fit…but, imho, Congress intended it to be limited to distribution of literal copies, not the creation of highly-transformative derivatives.

  8. 1

    One can also consider AI-based programming tools such as GitHub Copilot, a typical usage of which is for the programmer to write a series of comments, which the model then turns into lines of code. In a compiled language, the comments go away, leaving only the AI-generated executable code. Normally, executable code is still covered by copyright.

    It is possible to write an entire, non-trivial program this way. The main difference between this and, say, an Midjourney-generated image is that rather than a single prompt, the program will typically have been generated by a series of prompts. Does that make a difference, even if the resulting executable no longer contains any of the original human-authored text? Would it make a difference if the program were written in a non-compiled language, in which the human-authored comments remain intact?

    In other cases Copilot is used to augment rather than replace human code authorship, so the human programmer will still write at least some of the code themselves or may substantially edit the AI-generated code. In that case it’s closer to using a Midjourney-generated image as a starting point and then editing it in Photoshop or using it as a background in a larger work. I think most people would say that is still copyrightable, derivative work considerations aside.

Comments are closed.