Copyright doesn’t cover not liking LLMs

I’ve been thinking a lot about LLMs being trained on content against the will of the content creator. I am very aware of the damage that can be done here, especially to small creators who don’t have a legal budget, and I want to protect their rights, and their opportunity to make a living with their content. But I don’t think, in most cases, these content creators have a right to prohibit their work from being used to train LLMs.

For the sake of argument, there are a few things we’ll ignore. First, clear infringement. If an LLM writes a full-length Hunger Games sequel with the same characters, in the same universe – this is clearly already covered by copyright, this is clearly infringement. Important but intellectually boring. Second, electricity needed to power the servers housing the LLMs. Also important, also boring from an intellectual property perspective.

Also, it’s not AI. I like “spicy autocomplete” but whatever you call it, it’s not “intelligence”. It’s simply making guesses based on all the content it has ingested. It can’t make new connections. This is GOOD – we’ve all seen Terminator and no one wants to live in that universe.

We will also assume that the content has been obtained legally. Unauthorized content is a problem but also uninteresting in this context. People getting content through unauthorized means was a problem before LLMs and will be a problem going forward, even if LLMs disappeared today.

So take an anecdote. Let’s say I am a huge fan of Stephen King. I can read all his books (even the ones my friend’s mom swore were written by his wife). This will surely influence my writing style (and in fact it has, because I AM a fan of Stephen King, and have read dozens of his books. It would influence my fiction even more if I got around to writing any with any sort of frequency). This is clearly not any sort of copyright infringement. So, training your LLM on legally obtained copyrighted content is ALSO not copyright infringement.

Next, with my newly earned writing chops, I can write a 1,500 page sequel to The Stand. If I’m good enough, it will sound a bit like he wrote it. If I keep this on my laptop and only read it to pat myself on the back, this is completely legal and does not infringe on his copyright in any way.

Now I try to sell The Stand II – Standoff under my new pen name, Steven Kimg. This is VERY CLEARLY copyright infringement (and remains so even if I’m a bit more subtle with my marketing). Enforcement of these laws is hard, but it’s not impossible. I’m in favor of better enforcement of these laws to protect content creators, but that has little to do with LLMs. Ask any author how many infringing copies of their book were available on Amazon 3 years ago, before LLMs were mainstream.

What if my friend, who is ALSO a King superfan, pays me to write the book? He plans to keep it for himself and not show it to anyone else. For someone like Stephen King, this is too small to matter. He would probably be annoyed at me if he ever found out, but I can’t imagine he’d bother calling his lawyer. A small content creator might be angry, and justifiably so, but showing real damage would be difficult even though I think this is also copyright infringement.

But what LLMs are doing is largely not the same as any of the above. They are reading all of Stephen King, and all of Suzanne Collins, all of Tumblr and Reddit, and anything else they can get their “hands” on. This is literally exactly what humans do to develop their own craft, and I don’t think the volume at which the LLM may do this as opposed to the volume at which a human does it makes any difference to how the law applies. If I read a book and it influences my art, that is not copyright infringement. If I read 100 and they influence my art, still not infringement. 1,000? Still no. 1,000,000? Still no, though this would be a difficult feat for a human.

The problem that isn’t well covered by existing law is when the artist doesn’t want their work used to train these LLMs. I don’t think that is a protected right. It’s like when a politician licenses a song from the label and plays it at a rally. The artist gets mad because they disagree with the politics. The politician may get bad publicity for this, but they are 100% within their legal rights to continue using the song (again, assuming it’s legally licensed, because if it’s not then it’s not interesting to discuss, it’s just boring infringement). Another example – the creators of The Boys have complained that many people who watch the show come away thinking Homelander is the hero. He is quite obviously a deranged sociopath, though I absolutely love the character. But this is a similar case of authorized users of your content using it for something you hate (promoting sociopathic superheroes).

If we want to prevent this, we need new laws. Copyright is a giant hammer and modern content creation and sharing requires a much more versatile tool. Creative Commons tried to provide this and it caught on in some circles but never got the critical mass from big companies, probably because they’re just fine with the giant hammer – they have the legal resources to back it up and don’t much care about the collateral damage. I’m not optimistic we’ll resolve this – the Venn Diagram of those with the desire to change and the power to change is probably two separate circles. But maybe if we think about it this way, we can save some whining.

I feel like we’ve heard this before but it still sucks

You can go back through the archives of this blog, hardly an authority on anything (unless AI content farming has really killed off every other blog and then maybe we are), and you can see so many instances of “This is going to kill [some aspect of the publishing world]” and largely it just hasn’t happened.

This one does sound bad. SPD, one of the last small distributors, is going under. They’re doing it quickly, and so far leaving some clients unpaid. It’s already incredibly hard as a small author to get your book noticed by the mainstream, and if nothing else steps in to fill the void left by SPD’s demise, it’s going to be a whole lot harder. People are very stubbornly clinging to their paper books, and while I don’t entirely blame them, it’s just not sustainable going forward.

I have a fair bit of faith in authors and their ability to pivot, but we keep making it harder on them and that’s no way to encourage creation.

For example, Kameron Hurley is one author offering a monthly Patreon subscription where you get exclusive stuff. It’s cool. I absolutely love her universe where some people can inhabit corpses. It’s a really well-developed universe that she has sadly (to me, at least) not written nearly enough in. But plugging an author I like is not the point (though it’s a bonus). This is all extra work. It used to be you could just be an author and your agent would work and get your books in front of people. Maybe that worked and maybe it didn’t, but that was about it. Now authors have way more opportunity but also way more hats they have to wear.

We’ve been talking for a decade at least about alternate paths to success for authors and they mostly haven’t materialized. Maybe authors should try using the electrical output of a mid-sized country to write a book and maybe Silicon Valley would take notice and throw some venture capital at them.

Never thought I’d see the day – welcome to the Public Domain, Mickey

I’m a few months late on this as I was still on blogging hiatus when it actually happened. but the earliest version of Mickey Mouse (The one that Disney very likely stole from another artist) is finally in the Public Domain. I guess Disney decided the lobbying dollars would be more valuable somewhere else and they didn’t get Congress to retroactively extend copyright again.

It’s all the rage lately to be a Constitutional Originalist, but what that really means is you do it when convenient. Otherwise the Supreme Court would obviously have to overturn the laws retroactively extending copyright. Copyright was meant to “promote the Progress of Science and useful Arts” – it literally says that in the Constitution. It was supposed to give people incentive to create things that others would find useful or beautiful. Adding years to an existing copyright can’t do that – the covered work has already been created.

You can argue that extending copyright on future works would promote the progress, but that’s a different argument. It’s still wrong, but that’s an argument for another day.

Extending copyright on existing works is simply a handout to someone who already took your deal.

This is all kind of silly at this point anyway – Mickey Mouse has evolved quite a bit since 1928, with most versions are still protected, AND most of what people wanted to do with the freed Mickey were already permitted under fair use. Still, I’m happy to see this day, as I never thought I would.

Is training your LLM on copyrighted material against the law?

LLMs are everywhere now, for better or worse (mostly worse). They are mostly used for creating just-good-enough garbage. This has been possible for some time now – back in 2015 I trained a Markov chain on a bunch of Wine Spectator reviews and had it spit out wine reviews with Amazon affiliate links, back when you could buy wine on Amazon. The site is still up and still makes me laugh. I suppose ChatGPT is a little better at this now, but not orders of magnitude better.

A lot of things about LLMs are not interesting. They lie confidently. They’ve contributed to the utter uselessness of web search. They make it trivial to create content that search engines like but no one else has any use for. I find all that disturbing but not intellectually stimulating. What DOES interest me, however, is the copyright angle.

Assuming the LLM doesn’t spit out plagiarized work, I do not believe that training your LLM on copyright material is in any way a violation of any current US law. I am not a lawyer, and I am open to be proven wrong here, but it just doesn’t make sense. We are assuming that you are legally accessing these copyrighted works because the scenario is no longer interesting if you are, for example, downloading a torrent of every Random House book ever published and letting your LLM ingest all of that. Current law clearly covers that.

Let’s say I want to be an author. I’m into horror, so I read a bunch of horror books to hone my craft. I read a lot of Stephen King, and now a lot of what I write is pretty heavily influenced by his style. He’s a pretty successful author, so this isn’t really a bad thing. Now, no one would consider this copyright infringement, right? Every author ever is influenced by what they read. It’s one of the first things they teach you in writing class – go read more.

So what is the difference between me reading Stephen King and bits of his style creeping into mine, and an LLM reading EVERYONE, and bits of their style creeping into its writing? The only difference is volume. And there’s nothing in copyright law that says “doing this once is fine, doing it 100,000 times is a violation”.

Activitypub testing, mostly

I’ve just installed the Activitypub plugin and so far I can’t find it from Mastodon, and I assume this is because I haven’t posted anything to the blog yet since I installed the plugin.

I’m going to try to get the blog going again. I used to enjoy doing this. The ebook market has not evolved nearly as much as you would have thought in all these years, but at least it’s not all Amazon anymore. There aren’t a ton of cool new business models built up around ebooks. I know my kids read ebooks sometimes on their phones, but are more likely to read a paper book. Whether this is the cause of the lack of business models around ebooks, or the effect, I don’t really know.

One book does not a reversal of policy make

I don’t share the optimism of Teleread and Kindle Nation, but it appears that Amazon has not entirely deserted free ebooks.

They do, however, remain committed to controlling virtually every aspect of the Kindle that they’ve leased you, which is not terribly consumer-friendly.

Still, at least it’s not as bad as it originally seemed.

Free ebooks back at Amazon- John Lutz Urge to Kill | TeleRead: Bring the E-Books Home.

Free content doesn’t mean free everything

Over and over, when someone proposes giving away something for free in order to make more money on whatever else it is you’re selling, whether it’s the hard copy of your book, the tickets to your show, or anything else, some people see “free” and can’t understand that it doesn’t end there.  People get so mad that you’d suggest that everyone starve because “kids these days” don’t want to pay for music.

One of the comments on the really nice article linked below is one such person.

Eric: I have to say that this model saddens me.  Where’s the respect for the value of the artist’s labor when its given away free?  In over 25 years as a music writer for film/tv/theater, etc. I have many times been approached with some version of “We don’t have much budget on this one but do us a solid and there should be a good budget on the next….”  NEVER, has one of these ever come back with a decent paying gig and more than once people have come back with, “Oh, but last time you were able to do this for us.  How come?”

First of all, it’s clear the guy didn’t read the post. No one was suggesting you do the show for free.  The author of the article (Derek, not Eric) didn’t actually say to give anything away for free.  He just advocated making an appeal to fans to buy your cd. Pay what you want, even if it’s nothing, but walk out of the show with a copy of the cd.

The point is that, in his experience, the bands make more money this way.  This has nothing to do with giving away your work for some idealistic notion of good for society.  It has nothing to do with disrespecting creative works. The opposite, in fact – it’s all about compensating the creator in a way that allows him or her to continue creating, and treats fans like fans, not potential thieves.

You have to stop and think – is it better to make a living doing what you love, or to be compensated for each and every use of your work?

Article: Emphasize meaning over price = More paid sales | Derek Sivers via CwF RtB on Twitter