Newspapers, copyright, artists, and GenAI

The New York Times has sued OpenAI for copyright infringement. Authors and artists have sued image and text generators similarly.

I am sympathetic to the artists and writers (and not just because I've been working on a novel for far too long already without finishing it): these are famously low-income professions, and these automations are likely to render them as economically useful as blacksmiths or coopers. Even if^[0] they win their current lawsuits, even if the famous AI models are forced to be deleted and any new have to be made with only suitably licensed material, the reason I know there's enough freely licensed material for them to still be in trouble is that such models have already been made.

The newspapers, though? I can't find it in me to be sympathetic to the newspapers.

First, as faceless corporations, they're somewhat inhuman at the best of times. This isn't a good reason in any objective sense, but it tells you where the limits of the empathy in my mind is.

Second, and more pertinently, newspapers have a long history of getting things wrong. You've probably heard a quote that's (incorrectly!) attributed to Mark Twain: "If you don’t read the newspaper you are uninformed, if you do read the newspaper you are misinformed.". For those disinclined to follow those links, the general attitude represented by the quote goes back to at least Thomas Jefferson in 1807, though as this is about an English language quote and the earliest newspapers weren't in English I'd expect equivalent complaints to have been made in other languages that just weren't relevant to the quote researchers. You may also have heard of the Gell-Mann amnesia effect: "[t]he phenomenon of people trusting newspapers for topics which they are not knowledgeable about, despite recognizing them to be extremely inaccurate on certain topics which they are knowledgeable about".

Now, does the idea in these quotes ring any bells? To me, the idea sounds like the term "hallucination"^[1] in the context of AI: a response generated by an AI which contains false or misleading information presented as fact.

I'm told that you can't copyright facts. You can, in some jurisdictions, get a database right^[2], but that's a different thing. Conditional on this, an already-trained LLM can be used to read all newspapers (if search engines can process websites to construct indexes, why not?), extract all the claims of fact within each article along with their sources, then cross-reference each claim to make sure they're well-attested, and write up the results in the style of an out-of-copyright or public-domain author. The newspapers lose even harder than the artists and authors, because the AI is so cheap it can now afford to actually check the sources — and that's not merely theoretical: the free version of ChatGPT doesn't do this, but it's very easy to work with underlying API to enable this kind of thing.

So, I think newspapers can't really be protected by copyright even if they win these cases. Even with database rights, the only way they can really defend themselves is by writing fiction (not mere opinion, fiction, and then what's the point of the paper?), or by having exclusive access to some source unavailable to other papers — which, to be clear, I agree is both really useful and does happen, the problem is that when this happens with regard to an important story, even when the papers are correct they can lose, and have lost, court cases over it, or just get paid a visit by men in suits under orders to destroy their laptop.

What remains is, rather unfortunately given the importance of a free press, that LLMs and papers are really competing for the attention of those who find them compelling, not those who seek the truth — Goodhart's law strikes again.

[0] I'm absolutely not qualified to guess the outcomes. I know vastly less law than even the free version of ChatGPT and yet it would still be deeply unwise to trust ChatGPT to answer legal questions.

[1] Personally I prefer the alternative names "confabulation" and "delusion".

[2] Also: is it reasonable that newspapers have a 70 year monopoly on this stuff? Does society really benefit from them being able to sue for unpaid use of the text of, for example, a 1974 job advert for grill chefs or a 1994 factory explosion?

Tags: AI, Copyright, Opinion

Categories: AI