- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times’ suit goes well beyond that to show how the material ingested during training can come back out during use. “Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples,” the suit alleges.
The suit alleges—and we were able to verify—that it’s comically easy to get GPT-powered systems to offer up content that is normally protected by the Times’ paywall. The suit shows a number of examples of GPT-4 reproducing large sections of articles nearly verbatim.
The suit includes screenshots of ChatGPT being given the title of a piece at The New York Times and asked for the first paragraph, which it delivers. Getting the ensuing text is apparently as simple as repeatedly asking for the next paragraph.
The suit is dismissive of attempts to justify this as a form of fair use. “Publicly, Defendants insist that their conduct is protected as ‘fair use’ because their unlicensed use of copyrighted content to train GenAI models serves a new ‘transformative’ purpose,” the suit notes. “But there is nothing ‘transformative’ about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”
The suit seeks nothing less than the erasure of both any GPT instances that the parties have trained using material from the Times, as well as the destruction of the datasets that were used for the training. It also asks for a permanent injunction to prevent similar conduct in the future. The Times also wants money, lots and lots of money: “statutory damages, compensatory damages, restitution, disgorgement, and any other relief that may be permitted by law or equity.”
At best it will be a slight setback, I think the cats just outta the bag now.
There’s really no cat. We’ve been using algorithms to do stuff for a very long time (thousands of years), and there’s literally no intelligence behind what people are calling “artificial intelligence”. It’s just another algorithm. This is just another increment in automation like all the rest (plow, printing press, loom, assembly line, computer, etc.), except the marketing is making it sound even more fundamental, when in reality it’s really less impressive (i.e. a spell-checking feature added to a word-processing program! ).
Will capitalists still use the term “artificial intelligence” to try to justify whatever BS they’re pulling—against other capitalists in the market, but especially against workers? Of course. Just like they’re likely to keep using the term “sharing” to bypass labor protections and other regulation having to do with taxis, hotels, etc.
Anyway, we really don’t have a horse in this race. Whether the capitalists wanting to preserve “intellectual property” win and Napsters and Pirate Bays keep getting taken down, or the SPAM engine capitalists win and everything we try to do gets flooded with so much barely camouflaged marketing junk that we can’t sort through it all. Heads they win/tails we lose. Or whatever boring dumbassery winds up getting settled on in the middle to maximally both preserve and enhance our exploitation, which is the most likely result.
The cat, in this case, isn’t necessarily an actual artificial intelligence but is instead a cursed abuse of linear algebra smashed into a shape beyond human comprehension using an unimaginable amount of data and computational power.