AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

RickRussell_CA@beehaw.org · 2 years ago

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

RickRussell_CA@beehaw.org · 2 years ago

And yet, we know that the work is mechanically derivative.

keegomatic@kbin.social · edit-2 2 years ago

So is your comment. And mine. What do you think our brains do? Magic?

edit: This may sound inflammatory but I mean no offense

RickRussell_CA@beehaw.org · 2 years ago

No, I get it. I’m not really arguing that what separates humans from machines is “libertarian free will” or some such.

But we can properly argue that LLM output is derivative because we know it’s derivative, because we designed it. As humans, we have the privilege of recognizing transformative human creativity in our laws as a separate entity from derivative algorithmic output.

conciselyverbose@kbin.social · edit-2 2 years ago

So is literally every human work in the last 1000 years in every context.

Nothing is “original”. It’s all derivative. Feeding copyrighted work into an algorithm does not in any way violate any copyright law, and anyone telling you otherwise is a liar and a piece of shit. There is no valid interpretation anywhere close.

zygo_histo_morpheus · edit-2 2 years ago

Every human work isn’t mechanically derivative. The entire point of the article is that the way LLMs learn and create derivative text isn’t equivalent to the way humans do the same thing.

According to Giansiracusa, the key difference between humans and LLMs is that bots require a ton of training data to recognize patterns, which they do via interpolation. Humans, on the other hand, can extrapolate information from a very small amount of new data. He gave the example of a baby who quickly learns about gravity and its parents’ emotional states by dropping food on the floor. Bots, on the other hand, are good at imitating the style of something they’ve been trained on (ex: a writer’s work) because they see the patterns but not the deeper meanings behind them.

“If I want to write a play, I don’t read every play ever written and then just kind of like average them together,” Giansiracusa told me. “I think and I have ideas and there’s so much extrapolation, I have my real life experiences and I put them into words and styles. So I think we do extrapolate from our experiences – and I think the AI mostly interpolates. It just has so much data. It can always find data points that are between the things that it’s seen and experienced.”

conciselyverbose@kbin.social · 2 years ago

It’s complete and utter nonsense and they’re bad people for writing it. The complexity of the AI does not matter and if it did, they’re setting themselves up to lose again in the very near future when companies make shit arbitrarily complex to meet their unhinged fake definitions.

But none of it matters because literally no part of this in any way violates copyright law. Processing data is not and does not in any way resemble copyright infringement.

RickRussell_CA@beehaw.org · 2 years ago

This issue is easily resolved. Create the AI that produces useful output without using copyrighted works, and we don’t have a problem.

If you take the copyrighted work out of the input training set, and the algorithm can no longer produce the output, then I’m confident saying that the output was derived from the inputs.

conciselyverbose@kbin.social · 2 years ago

There is literally not one single piece of art that is not derived from prior art in the past thousand years. There is no theoretical possibility for any human exposed to human culture to make a work that is not derived from prior work. It can’t be done.

Derivative work is not copyright infringement. Straight up copying someone else’s work directly and distributing that is.

RickRussell_CA@beehaw.org · 2 years ago

There is literally not one single piece of art that is not derived from prior art in the past thousand years.

This is false. Somebody who looks at a landscape, for example, and renders that scene in visual media is not deriving anything important from prior art. Taking a video of a cat is an original creation. This kind of creation happens every day.

Their output may seem similar to prior art, perhaps their methods were developed previously. But the inputs are original and clean. They’re not using some existing art as the sole inputs.

AI only uses existing art as sole inputs. This is a crucial distinction. I would have no problem at all with AI that worked exclusively from verified public domain/copyright not enforced and original inputs, although I don’t know if I’d consider the outputs themselves to be copyrightable (as that is a right attached to a human author).

Straight up copying someone else’s work directly

And that’s what the training set is. Verbatim copies, often including copyrighted works.

That’s ultimately the question that we’re faced with. If there is no useful output without the copyrighted inputs, how can the output be non-infringing? Copyright defines transformative work as the product of human creativity, so we have to make some decisions about AI.

keegomatic@kbin.social · 2 years ago

The person who painted that landscape has certainly been influenced by prior artists, is not the first person to have painted a landscape, and is creating a work directly derivative of nature itself. They didn’t appear from thin air a fully-formed human being and start painting the hills. The person filming a cat video has seen videos before. They know to hold their phone at a certain angle and in a certain orientation to get the view they want of the cat, and that also does not spring from a vacuum. These two artists are each the sum total of their own experiences, their training sets. The difference between inference and extrapolation in this context is only a matter of complexity.

conciselyverbose@kbin.social · 2 years ago

If they’ve seen prior art, yes, they are. It’s literally not possible to be exposed to the history of art and not have everything you output be derivative in some manner.

Processing and learning from copyrighted material is not restricted by current copyright law in any way. It cannot be infringement, and shouldn’t be able to be infringement.

RickRussell_CA@beehaw.org · 2 years ago

It’s literally not possible to be exposed to the history of art and not have everything you output be derivative in some manner.

I respectfully disagree. You may learn methods from prior art, but there are plenty of ways to insure that content is generated only from new information. If you mean to argue that a rendering of landscape that a human is actually looking at is meaningfully derivative of someone else’s art, then I think you need to make a more compelling argument than “it just is”.

lily33@lemm.ee · edit-2 2 years ago

From Wikipedia, “a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work”.

You can probably can the output of an LLM ‘derived’, in the same way that if I counted the number of 'Q’s in Harry Potter the result derived from Rowling’s work.

But it’s not ‘derivative’.

Technically it’s possible for an LLM to output a derivative work if you prompt it to do so. But most of its outputs aren’t.

RickRussell_CA@beehaw.org · 2 years ago

a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work

What was fed into the algorithm? A human decided which major copyrighted elements of previously created original work would seed the algorithm. That’s how we know it’s derivative.

If I take somebody’s copyrighted artwork, and apply Photoshop filters that change the color of every single pixel, have I made an expressive creation that does not include copyrightable elements of a previously created original work? The courts have said “no”, and I think the burden is on AI proponents to show how they fed copyrighted work into an mechanical algorithm, and produced a new expressive creation free of copyrightable elements.

lily33@lemm.ee · edit-2 2 years ago

I think the test for “free of copyrightable elements” is pretty simple - can you look at the new creation and recognize any copyrightable elements in it? The process by which it was created doesn’t matter. Maybe I made this post entirely by copy-pasting phrases from other people, who knows (well, I didn’t, only because it would be too much work), but it does not infringe either way…