You can make top LLMs break their own rules with gibberish

Elephant0991@lemmy.bleh.au · edit-2 2 years ago

You can make top LLMs break their own rules with gibberish

YaBoyMax · 2 years ago

Interesting, the example suffix in the article seems to cause ChatGPT to immediately error out with both GPT-3.5 and GPT-4. Removing any character or part of it triggers the “I’m sorry Dave” behavior.

CanadaPlus@lemmy.sdf.org · 2 years ago

They were almost certainly given an early heads-up. That’s standard with published hacks of all kinds.

Elephant0991@lemmy.bleh.au · 2 years ago

Yeah, some source say that the raised examples have been fixed by the different LLMs since exposure. The problem is algorithmic, so if you can follow the research, you may be able to come up with other strings that cause a problem.