Poisoning Text

keytonowhere@lemmy.world · 3 天前

Poisoning Text

Vogi@piefed.social · 3 天前

There is iocaine and nepenthes. You can easily deploy on your server. Given the scraper uses the correct User-Agent… which they probably do not do especially if everyone started deployed tarpits.

Would be fun if lemmy or piefed had an option to have the content of posts poisend as there are some new Crawlers crawling in the fedispace recently.

Or we could of course invent our own lemmy speech that is still english, but really weird.

OhNoMoreLemmy@lemmy.ml · 3 天前

Most of these (nightshade etc) are just a secondary form of AI grift that also doesn’t work.

At best they last till they get effective enough for work arounds to be worth looking for, and then they’re gone. The only remedies that stand a chance of lasting are legal.

e8d79@discuss.tchncs.de · 3 天前

You can target the crawlers using tar pits and proof-of-work application firewalls but I am doubtful that poisoning does anything. The second a poisoning method becomes common enough to have an effect the AI companies will just start filtering for that. Unfortunately the only way I see that prevents your work from being stolen is to either not publish it at all, or to only publish to smaller invite based communities that closely monitor who is accepted.

shoki@lemmy.world · 2 天前

you could also have an unique challange, for example showing the user an image that has instructions to append sone text to the url. anything that scrapers are too stupid for (I don’t think they are scraping using “intelligent” ai agents yet)

GreenBeanMachine@lemmy.world · edit-2 3 天前

I’m only aware of font scrambling, but that comes at the cost of accessibility and SEO.

Edit: open this in reading mode https://tilschuenemann.de/projects/sacrificing-accessibility-for-not-getting-web-scraped

e8d79@discuss.tchncs.de · 3 天前

That’s a fun idea but AI companies would probably just screenshot the website and OCR the text if this became common. It’s also really inconvenient for the users as it breaks both copy pasting and Ctrl+F searching.

GreenBeanMachine@lemmy.world · 3 天前

Yes, it breaks the usability completely. But some of those issues can be fixed with more code. E.g. custom search and copy+paste would be pretty easy to do.

As for OCR, any solution would be futile against it. If a human can see it, robot can too.

Treczoks@lemmy.world · 3 天前

A simple engine that provides grammatically correct sentences with random content, triggered by following links that are not user accessible. That’s what we need basically everwhere.

gustofwind@lemmy.world · 3 天前

These things don’t work and are often just scam services sold to fearful content creators

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 3 天前

That dude here on Lemmy that uses thorns instead of “th” has a pretty decent idea.

red_tomato@lemmy.world · 3 天前

Just wait until OpenAI discovers s/þ/th/g

The Velour Fog @lemmy.world · 3 天前

An amusing side effect is that I read all of their thorn-ed comments in Daffy Duck’s voice.

thesdev@feddit.org · 3 天前

Don’t know, I’m thorn on that idea.

Saledovil@sh.itjust.works · 3 天前

Add random profanity? Fuck.

johsny@lemmy.world · 3 天前

Poop