Forget IPs: using cryptography to verify bot and agent traffic

Nemeski@lemm.ee · 8 days ago

Forget IPs: using cryptography to verify bot and agent traffic

refalo · 7 days ago

The article starts out talking about malicious bots that DoS your site, but how would a crypto signature fix that? Couldn’t the client just change the signature whenever it gets blocked?

Dave@lemmy.nz · edit-2 4 days ago

From my understanding, that’s not quite the intent.

Currently, there are a bunch of bots that behave themselves. For example, Google’s search crawler.

They identify themselves with a user agent, e.g. GoogleBot, so Cloudflare know what it is and don’t block it.

Unfortunately, some bad bots pretend to be GoogleBot by setting the same user agent. To counteract this, Cloudflare compares the known IP address ranges with the traffic to make sure it’s actually coming from Google. If it’s not coming from a Google IP range but the user agent says it’s Googlebot, they block it because it’s probably bad.

But knowing which IPs are OK and which aren’t is a challenge because they change over time.

So the proposal here, as I understand it, is to create a system whereby by publishing a public key, you can prove that GoogleBot really is from Google, AmazonBot is from Amazon, etc, and not another crawler pretending.

The spammy ones can keep generating new domains and keys, but you know for sure it’s not Googlebot or whatever.

So it helps “good” traffic prove who it is, it’s not supposed to be for tracking bad traffic.

refalo · 4 days ago

I wonder how long it will be until they start requiring signatures for individual people.

Kissaki · 7 days ago

For those building bots, we propose signing the authority of the target URI, i.e. www.example.com, and a way to retrieve the bot public key in the form of signature-agent, if present, i.e. crawler.search.google.com for Google Search, operator.openai.com for OpenAI Operator, workers.dev for Cloudflare Workers.

They’re proposing the request will include public key source information and request target. Through the public key source, you can verify the origin via source domain name.

refalo · edit-2 5 days ago

So when that gets blocked, they can just generate a new key. I don’t see how this really stops anyone that wants to keep going.

Kissaki · 5 days ago

The point is it makes them identifiable. If you block anything not authenticatable, and everything that auths via *.google.com, you are effectively blocking everything from Google.

If you fear they will evade to other domains, you’ll have to use an allow-list.

refalo · 4 days ago

Ok so effectively then this basically shifts the work from blocking IPs to blocking domains. It might slow down some smaller players, but I imagine anyone with a decent amount of money can afford an insane number of domains.