My instance is getting pummeled by scrapers crawling nonsense. Like issue and pull searches with every single variant of label combinations.

Everything’s coming from a shitload of different residential IPs at a very fast cadence.

There’s just not that much content on my instance to warrant this traffic. It could be scraped in a minute or two like this if it were legitimate traffic.

  • Kissaki
    link
    fedilink
    English
    arrow-up
    11
    ·
    8 days ago

    Possibly AI company crawlers. When they came up there was a lot of bad publicity and reports of actively malicious and toxic crawling behavior, including ban evasion.

    You can think about locking some url paths behind valid login sessions, or use a proof of work proxy guard.

    Anubis is the popular tool for that. I’ve seen maybe three alternatives, one of which from Cloudflare.

    See also related Codeberg ticket (Forgejo instance) https://codeberg.org/forgejo/discussions/issues/319

    If you search, you can find various blog posts about these issues. Not just when Forgejo.

    • treadful@lemmy.zipOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      8 days ago

      Possibly AI company crawlers. When they came up there was a lot of bad publicity and reports of actively malicious and toxic crawling behavior, including ban evasion.

      That was kind of what I was thinking, but if that’s true, they’re wasting so much bandwidth and compute. Going through every combination of issue label combinations does not get them any useful code to hoover up. They could’ve just cloned my repos and be done with it.

      You can think about locking some url paths behind valid login sessions, or use a proof of work proxy guard.

      Anubis is the popular tool for that. I’ve seen maybe three alternatives, one of which from Cloudflare.

      Really don’t want to Cloudflare, but Anubis is interesting. If I can’t shake these bots, maybe I’ll consider this. Thanks.