• 2 Posts
  • 414 Comments
Joined 2 years ago
cake
Cake day: August 23rd, 2023

help-circle
  • sudotoLinuxFOSS infrastructure is under attack by AI companies
    link
    fedilink
    arrow-up
    1
    arrow-down
    1
    ·
    24 hours ago

    The assumption is correct. PoW has been proven to significantly reduce bot traffic.

    What you’re doing is filtering out bots that can’t be bothered to execute JavaScript. You don’t need to do a computational heavy PoW task to do that.

    meanwhile the mere existence of residential proxies has exploded the availability of easy bot campaigns.

    Correct, and thats why they are the number one expense for any scraping company. Any scraper that can’t be bothered to spin up a headless browser isn’t going to cough up the dough for residential proxies.

    Demonstrably false… people already do this with abysmal results. Need to visit a clownflare site? Endless captcha loops. No thanks

    That’s not what “demonstrably false” even means. Canvas fingerprinting filters out bots better than PoW. What you’re complaining about too strict settings and some users being denied. Make your Anubis settings too high you’ll have users waiting long times while their batteries drain.






  • I write a lot of webscrapers. This sounds like the weakest solution with the most cost to regular users. I don’t even have to spoof anything, just solve some hash?

    1. Just open a browser and let anubis hash
    2. Intercept the its token
    3. Scrape the api with the token
    4. Schedule a background task that steadily solves the hashes at a steady rate.

    I should spin this up and test it out before I talk more shit about it. But the concept of slowing me down just by forcing extra computations is laughable. You’d have to crank the amount of computations up so high you’ll hurt your own userbase.


  • sudotoLinuxFOSS infrastructure is under attack by AI companies
    link
    fedilink
    arrow-up
    3
    arrow-down
    3
    ·
    2 days ago

    Proof of Work is a terrible solution because it assumes computational costs are significant expense for scrapers compared to proxy costs. It’ll never come close to costing the same as residential proxies and meanwhile every smartphone user will be complaining about your website draining their battery.

    You can do something like only challenge data data center IPs but you’ll have to do better than Proof-of-Work. Canvas fingerprinting would work.


  • sudotoLinuxFOSS infrastructure is under attack by AI companies
    link
    fedilink
    arrow-up
    8
    arrow-down
    2
    ·
    2 days ago

    Whats confusing the hell out of me is: why are they bothering to scrape the git blame page? Just download the entire git repo and feed that into your LLM!

    9/10 the best solution is to block nonresidential IPs. Residential proxies exist but they’re far more expensive than cloud proxies and providers will ask questions. Residential proxies are sketch AF and basically guarded like munitions. Some rookie LLM maker isn’t going to figure that out.

    Anubis also sounds trivial to beat. If its just crunching numbers and not attempting to fingerprint the browser then its just a case of feeding the page into playwright and moving on.


  • That’s what’s confusing me. Rober’s hypothesis is without lidar the Tesla couldn’t detect the wall. But to claim that autopilot shut itself off before impact means that the Tesla detected the wall and decided impact was imminent, which disproves his point.

    If you watch the in car footage, autopilot is on for all of three seconds and by the time its on impact was already going to happen. That said, teslas should have lidar and probably do something other than disengage before hitting the wall but I suspect their cameras were good enough to detect the wall through lack of parallax or something like that.