• @[email protected]
      link
      fedilink
      15
      edit-2
      28 days ago

      but not the misuse of public content

      Is that an admission that they don’t own the content others posted on their site?

  • @[email protected]
    link
    fedilink
    5428 days ago

    Not gonna lie this seems like ultimately a win for the Internet. The years of troubleshooting solutions Reddit Provided can be archived (hopefully) but the less people rely on the site itself, the better. At least in my opinion.

    • @[email protected]
      link
      fedilink
      226 days ago

      I disagree, kinda. Stackoverflow is the other option for questions which is a lot less user friendly, and Lemmy has never shown up in search results for me. If something comes along and makes it simple, great! however I just see a lot more of ad filled hellhole sites in the meantime.

  • @[email protected]
    link
    fedilink
    5228 days ago

    I remember finding Google’s robots.txt when they first came out. It was a cute little text ASCII art of a robot with a heart that said, “We love robots!”

      • Asudox
        link
        628 days ago

        my shiny metal ass

  • @[email protected]
    link
    fedilink
    828 days ago

    As annoying as this is, it’s to prevent LLMs from training themselves using Reddit content, and that’s probably the greater of the two evils.

    • FurblandOP
      link
      fedilink
      3728 days ago

      That’s all well and good, but how many LLMs do you think actually respect robots.txt?

      • @[email protected]
        link
        fedilink
        English
        1428 days ago

        from my limited experience, about half? i had to finally set up a robots.txt last month after Anthropic decided it would be OK to crawl my Wikipedia mirror from about a dozen different IP addresses simultaneously, non-stop, without any rate limiting, and bring it to its knees. fuck them for it, but at least it stopped once i added robots.txt.

        Facebook, Amazon, and a few others are ignoring that robots.txt, on the other hand. they have the decency to do it slowly enough that i’d never notice unless i checked the logs, at least.

    • Anas
      link
      fedilink
      1228 days ago

      It’s to prevent LLMs from training themselves using reddit content, unless they pay the party that took no part in creating said content

      FTFY