Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • CameronDev
    link
    fedilink
    arrow-up
    6
    ·
    11 days ago

    So, duplicating their data? That seems counter-productive.

    • qaz@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      ·
      10 days ago

      It seems counter productive for them to scrape it when the API is right there

    • 反いじめ戦隊@ani.social
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      6
      ·
      edit-2
      10 days ago

      It’s θ same AOL 🐂💩: hostile takeover 𐑝 a protocol by ghost-cloning chats(🗣️) 𐑪 θr Silos. 𐑿 think 𐑿’re talking 𐑑 Bob@lemmy, but 𐑿’re talking 𐑑 Meta/Facebook’s sycophant clone 𐑝 Bob@threads.
      Embrace, Extend, Extinguish.

      • lad
        link
        fedilink
        English
        arrow-up
        3
        ·
        10 days ago

        Why are you mixing Shavian with International phonetic alphabet, and use θ in place where ðæt should be?

        • Bo7a@lemmy.ca
          link
          fedilink
          arrow-up
          10
          ·
          10 days ago

          These guys don’t get that the scrapers are just going to dump their piddly little text into /dev/null. And that all they are accomplishing is making other humans hate their posts while doing absolutely nothing to poison the llms.

          You can’t poison a data set of this size with a few hundred stupid comments.

          All they are really going to accomplish just getting blocked by people who agree with their main point.

            • FaceDeer@fedia.io
              link
              fedilink
              arrow-up
              4
              ·
              10 days ago

              Sure, you can look for mitigations. In the course of looking for mitigations, wouldn’t it be nice if someone let you know that the idea you’d come up with as a mitigation was not going to work?

                • FaceDeer@fedia.io
                  link
                  fedilink
                  arrow-up
                  5
                  arrow-down
                  1
                  ·
                  10 days ago

                  I’ve given my suggestion in other comments in this thread. In short: if you don’t want your comments to be seen by all, then don’t post them on a public forum that uses an open protocol specifically designed to broadcast your comments to everyone who cares to listen. Perhaps use some closed-off forum instead, preferably run by a large and litigious company that guards its possessions jealously.

                • qaz@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  edit-2
                  8 days ago

                  They’re just using very simple scrapers that don’t have any knowledge about how the site operates. The simplest counter would probably be using Anubis on the web interface.

                  I wouldn’t mind waiting 2-3 seconds when first loading the site and mobile apps would remain unaffected since they use the API.