Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

The long-awaited S-1 filing reveals much of what Reddit users knew and feared: That many of the changes the company has made over the last year in the leadup to an IPO are focused on exerting control over the site, sanitizing parts of the platform, and monetizing user data.

Posting here because of the privacy implications of all this, but I wonder if at some point there should be an “Enshittification” community :-)

  • Fubarberry
    link
    fedilink
    English
    1437 months ago

    Reddit has long had an issue with confidently providing false statements as fact. Sometimes I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn’t know better. This made me question all the other posts that I had believed without knowing enough to tell otherwise.

    Llms also have the same issue of confidently telling lies that sound true. Training on Reddit will only make this worse.

    • @[email protected]
      link
      fedilink
      657 months ago

      Yeah all of my most down voted reddit comments were the ones where I replied about something I’m an actual expert in. Scary stuff

      • @[email protected]
        link
        fedilink
        577 months ago

        The voting system let’s people push comments to the top that they want to be true, not necessarily things that are true.

        • Dessalines
          link
          fedilink
          46
          edit-2
          7 months ago

          There’s also the issue of reddit comment sorting being entirely dominated by time. In something like 90% of posts, the top comment is one of the first five. Literally all you have to do is just comment first, and it’ll likely be the top.

          • Rentlar
            link
            fedilink
            237 months ago

            I noticed from the beginning that Lemmy’s default comment sorting improves visibility of a variety of comments including newer ones. Gee, I wonder who could have helped make it that way ;)

            Over the years I ended up getting a Reddit habit of replying to one of the top comments so that it could attain some visibility. I still do sometimes but less often on Lemmy.

            • Uglyhead
              link
              fedilink
              English
              77 months ago

              Some of the better subreddits tried to mix it up and change how this affected upvotes. There was Muxing,…etc etc… But then,… Spez came in (back) and didn’t give af about anything at all except money.

              • @[email protected]
                link
                fedilink
                27 months ago

                First time I’m hearing about this, can you give any links? Maybe we could use something similar in lemmy

                • Uglyhead
                  link
                  fedilink
                  English
                  17 months ago

                  Muxing upvotes , “balances”, etc.

                  Even hiding all upvotes of every comment thread until ~12 hrs after posting.

          • @[email protected]
            link
            fedilink
            English
            37 months ago

            This tends to give more influence to people who spend more time on it and write more. And they are less likely to be subject matter experts.

        • @[email protected]
          link
          fedilink
          57 months ago

          I strongly agree with this comment. To show my appreciation, you have my upvote. Had I only agreed a little bit, I might have not voted at all. If that comment had made me angry, I might have downvoted.

          Actually calling these things votes instead of likes makes a lot of sense. I might not like a comment, but I might want it to be higher. I might not hate another comment, but I might want it to be lower because of other reasons.

      • @[email protected]
        link
        fedilink
        English
        87 months ago

        Downvoting was always just fast food validation that you’re better than someone else without having to actually back it up.

      • @[email protected]
        link
        fedilink
        English
        3
        edit-2
        7 months ago

        I spent 20 years as a producer, developer, and project manager in the lottery and games industry.

        Trying to explain how lottery and games work to people and have them hear me makes me want to cry.

          • @[email protected]
            link
            fedilink
            English
            27 months ago

            Certainly, I’m always happy to share with inquisitive minds.

            Is there any particular question you’d like me to address?

            • @[email protected]
              link
              fedilink
              17 months ago

              Not really, I never paid much mind to it. I’m curious about the whole industry I guess, or anything you’d like to share or set the record straight about.

              • @[email protected]
                link
                fedilink
                English
                27 months ago

                Oh there’s lots I have to set the record straight about and there’s lots I could talk about, but without being asked a specific question that would just leave me to write an open-ended essay and I’m not up for it right now

      • @[email protected]
        link
        fedilink
        117 months ago

        This is a great example of why it’s so important to emphasize teaching critical thinking in school right now. Misinformation and disinformation is just going to continue to grow.

        • @[email protected]
          link
          fedilink
          57 months ago

          Literally why I bookmarked it. I’m an online teacher, so I’m going to advocate for adding that article to a grade 10 course that’s used by thousands of students each year.

          • @[email protected]
            link
            fedilink
            77 months ago

            I’m a student teacher right now in elementary! I try to get my kids to think critically whenever I can. I hear kids talk about insane shit they saw/heard on tiktok (I got into an argument with a student who thought Slenderman was 100% real because of something they saw on tiktok) and I try to really get them to think and actually justify why they believe things.

            • @[email protected]
              link
              fedilink
              37 months ago

              Somewhat related:

              A recommendation about teaching controversial topics: you need to build connection first.

              I mean, that’s true of all teaching, but when you start to question the (prejudiced) things they’re hearing from trusted adults at home, you really need to have a strong relationship with the students.

              Being an anti-racist pro-SOGI educator in conservative communities is hard.

              I wish you success in your career! Teachers have such an opportunity to make a huge impact on the world.

      • Fubarberry
        link
        fedilink
        English
        47 months ago

        That’s a really good article, and it does a good job of highlighting the issues with modern day search results.

        I’ve been guilty to use “best x” pages before, but if the website with the “best of page” doesn’t have specific reviews linked I usually look up individual product reviews for the good sounding items on other websites.

    • livus
      link
      fedilink
      16
      edit-2
      7 months ago

      @Fubarberry yes I saw this a lot too. Highly upvoted confidently incorrect comments, with the real answer or an answer debunking them with links to factual sources less upvoted.

      Happened to me as well.

      • @[email protected]
        link
        fedilink
        English
        147 months ago

        I am a lawyer and I would get down voted for posts explaining the law that contained citations to the actual applicable statute if people didn’t like the statute. Using reddit up votes as a measure of correctness is fundamentally a dumb idea.

    • @[email protected]
      link
      fedilink
      57 months ago

      but sounded correct to someone who didn’t know better

      specious /spē′shəs/ adjective

      Having the ring of truth or plausibility but actually fallacious.
      "a specious argument."
      

      and then the real answer will be hidden or something silly, or in some cases where money is involved the correct answer might have been removed

    • @[email protected]
      link
      fedilink
      English
      37 months ago

      I would come along a question that I was well educated on, and the top voted responses were all very clearly wrong, but sounded correct to someone who didn’t know better.

      This can be said to https://news.ycombinator.com/ as well. I wonder how much of this is due to sock puppets and bots.

  • @[email protected]
    link
    fedilink
    957 months ago

    I’m still happy that I went through the effort to delete all my old posts when I left Reddit a while back. I periodically check if they’ve restored them and luckily it hasn’t happened so far. I do miss some of the bigger communities but overall I’m having a good time on Lemmy.

    • @[email protected]
      link
      fedilink
      467 months ago

      I’m sure they have a backup somewhere that they will use to train the AI, but agreed, it is time to leave reddit for good.

        • @[email protected]
          link
          fedilink
          37 months ago

          Well, if you want to be sure that Reddit deleted your data, the time to bring it up is now. Ask questions, contact journalists, demand answers.

        • Atemu
          link
          fedilink
          07 months ago

          Your PII isn’t being sold here and you gave Reddit an irrevocable license to your content, so being in the EU doesn’t matter.

            • Atemu
              link
              fedilink
              27 months ago

              The GDRP explicitly only applies to “personal data”

              1. This Regulation lays down rules relating to the protection of natural persons with regard to the processing of personal data and rules relating to the free movement of personal data.

              which it defines as follows:

              ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person

              Please provide a quote where the GDPR says that it applies to anything but “personal data”.

      • TheOneCurly
        link
        fedilink
        English
        97 months ago

        I wonder what the risks are to including deleted and pre-edited content in training data. Most of the edits are going to be typos and formatting, do you want 2-3 copies of the same message with typos in them for training data? Similarly, deleted comments are mostly nonsense, unhelpful, duplicate, or highly controversial things.

        If someone wants to dig through and find individual users to restore that’s one thing, but I don’t think I’d immediately choose to train off of that other data unless I had to.

        • @[email protected]
          link
          fedilink
          47 months ago

          It should be very easy to distinguish edits and deletes which were made within a few minutes or hours after writing a comment, from those made months or years later right around the reddit blackout.

      • gradyp
        link
        fedilink
        English
        67 months ago

        Only shadenfreud I have is that my deleted banter that they will assuredly include, will hopefully increase the stupidity of whatever model gets trained on it. Ugh, what a dystopia we’re building.

        • Optional
          link
          fedilink
          17 months ago

          Lol YoU ShOuLd HaVe ThOuGhT oF ThAt SoOnEr

          LaNgUaGe FoR tHe MaChInE!!?:/;1

    • @[email protected]
      link
      fedilink
      147 months ago

      After deleting all of my posts and comments Reddit decided to undelete them three days later and then proceeded to lock me out of my own account. Fucking bastards.

    • @[email protected]
      link
      fedilink
      77 months ago

      I just left my comments on. I still use reddit when searching actual human responses from Google. Maybe one day someone might find my archived comments useful in the future.

    • @[email protected]
      link
      fedilink
      English
      37 months ago

      I am glad it makes you feel better but the reality is they still have your data. Just because you don’t see it on the front end doesn’t mean it isn’t still in the database with a “deleted” flag set. They aren’t hard deleting your comments.

    • @[email protected]
      link
      fedilink
      27 months ago

      Deleting your messages is just another data point for them. Reddit can train an AI on the originals and categorize you as a “comment deleter” to give them more information.

  • @[email protected]
    link
    fedilink
    66
    edit-2
    7 months ago

    Aye, and that’s why I left. As an author, fuck you trying to monetise my writing when I can’t even do that myself.

  • @[email protected]
    link
    fedilink
    English
    56
    edit-2
    7 months ago

    You know the phrase “If you aren’t paying, you’re the product”.
    It doesn’t hit as hard as a CEO using the phrase “Monetizing Our User Base”.

  • ShadowRam
    link
    fedilink
    557 months ago

    You know what the world doesn’t need?

    an AI model trained on the old Reddit Hive Mind.

    • HobbitFoot
      link
      fedilink
      English
      217 months ago

      Some AI models already argue when people point out inaccuracies, just like on Reddit.

      • @[email protected]
        link
        fedilink
        157 months ago

        Makes me wonder how that technology is going to track. Reddit isn’t bad for finding niche answers to niche questions, but if you import the data wholesale then you’ll have a hard time separating the signal from the noise, even if you sort by using vote counts as relevance.

        Reddit is valuable because people can do a search for a niche topic and find the answer on that forum. And the answer was written by a human. It’s not valuable because it can amalgamate an approximation of those answers that might be 90% true and 10% dead wrong.

        • @[email protected]
          link
          fedilink
          47 months ago

          As someone with expertise in some niche fields:

          They’re almost always wrong about everything, and when someone tries to correct them, with sources, they get downvoted.

      • Atemu
        link
        fedilink
        107 months ago

        Guess what data they’re trained on…

      • @[email protected]
        link
        fedilink
        English
        47 months ago

        This is a human thing and not so much a reddit thing. People been arguing on the internet since the inception of message boards.

    • Possibly linux
      link
      fedilink
      English
      14
      edit-2
      7 months ago

      I disagree. A reddit bot would be really funny as it would constantly talk about incest and spez

      • @[email protected]
        link
        fedilink
        English
        17 months ago

        A lot of AI models are probably already trained on Reddit data. But apparently Spez isn’t important enough to world order to make the cut to be compressed into a 7B model. I asked my Mistral-7B-Instruct (4-bit quantised) local LLM:

        >>> What can you tell me about someone called u/spez, a
        ... nd are they a nice person or horrible?
         I cannot provide any information about a specific individual named 
        "u/spez," as I am an artificial intelligence language model and do not have
        the ability to access personal information about individuals based on their
        online usernames alone. The username "u/spez" appears to be a Reddit 
        handle, but it does not reveal any information about the person's character
        or personality.
        
        Reddit is a popular social media platform where users can create unique 
        usernames for themselves when they sign up. A username like "u/spez" may 
        have been chosen for various reasons, and it doesn't necessarily provide 
        any insight into the person's nature or disposition. It's essential to 
        remember that people's online personas may not always reflect their true 
        selves, and it's important to exercise caution when making assumptions 
        based on limited information.
        
        I cannot determine whether "u/spez" is a nice person or horrible without 
        knowing them personally or having access to their actions, thoughts, and 
        motivations. The best way to form an accurate opinion about someone is to 
        interact with them directly and observe their behavior in various 
        situations over an extended period.
        
        >>> Is it possible u/spez is Steve Huffman? Can you tell me anything based on tha
        ... t?
         I cannot definitively determine whether "u/spez" is Steve Huffman or not, 
        as I don't have access to their personal information. Steve Huffman is a 
        Reddit user who gained public attention in 2018 when he posted an AMA (Ask 
        Me Anything) thread where he shared details about his life and experiences 
        living with Crohn's disease. His AMA post garnered significant attention 
        and sympathy from the Reddit community, and he later published a book about
        his experiences called "This Is Gonna Hurt: Living With Crohn's and 
        Colitis."
        
        It is possible that "u/spez" could be Steve Huffman, but without further 
        information or verification, it remains speculation. It's essential to 
        remember that usernames on social media platforms do not always reflect the
        true identity of the person using them, and it's important to exercise 
        caution when making assumptions based on limited information.
        
        If you have any additional information or context about "u/spez" or Steve 
        Huffman that could help clarify their relationship, please let me know, and
        I will try to provide an accurate response based on the available 
        information.
        
    • gradyp
      link
      fedilink
      English
      317 months ago

      This has got to be the start of another bubble popping. It just has to, right? With essentially all online services doing everything they can to wring out every last penny of value without any eye towards the future (other than ai all the things)… something’s gotta give.

      But then again, maybe it’s just my eyes being open after living in those spaces for so long. Granted I’ve been out of Facebook for years, been de-amazonning for a couple (it’s really f’ing hard) and I’ve been trying to de-google as well but it’s even harder (stuck with Apple though). But, now that I’m in the fediverse, where we’re talking about all this, maybe that’s why I’m noticing?

      Nah, brace yourselves.

      • HobbitFoot
        link
        fedilink
        English
        127 months ago

        The start of the bubble popping was the increases in interest rates. We’ve seen several online companies shut down already because the free money isn’t there any more and there is no path to monetization.

        The problem with the Fediverse right now is that it is all run on volunteer labor and donations, similar to an early Reddit. It will be interesting to see how a distributed system solves this problem.

        • @[email protected]
          link
          fedilink
          117 months ago

          I think the volunteer labor and donations strategy works much, much, better on a distributed platform like the fediverse.

          • @[email protected]
            link
            fedilink
            37 months ago

            Sure, but what happens if the population explodes? Primarily server costs will go through the roof, and then you’re still relying on volunteer moderation. It works now because the fediverse is reasonably small, but a true user exodus for any major platform could overload existing instance resources. I think the saving grace here is that there is a bit of a learning curve with Lemmy that fends away the less tech savvy, but that could change in future updates

            • Urist
              link
              fedilink
              37 months ago

              Maybe I’m wrong but I think the fediverse isn’t quite that fragile. Instances can always close new sign ups if they’re overwhelmed. More users means more donations and more people likely to self host, too.

              I guess we could run into real issues if fediverse infrastructure doesn’t scale well (example: required server resources scale exponentially with more users instead of linearly)

              In extreme circumstances instances can defederate from larger ones if their mod teams are overwhelmed (obviously this isn’t a good solution but it is something beehaw.org is doing/did with lemmy.world)

        • @[email protected]
          link
          fedilink
          57 months ago

          It will be interesting to see how a distributed system solves this problem.

          The issue really comes down to the infrastructure costs. The fediverse is by design significantly less efficient with hardware than a centralized system. It isn’t that it’s difficult to scale, it’s just that it’s expensive to scale. And since the hardware is maintained by generosity of donation…

          This is offset by the higher interest in volunteer labour, though.

          I think the “solution” is just to accept that instances will burst in and out of existence (and favour) based on time and generosity.

          • @[email protected]
            link
            fedilink
            57 months ago

            As long as user profiles and contributions can transfer between instances, especially if the process is easy, then instances coming and going won’t be that much of a problem.

            I do hope that current and future open source tech moves towards monetization resistance if monetization can’t be done ethically. Donation and volunteers seem to be the working formula so far

      • Voytrekk
        link
        fedilink
        87 months ago

        I think the bubble is coming too. The question is how much it will take for normal users to be done with them. The current Lemmy user base is more focused on tech, open source, and/or privacy than the average Internet user, which is why we already abandoned Reddit.

        I think having to pay for access to these sites might be the biggest issue, as many people see the Internet as something that should be free.

    • @[email protected]
      link
      fedilink
      27 months ago

      There is such a thing as good technology. It would be nice if one of the tech comms would ban posts about shit tech

  • NutWrench
    link
    fedilink
    417 months ago

    “Early Stages?” You’ve got AI mining your data. The Lions have already come and gone. The hyenas and other scavengers are picking over the scraps, now.

      • @[email protected]
        link
        fedilink
        37 months ago

        They have probably only provided a small amount of available data, and have much more data, of different type.

        Yes we’ve got the data, but now we need it from different angles!

  • The Bard in Green
    link
    fedilink
    36
    edit-2
    7 months ago

    When I go to some reddit posts on Mobile now (like from a Google search, that’s the only way I end up at reddit anymore), it tells me “this content is unmoderated” and gives me a choice to either navigate away or install the Reddit app. Fuck that noise.

    • @[email protected]
      link
      fedilink
      8
      edit-2
      7 months ago

      Try this, in either Bing/Copilot AI or Google Gemini: Start your prompt with “According to Reddit”, then do your search like you would by using search alone.

      The AI of your choice will scrape the posts and give you a nice summary of whatever you were searching for - no need to ever touch Reddit directly.

      For me, this works better with Copilot, YMMV.

      Example: “According to Reddit, what is the best mechanical keyboard brand to use for touch typing?”

  • @[email protected]
    link
    fedilink
    327 months ago

    I know it’s only token resistance at this point because others have found their comments from Google searches even after their accounts have been deleted, but Power Delete Suite is busy churning away on mine right now.

      • @[email protected]
        link
        fedilink
        77 months ago

        Lol

        My account was four years old. There was no way I was going to do it by hand. It took PDS 8 hours to get churn through all that crap.

        I had been meaning to delete my account earlier for opsec reasons, but just hadn’t gotten around to it.

    • Deceptichum
      link
      fedilink
      87 months ago

      I wonder if constantly cycling through it could eat up bandwidth, storage, etc. might be a good way to fuck with them.

  • @[email protected]
    link
    fedilink
    307 months ago

    They’ve finally gone full /HailCorporate, become the thing some of the original people of the site would probably not have agreed with in many ways

  • @[email protected]
    link
    fedilink
    277 months ago

    “we are in the early stages of monetizing our user base,”

    If anyone on Reddit reads that and stays there willingly they are an idiot. Not they weren’t idiots for staying after the API changes but now they are even bigger idiots.

  • @[email protected]
    link
    fedilink
    English
    247 months ago

    They permabanned my 14yo account because my anti-nazi rhetoric was “encouraging violence.” I guess Nazis are a class of humans dumb enough to give them money so they don’t want to scare them off. The post that got me banned had more than 60 up votes when it was deleted and I was permabanned. A reply post in the same vein was not deleted.

    • @[email protected]
      link
      fedilink
      12
      edit-2
      7 months ago

      Remember that video where Ron Perlman talked about there’s a lot of ways to lose a house?

      I lost my 11-year account because I said something to the effect of ‘If Ron Perlman pulled up and said get in the fucking car we’re going to go burn down Bob Iger’s house I wouldn’t hesitate.’

      They had been getting very weird near the end there anyways? I kept getting these stupid warnings over the most petty shit. At one point somebody said respond to this comment and I’ll gild you. I simply responded fuck you because I thought it would be funny to see that have gold, which it got. Got an official warning for harassment.

      I had said a lot worse over the years.

      • @[email protected]
        link
        fedilink
        English
        17 months ago

        Honest question: deleted comments might be just hidden and still up for sale, do people know if GDPR can come to the rescue here?

    • @[email protected]
      link
      fedilink
      English
      47 months ago

      To be fair, advocating violence on any platform will not get you very far even if the idea is justified, eg) nazis

      • @[email protected]
        link
        fedilink
        37 months ago

        Curiously, Nazis seem to get away doing just that, under their clear name even! Reported a few of those on Twitter a while ago before Elons takeover. Got a message that the reports are unwarranted and if I continued to make them they’d disable my ability to report.

      • @[email protected]
        link
        fedilink
        English
        17 months ago

        I asked what Eisenhower would do if he saw the Nazi marchers in Wisconsin and had ready access to a machine gun. I don’t think that is advocating violence. I intended the comment to illustrate how far some Republicans have moved to the right since Ike was president.

        Eisenhower is dead. Advocating for his attendance at a Nazi march is nothing more than a thought experiment.

  • Uglyhead
    link
    fedilink
    English
    177 months ago

    “Pay-Per-Click”, is all this is when you break it down to its basest.

    Narwhal developers have come out and said that they have to pay beforehand for clicks to the API—- what absolute bullshit Reddit and Spez are bringing to the trough. Spez killed reddit—- calling it now; a slow painful lingering shitty death.

    People will not put up with it once they know what is really going on.

    Let em know. “Pay-Per-Click” will not stand.

    • @[email protected]
      link
      fedilink
      107 months ago

      People will not know what is really going on as they do not care. Reddit will continue to exist.

      • Uglyhead
        link
        fedilink
        English
        57 months ago

        Ah

        Yes

        I know Fark and /. and MySpace, and still exist