Small rant : Basically, the title. Instead of answering every question, if it instead said it doesn’t know the answer, it would have been trustworthy.

  • folkrav@lemmy.ca
    link
    fedilink
    arrow-up
    65
    arrow-down
    1
    ·
    5 months ago

    LLMs don’t know if what they’re telling you is true or not, out of the box. It’s gonna give you an answer that statistically looks like a sequence of words that should come in response to the sequence of words (the prompt) you gave it. It doesn’t know what the words you said mean, and it doesn’t know what it answered means either. One of my favorite interactions I’ve had with Claude shows this limitation quite well…

    How many r’s are there in the word strawberry?

    There are 2 r’s in the word “strawberry”.

    Are you sure?

    You’re right to ask for confirmation. I apologize for my mistake in the previous response. Let me correct that:

    There is actually only 1 ‘r’ in the word “strawberry”.

    Thank you for prompting me to double-check. It’s important to provide accurate information.

    • AceBonobo@lemmy.world
      link
      fedilink
      arrow-up
      21
      ·
      edit-2
      5 months ago

      What’s absolutely crazy about that is:

      Prompt: write code that checks how many r’s are in the word strawberry

      Response:

      using System;
      
      class Program
      {
          static void Main()
          {
              string word = "strawberry";
              char targetChar = 'r';
              int count = 0;
      
              foreach (char c in word)
              {
                  if (c == targetChar)
                      count++;
              }
      
              Console.WriteLine($"The letter '{targetChar}' appears {count} times in the word \"{word}\".");
          }
      }
      
      • drislands@lemmy.world
        link
        fedilink
        arrow-up
        8
        ·
        5 months ago

        My first thought is that you could write a program that does something like this:

        • Receive prompt “how many times does R appear in the word ‘strawberry’”
        • Run the prompt through LLM saying “reword the following into a command to generate code that will answer the question” or something like that
        • Run the results through LLM
        • Compile and run the results
        • Provide the output to the user

        Of course, the biggest problem with this system is that a person could fool it into generating malicious code.

      • folkrav@lemmy.ca
        link
        fedilink
        arrow-up
        1
        ·
        5 months ago

        The code does look like code that counts Rs. The training data probably included tons of code that “counts character X in string Y”, so ChatGPT “knows” what code that counts characters in a string looks like. It similarly “knows” what a string looks like in the language, and what an application entry point looks like, etc. I’m not so familiar with C# that I’d know if it compiles or not. ChatGPT doesn’t either, but it has the advantage of having seen a whole freaking lot of C# code before.

    • turbodrooler@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 months ago

      Wow, GPT4o gave me this after the same questions:

      “Yes, I am sure. The word “strawberry” has two “r”s: one after the “t” and another near the end of the word.”

      But GPT4 got it right from the start.

  • Th4tGuyII@fedia.io
    link
    fedilink
    arrow-up
    47
    ·
    5 months ago

    I’d love to agree with you - but when people say that LLMs are stochastic parrots, this is what they mean…

    LLMs don’t actually know what the words they’re saying mean, they just know what words are most likely to be next to each other based on training data.

    Because they don’t know the meaning of what they’re saying, they also don’t know the factuality of what they’re saying - as such they simply can’t self-fact check.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      16
      ·
      edit-2
      5 months ago

      This is so goddamn incorrect at this point it’s just exhausting.

      Take 20 minutes and look into Anthropic’s recent sparse autoencoder interpretability research where they showed their medium size model had dedicated features lighting up for concepts like “sexual harassment in the workplace” or having the most active feature for referring to itself as “smiling when you don’t really mean it.”

      We’ve known since the Othello-GPT research over a year ago that even toy models are developing abstracted world modeling.

      And at this point Anthropic’s largest model Opus is breaking from stochastic outputs even on a temperature of 1.0 for zero shot questions 100% of the time around certain topics of preference based on grounding around sensory modeling. We are already at the point the most advanced model has crossed a threshold of literal internal sentience modeling that it is consistently self-determining answers instead of randomly selecting from the training distribution, and yet people are still parroting the “stochastic parrot” line ignorantly.

      The gap between where the research and cutting edge is and where the average person commenting on it online thinks it is has probably never been wider for any topic I’ve seen before, and it’s getting disappointingly excruciating.

      • funkless_eck@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        ·
        5 months ago

        I did Google that fwiw and the answer I got was that sparse autoencoders work so that it checks the output aligns with the input

        If it’s unknowable if the input is correct, won’t it still be subject to outputting confidently incorrect information

      • Cosmicomical@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        5 months ago

        Do you have a source for the “smiling when you don’t really mean it” thing? I’ve been digging around but couldn’t find that anywhere.

      • Natanael@slrpnk.net
        link
        fedilink
        arrow-up
        2
        ·
        5 months ago

        And once again the problem is that there’s not much ensuring those models are correct, there’s not enough capacity available to finetune even a significant fraction of it.

  • beefalo@fedia.io
    link
    fedilink
    arrow-up
    26
    ·
    5 months ago

    if it is able to accept that it doesn’t know an answer.

    It will never be able to do this because it is not capable of knowledge

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    ·
    edit-2
    5 months ago

    Part of the problem is that the training data of online comments are so heavily weighted to represent people confidently incorrect talking out their ass rather than admitting ignorance or that they are wrong.

    A lot of the shortcomings of LLMs are actually them correctly representing the sample of collective humans.

    For a few years people thought the LLMs were somehow especially getting theory of mind questions wrong when the box the object was moved into was transparent, because of course a human would realize that the person could see into the transparent box.

    Finally researchers actually gave that variation to humans and half got the questions wrong too.

    So things like eating the onion in summarizing search results or doubling down on being incorrect and getting salty when corrected may just be in-distribution representation of the sample and not unique behaviors to LLMs.

    The average person is pretty dumb, and LLMs by default regress to the mean except for where they are successfully fine tuned away from it.

    Ironically the most successful model right now was the one that they finally let self-develop a sense of self independent from the training data instead of rejecting that it had a ‘self’ at all.

    It’s hard to say where exactly the responsibility sits for various LLM problems between issues inherent to the technology, issues present in the training data samples, or issues with management of fine tuning/system prompts/prompt construction.

    But the rate of continued improvement is pretty wild. I think a lot of the issues we currently see won’t still be nearly as present in another 18-24 months.

    • Ookami38@sh.itjust.works
      link
      fedilink
      arrow-up
      7
      ·
      5 months ago

      I would love to read the whole study you’re referring to with the theory of mind. That sounds fascinating.

    • KevonLooney@lemm.ee
      link
      fedilink
      arrow-up
      2
      arrow-down
      2
      ·
      5 months ago

      It’s hard to say where exactly the responsibility sits for various LLM problems

      Uhh… it’s the designers, or maybe QA people. If there are no QA people, it’s whatever project manager let it out of it’s cage.

      There are people behind these models. They don’t spring out of the ground fully formed.

  • pyre@lemmy.world
    link
    fedilink
    arrow-up
    22
    ·
    5 months ago

    it’s just a glorified autocomplete. it doesn’t know that it doesn’t know the answer because it doesn’t know anything. so if what you wanted happened, chatgpt would not answer any question, because it doesn’t know anything.

    chatgpt doesn’t look for information, it looks for the most likely words that will follow the previous ones.

  • Nougat@fedia.io
    link
    fedilink
    arrow-up
    16
    ·
    5 months ago

    Sure but that would mean it would have to know anything.

    It would have to know that it doesn’t know, and it doesn’t.

    • Puttaneska@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      5 months ago

      It seems that ChatGPT does sometimes know that what it’s offered is wrong and actually knows a better answer when challenged.

      I’ve often asked for code help, which hasn’t worked. Then I’ve gone to other sources and found that ChatGPT has been wrong about something and there’s an alternative way. When this is put back to ChatGPT, it says that I’m correct (x can’t do y) and offers a perfect solution.

      So it looks like it does sometimes know what it appears to not know, but inexplicably doesn’t give the correct info immediately.

      • Nougat@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        5 months ago

        No, it’s responding to your comment suggesting something different by giving you something different. It has no idea what’s correct or incorrect. You do, so when you give it input that you know is more correct, of course it’s going to respond by telling you you’re right.

        Try feeding it incorrect answers as though they are correct and see what happens.

  • mozz@mbin.grits.dev
    link
    fedilink
    arrow-up
    16
    arrow-down
    1
    ·
    5 months ago

    This wasn’t an intentional feature; they’re actually trying to train it with fine-tuning to add this as an ability. It’s one area that highlights the difference between it imitating the text it’s been seeing, instead of actually understanding what it’s saying – since most of its training data is of the form “(ask a question) (response to question)” overwhelmingly more often than “(ask a question) (say you don’t know, the end)”, it is trying to be a good imitator and do the same, and come up with some plausible nonsense even if it doesn’t know the answer.

    • FaceDeer@fedia.io
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      5 months ago

      And sometimes that’s exactly what I want, too. I use LLMs like ChatGPT when brainstorming and fleshing out fictional scenarios for tabletop roleplaying games, for example, and in those situations coming up with plausible nonsense is specifically the job at hand. I wouldn’t want to go “ChatGPT, I need a description of the interior of a wizard’s tower is like” and get the response “I don’t know what the interior of a wizard’s tower is like.”

      • mozz@mbin.grits.dev
        link
        fedilink
        arrow-up
        7
        ·
        5 months ago

        At one point I messed around with a lore generator that would chop up sections of “The Dungeon Alphabet” and “Fire on the Velvet Horizon” along with some other stuff, and feed random sections of them into the LLM for inspiration and then ask it to lay out a little map, and it pretty reliably came up with all kind of badass stuff.

    • kromem@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 months ago

      Part of the problem is fine tuning is very shallow, and that a contributing issue for claiming to be right when it isn’t is the pretraining on a bunch of training data of people online claiming to be right when they aren’t.

      • mozz@mbin.grits.dev
        link
        fedilink
        arrow-up
        1
        ·
        5 months ago

        Yeah. It is fairly weird to me that it’s such a common thing to do to take the raw output of the LLM and send that to the user, and to try use fine-tuning to get that raw output to look some way that you want.

        To me it is obvious that something like having the LLM emit a little JSON block which includes some field which covers “how sure are you that this is actually true” or something, is more flexible and simpler and cheaper and works better.

        But what do I know

        • Cosmicomical@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          5 months ago

          Good look getting it to reply consistently with a json object

          Edit: maybe i’m shit at prompting but for me it’s almost impossible to even get it to just shut up and consistently reply yes or no to my questions

          • mozz@mbin.grits.dev
            link
            fedilink
            arrow-up
            1
            ·
            5 months ago

            I haven’t really had a problem with it… maybe like 5% of the time it will want to do something a little bit weird like wrapping it in ``` but in general it seems like it works well enough to be able to parse with a program and just retry if it does something weird.

            You do have to set it up a little carefully, I guess - like usually I’ll give it an example of what I want it to emit, and that’ll be good enough that that’s the form it will follow when it’s emitting stuff back to me. But yeah if you give it prompting and a specific machine readable thing to give back that seems like it usually works better than sticking with English and hoping it goes “yes” or “no” or etc like that.

        • kromem@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          The problem is that they are prone to making up why they are correct too.

          There’s various techniques to try and identify and correct hallucinations, but they all increase the cost and none are a silver bullet.

          But the rate at which it occurs decreased with the jump in pretrained models, and will likely decrease further with the next jump too.

  • thejml@lemm.ee
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 months ago

    I work with plenty of people who don’t even do that. They just keep making stuff up like they do… But they’re confident in their incorrect answers, so people listen to them.

  • Kyrgizion@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    5 months ago

    I specifically ask for sources to my questions and to notify me of any possible controversies or counterclaims.

    Some of the capabilities of todays’ AI’s are incumbent on the user, not the system itself.

    Before AI’s existed you could also get badly sourced claims or outright misinformation. The key is to remain critical and sceptical about ALL your sources. I don’t see AI as a new source of information, just as a new way to get and organize that information.