• andrew_bidlaw@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    23
    ·
    13 hours ago

    As it learns from our data, no wonder it fucks up at regexps. They are the arcane knowledge not accessible to us mere mortals, nor to LLMs.

    • ryathal@sh.itjust.works
      link
      fedilink
      arrow-up
      18
      arrow-down
      2
      ·
      12 hours ago

      If you know even a little about how an LLM works it’s obvious why regex is basically impossible for it. I suspect perl has similar problems, but no one is capable of actually validating that.

      • Ignotum@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        4 hours ago

        What do you mean it’s impossible for it? I know how LLMs work but I don’t know if any such limitations

        Write me a regex that matches a letter repeated four times, followed by a 3 or 4 digit number

        Here’s your regex: ([a-zA-Z])\1{3}\d{3,4}

        • ryathal@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          3 hours ago

          They aren’t context aware, it’s using statistical probability. It can replicate things it’s seen a lot of like a tutorial regex. It can’t apply that to make a more complicated one. Regex in the wild isn’t really standard at all, because it’s rarely used to solve common problems. It has a bunch of random regexs from code it analyzed and will spit something out that looks similar.