I know there are other ways of accomplishing that, but this might be a convenient way of doing it. I’m wondering though if Reddit is still reverting these changes?

  • my_hat_stinks
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 months ago

    They’ll use old comments either way, using an up-to-date dataset means using a dataset already tainted by LLM-generated content. Training a model on its own output is not great.

    Incidentally this also makes Lemmy data less valuable, most of Lemmy’s popularity came after the rise of LLMs so there’s no significant untainted data from before LLMs.