The Center for AI Safety, an AI doom crank nonprofit, and Scale AI have released a new AI benchmark called “Humanity’s Last Exam.” This supposedly tests “world-class expert-level reasoning and know…
It’s just pure grift, they’ve creating an experiment with an outcome that tells us no new information. Even if models stop ‘improving’ today, it’s a static benchmark and by EOY worked solutions will be leaked into the training of any new models, so performance will saturate to 90%. At which point, the Dan and the AI Safety folks at his fake ass not-4-profit can clutch their pearls and claim humanity is obsolete so they need more billionaire funding to save us & Sam and Dario can get more investors to buy them gpus. If anything, I’m hoping the Frontier Math debacle would inoculate us all against this bullshit (at least I think it’s stolen some of the thunder from their benchmark’s attempt to hype the end of days🫠)
I mean is it really leaking if you can get access to the dataset without signing anything agreeing to not leak it? When I last checked you could just like look at the questions after checking a box acknowledging that they can see your email address but that’s it.
It’s just pure grift, they’ve creating an experiment with an outcome that tells us no new information. Even if models stop ‘improving’ today, it’s a static benchmark and by EOY worked solutions will be leaked into the training of any new models, so performance will saturate to 90%. At which point, the Dan and the AI Safety folks at his fake ass not-4-profit can clutch their pearls and claim humanity is obsolete so they need more billionaire funding to save us & Sam and Dario can get more investors to buy them gpus. If anything, I’m hoping the Frontier Math debacle would inoculate us all against this bullshit (at least I think it’s stolen some of the thunder from their benchmark’s attempt to hype the end of days🫠)
I mean is it really leaking if you can get access to the dataset without signing anything agreeing to not leak it? When I last checked you could just like look at the questions after checking a box acknowledging that they can see your email address but that’s it.