• 22 Posts
  • 456 Comments
Joined 1 year ago
cake
Cake day: August 13th, 2023

help-circle
  • so openai claims to be doing great on the FrontierMath dataset. Iā€™ve already seen the usual sort of dipshits using this to pump ai on reddit, and hereā€™s a post that went to the frontpage on HN:

    https://xenaproject.wordpress.com/2024/12/22/can-ai-do-maths-yet-thoughts-from-a-mathematician/

    (tl;dr only a few problems from the dataset are public but if representative the problems are about 25% survivable by an undergrad; coincidentally this is the % openai says their models are completing.)

    this post is by kevin buzzard. he has a letā€™s say not easily beloved personality, but I donā€™t think of him as credulous or grifty, and people in his area regard him as an excellent mathematician.

    he points out but I think does not focus enough on how discrediting the secretive nature of the dataset is. the fact that you canā€™t make it public is necessary to run such experiments in a scientifically reasonable way, but also makes it totally impossible to run the experiment in a scientifically reasonable way. an experiment which cannot be examined or reproduced is actually the opposite of science. itā€™s pure grift fuel