Wondering if Modern LLMs like GPT4, Claude Sonnet and llama 3 are closer to human intelligence or next word predictor. Also not sure if this graph is right way to visualize it.

  • I won’t pretend I understand all the math and the notation they use, but the abstract/conclusions seem clear enough.

    I’d argue what they’re presenting here isn’t the LLM actually “reasoning”. I don’t think the paper really claims that the AI does either.

    The CoT process they describe here I think is somewhat analogous to a very advanced version of prompting an LLM something like “Answer like a subject matter expert” and finding it improves the quality of the answer.

    They basically help break the problem into smaller steps and get the LLM to answer smaller questions based on those smaller steps. This likely also helps the AI because it was trained on these explained steps, or on smaller problems that it might string together.

    I think it mostly helps to transform the prompt into something that is easier for an LLM to respond accurately to. And because each substep is less complex, the LLM has an easier time as well. But the mechanism to break down a problem is quite rigid and not something trainable.

    It’s super cool tech, don’t get me wrong. But I wouldn’t say the AI is really “reasoning” here. It’s being prompted in a really clever way to increase the answer quality.