Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks. By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks.
Answer the question directly. Do not return any preamble, explanation, or reasoning.
Chain-of-Thought Think step by step to answer the following question. Return the answer at the end of the response after a separator ####.
Chain-of-Draft Think step by step, but only keep a minimum draft for each thinking step, with 5 words at most. Return the answer at the end of the response after a separator ####.
Thats interesting. Good tip.
Looking at their repo, they’ve tested this with LLM models that have not been trained to generate chain of thought outputs, by varying the system prompts. It’s therefore more of a proof of concept, but I can imagine that if you train a model to do this natively it could work.
Using the same prompt with QwQ made no difference for me (the chain of thought was still very long and quite verbose), while using it with Qwen2.5 Coder made the output extremely terse and not very useful for open-ended questions.