To improve adaptability of large language models to challenging tasks that require reasoning, MIT researchers found strategically applying a method known as test-time training with task-specific examples can boost the accuracy of an LLM more than sixfold.