

The official Anthropic post/announcement
Very interesting read
The math guessing game (lol), the bullshitting of “thinking out loud”, being able to identify hidden (trained) biases, looking ahead when producing text, following multi-step reasoning, analyzing jailbreak prompts, analysis of antihallucination training and hallucinations
At the same time, we recognize the limitations of our current approach. Even on short, simple prompts, our method only captures a fraction of the total computation performed by Claude, and the mechanisms we do see may have some artifacts based on our tools which don’t reflect what is going on in the underlying model. It currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words.
Arguably, the openness is in that the EU OS can switch from one to another at some point if it becomes necessary.
Supporting multiple alternatives within the same platform and OS is costly. Not only the integration, but also user training and troubleshooting, specifically about the many, big and small subtle differences. Focusing on one, for now anyway, makes sense.