Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark We measured token usage across reasoning models: open models output 1.5-4x more tokens than closed models on identical tasks, but with huge variance depending on task type (up to 10x on simple questions). This hidden cost often negates per-token pricing advantages. Token efficiency should become a primary target alongside accuracy benchmarks, especially considering non-reasoning use cases. Read the thorough review of reasoning efficiency across the open and closed model landscape in our latest blog post in collaboration with our researcher in residence, Tim. See more of their work here:
19,39K