Here's a comparison of GPT5 and Grok 4 GPT 5 with tools is between Grok 4 and Grok 4 Heavy in Humanity's Last Exam benchmark
Given it's a single agent rather than a swarm of agents, that's very impressive
*GPT 5 pro
16,7M