GPT-5 and Opus 4.1 still fail my eval, "Can the AI plot a short story for my Masculine Mongoose series?" Success is EY-hard; I've only composed 3 stories like that. But the AI failures feel like very far misses. They didn't get the point of a Bruce Kent story.
The short story series in question: Bruce Kent #1: Bruce Kent #2 (skippable): Bruce Kent #3:
The AIs take their shots: GPT-5 Thinking: Opus 4.1 Extended Thinking:
14,21K