I'm confused about this. It's so incongruent with my actual day to day experience of LLMs, which is that they mostly all sound the same now, and have been since Claude 3.5 Sonnet. Either benchmarks have become meaningless or models have simply become so good that further improvements can't be felt by vibes, but only through rigorous analysis.
Tolga Bilge
Tolga Bilge11.8. klo 10.47
Are the sigmoids in the room with us right now?
6,52K