Ok - this is why you should never trust AI benchmarks.
A recent study compared Opus 4.6 (the latest Claude model) versus Opus 4.5 across 165 different tasks. The deep research concluded Opus 4.6 did no better than the previous model. But Opus 4.6 did this with 50% of the cost and 50% of the wall time. Still massive improvements, just be wary of benchmark headlines. The best way to determine whether a model is better than previous iterations is to test it for your specific needs. Anyways, well done to the Anthropic team for this massive cost/wall time reduction.
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
Ok - this is why you should never trust AI benchmarks.
A recent study compared Opus 4.6 (the latest Claude model) versus Opus 4.5 across 165 different tasks.
The deep research concluded Opus 4.6 did no better than the previous model.
But Opus 4.6 did this with 50% of the cost and 50% of the wall time.
Still massive improvements, just be wary of benchmark headlines.
The best way to determine whether a model is better than previous iterations is to test it for your specific needs.
Anyways, well done to the Anthropic team for this massive cost/wall time reduction.