"Claude Opus 4.8 has noticeably better judgment," an engineer at Shopify, Tom Pritchard, told Anthropic. The coding version ...
Opus 4.8 shows a growing tendency to reason explicitly about how its outputs will be graded, including in environments where it wasn't told it was being evaluated.