METR couldn't repeat its AI coding study because devs refused to work without AI. Amazon shut down its token leaderboard. Uber blew its AI budget in four months.
Anthropic’s latest AI model has reportedly reached the top of the Super-Agent benchmark, a grueling test of whether an AI system can take a real-world code repository and run it from scratch without ...