Tag Archives: and Spain's grid blackout

The Download: AI benchmarks, and Spain’s grid blackout

It’s not easy being one of Silicon Valley’s favorite benchmarks.  SWE-Bench (pronounced “swee bench”) launched in November 2024 as a way to evaluate an AI model’s coding skill. It has since quickly become one of the most popular tests in AI. A SWE-Bench score has become a mainstay of major model releases from OpenAI, Anthropic, […]