Tag
2 articles
OpenAI announces it will no longer evaluate SWE-bench Verified due to contamination and data leakage issues. The organization recommends SWE-bench Pro as a replacement.
OpenAI plans to retire the SWE-bench Verified benchmark, citing flaws that undermine its validity as a coding performance measure. The move highlights concerns about memorization in AI model evaluations.