A year ago, we verified a preview of an unreleased version of @OpenAI o3 (High) that scored 88% on ARC-AGI-1 at est. $4.5k/task Today, we’ve verified a new GPT-5.2 Pro (X-High) SOTA score of 90.5% at $11.64/task This represents a ~390X efficiency improvement in one year
We also verified that GPT-5.2 Pro (High) is SOTA for ARC-AGI-2, scoring 54.2% for $15.72/task (Due to API timeouts, we were unable to reliably verify GPT 5.2 Pro X-High on ARC-AGI-2) All verified GPT-5.2 family scores:
ARC-AGI is achieving its 2019 goal to push AI beyond memorization towards efficient on-the-fly adaptation Reasoning systems now show genuine fluid intelligence on simple tasks
Even with this big efficiency improvement, there remains a large gap vs humans The 2025 Grand Prize goal was $0.20/task and humans are several orders of magnitude more efficient on an energy basis There is still much to learn from ARC-AGI-1 and ARC-AGI-2
ARC-AGI-3 (2026) will drive AI capability and efficiency even further Designed to measure the ability of AI to efficiently learn and generalize in novel environments, it will be a first-of-its-kind Interactive Reasoning Benchmark Stay tuned
If shipping hundreds of novel games that test the frontier of AI in just a few months sounds exciting, join the engineering team creating ARC-AGI-3
919