Google Gemini 3 Makes a Huge Leap on the ARC‑AGI Benchmark
- By Bruce Nielson
- ML & AI Specialist
Google’s Gemini 3 has posted a standout result on the ARC Prize Leaderboard – ARC‑AGI‑1, scoring about 87.5% in its Deep Think preview — a very strong showing on a benchmark focused on abstract reasoning. ARC‑AGI‑1 (Abstraction & Reasoning Corpus for AGI) is designed to test fluid intelligence: each task provides a few input/output examples, and models must infer underlying rules rather than rely on memorization. The benchmark emphasizes skill-acquisition efficiency, rewarding reasoning and generalization over brute-force performance. (ARC Prize Technical Report)
This 87.5% score suggests Gemini 3 Deep Think is effectively reasoning with abstract rules, not just pattern matching, and positions it as a model capable of structured, AGI‑style problem solving. Beyond ARC‑AGI‑1, Google has publicly confirmed that Gemini 3 Deep Think achieves 45.1% on ARC‑AGI‑2, a more challenging follow-up benchmark that also tests reasoning and code execution skills. (Google Blog, VentureBeat)
If these results hold, they represent a meaningful step forward: Gemini 3 is not just a larger LLM but a reasoning-first model capable of abstract problem solving. High performance on ARC‑AGI‑1 signals efficient learning and generalization — core aspects of intelligence that many benchmarks don’t test — and marks a clear signal that AI systems are beginning to handle tasks previously out of reach for conventional models.