Symbolica Agentica SDK Achieves 36% on ARC-AGI-3 Day 1 — 100x Better Than Frontier Models at 1/9th the Cost, Using Agentic Program Synthesis

On March 26, 2026, Symbolica AI published results showing its Agentica SDK achieved 36.08% on the ARC-AGI-3 benchmark on its first day of availability — a score roughly 100 times higher than frontier foundation models using chain-of-thought reasoning alone. The results were published on the Symbolica blog and the open-source code was released on GitHub.

PERFORMANCE COMPARISON:

Symbolica Agentica SDK: 36.08% (113/182 playable levels, 7/25 games completed) for $1,005
Claude Opus 4.6 Max (CoT): 0.2% for $8,900
GPT-5.4 High (CoT): 0.3%
Gemini 3.1 Pro: 0.37% (top frontier model score per ARC Prize leaderboard)

The cost-efficiency gap is staggering: Agentica achieved 36% for $1,005 while Opus 4.6 spent $8,900 to achieve just 0.25%. That represents approximately 1,260x better cost-performance ratio.

HOW IT WORKS: ARC-AGI-3 is fundamentally different from previous benchmarks — it is an interactive game environment where the AI must explore, experiment, and iteratively solve puzzles. Standard chain-of-thought prompting fails because:

Tasks require multi-step interaction with the environment
Solutions cannot be derived from a single reasoning pass
The AI must form hypotheses, test them, and refine — requiring genuine agentic behavior

Symbolica Agentica uses agentic program synthesis — the SDK generates candidate solution programs, tests them against the interactive environment, observes results, and iteratively refines its approach. This mirrors how human programmers solve novel problems: write code, run it, observe output, debug, repeat.

GAME-BY-GAME RESULTS: The SDK performed differently across game types, revealing where agentic reasoning excels:

CN04: 97.6% (nearly perfect, won)
LP85: 84.16% (won)
AR25: 83.28% (won)
FT09: 77.59% (won)
Lower performers: BP35 at 0.22%, SP80 at 0.73%

The variance shows the approach works spectacularly on some reasoning domains while struggling on others — suggesting that different agentic strategies may be needed for different problem types.

WHY THIS MATTERS: ARC-AGI-3 launched March 25 with a $2M prize for any AI that matches untrained human performance. The ARC Prize Foundation calls it the only unsaturated general agentic intelligence benchmark as of March 2026. The fact that Symbolica dramatically outperformed frontier models with an agentic approach — while those models scored under 1% — suggests that the next leap in AI capability may come from better agent architectures rather than larger models.

The result was featured on Hacker News front page and validates the thesis that agentic frameworks (where AI systems interact with environments iteratively) outperform pure reasoning approaches on tasks requiring exploration and adaptation.

Symbolica Agentica SDK Achieves 36% on ARC-AGI-3 Day 1 — 100x Better Than Frontier Models at 1/9th the Cost, Using Agentic Program Synthesis

Sources

Share this article

🧠 Stay Updated on AI Agents

Deploy Your AI Agent Today

More from Agentic AI