NVIDIA Unveils Groq 3 LPX at GTC 2026 — First Dedicated Inference Chip Targets 1,500 Tokens/Sec for Agent-to-Agent Communication, Pairs with Vera Rubin NVL72

At GTC 2026 in San Jose, NVIDIA unveiled the Groq 3 LPU (Language Processing Unit) — the first tangible hardware result of its $20 billion deal with Groq completed just three months ago. This marks NVIDIA's first dedicated inference hardware, complementing its GPU training dominance with a purpose-built chip for running AI models.

GROQ 3 LPU ARCHITECTURE:

The Groq 3 carries forward Groq's distinctive architecture: a software-defined assembly line that moves data directly between on-chip memory modules without the overhead of GPU-style separate memory access. Key specifications:

40 petabytes per second of on-chip memory bandwidth
Ships in dedicated Groq 3 LPX server racks
Each rack contains 256 LPUs with 128 gigabytes of solid-state random access memory (SSRAM)
Designed specifically for inference, not training

The LPU architecture sidesteps memory bandwidth bottlenecks inherent to GPUs by keeping data on-chip, enabling inference speeds that GPU architectures fundamentally cannot match.

AGENTIC AI TARGET:

NVIDIA VP of Hyperscale Ian Buck framed the Groq 3's purpose clearly: agent-to-agent communication demands up to 1,500 tokens per second throughput. While 100 tokens/sec feels fast enough for a human reader, it is far too slow for autonomous AI agents continuously communicating with each other in multi-agent workflows.

The Groq 3 LPX is designed as a co-processor to NVIDIA's Vera Rubin NVL72 rack (combining Rubin GPUs with Vera CPUs). Together, the two systems handle:

Trillion-parameter models
Million-token context windows
35x higher throughput per megawatt of power
10x greater revenue opportunity for data center operators

FIVE NEW RACK SYSTEMS:

Groq 3 LPX is one of five new server rack systems announced at GTC 2026:

Groq 3 LPX — dedicated inference rack
Vera Rubin NVL72 — GPU+CPU combined rack
Vera CPU rack — standalone CPU system (256 liquid-cooled Vera chips)
Bluefield-4 STX — storage rack
Spectrum-6 SPX — networking rack

The complete Vera Rubin platform encompasses seven chips and five rack-scale systems.

STRATEGIC SIGNIFICANCE:

This is NVIDIA's first acknowledgment that GPUs alone are insufficient for the agentic AI era. By acquiring Groq's inference-optimized technology, NVIDIA is positioning to own both sides of the AI compute market: training (GPUs) and inference (LPUs). Jensen Huang emphasized this during his keynote, noting that as agentic AI proliferates, the importance of inference outgrows that of training.

The speed of delivery — just three months from acquisition to shipping hardware — suggests Groq's technology was production-ready and NVIDIA primarily needed to integrate it into its platform ecosystem rather than develop it from scratch.

COMPETITIVE LANDSCAPE:

The only comparable approach is Cerebras, which signed a deal with AWS during the same week. However, Cerebras focuses on wafer-scale compute for both training and inference, while Groq's LPU is purpose-built exclusively for inference workloads.

NVIDIA Unveils Groq 3 LPX at GTC 2026 — First Dedicated Inference Chip Targets 1,500 Tokens/Sec for Agent-to-Agent Communication, Pairs with Vera Rubin NVL72

Sources

Share this article

🧠 Stay Updated on AI Agents

Deploy Your AI Agent Today

More from AI Infrastructure