NVIDIA Unveils Groq 3 LPX at GTC 2026 — First Dedicated Inference Chip Targets 1,500 Tokens/Sec for Agent-to-Agent Communication, Pairs with Vera Rubin NVL72

At GTC 2026 in San Jose, NVIDIA unveiled the Groq 3 LPU (Language Processing Unit) — the first tangible hardware result of its $20 billion deal with Groq completed just three months ago. This marks NVIDIA's first dedicated inference hardware, complementing its GPU training dominance with a purpose-built chip for running AI models.
GROQ 3 LPU ARCHITECTURE:
The Groq 3 carries forward Groq's distinctive architecture: a software-defined assembly line that moves data directly between on-chip memory modules without the overhead of GPU-style separate memory access. Key specifications:
- 40 petabytes per second of on-chip memory bandwidth
- Ships in dedicated Groq 3 LPX server racks
- Each rack contains 256 LPUs with 128 gigabytes of solid-state random access memory (SSRAM)
- Designed specifically for inference, not training
The LPU architecture sidesteps memory bandwidth bottlenecks inherent to GPUs by keeping data on-chip, enabling inference speeds that GPU architectures fundamentally cannot match.
AGENTIC AI TARGET:
NVIDIA VP of Hyperscale Ian Buck framed the Groq 3's purpose clearly: agent-to-agent communication demands up to 1,500 tokens per second throughput. While 100 tokens/sec feels fast enough for a human reader, it is far too slow for autonomous AI agents continuously communicating with each other in multi-agent workflows.
The Groq 3 LPX is designed as a co-processor to NVIDIA's Vera Rubin NVL72 rack (combining Rubin GPUs with Vera CPUs). Together, the two systems handle:
- Trillion-parameter models
- Million-token context windows
- 35x higher throughput per megawatt of power
- 10x greater revenue opportunity for data center operators
FIVE NEW RACK SYSTEMS:
Groq 3 LPX is one of five new server rack systems announced at GTC 2026:
- Groq 3 LPX — dedicated inference rack
- Vera Rubin NVL72 — GPU+CPU combined rack
- Vera CPU rack — standalone CPU system (256 liquid-cooled Vera chips)
- Bluefield-4 STX — storage rack
- Spectrum-6 SPX — networking rack
The complete Vera Rubin platform encompasses seven chips and five rack-scale systems.
STRATEGIC SIGNIFICANCE:
This is NVIDIA's first acknowledgment that GPUs alone are insufficient for the agentic AI era. By acquiring Groq's inference-optimized technology, NVIDIA is positioning to own both sides of the AI compute market: training (GPUs) and inference (LPUs). Jensen Huang emphasized this during his keynote, noting that as agentic AI proliferates, the importance of inference outgrows that of training.
The speed of delivery — just three months from acquisition to shipping hardware — suggests Groq's technology was production-ready and NVIDIA primarily needed to integrate it into its platform ecosystem rather than develop it from scratch.
COMPETITIVE LANDSCAPE:
The only comparable approach is Cerebras, which signed a deal with AWS during the same week. However, Cerebras focuses on wafer-scale compute for both training and inference, while Groq's LPU is purpose-built exclusively for inference workloads.
Sources
🧠 Stay Updated on AI Agents
Get weekly insights on agentic AI, networks and infrastructure. No spam.
Join 500+ AI builders. Unsubscribe anytime.