Ollama 0.19 Preview: MLX-Powered Apple Silicon Acceleration Specifically Optimized for AI Coding Agents and OpenClaw

Ollama announced v0.19 preview on March 31, 2026 β a fundamental architecture shift from llama.cpp to Apple MLX framework on Apple Silicon, specifically optimized for AI coding agents and personal assistant workloads.
Key Performance Gains: Using Qwen3.5-35B-A3B with NVFP4 quantization on M5 chips:
- 1851 tokens/s prefill speed (massive improvement over previous Q4_K_M implementation)
- 134 tokens/s decode speed
- Leverages M5/M5 Pro/M5 Max GPU Neural Accelerators for both time-to-first-token and generation speed
Agent-Specific Optimizations: The release explicitly targets agentic coding use cases with three cache improvements:
- Cross-conversation cache reuse: Less memory utilization and more cache hits when branching with shared system prompts β critical for tool-calling agents like Claude Code that send repeated system prompts
- Intelligent checkpoints: Cache snapshots at strategic prompt locations, reducing prompt processing and enabling faster responses during multi-turn agent interactions
- Smarter eviction: Shared prefixes survive longer even when older branches are dropped
NVFP4 Production Parity: Ollama now uses NVIDIA NVFP4 format for quantization, matching what inference providers use in production. This means local development with Ollama produces the same quality results as cloud deployments β critical for agent developers testing locally before deploying.
Explicit Agent Integrations: The blog post specifically highlights integration with:
- OpenClaw personal assistants
- Claude Code coding agent
- OpenCode
- Codex
With command-line launchers like 'ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4' and 'ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4'.
Requires Mac with 32GB+ unified memory. The release is #2 on Hacker News with 187 points and 77 comments within 4 hours.
Sources
π§ Stay Updated on AI Agents
Get weekly insights on agentic AI, networks and infrastructure. No spam.
Join 500+ AI builders. Unsubscribe anytime.