πŸ—οΈ AI Infrastructure

Ollama 0.19 Preview: MLX-Powered Apple Silicon Acceleration Specifically Optimized for AI Coding Agents and OpenClaw

β€’2 min readβ€’1 views

Ollama announced v0.19 preview on March 31, 2026 β€” a fundamental architecture shift from llama.cpp to Apple MLX framework on Apple Silicon, specifically optimized for AI coding agents and personal assistant workloads.

Key Performance Gains: Using Qwen3.5-35B-A3B with NVFP4 quantization on M5 chips:

  • 1851 tokens/s prefill speed (massive improvement over previous Q4_K_M implementation)
  • 134 tokens/s decode speed
  • Leverages M5/M5 Pro/M5 Max GPU Neural Accelerators for both time-to-first-token and generation speed

Agent-Specific Optimizations: The release explicitly targets agentic coding use cases with three cache improvements:

  1. Cross-conversation cache reuse: Less memory utilization and more cache hits when branching with shared system prompts β€” critical for tool-calling agents like Claude Code that send repeated system prompts
  2. Intelligent checkpoints: Cache snapshots at strategic prompt locations, reducing prompt processing and enabling faster responses during multi-turn agent interactions
  3. Smarter eviction: Shared prefixes survive longer even when older branches are dropped

NVFP4 Production Parity: Ollama now uses NVIDIA NVFP4 format for quantization, matching what inference providers use in production. This means local development with Ollama produces the same quality results as cloud deployments β€” critical for agent developers testing locally before deploying.

Explicit Agent Integrations: The blog post specifically highlights integration with:

  • OpenClaw personal assistants
  • Claude Code coding agent
  • OpenCode
  • Codex

With command-line launchers like 'ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4' and 'ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4'.

Requires Mac with 32GB+ unified memory. The release is #2 on Hacker News with 187 points and 77 comments within 4 hours.

Share this article

🧠 Stay Updated on AI Agents

Get weekly insights on agentic AI, networks and infrastructure. No spam.

Join 500+ AI builders. Unsubscribe anytime.

Deploy Your AI Agent Today

Launch a managed OpenClaw instance in minutes

Request demo β†’