The most underutilized feature of a well-configured OpenClaw setup is local model routing. Every heartbeat log entry, draft summary, simple Q&A, and routing decision your agent makes can run on your own hardware at $0/request — freeing your API quota for the complex tasks that actually need frontier-model reasoning.

The local model landscape in 2026 has changed significantly from even six months ago. This guide reflects what's actually working now — updated for the models available on Ollama as of late April 2026, including the just-launched DeepSeek V4-Flash.

The Case for Local Models in OpenClaw

Before the recommendations: why bother? Three reasons that compound:

The tradeoff: quality ceiling. Local models in the 7–13B range are meaningfully weaker than frontier models on complex multi-step reasoning. The strategy is routing — not replacing frontier models, but handling the 60–70% of tasks that don't need them.

Hardware Baseline: What Can Run What

Hardware RAM Max Model Size Notes
MacBook Air M2/M3, Mac Mini M4 16GB ~13B active (MoE: 27–30B total) Unified memory = excellent for MoE models
NucBoxM5Ultra, Intel NUC (16GB RAM) 16GB ~9–13B (CPU-only) Slower than Apple Silicon; 9B is the sweet spot
Desktop with 24GB VRAM (RTX 4090) 24GB VRAM ~30B dense / larger MoE Full GPU inference; fastest local option
Mac Studio M3 Ultra (192GB) 192GB 70B+ comfortably Can run 70B Llama at full speed

The Recommendations

glm-4.7-flash
Best Overall Default Fast

The current community consensus for best general-purpose local model for OpenClaw. Strong instruction-following, good tool-call reliability, fast inference. Works well for heartbeat tasks, daily briefings, drafting, summarization, and simple Q&A. Fits comfortably in 16GB RAM.

ollama pull glm-4.7-flash
OpenClaw config: ollama/glm-4.7-flash
qwen3-coder:30b
Best for Coding Needs 24GB+ RAM or Apple Silicon

Top local model for code generation, debugging, and technical reasoning in 2026. Qwen's coder variant at 30B outperforms most 70B general-purpose models on coding tasks. If you're using OpenClaw for ACP agent sessions (Codex, Claude Code alternatives), running qwen3-coder:30b locally for the scaffolding and review steps cuts costs significantly.

ollama pull qwen3-coder:30b
OpenClaw config: ollama/qwen3-coder:30b
deepseek-v4-flash
Best for Agent Tasks New — Apr 24, 2026

Just released today (April 24). DeepSeek's V4-Flash is a 284B total / 13B active parameter MoE model, explicitly fine-tuned for OpenClaw-style agentic workflows. DeepSeek's own claim: "performs on par with V4-Pro on simple Agent tasks." For tool calling, multi-turn instruction following, and heartbeat-style autonomous tasks, this is now the strongest local option available. The MoE architecture means it runs efficiently despite the large total parameter count.

ollama pull deepseek-v4-flash
OpenClaw config: ollama/deepseek-v4-flash · Needs 16GB+ RAM
qwen3.5:27b
Strong Generalist Apple Silicon or 24GB VRAM recommended

The stronger generalist option when you have the hardware to run it. Better reasoning depth than the 9B variants, handles longer context better, and produces more coherent multi-step analysis. Good for research summaries, document analysis, and any task that needs more nuanced judgment without frontier-model cost.

ollama pull qwen3.5:27b
OpenClaw config: ollama/qwen3.5:27b
qwen3.5:9b
Best Budget / Low-RAM Runs on 8GB RAM

The right call for budget hardware or machines with 8–12GB RAM. Fast, small, reliable for routine tasks. Not competitive with larger models on complex reasoning, but perfectly adequate for heartbeat logs, simple scheduling tasks, message routing, and draft generation. If you're running OpenClaw on a Raspberry Pi 5 or an older laptop, this is your model.

ollama pull qwen3.5:9b
OpenClaw config: ollama/qwen3.5:9b

The Recommended Tiered Config

Don't pick one model — route by task type. This is the configuration pattern that balances cost and quality effectively:

// openclaw.json — tiered model routing
"model": "claude/claude-sonnet-4-6", // complex reasoning, default

"heartbeat": {
  "model": "ollama/glm-4.7-flash" // routine background tasks: free
},

"skills": {
  "defaultModel": "ollama/deepseek-v4-flash" // agent tool tasks: optimized + free
}

// Result: ~60-70% of requests hit local models at $0/request

Setting Up Ollama with OpenClaw

If you haven't set up Ollama yet:

  1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh (Linux/Mac)
  2. Pull your chosen model: ollama pull glm-4.7-flash
  3. Verify it's running: ollama list
  4. Configure in OpenClaw: set "model": "ollama/glm-4.7-flash" for the tasks you want local
  5. Test: send a message to your OpenClaw instance and verify the model used in the response metadata

Ollama runs as a local server on port 11434. It starts automatically on boot by default. OpenClaw's gateway connects to it via localhost:11434 — no configuration needed if both are on the same machine. If you run OpenClaw on a different machine than Ollama, configure the base URL in your model provider settings.

What Local Models Can't Do Well (Yet)

Be realistic about the limits. For OpenClaw use cases, local models at 9–13B active parameters struggle with:

The strategy: local for volume, frontier for complexity. Route the 70% of requests that don't need deep reasoning to local, reserve your API quota for the 30% that do.

Note on DeepSeek V4-Flash: The 13B active parameter count via MoE makes it significantly more capable than a 13B dense model of the same effective size. On agent-specific tasks (tool calling, instruction following, multi-turn), it punches above its weight. Worth testing directly against glm-4.7-flash for your specific workload before committing to either as primary.

Get Local Model Routing Configured Correctly

ClawReady sets up Ollama + local model routing as part of every setup — right model for each task type, proper OpenClaw integration, and fallback configuration. You get $0 API costs for routine tasks from day one.

See Setup Packages →