The most underutilized feature of a well-configured OpenClaw setup is local model routing. Every heartbeat log entry, draft summary, simple Q&A, and routing decision your agent makes can run on your own hardware at $0/request — freeing your API quota for the complex tasks that actually need frontier-model reasoning.
The local model landscape in 2026 has changed significantly from even six months ago. This guide reflects what's actually working now — updated for the models available on Ollama as of late April 2026, including the just-launched DeepSeek V4-Flash.
The Case for Local Models in OpenClaw
Before the recommendations: why bother? Three reasons that compound:
- Cost: API costs for high-volume agentic use add up fast. Heartbeat tasks firing every hour, research summaries, draft generation — these are cheap per-call but expensive at scale. Local models cost electricity.
- Resilience: If your cloud provider restricts access (as Anthropic did for some OpenClaw users recently), local models keep your agent running. No API key, no rate limit, no outage risk.
- Latency: For simple tasks on fast hardware, a local 9B model responds in 1–3 seconds. No network round-trip, no rate limiting, no queuing.
The tradeoff: quality ceiling. Local models in the 7–13B range are meaningfully weaker than frontier models on complex multi-step reasoning. The strategy is routing — not replacing frontier models, but handling the 60–70% of tasks that don't need them.
Hardware Baseline: What Can Run What
| Hardware | RAM | Max Model Size | Notes |
|---|---|---|---|
| MacBook Air M2/M3, Mac Mini M4 | 16GB | ~13B active (MoE: 27–30B total) | Unified memory = excellent for MoE models |
| NucBoxM5Ultra, Intel NUC (16GB RAM) | 16GB | ~9–13B (CPU-only) | Slower than Apple Silicon; 9B is the sweet spot |
| Desktop with 24GB VRAM (RTX 4090) | 24GB VRAM | ~30B dense / larger MoE | Full GPU inference; fastest local option |
| Mac Studio M3 Ultra (192GB) | 192GB | 70B+ comfortably | Can run 70B Llama at full speed |
The Recommendations
The current community consensus for best general-purpose local model for OpenClaw. Strong instruction-following, good tool-call reliability, fast inference. Works well for heartbeat tasks, daily briefings, drafting, summarization, and simple Q&A. Fits comfortably in 16GB RAM.
Top local model for code generation, debugging, and technical reasoning in 2026. Qwen's coder variant at 30B outperforms most 70B general-purpose models on coding tasks. If you're using OpenClaw for ACP agent sessions (Codex, Claude Code alternatives), running qwen3-coder:30b locally for the scaffolding and review steps cuts costs significantly.
Just released today (April 24). DeepSeek's V4-Flash is a 284B total / 13B active parameter MoE model, explicitly fine-tuned for OpenClaw-style agentic workflows. DeepSeek's own claim: "performs on par with V4-Pro on simple Agent tasks." For tool calling, multi-turn instruction following, and heartbeat-style autonomous tasks, this is now the strongest local option available. The MoE architecture means it runs efficiently despite the large total parameter count.
The stronger generalist option when you have the hardware to run it. Better reasoning depth than the 9B variants, handles longer context better, and produces more coherent multi-step analysis. Good for research summaries, document analysis, and any task that needs more nuanced judgment without frontier-model cost.
The right call for budget hardware or machines with 8–12GB RAM. Fast, small, reliable for routine tasks. Not competitive with larger models on complex reasoning, but perfectly adequate for heartbeat logs, simple scheduling tasks, message routing, and draft generation. If you're running OpenClaw on a Raspberry Pi 5 or an older laptop, this is your model.
The Recommended Tiered Config
Don't pick one model — route by task type. This is the configuration pattern that balances cost and quality effectively:
"model": "claude/claude-sonnet-4-6", // complex reasoning, default
"heartbeat": {
"model": "ollama/glm-4.7-flash" // routine background tasks: free
},
"skills": {
"defaultModel": "ollama/deepseek-v4-flash" // agent tool tasks: optimized + free
}
// Result: ~60-70% of requests hit local models at $0/request
Setting Up Ollama with OpenClaw
If you haven't set up Ollama yet:
- Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh(Linux/Mac) - Pull your chosen model:
ollama pull glm-4.7-flash - Verify it's running:
ollama list - Configure in OpenClaw: set
"model": "ollama/glm-4.7-flash"for the tasks you want local - Test: send a message to your OpenClaw instance and verify the model used in the response metadata
Ollama runs as a local server on port 11434. It starts automatically on boot by default. OpenClaw's gateway connects to it via localhost:11434 — no configuration needed if both are on the same machine. If you run OpenClaw on a different machine than Ollama, configure the base URL in your model provider settings.
What Local Models Can't Do Well (Yet)
Be realistic about the limits. For OpenClaw use cases, local models at 9–13B active parameters struggle with:
- Multi-step agentic planning — chains of 5+ tool calls with conditional logic
- Novel problem solving — anything requiring genuine reasoning about unfamiliar situations
- Long-form synthesis — pulling insights from 10+ sources into a coherent analysis
- SOUL.md constraint adherence under pressure — complex behavioral rules are harder to enforce at smaller model sizes
The strategy: local for volume, frontier for complexity. Route the 70% of requests that don't need deep reasoning to local, reserve your API quota for the 30% that do.
Note on DeepSeek V4-Flash: The 13B active parameter count via MoE makes it significantly more capable than a 13B dense model of the same effective size. On agent-specific tasks (tool calling, instruction following, multi-turn), it punches above its weight. Worth testing directly against glm-4.7-flash for your specific workload before committing to either as primary.
Get Local Model Routing Configured Correctly
ClawReady sets up Ollama + local model routing as part of every setup — right model for each task type, proper OpenClaw integration, and fallback configuration. You get $0 API costs for routine tasks from day one.
See Setup Packages →