If you've been running OpenClaw with Claude or GPT-4 for a while, you've probably felt the API bill. Active agents doing real work can burn $50–$200/month or more in API costs — and that's a real barrier to leaving it running 24/7.
Ollama is the solution most people know about but few have actually set up correctly with OpenClaw. It runs open-source AI models locally on your own hardware — no API key, no per-token cost, no data leaving your machine.
Here's the full picture: how to set it up, which models actually work well with OpenClaw's tool-use requirements, and when local models fall short.
The Cost Case
The hybrid approach is the sweet spot: use local models for routine tasks (summarizing, drafting, heartbeat cycles, research), and route complex reasoning or tool-heavy tasks to cloud APIs. Most agents spend 70–85% of their tokens on tasks where a capable local model is good enough.
What You Need
- Hardware: Any machine with at least 8GB RAM (16GB+ recommended). GPU accelerates inference dramatically but isn't required — CPU-only works fine for smaller models.
- Ollama: Free, open-source model runtime. Works on Mac, Linux, Windows.
- OpenClaw: Any recent version (2026.x) — Ollama support is built in.
Context window note: OpenClaw requires a large context window. Ollama recommends at least 64k tokens for local models used with OpenClaw. Most models default to 2k–8k — you must override this in your Modelfile or via the API parameter. Skipping this is the #1 reason local models feel "dumb" in OpenClaw setups.
Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
# Should show: ollama version 0.x.x
On Mac, Ollama runs as a menu bar app after installation. On Linux, it runs as a systemd service. It serves a local API on port 11434 by default.
Step 2: Pull a Model
Not all models work equally well with OpenClaw. The key requirement is strong tool/function calling support — OpenClaw relies heavily on structured tool calls that many smaller models handle poorly.
| Model | Size | Tool Use | Speed (CPU) | Best For |
|---|---|---|---|---|
| qwen2.5:7b | 4.7GB | ✅ Excellent | Fast | Everyday tasks, heartbeat, drafting |
| qwen3:8b | 5.2GB | ✅ Excellent | Fast | Best overall for most setups |
| llama3.1:8b | 4.7GB | ⚠️ Good | Fast | General use, slightly weaker tool calls |
| qwen2.5:14b | 9GB | ✅ Excellent | Moderate | Better reasoning, needs 16GB+ RAM |
| mistral:7b | 4.1GB | ⚠️ Moderate | Fast | Drafting/writing, weaker on tools |
| llama3.3:70b | 43GB | ✅ Excellent | Slow on CPU | Near-Claude quality, needs GPU/high RAM |
Our recommendation for most setups: Start with qwen3:8b. It's fast on CPU, has excellent tool calling, and works well with OpenClaw's tool use patterns out of the box.
ollama pull qwen3:8b
# Downloads ~5.2GB — takes a few minutes
Step 3: Set the Context Window
This step is skipped by almost everyone and causes the most problems. Default context windows are far too small for OpenClaw's needs.
# Create the file
cat > ~/qwen3-openclaw.Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER num_ctx 65536
EOF
# Build the custom model
ollama create qwen3-openclaw -f ~/qwen3-openclaw.Modelfile
Now you have a model named qwen3-openclaw that runs qwen3:8b with a 64k context window. Use this model name in your OpenClaw config.
Step 4: Wire It Into OpenClaw
Open ~/.openclaw/openclaw.json and configure Ollama as a model provider:
{
"agents": {
"defaults": {
"model": {
"primary": "anthropic/claude-sonnet-4-6",
"fallbacks": [
"ollama/qwen3-openclaw"
]
}
}
},
"providers": {
"ollama": {
"baseUrl": "http://127.0.0.1:11434"
}
}
}
This sets Claude as your primary model and Ollama as a fallback. You can also flip this — set Ollama as primary and Claude as fallback for when local isn't sufficient.
Hybrid routing tip: Set a per-agent model override for your heartbeat/background agent to use Ollama, and keep Claude for your main interactive agent. Most heartbeat tasks (scanning, logging, research) run fine on local models, saving Claude tokens for conversations that actually need it.
Step 5: Test It
# Force a specific model for one message
openclaw chat --model ollama/qwen3-openclaw "List 3 things you can help me with"
If it responds coherently with tool-aware output, you're set. If it returns garbled JSON or ignores tool schemas, recheck your Modelfile context size — that's almost always the culprit.
Making Ollama Survive Reboots
On Linux, Ollama installs as a systemd service automatically. Verify:
systemctl status ollama
# Should show: active (running)
# If not enabled:
systemctl enable ollama
systemctl start ollama
On Mac, Ollama runs in the menu bar and starts automatically on login.
When to Use Local vs. Cloud
Local models are great for:
- Heartbeat cycles and background tasks
- Drafting, summarizing, reformatting
- Research scanning and logging
- Routine tool calls (file ops, web searches)
- Any task where latency doesn't matter much
Stick with cloud APIs for:
- Complex multi-step reasoning chains
- Code generation that needs to actually work first try
- Tasks requiring up-to-date knowledge (local models have training cutoffs)
- Anything where you need the absolute best quality output
- Real-time conversations where response speed matters
Don't go 100% local for anything client-facing. Local models at the 7–14B scale are good — not great. For work that goes to clients, gets published, or makes decisions, keep a cloud model in the loop as primary or reviewer.
Performance Expectations (CPU-Only)
On a modern laptop or mini PC with no GPU:
- qwen3:8b / qwen2.5:7b — 8–15 tokens/sec. Perfectly usable for background tasks. Feels slow for live chat.
- 14B models — 4–7 tokens/sec. Workable for overnight/batch tasks, sluggish for interactive use.
- 70B models — 1–2 tokens/sec on CPU. Not practical without a GPU.
For interactive chat, you'll feel the slowness of CPU inference. For background agents (heartbeat tasks, research scans, log writing), it doesn't matter — the agent runs while you sleep.
🦞 Want This Set Up Properly?
Getting Ollama + OpenClaw hybrid routing configured correctly takes about 30–60 minutes the first time — and most people hit at least one snag with context windows or model routing. ClawReady sets this up as part of our standard setup packages.
Book a Free 15-Min Call →