Running OpenClaw with Ollama is a great setup โ zero API cost, full privacy, works offline. But large local models (gemma4, qwen3, llama3.3, anything over ~13B) are slow, and OpenClaw's default timeout wasn't designed for slow hardware.
The result: OpenClaw sends a request to Ollama, waits, and then gives up before the model finishes generating. You get an error like LLM request timed out or Error: connect ECONNREFUSED or just silence.
Most common scenario: Works fine on short prompts, fails on longer ones โ complex reasoning, multi-step tasks, or anything with large context. That's the timeout in action: the model is still thinking, OpenClaw already gave up.
What's Actually Happening
OpenClaw has a configurable LLM request timeout (default: 60 seconds). On fast hardware with a small quantized model, 60 seconds is plenty. On CPU-only hardware with a 31B model at Q4, a single complex response can take 3โ8 minutes.
Two separate timeouts can hit you:
- OpenClaw's request timeout โ the time it waits for the first token back from Ollama
- Ollama's own keep-alive timeout โ if Ollama unloads the model between requests, the next one has to reload from disk, adding 30โ90 seconds of latency before the first token
Fix both and the problem goes away.
The 4 Fixes (Ranked by Effort)
In your openclaw.json, add or update the timeout setting for your local model provider:
{
"models": {
"local": {
"provider": "ollama",
"model": "qwen3.5:9b",
"requestTimeoutMs": 300000,
"streamTimeoutMs": 600000
}
}
}
requestTimeoutMs: 300000 = 5 minutes. streamTimeoutMs: 600000 = 10 minutes. Adjust based on your hardware. CPU-only with a 30B+ model may need even higher values.
By default Ollama unloads models from memory after 5 minutes of inactivity. When OpenClaw's heartbeat fires 30 minutes later, Ollama has to reload the model โ adding a 30โ90 second cold-start delay before the first token. That alone can trigger the timeout.
Set Ollama's keep-alive to indefinite by adding an environment variable to your Ollama systemd service or startup command:
# For systemd: edit /etc/systemd/system/ollama.service
[Service]
Environment="OLLAMA_KEEP_ALIVE=-1"
# Then reload:
sudo systemctl daemon-reload && sudo systemctl restart ollama
OLLAMA_KEEP_ALIVE=-1 keeps the model loaded in memory until Ollama exits. Only do this if you have enough RAM. If memory is tight, use OLLAMA_KEEP_ALIVE=30m to extend to 30 minutes instead.
If you're running a 31B model on a machine without a GPU, you're asking for timeout trouble. CPU inference on a large model is simply too slow for OpenClaw's conversational use case.
Practical hardware-to-model guidance:
- 8โ16GB RAM, CPU-only: qwen2.5:7b Q4, qwen3.5:9b Q4, gemma3:9b Q4 โ these generate at 8โ15 tokens/sec on modern CPUs
- 16โ32GB RAM, CPU-only: qwen2.5:14b Q4, qwen3:14b Q4 โ 4โ8 tokens/sec, works but slow
- GPU available (4โ8GB VRAM): llama3.3:8b, gemma3:12b โ fast enough for any timeout setting
- 31B+ models on CPU: Not recommended for interactive use. Fine for overnight batch tasks, not for heartbeat-driven conversations.
# Pull a faster model
ollama pull qwen3.5:9b
# Update openclaw.json
"model": "qwen3.5:9b"
OpenClaw supports multiple model profiles. Route fast/simple tasks (heartbeat checks, short responses, memory reads) to your local model and complex reasoning to a cloud API. This cuts your API cost by 70โ90% while keeping response quality and speed high where it matters.
{
"models": {
"default": "local",
"local": {
"provider": "ollama",
"model": "qwen3.5:9b",
"requestTimeoutMs": 300000
},
"powerful": {
"provider": "anthropic",
"model": "claude-sonnet-4-6"
}
}
}
In your SOUL.md or AGENTS.md, instruct the agent to use model: powerful only for complex multi-step tasks. Heartbeats and simple replies stay on local at $0.
Diagnosing Which Timeout You're Hitting
Quick way to tell:
- Fails after exactly 60 seconds: OpenClaw timeout. Fix 1.
- Fails after 30โ90 seconds on the first message, then works: Ollama model reload lag. Fix 2.
- Fails unpredictably on longer prompts but not short ones: Model too slow for your hardware. Fix 3.
- Intermittent failures with no clear pattern: Memory pressure causing Ollama to swap. Check
htopduring a request โ if RAM is maxed, the model is paging to disk. Fix 3 (smaller model) or add swap space.
Quick Reference: Recommended Settings by Hardware
# CPU-only (8โ16GB RAM) โ qwen3.5:9b recommended
requestTimeoutMs: 180000 # 3 min
streamTimeoutMs: 360000 # 6 min
OLLAMA_KEEP_ALIVE: 60m
# CPU-only (16โ32GB RAM) โ qwen2.5:14b
requestTimeoutMs: 300000 # 5 min
streamTimeoutMs: 600000 # 10 min
OLLAMA_KEEP_ALIVE: 30m
# GPU (4GB+ VRAM) โ gemma3:12b or llama3.3:8b
requestTimeoutMs: 60000 # 1 min (default is fine)
streamTimeoutMs: 120000 # 2 min
OLLAMA_KEEP_ALIVE: -1 # keep loaded
Start with Fix 1 (timeout increase) + Fix 2 (keep-alive). Those two changes solve the issue for ~80% of setups in under 10 minutes total.