Running OpenClaw with Ollama is a great setup โ€” zero API cost, full privacy, works offline. But large local models (gemma4, qwen3, llama3.3, anything over ~13B) are slow, and OpenClaw's default timeout wasn't designed for slow hardware.

The result: OpenClaw sends a request to Ollama, waits, and then gives up before the model finishes generating. You get an error like LLM request timed out or Error: connect ECONNREFUSED or just silence.

Most common scenario: Works fine on short prompts, fails on longer ones โ€” complex reasoning, multi-step tasks, or anything with large context. That's the timeout in action: the model is still thinking, OpenClaw already gave up.

What's Actually Happening

OpenClaw has a configurable LLM request timeout (default: 60 seconds). On fast hardware with a small quantized model, 60 seconds is plenty. On CPU-only hardware with a 31B model at Q4, a single complex response can take 3โ€“8 minutes.

Two separate timeouts can hit you:

Fix both and the problem goes away.

The 4 Fixes (Ranked by Effort)

1
Increase OpenClaw's LLM timeout
โฑ 2 minutes ยท Fixes most cases immediately

In your openclaw.json, add or update the timeout setting for your local model provider:

{
  "models": {
    "local": {
      "provider": "ollama",
      "model": "qwen3.5:9b",
      "requestTimeoutMs": 300000,
      "streamTimeoutMs": 600000
    }
  }
}

requestTimeoutMs: 300000 = 5 minutes. streamTimeoutMs: 600000 = 10 minutes. Adjust based on your hardware. CPU-only with a 30B+ model may need even higher values.

2
Pin Ollama's keep-alive to prevent model unloading
โฑ 5 minutes ยท Eliminates reload latency

By default Ollama unloads models from memory after 5 minutes of inactivity. When OpenClaw's heartbeat fires 30 minutes later, Ollama has to reload the model โ€” adding a 30โ€“90 second cold-start delay before the first token. That alone can trigger the timeout.

Set Ollama's keep-alive to indefinite by adding an environment variable to your Ollama systemd service or startup command:

# For systemd: edit /etc/systemd/system/ollama.service
[Service]
Environment="OLLAMA_KEEP_ALIVE=-1"

# Then reload:
sudo systemctl daemon-reload && sudo systemctl restart ollama

OLLAMA_KEEP_ALIVE=-1 keeps the model loaded in memory until Ollama exits. Only do this if you have enough RAM. If memory is tight, use OLLAMA_KEEP_ALIVE=30m to extend to 30 minutes instead.

3
Switch to a smaller or more aggressively quantized model
โฑ 15 minutes ยท Best ROI on CPU-only hardware

If you're running a 31B model on a machine without a GPU, you're asking for timeout trouble. CPU inference on a large model is simply too slow for OpenClaw's conversational use case.

Practical hardware-to-model guidance:

  • 8โ€“16GB RAM, CPU-only: qwen2.5:7b Q4, qwen3.5:9b Q4, gemma3:9b Q4 โ€” these generate at 8โ€“15 tokens/sec on modern CPUs
  • 16โ€“32GB RAM, CPU-only: qwen2.5:14b Q4, qwen3:14b Q4 โ€” 4โ€“8 tokens/sec, works but slow
  • GPU available (4โ€“8GB VRAM): llama3.3:8b, gemma3:12b โ€” fast enough for any timeout setting
  • 31B+ models on CPU: Not recommended for interactive use. Fine for overnight batch tasks, not for heartbeat-driven conversations.
# Pull a faster model
ollama pull qwen3.5:9b

# Update openclaw.json
"model": "qwen3.5:9b"
4
Use a hybrid routing setup โ€” local for simple, API for complex
โฑ 30 minutes ยท Best of both worlds

OpenClaw supports multiple model profiles. Route fast/simple tasks (heartbeat checks, short responses, memory reads) to your local model and complex reasoning to a cloud API. This cuts your API cost by 70โ€“90% while keeping response quality and speed high where it matters.

{
  "models": {
    "default": "local",
    "local": {
      "provider": "ollama",
      "model": "qwen3.5:9b",
      "requestTimeoutMs": 300000
    },
    "powerful": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6"
    }
  }
}

In your SOUL.md or AGENTS.md, instruct the agent to use model: powerful only for complex multi-step tasks. Heartbeats and simple replies stay on local at $0.

Diagnosing Which Timeout You're Hitting

Quick way to tell:

Quick Reference: Recommended Settings by Hardware

# CPU-only (8โ€“16GB RAM) โ€” qwen3.5:9b recommended
requestTimeoutMs: 180000    # 3 min
streamTimeoutMs:  360000    # 6 min
OLLAMA_KEEP_ALIVE: 60m

# CPU-only (16โ€“32GB RAM) โ€” qwen2.5:14b
requestTimeoutMs: 300000    # 5 min
streamTimeoutMs:  600000    # 10 min
OLLAMA_KEEP_ALIVE: 30m

# GPU (4GB+ VRAM) โ€” gemma3:12b or llama3.3:8b
requestTimeoutMs: 60000     # 1 min (default is fine)
streamTimeoutMs:  120000    # 2 min
OLLAMA_KEEP_ALIVE: -1       # keep loaded

Start with Fix 1 (timeout increase) + Fix 2 (keep-alive). Those two changes solve the issue for ~80% of setups in under 10 minutes total.