OpenClaw + Ollama LLM Timeout: What's Causing It and How to Fix It

Running OpenClaw with Ollama is a great setup — zero API cost, full privacy, works offline. But large local models (gemma4, qwen3, llama3.3, anything over ~13B) are slow, and OpenClaw's default timeout wasn't designed for slow hardware.

The result: OpenClaw sends a request to Ollama, waits, and then gives up before the model finishes generating. You get an error like LLM request timed out or Error: connect ECONNREFUSED or just silence.

Most common scenario: Works fine on short prompts, fails on longer ones — complex reasoning, multi-step tasks, or anything with large context. That's the timeout in action: the model is still thinking, OpenClaw already gave up.

What's Actually Happening

OpenClaw has a configurable LLM request timeout (default: 60 seconds). On fast hardware with a small quantized model, 60 seconds is plenty. On CPU-only hardware with a 31B model at Q4, a single complex response can take 3–8 minutes.

Two separate timeouts can hit you:

OpenClaw's request timeout — the time it waits for the first token back from Ollama
Ollama's own keep-alive timeout — if Ollama unloads the model between requests, the next one has to reload from disk, adding 30–90 seconds of latency before the first token

Fix both and the problem goes away.

The 4 Fixes (Ranked by Effort)

Increase OpenClaw's LLM timeout

⏱ 2 minutes · Fixes most cases immediately

In your openclaw.json, add or update the timeout setting for your local model provider:

{
  "models": {
    "local": {
      "provider": "ollama",
      "model": "qwen3.5:9b",
      "requestTimeoutMs": 300000,
      "streamTimeoutMs": 600000
    }
  }
}

requestTimeoutMs: 300000 = 5 minutes. streamTimeoutMs: 600000 = 10 minutes. Adjust based on your hardware. CPU-only with a 30B+ model may need even higher values.

Pin Ollama's keep-alive to prevent model unloading

⏱ 5 minutes · Eliminates reload latency

By default Ollama unloads models from memory after 5 minutes of inactivity. When OpenClaw's heartbeat fires 30 minutes later, Ollama has to reload the model — adding a 30–90 second cold-start delay before the first token. That alone can trigger the timeout.

Set Ollama's keep-alive to indefinite by adding an environment variable to your Ollama systemd service or startup command:

# For systemd: edit /etc/systemd/system/ollama.service
[Service]
Environment="OLLAMA_KEEP_ALIVE=-1"

# Then reload:
sudo systemctl daemon-reload && sudo systemctl restart ollama

OLLAMA_KEEP_ALIVE=-1 keeps the model loaded in memory until Ollama exits. Only do this if you have enough RAM. If memory is tight, use OLLAMA_KEEP_ALIVE=30m to extend to 30 minutes instead.

Switch to a smaller or more aggressively quantized model

⏱ 15 minutes · Best ROI on CPU-only hardware

If you're running a 31B model on a machine without a GPU, you're asking for timeout trouble. CPU inference on a large model is simply too slow for OpenClaw's conversational use case.

Practical hardware-to-model guidance:

8–16GB RAM, CPU-only: qwen2.5:7b Q4, qwen3.5:9b Q4, gemma3:9b Q4 — these generate at 8–15 tokens/sec on modern CPUs
16–32GB RAM, CPU-only: qwen2.5:14b Q4, qwen3:14b Q4 — 4–8 tokens/sec, works but slow
GPU available (4–8GB VRAM): llama3.3:8b, gemma3:12b — fast enough for any timeout setting
31B+ models on CPU: Not recommended for interactive use. Fine for overnight batch tasks, not for heartbeat-driven conversations.

# Pull a faster model
ollama pull qwen3.5:9b

# Update openclaw.json
"model": "qwen3.5:9b"

Use a hybrid routing setup — local for simple, API for complex

⏱ 30 minutes · Best of both worlds

OpenClaw supports multiple model profiles. Route fast/simple tasks (heartbeat checks, short responses, memory reads) to your local model and complex reasoning to a cloud API. This cuts your API cost by 70–90% while keeping response quality and speed high where it matters.

{
  "models": {
    "default": "local",
    "local": {
      "provider": "ollama",
      "model": "qwen3.5:9b",
      "requestTimeoutMs": 300000
    },
    "powerful": {
      "provider": "anthropic",
      "model": "claude-sonnet-4-6"
    }
  }
}

In your SOUL.md or AGENTS.md, instruct the agent to use model: powerful only for complex multi-step tasks. Heartbeats and simple replies stay on local at $0.

Diagnosing Which Timeout You're Hitting

Quick way to tell:

Fails after exactly 60 seconds: OpenClaw timeout. Fix 1.
Fails after 30–90 seconds on the first message, then works: Ollama model reload lag. Fix 2.
Fails unpredictably on longer prompts but not short ones: Model too slow for your hardware. Fix 3.
Intermittent failures with no clear pattern: Memory pressure causing Ollama to swap. Check htop during a request — if RAM is maxed, the model is paging to disk. Fix 3 (smaller model) or add swap space.

Quick Reference: Recommended Settings by Hardware

# CPU-only (8–16GB RAM) — qwen3.5:9b recommended
requestTimeoutMs: 180000    # 3 min
streamTimeoutMs:  360000    # 6 min
OLLAMA_KEEP_ALIVE: 60m

# CPU-only (16–32GB RAM) — qwen2.5:14b
requestTimeoutMs: 300000    # 5 min
streamTimeoutMs:  600000    # 10 min
OLLAMA_KEEP_ALIVE: 30m

# GPU (4GB+ VRAM) — gemma3:12b or llama3.3:8b
requestTimeoutMs: 60000     # 1 min (default is fine)
streamTimeoutMs:  120000    # 2 min
OLLAMA_KEEP_ALIVE: -1       # keep loaded

Start with Fix 1 (timeout increase) + Fix 2 (keep-alive). Those two changes solve the issue for ~80% of setups in under 10 minutes total.

OpenClaw + Ollama LLM Timeout: What's Causing It and How to Fix It

What's Actually Happening

The 4 Fixes (Ranked by Effort)

Diagnosing Which Timeout You're Hitting

Quick Reference: Recommended Settings by Hardware

Want This Configured Correctly From the Start?