OpenClaw + Ollama: Run Local AI Models for Free

If you've been running OpenClaw with Claude or GPT-4 for a while, you've probably felt the API bill. Active agents doing real work can burn $50–$200/month or more in API costs — and that's a real barrier to leaving it running 24/7.

Ollama is the solution most people know about but few have actually set up correctly with OpenClaw. It runs open-source AI models locally on your own hardware — no API key, no per-token cost, no data leaving your machine.

Here's the full picture: how to set it up, which models actually work well with OpenClaw's tool-use requirements, and when local models fall short.

The Cost Case

Cloud-only (Claude Sonnet)

$50–200

per month for active agents

Hybrid (80% local / 20% cloud)

$5–20

per month — same agent capability

The hybrid approach is the sweet spot: use local models for routine tasks (summarizing, drafting, heartbeat cycles, research), and route complex reasoning or tool-heavy tasks to cloud APIs. Most agents spend 70–85% of their tokens on tasks where a capable local model is good enough.

What You Need

Hardware: Any machine with at least 8GB RAM (16GB+ recommended). GPU accelerates inference dramatically but isn't required — CPU-only works fine for smaller models.
Ollama: Free, open-source model runtime. Works on Mac, Linux, Windows.
OpenClaw: Any recent version (2026.x) — Ollama support is built in.

Context window note: OpenClaw requires a large context window. Ollama recommends at least 64k tokens for local models used with OpenClaw. Most models default to 2k–8k — you must override this in your Modelfile or via the API parameter. Skipping this is the #1 reason local models feel "dumb" in OpenClaw setups.

Step 1: Install Ollama

Install Ollama (Mac / Linux)

curl -fsSL https://ollama.com/install.sh | sh

Verify it's running

ollama --version
# Should show: ollama version 0.x.x

On Mac, Ollama runs as a menu bar app after installation. On Linux, it runs as a systemd service. It serves a local API on port 11434 by default.

Step 2: Pull a Model

Not all models work equally well with OpenClaw. The key requirement is strong tool/function calling support — OpenClaw relies heavily on structured tool calls that many smaller models handle poorly.

Model	Size	Tool Use	Speed (CPU)	Best For
qwen2.5:7b	4.7GB	✅ Excellent	Fast	Everyday tasks, heartbeat, drafting
qwen3:8b	5.2GB	✅ Excellent	Fast	Best overall for most setups
llama3.1:8b	4.7GB	⚠️ Good	Fast	General use, slightly weaker tool calls
qwen2.5:14b	9GB	✅ Excellent	Moderate	Better reasoning, needs 16GB+ RAM
mistral:7b	4.1GB	⚠️ Moderate	Fast	Drafting/writing, weaker on tools
llama3.3:70b	43GB	✅ Excellent	Slow on CPU	Near-Claude quality, needs GPU/high RAM

Our recommendation for most setups: Start with qwen3:8b. It's fast on CPU, has excellent tool calling, and works well with OpenClaw's tool use patterns out of the box.

Pull the recommended model

ollama pull qwen3:8b
# Downloads ~5.2GB — takes a few minutes

Step 3: Set the Context Window

This step is skipped by almost everyone and causes the most problems. Default context windows are far too small for OpenClaw's needs.

Create a custom Modelfile with 64k context

# Create the file
cat > ~/qwen3-openclaw.Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER num_ctx 65536
EOF

# Build the custom model
ollama create qwen3-openclaw -f ~/qwen3-openclaw.Modelfile

Now you have a model named qwen3-openclaw that runs qwen3:8b with a 64k context window. Use this model name in your OpenClaw config.

Step 4: Wire It Into OpenClaw

Open ~/.openclaw/openclaw.json and configure Ollama as a model provider:

openclaw.json — add Ollama provider + set as fallback

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6",
        "fallbacks": [
          "ollama/qwen3-openclaw"
        ]
      }
    }
  },
  "providers": {
    "ollama": {
      "baseUrl": "http://127.0.0.1:11434"
    }
  }
}

This sets Claude as your primary model and Ollama as a fallback. You can also flip this — set Ollama as primary and Claude as fallback for when local isn't sufficient.

Hybrid routing tip: Set a per-agent model override for your heartbeat/background agent to use Ollama, and keep Claude for your main interactive agent. Most heartbeat tasks (scanning, logging, research) run fine on local models, saving Claude tokens for conversations that actually need it.

Step 5: Test It

Quick test from the OpenClaw CLI

# Force a specific model for one message
openclaw chat --model ollama/qwen3-openclaw "List 3 things you can help me with"

If it responds coherently with tool-aware output, you're set. If it returns garbled JSON or ignores tool schemas, recheck your Modelfile context size — that's almost always the culprit.

Making Ollama Survive Reboots

On Linux, Ollama installs as a systemd service automatically. Verify:

Verify Ollama service is enabled

systemctl status ollama
# Should show: active (running)

# If not enabled:
systemctl enable ollama
systemctl start ollama

On Mac, Ollama runs in the menu bar and starts automatically on login.

When to Use Local vs. Cloud

Local models are great for:

Heartbeat cycles and background tasks
Drafting, summarizing, reformatting
Research scanning and logging
Routine tool calls (file ops, web searches)
Any task where latency doesn't matter much

Stick with cloud APIs for:

Complex multi-step reasoning chains
Code generation that needs to actually work first try
Tasks requiring up-to-date knowledge (local models have training cutoffs)
Anything where you need the absolute best quality output
Real-time conversations where response speed matters

Don't go 100% local for anything client-facing. Local models at the 7–14B scale are good — not great. For work that goes to clients, gets published, or makes decisions, keep a cloud model in the loop as primary or reviewer.

Performance Expectations (CPU-Only)

On a modern laptop or mini PC with no GPU:

qwen3:8b / qwen2.5:7b — 8–15 tokens/sec. Perfectly usable for background tasks. Feels slow for live chat.
14B models — 4–7 tokens/sec. Workable for overnight/batch tasks, sluggish for interactive use.
70B models — 1–2 tokens/sec on CPU. Not practical without a GPU.

For interactive chat, you'll feel the slowness of CPU inference. For background agents (heartbeat tasks, research scans, log writing), it doesn't matter — the agent runs while you sleep.

🦞 Want This Set Up Properly?

Getting Ollama + OpenClaw hybrid routing configured correctly takes about 30–60 minutes the first time — and most people hit at least one snag with context windows or model routing. ClawReady sets this up as part of our standard setup packages.

Book a Free 15-Min Call →

OpenClaw + Ollama:Run Local AI Models for Free

The Cost Case

What You Need

Step 1: Install Ollama

Step 2: Pull a Model

Step 3: Set the Context Window

Step 4: Wire It Into OpenClaw

Step 5: Test It

Making Ollama Survive Reboots

When to Use Local vs. Cloud

Performance Expectations (CPU-Only)

🦞 Want This Set Up Properly?

OpenClaw + Ollama:
Run Local AI Models for Free