Cost Optimization · April 2026

OpenClaw + Ollama:
Run Local AI Models for Free

Cloud API bills are the biggest hidden cost in OpenClaw. Ollama lets you run capable AI models on your own hardware at zero per-token cost. Here's the full setup — with honest tradeoffs.

By ClawReady · 11 min read

If you've been running OpenClaw with Claude or GPT-4 for a while, you've probably felt the API bill. Active agents doing real work can burn $50–$200/month or more in API costs — and that's a real barrier to leaving it running 24/7.

Ollama is the solution most people know about but few have actually set up correctly with OpenClaw. It runs open-source AI models locally on your own hardware — no API key, no per-token cost, no data leaving your machine.

Here's the full picture: how to set it up, which models actually work well with OpenClaw's tool-use requirements, and when local models fall short.

The Cost Case

Cloud-only (Claude Sonnet)
$50–200
per month for active agents
Hybrid (80% local / 20% cloud)
$5–20
per month — same agent capability

The hybrid approach is the sweet spot: use local models for routine tasks (summarizing, drafting, heartbeat cycles, research), and route complex reasoning or tool-heavy tasks to cloud APIs. Most agents spend 70–85% of their tokens on tasks where a capable local model is good enough.

What You Need

Context window note: OpenClaw requires a large context window. Ollama recommends at least 64k tokens for local models used with OpenClaw. Most models default to 2k–8k — you must override this in your Modelfile or via the API parameter. Skipping this is the #1 reason local models feel "dumb" in OpenClaw setups.

Step 1: Install Ollama

Install Ollama (Mac / Linux)
curl -fsSL https://ollama.com/install.sh | sh
Verify it's running
ollama --version
# Should show: ollama version 0.x.x

On Mac, Ollama runs as a menu bar app after installation. On Linux, it runs as a systemd service. It serves a local API on port 11434 by default.

Step 2: Pull a Model

Not all models work equally well with OpenClaw. The key requirement is strong tool/function calling support — OpenClaw relies heavily on structured tool calls that many smaller models handle poorly.

Model Size Tool Use Speed (CPU) Best For
qwen2.5:7b 4.7GB ✅ Excellent Fast Everyday tasks, heartbeat, drafting
qwen3:8b 5.2GB ✅ Excellent Fast Best overall for most setups
llama3.1:8b 4.7GB ⚠️ Good Fast General use, slightly weaker tool calls
qwen2.5:14b 9GB ✅ Excellent Moderate Better reasoning, needs 16GB+ RAM
mistral:7b 4.1GB ⚠️ Moderate Fast Drafting/writing, weaker on tools
llama3.3:70b 43GB ✅ Excellent Slow on CPU Near-Claude quality, needs GPU/high RAM

Our recommendation for most setups: Start with qwen3:8b. It's fast on CPU, has excellent tool calling, and works well with OpenClaw's tool use patterns out of the box.

Pull the recommended model
ollama pull qwen3:8b
# Downloads ~5.2GB — takes a few minutes

Step 3: Set the Context Window

This step is skipped by almost everyone and causes the most problems. Default context windows are far too small for OpenClaw's needs.

Create a custom Modelfile with 64k context
# Create the file
cat > ~/qwen3-openclaw.Modelfile << 'EOF'
FROM qwen3:8b
PARAMETER num_ctx 65536
EOF

# Build the custom model
ollama create qwen3-openclaw -f ~/qwen3-openclaw.Modelfile

Now you have a model named qwen3-openclaw that runs qwen3:8b with a 64k context window. Use this model name in your OpenClaw config.

Step 4: Wire It Into OpenClaw

Open ~/.openclaw/openclaw.json and configure Ollama as a model provider:

openclaw.json — add Ollama provider + set as fallback
{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-sonnet-4-6",
        "fallbacks": [
          "ollama/qwen3-openclaw"
        ]
      }
    }
  },
  "providers": {
    "ollama": {
      "baseUrl": "http://127.0.0.1:11434"
    }
  }
}

This sets Claude as your primary model and Ollama as a fallback. You can also flip this — set Ollama as primary and Claude as fallback for when local isn't sufficient.

Hybrid routing tip: Set a per-agent model override for your heartbeat/background agent to use Ollama, and keep Claude for your main interactive agent. Most heartbeat tasks (scanning, logging, research) run fine on local models, saving Claude tokens for conversations that actually need it.

Step 5: Test It

Quick test from the OpenClaw CLI
# Force a specific model for one message
openclaw chat --model ollama/qwen3-openclaw "List 3 things you can help me with"

If it responds coherently with tool-aware output, you're set. If it returns garbled JSON or ignores tool schemas, recheck your Modelfile context size — that's almost always the culprit.

Making Ollama Survive Reboots

On Linux, Ollama installs as a systemd service automatically. Verify:

Verify Ollama service is enabled
systemctl status ollama
# Should show: active (running)

# If not enabled:
systemctl enable ollama
systemctl start ollama

On Mac, Ollama runs in the menu bar and starts automatically on login.

When to Use Local vs. Cloud

Local models are great for:

Stick with cloud APIs for:

Don't go 100% local for anything client-facing. Local models at the 7–14B scale are good — not great. For work that goes to clients, gets published, or makes decisions, keep a cloud model in the loop as primary or reviewer.

Performance Expectations (CPU-Only)

On a modern laptop or mini PC with no GPU:

For interactive chat, you'll feel the slowness of CPU inference. For background agents (heartbeat tasks, research scans, log writing), it doesn't matter — the agent runs while you sleep.

🦞 Want This Set Up Properly?

Getting Ollama + OpenClaw hybrid routing configured correctly takes about 30–60 minutes the first time — and most people hit at least one snag with context windows or model routing. ClawReady sets this up as part of our standard setup packages.

Book a Free 15-Min Call →