Gemma 4 (9.6GB) on a CPU-only VPS is entirely doable โ€” but OpenClaw's defaults are tuned for GPU inference or cloud APIs where responses arrive in seconds. On CPU, a single response can take 60โ€“300 seconds. Without specific config changes, OpenClaw will time out, kill the request, and leave you staring at an error.

This guide covers every setting you need to change, in the right order.

Minimum viable hardware: 8GB RAM for Gemma 4 4-bit quantized (Q4_K_M). 16GB recommended. A 4-core VPS will work but expect 2โ€“5 minute response times on complex prompts. This is a cost play, not a speed play.

Step 1 โ€” Install Ollama and Pull Gemma 4

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull Gemma 4 (4-bit quantized โ€” fits in 8GB RAM)
ollama pull gemma4:9b-instruct-q4_K_M

# Verify it loads (this will take a few minutes first run)
ollama run gemma4:9b-instruct-q4_K_M "say hello"

Wait for the model to fully load before touching OpenClaw config. If ollama run works, you know the model is functional on your hardware.

Step 2 โ€” The Critical Timeout Config

This is the #1 reason CPU setups fail. OpenClaw's default provider timeout is 60โ€“120 seconds โ€” fine for GPU, fatal for CPU inference on a 9.6GB model.

Edit ~/.openclaw/openclaw.json and add or update the model provider section:

{
  "models": {
    "default": "ollama/gemma4:9b-instruct-q4_K_M"
  },
  "providers": {
    "ollama": {
      "baseUrl": "http://127.0.0.1:11434",
      "timeout": 600,
      "firstTokenTimeout": 300,
      "streamTimeout": 600
    }
  }
}

Key values:

If you're seeing "LLM timeout" errors: It's almost always firstTokenTimeout that's too low. The model is thinking โ€” OpenClaw just doesn't know that and gives up.

Step 3 โ€” Disable Heartbeat or Set a Long Interval

A heartbeat firing every 30 minutes while Gemma 4 is mid-inference will stack requests and exhaust your RAM. Either disable it or set a longer interval during initial testing:

{
  "heartbeat": {
    "enabled": true,
    "intervalMs": 1800000,
    "timeoutMs": 540000
  }
}

That's a 30-minute interval with a 9-minute timeout โ€” giving Gemma 4 room to complete a response before the next heartbeat fires.

Step 4 โ€” Tune Ollama for Your RAM

Add these environment variables to Ollama's systemd unit or your shell before starting it:

# In /etc/systemd/system/ollama.service under [Service]:
Environment="OLLAMA_NUM_PARALLEL=1"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_KEEP_ALIVE=30m"
sudo systemctl daemon-reload
sudo systemctl restart ollama

Step 5 โ€” Reduce Context Window

Gemma 4 supports a large context window, but on CPU the full context is brutal on memory and inference time. Unless you need deep memory, cap it:

{
  "providers": {
    "ollama": {
      "options": {
        "num_ctx": 8192,
        "num_thread": 4
      }
    }
  }
}

Set num_thread to your VPS vCPU count. A 4-core VPS gets 4. This is one of the biggest performance levers available to you.

Step 6 โ€” Slim Down Your SOUL.md

Every heartbeat cycle loads your full workspace context into the prompt. On CPU with a capped context window, a bloated SOUL.md eats into the tokens available for actual responses. Keep SOUL.md under 500 words and trim memory files aggressively.

What to Expect on Different Hardware

VPS SpecRAMFirst TokenFull ResponseVerdict
2 vCPU / 8GBTight90โ€“180s3โ€“8 minMarginal โ€” works with tuning
4 vCPU / 16GBComfortable45โ€“90s2โ€“5 minGood โ€” recommended minimum
8 vCPU / 32GBHeadroom20โ€“45s1โ€“3 minSolid for daily use
Any GPU VPSFine1โ€“3s5โ€“30sMuch better โ€” consider upgrading

When CPU Doesn't Make Sense

CPU-only Gemma 4 is a cost optimization โ€” you're trading speed for ~$0 inference cost. It makes sense for:

It doesn't make sense if your agent handles real-time communications (Telegram, Discord) where a 3-minute response time will drive users away. For that, use a cloud API or at minimum a GPU VPS.

If you want this configured correctly without the trial-and-error โ€” book a ClawReady setup call. We've run OpenClaw on CPU-only hardware and know exactly what combination of settings works.