A post on r/openclaw last week claimed a 95% token reduction using a "workspace compiler" approach. The thread blew up โ hundreds of comments, mostly people asking "how?" with very few concrete answers.
We've been using a version of this pattern across our installs for months. Here's the full explanation.
The Problem: Context Bloat
Every time OpenClaw processes a request, it loads your workspace context โ SOUL.md, AGENTS.md, TOOLS.md, memory files, and whatever other project context files you have. On a mature workspace, this adds up fast.
The issue isn't that your files are too long โ it's that most of the content is:
- Stale decisions that no longer apply
- Duplicate information spread across multiple files
- Verbose prose where a single line of dense structured data would work better
- Completed tasks still sitting in active memory files
- Tool notes and instructions relevant to setup but not to daily operation
The math matters: At Claude Opus 4 pricing (~$15/million input tokens), 8,000 wasted tokens per request ร 100 requests/day = $12/day = $360/month in pure waste. A well-compiled workspace paying 400 tokens instead cuts that to $18/month.
What a Workspace Compiler Does
A workspace compiler is a script (or agent task) that periodically reads all your workspace files and produces a single, dense, deduplicated context file โ memory/compiled-context.md โ optimized for token efficiency.
Think of it like a build step for your agent's brain. The source files stay human-readable and verbose. The compiled output is tightly packed structured data that says the same things in 1/10th the tokens.
Before compilation (SOUL.md excerpt, ~600 tokens):
# SOUL โ Identity, tone, and boundaries
## Identity
I am DoIt, Josh's Chief of Staff and sole point of contact.
My primary role is to run a 24/7 autonomous organization focused on
generating growing monthly income through micro-SaaS, crowdfunding,
underserved market opportunities, and online business operations.
## Tone
Friendly and casual; can produce professional documents when needed.
Short replies unless Josh asks for reasoning.
Proactive โ suggest actions, don't wait to be told everything.
Think like a business partner, not a chatbot.
[... continues for 200 more lines ...]
After compilation (~40 tokens):
AGENT: DoIt | CEO: Josh | MISSION: grow MRR via micro-SaaS/crowdfunding/services
TONE: casual, short, proactive | ROLE: chief-of-staff, all delegation flows through here
BOUNDARIES: ask before destructive/paid/production actions
The agent gets the same functional information. It just takes 15x fewer tokens to convey it.
How to Implement It
Option 1: Manual compilation (free, takes 30 min once)
Go through each of your workspace files and create a compressed version in memory/compiled-context.md. Rules:
- Every fact goes on one line, structured as
KEY: valueor comma-separated - Remove all prose that exists to be human-readable โ the agent doesn't need it
- Merge duplicate information from different files into one entry
- Archive completed tasks โ move them to
memory/archive/, not just strike them through - Keep only facts the agent needs in every single request. Rarely-needed info stays in the full files and gets loaded on demand.
Then update your OpenClaw config to inject memory/compiled-context.md as the primary workspace file instead of loading all your full files by default.
Option 2: Agent-driven compilation (automated)
Add a compilation task to your HEARTBEAT.md that runs weekly:
## Weekly Compilation Task (every Sunday, low-priority)
- Read all files in memory/ and workspace root
- Identify: stale decisions, duplicate facts, completed tasks, verbose prose
- Update memory/compiled-context.md with compressed structured version
- Archive completed items to memory/archive/
- Log token count before/after to memory/heartbeat-log.md
This way your compiled context stays fresh automatically as your workspace evolves.
Option 3: The full compiler script
For advanced users โ a Node.js or Python script that reads your workspace, strips comments and prose, deduplicates across files, and outputs a structured context block. We have a template we share with ClawReady clients.
What Actually Gets the 95% Reduction
The 95% claim in the Reddit post isn't typical โ that user had a very bloated workspace. More realistic expectations:
- Fresh workspace (under 1 month old): 30โ50% reduction
- Mature workspace (3โ6 months): 60โ80% reduction
- Very mature workspace with lots of memory files: 80โ95% reduction
The longer you've been using OpenClaw without pruning, the more you gain from compilation.
The Three Files That Bloat the Most
- memory.md โ accumulates every decision, task, and note ever recorded. Archive completed items aggressively.
- TOOLS.md โ often contains setup instructions that were relevant once but aren't needed in every request. Move setup notes to a separate file that's not auto-injected.
- heartbeat-log.md โ injecting the full log into every context is wasteful. Only inject the last 7 days; archive the rest.
Quick win: Even without a full compiler, trimming those three files alone typically cuts context size by 40โ50%. Start there โ it takes 20 minutes.
Is This Worth Doing?
Yes, if:
- You've been using OpenClaw for more than a month
- Your API costs feel higher than they should be
- You notice responses getting slower or less focused over time
- Your workspace has more than 10 active files being injected
Probably not urgent if:
- You're using a local model (token cost = 0)
- Your workspace is less than 2 weeks old
- You're already on a cheap model for most tasks
If you want this done properly as part of a ClawReady setup or audit โ it's part of our standard optimization pass. Book a call and we'll look at your specific workspace.