Cost and Tools
Optimization strategies and tool ecosystem
Video Lesson Coming Soon
A video walkthrough for this chapter is in production. For now, dive into the written content below.
System Architecture — Chapter 5 View
This diagram reveals more of the OpenClaw architecture as you progress through chapters.
What You'll Learn
- ✓ Token cost breakdown
- ✓ Cost optimization strategies
- ✓ Tool ecosystem overview
- ✓ MCP integration
- ✓ Third-party tools
- ✓ Custom tool development
In this chapter 8 sections
The Cost Problem and Optimization Landscape
See the cost difference between naive and optimized agent systems.
Interactive — tap to explore
The harsh reality: unoptimized AI agent deployments cost $1,500-$3,000+ monthly with modest usage. Why?
Cloud LLMs charge per token, and naive setups send everything to expensive models. A single customer conversation might involve 5-10 LLM calls, each consuming thousands of tokens. Multiply by hundreds of users and costs explode.
OpenClaw's genius is making costs reasonable through optimization. The same system that costs $1,500/month unoptimized can cost $30-50/month optimized.
This isn't about cutting corners—it's about routing each task to the right tool. The difference between cost-optimized and cost-naive deployments is typically a 30-50x factor. Getting this right is not optional if you're building production systems.
The good news: the optimizations are learnable and follow predictable patterns. You control the knobs that dramatically affect costs.
Long-running agents with large context windows can cost $10-100 per session. Strategic routing, caching, and compression turn this into $1-10. Small changes compound.
The 85/15 Routing Strategy
Explore smart task routing to match complexity with model capability.
Interactive — tap to explore
The cornerstone cost optimization is the 85/15 routing strategy (sometimes 85/13/2). Send 85% of requests to Haiku (the cheapest model), 13% to Sonnet (mid-tier), and 2% to Opus (premium). This distribution works because most tasks don't need premium reasoning.
Haiku handles straightforward classification, summarization, formatting, and simple questions beautifully. Sonnet tackles nuanced decisions, complex analysis, and creative tasks. Opus handles only the most critical decisions and novel problems.
The distribution isn't arbitrary—it's based on real-world task distributions. In most domains, 85% of decisions are routine, 13% are moderately complex, and 2% are genuinely hard.
You determine routing by defining decision criteria: if the message is purely factual lookup, use Haiku; if it requires judgment across multiple factors, use Sonnet; if it's a critical business decision, use Opus. Implementation involves writing routing rules in your Task configuration. Once routing is correct, you've solved 80% of the cost problem.
Route 85% of requests to cost-efficient models and 15% to expensive frontier models. This maximizes throughput while concentrating compute on tasks that truly need it.
Model Pricing Comparison
| Model | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) | Best For |
|---|---|---|---|
| Claude Haiku | $0.25 | $1.25 | Routine classification, formatting, simple QA |
| Claude Sonnet | $3.00 | $15.00 | Complex analysis, nuanced reasoning, creativity |
| Claude Opus | $15.00 | $75.00 | Critical decisions, novel problems, edge cases |
Token Calibration: Estimate → Run → Compare → Adjust
Follow the iterative calibration cycle for token optimization.
Interactive — tap to explore
Even with good routing, you need to calibrate how much context you're using per request. The calibration cycle follows four steps.
First, estimate: calculate how many tokens your context will consume (count words and multiply by ~1.3 for token ratio). Second, run: execute 10-20 real interactions and measure actual token usage. Third, compare: how close was your estimate?
If estimates are 2x actual usage, you're loading too much context; if they're 0.5x actual, you're under-loading. Fourth, adjust: tweak your memory loading strategy based on findings.
Maybe you're loading all of MEMORY.md when you only need the last 3 days; adjusting saves 40% of tokens. Calibration is iterative—run the cycle 3-4 times across different time periods and user types.
A fully calibrated system uses 30-40% fewer tokens than a naive one, directly translating to cost savings. Most teams under-invest in calibration and leave money on the table—it's tedious work, but the ROI is enormous.
Calculate expected token usage based on input size and model selection.
Execute the task and measure actual token consumption.
Analyze variance between estimate and reality.
Refine your prompts and chunking strategy based on findings.
Prompt Caching: The 90% Discount
See how prompt caching dramatically reduces token costs.
Interactive — tap to explore
OpenClaw supports prompt caching, a feature that offers 90% discounts on repeated content. When you include the same large context (like your SOUL.md file) in multiple requests, caching saves tokens.
The LLM API caches the first request's full context, and subsequent requests within a cache window (typically 5 minutes) pay only 10% of the token cost for that cached content. Example: Your SOUL.md is 2,000 tokens. The first request costs 2,000 tokens for context.
Identical requests within 5 minutes cost only 200 tokens for that context (90% savings). If you use the same SOUL.md in 50 requests daily, you'd normally pay 100,000 tokens—but with caching you pay ~22,000 tokens (78% savings on that component). Caching works best with stable, large context pieces.
Configuration is simple: mark which memory sections are cacheable (typically SOUL.md, IDENTITY.md, and large reference documents). The savings accumulate quickly and represent some of the easiest cost reductions available.
Cache large system prompts, knowledge bases, or static instructions. Cache hits reduce per-request costs dramatically. Most OpenClaw deployments see 30-50% overall savings.
Session Initialization and Context Loading
Compare naive and intelligent approaches to session initialization.
Interactive — tap to explore
How you initialize sessions dramatically affects context size and costs. A naive approach loads all memory files on every request. A smart approach loads only what's needed.
Session initialization typically involves: loading SOUL.md and IDENTITY.md (small, cacheable), loading current user's preferences from USER.md, loading today's logs (but not all historical logs), loading a summary of MEMORY.md instead of the full file. This selective loading achieves 80% reduction in average context size compared to loading everything.
For example, full initialization might be 15,000 tokens, optimized initialization might be 3,000 tokens—5x reduction. The tradeoff is that you sacrifice some context depth for efficiency.
Practically, you retain enough context to serve users well while dramatically reducing costs. Configuration involves defining 'context profiles'—different initialization strategies for different task types.
A quick-answer profile loads minimal context; a complex-reasoning profile loads richer context. Profiling ensures you spend tokens where they matter.
Tools, Skills, and the ClawHub Ecosystem
OpenClaw extends agent capabilities through Skills—reusable functions that agents can invoke. The platform provides built-in skills for common operations (sending messages, fetching URLs, querying databases), and the community has built 5,700+ additional skills published on ClawHub. Skills are typically lightweight—parsing JSON, formatting text, calling APIs.
They execute in milliseconds and cost nothing beyond the LLM token cost to decide whether to use them. OpenClaw also integrates with the Model Context Protocol (MCP), an emerging standard for tool-ai integration that lets you use tools from major platforms (browser control, email, file operations, etc.). The skill ecosystem means you rarely need to build capabilities from scratch.
Before writing custom code, check ClawHub—the skill you need likely exists. However, be cautious: malicious skills exist (discussed in Chapter 6). Prefer bundled skills or well-reviewed community skills.
If the exact skill doesn't exist, building a custom skill takes 20-30 minutes and eliminates supply chain risk. The 5,700+ skills represent genuine leverage—use them to extend your agents without reinventing infrastructure.
Combined Cost Savings Example
Track dramatic cost reductions through comprehensive optimization.
Interactive — tap to explore
| Optimization | Monthly Message Volume | Without Optimization | With Optimization | Savings |
|---|---|---|---|---|
| 85/15 Routing | 10,000 messages | $1,400 | $280 | 80% |
| Token Calibration | 10,000 messages | $1,400 | $900 | 36% |
| Prompt Caching | 10,000 messages | $1,400 | $1,260 | 10% |
| Local Heartbeat Filtering | 48 daily checks | $150/month | $15/month | 90% |
| All Optimizations Combined | 10,000 + heartbeat | $1,550 | $50 | 97% |