💰 Chapter 5 of 7 Phase 2: Operating

Cost and Tools

Optimization strategies and tool ecosystem

▶

Video Lesson Coming Soon

A video walkthrough for this chapter is in production. For now, dive into the written content below.

System Architecture — Chapter 5 View

This diagram reveals more of the OpenClaw architecture as you progress through chapters.

Project > Task > Model Hierarchy

Haiku 85%Sonnet 13%Opus 2%

routes to↓

OpenClaw Gateway

Port 18789 • Node.js v22 • 190K+ GitHub Stars

Message → Context → LLM → Tools → Response → Memory

connects↓

Messaging Channels

Slack

Discord

Signal

iMessage

Memory System

SOUL.mdUSER.mdIDENTITY.mdMEMORY.md

Heartbeat Daemon

Proactive monitoring every 30 min. Can offload to Ollama ($0/mo).

5-Layer Feedback LoopExecute → Observe → Remember → Analyze → Adapt

Cron Schedulerat • every • cron expressions

$1,500 → $30/mo97% cost reduction

ClawHub5,700+ skills

MCPTool protocol

Custom Skills30 min to build

Chapter 5 of 7 — 71% Architecture Revealed

What You'll Learn

✓ Token cost breakdown
✓ Cost optimization strategies
✓ Tool ecosystem overview
✓ MCP integration
✓ Third-party tools
✓ Custom tool development

In this chapter 8 sections

The Cost Problem and Optimization Landscape

See the cost difference between naive and optimized agent systems.

💸 The Cost Gap

Same System, Dramatically Different Costs

🔴

Unoptimized

$1,500+/mo

Everything sent to expensive models. No caching. No routing.

🟢

Optimized

$30-50/mo

Smart routing, caching, calibration. Same capabilities, 30-50x cheaper.

0/2

Interactive — tap to explore

The harsh reality: unoptimized AI agent deployments cost $1,500-$3,000+ monthly with modest usage. Why?

Cloud LLMs charge per token, and naive setups send everything to expensive models. A single customer conversation might involve 5-10 LLM calls, each consuming thousands of tokens. Multiply by hundreds of users and costs explode.

OpenClaw's genius is making costs reasonable through optimization. The same system that costs $1,500/month unoptimized can cost $30-50/month optimized.

This isn't about cutting corners—it's about routing each task to the right tool. The difference between cost-optimized and cost-naive deployments is typically a 30-50x factor. Getting this right is not optional if you're building production systems.

The good news: the optimizations are learnable and follow predictable patterns. You control the knobs that dramatically affect costs.

ℹ️

Cost Reality

Long-running agents with large context windows can cost $10-100 per session. Strategic routing, caching, and compression turn this into $1-10. Small changes compound.

The 85/15 Routing Strategy

Explore smart task routing to match complexity with model capability.

🔀 Cost-Aware Task Routing

Route Tasks to Save 97%

Haiku85%

$0.25 / $1.25 per 1M tokens

Sonnet13%

$3.00 / $15.00 per 1M tokens

Opus2%

$15.00 / $75.00 per 1M tokens

0/10

Interactive — tap to explore

The cornerstone cost optimization is the 85/15 routing strategy (sometimes 85/13/2). Send 85% of requests to Haiku (the cheapest model), 13% to Sonnet (mid-tier), and 2% to Opus (premium). This distribution works because most tasks don't need premium reasoning.

Haiku handles straightforward classification, summarization, formatting, and simple questions beautifully. Sonnet tackles nuanced decisions, complex analysis, and creative tasks. Opus handles only the most critical decisions and novel problems.

The distribution isn't arbitrary—it's based on real-world task distributions. In most domains, 85% of decisions are routine, 13% are moderately complex, and 2% are genuinely hard.

You determine routing by defining decision criteria: if the message is purely factual lookup, use Haiku; if it requires judgment across multiple factors, use Sonnet; if it's a critical business decision, use Opus. Implementation involves writing routing rules in your Task configuration. Once routing is correct, you've solved 80% of the cost problem.

85/15 Routing

Route 85% of requests to cost-efficient models and 15% to expensive frontier models. This maximizes throughput while concentrating compute on tasks that truly need it.

Model Pricing Comparison

Model	Input Cost (per 1M tokens)	Output Cost (per 1M tokens)	Best For
Claude Haiku	$0.25	$1.25	Routine classification, formatting, simple QA
Claude Sonnet	$3.00	$15.00	Complex analysis, nuanced reasoning, creativity
Claude Opus	$15.00	$75.00	Critical decisions, novel problems, edge cases

Token Calibration: Estimate → Run → Compare → Adjust

Follow the iterative calibration cycle for token optimization.

🔄 Calibration Cycle

Four Steps to Right-Size Your Token Usage

🔢

Estimate

→

▶️

Run

→

📊

Compare

→

🔧

Adjust

0/4

Interactive — tap to explore

Even with good routing, you need to calibrate how much context you're using per request. The calibration cycle follows four steps.

First, estimate: calculate how many tokens your context will consume (count words and multiply by ~1.3 for token ratio). Second, run: execute 10-20 real interactions and measure actual token usage. Third, compare: how close was your estimate?

If estimates are 2x actual usage, you're loading too much context; if they're 0.5x actual, you're under-loading. Fourth, adjust: tweak your memory loading strategy based on findings.

Maybe you're loading all of MEMORY.md when you only need the last 3 days; adjusting saves 40% of tokens. Calibration is iterative—run the cycle 3-4 times across different time periods and user types.

A fully calibrated system uses 30-40% fewer tokens than a naive one, directly translating to cost savings. Most teams under-invest in calibration and leave money on the table—it's tedious work, but the ROI is enormous.

Estimate

Calculate expected token usage based on input size and model selection.

Run

Execute the task and measure actual token consumption.

Compare

Analyze variance between estimate and reality.

Adjust

Refine your prompts and chunking strategy based on findings.

Prompt Caching: The 90% Discount

See how prompt caching dramatically reduces token costs.

💰 Prompt Caching Savings

How Caching Cuts Repeated Context Costs

$100,000/mo

SOUL.md in 50 daily requests (no cache)100K tokens

First request pays full cost-2%

Remaining 49 requests at 10% cost-90%

Total with caching78% saved

0/4

Interactive — tap to explore

OpenClaw supports prompt caching, a feature that offers 90% discounts on repeated content. When you include the same large context (like your SOUL.md file) in multiple requests, caching saves tokens.

The LLM API caches the first request's full context, and subsequent requests within a cache window (typically 5 minutes) pay only 10% of the token cost for that cached content. Example: Your SOUL.md is 2,000 tokens. The first request costs 2,000 tokens for context.

Identical requests within 5 minutes cost only 200 tokens for that context (90% savings). If you use the same SOUL.md in 50 requests daily, you'd normally pay 100,000 tokens—but with caching you pay ~22,000 tokens (78% savings on that component). Caching works best with stable, large context pieces.

Configuration is simple: mark which memory sections are cacheable (typically SOUL.md, IDENTITY.md, and large reference documents). The savings accumulate quickly and represent some of the easiest cost reductions available.

💡

Caching Wins

Cache large system prompts, knowledge bases, or static instructions. Cache hits reduce per-request costs dramatically. Most OpenClaw deployments see 30-50% overall savings.

Session Initialization and Context Loading

Compare naive and intelligent approaches to session initialization.

⚙️ Context Loading Strategies

Naive vs. Smart Session Initialization

🐌

Naive Loading

15,000 tokens

Load ALL memory files on EVERY request. Expensive and wasteful.

⚡

Smart Loading

3,000 tokens

Cache stable files, fresh user data, summarize MEMORY.md. 80% reduction.

0/2

Interactive — tap to explore

How you initialize sessions dramatically affects context size and costs. A naive approach loads all memory files on every request. A smart approach loads only what's needed.

Session initialization typically involves: loading SOUL.md and IDENTITY.md (small, cacheable), loading current user's preferences from USER.md, loading today's logs (but not all historical logs), loading a summary of MEMORY.md instead of the full file. This selective loading achieves 80% reduction in average context size compared to loading everything.

For example, full initialization might be 15,000 tokens, optimized initialization might be 3,000 tokens—5x reduction. The tradeoff is that you sacrifice some context depth for efficiency.

Practically, you retain enough context to serve users well while dramatically reducing costs. Configuration involves defining 'context profiles'—different initialization strategies for different task types.

A quick-answer profile loads minimal context; a complex-reasoning profile loads richer context. Profiling ensures you spend tokens where they matter.

Tools, Skills, and the ClawHub Ecosystem

OpenClaw extends agent capabilities through Skills—reusable functions that agents can invoke. The platform provides built-in skills for common operations (sending messages, fetching URLs, querying databases), and the community has built 5,700+ additional skills published on ClawHub. Skills are typically lightweight—parsing JSON, formatting text, calling APIs.

They execute in milliseconds and cost nothing beyond the LLM token cost to decide whether to use them. OpenClaw also integrates with the Model Context Protocol (MCP), an emerging standard for tool-ai integration that lets you use tools from major platforms (browser control, email, file operations, etc.). The skill ecosystem means you rarely need to build capabilities from scratch.

Before writing custom code, check ClawHub—the skill you need likely exists. However, be cautious: malicious skills exist (discussed in Chapter 6). Prefer bundled skills or well-reviewed community skills.

If the exact skill doesn't exist, building a custom skill takes 20-30 minutes and eliminates supply chain risk. The 5,700+ skills represent genuine leverage—use them to extend your agents without reinventing infrastructure.

Combined Cost Savings Example

Track dramatic cost reductions through comprehensive optimization.

💰 Total Savings Breakdown

From $1,500/mo to $30/mo

$1,500/mo

Session Initialization Rules-87%

Kill Word Protocol-$100/mo

Model Routing (85% Haiku)-92%

Heartbeat → Ollama ($0/mo)-100%

Prompt Caching-90%

Token Calibration-67%

0/6

Interactive — tap to explore

Optimization	Monthly Message Volume	Without Optimization	With Optimization	Savings
85/15 Routing	10,000 messages	$1,400	$280	80%
Token Calibration	10,000 messages	$1,400	$900	36%
Prompt Caching	10,000 messages	$1,400	$1,260	10%
Local Heartbeat Filtering	48 daily checks	$150/month	$15/month	90%
All Optimizations Combined	10,000 + heartbeat	$1,550	$50	97%

Key Takeaways

📝 My Notes▼

← Beyond Reactive Chat Security →