Skip to main content
BlogResourcesPodcast
⚙️ Chapter 3 of 7 Phase 1: Understanding

The Engine Room

Architecture, message flow, and memory

Video Lesson Coming Soon

A video walkthrough for this chapter is in production. For now, dive into the written content below.

System Architecture — Chapter 3 View

This diagram reveals more of the OpenClaw architecture as you progress through chapters.

Mission Control Dashboardlocalhost:3001
LIVE
TRUST FrameworkSkill vetting active
ClawHavoc Defense1,184 threats blocked
Project > Task > Model Hierarchy
Haiku 85%Sonnet 13%Opus 2%
routes to
OpenClaw Gateway
Port 18789 • Node.js v22 • 190K+ GitHub Stars
Message → Context → LLM → Tools → Response → Memory
connects
Messaging Channels
WhatsApp
Telegram
Slack
Discord
Signal
iMessage
Memory System
SOUL.mdUSER.mdIDENTITY.mdMEMORY.md
Heartbeat Daemon
Proactive monitoring every 30 min. Can offload to Ollama ($0/mo).
5-Layer Feedback LoopExecute → Observe → Remember → Analyze → Adapt
Cron Schedulerat • every • cron expressions
$1,500 → $30/mo97% cost reduction
ClawHub5,700+ skills
MCPTool protocol
Custom Skills30 min to build
Chapter 3 of 7 — 43% Architecture Revealed

What You'll Learn

  • Message flow architecture
  • Memory system deep dive
  • Context window management
  • Decision-making process
  • Configuration internals
  • Session state management
In this chapter 6 sections

The Message Flow Pipeline

Trace the complete end-to-end message processing flow.

Complete Message Cycle
From Incoming Message to Response (2-5 seconds)
1
Context Assembly
Load SOUL.md, IDENTITY.md, user context, session logs
2
LLM Call
Assembled context sent to chosen model with reasoning prompt
3
Response Generation
Model outputs text, possibly with tool call directives
4
Tool Execution
If tools called, run sequentially with results fed back
5
Response Formatting
Final text adapted for target channel
6
Memory Update
Interaction logged, significant findings saved to persistent memory
0/6

Interactive — tap to explore

Understanding OpenClaw's message flow reveals how it achieves both power and efficiency. When a message arrives (from any channel), the Gateway receives it and initiates the Agent Loop cycle.

First, context assembly: the system loads relevant memory files (SOUL.md, IDENTITY.md, user context, session logs). Second, LLM call: the assembled context goes to the chosen model with a prompt that guides reasoning.

Third, response generation: the model outputs text, which might include tool calls (special directives like [EXEC: send_email]). Fourth, tool execution: if tools were called, they run sequentially with results fed back for analysis. Fifth, response formatting: the final text gets adapted for the target channel.

Finally, memory update: the entire interaction gets logged, with significant findings saved to persistent memory. This complete cycle—from incoming message to outgoing response with learning—typically completes in 2-5 seconds.

1
Receive

Gateway accepts message from user or agent.

2
Route

Determine target agent and context requirements.

3
Enrich

Load relevant memory, session state, and tool definitions.

4
Process

Agent reasons, calls tools, and generates response.

5
Store

Save memory updates and session state.

6
Return

Send response back through original channel.

Memory System Deep Dive

Compare caching strategies to optimize memory and token usage.

🧠 Caching Strategy
What to Cache vs. Keep Fresh
Cache Aggressively
SOUL.md, IDENTITY.md
Rarely change — reload only on startup. Saves 60-75% tokens.
🔄
Always Fresh
USER.md, Logs
Changes per interaction — must read each cycle for accuracy.
📊
Summarize
MEMORY.md
Grows over time — periodically compress old entries.
0/3

Interactive — tap to explore

The memory system's efficiency comes from understanding which files to cache and which to keep fresh. SOUL.md should be cached aggressively—it rarely changes, and including it in every context assembly is redundant. IDENTITY.md is similarly stable.

USER.md entries for the current user should always be fresh (to catch preference changes), but historical users' files can be summarized. Daily logs should be read fresh each cycle to capture the current session's context.

MEMORY.md grows over time and might accumulate outdated information, so it's worth periodically summarizing (rolling old entries into a compressed summary). The HEARTBEAT.md file is small and should be fresh.

Most OpenClaw deployments cache aggressively at the file level, re-reading SOUL.md and IDENTITY.md only on startup, while treating USER.md and logs as session-fresh. This strategy typically reduces token consumption by 60-75% compared to naïve approaches that load everything on each call.

Memory Deep Dive

An exploration of how OpenClaw's memory layers interact: short-term context windows for active reasoning, long-term stores for facts and history, and sliding windows to manage token consumption efficiently.

Session Management and Context Windows

OpenClaw manages sessions with awareness of the context window problem. Each messaging platform maintains a session—a logical continuity within a channel. When you message your agent on Slack, you're in a Slack session.

When you message it on WhatsApp, that's a separate session with its own daily logs and context. The challenge emerges from how messaging platforms handle history.

Many platforms (notably WhatsApp, which resends entire chat history on reconnection) can cause a 'session history cost bomb': if your agent and user exchange 100 messages, and WhatsApp resends all 100 on the next check, you pay token costs for re-processing. OpenClaw mitigates this by batching platform requests (checking for new messages once per heartbeat instead of continuously) and by keeping daily logs separate from persistent memory (only summaries and important findings persist). The assembled prompt sent to the model typically hovers between 2,000-8,000 tokens depending on the model and session history length—note this is the size of the assembled prompt per interaction, not the model's maximum capacity.

⚠️
Context Window Limits

Larger context windows give better reasoning but cost more tokens. Monitor your token burn rate. Use summarization and chunking to stay within budget without losing critical context.

Kill Words and Session Clearing

Follow the session reset process triggered by kill words.

Kill Word Flow
What Happens When a Kill Word Is Detected
1
User Says Kill Word
'forget everything', 'reset session', 'new conversation'
2
Agent Loop Detects
Matches against configured kill word list
3
Session Logs Cleared
Today's daily logs truncated (1-2 seconds)
4
Persistent Memory Preserved
MEMORY.md, USER.md preferences remain intact
5
Fresh Session Begins
Agent starts with clean short-term context
0/5

Interactive — tap to explore

Sometimes you need to reset an agent's session—clear its short-term memory and start fresh. OpenClaw provides 'kill words': specific phrases that trigger session clearing.

You might configure 'forget everything', 'reset session', or 'new conversation' as kill words. When the Agent Loop detects a kill word, it clears the current session's daily logs but preserves persistent memory (MEMORY.md, USER.md preferences).

This is useful when conversations go off-track, when you want to test fresh behavior, or when a user explicitly requests a fresh start. Kill words are channel-specific—you can configure different ones for Slack vs WhatsApp. They're a graceful alternative to restarting the entire agent. Kill words typically execute within 1-2 seconds since they just truncate a file rather than running a new reasoning cycle.

🧠
Clean Shutdown

Kill words trigger immediate session cleanup, flushing memory and releasing resources. Use them to end long-running processes gracefully.

In OpenClaw, the Gateway is your fortress. Everything passes through it. Everything is logged.

Security Basics and Gateway Binding

Verify your gateway security configuration meets critical standards.

🔒 Gateway Security Check
Is Your Gateway Secure?
Bind to 127.0.0.1 only
Use reverse proxy (nginx)
Enable HTTPS via certbot
Authentication tokens set
Binding to 0.0.0.0
No reverse proxy
0/6

Interactive — tap to explore

A critical security consideration: the Gateway must bind to localhost (127.0.0.1) exclusively. Binding to 0.0.0.0 (all interfaces) exposes your agent to the internet without authentication.

Kaspersky researchers found over 30,000 exposed OpenClaw instances—almost all were bound to 0.0.0.0. This remains a leading vulnerability.

Since v2026.1.29, OpenClaw requires authentication tokens for all Gateway access, but binding to 127.0.0.1 and using a reverse proxy (nginx or similar) for external access is still the secure pattern. The Gateway should only be directly accessible from the same machine or a trusted internal network. Any external access should flow through a proxy that enforces authentication. This simple principle—localhost binding + authenticated proxy—eliminates the vast majority of attack surface.

Performance Optimization Strategies

See how multiple optimizations compound to reduce operating costs.

💰 Cumulative Cost Impact
Watch Each Optimization Stack
$1,500/mo
1
Batch tool calls (group into single cycle)-20%
2
Prompt caching on stable context-90%
3
Model routing (85% Haiku)-85%
4
Request debouncing-15%
5
Archive old daily logs to summaries-40%
6
Full optimization suite= $80/mo
0/6

Interactive — tap to explore

The Engine Room demands efficiency to stay cost-effective. Beyond memory caching, several strategies optimize performance.

First, batch tool calls: instead of making three API requests in sequence (which requires three reasoning cycles), group them and execute once. Second, use prompt caching when the same context appears repeatedly (we'll cover this in Chapter 5).

Third, configure appropriate model selection per task—never use Opus when Haiku suffices. Fourth, implement request debouncing for high-frequency messages (don't process every keystroke-indicator as a new message).

Fifth, regularly archive old daily logs into summaries, preventing them from bloating context windows. A well-optimized OpenClaw instance can handle hundreds of concurrent conversations while spending under $50/month on model tokens. Inefficient setups with the same throughput might spend $1,500+.

Key Takeaways

📝 My Notes
← What It Creates Beyond Reactive Chat →