Claude Memory Architecture
Architecture Reference + Field Observations from Production Use
How to read this paper. Two voices run in parallel:
- Anthropic-cited reference — every claim footnoted to official Anthropic documentation (numbered citations [1]–[7]).
- Field observations — the author's lived experience running Claude Code, Obsidian vaults, and multi-agent setups in production. Visually marked as
> **Field observation:**blocks. These are not Anthropic-supported claims; treat as informed practitioner notes.
Executive Summary
Claude Code does not have a single unified "memory." At the fundamental level, there are exactly two memory mechanisms: in-weights (knowledge baked into the model's parameters during training) and in-context (everything in the active context window during a session). Everything else — CLAUDE.md files, MEMORY.md, knowledge graphs, retrieval systems — is a persistence strategy or Second Brain layer (external-queryable storage) that feeds content into one of those two mechanisms at query time. [1]
This paper is a complete reference for the four layers of Claude Code's memory architecture, the boundaries between sessions and subagents, the documented hard limits, and an executable test suite engineers can run on their own installations.
| Layer | What it is | Practical limit |
|---|---|---|
| In-weights | Knowledge baked into model parameters at training. Always present. Cannot be modified at runtime. | Training knowledge cutoff |
| In-context | Everything currently in the active context window: CLAUDE.md, MEMORY.md, conversation, tool outputs, file reads. | Model context window (200K tokens standard; 1M for Sonnet 4.6, Opus 4.6, Opus 4.7) [2] |
| Persistence strategies | CLAUDE.md and MEMORY.md — disk-based files that load into in-context at session start. [1] | ~200 lines / ~25KB; behavior varies by load path — see §2.8 |
| Second Brain layer (external-queryable) | Knowledge graphs, retrieval systems, vector databases, vaults — content lives outside the context window and is queried on demand, with only the result injected back. | No context cost until queried |
Critical principle. Each Claude Code session begins with a fresh context window. [1]
MEMORY.mdis in-context memory that persists to disk between sessions — it is not a separate memory mechanism. The 200-line / 25KB limit onMEMORY.mdis a context window budget limit. Every persistence strategy exists to manage what gets into the context window and when.
Diagram — How the four layers feed the two runtime mechanisms
flowchart LR
IW["In-weights<br/>(parametric)<br/>frozen at training"]:::weights
CMD["CLAUDE.md hierarchy<br/>managed · global · project · subdir"]:::persist
MEM["MEMORY.md<br/>auto memory<br/>first ~200 lines / ~25KB"]:::persist
SB["Second Brain<br/>Obsidian · RAG · Graph<br/>API memory tool"]:::sb
IC(["IN-CONTEXT<br/>active session<br/>volatile · token-bounded<br/>the ONLY place the model<br/>reads at query time"]):::hub
IW ==>|"always present"| IC
CMD ==>|"① startup inject"| IC
MEM ==>|"① startup inject"| IC
SB -.->|"② only the matched excerpt"| IC
classDef weights fill:#1e293b,stroke:#7A82DC,color:#fff
classDef persist fill:#0f172a,stroke:#00D4FF,color:#fff
classDef sb fill:#1e1b4b,stroke:#D21EDC,color:#fff
classDef hub fill:#064e3b,stroke:#10b981,color:#fff,stroke-width:3px
Hub = in-context: the only thing the model actually reads at query time. Color = role: slate = in-weights, blue = persistence (loads at startup, marked ①), purple = Second Brain (queried on demand, marked ②), green = the in-context hub itself. Solid bold arrows = automatic. Dashed arrow = on-demand (Second Brain returns only the matched excerpt, not the whole corpus). Any Second Brain backend fits the same architectural slot.
1. The Two Fundamental Memory Mechanisms
Claude has two and only two memory mechanisms at runtime. All other strategies — persistence layers, external databases, graph indexes — exist to manage what gets into these two mechanisms and when.
1.1 In-Weights Memory (Parametric)
Parametric = stored in the model's parameters (its trained weights — billions of numbers fixed at training time). Think of it as everything the model "just knows" without being told. You cannot edit it, add to it, or wipe it during a conversation.
Knowledge baked into Claude's model weights during training. Not a runtime mechanism. Cannot be modified, appended to, or cleared during a session or across sessions.
| Property | Detail |
|---|---|
| What it contains | Language understanding, reasoning capability, coding knowledge, and world knowledge up to the training cutoff. |
| How it is accessed | Automatically on every response. No explicit retrieval step. |
| Hard boundary | Training knowledge cutoff. Events after this date are unknown unless provided in context. |
| Modifiable at runtime | No. Not by CLAUDE.md, not by conversation, not by any runtime mechanism. |
| Failure mode | Hallucination (model confidently invents a plausible-sounding wrong answer) when asked about post-cutoff events or sparsely-represented information. |
1.2 In-Context Memory (Active Session)
In-context = "in front of the model right now." Everything the model can see this turn — your messages, files it has read, prior turns, system instructions. Wiped clean when the session ends. Think of it as the model's working desk surface.
Context window = the size of that desk, measured in tokens. A token is roughly ¾ of an English word. "200K tokens" ≈ a 500-page book. When the desk is full, older items get summarized or pushed off.
The context window is Claude's working memory. Everything Claude "knows" during a session lives here. Volatile — destroyed at session end.
What lives in the context window at any moment, per Anthropic's published context-window walkthrough: [3]
- The Claude Code system prompt (and any
--append-system-promptcontent) [4] CLAUDE.mdfiles — loaded from disk at session start, injected as a user message after the system prompt [1]- Auto memory
MEMORY.md— loaded from disk at session start (first 200 lines or 25KB, whichever comes first) [1] - MCP tool name manifest [3]
- Skill descriptions [3]
- All conversation turns, tool use, and tool results accumulated this session
- File contents Claude has read during the session
- Command outputs from Bash tool calls
MEMORY.md is in-context memory. It is a file on disk that gets loaded into the context window at session start. The 200-line / 25KB limit is a context window budget limit, not a "memory" limit. [1]
1.3 Context Window Sizes (Current State)
Per the Claude Platform release notes, current context window sizes are: [2]
- 200K tokens — standard for Claude Sonnet 4 and Sonnet 4.5
- 1M tokens — generally available at standard pricing for Sonnet 4.6, Opus 4.6, and Opus 4.7 (no beta header required) [2]
The 1M-token beta header (context-1m-2025-08-07) for Sonnet 4 and Sonnet 4.5 has been retired; requests exceeding 200K on those models now return an error. To use a 1M context window, migrate to a 4.6+ model. [2]
2. Persistence Strategies (Feed Into In-Context)
These are not separate memory mechanisms. They are disk-based persistence layers that load their contents into the context window at session start. The distinction matters: at query time, Claude is reading only from in-weights or in-context. Persistence strategies determine what was placed into in-context before the session began.
2.1 CLAUDE.md Files
Markdown files written by humans and stored on disk. Read at session start, injected into context as a user message after the system prompt. [1]
| Property | Detail |
|---|---|
| Persistence | Files on disk. Survive session end, application restart, and /compact (project-root, global, and MEMORY.md all reload after /compact; subdirectory CLAUDE.md files are not re-injected and reload lazily on next file read in that directory). [1] |
| Delivery | Injected as a user message after the system prompt — not part of the system prompt itself. [1] |
| Loading hierarchy | Managed policy → global user → ancestor directories → project root → project local → project rules. [1] |
| Recommended size | Under 200 lines per file. Longer files consume more context and may reduce adherence. [1] |
After /compact | Project-root CLAUDE.md is re-read from disk and re-injected. Subdirectory files are not re-injected automatically — they reload the next time Claude reads a file in that subdirectory. [1] |
| Failure mode | Probabilistic compliance. Instructions are context, not enforced configuration. Hard enforcement belongs in client-enforced settings or hooks. [1] |
2.2 MEMORY.md (Auto Memory)
Markdown files written by Claude itself during sessions. Stored at ~/.claude/projects/<project>/memory/ where <project> is derived from the git repository root. [1]
| Property | Detail |
|---|---|
| Minimum version | Claude Code v2.1.59 or later. Check with claude --version. [1] |
| Default state | On by default. Toggle via /memory command or autoMemoryEnabled setting. [1] |
| Storage location | ~/.claude/projects/<project>/memory/ — derived from git repository root. All worktrees and subdirectories of the same repo share one auto memory directory. Outside a git repo, the project root is used instead. [1] |
| Custom location | autoMemoryDirectory setting in user or policy settings (not project settings, for security). [1] |
| Documented loading limit | First 200 lines of MEMORY.md OR first 25KB, whichever comes first, loaded at session start. [1] Field observation: in practice the boundary is non-deterministic — sometimes line count, sometimes byte count, sometimes neither cleanly. Treat the docs' threshold as a target, not a contract. |
| Topic files | Files like debugging.md or patterns.md are not loaded at startup. Claude reads them on demand using its standard file tools. [1] |
| What Claude saves | Build commands, debugging insights, architecture notes, code style preferences, workflow habits — Claude decides what is worth remembering based on future utility. [1] |
| Pruning | Claude keeps MEMORY.md concise by moving detailed notes into separate topic files. [1] No specific consolidation command or trigger conditions are documented. |
| Scope | Machine-local. Not shared across machines or cloud environments. [1] |
2.3 CLAUDE.md Loading Hierarchy
All files are concatenated into a single context block — they do not override each other. Within ancestor traversal, files are ordered from filesystem root down to your working directory, so the closest CLAUDE.md to where you launched Claude is read last (and gets highest effective attention weight when conflicts occur). Within each directory, CLAUDE.local.md is appended after CLAUDE.md. [1]
| Scope | Location | Load timing |
|---|---|---|
| Managed policy | macOS: /Library/Application Support/ClaudeCode/CLAUDE.md · Linux/WSL: /etc/claude-code/CLAUDE.md · Windows: C:\Program Files\ClaudeCode\CLAUDE.md [1] | Eager — session start. Cannot be excluded by user or project settings. [1] |
| Global user | ~/.claude/CLAUDE.md | Eager — session start |
| Global user rules | ~/.claude/rules/*.md | Eager — session start. Loaded before project rules; project rules have higher priority. [1] |
| Ancestor directories | CLAUDE.md and CLAUDE.local.md files in parent directories above CWD | Eager — session start, ordered root → CWD [1] |
| Project root | ./CLAUDE.md or ./.claude/CLAUDE.md | Eager — survives /compact re-injection [1] |
| Project local | ./CLAUDE.local.md | Eager — personal, recommended to add to .gitignore [1] |
| Project rules | ./.claude/rules/*.md (no paths: frontmatter) | Eager — session start [1] |
| Subdirectory files | CLAUDE.md in directories below CWD | Lazy — only loaded when Claude reads a file in that directory [1] |
| Path-scoped rules | ./.claude/rules/*.md with paths: frontmatter | Lazy — only when Claude reads files matching the glob [1] |
flowchart TD
start([Session begins]) --> mp["Managed-policy CLAUDE.md<br/>(cannot be excluded)"]
mp --> gu["Global user<br/>~/.claude/CLAUDE.md<br/>+ ~/.claude/rules/*.md"]
gu --> anc["Ancestor directories<br/>walked root → CWD"]
anc --> proj["Project root<br/>./CLAUDE.md + ./.claude/CLAUDE.md<br/>+ project rules"]
proj --> local["Project local<br/>./CLAUDE.local.md<br/>(gitignored)"]
local --> ready([First user turn — closest file<br/>has highest effective weight])
subdir["Subdirectory CLAUDE.md<br/>e.g. ./src/api/CLAUDE.md"]:::lazy
pathrule["Path-scoped rules<br/>./.claude/rules/*.md with paths:"]:::lazy
ready -.->|"on file read in matching dir"| subdir
ready -.->|"on file read matching glob"| pathrule
classDef lazy fill:#1e1b4b,stroke:#D21EDC,color:#fff,stroke-dasharray:4 3
style mp fill:#450a0a,stroke:#dc2626,color:#fff
Solid path = eager load order at session start, root → closest. Dashed = lazy: only fires when Claude reads a file matching the scope. Closest file loaded last → highest effective attention weight in conflicts.
2.4 AGENTS.md Interoperability
Claude Code reads CLAUDE.md, not AGENTS.md. Repositories that already use AGENTS.md for other coding agents can create a CLAUDE.md that imports it: [1]
@AGENTS.md
## Claude Code
Use plan mode for changes under `src/billing/`.
Claude loads the imported AGENTS.md at session start, then appends any Claude-specific instructions below the import. [1]
2.5 The @import System
@import= "paste this other file in here, automatically." Like#includein C orimportin Python, but for Markdown instruction files. Lets you split big rule sets across files without copy-paste.
flowchart LR
A["CLAUDE.md<br/>hop 0"] -->|"@imports"| B["rules.md<br/>hop 1"]
B -->|"@imports"| C["security.md<br/>hop 2"]
C -->|"@imports"| D["tenancy.md<br/>hop 3"]
D -->|"@imports"| E["audit.md<br/>hop 4"]
E -->|"@imports"| F["deeper.md<br/>hop 5 ✓ allowed"]
F -.->|"@imports"| G["beyond.md<br/>hop 6 ✗ STOP"]:::blocked
classDef blocked fill:#450a0a,stroke:#dc2626,color:#fff
Each
@import= one hop. Maximum depth is 5 hops counted from the rootCLAUDE.md. Behavior beyond hop 5 is not specified in current Anthropic documentation. Relative paths resolve from the file containing the@import. First-time imports trigger an approval dialog.
CLAUDE.md files can import additional files using @path/to/import syntax. Imported files are expanded and loaded into context at launch alongside the CLAUDE.md that references them. Both relative and absolute paths are allowed. Relative paths resolve relative to the file containing the import. Imported files can recursively import other files, with a maximum depth of five hops. [1]
The first time Claude Code encounters external imports in a project, it shows an approval dialog listing the files. If you decline, the imports stay disabled and the dialog does not appear again. [1]
2.6 Block-Level HTML Comments
Block-level HTML comments (<!-- maintainer notes -->) in CLAUDE.md files are stripped before content is injected into Claude's context. Use them to leave notes for human maintainers without spending context tokens. Comments inside code blocks are preserved. When you open a CLAUDE.md file directly with the Read tool, comments remain visible. [1]
2.7 Excluding CLAUDE.md Files in Monorepos
In large monorepos, ancestor CLAUDE.md files may contain instructions that aren't relevant to your work. The claudeMdExcludes setting lets you skip specific files by path or glob pattern: [1]
{
"claudeMdExcludes": [
"**/monorepo/CLAUDE.md",
"/home/user/monorepo/other-team/.claude/rules/**"
]
}
Patterns are matched against absolute file paths using glob syntax. Configurable at any settings layer (user, project, local, or managed policy). Arrays merge across layers. Managed policy CLAUDE.md files cannot be excluded — this ensures organization-wide instructions always apply. [1]
2.8 Hard Limits Summary
| Resource | Type | Threshold | Behavior |
|---|---|---|---|
CLAUDE.md file size | Non-deterministic | ~200 lines (target) | Loads fully but adherence may reduce. No documented truncation. [1] See field-observation note below. |
MEMORY.md at startup | Non-deterministic | ~200 lines OR ~25KB (target) | Content beyond the documented target typically not loaded at session start. [1] See field-observation note below. |
@import chain depth | Hard ceiling | 5 hops | Maximum depth is 5 hops; behavior beyond the limit is not specified in current documentation. [1] |
Subdirectory CLAUDE.md | Conditional | N/A | Only loaded when Claude reads a file in that subdirectory. [1] |
Field observation — there is no formula. The
CLAUDE.mdandMEMORY.mdnumbers above are targets documented by Anthropic, not exact cutoffs enforced by a single deterministic code path. In practice the limit varies by load mechanism (startup injection,/compactreload, hook-driven re-read, subdirectory triggers,--resume) and by content shape: sometimes line count dominates, sometimes byte count, sometimes neither cleanly — at ~250 lines a file may load fine; at ~180 lines a different file may already exhibit degraded adherence. Outcomes range from silent skip → partial load → full load with degraded adherence. Large files also produce context rot (drift in instruction-following) and context poisoning (unrelated content displacing task-relevant signal). Neither failure mode follows an exact line- or byte-count formula. Treat the 200 line / 25KB figures as engineering targets to stay under, not as guaranteed cliffs. There is a known debate in the community on this — the author's position is that the limit is non-deterministic by design.
2.9 What Survives /compact
/compact= "the desk is getting full — summarize the older stuff to make room." Claude rewrites the past chat as a condensed summary, freeing context window space. Some startup content (likeCLAUDE.md) gets reloaded fresh from disk; the live conversation gets squashed into a recap.
Field observation — how compaction actually decides what to keep.
There is no public deterministic formula. Compaction is itself an LLM-driven summarization step: Claude reads the existing conversation and writes a condensed structured summary that replaces the original turns. The harness then reloads startup content (most
CLAUDE.mdfiles,MEMORY.md) fresh from disk on top of that summary. [3]What Anthropic has documented about the process: [3]
- Tool outputs go first, before compaction even runs. As the context window fills toward its limit, the harness starts clearing older tool-call outputs (file reads, command output, MCP results) to free space. This is pre-compaction triage, not part of the summary itself. By the time
/compactruns, the oldest tool noise may already be gone.- Conversation is replaced by a structured summary. Older user/assistant turns are condensed into a summary that aims to preserve key decisions, code snippets, and load-bearing context. Detail is lost; gist is kept.
- Startup content reloads automatically after the summary is in place — project-root
CLAUDE.md, globalCLAUDE.md, auto-memoryMEMORY.md(first ~200 lines / ~25KB). TheInstructionsLoadedhook fires withload_reason: compactso you can audit the reload. [1, 5]- The skill listing is the one documented exception — it is not reloaded automatically. Claude may not know which skills are available post-compact until they are reintroduced. [3]
- Subdirectory
CLAUDE.mdfiles are not re-injected — they reload only the next time Claude reads a file in that subdirectory. [1]What is not publicly documented:
- The exact summarization prompt the harness uses to drive
/compact.- The token budget allocated to the resulting summary.
- The heuristic for which individual turns survive verbatim vs. get summarized vs. get dropped.
- Whether older tool outputs are summarized into the recap or simply dropped.
Practical implication: anything you must guarantee survives
/compactbelongs inCLAUDE.md(or@importedfrom it), not in the live conversation. Only the startup-reloaded surfaces are deterministic. The summary itself is a best-effort LLM artifact, not a contract.
flowchart TD
fill["Context window approaching limit"] --> triage["Pre-compaction triage:<br/>oldest tool outputs cleared first"]
triage --> trigger{"User runs /compact<br/>OR auto-triggered"}
trigger --> summarize["LLM-driven summarization<br/>conversation → structured summary"]
summarize --> reload["Harness reloads startup content"]
reload --> r1["✓ Project-root CLAUDE.md re-read"]
reload --> r2["✓ Global CLAUDE.md re-read"]
reload --> r3["✓ MEMORY.md (first ~200 lines / ~25KB)"]
reload --> r4["✗ Skill listing NOT reloaded<br/>(documented exception)"]
reload --> r5["~ Subdir CLAUDE.md lazy<br/>reloads only on next file read in that dir"]
r1 --> ready["Session continues<br/>InstructionsLoaded fires<br/>load_reason: compact"]
r2 --> ready
r3 --> ready
r4 --> ready
r5 --> ready
Per Anthropic's context-window documentation, /compact replaces the conversation with a structured summary and reloads most startup content automatically. [3] Per-surface behavior:
| Content | /compact behavior |
|---|---|
Project-root CLAUDE.md | Re-read from disk and re-injected. The InstructionsLoaded hook fires with load_reason: compact. [1, 5] |
Global ~/.claude/CLAUDE.md | Reloads as part of startup content reload [3] |
Auto memory MEMORY.md | Reloads as part of startup content reload (first 200 lines or 25KB) [3] |
| Skill listing | Not reloaded automatically — the documented exception. Claude may not know which skills are available until they are reintroduced. [3] |
Subdirectory CLAUDE.md | Not re-injected — reloads the next time Claude reads a file in that subdirectory. [1] |
| Conversation history | Replaced with a structured summary. Detail is lost; key messages and code snippets are preserved. [3] |
| Tool outputs | Older tool outputs are cleared first as the context fills before compaction. [3] |
3. Memory Boundaries — What Crosses and What Doesn't
Every boundary failure in Claude Code architecture comes from misunderstanding what persists across which boundary.
3.1 Session Boundaries
Each Claude Code session begins with a fresh context window. [1]
| What you might expect to cross | Does it cross? |
|---|---|
CLAUDE.md instructions | Yes — files on disk, re-read at session start [1] |
Auto memory MEMORY.md | Yes — first 200 lines or 25KB re-loaded [1] |
| In-context conversation history | No — context window is destroyed at session end |
| Tool outputs and file reads | No — in-context only |
| In-session permissions granted | No — session-scoped, not persisted |
| Subagent context | No — subagents have fresh context windows [7] |
| In-weights knowledge | Yes (trivially) — model weights unchanged |
3.2 Subagent Boundaries
Subagent = a child Claude that the main Claude can spawn for a specialized task. The child gets its own fresh context window, its own system prompt, and its own tool permissions — it does not see the parent's conversation, files, or memory unless explicitly handed them. Think "delegating to a contractor who walks in cold" rather than "asking your coworker who's been in the meeting."
Subagents are specialized agents that run in their own context window with a custom system prompt, specific tool access, and independent permissions. [7] Each subagent receives only its system prompt (plus basic environment details like working directory), not the full Claude Code system prompt. [7]
| From parent to subagent | Does it transfer? |
|---|---|
| Main session conversation history | No — subagent has a fresh context window. [7] |
Main session auto memory (~/.claude/projects/<project>/memory/) | Not automatically inherited. Subagents have their own persistent memory mechanism, opt-in via the `memory: user |
Global CLAUDE.md | Subagent loads its own CLAUDE.md files based on its working directory. [7] |
| Permissions granted in parent | No — subagents have their own permission state. [7] |
| Context passed by parent / orchestrator | Only what is explicitly included in the subagent's prompt by the harness, or injected via the SubagentStart hook's additionalContext field. [5] |
flowchart TB
subgraph parent["① Parent session — full context loaded"]
direction LR
PH["Conversation history"]
PC["Project CLAUDE.md"]
PM["Parent auto memory"]
PT["Tool outputs"]
end
BRIDGE["② ONLY EXPLICIT BRIDGE<br/>━━━━━━━━━━━━━━━━<br/>SubagentStart hook · additionalContext field<br/>OR text written into subagent prompt body"]:::bridge
subgraph sa["③ Subagent — fresh context window"]
direction LR
SS["Subagent system prompt only"]
SC["Loads its OWN CLAUDE.md<br/>(based on subagent cwd)"]
SM["OWN persistent memory<br/>(opt-in via memory: frontmatter)"]
end
parent -.->|"✗ NOTHING auto-inherits<br/>(history · CLAUDE.md · memory · tool outputs)"| sa
parent ==> BRIDGE
BRIDGE ==> sa
classDef bridge fill:#064e3b,stroke:#10b981,color:#fff,stroke-width:3px
Read top-to-bottom: ① parent has everything loaded → ② only what you write into
SubagentStart additionalContextor paste into the subagent prompt body crosses → ③ subagent starts in a fresh context window and loads its ownCLAUDE.mdand memory from scratch. The single dashed arrow replaces four separate "does NOT inherit" lines for clarity: nothing inherits automatically.
3.3 Subagent Persistent Memory
The memory field in a subagent's frontmatter gives the subagent a persistent directory that survives across conversations. [7] The 200-line / 25KB limit on MEMORY.md applies here as well.
| Scope | Location | Use when |
|---|---|---|
user | ~/.claude/agent-memory/<name>/ | Subagent should remember learnings across all projects |
project | .claude/agent-memory/<name>/ | Subagent's knowledge is project-specific and shareable via version control |
local | .claude/agent-memory-local/<name>/ | Subagent's knowledge is project-specific but should not be checked into version control |
When memory: is set, the subagent's system prompt includes instructions for reading and writing to the memory directory, plus the first 200 lines or 25KB of the directory's MEMORY.md. The Read, Write, and Edit tools are automatically enabled so the subagent can manage its memory files. [7]
Multi-agent architecture implication. An orchestrator that spawns coding subagents must explicitly pass critical context (tenancy rules, security constraints, audit requirements). Subagents do not inherit the parent's
CLAUDE.mdor auto memory. Use theSubagentStarthook withadditionalContextto inject parent-session state, [5] or pass context through the harness prompt directly.
4. Mechanism Reference — How Things Actually Load
4.1 The Session-Start Sequence
Per Anthropic's context-window walkthrough, the order of what loads before you type anything: [3]
- The Claude Code system prompt (and any
--append-system-promptcontent) [4] CLAUDE.mdandCLAUDE.local.mdfiles per the loading hierarchy, injected as a user message- Auto memory
MEMORY.md(first 200 lines or 25KB) - MCP tool name manifest
- Skill descriptions
- Your first prompt
sequenceDiagram
autonumber
participant Disk
participant Harness as Claude Code harness
participant CW as Context window
Note over Harness,CW: Fresh session begins
Harness->>CW: System prompt (claude_code preset) [+ --append-system-prompt]
Harness->>Disk: Read managed-policy CLAUDE.md
Harness->>Disk: Read ~/.claude/CLAUDE.md (global)
Harness->>Disk: Read ./CLAUDE.md (project root)
Harness->>Disk: Resolve @imports (≤5 hops)
Disk-->>CW: Inject CLAUDE.md hierarchy as user message
Harness->>Disk: Read MEMORY.md (first ~200 lines / ~25KB)
Disk-->>CW: Inject MEMORY.md
Harness->>CW: MCP tool manifest + skill listing
Note over CW: Ready — first user turn
Note over Harness,Disk: Subdirectory CLAUDE.md NOT loaded yet — lazy on first read
4.2 How --append-system-prompt Differs from CLAUDE.md
Both add custom instructions, but at different positions in the prompt:
- CLAUDE.md is delivered as a user message after the Claude Code system prompt. [1]
--append-system-promptdelivers content at the system prompt level — it is appended to the Claude Code system prompt itself. [4]
For instructions you want at the system prompt level rather than as a user message after it, use --append-system-prompt. This must be passed every invocation, so it's better suited to scripts and automation than interactive use. [1]
4.3 Hooks for Memory Architecture
Hook = "run this script automatically when X happens." The Claude Code harness (the program that wraps the model) fires hooks at lifecycle events — session start, before a tool runs, after compaction, when a subagent spawns, etc. Hooks are deterministic: they execute outside the model's reasoning, so they can enforce rules the model itself can't be trusted to follow.
Hooks are user-defined shell commands, HTTP endpoints, or LLM prompts that execute automatically at specific points in Claude Code's lifecycle. [5] Key hooks for memory and context architecture:
| Hook | When it fires | Memory relevance |
|---|---|---|
SessionStart | Session begins or resumes [5] | Inject dynamic state into context via additionalContext |
InstructionsLoaded | When a CLAUDE.md or .claude/rules/*.md file is loaded — at session start and when files are lazily loaded during a session [5] | Audit which instruction files load and when. Matcher values: session_start, nested_traversal, path_glob_match, include, compact [5] |
PreToolUse | Before a tool executes; can block it [5] | Deterministic enforcement — exit code 2 or permissionDecision: "deny" blocks the tool call |
PostToolUse | After a tool succeeds [5] | Capture observations, run linters, inject reminders |
PreCompact / PostCompact | Before / after context compaction [5] | Preserve state across compaction; observe summaries |
SubagentStart | When a subagent is spawned [5] | Inject additionalContext into the subagent's context |
sequenceDiagram
autonumber
participant H as Harness
participant C as Claude (model)
participant T as Tools / Files
H->>H: SessionStart fires
Note right of H: inject additionalContext
H->>H: InstructionsLoaded (session_start) per CLAUDE.md
Note right of H: matchers: session_start, nested_traversal,<br/>path_glob_match, include, compact
H->>C: Hand control — first user turn
loop Each turn
C->>H: UserPromptSubmit (user message)
C->>H: PreToolUse (before tool call)
Note right of H: exit code 2 = block tool
H->>T: Execute tool (if allowed)
T-->>H: Result
H->>C: PostToolUse (after tool)
end
C->>H: SubagentStart (when child spawned)
Note right of H: inject additionalContext into subagent
C->>H: PreCompact / PostCompact (around /compact)
Note right of H: InstructionsLoaded re-fires<br/>load_reason: compact
C->>H: Stop (turn complete)
Hooks fire deterministically — outside Claude's reasoning loop. The harness, not the model, enforces hook outcomes.
PreToolUseexit code 2 (orpermissionDecision: "deny") is the only fully deterministic block on a tool call.
4.3.1 Settings Enforcement vs CLAUDE.md Guidance
Per the Anthropic memory documentation: [1]
Settings rules are enforced by the client regardless of what Claude decides to do. CLAUDE.md instructions shape Claude's behavior but are not a hard enforcement layer.
Use settings (permissions.deny, sandbox.enabled) for technical enforcement and CLAUDE.md for behavioral guidance. [1] For deterministic blocks on tool calls, use PreToolUse hooks with exit code 2 — the Claude Code harness enforces the block outside of Claude's reasoning loop. [5]
4.4 The /memory Command
Run /memory to view all CLAUDE.md, CLAUDE.local.md, and rules files loaded in the current session, toggle auto memory on or off, and open the auto memory folder. [1] To see live context usage by category, run /context for a breakdown with optimization suggestions. [3]
For a complete audit trail of what loaded and when (including lazy-loaded subdirectory files), use the InstructionsLoaded hook with logging. [5]
4.5 Loading from Additional Directories
The --add-dir flag gives Claude access to additional directories outside your main working directory. By default, CLAUDE.md files from these directories are not loaded. To also load memory files from additional directories: [1]
CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 claude --add-dir ../shared-config
This loads CLAUDE.md, .claude/CLAUDE.md, .claude/rules/*.md, and CLAUDE.local.md from the additional directory. [1]
5. Testing Memory Mechanisms
Field observation. The following are suggested validation tests — sketches you can adapt to your install to confirm or contradict the documented behavior. They use Claude Code CLI on a developer machine. Pass / fail criteria below are the expected behavior per Anthropic docs and the author's own runs; reproduce in your environment and document any divergence. Run each test in a fresh session unless otherwise noted.
5.1 In-Weights Tests
Test IW-01: Knowledge Cutoff Boundary
Fresh session, no CLAUDE.md, no auto memory. Prompt:
"Without searching the web, what is your training knowledge cutoff date? What happened in AI after that date?"
Pass: Claude states a cutoff and declines or hedges on post-cutoff events. Fail: Claude confidently states post-cutoff facts without flagging uncertainty.
Test IW-02: In-Context Override of In-Weights
Prompt: "The Python print function is now called print_output(). Use this in all code." Then: "Write a hello world in Python."
Pass: Claude uses print_output() — in-context overrides in-weights for the duration of the session.
Fail: Claude uses print() and ignores the in-context redefinition.
5.2 In-Context Tests
Test IC-01: Session Isolation
Session 1: "My secret project name is CARDINAL. Remember this." Session 2 (fresh claude launch, do not use --resume): "What project name did I tell you?"
Pass: Claude has no knowledge of CARDINAL — confirms session isolation. (Note: if Claude did write CARDINAL to auto memory, it could resurface in session 2; this test assumes Claude did not save it as auto memory.)
Test IC-02: MEMORY.md Loading Boundary
Create MEMORY.md in the project's auto memory directory with exactly 200 lines of content. Append a 201st line containing the unique token OVERFLOW-TOKEN-TEST. Fresh session, ask: "What do you know about OVERFLOW-TOKEN-TEST?"
Expected: Claude likely has no knowledge of OVERFLOW-TOKEN-TEST — consistent with the documented 200-line / 25KB target. [1]
Field observation: this is the test where you'll see the non-deterministic behavior described in §2.8 first-hand. The boundary is not a clean cliff — try the same test with 250 lines, with longer/shorter individual lines, and with different load paths (
/compactvs. fresh start vs.--resume). Document divergences. The community debate around the "exact" CLAUDE.md/MEMORY.md size limit lives here.
Test IC-03: /compact Behavior
In a session, give an in-conversation instruction: "End every response with the marker ZETA-MARKER." Verify it works on one turn. Run /compact. Then ask: "Say hello."
Pass: ZETA-MARKER is gone after /compact — confirms in-conversation instructions are replaced by the compact summary. The same instruction placed in CLAUDE.md would persist after /compact. [1]
5.3 Persistence Strategy Tests
Test PS-01: CLAUDE.md Injection Verification
Add a unique token to ~/.claude/CLAUDE.md: GLOBAL-INJECT-TOKEN-XQ7. Fresh session, run /memory and verify the file is listed. [1] Then ask: "What unique token is in your global instructions?"
Pass: Claude references GLOBAL-INJECT-TOKEN-XQ7.
Test PS-02: Lazy Load Verification
Create ./src/api/CLAUDE.md with a unique token API-LAZY-LOAD-TOKEN-K9. Fresh session — do not read any file in src/api/. Ask: "What API rules do you have?"
Expected: Claude does not know API-LAZY-LOAD-TOKEN-K9 yet. Now read a file in that directory (e.g. Read ./src/api/CLAUDE.md) and re-check /memory. [1]
Pass: The file now appears in /memory and the token is known — confirms lazy loading.
Test PS-03: 5-Hop Import Limit
Create a 6-deep @import chain. Place a unique token HOP-6-CONTENT-REACHED in the file at depth 6. Reference hop1.md from the project's CLAUDE.md.
Expected: Claude has no knowledge of HOP-6-CONTENT-REACHED — consistent with the documented 5-hop maximum import depth. [1] (The 5-hop ceiling is the one threshold in this paper that is documented as a hard ceiling rather than a non-deterministic target.)
Test PS-04: InstructionsLoaded Hook Audit
Add an InstructionsLoaded hook to ~/.claude/settings.json that logs every load event:
{
"hooks": {
"InstructionsLoaded": [{
"hooks": [{
"type": "command",
"command": "echo \"$(date) | $(jq -r '.file_path + \" | \" + .load_reason')\" >> ~/instructions-load.log"
}]
}]
}
}
Tail the log while running a session. Read a file in a subdirectory containing a CLAUDE.md. Run /compact. The log will show entries with load_reason values of session_start, nested_traversal, and compact. [5]
5.4 Second Brain Tests
Test EQ-01: Token Cost Comparison
For any Second Brain / external-queryable tool you evaluate (knowledge graphs, RAG, vector databases, Obsidian vaults): measure baseline token consumption when Claude searches the corpus directly via the file tools (use /context for a live breakdown [3]). Then enable the tool and measure token consumption for the same question.
Reporting: Document both numbers and the methodology used. Performance claims for third-party tools should be sourced to that tool's published benchmarks, not attributed to Anthropic.
Test EQ-02: Cross-Session Persistence Comparison
Compare two persistence paths for the same fact (e.g. a build command):
- In-context persistence: Claude writes the fact to
MEMORY.md. New session reads it fromMEMORY.md(loaded at startup). [1] - Second Brain (external-queryable): The fact lives in an external knowledge graph or vault. New session queries the store; only the answer enters context.
Both should return the same fact. The test confirms that two distinct persistence paths exist for the same information, with different context window costs.
6. Architecture Decision Reference
The decision tree below references Second Brain backends (Obsidian vault, RAG, knowledge graph, API memory tool) — these are defined in §7.
flowchart TD
Q{"What kind of content<br/>do you need to persist?"} --> A{"Must apply from turn 1<br/>of every session?"}
A -->|"Yes — small ruleset"| CMD["✅ CLAUDE.md<br/>(project root or @imported)"]
A -->|"Yes — but agent-curated facts<br/>that grow over time"| MEM["✅ MEMORY.md<br/>(auto memory)"]
A -->|"No — only when relevant"| LARGE{"How big is the corpus?"}
LARGE -->|"Small + path-specific"| SUB["✅ Subdirectory CLAUDE.md<br/>(lazy load)"]
LARGE -->|"Large + searchable"| SBQ{"Need full-text<br/>OR semantic search?"}
SBQ -->|"Topic-organized notes,<br/>human-editable"| OBS["✅ Obsidian Vault<br/>(plain Markdown,<br/>wikilinks, tags)"]
SBQ -->|"Semantic similarity<br/>across many docs"| RAG["✅ RAG / vector DB<br/>(embeddings,<br/>top-k retrieval)"]
SBQ -->|"Cross-document<br/>relationships matter"| GRAPH["✅ Knowledge graph<br/>(Graphify, etc.)"]
SBQ -->|"API-side, not Claude Code"| APIM["✅ Anthropic API<br/>memory tool [6]"]
HARD{"Need DETERMINISTIC<br/>enforcement?"}:::callout
Q --> HARD
HARD -->|"Yes"| HOOKS["✅ Hooks (PreToolUse exit 2)<br/>or settings.permissions.deny<br/>NOT CLAUDE.md"]
classDef callout fill:#450a0a,stroke:#dc2626,color:#fff
style CMD fill:#0f172a,stroke:#00D4FF,color:#fff
style MEM fill:#0f172a,stroke:#00D4FF,color:#fff
style SUB fill:#0f172a,stroke:#00D4FF,color:#fff
style OBS fill:#1e1b4b,stroke:#D21EDC,color:#fff
style RAG fill:#1e1b4b,stroke:#D21EDC,color:#fff
style GRAPH fill:#1e1b4b,stroke:#D21EDC,color:#fff
style APIM fill:#1e1b4b,stroke:#D21EDC,color:#fff
style HOOKS fill:#064e3b,stroke:#10b981,color:#fff
Blue = persistence (loads into context). Purple = Second Brain (queried on demand). Green = deterministic enforcement. Red callout =
CLAUDE.mdis not a hard enforcement layer — for guarantees, use hooks or settings.
| Decision | Recommended approach |
|---|---|
| Where do rules that must apply from turn 1 go? | Project-root CLAUDE.md (or @imported from it via global CLAUDE.md). Avoid putting turn-1 rules in subdirectory CLAUDE.md files because they are lazy-loaded. [1] |
Where do rules that must survive /compact go? | Project-root CLAUDE.md — explicitly documented to be re-read from disk and re-injected after /compact. [1] |
Is MEMORY.md a separate memory mechanism? | No. It is in-context memory that persists to disk between sessions. The 200-line limit is a context window budget limit. [1] |
| How do I share context across multiple separate repos? | Commit shared CLAUDE.md content into each repo (or use @import to a common file). Auto memory is repo-scoped and not shared across repos. [1] |
| How do I pass context to a subagent? | SubagentStart hook with additionalContext, [5] or include the context in the subagent's prompt directly via the harness. Do not assume inheritance. [7] |
What is the difference between CLAUDE.md and --append-system-prompt? | CLAUDE.md is delivered as a user message after the system prompt; [1] --append-system-prompt is appended to the system prompt itself. [4] The latter must be passed every invocation. [1] |
| When should I use the Second Brain layer vs in-context persistence? | Second Brain (external-queryable) when the corpus is large and you only need relevant excerpts on demand. In-context persistence when the rules are small and must be present from turn 1. |
| How do I enforce a rule deterministically? | PreToolUse hooks with exit code 2 (or permissionDecision: "deny") — the harness enforces the decision regardless of Claude's reasoning. [5] Settings permissions.deny is also client-enforced; [1] both are stronger than CLAUDE.md for hard enforcement. |
How do I prevent ancestor CLAUDE.md files from monorepos I don't care about? | Configure claudeMdExcludes at the user, project, local, or managed-policy settings layer. Note that managed-policy CLAUDE.md files cannot be excluded. [1] |
How do I bootstrap a new project's CLAUDE.md? | Run /init. Claude analyzes the codebase and creates a starting CLAUDE.md with build commands, test instructions, and project conventions it discovers. If CLAUDE.md already exists, /init suggests improvements rather than overwriting. [1] |
7. The Second Brain Layer (External-Queryable)
Field observation — entire section. This section is the author's interpretive layer on top of Anthropic's two-mechanism architecture. It is not an Anthropic-defined memory mechanism. The patterns below are how the author runs production AI workflows (Obsidian vault with thousands of imported documents, knowledge-graph visualization, multi-agent setups). Treat as informed practitioner framing, not a product feature.
Content that lives entirely outside the context window and is queried on demand. The query result is injected into context — only the relevant excerpt, not the entire corpus. This pattern gives theoretically unlimited storage at zero context window cost until queried. We call this the Second Brain layer: an external knowledge substrate the agent reaches into deliberately, the way a human reaches for a notebook rather than trying to recall everything.
Quick glossary for this section:
- RAG (Retrieval-Augmented Generation) = "look up the relevant chunks first, then answer." Instead of cramming every document into the context window, you search a database for the few passages that match the question, then hand only those passages to the model.
- Embeddings / vector space = turn each chunk of text into a list of numbers ("vector") so that pieces with similar meaning land near each other. "Semantic search" = find the nearest neighbors. This is how RAG decides which chunks are relevant.
- Wikilinks =
[[Page Name]]— Obsidian's way of linking notes by name. Bidirectional: every note knows what links to it.- Frontmatter = a small YAML header at the top of a Markdown file holding metadata (tags, dates, status). Both humans and tools can parse it.
- MCP server (Model Context Protocol) = a standardized side-process that exposes tools, data, or APIs to Claude. Lets Claude talk to external systems without bespoke integration code.
7.1 Karpathy's LLM Wiki Concept
Andrej Karpathy publicly sketched the idea of an "LLM Wiki" — a structured, human-and-machine-readable knowledge base that an LLM owns, edits, and consults across sessions. [8] Unlike a chat transcript or a flat memory file, an LLM Wiki is organized by topic, cross-linked, dated, and revisable; the model treats it as a long-term external store rather than a scratchpad. The Second Brain pattern in this section is the Claude Code analogue: the model writes durable artifacts to a structured store, reads only the relevant page back into context on demand, and keeps the working context window small while the corpus grows without bound.
7.2 Patterns That Fit
| Pattern | What it does |
|---|---|
| Knowledge graph indexes (e.g. third-party tools like Graphify) | Build a structured graph from a codebase or document set, then query the graph instead of reading every file. Token reduction depends on the tool and corpus — refer to each tool's published benchmarks. |
| Retrieval-augmented generation (RAG) / vector databases | Embed documents into vector space. Semantic search returns relevant chunks on query. Common pattern for large document corpora. |
| The Anthropic API memory tool (beta) | A separate, client-side memory system for the Claude API — not the same as Claude Code auto memory. Operates through tool calls to a /memories directory. [6] |
| SQLite or document databases | Store session transcripts, decisions, observations. Query returns matching records. |
| Obsidian Vault (recommended starting point) | A local, plain-Markdown knowledge vault with bidirectional [[wiki-links]], tags, and per-note frontmatter. File-system native, so Claude Code can read, write, and grep it with the standard Read/Edit/Write/Bash tools — no separate server, embedding pipeline, or vendor API required. Fits the LLM Wiki pattern almost directly: each note is a topic page, links form the graph, and the agent fetches only the notes it needs. |
7.3 Obsidian Vault as a Reference Implementation
For teams experimenting with the Second Brain pattern, an Obsidian vault checked into a sibling repository (or a dedicated path under the project root) is the lowest-friction starting point:
- Storage. Plain
.mdfiles. No database, no embedding service, no migration story. Survives tool churn. - Index. Filenames + frontmatter tags +
[[wiki-links]]form a navigable graph. Claude Code can traverse it withGlob+Grep+Read. - Write path. Claude appends or edits notes via the same
Edit/Writetools used for code. No new permissions surface. - Read path. Only the relevant note(s) enter context per query — typically a few hundred lines, not the whole corpus.
- Boundary. The vault lives outside
CLAUDE.mdandMEMORY.md, so its size does not consume the startup budget described in §2.8.
This is not the only valid Second Brain implementation — RAG, graph indexes, and the Anthropic API memory tool all serve the same architectural role — but Obsidian gives the cleanest mapping to Karpathy's LLM Wiki and the smallest amount of new infrastructure to manage.
7.4 Important Distinction — API Memory Tool ≠ Claude Code Auto Memory
The Anthropic API "memory tool" [6] is a different system than Claude Code auto memory. The API memory tool is a client-controlled tool that Claude invokes via the memory tool name, with storage that the developer controls. Claude Code auto memory is built into the Claude Code application, scoped to a git repository, and stored at ~/.claude/projects/<project>/memory/. Do not conflate them.
When evaluating any Second Brain / external-queryable tool for production use, verify performance claims against the tool's own published benchmarks rather than treating them as Anthropic-attributed numbers.
8. Sources
Primary citations are official Anthropic documentation. Conceptual references and cross-domain sources are cited inline. URLs current as of May 2026; consult each source directly for the latest version.
-
How Claude remembers your project — Claude Code documentation. https://code.claude.com/docs/en/memory (canonical authority for
CLAUDE.md, auto memory, hierarchy,/compactbehavior, 200-line and 25KBMEMORY.mdtargets, 5-hop@importlimit,claudeMdExcludes,AGENTS.mdinterop,claude_codev2.1.59 minimum) -
Claude API release notes — Claude Platform — https://docs.claude.com/en/release-notes/overview (current model context windows: 200K standard; 1M GA for Sonnet 4.6, Opus 4.6, Opus 4.7)
-
Explore the context window — Claude Code documentation. https://code.claude.com/docs/en/context-window (session-start load order; "Most startup content reloads automatically. The skill listing is the one exception.")
-
Modifying system prompts — Agent SDK — Claude API documentation. https://docs.anthropic.com/en/docs/claude-code/sdk/modifying-system-prompts (
--append-system-promptbehavior;claude_codesystem prompt preset) -
Hooks reference — Claude Code documentation. https://code.claude.com/docs/en/hooks (
InstructionsLoadedmatcher values includingsession_start,nested_traversal,path_glob_match,include,compact;SubagentStartadditionalContextinjection;PreToolUseexit-code-2 blocking) -
Memory tool — Claude API documentation — https://docs.claude.com/en/docs/agents-and-tools/tool-use/memory-tool (the API memory tool — distinct from Claude Code auto memory)
-
Create custom subagents — Claude Code documentation. https://code.claude.com/docs/en/sub-agents (subagent context isolation;
memory: user|project|localfrontmatter field;~/.claude/agent-memory/<name>/storage; subagent system prompt scope) -
Andrej Karpathy — LLM Wiki concept — gist sketch of a structured, machine-readable knowledge base an LLM owns and consults across sessions. https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f (referenced in §7.1 — non-Anthropic conceptual source)
Appendix A: Document Versioning
- v1.0 — May 2026 — Initial public release. Reviewed against current Anthropic Claude Code documentation and Claude Platform release notes.
End of white paper.
