A "session" in Claude Code:
- Starts with `claude` command that you run in the bash terminal to start Claude Code
- Multiple back-and-forth exchanges with the LLM
- End with `/exit` command or by pressing Ctrl+C
Within that single session:
- System components (CLAUDE.md files, Tools) load once at start
- Conversation history accumulates with each exchange
- Context window fills up over time
- Cache reduces costs but doesn't free context space
When you `/exit` and start a new session, everything resets -- new context window, fresh cache, clean slate. For effective Claude Code usage, understanding both caching (cost optimization) and context space (capacity management) is crucial.
How Caching works in Claude Code with Anthropic API backend?
Claude Code enables prompt caching by default and so you don't need to configure it manually. When you start a Claude Code session:
- Initial request: Claude Code sends the complete system prompt (including CLAUDE.md, tool definitions, and memory files) to the Anthropic API
- Anthropic caches this prefix: The API recognizes the stable beginning of your prompt and caches it
- Subsequent requests: Only the dynamic parts (your new messages, tool outputs) are sent; the cached prefix is reused
Aside: Prompt caching operates on a prefix matching principle. The prefix must be byte-for-byte identical for caching to work. If anything changes in the first part of the prompt, the entire cache is invalidated.
Notably, the following actions will invalidate your cache:
- Changing tool order between requests
- Adding/removing tools mid-session
- Modifying the system prompt (including adding timestamps)
- Switching models (cache is model-specific)
- Using different cache_control markers
So technically, your system components (System prompt, CLAUDE.md, tool definitions, etc.) is in every request, but after the first request, you only pay for cache reads ($0.30 per million tokens - cost at the time of writing this post) rather than full input processing ($3.00 per million tokens - cost at the time of writing this post).
How Context Space is managed by computing tokens consumption of different Claude Code components?
Claude outputs this with a nice visual representation where:
- Each block (⛁/⛶/⛝) represents 1% of your context window
- 20 blocks per row (so 2 rows = 40%, 5 rows = 100%)
- Different symbols indicate different categories:
- ⛁ = Used tokens
- ⛶ = Free space
- ⛝ = Autocompact buffer (reserved space)
- Here are the different groups that can be seen consuming tokens in Claude Code session:
- System Prompt
- System Tools
- Memory Files
- Messages
- Optimize system components (CLAUDE.md, tools) for size -- they permanently occupy context space for the entire session
- Monitor MESSAGES growth -- this is what actually reduces free space over time
- Use `/compact` to compress messages -- this frees up actual context space by summarizing conversation history
- Start new sessions for unrelated tasks -- don't carry dead weight across tasks.