A "session" in Claude Code:
- Starts with `claude` command that you run in the bash terminal to start Claude Code
- Multiple back-and-forth exchanges with the LLM
- End with `/exit` command or by pressing Ctrl+C
Within that single session:
- System components (CLAUDE.md files, Tools) load once at start
- Conversation history accumulates with each exchange
- Context window fills up over time
- Cache reduces costs but doesn't free context space
When you `/exit` and start a new session, everything resets -- new context window, fresh cache, clean slate. For effective Claude Code usage, understanding both caching (cost optimization) and context space (capacity management) is crucial.
And before we delve into caching and context space, lets take a look into how a representative session context window looks as below, so we learn what we should optimize and how we do it:
How Caching works in Claude Code with Anthropic API backend?
Claude Code enables prompt caching by default and so you don't need to configure it manually. When you start a Claude Code session:- Initial request: Claude Code sends the complete system prompt (including CLAUDE.md, tool definitions, and memory files) to the Anthropic API
- Anthropic caches this prefix: The API recognizes the stable beginning of your prompt and caches it
- Subsequent requests: Only the dynamic parts (your new messages, tool outputs) are sent; the cached prefix is reused
Aside: Prompt caching operates on a prefix matching principle. The prefix must be byte-for-byte identical for caching to work. If anything changes in the first part of the prompt, the entire cache is invalidated.
Notably, the following actions will invalidate your cache:
- Changing tool order between requests
- Adding/removing tools mid-session
- Modifying the system prompt (including adding timestamps)
- Switching models (cache is model-specific)
- Using different cache_control markers
So technically, your system components (System prompt, CLAUDE.md, tool definitions, etc.) is in every request, but after the first request, you only pay for cache reads ($0.30 per million tokens - cost at the time of writing this post) rather than full input processing ($3.00 per million tokens - cost at the time of writing this post).
How Context Space is managed by computing tokens consumption of different Claude Code components?
Claude outputs this with a nice visual representation where:
- Each block (⛁/⛶/⛝) represents 1% of your context window
- 20 blocks per row (so 2 rows = 40%, 5 rows = 100%)
- Different symbols indicate different categories:
- ⛁ = Used tokens
- ⛶ = Free space
- ⛝ = Autocompact buffer (reserved space)
- Here are the different groups that can be seen consuming tokens in Claude Code session:
- System Prompt
- System Tools
- Memory Files
- Messages
- Optimize system components (CLAUDE.md, tools) for size -- they permanently occupy context space for the entire session.
- Monitor MESSAGES growth frequently -- this is what helps reduces free space proactively for better performance. The `/context` is your best friend here sharing data in a manner you will find easy to grok.
- Use `/compact` to compress messages -- this frees up actual context space by summarizing conversation history. It preserves cache and costs less long-term. It works by first reading the full conversation history and then replacing it with a condensed summary.
- Start new sessions for unrelated tasks -- don't carry dead weight across tasks. Use `/exit` and then restart Claude session to start afresh a new session on a clean state without any baggage. DO NOT use `/clear` command as shortcut that it is supposed to be; it has issues that remains unfixed as of today.
How to check what is currently loaded?
- /memory — Shows which CLAUDE.md and memory files are active.
- /context — Shows a breakdown of token usage, including which skills are currently occupying space in the context window (as detailed in the earlier section).
