Understanding Session in Claude Code

A "session" in Claude Code:

Starts with `claude` command that you run in the bash terminal to start Claude Code
Multiple back-and-forth exchanges with the LLM
End with `/exit` command or by pressing Ctrl+C

Within that single session:

System components (CLAUDE.md files, Tools) load once at start
Conversation history accumulates with each exchange
Context window fills up over time
Cache reduces costs but doesn't free context space

When you `/exit` and start a new session, everything resets -- new context window, fresh cache, clean slate. For effective Claude Code usage, understanding both caching (cost optimization) and context space (capacity management) is crucial.

And before we delve into caching and context space, lets take a look into how a representative session context window looks as below, so we learn what we should optimize and how we do it:

How Caching works in Claude Code with Anthropic API backend?

Claude Code enables prompt caching by default and so you don't need to configure it manually. When you start a Claude Code session:

Initial request: Claude Code sends the complete system prompt (including CLAUDE.md, tool definitions, and memory files) to the Anthropic API
Anthropic caches this prefix: The API recognizes the stable beginning of your prompt and caches it
Subsequent requests: Only the dynamic parts (your new messages, tool outputs) are sent; the cached prefix is reused

Aside: Prompt caching operates on a prefix matching principle. The prefix must be byte-for-byte identical for caching to work. If anything changes in the first part of the prompt, the entire cache is invalidated.

Notably, the following actions will invalidate your cache:

Changing tool order between requests
Adding/removing tools mid-session
Modifying the system prompt (including adding timestamps)
Switching models (cache is model-specific)
Using different cache_control markers

So technically, your system components (System prompt, CLAUDE.md, tool definitions, etc.) is in every request, but after the first request, you only pay for cache reads ($0.30 per million tokens - cost at the time of writing this post) rather than full input processing ($3.00 per million tokens - cost at the time of writing this post).

How Context Space is managed by computing tokens consumption of different Claude Code components?

Think of Context Space as your session's memory bucket and there is only so much it can hold requiring you to keep track of your tokens consumption in Claude Code's API backend memory. The `/context` command in Claude comes handy for you, revealing how exactly your Claude Code session is consuming tokens.

Claude outputs this with a nice visual representation where:

Each block (⛁/⛶/⛝) represents 1% of your context window
20 blocks per row (so 2 rows = 40%, 5 rows = 100%)
Different symbols indicate different categories:

⛁ = Used tokens
⛶ = Free space
⛝ = Autocompact buffer (reserved space)

Here are the different groups that can be seen consuming tokens in Claude Code session:

System Prompt
System Tools
Memory Files
Messages

But what can be healthy or unhealthy token consumption by different groups?

Do the following for better context space management:

Optimize system components (CLAUDE.md, tools) for size -- they permanently occupy context space for the entire session.
Monitor MESSAGES growth frequently -- this is what helps reduces free space proactively for better performance. The `/context` is your best friend here sharing data in a manner you will find easy to grok.
Use `/compact` to compress messages -- this frees up actual context space by summarizing conversation history. It preserves cache and costs less long-term. It works by first reading the full conversation history and then replacing it with a condensed summary.
Start new sessions for unrelated tasks -- don't carry dead weight across tasks. Use `/exit` and then restart Claude session to start afresh a new session on a clean state without any baggage. DO NOT use `/clear` command as shortcut that it is supposed to be; it has issues that remains unfixed as of today.

How to check what is currently loaded?

You can verify exactly what Claude "sees" in your current session by running:

/memory — Shows which CLAUDE.md and memory files are active.
/context — Shows a breakdown of token usage, including which skills are currently occupying space in the context window (as detailed in the earlier section).

Blog @ Codonomics

Search This Blog

Buy @ Amazon