LLM, context window, and why the hell it keeps forgetting what you told it

You spend an hour drilling project details into the model: your ideas, your vision, how you want things done. Files, dependencies, decisions. It nods along, responds on point. Then — BAM, message 40 and it suggests exactly what you rejected on message 5. Gotta keep a close eye on these LLMs, do ya!

Congratulations, you’ve hit the context window, which is everything the model “sees” while talking to you: your messages, its responses, system instructions, tool descriptions, contents of files it read. All of it one big block of text, measured in tokens.

But everything has a limit, even the context window. E.g. Sonnet has about a million tokens. Sounds like a lot, until you realize: reading one 2,000-line file is ~20k tokens. Five files…and you’ve spent 100k on context.

And LLM remembers nothing between calls. Every time you send a message, the client (Claude Code, Cursor, ChatGPT) collects the full history and sends it to the model in one shot. The model reads, generates a response, and forgets everything. What looks like “memory” is the client’s memory, not the model one. It’s just forwarding history.

sequenceDiagram
    participant Client as Client<br/>(Claude Code / Cursor / ChatGPT)
    participant LLM as LLM<br/>(stateless)
    Note over Client: Stores history
    rect rgb(197, 224, 203)
        Note left of Client: Call 1
        Client->>LLM: system prompt + tools + project files + msg1
        LLM-->>Client: resp1
        Note right of LLM: Forgets everything
    end
    Note over Client: History: msg1 + resp1
    rect rgb(197, 224, 203)
        Note left of Client: Call 2
        Client->>LLM: system prompt + tools + project files + msg1 + resp1 + msg2
        LLM-->>Client: resp2
        Note right of LLM: Forgets everything
    end
    Note over Client: History: msg1 + resp1 + msg2 + resp2
    rect rgb(197, 224, 203)
        Note left of Client: Call 3
        Client->>LLM: system prompt + tools + project files + msg1 + resp1 + msg2 + resp2 + msg3
        LLM-->>Client: resp3
        Note right of LLM: Forgets everything
    end
    Note over Client: Stores everything
    Note over LLM: Stores nothing

Okay, so the full history is being sent. Then why does the model still “forget”?

And it has 3 reasons for that:

  1. Attention dilution. The model spreads attention across all tokens. With 1,000 tokens, each one carries weight. With 500,000, the weight is smeared thin. An important instruction is physically there in the context, but the model just isn’t looking at it. Just like how you stop noticing details in a huge document.

  2. Lost in the middle. Models “remember” the beginning and end of context best. The middle is a blind spot. If your key decision landed at token 200,000 out of a million…geez good luck with that.

  3. Garbage accumulation. Every failed attempt, every file read, every long command output stays in context. After an hour of work, ~60% of the window is junk from failed tries. The model sees it all at once and can’t tell useful from trash.

When the window fills completely, one of two things happens: either the oldest messages get trimmed or compaction starts its work. In Claude Code, it’s compaction: a separate LLM call that compresses old history into a short summary. But details are lost for good: exact values, specific lines of code, reasons why certain approaches were rejected — all gone.

It can confidently hallucinate from fragments. And the worst part of all of it after compaction, the model doesn’t know what it doesn’t know.

how-llm-works grounding mcp agents