So… how does LLM actually work?

LLM doesn’t think. It doesn’t read your files. It doesn’t decide anything.

All it does is generate the next token (a token is a chunk of text; sometimes a word, sometimes part of a word, sometimes just punctuation). One token at a time. With some randomness built in.

It’s predicting. That’s it.

But what about file that you asked it to read the other day and it did!

Well… it didn’t. When you send a prompt, the LLM doesn’t just receive your message. It also gets a list of tools available to it: read a file, write a file, search the web, run code, etc.

So when you say “read my file”, the LLM doesn’t understand your request and decide to help. It predicts: user said “read file” + instructions say tool “read file” exists statistically, the next tokens should be a call to that tool.

Prediction + Memory in play.

Next time you use ChatGPT or Claude and it does something that looks smart, remember: it didn’t choose to help you. It predicted that helping is what comes next.

graph TD
    A["Tools list"] --> C["LLM - Predicts next token"]
    B["Your prompt"] --> C
    C --> D{"Prediction:<br/>tool or text?"}
    D -- "Tool" --> E["Tool executes"]
    D -- "Text" --> F["Response to you"]
    E -- "Result back" --> C

    classDef prompt fill:#dee3c6,stroke:#758879,stroke-width:1.5px,color:#2d2d30
    classDef toolsList fill:#e3d7c6,stroke:#8b7e7a,stroke-width:1.5px,color:#2d2d30
    classDef llm fill:#7c838e,stroke:#5a6f8f,stroke-width:1.5px,color:#f5f5f0
    classDef decision fill:#ddc6e3,stroke:#7e7a8b,stroke-width:1.5px,color:#2d2d30
    classDef toolExec fill:#e3c6d2,stroke:#8b7a87,stroke-width:1.5px,color:#2d2d30
    classDef response fill:#c5e0cb,stroke:#758879,stroke-width:1.5px,color:#2d2d30

    class A toolsList
    class B prompt
    class C llm
    class D decision
    class E toolExec
    class F response
grounding mcp context-window neural-angel-part-2