Context Window
Theory
The context window is the amount of text an LLM can "see" at once. It includes everything: system prompt, conversation history, and the response being generated. If the total exceeds the limit, the earliest content is dropped.
system prompt
history
user message
...
response
Storage
Persistent memory across calls. The model remembers yesterday.
Working memory
Fixed-size buffer per request. What's out of the window is invisible.
Larger windows hold more history but cost more per request — and attention scales quadratically. Context budgeting is a practical constraint, not a theoretical one.