The Two Numbers Every AI Conversation Runs On

By Carey Whitten— May 5, 2026

The Two Numbers Every AI Conversation Runs On

What a Token Actually Is

The model never sees text. It sees numbers: integer IDs that correspond to sub-word chunks called tokens. Before any text reaches the model, a tokenizer breaks it into these chunks according to a vocabulary the model was trained with. The chunks sit somewhere between characters and words, with boundaries determined by statistical frequency in the training corpus.

In English, the rough conversion is about four characters per token — but that's an average, not a rule. "Unbelievable" might tokenize into three pieces. "AI" is likely one. A word that appears constantly in training data gets its own token; a rare compound gets split. Languages with different scripts tokenize differently, often less efficiently, meaning the same semantic content costs more tokens in some languages than others.

When a vendor quotes you token counts, they are quoting the output of a specific tokenizer applied to specific text. The number is real and measurable, just not the number you'd intuitively estimate from word count or character count.

The Context Window: A Shared Budget

The context window is the total number of tokens a model can process in a single call. Everything the model sees in that call draws from the same pool: the instructions you give it, the conversation history, any documents you've included, and the response it generates. Input and output are not separate allocations. They are competing claims on one budget.

For scale: a short email runs roughly 100–200 tokens. A 20-page policy document might run 5,000–8,000 tokens. A model with a 128,000-token context window sounds enormous until you're feeding it a document corpus and expecting a detailed response, at which point the math gets tight quickly. (Context window sizes and per-token pricing shift as providers update their models and pricing tiers; treat any specific figure as a benchmark to verify, not a number to memorize.)

Cost scales with tokens consumed: the sum of input tokens and output tokens in a given call. Model quality can also degrade as the window fills. Research from multiple providers suggests that models attend better to content near the beginning and end of a long context than to content buried in the middle. A full context window produces a valid call; it may just be a quietly worse one.

What This Means When a CIO Asks "Can We Just Feed It Our Docs?"

You will hear some version of this question. An agency has a document corpus: policy files, acquisition guidance, technical standards. The assumption is that feeding it to a model is roughly analogous to giving someone a filing cabinet. One-time load, then ask questions.

The context window is why that assumption needs unpacking. Each model call is stateless and starts fresh. If you want the model to reference a 200-page document, those pages need to be in the context window of that specific call. If the document corpus is 500,000 tokens and the model's context window is 128,000, the corpus does not fit in a single call. Full stop. The solution to that problem: retrieval, chunking, document selection. That's real engineering work, not a configuration setting.

The quick estimation heuristic: multiply estimated page count by 250 (a rough words-per-page average for dense government documents), then multiply by 1.3 to convert words to tokens. A 40-page RFP is roughly 13,000 tokens. A 200-page technical standard is roughly 65,000. Those numbers tell you whether you're in single-call territory or whether the conversation needs to go somewhere more nuanced.

“

Okta Concept Mapping

The closest IDAM analogy is the session — a bounded, time-limited context in which an authenticated principal can act. The context window is a bounded, capacity-limited context in which a model can process and respond. Both are containers with edges.

The analogy breaks in three places. First, a session doesn't have a shared budget between what you bring in and what you get back. In a context window, every input token is an output token you can't have. Second, a session fails cleanly when it expires — you get an error, you reauthenticate. A context window that's nearly full doesn't throw an exception; it degrades quietly. Third, and most consequentially: sessions carry state across interactions. Context windows don't. Each model call starts with no memory of previous calls unless you explicitly pass prior context back in as tokens. An AE who maps "context window" to "session" will assume the model remembers the last conversation. It doesn't. That assumption, carried into a buyer conversation about agentic workflows, will cost you credibility at exactly the wrong moment.

Before any conversation about feeding agency documents to a model, run the page-count math. Know whether you're in single-call territory. If you're not, you're in a different conversation, one that involves architecture rather than configuration, and knowing that before the CIO asks is the difference between leading the discussion and catching up to it.