The Unit Underneath Everything: Tokens and the Context Window

By Carey Whitten— May 5, 2026

The Unit Underneath Everything: Tokens and the Context Window

Before anything else, a disambiguation. In your world, a tokenis an OAuth bearer credential, a SAML artifact, a TOTP code — a unit of authentication. In the AI world, a token is a unit of text. Same word, completely different architecture. The collision matters because you'll hear both in the same meeting, and the speaker will assume you know which one they mean.

What a Token Is

A language model doesn't read text the way you do. It processes text that has been broken into sub-word chunks called tokens. In English, a token averages roughly four characters — not words, not letters, but something in between. Common short words ("the," "is," "a") are typically single tokens. Longer or rarer words get split at morpheme-like boundaries: "tokenization" becomes something like three tokens, "authentication" becomes four or five. Numbers, punctuation, and non-English text tokenize differently, often less efficiently.

The tokenization step happens before the model sees anything. Text goes in, gets converted to a sequence of integer IDs (one per token), and the model works entirely in that numeric representation. The output — the model's response — gets converted back to text on the way out. You never see the token IDs directly, but they're what's being processed and what's being billed.

A rough calibration: 100 tokens is approximately 75 English words. A standard single-spaced page of text runs about 500 tokens. A 10-page policy document lands somewhere between 4,000 and 6,000 tokens, depending on formatting, tables, and whitespace.

The Context Window

Every call to a language model operates within a fixed token budget called the context window. This budget covers everything: the instructions you send, the documents you include, the conversation history, and the model's response. Input and output together, in a single call.

Current production models vary widely. Smaller, faster models might offer 8,000 to 32,000 tokens. Frontier models advertise context windows of 128,000 tokens or more. A handful of models now claim windows exceeding one million tokens — roughly 750,000 words, or about 1,500 pages of text.

When a call exceeds the context window, the model doesn't gracefully summarize what it can't fit. It truncates, or the API returns an error. Either way, the content that didn't fit wasn't processed. Context windows are hard limits, not soft suggestions.

The Practical Ceiling

The one-million-token context window is real, and its usefulness at that scale is a separate matter.

Research on long-context model behavior consistently shows quality degradation as context fills. Models attend well to content near the beginning and end of the context window; material buried in the middle gets processed but weighted less reliably. If you hand a model a 500-page document and ask a question whose answer is on page 247, you're betting on the model's ability to surface information from exactly the part of the context it handles least well.

Cost compounds this. Vendors price per token, and the pricing is linear — double the tokens, double the cost. Typical input pricing for frontier models runs in the range of a few cents per thousand tokens; output tokens are priced higher, often two to five times the input rate, because generation is computationally more expensive than processing. A call that sends 100,000 tokens of context and receives 2,000 tokens of output will cost meaningfully more than ten calls that each send 8,000 tokens of focused context and receive 200 tokens of output. The math favors precision over comprehensiveness.

Agencies and large organizations evaluating AI for document-intensive workflows — policy review, contract analysis, regulatory compliance — will hit context window economics quickly. A CIO asking "can it read our entire policy library at once?" is asking a question with a technically-yes, operationally-complicated answer.

Estimating Fit and Cost

You don't need a tokenizer to estimate. Use the 75-words-per-100-tokens ratio as a working approximation. Take your document's word count, divide by 75, multiply by 100. A 3,000-word briefing is roughly 4,000 tokens. A 15,000-word RFP is roughly 20,000 tokens. Add your expected response length. Compare to the model's published context window.

For cost estimation, take your total token count, divide by 1,000, and multiply by the vendor's published per-thousand-token rate. If a vendor charges $0.01 per 1,000 input tokens and you're sending 50,000 tokens per call, that's $0.50 per call. At 1,000 calls per day, that's $500 per day. The numbers are illustrative — actual rates vary by model, vendor, and contract — but the arithmetic is the right arithmetic.

When a buyer asks about AI costs, this is the unit the answer will be denominated in.

“

Okta Concept Mapping

Closest IDAM analog: A session token — a bounded unit of exchange that carries meaning within a system and expires or gets consumed.

Where it holds: Both OAuth tokens and LLM tokens are the fundamental unit of value exchange in their respective architectures. Both have defined scopes and lifetimes. Both are what the system actually operates on.

Where it breaks: An OAuth bearer token is an opaque reference — it points to an authorization decision stored elsewhere, and its length has no bearing on what it costs to process. An LLM token IS the content, in processed form, and length is exactly what drives cost. A longer OAuth token doesn't cost more to validate. A longer LLM context costs linearly more to run. That structural difference is why "token budget" is a real operational constraint in AI in a way it simply isn't in IDAM.

The next time a CAIO mentions token costs or context limits, you have the mechanical picture behind the terminology. Not enough to architect the system — enough to ask the right questions and recognize when the answer is incomplete.

Everything else about how models work builds from here.