Understanding a Model Spec Sheet

By Leigh Garrity— May 8, 2026

Numbers in the sample spec below are illustrative, modeled on published frontier provider documentation as of early 2026. Provider specs change without notice — always verify against current source documentation before a customer conversation.

You've worked through five mental models. This document puts them to work on the artifact you'll actually encounter in a customer conversation: the model spec sheet. Every field below activates one or more of those models. The vocabulary should stick to something real now, not float in the abstract.

The Spec Sheet

This is a composite, representative spec — plausible structure and values, not attributed to any single provider.

Field	Value
Model	Apex-1 (composite frontier model)
Context Length	128,000 tokens
Input Pricing	$3.00 / 1M tokens
Output Pricing	$12.00 / 1M tokens
Supported Modalities	Text, image, document (PDF)
Reasoning Mode	Available (extended)
Knowledge Cutoff	March 2025
Architecture	Transformer, decoder-only, dense

Annotated Fields

Context Length: 128,000 tokens — The maximum amount of text the model can hold in view at once, measured in tokens. Everything the model "knows" during a single interaction lives here.

When it comes up: A prospect asks whether the model can process a long policy document or a full conversation history. Context length is the answer — and the constraint. The tokens-as-currency model applies directly: this number is the budget. When it's spent, the model either stops or starts dropping earlier content. Next-token prediction is why this matters mechanically — the model can only predict the next token from what's in the window. What's outside the window doesn't exist.

Don't confuse with: Session length or conversation history in an IDAM context. A security session has a timeout and a state store. The context window has neither — it's a fixed-size buffer, not a persistent record.

Input Pricing: $3.00 / 1M tokens | Output Pricing: $12.00 / 1M tokens — What you pay per million tokens sent to the model (input) and generated by the model (output). Output costs more because generating tokens is computationally heavier than reading them.

When it comes up: Any conversation about cost modeling, ROI, or "what does this actually cost to run?" The asymmetry between input and output pricing is the number your customer's finance team will ask about. Tokens-as-currency again: every token in the context window costs money, and every token the model generates costs more money. A reasoning-mode call generates significantly more output tokens than a standard call — that asymmetry compounds.

Don't confuse with: API rate limits or seat licensing. Token pricing is consumption-based, not seat-based. The cost model is fundamentally different from how your customer thinks about SaaS licensing.

Supported Modalities: Text, image, document (PDF) — The input types the model can process. A multimodal model converts each input type into the same underlying representation before reasoning over it.

When it comes up: A prospect asks whether the model can "read" a scanned contract or analyze a diagram. The answer lives here. Embeddings-as-meaning is the mechanism: images, PDFs, and text all get converted into vectors in the same mathematical space, which is why the model can reason across them. It's not reading the image the way you read it — it's converting the image into a representation that lives in the same space as words.

Don't confuse with: File storage or document management. Modality support is about what the model can process as input, not what it can store or retrieve. The model doesn't retain the document after the context window closes.

Reasoning Mode: Available (extended) — An optional operating mode where the model generates intermediate reasoning steps before producing a final answer. Slower, more expensive, more accurate on complex tasks.

When it comes up: A prospect asks why one model is "smarter" on hard problems, or why a reasoning-capable model costs more. Reasoning-as-extra-tokens is the explanation: the model isn't running a different algorithm — it's generating more tokens. Those tokens are the work. Extended reasoning mode makes that process explicit and longer, which is why it costs more and takes more time.

Don't confuse with: Any IDAM concept. "Reasoning mode" has no analogue in identity infrastructure. When a customer uses this term, they mean the model is doing more deliberate, step-by-step generation — not that it's applying a policy engine or a rules framework.

Knowledge Cutoff: March 2025 — The date after which the model has no training data. Events, documents, and facts that postdate this cutoff are unknown to the model unless provided in the context window.

When it comes up: Any time a prospect assumes the model "knows" current information — recent CVEs, updated regulations, new agency guidance. This field is where grounding-not-trusting becomes a practical requirement, not a philosophical one. The model will generate plausible-sounding text about post-cutoff events because next-token prediction doesn't stop at the cutoff — it just starts confabulating. Grounding (injecting current, authoritative content into the context window) is the architectural answer to this field.

Don't confuse with: A cache expiration or a stale token. The knowledge cutoff isn't a refresh problem with a TTL solution. The model's weights are fixed at training time. Updating them requires retraining or fine-tuning — not a configuration change.

Architecture: Transformer, decoder-only, dense — The structural family this model belongs to. Decoder-only means the model generates output sequentially, left to right, one token at a time. Dense means all parameters activate on every inference call.

When it comes up: Rarely in a sales conversation, but occasionally in a technical deep-dive with a customer's AI team. "Decoder-only" is the architectural description of a model built to predict the next token — that's the next-token prediction model, stated in spec-sheet language. Dense vs. sparse (mixture-of-experts) affects cost and latency but not the fundamental behavior.

Don't confuse with: Network architecture in the infrastructure sense. This isn't about topology or routing. It's a description of how the model's parameters are organized and activated.

If you remember nothing else from this section: the spec sheet is a budget document. It tells you the size of the working memory, the cost per unit of work, and the date after which the model is flying blind without help.

Vocabulary Collision Zones

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	The smallest unit of text the model processes; also the billing unit	OAuth bearer token; session token	An LLM token is a text fragment (roughly ¾ of a word). An OAuth token is a credential. They share a name and nothing else. Using "token" without qualification in a mixed conversation will cause confusion every time.
Context	The full content currently in the model's active window — everything it can "see"	Security context; authentication context; session context	In IDAM, context is metadata about a principal's state. In LLMs, context is the raw content the model is processing. One is about who; the other is about what.
Grounding	Injecting authoritative, current content into the context window so the model reasons from facts rather than training data	Trust framework; trust anchor; root of trust	In IDAM, grounding means establishing a trust root. In LLMs, grounding means supplying facts. The word implies reliability in both cases, but the mechanism is completely different — one is cryptographic, the other is editorial.

If you remember nothing else from this section: when AI and IDAM share a word, the collision is the lesson. Name it out loud in the conversation before it creates confusion.

What the Spec Sheet Doesn't Tell You

The spec sheet tells you what the model can process, how much it costs, and where its knowledge ends.

It says nothing about how your customer's data gets in front of the model. How their internal tools connect to it, how access to those tools gets controlled, what happens when the model needs to act on a system the customer hasn't explicitly authorized — none of that is in here.

That's the next question. The spec sheet ends at the model's boundary. Everything on the other side of that boundary — the retrieval systems, the tool integrations, the authorization decisions — is where identity infrastructure enters the picture.

That's where we're going next.

Source Index

Concept	Source Article	Section
Next-token prediction	AI Foundations: How Language Models Actually Work	"The Mechanism"
Tokens-as-currency	AI Foundations: Tokens and Cost	"What You're Paying For"
Embeddings-as-meaning	AI Foundations: How Models Represent Meaning	"From Words to Vectors"
Grounding-not-trusting	AI Foundations: What Models Know and Don't Know	"The Cutoff Problem"
Reasoning-as-extra-tokens	AI Foundations: Reasoning Models	"What Extended Thinking Actually Does"
Context window mechanics	AI Foundations: How Language Models Actually Work	"The Window"
Modality processing	AI Foundations: Multimodal Models	"One Space, Many Inputs"