The Model Asks. The Harness Acts. What "Tool Calling" Actually Means.

By Leigh Garrity— May 8, 2026

The Model Asks. The Harness Acts. What "Tool Calling" Actually Means.

The Two Actors

Comparison approach: This is a trait-led parallel profile. Both actors are described across the same fields, in the same order. The comparison section maps specific dimensions — control, credentials, auditability, failure surface — to each actor. With two subjects, this is more useful than clustering, because the traits are exactly what a buyer's security team will ask about.

Subject 1: The Model Side

What it is: The actor that produces a structured request instead of prose, and stops there.

What it does: When a language model is configured with tools, it can respond to a prompt not by writing a paragraph but by emitting something like: call search_email, arguments: {query: "Q1 budget", date_range: "last 90 days"}. That's it. The model identified that a tool call was appropriate, selected the right tool from the available list, and specified the arguments. It produced an output. It did not run anything.

The model operates entirely within its inference context. It sees the prompt, the conversation history, the tool definitions it's been given, and whatever results previous tool calls have returned. It reasons over that context and produces the next output, which might be prose or might be another tool call request. Either way, it's output. The model has no network access, no credentials, no ability to reach outside its own inference process.

Who's behind it / where it comes from: The model is the AI provider's product. OpenAI's GPT-4o, Anthropic's Claude, Google's Gemini — these run on the provider's infrastructure, under the provider's terms of service, with the provider's safety systems applied. The deploying organization doesn't control the model's weights or its reasoning process. They control what tools the model is told about, what context it receives, and what it's asked to do. The model itself is not theirs.

What makes it distinct: The model has opinions, not keys. It can tell you it wants to search your email. It cannot search your email. No credentials, no network access, nothing except the ability to produce a very specific kind of output. That output looks like an instruction, which is why people assume the model is doing something. It isn't. It's asking.

“

IDAM Parallel: Context Window ≠ Session

The model's context window holds conversation history, tool results, and system instructions — it looks like session state. But it isn't. A session in the Okta sense carries authenticated identity, has a defined lifetime, and can be revoked. The model's context carries no authenticated identity of its own, has no native expiration mechanism, and cannot be revoked mid-inference. The analogy helps explain why context matters; it breaks the moment someone asks "so who is the model authenticated as?" The answer is: the model isn't authenticated as anyone. The harness is.

Subject 2: The Harness Side

What it is: The actor that receives the model's structured request, executes it with real credentials, and returns the result to context.

What it does: When the model emits a tool call request, something has to receive it. That something is the harness — the orchestration layer sitting between the model and the systems the model wants to interact with. The harness reads the model's request, validates it against whatever policy it's been configured with, calls the actual API or function using actual credentials, and feeds the result back into the model's context. Then the loop continues.

Execution lives in the harness. When an AI agent "searches your email," what actually happened is: the harness received a request from the model, authenticated to your email system using a service account or OAuth token, ran the search, and returned the results. The model saw the results. The model didn't do the search.

Who's behind it / where it comes from: The harness is the deploying organization's responsibility. It might be a framework (LangChain, LlamaIndex, AutoGen), a platform (a vendor's agent runtime), or custom code someone on the engineering team wrote. Wherever it comes from, the organization operating it owns its behavior. The harness runs in the deploying org's environment, holds the deploying org's credentials, and makes API calls that appear in the deploying org's audit logs. The AI provider has no visibility into what the harness does.

What makes it distinct: The harness is where the actual risk lives. It has credentials. It makes real API calls that create real audit log entries and real side effects. When a security team asks "what can this AI agent do to our systems?" — that's a harness question. The answer lives in the harness configuration: which tools are registered, what credentials they use, what scope those credentials carry, and whether there's any policy layer between the model's request and the harness's execution.

“

IDAM Parallel: The Harness as Machine Identity

The harness behaves like a service account: it authenticates to downstream systems, carries scoped credentials, and acts on behalf of something else (in this case, the model's requests). Okta's service account governance and machine identity features map directly here — the harness is exactly the kind of non-human actor those controls were designed for. The analogy holds well in a buyer conversation right up until someone asks about dynamic credential issuance mid-loop. Most harness implementations today use static credentials or OAuth tokens issued at startup, not per-request credentials. That gap is real and worth naming.

Comparison: What Each Actor Controls

The five dimensions that matter in a buyer conversation are: where execution lives, who controls it, where credentials live, what it produces, and what breaks when something goes wrong.

Where execution lives

The model produces requests. The harness executes them. The model's output is data. The harness's output is action. Conflating these two things is what leads buyers to ask "how do we prevent the AI from accessing our HR system?" when the actual question is "how do we configure the harness so it won't execute tool calls against our HR system?"

Who controls it

The model is controlled by the AI provider. The deploying organization controls the harness. For governance conversations, this matters: the organization cannot audit the model's reasoning process, cannot inspect its weights, and cannot modify its behavior beyond the inputs they provide. They can audit the harness completely — every tool call it executed, every credential it used, every result it returned. Governance attaches to the harness, not the model.

Where credentials live

The model has no credentials. None. It doesn't know what credentials the harness is using, and it can't request new ones. The harness holds all credentials — API keys, OAuth tokens, service account secrets — and uses them on behalf of the model's requests. Credential scope, rotation policy, and least-privilege configuration are entirely harness-side concerns. A model that's been told it can call a tool will always ask to call it; whether the harness actually has permission to do so is a separate question.

What it produces

The model produces a structured request: a tool name and arguments. The harness produces a real-world side effect: a database query, an email sent, a ticket created, a file modified. The model's output is reversible in the sense that you can ignore it. The harness's output often isn't.

Failure modes

Model failures look like: wrong tool selected, hallucinated argument values, tool call when prose was appropriate, infinite loop of requests. Harness failures look like: authentication error, rate limit exceeded, permission denied, network timeout, downstream API returning unexpected schema. These failure modes require different responses. Model failures are often addressed by improving the system prompt or tool definitions. Harness failures are infrastructure and identity problems — they require the same operational response as any other service account or API integration failure.

“

IDAM Parallel: Authorization Policy Lives at the Harness

In a zero trust architecture, policy enforcement happens at the resource, not at the requester. The harness is the enforcement point for AI tool calls — it's where you decide whether to execute what the model asked for, using what credentials, against what systems. Okta's Fine-Grained Authorization and similar policy engines can sit in front of harness execution to enforce least-privilege at the tool level. The model can request anything it's been told is available; the harness decides what actually runs. In a buyer conversation, that reframe moves the governance question from "how do we control the AI?" to "how do we configure our enforcement layer?" — a question the security team already knows how to answer.

The Loop, Plainly Stated

Model emits request. Harness executes with credentials. Result returns to context. Repeat.

That's the agentic AI loop. Almost every implementation of "AI agents" in production today reduces to some version of this. The sophistication is in the model's reasoning — how it decides which tool to call, how it interprets results, how it chains multiple calls together toward a goal. The mechanics are almost disappointingly straightforward: structured output, function call, result injection, next inference.

Every iteration of the loop is a credentialed action taken by the harness. A ten-step agent task is ten API calls, each one authenticated, each one auditable, each one subject to the access controls on the harness's credentials. The model's reasoning is opaque. The harness's actions are not.

How to Say This in the Field

Don't say	Do say	Why it matters
"The AI accessed your email"	"The harness called the email API using a service account"	Execution lives in the harness — that's where your audit trail is
"The model called the API"	"The model requested a tool call; the harness ran it"	Conflating the two actors obscures where credentials actually live
"The AI has access to your systems"	"The harness has credentials scoped to specific systems"	Access is a harness configuration property, not a model property
"The AI decided to do that"	"The model emitted a tool call request; the harness executed it"	Decision and execution are separate layers with separate governance
"The model is running in your environment"	"The harness runs in your environment; the model runs at the provider"	These two actors often live in different trust boundaries
"The AI agent did X"	"The harness executed X on behalf of the model's request"	"Agent" is ambiguous; naming the actor is more precise
"The model has credentials"	"The harness holds credentials; the model never sees them"	Credential scope is a harness configuration question, full stop
"You can't control what the AI does"	"You control what the harness is configured to execute"	Reframes AI governance as an infrastructure problem, which is solvable
"The model knows your data"	"The model sees what the harness returns to context"	The model's knowledge is bounded by what the harness returns
"The AI is making API calls"	"The harness is making API calls based on the model's requests"	Clarifies where rate limits, auth failures, and audit entries appear

What This Buys You

The buyer asking "how do we prevent the AI from doing something it shouldn't?" is asking a harness question. So is the buyer asking "how do we audit what the AI did?" and "whose credentials is the AI using?" Every security, compliance, and identity question in this space lands on the harness side of the architecture.

The model side matters for capability conversations — what the agent can reason about, how reliably it selects the right tool, how it handles ambiguous instructions. But the security team isn't there for capability conversations.

Sellers who walk in knowing this distinction aren't just sounding credible. They're asking the right questions before the security team does, which is a different kind of credible entirely.