Function Calling vs. XML Tool Calls: The Portability Argument You've Already Won Once

By Leigh Garrity— May 8, 2026

Function Calling vs. XML Tool Calls: The Portability Argument You've Already Won Once

When an AI agent needs to do something — look up a user record, check an entitlement, call an API — it can't just do it. It has to tell the software running it what it wants to do. How that communication happens is an architectural choice, and it's one your buyers are making right now, often without realizing the lock-in implications.

There are two approaches. Native function calling is a first-class API feature: you pass a schema of available tools when you make the model call, and the model returns a structured object when it decides to invoke one. XML-style tool calls are a harness convention: you describe the tools in the system prompt as text, and the model emits tool invocations as text in its output, which your code then parses. The first is cleaner in simple cases. The second is what practitioners have largely converged on for production agent systems. The reason is the same reason SAML exists.

What Each One Actually Does

Native function calling works at the API layer. When you call a model that supports it — OpenAI's GPT-4o, Anthropic's Claude, Google's Gemini — you include a tools parameter alongside your messages. That parameter contains a JSON schema describing each available tool: its name, what it does, and what arguments it accepts. The model reads that schema, decides whether to call a tool, and if so, returns a structured tool_call object in the response rather than (or alongside) a text response. Your harness reads the object, executes the function, passes the result back to the model, and the conversation continues.

The schema format is provider-specific. OpenAI's tools array looks different from Anthropic's tools block, which looks different from Gemini's functionDeclarations. They accomplish the same thing. They are not interchangeable.

XML-style tool calls work at the prompt layer. You describe the available tools in the system prompt — often using XML-like tags to structure the description — and you instruct the model to invoke tools by emitting a specific syntax in its text output. Something like:

<tool_call>
{"name": "get_user_entitlements", "arguments": {"user_id": "u-4821"}}
</tool_call>

Your harness watches the model's text stream, detects that pattern, extracts the JSON, executes the function, and injects the result back into the conversation. The model never interacts with a special API parameter. It just follows instructions, like any other instruction-following task.

This approach works on any model capable of following instructions. It doesn't require native tool support. It doesn't require the model to know it's being used in an agent framework. It requires only that the model can produce consistent structured output when asked to.

“

• Native function calling: The model receives a JSON schema of available tools via the API and returns a structured tool-call object when it decides to invoke one; the format is provider-specific and requires API-level support. • XML-style tool calls: The model is instructed via the system prompt to emit tool invocations as tagged text in its output; the harness parses that text to execute the call; works on any instruction-following model regardless of provider.

How the Mechanism Runs

Native function calling, step by step:

Your harness sends an API request to, say, OpenAI. The request includes the conversation messages and a tools array containing JSON schemas for three functions: get_user_profile, check_policy, revoke_session.
The model determines it needs to call get_user_profile before it can answer the user's question.
The API response comes back with finish_reason: "tool_calls" and a tool_calls array containing the function name and arguments the model wants to pass.
Your harness reads that structured response, executes get_user_profile with those arguments, gets a result.
Your harness sends another API request with the result appended as a tool role message.
The model continues.

The tool call is never visible as text. It lives in a structured field in the API response. If something goes wrong — wrong arguments, unexpected behavior — you're debugging a JSON object in an API response, not a text stream.

XML-style tool calls, step by step:

Your harness sends an API request. The system prompt includes natural-language descriptions of the same three functions, formatted with XML tags, plus an instruction: "When you need to call a tool, emit it in this format: <tool_call>{"name": "...", "arguments": {...}}</tool_call>."
The model determines it needs to call get_user_profile.
The model's text response includes the XML-tagged invocation inline with (or instead of) its reasoning.
Your harness detects the tag, extracts and executes the call, injects the result.
The model continues.

The tool call is visible in the text stream. You can log it, inspect it, replay it. When something goes wrong, you read the model's output like a transcript.

That debuggability difference is not trivial. In a multi-step agent task that touches identity systems, authorization checks, and external APIs in sequence, the ability to read what the model decided to do and in what order is the difference between a two-hour debugging session and a two-day one.

“

Okta Concept Mapping

The IDAM parallel: proprietary federation formats, pre-SAML.

Before SAML ratified a common assertion format, every major identity vendor shipped their own federation protocol. Netegrity had one. Oblix had one. Microsoft had one. If you built your SSO infrastructure on any of them, you were coupled to that vendor's format. Migrating meant rebuilding the integration layer from scratch.

Native function calling is the same pattern. OpenAI's tool schema format, Anthropic's tool schema format, and Gemini's function declaration format accomplish the same thing and are not interchangeable. An agent stack built around OpenAI's native function calling requires rework to run on Anthropic, and vice versa. XML-style tool calls decouple the harness from the provider: swap the model endpoint, keep the parsing logic.

Where the analogy holds: The portability argument is identical. Proprietary formats create switching costs that compound over time as the integration layer deepens. The vendors who bet on proprietary federation formats lost. The portability argument won.

Where it breaks: SAML is a ratified standard with a governance body, interoperability test suites, and formal conformance requirements. XML-style tool calls are a practitioner convention. There is no OASIS equivalent for agent tool syntax, no certification process, no guarantee that two implementations using "XML-style" tool calls will be compatible with each other. The portability is real, but it's informal portability — it holds because practitioners have converged on similar patterns, not because a standard enforces it. That's a meaningful difference if your buyer is a federal agency that procures against standards.

When This Comes Up in Your Accounts

The scenario where this matters most in public sector: a buyer evaluating AI infrastructure across multiple classification levels, or across multiple agency components with different approved model lists.

A civilian agency running unclassified workloads might have access to commercial frontier models. A component handling controlled unclassified information might be limited to a smaller set of approved options. A classified environment might be running a self-hosted open-weight model with no commercial API at all. If the agency wants a single agent framework that works across all three environments, native function calling is a problem. The tool-calling layer would need to be rebuilt for each provider's API format.

XML-style tool calls make the agent harness model-agnostic. The same parsing logic, the same tool descriptions in the system prompt, the same result-injection pattern. They work whether the model endpoint is GPT-4o, Claude 3.7, or a locally hosted Llama derivative. The only thing that changes is the API call itself.

You'll hear this framed as a "multi-model strategy" or "model-agnostic architecture." The technical substance underneath that framing is exactly this: where does the tool-calling logic live, and is it coupled to a specific provider's API format?

When a CIO or enterprise architect asks whether their AI investment will survive a model transition, and they will ask this because they watched their predecessors get locked into on-premises infrastructure they couldn't exit, this is the mechanism that determines the answer.

“

• Portability: XML-style tool calls work on any instruction-following model because they operate at the prompt layer, not the API layer; native function calling requires provider-specific schema formats that don't translate across vendors. • Debuggability: XML-style tool calls are visible in the model's text stream and can be logged and replayed directly; native tool calls live in structured API response fields, which makes tracing multi-step agent behavior more opaque.

The Practical State of Play

Practitioners haven't abandoned native function calling entirely. For simple, single-provider deployments where you control the model endpoint and don't anticipate migration, native function calling is cleaner: the schema validation is tighter, the structured response is easier to parse reliably, and you don't need to write a text parser that handles edge cases in the model's output format.

But in production agent systems with meaningful complexity — multiple tools, multi-step reasoning, environments where the model might change — the community has largely moved toward XML-style. Frameworks like LangChain and LlamaIndex support both, and their documentation increasingly treats XML-style as the more portable default. Anthropic's own published guidance on building agents with Claude emphasizes prompt-layer tool description patterns, even though their API supports native tool use.

The field is moving fast enough that "practitioner consensus" is a claim worth hedging. What's accurate as of mid-2026 is that the portability argument has won the architectural conversation, even if native function calling remains common in simpler implementations. The same way SAML won the federation argument without eliminating proprietary SSO entirely.

Your buyers who've been through an identity infrastructure migration understand this dynamic viscerally. They know what it costs to rebuild an integration layer that was designed for one vendor's format. The AI version of that conversation is happening now, before the lock-in is baked in. That's the moment to be useful.