Last week, within a day of each other, Anthropic released self-hosted sandboxes and MCP tunnels for Claude Managed Agents. OpenAI announced a partnership with Dell to bring Codex into on-premises enterprise infrastructure. Both respond to the same brute constraint: regulated organizations cannot send sensitive data outside their boundary.
Each vendor split the agent to accommodate that boundary. And each split is incomplete in a way that reveals where control actually lives.
Anthropic's architecture is the more legible of the two. Tool execution moves inside the customer perimeter. Code runs locally, files stay local, network egress stays under the customer's control. But the orchestration plane, the agent loop itself, stays on Anthropic's infrastructure. Context management, error recovery, the sequence of decisions the agent makes. All of that lives outside the building. The customer shapes what the agent is told to do. Anthropic hosts where the reasoning happens. Who bears accountability for a given decision sits somewhere in the gap between those two facts.
OpenAI's Dell partnership is harder to read. The announcement describes connecting Codex to Dell's on-premises AI Data Platform. Whether model inference actually runs on Dell hardware, or whether Dell serves as a data context layer that Codex still calls via cloud API, is not specified. The word "explore" appears in the description of deeper integration. No compliance certifications were disclosed. No GA date. One architecture shipped a defined split with a known gap. The other announced an intent without defining the split at all.
Both are incomplete. The specific shape of each incompleteness matters.
Splitting an agent across a trust boundary splits the audit trail. Anthropic's Claude Console logs decisions on its side. The customer's security tooling covers execution on theirs. Reconciling the two into a single accountable record requires three separate logging layers working in concert. For OpenAI's Codex, telemetry is opt-in and off by default. Whether the Dell configuration routes telemetry locally or to an OpenAI-controlled collector remains unspecified.
Neither architecture co-locates governance with reasoning. The controls and the consequential decisions live on opposite sides of the perimeter. So: an agent takes an action that needs to be stopped or reversed. The customer can see what happened on their execution plane. But the reasoning that led to that action, the context the agent was holding, the error-recovery logic that chose this path over another, lives on the vendor's infrastructure. The person who needs to intervene can cut the connection but has no way to inspect the decision or redirect it. Codex offers an approval-before-execution model that gates individual actions. Anthropic provides per-session controls and a trust hierarchy. Useful mechanisms, both of them. They still leave the override authority and the reasoning state on opposite sides of a network boundary.
Compliance frameworks covering healthcare data residency in 2026 now address where PHI is stored, processed, queried, and where AI models run inference. ITAR effectively requires air-gapped deployment. Regulators are now asking where the thinking happens.
Lock-in follows the same fault line. If orchestration stays with the vendor, switching vendors means migrating the reasoning layer, which is far harder to extract than the execution environment. Every workflow the agent learned belongs to whoever owns the orchestration plane. The data perimeter kept your data inside the building. It said nothing about your agent's accumulated judgment.
The data stayed local. Accountability walked out with the architecture.
Things to follow up on...
-
Anthropic's new compliance integrations: Anthropic announced 28 security and compliance integrations on May 25, piping Claude Compliance API data into tools like Microsoft Purview and CrowdStrike, though coverage still applies to the control plane rather than the reasoning plane.
-
Reliability lags capability gains: A recent Princeton-affiliated paper proposes twelve metrics for agent reliability and finds that across 14 models and 18 months of releases, reliability improvements trail capability progress by a significant margin, which compounds the audit problem when agents reason outside the customer boundary.
-
Agent governance on Gartner's radar: The 2026 Gartner Hype Cycle places agentic AI at Peak of Inflated Expectations, and notably surfaces governance, security, and FinOps for agentic AI as emerging technology profiles alongside core agent capabilities.
-
The telemetry gap compounds: For on-premises agent deployments specifically, analysts have flagged that neither vendor has published reference architectures documenting how telemetry and authentication will work when Codex connects to internal repositories through Dell infrastructure.

