Market Pulse

Market Pulse

The Data Perimeter Split Both Architectures in Half

Last week, Anthropic and OpenAI both responded to the same brute constraint: regulated enterprise data cannot leave the building. Each vendor split the agent to accommodate that boundary. Anthropic moved tool execution inside the customer perimeter but kept the reasoning loop on its own infrastructure. OpenAI announced an on-premises partnership with Dell without specifying where inference actually runs. Neither architecture co-locates governance with the consequential decisions. When something goes wrong, the person who needs to intervene can kill the connection but has no way to inspect or redirect the reasoning that got there.
The Data Perimeter Split Both Architectures in Half
Last week, Anthropic and OpenAI both responded to the same brute constraint: regulated enterprise data cannot leave the building. Each vendor split the agent to accommodate that boundary. Anthropic moved tool execution inside the customer perimeter but kept the reasoning loop on its own infrastructure. OpenAI announced an on-premises partnership with Dell without specifying where inference actually runs. Neither architecture co-locates governance with the consequential decisions. When something goes wrong, the person who needs to intervene can kill the connection but has no way to inspect or redirect the reasoning that got there.

Research Spotlight
Towards a Science of AI Agent Reliability
Structured tasks show moderate gains; open-ended tasks show almost none. Single success metrics hide the gap.
The Princeton team running the Science of AI Agent Evaluation program, with a related paper accepted to ICLR 2026.
Research Spotlight
Consistency Amplifies: How Behavioral Variance Shapes Agent Accuracy
Interpretation accuracy matters more than execution consistency. Running it again won't help if the reading was wrong from the start.
Only if the underlying assumption is correct. Otherwise you've just automated the same mistake at higher confidence.
Research Spotlight
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents
69% of trajectory divergence occurs at step two. Early commitments cascade through the rest of execution.
No. Behavioral variance persists regardless, suggesting the instability runs deeper than sampling randomness.
Research Spotlight
ReliabilityBench
Pass-at-one scores can run more than double the actual multi-trial consistency rate.
A demo that works is not evidence of a system that works repeatedly. The gap is measurable and large.
Pricing Shift

Per-seat pricing dropped from 21% to 15% of SaaS companies in twelve months. Hybrid models surged to 41%. Everyone agrees the old model is dying. Nobody agrees what replaces it.
Intercom charges $0.99 per resolution. HubSpot, since April 2026, undercuts at $0.50. Both say "resolved conversation." Intercom means the customer stopped replying. HubSpot means no human handoff within 72 hours. Same noun, wildly different measurements, entirely different incentive structures.
Pricing has outrun the definitions it depends on.
Further Reading




Past Articles

A validation rule in Salesforce fires when a rep moves a deal to "Closed Lost" — won't save without a reason. Nobody cal...

A support platform prices its AI agents per automated resolution: no escalation, ticket not reopened within 72 hours. Cl...

Researchers found tens of thousands of exposed OpenClaw agent instances and over a million compromised API tokens. The e...

When AWS customers reportedly greeted OpenAI's arrival on Bedrock with a shrug, the obvious read was that OpenAI had los...
