Share

Every spec has an implicit reader. The OpenTelemetry GenAI semantic conventions are worth reading closely, less for what they define than for who they assume is looking at the data.

Take agent identity. The spec defines gen_ai.agent.id, gen_ai.agent.name, and as of v1.40, gen_ai.agent.version. These tell you which software artifact ran. A developer tracing a failed workflow back to a specific agent build gets exactly what they need. An auditor looking at the same run is trying to answer a different set of questions: who authorized this agent to act, under what constraints, and whether those constraints held. The spec identifies the software; the delegation that put the software in motion goes unaddressed.

Tool calls follow the same pattern. gen_ai.tool.call.arguments and gen_ai.tool.call.result capture what went into a tool and what came back. The spec even distinguishes tool types: function for client-side execution, extension for agent-side API calls, datastore for data access. That taxonomy is genuinely useful. It lets a developer reading a trace understand immediately whether a tool ran locally, called an external API, or queried a data store, which matters a lot when you're diagnosing where a workflow broke. There's no attribute, though, for whether the tool call was permitted by policy, approved by a person, or bounded within a stated permission scope. The spec records what happened. Permission and authorization live outside its current vocabulary.

Token usage is well-covered. Input tokens, output tokens, cache-read and cache-creation variants, a required duration metric. An operator can reconstruct what a workflow cost after the fact. What they can't do from the spec alone is set a per-task budget, detect when spending approaches a threshold, or find evidence that a cost limit was enforced. The attributes support cost observation, and cost governance requires infrastructure the spec doesn't describe.

Evaluation events tell a similar story. gen_ai.evaluation.score.value and gen_ai.evaluation.score.label let you record quality assessments: relevance, correctness, pass/fail. These are developer-grade quality signals. A compliance team looking for evidence that human oversight was effectively exercised, as the EU AI Act requires for high-risk systems, would find no corresponding event type. No attribute for a human approval decision. No binding between an approval and the specific action it authorized.

Content capture is off by default, gated behind OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT. The spec offers three modes: not recorded, captured on span attributes, or stored externally with a reference URL. A reasonable privacy-first design for debugging. Accountability, where reconstructibility is the whole point, needs something more.

None of this makes the conventions wrong. Every attribute in the GenAI namespace is still marked "Development" with no public stabilization timeline. And the conventions are genuinely useful for the problem they were designed around: helping developers understand what their GenAI applications are doing. The implicit reader is someone with access to traces, looking for bugs.

The gap appears when that reader changes. When it's an auditor tracing a delegation chain, an operator enforcing cost boundaries, a compliance team documenting that oversight actually happened. Production agent systems need traces that carry authorization records, approval evidence, and policy enforcement artifacts the spec doesn't yet reach. The distance between what the spec captures and what these readers need is where enterprise adoption friction quietly accumulates. Debugging and accountability draw on the same underlying data, and they need very different things from it.

Things to follow up on...

Agent identity without attestation: A recent paper on verifiable delegation found that a security scan of roughly 2,000 MCP servers showed every single one lacked authentication, highlighting the gap between having an agent ID attribute and proving who authorized the agent to act.
Governance as enforcement, not dashboards: A 2026 research paper on governance-aware agent telemetry found that a dashboard-only approach to governance achieved only 27.1% violation prevention, characterizing current OTel-based tools as focused on post-hoc analysis rather than closed-loop enforcement.
OWASP's agentic security frame: OWASP published its Top 10 for Agentic Applications in December 2025, shifting the security conversation from "can the model be tricked" to what authority and side effects exist when it is.
Non-human identity risks: The OWASP Non-Human Identities Top 10 catalogs risks like overprivileged credentials, long-lived secrets, and improper offboarding that represent real attack surfaces no observability spec currently detects or records.

The OTel GenAI Spec Has an Implicit Reader