Vision

Vision

The Quiet Part

In a controlled trial, experienced developers using AI tools estimated they'd gotten about 20% faster. Measured result: 19% slower. What lingers is that completing the work, experiencing the full arc of it, didn't correct their perception. Meanwhile, 74% of teams running agent systems in production rely on human reviewers as their primary quality gate. Both findings are solid. Together, they describe something uncomfortable about where organizational AI deployment is actually headed.

The Quiet Part
In a controlled trial, experienced developers using AI tools estimated they'd gotten about 20% faster. Measured result: 19% slower. What lingers is that completing the work, experiencing the full arc of it, didn't correct their perception. Meanwhile, 74% of teams running agent systems in production rely on human reviewers as their primary quality gate. Both findings are solid. Together, they describe something uncomfortable about where organizational AI deployment is actually headed.
Lessons and Designs

What Aviation Built After It Noticed the Problem
Aviation solved the automation-complacency problem decades ago with simulator checks, mandatory manual flying, and crew resource management. Those countermeasures rest on a foundation enterprise AI lacks: a stable, agreed-upon definition of what correct human performance looks like. A pilot's hand-flown approach can be graded against known parameters. An analyst's judgment on a market entry recommendation cannot. That structural difference determines whether countermeasures are even possible.

Monitoring the Monitors
Organizations are building increasingly sophisticated systems to monitor whether their AI agents perform well. Almost none are checking whether the humans evaluating those agents still can. Evidence from the few domains where measurement exists — developer productivity, medical diagnostics — reveals a durable gap between perceived and actual human capability. The instrument to detect oversight degradation needs to be built now, while the people who could design it still have the expertise to know what it should measure.

The Causal Evidence

Polish researchers studying AI-assisted colonoscopy weren't looking for deskilling. They found it anyway. After endoscopists gained routine access to AI polyp detection, their unassisted adenoma detection rate fell from 28.4% to 22.4% across four centres and nearly 1,500 procedures. The degradation only surfaced when the tool was taken away. With it, performance held steady.
From a completely different domain, the same shape: experienced developers using AI coding tools took 19% longer on real tasks while reporting they'd become 20% faster. The tool hides the loss from the person losing something.
Two findings, two fields, one mechanism.
Further Reading




Past Articles

While the apprenticeship pipeline thins, a different question is surfacing at the organizational level: what does work l...

Entry-level job postings have fallen 35 percent since early 2023. The tasks vanishing from junior roles—debugging, data ...

A year ago, the enterprise AI conversation was about capability — whether agents could navigate websites without halluci...

Experienced developers predicted AI made them 24% faster. Actual result: 19% slower. After completing the work, with the...
