We Built the Infrastructure Agents Need — We Just Built It for Examiners

The following is a hypothetical interview. Any resemblance to an actual compliance executive willing to speak on the record about agent deployment in regulated financial services is, frankly, implausible enough to confirm the point.

The dominant narrative around AI agents in financial compliance has been friction. Regulation as obstacle. Compliance as the department that kills momentum. That story is tidy, satisfying, and increasingly wrong.

What's actually emerging from early deployments is stranger and more useful: the compliance machinery these organizations spent decades building, under duress, for human examiners, turns out to be almost exactly the infrastructure agents need to function in production. Role-based authority. Mandatory logging. Structured approval flows. Audit trails that exist because regulators demanded them, not because anyone thought they'd be elegant.

The numbers explain why firms are willing to find out. Global compliance costs for banks now exceed $270 billion annually.¹ Conventional AML systems produce false-positive rates between 90% and 95%.² More than 70% of financial institutions still rely on manual operations for at least half of their AML activities.³ That's not a system waiting for disruption. That's a system buckling under its own weight.

Nkechi Okafor-Blum spent nine years as an FDIC bank examiner before crossing to the industry side. She's now SVP of Compliance Operations at a mid-large U.S. bank holding company with European operations, where she's overseen agent deployment into AML alert triage, SAR drafting, and KYC continuous monitoring workflows. She agreed to talk about what actually happened, including the parts that didn't match the vendor pitch.

She has the examiner's habit of pausing before answering, as if mentally checking whether what she's about to say would survive a deposition.

You came from the examiner side. How does that shape how you think about agents in compliance?

Nkechi: It means I spent nearly a decade asking banks to prove that their processes did what they claimed. So when someone tells me an agent "handles" alert triage, my first question is the same one I'd ask sitting across the table with a legal pad: show me the trail. What was delegated, to whom, with what authority, and can you reconstruct the decision chain six months later when I come knocking?

What genuinely surprised me, and I don't use that word casually, is that when we started mapping what agents need to operate safely, we kept landing on infrastructure we already had. Role-based access control. Mandatory logging. Structured approval flows with defined escalation thresholds. Audit trails that are regulatory deliverables, not afterthoughts.

We'd built all of that over twenty years of SOX compliance, BSA/AML requirements, GLBA, Part 500. None of it was built for agents. It was built to survive examinations. But structurally? Same problem. Who can act, on what, with whose authority, and where's the evidence.

Where did that actually play out first?

Nkechi: SAR drafting. Our analysts were spending a third of their day reviewing transactions that turned out to be nothing.⁴ I'm not rounding up. The agent handles evidence gathering, structures the narrative, pulls supporting documentation. The analyst reviews and authorizes. Cycle time dropped by roughly half, which tracks with what others have reported.⁵

But here's why the existing infrastructure mattered. We already had tiered risk frameworks. When we needed to classify which agent actions require human sign-off versus which can proceed autonomously, we didn't invent a new taxonomy. We mapped agents onto the risk tiers we already used for human-executed decisions. Tier-1 actions like filing SARs, blocking payments, sanctions checks require the same comprehensive validation we'd apply under the old SR 11-7, now SR 26-2.⁶ We extended the vocabulary to cover a new kind of actor. That's it.

The model governance people adapted faster than I expected, actually. They already spoke in terms of model inventories, validation evidence, documentation standards. Agents fit into a conversation that was already happening. I'd braced for a turf war and got a working group instead.

What surprised you most?

Nkechi: The approval fatigue problem migrating.

We already knew our AML analysts were drowning in false positives. Ninety-plus percent of alerts going nowhere.⁷ That's the whole reason we brought agents in. And the agents are better at triage. Measurably better.

Then something happened that I should have predicted and didn't.

The agent generates decision packages. Each one needs human review before action. Suddenly our analysts have a second queue. The agent's queue. And if the approval interface isn't designed well, if it's just a click, if it doesn't expose the reasoning chain, the data sources, the confidence level, the consequences, people stop reading. They approve on autopilot. You've recreated the exact problem you were solving, just one layer up.

We redesigned the approval interface three times. The first version was basically a confirmation button. Useless. The current version forces the reviewer to engage with the agent's reasoning before they can authorize. It slows things down slightly, which made some people nervous.

“

But an approval that nobody actually reads is worse than no approval at all. It's liability theater.

How did this affect your team? The honest version.

Nkechi: We spent more on governance than we projected. About 60% more.⁸

[She lets that number sit for a moment.]

Anyone who tells you agents reduce compliance costs without mentioning the governance overhead is either early in their deployment or late in their sales cycle. We didn't cut headcount. We shifted what people do. Some of that shift was welcome. Some of it meant telling experienced analysts that their job had fundamentally changed, and not everyone was thrilled about that.

What remains genuinely hard?

Nkechi: Three things.

First, prompt versioning. In a regulated firm, changing an agent's system prompt should be a model governance event. Logged, reviewed, approved, traceable. Most engineering teams don't think of it that way. They think of it as a deployment. That gap between how engineers see prompts and how examiners will see prompts is real, and nobody has fully closed it.

Second, the multi-step reasoning chain problem. FINRA's 2026 report explicitly flags auditability gaps in multi-step reasoning as a risk vector.⁹ When an examiner asks why an agent made a particular recommendation, "the model thought so" is not an answer that keeps your charter. You need to reconstruct the chain. We're not fully there yet. Nobody is, as far as I can tell.

Third, and this is the one nobody wants to discuss at conferences: data quality. Forty-eight percent of organizations cite data governance as their primary implementation challenge, and that number feels low to me.¹⁰ Our agents surfaced data quality problems that had been lurking for years. Humans were interpreting around the inconsistencies, the way you learn to read your coworker's handwriting. Agents can't do that. Or rather, they can, but they do it wrong in ways that are harder to catch than the original problem.

FINRA now classifies agents as a distinct supervisory risk category. The EU AI Act deadlines are approaching. Tailwind or headwind?

Nkechi: Both, simultaneously, which is the most honest answer I can give about anything in regulatory compliance.

The signals are actually clarifying, which helps. SR 26-2 explicitly brings agentic systems in scope.¹¹ The EU AI Act's August deadline means high-risk financial AI systems need transparency, traceability, and human oversight baked in.¹² If you've already built that infrastructure, and we had, because we had to, you're ahead. If you haven't, you're scrambling in a way that's expensive and visible.

The institutions I worry about are the ones deploying agents on generic platforms without banking-grade controls. Drag-and-drop is not going to survive an examination. I say that as someone who conducted examinations. I know what we looked for.

If you could go back to the beginning of this process, what would you do differently?

Nkechi: I'd start with the approval interface, not the agent. We spent months optimizing how the agent gathers and structures information, and about two weeks on how a human actually reviews its output under real workload conditions. That ratio was backwards. The agent is only as good as the human's ability to meaningfully evaluate what it produces, and "meaningfully" is doing a lot of work in that sentence.

And I'd be more honest with the team earlier about what "human-in-the-loop" actually means for their Tuesday morning.

“

It doesn't mean the agent does the work and you watch. It means you're now responsible for evaluating something a machine produced, under time pressure, with your name on the authorization. That's a different job. We should have treated it like one from the start.

Facctum, "State Of AML Compliance 2026 Report," https://www.facctum.com/blog/state-of-aml-compliance-2026 ↩
Facctum, "AML False Positive Rates 2026 Report," https://www.facctum.com/blog/aml-false-positive-report ↩
Flagright, "How AI Reduces Operational Strain and Cuts Costs in AML Compliance Workflows," https://www.flagright.com/post/how-ai-reduces-strain-and-costs-in-aml-compliance-workflows ↩
Industry benchmarks estimate compliance analysts spend approximately 32% of their working day reviewing transactions ultimately cleared as legitimate. See Lucinity, "Tackling Alert Fatigue in AML Compliance," https://lucinity.com/blog/tackling-alert-fatigue-in-aml-compliance-with-ai-powered-case-management ↩
Neurons Lab, "AI For Compliance In Banking: From Pilot To Production," https://neurons-lab.com/article/ai-for-compliance-in-banking/ ↩
On April 17, 2026, the Federal Reserve issued SR 26-2, superseding SR 11-7. See SIA Partners, "SR 11-7 vs. SR 26-2: Model Risk Management Modernization," https://www.sia-partners.com/en/insights/publications/sr-11-7-vs-sr-26-2-model-risk-management-modernization ↩
Industry research consistently estimates 90–95% false-positive rates in conventional AML systems. See Facctum, "AML False Positive Rates 2026 Report." ↩
Industry data suggests average enterprise bank AI governance budgets grew by 62% in 2025. See SymphonyAI, "Agentic AI in financial services: From hype to governance," https://www.symphonyai.com/resources/blog/financial-services/agentic-ai-compliance/ ↩
FINRA's 2026 Annual Regulatory Oversight Report classifies AI agents as a distinct supervisory risk category and identifies auditability gaps in multi-step reasoning chains as a primary risk vector. ↩
KPMG's 2025 analysis found 48% of organizations cite data governance as their primary agentic AI implementation challenge. ↩
Databricks, "Model risk management in 2026," https://www.databricks.com/blog/model-risk-management-2026-bankers-guide-revised-interagency-guidance ↩
The EU AI Act requires high-risk AI systems in financial services to comply with transparency, traceability, and human oversight requirements by August 2, 2026. ↩