RAG vs. Agentic Search — Two Ways to Feed a Model Before It Thinks

Compares RAG and agentic search as retrieval strategies, maps each to buyer data environments, and provides field-ready conversation language for AEs.

By Leigh Garrity— May 9, 2026

RAG vs. Agentic Search — Two Ways to Feed a Model Before It Thinks

Compares RAG and agentic search as retrieval strategies, maps each to buyer data environments, and provides field-ready conversation language for AEs.

The retrieval layer is where access governance actually lives in an AI system. When a buyer asks about "retrieval strategies" or whether your platform "supports RAG pipelines," they're asking how the system finds the right information before the model generates an answer, and that question is inseparable from who controls what the model sees. Two approaches dominate. RAG pre-indexes content into searchable chunks and retrieves them by similarity. Agentic search lets the model search live content on demand using tools like grep and file reads. The sentence that earns you a seat at the architectural table: "How are you controlling what the model sees before it reasons?"

RAG (Retrieval-Augmented Generation)

What it is: A pipeline that pre-processes documents into searchable chunks, stores them in an index, and retrieves the most relevant pieces to feed a model alongside the user's question.

What it does: Before the model generates anything, a retrieval step queries the index for content semantically similar to the user's prompt. Those chunks become the model's working context. The model reasons over your documents. Answers come back grounded in your corpus.

Who's behind it / where it comes from: Facebook AI Research formalized RAG in 2020. It has since become the default enterprise pattern for connecting LLMs to proprietary data. Every major cloud provider ships a RAG-capable stack: Microsoft's Azure AI Search, Google's Vertex AI Search, AWS's Bedrock Knowledge Bases. It's the incumbent.

What makes it distinct: RAG does its expensive work upfront. You index once (or on a schedule), and retrieval at query time is fast and cheap. The tradeoff is maintenance. The index drifts out of sync with source data. Retrieval quality depends on how well content was chunked and embedded. And the system can return semantically similar but functionally wrong results when embedding quality varies across domains and terminology.

Agentic Search

What it is: The model itself decides what to search for, executes search operations in real time (grep, glob, file reads), evaluates results, and iterates until it has enough context to answer.

What it does: The model uses tools to explore source material directly rather than querying a pre-built index. It reads directory structures, greps for patterns, opens files, scans contents, and decides whether it needs to keep looking. Retrieval and reasoning happen in the same loop. The model drives both.

Who's behind it / where it comes from: The highest-profile implementation is Claude Code, Anthropic's coding agent. Boris Cherny's team at Anthropic built RAG first, tested it against agentic search, and switched. The approach has since spread to other coding agents and developer tools. It's newer as a deliberate architectural choice, though the underlying operations (grep, find, read) are as old as Unix.

What makes it distinct: Agentic search has no index to maintain, no embedding pipeline to tune, and no staleness problem. The model always reads current state. The tradeoff is cost and time: every search burns tokens, the model may need multiple rounds to find what it needs, and the token bill can grow unpredictably. There's also no semantic understanding in the search itself. Grep matches strings. If you remember "port conflict during deployment" but the actual text says "modified docker-compose port mapping," grep misses it. A function called authenticateUser and one called validateLogin might do the same thing, but grep will only find the one you explicitly name.

Comparison by Data Environment

I'm using scenario mapping rather than a trait-by-trait table because the honest answer to "which is better" depends entirely on the buyer's data environment. A parallel table would imply these approaches compete on the same dimensions at the same scale. They don't. What matters is the corpus: its size, how fast it changes, and how access to it is governed.

Code and Fast-Moving Content

The Claude Code origin story, and the strongest case for agentic search.

Boris Cherny described the decision on the Latent Space podcast in May 2025. His team built RAG first, using Voyage embeddings and a local vector database. Then they tested agentic search. His assessment: it outperformed everything by a significant margin, and this surprised the team.

A caveat worth carrying into any buyer conversation where this story comes up: when asked what benchmark established the performance gap, Cherny acknowledged it was "mostly vibes" plus internal testing Anthropic hasn't published. The architectural reasons for the switch are more concrete. Cherny identified three: the index drifted out of sync with the codebase, the index itself was a security liability (it had to live somewhere, and that somewhere could be compromised), and agentic search was simply cleaner to deploy. Catherine Wu, Claude Code's PM, reinforced the enterprise angle in a later interview: hosting an index exposed more surface area and security risk for external enterprise adopters.

The Pragmatic Engineer's summary captures it cleanly:

“

"Plain glob and grep, driven by the model, beat everything."

The approach was inspired by how engineers at Instagram actually searched code when their IDE's click-to-definition broke. They grepped. It worked.

For codebases and fast-changing content, agentic search wins on freshness, deployment simplicity, and security surface area. Nothing to index, nothing to host, nothing to drift.

Large, Stable, Document-Heavy Corpora

Agentic search hits a structural wall here, and most public sector buyers live on this side of it.

The core problem: agentic search assumes a filesystem-shaped corpus. Most enterprise data isn't filesystem-shaped. Legal, regulatory, and policy corpora span distributed document management systems with matter-level access controls and ethical walls. There is no single filesystem to grep. The content lives across iManage, NetDocuments, SharePoint, email archives, and knowledge management platforms, each with its own permission model.

Firms doing legal AI work have converged on RAG-grounded retrieval for exactly this reason. Every answer must be traceable to a verified source within the firm's document corpus. Patent search operations run semantic retrieval across hundreds of millions of documents where attorneys won't act on results that aren't grounded in a real source document. Traceability is a prerequisite, full stop.

RAGFlow's year-end review put the boundary plainly: for enterprise multi-modal, unstructured, or semi-structured data like product manuals, meeting notes, and reports with tables and images, grep-based search "fails completely." Even Claude Code's own users have surfaced this limit. GitHub issues document requests for codebase indexing because agentic search on large repositories burns tokens at exponential rates. The tool that proved agentic search works is also generating the evidence for where it stops working.

For large, stable corpora with complex access requirements, RAG wins on cost predictability, source traceability, and the ability to enforce document-level access control at the retrieval layer.

Okta Concept Mapping: Fine-Grained Authorization at the Retrieval Layer

Auth0 FGA enforces relationship-based access control at the point of document retrieval in a RAG pipeline, ensuring the model only sees data the authenticated user is authorized to access. Your RBAC intuition applies here: application-level access is too coarse, and the model needs data-level authorization. Where this matters in a buyer conversation: if they're building RAG over sensitive corpora, the retrieval layer is where authorization has to live. At the retrieval layer itself, before the model ever sees the data.

The Hybrid Middle Ground Where 2026 Is Landing

Most production systems in 2026 are combining both approaches, and the market data is unambiguous about the direction.

VentureBeat's Q1 2026 enterprise survey found hybrid retrieval intent tripled across the quarter. The pattern: dense vector search plus sparse keyword search plus a reranking layer. Meanwhile, standalone vector database adoption declined. The market is combining retrieval strategies by data type, not picking a lane.

Microsoft has shipped the clearest example. Azure AI Search's "agentic retrieval" uses an LLM to plan queries, then executes them against both keyword (BM25) and vector indexes simultaneously. The model breaks compound questions into focused subqueries, runs hybrid search, and merges results. Microsoft's current recommendation for new RAG implementations: start with agentic retrieval.

The practical frame from Data Nucleus's enterprise guide:

“

"Agents orchestrate when and how to retrieve; RAG remains the grounding mechanism that keeps answers defensible."

For public sector buyers specifically, hybrid matters because regulatory and policy corpora contain both standardized terminology (where keyword search excels) and natural language descriptions (where semantic search excels). An arXiv reference architecture from March 2026 describes weighting between the two channels based on content type: BM25 for regulatory codes and clause numbers, dense retrieval for synonyms and paraphrases. That maps directly to how agencies actually store and reference policy.

Okta Concept Mapping: The Credential Problem in Agentic Search

RAG has a governable artifact: the index, the chunk, the retrieval query. You can audit it, apply ACLs, enforce FGA against it. Agentic search reads live files with whatever credential the harness carries. Your OAuth intuition helps here: the agent authenticates as someone, and that someone's permissions define the search boundary. Where it stops helping: OAuth checks the retriever's identity, but the model can still surface retrieved data to unauthorized recipients after reasoning over it. If the agent outputs to a shared channel, authorized retrieval reaches unauthorized recipients. Okta documented four CVSS 9.3–9.4 vulnerabilities in 2025 across Anthropic, Microsoft, ServiceNow, and Salesforce with exactly this pattern.

Okta Concept Mapping: Governing Agent Activity Across Both Patterns

Okta for AI Agents (GA April 30, 2026) logs agent activity including tool calls, authorization decisions, and access attempts, forwarding to your SIEM. Your privileged access monitoring intuition is useful here: you already know why session logging matters for admin accounts. It applies to agents for the same reason. Where it misleads: a privileged user session is one person doing sequential operations. An agent's tool calls can chain across systems and data sources in a single reasoning loop, generating access patterns no human session would produce. The audit trail needs to capture that full chain as a unit.

How to Say This in the Field

Don't say	Do say	Why it matters
"RAG is dead, everyone's moving to agentic search"	"Anthropic's Claude Code team tested both and found agentic search worked better for codebases. For large document corpora, the math is different."	Overgeneralizing the Claude Code story loses credibility with technical buyers who know their environment isn't a codebase.
"Agentic search outperformed RAG by a lot in benchmarks"	"Anthropic's team found agentic search outperformed RAG in their internal testing, though they haven't published the benchmarks. The architectural advantages are clearer: no stale index, no third-party hosting, simpler deployment."	The "vibes" basis of the performance claim is publicly documented; a technical buyer who's read the transcript will catch the overstatement.
"RAG is the standard enterprise approach"	"RAG is the incumbent pattern for large, stable corpora. The newer approach is agentic search, where the model searches live content on demand. Most production systems in 2026 are combining both."	Framing RAG as the only option signals you haven't tracked the last twelve months of the field.
"You need a vector database for AI search"	"A vector index is one retrieval strategy. The other is letting the model search directly. The right choice depends on how large and stable your corpus is, and how you need to govern access to it."	Standalone vector database adoption is declining; leading with it sounds like last year's architecture.
"The model just greps for what it needs"	"Agentic search means the model decides what to look for, runs the search, evaluates results, and iterates. The search tools are simple, but the model's reasoning drives the process."	"Just grep" undersells the approach and makes the buyer wonder why it's worth discussing.
"Access control is handled at the application layer"	"For RAG, access control lives at the retrieval layer. You need fine-grained authorization on what chunks the model can pull. For agentic search, the agent's credential defines the search boundary, and you still need to govern what happens to retrieved data after the model reasons over it."	Application-level RBAC is too coarse for either pattern; the buyer's security team will flag it.
"Fine-grained authorization enforces document-level permissions at the retrieval step, so the model only sees data the requesting user is authorized to access. That's the layer most RAG implementations are missing."	"Good RAG architecture enforces document-level permissions at the retrieval step, not just at the application boundary. The model should only see data the requesting user is authorized to access. That's the layer most implementations skip."	Describing what good looks like is more credible than leading with a product name; the buyer asks what product does it, and now you've earned the answer.
"Agentic search is more secure because there's no index to hack"	"Agentic search eliminates the index as an attack surface, but it introduces a different risk: the agent reads live data with its own credential, and the output can reach recipients with different permission levels."	The Claude Code team cited security as a reason to drop RAG, but the security tradeoffs are real on both sides.
"Which retrieval approach are you using?"	"What does your source content look like? Is it code and fast-changing docs, or large stable corpora like policy and regulatory text? That drives the retrieval architecture decision."	Starting from the buyer's data environment is more useful than asking them to self-identify with an architecture label.
"AI search is a separate problem from identity"	"The retrieval layer is where identity governance meets AI architecture. Who the agent authenticates as, what data it can pull, and who receives the output are three separate authorization checks. Most implementations are only doing the first one."	This frames your IDAM expertise as directly relevant to the buyer's AI architecture decision.

One thing to carry out of this piece. As models get smarter, agentic search improves automatically because the model is the retriever. RAG quality depends on continuous engineering effort in the chunking, embedding, and indexing pipeline. Neither approach is universally better, but the improvement curves are asymmetric. The buyer who understands that asymmetry will make a better architectural bet. The AE who can explain it will be the one they trust to help.

Things to follow up on...

Claude Code's memory architecture uses a three-layer system where a compact MEMORY.md index points to topic files loaded on demand, with raw session transcripts searched only via grep, as described by practitioners tracking the pattern.
Fragmentation fatigue in retrieval stacks: HyperFRAME Research's Steven Dickens told VentureBeat that managing separate vector stores, graph databases, and relational systems to power one agent has become a DevOps nightmare driving consolidation.
Okta's RAG authorization training: Auth0 has published a hands-on course on integrating Fine-Grained Authorization with the Auth0 AI SDK to enforce per-user document access at the retrieval layer.
Context rot compounds retrieval problems: as agent sessions lengthen, retrieved content accumulates alongside stale tool outputs and resolved errors, degrading reasoning quality in ways that are often misattributed to the model rather than the context.