When your buyer says "we're building RAG," the conversation turns to retrieval within about ninety seconds. How does the system find the right documents before the model generates an answer? Three mechanisms own that conversation: keyword search (BM25), vector search (semantic similarity), and hybrid search (both, combined with re-ranking). Each finds different things well. Each drops different things silently. The language that buys you credibility in a buyer conversation is knowing which does which, and where the identity layer breaks underneath all of them.
Keyword Search (BM25)
What it is: A scoring algorithm that ranks documents by how well they match the exact terms in your query.
What it does: BM25 counts how often your search terms appear in a document, weights the score by how rare those terms are across the full collection, and returns a ranked list. Search for "FAR 52.219-8" and documents containing that exact string score high because the match is precise and the string is rare. This is the math behind every enterprise search box you've used for two decades. Nothing new here except the name.
Who's behind it: BM25 is an open standard embedded in Elasticsearch, OpenSearch, Azure AI Search, and essentially every search platform that predates the current AI wave. It's not a product. It's infrastructure.
What makes it distinct: BM25 never invents a match. If the term isn't in the document, the document doesn't come back. That sounds trivially obvious until you spend time with vector search, where "close enough" is the entire operating principle. For contract numbers, error codes, regulation citations, NIST control references, and policy clause identifiers, BM25 is the mechanism that stays literal when you need it to be literal.
Vector Search
What it is: A retrieval method that converts text into numerical representations called embeddings and finds documents whose meaning is mathematically close to the query's meaning.
What it does: An embedding model reads a chunk of text and produces a vector: a list of numbers representing that chunk's semantic content. A vector database stores millions of these and, at query time, finds the ones nearest to your query's vector. The practical result: a query about "stress management strategies" retrieves a document titled "Employee Wellness and Burnout Prevention" even though the two share zero keywords. The embedding captured the meaning. The words were irrelevant.
The inverse is equally true. Search for "SKU-B4920" and the vector store might return documents about product inventory management in general, because the embedding model encoded the semantic neighborhood rather than the character sequence. Practitioners have documented this pattern across part numbers, regulatory codes, contract references, and domain-specific jargon.
Who's behind it: Embedding models come from OpenAI, Cohere, Google, and others. Vector databases include Pinecone, Weaviate, Qdrant, and the vector capabilities built into Azure AI Search and OpenSearch. The ecosystem is fragmented and still sorting itself out. Treat any specific product reference here as a snapshot, not a recommendation.
What makes it distinct: The embedding process is lossy in ways that matter for enterprise search. The same property that lets vector search find documents you didn't know how to ask for also destroys exact-string fidelity. The spec calls this "semantic representation," which is a generous description of what happens to your contract number. The embedding model treats identifiers as semantically similar to their category rather than as unique strings. Feature and failure mode, fused together.
Vector stores have the same authentication-vs-authorization gap your AEs already know. A shared vector index can authenticate the query but return chunks from documents the user was never authorized to access. OWASP's LLM08 names this explicitly: similarity search returning documents based on embedding distance without verifying document-level permissions. The analogy to ABAC holds cleanly — you need attributes enforced at query time, not just at the perimeter. Where it breaks: in traditional IDAM, authorization is a gate. In vector retrieval, it's a metadata filter fused into a similarity search across millions of chunks, and if the developer didn't attach permission metadata at index time, there's nothing to filter on.
Hybrid Search
What it is: A retrieval architecture that runs keyword and vector search in parallel, merges the results, and re-ranks them using a more expensive model.
What it does: The query hits two systems simultaneously. BM25 returns its top results ranked by term frequency. Vector search returns its top results ranked by semantic similarity. A fusion algorithm, typically Reciprocal Rank Fusion (RRF), combines both lists by looking at each document's position in each ranking rather than the raw scores (which are on incompatible scales). Then a re-ranker takes the top merged results and re-scores them using a deeper model. In Azure AI Search, that re-ranker is a semantic ranking model adapted from Bing.
Two layers, then. Initial retrieval (BM25 + vector, running in parallel) is fast and broad, producing a candidate set. Re-ranking is slow and expensive, applied only to the top 50 results from the merge. You wouldn't run the expensive pass on every document in the index.
Who's behind it: Azure AI Search is the most documented production implementation, but Elasticsearch, OpenSearch, and several purpose-built platforms offer hybrid modes. The pattern is converging across vendors because the empirical case closed the argument.
What makes it distinct: Keyword and vector retrieval fail in opposite directions. BM25 misses the document about "transaction error recovery" when you search "payment failures." Vector search misses "FAR 52.219-8" when you search for that exact clause. Hybrid catches both. Microsoft's production benchmarks, measured as NDCG@3 (a standard relevance metric scored against customer query/document pairs), show hybrid search with semantic re-ranking at 48.4, versus 43.8 for vector-only and 40.6 for keyword-only. Microsoft's own data shows that even with powerful embedding models, initial retrieval alone "is not enough," and the semantic ranker "drastically improves relevance" over the merged results. Independent testing (Supermemory, a hybrid search vendor) tells the same directional story from a recall perspective: hybrid hits 91% recall@10 compared to 78% for dense vector and 65% for sparse keyword.
The two-layer retrieval pattern maps to step-up auth. Initial retrieval (BM25 + vector) is the first authentication: fast, broad, produces a candidate set. Re-ranking is step-up: a more expensive, higher-confidence evaluation applied to a smaller population. You wouldn't run the re-ranker on every document any more than you'd run the expensive check on every user. Where the analogy breaks: step-up auth is triggered by risk signals. Re-ranking happens on every query. The cost model is different, but the architectural intuition transfers.
What Wins Where
I'm organizing this by scenario because that's what's useful in a conversation. When your buyer describes their use case, you need to know which mechanism fits and which one will quietly fail them.
For quick reference, here are the benchmark numbers you can cite:
| Metric | Keyword (BM25) | Vector Only | Hybrid + Re-ranking |
|---|---|---|---|
| NDCG@3 (Microsoft production) | 40.6 | 43.8 | 48.4 |
| Recall@10 (Supermemory benchmark) | 65% | 78% | 91% |
Scenario 1: Exact-Match Queries The user searches for a contract number, error code, regulation section, or policy clause ID.
Keyword search wins cleanly. BM25 treats "FAR 52.219-8" as a literal string and returns documents containing it. Vector search encodes that string into a semantic neighborhood ("federal acquisition regulations, small business provisions") and may return thematically related documents that don't contain the actual clause. Hybrid covers this because the BM25 component catches the exact match even when the vector component drifts. In public sector contexts where users routinely search by regulation number, contract identifier, or NIST control reference, vector-only retrieval is a documented failure pattern.
Scenario 2: Conceptual Queries The user asks "how do we handle exceptions for contractors on regulated projects?" and the relevant document uses entirely different vocabulary.
Vector search wins. The embedding model captures the semantic relationship between the query and a document titled "Contractor Onboarding: Special Provisions for Classified Programs" even though the word "exceptions" never appears. BM25 returns nothing useful because the term overlap is too low. Hybrid covers this because the vector component finds the semantic match regardless of keyword presence.
One practitioner documented a production failure (a post-mortem on Medium's Towards Data Science) where a compliance team's question about contractor exceptions returned a confident, well-structured, wrong answer. The model performed fine. The relevant exception clause had been split at a chunk boundary during ingestion. The general rule ended up in one chunk, the qualification in another, and the retrieval layer surfaced the general rule without the exception. Technically a chunking failure compounding the retrieval problem, and it would bite hybrid search too. But it illustrates the broader point: retrieval failures are silent. The system returned a confident answer. Nobody knew it was wrong until the compliance team checked.
Scenario 3: Mixed Queries The user asks "what does Section 4.2.1 say about data retention?" — an exact reference combined with a semantic concept.
Neither keyword nor vector handles this well alone. BM25 finds documents containing "Section 4.2.1" but can't evaluate whether they're about data retention. Vector search finds documents about data retention but may miss the specific section reference. Hybrid handles this because RRF merges both result sets, and the re-ranker promotes documents that satisfy both criteria. This is the most common real-world query pattern, and it's the scenario where the benchmark gap becomes tangible. That 10% NDCG gap between hybrid with re-ranking (48.4) and vector-only (43.8) is the difference between the system finding the right section about the right topic and the system finding one or the other.
Scenario 4: Permission-Sensitive Retrieval The user asks a question whose answer lives in a document they're not authorized to see.
This is where retrieval mechanism choice becomes an identity governance question. In a vector-only system without document-level ACLs, the similarity search returns chunks based purely on embedding distance. The model then presents that content as an answer. The user receives information they couldn't access directly, laundered through a natural-language interface that obscures the permission violation.
OWASP documents the concrete version of this: in a shared vector store, a user from Company A asks about quarterly revenue projections, and the similarity search retrieves semantically related content from Company B's confidential financials because the vectors aren't isolated by tenant. A routine query. An architecture that couldn't distinguish tenants.
Every major vector database supports metadata filtering as a mechanism for access control. Enforcement is the gap. Developers build shared indexes without attaching permission metadata, queries run without filter clauses, and the architecture assumes that if the retriever can access it, the user should too. Azure AI Search is the one platform that has addressed this most explicitly, introducing native Entra ID-based document-level security in mid-2025. Before that, developers had to hand-code security trimming and maintain the logic whenever someone's role shifted.
In federation, the relying party trusts the identity provider's assertion about who the user is. In RAG, the model trusts the retriever's results about what the user should see. If the retriever doesn't enforce document-level permissions, the model has no way to know it's presenting unauthorized content. Same structural problem as a service provider accepting any SAML assertion without validating audience restriction. When a buyer says "we're building RAG," the identity question worth asking is whether the retrieval layer enforces authorization on every chunk it returns. Authentication to the chat interface doesn't cover it.
How to Say This in the Field
| Don't say | Do say | Why it matters |
|---|---|---|
| "Vector search is AI-powered search" | "Vector search finds documents by meaning similarity, not keyword match — great for conceptual questions, drops exact strings like contract numbers" | Shows you know the tradeoff, not just the buzzword |
| "Keyword search is legacy" | "BM25 is still the only retrieval method that reliably finds exact identifiers — regulation numbers, error codes, SKUs" | Names a wall the buyer will hit on vector-only |
| "Hybrid search is the best approach" | "Hybrid covers both failure modes — Microsoft's benchmarks show it scoring about 10% higher than vector-only on relevance" | Attaches a number to a directional claim the AE can defend |
| "Embeddings capture meaning" | "An embedding is a list of numbers representing what a chunk of text means — similar meanings end up close together, but exact strings get smeared in the process" | "Smeared" is the insight; without it, embeddings sound like magic |
| "Re-ranking improves results" | "Re-ranking is the second pass — a more expensive model re-scores the top results from initial retrieval, and Microsoft's own data says it drastically improves relevance over the merge alone" | Explains the two-layer architecture without jargon |
| "RAG handles security through the LLM" | "The model has no idea whether the user was authorized to see what the retriever returned — permissions have to be enforced at the retrieval layer, on every query" | The identity conversation your buyer probably hasn't had |
| "Vector databases support access control" | "Every major vector database supports metadata filtering for access control — the gap is enforcement at index time and query time, and most early builds skip both" | Names the enforcement gap, which is where deals live |
| "We need to discuss your AI search strategy" | "When your users search by regulation number or contract ID, does your retrieval layer handle exact matches or just semantic similarity?" | Opens a diagnostic conversation without sounding like a pitch |
| "You should use hybrid search" | "If your users search by both meaning and exact reference — and in government, they always do — you need both retrieval mechanisms running, not one" | Ties the recommendation to the buyer's actual query patterns |
| "NDCG is a relevance metric" | "NDCG measures whether the system found the right documents and ranked them correctly — hybrid with re-ranking scores about 10% higher than vector-only on Microsoft's production benchmarks" | Gives the buyer a number they can use without needing the math |
| "The architecture needs document-level security" | "OWASP already named this — their LLM Top 10 calls out vector stores returning documents based on similarity without checking whether the user is authorized to see them" | Third-party validation lands harder than your opinion |
The retrieval mechanism your buyer chooses determines two things: what their AI system can find, and what it silently leaks. One is a search quality problem. The other is an identity governance problem. You're the person in the room who can talk about both.
Things to follow up on...
- OWASP's vector store entry: OWASP LLM08 in the 2025 LLM Top 10 specifically names vector and embedding weaknesses, including unauthenticated embedding endpoints and missing tenant isolation, as a top-ten risk category worth reading in full before any RAG-related buyer conversation.
- Azure's document-level security: Microsoft shipped native Entra ID-based document-level ACLs for Azure AI Search in mid-2025, replacing the hand-coded security trimming workaround that most production deployments still rely on.
- Chunking as silent failure: A Towards Data Science practitioner post-mortem on chunk boundary failures in production RAG shows how retrieval quality depends on ingestion decisions made long before the query runs, a problem that no retrieval mechanism fixes on its own.
- Agentic retrieval momentum: Microsoft's May 2025 announcement of an agentic retrieval engine for Azure AI Search claims up to 40% better relevance on complex queries, signaling that the next generation of hybrid search may let the model itself decide how to decompose and route retrieval.

