Lesson 3: Embeddings and Vector Search

By Leigh Garrity— May 6, 2026

An embedding is a list of numbers that encodes meaning as geometry. More precisely: it's a dense numerical vector, typically hundreds to thousands of floating-point values, produced by a neural network trained so that semantically similar inputs generate geometrically proximate outputs. Two sentences that mean the same thing will produce vectors that are close together in this high-dimensional space. Two sentences about unrelated topics will produce vectors that are far apart.

That geometric relationship is not a metaphor. It's the actual mechanism. When an AI system retrieves relevant content, it is doing distance math.

How the Mechanism Works

Start with a document — a policy, a contract, an incident report. You pass it through an embedding model, which outputs a vector: something like [0.23, -0.87, 0.41, ... 1,536 values total]. That vector is stored in a vector database alongside a pointer to the source document. Repeat this for every document in your corpus.

At query time, the user's question goes through the same embedding model and produces its own vector. The vector database then computes the geometric distance between the query vector and every stored vector, or uses an approximate nearest-neighbor algorithm to do this efficiently at scale, and returns the k closest matches.

Keyword search would miss a lot of this. The query "what are our data retention requirements for federal contractors?" and a document paragraph reading "records involving federal procurement must be preserved for a minimum of seven years" share almost no words. A keyword index would likely pass over the match. A vector search surfaces it, because the two inputs encode to nearby coordinates in semantic space. The model learned, during training, that "retention requirements" and "preserved for a minimum of" are semantically adjacent concepts.

The dimensionality of the embedding space determines the resolution of the semantic representation. Current production models typically operate in ranges from 768 to 3,072 dimensions, though this varies by model and will continue to shift. More dimensions allow finer-grained distinctions. The tradeoff is storage and compute cost, which is why architecture decisions about embedding models carry budget weight alongside technical weight.

Similarity is typically measured by cosine similarity (the angle between two vectors, regardless of their magnitude) or Euclidean distance (the straight-line distance between two points). Cosine similarity is more common for text because it's insensitive to document length. A short paragraph and a long chapter can be meaningfully compared without the chapter's larger magnitude dominating the result.

Why the Vector Database Is on Every Architecture Diagram

Language models don't have your organization's data. They were trained on public corpora — web text, books, code — and that training is frozen. The model has no knowledge of your agency's policies, your contracts, your incident history, or anything else that postdates its training cutoff or was never public to begin with.

The vector database is the component that makes the model's knowledge specific to you. It holds the embeddings of your internal content, indexed for retrieval at query time. When a user asks a question, the system retrieves the most relevant chunks from your corpus and supplies them to the model as context. The model answers using both its general knowledge and the retrieved content.

This is why the box appears on every diagram. Without it, the AI system can only answer from its training data. With it, the system can answer from your data. A CIO asking why the architecture requires a vector database is asking why the system knows anything about their agency. The vector database is the answer.

The procurement implication follows directly. Evaluating an enterprise AI system means evaluating its retrieval layer: which embedding model it uses, how it chunks and indexes documents, what similarity threshold it applies before returning results, and how it handles documents that have changed since they were indexed. These choices determine whether the system surfaces accurate, current information or confidently retrieves something adjacent but wrong.

Okta Concept Mapping

The closest IDAM analogue to a vector database is a directory service. An LDAP directory stores identity attributes — names, roles, group memberships — indexed for runtime retrieval. A vector database stores embeddings, also indexed for runtime retrieval. Both are purpose-built stores queried at runtime rather than baked into the application; both return records the application uses to make a decision.

The analogy breaks at the query model. LDAP queries are exact-match: the filter (memberOf=contractors) either returns a record or it doesn't. Vector search is approximate by design — it returns the k nearest neighbors regardless of whether they're actually close enough to be useful. There is no built-in threshold. The application decides what "similar enough" means, and that threshold is a policy decision that most architecture diagrams don't show and most procurement conversations don't reach. IDAM people will recognize the governance shape immediately: it's the difference between a binary access decision and a risk score, and it carries the same implications.

That geometric relationship is not a metaphor. It's the actual mechanism. When an AI system retrieves relevant content, it is doing distance math.

How the Mechanism Works

Why the Vector Database Is on Every Architecture Diagram

Okta Concept Mapping