A language model can reason over what's in its context window. It cannot search. Those are different operations, and the difference is why every enterprise AI architecture diagram has a box labeled "vector database" sitting between the data layer and the model. That box is doing something specific. This piece explains what.
What an Embedding Actually Is
An embedding is a vector — a list of numbers — that represents the meaning of a piece of text. A coordinate in a high-dimensional space where the geometry encodes semantic relationships, not the text itself, not a hash or a fingerprint.
The key property: semantic similarity maps to geometric distance. Two pieces of text that mean similar things will have coordinates that are close together in this space. Two pieces of text that mean different things will have coordinates that are far apart. "Password reset" and "I can't log in to my account" will land near each other. "Password reset" and "quarterly earnings guidance" will not.
The space is high-dimensional in a way that resists visualization but is straightforward to compute over. A typical production text embedding model outputs vectors with hundreds to thousands of dimensions — 1,536 is a representative number for current general-purpose text embedding models, though this is model-specific and changes as the field moves. Each dimension captures some aspect of meaning that the model learned during training. You don't interpret the dimensions individually. The meaning lives in the relationships between coordinates, not in any single number.
What you're working with, in practice, is a long list of decimal numbers: [0.023, -0.417, 0.891, 0.114, ...] continuing for however many dimensions the model produces. That list is the embedding. Store it alongside a pointer to the original text, and you have the basic unit of a vector database.
• Embedding: A numerical coordinate in a high-dimensional meaning-space, produced by passing text through a trained model. Semantic similarity between two pieces of text corresponds to geometric proximity between their embeddings.
How Vector Search Works
Start with a corpus — a set of documents, policy pages, support articles, whatever the model needs to be able to retrieve. Pass each document (or each chunk of a document) through an embedding model. The model returns a vector for each chunk. Store those vectors in a vector database, each one linked to its source text.
At query time: a user submits a question. That question goes through the same embedding model — critically, the same model, because the coordinate space only makes sense if everything in it was generated by the same model. You get back a query vector, then ask the vector database to find the stored vectors geometrically closest to it.
The database does that search and returns the nearest neighbors — the stored chunks whose embeddings are closest to the query embedding. Those chunks go into the context window. The model generates a response from them.
The distance calculation is typically cosine similarity rather than straight Euclidean distance. Cosine similarity measures the angle between two vectors rather than the distance between their endpoints. This matters because the magnitude of a vector — how long it is — doesn't carry semantic information. The direction does. Two embeddings can be very different in magnitude and still point in nearly the same direction, meaning they represent semantically similar content. Cosine similarity captures that; raw distance does not.
The search itself is where vector databases earn their place in the architecture. Comparing a query vector against millions of stored vectors by brute force is computationally expensive. Vector databases use approximate nearest neighbor (ANN) algorithms — indexing structures that trade a small amount of accuracy for a large reduction in search time. The tradeoff is explicit and tunable: you can ask for exact nearest neighbors at high computational cost, or approximate nearest neighbors fast. Most production deployments accept the approximation.
• Vector search: The operation that finds stored embeddings geometrically closest to a query embedding. Results are the nearest neighbors in meaning-space — the content most semantically similar to the query.
IDAM Concept Mapping: The Directory Lookup, and Where It Stops Working
The closest analogy in your existing mental model is a directory lookup. You have a query, you hit a store, and you get back records that inform what happens next — the same basic shape as querying LDAP or an IdP's attribute store. The vector database is a lookup layer; it sits between the application and the model; it accepts a query and returns records. That much maps cleanly.
The break is this: a directory lookup is exact-match. uid=jsmith either resolves or it doesn't. The result is deterministic and unambiguous. Vector search is approximate-match on semantic similarity. There is no exact key. The results are the nearest neighbors — the geometrically closest embeddings — not a guaranteed correct answer. The same query returns the same results (it's computationally deterministic), but "nearest" does not mean "right." It means "closest in the embedding space the model learned." A chunk can be the top result and still be the wrong document for the question. If you're building a workflow where the retrieval needs to be auditable, explainable, or trusted for a consequential decision, that distinction is not a footnote. Your IdP does not return "the user most similar to jsmith." Your vector database returns exactly that kind of approximation, every time.
When You'll Need This
A federal civilian agency is evaluating an AI assistant for their IT help desk. The CIO asks how the system knows which policy documents to surface when an employee asks about leave procedures. The agency has twelve thousand policy documents, HR guidance pages, and IT runbooks. You cannot put all of them in the context window — the reader of this series already knows why. Instead, every document has been pre-processed through an embedding model, and those embeddings live in a vector database. When the employee asks their question, the question gets embedded too, and the vector database finds the four or five documents that are semantically closest to the question. Those go into the context window. The model answers from them.
Semantic search is a distinct capability from keyword search, which is why procurement conversations increasingly treat them as separate line items. A keyword search for "leave" finds every document containing the word "leave" — which, in a federal agency's document corpus, is a large and mostly irrelevant set. A vector search for "I need to take time off for a medical procedure" finds the FMLA policy, the sick leave policy, and the reasonable accommodation guidance — without any of those documents necessarily containing the exact words the employee used. Retrieval is driven by meaning.
The same substrate powers recommendation systems and semantic deduplication, two use cases that show up in enterprise AI conversations well before anyone mentions a chatbot. A recommendation engine that suggests similar content is doing vector search. A system that flags near-duplicate records in an identity store is doing vector search. When a vendor says their product uses "AI-powered similarity matching," they almost certainly mean embeddings and vector search. Knowing the mechanism changes what you can ask them.
The vector database retrieves. It does not generate. That distinction matters when a buyer asks about accuracy, hallucination, or explainability — because those questions have different answers depending on whether they're aimed at the retrieval layer or the generation layer. Retrieval fails when the nearest neighbor wasn't actually relevant. Generation fails when the model reasons incorrectly from correct inputs. These are different failure modes, and conflating them is where a lot of enterprise AI conversations go sideways.
• Vector database: Infrastructure that stores embeddings and executes nearest-neighbor search at scale. It is the retrieval layer in AI architectures — it finds relevant content; it does not generate responses.
The next section covers how this retrieval layer connects to the generation layer — the full RAG pipeline. What you have now is the substrate: a coordinate system for meaning, a search operation that works on that coordinate system, and a database that makes that search fast enough to be useful. That's enough to follow the conversation. The rest is plumbing.

