Browse all previously published AI Tutorials here.
Table of Contents
Vector Databases vs. Traditional Databases for LLM Document Retrieval
Retrieval Efficiency
Embedding Storage and Indexing
Hybrid Retrieval Approaches
Retrieval Efficiency
Vector databases are purpose-built for fast similarity search on high-dimensional embeddings, enabling rapid retrieval even as data scales to millions or billions of vectors. They rely on approximate nearest neighbor (ANN) indexes that dramatically outperform brute-force scans. For example, benchmarks show a huge performance gap between exhaustive search and ANN-based search, with graph-based indexes like HNSW achieving state-of-the-art speedups (When Large Language Models Meet Vector Databases: A Survey) . Specialized vector engines (e.g. FAISS, Milvus, Pinecone) treat vectors as first-class data, using custom data structures and optimizations to attain millisecond-level query times on large corpora (ICDE_PaperID_79.pdf). In contrast, traditional relational databases (PostgreSQL, MySQL) and document stores (MongoDB) were not originally designed for high-dimensional similarity queries. Without specialized indexes, a relational DB must compare a query embedding to every stored vector (O(n) complexity), which becomes infeasible at scale. Even with recent extensions that add ANN indexing to relational systems, there is an observed slowdown: one study confirms that a PostgreSQL-based vector extension delivers significantly slower query performance than a dedicated vector search library under the same conditions . The overhead of the general-purpose engine (transaction layers, row format, etc.) means vector search in a traditional DB can be orders of magnitude less efficient for large datasets. For instance, attempts to index 15 million text embeddings (768 dimensions each) inside PostgreSQL led to system instability and excessive query times, underscoring scalability issues beyond small datasets (BULGARIAN ACADEMY OF SCIENCES). By contrast, specialized vector systems (often leveraging GPU acceleration and memory-optimized indexes) have demonstrated responsive searches on similarly massive corpora . In summary, when retrieving chunk embeddings for LLM augmentation, vector databases scale to far larger corpora with lower latency, whereas naive use of relational or document databases becomes a bottleneck as embedding count and dimensionality grow .
Embedding Storage and Indexing
Indexing techniques differ fundamentally between vector and relational databases. Vector databases typically store embeddings in compact binary or numeric forms and organize them with dedicated index structures (graph-based, tree-based, or quantization-based) optimized for similarity queries . These indexes (e.g. HNSW graphs, IVF inverted files with product quantization) prune the search space and compute distances only on a small fraction of candidates, greatly speeding up retrieval. Many vector DBs offer a choice of index types to balance accuracy, query speed, and memory footprint – for example, FAISS provides flat (exact), IVF+PQ (compressed), and HNSW (graph) indexes in its library . Relational databases, on the other hand, traditionally lack a native vector data type or index. Embeddings are often stored as arrays or blobs in a table row, which a standard B-tree index cannot accelerate for nearest-neighbor search. Newer extensions have emerged to bridge this gap: for instance, PostgreSQL’s pgvector
(and Alibaba’s PASE) plugin defines a vector column type and implements ANN indexes (HNSW and IVF) inside the database . This allows similarity queries via SQL, but the underlying engine must still manage these indexes through its buffer manager and tuple structure. Research shows that such integrated approaches carry non-trivial overhead. One case study found that a Postgres-based HNSW index was slower and more memory-intensive than the same index in a standalone vector library, especially as index parameters (graph connectivity) increased . The performance gap widened for more complex indexes, due to extra pointer chasing and tuple access costs in the relational engine . In practice, specialized vector stores use low-level optimizations (e.g. contiguous memory layout, SIMD distance computations, GPU offloading) that general databases rarely exploit. While some document databases like MongoDB have added vector search features (using an underlying Lucene ANN index for up to 2048-dimensional vectors) , these are essentially embedding a vector index inside a text-search engine. Overall, vector DBs excel by storing embeddings in tailor-made indexes for fast similarity lookup, whereas relational and general-purpose databases must either forego indexing (resorting to brute force) or bolt on limited ANN indexes that struggle to match the efficiency of purpose-built solutions .
Hybrid Retrieval Approaches
To get the best of both worlds, modern systems explore hybrid approaches that combine vector searches with traditional database filtering or storage. One strategy is to integrate vector indexes into a relational database engine (as in AnalyticDB-V or Postgres+pgvector) so that a single query can perform semantic embedding matching alongside structured filters (ICDE_PaperID_79.pdf). This enables, for example, an SQL query that finds the top-10 similar document chunks (via an ANN index) constrained by a date or author field. The challenge is choosing the optimal query plan: scanning all candidate vectors versus using the ANN index. Recent research proposes adaptive execution based on filter selectivity ( Efficient Data Access Paths for Mixed Vector-Relational Search). If a metadata filter (e.g. a specific document category) reduces the candidate set significantly, a sequential scan over those few embeddings may be faster than engaging a global index . Conversely, for broad queries with low selectivity, the vector index avoids costly distance computations on the entire dataset . Sanca and Ailamaki (2024) show that there is a crossover point (dependent on data dimensionality and hardware concurrency) where the engine should switch from brute-force to indexed search to minimize latency . Another hybrid pattern keeps vector and traditional databases side by side: embeddings are stored in a vector database for fast similarity ranking, while the original documents and metadata reside in a relational or document store. In a retrieval-augmented generation pipeline, a query embedding is used to fetch top-K similar chunk IDs from the vector database, then those IDs are used to retrieve full text or records from the document database. This two-tier design leverages the strength of each system — high-dimensional search in the vector store and reliable storage/lookup in the document store. Many vector databases now also support storing metadata with vectors and offer boolean filters or keyword search, effectively merging this two-tier approach into one system (When Large Language Models Meet Vector Databases: A Survey). For example, Weaviate and Qdrant allow hybrid queries that combine ANN similarity ranking with traditional term filters, using an internal full-text index alongside the vector index . Such solutions confirm that combining semantic vector search with classical filtering can greatly improve retrieval quality and flexibility without sacrificing performance. Ongoing research indicates that with careful system design, a unified hybrid approach can achieve near-specialized performance: there appear to be no fundamental barriers preventing a relational database from matching a vector database’s speed, given sufficient engineering effort . In practice, organizations choose a hybrid architecture that balances the convenience of a one-stop system against the absolute performance gains of dedicated vector stores (Choosing Between Relational and Vector Databases - Zilliz blog) , ensuring that LLMs can be efficiently fed with relevant document chunks at scale.