LLMs and RAG are Great, But Don’t Throw Away Your Inverted Index Yet

Daniel Tunkelang
3 min readMar 29, 2024

Vectors, embeddings, large language models (LLMs), and retrieval-augmented generation (RAG) represent the cutting edge of search architecture, and it is very tempting to believe we can dispense with the traditional inverted index architecture entirely. You should be excited about this brave new world, but you should also proceed with caution.

It is true that embedding-based retrieval addresses many pain points that challenge a traditional inverted index. Embeddings are less susceptible to polysemy (words having multiple meanings) and synonymy (multiple words having the same meaning). And embedding-based retrieval can be especially useful for handling long queries, especially compared to traditional methods like query expansion and query relaxation.

These sound like great arguments in favor of embedding-based retrieval. So what is the catch? Why are most companies still using a traditional — or at least a hybrid — architecture? Here are some of the main reasons.

Embedding-based retrieval is powerful, but it gains that power at the price of explainability. Vectors from embeddings tend to beless explainable than token-based representations. While a bag of words may not be a perfect representation of content, it is at least simple and understandable. In contrast, embeddings are a black box, making it hard to understand how they affect retrieval and ranking, and even harder to debug.

Embeddings also tend to be task-dependent. A single embedding model may not capture everything about a document or query. For example, in an e-commerce setting, an embedding might be more or less sensitive to variations in product type, brand, or size. Since embedding-based retrieval reduces relevance to a single similarity metric, there is a risk that a single vector representation will not address all search use cases. In contrast, token-based representations, despite being simplistic, are more flexible.

There are also computational challenges. Embeddings tend to be vectors with hundreds of densely populated dimensions. That is not necessarily a showstopper, especially if the documents they represent are large. Also, there are techniques to make the representations more compact. Still, index size matters, especially when vectors need to be kept in memory to minimize the latency of accessing them. Aside from scale concerns, exact nearest-neighbor search is not practical for most latency-sensitive applications, and even approximate nearest-neighbor (ANN) search is slower than performing simple set operations on an inverted index.

And then there is ranking. It is not clear how to combine the query-dependent similarity score with other ranking factors, particularly query-independent desirability factors. Ranking is never easy, but embedding-based retrieval introduces additional complexity.

Finally, there is the challenge of any operations that depend on retrieval, including result counts, filters or facets, and explicit sorts. As we discussed in the previous section, these are hard to implement well when we lack a principled way to manage the precision-recall tradeoff.

These are serious challenges! So it is important to go into embedding-based retrieval cautiously, recognizing that, for many applications, the costs of moving to embedding-based retrieval do not justify giving up the benefits of a traditional inverted index architecture. Or at least not yet.