Indexing by Intent

Daniel Tunkelang
2 min readDec 18, 2023

Search developers mostly focus on mapping queries to results. This perspective is natural, insofar as search starts with the user formulating a query and (hopefully) ends with finding a result that satisfies the query. But sometimes it is useful to invert this perspective, mapping results to queries that represent search intents.

In effect, we are just flipping a duality: instead of modeling each query as a bag of documents, we model each document as a bag of queries. We can then ask questions like which queries that match the document can actually retrieve it (which turns out to be a useful recall metric!).

Modeling documents as bags of queries is an old idea. In 2010, Jeremy Pickens, Matthew Cooper, and Gene Golovchinsky proposed a “reverted index” similar to what I am proposing here. They in turn drew inspiration fro, Leif Azzopardi and Vishwa Vinay’s 2008 work on accessibility in information retrieval, which borrowed the concept of “accessibility” from land use and transportation planning. After all, search is a journey!

But sometimes an old idea needs new life. Here are some direct ways to apply the bag-of-queries model:

  • Measuring recall. The idea is to take a representative sample of documents, find queries for which the result is relevant, and see how often those queries retrieve the documents. More details in this post on “Using Retrievability to Measure Recall”.
  • Detecting spam / abuse. A document should be about something, and the bag-of-queries model gives us a searcher-centric representation of that aboutness. If the queries in the bag are spread out over too much of the intent space, that suggest that the document is trying to be too many things to too many people — which is usually a sign of spam or abuse.
  • Supply / demand alignment. In ecommerce, content, and talent marketplaces, suppliers often struggle to find their place in the competitive landscape. Modeling their offerings as bags of queries helps them understand what demand they are targeting, which in turn can help them pursue niches where demand exceeds supply.

I expect that these applications only scrape the surface of what we can do with the bag-of-queries model. An important nuance is that queries are not the same as search intents. It is likely that, even if a document maps to a single intent, that intent will be expressed by a variety of different queries.

Regardless, this duality serves as a reminder that, just as queries target results, results effectively target queries. Search is about connecting intents to satisfaction of those intents, and connection is a two-way street.

--

--