In Search of Recall

Daniel Tunkelang
3 min readDec 7, 2020

--

Search developers tend to focus most of their efforts on the first page of results. As a result, they prioritize investment in ranking models, with the goal of improving quality and business metrics, such as relevance and conversion.

Precision and Recall

In information retrieval terms, this focus on the first page corresponds to an emphasis on precision, the fraction of results that are relevant. To be more precise — no pun intended — it corresponds to an emphasis on position-biased precision measures like discounted cumulative gain (DCG).

But precision isn’t the only measure of search quality. There’s also recall, which measures the fraction of potentially good results that are retrieved. Recall is about “the whole truth”, while precision is about “nothing but truth”.

There’s a tradeoff between precision and recall: efforts to improve one almost always come at the expense of the other. But search developers tend to invest less into recall than into precision, and their investments in recall are often crude. That’s a shame, since recall dramatically affects the search experience.

When Recall Really Matters

There are three search scenarios where recall is especially important:

  • Searches that return no results or only a few results. For these searches, even a small increase in recall can have a critical impact. When the number of results is low, the expected benefit of increasing recall tends to outweigh the expected cost of decreasing precision.
  • Searches for which the best results do not match the keywords. The matching failure may reflect a vocabulary gap, redundant query words, or spelling errors. It may not be obvious from the number of results that there’s a recall problem, but the missing results negatively impact metrics.
  • Searches where searchers override default ranking, e.g. sorting by price. Searcher-specified sorts generally do not promote relevant results — for example, the lowest-price results are unlikely to be relevant. Hence, search needs to manage a precision-recall tradeoff for the entire result set.

Recall affects more than just the search results. It also affect aggregates, like the total number of results and counts for facet values. These aggregates, which are especially useful for broad queries, can be sensitive to the precision-recall tradeoff for the entire result set.

Improving Recall

There are three main ways to improve recall.

  • Query expansion: a reductionist approach that expands words or phrases using a dictionary. This approach is simple, but it does not respect context (e.g., wine glasses -> wine eyeglasses). It works best in conjunction with an approach that ensures precision, such as query categorization. It’s best to keep the dictionary size manageable, as well as to avoid conflating spelling and stemming word variations with semantic synonyms.
  • Query relaxation: instead of expanding words, it relaxes them, i.e., makes them optional. The challenge is choosing which words to relax. Relaxation can drastically affect the precision-recall tradeoff, so it’s important to choose words wisely. Relaxation aims to choose the least important words, but simple statistical measures like inverse document frequency (idf) may not recognize importance in context (e.g., desk and chair -> desk chair).
  • Whole-query expansion: in contrast to dictionary-based query expansion, this holistic strategy expands the query to include the results of similar queries. Whole-query expansion maps the query to an intent and then expands the results based on other queries with similar intent. This approach is powerful: unlike dictionary-based expansion, it doesn’t suffer from loss of context. But it requires having a mapping of queries to embeddings, as well being able to obtain a query’s nearest neighbors.

Summary

Precision is important for the happy path where searchers find great results on the first page using the default ranking. But recall matters too, and not just for queries that would otherwise return few results. Recall is critical for non-default sorts, as well as for computing useful aggregates like facet counts. There are at least three ways to improve recall: dictionary-based query expansion, query relaxation, and whole-query expansion. Search developers invest in all three of these approaches, and not just in better ranking.

--

--