Not All Recall is Created Equal

2 min readFeb 24, 2025

Search application developers constantly navigate tradeoffs, particularly between precision and recall. Precision measures the fraction of retrieved results that are relevant, while recall measures the fraction of relevant documents that are retrieved. In simple terms, precision ensures “nothing but the truth,” whereas recall strives for “the whole truth.”

However, the standard definition of recall applies to retrieval, not ranking. A common approach prioritizes recall in retrieval while relying on ranking to surface the most relevant results. After all, if a document is excluded during retrieval, ranking cannot bring it back. That said, retrieval cannot ignore precision entirely — especially when searchers can re-sort results, such as by price or popularity.

Even this approach remains too simplistic for real-world applications. Consider e-commerce: failing to retrieve relevant best sellers is far more damaging — to both searchers and the business — than failing to retrieve products that fewer searchers want to buy.

Cumulative gain provides a useful framework to understand this nuance. While retrieval should aim to include all relevant results, some contribute more value than others. What we often care about is not just the fraction of relevant results retrieved, but the fraction of total utility captured. This perspective aligns with ranking metrics like discounted cumulative gain (DCG), which prioritizes surfacing the most desirable results.

The interplay between recall and ranking underscores the importance of evaluating recall-oriented retrieval changes within the broader search experience. If searchers never see the additional results — or if their experience doesn’t improve — the extra computational effort is wasted.

Measuring recall in absolute terms is notoriously difficult, as it requires exhaustive labeling of relevant documents. A more practical approach is to analyze how retrieval changes impact the results searchers actually see, particularly on the first page. Query log replays can help measure these shifts offline. While such analysis doesn’t determine whether changes are beneficial, it does establish an upper bound on their potential impact.

This approach is inherently application-specific, as it ties recall to the ranking model. If ranking is weak, improved retrieval may offer little benefit — and may even degrade performance. Fortunately, ranking can be evaluated independently by retrieving the full corpus — or approximated by using highly recall-biased retrieval techniques.

Ultimately, we should not improve recall for its own sake, nor should we assume all recall is equally valuable. Every relevant result matters, but some matter more than others. We should not judge retrieval in isolation; we need to measure its contribution to the overall search experience.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Search Engines

Information Retrieval

Recall

Relevance

Search Experience

Written by Daniel Tunkelang

6.6K Followers

2 Following

High-Class Consultant.

Responses (1)

Write a response

What are your thoughts?

Also publish to my profile

More from Daniel Tunkelang

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

Daniel Tunkelang

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

Replacing traditional search with AI-powered search means embedding-based retrieval and possibly retrieval-augmented generation (RAG).

Apr 8, 2024

ChatGPT, Are You Just Telling Me What I Want to Hear?

Daniel Tunkelang

ChatGPT, Are You Just Telling Me What I Want to Hear?

If I had completely different beliefs, would ChatGPT be as gleefully supportive of my perspective? Is it just saying what I want to hear?

Mar 3

Precision, Recall, and Desirability: A Deep Dive

Daniel Tunkelang

Precision, Recall, and Desirability: A Deep Dive

This deep dive on precision, recall, and desirability discusses defining, motivating, measuring, identifying, and addressing these…

4d ago

Query Understanding

Daniel Tunkelang

Query Understanding: An Introduction

Search engines are so core to our digital experience that we take them for granted. Most of us cannot remember the web without Google to…

Dec 2, 2023

See all from Daniel Tunkelang

Recommended from Medium

Fired From Meta After 1 Week: Here’s All The Dirt I Got

Sebastian Carlos

Fired From Meta After 1 Week: Here’s All The Dirt I Got

This is not just another story of a disgruntled ex-employee. I’m not shying away from the serious corporate espionage or the ethical…

Jan 8

403

Everything You Need To Know on LLMs : Brick by Brick

Data Science Collective

Ashish Abraham

Everything You Need To Know on LLMs : Brick by Brick

A comprehensive study on LLMs , explored layer-by-layer

4d ago

Human Parts

Devon Price

Laziness Does Not Exist

Psychological research is clear: when people procrastinate, there's usually a good reason

Mar 23, 2018

2261

The Death of Product Development as We Know it

Julie Zhuo

The Death of Product Development as We Know it

Goodbye three-legged stool, 2-pizza teams, and "managers"

6d ago

This new IDE from Google is an absolute game changer

Coding Beauty

Tari Ibaba

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Mar 11

175

Luck Favors the Prepared: My Journey Across Three Roles at Airbnb

Robert Chang

Luck Favors the Prepared: My Journey Across Three Roles at Airbnb

Career growth doesn’t always mean switching companies — sometimes, the best opportunities are right where you are

Mar 8

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech