Putting Search Ranking in Perspective

Search is different.

A search engine elicits the searcher’s explicit intent, expressed as keywords, and this explicit intent is, by far, its most valuable input. Searchers, quite understandably, expect results that are relevant to their expressed intent. Ranking is still valuable, but it plays less of a role than for other applications.

Foremost, ranking should respect query understanding.

Before a search engine retrieves and ranks results, query understanding maps the query to a representation of searcher intent. Good ranking depends on robust query understanding. Investing in sophisticated ranking models is premature if your search engine cannot understand the searcher’s query.

Ranking should focus primarily on query-independent signals.

Once query understanding establishes a robust representation of searcher intent, a relevance model should ensure that the retrieved results are all relevant. A relevance model is essentially a binary classifier. Ranking is not a substitute for a relevance model, as becomes evident when searchers override the default ranking, e.g., to sort by price.

Two reasons to use query-dependent signals for ranking:

  • Prototypicality. For example, a query can be associated with a category or price distribution of results. There can be query-dependent prototypicality signals reflecting how a result fits into these distributions.
  • Non-binary relevance. There may be gradations of relevance — particularly when a search engine uses query relaxation to increase recall. The relevance model score can serve as a query-dependent ranking signal.

A ranking model only learn from signals that searchers see or infer.

Ranking can draw on many sources for signals. Some content signals, such as result title or structured data fields (like brand or price), are clearly visible on the search results page. There are also signals that a searcher can often infer from the search results page, such as category or image quality. Finally, there are feedback signals, such as average rating and number of reviews, that are usually displayed on the search results page. All of these signals can influence search behavior, and thus they are fair game for training a ranking model.

A/B-testing is the gold standard, but offline evaluation is possible.

The only way to know that a ranking change is an improvement is to run an A/B test on enough traffic to obtain a statistically significant outcome. For a change that only affects a small fraction of queries, you should scope the A/B test — and the analysis — to the affected search sessions.

Summary: keep ranking in perspective, and rank wisely.

Ranking matters for search, but it is no substitute for query understanding and a robust relevance model. Ranking should focus first on query-independent signals for desirability and second on user-dependent signals for personalization. Only then should it consider query-dependent signals for prototypicality or non-binary relevance. A ranking model can only learn from signals that searchers can see or infer. Finally, while online A/B-testing is the gold standard, offline evaluation using log replay is useful as a sanity check.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store