Query understanding and relevance are key aspects of search, but they don’t tell the whole story. A holistic framework for search calls for a broader perspective. Search is about computers helping humans help themselves, a framework also known as human-computer information retrieval (HCIR).
In order to understand search as HCIR, we need to consider it from both the human and computational perspectives.
The Human Perspective
Search as a term is overloaded: it describes both the desire to find something and the process for doing so. Understanding search from a human perspective requires examination of both of these aspects.
Why People Search
People use search engines to satisfy an information-seeking intent. There are two main classes of information-seeking intent: known-item search and exploratory search.
Known-item search is exactly what it sounds like: the searcher knows the item that he or she is looking for. Examples of known-item search include looking for a specific document, product, or person. The defining characteristic of a known-item search is that the searcher knows the target of the search and will recognize it when he or she sees it.
Exploratory search is the opposite of known-item search: the searcher doesn’t have a particular item in mind, but instead wants to find items that satisfy one or more criteria. Exploratory search often involves discovery: by searching, the searcher learns about the landscape of items that satisfy the initial criteria and then refines the criteria based on that understanding. Examples of exploratory search include finding documents about a particular topic, products of a particular type, and people with specified attributes.
How People Search
Conventionally people search by typing words into a search box. But this description is neither specific nor complete. People’s search queries tend to fall into a handful of classes, depending on why they are searching. In addition, people use search user interfaces like autocomplete and faceted refinement when search engines provide them.
To perform known-item search, people tend to search for the name or title of the desired item. Autocomplete can be particularly useful for these searches, since it helps searchers type in long titles correctly without too much effort. Some known-item searches specify the desired item indirectly, e.g., a search for “google ceo” instead of “sundar pichai”.
The ideal search result for a known-item search is the desired item. When a known-item search returns more than one result, the searcher scans down the page, reformulates the query, or uses faceted refinement (or some other filtering mechanism) to narrow the query and filter out irrelevant results.
Exploratory search needs are more complex and varied than known-item search, and exploratory search behavior reflects this complexity. Exploratory searchers tend to search for categories, or for categories combined with aspects. For example, a person might search for “headphones”, “sony headphones”, or “wireless headphones”. Autocomplete can be useful for many of these queries, since they are often small set of head (frequent) queries.
Unlike known-item searchers, exploratory searcher are interested in more than just a single target result. The searcher is likely to scan deeply into the ranked result set, often going beyond the first page of results. Exploratory searchers are also likely to use faceted refinements, not only to narrow down their results, but also to discover the landscape of the results. For example, a person searching for headphones may use the refinements to discover the variety of styles, prices, brands, etc. Exploratory searchers are also likely to override the default relevance sort (e.g., sorting by price), if the search engine allows them to do so.
The Computational Perspective
Search engines have two main tasks: figure out what searchers want, and return results that are relevant to their intent. In addition, search engines can provide opportunities for searchers to clarify or refine their queries.
How Search Engines Understand
Understanding a search query starts with character filtering (which includes converting the query to lowercase) and tokenization to represent the query as a sequence of words. The words may then be stemmed or lemmatized in order to ignore inflection, particularly singular vs plural forms of nouns.
The search engine then rewrites the query to improve precision and recall. It improves precision by segmenting the query into semantic units, recognizing entities, and scoping the query to match each segment to the appropriate attribute. It improves recall through query expansion (particularly synonyms) and query relaxation (making some of the query words optional).
In addition to these reductionist steps that split the query into components and determine what those components mean, search engines may take a holistic approach to query understanding. Holistic query understanding looks at the query as a whole, aiming to be broad rather than deep. It can include language identification, query categorization (e.g., into a category taxonomy), and establishing high-level intent (e.g., known-item search vs. exploratory search). The search engine may also convert the query into a semantic vector representation using word embeddings.
How Search Engines Respond
Understanding the query is only part of a search engine’s job. The search engine uses that understanding to retrieve, rank, and present results.
Retrieval typically uses an inverted index (also called a posting list) to identify the documents in the collection that contain every query term. It performs a set intersection on the lists of document identifiers in the inverted index corresponding to all of the query terms. Indexing and retrieval are a bit more complicated if the search engine supports operators, such as quotes to denote phrases or negation to exclude results containing particular words.
Query rewriting may also transform the query into a Boolean expression of the query terms: typically an AND of ORs, where each expanded query term becomes an OR clause, and all the OR clauses are connected by an AND. This expression is used to retrieve matching documents from the inverted index.
The search engine then scores the matching documents using a function intended to reflect their relevance. A typical scoring function combines query-dependent and query-independent factors. Common query-dependent factors include what document field the query terms match (e.g., title vs. description) and whether two or more consecutive query terms occur in a matching document as a phrase, or at least near each other. Common query-independent factors include popularity / frequency of access and recency. The scoring function may be a hand-tuned composition of these factors, our it may be a machine-learned model in which these factors serve as features.
The search engine then returns the results in descending order of score, typically as a paginated result set. Since most users don’t look past the first page of results, that is the primary focus of most work on search relevance. Sometimes the search engine overrides the scoring function to ensure some degree of diversification, which could be as simple as removing duplicates or as complicated as considering the diversity of the search results as part of a whole-page relevance score that it optimizes.
In addition to computing a ranked list of results, the search engine computes search result snippets, also known as as search result summaries, to present with each result. The search engine can also compute faceted refinement options, which are particularly useful for exploratory searches.
How the search engine presents results matters for two reasons. The first is that an effective presentation allows searchers to quickly determine whether a particular search result is relevant to their needs. The second is that the presentation of results should reflect the nature of the query and, when the search engine can infer it, the underlying information-seeking task.
The search engine has to do more than retrieve the right results and rank them by relevance. It needs to present the results in a way that communicates information to the searcher effectively and efficiently.
Having considered search from both human and computational perspectives, we now turn to the question of how to evaluate search — that is, determining how well a search engine helps people address information-seeking needs.
Precision and Recall
Searchers expect search engines to tell the truth, the whole truth and nothing but the truth. That is, search engines are supposed to return all the results that are relevant, and only those results.
Two quantities measure how well a search engine meets this expectation. Precision is the fraction of returned results that are relevant, while recall is the fraction of relevant results that are returned.
A recall of 100% means the returned results include all of the relevant results (the whole truth), while a precision of 100% means the results include only relevant results (nothing but the truth).
In practice, neither precision nor recall is equal to 100%, let alone both. Tuning a search engine is usually a trade-off: increasing precision decreases recall, and vice versa.
Moreover, since search engines return results as ranked lists, result position matters, particularly for precision. There are two simple ways to incorporate position into the evaluation metric. The first is to only measure precision only for the top-ranked (e.g., the first 10) results. The second is to discount the importance of a result’s precision based on its position. The discount factor is typically logarithmic in the position.
Precision and recall are useful measures, but computing them requires knowing whether a result is relevant. How do we obtain this knowledge? The short answer: from people. The longer answer: using either explicit or implicit relevance judgments.
Explicit Relevance Judgments
Explicit relevance judgments take a direct approach: each task involves a human evaluator looking at a search query and a search result, and then deciding whether the result is relevant to the query. The judgment can be binary (i.e., relevant vs. not relevant), or it can allow for varying gradations of relevance (e.g., a scale from 1 to 4).
Collecting explicit judgments for a representative sample of query-result pairs makes it possible to estimate precision-based measures. For position-based precision measures, the sample is skewed towards top-ranked results.
Implicit Relevance Judgments
Since searchers use search engines to satisfy information-seeking needs, an alternative way to collect relevance judgments is to treat searcher behavior as a collection of implicit relevance judgments.
For example, clicks on results are a form of implicit relevance judgments, since searchers generally click on results they find relevant. Actions on those results, such as searchers buying products, serve as even stronger implicit relevance judgments. The dwell time on a result that follows a click can serve as an intermediate signal between clicks and actions.
Trade-Offs Between Explicit and Implicit Relevance Judgments
Explicit relevance judgments have the benefit of being explicit. The human evaluator unambiguously provides a judgment as to whether he or she believes that a particular result is relevant for a particular query. When the question of relevance is objective and does not require specialized domain knowledge or personal context, explicit relevance judgments are very useful.
Explicit relevance judgments have two main drawbacks. The first drawback is that they aren’t free: judgments cost money to pay the evaluators and time for them to perform the evaluation. The second drawback is that an evaluator may not be able to infer the searcher’s intent from the query. Sometimes relevance requires knowledge or context beyond the query, and there’s only so much that an evaluator can guess.
In contrast, implicit relevance judgments are effectively free, since they come from the search engine’s data exhaust. However, they have the drawback of being implicit and thus not entirely reliable. Just because the searcher clicks on a result, there’s no guarantee that the result was relevant to the query. The searcher may simply have been curious, or unable to determine the relevance of the result without clicking through and looking at it. Moreover, the lack of a click or action may not mean that the searcher finds the result irrelevant.
The best practice for evaluation is to recognize these trade-offs and evaluate search using a combination of explicit and implicit relevance judgments.
Search is about computers helping humans help themselves. Searchers come to search engines with information-seeking needs; search engines have to figure out those needs and return relevant results. Through a combination of explicit and implicit relevance judgments, we measure the effectiveness of the search process and bring the whole story of search together.