If your business depends on search, it’s critical to track search effectiveness using an evaluation methodology. In my experience, most organizations rely on implicit relevance judgments derived from searcher behavior.
For ecommerce search, behavioral metrics usually come from two signals of search success: clicks and conversions (purchases). Many other domains use analogous signals. For example, in job search, there are clicks and job applications. This pattern is almost universal.
Almost, but not quite.
An exception that I recently encountered in my consulting work is a site where most search results are long documents. Searchers tend to make short queries (1 to 3 words), click on results based on titles or snippets, and then skim or search through the results in the hopes of satisfying their information needs.
While clicks still serve as a relevance signal, they are only a weak signal — since the titles and snippets are usually too general for the searcher to definitively establish relevance. Unfortunately, there’s no reliable conversion action to provide a stronger signal. Searchers can download documents, but they rarely do so. As a result, it’s difficult to establish from behavioral data whether a searcher found the information that he or she was looking for.
The dwell time on a clicked result can serve as an intermediate signal between clicks and conversions. In this case, the dwell time roughly corresponds to the amount of time that a searcher spends reading a clicked document.
Unfortunately, dwell time is an ambiguous signal. A very short dwell time (e.g., less than 10 seconds) generally indicates that the searcher quickly found enough information to rule out the document as irrelevant. But it’s not clear how to interpret longer dwell times — especially for very long documents. Reading through the document could take 10 minutes or even an hour, but searchers rarely spend more than a few minutes reading a document online. It’s possible that they stop when they find what they’re looking for — but also possible that they stop when they give up looking.
There is no perfect substitute for a strong signal like conversion. But there are some strategies you can use to shed some light on search success.
One strategy is to build a richer representation of session behavior. For example, a searcher who clicks on other results from the same search or repeats the query probably hasn’t learned much from the first clicked results. But a searcher who proceeds with a different query is likely to either be moving on to a different task or at least to have learned something useful from the clicked result. Conversely, a searcher who “pogo-sticked” between the search results page and several documents may have finally been satisfied with the last one.
Another is to try to instrument the browser client to detect actions like scrolling and search within the page. Ideally, these make it possible to distinguish between time spent looking for content and time spent actually consuming it. In general, a more detailed instrumentation of searcher behavior on the result page provides a richer picture than dwell time alone.
Finally, if you have sufficient traffic and patience — and if you have a way to identify users or at least track them across visits — then you can try to observe retention as a function of relevance. It’s reasonable to assume that users will search more frequently if they obtain better relevance. Unfortunately, retention metrics, such as the number of days between visits, are noisy and insensitive. Part of the problem is that it’s hard change users’ initial impressions of search engine performance. But if you’re making changes to relevance, you can at least run A/B tests on new users and try to observe differences in retention.
None of these strategies are as good a robust conversion metric. You could put a button on the page for searchers to indicate whether or not the result was relevant, but it’s unlikely that searchers will invest much effort in providing feedback if doing so doesn’t offer them any value in return.
Alternatively, it may be more effective to invest effort into soliciting more specific queries (especially through autocomplete) and into improving snippets — in the hopes of strengthening clicks as a relevance signal. Also, if you have long documents and users are only searching for small part of them, you might make everyone happier by breaking up the documents into smaller subdocuments and indexing each subdocument separately.
Sometimes, when you can’t come up with a great solution, the better strategy is to change the problem.