Thanks. And sorry that slide was cryptically brief.

1 min readAug 28, 2020

Thanks. And sorry that slide was cryptically brief. What I mean is that spelling correction should not restrict itself to correcting each token separately. Many spelling mistakes violate token boundaries because the user either omitted a space or included one that wasn’t necessary. In addition, the tokenization of queries with punctuation and special characters may not match how the text was tokenized in the index. But, regardless of how tokenization works, it should be possible — and practical — to correct a misspelled head query by thinking of the query as a single string; rather than first tokenizing the query and performing spelling correction one token at a time.

Written by Daniel Tunkelang

Responses (1)