Thanks, Dmitry!
I've learned that annotators do best with simple, objective questions. In my experience, asking an annotator to assess whether a result is relevant to a query works reasonably well -- as long as the judgment does not depend on research or highly specialized knowledge. In contrast, assessing the similarity between results -- or between queries -- is neither simple nor objective. I have tried to tame this task with questions like "Is X a substitute for Y?" with limited success.
Thankfully, in this case, the similarity or distance function is a computed one intrinsic to our content representation (i.e., the model we use for document embedding), so we only need relevance judgments.
That said, I am interested in exploring the question of whether similarity is objective. Watch this space!