False positives suck, of course. But consider the alternatives — at least for true positives, which should be far more frequent than false positives if you’ve done a decent job of bigram detection.
If you ignore the bigrams, then most of your results are irrelevant. Boost bigram matches a little bit? Then you still have a lot of irrelevant results. Boost bigram matches a lot? Then it’s not so different than enforcing them, except for the bad results at the bottom — which are only an issue if you report the number of results or have faceted refinement.
I feel we implemented this concept pretty well at LinkedIn — not so much for bigram enforcement but for query understanding generally. A large fraction of searches are automatically rewritten to be more precise but offer an escape hatch to address the occasional false positives. Judging by the significant metrics improvements and the lack of complaints, I’d say users are pretty happy with the improved relevance.