Fundamentally, machine learning means learning from data. Specifically, machine learning means learning generalizable patterns from data. After it has been trained on a collection of labelled examples, a machine learning model can assign labels to instances it has never seen before, such as distinguishing cat videos from dog videos. Not all machine learning follows this particular training protocol, but the principle of learning generalizable patterns from data is at the heart of all machine learning.
There’s more to learning than generalizable patterns from data.
But there is more to learning than learning generalizable patterns from data. If you think back to your early education, some of it probably involved rote memorization of facts, such as memorizing when your country was founded and who founded it. If you hear a person’s name today, your ability to recognize who that person is (e.g., a pop music artist vs. a football player) has nothing to do with general pattern recognition and everything to know with your knowledge of individual facts about the world.
Much as we humans use such knowledge of individual facts to complement our ability to learn and recognize general patterns, our machines need to apply a similar strategy in order to thrive in the real world.
An illustrative example: named entity recognition.
Consider a classic problem faced by search applications: named entity recognition (NER). Modern NER systems rely on machine learning — specifically, they are trained on collections of text in which entities have been annotated with their entity types (e.g., people, places, organizations).
NER models trained this way are able to learn general patterns. For example, if we see a sentence about the “ancient regions of Tlön”, we can infer that Tlön is a place, even if we have never seen the word before.
Such a model, however, cannot infer a person’s profession from that person’s name — though there are some patterns in CEO first names) Still, such patterns break down in the face of unique names, like mine.
Like people, machines can only hope to learn isolated, individual facts by memorizing them. Such explicit memorization goes against the spirit of generalization that underlies machine learning — though prominent AI researchers have characterized large language models (LLMs) as nothing more than “stochastic parrots”.
Regardless, it is important to distinguish what can be learned as general patterns from what can only be learned through explicit memorization. Early NER researchers new this, and they relied on “gazetteer” approaches for the latter. In my experience, gazetteer approaches have fallen out of favor, replaced by modern machine learning approaches. I appreciate the appeal of machine leaning, but I also see its limitations.
What to do when machine learning hits a wall.
The last decade has demonstrated how machine learning can learn from data and apply that learning to countless real-world domains. But not all learning can be achieved by learning generalizable patterns from data. Some things must be learned through explicit memorization. When machine learning hits that wall, we need to be flexible. Our challenge is to recognize when machine learning hits that wall, and to act accordingly.