Random Thoughts on Interestingness
As a computer scientist and an information scientist, I have always been fascinated by fundamental questions of computational complexity and compressibility. These days, I am less concerned with big questions like Turing completeness and the P vs. NP problem, and more concerned with quantifying interestingness, especially when randomization is involved.
Let us consider a sequence of n musical notes. For simplicity, we restrict our attention to the 12 notes in a single octave of the chromatic scale, and we assume that they are all of the same duration (e.g., whole notes).
I hope we agree that the least interesting possible sequence of n notes is a sequence of identical notes, e.g., all Cs. Such a sequence, aside from being boring for listeners, has high compressibility. Indeed, we can validate its high compressibility by using any standard compression algorithm, and we can intuitively see that producing a sequence of n identical notes does not require a program of length proportional to n. Indeed, the only part of the program that grows with n is that we need log₂(n) bits to represent n.
Let us now consider the other extreme case: a sequence of n notes, each of which is independently chosen at random from among the 12 available notes. While Schönberg might have approved of such dodecaphony, I think that most of us would find such random “music” unlistenable and hardly more interesting than the previous sequence of identical notes. But such a sequence does not have high compressibility. With high probability, the sequence is incompressible: reproducing it would require a program of length proportional to n.
So both extreme cases of low and high compressibility are uninteresting!
But is this random sequence of n notes really incompressible? After all, I just described it concisely as a random sequence of n notes.
A program that produces a random sequence of n notes is essentially the same length as a program that produces a sequence of n identical notes. The only difference is that the program to produce a sequence of n identical notes is deterministic, while the program to produce a sequence of n random notes is non-deterministic because it uses randomization.
But is a program that produces a sequence of n random notes really a compressed version of its output? Not according to conventional information theory. But, to a human ear, all random noise seems interchangeable and equivalent. So what is happening here?
Also, there is an important philosophical distinction between a sequence of n notes generated at random and that same sequence of n notes chosen intentionally. It is the sort of thing about which Borges could have written an eloquent short story (like this one). But someone who only observes the sequence of notes cannot distinguish between these two scenarios.
Intuitively, I feel that there is some notion of interestingness that relates to the length of the shortest program that can produce a sequence — but only if we can allow for non-determinism. According to this hypothetical measure, a sequence of identical notes and a sequence of random notes would be equally uninteresting. But how do we formalize such a measure?
I don’t know.
The question has bothered me for decades, but no one has ever offered a satisfactory answer or convinced me to give up the quest for one.
But perhaps someone reading this post will enlighten me.