If we ask whether a fact about a person identifies that person, it turns out that the answer isn’t simply yes or no. If all I know about a person is their ZIP code, I don’t know who they are. If all I know is their date of birth, I don’t know who they are. If all I know is their gender, I don’t know who they are. But it turns out that if I know these three things about a person, I could probably deduce their identity! Each of the facts is partially identifying.
There is a mathematical quantity which allows us to measure how close a fact comes to revealing somebody’s identity uniquely. That quantity is called entropy, and it’s often measured in bits. Intuitively you can think of entropy being generalization of the number of different possibilities there are for a random variable: if there are two possibilities, there is 1 bit of entropy; if there are four possibilities, there are 2 bits of entropy, etc. Adding one more bit of entropy doubles the number of possibilities.1
Because there are around 7 billion humans on the planet, the identity of a random, unknown person contains just under 33 bits of entropy (two to the power of 33 is 8 billion). When we learn a new fact about a person, that fact reduces the entropy of their identity by a certain amount. There is a formula to say how much: Continue reading.