In my investigation of Bayes Theorem, I have learned a great many things. Many of these lessons come from listening to the lectures and debates of Dr. Richard Carrier, whose lecture on Bayes Theorem is what got me to first realise the power of Bayes in the first place and how to apply the theorem in a real world situation.
So What is Bayes Theorem?
Bayes Theorem is a mathematical relationship between the probabilities of conditional events. Conditional events are different from independent events in that a conditional event’s chances of occurring are dependent on the chances of occurrence of the event it is conditional upon.
So if A and B are conditional events, with B conditioned upon the occurrence of A, then then probability that B occurs is dependent on the probability that A occurs and vice versa. The chance of me getting into an road accident is conditional upon the chance that I am not fully capable of driving my vehicle safely. Conditional probabilities are all around us, in every sphere of life and are fairly straightforward to recognise. In math terms,
P(B|A) = P(B) * P(A|B) –> The probability that B will occur given that A occurred, is equal to the probability that B will occur multiplied by the probability of A occurring given that B has already occurred.
It follows that P(B) * P(A|B) = P(A) * P(B|A)
Dividing both the sides of this equation by P(A),
we get Bayes Theorem—-> P(B|A) = P(B) * P(A|B) divided by P(A)
If we now do the following….
1. Replace events A and B with more “real life” terms like a Hypothesis H
and some Evidence E,
2. Expand the Denominator, P(E) and rewrite
P(E) = P(E) if H were true PLUS P(E) if H were false,
we get a much easier to understand (albeit longer) version of Bayes Theorem that is given below.
It is this fashion of writing of Bayes Theorem that I am most comfortable with (again, a representation I learned from watching Dr. Richard Carrier’s video). There are other representations of the theorem, but I find it easiest to understand and apply this particular variant:-
As given above, the Bayes Theorem reads as follows:
The probability that our hypothesis H is true given that we have received evidence E, is equal to :
Numerator = The Prior probability that H was true (before any new evidence was received) * the probability of seeing evidence E if our hypothesis H were indeed true
Numerator PLUS The probability that H is false * the probability of seeing evidence E if H were false.
Putting all this theory into a concrete example…..
Lets make a hypothesis H = Santa Claus visited my house last night.
Now suppose I live alone, am a rational, sensible adult and have no reason to believe that Santa Claus actually exists. I think it would be reasonable to assume a Prior H = 0.000001. Clearly, I don’t have much prior belief in the existence of Santa Claus and I give the chance that he visited my home last night a probability of 1 in a million. Now let some new evidence come in and lets see how Bayes helps me to analyse this new evidence in a rational manner to update my beliefs about Santa Claus.
Suppose I woke up in the morning to find my stockings that were hanging over the fireplace (which I had left there to dry) last night stuffed full of gifts and chocolate. Extraordinary it may sound, but there it is, concrete evidence that somehow during the night, someone stuffed my stockings with lots of gifts.
Evidence E = my stockings have somehow been stuffed with gifts and chocolates
Now, Bayes Theorem is going to help me analyse this to update my beliefs on the existence of Santa Claus. Lets look at the expanded version of Bayes Theorem again…
Using the formula above,
Numerator = P(H) * P(E|H)
P(H) = 0.000001 (Prior Probability = One in million chance that Santa came home last night)
P(E|H) = I’m going to say this would be = 0.7 (If Santa did indeed come home last night, and from what little I know about him, there is quite a good chance, at least 0.7 that he would remember to put gifts into my stockings)
So the Numerator = 0.000001 * 0.7 = 0.0000007
Now P(~H) = The Probability that H is false. This is = 1 – P(H) = 0.999999
and finally P(E|~H) = The probability that someone other than Santa crept into my house last night and stuffed my stockings with gifts. Now I live alone, on top of an isolated hillock, I have a state-of-the-art security system, and if anyone had visited my house, I would have known about it!! I check my internal security camera, to see whether maybe I had woken up at night, maybe done some sleepwalking and stuffed the stockings myself, but no, the camera shows me fast asleep in bed, the whole night.
Since I can be quite certain that NO ONE entered my house light night and I was asleep in my bed, P(E|~H) has to be small, I’m going to be conservative say about 0.0001. There is a one in ten thousand chance that someone climbed up to my isolated house on the hill, broke through past security system, and then stuffed my stockings full of presents.
Notice how I have considered that a person sneaked through my security system P(E|~H) is a thousand times more likely than a visitation from Santa Claus P(H).
Now lets plug the values into Bayes equation and see how the new evidence has changed my beliefs about Santa Claus.
Numerator = 0.000007
Denominator = Numerator + 0.999999 * 0.0001 = 0.000107
So P(H|E) = 0.0654.
So this new evidence (gifts in my stockings), rationally analysed using Bayes Theorem has increased my belief in Santa Claus, In fact, you can see from the figures, that my belief has increased from one in a million to about 6.5 in a hundred. But I did need very very extraordinary evidence to precipitate this change in my beliefs. If I had opened the front door and found reindeer tracks around my house, or if I had found soot marked footprints from my chimney on the floor, this 6.54 would probably have got even higher, but if i found my wallet missing, something that Santa would NEVER do, it would act as evidence to reduce my belief in Santa Claus.
As each new piece of evidence is perceived, it is fed into Bayes Theorem and the resulting probability (called the Posterior) serves as the new Prior in the subsequent step. If this is continued in a cycle, or a loop, the process is called Recursive Bayesian Reasoning and is one of the keys to making intelligent machines that can learn from the evidence they observe.
Here are some important lessons that I have learned about Bayes Theorem over the last few months:-
1. The Importance of the Prior. The Prior is very very important. A very low prior, or a very high Prior, would demand very very strong evidence to disregard initial beliefs. This makes perfect sense, and it follows that extraordinary claims require extraordinary evidence for me to believe them. But if the Prior were = 1 or = 0, absolutely no amount of evidence would be able to change our beliefs. This means that we should never be absolutely certain of any fact, if we don’t leave a little room for new evidence to change our minds, we are essentially blocking out any possibility of fresh evidence affecting our belief. Put another way, no amount of evidence would be able to convince us otherwise if our Prior were = 1 or = 0.
2. The Weight of the Evidence. Extraordinary claims require extraordinary evidence to overcome. If P(H) is very very small ( an extraordinary claim), P(E|H) would need to be very large to significantly change the Prior. So when being asked to believe an extraordinary claim, always look to see if there is extraordinary evidence supporting that claim. Also, the Evidence must be relative to the Hypothesis at hand. This is a critical aspect of rational analysis, and essential to a fair examination of the facts at hand. For instance it doesn’t matter how likely it is to wake up one morning and find gifts in my stockings, what does matter instead is how likely is it that Santa the cause for gifts in my stockings.
3. Other Possible Hypothesis MUST be Included in our analysis. When examining evidence we MUST allow for the possibility that just because our chosen hypothesis explains the evidence, it does not mean that NO other hypotheses exist that cannot also explain the evidence satisfactorily. Our theory need not necessarily be the ONLY possible manner for a given piece of evidence to be witnessed.
Bayes Theorem has made me wiser.
It has empowered me and has made me less likely to fall prey to irrational claims, and tainted evidence. If you are reading this blog, I urge you to investigate this further, it may take time, it has taken me six months of study just to get this far, but I am sure that like me, you too will be rewarded for your efforts. Bayes Theorem is truly the most powerful force in the universe, because it illuminates truth.
Next step for me, more advanced applications of Bayes Theorem in Bayesian Particle Filters, the key to target tracking, motion estimation and artificial intelligence. I recently finished work on a primitive particle filter simulation for a roving robot….take a look.