Wednesday, January 16, 2013

The Monty Hall Problem and a lesson in statistics

This semester I am the teaching assistant for a graduate level course on Statistical Mechanics. To me this represents a milestone. Just a little over two years ago I was in the department's office nearly in tears (okay, actually in tears). I was prepared to quit the program - I was just too stupid for graduate school (I'm still not sure I am smart enough, but that's the subject for another blog). It was only my second week in grad school and already I was convinced that I would never pass Statistical Mechanics.

I don't shed tears over Stat Mech anymore, in fact it's probably one of my favorite classes. Statistical Mechanics is the branch of chemistry (or physics, depending on who you ask) that applies large number statistics to molecules. Since molecules are small and there are so many of them the statistics work out nicely and we can accurately predict thermodynamic quantities (like pressure, energy, entropy, etc).

While brushing up on the subject I was reminded of an interesting statistical problem that I thought I'd share. It's called The Monty Hall Problem, and it's based on the game show "Let's Make a Deal!"


Monty Hall was the original host of "Let's Make a Deal!". Most of the game show worked by giving someone in the audience a small prize and then offering them a deal. "Keep the small prize or trade it for whatever is behind door # 1!". Sometimes door #1 got you a new car other times it was something completely useless. So here's the Monty Hall problem: 
Suppose Monty shows you three doors. You know that behind one of the doors is a new car. Behind each of the other doors is a goat. You choose a door at random (we'll say door #1). Then, Monty opens door #3 and reveals a goat. Monty then makes you an offer - You can switch and take what's behind door #2 instead of door #1. Should you switch or stick with your initial choice?
At first it may seem like switching will make no difference. When the game began your odds of winning a car were 1/3. Monty opens a door and reveals a goat, but that still leaves one goat and one car. The odds must be 50/50, right? The car must be equally likely to be behind either one of the doors. It's often the case, though, that your intuition will deceive you. Already this semester I have warned several students that they were trusting their own intuition a  little too much.

The real answer to the Monty Hall problem is that by switching your choice you move from a 1/3 chance of winning a car to a 2/3 chance. It's important to note that Monty knows where the car is and will never open a door to reveal it (that would ruin the game). Below I outline three ways of convincing yourself that this is the answer. Choose your favorite.

Thinking it through: Making a table
In the beginning of any statistics class you'll get some very easy problems. For example:
"If I roll a 6-sided die1 what are the chances that I roll a 6?"
These problems are usually pretty easy to answer by just thinking about it or in some cases writing down all the possible outcomes and counting them. Let's write out all the possible outcomes for the Monty Hall problem. This table assumes your choice was door #1 and that Monty will eliminate one of the other doors that has a goat.


Behind Door #1
Behind Door #2
Behind Door #3
Your prize
(No switching)
Your prize (Switching)

Car


Goat

Goat

Car

Goat

Goat


Car

Goat

Goat


Car

Goat


Goat

Car

Goat

Car



Just by writing out all the possible outcomes you can see that switching gives you a 2/3 chance of winning a car while not switching leaves you with a 1/3 chance. It can be tedious to write out all the possible outcomes to a problem, especially when you have a large number of events. But that's why we have math.

Mathematically: Bayes' Theorem
Bayes' Theorem is a statistical tool that lets us analyze the probability of one event happening given that another event has already occurred. In this case, what is the probability that the car is behind door #1, given that Monty reveals a goat behind door #3. The math behind Bayes' theorem is written as:
P(A|B) = \frac{{P(B|A)P(A)}}{{P(B)}}

Which is read "The probability that event A will happen given that B is true is equal to the probability that B will happen given that event A has happened multiplied by the probability of A divided by the probability of B." 

For the Monty Hall problem we have three important variables. The door you choose (Dn), the door Monty opens (Mn) and the door that actually has a car (Cn). So the probability that the car is behind door #2 (C2), given that you chose door #1 (D1) and Monty opened door #3 (M3) is:


Which works out to be:

So, if you choose door #1 and Monty reveals door #3 there is a 66.6% chance that the car is behind door #2 and only a 33.3% chance that it is behind door #1.

Mathematically: Renormalization
This is my personal explanation of the math. As such it is not strictly correct (a mathematician would likely have my head), but the math works out and it is applicable to many other problems. For this reason I've chosen to include it.  

When we first started out door #1, door #2, and door #3 all had equal probability of containing the car (1/3). Then Monty opens up door #3 and reveals a goat. You haven't changed anything, so your probability is unchanged. You still have that 1/3 chance of getting a car if you stick with door #1. However, you do have a choice. You can switch to door #2. If you choose to switch, we have to renormalize the problem. Normalization basically just means that the total probability must be equal to 1. There are a few ways we can normalize. 
1. Realize that the remaining probability is equal to the total probability minus the original probability. In this case that means 1-1/3 = 2/3.
2. Divide by the "new probability". In other words our initial probability was 1/3. Now there are only 2 possibilities. We have to renormalize to reflect that change. The probability that the car is in a door other than door #1 is (1/3)/(1/2) = 2/3.
This solution is sure to get me in trouble with mathematicians (they don't like it when you place loose with their maths), but it does work. To convince you that it's not just true for this specific case let's imagine there are N doors. The probability of choosing correctly is 1/N. Monty opens one door and allows us to switch. What is the probability that the car is in one of the other available doors (instead of door #1)? Solving it both ways from above:
1. The total must be one, and I know my original probability is 1/N. The remaining probability is 1-1/N = (N-1)/N.
2. Renormalize. (1/N)/(1/N-1) = (N-1)/N 
Renormalization is a handy tool, but you have to be careful. Running about dividing probabilities willy-nilly is sure to get you a bunch of wrong answers. It's important that you know the physical meaning behind the division that you're doing.


Notes
[1] Nerds are very easy to spot. They're the ones that ask "How many sides?" when you talk about dice. Everyone else assumes you're talking about 6-sided dice. After all, isn't that the only kind?