I recently came across this old video from the MindYourDecisions Youtube channel by Presh Talwalkar (which I find to often be a good source for interesting problems). At first I was sure the video was wrong. Then I thought some more and realized he was right. Finally, I am convinced that he is wrong, but about something else. This post is me shaking my angry fist at the internet and since it's about rolling a die, I figured it was close enough to being about games to count.

First, let's state the problem of the video:

*You roll a fair dice until you get a 6. What is the expected number of rolls, including the roll of 6, conditioned on the event that all previous rolls were even numbers?*

There are two curious changes from the way the problem is stated in most other places that are linked in the video. First, most statements use the singular "die" and not "dice". This is perhaps leaning on the possible internet origin post of the problem (it is attributed as originally coming from Elchanan Mossel given to undergrad students during a class at UPenn in 2014-5) which uses "dice" in the singular, though this is changed to "die" in the follow-up post. I find this choice curious because while I do see some usage of dice in the singular (perhaps more in non-American contexts like Shut Up and Sit Down - though even they've corrected dice to die) I see no argument for discarding die as the singular. That is, I've never seen any assertion that the singular "die" is incorrect. Even the dictionary used in the video says the following:

*Historically, dice is the plural of die, but in modern English dice can be both the singular and the plural: throw the dice could mean a reference to either one or more than one dice*

Note the word "can" is used, not "is". This is even more clear in the phrasing on the US version instead of the UK version of the same website:

*Historically, dice is the plural of die, but in modern English dice is sometimes also used in the singular: throw the dice could mean a reference to either one or more than one dice*

This would leave me with little support for labeling the singular "dice" as wrong, but I still don't see why someone accustomed to the singular "die" would choose to abandon it and introduce ambiguity. I'm used to the singular "die", though, and anything else just sounds weird. If you learned "dice" and are used to it, I could see it being otherwise.

Second, and perhaps more relevantly, most statements of the problem say that *all* rolls are even numbers instead of all *previous* rolls. The video makes a big distinction between these and claims them to give different answers. In fact, it claims that the answer to the "all rolls" version of the problem is the oft-given erroneous answer of 3. This is wrong, as I will try to demonstrate here. I will focus on how the two problems are the same. There are other good sources for how to approach the main problem, like the one here.

The common wrong answer of 3 comes from the idea that if we condition on rolling even numbers then that is equivalent to rolling a three-sided die with faces labeled 2, 4, and 6. However, that is not true. Instead, we can look at the problem setup to construct all sequences that end in a 6 and exclude those that contain odd numbers. Since 6 is an even number, we will see that conditioning on rolling all even numbers is the same as rolling all even numbers before the 6.

To quickly reference why the average number of rolls if we rolled a three-sided die until we got a particular side, we can look to our previous work with Memoir '44, where we found that the expected number of rolls needed to achieve a roll with probability $p$ is $1/p$. In this case the probability is $1/3$, so the expected number of rolls, $\mathbb{E}N_3$ is $1/(1/3)=3$. (The notation $N_3$ uses the subscript 3 because there are three sides to the die here.) We can find this by starting by the probability of getting a roll of length $n$,

\begin{align}
P(N_3=n) = \left ( \frac{2}{3} \right ) ^{n-1} \cdot \frac{1}{3},
\end{align}

with an often-useful shortcut to computing the expected value for non-negative integer-valued random variables,

\begin{align}
\mathbb{E} N = \sum_{n=0}^\infty P(N > n).
\end{align}

There are more details in the previous post on the toughest unit in Memoir '44.

Let's start by looking at sequences of six-sided die rolls of length 1 that end in a 6. Actually, there's just one, as shown in Table 1. There I've annotated it with a * to indicate that it meets our conditioning criteria of containing only even numbers. All $1/1$ of the sequences are allowed.

\begin{align*}
\begin{array}{c}
6 *
\end{array}
\end{align*}

Table 1: Sequences of length 1

Now, let's look at sequences of length 2, where only $2/5$ are allowed.

\begin{align*}
\begin{array}{ccccc}
16 \phantom{*} &
26 * &
36\phantom{*} &
46 * &
56\phantom{*}
\end{array}
\end{align*}

Table 2: Sequences of length 2

Finally, let's look at sequences of length 3, where $4/25$ are allowed. We'll stop here lest the number of sequences gets too cumbersome.

\begin{align*}
\begin{array}{ccccc}
116 \phantom{*} & 216 \phantom{*} & 316 \phantom{*} & 416 \phantom{*} & 516 \phantom{*} \\
126 \phantom{*} & 226 * & 326 \phantom{*} & 426 * & 526 \phantom{*} \\
136 \phantom{*} & 236 \phantom{*} & 336 \phantom{*} & 436 \phantom{*} & 536 \phantom{*} \\
146 \phantom{*} & 246 * & 346 \phantom{*} & 446 * & 546 \phantom{*} \\
156 \phantom{*} & 256 \phantom{*} & 356 \phantom{*} & 456 \phantom{*} & 556 \phantom{*}
\end{array}
\end{align*}

Table 3: Sequences of length 3

It looks like for a sequence of length $n$, then $2^{n-1}$ of the $5^{n-1}$ sequences are allowed under the conditioning. This disproportionately removes long sequences, which is why the average is so short. You can think of it this way: the longer the sequence, the more likely it is to have at least one 1,3, or 5 before conditioning. The probability of a sequence being length $n$ is the probability that the first 6 was on roll $n$. Thus, all rolls before this must be something other than 6. The probability of this happening on each roll is $5/6$, while the probability of getting a 6 in round $n$ is $1/6$. Thus, if $N_6$ is a random variable equal to the length of the sequence of rolling a six-sided die, then

\begin{align}
P(N_6=n) = \left ( \frac{5}{6} \right) ^{n-1} \cdot \frac{1}{6}.
\end{align}

However, if we only allow sequences with even numbers, then there are only two valid choices for each roll before the last, thus the individual roll probability changes from $5/6$ to $2/3$ and the sequence probability changes accordingly. With only even rolls allowed, there are three options instead of six and two of them instead of five continue the sequence to be longer. (Heads up for skimmers: this part is wrong and will be corrected later.)

\begin{align}
P(N_6=n \, | \, \text{even rolls}) &= \left ( \frac{2}{3} \right) ^{n-1} \cdot \frac{1}{6} \qquad (1)
\end{align}

Note that this probability is half of the probability $P(N_3=n)$ from above.

\begin{align}
P(N_6=n \, | \, \text{even rolls}) &= \frac{1}{2} P(N_3=n)
\end{align}

Using the definition of expected value for discrete-valued random variables,

\begin{align}
\mathbb{E} N = \sum_n n \cdot P(N=n),
\end{align}

we can find the expected value of $N_6$ given all even rolls.

\begin{align}
\mathbb{E} \left [ N_6 \, | \, \text{even rolls} \right ] &= \sum_{n=1}^\infty n \cdot P(N_6=n \, | \, \text{even rolls}) \\
&=\sum_{n=1}^\infty n \cdot \frac{1}{2} \cdot P(N_3=n) \\
&=\frac{1}{2} \cdot \underbrace{\left (\sum_{n=1}^\infty n \cdot P(N_3=n) \right )}_{\mathbb{E} N_3}\\
&= \frac{1}{2} \cdot 3 \\
&= 1.5
\end{align}

That's a little hand-wavy though, so let's take this step slower. We're interested in conditional expectation, and thus conditional probability.

\begin{align}
\mathbb{E}\left [N \, | \, A \right ] = \sum_n n \cdot P(N=n \, | \, A)
\end{align}

Here $A$ is some generic event. Recall that conditional probability is given by the following,

\begin{align}
P(N=n \, | \, A) = \frac{P( N=n \cap A)}{P(A)},
\end{align}

where $\cap$ means *and*, that is that both the thing on its left and its right are true. For an individual roll that continues the sequence (the roll doesn't equal 6), which we'll label as event $C$ for continuing, the conditional probability is as follows.

\begin{align}
P(C \, | \, \text{even roll}) &= \frac{P(C \cap \text{even roll})}{P(\text{even roll})} \\
&=\frac{\;\cfrac{2}{6}\;}{\;\cfrac{3}{6}\;} \\
&=\frac{2}{3}
\end{align}

Now we apply this to the probability of the final roll. We'll say the probability of ending is $P(E)$.

\begin{align}
P(E \, | \, \text{even roll}) &= \frac{P(E \cap \text{even roll})}{P(\text{even roll})} \\
&=\frac{\;\cfrac{1}{6}\;}{\;\cfrac{3}{6}\;} \\
&=\frac{1}{3}
\end{align}

This is not what we want. I've brought us to the wrong answer of 3! (I haven't quite gotten there, but I think you can take the next few steps.)

At this point, I think it may be useful to acknowledge one of the commenters to the video for helping me to realize what was going on. I was convinced that the answer should be the same for the "previous" case as the all evens case and didn't believe that 1.5 was the correct answer. Marce Villarino posted some python code that quickly convinced me that 1.5 must be right, but also helped me see that the two cases are actually the same. The code follows.

##### Record successive thows of a six sides die, untill a 6 is thrown.

##### If the record does not contain even number, record it's length.

import random

import time

def calcOnlySixStops(howManyTimes):

result = list()

start_time = time.time()

for i in range(howManyTimes):

a = list()

a.append(random.randint(1,6))

while a[-1] != 6:

a.append(random.randint(1,6))

if not((1 in a) or (3 in a) or (5 in a)):

result.append(len(a))

print("--- %s seconds ---" % (time.time() - start_time))

return result

lolailo = calcOnlySixStops(1000000)

print(sum(lolailo)/len(lolailo))

####### about 1.5

####### it takes ca. 43 seconds on my laptop

##### Same, but throw a three sided die.

def calcThreeSidedDie(howManyTimes):

result = list()

start_time = time.time()

for i in range(howManyTimes):

a = list()

a.append(random.choice([2,4,6]))

while a[-1] != 6:

a.append(random.choice([2,4,6]))

result.append(len(a))

print("--- %s seconds ---" % (time.time() - start_time))

return result

lolailo = calcThreeSidedDie(1000000)

print(sum(lolailo)/len(lolailo))

####### abt 3.0

####### it takes ca. 19 s on my laptop.

As you can see from the annotated results, or if you run it yourself, it shows that the expected value is approximately 1.5. It uses a Monte Carlo approach with a million trials, which seems sufficient here. Also, if you look through the algorithm to condition on the previous rolls being even, you'll see that it's exactly the same as what you would do if you were to condition on all rolls being even.

The key error that we made is the same one to get the wrong answer most of the time, I think, just a little drawn out. Note that we conditioned on individual even rolls, not all rolls being even. That's the wrong event. If we want to take that approach, we need to condition the entire $P(N_6=n)$ on getting even rolls. (We could also use the individual pieces, but we still must condition on getting *all* even rolls, not just the one under consideration at that moment.)

\begin{align}
P(N_6=n \, | \, \text{even rolls}) &= \frac{P(N_6 =n \cap \text{even rolls})}{P(\text{even rolls})}
\end{align}

The numerator is relatively easy to figure out. Based on what we saw above, we find that

\begin{align}
P(N_6 =n \cap \text{even rolls}) = \left ( \frac{2}{6} \right)^{n-1} \cdot \frac{1}{6}
\end{align}

To get this we we took the unconditioned probability of getting a certain number of rolls and combined it with the fraction of the rolls that meet our all evens criteria. But what is the probability of even rolls? Note that this isn't the probability of getting even rolls given a length, it's the overall probability. Here it may be useful to turn things around.

\begin{align}
P(\text{even rolls}) &= \sum_n P(\text{even rolls} \, | \, N_6=n) \cdot P(N_6=n) \\
&= \sum_n P(\text{even rolls} \cap N_6=n)
\end{align}

Since we already know this joint probability from above, we can plug it in.

\begin{align}
P(\text{even rolls}) &= \sum_{n=1}^\infty \left ( \frac{2}{6} \right)^{n-1} \cdot \frac{1}{6}\\
&= \frac{1}{6} \cdot \sum_{n=1}^\infty \left ( \frac{1}{3} \right)^{n-1}
\end{align}

The above includes a geometric series. We can adjust the range and the exponent to make it look more familiar

\begin{align}
P(\text{even rolls}) &= \frac{1}{6} \cdot \underbrace{\sum_{n=0}^\infty \left ( \frac{1}{3} \right)^{n}}_{\cfrac{1}{1-\cfrac{1}{3}}} \\
&= \frac{1}{6} \cdot \frac{3}{2}\\
& = \frac{1}{4}
\end{align}

Does this number make sense? At first, it may seem like too few of the sequences. But consider that on each roll, we throw out half of the results. That is, every time we roll the die half of the results are excluded.

Now we have the pieces we need and we can find the average number of rolls.

\begin{align}
P(N_6=n \, | \, \text{even rolls}) &=\frac{P(N_6 =n \cap \text{even rolls})}{P(\text{even rolls})} \\
&= \frac{\left ( \cfrac{2}{6} \right)^{n-1} \cdot \cfrac{1}{6}}{\cfrac{1}{4}} \\
P(N_6=n \, | \, \text{even rolls}) &=\left(\frac{1}{3} \right )^{n-1} \cdot \frac{2}{3} \label{eq:myd_2} \qquad (2)\end{align}

Leveraging our previous work with Memoir '44, this has the same form as asking how long it takes to get the first result, when the result occurs with probability $p=2/3$. Thus,

\begin{align}
\mathbb{E} \left [ N_6 \, | \, \text{even rolls} \right ] &= \frac{1}{2/3}\\
&=1.5
\end{align}

Note that here we found that

\begin{align}
P(N_6=n \, | \, \text{even rolls}) &= \left(\frac{1}{3} \right )^{n-1} \cdot \frac{2}{3} \end{align}

which contradicts our earlier hand-wavy equation. Which one is right? Well, I generally trust more rigorous derivations, so I'm inclined towards the later. Interestingly, they both give the correct average. Let's go back to our simulation and look at not just the average, but also the distribution of lengths given all even rolls. By adding a calculation of the percentage of sequences of each length to the above simulation, we get the results in Table 4.

\begin{align*}
\begin{array}{cccc}
\text{Sequence length} & \text{Equation (1)} & \text{Equation (2)} & \text{Estimated probability} \\ \hline
1 & 0.1667 & 0.6667 & 0.6665 \\
2 & 0.1111 & 0.2222 & 0.2226 \\
3 & 0.0741 & 0.0741 & 0.0740 \\
4 & 0.0494 & 0.0247 & 0.0241 \\
5 & 0.0329 & 0.0082 & 0.0084 \\
6 & 0.0219 & 0.0027 & 0.0029 \\
7 & 0.0146 & 0.0009 & 0.0009 \\
8 & 0.0098 & 0.0003 & 0.0004 \\
9 & 0.0065 & 0.0001 & 0.0001
\end{array}
\end{align*}
Table 4: Simulated and theoretical sequence length conditional probabilities

As we can see, this lines up well with our second, revised, equation (2). Interestingly, equation (1) looks like it should have a higher expected number of rolls, as the probability of having sequences of length 1 and 2 is lower than simulated while the probability of sequences longer than 3 is higher. An observation that is useful to recall at this point is that equation (1) gave us that $P(N_6=n \, | \, \text{even rolls})$ is half of $P(N_3=n)$. However, further reflection indicates that that doesn't make sense as the sum of the probability over the whole sample space must be 1. The probability that there is some outcome is 1.

\begin{align}
\sum_n P(N=n) = 1
\end{align}

If this is true of some function $f(n)$ such that $P(N=n)=f(n)$ it cannot be true of some other function $g(n) = f(n)/2$. That is, we should find that our incorrect equation gives the sum to be 0.5.

\begin{align}
\sum_{n=1}^\infty \left ( \frac{2}{3} \right) ^{n-1} \cdot \frac{1}{6} & = \frac{1}{6} \cdot\underbrace{\sum_{n=0}^\infty \left ( \frac{2}{3} \right) ^n}_{\cfrac{1}{1-\cfrac{2}{3}}}\\
&= \frac{1}{6} \cdot 3 \\
& = \frac{1}{2}
\end{align}

Indeed, that is the case. This is confirmation that equation (1) is incorrect.

Again, note that there is no sequence that would be included or excluded differently if we were to emphasis that we conditioned on rolling all even numbers before the 6. The two descriptions are logically equivalent and have the same expected number of rolls.