Tuesday, August 8, 2023

Will I draw an epidemic card in Pandemic? (and more)

When an epidemic card is drawn in the cooperative game Pandemic, it ramps up the difficulty of the game. Knowing when this occurs, or when it could occur, is useful to players to achieve victory. Here, one question ends up leading to additional questions. As we know more about a problem, the horizon of our ignorance often expands, rather than contracts. However, by taking what we learn in answering one question we're often better positioned to answer the next one.


Pandemic setup and epidemic cards

Pandemic has an unusual setup procedure for the player deck. For standard difficulty with 4 players there are 48 city cards, 5 event cards, and 5 epidemic cards summing to 58 total cards. However, $2\cdot 4=8$ of these cards are dealt initially to the players, 2 cards to each of the 4 players. This results in there being $48+5+5-2\cdot 4=50$ cards in the player deck at the start of the game. However, these are not all shuffled together. Each of the 5 epidemic cards are separated with an equal number of other cards to form 5 piles of 10 cards per pile ($50/5=10$). Each pile is shuffled separately and then stacked on top of each other.


There is a separate infection deck containing 48 city cards that will come to our attention later. Cards are drawn from the infection deck each turn to determine where disease spreads in the game.


When an epidemic card is drawn three things occur. First, the number of cards drawn from the infection deck each turn may increase. Second, a new city is infected by drawing from the bottom of the infection deck (guaranteeing it has not been drawn before) with three cubes. Third, the infection deck discard pile is shuffled together and put on top of the deck. This means currently and previously infected cities get drawn again. The next step in the game is to draw from the infection deck, where any cards drawn for a city already containing three disease cubes triggers an outbreak. This causes more disease cubes to be placed in neighboring cities. Among other loss conditions, players lose if there are too many cubes placed or too many outbreaks.


Will I draw an epidemic card in Pandemic?

On each turn, 2 player cards are drawn. The probability of drawing an epidemic card on any particular turn, without knowing anything else is $1/5=0.2$. There are at least two ways to think about it and it's instructive to consider both. The probability that any single card drawn being an epidemic card is $1/10=0.1$, see Fig. 1.


Figure 1: 10 cards from one pile


This would be the same even if we didn't make special preparations to the deck, though perhaps we'd think of the calculation as $5/50=0.1$ as 5 of the 50 card in the initial deck are epidemic cards. Since players draw 2 cards per turn, each initial pile of 10 cards is drawn across exactly 5 turns, as depicted in Fig. 2.  (While the piles are combined at the beginning of the game, I'll refer to the current pile as the set of cards that were in a pile before they were stacked to make the whole deck.)


Figure 2: Drawing each 10-card pile over 5 turns


Exactly 1 of those 5 pairs of cards contains an epidemic card. Thus, the probability is $1/5=0.2$. However, we should be careful about thinking this way. It's tempting to think of this probability as the sum of the probabilities of drawing an epidemic card for each of the two cards drawn on the turn. This only works here because drawing an epidemic card as the first card in a turn and drawing one as the second card are not independent events. The probability of drawing an epidemic card on a given turn is equal to the sum of the probability of drawing it as the first card plus the probability of drawing it as the second card, because these events are disjoint, or mutually exclusive, events.  (Note that this is not always true in Pandemic. For some player counts and difficulty levels, there can be an uneven number of cards in some of the player deck piles at the beginning of the game. In such a case, if the last card of one pile and the first card of the next pile are both epidemic cards, then two epidemic cards can be drawn on a single turn.)

\begin{align} P(\text{E during turn}) = P(\text{E as first card}) + P(\text{E as second card}) \end{align}

(I've abbreviated "draw epidemic" as "E" to shorten the equations.)

From here, we could complete the computation, as we know the probability of any card being an epidemic card in isolation is 0.1. However, we want to describe these events in a way that is easy to construct with probability that doesn't feel so hand wavy. Or, at least highlights the difference compared to when the events are not disjoint. The only way to draw an epidemic card as the second card is if we didn't draw it on the first, thus that event is equivalent to not drawing an epidemic card and then drawing an epidemic card. Similarly, the only way to draw an epidemic card as the first card is to draw an epidemic card and then not draw one.

\begin{align} P(\text{E during turn}) &= P(\text{E then not E}) + P(\text{not E then E}) \end{align}

Let's rewrite this in terms of a logical and, or the intersection of events.

\begin{align} P(\text{E then not E}) &= P(\text{first E} \cap \text{second no E}) \\ P(\text{not E then E}) &= P(\text{first no E} \cap \text{second E}) \end{align}

We can break this up into independent pieces using the multiplication rule, $P(A \cap B) = P(A|B) \cdot P(B)$, though we'll turn the order around a little bit in terms of $A$ and $B$ to write them in the order of occurrence. $P(A|B)$ is a conditional probability, the probability of $A$ given that $B$ occurs. This allows us to chain events together, as shown below.

\begin{align} P(\text{first E} \cap \text{second no E}) &= P(\text{first E}) \cdot P( \text{second no E}|\text{first E}) \\ P(\text{first no E} \cap \text{second E}) &= P(\text{first no E}) \cdot P( \text{second E}|\text{first no E}) \end{align}

Now, let's plug in the numbers. For the first draw, the probabilities are $1/10$ and $9/10$, as either 1 or 9 of the 10 cards are relevant. For the second draw under these conditions there are only 9 cards remaining, and either 9 or 1 of them are of interest.\begin{align} P(\text{first E}) &= \frac{1}{10} \\ P( \text{second no E}|\text{first E}) &= \frac{9}{9} \\ P(\text{first no E}) &= \frac{9}{10} \\ P( \text{second E}|\text{first no E}) &= \frac{1}{9} \end{align}

We can combine our equations to get the following.

\begin{multline} P(\text{E during turn}) = P(\text{first E}) \cdot P( \text{second no E}|\text{first E}) \\ + P(\text{first no E}) \cdot P( \text{second E}|\text{first no E}) \end{multline}

Now we plug in the numbers.

\begin{align} P(\text{E during turn}) &= \frac{1}{10} \cdot \frac{9}{9} + \frac{9}{10}\cdot\frac{1}{9}\\ &= \frac{1}{10} + \frac{1}{10}\\ &= 0.2 \end{align}


We can contrast this with rolling dice, to see the care we should take when adding probabilities. Consider the probability of rolling a 3 on 2d10. (Two 10-sided dice numbered 1 through 10 are here referred to as 2d10.)  This could either be for exactly one 3 or at least one 3. The probability of getting a 3 on a single d10 is $1/10=0.1$, similar to our Pandemic question. However, since each roll is independent, the probability of one roll does not affect the next. Thus, the probability of getting exactly one 3 is $\frac{1}{10}\cdot\frac{9}{10} + \frac{9}{10}\cdot\frac{1}{10} = 0.18$, which is somewhat less than the 0.2 for the Pandemic scenario, using a similar calculation form. The probability of getting at least one 3 is computed as $1-P(\text{no 3s}) = 1 - \left (\frac{9}{10}\right )^2 = 0.19$, a higher probability, but still less than 0.2.


It is close to 0.2, though. Note the equation we used above, we can generalize to $1 - (1-p)^2$, where $p$ is the probability of occurring in one of the two trials. This equations expands to,

\begin{align} 1 - (1-p)^2 = 1 - (1 - 2p + p^2) = 2p - p^2. \end{align}

When $p$ is small, $p^2 \ll 2p$, so $2p$ is a good approximation.


The distinction here between the behavior observed with dice and cards is ripe for reflection. (It's not so much dice vs. cards, but rather independent vs. dependent. If drawing cards with replacement, the same probabilities hold as with dice.)  For the probability of rolling at least one 3, note that the second roll only increases the probability if the first roll was not a 3. This means much of the time, indeed with probability $\frac{1}{10} \cdot \frac{1}{10} = 0.01$, the second roll does no good in terms of counting as rolling a 3. This is an additional case that does not show up for our Pandemic question. In the scenario considered, it is not possible for both cards drawn on one turn to be epidemic cards, but it is possible to get two 3s when rolling 2d10.


Note also that the piles of 10 cards is important to the calculation. If instead there were simply 5 epidemic cards somewhere in the 50 card deck, then the probability of drawing an epidemic card as the first card in a turn is unchanged, but the second card is different and depends on whether the first card is an epidemic card or not. Here we'll use E$_{50}$ to denote drawing an epidemic card with a fully shuffled player deck.

\begin{align} P(\text{first E}_{50}) &= \frac{5}{50} \\ P( \text{second no E}_{50}|\text{first E}_{50}) &= \frac{45}{49} \\ P(\text{first no E}_{50}) &= \frac{45}{50} \\ P( \text{second E}_{50}|\text{first no E}_{50}) &= \frac{5}{49} \end{align}

Since here we can also get two epidemic cards, we have an additional probability of interest.

\begin{align} P( \text{second E}_{50}|\text{first E}_{50}) &= \frac{4}{49} \end{align}

Putting this together, we get the following.

\begin{multline} P(\text{E$_{50}$ during turn}) = P(\text{first E}_{50}) \cdot P( \text{second no E}_{50}|\text{first E}_{50}) \\ + P(\text{first E}_{50}) \cdot P( \text{second E}_{50}|\text{first E}_{50}) \\ + P(\text{first no E}_{50}) \cdot P( \text{second E}_{50}|\text{first no E}_{50}) \end{multline}

This simplifies, as it doesn't matter what we draw as a second card if the first is an epidemic card. Put another way,

\begin{align} P( \text{second no E}_{50}|\text{first E}_{50})+ P( \text{second E}_{50}|\text{first E}_{50}) = 1. \end{align}

Plugging that in, we get the following.

\begin{align} P(\text{E$_{50}$ during turn}) &= P(\text{first E}_{50}) + P(\text{first no E}_{50}) \cdot P( \text{second E}_{50}|\text{first no E}_{50}) \\ &=\frac{5}{50} + \frac{45}{50} \cdot \frac{5}{49} \\ &\approx 0.192 \end{align}

An alternative way of thinking of the earliest computation is that we are separating the piles of 10 cards into two groups: the 2 cards we draw this turn and the 8 cards we draw on the other four turns. Thus the probability of drawing the one epidemic card is 2 cards out of 10 total, or 0.2. We can think of this construction as choosing where to place the epidemic card. There are $\binom{2}{1}=2$ ways to put it in the 2 cards of interest and $\binom{10}{1}=10$ total ways to put it in a particular pile.


What if we're in the middle of the game?

The single probability that we calculated is limited, though, as it doesn't match the amount of information that we have for much of the game. Once we draw cards, we know more about some of the remaining turns. We can compute the conditional probabilities, the probabilities given the knowledge we have from either drawing or not drawing epidemic cards on previous turns. For example, after not drawing an epidemic card in the first turn, the probability of drawing an epidemic card on the second turn increases, as now there are only 8 cards in the current pile, or equivalently 4 turns until the current pile is exhausted. This is similar to when we looked at the length of a game of High Society, except here we repeat through multiple piles. The conditional probability can also drop to zero. If you draw an epidemic card on the first turn, you know that you won't for the next 4 turns.


The probability of drawing an epidemic card on the second turn given that we did not on the first turn is $2/8=0.25$. Of course, if the probability of drawing an epidemic card on the second turn given that we did draw one of the first turn is 0. We can extend this for the third turn ($2/6=1/3$ and $0$), fourth turn ($2/4=0.5$ and $0$) and fifth turn ($2/2=1$ and $0$). On the sixth turn we draw from the second pile, so the previous set of probabilities repeat. We can write the probability, $p_\text{not}[n]$, of drawing an epidemic card on turn $n$ given that we have not drawn one from the current piles as,

\begin{align} p_\text{not}[n] = \frac{2}{10 - 2\cdot ( (n-1)\mod 5 )}, \end{align}

where $a \mod b$, or $a$ modulo $b$, is the remainder after dividing $a$ by $b$. This forms the desired repeating pattern. The above equation could be rewritten in a simplified form, but it is based on the number of cards, which are the physical counts that give rise to the probabilities. The $(n-1) \mod 5$ term represents the number of turns where cards were already drawn from the current pile. These probabilities are plotted in Fig. 3 along with the probability given we already drew the current epidemic card (if possible) as well as the unconditional probability.  (The probability is set to 0.2 for the first turn of each pile to emphasize that the probability here is always 0.2.)

Figure 3: Probability of drawing an epidemic card in Pandemic conditioned on whether one was drawn from the current pile

The plots make the periodic nature quite apparent, even aside from the logical necessity given the setup.


How many turns are there between epidemic cards?

This leads us to question the distribution of the number of turns between epidemic cards, as this can impact the game in notable ways. Do we get new heavily infected cities back to back, or a long gap to take care of the current situation. To find the distribution, we construct five random variables, $X_1$ through $X_5$, for the five turns on which the epidemic cards are drawn, and then four additional random variables, $Y_2$ through $Y_5$, that represent the differences between the first five. $Y_k$ is the number of turns since the previous epidemic card after the $k$-th epidemic card.

\begin{align} Y_k = X_{k} - X_{k-1} \end{align}


The first five are uniform random variables, as each possible value occurs with equal probabilities. That is, we know the first epidemic card has a probability 0.2 of being drawn in each of the first five turns. Thus, we can write the probability mass function (pmf) of the random variable $X_1$, the turn in which the first epidemic card is drawn, as follows.

\begin{align} p_{X_1}(x) = \begin{cases} 0.2 & x \in \{1, 2, 3, 4, 5\} \\ 0 & \text{else} \end{cases} \end{align}

The pmfs of the random variables for the other turns in which an epidemic card is drawn are similar.

\begin{align} p_{X_k}(x) = \begin{cases} 0.2 & x \in [5 (k-1)+1, 5 k] \\ 0 & \text{else} \end{cases} \end{align}

These pmfs are plotted in Fig. 4.

Figure 4: Probability mass functions of the turn in which each epidemic card is drawn


When discussing D&D ability scores, we introduced convolution as a mathematical operation to determine the pmf of the sum of two random variables from their pmfs.  (Convolutions have other uses as well.) We can use convolution here to find the pmfs of the differences $Y_k$. While $Y_k$ are equal to the difference of known random variables, rather than the sum, the same principle applies. We can see this by first transforming $X_k$ into negative versions, $X'_k = -X_k$. We know the pmfs.

\begin{align} p_{X'_k}(x) &= p_{X_k}(-x) \\ &= \begin{cases} 0.2 & x \in [-5 k, -5 (k-1)-1] \\ 0 & \text{else} \end{cases} \end{align}

With this, we can write $Y_k$ in terms of the sum of random variables with known pmfs.

\begin{align} Y_k = X_{k} + X'_{k-1} \end{align}

And thus state the pmfs in terms of convolution.

\begin{align} p_{Y_k}(y) = p_{X_{k}}(y) * p_{X'_{k-1}}(y) \end{align}

Since $X_k$ and $X'_k$ have uniform distributions that have a rectangular shape when considered over all $x \in \mathbb{Z}$ (the set of all integers), the differences $Y_k$ have a triangular distribution from convolution. This is the same as how the sum of 2d6 has a triangular distribution stemming from the rectangular pmfs of each individual d6. We could go through the motions, but this is a general truth when the rectangular pmfs have equal length. That is, when they are non-zero for a range of the same size. Otherwise, there is a flat portion in the middle. It is worth considering something like 1d6+1d8 to see this effect. If we take the shape as given, then we can find the relevant constants.


We can argue from the situation the maximum and minimum possible values of $Y_k$. It has a minimum value of 1, which occurs when the epidemic card in one pile is drawn as late as possible, while the one in the next pile is drawn as early as possible. It has a maximum value equal to $10-1=9$. Here I'm looking at the latest possible turn the second epidemic card can be drawn (turn 10) and the earliest the first can be (turn 1). It is the same for all $Y_k$. If we're familiar with convolution, we'd also know that it has this property that the length of of the convolution of two finite sequences with lengths $n$ and $m$ is $n+m-1$, which in this case is $5+5-1=9$. This necessarily follows from the convolution definition.

\begin{align} p_{X_{k}}(y) * p_{X'_{k-1}}(y) &= \sum_{n=-\infty}^\infty p_{X_{k}}(y-n) \cdot p_{X'_{k-1}}(n) \\ &= \sum_{n=-\infty}^\infty p_{X_{k}}(n) \cdot p_{X'_{k-1}}(y-n) \label{eq:pandemic_convolve} \end{align}

There is some symmetry in that the two random variables being added have equal values over the same range. Combined with the symmetry of convolution itself, that $f(x) * g(x) = g(x) * f(x)$, the resulting convolution must be symmetric about some point. This must be the mid-point between 1 and 9, which is 5. This makes a lot of sense, since each pile is five turns of card draws it is most likely that that is how far apart they are. We could find the probabilities by combining the known shape with the fact that the pmf must sum to 1 to be valid, but it's easier to just use a computer to convolve some vectors of copies of 0.2 or look at.


We can use the visual method of computing convolutions, illustrated in Fig. 5 for $p_{Y_2}(y) = p_{X_{2}}(y) * p_{X'_{1}}(y)$. Following an earlier equation, we plot $p_{X_2}(n)$, a version of $p_{X'_1}(n)$ that is horizontally flipped and shifted by $y$, their product, and the running sum, across several values of $y$. To save space, not all values of $y$ are plotted.

Figure 5: Illustrating convolution to find pmf of $Y_k$

More numerically, we can look at the peak value of $p_{Y_k}(y)$. This occurs at $y=5$ and corresponds to when the flipped and shifted $p_{X_{k}}(n)$ is exactly on top of $p_{X'_{k-1}}(n)$. Thus, the convolution yields $p_{Y_k}(5) = 5 \cdot 0.2^2 = 0.2$. This is enough to construct the pmf shown in Fig. 6 with the arguments made earlier.

Figure 6: Probability mass function of the number of turns from one epidemic card to the next


While $X_k$ are independent, as shuffling one pile does not affect the next, $Y_k$ are not independent. The dependence is quite clear as $Y_2 = X_2 - X_1$ and $Y_3 = X_3 - X_2$. Both depend on $X_2$. If the second epidemic card come early, that means the first interval is short while the second interval is long. We could capture this dependence by looking at the joint distribution of all $Y_k$. While some pairs are independent, for example the first interval, $Y_2$, and the last, $Y_5$, they all connect via other intervals. However, this looks to be a tedious and uninspiring calculation. We know the main gist of the dependence and leave it at that for now.


What is the probability of drawing the newly infected city after an epidemic card?

One of the things that can happen after an epidemic card is drawn is that the newly infected city is drawn from the infection deck. The probability of this occurring is a function of how many cards are in the discard pile of the infection deck, which in turn depends on how many turns are between epidemic cards (or since the beginning of the game).


First let's ask an easier question, let's look at after just the first epidemic card. We start with 9 cards in the infection deck discard pile. Then we get 2 more per turn until the epidemic card is drawn. This means we get 0, 2, 4, 6, or 8 more infection cards (epidemic cards are in the player deck, which is drawn from before drawing from the infection deck). One more card is added from the epidemic card itself (from the bottom of the infection deck). All these cards get shuffled and placed atop the infection deck. Thus, there are 10 to 18 cards in which the newly infected city card is shuffled. After the first epidemic card, we're still drawing two infection cards per turn, so the probability of drawing the newly infected city is 2 divided by the number of cards that were shuffled and put on top. This repeats the same concept as we used to find the probability of drawing an epidemic card in the first place, that we're drawing from a deck without replacement. In the equations, I'll abbreviate "outbreak in newly infected city immediately after the first epidemic card" as "O from E1".

\begin{align} P(\text{O from E1} | \text{E on turn 1}) &= \frac{2}{10} \\ P(\text{O from E1} | \text{E on turn 2}) &= \frac{2}{12} \\ P(\text{O from E1} | \text{E on turn 3}) &= \frac{2}{14} \\ P(\text{O from E1} | \text{E on turn 4}) &= \frac{2}{16} \\ P(\text{O from E1} | \text{E on turn 5}) &= \frac{2}{18} \end{align}


As we found before, the 5 possibilities are equally likely, so the total probability is as follows.

\begin{align} P(\text{O from E1}) &= \frac{1}{5}\cdot\frac{2}{10} + \frac{1}{5}\cdot\frac{2}{12} + \frac{1}{5}\cdot\frac{2}{14} + \frac{1}{5}\cdot\frac{2}{16} + \frac{1}{5}\cdot\frac{2}{18} \\ &\approx 0.149 \end{align}

This is the probability of this happening before we know anything about how the game goes. Obviously, once the epidemic card is actually drawn, then the probability is $\frac{1}{10}$ or $\frac{1}{14}$ or one of the other possibilities depending on how many cards are actually in the infection deck discard pile. This probability speaks to how likely it is that the deck is stacked against us.


Now, let's look at the probability of this happening after the second epidemic card. Here, we can use the distribution of the number of turns from one epidemic card to the next, in this case $Y_1$. Through this time, we're still drawing 2 infection cards per turn, so the number of cards in the infection deck discard pile is $1+2Y_2$, one from the epidemic card, and two for each turn since the last epidemic card. Using the same logic as above, we find the probability of drawing the newly infected city conditioned on the value of $Y_2$.

\begin{align} P(\text{O from E2} | Y_2 = y) &= \frac{2}{1+2y} \end{align}

To get the total probability, we need to sum over all possible $Y_2$.

\begin{align} P(\text{O from E2}) &= \sum_{y=1}^9 \frac{2}{1+2y}\cdot p_{Y_2}(y) \\ &\approx 0.219 \end{align}


Getting to the third epidemic card gets interesting, because now we start to draw more infection cards per turn. We'll define $r_k$ to be the rate of infection after the $k$-th epidemic card is drawn, with $r_0$ being the rate at the start of the game. These rates are shown in Table 1.

\begin{array}{ccccccc} r_0 & r_1  & r_2  & r_3  & r_4  & r_5  & r_6 \\ \hline 2 & 2 & 2 & 3 & 3 & 4 & 4 \end{array} Table 1: Infection rate, $r_k$, after the $k$-th epidemic card

Note that this includes $r_6$, which is only relevant for the heroic difficulty level. In the third epidemic card case, we still were drawing 2 infection cards per turn until the epidemic card, so the three infection cards being drawn only affects the probability of drawing the newly infected city, not how many cards there are.

\begin{align} P(\text{O from E3}) &= \sum_{y=1}^9 \frac{3}{1+2y}\cdot p_{Y_3}(y) \\ &= \frac{3}{2} \cdot P(\text{O from E2})\\ &=0.329 \end{align}

Note that the pmfs of all four differences rare the same, so $p_{Y_2}(y) = p_{Y_3}(y)$.


For subsequent epidemic cards, though, the higher rate of infection card draw changes the number of cards in the infection card discard pile in which the newly infected city is shuffled. For example, for the fourth epidemic card, there are now $1+3Y_4$ cards, three for each of the $Y_4$ turns from the third epidemic card to the fourth. In general, there are $1+r_{k-1} \cdot Y_k$ cards in the infection deck discard pile after the $k$-th epidemic card. Thus, we can generalize our equation.

\begin{align} P(\text{O from E$k$}) &= \sum_{y=1}^9 \frac{r_k}{1+r_{k-1}\cdot y}\cdot p_{Y_k}(y) \qquad k \geq 2 \label{eq:p_outbreak_k} \end{align}

One thing that's worth checking is that we don't create nonsensical probabilities. In this case, I'm concerned that our generalized formula could erroneously give probabilities above 1 in the $ \frac{r_k}{1+r_{k-1}\cdot y}$ term for small $y$. However, since $y \geq 1$ and $r_k \leq r_{k-1} + 1$, then this correctly gives a probability of 1 when epidemic cards are drawn back to back and the infection rate increases by one.


\begin{array}{ccc} k & r_k & P(\text{O from E$k$}) \\ \hline 1 & 2 & 0.149 \\ 2 & 2 & 0.219\\ 3 & 3 & 0.329\\ 4 & 3 & 0.230\\ 5 & 4 & 0.307 \end{array} Table 2: Probability of outbreaking in newly infected city, after the $k$-th epidemic card


The computed approximate probabilities are listed in Table 2. It is interesting to see how these probabilities change over the course of the game. The first we would expect to be lower, as the beginning of the game is seeded with an additional 9 cards, while after the later epidemic cards it is possible to have only a few cards.  We also see the probabilities jump when the infection rate increases. This makes sense, as the number of cards drawn immediately after an epidemic card is relatively high. What may not be clear is why $P(\text{O from E$4$})$ is higher than $P(\text{O from E$2$})$ while $P(\text{O from E$5$})$ is lower than $P(\text{O from E$3$})$. The effect of more cards being drawn overall has a different effect when $r_k = r_{k-1}$ compared to when $r_k = r_{k-1}+1$. Looking at the denominator $1+r_{k-1}\cdot y$ again indicates that as $r_{k-1}$ increases, the constant 1 card from the epidemic card has a smaller effect, making the probability increase. That explains why $P(\text{O from E$4$})$ is higher than $P(\text{O from E$2$})$. In the other direction, as $r_{k-1}$ increases, having $r_k = r_{k-1} + 1$ in the numerator makes a smaller increase over $r_k = r_{k-1}$. In the limit there is no increase.

\begin{align} \lim_{r_{k-1} \to \infty} \frac{r_{k-1}+1}{1+r_{k-1} \cdot y} = \frac{1}{y} \end{align}

That explains why $P(\text{O from E$5$})$ is lower than $P(\text{O from E$3$})$.


Looking at the probabilities of getting an outbreak in the newly infected city after each epidemic card, $P(\text{O from E$k$})$, it is clear that the total probability is not just the sum, as that would be well above 1, which is the maximum valid value for any probability. This indicates that these probabilities are not independent. Indeed they are not, as they are based on the number of turns from one epidemic card to the next, which themselves are not independent. If we had the joint probability distribution then we could calculate the total probability. That is not simple to come by here, so we'll take two approaches to get what we want.


One approach is to set aside the probability of getting such an outbreak and rather look at the expected number of such outbreaks. This simplifies our calculations, because when taking the expected value of the sum of some random variables it is equal to the sum of the expected values of those random variables, even if the random variables are dependent. To see why, let's just consider two of our random variables, $Y_2$ and $Y_3$.

\begin{align} \mathbb{E} (Y_2 + Y_3) &= \sum_{y_2=-\infty}^\infty\sum_{y_3=-\infty}^\infty (y_2 + y_3) \cdot p_{Y_2,Y_3}(y_2, y_3) \end{align}

We can break this into two summations.

\begin{align} \mathbb{E} (Y_2 + Y_3) &= \sum_{y_2=-\infty}^\infty\sum_{y_3=-\infty}^\infty y_2 \cdot p_{Y_2,Y_3}(y_2, y_3) + \sum_{y_2=-\infty}^\infty\sum_{y_3=-\infty}^\infty y_3 \cdot p_{Y_2,Y_3}(y_2, y_3) \end{align}

Now we reorder the second double summation and pull out $y_2$ in first and $y_3$ in the second.

\begin{align} \mathbb{E} (Y_2 + Y_3)&= \sum_{y_2=-\infty}^\infty y_2 \cdot \sum_{y_3=-\infty}^\infty p_{Y_2,Y_3}(y_2, y_3) + \sum_{y_3=-\infty}^\infty y_3 \cdot \sum_{y_2=-\infty}^\infty p_{Y_2,Y_3}(y_2, y_3) \end{align}

The marginal pmfs, which are the individual pmfs, are useful here. These are obtained by summing the joint pmf over all possible values of one random variable, to get the distribution of a single one. For example,

\begin{align} p_{Y_2}(y_2) = \sum_{y_3=-\infty}^\infty p_{Y_2,Y_3}(y_2, y_3). \end{align}

Now we replace our equation for $\mathbb{E} (Y_2 + Y_3)$ with the marginal pmfs.

\begin{align} \mathbb{E} (Y_2 + Y_3)&= \sum_{y_2=-\infty}^\infty y_2 \cdot p_{Y_2}(y_2) + \sum_{y_3=-\infty}^\infty y_3 \cdot p_{Y_3}(y_3)\\ \mathbb{E} (Y_2 + Y_3) &= \mathbb{E}(Y_2) + \mathbb{E}(Y_3) \end{align}

This is not the expected value we're looking for, but it shows that it's valid to find the expected value of the sum of dependent random variables.


We need to define more random variables to get the expected number of outbreaks of newly infected cities. The total number of such outbreaks is $O$, while the number after the $k$-th epidemic card is $O_k$. These are related by,

\begin{align} O = \sum_{k=1}^5 O_k \end{align}

Since after each epidemic card there can be either 0 or 1 outbreaks in the newly infected city, the expected number of outbreaks is equal to the probability of such an outbreak.

\begin{align} \mathbb{E}O_k &= 0 \cdot (1 - P(\text{O from E$k$})) + 1 \cdot P(\text{O from E$k$}) \\ \mathbb{E}O_k &=P(\text{O from E$k$}) \end{align}

Thus the total number of expected such outbreaks in an entire game is equal to the sum of these probabilities.

\begin{align} \mathbb{E}O&= \sum_{k=1}^5 P(\text{O from E$k$})\\ \mathbb{E}O&= 1.234 \end{align}

This sum seems surprisingly high. I feel I don't get this happening this often. What could explain the surprise? One possibility is that because it is apparent when this is more likely to happen, players choose to use event cards that can avoid such scenarios at the proper times. Another possibility is that I've made some computation error. We'll check this next.


Our second approach is to rewrite our probabilities in terms of $X_k$ instead of $Y_k$, because $X_k$ are independent. We'll use $Y_k = X_k - X_{k-1}$ to rewrite an earlier equation.

\begin{align} P(\text{O from E$k$}) &= \sum_{x_{k-1}} \sum_{x_{k}} \frac{r_k}{1+r_{k-1}\cdot (x_k-x_{k-1})}\cdot p_{X_k,X_{k-1}}(x_k,x_{k-1}) \qquad k \geq 2 \end{align}

Because $X_k$ are independent, the joint pmf is equal to the product of the pmfs.

\begin{align} p_{X_1,X_2,X_3,X_4,X_5} (x_1,x_2,x_3,x_4,x_5) &= p_{X_1}(x_1) p_{X_2}(x_2) p_{X_3}(x_3) p_{X_4}(x_4) p_{X_5}(x_5) \end{align}

This has a constant value when non-zero.

\begin{align} p_{X_1,X_2,X_3,X_4,X_5}&= \frac{1}{5} \cdot \frac{1}{5} \cdot \frac{1}{5} \cdot \frac{1}{5} \cdot \frac{1}{5} \\ &= \frac{1}{5^5} = \frac{1}{3125}\\ &=0.00032 \end{align}

The conditional probabilities are useful here.

\begin{align} P(\text{O from E$k$}|X_k=x_k,X_{k-1}=x_{k-1}) &= \frac{r_k}{1+r_{k-1}\cdot (x_k-x_{k-1})} \qquad k \geq 2 \end{align}

And the conditional pmfs of $O_k$.

\begin{align} p_{O_1|X_1=x_1}(o) &= \begin{cases} 1 - \cfrac{2}{8+2x_1} & o = 0\\ \cfrac{2}{8+2x_1} & o = 1 \end{cases}\\ p_{O_k|X_k=x_k,X_{k-1}=x_{k-1}}(o) &= \begin{cases} 1 - \cfrac{r_k}{1+r_{k-1}\cdot (x_k-x_{k-1})} & o = 0\\ \cfrac{r_k}{1+r_{k-1}\cdot (x_k-x_{k-1})} & o = 1 \end{cases} \end{align}

We combine these to get the total number of such outbreaks, $O$, conditioned on $X_1$ through $X_5$ and use convolution to get the resulting conditional pmf.

\begin{align} p_{O|\underline{X}=\underline{x}}(o) = p_{O_1|\underline{X}=\underline{x}}(o) * p_{O_2|\underline{X}=\underline{x}}(o) * p_{O_3|\underline{X}=\underline{x}}(o) * p_{O_4|\underline{X}=\underline{x}}(o) * p_{O_5|\underline{X}=\underline{x}}(o) \end{align}

Here $\underline{X}$ is a vector representing all $X_1$ through $X_5$, similarly with $\underline{x}$, both are used for brevity. Note that conditioning on $X_k$ that do not influence $O_k$ has no affect on the corresponding conditional pmf. Then we can use the law of total probability to sum across all $\underline{X}$.

\begin{align} p_O(o) &= \sum_{x_1=1}^5 \sum_{x_2=6}^{10} \sum_{x_3=11}^{15} \sum_{x_4=16}^{20} \sum_{x_5=21}^{25} p_{O|\underline{X}=\underline{x}}(o) \cdot p_{\underline{X}} (\underline{x}) \\ &= \sum_{x_1=1}^5 \sum_{x_2=6}^{10} \sum_{x_3=11}^{15} \sum_{x_4=16}^{20} \sum_{x_5=21}^{25} p_{O|\underline{X}=\underline{x}}(o) \cdot 0.00032 \end{align}

This is sort halfway towards an exhaustive check, as we just sum over all $5^5=3125$ possible locations of the 5 epidemic cards. As such, this is obviously a calculation made to be done by a computer. Note also, though, that we are making use of the probability equations for drawing the newly infected city and don't have to count through all those cases too. The resulting distribution is plotted in Fig. 7. The expected value, $\mathbb{E}O \approx 1.234$, matches our earlier calculation, which adds to our confidence that we are correct.


Figure 7: Probability mass function of the number of outbreaks of newly infected cities immediately after an epidemic card


Further avenues for exploration

We started off with a relatively simple question with a relatively simple answer, that the probability of drawing an epidemic card on any particular turn is 0.2, but ended up looking at quite a bit more. We ended up with both an unexpected question and answer, that it is more likely than not that there is at least one outbreak that players cannot avoid.  (Cannot avoid without special abilities or event cards, that is.)  Before getting into the conditional case, I had thought that autocorrelation and power spectral density would be useful concepts, but they ended up not really fitting in because of the finite, and rather small, number of turns in Pandemic. We could go on to compute the joint distribution of $Y_k$, but it wasn't necessary for what we were interested in, and with so many variables it's not immediately apparent how to visualize. An interesting question I'll leave unanswered but is perhaps the next in a sequence to tackle is this: what is the distribution of the total number of cards drawn from the infection deck?

Monday, May 9, 2022

How long will a siege last in War of the Ring?

For simplicity, we'll assume that no combat cards are played, as that complicates things too much for an initial look. 

Our first step is to compute if there is a round of combat, how many units get removed without leaders, this is pretty clear: it's a binomial distribution with n = # of units (max 5) and $p = 2/6$ for defender (hit on 5 & 6), $p = 1/6$ for attacker (hit on only 6) Let $X_n$ be a random variable equal to the number of hits generated by an attack with $n$ combat dice. 
\begin{align} P(X_n=x) = \binom{n}{x}\cdot (1-p)^x \cdot p^{(n-x)} \end{align} With as many leaders as units (or 5 leadership in the case of larger armies), we can use the same approach. Because any die that misses the combat roll will get rerolled in the leader reroll, we can compute the combined probability, $p_l$, of hitting on either the combat roll or the leader reroll for each die namely. 
\begin{align} p_l &= p + (1-p)\cdot p \\ &= 2\cdot p-p^2 \end{align} It may be handy to look at the probability of a miss in this case also. \begin{align} 1 - p_l &= 1 - 2\cdot p + p^2 \\ &= (1-p)^2 \end{align} If we let $X_n'$ be a random variable equal to the number of hits generated by an attack with $n$ combat dice and at least $n$ leadership. This is the same idea as $X_n$, but with at least $n$ leadership. \begin{align} P(X_n' = x) &= \binom{n}{x}\cdot (1-p_l)^x \cdot {p_l}^{(n-x)} \\ &= \binom{n}{x}\cdot (1-p)^{2x} \cdot \left ( 2\cdot p-p^2 \right )^{(n-x)} \end{align} When the leadership and combat strength do no match, then it gets more tricky. It's tempting to think that we can just add two cases. That is, it may seem that 5 combat strength with 3 leadership is equivalent to adding the hits generated by 3 combat strength with 3 leadership and 2 combat strength with 0 leadership. However, this is not the case, because that way if the 3 units with leadership hit and the 2 units without leadership miss, then there is no leadership reroll. However, in the case with 5 units with 3 leadership roll the same as before, if any two units miss then their dice get rerolled. Thus, we'd need to approach it differently. For simplicity, we'll set this case aside and focus on the extremes of no leadership and maximum leadership. 

Using the equations we derived, we can compute the probabilities of getting different numbers of hits for a variety of situations. There are three variables we'll consider: the number of combat dice rolled, whether leaders are present, and whether the units are attacking a fortified position, meaning either a stronghold or a city or fortification in the first round. The next several figures plot the probabilities in a couple different ways. 

Figure 1: PMF of hits in War of the Ring when attacking with 5 units


Figure 2: PMF of hits in War of the Ring when attacking a fortified position with leadership

Figure 3: PMF of hits in War of the Ring when attacking an unfortified position with leadership

Figure 4: PMF of hits in War of the Ring when attacking a fortified position without leadership

Figure 5: PMF of hits in War of the Ring when attacking an unfortified position without leadership

Our next step is to find how many rounds of combat a siege is expected to last. We'll use three techniques to quantify this and compare. Throughout, we'll use a few common assumptions. While the number of defending units may vary, being able to take a total of $d$ hits, we fix the number of attack dice at five, either with or without leadership. We thus remove the possibility that the siege goes well enough for the defender that the attackers are whittled down or eliminated. 

First, we'll estimate it based on how many individual dice it takes to eliminate the defending units, similar to when we looked at What's the toughest unit in Memoir '44?. The limitation here is that when moving from rolling one die at a time to five dice at a time there's different quantization occurring. Needing to roll one through five dice is all just one round of combat, but moving from five dice to six dice means a second round even though it's only one die. The averaging that occurs over dice and rounds can have a notable impact. We'll see when we compare. Each die eliminates one hit point of defending units with probability $p$ (or $p_l$ with leadership). The expected number of dice to generate this one hit is $1/p$. Thus, to generate $d$ hits we can estimate $d/p$ dice on average. Since five dice are rolled each round, this means there should be about $d/(5p)$ rounds on average. 

For the next two methods, we'll make use of a Markov chain, as we did when checking if it was possible an event was never drawn in a previous problem. We'll start with an initial state column vector $\mathbf{s_d}$ that indicates the number, $d$, of hit points of the defenders in the siege. The vector is zero everywhere, except in index $d$, where it has a value of 1 to indicate that the initial probability of there being $d$ defending hit points is 1. (I'll use indexes starting at 0, instead of 1 as commonly done, as it lines up nicely with programming as well as the meaning in this problem.)  For example, with 10 defending hit points, \begin{align} \mathbf{s_{10}}^T & = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} , \end{align} where $\mathbf{s_d}^T$ is the transpose of $\mathbf{s_d}$, shown to fit on the page more easily. Note the 1 would be in the bottom position of the column vector. Each position in the vector corresponds to the 11 total states: 0-10 defending hit points of units. 

We can multiply this by a transition matrix $\mathbf{P}$. \begin{align} \mathbf{P} = \begin{bmatrix} p_{0,0} & p_{0,1} & \dots & p_{0,10}\\ p_{1,0} & p_{1,1} \\ \vdots & & \ddots \\ p_{10,0} & p_{10,1} & \ldots & p_{10,10} \end{bmatrix} \end{align} An element of $\mathbf{P}$, $p_{i,j}$ is the probability of transitioning from state $j$ to state $i$, in line with matrix multiplication. This is a left stochastic matrix, where each column adds to one. You may find this less common than right stochastic matrices, where each row sums to one. This means that the matrices and equations here may need to be transposed to compare to other sources. The core is whether $p_{i,j}$ is as used here or as the probability of transitioning from state $i$ to state $j$. The left and right names have to do with whether the stochastic matrix should multiply state probability vectors from the left or right. 

To fill in the values of this matrix, let's work with the probabilities of getting a certain number of hits by the attacker, which we assume always rolls 5 dice for simplicity. The probability $p_h$ is the probability of getting $h$ hits. Thus we have six probabilities of interest here: $p_0$, $p_1$, $p_2$, $p_3$, $p_4$, and $p_5$, where $p_h = P(X_5 = h)$ as found earlier. We construct $\mathbf{P}$ from these individual probabilities. Each column represents the probabilities of moving from one state to the others. Thus, the values of each column must sum to 1, as the probability of ending up in one of the states is 1 and all states are under consideration here. Further, since there is no way to add defending hit points during a battle, we know that many of the entries are zero. \begin{align} p_{i,j} = 0 \qquad i > j \end{align} We can't transition from state $j$, which has $j$ defending hit points, to a state $i$, which has $i$ defending hit points, where $i>j$. 

We'll go through the matrix one column at a time to build it up. The first column, representing what happens when there are already no defending units is fairly simple, as we must stay in that state. Thus, the transition probabilities are either 0 or 1. \begin{align} p_{i,0} = \begin{cases} 1 &\qquad i = 0 \\ 0 &\qquad i \neq 0 \end{cases} \end{align} Specifically, the probability of staying in state 0 is 1 and the probability of transitioning to any other state is 0. 

In the next column, we start with one defending hit point and there are two possibilities: if no hits are rolled, then stay in the same state, but if any hits are rolled, then move to state 0. \begin{align} p_{i,1} = \begin{cases} \sum_{h=1}^5 p_h = (1-p_0) &\qquad i = 0 \\ p_0 & \qquad i = 1\\ 0 &\qquad i > 1 \end{cases} \end{align} The next several columns follow this pattern \begin{align} p_{i,2} = \begin{cases} \sum_{h=2}^5 p_h &\qquad i = 0 \\ p_1 & \qquad i = 1\\ p_0 & \qquad i = 2\\ 0 &\qquad i > 2 \end{cases} \end{align} Eventually, no summation is necessary and next the probability of reaching state 0 in the single transition of one round of combat becomes 0, as it would require more than the maximum 5 hits. \begin{align} p_{i,6} = \begin{cases} 0 &\qquad i = 0 \\ p_5 & \qquad i = 1\\ p_4 & \qquad i = 2\\ p_3 & \qquad i = 3\\ p_2 & \qquad i = 4\\ p_1 & \qquad i = 5\\ p_0 & \qquad i = 6\\ 0 &\qquad i > 6 \end{cases} \end{align} We can also generalize this. \begin{align} p_{i,j} = \begin{cases} p_5 & \qquad i = j-5\\ p_4 & \qquad i = j-4\\ p_3 & \qquad i = j-3\\ p_2 & \qquad i = j-2\\ p_1 & \qquad i = j-1\\ p_0 & \qquad i = j\\ 0 &\qquad i < j-5, i > j \end{cases} \qquad j \geq 5 \end{align} Multiplying some probability state vector, such as $\mathbf{s_d}$, by $\mathbf{P}$ gives the updated probabilities of being in each state after one round of combat. \begin{align} \mathbf{s_{new}} = \mathbf{P} \times \mathbf{s_{old}} \end{align} We can repeat this to get the probabilities after multiple rounds of combat. \begin{align} \mathbf{s_{r=2}} &= \mathbf{P} \times \mathbf{s_{n=1}}\\ \mathbf{s_{r=2}} &= \mathbf{P} \times \left ( \mathbf{P} \times \mathbf{s_{n=0}} \right ) \\ \mathbf{s_{r=2}} &= \mathbf{P}^2 \times \mathbf{s_{n=0}} \end{align} Here $\mathbf{s_{r=2}}$ indicates the state probability vector after two rounds of combat. Matrix exponentiation is repeated matrix multiplication as exponentiation is repeated multiplication. \begin{align} \mathbf{P}^r &= \underbrace{\mathbf{P} \times \mathbf{P} \times \cdots \times \mathbf{P}}_{r \text{ times}} \end{align} Since it comes up later if you pay attention, so we'll note that just as $x^0 = 1$, $\mathbf{P}^0 = \mathbf{I}$. That is, if we multiply by $\mathbf{P}$ zero times, we should keep what we have, which is accomplished by multiplying by the identity matrix. 

The number of rounds, $R_{d}$, that it takes to remove all $d$ defending units has a probability mass function that can be extracted by grabbing the probability of being in state 0 beginning in round $r$, using a matrix $\mathbf{A}$. \begin{align} \mathbf{A} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \end{align} The point of this matrix is to take the first entry of the state probability vector, which is the probability of there being no defending hit points remaining that we're interested in. Combining $\mathbf{A}$, $\mathbf{P}$, and $\mathbf{s_{d}}$, we find the probability that there are no defending units in round $r$. \begin{align} P(0\text{ defenders in round } r) = \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_{d}} \end{align} To get the pmf of $R_{d}$, we want to only see the probability that this came to be in round $r$. \begin{align} P(R_{d} = r) &= P(0\text{ defenders in round } r) - P(0\text{ defenders in round } r-1)\\ & = \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_{d}} - \mathbf{A} \times \mathbf{P}^{r-1} \times \mathbf{s_{d}} \\ & = \mathbf{A} \times \left ( \mathbf{P}^{r} - \mathbf{P}^{r-1} \right ) \times \mathbf{s_{d}} \end{align} Here it's useful to make use of an identity matrix, $\mathbf{I}$, which is the matrix multiplication equivalent of multiplying by 1 in that multiplying by it leaves the same matrix. \begin{align} P(R_{d}=r) = \mathbf{A} \times \left(\mathbf{P}- \mathbf{I} \right) \times \mathbf{P}^{r-1} \times \mathbf{s_{d}} \end{align} For our second technique, let's take this and do enough of an infinite sum to estimate the expected value of $R$. Recall the definition of the expected value of a non-negative discrete random variable. \begin{align} \mathbb{E} R_d &= \sum_{r=0}^\infty r \cdot P(R_d = r) \\ &= \sum_{r=0}^\infty P(R_d > r) \end{align} We can turn these probability terms around by looking at their complements. \begin{align} P(R_d > r) &= 1 - P(R_d \leq r) \end{align} The probability that the combat ends in round $r$ or earlier, $P(R_d \leq r)$ is equivalent to there being no defenders in round $r$. They could have been removed in that round or an earlier one. We found this probability earlier. \begin{align} P(R_d \leq r) &= P(0\text{ defenders in round } r)\\ &= \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \end{align} Putting this together, we find the following infinite sum. \begin{align} \mathbb{E} R_d &= \sum_{r=0}^\infty \left ( 1 - \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \right ) \end{align} By computing this running sum, as shown in Figure 6 for 10 initial hit points, we observe that it appears to be approaching an asymptote well before 50 rounds. \begin{align} \widehat{\mathbb{E} R_d} &= \sum_{r=0}^{50} \left ( 1 - \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \right ) \end{align} The asymptote is approached faster with fewer hit points, which must require fewer rounds to eliminate with the same combat rolls. Thus, we can use the running sum after 50 rounds as a good estimate of the infinite sum. 

Figure 6: Running estimate of expected number of combat rounds complete siege of 10 defending hit points

For our third technique, I think we can do better than an estimate this time. We can solve for the actual expected values. We'll again look at the number of rounds $R_d$ based on starting with $d$ defending hit points. If we start with no defending units, the siege lasts 0 rounds. \begin{align} \mathbb{E}R_0 = 0 \end{align} If we start with 1 hit point of defenders, then it's just a question of how long until the attacker rolls one hit. The probability of one or more hits if $1-p_0$ and we've previously found that the expected number of tries to get a result that occurs with probability $p$ is $1/p$, so we can find the expected value of $R_1$. \begin{align} \mathbb{E}R_1 &= \frac{1}{1-p_0} \end{align} We can also leverage the columns of $\mathbf{P}$ to find this result. We know the probabilities of transitioning to each other state. Expectation is linear, so we can write the expectation of the length of a siege in one state based on the expected durations of other states and the transition probabilities. \begin{align} \mathbb{E}R_i = \sum_{j=0}^{10} p_{i,j} \cdot (1 + \mathbb{E} R_j) \end{align} In particular for state 1, we find the following. \begin{align} \mathbb{E}R_1 &= p_0 \cdot \mathbb{E} (1 + R_1) + (1- p_0) \cdot (1 + \underbrace{\mathbb{E} R_0}_{0}) \\ \mathbb{E}R_1 \cdot (1 - p_0) &= p_0 + 1 - p_0 \\ &= 1 \\ \mathbb{E}R_1 &= \frac{1}{1-p_0} \end{align} This matches what we had from before and was thus to some extent not necessary to do, but it should help build confidence in this method. We can extent this and work up to each new state. \begin{align} \mathbb{E}R_2 &= p_0 \cdot (1 + \mathbb{E} R_2) + p_1 \cdot (1 + \mathbb{E}R_1) + \left ( \sum_{j=2}^5 p_j \right ) \cdot (1 + \mathbb{E} R_0 ) \end{align} I'll let you go through the algebra to solve this yourself, but as you can imagine, it's not necessarily a very elegant equation. \begin{align} \mathbb{E}R_2 &= \frac{1 + \frac{p_1}{1-p_0}}{1-p_0} \end{align} This continues, with ever more fractions involving $1-p_0$ denominators. I think there's a way to look at things differently once we get past the bunching up and we're in the generic form that we came up with, but given how complicated this is getting we're not going to get any closed-form equation from which we can glean much meaning. Rather, the only advantage is that we've come up with a way through which we can calculate the result of interest without approximating anything (beyond the limits inherent in whatever computation tool we use). 

However, there is an alternate way of solving these equations using linear algebra. First, let's put these expected values into a row vector. \begin{align} \mathbf{R} &= \begin{bmatrix} \mathbb{E}R_0 & \mathbb{E}R_1 & \dots & \mathbb{E}R_{10} \end{bmatrix} \end{align} Now, we can express this in terms of the transition matrix $\mathbf{P}$. We're taking the previous equation for the expected number of rounds to reach state 0 from each other state, \begin{align} \mathbb{E}R_i = \sum_{j=0}^{10} p_{i,j} \cdot (1 + \mathbb{E} R_j), \end{align} and rewriting them all at once with matrices. \begin{align} \mathbf{R} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \mathbf{P} \end{align} Here $\mathbf{J_{1\times 11}}$ is an one by eleven matrix of all ones, a row vector of all ones. (Note that while my indexing starts at 0, the dimension subscripts here still refers to the size.)  Next, we'll do some linear algebra to solve for $\mathbf{R}$. \begin{align} \mathbf{R} - \mathbf{R} \times \mathbf{P} &= \mathbf{J_{1\times 11}} \times \mathbf{P} \\ \mathbf{R} \times \left ( \mathbf{I} - \mathbf{P} \right ) &= \mathbf{J_{1\times 11}} \times \mathbf{P} \\ \mathbf{R} &= \mathbf{J_{1\times 11}} \times \mathbf{P} \times \left ( \mathbf{I} - \mathbf{P} \right )^{-1} \end{align} At this point we can simplify the equation. We do this because multiplying a row vector of ones by each column of $\mathbf{P}$ is equivalent to adding up each column of $\mathbf{P}$ and putting the results into a row vector. Since these columns represents the transition probabilities from a given state and we we're considering all the states, each column must sum to one, as the probability of transitioning to some state is 1 (staying in the same state is considered a transition for simplicity). \begin{align} \mathbf{J_{1\times 11}} \times \mathbf{P} &= \mathbf{J_{1\times 11}} \\ \mathbf{R} &= \mathbf{J_{1\times 11}} \times \left ( \mathbf{I} - \mathbf{P} \right )^{-1} \end{align} This equation looks like it should work, but doesn't because $\mathbf{I} - \mathbf{P} $ isn't invertible, as it has a column of zeros. The left column of $\mathbf{P}$ is a one followed by all zeros, which is exactly the same as the left column of the identity matrix $\mathbf{I}$. 

We can, however, look at a submatrix, because we know $\mathbb{E}R=0$, so we can just eliminate that part of our set of equations. \begin{align} \mathbf{R} &= \begin{bmatrix} 0 & \mathbb{E}R_1 & \cdots & \mathbb{E}R_{10} \end{bmatrix}\\ &= \begin{bmatrix} 0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} \end{align} We'll start by dividing $\mathbf{P}$ into submatrices. \begin{align} \mathbf{P} = \begin{bmatrix} 1 & \mathbf{T} \\ \mathbf{0} & \mathbf{Q} \\ \end{bmatrix} \end{align} Here, 1 is a single 1, representing that once we get to state 0 we stay there. As a corollary, in state 0 we don't transition to any other states, so the column vector $\mathbf{0}$ represents a set of zeros for those transition probabilities. Next, $\mathbf{Q}$ represents the one-step transition probabilities amongst the transient states 1-10. (Transient meaning that the system is only in those states temporarily as it transitions, at least with probability 1.  There is the pathological case where it could stay there forever which occurs with probability 0.) Finally, $\mathbf{T}$ represents the probabilities of transitioning in one step from the transient states 1-10 to the recurrent and absorbing state 0. (Recurrent states once reached, recur infinitely many times with probability 1. Absorbing states can never be left.)  Note that $\mathbf{T}$ is often called $\mathbf{R}$, but we've already used that, so we had to go with a different letter here. I've often seen the matrix $\mathbf{P}$ organized a bit differently, with the recurrent states at the bottom.   (The calculation we're doing is often referred to as the hitting time.)  While convention is useful, we are not constrained by it. 

Next, we just plug in these divided $\mathbf{R}$ and $\mathbf{P}$. \begin{align} \begin{bmatrix}0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \begin{bmatrix} 1 & \mathbf{T} \\ \mathbf{0} & \mathbf{Q} \\ \end{bmatrix} \end{align} Now we'll get rid of the left entry of both sides. Note that another way of looking at the issue with the full matrix $\mathbf{P}$ is that it yielded a nonsensical equation for $\mathbb{E}R_0$. \begin{align} \mathbb{E}R_0 &= 1 + \mathbb{E}R_0 \end{align} We could fix this equation so that it's not wrong, but including $\mathbb{E}R_0 = \mathbb{E}R_0$ still doesn't let us solve for $\mathbb{E}R_0$, which we already know anyways. 

To remove the left entry of the right side of our matrix equation, we remove the left column of $\mathbf{P}$. \begin{align} \mathbf{R_{1\text{-}10}} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \begin{bmatrix} \mathbf{T} \\ \mathbf{Q} \end{bmatrix} \end{align} Now we'll break up $\mathbf{J_{1\times 11}}$ and $\mathbf{R}$ on the right side, so that we can multiply out with the remains of $\mathbf{P}$, using the rules of matrix multiplication. \begin{align} \mathbf{R_{1\text{-}10}} &= \left ( \begin{bmatrix} \mathbf{J_{1\times 1}} & \mathbf{J_{1\times 10}} \end{bmatrix} + \begin{bmatrix} 0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} \right ) \times \begin{bmatrix} \mathbf{T} \\ \mathbf{Q} \end{bmatrix} \\ &=\mathbf{J_{1\times 1}} \times \mathbf{T} + \left ( \mathbf{J_{1\times 10}} + \mathbf{R_{1\text{-}10}} \right ) \times \mathbf{Q} \end{align} Now we do some algebra so that we can solve for $\mathbf{R_{1\text{-}10}}$. \begin{align} \mathbf{R_{1\text{-}10}} \times \left ( \mathbf{I} - \mathbf{Q} \right ) &= \mathbf{J_{1\times 1}} \times \mathbf{T}+ \mathbf{J_{1\times 10}} \times \mathbf{Q} \\ \mathbf{R_{1\text{-}10}} &= \left ( \mathbf{J_{1\times 1}} \times \mathbf{T}+ \mathbf{J_{1\times 10}} \times \mathbf{Q} \right ) \times \left ( \mathbf{I} - \mathbf{Q} \right ) ^ {-1} \end{align} Recall our step earlier where we used the equation $\mathbf{J_{1\times 11}} \times \mathbf{P} = \mathbf{J_{1\times 11}}$ to simplify, as each column of $\mathbf{P}$ sums to one. This remains true of the submatrix. \begin{align} \begin{bmatrix} \mathbf{J_{1\times 1}} & \mathbf{J_{1\times 10}} \end{bmatrix} \times \begin{bmatrix}\mathbf{T} \\ \mathbf{Q} \end{bmatrix} &= \mathbf{J_{1\times 1}} \times \mathbf{T} + \mathbf{J_{1\times 10}} \times \mathbf{Q}\\ &= \mathbf{J_{1\times 10}} \end{align} Recall that $\mathbf{T}$ and $\mathbf{Q}$ have 10 columns, hence the 10 columns in the last equation, which we can use to find $\mathbf{R_{1\text{-}10}}$. \begin{align} \mathbf{R_{1\text{-}10}} &= \mathbf{J_{1\times 10}} \times \left ( \mathbf{I} - \mathbf{Q} \right ) ^ {-1} \end{align} The result of this computation for 1-10 initial defending hit points is plotted in Figure 7. While a fair amount of computation is elided by including a matrix inversion, which is not always trivial, in this case standard algorithms seem to have no issue, numerical or otherwise. 

Figure 7: Calculated expected length siege by number of defending hits points

The results from the three methods can be compared by inspecting the plots in Figures 8 and 9, which show the results with and without leadership/rerolls, respectively. The first rough estimate is somewhat lower than the two later techniques, which appear indistinguishable. Both provide the very similar trends, dominated by a linear factor between initial defending hit points and expected length of the siege. 

Figure 8: Comparison of methods to find expected length of siege with rerolls

Figure 9: Comparison of methods to find expected length of siege without rerolls

It may also be interesting to compare against the methods used when we addressed the question What is the best weapon in D&D?. There we used a different approach when dealing with similar issues.

Monday, April 18, 2022

How many menus are there in Sushi Go Party!?

Sushi Go Party! includes a lot of options not available in the original Sushi Go. Players can select what on the menu\footnote{Unfortunately, there's also a menu item titled menu.} for each play. A valid menu has a set format: Nigiri is always included, as well as one of three types of roll, three out of eight types of appetizer, two out of eight types of specials, and one of three types of dessert. Each of these choices is independent, so we can multiply the number of ways to do each together to get the total number of combinations. For appetizers and specials where we're choosing more than one, we don't care about the order, what matters is what items are shuffled into the deck. Thus, we're looking for the number of combinations of each, where the number of combinations of selecting $k$ items from $n$ options, is

\begin{align} \binom{n}{k} = \frac{n!}{k! \cdot (n-k)!}. \end{align}

If you want to be able to calculate this manually without a tool with a special function, it can be helpful to do some of the cancellation yourself. We can rewrite this while reducing the amount of common terms in the numerator and the denominator.

\begin{align} \frac{n!}{k! \cdot (n-k)!} = \frac{\prod_{i=n-k+1}^n i}{k!} \end{align}

A concrete example may help.

\begin{align} \binom{8}{3} &= \frac{8!}{3! \cdot (8-3)!}\\ &= \require{cancel}\frac{8 \cdot 7 \cdot 6 \cdot \cancel{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}}{3 \cdot 2 \cdot 1 \cdot \cancel{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}} \\ &= 56 \end{align}

Applying this we get the number of combinations.

\begin{align} 1 \cdot 3 \cdot \binom{8}{3} \cdot \binom{8}{2} \cdot 3 = 14112 \end{align}

There are a couple of wrinkles. First, is that a couple of the menu items score a little bit differently depending on how many players there are, but I don't consider that a different menu.


Second, the above only applies for player counts of 3--6. When there are 2, 7, or 8 players there are some restrictions to the items included. Spoon (a special) and edamame (an appetizer) may not be used in two-player games. Thus in such cases there are fewer menu combinations.

\begin{align} 1 \cdot 3 \cdot \binom{7}{3} \cdot \binom{7}{2} \cdot 3 = 6615 \end{align}

Similarly, in seven- and eight-player games neither menu nor special order are allowed, both of which are specials.

\begin{align} 1 \cdot 3 \cdot \binom{8}{3} \cdot \binom{6}{2} \cdot 3 = 7560 \end{align}

Monday, April 11, 2022

Elchanan Mossel's Amazing Dice Paradox

I recently came across this old video from the MindYourDecisions Youtube channel by Presh Talwalkar (which I find to often be a good source for interesting problems). At first I was sure the video was wrong. Then I thought some more and realized he was right. Finally, I am convinced that he is wrong, but about something else. This post is me shaking my angry fist at the internet and since it's about rolling a die, I figured it was close enough to being about games to count.

First, let's state the problem of the video:


You roll a fair dice until you get a 6. What is the expected number of rolls, including the roll of 6, conditioned on the event that all previous rolls were even numbers?

There are two curious changes from the way the problem is stated in most other places that are linked in the video. First, most statements use the singular "die" and not "dice". This is perhaps leaning on the possible internet origin post of the problem (it is attributed as originally coming from Elchanan Mossel given to undergrad students during a class at UPenn in 2014-5) which uses "dice" in the singular, though this is changed to "die" in the follow-up post. I find this choice curious because while I do see some usage of dice in the singular (perhaps more in non-American contexts like Shut Up and Sit Down - though even they've corrected dice to die) I see no argument for discarding die as the singular. That is, I've never seen any assertion that the singular "die" is incorrect. Even the dictionary used in the video says the following:

Historically, dice is the plural of die, but in modern English dice can be both the singular and the plural: throw the dice could mean a reference to either one or more than one dice

Note the word "can" is used, not "is". This is even more clear in the phrasing on the US version instead of the UK version of the same website:

Historically, dice is the plural of die, but in modern English dice is sometimes also used in the singular: throw the dice could mean a reference to either one or more than one dice

This would leave me with little support for labeling the singular "dice" as wrong, but I still don't see why someone accustomed to the singular "die" would choose to abandon it and introduce ambiguity. I'm used to the singular "die", though, and anything else just sounds weird. If you learned "dice" and are used to it, I could see it being otherwise.

Second, and perhaps more relevantly, most statements of the problem say that all rolls are even numbers instead of all previous rolls. The video makes a big distinction between these and claims them to give different answers. In fact, it claims that the answer to the "all rolls" version of the problem is the oft-given erroneous answer of 3. This is wrong, as I will try to demonstrate here. I will focus on how the two problems are the same. There are other good sources for how to approach the main problem, like the one here.


The common wrong answer of 3 comes from the idea that if we condition on rolling even numbers then that is equivalent to rolling a three-sided die with faces labeled 2, 4, and 6. However, that is not true. Instead, we can look at the problem setup to construct all sequences that end in a 6 and exclude those that contain odd numbers. Since 6 is an even number, we will see that conditioning on rolling all even numbers is the same as rolling all even numbers before the 6.


To quickly reference why the average number of rolls if we rolled a three-sided die until we got a particular side, we can look to our previous work with Memoir '44, where we found that the expected number of rolls needed to achieve a roll with probability $p$ is $1/p$. In this case the probability is $1/3$, so the expected number of rolls, $\mathbb{E}N_3$ is $1/(1/3)=3$. (The notation $N_3$ uses the subscript 3 because there are three sides to the die here.) We can find this by starting by the probability of getting a roll of length $n$,

\begin{align} P(N_3=n) = \left ( \frac{2}{3} \right ) ^{n-1} \cdot \frac{1}{3}, \end{align}

with an often-useful shortcut to computing the expected value for non-negative integer-valued random variables,

\begin{align} \mathbb{E} N = \sum_{n=0}^\infty P(N > n). \end{align}

There are more details in the previous post on the toughest unit in Memoir '44.


Let's start by looking at sequences of six-sided die rolls of length 1 that end in a 6. Actually, there's just one, as shown in Table 1. There I've annotated it with a * to indicate that it meets our conditioning criteria of containing only even numbers. All $1/1$ of the sequences are allowed.

\begin{align*} \begin{array}{c} 6 * \end{array} \end{align*} 
Table 1: Sequences of length 1


Now, let's look at sequences of length 2, where only $2/5$ are allowed.

\begin{align*} \begin{array}{ccccc} 16 \phantom{*} & 26 * & 36\phantom{*} & 46 * & 56\phantom{*} \end{array} \end{align*} 
Table 2: Sequences of length 2


Finally, let's look at sequences of length 3, where $4/25$ are allowed. We'll stop here lest the number of sequences gets too cumbersome.


\begin{align*} \begin{array}{ccccc} 116 \phantom{*} & 216 \phantom{*} & 316 \phantom{*} & 416 \phantom{*} & 516 \phantom{*} \\ 126 \phantom{*} & 226 * & 326 \phantom{*} & 426 * & 526 \phantom{*} \\ 136 \phantom{*} & 236 \phantom{*} & 336 \phantom{*} & 436 \phantom{*} & 536 \phantom{*} \\ 146 \phantom{*} & 246 * & 346 \phantom{*} & 446 * & 546 \phantom{*} \\ 156 \phantom{*} & 256 \phantom{*} & 356 \phantom{*} & 456 \phantom{*} & 556 \phantom{*} \end{array} \end{align*} 
Table 3: Sequences of length 3


It looks like for a sequence of length $n$, then $2^{n-1}$ of the $5^{n-1}$ sequences are allowed under the conditioning. This disproportionately removes long sequences, which is why the average is so short. You can think of it this way: the longer the sequence, the more likely it is to have at least one 1,3, or 5 before conditioning. The probability of a sequence being length $n$ is the probability that the first 6 was on roll $n$. Thus, all rolls before this must be something other than 6. The probability of this happening on each roll is $5/6$, while the probability of getting a 6 in round $n$ is $1/6$. Thus, if $N_6$ is a random variable equal to the length of the sequence of rolling a six-sided die, then

\begin{align} P(N_6=n) = \left ( \frac{5}{6} \right) ^{n-1} \cdot \frac{1}{6}. \end{align}

However, if we only allow sequences with even numbers, then there are only two valid choices for each roll before the last, thus the individual roll probability changes from $5/6$ to $2/3$ and the sequence probability changes accordingly. With only even rolls allowed, there are three options instead of six and two of them instead of five continue the sequence to be longer.  (Heads up for skimmers: this part is wrong and will be corrected later.)

\begin{align} P(N_6=n \, | \, \text{even rolls}) &= \left ( \frac{2}{3} \right) ^{n-1} \cdot \frac{1}{6} \qquad (1) \end{align}

Note that this probability is half of the probability $P(N_3=n)$ from above.

\begin{align} P(N_6=n \, | \, \text{even rolls}) &= \frac{1}{2} P(N_3=n) \end{align}

Using the definition of expected value for discrete-valued random variables,

\begin{align} \mathbb{E} N = \sum_n n \cdot P(N=n), \end{align}

we can find the expected value of $N_6$ given all even rolls.

\begin{align} \mathbb{E} \left [ N_6 \, | \, \text{even rolls} \right ] &= \sum_{n=1}^\infty n \cdot P(N_6=n \, | \, \text{even rolls}) \\ &=\sum_{n=1}^\infty n \cdot \frac{1}{2} \cdot P(N_3=n) \\ &=\frac{1}{2} \cdot \underbrace{\left (\sum_{n=1}^\infty n \cdot P(N_3=n) \right )}_{\mathbb{E} N_3}\\ &= \frac{1}{2} \cdot 3 \\ &= 1.5 \end{align}


That's a little hand-wavy though, so let's take this step slower. We're interested in conditional expectation, and thus conditional probability.

\begin{align} \mathbb{E}\left [N \, | \, A \right ] = \sum_n n \cdot P(N=n \, | \, A) \end{align}

Here $A$ is some generic event. Recall that conditional probability is given by the following,

\begin{align} P(N=n \, | \, A) = \frac{P( N=n \cap A)}{P(A)}, \end{align}

where $\cap$ means and, that is that both the thing on its left and its right are true. For an individual roll that continues the sequence (the roll doesn't equal 6), which we'll label as event $C$ for continuing, the conditional probability is as follows.

\begin{align} P(C \, | \, \text{even roll}) &= \frac{P(C \cap \text{even roll})}{P(\text{even roll})} \\ &=\frac{\;\cfrac{2}{6}\;}{\;\cfrac{3}{6}\;} \\ &=\frac{2}{3} \end{align}

Now we apply this to the probability of the final roll. We'll say the probability of ending is $P(E)$.

\begin{align} P(E \, | \, \text{even roll}) &= \frac{P(E \cap \text{even roll})}{P(\text{even roll})} \\ &=\frac{\;\cfrac{1}{6}\;}{\;\cfrac{3}{6}\;} \\ &=\frac{1}{3} \end{align}

This is not what we want. I've brought us to the wrong answer of 3!  (I haven't quite gotten there, but I think you can take the next few steps.)


At this point, I think it may be useful to acknowledge one of the commenters to the video for helping me to realize what was going on. I was convinced that the answer should be the same for the "previous" case as the all evens case and didn't believe that 1.5 was the correct answer.  Marce Villarino posted some python code that quickly convinced me that 1.5 must be right, but also helped me see that the two cases are actually the same. The code follows.


##### Record successive thows of a six sides die, untill a 6 is thrown.

##### If the record does not contain even number, record it's length.

import random

import time

def calcOnlySixStops(howManyTimes):

result = list()

start_time = time.time()

for i in range(howManyTimes):

a = list()

a.append(random.randint(1,6))

while a[-1] != 6:

a.append(random.randint(1,6))

if not((1 in a) or (3 in a) or (5 in a)):

result.append(len(a))

print("--- %s seconds ---" % (time.time() - start_time))

return result

lolailo = calcOnlySixStops(1000000)

print(sum(lolailo)/len(lolailo))

####### about 1.5

####### it takes ca. 43 seconds on my laptop


##### Same, but throw a three sided die.

def calcThreeSidedDie(howManyTimes):

result = list()

start_time = time.time()

for i in range(howManyTimes):

a = list()

a.append(random.choice([2,4,6]))

while a[-1] != 6:

a.append(random.choice([2,4,6]))

result.append(len(a))

print("--- %s seconds ---" % (time.time() - start_time))

return result

lolailo = calcThreeSidedDie(1000000)

print(sum(lolailo)/len(lolailo))

####### abt 3.0

####### it takes ca. 19 s on my laptop.


As you can see from the annotated results, or if you run it yourself, it shows that the expected value is approximately 1.5. It uses a Monte Carlo approach with a million trials, which seems sufficient here.  Also, if you look through the algorithm to condition on the previous rolls being even, you'll see that it's exactly the same as what you would do if you were to condition on all rolls being even.


The key error that we made is the same one to get the wrong answer most of the time, I think, just a little drawn out. Note that we conditioned on individual even rolls, not all rolls being even. That's the wrong event. If we want to take that approach, we need to condition the entire $P(N_6=n)$ on getting even rolls.  (We could also use the individual pieces, but we still must condition on getting all even rolls, not just the one under consideration at that moment.)

\begin{align} P(N_6=n \, | \, \text{even rolls}) &= \frac{P(N_6 =n \cap \text{even rolls})}{P(\text{even rolls})} \end{align}

The numerator is relatively easy to figure out. Based on what we saw above, we find that

\begin{align} P(N_6 =n \cap \text{even rolls}) = \left ( \frac{2}{6} \right)^{n-1} \cdot \frac{1}{6} \end{align}

To get this we we took the unconditioned probability of getting a certain number of rolls and combined it with the fraction of the rolls that meet our all evens criteria.  But what is the probability of even rolls? Note that this isn't the probability of getting even rolls given a length, it's the overall probability. Here it may be useful to turn things around.

\begin{align} P(\text{even rolls}) &= \sum_n P(\text{even rolls} \, | \, N_6=n) \cdot P(N_6=n) \\ &= \sum_n P(\text{even rolls} \cap N_6=n) \end{align}

Since we already know this joint probability from above, we can plug it in.

\begin{align} P(\text{even rolls}) &= \sum_{n=1}^\infty \left ( \frac{2}{6} \right)^{n-1} \cdot \frac{1}{6}\\ &= \frac{1}{6} \cdot \sum_{n=1}^\infty \left ( \frac{1}{3} \right)^{n-1} \end{align}

The above includes a geometric series. We can adjust the range and the exponent to make it look more familiar

\begin{align} P(\text{even rolls}) &= \frac{1}{6} \cdot \underbrace{\sum_{n=0}^\infty \left ( \frac{1}{3} \right)^{n}}_{\cfrac{1}{1-\cfrac{1}{3}}} \\ &= \frac{1}{6} \cdot \frac{3}{2}\\ & = \frac{1}{4} \end{align}
Does this number make sense? At first, it may seem like too few of the sequences. But consider that on each roll, we throw out half of the results. That is, every time we roll the die half of the results are excluded.
 

Now we have the pieces we need and we can find the average number of rolls.

\begin{align} P(N_6=n \, | \, \text{even rolls}) &=\frac{P(N_6 =n \cap \text{even rolls})}{P(\text{even rolls})} \\ &= \frac{\left ( \cfrac{2}{6} \right)^{n-1} \cdot \cfrac{1}{6}}{\cfrac{1}{4}} \\ P(N_6=n \, | \, \text{even rolls}) &=\left(\frac{1}{3} \right )^{n-1} \cdot \frac{2}{3} \label{eq:myd_2} \qquad (2)\end{align}

Leveraging our previous work with Memoir '44, this has the same form as asking how long it takes to get the first result, when the result occurs with probability $p=2/3$. Thus,

\begin{align} \mathbb{E} \left [ N_6 \, | \, \text{even rolls} \right ] &= \frac{1}{2/3}\\ &=1.5 \end{align}
Note that here we found that 
 \begin{align} P(N_6=n \, | \, \text{even rolls}) &= \left(\frac{1}{3} \right )^{n-1} \cdot \frac{2}{3}  \end{align}

which contradicts our earlier hand-wavy equation. Which one is right? Well, I generally trust more rigorous derivations, so I'm inclined towards the later. Interestingly, they both give the correct average. Let's go back to our simulation and look at not just the average, but also the distribution of lengths given all even rolls. By adding a calculation of the percentage of sequences of each length to the above simulation, we get the results in Table 4.


\begin{align*} \begin{array}{cccc} \text{Sequence length} & \text{Equation (1)} & \text{Equation (2)} & \text{Estimated probability} \\ \hline 1 & 0.1667 & 0.6667 & 0.6665 \\ 2 & 0.1111 & 0.2222 & 0.2226 \\ 3 & 0.0741 & 0.0741 & 0.0740 \\ 4 & 0.0494 & 0.0247 & 0.0241 \\ 5 & 0.0329 & 0.0082 & 0.0084 \\ 6 & 0.0219 & 0.0027 & 0.0029 \\ 7 & 0.0146 & 0.0009 & 0.0009 \\ 8 & 0.0098 & 0.0003 & 0.0004 \\ 9 & 0.0065 & 0.0001 & 0.0001 \end{array} \end{align*} Table 4: Simulated and theoretical sequence length conditional probabilities


As we can see, this lines up well with our second, revised, equation (2). Interestingly, equation (1) looks like it should have a higher expected number of rolls, as the probability of having sequences of length 1 and 2 is lower than simulated while the probability of sequences longer than 3 is higher. An observation that is useful to recall at this point is that equation (1) gave us that $P(N_6=n \, | \, \text{even rolls})$ is half of $P(N_3=n)$. However, further reflection indicates that that doesn't make sense as the sum of the probability over the whole sample space must be 1. The probability that there is some outcome is 1.

\begin{align} \sum_n P(N=n) = 1 \end{align}

If this is true of some function $f(n)$ such that $P(N=n)=f(n)$ it cannot be true of some other function $g(n) = f(n)/2$. That is, we should find that our incorrect equation gives the sum to be 0.5.

\begin{align} \sum_{n=1}^\infty \left ( \frac{2}{3} \right) ^{n-1} \cdot \frac{1}{6} & = \frac{1}{6} \cdot\underbrace{\sum_{n=0}^\infty \left ( \frac{2}{3} \right) ^n}_{\cfrac{1}{1-\cfrac{2}{3}}}\\ &= \frac{1}{6} \cdot 3 \\ & = \frac{1}{2} \end{align}

Indeed, that is the case. This is confirmation that equation (1) is incorrect.


Again, note that there is no sequence that would be included or excluded differently if we were to emphasis that we conditioned on rolling all even numbers before the 6. The two descriptions are logically equivalent and have the same expected number of rolls.