Monday, May 9, 2022

How long will a siege last in War of the Ring?

For simplicity, we'll assume that no combat cards are played, as that complicates things too much for an initial look. 

Our first step is to compute if there is a round of combat, how many units get removed without leaders, this is pretty clear: it's a binomial distribution with n = # of units (max 5) and $p = 2/6$ for defender (hit on 5 & 6), $p = 1/6$ for attacker (hit on only 6) Let $X_n$ be a random variable equal to the number of hits generated by an attack with $n$ combat dice. 
\begin{align} P(X_n=x) = \binom{n}{x}\cdot (1-p)^x \cdot p^{(n-x)} \end{align} With as many leaders as units (or 5 leadership in the case of larger armies), we can use the same approach. Because any die that misses the combat roll will get rerolled in the leader reroll, we can compute the combined probability, $p_l$, of hitting on either the combat roll or the leader reroll for each die namely. 
\begin{align} p_l &= p + (1-p)\cdot p \\ &= 2\cdot p-p^2 \end{align} It may be handy to look at the probability of a miss in this case also. \begin{align} 1 - p_l &= 1 - 2\cdot p + p^2 \\ &= (1-p)^2 \end{align} If we let $X_n'$ be a random variable equal to the number of hits generated by an attack with $n$ combat dice and at least $n$ leadership. This is the same idea as $X_n$, but with at least $n$ leadership. \begin{align} P(X_n' = x) &= \binom{n}{x}\cdot (1-p_l)^x \cdot {p_l}^{(n-x)} \\ &= \binom{n}{x}\cdot (1-p)^{2x} \cdot \left ( 2\cdot p-p^2 \right )^{(n-x)} \end{align} When the leadership and combat strength do no match, then it gets more tricky. It's tempting to think that we can just add two cases. That is, it may seem that 5 combat strength with 3 leadership is equivalent to adding the hits generated by 3 combat strength with 3 leadership and 2 combat strength with 0 leadership. However, this is not the case, because that way if the 3 units with leadership hit and the 2 units without leadership miss, then there is no leadership reroll. However, in the case with 5 units with 3 leadership roll the same as before, if any two units miss then their dice get rerolled. Thus, we'd need to approach it differently. For simplicity, we'll set this case aside and focus on the extremes of no leadership and maximum leadership. 

Using the equations we derived, we can compute the probabilities of getting different numbers of hits for a variety of situations. There are three variables we'll consider: the number of combat dice rolled, whether leaders are present, and whether the units are attacking a fortified position, meaning either a stronghold or a city or fortification in the first round. The next several figures plot the probabilities in a couple different ways. 

Figure 1: PMF of hits in War of the Ring when attacking with 5 units

Figure 2: PMF of hits in War of the Ring when attacking a fortified position with leadership

Figure 3: PMF of hits in War of the Ring when attacking an unfortified position with leadership

Figure 4: PMF of hits in War of the Ring when attacking a fortified position without leadership

Figure 5: PMF of hits in War of the Ring when attacking an unfortified position without leadership

Our next step is to find how many rounds of combat a siege is expected to last. We'll use three techniques to quantify this and compare. Throughout, we'll use a few common assumptions. While the number of defending units may vary, being able to take a total of $d$ hits, we fix the number of attack dice at five, either with or without leadership. We thus remove the possibility that the siege goes well enough for the defender that the attackers are whittled down or eliminated. 

First, we'll estimate it based on how many individual dice it takes to eliminate the defending units, similar to when we looked at What's the toughest unit in Memoir '44?. The limitation here is that when moving from rolling one die at a time to five dice at a time there's different quantization occurring. Needing to roll one through five dice is all just one round of combat, but moving from five dice to six dice means a second round even though it's only one die. The averaging that occurs over dice and rounds can have a notable impact. We'll see when we compare. Each die eliminates one hit point of defending units with probability $p$ (or $p_l$ with leadership). The expected number of dice to generate this one hit is $1/p$. Thus, to generate $d$ hits we can estimate $d/p$ dice on average. Since five dice are rolled each round, this means there should be about $d/(5p)$ rounds on average. 

For the next two methods, we'll make use of a Markov chain, as we did when checking if it was possible an event was never drawn in a previous problem. We'll start with an initial state column vector $\mathbf{s_d}$ that indicates the number, $d$, of hit points of the defenders in the siege. The vector is zero everywhere, except in index $d$, where it has a value of 1 to indicate that the initial probability of there being $d$ defending hit points is 1. (I'll use indexes starting at 0, instead of 1 as commonly done, as it lines up nicely with programming as well as the meaning in this problem.)  For example, with 10 defending hit points, \begin{align} \mathbf{s_{10}}^T & = \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} , \end{align} where $\mathbf{s_d}^T$ is the transpose of $\mathbf{s_d}$, shown to fit on the page more easily. Note the 1 would be in the bottom position of the column vector. Each position in the vector corresponds to the 11 total states: 0-10 defending hit points of units. 

We can multiply this by a transition matrix $\mathbf{P}$. \begin{align} \mathbf{P} = \begin{bmatrix} p_{0,0} & p_{0,1} & \dots & p_{0,10}\\ p_{1,0} & p_{1,1} \\ \vdots & & \ddots \\ p_{10,0} & p_{10,1} & \ldots & p_{10,10} \end{bmatrix} \end{align} An element of $\mathbf{P}$, $p_{i,j}$ is the probability of transitioning from state $j$ to state $i$, in line with matrix multiplication. This is a left stochastic matrix, where each column adds to one. You may find this less common than right stochastic matrices, where each row sums to one. This means that the matrices and equations here may need to be transposed to compare to other sources. The core is whether $p_{i,j}$ is as used here or as the probability of transitioning from state $i$ to state $j$. The left and right names have to do with whether the stochastic matrix should multiply state probability vectors from the left or right. 

To fill in the values of this matrix, let's work with the probabilities of getting a certain number of hits by the attacker, which we assume always rolls 5 dice for simplicity. The probability $p_h$ is the probability of getting $h$ hits. Thus we have six probabilities of interest here: $p_0$, $p_1$, $p_2$, $p_3$, $p_4$, and $p_5$, where $p_h = P(X_5 = h)$ as found earlier. We construct $\mathbf{P}$ from these individual probabilities. Each column represents the probabilities of moving from one state to the others. Thus, the values of each column must sum to 1, as the probability of ending up in one of the states is 1 and all states are under consideration here. Further, since there is no way to add defending hit points during a battle, we know that many of the entries are zero. \begin{align} p_{i,j} = 0 \qquad i > j \end{align} We can't transition from state $j$, which has $j$ defending hit points, to a state $i$, which has $i$ defending hit points, where $i>j$. 

We'll go through the matrix one column at a time to build it up. The first column, representing what happens when there are already no defending units is fairly simple, as we must stay in that state. Thus, the transition probabilities are either 0 or 1. \begin{align} p_{i,0} = \begin{cases} 1 &\qquad i = 0 \\ 0 &\qquad i \neq 0 \end{cases} \end{align} Specifically, the probability of staying in state 0 is 1 and the probability of transitioning to any other state is 0. 

In the next column, we start with one defending hit point and there are two possibilities: if no hits are rolled, then stay in the same state, but if any hits are rolled, then move to state 0. \begin{align} p_{i,1} = \begin{cases} \sum_{h=1}^5 p_h = (1-p_0) &\qquad i = 0 \\ p_0 & \qquad i = 1\\ 0 &\qquad i > 1 \end{cases} \end{align} The next several columns follow this pattern \begin{align} p_{i,2} = \begin{cases} \sum_{h=2}^5 p_h &\qquad i = 0 \\ p_1 & \qquad i = 1\\ p_0 & \qquad i = 2\\ 0 &\qquad i > 2 \end{cases} \end{align} Eventually, no summation is necessary and next the probability of reaching state 0 in the single transition of one round of combat becomes 0, as it would require more than the maximum 5 hits. \begin{align} p_{i,6} = \begin{cases} 0 &\qquad i = 0 \\ p_5 & \qquad i = 1\\ p_4 & \qquad i = 2\\ p_3 & \qquad i = 3\\ p_2 & \qquad i = 4\\ p_1 & \qquad i = 5\\ p_0 & \qquad i = 6\\ 0 &\qquad i > 6 \end{cases} \end{align} We can also generalize this. \begin{align} p_{i,j} = \begin{cases} p_5 & \qquad i = j-5\\ p_4 & \qquad i = j-4\\ p_3 & \qquad i = j-3\\ p_2 & \qquad i = j-2\\ p_1 & \qquad i = j-1\\ p_0 & \qquad i = j\\ 0 &\qquad i < j-5, i > j \end{cases} \qquad j \geq 5 \end{align} Multiplying some probability state vector, such as $\mathbf{s_d}$, by $\mathbf{P}$ gives the updated probabilities of being in each state after one round of combat. \begin{align} \mathbf{s_{new}} = \mathbf{P} \times \mathbf{s_{old}} \end{align} We can repeat this to get the probabilities after multiple rounds of combat. \begin{align} \mathbf{s_{r=2}} &= \mathbf{P} \times \mathbf{s_{n=1}}\\ \mathbf{s_{r=2}} &= \mathbf{P} \times \left ( \mathbf{P} \times \mathbf{s_{n=0}} \right ) \\ \mathbf{s_{r=2}} &= \mathbf{P}^2 \times \mathbf{s_{n=0}} \end{align} Here $\mathbf{s_{r=2}}$ indicates the state probability vector after two rounds of combat. Matrix exponentiation is repeated matrix multiplication as exponentiation is repeated multiplication. \begin{align} \mathbf{P}^r &= \underbrace{\mathbf{P} \times \mathbf{P} \times \cdots \times \mathbf{P}}_{r \text{ times}} \end{align} Since it comes up later if you pay attention, so we'll note that just as $x^0 = 1$, $\mathbf{P}^0 = \mathbf{I}$. That is, if we multiply by $\mathbf{P}$ zero times, we should keep what we have, which is accomplished by multiplying by the identity matrix. 

The number of rounds, $R_{d}$, that it takes to remove all $d$ defending units has a probability mass function that can be extracted by grabbing the probability of being in state 0 beginning in round $r$, using a matrix $\mathbf{A}$. \begin{align} \mathbf{A} = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \end{align} The point of this matrix is to take the first entry of the state probability vector, which is the probability of there being no defending hit points remaining that we're interested in. Combining $\mathbf{A}$, $\mathbf{P}$, and $\mathbf{s_{d}}$, we find the probability that there are no defending units in round $r$. \begin{align} P(0\text{ defenders in round } r) = \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_{d}} \end{align} To get the pmf of $R_{d}$, we want to only see the probability that this came to be in round $r$. \begin{align} P(R_{d} = r) &= P(0\text{ defenders in round } r) - P(0\text{ defenders in round } r-1)\\ & = \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_{d}} - \mathbf{A} \times \mathbf{P}^{r-1} \times \mathbf{s_{d}} \\ & = \mathbf{A} \times \left ( \mathbf{P}^{r} - \mathbf{P}^{r-1} \right ) \times \mathbf{s_{d}} \end{align} Here it's useful to make use of an identity matrix, $\mathbf{I}$, which is the matrix multiplication equivalent of multiplying by 1 in that multiplying by it leaves the same matrix. \begin{align} P(R_{d}=r) = \mathbf{A} \times \left(\mathbf{P}- \mathbf{I} \right) \times \mathbf{P}^{r-1} \times \mathbf{s_{d}} \end{align} For our second technique, let's take this and do enough of an infinite sum to estimate the expected value of $R$. Recall the definition of the expected value of a non-negative discrete random variable. \begin{align} \mathbb{E} R_d &= \sum_{r=0}^\infty r \cdot P(R_d = r) \\ &= \sum_{r=0}^\infty P(R_d > r) \end{align} We can turn these probability terms around by looking at their complements. \begin{align} P(R_d > r) &= 1 - P(R_d \leq r) \end{align} The probability that the combat ends in round $r$ or earlier, $P(R_d \leq r)$ is equivalent to there being no defenders in round $r$. They could have been removed in that round or an earlier one. We found this probability earlier. \begin{align} P(R_d \leq r) &= P(0\text{ defenders in round } r)\\ &= \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \end{align} Putting this together, we find the following infinite sum. \begin{align} \mathbb{E} R_d &= \sum_{r=0}^\infty \left ( 1 - \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \right ) \end{align} By computing this running sum, as shown in Figure 6 for 10 initial hit points, we observe that it appears to be approaching an asymptote well before 50 rounds. \begin{align} \widehat{\mathbb{E} R_d} &= \sum_{r=0}^{50} \left ( 1 - \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \right ) \end{align} The asymptote is approached faster with fewer hit points, which must require fewer rounds to eliminate with the same combat rolls. Thus, we can use the running sum after 50 rounds as a good estimate of the infinite sum. 

Figure 6: Running estimate of expected number of combat rounds complete siege of 10 defending hit points

For our third technique, I think we can do better than an estimate this time. We can solve for the actual expected values. We'll again look at the number of rounds $R_d$ based on starting with $d$ defending hit points. If we start with no defending units, the siege lasts 0 rounds. \begin{align} \mathbb{E}R_0 = 0 \end{align} If we start with 1 hit point of defenders, then it's just a question of how long until the attacker rolls one hit. The probability of one or more hits if $1-p_0$ and we've previously found that the expected number of tries to get a result that occurs with probability $p$ is $1/p$, so we can find the expected value of $R_1$. \begin{align} \mathbb{E}R_1 &= \frac{1}{1-p_0} \end{align} We can also leverage the columns of $\mathbf{P}$ to find this result. We know the probabilities of transitioning to each other state. Expectation is linear, so we can write the expectation of the length of a siege in one state based on the expected durations of other states and the transition probabilities. \begin{align} \mathbb{E}R_i = \sum_{j=0}^{10} p_{i,j} \cdot (1 + \mathbb{E} R_j) \end{align} In particular for state 1, we find the following. \begin{align} \mathbb{E}R_1 &= p_0 \cdot \mathbb{E} (1 + R_1) + (1- p_0) \cdot (1 + \underbrace{\mathbb{E} R_0}_{0}) \\ \mathbb{E}R_1 \cdot (1 - p_0) &= p_0 + 1 - p_0 \\ &= 1 \\ \mathbb{E}R_1 &= \frac{1}{1-p_0} \end{align} This matches what we had from before and was thus to some extent not necessary to do, but it should help build confidence in this method. We can extent this and work up to each new state. \begin{align} \mathbb{E}R_2 &= p_0 \cdot (1 + \mathbb{E} R_2) + p_1 \cdot (1 + \mathbb{E}R_1) + \left ( \sum_{j=2}^5 p_j \right ) \cdot (1 + \mathbb{E} R_0 ) \end{align} I'll let you go through the algebra to solve this yourself, but as you can imagine, it's not necessarily a very elegant equation. \begin{align} \mathbb{E}R_2 &= \frac{1 + \frac{p_1}{1-p_0}}{1-p_0} \end{align} This continues, with ever more fractions involving $1-p_0$ denominators. I think there's a way to look at things differently once we get past the bunching up and we're in the generic form that we came up with, but given how complicated this is getting we're not going to get any closed-form equation from which we can glean much meaning. Rather, the only advantage is that we've come up with a way through which we can calculate the result of interest without approximating anything (beyond the limits inherent in whatever computation tool we use). 

However, there is an alternate way of solving these equations using linear algebra. First, let's put these expected values into a row vector. \begin{align} \mathbf{R} &= \begin{bmatrix} \mathbb{E}R_0 & \mathbb{E}R_1 & \dots & \mathbb{E}R_{10} \end{bmatrix} \end{align} Now, we can express this in terms of the transition matrix $\mathbf{P}$. We're taking the previous equation for the expected number of rounds to reach state 0 from each other state, \begin{align} \mathbb{E}R_i = \sum_{j=0}^{10} p_{i,j} \cdot (1 + \mathbb{E} R_j), \end{align} and rewriting them all at once with matrices. \begin{align} \mathbf{R} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \mathbf{P} \end{align} Here $\mathbf{J_{1\times 11}}$ is an one by eleven matrix of all ones, a row vector of all ones. (Note that while my indexing starts at 0, the dimension subscripts here still refers to the size.)  Next, we'll do some linear algebra to solve for $\mathbf{R}$. \begin{align} \mathbf{R} - \mathbf{R} \times \mathbf{P} &= \mathbf{J_{1\times 11}} \times \mathbf{P} \\ \mathbf{R} \times \left ( \mathbf{I} - \mathbf{P} \right ) &= \mathbf{J_{1\times 11}} \times \mathbf{P} \\ \mathbf{R} &= \mathbf{J_{1\times 11}} \times \mathbf{P} \times \left ( \mathbf{I} - \mathbf{P} \right )^{-1} \end{align} At this point we can simplify the equation. We do this because multiplying a row vector of ones by each column of $\mathbf{P}$ is equivalent to adding up each column of $\mathbf{P}$ and putting the results into a row vector. Since these columns represents the transition probabilities from a given state and we we're considering all the states, each column must sum to one, as the probability of transitioning to some state is 1 (staying in the same state is considered a transition for simplicity). \begin{align} \mathbf{J_{1\times 11}} \times \mathbf{P} &= \mathbf{J_{1\times 11}} \\ \mathbf{R} &= \mathbf{J_{1\times 11}} \times \left ( \mathbf{I} - \mathbf{P} \right )^{-1} \end{align} This equation looks like it should work, but doesn't because $\mathbf{I} - \mathbf{P} $ isn't invertible, as it has a column of zeros. The left column of $\mathbf{P}$ is a one followed by all zeros, which is exactly the same as the left column of the identity matrix $\mathbf{I}$. 

We can, however, look at a submatrix, because we know $\mathbb{E}R=0$, so we can just eliminate that part of our set of equations. \begin{align} \mathbf{R} &= \begin{bmatrix} 0 & \mathbb{E}R_1 & \cdots & \mathbb{E}R_{10} \end{bmatrix}\\ &= \begin{bmatrix} 0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} \end{align} We'll start by dividing $\mathbf{P}$ into submatrices. \begin{align} \mathbf{P} = \begin{bmatrix} 1 & \mathbf{T} \\ \mathbf{0} & \mathbf{Q} \\ \end{bmatrix} \end{align} Here, 1 is a single 1, representing that once we get to state 0 we stay there. As a corollary, in state 0 we don't transition to any other states, so the column vector $\mathbf{0}$ represents a set of zeros for those transition probabilities. Next, $\mathbf{Q}$ represents the one-step transition probabilities amongst the transient states 1-10. (Transient meaning that the system is only in those states temporarily as it transitions, at least with probability 1.  There is the pathological case where it could stay there forever which occurs with probability 0.) Finally, $\mathbf{T}$ represents the probabilities of transitioning in one step from the transient states 1-10 to the recurrent and absorbing state 0. (Recurrent states once reached, recur infinitely many times with probability 1. Absorbing states can never be left.)  Note that $\mathbf{T}$ is often called $\mathbf{R}$, but we've already used that, so we had to go with a different letter here. I've often seen the matrix $\mathbf{P}$ organized a bit differently, with the recurrent states at the bottom.   (The calculation we're doing is often referred to as the hitting time.)  While convention is useful, we are not constrained by it. 

Next, we just plug in these divided $\mathbf{R}$ and $\mathbf{P}$. \begin{align} \begin{bmatrix}0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \begin{bmatrix} 1 & \mathbf{T} \\ \mathbf{0} & \mathbf{Q} \\ \end{bmatrix} \end{align} Now we'll get rid of the left entry of both sides. Note that another way of looking at the issue with the full matrix $\mathbf{P}$ is that it yielded a nonsensical equation for $\mathbb{E}R_0$. \begin{align} \mathbb{E}R_0 &= 1 + \mathbb{E}R_0 \end{align} We could fix this equation so that it's not wrong, but including $\mathbb{E}R_0 = \mathbb{E}R_0$ still doesn't let us solve for $\mathbb{E}R_0$, which we already know anyways. 

To remove the left entry of the right side of our matrix equation, we remove the left column of $\mathbf{P}$. \begin{align} \mathbf{R_{1\text{-}10}} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \begin{bmatrix} \mathbf{T} \\ \mathbf{Q} \end{bmatrix} \end{align} Now we'll break up $\mathbf{J_{1\times 11}}$ and $\mathbf{R}$ on the right side, so that we can multiply out with the remains of $\mathbf{P}$, using the rules of matrix multiplication. \begin{align} \mathbf{R_{1\text{-}10}} &= \left ( \begin{bmatrix} \mathbf{J_{1\times 1}} & \mathbf{J_{1\times 10}} \end{bmatrix} + \begin{bmatrix} 0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} \right ) \times \begin{bmatrix} \mathbf{T} \\ \mathbf{Q} \end{bmatrix} \\ &=\mathbf{J_{1\times 1}} \times \mathbf{T} + \left ( \mathbf{J_{1\times 10}} + \mathbf{R_{1\text{-}10}} \right ) \times \mathbf{Q} \end{align} Now we do some algebra so that we can solve for $\mathbf{R_{1\text{-}10}}$. \begin{align} \mathbf{R_{1\text{-}10}} \times \left ( \mathbf{I} - \mathbf{Q} \right ) &= \mathbf{J_{1\times 1}} \times \mathbf{T}+ \mathbf{J_{1\times 10}} \times \mathbf{Q} \\ \mathbf{R_{1\text{-}10}} &= \left ( \mathbf{J_{1\times 1}} \times \mathbf{T}+ \mathbf{J_{1\times 10}} \times \mathbf{Q} \right ) \times \left ( \mathbf{I} - \mathbf{Q} \right ) ^ {-1} \end{align} Recall our step earlier where we used the equation $\mathbf{J_{1\times 11}} \times \mathbf{P} = \mathbf{J_{1\times 11}}$ to simplify, as each column of $\mathbf{P}$ sums to one. This remains true of the submatrix. \begin{align} \begin{bmatrix} \mathbf{J_{1\times 1}} & \mathbf{J_{1\times 10}} \end{bmatrix} \times \begin{bmatrix}\mathbf{T} \\ \mathbf{Q} \end{bmatrix} &= \mathbf{J_{1\times 1}} \times \mathbf{T} + \mathbf{J_{1\times 10}} \times \mathbf{Q}\\ &= \mathbf{J_{1\times 10}} \end{align} Recall that $\mathbf{T}$ and $\mathbf{Q}$ have 10 columns, hence the 10 columns in the last equation, which we can use to find $\mathbf{R_{1\text{-}10}}$. \begin{align} \mathbf{R_{1\text{-}10}} &= \mathbf{J_{1\times 10}} \times \left ( \mathbf{I} - \mathbf{Q} \right ) ^ {-1} \end{align} The result of this computation for 1-10 initial defending hit points is plotted in Figure 7. While a fair amount of computation is elided by including a matrix inversion, which is not always trivial, in this case standard algorithms seem to have no issue, numerical or otherwise. 

Figure 7: Calculated expected length siege by number of defending hits points

The results from the three methods can be compared by inspecting the plots in Figures 8 and 9, which show the results with and without leadership/rerolls, respectively. The first rough estimate is somewhat lower than the two later techniques, which appear indistinguishable. Both provide the very similar trends, dominated by a linear factor between initial defending hit points and expected length of the siege. 

Figure 8: Comparison of methods to find expected length of siege with rerolls

Figure 9: Comparison of methods to find expected length of siege without rerolls

It may also be interesting to compare against the methods used when we addressed the question What is the best weapon in D&D?. There we used a different approach when dealing with similar issues.

Monday, April 18, 2022

How many menus are there in Sushi Go Party!?

Sushi Go Party! includes a lot of options not available in the original Sushi Go. Players can select what on the menu\footnote{Unfortunately, there's also a menu item titled menu.} for each play. A valid menu has a set format: Nigiri is always included, as well as one of three types of roll, three out of eight types of appetizer, two out of eight types of specials, and one of three types of dessert. Each of these choices is independent, so we can multiply the number of ways to do each together to get the total number of combinations. For appetizers and specials where we're choosing more than one, we don't care about the order, what matters is what items are shuffled into the deck. Thus, we're looking for the number of combinations of each, where the number of combinations of selecting $k$ items from $n$ options, is

\begin{align} \binom{n}{k} = \frac{n!}{k! \cdot (n-k)!}. \end{align}

If you want to be able to calculate this manually without a tool with a special function, it can be helpful to do some of the cancellation yourself. We can rewrite this while reducing the amount of common terms in the numerator and the denominator.

\begin{align} \frac{n!}{k! \cdot (n-k)!} = \frac{\prod_{i=n-k+1}^n i}{k!} \end{align}

A concrete example may help.

\begin{align} \binom{8}{3} &= \frac{8!}{3! \cdot (8-3)!}\\ &= \require{cancel}\frac{8 \cdot 7 \cdot 6 \cdot \cancel{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}}{3 \cdot 2 \cdot 1 \cdot \cancel{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}} \\ &= 56 \end{align}

Applying this we get the number of combinations.

\begin{align} 1 \cdot 3 \cdot \binom{8}{3} \cdot \binom{8}{2} \cdot 3 = 14112 \end{align}

There are a couple of wrinkles. First, is that a couple of the menu items score a little bit differently depending on how many players there are, but I don't consider that a different menu.

Second, the above only applies for player counts of 3--6. When there are 2, 7, or 8 players there are some restrictions to the items included. Spoon (a special) and edamame (an appetizer) may not be used in two-player games. Thus in such cases there are fewer menu combinations.

\begin{align} 1 \cdot 3 \cdot \binom{7}{3} \cdot \binom{7}{2} \cdot 3 = 6615 \end{align}

Similarly, in seven- and eight-player games neither menu nor special order are allowed, both of which are specials.

\begin{align} 1 \cdot 3 \cdot \binom{8}{3} \cdot \binom{6}{2} \cdot 3 = 7560 \end{align}

Monday, April 11, 2022

Elchanan Mossel's Amazing Dice Paradox

I recently came across this old video from the MindYourDecisions Youtube channel by Presh Talwalkar (which I find to often be a good source for interesting problems). At first I was sure the video was wrong. Then I thought some more and realized he was right. Finally, I am convinced that he is wrong, but about something else. This post is me shaking my angry fist at the internet and since it's about rolling a die, I figured it was close enough to being about games to count.

First, let's state the problem of the video:

You roll a fair dice until you get a 6. What is the expected number of rolls, including the roll of 6, conditioned on the event that all previous rolls were even numbers?

There are two curious changes from the way the problem is stated in most other places that are linked in the video. First, most statements use the singular "die" and not "dice". This is perhaps leaning on the possible internet origin post of the problem (it is attributed as originally coming from Elchanan Mossel given to undergrad students during a class at UPenn in 2014-5) which uses "dice" in the singular, though this is changed to "die" in the follow-up post. I find this choice curious because while I do see some usage of dice in the singular (perhaps more in non-American contexts like Shut Up and Sit Down - though even they've corrected dice to die) I see no argument for discarding die as the singular. That is, I've never seen any assertion that the singular "die" is incorrect. Even the dictionary used in the video says the following:

Historically, dice is the plural of die, but in modern English dice can be both the singular and the plural: throw the dice could mean a reference to either one or more than one dice

Note the word "can" is used, not "is". This is even more clear in the phrasing on the US version instead of the UK version of the same website:

Historically, dice is the plural of die, but in modern English dice is sometimes also used in the singular: throw the dice could mean a reference to either one or more than one dice

This would leave me with little support for labeling the singular "dice" as wrong, but I still don't see why someone accustomed to the singular "die" would choose to abandon it and introduce ambiguity. I'm used to the singular "die", though, and anything else just sounds weird. If you learned "dice" and are used to it, I could see it being otherwise.

Second, and perhaps more relevantly, most statements of the problem say that all rolls are even numbers instead of all previous rolls. The video makes a big distinction between these and claims them to give different answers. In fact, it claims that the answer to the "all rolls" version of the problem is the oft-given erroneous answer of 3. This is wrong, as I will try to demonstrate here. I will focus on how the two problems are the same. There are other good sources for how to approach the main problem, like the one here.

The common wrong answer of 3 comes from the idea that if we condition on rolling even numbers then that is equivalent to rolling a three-sided die with faces labeled 2, 4, and 6. However, that is not true. Instead, we can look at the problem setup to construct all sequences that end in a 6 and exclude those that contain odd numbers. Since 6 is an even number, we will see that conditioning on rolling all even numbers is the same as rolling all even numbers before the 6.

To quickly reference why the average number of rolls if we rolled a three-sided die until we got a particular side, we can look to our previous work with Memoir '44, where we found that the expected number of rolls needed to achieve a roll with probability $p$ is $1/p$. In this case the probability is $1/3$, so the expected number of rolls, $\mathbb{E}N_3$ is $1/(1/3)=3$. (The notation $N_3$ uses the subscript 3 because there are three sides to the die here.) We can find this by starting by the probability of getting a roll of length $n$,

\begin{align} P(N_3=n) = \left ( \frac{2}{3} \right ) ^{n-1} \cdot \frac{1}{3}, \end{align}

with an often-useful shortcut to computing the expected value for non-negative integer-valued random variables,

\begin{align} \mathbb{E} N = \sum_{n=0}^\infty P(N > n). \end{align}

There are more details in the previous post on the toughest unit in Memoir '44.

Let's start by looking at sequences of six-sided die rolls of length 1 that end in a 6. Actually, there's just one, as shown in Table 1. There I've annotated it with a * to indicate that it meets our conditioning criteria of containing only even numbers. All $1/1$ of the sequences are allowed.

\begin{align*} \begin{array}{c} 6 * \end{array} \end{align*} 
Table 1: Sequences of length 1

Now, let's look at sequences of length 2, where only $2/5$ are allowed.

\begin{align*} \begin{array}{ccccc} 16 \phantom{*} & 26 * & 36\phantom{*} & 46 * & 56\phantom{*} \end{array} \end{align*} 
Table 2: Sequences of length 2

Finally, let's look at sequences of length 3, where $4/25$ are allowed. We'll stop here lest the number of sequences gets too cumbersome.

\begin{align*} \begin{array}{ccccc} 116 \phantom{*} & 216 \phantom{*} & 316 \phantom{*} & 416 \phantom{*} & 516 \phantom{*} \\ 126 \phantom{*} & 226 * & 326 \phantom{*} & 426 * & 526 \phantom{*} \\ 136 \phantom{*} & 236 \phantom{*} & 336 \phantom{*} & 436 \phantom{*} & 536 \phantom{*} \\ 146 \phantom{*} & 246 * & 346 \phantom{*} & 446 * & 546 \phantom{*} \\ 156 \phantom{*} & 256 \phantom{*} & 356 \phantom{*} & 456 \phantom{*} & 556 \phantom{*} \end{array} \end{align*} 
Table 3: Sequences of length 3

It looks like for a sequence of length $n$, then $2^{n-1}$ of the $5^{n-1}$ sequences are allowed under the conditioning. This disproportionately removes long sequences, which is why the average is so short. You can think of it this way: the longer the sequence, the more likely it is to have at least one 1,3, or 5 before conditioning. The probability of a sequence being length $n$ is the probability that the first 6 was on roll $n$. Thus, all rolls before this must be something other than 6. The probability of this happening on each roll is $5/6$, while the probability of getting a 6 in round $n$ is $1/6$. Thus, if $N_6$ is a random variable equal to the length of the sequence of rolling a six-sided die, then

\begin{align} P(N_6=n) = \left ( \frac{5}{6} \right) ^{n-1} \cdot \frac{1}{6}. \end{align}

However, if we only allow sequences with even numbers, then there are only two valid choices for each roll before the last, thus the individual roll probability changes from $5/6$ to $2/3$ and the sequence probability changes accordingly. With only even rolls allowed, there are three options instead of six and two of them instead of five continue the sequence to be longer.  (Heads up for skimmers: this part is wrong and will be corrected later.)

\begin{align} P(N_6=n \, | \, \text{even rolls}) &= \left ( \frac{2}{3} \right) ^{n-1} \cdot \frac{1}{6} \qquad (1) \end{align}

Note that this probability is half of the probability $P(N_3=n)$ from above.

\begin{align} P(N_6=n \, | \, \text{even rolls}) &= \frac{1}{2} P(N_3=n) \end{align}

Using the definition of expected value for discrete-valued random variables,

\begin{align} \mathbb{E} N = \sum_n n \cdot P(N=n), \end{align}

we can find the expected value of $N_6$ given all even rolls.

\begin{align} \mathbb{E} \left [ N_6 \, | \, \text{even rolls} \right ] &= \sum_{n=1}^\infty n \cdot P(N_6=n \, | \, \text{even rolls}) \\ &=\sum_{n=1}^\infty n \cdot \frac{1}{2} \cdot P(N_3=n) \\ &=\frac{1}{2} \cdot \underbrace{\left (\sum_{n=1}^\infty n \cdot P(N_3=n) \right )}_{\mathbb{E} N_3}\\ &= \frac{1}{2} \cdot 3 \\ &= 1.5 \end{align}

That's a little hand-wavy though, so let's take this step slower. We're interested in conditional expectation, and thus conditional probability.

\begin{align} \mathbb{E}\left [N \, | \, A \right ] = \sum_n n \cdot P(N=n \, | \, A) \end{align}

Here $A$ is some generic event. Recall that conditional probability is given by the following,

\begin{align} P(N=n \, | \, A) = \frac{P( N=n \cap A)}{P(A)}, \end{align}

where $\cap$ means and, that is that both the thing on its left and its right are true. For an individual roll that continues the sequence (the roll doesn't equal 6), which we'll label as event $C$ for continuing, the conditional probability is as follows.

\begin{align} P(C \, | \, \text{even roll}) &= \frac{P(C \cap \text{even roll})}{P(\text{even roll})} \\ &=\frac{\;\cfrac{2}{6}\;}{\;\cfrac{3}{6}\;} \\ &=\frac{2}{3} \end{align}

Now we apply this to the probability of the final roll. We'll say the probability of ending is $P(E)$.

\begin{align} P(E \, | \, \text{even roll}) &= \frac{P(E \cap \text{even roll})}{P(\text{even roll})} \\ &=\frac{\;\cfrac{1}{6}\;}{\;\cfrac{3}{6}\;} \\ &=\frac{1}{3} \end{align}

This is not what we want. I've brought us to the wrong answer of 3!  (I haven't quite gotten there, but I think you can take the next few steps.)

At this point, I think it may be useful to acknowledge one of the commenters to the video for helping me to realize what was going on. I was convinced that the answer should be the same for the "previous" case as the all evens case and didn't believe that 1.5 was the correct answer.  Marce Villarino posted some python code that quickly convinced me that 1.5 must be right, but also helped me see that the two cases are actually the same. The code follows.

##### Record successive thows of a six sides die, untill a 6 is thrown.

##### If the record does not contain even number, record it's length.

import random

import time

def calcOnlySixStops(howManyTimes):

result = list()

start_time = time.time()

for i in range(howManyTimes):

a = list()


while a[-1] != 6:


if not((1 in a) or (3 in a) or (5 in a)):


print("--- %s seconds ---" % (time.time() - start_time))

return result

lolailo = calcOnlySixStops(1000000)


####### about 1.5

####### it takes ca. 43 seconds on my laptop

##### Same, but throw a three sided die.

def calcThreeSidedDie(howManyTimes):

result = list()

start_time = time.time()

for i in range(howManyTimes):

a = list()


while a[-1] != 6:



print("--- %s seconds ---" % (time.time() - start_time))

return result

lolailo = calcThreeSidedDie(1000000)


####### abt 3.0

####### it takes ca. 19 s on my laptop.

As you can see from the annotated results, or if you run it yourself, it shows that the expected value is approximately 1.5. It uses a Monte Carlo approach with a million trials, which seems sufficient here.  Also, if you look through the algorithm to condition on the previous rolls being even, you'll see that it's exactly the same as what you would do if you were to condition on all rolls being even.

The key error that we made is the same one to get the wrong answer most of the time, I think, just a little drawn out. Note that we conditioned on individual even rolls, not all rolls being even. That's the wrong event. If we want to take that approach, we need to condition the entire $P(N_6=n)$ on getting even rolls.  (We could also use the individual pieces, but we still must condition on getting all even rolls, not just the one under consideration at that moment.)

\begin{align} P(N_6=n \, | \, \text{even rolls}) &= \frac{P(N_6 =n \cap \text{even rolls})}{P(\text{even rolls})} \end{align}

The numerator is relatively easy to figure out. Based on what we saw above, we find that

\begin{align} P(N_6 =n \cap \text{even rolls}) = \left ( \frac{2}{6} \right)^{n-1} \cdot \frac{1}{6} \end{align}

To get this we we took the unconditioned probability of getting a certain number of rolls and combined it with the fraction of the rolls that meet our all evens criteria.  But what is the probability of even rolls? Note that this isn't the probability of getting even rolls given a length, it's the overall probability. Here it may be useful to turn things around.

\begin{align} P(\text{even rolls}) &= \sum_n P(\text{even rolls} \, | \, N_6=n) \cdot P(N_6=n) \\ &= \sum_n P(\text{even rolls} \cap N_6=n) \end{align}

Since we already know this joint probability from above, we can plug it in.

\begin{align} P(\text{even rolls}) &= \sum_{n=1}^\infty \left ( \frac{2}{6} \right)^{n-1} \cdot \frac{1}{6}\\ &= \frac{1}{6} \cdot \sum_{n=1}^\infty \left ( \frac{1}{3} \right)^{n-1} \end{align}

The above includes a geometric series. We can adjust the range and the exponent to make it look more familiar

\begin{align} P(\text{even rolls}) &= \frac{1}{6} \cdot \underbrace{\sum_{n=0}^\infty \left ( \frac{1}{3} \right)^{n}}_{\cfrac{1}{1-\cfrac{1}{3}}} \\ &= \frac{1}{6} \cdot \frac{3}{2}\\ & = \frac{1}{4} \end{align}
Does this number make sense? At first, it may seem like too few of the sequences. But consider that on each roll, we throw out half of the results. That is, every time we roll the die half of the results are excluded.

Now we have the pieces we need and we can find the average number of rolls.

\begin{align} P(N_6=n \, | \, \text{even rolls}) &=\frac{P(N_6 =n \cap \text{even rolls})}{P(\text{even rolls})} \\ &= \frac{\left ( \cfrac{2}{6} \right)^{n-1} \cdot \cfrac{1}{6}}{\cfrac{1}{4}} \\ P(N_6=n \, | \, \text{even rolls}) &=\left(\frac{1}{3} \right )^{n-1} \cdot \frac{2}{3} \label{eq:myd_2} \qquad (2)\end{align}

Leveraging our previous work with Memoir '44, this has the same form as asking how long it takes to get the first result, when the result occurs with probability $p=2/3$. Thus,

\begin{align} \mathbb{E} \left [ N_6 \, | \, \text{even rolls} \right ] &= \frac{1}{2/3}\\ &=1.5 \end{align}
Note that here we found that 
 \begin{align} P(N_6=n \, | \, \text{even rolls}) &= \left(\frac{1}{3} \right )^{n-1} \cdot \frac{2}{3}  \end{align}

which contradicts our earlier hand-wavy equation. Which one is right? Well, I generally trust more rigorous derivations, so I'm inclined towards the later. Interestingly, they both give the correct average. Let's go back to our simulation and look at not just the average, but also the distribution of lengths given all even rolls. By adding a calculation of the percentage of sequences of each length to the above simulation, we get the results in Table 4.

\begin{align*} \begin{array}{cccc} \text{Sequence length} & \text{Equation (1)} & \text{Equation (2)} & \text{Estimated probability} \\ \hline 1 & 0.1667 & 0.6667 & 0.6665 \\ 2 & 0.1111 & 0.2222 & 0.2226 \\ 3 & 0.0741 & 0.0741 & 0.0740 \\ 4 & 0.0494 & 0.0247 & 0.0241 \\ 5 & 0.0329 & 0.0082 & 0.0084 \\ 6 & 0.0219 & 0.0027 & 0.0029 \\ 7 & 0.0146 & 0.0009 & 0.0009 \\ 8 & 0.0098 & 0.0003 & 0.0004 \\ 9 & 0.0065 & 0.0001 & 0.0001 \end{array} \end{align*} Table 4: Simulated and theoretical sequence length conditional probabilities

As we can see, this lines up well with our second, revised, equation (2). Interestingly, equation (1) looks like it should have a higher expected number of rolls, as the probability of having sequences of length 1 and 2 is lower than simulated while the probability of sequences longer than 3 is higher. An observation that is useful to recall at this point is that equation (1) gave us that $P(N_6=n \, | \, \text{even rolls})$ is half of $P(N_3=n)$. However, further reflection indicates that that doesn't make sense as the sum of the probability over the whole sample space must be 1. The probability that there is some outcome is 1.

\begin{align} \sum_n P(N=n) = 1 \end{align}

If this is true of some function $f(n)$ such that $P(N=n)=f(n)$ it cannot be true of some other function $g(n) = f(n)/2$. That is, we should find that our incorrect equation gives the sum to be 0.5.

\begin{align} \sum_{n=1}^\infty \left ( \frac{2}{3} \right) ^{n-1} \cdot \frac{1}{6} & = \frac{1}{6} \cdot\underbrace{\sum_{n=0}^\infty \left ( \frac{2}{3} \right) ^n}_{\cfrac{1}{1-\cfrac{2}{3}}}\\ &= \frac{1}{6} \cdot 3 \\ & = \frac{1}{2} \end{align}

Indeed, that is the case. This is confirmation that equation (1) is incorrect.

Again, note that there is no sequence that would be included or excluded differently if we were to emphasis that we conditioned on rolling all even numbers before the 6. The two descriptions are logically equivalent and have the same expected number of rolls.

Monday, April 4, 2022

How often does each number come up in Space Base?

 In Space Base, every turn the active player rolls two six-sided dice. Each player then chooses to treat these as separate numbers or first sum them together and get the corresponding rewards or reward, respectively. Thus, in some sense, the question is not answerable, because it depends on each player's choice. The rulebook does a decent job on laying this out and speaking somewhat about the probabilities.

We previously found the probability of rolling each result on 2d6 for Catan, so we won't revisit that. Is there a good way to layer on top of that the idea that we can take either the sum of the dice or the results individually? The rulebook makes one choice. A quick glance at the table there may be misleading, as it shows the distribution of a regular 2d6 roll with die faces, but then show the numbers for a combination of sum and individual results.

If we want to get a single probability for how often each number comes up, we have to make a choice of how to handle the case where we roll doubles. Should we count that as two occurrences of the single die value or just one? It depends on what we care about. If we're trying to figure out the average return then I think it makes sense to choose, as the rulebook did, to count that as two results. You can see that in the number of times that they list a 1 showing up: 12 out of the 36 possible results of rolling two 6-sided dice, as shown in Table 1. Note that one of these times is when both dice are rolled together. Thus, there are only 11 rolls that produce these 12 ones. While the double roll makes up for there being only 11 outcomes that include a 1 being rolled in the average reward over many rolls, there may be cases where we are more interested in how many rolls include a given number and don't benefit from rolling doubles. In Space Base there are some cards that gain a charge cube when rolled, but can only hold one charge cube. Thus, rolling doubles of that number can be worse overall, as we don't have access to rewards from a second number.

\begin{align*} \begin{array}{c|cccccc} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline 1& {1},{1} & {1},2 & {1},3 & {1},4 & {1},5 & {1},6\\ 2& 2,{1} & 2,2 & 2,3 & 2,4 & 2,5 & 2,6\\ 3& 3,{1} & 3,2 & 3,3 & 3,4 & 3,5 & 3,6\\ 4& 4,{1} & 4,2 & 4,3 & 4,4 & 4,5 & 4,6\\ 5& 5,{1} & 5,2 & 5,3 & 5,4 & 5,5 & 5,6\\ 6& 6,{1} & 6,2 & 6,3 & 6,4 & 6,5 & 6,6\\ \end{array} \end{align*}

Table 1: Rolling ones on two 6-sided dice

We also need to choose how to handle the option to take the sum of the dice or the individual values. The rulebook includes both at full weight. That is, out of the 36 possible results of rolling two dice, it includes all the times that a 6 is rolled on individual dice as well as all the times that the sum of the two dice is 6. If this were the only number for which we received rewards, or if it were significantly better than our other rewards, then this makes sense, as we'd always choose 6 when it's an option. However, in many cases it could be that getting both a 2 and a 4, for example, is better than just getting the sum of 6. That may be an obvious choice or a difficult one. A way to handle this is to say that we don't know which way players are going to choose (though this may vary with strategy) and treat it as a random event with each option having equal probability. We can think of this as reducing the number of times each number comes up by a factor of 2.

Fortunately, we are spared some complexity by the fact that the sum and individual numbers are always different, since each die has a minimum value of 1. Otherwise, we'd have to consider what to do when the sum matches a single die. Actually, I think we'd always choose the single values, unless it was to use an ability to adjust the number that depends on using the sum. However, we'd still have to account for it in the math to avoid double counting.

There's also a choice of how to combine these two factors. I see two natural pairings. First, if looking at the average amount of return, I think it makes sense to assume that the sum and individual values are each used half the time. This gives something like a weighted probability that you get the rewards from a given number. (I say weighted, because it may sometimes involve double rewards, as discussed above.) Second, if looking at how often something is available, it makes sense to count double rolls as single, but include both individual values and the sum at full weight. This is essentially answering how many turns a given number is an option.

Those both give the middling option for the probabilities. Interestingly, the rulebook went with neither and instead prints the more optimistic combination of counting a double value as two result and including both individual numbers and sums. This does make sense if you're looking at the average reward of a number that is strictly better than everything else, but it ignores the counterfactual or opportunity cost of not choosing the other option. That is, if you're choosing between the sum 5 and the individual numbers 2 and 3, if you're looking at 5, the benefit of what's there has to be compared to the alternative of the rewards for both 2 and 3.

I suppose I should say something about the last of the four options: what does pairing no doubles with random sum or individual? It seems appropriate to use when you only get benefit from a single reward per turn, but it's a tossup as to whether the number would be chosen given the other choices.

\begin{align*} \begin{array}{c|cccc} \text{Method} & 1 & 2 & 3 & 4 \\ \hline P(\text{sum}) = P(\text{individual}) & 0.5 & 0.5 & 1 & 1 \\ \text{# single} & 11 & 12 & 11 & 12 \\ \hline 1 & \frac{5.5}{36} & \frac{6}{36} & \frac{11}{36} & \frac{12}{36}\\ 2 & \frac{6}{36} & \frac{6.5}{36} & \frac{12}{36} & \frac{13}{36}\\ 3 & \frac{6.5}{36} & \frac{7}{36} & \frac{13}{36} & \frac{14}{36}\\ 4 & \frac{7}{36} & \frac{7.5}{36} & \frac{14}{36} & \frac{15}{36}\\ 5 & \frac{7.5}{36} & \frac{8}{36} & \frac{15}{36} & \frac{16}{36}\\ 6 & \frac{8}{36} & \frac{8.5}{36} & \frac{16}{36} & \frac{17}{36}\\ 7 & \frac{3}{36} & \frac{3}{36} & \frac{6}{36} & \frac{6}{36}\\ 8 & \frac{2.5}{36} & \frac{2.5}{36} & \frac{5}{36} & \frac{5}{36}\\ 9 & \frac{2}{36} & \frac{2}{36} & \frac{4}{36} & \frac{4}{36}\\ 10 & \frac{1.5}{36} & \frac{1.5}{36} & \frac{3}{36} & \frac{3}{36}\\ 11 & \frac{1}{36} & \frac{1}{36} & \frac{2}{36} & \frac{2}{36}\\ 12 & \frac{0.5}{36} & \frac{0.5}{36} & \frac{1}{36} & \frac{1}{36}\\ \end{array} \end{align*}

Table 2: Probability of reward in Space Base

Friday, April 1, 2022

How many unique starting spice counts are there in Century: Spice Road?

Suppose you want all players to start with a unique setup in terms of the number of each of the four spices, with a total number of spices equal to 0 to the maximum 10 in a caravan. How many players could you have?

A first guess would be that each of the 10 caravan positions can have one of five possible values (one of the four spices or no spice). Thus, we might think that there are $5^{10}=9765625$ setups. However, as we don't distinguish, for example, one yellow (turmeric) and one red (saffron) from one red and one yellow, this dramatically over counts the number of unique setups.

One approach to dealing with this type of problem is to always order the spices in a certain way. Imagine the 10 caravan spaces laid out in a line, starting with brown (cinnamon), then green (cardamom), red (saffron), yellow (turmeric), and finally no spice. We can express each unique setup in terms of the number of each spice. Equivalently, we may note the transition points. For example, instead of focusing on 3 brown cubes, 2 green cubes, 1 red cube, 3 yellow cubes, and 1 blank space, we can look at the transitions after positions 3, 5, 6, and 9. In terms of numbers, this hasn't helped us much, though we have reduced down from five parameters to four (we could have done this other ways, since we know there are 10 total spots for cubes, so the initial numbers must sum to 10). However, if we look at the nine possible transition points between the ten caravan locations, we can see that each unique setup is a unique selection of four of those nine possible transition points. Selecting a subset from a larger set is a problem we know how to solve with our handy combination or binomial coefficient,

\begin{align} \binom{n}{k} = \frac{n!}{k! \cdot (n-k)!}, \end{align}
said as "n choose k".

However, that's not quite right. Let's consider what happens when we have none of one type of the spices. That means that two of our transition points overlap, which isn't accounted for when always choosing four transition points. We could handle this by adding up different scenarios, but that's ugly and boring. So while it could give us the answer, let's find something more elegant.

We're trying to count the number of ways to take 10 of something and allocate it into 5 buckets, where each bucket can hold 0 through 10. We can add one into each of the buckets and not change the situation. Since there are five buckets, that means adding five to the number of possible transition points. That is, the number of ways to take 15 of something and allocate it into 5 buckets where each bucket can hold 1 through 11 is the same. Thus, we can take $n+5-1$ choose $4$,

\begin{align} \binom{10+5-1}{4} = \binom{14}{4} = \frac{14!}{4! \cdot 10!} = 1001. \end{align}

As a check for this method, we can look at a simpler case, where it's easier to count or come up with the number in another way. Suppose there are only 2 spots on the caravan. Our formula above says there should be $\binom{2+5-1}{4} = \binom{5}{4} = 15$ setups. With only 2 caravan spots, there are only two types of pattern: two of one spice (including no spice) and one each of two different spices. For the two of one there are 5 ways to do this, and for the one of each, there are $\binom{5}{2} = 10$ ways of doing it. Summing these two ways we get 15. This isn't a perfect check, and it's probably more useful to use to think about the problem and how it works.