Monday, June 29, 2020

Which die is highest?

You roll a d6, d8, d12, and d20 all at the same time. What is the probability the d6 results in the highest number? d8? d12? d20?
First, let's clarify by what we mean by "highest number''. Does that mean that the d6 is greater the other dice, or just that no other dice are greater? In other words, how do we count ties? Let's not count ties for now, and deal with them last. With four dice, they will be the most tricky. (This is due to the number of cases involved. There could be 2, 3, or 4 dice tied. Or two sets of 2 for that matter!)

The faces of each of these dice are numbered 1 through the number of sides that it has. Each side has equal probability of occurring. If we label the results of the four dice as random variables $X_6$, $X_8$, $X_{12}$, and $X_{20}$, where the subscript refers to the number of sides of the die, then we can write the statement of equal probability in the following equation.
\begin{align}
P(X_N = x) = \begin{cases}
\frac{1}{N} &\qquad x \in \mathbb{W}, 1 \leq x \leq N \\
0 &\qquad \text{else}
\end{cases}
\end{align}
Here $\mathbb{W}$ is the set of all whole numbers (i.e. 0, 1, 2, 3, etc.). Recall that the probability of rolling less than a number on such a die is proportional to said number,
\begin{align}
P(X_N < x) = \frac{x-1}{N} &\qquad x \in \mathbb{W}, 1 \leq x \leq N .
\end{align}
But perhaps I'm putting the cart before the horse. How do we approach this problem? We can write the question as,
\begin{align}
P(X_6 > \max(X_8, X_{12}, X_{20})) = ?
\end{align}
This is not a great way to write it though. It tends to make us think that we should compute the distribution of the maximum term $\max(X_8, X_{12}, X_{20})$. We can write them separately also,
\begin{multline}
P(X_6 > \max(X_8, X_{12}, X_{20})) \\= P((X_6 > X_8) \cap (X_6 > X_{12}) \cap (X_6 > X_{20}))
\end{multline}
This means we are looking for the probability that $X_6 > X_8$ and $X_6 > X_{12}$ and $X_6 > X_{20}$. Unfortunately, these events are not independent, so we can not break up the intersection easily. How do we know that they are not independent? The requirement for independence is that knowing one event does not affect the probability of the other. It holds that $P(A | B) = P(A)$ if $A$ and $B$ are independent. Consider the case when $X_6 > X_{20}$. If true, that makes it more likely that $X_6 > X_8$ as then $X_6$ is more likely to be high.

Instead of using independence, let's consider that we roll the d6 first. All the rolls are independent, so this doesn't affect the probabilities. We can then consider the conditional probability,
\begin{multline}
P(( (X_6 > X_8) \cap (X_6 > X_{12}) \cap (X_6 > X_{20}) ) | X_6 = x) \\= P((x > X_8) \cap (x > X_{12}) \cap (x > X_{20})).
\end{multline}
Here, the three events are independent, since they depend on independent random variables, thus,
\begin{multline}
P((x > X_8) \cap (x > X_{12}) \cap (x > X_{20})) \\= \underbrace{P(x > X_8)}_{\frac{x-1}{8}} \cdot \underbrace{P(x > X_{12})}_{\frac{x-1}{12}} \cdot \underbrace{P(x > X_{20})}_{\frac{x-1}{20}}.
\end{multline}
We can then fill in the individual terms using a previous equation as shown above.
\begin{align}
P((x > X_8) \cap (x > X_{12}) \cap (x > X_{20})) = \frac{(x-1)^3}{8 \cdot 12 \cdot 20}.
\end{align}
We can then use the conditional probability to compute the total probability, using the following form,
\begin{align}
P(A) = \sum_{x \in \Omega} P(A | X=x) \cdot P(X=x),
\end{align}
where $A = (X_6 > X_8) \cap (X_6 > X_{12}) \cap (X_6 > X_{20})$ is an event and $X = X_6$ is a random variable. Note that this is an extension of the basic statement of conditional probability, namely that,
\begin{align}
P(A \cap B) = P(A|B) \cdot P(B),
\end{align}
where $A$ and $B$ are events. Consider a set of mutually exclusive events $B_i$, the union of which encompasses the entire sample space. That is,
\begin{align}
&B_i \cap B_j = \varnothing, \quad i \neq j \\
&\bigcup_i B_i = \Omega .
\end{align}
Suppose we take the intersection of our event $A$ with the whole sample space. The resulting event is equivalent to the event $A$,
\begin{align}
A = A \cap \Omega ,
\end{align}
and thus the probabilities are the same,
\begin{align}
P(A \cap \Omega) = P(A | \Omega) \cdot P(\Omega) = P(A) .
\end{align}
Now, but substituting $\Omega$ for the union of our events $B_i$, we find a new way to write the probability of $A$.
\begin{align}
P(A) & = P(A \cap \Omega) \\
&= P\left (A \cap \left (\bigcup_i B_i \right ) \right ) \\
&=P\left( \bigcup_i \left( A \cap B_i \right) \right)
\end{align}
The last line above distributes the intersection across all the terms in the union. This is analogous to the distributive property of multiplication. That is, treat intersection like multiplication and union like addition when rearranging. A simple statement of the idea is that
\begin{align}
A \cap (B \cup C) = (A \cap B) \cup (A \cap C).
\end{align}
We know that all that sets $A \cap B_i$ are disjoint since $B_i$ are disjoint by construction. The probability of disjoint events add when considering their union, thus,
\begin{align}
P(A) &=P\left( \bigcup_i \left( A \cap B_i \right) \right) \\
&=\sum_i P(A\cap B_i).
\end{align}
Now, we can use our earlier statement of conditional probability to rewrite this as,
\begin{align}
P(A) &=\sum_i P(A\cap B_i) \\
&=\sum_i P(A | B_i) \cdot P(B_i).
\end{align}
By defining the event $B_i$ as when $X = x$, where $x \in \mathbb{Z}$, for some mapping of $i$ onto $x$ (or should it be vice-versa?), we get the equation we are looking to prove. QED.

Now we can move on with the problem at hand, and compute the probability we are interested in.
\begin{align}
P((X_6 > X_8) \cap (X_6 > X_{12}) \cap (X_6 > X_{20})) &= \sum_{x=1}^6 \frac{(x-1)^3}{8 \cdot 12 \cdot 20}\cdot \frac{1}{6} \\
&\approx 0.0195
\end{align}
We can similarly compute the other probabilities, but there is a slight wrinkle here. We tacitly benefited from the fact that the d6 has the lowest maximum. If we instead look at the probability that the d20 is highest, we have to consider that the d20 can roll a number not possible on the other dice. This shows up when we use the earlier equation for $P(X_N < x)$ outside the range we stated. We can extend it as follows.
\begin{align}
P(X_N < x) = \begin{cases}
\hfill 0 \hfill & \, x < 0 \\
\hfill \cfrac{x-1}{N} \hfill &\, x \in \mathbb{N}, x \leq N \\
\hfill 1 \hfill & \, x > N
\end{cases}
\end{align}
Focusing on the probability that the d8 is highest, there are thus two cases two consider: when the d8 rolls up to 6, and when it rolls above 6.
\begin{align}
P((x > X_6) \cap (x > X_{12}) \cap (x > X_{20})) = \begin{cases}
\cfrac{(x-1)^3}{6 \cdot 12 \cdot 20}&\, x \leq 6 \\ & \\
\cfrac{(x-1)^2}{12 \cdot 20} &\, x > 6
\end{cases}
\end{align}
From this we can compute the probability that the d8 is higher than all other dice using two summations.
\begin{multline}
P((X_8 > X_6) \cap (X_8 > X_{12}) \cap (X_8 > X_{20})) \\= \left (
\sum_{x=1}^6 \frac{(x-1)^3}{6 \cdot 12 \cdot 20}
+ \sum_{x=7}^8 \frac{(x-1)^2}{12 \cdot 20}
\right ) \cdot \frac{1}{8}
\end{multline}
\begin{align}
P((X_8 > X_6) \cap (X_8 > X_{12}) \cap (X_8 > X_{20}))&\approx 0.0638
\end{align}
Computing the probability for the d12 and d20 are similar, except that there are even more ranges.
\begin{multline}
P((X_{12} > X_6) \cap (X_{12} > X_{8}) \cap (X_{12} > X_{20})) \\= \left (
\sum_{x=1}^6 \frac{(x-1)^3}{6 \cdot 8 \cdot 20}
+ \sum_{x=7}^8 \frac{(x-1)^2}{8 \cdot 20}
+ \sum_{x=9}^{12} \frac{(x-1)}{20}
\right ) \cdot \frac{1}{12}
\end{multline}
\begin{align}
P((X_{12} > X_6) \cap (X_{12} > X_{8}) \cap (X_{12} > X_{20}))&\approx 0.222
\end{align}
\begin{multline}
P((X_{20} > X_6) \cap (X_{20} > X_{8}) \cap (X_{20} > X_{12})) \\= \left (
\sum_{x=1}^6 \frac{(x-1)^3}{6 \cdot 8 \cdot 12}
+ \sum_{x=7}^8 \frac{(x-1)^2}{8 \cdot 12}
+ \sum_{x=9}^{12} \frac{(x-1)}{12}
+ \sum_{x=13}^{20} 1
\right ) \cdot \frac{1}{20}
\end{multline}
\begin{align}
P((X_{20} > X_6) \cap (X_{20} > X_{8}) \cap (X_{20} > X_{12}))&\approx 0.622
\end{align}
Now, let's consider the probability of a tie. If we just lump them all together, then we compute the probability relatively easily, by looking at it in terms of there not being a tie.
\begin{align}
P(\text{tie}) = 1 - P(\text{no tie})
\end{align}
We can construct the case when there is no tie. Consider rolling the d6 first. Any face is allowed. Now consider rolling the d8. All but one of the faces are allowed; if we roll the same as on the d6 we have a tie. Similarly for the d12, all but 2 are allowed. For the d20, all but 3 are allowed.
\begin{align}
P(\text{no tie}) &= \frac{6}{6} \cdot \frac{7}{8} \cdot \frac{10}{12} \cdot \frac{17}{20}\\
&\approx 0.620\\
P(\text{tie}) &\approx 1 - 0.620 \\
&\approx 0.380
\end{align}
Note that when we compare this to what's left when none of the four dice are higher than all the others, the values are not equal.
\begin{align}
P(\text{tie for highest}) &= 1 - P(\text{d6 strictly higher}) \\
&\qquad -P(\text{d8 strictly higher}) \\
&\qquad -P(\text{d12 strictly higher}) \\
&\qquad- P(\text{d20 strictly higher}) \\
&\approx 1 - 0.0195 - 0.0638 - 0.222 - 0.622 \\
&\approx 0.0727
\end{align}
This is because many, most actually, of the ties are between dice which are not amongst the highest. For example, the dice may roll 6, 6, 10, 15.

(I originally found this question here on BGG.)

Tuesday, June 23, 2020

Errata: How replayable is Onitama using the 16 move cards in the base game?


In our earlier discussion of Onitama, there was an error in the calculation of the number of sets of five cards which have an asymmetric card without its mirror, which we labeled $n_3$, helpfully pointed out by a reader in the comments.  On reflection, the error is somewhat obvious, as we computed that there are more of these cases than the total number of ways of select five cards, $n_1$.
\begin{align}
n_1 &= 4368 \\
n_3 &= 8008 \\
n_3 &> n_1
\end{align}
We avoided spotting the error because we subtracted half of $n_3$ from $n_1$, and since $n_1 > n_3/2$, we still got a positive number.  However, a simple check of the scale of these two numbers would have indicated the error.

But what is the error?  It's that we double counted many cases (I really should be saying I here, as I made the error, and you're following along at home probably spotting my error!).  Recall the construction of $n_3$.  We selected one of the asymmetric cards, and then allowed the remaining cards to be any of the cards except the mirror of the first card.  However, that includes cases where there is an additional unmatched asymmetric card.  Those cases get counted again when we get to picking the unmatched card as the first card.

Correcting this gets messy.  I'd still like the largely follow the flow that I used originally, so let's just recompute $n_3$ correctly.  Unfortunately, the best way that I've been able to come up with involves summing over every possible number of unmatched asymmetric cards as well as looking at every possible number of matched asymmetric cards.  The crux is this: if we include an asymmetric card we must know whether or not its pair is included.  Given the number of unmatched asymmetric cards, $u$, and the number of matched pairs, $m$, we can find the number of ways to come up with a valid selection.  First, there are $\binom{8}{u}$ ways to select $u$ unmatched asymmetric cards out of the total of 8, with $u$ from 1 to 4.  The maximum is 4, because there are only 4 pairs of matches cards, so we can't get 5 unmatches ones.  A fifth card would have to match.  That leaves us with $16-u$ cards, but only $4- u$ pairs of matched asymmetric cards ($8-2 \cdot u$ cards in total).  This means there are $\binom{4-u}{m}$ ways to select the matched pairs.  If $u \geq 4$, then there's no space for matched pairs.  In general, we have to keep the total number of matched and unmatched cards to 5 or less, so
\begin{align}
u + 2\cdot m \leq 5 .
\end{align}
This limits $m$ to satisfy,
\begin{align}
m \leq \frac{5-u}{2}
\end{align}
From the binomial coefficient along, we can see that $m \leq 4-u$, but the above requirement is more restrictive here, since $m$ must be an integer.
\begin{align}
\begin{array}{c|c|c}
u &4-u&\frac{5-u}{2}\\\hline
1&3&2\\
2&2&1.5\\
3&1&1\\
4&0&0.5\\
\end{array}
\end{align}
This leaves a remaining $5-u-2 \cdot m$ cards to be chosen from the remaining 8 symmetric cards, which there are $\binom{8}{5-u-2 \cdot m}$ to select.  Putting this together we get the following when we sum over the ranges computed above.
\begin{align}
n_3 &= \sum_{u=1}^4 \sum_{m=0}^{\lfloor \frac{5-u}{2} \rfloor} \binom{8}{u} \cdot \binom{4-u}{m} \cdot \binom{8}{5-u-2 \cdot m} \\
n_3 &= 5456
\end{align}
Unfortunately, I've made another mistake, since the new $n_3$ calculation is still larger than $n_1$.

Perhaps you noticed that I did a poor job of selecting the number of ways to choose $u$ unmatched asymmetric cards.  It holds up to $u=1$, but not above.  We can't choose any of the 8 cards, as we can choose at most one of each pair.  Thus, when selecting each card, we need to choose which pair to select from, of which there are $\binom{4}{u}$ ways to do, and then select which card inside each set.  For each unmatched asymmetric card there are 2 ways of doing that (since there are two cards), so in total there are $2^u$ ways of selecting the $u$ cards once selecting the pairs.  This provides us with a new equation for $n_3$.
\begin{align}
n_3 &= \sum_{u=1}^4 \sum_{m=0}^{\lfloor \frac{5-u}{2} \rfloor} \binom{4}{u} \cdot 2^u \cdot \binom{4-u}{m} \cdot \binom{8}{5-u-2 \cdot m} \\
n_3 &= 4040
\end{align}
Finally, we've gotten a number that's less than $n_1$.  It seems surprisingly large, in that almost all of the ways to select 5 cards out of 16 results in at least one unmatched asymmetric card.  Let's come up with a check that our methodology is correct.  If we add in the case that we get no unmatched pairs, that is $u=0$,
then we should get $n_1$.  Let's try that.
\begin{align}
n_3 +  \sum_{m=0}^{\lfloor \frac{5}{2} \rfloor} \binom{4}{m} \cdot \binom{8}{5-2 m} &= \sum_{u=0}^4 \sum_{m=0}^{\lfloor \frac{5-u}{2} \rfloor} \binom{4}{u} \cdot 2^u \cdot \binom{4-u}{m} \cdot \binom{8}{5-u-2m} \\
 &= 4368 \\
&= \binom{16}{5} \\
&= n_1
\end{align}
This does a pretty good job of confirming that our methodology is correct, as our somewhat convoluted summation gives us the same value for $n_1$.  Now we can bring this all together and calculate the total number of initial positions considering  isomorphs.  Recall that we had the number of ways to select 5 cards excluding isomorphs is as follows.
\begin{align}
n_4 &= n_1 - \frac{n_3}{2} \\
&= \binom{16}{5} - \frac{1}{2} \cdot  \sum_{u=1}^4 \sum_{m=0}^{\lfloor \frac{5-u}{2} \rfloor} \binom{4}{u} \cdot 2^u \cdot \binom{4-u}{m} \cdot \binom{8}{5-u-2m}\\
&= 4365 - \frac{4040}{2} \\
&= 2345
\end{align}
To get the number of initial positions with isomorphs, we need to multiply by $\frac{5!}{2 \cdot 2}$ as before.
\begin{align}
n_5 &= n_4 \cdot \frac{5!}{2 \cdot 2} \\
&= 2345  \cdot \frac{5!}{2 \cdot 2} \\
&= 2345 \cdot 30 \\
&= 70350
\end{align}

Monday, June 22, 2020

Onitama errata forthcoming

I'm working on fixing the Onitama calculation error I made previously, but I've made some subsequent computation errors that I'm tracking down.  Expect an update soon.

Monday, June 15, 2020

Rolling ability scores in D&D

There are six main ability scores that have persisted through all editions of Dungeons & Dragons (although their order has changed). In the earlier editions, for each you would roll 3d6 to get a scores from 3–18, which may then be modified by various choices. Newer editions and house rules are often more generous to the player. This 3d6 method is often said to generate a "bell curve" of possible values, referring to the shape of the normal, or Gaussian, distribution's probability density function. I'll briefly get ahead of myself to point out the difference between the two shown in Fig. 1. They're pretty similar, but don't quite match. This comes from the central limit theorem, which tells that as we add up independent and identically distributed (i.i.d.) random variables the distribution of the sum converges to the normal distribution as the number of addends (i.e. the number of terms in the sum). approaches infinity. Each of the 3d6 that we roll are i.i.d, and even with only three terms we're already looking quite a bit like a bell curve, even though it's actually piece-wise quadratic. So the bell curve descriptor is not baseless, but it's not precise.

Figure 1: PMF of 3d6 compared to a bell curve

Let's go back and derive the pmf we just saw.

At first we might attempt to extend the method we used to compute the pmf of 1d$N$+1d$M$ in our discussion related to the dots in Catan. This works up to a point. We know there are $6^3= 216$ possible outcomes from rolling 3d6. For each sum we can count the number of ways to generate that sum with three numbers valued 1–6 by looking at the number of ways to put two separators on the inside of that many pips. For example, for a sum of 6, we can see that there are 5 choose 2, or $\binom{5}{2} = 5! / (3! \cdot 2!) = 10$ ways to do it. The 5 slots are illustrated below.

* | * | * | * | * | *

Unfortunately, this breaks down as we get to larger sums. For example, let's consider a sum of 9. We might think that there are 7 choose 2 possible combinations. However, some of those are invalid, like the one below.

* | * * * * * * * | *

Here we've chosen the separators such that the three dice sum to 9 with the values 1, 7, and 1. However, 7 is not a valid result on 1d6. We've over counted. We may be able to figure out how much we're over counting by and subtract that number out. However, it's going to be kludgey and probably not give us an equation from which we can glean anything intuitively. Thus, I'm not very interested in doing that. Let's find a better way.

To do that, I'll introduce the probability generating function (pgf). This is related to the moment generating function and the characteristic function of a random variable in ways that we won't get into here. The probability generating function $G_X(z)$ of a non-negative random variable $X$ with probability mass function $f_X(x)$ is defined as the expected value of $z^X$.
\begin{align}
G_X(z) &= \mathbb{E} z^X \\
&= \sum_{x=0}^\infty f_X(x) z^x
\end{align}
This looks very much like the $z$-transform of the probability mass function, the only exception being that we have a positive exponent of $z$, whereas the $z$-transform has a negative exponent. However, given that this function is only expected to converge for complex values of $z$ where $|z| \leq 1$, this difference is relatively minor. This gives us access to a bunch of tools and properties if we're familiar with the $z$-transform, which is related to Fourier and Laplace transforms, but for discrete-time signals. We won't get into all of them, but I'll show you the one we'll use here.

Let's say that $Y$ is the sum of rolling 3d6, where each die has its own result given by $X_1$, $X_2$, and $X_3$. The pmf of any of $X_1$, $X_2$, and $X_3$ is given as follows.
\begin{align}
f_X(x) &= \begin{cases}
\dfrac{1}{6} & \qquad x \in \{1, 2, 3, 4, 5, 6\} \\
0 &\qquad \text{else}
\end{cases}
\end{align}
Thus, the pgf is given by.
\begin{align}
G_X(z) &= \frac{1}{6} \left ( z^1 + z^2 + z^3 + z^4 + z^5 + z^6 \right )
\end{align}
Looking at the pgf of $Y$, we find an interesting property.
\begin{align}
G_Y(z) &= \mathbb{E} z^Y \\
&= \mathbb{E} z^{X_1+X_2+X_3} \\
&= \mathbb{E} z^{X_1} \cdot z^{X_2} \cdot z^{X_3} \\
&= \mathbb{E} z^{X_1}\mathbb{E}\cdot z^{X_2}\mathbb{E}\cdot z^{X_3} \\
&=G_{X_1}(z) \cdot G_{X_2}(z) \cdot G_{X_3}(z) \\
&= G_X(z)^3
\end{align}
When we sum random variables, the pgf of the sum is the product of the pgfs of the addends. Let's do this first for the 2d6 case, where $Y_2 = X_1 + X_2$.
\begin{align}
G_{Y_2}(z) &= G_X(z)^2 \\
&= \left( \frac{1}{6} \left ( z^1 + z^2 + z^3 + z^4 + z^5 + z^6 \right ) \right)^2 \\
&= \frac{1}{6^2} \left ( z^1 + z^2 + z^3 + z^4 + z^5 + z^6 \right ) ^2
\end{align}
Now let's multiply this out.
\begin{align*}
G_{Y_2}(z) = \frac{1}{6^2} ( z^2 +& z^3 + &z^4 +& z^5 +& z^6 +& z^7 + &&&&& \\
&z^3 + &z^4 + &z^5 + &z^6 + &z^7 + &z^8 +&&&&\\
&& z^4 + &z^5 + &z^6 + &z^7 + &z^8 + &z^9+&&&\\
&& &z^5 + &z^6 + &z^7 + &z^8 + &z^9 +&z^{10}+&&\\
&& & &z^6 + &z^7 + &z^8 + &z^9 +&z^{10} +&z^{11}+&\\
&&& & &z^7 + &z^8 + &z^9 +&z^{10}+&z^{11}+&z^{12} ) \hfill\\
\end{align*}
For our simple case, I think it's pretty clear that $f_{Y_2}(y)$ is the coefficient of the $z^y$ term. By grouping the like terms, which I've aligned vertically, we can see the familiar triangular shape of the 2d6 pmf.

A more computationally efficient way of looking at this is to take advantage of a property of $z$-transforms. Namely, that multiplication in the $z$ domain is convolution in the $x$ domain. (This is often stated as multiplication in the frequency domain is convolution in the time domain, although those are not the types of variables we're looking at. The opposite is also true: multiplication in the time domain is convolution in the frequency domain.) The convolution of two functions of integers, $f()$ and $h()$, is written as follows.
\begin{align}
f(x) * h(x) &= \sum_{n=-\infty}^\infty f(n) \cdot h(x-n)
\end{align}
Convolution is symmetric, so we can write either function first.
\begin{align}
f(x) * h(x) &= \sum_{n=-\infty}^\infty f(x-n) \cdot h(n) \\
&= h(x) * f(x)
\end{align}
Let's understand what this is doing graphically a little bit. The following figures depict various values of the convolution computation. The first plot in each figure shows one pmf function. The second plot shows the second pmf, which is flipped horizontally and shifted by the value at which the convolution is being computed. The third plot shows the multiplication of these two functions for each value of $n$. The fourth plot shows the result of the convolution, which at each value of $x$ is equal to the sum of the third plot across all $n$ for that value of $x$. As $x$ changes, it moves the flipped pmf horizontally. When it overlaps with the first pmf the result of the convolution is nonzero. When the two pmfs line up the convolution is maximized. Since the pmfs are either $1/6$ or 0, the product term takes on the value $1/36$ for some set of $n$. As the overlap increases, the result of the convolution increases linearly. Note that for $x=8$ the overlap starts decreasing, and so does the result of the convolution.

Figure 2: Illustrating convolution to find 2d6 pmf

We can repeat this process to find the pmf of 3d6 by convolving the previous result with the pmf of 1d6. In this case, the product term increases linearly once the overlap starts and continues through $x=8$. Up to the point, it's somewhat clear that the resulting pmf is given by,
\begin{align}
f_Y(y) &= \sum_{n=2}^{y-1} f_{Y_2}(n) .
\end{align}
Since $f_{Y_2}(n)$ is linear in that range, we can easily substitute it in.
\begin{align}
f_Y(y) &= \sum_{n=2}^{y-1} \frac{n-1}{6^2} \\
&= \sum_{n=1}^{y-2} \frac{n}{6^2}
\end{align}
You may recall that the sum of integers 1 to $n$ is equal to $n\cdot(n+1)/2$. If not, we can quickly prove this by induction. Consider $n=2$, the sum,
\begin{align}
\sum_{k=1}^2 k &= 1 + 2 = 3.
\end{align}
And when we evaluate $n\cdot(n+1)/2$ for $n=2$ we get $2 \cdot (2+1) / 2 = 3$. So it holds here. Now, let's assume it's true for $n$ and check if it holds for $n+1$.
\begin{align}
\sum_{k=1}^{n+1} k&= \left ( \sum_{k=1}^{n} k \right )+ (n+1) \\
&= \frac{n \cdot (n+1)}{2} + (n+1) \\
&=\frac{n \cdot (n+1) + 2\cdot(n+1)}{2} \\
&=\frac{(n+2) \cdot (n+1)}{2} \\
&=\frac{(n+1) \cdot (n+2)}{2}
\end{align}
We see that it does, thus it hold for all integer $n$. Multiplying this out we see clearly that this is a quadratic function.
\begin{align}
\sum_{k=1}^{n} k &= \frac{n \cdot (n+1)}{2} = \frac{1}{2}n^2 + \frac{1}{2} n
\end{align}
This holds for the other regions of this function as well. In general, if we calculate a running sum of a linear function, we'll get a quadratic function. If we do it piecewise, it holds piecewise. This extends to higher orders as well, as expressed in Faulhaber's formula.

Below we see the same plots showing how $f_Y(y)$ is calculated using the following convolution.
\begin{align}
f_Y(y) = f_{Y_2}(y) * f_X(y)
\end{align}


Figure 3: Illustrating convolution to find 3d6 pmf


This solution appears unsatisfying in some ways. We have a result we can express in an equation, but we can't look at it directly and glean much information. However, what we often care about are having an intuitive understanding of what the result is and an easy way to calculate it numerically. We do have both of those. The central limit theorem and the properties of convolution we looked at here gives us the intuitive understanding. We can also extend this by looking at specific critical values, such as the extreme and central values. Calculating it is also simple. Many tools have simple functions to compute this convolution, so that we can easily tabulate or plot any specific case that we're interested in.

Monday, June 8, 2020

Should I draw a face-up wild in Ticket to Ride?

The inevitably unsatisfying short answer is: it depends. (Of course, if a game offers a choice, but one option is always best, then it's not really a choice, but it's still taking up rules complexity.) Let me set up the condition I'm thinking about. It's your turn, you don't want to build, draw more tickets, or run a passenger (I have the Märklin edition), so you're going to draw cards. None of the face-up cards interest you except possibly a locomotive. So should you draw the face-up wild or top deck?

One account could look at the expected value of the number of useful cards you'll get on this turn. Drawing the face-up wild it's clear that you'll always get one useful card (assuming you're still going to build something). But what if you draw off the top? You might get something good, you might not. Moreover, since you'll draw two cards there's even a middling outcome. For simplicity, let's consider cards to be either useful or useless for you. Thus, you could end up better or worse than drawing the face-up wild. The probability of each case depending on how many cards are useful to you. We'll assume that $c$ colors are useful to you. For simplicity, let's also assume that at least two cards of any useful color are useful. We've certainly put a lot of caveats and assumptions in, which is often necessary to get a first clean result.

Now let's look at the composition of the deck. In Ticket to Ride there are 110 cards: 12 of each of 8 colors plus 14 locomotives (wilds). (Most versions have this deck composition, but the Märklin edition is different.) In reality, you'll know a lot of information about cards in your hand, other cards available to draw, cards in the discard pile, maybe even some you remember that are in other players' hands. However, we're going to look at the general case where almost none of that is taken into account. This averages our result across all possibilities. Given the fact that the deck has over 100 cards and we're drawing only two, these effects don't often have large impacts. The number of useful cards, $n$, is 14 cards, which are always useful (wilds), plus 12 for each useful suit.
\begin{align}
n = 12 \cdot c + 14
\end{align}
We can simplify the answer by approximating the probabilities of the card draws as independent. Thus, for each of the two cards drawn, the probability that it's useful is approximately,
\begin{align}
P(\text{useful}) &= \frac{n}{110} \\
&= \frac{12\cdot c + 14}{110}
\end{align}
Let's consider the scale of the error some of our assumptions and approximations are introducing. We're ignoring the five cards face up and the first card drawn when considering the second. Thus, the true probability can fall within the following range.
\begin{align}
\frac{n-6}{110-6} \leq p \leq \frac{n+6}{110-6}
\end{align}
That's actually a bit loose. We just tacitly assumed that these six cards could be either useful or not. However, we know that's not true. Specifically, we know that of the face-up cards, the only useful ones are wild cards, of which there can be at most two (given the rules about clearing the display for three). Given the choice we're making, we know there's at least one face-up wild card. Taking this into account, we can update our approximate probability assuming one face-up wild card. We know five of the 110 cards, and that one of them is useful.
\begin{align}
P(\text{useful}) &= \frac{n-1}{110-5} \\
&= \frac{12\cdot c + 13}{105}
\end{align}
Note that we're not assuming that 105 cards are in the draw pile. Instead, we're saying that we don't know anything about where any of them are. If some are in the discard pile and some are in other players' hand, but we don't know what cards they are, the probability is the same. Taking this into account, we can see the approximation error from assuming that the card draw probabilities are independent is small. The probability of the second card given the first would be one of the following two.
\begin{align}
P(\text{second useful} | \text{first useful}) &= \frac{12\cdot c + 13 -1 }{105 -1} \\
&= \frac{12 \cdot c + 12}{104} \\
P(\text{second useful} | \text{first useless}) &= \frac{12\cdot c + 13 + 1 }{105 -1} \\
&= \frac{12 \cdot c + 14}{104}
\end{align}
From we these we can find the approximation errors.
\begin{align}
\epsilon_1 &= P(\text{useful}) - P(\text{second useful} | \text{first useful}) \\
&= \frac{12\cdot c + 13}{105} - \frac{12 \cdot c + 12}{104} \\
&= \frac{ 104 \cdot (12 c + 13) - 105 \cdot (12c + 12)}{104 \cdot 104} \\
&= \frac{ -12 c + 92}{10920} \\
&= \frac{ -3 c + 23}{2730} \\
&\approx -0.0011c + 0.0084
\end{align}
Since the number of useful colors can range from 1 to 7, this first error can range from about 0.0007 to 0.0073, less than 1% in either case. Similarly for the other case.
\begin{align}
\epsilon_2 &= P(\text{useful}) - P(\text{second useful} | \text{first useless}) \\
&= \frac{12\cdot c + 13}{105} - \frac{12 \cdot c + 13}{104} \\
&= \frac{ 104 \cdot (12 c + 13) - 105 \cdot (12c + 13)}{104 \cdot 104} \\
&= \frac{ -12 c -13}{10920} \\
&\approx -0.0023c - 0.0012
\end{align}
Here the error can be somewhat larger in magnitude, ranging from $-0.012$ to $-0.0089$, but still less than 1%.

I go through this exercise to make clear that we can make simplifications like this. I've probably used more words justifying it than I would have taking the full probabilities into consideration. To quickly check or bound the error for yourself is much faster.

Knowing the probability that each draw is useful, we can look at the distribution of the number of useful cards when drawing from the top here. This is another binomial distribution, although with two draws it's pretty transparent. If the number of useful cards we draw is the random variable $U$, then we can express the probabilities as follows.
\begin{align}
P(U = 0) & = \left ( 1 - P(\text{useful}) \right ) ^2 \\
&= \left ( 1 - \frac{12 \cdot c + 13}{105} \right ) ^2 \\
&= \left ( \frac{104 - (12c + 13)}{105} \right ) ^2 \\
&= \left ( \frac{92 - 12c}{105} \right ) ^2 \\
P(U = 1) & = 2 \cdot P(\text{useful}) \cdot \left ( 1 - P(\text{useful}) \right ) \\
&= 2 \cdot \frac{12 \cdot c + 13}{105} \cdot \left ( 1 - \frac{12 \cdot c + 13}{105} \right ) \\
&= 2 \cdot \frac{12 \cdot c + 13}{105} \cdot \frac{92 - 12c}{105} \\
P(U = 2) &= P(\text{useful}) ^2 \\
& = \left ( \frac{12 \cdot c + 13}{105} \right ) ^2

\end{align}
Each of these are plotted as a function of the number of useful colors in Fig. 1. As there are three possibilities, it's not necessarily clear what we should make of this. All three probabilities are second-order functions of $c$, which we can see in their parabolic shape. Two alternative ways are the probability of getting one or more useful card and the expected value of the number of useful cards, respectively.

Figure 1: Probabilities when top-decking in Ticket to Ride

The probability of getting one or more card is the sum of the probabilities of getting one and two cards.
\begin{align}
P(U \geq 1) &= P(U = 1) + P(U = 2) \\
&= 2 \cdot \frac{12 \cdot c + 13}{105} \cdot \frac{92 - 12c}{105} + \left ( \frac{12 \cdot c + 13}{105} \right ) ^2 \\
&= 2\cdot \frac{-144c^2+948c +1196}{105^2} + \frac{144c^2+312c+169}{105^2} \\
&= \frac{-288c^2+1896c +2392 + 144c^2+312c+169}{105^2} \\
&= \frac{-144c^2+2208c +2561}{105^2}

\end{align}
This function is also plotted in Fig. 1. This indicates that when two or more colors are useful, you're more likely than not to get at least one useful card. On the other hand, it's still quite likely that you'll get nothing useful.

Figure 2: Expected useful cards when top-decking in Ticket to Ride

Recall the expected value is the weighted average of a random variable, and given here by,
\begin{align}
\mathbb{E} U = \sum_u u \cdot P(U=u).
\end{align}
This ends up being the sum of just the two possible non-zero terms.
\begin{align}
\mathbb{E} U &= 1 \cdot P(U = 1) + 2 \cdot P(U = 2) \\
&=2 \cdot \frac{12 \cdot c + 13}{105} \cdot \frac{92 - 12c}{105} + 2 \cdot \left ( \frac{12 \cdot c + 13}{105} \right ) ^2 \\
&= 2\cdot \frac{-144c^2+948c +1196}{105^2} + 2\cdot\frac{144c^2+312c+169}{105^2} \\
&=2\cdot\frac{1260c+1365}{105^2} \\
&=2\cdot\frac{12c+13}{105}
\end{align}
Alternatively, we can do the algebra at a higher level of abstraction, where we're less prone to arithmetic mistakes.
\begin{align}
&= 2 \cdot P(\text{useful}) \cdot \left ( 1 - P(\text{useful}) \right ) + 2 \cdot P(\text{useful}) ^2 \\
&= 2 \cdot P(\text{useful}) - \require{cancel}\cancel{2 \cdot P(\text{useful}) \cdot P(\text{useful})} + \cancel{2 \cdot P(\text{useful})^2} \\
&= 2 \cdot \frac{12 \cdot c + 13}{105}\\
&= \frac{24 \cdot c + 26}{105}\\
&\approx 0.23 c + 0.25

\end{align}
We see that the quadratic terms canceled out and we're left with a linear equation, as shown in Fig. 2. This makes sense. Given that we approximated the two draws as independent, the expected value of the number of useful cards when drawing two cards should be twice that as when drawing one card. Expectation is linear. The expected value for a single draw, $U_1$, should be proportional to the number of useful cards available in the deck.
\begin{align}
\mathbb{E} U_1 &= 0 \cdot P(U_1 = 0) + 1 \cdot P(U_1 = 1) \\
&= P(U_1 = 1) \\
&= P(\text{useful}) \\
&= \frac{12\cdot c + 13}{105} \\
 &\approx 0.11 c + 0.12
\end{align}
According to the plot, when four or more colors are useful we expect to get more than one useful card. Looking at the earlier result, we see that there's still a pretty substantial chance that you'll get nothing useful, $P(\text{0 useful cards} | c = 4) \approx 0.2$. Which is the correct way to think about it? They're both useful. The expected value essentially tells you what you'd get on average if you repeated this an infinite number of times. This is known from the strong law of large number, which states that the sample average converges almost surely to the mean as the sample size goes to infinity. Here almost surely has the specific meaning that it occurs with probability 1. Note that this is distinct from always happening and is perhaps best understood with continuous-valued random variables which tend not to come up for us here. A nice property is that it takes into account the benefit of sometimes getting two useful cards. So as a general strategy, it makes sense to look at the expected value.

However, using the expected value breaks down in two ways here. First, it violates the approximating assumption that the card draws are independent. That's a reasonable approximation for two cards, but becomes less so as more cards are drawn. So our approximate expected value is really only the average of a single draw across many separate games. Second, the number of times that this comes up in a game isn't that large. For many values of $c$, the probabilities of getting either no useful cards and two useful cards are both substantial. Thus, you'd expect to see scenarios where you come up empty-handed and others where you do much better drawing from the top. On the other hand, drawing the wild card is more reliable. It's the low variance play.

Monday, June 1, 2020

How far can I throw in a tunnel?

As pointed out in this video from LindyBeige, if you are inside, say, a dungeon, then low ceilings limit how far you can throw javelins and the like because they rely on you throwing them high to achieve distance. So the question is, if you know the ceiling height, how far can you throw?



Let's assume that we know how far you can throw outside, where you're not limited by ceilings. This aligns with the maximum range that may be listed in a game's rules. For example, in Dungeons and Dragons, a javelin has a long range of 120'. For simplicity, let's assume the ceiling is uniform is height. (Yes, this directly contradicts the first point in LindyBeige's video.)

When you throw, you can choose the angle at which you throw. Inside, you'll use a lower angle to avoid the ceiling. But let's assume that you throw with the same initial speed. Gravity pulls the javelin down and air pushes against the direction of movement. We'll account for the first, but ignore the second. Ignoring air resistance is par for the course when doing analyses such as this, often without regard for whether or not it's appropriate. Our first task is to deduce the initial speed based on the outside throwing range. Let $x_f$ be this range. We'll break up the initial velocity into its horizontal component, $v_{x0}$, and its vertical component, $v_{y0}$. Gravity provides a constant acceleration, the rate of change in velocity, so we know the velocity as a function of time and the gravitational constant $g = 9.8 \, \text{m}/\text{s}$.
\begin{align}
v_y(t) = v_{y0} -  g\cdot t
\end{align}
From this we can determine the elevation of the javelin throughout its flight, $y(t)$. Here we'll reference zero height at the normal height of a person, assuming that since the target is about equal in height to the launch height, so $y(0) = y_0 = 0$. The change in height is determined by integrating the velocity.
\begin{align}
y(t) &= y_0 + \int_0^t v(t) \,\mathrm{d}t \\
y(t) &= \int_0^t  v_{y0} -  g\cdot t  \,\mathrm{d}t \\
y(t) &=  v_{y0} \cdot t -  \frac{1}{2} \cdot g\cdot t^2
\end{align}
Using this we can solve for the time at which the javelin reaches the target height of $y(t) = 0$, which occurs at the final time $t_f$.
\begin{align}
y(t_f) &= 0 \\
0 &= v_{y0} \cdot t_f -  \frac{1}{2} \cdot g\cdot t_f^2  \\
\end{align}
We're interested in the case when $y(t) = 0$ but $t \neq 0$, so we can divide by $t_f$ and then solve for it.
\begin{align}
0 &= v_{y0} -  \frac{1}{2} \cdot g\cdot t_f  \\
\frac{1}{2} \cdot g\cdot t_f & =v_{y0} \\
t_f &= \frac{v_{y0}}{\frac{1}{2} \cdot g} \\
t_f &= 2 \cdot \frac{v_{y0}}{g}
\end{align}
This tells us how much time the javelin spends in the air. Next, we can find how far it travels based on this. Since you're not throwing it straight up, there's some initial horizontal component to the velocity, $v_{x0}$. We'll specify the horizontal position, $x$, to have value $0$ at $t=0$ where you throw from, so the horizontal position is simply proportional to time and initial horizontal velocity. In fact, this position is equal to the product of these two.
\begin{align}
x(t) = v_{x0} \cdot t
\end{align}
Thus the distance thrown, $x_f = x(t_f)$ is as follows.
\begin{align}
x_f &= v_{x0} \cdot t_f \\
&= v_{x0} \cdot 2 \cdot \frac{v_{y0}}{g} \\
&=  \frac{2 \cdot v_{x0} \cdot v_{y0}}{g}
\end{align}
Next, we can find the optimal way to split the initial velocity $v_0$ between its vertical and horizontal components. The initial velocity is a vector, having both magnitude, $|v_0|$ and an angle $\angle v_0$. We can use trigonometric functions to relate the initial velocity into its Cartesian components.
\begin{align}
v_{x0} &= |v_0| \cos  \angle v_0  \\
v_{y0} &= |v_0| \sin  \angle v_0
\end{align}
We don't ultimately care about the angle, so we can also just look at the magnitude, using the Pythagorean formula.
\begin{align}
|v_0|^2 &= v_{x0}^2 + v_{y0}^2
\end{align}
From this we can solve for $v_{x0}$.
\begin{align}
v_{y0} &=  \sqrt{|v_0|^2 - v_{x0}^2}
\end{align}
We can use this to go back and find the distance thrown in terms of $v_0$ and $v_{y0}$.
\begin{align}
x_f &=   \frac{2  \cdot v_{x0} \cdot \sqrt{|v_0|^2 - v_{x0}^2}}{g}
\end{align}
The final distance reached is a function of the initial horizontal velocity component. We anticipate a maximum, so let's take the derivative of the final position with respect to this velocity component and find where it's zero.
\begin{align}
0 & = \frac{ \mathrm{d} x_f } {  \mathrm{d} v_{x0}  } \\
0 &= \frac{ \mathrm{d} } {  \mathrm{d} v_{x0}  }   \frac{2  \cdot v_{x0} \cdot \sqrt{|v_0|^2 - v_{x0}^2}}{g}  \\
0 &=  \frac{2 }{g} \cdot \frac{ \mathrm{d} } {  \mathrm{d} v_{x0}  }   \cdot v_{x0} \cdot \sqrt{|v_0|^2 - v_{x0}^2}  \\
0 &=  \frac{2 }{g} \cdot \left ( \sqrt{|v_0|^2 - v_{x0}^2} + v_{x0} \cdot \frac{1}{2} { \left (|v_0|^2 - v_{x0}^2 \right )}^{-\frac{1}{2}}\cdot  \frac{ \mathrm{d} } {  \mathrm{d} v_{x0}  } |v_0|^2 - v_{x0}^2 \right ) \\
0 &=  \frac{2 }{g} \cdot \left ( \sqrt{|v_0|^2 - v_{x0}^2} + v_{x0} \cdot \require{cancel}\cancel{ \frac{1}{2}} { \left (|v_0|^2 - v_{x0}^2 \right )}^{-\frac{1}{2}}\cdot  -\cancel{2}\cdot v_{x0} \right ) \\
0 &=  \frac{2 }{g} \cdot \left ( \sqrt{|v_0|^2 - v_{x0}^2} -  v_{x0} \cdot { \left (|v_0|^2 - v_{x0}^2 \right )}^{-\frac{1}{2}}\cdot  v_{x0} \right ) \\
0 &=  \left ( \sqrt{|v_0|^2 - v_{x0}^2} -  \frac{v_{x0}^2}{\sqrt{|v_0|^2 - v_{x0}^2 }} \right ) \\
\end{align}
Now let's bring some stuff to both sides and rearrange.
\begin{align}
\frac{v_{x0}^2}{\sqrt{|v_0|^2 - v_{x0}^2 }} &=   \sqrt{|v_0|^2 - v_{x0}^2}  \\
\frac{v_{x0}^2}{\sqrt{|v_0|^2 - v_{x0}^2 }} \cdot  \sqrt{|v_0|^2 - v_{x0}^2}  &=   \sqrt{|v_0|^2 - v_{x0}^2} \cdot  \sqrt{|v_0|^2 - v_{x0}^2} \\
v_{x0}^2 & = |v_0|^2 - v_{x0}^2 \\
2 v_{x0}^2 & = |v_0|^2 \\
v_{x0}^2 & = \frac{1}{2} |v_0|^2
\end{align}
There's a solution to this where $v_{x0}$ and $|v_0|$ are both $0$, but we're not interested in that case.
\begin{align}
v_{x0} &= \frac{ |v_0| }{\sqrt{2}}
\end{align}
This is a simple case that we can recognize that $v_{y0}=1/ \sqrt{2} \cdot |v_0|$ also and $\angle v_0 = 45^\circ$. We can now plug this back into our previous equation to find a relationship between $x_f$ and $v_0$.
\begin{align}
x_f &=  \frac{2  \cdot  \frac{ |v_0| }{\sqrt{2}} \cdot \sqrt{|v_0|^2 -\left(  \frac{ |v_0| }{\sqrt{2}} \right )^2}}{g}  \\
&=  \frac{2  \cdot  \frac{ |v_0| }{\sqrt{2}} \cdot \sqrt{\frac{|v_0|^2}{2}}}{g}  \\
&=  \frac{2  \cdot  \frac{ |v_0| }{\sqrt{2}} \cdot \frac{ |v_0| }{\sqrt{2}}}{g}  \\
&=  \frac{\require{cancel}\cancel{2}  \cdot  \frac{|v_0|^2}{\cancel{2}} }{g}  \\
x_f &=  \frac{|v_0|^2}{g}  \\
|v_0|^2 &= g \cdot x_f\\
|v_0| &= \sqrt{g \cdot x_f}
\end{align}
All this work has been done to give us the initial speed as a function of the distance thrown outside. Now we can turn our attention to how far we can throw given this initial velocity and a ceiling height. Let's set the maximum height of the javelin in flight to be $y_p$, where our $p$ is for peak. We'll say that this occurs at time $t_p$.
\begin{align}
y_p &= y(t_p)
\end{align}
To try to keep our variables from getting confused, we'll specify different vertical and horizontal components to the velocity now that you're throwing inside. Let these be $v_{yi}$ and $v_{xi}$, respectively.
\begin{align}
y_p &= v_{yi} \cdot t_p -  \frac{1}{2} \cdot g\cdot t_p^2
\end{align}
We'll find where the derivative is zero again to find the time at which the maximum height occurs maximum again.
\begin{align}
0 &= \frac{ \mathrm{d} y(t) } {  \mathrm{d} t  } \\
0 &= \frac{ \mathrm{d} } {  \mathrm{d} t  }  \left ( v_{yi} \cdot t -  \frac{1}{2} \cdot g\cdot t^2 \right )\\
0 &=    v_{yi}  -    g\cdot t\\
g\cdot t &=  v_{yi}   \\
t &= \frac{v_{yi}}{g}\\
t_p &= \frac{v_{yi}}{g}
\end{align}
Now we can find the maximum height by plugging in this time.
\begin{align}
y_p &= v_{yi} \cdot \frac{v_{yi}}{g} -  \frac{1}{2} \cdot g\cdot \left ( \frac{v_{yi}}{g} \right ) ^2 \\
&= \frac{v_{yi}^2}{g} -  \frac{1}{2} \cdot g\cdot  \frac{v_{yi}^2}{g^2}  \\
&= \frac{v_{yi}^2}{g} -  \frac{1}{2} \cdot   \frac{v_{yi}^2}{g}  \\
y_p &=   \frac{1}{2} \cdot  \frac{v_{yi}^2}{g}   \\
v_{yi}^2 &= 2 \cdot g \cdot y_p \\
v_{yi} &= \sqrt{2 \cdot g \cdot y_p }
\end{align}
Based on the previous equation for the final distance, we can find the write the final inside distance, $x_{fi}$, in terms of the initial velocity components.
\begin{align}
&=  \frac{2 \cdot v_{xi} \cdot v_{yi}}{g} \\
&=  \frac{2 \cdot \sqrt{|v_0|^2 - v_{yi}^2} \cdot v_{yi}}{g}  \\
&=  \frac{2 \cdot \sqrt{|v_0|^2 - 2 \cdot g \cdot y_p } \cdot \sqrt{2 \cdot g \cdot y_p }}{g}  \\
&=  \frac{2 \cdot \sqrt{\left (  \sqrt{g \cdot x_f} \right)^2 - 2 \cdot g \cdot y_p } \cdot \sqrt{2 \cdot g \cdot y_p }}{g}  \\
&=  \frac{2 \cdot \sqrt{\require{cancel}\cancel{g} \cdot x_f- 2 \cdot \cancel{g} \cdot y_p } \cdot \sqrt{2 \cdot \cancel{ g} \cdot y_p }}{\cancel{g}}  \\
x_{fi} &=  2 \cdot \sqrt{x_f - 2  \cdot y_p } \cdot \sqrt{2  \cdot y_p }
\end{align}
Note that this only applies when $y_p$ is less than the maximum height outside. As an example with the javelin's range of 120 ft in a 10 ft room, the maximum thrown distance in ft becomes,
\begin{align}
2 \cdot \sqrt{120 - 2  \cdot 10 } \cdot \sqrt{2  \cdot 10 }  \approx 89.4 .
\end{align}
There's a little gotcha here. We assumed zero elevation at the height of the thrower. Let's just call this 5 ft for simplicity. Thus, we should take 5 ft off the height that we plug into the equation above.
\begin{align}
2 \cdot \sqrt{120 - 2  \cdot (10-5) } \cdot \sqrt{2  \cdot (10-5) }  \approx 66.3
\end{align}

Another thing we need to take care of is the range over which our equation is valid. Now, since we're dealing with real numbers in the real world, we know that we must meet the condition $x_f > 2 \cdot y_p$, as otherwise we'd have a negative number inside the square root. However, we need to consider the case when the outdoor maximum height, $y_{max}$, is less than or equal to the elevation of the ceiling, $y_p$. To check, we'll go back and find $y_{max}$. If $y_p \geq y_{max}$ then the indoor range is equal to the outdoor range. To do this, let's recall some earlier equations.
\begin{align}
|v_0| &= \sqrt{g \cdot x_f} \\
v_{y0} &= \sqrt{2 \cdot g \cdot y_{max} } \\
v_{y0}&=\frac{1}{ \sqrt{2}} \cdot |v_0|
\end{align}
We can combine these and do some algebra to find the outdoor maximum height reached by the javelin.
\begin{align}
\sqrt{2 \cdot  g \cdot y_{max} } & = \frac{1}{ \sqrt{2}} \cdot |v_0| \\
2 \cdot g \cdot y_{max}  & = \frac{1}{2} \cdot |v_0|^2 \\
y_{max}  & = \frac{1}{4g} \cdot |v_0|^2 \\
y_{max}  & = \frac{1}{4g} \cdot \left (  \sqrt{g \cdot x_f} \right )^2 \\
y_{max}  & =  \frac{ x_f}{4}
\end{align}
So this means the maximum range is 4 times the height reached during flight outside. Thus, if the ceiling is at least one quarter of the outdoor range, it has no impact. To check that the points coincide, we can evaluate our equation for the indoor range when the ceiling is equal to $y_{max} = \frac{x_f}{4}$.
\begin{align}
x_{fi} &=  2 \cdot \sqrt{x_f - 2  \cdot \frac{x_f}{4} } \cdot \sqrt{2 \cdot \frac{x_f}{4} }  \\
& =  2 \cdot \sqrt{x_f - \frac{x_f}{2} } \cdot \sqrt{ \frac{x_f}{2} }  \\
& =  2 \cdot \sqrt{ \frac{x_f}{2} } \cdot \sqrt{ \frac{x_f}{2} }  \\
& =  2 \cdot \frac{x_f}{2} \\
& =  x_f
\end{align}
Indeed they match.

Putting this together we can get in the following equation.
\begin{align}
x_{fi} &= \begin{cases}
2 \cdot \sqrt{x_f - 2  \cdot y_p } \cdot \sqrt{2  \cdot y_p } & \text{if } y_p < \frac{x_f}{4} \\
x_f & \text{otherwise}
\end{cases}
\end{align}
This is shown below as plots of indoor range vs. outdoor range by ceiling height, assuming a 5' tall thrower. Some example indoor ranges, rounded to the nearest 5', for an assortment of outdoor ranges and ceiling heights are listed in the table at the end.


Indoor range by ceiling height

\begin{align}
\begin{array}{c|c|c|c|c|c|c|c}
\text{Ceiling Height}&  15& 60& 100& 120& 320& 400& 600 \\ \hline
 5&   0&    0&    0&    0&    0&    0&    0\\
10&  15&   45&   60&   65&  110&  125&  155\\
15&  15&   55&   80&   90&  155&  175&  215\\
20&  15&   60&   90&  105&  185&  210&  260\\
25&  15&   60&  100&  115&  210&  240&  300\\
30&  15&   60&  100&  120&  230&  265&  330\\
35&  15&   60&  100&  120&  250&  285&  360\\
40&  15&   60&  100&  120&  265&  305&  385\\
45&  15&   60&  100&  120&  275&  320&  410\\
50&  15&   60&  100&  120&  290&  335&  430\\
\end{array}\\
\text{Indoor ranges by outdoor ranges for project weapons, rounded to 5'}\\
\end{align}