Our first step is to compute if there is a round of combat, how many units get removed
without leaders, this is pretty clear: it's a binomial distribution with n = # of units (max 5)
and $p = 2/6$ for defender (hit on 5 & 6), $p = 1/6$ for attacker (hit on only 6)
Let $X_n$ be a random variable equal to the number of hits generated by an attack with $n$ combat dice.

\begin{align}
P(X_n=x) = \binom{n}{x}\cdot (1-p)^x \cdot p^{(n-x)}
\end{align}
With as many leaders as units (or 5 leadership in the case of larger armies), we can use the same approach.
Because any die that misses the combat roll will get rerolled in the leader reroll, we can compute the combined probability, $p_l$, of hitting on either the combat roll or the leader reroll for each die
namely.

\begin{align}
p_l &= p + (1-p)\cdot p \\
&= 2\cdot p-p^2
\end{align}
It may be handy to look at the probability of a miss in this case also.
\begin{align}
1 - p_l &= 1 - 2\cdot p + p^2 \\
&= (1-p)^2
\end{align}
If we let $X_n'$ be a random variable equal to the number of hits generated by an attack with $n$ combat dice and at least $n$ leadership. This is the same idea as $X_n$, but with at least $n$ leadership.
\begin{align}
P(X_n' = x) &= \binom{n}{x}\cdot (1-p_l)^x \cdot {p_l}^{(n-x)} \\
&= \binom{n}{x}\cdot (1-p)^{2x} \cdot \left ( 2\cdot p-p^2 \right )^{(n-x)}
\end{align}
When the leadership and combat strength do no match, then it gets more tricky.
It's tempting to think that we can just add two cases. That is, it may seem that 5 combat strength with 3 leadership is equivalent to adding the hits generated by 3 combat strength with 3 leadership and 2 combat strength with 0 leadership. However, this is not the case, because that way if the 3 units with leadership hit and the 2 units without leadership miss, then there is no leadership reroll. However, in the case with 5 units with 3 leadership roll the same as before, if any two units miss then their dice get rerolled. Thus, we'd need to approach it differently. For simplicity, we'll set this case aside and focus on the extremes of no leadership and maximum leadership.

Using the equations we derived, we can compute the probabilities of getting different numbers of hits for a variety of situations. There are three variables we'll consider: the number of combat dice rolled, whether leaders are present, and whether the units are attacking a fortified position, meaning either a stronghold or a city or fortification in the first round. The next several figures plot the probabilities in a couple different ways.

Figure 1: PMF of hits in War of the Ring when attacking with 5 units |

Figure 2: PMF of hits in War of the Ring when attacking a fortified position with leadership |

Figure 3: PMF of hits in War of the Ring when attacking an unfortified position with leadership |

Figure 4: PMF of hits in War of the Ring when attacking a fortified position without leadership |

Figure 5: PMF of hits in War of the Ring when attacking an unfortified position without leadership |

Our next step is to find how many rounds of combat a siege is expected to last. We'll use three techniques to quantify this and compare. Throughout, we'll use a few common assumptions. While the number of defending units may vary, being able to take a total of $d$ hits, we fix the number of attack dice at five, either with or without leadership. We thus remove the possibility that the siege goes well enough for the defender that the attackers are whittled down or eliminated.

First, we'll estimate it based on how many individual dice it takes to eliminate the defending units, similar to when we looked at What's the toughest unit in Memoir '44?. The limitation here is that when moving from rolling one die at a time to five dice at a time there's different quantization occurring. Needing to roll one through five dice is all just one round of combat, but moving from five dice to six dice means a second round even though it's only one die. The averaging that occurs over dice and rounds can have a notable impact. We'll see when we compare. Each die eliminates one hit point of defending units with probability $p$ (or $p_l$ with leadership). The expected number of dice to generate this one hit is $1/p$. Thus, to generate $d$ hits we can estimate $d/p$ dice on average. Since five dice are rolled each round, this means there should be about $d/(5p)$ rounds on average.

For the next two methods, we'll make use of a Markov chain, as we did when checking if it was possible an event was never drawn in a previous problem. We'll start with an initial state column vector $\mathbf{s_d}$ that indicates the number, $d$, of hit points of the defenders in the siege. The vector is zero everywhere, except in index $d$, where it has a value of 1 to indicate that the initial probability of there being $d$ defending hit points is 1. (I'll use indexes starting at 0, instead of 1 as commonly done, as it lines up nicely with programming as well as the meaning in this problem.) For example, with 10 defending hit points,
\begin{align}
\mathbf{s_{10}}^T & =
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1
\end{bmatrix}
,
\end{align}
where $\mathbf{s_d}^T$ is the transpose of $\mathbf{s_d}$, shown to fit on the page more easily. Note the 1 would be in the bottom position of the column vector. Each position in the vector corresponds to the 11 total states: 0-10 defending hit points of units.

We can multiply this by a transition matrix $\mathbf{P}$.
\begin{align}
\mathbf{P} =
\begin{bmatrix}
p_{0,0} & p_{0,1} & \dots & p_{0,10}\\
p_{1,0} & p_{1,1} \\
\vdots & & \ddots \\
p_{10,0} & p_{10,1} & \ldots & p_{10,10}
\end{bmatrix}
\end{align}
An element of $\mathbf{P}$, $p_{i,j}$ is the probability of transitioning from state $j$ to state $i$, in line with matrix multiplication. This is a left stochastic matrix, where each column adds to one. You may find this less common than right stochastic matrices, where each row sums to one. This means that the matrices and equations here may need to be transposed to compare to other sources. The core is whether $p_{i,j}$ is as used here or as the probability of transitioning from state $i$ to state $j$. The left and right names have to do with whether the stochastic matrix should multiply state probability vectors from the left or right.

To fill in the values of this matrix, let's work with the probabilities of getting a certain number of hits by the attacker, which we assume always rolls 5 dice for simplicity. The probability $p_h$ is the probability of getting $h$ hits. Thus we have six probabilities of interest here: $p_0$, $p_1$, $p_2$, $p_3$, $p_4$, and $p_5$, where $p_h = P(X_5 = h)$ as found earlier. We construct $\mathbf{P}$ from these individual probabilities. Each column represents the probabilities of moving from one state to the others. Thus, the values of each column must sum to 1, as the probability of ending up in one of the states is 1 and all states are under consideration here. Further, since there is no way to add defending hit points during a battle, we know that many of the entries are zero.
\begin{align}
p_{i,j} = 0 \qquad i > j
\end{align}
We can't transition from state $j$, which has $j$ defending hit points, to a state $i$, which has $i$ defending hit points, where $i>j$.

We'll go through the matrix one column at a time to build it up. The first column, representing what happens when there are already no defending units is fairly simple, as we must stay in that state. Thus, the transition probabilities are either 0 or 1.
\begin{align}
p_{i,0} =
\begin{cases}
1 &\qquad i = 0 \\
0 &\qquad i \neq 0
\end{cases}
\end{align}
Specifically, the probability of staying in state 0 is 1 and the probability of transitioning to any other state is 0.

In the next column, we start with one defending hit point and there are two possibilities: if no hits are rolled, then stay in the same state, but if any hits are rolled, then move to state 0.
\begin{align}
p_{i,1} =
\begin{cases}
\sum_{h=1}^5 p_h = (1-p_0) &\qquad i = 0 \\
p_0 & \qquad i = 1\\
0 &\qquad i > 1
\end{cases}
\end{align}
The next several columns follow this pattern
\begin{align}
p_{i,2} =
\begin{cases}
\sum_{h=2}^5 p_h &\qquad i = 0 \\
p_1 & \qquad i = 1\\
p_0 & \qquad i = 2\\
0 &\qquad i > 2
\end{cases}
\end{align}
Eventually, no summation is necessary and next the probability of reaching state 0 in the single transition of one round of combat becomes 0, as it would require more than the maximum 5 hits.
\begin{align}
p_{i,6} =
\begin{cases}
0 &\qquad i = 0 \\
p_5 & \qquad i = 1\\
p_4 & \qquad i = 2\\
p_3 & \qquad i = 3\\
p_2 & \qquad i = 4\\
p_1 & \qquad i = 5\\
p_0 & \qquad i = 6\\
0 &\qquad i > 6
\end{cases}
\end{align}
We can also generalize this.
\begin{align}
p_{i,j} =
\begin{cases}
p_5 & \qquad i = j-5\\
p_4 & \qquad i = j-4\\
p_3 & \qquad i = j-3\\
p_2 & \qquad i = j-2\\
p_1 & \qquad i = j-1\\
p_0 & \qquad i = j\\
0 &\qquad i < j-5, i > j
\end{cases} \qquad j \geq 5
\end{align}
Multiplying some probability state vector, such as $\mathbf{s_d}$, by $\mathbf{P}$ gives the updated probabilities of being in each state after one round of combat.
\begin{align}
\mathbf{s_{new}} = \mathbf{P} \times \mathbf{s_{old}}
\end{align}
We can repeat this to get the probabilities after multiple rounds of combat.
\begin{align}
\mathbf{s_{r=2}} &= \mathbf{P} \times \mathbf{s_{n=1}}\\
\mathbf{s_{r=2}} &= \mathbf{P} \times \left ( \mathbf{P} \times \mathbf{s_{n=0}} \right ) \\
\mathbf{s_{r=2}} &= \mathbf{P}^2 \times \mathbf{s_{n=0}}
\end{align}
Here $\mathbf{s_{r=2}}$ indicates the state probability vector after two rounds of combat.
Matrix exponentiation is repeated matrix multiplication as exponentiation is repeated multiplication.
\begin{align}
\mathbf{P}^r &= \underbrace{\mathbf{P} \times \mathbf{P} \times \cdots \times \mathbf{P}}_{r \text{ times}}
\end{align}
Since it comes up later if you pay attention, so we'll note that just as $x^0 = 1$, $\mathbf{P}^0 = \mathbf{I}$. That is, if we multiply by $\mathbf{P}$ zero times, we should keep what we have, which is accomplished by multiplying by the identity matrix.

The number of rounds, $R_{d}$, that it takes to remove all $d$ defending units has a probability mass function that can be extracted by grabbing the probability of being in state 0 beginning in round $r$, using a matrix $\mathbf{A}$.
\begin{align}
\mathbf{A} =
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0
\end{bmatrix}
\end{align}
The point of this matrix is to take the first entry of the state probability vector, which is the probability of there being no defending hit points remaining that we're interested in.
Combining $\mathbf{A}$, $\mathbf{P}$, and $\mathbf{s_{d}}$, we find the probability that there are no defending units in round $r$.
\begin{align}
P(0\text{ defenders in round } r) = \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_{d}}
\end{align}
To get the pmf of $R_{d}$, we want to only see the probability that this came to be in round $r$.
\begin{align}
P(R_{d} = r) &= P(0\text{ defenders in round } r) - P(0\text{ defenders in round } r-1)\\
& = \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_{d}} - \mathbf{A} \times \mathbf{P}^{r-1} \times \mathbf{s_{d}} \\
& = \mathbf{A} \times \left ( \mathbf{P}^{r} - \mathbf{P}^{r-1} \right ) \times \mathbf{s_{d}}
\end{align}
Here it's useful to make use of an identity matrix, $\mathbf{I}$, which is the matrix multiplication equivalent of multiplying by 1 in that multiplying by it leaves the same matrix.
\begin{align}
P(R_{d}=r) = \mathbf{A} \times \left(\mathbf{P}- \mathbf{I} \right) \times \mathbf{P}^{r-1} \times \mathbf{s_{d}}
\end{align}
For our second technique, let's take this and do enough of an infinite sum to estimate the expected value of $R$. Recall the definition of the expected value of a non-negative discrete random variable.
\begin{align}
\mathbb{E} R_d &= \sum_{r=0}^\infty r \cdot P(R_d = r) \\
&= \sum_{r=0}^\infty P(R_d > r)
\end{align}
We can turn these probability terms around by looking at their complements.
\begin{align}
P(R_d > r) &= 1 - P(R_d \leq r)
\end{align}
The probability that the combat ends in round $r$ or earlier, $P(R_d \leq r)$ is equivalent to there being no defenders in round $r$. They could have been removed in that round or an earlier one. We found this probability earlier.
\begin{align}
P(R_d \leq r) &= P(0\text{ defenders in round } r)\\
&= \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d}
\end{align}
Putting this together, we find the following infinite sum.
\begin{align}
\mathbb{E} R_d &= \sum_{r=0}^\infty \left ( 1 - \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \right )
\end{align}
By computing this running sum, as shown in Figure 6 for 10 initial hit points, we observe that it appears to be approaching an asymptote well before 50 rounds.
\begin{align}
\widehat{\mathbb{E} R_d} &= \sum_{r=0}^{50} \left ( 1 - \mathbf{A} \times \mathbf{P}^{r} \times \mathbf{s_d} \right )
\end{align}
The asymptote is approached faster with fewer hit points, which must require fewer rounds to eliminate with the same combat rolls. Thus, we can use the running sum after 50 rounds as a good estimate of the infinite sum.

Figure 6: Running estimate of expected number of combat rounds complete siege of 10 defending hit points |

For our third technique, I think we can do better than an estimate this time. We can solve for the actual expected values.
We'll again look at the number of rounds $R_d$ based on starting with $d$ defending hit points. If we start with no defending units, the siege lasts 0 rounds.
\begin{align}
\mathbb{E}R_0 = 0
\end{align}
If we start with 1 hit point of defenders, then it's just a question of how long until the attacker rolls one hit. The probability of one or more hits if $1-p_0$ and we've previously found that the expected number of tries to get a result that occurs with probability $p$ is $1/p$, so we can find the expected value of $R_1$.
\begin{align}
\mathbb{E}R_1 &= \frac{1}{1-p_0}
\end{align}
We can also leverage the columns of $\mathbf{P}$ to find this result. We know the probabilities of transitioning to each other state. Expectation is linear, so we can write the expectation of the length of a siege in one state based on the expected durations of other states and the transition probabilities.
\begin{align}
\mathbb{E}R_i = \sum_{j=0}^{10} p_{i,j} \cdot (1 + \mathbb{E} R_j)
\end{align}
In particular for state 1, we find the following.
\begin{align}
\mathbb{E}R_1 &= p_0 \cdot \mathbb{E} (1 + R_1) + (1- p_0) \cdot (1 + \underbrace{\mathbb{E} R_0}_{0}) \\
\mathbb{E}R_1 \cdot (1 - p_0) &= p_0 + 1 - p_0 \\
&= 1 \\
\mathbb{E}R_1 &= \frac{1}{1-p_0}
\end{align}
This matches what we had from before and was thus to some extent not necessary to do, but it should help build confidence in this method. We can extent this and work up to each new state.
\begin{align}
\mathbb{E}R_2 &= p_0 \cdot (1 + \mathbb{E} R_2) + p_1 \cdot (1 + \mathbb{E}R_1) + \left ( \sum_{j=2}^5 p_j \right ) \cdot (1 + \mathbb{E} R_0 )
\end{align}
I'll let you go through the algebra to solve this yourself, but as you can imagine, it's not necessarily a very elegant equation.
\begin{align}
\mathbb{E}R_2 &= \frac{1 + \frac{p_1}{1-p_0}}{1-p_0}
\end{align}
This continues, with ever more fractions involving $1-p_0$ denominators. I think there's a way to look at things differently once we get past the bunching up and we're in the generic form that we came up with, but given how complicated this is getting we're not going to get any closed-form equation from which we can glean much meaning. Rather, the only advantage is that we've come up with a way through which we can calculate the result of interest without approximating anything (beyond the limits inherent in whatever computation tool we use).

However, there is an alternate way of solving these equations using linear algebra. First, let's put these expected values into a row vector.
\begin{align}
\mathbf{R} &=
\begin{bmatrix}
\mathbb{E}R_0 &
\mathbb{E}R_1 &
\dots &
\mathbb{E}R_{10}
\end{bmatrix}
\end{align}
Now, we can express this in terms of the transition matrix $\mathbf{P}$. We're taking the previous equation for the expected number of rounds to reach state 0 from each other state,
\begin{align}
\mathbb{E}R_i = \sum_{j=0}^{10} p_{i,j} \cdot (1 + \mathbb{E} R_j),
\end{align}
and rewriting them all at once with matrices.
\begin{align}
\mathbf{R} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \mathbf{P}
\end{align}
Here $\mathbf{J_{1\times 11}}$ is an one by eleven matrix of all ones, a row vector of all ones. (Note that while my indexing starts at 0, the dimension subscripts here still refers to the size.) Next, we'll do some linear algebra to solve for $\mathbf{R}$.
\begin{align}
\mathbf{R} - \mathbf{R} \times \mathbf{P} &= \mathbf{J_{1\times 11}} \times \mathbf{P} \\
\mathbf{R} \times \left ( \mathbf{I} - \mathbf{P} \right ) &= \mathbf{J_{1\times 11}} \times \mathbf{P} \\
\mathbf{R} &= \mathbf{J_{1\times 11}} \times \mathbf{P} \times \left ( \mathbf{I} - \mathbf{P} \right )^{-1}
\end{align}
At this point we can simplify the equation. We do this because multiplying a row vector of ones by each column of $\mathbf{P}$ is equivalent to adding up each column of $\mathbf{P}$ and putting the results into a row vector. Since these columns represents the transition probabilities from a given state and we we're considering all the states, each column must sum to one, as the probability of transitioning to some state is 1 (staying in the same state is considered a transition for simplicity).
\begin{align}
\mathbf{J_{1\times 11}} \times \mathbf{P} &= \mathbf{J_{1\times 11}} \\
\mathbf{R} &= \mathbf{J_{1\times 11}} \times \left ( \mathbf{I} - \mathbf{P} \right )^{-1}
\end{align}
This equation looks like it should work, but doesn't because $\mathbf{I} - \mathbf{P} $ isn't invertible, as it has a column of zeros. The left column of $\mathbf{P}$ is a one followed by all zeros, which is exactly the same as the left column of the identity matrix $\mathbf{I}$.

We can, however, look at a submatrix, because we know $\mathbb{E}R=0$, so we can just eliminate that part of our set of equations.
\begin{align}
\mathbf{R} &=
\begin{bmatrix}
0 &
\mathbb{E}R_1 &
\cdots &
\mathbb{E}R_{10}
\end{bmatrix}\\
&=
\begin{bmatrix}
0 & \mathbf{R_{1\text{-}10}}
\end{bmatrix}
\end{align}
We'll start by dividing $\mathbf{P}$ into submatrices.
\begin{align}
\mathbf{P} =
\begin{bmatrix}
1 & \mathbf{T} \\
\mathbf{0} & \mathbf{Q} \\
\end{bmatrix}
\end{align}
Here, 1 is a single 1, representing that once we get to state 0 we stay there. As a corollary, in state 0 we don't transition to any other states, so the column vector $\mathbf{0}$ represents a set of zeros for those transition probabilities. Next, $\mathbf{Q}$ represents the one-step transition probabilities amongst the transient states 1-10. (Transient meaning that the system is only in those states temporarily as it transitions, at least with probability 1. There is the pathological case where it could stay there forever which occurs with probability 0.) Finally, $\mathbf{T}$ represents the probabilities of transitioning in one step from the transient states 1-10 to the recurrent and absorbing state 0. (Recurrent states once reached, recur infinitely many times with probability 1. Absorbing states can never be left.) Note that $\mathbf{T}$ is often called $\mathbf{R}$, but we've already used that, so we had to go with a different letter here. I've often seen the matrix $\mathbf{P}$ organized a bit differently, with the recurrent states at the bottom. (The calculation we're doing is often referred to as the hitting time.) While convention is useful, we are not constrained by it.

Next, we just plug in these divided $\mathbf{R}$ and $\mathbf{P}$.
\begin{align}
\begin{bmatrix}0 & \mathbf{R_{1\text{-}10}} \end{bmatrix}
&= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times
\begin{bmatrix}
1 & \mathbf{T} \\
\mathbf{0} & \mathbf{Q} \\
\end{bmatrix}
\end{align}
Now we'll get rid of the left entry of both sides. Note that another way of looking at the issue with the full matrix $\mathbf{P}$ is that it yielded a nonsensical equation for $\mathbb{E}R_0$.
\begin{align}
\mathbb{E}R_0 &= 1 + \mathbb{E}R_0
\end{align}
We could fix this equation so that it's not wrong, but including $\mathbb{E}R_0 = \mathbb{E}R_0$ still doesn't let us solve for $\mathbb{E}R_0$, which we already know anyways.

To remove the left entry of the right side of our matrix equation, we remove the left column of $\mathbf{P}$.
\begin{align}
\mathbf{R_{1\text{-}10}} &= \left ( \mathbf{J_{1\times 11}} + \mathbf{R} \right ) \times \begin{bmatrix} \mathbf{T} \\ \mathbf{Q} \end{bmatrix}
\end{align}
Now we'll break up $\mathbf{J_{1\times 11}}$ and $\mathbf{R}$ on the right side, so that we can multiply out with the remains of $\mathbf{P}$, using the rules of matrix multiplication.
\begin{align}
\mathbf{R_{1\text{-}10}} &= \left (
\begin{bmatrix} \mathbf{J_{1\times 1}} & \mathbf{J_{1\times 10}} \end{bmatrix}
+ \begin{bmatrix} 0 & \mathbf{R_{1\text{-}10}} \end{bmatrix} \right )
\times
\begin{bmatrix} \mathbf{T} \\ \mathbf{Q} \end{bmatrix} \\
&=\mathbf{J_{1\times 1}} \times \mathbf{T} + \left ( \mathbf{J_{1\times 10}} + \mathbf{R_{1\text{-}10}} \right ) \times \mathbf{Q}
\end{align}
Now we do some algebra so that we can solve for $\mathbf{R_{1\text{-}10}}$.
\begin{align}
\mathbf{R_{1\text{-}10}} \times \left ( \mathbf{I} - \mathbf{Q} \right ) &= \mathbf{J_{1\times 1}} \times \mathbf{T}+ \mathbf{J_{1\times 10}} \times \mathbf{Q} \\
\mathbf{R_{1\text{-}10}} &= \left ( \mathbf{J_{1\times 1}} \times \mathbf{T}+ \mathbf{J_{1\times 10}} \times \mathbf{Q} \right ) \times \left ( \mathbf{I} - \mathbf{Q} \right ) ^ {-1}
\end{align}
Recall our step earlier where we used the equation $\mathbf{J_{1\times 11}} \times \mathbf{P} = \mathbf{J_{1\times 11}}$ to simplify, as each column of $\mathbf{P}$ sums to one. This remains true of the submatrix.
\begin{align}
\begin{bmatrix} \mathbf{J_{1\times 1}} & \mathbf{J_{1\times 10}} \end{bmatrix} \times
\begin{bmatrix}\mathbf{T} \\ \mathbf{Q} \end{bmatrix} &= \mathbf{J_{1\times 1}} \times \mathbf{T} + \mathbf{J_{1\times 10}} \times \mathbf{Q}\\
&= \mathbf{J_{1\times 10}}
\end{align}
Recall that $\mathbf{T}$ and $\mathbf{Q}$ have 10 columns, hence the 10 columns in the last equation, which we can use to find $\mathbf{R_{1\text{-}10}}$.
\begin{align}
\mathbf{R_{1\text{-}10}} &= \mathbf{J_{1\times 10}} \times \left ( \mathbf{I} - \mathbf{Q} \right ) ^ {-1}
\end{align}
The result of this computation for 1-10 initial defending hit points is plotted in Figure 7. While a fair amount of computation is elided by including a matrix inversion, which is not always trivial, in this case standard algorithms seem to have no issue, numerical or otherwise.

Figure 7: Calculated expected length siege by number of defending hits points |

The results from the three methods can be compared by inspecting the plots in Figures 8 and 9, which show the results with and without leadership/rerolls, respectively. The first rough estimate is somewhat lower than the two later techniques, which appear indistinguishable. Both provide the very similar trends, dominated by a linear factor between initial defending hit points and expected length of the siege.

Figure 8: Comparison of methods to find expected length of siege with rerolls |

Figure 9: Comparison of methods to find expected length of siege without rerolls |

It may also be interesting to compare against the methods used when we addressed the question What is the best weapon in D&D?. There we used a different approach when dealing with similar issues.