optimization – Combing playing darts with dynamic programming

When you play darts, you can throw at 62 regions, z on the dartboard. Namely, the singles regions S1, …, S20, the double regions D1, …, D20, the treble regions T2, …, T20 and the single- and double bullseye, SB and DB. Every region has a numerical score, h. Lets say, that mu represents the location on the dartboard where the player aims at. In addition, we know the probability of hitting any z when aiming for mu.

Ignoring turns, a player wants to minimize the expected number of throws to get to zero. This can be formulated as a stochastic shortest path DP. The state space for this DP is s {1, 2, …, 501} and represents the players score. i denotes the number of darts thrown. The goal is to determine the aiming location that minimizes the expected number of throws.

When a player throws, there are three possible outcomes:

This can be formulated as:

enter image description here

In addition, we have a Bellman equation that satisfies

enter image description here

The terminal condition is equal to


We can solve the DP successively for s = 2, …, 501. All of the above can be found on page 13 of the following paper.

Just for my understanding, please let me know if I understand the methodology correctly:
If we have s = 2, there is only one legal action, D1. This the only z where where we consider all p(z; mu) for, correct? Because other possibilities of z results in busts. And then you have basically a circular reference where the value of V(2) is

V(f(2,0) = V(2) 

In addition, throwing D1 when aiming for mu gives the terminal condition

V(f(2, D1)) = 0

So, the Bellman equation will always be equal to 1, independent of mu? Although, the probability of throwing D1 is larger when the player aims for D1 than for DB, the product of the probabilities and the terminal condition is always 0 (and next we add 1).So, it doesn’t matter where you aim on the dartboard when s = 2, but this feels incorrect.
Please advice!