Dynamic programming – Probability, gambling and a die
April 17, 2011 Leave a comment
Let’s say we have a fair 6-sided die. Player A can earn money based on what shows up when the die is rolled. The rule of the game is as follows: Player A will roll a die upto 3 times. After each roll, Player A has two choices: (1) He can take money in dollars equal to the number showing on the upturned face of the die. (i.e. if 5 shows on the die, he can take $5.) In this case, the die is not rolled any more, and the game ends. (2) He can decide to discard this roll of the die, and decide to roll the die again. Note that the die can be rolled at most 3 times.
Question — What is the expected payoff for Player A?
Let’s say that instead of 3 rolls of the die, the game allowed only for 1 roll. What is the expected payoff for Player A in this case? It is clearly equal to the expected value of the number that shows up when the die is rolled. And since the die is fair, this expectation is equal to . And Player A has no choice but to accept whatever shows up on the die because there is only one roll possible. Hence, in this game, the expected payoff for Player A is $3.5.
Now, consider what happens if instead of 1 roll, we allow for upto 2 rolls.
Now, if the 1st roll is a 6, obviously Player A would pocket $6 and decide to end the game there itself. Also, if the 1st roll is a 1, Player A can always decide to roll again [because irrespective of what shows up on the 2nd roll, it cannot be less than $1, and hence Player A has nothing to lose by deciding to discard the 1st roll and opting to roll again.]
Now consider what happens if the 1st roll is a 3. Now, Player A is faced with a predicament. What should he do? Should he decide to stop there and go home with $3, or should he discard this 3 in the hope of higher returns in the 2nd and final roll?
This is where having learnt probability well in school helps.
We know that the expected value of the number that shows up when the die is rolled once is 3.5.
Hence, if in the 1st roll, Player A gets a number greater than 3.5 i.e. either 4, 5 or 6, he can decide to stop there and take the corresponding amount of money.
However, if he gets either 1, 2 or 3 in his 1st roll, he should discard that, and decide to roll again, because the expected value of the 2nd roll is 3.5.
Hence, the expected payoff for Player A from a game which allows for at most 2 rolls is: .
(Here, is the probability of getting a 1, 2 or 3 on the 1st roll, in which case the expected payoff becomes 3.5. )
Now, coming back to our original game, we can decide as to how we should proceed after the die is rolled once.
We know that the expected payoff from 2 rolls of the die is $4.25.
Hence the strategy of Player A should be the following:
– If the 1st roll of the die yields either a 5 or a 6, take the money, and end the game.
– Else, discard the 1st roll, and continue the game.
The expected payoff can be computed easily based on the preceding observations:
Expected payoff for Player A = .
This is a classic question using the principle of dynamic programming. Let’s say the the game allows for the die to be tossed upto times.
Let denote the expected payoff for Player A in a game that allows upto rolls of the dice.
The strategy for Player A after roll should be the following:
— If the number showing on the dice is greater than , take the money and stop the game.
— Else, discard that roll, and decide to continue on to the next roll.
The recurrence relation for computing the values can be derived in terms of as was done in the case of the 3-roll example that we saw earlier. Once we have done that, we can calculate the values in the order, , and so on uptil .