# Minimax

34,200pages on
this wiki

Minimax (sometimes minmax) is a method in decision theory for minimizing the maximum possible loss. Alternatively, it can be thought of as maximizing the minimum gain (maximin). It started from two player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves. It has also been extended to more complex games and to general decision making in the presence of uncertainty.

A simple version of the algorithm deals with games such as tic-tac-toe, where each player can win, lose, or draw. If player A can win in one move, his best move is that winning move. If player B knows that one move will lead to the situation where player A can win in one move, while another move will lead to the situation where player A can, at best, draw, then player B's best move is the one leading to a draw. Late in the game, it's easy to see what the "best" move is. The Minimax algorithm helps find the best move, by working backwards from the end of the game. At each step it assumes that player A is trying to maximize the chances of A winning when he plays, while on the next turn player B is trying to minimize the chances of A winning (i.e., to maximize B's own chances of winning).

## Minimax criterion in statistical decision theoryEdit

In classical statistical decision theory, we have an estimator $\delta$ that is used to estimate a parameter $\theta \in \Theta$. We also assume a risk function $R(\theta,\delta)$, usually specified as the integral of a loss function. In this framework, $\tilde{\delta}$ is called minimax if it satisfies

$\sup_\theta R(\theta,\tilde{\delta}) = \inf_\delta \sup_\theta R(\theta,\delta)$.

An alternative criterion in the decision theoretic framework is the Bayes estimator in the presence of a prior distribution $\Pi$. An estimator is Bayes if it minimizes the average risk.

$\int_\Theta R(\theta,\delta)\,d\Pi(\theta)$

## Minimax algorithm with alternate movesEdit

A minimax algorithm is a recursive algorithm for choosing the next move in a two-player game. A value is associated with each position or state of the game. This value is computed by means of a position evaluation function and it indicates how good it would be for a player to reach that position. The player then makes the move that maximises the minimum value of the position resulting from the opponent's possible following moves. If it is A's turn to move, A gives a value to each of his legal moves.

One allocation method is to assign a certain win for A as +1 and for B as −1. This leads to combinatorial game theory as developed by John Horton Conway.

An alternative is to use a rule that if the result of a move is an immediate win for A it is assigned positive infinity and, if it is an immediate win for B, negative infinity. The value to A of any other move is the minimum of the values resulting from each of B's possible replies. (A is called the maximizing player and B is called the minimizing player), hence it is called the minimax algorithm. The above algorithm will assign a value of positive or negative infinity to any position since the value of every position will be the value of some final winning or losing position. Often this is generally only possible at the very end of complicated games such as chess or go, since it is not computationally possible to look ahead as far as the completion of the game, except towards the end, and instead positions are given finite values as estimates of the degree of belief that they will lead to a win for one player or another.

This can be extended if we can supply a heuristic evaluation function which gives values to non-final game states without considering all possible following complete sequences. We can then limit the minimax algorithm to look only at a certain number of moves ahead. This number is called the "look-ahead", measured in "plies". For example, "Deep Blue" looks ahead 12 plies.

The algorithm can be thought of as exploring the nodes of a game tree. The effective branching factor of the tree is the average number of children of each node (i.e., the average number of legal moves in a position). The number of nodes to be explored usually increases exponentially with the number of plies (it is less than exponential if evaluating forced moves or repeated positions). The number of nodes to be explored for the analysis of a game is therefore approximately the branching factor raised to the power of the number of plies. It is therefore impossible to completely analyze games such as chess using the minimax algorithm.

The performance of the naïve minimax algorithm may be improved dramatically, without affecting the result, by the use of alpha-beta pruning. Other heuristic pruning methods can also be used, but not all of them are guaranteed to give the same result as the un-pruned search.

## Minimax theorem with simultaneous movesEdit

The following example of a zero-sum game, where A and B make simultaneous moves, illustrates the minimax algorithm. If each player has three choices and the payoff matrix for A is:

B chooses B1 B chooses B2 B chooses B3
A chooses A1     +3     -2     +2
A chooses A2     -1      0     +4
A chooses A3     -4     -3     +1

and B has the same payoff matrix with the signs reversed (i.e. if the choices are A1 and B1 then B pays 3 to A) then the simple minimax choice for A is A2 since the worst possible result is then having to pay 1, while the simple minimax choice for B is B2 since the worst possible result is then no payment. However, this solution is not stable, since if B believes A will choose A2 then B will choose B1 to gain 1; then if A believes B will choose B1 then A will choose A1 to gain 3; and then B will choose B2; and eventually both players will realize the difficulty of making a choice. So a more stable strategy is needed.

Some choices are dominated by others and can be eliminated: A will not choose A3 since either A1 or A2 will produce a better result, no matter what B chooses; B will not choose B3 since B2 will produce a better result, no matter what A chooses.

A can avoid having to make an expected payment of more than 1/3 by choosing A1 with probability 1/6 and A2 with probability 5/6, no matter what B chooses. B can ensure an expected gain of at least 1/3 by using a randomized strategy of choosing B1 with probability 1/3 and B2 with probability 2/3, no matter what A chooses. These mixed minimax strategies are now stable and cannot be improved.

John von Neumann proved the Minimax theorem in 1928, stating that such strategies always exist in two-person zero-sum games and can be found by solving a set of simultaneous equations.

## Minimax in the face of uncertaintyEdit

Minimax theory has been extended to decisions where there is no other player, but where the consequences of decisions depend on unknown facts. For example, deciding to prospect for minerals entails a cost which will be wasted if the minerals are not present, but will bring major rewards if they are. One approach is to treat this as a game against Nature, and using a similar mindset as Murphy's law, take an approach which minimizes the maximum expected loss, using the same techniques as in the two-person zero-sum games.

In addition, expectiminimax trees have been developed, for two-player games in which chance (for example, dice) is a factor.

## Minimax in non-zero-sum gamesEdit

If games have a non-zero-sum in terms of the payoffs between the players, apparently non-optimal strategies may evolve. For example in the prisoner's dilemma, the minimax strategy for each prisoner is to betray the other even though they would each do better if neither confessed their guilt. So, for non-zero-sum games the best strategy is not necessarily minimax.