A game like poker is one with imperfect information. Because each player has a set of starting cards that others can’t see, a player can bluff. Despite the added complexity of the game compared with perfect-information games like chess, in 2015 artificial intelligence (AI) researchers designed a game-winning strategy for Texas Hold ’em, a variation of poker with 10164 possible game states.
That number, however, is but a fraction of the 10535 possible states for the board game Stratego. As in capture the flag, each player guards their flag and tries to capture their opponent’s. Two players each control 40 pieces: A piece can capture one of lower rank, but the specific ranks of the opponent’s pieces are unknown. (Only during an interaction between pieces do their ranks become known.) For an AI algorithm to win, it must make a series of long-term strategic moves and analyze a staggering 1060 times as many starting arrangements as a two-player game of Texas Hold ’em.
The research laboratory DeepMind Technologies became famous in 2016 when its AlphaGo algorithm beat Go world champion Lee Sedol in a five-game match. Now a DeepMind team led by Julien Perolat, Bart De Vylder, and Karl Tuyls has developed an algorithm called DeepNash that plays Stratego at the level of a human expert.
DeepNash plays at a highly competitive level by finding a Nash equilibrium. In that approach, the algorithm finds a winning strategy by applying a set of tactical moves that can’t be exploited by the opponent.
Many games, including Stratego, can have multiple Nash equilibria, and oftentimes researchers will develop an opponent model that tracks all the possible game states that could form from various moves and the likelihood of the player making each of those moves on a particular turn.
But calculating all those possibilities in an imperfect-information game quickly becomes too computationally expensive. The researchers approached the challenge by building DeepNash as a deep-learning neural network, which improves the algorithm’s skill by having it play against itself, combined with a type of algorithm known as Regularized Nash Dynamics.
At its simplest, the Regularized Nash Dynamics algorithm is an iterative process consisting of three steps. In the first step, the game is transformed into a simplified decision-making situation in which two agents each take some action according to a regularization policy added to the game. Akin to an error-minimization technique, regularization discourages a newly learned set of actions from deviating too far from the policy assigned at the beginning of the iteration. Step two takes the simplified, transformed game and converges on a possible winning solution. In the third and final step, the previous transformation of the game is updated with the solution found in step two.
When playing Stratego, DeepNash made several humanlike tactical decisions. In Stratego, players can often gain an advantage by knowing more private information than their opponents, even if they have fewer pieces on the board. In some games, DeepNash purposefully chose not to move certain pieces, which left its opponent unsure of the rank of those pieces. Over the thousands of games played, DeepNash won 97% of the time against other AI bots and 84% of the time against human experts. (J. Perolat et al., Science 378, 990, 2022.)