Mastering Stratego, the classic game with imperfect information

Tests

Published: December 1, 2022
Author’s: Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarasov, Florian Strub and Karl Tuyls

DeepNash learns to play Stratego from scratch by combining game theory and deep modelless RL

Artificial intelligence (AI) systems for gaming have reached recent heights. Stratego, the classic board game more complicated than chess and Go and more cunning than poker, has now been mastered. Published in ScienceWe present Deep Nashan AI agent who has learned the game from scratch to a human expert level by playing against itself.

DeepNash uses an creative approach based on game theory and model-free deep reinforcement learning. His playing style converges to Nash equilibrium, which means his game is very challenging for an opponent to exploit. So challenging, in fact, that DeepNash was ranked in the top three all-time among experts on the world’s largest online Stratego platform, Gravon.

Board games have long been a measure of progress in artificial intelligence, allowing us to study how humans and machines develop and execute strategies in a controlled environment. Unlike chess and Go, Stratego is a game in which information is imperfect: players cannot directly observe the identity of their opponent’s pieces.

This complexity meant that other AI-based Stratego systems had difficulty advancing beyond the amateur level. This also means that a highly effective artificial intelligence technique called “game tree search”, previously used to master many games with perfect information, is not scalable enough for Stratego. For this reason, DeepNash goes far beyond game tree searches.

The value of mastering Stratego goes beyond the game. As we pursue our mission of solving intelligence problems to advance science and benefit humanity, we must build advanced artificial intelligence systems that can operate in complicated, real-world situations with circumscribed information about other agents and humans. Our article shows how DeepNash can be applied in situations of uncertainty and effectively balance the results to lend a hand solve complicated problems.

Getting to know Stratego

Stratego is a turn-based capture the flag game. It’s a game of bluff and tactics, information gathering and subtle maneuvering. It is a zero-sum game, so any gain for one player means a loss of the same size for his opponent.

Stratego poses a challenge for artificial intelligence in part because it is a game of imperfect information. Both players start by placing their 40 pieces in any starting configuration, initially hidden from each other at the beginning of the game. Because both players do not have access to the same knowledge when making a decision, they must balance all possible outcomes, providing a challenging benchmark from which to study strategic interactions. The item types and their rankings are shown below.

Left: Art rankings. In battles, higher-ranked units win, except 10 (Marshal) are lost if a Spy attacks, and Bombs always win, except when intercepted by a Miner.
Middle: Possible starting formation. Notice how the flag is safely tucked at the back, surrounded by protective bombs. The two featherlight blue areas are “lakes” and you can never enter them.
Normal: A game within the game, showing the Blue Spy gaining 10 Red.

Information is strenuous to come by in Stratego. The identity of an opponent’s pawn is usually only revealed when it encounters another player on the battlefield. This is in stark contrast to games that require perfect information, such as chess or Go, in which the location and identity of each piece are known to both players.

Machine learning approaches that work so well in perfect information games like DeepMind’s AlphaZero cannot be easily transferred to Stratego. The need to make decisions based on imperfect information and the possibility of bluffing makes Stratego more like Texas Hold’em poker and requires human skill, as American writer Jack London noted: “Life isn’t always about holding good cards, but sometimes playing well a weak hand.”

However, the artificial intelligence techniques that work so well in games like Texas Hold’em cannot be transferred to Stratego due to the length of the game – often hundreds of moves before a player wins. Reasoning in Stratego must involve a huge number of sequential actions, without obvious insight into how each action contributes to the final result.

Finally, the number of possible game states (expressed as “game tree complexity”) is off the scale compared to chess, Go, and poker, making it extremely challenging to solve. This is what excited us at Stratego, and why it has been a decades-long challenge for the AI community.

The scale of differences between chess, poker, Go and Stratego.

Searching for balance

DeepNash uses an creative approach based on a combination of game theory and modeling-free deep reinforcement learning. “Modelless” means that DeepNash does not explicitly attempt to model the opponent’s private game state during gameplay. Especially in the early stages of the game, when DeepNash knows little about the opponent’s pieces, such modeling would be ineffective, if not impossible.

And because the complexity of the Stratego game tree is so enormous, DeepNash cannot apply the strong approach to AI games – Monte Carlo tree search. Tree crawling has been a key ingredient in many AI breakthroughs for less complicated board games and poker.

Instead, DeepNash relies on a recent algorithmic concept based on game theory that we call Regularized Nash Dynamics (R-NaD). Operating at an unparalleled scale, R-NaD drives DeepNash’s learning behavior toward a so-called Nash equilibrium (dive into the technical details in our newspaper).

The gaming behavior that results in a Nash equilibrium cannot be exploited over time. If a person or machine played a completely underutilized Stratego, the worst win rate they could achieve would be 50%, and only if they were up against an equally perfect opponent.

In matches against the best Stratego bots – including several winners of the Computer Stratego World Championship – DeepNash’s win rate exceeded 97% and often reached 100%. Compared to the top expert players on the Gravon gaming platform, DeepNash achieved a win rate of 84%, earning him a spot in the top three of all time.

Expect the unexpected

Stratego players try to be unpredictable, so it’s worth keeping information hidden. DeepNash shows how it values information in quite a striking way. In the example below, against a human player, DeepNash (blue) sacrificed, among others, a 7 (Major) and an 8 (Colonel) early in the game and as a result was able to locate a 10 (Marshal), a 9 (General), an 8, and two 7s opponent.

In this situation, at the beginning of the game, DeepNash (blue) has already located many of the opponent’s most powerful elements, while keeping his own key elements secret.

These efforts placed DeepNash at a significant financial disadvantage; he lost 7 and 8, while his human opponent retained all of his pieces of rank 7 and above. Nevertheless, with solid information about the best opponent, DeepNash estimated its chances of winning at 70% – and it did.

The art of bluffing

Just like in poker, a good Stratego player must sometimes represent strength even when delicate. DeepNash learned various bluffing tactics. In the example below, DeepNash uses a 2 (a delicate Scout, unknown to the opponent) as if it were a high-ranking piece, chasing the opponent’s known 8. The human opponent decides that the pursuer is most likely 10, so they try to lure him into an ambush set by their Spy. While risking only a miniature fraction, this DeepNash tactic effectively flushes out and eliminates the opponent’s Spy, which is a critical piece.

The human player (red) believes that the unknown pawn chasing his 8 must be 10 DeepNash (note: DeepNash has already lost his only 9).

See more by watching these four videos of full-length DeepNash gameplay with (anonymous) experts: Game 1, Game 2, Game 3, Game 4.

“

DeepNash’s level of play surprised me. I’ve never heard of a Stratego AI player coming close to winning a match against an experienced human player. However, after playing against DeepNash myself, I wasn’t surprised by the top 3 I managed to achieve on Gravon later on. I expect he would do very well if he were allowed to compete in the Human World Championship.

Vincent de Boer, co-author of the article and former Stratego world champion

Future directions

Although we developed DeepNash for the strict world of Stratego, our novel R-NaD method is directly applicable to other two-player zero-sum games with both perfect and imperfect information. R-NaD has the potential to generalize far beyond two-player game settings to solve large-scale real-world problems that are often characterized by imperfect information and astronomical state spaces.

We also hope that R-NaD can lend a hand unlock recent applications of AI in domains where there are huge numbers of people or AI participants with different goals who may not have information about each other’s intentions or what is happening in the their surroundings, for example in large-scale traffic management optimization to reduce drivers’ travel times and associated exhaust emissions.

By creating a generalizable AI system that is strong in the face of uncertainty, we hope to make AI’s problem-solving capabilities even greater in our inherently unpredictable world.

Learn more about DeepNash by reading our article in Science.

We have made the software open source for researchers interested in trying R-NaD or working with our newly proposed method our code.

Categories

Mastering Stratego, the classic game with imperfect information

Getting to know Stratego

Searching for balance

Expect the unexpected

The art of bluffing

Future directions

Trump’s general surgeon tears the movement of Maha

The next IVA JONY product is powered by the “unintentional consequences” of the iPhone

Himscast: Assist in a shortage of workforce and more can come from technical transactions

Recent AI META glasses can have a “super sensitivity” mode with face recognition

Buy now or pay more later? “Macroeconomic uncertainty” has anxiety of buyers

More News

Build luxurious, interactive internet applications with updated Gemini 2.5 Pro

Gemini 2.5 Pro Preview: even better coding efficiency

Music AI Sandbox, now with up-to-date functions and wider access

Start construction with Flash Gemini 2.5

Trump’s general surgeon tears the movement of Maha

The next IVA JONY product is powered by the “unintentional consequences” of the iPhone

Himscast: Assist in a shortage of workforce and more can come from technical transactions