In game theory, generalists sometimes beat specialists

Share

Whether you’re playing poker against a single opponent or in a bidding war for a home purchase with another potential buyer, you’re operating under conditions of imperfect information. You know what cards you hold in a poker game, and you know how much above the asking price of your home you can afford, but you don’t know your opponent’s hand in a card game or how high the other homebuyer is willing to go.

AND paper co-authored by MIT researchers and presented in April at the International Conference on Learning Representations in Rio De Janeiro, won’t tell you what exactly to do in such situations. But it offers modern insight into so-called imperfect information games, in which two players face each other in a zero-sum competition in which one player’s gain means the other’s loss.

MIT researchers on the project include Sobhan Mohammadpour, a graduate student in the Department of Electrical Engineering and Computer Science (EECS) and the Laboratory for Information and Decision Systems (LIDS); and Gabriele Farina, assistant professor at EECS and principal investigator at LIDS. Additional co-authors include Max Rudolph of the University of Texas at Austin (UT), Nathan Lichtlé of the University of California at Berkeley (UCB), Alexandre Bayen of UCB, J. Zico Kolter of Carnegie Mellon University (CMU), Amy X. Zhang ’11, MNG ’12 of UT; Eugene Vinitsky of Recent York University; and Samuel Sokota of CMU.

The modern work focuses on algorithms that can be used to train neural networks to play games with imperfect information. The assumption, long used in the field, was that algorithms based on game theory principles in this context would clearly outperform a variety of general-purpose algorithms called policy gradient methods, which began to be used in decision-making in the 1990s. The term “policy” in this context generally means strategy, while “gradient” refers to the path leading towards the greatest change – for example, to the top (or bottom) of the hill. Policy gradient methods are used to train neural networks to make decisions that lead – in compact, sequential steps – to a specific goal (such as reaching a peak, metaphorically speaking), with constant adjustments and course corrections made along the way to bring the agent closer to its intended goal.

Although strategy games weren’t on the original agenda when policy gradient methods were developed in the early 1990s, the authors of the modern paper still wondered how this class of algorithms might fare in two-player games. According to Farina, these methods become more intricate to analyze in multi-agent settings. “There is still a direction you can take to improve your situation, but due to the other player’s actions, that direction may constantly change throughout the game. These changes can be rapid.”

“It was almost certain that specialized game theory-based algorithms would be the right approach in this case,” says Sokota. “Our study showed that policy gradient methods may perform better than these specialized algorithms, and that specialized algorithms may not perform as well as people thought, which raises an interesting sociological question: why has this gone unnoticed for so long. Part of the answer is that the field has not done the engineering required to rigorously evaluate the algorithms, so it has been difficult to tell what worked and what didn’t.”

Therefore, the main contribution of this work was to provide an unbiased way to evaluate various algorithms that can teach agents – i.e. neural networks – how to compete in games based on imperfect information. “We’re taking a different approach,” notes Rudolph. “Unlike many papers published in this field, we do not propose a new algorithm that can beat other algorithms. We propose a benchmark that can evaluate these algorithms.”

Simply put, a benchmark consists of software designed to evaluate the performance of algorithms. “We offer a testing ground, a playground where people can use their algorithms, train them for a specific task and see how they perform,” says Farina.

The group calculates a player’s performance based on a concept called exploitability, which measures how well a player performs against a “worst-case opponent,” Sokota explains. “In a game like poker, my opponent wouldn’t know what my hand was, but he would know how I would handle a given hand.” Achieving zero on this scale means perfect play, while a high utilization score means play is far from optimal.

The team’s experiments included five games: two versions of Phantom Tic-Tac-Toe in which players cannot see what their opponent did, two variants of the Hex board game with imperfect information, and another cheating game called Liar’s Dice.

The biggest challenge the researchers faced was adapting the usability measure to work in games of this size, which can span up to 30 billion states. “State” in this case is not only all possible positions on the board, but also includes the entire history of the game, including every step and mistake along the way.

“It’s like looking into a dark room filled with things you can’t see,” says Mohammadpour. “Somehow you have to find out where these objects are and how exactly they got there.” Mohammadpour adds that previous researchers have typically used game exploitation opportunities that are 100,000 times smaller than those analyzed in their study.

In experiments conducted on these five games, neural networks trained with policy gradient algorithms achieved better (lower) utilization scores compared to networks trained with game theory-based algorithms. In the next round’s head-to-head competition, the policy gradient-trained networks again outperformed their game-theoretic-trained opponents. “We are reassured by these results,” says Rudolph, “because they give us greater confidence in our benchmarking approach.”

The team has made their benchmarking software free and convenient to employ. “You don’t need a supercomputer,” says Mohammadpour. “You can run it on a regular laptop. Just add one line of code to a widely used collection of benchmarking software called OpenSpiel.”

Although their experiments involved fairly undiscovered games, Farina would like to place this work in a broader context. “Remember that the term ‘game’ really refers to any strategic interaction between multiple agents,” he says. “The conclusions we draw from this research are by no means limited to recreational games.”

Vinitsky agrees. “Hidden information is a very important property of the world,” he says. “This applies to a range of things – including military operations, trade scenarios and negotiations – and all of this takes place under conditions of hidden information. The idea that we can improve these games suggests that we can also perform better under other conditions.”

Ian Gemp, a computer scientist and game theory expert at Google DeepMind who was not involved in this study, finds these results encouraging. “This work is a compelling reminder,” he says, “that modernizing classic tools [like policy gradient methods] remains a highly productive path to solving complex strategic problems.”

Latest Posts

More News