Wednesday, March 18, 2026

How game theory can make artificial intelligence more reliable

Share

A much bigger challenge for AI researchers was the game of diplomacy – a favorite game of politicians such as John F. Kennedy and Henry Kissinger. Instead of two opponents, the game features seven players, whose motives can be hard to read. To win, the player must negotiate and enter into cooperation agreements, which anyone can break at any time. Diplomacy is so sophisticated that the Meta group was pleased when in 2022 AI Cicero program over the course of 40 games, he developed “human-level gameplay”. While it didn’t defeat the world champion, Cicero performed well enough to be in the top 10 percent compared to human participants.

During the project, Jacob – a member of the Meta team – was struck by the fact that Cicero relied on a language model to generate dialogue with other players. He sensed untapped potential. The team’s goal, he said, “was to build the best language model we could for playing this game.” But what if they instead focused on making the best game possible to improve the performance of gigantic language models?

Consensual interactions

In 2023, Jacob began exploring this issue at MIT, working with Yikang Shen, Gabriela Farinand his advisor, Jakub Andreas, on what will become the consensus game. The main idea came from imagining a conversation between two people as a cooperative game in which success comes when the listener understands what the speaker is trying to convey. Specifically, the consensus game aims to match two systems of the language model – the generator, which handles generative questions, and the discriminator, which handles discriminative questions.

After several months of stops and starts, the team turned this rule into a full game. First, the generator receives a question. It can come from a human or from an existing list. For example: “Where was Barack Obama born?” The generator then receives responses from several candidates, say from Honolulu, Chicago and Nairobi. Again, these options can come from a human, a list, or a search performed by the language model itself.

But before answering, the generator is also told whether it should answer the question correctly or incorrectly, depending on the result of a fair coin toss.

If the result is heads, the machine will attempt to give the correct answer. The generator sends the original question along with the selected answer to the discriminator. If the discriminator determines that the generator intentionally sent the correct answer, they each receive one point as an incentive.

If the coin lands on heads, the generator sends an answer that it thinks is wrong. If the discriminator thinks they have given a wrong answer on purpose, they both get a point again. The idea is to encourage understanding. “It’s like teaching a dog a trick,” Jacob explained. “Give them a treat when they do the right thing.”

Both the generator and the discriminator start with some initial “beliefs”. They take the form of a probability distribution associated with different choices. For example, a generator might believe, based on information gathered from the Internet, that there is an 80 percent chance that Obama was born in Honolulu, a 10 percent chance that he was born in Chicago, a 5 percent chance that Nairobi, and a 5 percent chance elsewhere . The discriminator may start with a different distribution. While both “players” are still rewarded for reaching an agreement, they also receive points deducted for deviating too much from their original beliefs. This arrangement encourages players to incorporate their knowledge of the world into their answers – again, drawn from the Internet – which should escalate the accuracy of the model. Without something like this, they could agree to a completely wrong answer, like Delhi, but still score points.

Latest Posts

More News