Emergent Bartering Behavior in Multi-Agent Reinforcement Learning

Share

IN our last articleWe investigate how populations of deep reinforcement learning (deep RL) agents can learn microeconomic behaviors such as the production, consumption, and trade of goods. We find that artificial agents learn to make economically rational decisions about production, consumption, and prices, and to respond appropriately to changes in supply and demand. The population converges to local prices that reflect nearby resource abundances, and some agents learn to transport goods between these areas to “buy low and sell high.” This work advances the broader agenda of multi-agent reinforcement learning research by introducing up-to-date societal challenges that agents must learn to solve.

While the goal of multi-agent reinforcement learning research is ultimately to produce agents that perform with the full range and complexity of human social intelligence, the set of domains considered to date is woefully incomplete. Key domains where human intelligence excels and where humans spend significant amounts of time and energy are still missing. The subject of economics is one such domain. Our goal in this work is to establish environments based on the themes of trading and negotiation for apply by researchers in multi-agent reinforcement learning.

Economics uses agent-based models to simulate the behavior of economies. These agent-based models often rely on economic assumptions about how agents should act. In this work, we introduce a simulated multi-agent world in which agents can learn economic behavior from scratch, in a way that is familiar to any student of Microeconomics 101: decisions about production, consumption, and pricing. But our agents must also make other choices that result from a more physically embodied way of thinking. They must navigate the physical environment, find trees to pick fruit from, and partners to trade with. Recent advances in deep RL techniques now make it possible to create agents that can learn these behaviors on their own, without requiring a programmer to encode domain knowledge.

Our environment, called Fruit marketis a multiplayer environment where agents produce and consume two types of fruit: apples and bananas. Each agent is skilled at producing one type of fruit but prefers another – if agents learn to trade and exchange goods, both parties will be better off.

In our experiments, we show that current deep RL agents can learn to trade, and that their behavior in response to changes in supply and demand is consistent with the predictions of microeconomic theory. We then extend this work to present scenarios that would be very challenging to solve using analytical models, but that are straightforward for our deep RL agents. For example, in environments where each type of fruit grows in a different area, we observe the emergence of different price regions associated with local fruit abundance, as well as the subsequent learning of arbitrage behavior by some agents who become specialized in transporting fruit between these regions.

The field of agent-based computational economics uses similar simulations for economic research. In this work, we also show that state-of-the-art deep RL techniques can flexibly learn to operate in these environments from their own experience, without requiring any built-in economic knowledge. This highlights the recent progress of the reinforcement learning community in multi-agent RL and deep RL, and shows the potential of multi-agent techniques as tools for advancing simulated economic research.

How the path to artificial general intelligence (AGI), multivariate reinforcement learning research should cover all the critical domains of social intelligence. However, conventional economic phenomena such as trade, negotiation, specialization, consumption, and production have not been included in this research so far. This paper fills this gap and provides a platform for further research. To facilitate future research in this area, the fruit market environment will be included in the next version Crucible Set of environments.

The AI Sckool

Categories

Emergent Bartering Behavior in Multi-Agent Reinforcement Learning

When summer is approaching, federal cuts threaten the program to keep sensitive people in the frigid

Chatgpt will now exploit its “memory” to personalize internet search

Up-to-date reasoning of AI OpenAi Hallucinations more

Himscast: Should every health care organization have an AI strategy?

Wikipedia gives programmers to artificial intelligence of their data to reject the Bot Coppes

More News

Start construction with Flash Gemini 2.5

Dolphingemma: How Google AI helps decoding dolphins communication

Generate movies in twins and beat with veo 2

Assessment of potential threats of advanced cyber security AI

When summer is approaching, federal cuts threaten the program to keep sensitive people in the frigid

Chatgpt will now exploit its “memory” to personalize internet search

Up-to-date reasoning of AI OpenAi Hallucinations more