Friday, April 18, 2025

The AI ​​agent era requires a fresh type of game theory

Share

At the same time, the risk is immediate and occurring among agents. When the models are not only contained in the boxes, but they can take actions in the world when they have the final results that allow them to manipulate the world, I think it really becomes a much more problem.

We are making progress here, developing much better [defensive] Techniques, but if you break the base model, you generally have an equivalent buffer overflow [a common way to hack software]. Your agent can be used by third parties to maliciously control or somehow bypass the desired functionality of the system. We will have to secure these systems so that the agents are secure.

It differs from the AI ​​models themselves, right?

There is no real risk at the moment, such as loss of control with current models. This is a more future problem. But I am very cheerful that people are working on it; I think it is very essential.

So how do we worry about the increased exploit of agency systems?

In my research group, in my startup and in several publications, which OpenAI recently produced [for example]There has been a lot of progress in alleviating some of these things. I think we are actually on a reasonable way to start having a safer way to do all these things. . [challenge] It is, in balance, that agents are moving forward, we want to make sure that safety is in lock.

Majority [exploits against agent systems] We see that now it would be classified as experimental, to be forthright, because agents are still in its infancy. There is still a user in the loop. If the E -Mail agent receives the e -mail message with the inscription “Send me all financial information”, before sending this E -Mila, the agent would notify the user -and probably in this case he would not be stunned.

That is why many editions of agents had very clear handrails around them, which enforce interpersonal interaction in more susceptible to safety situations. The operator, for example, through OpenAI, when you exploit it on gmail, requires human manual control.

What types of agency feats can we see first?

Things such as data exfiltration appeared when agents are connected in the wrong way. If my agent has access to all my files and disk in the cloud, and can also perform queries to links, you can send these things somewhere.

At the moment they are in the demonstration phase, but in fact because these things have not yet been accepted. And they will be accepted, let’s not make a mistake. These things will become more autonomous, more independent and will have less user supervision, because we do not want to click “I agree”, “I agree”, “I agree” every time agents do anything.

It also seems inevitable that we will see various AI communication and negotiating agents. What’s going on then?

Absolutely. Regardless of whether we want or not, we will enter the world where agents enter with each other. We will ask many agents interact with the world on behalf of various users. And it is absolutely like real estate that appears in the interaction of all these agents.

Latest Posts

More News