This AI agent is designed not to act dishonestly

Share

AI agents like OpenClaw have recently surged in popularity precisely because they can take control of your digital life. Whether you need a personalized digest of the morning news, a proxy server that can fight your cable company’s customer service, or a to-do list auditor to do some tasks for you and nudge you to solve others, agent assistants are designed to access your digital accounts and do your bidding. This is helpful, but it also causes a lot of chaos. The bots are there mass email deletion they were instructed to keep writing hits instead of perceived snubsand conducting phishing attacks on their owners.

Watching the pandemonium unfold in recent weeks, longtime engineer and security researcher Niels Provos decided to try something modern. Today it is launching a secure, open-source AI assistant called The Iron Curtain designed to add a critical layer of control. Instead of the agent interacting directly with systems and user accounts, it runs in an isolated virtual machine. And its ability to take any action depends on the policy – you could even call it a constitution – that the owner writes to manage the system. Most importantly, IronCurtain is also designed to receive these overarching policies in plain English and then guide them through a multi-step process that uses a enormous language model (LLM) to transform natural language into an enforceable security policy.

“Services like OpenClaw are getting the most attention right now, but I hope there’s an opportunity to say, ‘Well, we probably don’t want to do it this way,’” Provos says. “Instead, let’s develop something that still has very high usability, but doesn’t go down these completely unexplored, sometimes destructive paths.”

Provos says IronCurtain’s ability to take intuitive, straightforward statements and turn them into enforceable, deterministic or predictable red lines is crucial because LLM methods are known to be “stochastic” and probabilistic in nature. In other words, they don’t necessarily always generate the same content or provide the same information in response to the same prompt. This creates challenges for AI guardrails because AI systems may evolve over time in such a way that they change the way controls or constraints are interpreted, which could result in unfair actions.

Provos says IronCurtain’s policy can be as straightforward as: “The agent can read all my emails. Can email my contacts without asking. For others, ask me first. Never permanently delete anything.”

IronCurtain processes these instructions, transforms them into an executable policy, and then acts as an intermediary between the assistant agent in the virtual machine and the so-called model context protocol server, which provides LLM with access to data and other digital services to perform tasks. The ability to restrict an agent in this way adds an crucial element of access control that online platforms such as email providers do not currently offer because they were not built for a scenario where both a human owner and AI agent bots share a single account.

Provos notes that IronCurtain is designed to refine and improve each user’s “constitution” over time as the system encounters edge cases and asks for human feedback on how to proceed. The system, which is model agnostic and can be used with any LLM, has also been designed to maintain an audit trail of all policy decisions over time.

IronCurtain is a research prototype, not a consumer product, and Provos hopes people will contribute to the project by researching it and helping it evolve. Dino Dai Zovi, a renowned cybersecurity researcher who experimented with early versions of IronCurtain, says the project’s conceptual approach is consistent with his own intuition about how to limit agent-based AI.

The AI Sckool

Categories

This AI agent is designed not to act dishonestly

5 Powerful Python Decorators for Optimizing LLM Applications

War with Iran threatens global chip supplies and the expansion of artificial intelligence

Trump’s war with Iran could upend American farmers

ByteDance’s artificial intelligence ambitions are hampered by computational limitations and copyright concerns

OpenAI banned military applications. The Pentagon tested its models through Microsoft anyway

More News

War with Iran threatens global chip supplies and the expansion of artificial intelligence

ByteDance’s artificial intelligence ambitions are hampered by computational limitations and copyright concerns

OpenAI banned military applications. The Pentagon tested its models through Microsoft anyway

Apple blocks US users from downloading Chinese ByteDance apps

5 Powerful Python Decorators for Optimizing LLM Applications

War with Iran threatens global chip supplies and the expansion of artificial intelligence

Trump’s war with Iran could upend American farmers