Improving the speed and energy efficiency of AI agents

Share

Agentic workflows are artificial intelligence-based software systems that combine multiple models and external tools to solve convoluted tasks, such as analyzing video and answering questions about it.

However, the way these highly fragmented systems are designed and implemented often creates inefficiencies that can lead to wasted processing power, energy and costs.

To improve efficiency, researchers from MIT and Microsoft have developed an clever system that streamlines the process of designing agentic workflows and automatically optimizes how they are implemented.

With this recent method, a developer can describe what he or she expects from an agentic workflow in plain language, without having to specify all the details of the application in advance.

The system automatically determines the best models and tools to operate, as well as the ideal hardware configuration and computational resource allocation when the workflow is executed by the cloud provider.

It adjusts these configurations on the fly based on each user’s priorities, such as minimizing costs or maximizing speed.

When tested on several agent-based workloads, this recent system reduced the number of compute units needed for deployment, significantly reducing power requirements and costs compared to time-honored approaches, without compromising performance.

“Agent workflows are becoming very complex and are quickly becoming the backbone of cloud providers’ businesses. Energy consumption is a huge issue, so we have to be very careful about the efficiency of these workflows. It’s very easy to overspend on resources, wasting energy and money. Enabling the cloud provider to intelligently make these workflows more resource-optimal is a win-win for all parties involved,” says Gohar Chaudhry, an electrical engineering and computer science (EECS) graduate student and lead author of the book article about this system.

He is joined in this article by Adam Belay, EECS associate professor and member of MIT’s Computer Science and Artificial Intelligence Laboratory; senior author Ricardo Bianchini, technical fellow and corporate vice president at Microsoft Azure; and others at Microsoft Azure. The paper will be presented at the USENIX Symposium on Operating System Design and Implementation.

Configuration puzzle

An agentic workflow is a system composed of several autonomous AI agents that collectively operate different models and tools, such as databases or Python programs, to dynamically perform a multi-step task, such as data processing or code generation.

These workflows can serve as behind-the-scenes processes that power user-facing applications.

Typically, developers must code all technical choices in advance. They must determine what AI agents, models and tools to operate and in what order to operate them. They also need to determine the hardware that supports the workflow and how to balance tradeoffs such as speed and cost.

This is particularly challenging because agentic workflows combine multiple black box models and a variety of tools, each with their own configuration options that may be offered by different companies.

If a recent AI model is released that improves the accuracy or performance of an application, the developer will have to start from scratch to implement it.

“Even if you want to do it all manually, it’s unlikely you’ll be able to optimally configure the workflow because the space of possible configurations is so large,” says Chaudhry.

Additionally, the cloud data center that deploys the application to customers lacks visibility into the workflow to allocate hardware resources in the most capable manner when a user requests them.

With this recent system called Murakkab (an Urdu word meaning putting things together), researchers sought to optimize the entire agent workflow process.

Active decision making

First, Murakkab enables developers to create an agentic workflow by describing their intentions for the application in high-level terms, rather than detailing how the multiple components of that workflow should be connected.

For example, a developer might describe a video Q&A application that extracts key frames, generates a transcript, and then answers users’ questions about the video.

“There are many ways to do this, and all these different models and tools influence how quickly the application can complete the task,” he says.

Murakkab uses straightforward developer specifications and automatically identifies the best existing models and tools that can be applied to your workflow.

It also determines which components must run sequentially and which can be run in parallel to improve performance.

“The platform makes configuration decisions dynamically over time, so if a new model or graphics accelerator comes out tomorrow, the developer doesn’t have to worry about it,” he says.

When a cloud service provider deploys this application to a customer, Murakkab optimizes the workflow by configuring its components to meet user constraints, such as prioritizing accuracy while meeting latency requirements.

Adaptively identifies ideal hardware allocation and deployment schedules to maximize performance in real time, then generates a workflow that is ready for execution by the cloud provider.

“Our system also gives cloud service providers visibility across multiple workloads so they can divide compute resources in the most efficient way while meeting user constraints,” he says.

Tested on various agentic video Q&A and code generation workflows, Murakkab met user requirements by using only about 35 percent of the computations required by other methods. It used only about 27 percent more energy for less than 25 percent of the cost.

The animated nature of Murakkab also allows users to balance trade-offs. In one case, the system reduced energy consumption in an agent-based workflow by more than an order of magnitude, with only about a 2 percent decrease in accuracy for the client.

The system also managed to identify an unexpectedly ideal configuration for the video frame selection model, optimizing video question and answer performance. Chaudhry says this type of optimization would be almost impossible for a programmer to do manually.

Next, researchers plan to expand their system to more convoluted workflows and larger computing clusters, while also exploring opportunities to optimize recent agent applications.

“There is a lot of potential to make these workflows more resource-optimal, so they use much less energy, but we need to think about it at the scale of the major cloud platforms,” Chaudhry says.

This research was supported in part by the Semiconductor Research Corporation and the United States Defense Advanced Research Projects Agency.

Latest Posts

More News