MIT researchers are developing an effective way to train more reliable AI agents

Fields ranging from robotics to medicine to political science are trying to train artificial intelligence systems to make all kinds of meaningful decisions. For example, using an artificial intelligence system to intelligently control traffic in a congested city could assist drivers reach their destinations faster, while improving safety and sustainability.

Unfortunately, teaching an AI system to make good decisions is not an straightforward task.

The reinforcement learning models that underpin AI decision-making systems still often fail in the face of even miniature differences in the tasks they are trained to perform. In the case of traffic, the model may have difficulty controlling a set of intersections with different speed limits, number of lanes, or traffic patterns.

To make reinforcement learning models more reliable for elaborate tasks with variability, MIT researchers introduced a more competent algorithm for training them.

The algorithm strategically selects the best tasks to train the AI agent so that it can efficiently perform all tasks in a set of related tasks. In the case of traffic airy control, each task can be one intersection in the task space covering all intersections in the city.

By focusing on the fewer intersections that contribute most to the overall efficiency of the algorithm, this method maximizes performance while keeping training costs low.

The researchers found that their technique was five to 50 times more effective than standard approaches on a range of simulated tasks. This augment in performance helps the algorithm learn a better solution faster, which ultimately improves the performance of the AI agent.

“By thinking outside the box, we were able to see incredible performance improvements thanks to a very simple algorithm. An algorithm that is not very complex is more likely to be adopted by the community because it is easier to implement and easier for others to understand, says senior author Cathy Wu, Thomas D. and Virginia W. Cabot Associate Professor of Career Development at Civil and Environmental Engineering (CEE) and at the Institute of Data, Systems and Society (IDSS) and member of the Laboratory of Information and Decision Systems (LIDS).

She joined paper by lead author Jung-Hoon Cho, graduate student in Central and Eastern Europe; Vindula Jayawardana, graduate of the Faculty of Electrical Engineering and Computer Science (EECS); and Sirui Li, IDSS graduate. The research results will be presented at the Conference on Neural Information Processing Systems.

Finding the golden mean

To train an algorithm to control traffic lights at multiple intersections in a city, an engineer typically chooses one of two main approaches. It can train one algorithm independently for each intersection using only that intersection’s data, or train a larger algorithm using data from all intersections and then apply it to each of them.

But each approach has its drawbacks. Training a separate algorithm for each task (e.g. a given intersection) is a time-consuming process that requires a huge amount of data and calculations, while training one algorithm for all tasks often leads to low performance.

Wu and her colleagues looked for a middle ground between these two approaches.

In their method, they select a subset of tasks and train one algorithm independently for each task. Importantly, they strategically select individual tasks that are most likely to improve the overall performance of the algorithm across all tasks.

They use a popular reinforcement learning trick called zero-transfer learning, in which an already trained model is applied to a new task without further training. Thanks to transfer learning, the model often performs exceptionally well on the new neighbor task.

“We know that ideally we would train on all tasks, but we wondered if we could do without training on a subset of those tasks, apply the results to all tasks, and still see an increase in performance,” Wu says.

To determine which tasks they should choose to maximize expected performance, researchers developed an algorithm called model-based transfer learning (MBTL).

The MBTL algorithm consists of two parts. First, it models the performance of each algorithm if it were trained independently on a single task. It then models how much each algorithm’s performance would degrade if it were transferred to any other task, a concept known as generalization efficiency.

Explicit modeling of generalization performance allows MBTL to estimate the value of training on a up-to-date task.

MBTL does this sequentially by first selecting the task that leads to the greatest performance gain and then selecting additional tasks that provide the greatest subsequent marginal improvement in overall performance.

Because MBTL focuses only on the most promising tasks, it can dramatically improve the efficiency of the training process.

Reducing training costs

When the researchers tested the technique on simulated tasks, including controlling traffic lights, managing real-time speed recommendations, and performing several classic control tasks, it was found to be five to 50 times more effective than other methods.

This means they could achieve the same solution by training on much less data. For example, with a 50x augment in performance, the MBTL algorithm can train only two tasks and achieve the same performance as the standard method that uses data from 100 tasks.

“From the perspective of the two main approaches, this means that data from the remaining 98 tasks was not needed, or that training on all 100 tasks misleads the algorithm, resulting in worse performance than ours,” Wu says.

With MBTL, adding even a miniature amount of additional training time can lead to significantly better results.

In the future, researchers plan to design MBTL algorithms that can be applied to more elaborate problems, such as high-dimensional task spaces. They are also interested in applying their approach to real-world problems, especially in next-generation mobility systems.

The research is funded in part by a National Science Foundation CAREER Award, the Kwanjeong Educational Foundation Doctoral Fellowship Program, and an Amazon Robotics Doctoral Fellowship.

Categories

MIT researchers are developing an effective way to train more reliable AI agents

Why pigeons at rest are in the center of complexity theory

Why balcony solar panels did not start in the USA

Apple cooperates with the anthropic on the AI coding tool for Xcode

Compact packages from Shein and this are now subject to American tariffs. Here’s what to know

Altman and Elon Musk are racing to build “everything

More News

An pioneering AI model inspired by neural dynamics from the brain

Making AI models will be more trustworthy in high rate settings

The novel method detects contamination of microorganisms in cell farms

The robotic system is zooming on objects most significant for helping people

Why pigeons at rest are in the center of complexity theory

Why balcony solar panels did not start in the USA

Apple cooperates with the anthropic on the AI coding tool for Xcode