Sunday, December 22, 2024

MIT engineers are on a mission to find the failure

Share

From vehicle collision prevention to flight planning systems to energy grids, many of the services we rely on are managed by computers. As these autonomous systems become more convoluted and pervasive, their failure rates also raise.

Now, MIT engineers have developed an approach that can be connected to any autonomous system to quickly identify a range of potential failures in that system before they are deployed in the real world. Moreover, this approach allows you to find solutions to faults and suggest repairs to avoid system failure.

The team demonstrated that this approach could root out failures in a variety of simulated autonomous systems, including a compact and huge power grid, an aircraft collision avoidance system, a rescue drone team, and a robotic manipulator. In each system, a modern approach in the form of an automatic sampling algorithm allows you to quickly identify the scope of probable failures and make repairs to avoid them.

The modern algorithm differs from other automatic searches that aim to detect the most earnest failures in the system. The team says these approaches may miss more subtle but vital vulnerabilities that the modern algorithm could detect.

“In fact, for more complex systems, there can be a whole host of messiness,” says Charles Dawson, a graduate student in MIT’s School of Aeronautics and Astronautics. “We want to make sure that these systems will guide us, fly our planes or manage the energy grid. It is very vital to know their limitations and where they can fail.

Presented by Dawson and Chuchu Fan, assistant professor of aeronautics and astronautics at MIT their work this week at a conference on robotic learning.

Vulnerability to opponents

In 2021, Fan and Dawson began to think about a major system failure in Texas. In February this year, winter storms rolled through the state, bringing unexpectedly low temperatures that caused power grid outages. The crisis left more than 4.5 million homes and businesses without power for days. The system-wide failure caused the worst energy crisis in Texas history.

“It was a pretty big failure, and it made me wonder if we could have predicted it earlier,” Dawson says. “Could we use our knowledge of the physics of the power grid to understand where its weak spots might be, and then develop software updates and patches to harden those weak spots before something catastrophic happens?”

Dawson and Fan’s work focuses on robotic systems and finding ways to make them more resilient in the environment. Partly influenced by the Texas energy crisis, they decided to expand their scope to detect and repair failures in other, more convoluted, large-scale autonomous systems. To do this, they realized they would have to change the conventional approach to finding bugs.

Designers often test the security of autonomous systems by identifying their most likely and most earnest failures. They start with a computer simulation of the system, which shows the underlying physics and all the variables that can affect the system’s behavior. They then run a simulation using an algorithm that performs “adversarial optimization” – an approach that automatically optimizes for the worst-case scenario by making compact changes to the system over and over until it can narrow down the changes that are associated with the most severe failures.

“By condensing all of these changes into the most severe or most likely failure, we lose a lot of the complexity of the behaviors that can be seen,” Dawson notes. “Instead, we wanted to prioritize identifying a variety of failures.”

To this end, the team took a more “sensitive” approach. They developed an algorithm that automatically generates random changes to the system and assesses the sensitivity or potential failure of the system in response to these changes. The more sensitive a system is to a particular change, the greater the likelihood that this change will be associated with a possible failure.

This approach enables the team to eliminate a wider range of possible failures. Through this method, the algorithm also allows researchers to identify fixes by working backwards through the chain of changes that led to a specific failure.

“We know there’s really a duality to the problem,” Fan says. “There are two sides of the coin. If you can predict a failure, you should be able to predict what to do to avoid it. Our method now closes this loop.”

Hidden failures

The team tested the modern approach on various simulated autonomous systems, including a compact and huge power grid. In these cases, researchers combined their algorithm with the simulation of generalized power grids at a regional scale. It was shown that while conventional approaches focused on one power line as being most prone to failure, the team’s algorithm found that when a second line failed, a complete blackout could occur.

“Our method uncovers hidden correlations in the system,” says Dawson. “As we get better at exploring the failure space, we can find all kinds of failures, which sometimes include even more severe failures than existing methods can detect.”

Researchers have shown similarly mixed results in other autonomous systems, including simulating aircraft collision avoidance and coordinating rescue drones. To check whether the errors predicted in the simulation would hold up in reality, they also demonstrated the approach on a robotic manipulator, a robotic arm designed to push and lift objects.

The team first ran their algorithm on a simulation of a robot that was instructed to push the bottle out of the way without knocking it over. When they ran the same scenario with a real robot in the lab, they found that it failed in the way the algorithm predicted—for example, it knocked it over or it didn’t reach the bottle. When they applied the solution suggested by the algorithm, the robot successfully pushed the bottle away.

“This shows that in fact the system fails when we predict it will, and succeeds when we expect it to,” Dawson says.

In principle, the team’s approach could find and fix failures in any autonomous system, provided that its behavior was accurately simulated. Dawson predicts that one day this approach could be turned into an application that designers and engineers can download and utilize to tune and tighten their own systems before testing them in the real world.

“I think as we increase our reliance on these automated decision-making systems, the taste of failure will change,” Dawson says. “Rather than mechanical system failures, we will see more failures caused by the interaction of automated decision-making and the physical world. We are trying to explain this change by identifying different types of failures and addressing them now.”

This research is supported in part by NASA, the National Science Foundation, and the U.S. Air Force Office of Scientific Research.

Latest Posts

More News