Friday, May 23, 2025

Learning to predict infrequent types of failure

Share

How did the weather system be launched such a widespread failure? Scientists from the myth examined this commonly reported defeat as an example of cases in which systems that work smoothly most of the time suddenly fall apart and cause dominants of failure. They have now developed a calculation system to operate a combination of infrequent data on a infrequent failure event, combined with much more extensive data on normal operations, backward work and trying to indicate the main causes of failure and, hopefully you can find ways to adapt systems to prevent failure in the future.

Arrangements They were presented at the international conference of the Learning (ICLR) national team, which took place in Singapore on April 24-28 by PhD student Charles Dawson, Professor Aeronautics and Astronautics of Chuchu and colleagues from Harvard University and the University of Michigan.

“The motivation of this work is that it is really frustrating when we have to interact with these complex systems, in which it is really difficult to understand what is happening behind the scenes that create the problems or failures that we observe,” says Dawson.

The modern work is based on previous studies from the fans laboratory, in which they analyzed problems related to hypothetical problems with predicting failure, as he says, for example in the case of groups of robots cooperating on the task or complicated systems, such as the power grid, looking for ways to predict how such systems may not be a novel. “The purpose of this project,” says the fan, “it really was to transform it into a diagnostic tool that we could use in real world systems.”

The idea consisted in providing a way in which someone could “give us data from the time when this real system had a problem or failure,” says Dawson: “And we can try to diagnose the main causes and provide a little for the veil of this complexity.”

It is about methods that have been developed “to work in a fairly general class of cyberspace problems,” he says. These are problems in which “you have an automated decision component in interaction with a mess of a real world,” he explains. There are available tools for testing software systems that work independently, but complexity arises when the software must interact with physical entities regarding their activities in a real physical environment, regardless of whether it is planning aircraft, movement of autonomous vehicles, interactions of a team of robots, or control of entrances and output on the electrical network. In such systems, it often happens that “software can make a decision at the beginning, but it has all these dominoes, knockout effects that make everything more messy and much more uncertain.”

One of the key differences, however, is that in systems such as robot teams, unlike plane planning, “we have access to the model in the world of robotics,” says a fan who is the main researcher at the MIT laboratory for information and decision systems (LIDS). “We understand the physics of robotics well and we have ways to create a model,” which represents their actions with reasonable accuracy. But the planning of airlines includes processes and systems that are reserved business information, which is why researchers had to find ways to conclude what the decisions are behind, using only relatively infrequent publicly available information, which basically consisted only of the actual times of arrival and departure of each plane.

“We took all these flight data, but this is behind the entire planning system and we don’t know how the system works,” says the fan. And the amount of data on the actual failure is only a few days, compared to the years of data on normal flight operations.

The impact of weather events in Denver during the Southwest Planning crisis week clearly appeared in aviation data, longer than the normal implementation period between the landing and the start at the Denver airport. But the way the cascade influenced, although the system was less obvious and required more analysis. The key turned out to be related to the concept of reserve aircraft.

Airlines usually maintain some planes in the reserve at different airports, so you can find problems with one plane that is planned for a flight, another aircraft can be quickly replaced. Southwest uses only one type of aircraft, so they are all interchangeable, which facilitates such substitutions. But most of the airlines operate in the Hub-Anda-SPOKE system, with several designated Piast airports, in which most reserve aircraft can be stored, while Southwest does not operate Piast, so their reserve aircraft are more dispersed throughout the network. And the way the planes were arranged, it turned out that it plays an essential role in the developing crisis.

“The challenge is that there is no public data available in terms of where the plane is stationed throughout the Southwest network,” says Dawson. “What we can find using our method is, looking at public data on arrivals, departures and delays, we can use our method to withdraw the hidden parameters of these aircraft reserves to explain the observations we saw.”

They discovered that the way of distribution of reserves was a “leading indicator” of problems that they stuck in the nationwide crisis. Some parts of the network to which the weather affected directly were able to recover quickly and return to the schedule. “But when we looked at other areas on the web, we saw that these reserves were simply not available, and things just got worse.”

For example, the data has shown that Denver’s reserves quickly decrease due to weather delays, but “it also allowed us to trace this defeat from Denver to Las Vegas,” he says. Although there was no challenging weather there: “Our method was still showing a constant decrease in the number of aircraft that could serve flights from Las Vegas.”

He says that “we found that there were circulation of aircraft in the southeastern network where the plane can start the day in California, and then fly to Denver, and then end the day in Las Vegas.” What happened in the case of this storm was that the cycle was interrupted. As a result, “this one storm in Denver interrupts the cycle and suddenly reserves in Las Vegas, which the weather does not affect, begins to deteriorate.”

Ultimately, Southwest was forced to take a drastic measure to solve the problem: they had to make a “heavy reset” of the whole system, canceling all flights and flying empty planes throughout the country to restore the balance of their reserves.

Working with experts from aviation transport systems, scientists have developed a model of planning system operation. Then “our method is, we basically try to start the model back.” Looking at the observed results, the model allows them to come back to see what types of initial conditions could bring these results.

While the data on actual failures were infrequent, extensive data on typical operations helped teach the computing model “what is possible, what is possible, what is the field of physical possibilities here,” says Dawson. “This gives us domain knowledge to say, in this extreme event, taking into account the space of what is possible, what is the most likely explanation”.

He says that this can lead to a real -time monitoring system, in which data on normal operations is constantly compared with current data, and determining what the trend looks like. “Do we improve in the direction of normal or do we improve extreme events?” Seeing signs of upcoming problems can allow preventive measures, such as re -implementation of a reserve aircraft into areas of anticipated problems.

Fan says that work on developing such systems is ongoing in its laboratory. In the meantime, they have produced the Open Source tool for analyzing failure systems called Calnf, which is available to anyone who can operate. Meanwhile, Dawson, who won a doctorate last year, works as Postdoc to operate methods developed in this work to understand failures in energy networks.

The research team also included Max Li from the University of Michigan and Van Tan from Harvard University. The works were supported by NASA, Air Force Office of Scientific Research and the Mit-Dsta program.

Latest Posts

More News