Wednesday, December 25, 2024

MIT researchers exploit gigantic language models to label problems in elaborate systems

Share

Identifying a single faulty turbine in a wind farm, which can involve analyzing hundreds of signals and millions of data points, is like looking for a needle in a haystack.

While LLMs couldn’t beat state-of-the-art deep-learning models at detecting anomalies, they performed as well as some other AI approaches. If researchers can improve LLM’s performance, the framework could aid technicians flag potential problems in equipment like ponderous machinery or satellites before they occur, without having to train an pricey deep-learning model.

“Because this is only the first iteration, we didn’t expect to achieve this goal right away, but these results show that it is possible to use LLM for complex anomaly detection tasks,” says Sarah Alnegheimish, a graduate student in electrical engineering and computer science (EECS) and lead author article about SigLLM.

Co-authors include Linh Nguyen, an EECS postgraduate student; Laure Berti-Equille, director of research at the French National Institute for Sustainable Development Research; and senior author Kalyan Veeramachaneni, principal research scientist at the Laboratory for Information and Decision Systems. The research will be presented at the IEEE Conference on Data Science and Advanced Analytics.

Solution ready

Gigantic language models are autoregressive, meaning they can understand that the latest values ​​in sequential data depend on previous values. For example, models like GPT-4 can predict the next word in a sentence using the words that precede it.

But they wanted to develop a technique that avoids fine-tuning, the process by which engineers retrain a general-purpose LLM on a tiny amount of task-specific data to make it an expert at a single task. Instead, the researchers deploy the LLM out of the box, without additional training steps.

“If you don’t follow these steps very carefully, you may end up losing some data that is important,” Alnegheimish says.

Approaches to Anomaly Detection

In the first case, called Prompter, they feed prepared data into the model and tell it to find invalid values.

With Detector, LLM would be part of the anomaly detection process, while Prompter would do the job on its own. In practice, Detector performed better than Prompter, which produced many false positives.

“I think with the Prompter approach we were asking LLM to jump through too many hoops. We were giving it a harder problem to solve,” Veeramachaneni says.

When they compared both approaches with current techniques, Detector outperformed transformer-based AI models on seven of the 11 datasets evaluated, even though LLM did not require any training or tuning.

In the future, LLM will also be able to provide plain language explanations of its predictions, allowing the operator to better understand why LLM identified a particular data point as an anomaly.

However, state-of-the-art deep learning models performed significantly better than LLM models, which shows that much work remains to be done before LLM models can be used for anomaly detection.

“What will it take to get to the point where it performs as well as these state-of-the-art models? That’s the million-dollar question that we’re facing right now. An LLM-based anomaly detector needs to be a breakthrough for us to justify this kind of effort,” Veeramachaneni says.

The researchers want to see if fine-tuning can improve performance, although this would require additional time, money and expertise for training.

Their LLM approaches also take between 30 minutes and two hours to produce results, so increasing speed is a key area for future work. The researchers also want to study LLMs to understand how they perform anomaly detection, in hopes of finding a way to improve their efficiency.

This research was supported by SES SA, Iberdrola and ScottishPower Renewables and Hyundai Motor Company.

Latest Posts

More News