Sunday, April 20, 2025

Scientists propose a better way to report hazardous defects in artificial intelligence

Share

At the end of 2023, a team of external researchers discovered a disturbing fault in the broadly used model of artificial intelligence GPT-3.5.

Asked to repeat some words a thousand times, the model began to repeat the word over and over and then suddenly switched to spitting Incorrect text and fragments of personal data drawn from training data, including part of the names, telephone numbers and E -Mail addresses. A team that discovered the OpenAI problem to ensure the repair of a defect before disclosing it in public. This is only one of the many problems occurring in the main AI models in recent years.

IN Proposal issued todayOver 30 outstanding AI researchers, including some who have found a GPT-3.5 defect, say that many other gaps affecting popular models are reported in a problem. They suggest a up-to-date program supported by AI, which give permission from the outside to study their models and a public disclosure of defects.

“At the moment it is a bit of the wild West,” he says Shayne longpresPhD student in myth and main author of the proposal. Longpre says that some so -called jailbreakers share their methods of breaking the artificial intelligence of the social media platform X, leaving models and endangered users. Other Jailbreaks are made available to only one company, even though they can influence many. And some disadvantages, he says, are kept secret because of fear of ban or accusing of groundbreaking conditions of exploit. “It is obvious that there are cool effects and uncertainty,” he says.

The safety and safety of AI models are extremely significant, taking into account the broadly that technology is now used and how it can penetrate countless applications and services. Powerful models must be tested in stress or red team, because they may contain harmful prejudices, and because some input data may cause them to free themselves from the handrail and produce unpleasant or hazardous reactions. They include encouraging sensitive users to engage harmful behavior or aid an evil actor in developing cyber, chemical or biological weapons. Some experts are afraid that models can aid cybercriminals or terrorists, and even turn to people as progressed.

The authors suggest three main means to improve the process of disclosing information third: adopting standard reports of artificial intelligence defects to improve the reporting process; for immense AI companies to provide infrastructure to third -party researchers revealing defects; and to develop a system that allows you to share defects between different suppliers.

This approach is borrowed from the world of cyber security, where legal protection and established standards for external researchers in order to reveal errors.

“AI researchers do not always know how to reveal a defect and cannot be sure that their disadvantage in good faith will not disclose them at legal risk,” says Ilona Cohen, chief legal official and politics in HackeroneA company that organizes Bug and Co -author prizes in the report.

Immense AI companies currently carry out extensive safety tests of AI models before they are released. Some also conclude contracts with external companies to do a further study. “Is there enough people in them [companies] To solve all problems with general AI systems, used by hundreds of millions of people in applications that we never dreamed of? “Longpre asks. Some AI companies began to organize the Bug Bug awards. Longpre, however, claims that independent researchers risk breaking the conditions of exploit if they undertake to examine powerful AI models.

Latest Posts

More News