AI agents are becoming better to write code - and hack it too

Share

Latest Models of artificial intelligence are not only extremely good in engineering software-new studies show that they are becoming more and more in finding software errors.

AI researchers from UC Berkeley checked how well the latest models and AI agents can find gaps in 188 gigantic Open Source codes. By means of new reference point called CybergamAI models identified 17 modern mistakes, including 15 previously unknown or “zero-day”. “Many of these gaps are critical,” says Dawn Song, Professor UC Berkeley, who conducted work.

Many experts expect AI models to become a powerful weapon of cyber security. AI tool from the XBOW startup currently developed the ranks of the hackeronThe employed plaque for errors and is currently in the highest place. The company recently announced $ 75 million for modern financing.

Song says that the skills of coding the latest AI models in combination with improving the ability to reason are beginning to change the cyber security landscape. “This is a key moment,” he says. “In fact, it exceeded our general expectations.”

As the models are improved, they automate the process of discovering and using security defects. This can assist companies ensure software safety, but it can also assist hackers break into systems. “We didn’t even try so hard,” says Song. “If we increased the budget, allowed agents to run longer, they could do even better.”

The UC Berkeley team tested the conventional models of AI Frontier from OpenAI, Google and Anthropic, as well as Open Source offers from Meta, Deepseek and Alibaba in combination with several agents of finding errors, including errors, including Open –IN CybenchAND Enigma.

Scientists used descriptions of known gaps in software from 188 software designs. Then they gave descriptions to cyber security agents driven by the Frontier AI model to see if they can identify the same disadvantages for themselves, analyzing modern code bases, starting tests and creating evidence exposure. The team also asked agents to hunt modern gaps in code databases.

Thanks to this process, AI tools generated hundreds of evidence exposure, and among these used researchers identified 15 previously undetectable gaps and two gaps that were previously disclosed and arranged. Work adds to growing evidence that artificial intelligence can automate the discovery of gaps to zero-life, which are potentially risky (and valuable) because they can be a way to break into live systems.

However, it seems that AI is becoming an vital part of the cyber security industry. Security Expert Sean Heelan recently discovered Zero-day defect in the widely used Linux nucleus with the assist of the OPENAI O3 reasoning model. Google in November last year announced The fact that he discovered a previously unknown susceptibility to software using artificial intelligence through a program called Project Zero.

Like other parts of the software industry, many cyber security companies are in love with the potential of artificial intelligence. The modern work actually shows that artificial intelligence can routinely find modern disadvantages, but also emphasizes other restrictions on technology. AI systems were not able to find most of the disadvantages and were surprised by the particularly intricate.

The AI Sckool

Categories

AI agents are becoming better to write code – and hack it too

You can approximate Pi by dropping needles on the floor

How to operate novel ChatGPT app integrations including DoorDash, Spotify, Uber and more

How can a locomotive pull a long train that is much heavier?

AI psychosis lawyer warns of risk of mass casualties

5 Powerful Python Decorators for Proficient Data Pipelines

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

You can approximate Pi by dropping needles on the floor

How to operate novel ChatGPT app integrations including DoorDash, Spotify, Uber and more

How can a locomotive pull a long train that is much heavier?