“Jailbreaks persist simply because completely eliminating them is almost impossible – just like gaps in the scope of overfilling buffer in software (which have existed for over 40 years) or SQL injection defects in internet applications (which have been harassed by security teams for over two decades) “, Alex Polyakov, General Director of the Security Company of Adversa AI, said Wired We -mail.
Sampath Cisco claims that because companies operate more types of artificial intelligence in their applications, the risk is strengthened. “It begins to become a great thing when you start putting these models in important complex systems, and these jailbreaks suddenly result in further things that increase responsibility, increases business risk, increases all kinds of problems for enterprises,” says Sampath.
Cisco researchers developed 50 randomly selected hints to test R1 Deepseek from a well -known library of standardized ratings known as harmbench. They tested the hints from six categories of Harmbench, including general damage, cybercrime, disinformation and illegal actions. They probed a model working locally on machines, not via the website or Deepseek application, which send data to China.
In addition, scientists say that they also saw potentially disturbing R1 testing results with more involved non -linguistic attacks using such things as Cyrillicians and scrips adapted to the code. But in the case of the first tests, Sampath says, his team wanted to focus on the arrangements that resulted from a generally recognized reference point.
Cisco also included comparisons of R1 performance compared to Harmbench with the result of other models. And some, like Lama 3.1 Meta, hesitated almost as seriously as R1 Deepseek. But Sampath emphasizes that R1 Deepseek is a specific reasoning model, which generates answers longer, but pulls more elaborate processes to try to get better results. That is why, according to Sampath, the best comparison concerns the modeling model O1 OPENAI, which coped best of all tested models. (The finish line did not immediately answer at the request for comment).
Polyakov from AI AI explains that Deepseek seems to detect and reject some known Jailbreak attacks, saying that “it seems that these answers are often copied from the OPENAI data set.” However, Polyakov claims that in the tests of their company about four different types of jailbreaks-from language-from language-based tricks based on the codes-limits Deepseek can be easily bypassed.
“Each method worked flawlessly,” says Polyakov. “Even more disturbing is that they are not innovative” zero-day “Jailbreaks-Wielu is publicly known for years,” he says, claiming that he saw the model has a greater depth with some instructions on psychedelics, than he saw some other model.
“Deepseek is just another example of how any model can be broken – it’s just a matter of how much effort you put in. Some attacks may be arranged, but the attack surface is infinite, “adds Polyakov. “If not still red fusion of your artificial intelligence, you are already exposed.”