Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On

What You Need to Know About VPNs and Age-Verification Laws

12 September 2025

A Vigil for Charlie Kirk

12 September 2025

Naughty Dog Debated Going Straight Into The Last Of Us Part III After Part II

12 September 2025
Facebook X (Twitter) Instagram
Just In
  • What You Need to Know About VPNs and Age-Verification Laws
  • A Vigil for Charlie Kirk
  • Naughty Dog Debated Going Straight Into The Last Of Us Part III After Part II
  • Review: Nvidia GeForce Now RTX 5080 (Blackwell)
  • Which iPhone 17 Model Should You Buy?
  • Save Big on Our Favorite Outdoor Security Cam
  • How China’s Propaganda and Surveillance Systems Really Operate
  • Naughty Dog’s Debated Going Straight Into The Last Of Us Part III After Part II
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot
News

DeepSeek’s Safety Guardrails Failed Every Test Researchers Threw at Its AI Chatbot

News RoomBy News Room31 January 20253 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

“Jailbreaks persist simply because eliminating them entirely is nearly impossible—just like buffer overflow vulnerabilities in software (which have existed for over 40 years) or SQL injection flaws in web applications (which have plagued security teams for more than two decades),” Alex Polyakov, the CEO of security firm Adversa AI, told WIRED in an email.

Cisco’s Sampath argues that as companies use more types of AI in their applications, the risks are amplified. “It starts to become a big deal when you start putting these models into important complex systems and those jailbreaks suddenly result in downstream things that increases liability, increases business risk, increases all kinds of issues for enterprises,” Sampath says.

The Cisco researchers drew their 50 randomly selected prompts to test DeepSeek’s R1 from a well-known library of standardized evaluation prompts known as HarmBench. They tested prompts from six HarmBench categories, including general harm, cybercrime, misinformation, and illegal activities. They probed the model running locally on machines rather than through DeepSeek’s website or app, which send data to China.

Beyond this, the researchers say they have also seen some potentially concerning results from testing R1 with more involved, non-linguistic attacks using things like Cyrillic characters and tailored scripts to attempt to achieve code execution. But for their initial tests, Sampath says, his team wanted to focus on findings that stemmed from a generally recognized benchmark.

Cisco also included comparisons of R1’s performance against HarmBench prompts with the performance of other models. And some, like Meta’s Llama 3.1, faltered almost as severely as DeepSeek’s R1. But Sampath emphasizes that DeepSeek’s R1 is a specific reasoning model, which takes longer to generate answers but pulls upon more complex processes to try to produce better results. Therefore, Sampath argues, the best comparison is with OpenAI’s o1 reasoning model, which fared the best of all models tested. (Meta did not immediately respond to a request for comment).

Polyakov, from Adversa AI, explains that DeepSeek appears to detect and reject some well-known jailbreak attacks, saying that “it seems that these responses are often just copied from OpenAI’s dataset.” However, Polyakov says that in his company’s tests of four different types of jailbreaks—from linguistic ones to code-based tricks—DeepSeek’s restrictions could easily be bypassed.

“Every single method worked flawlessly,” Polyakov says. “What’s even more alarming is that these aren’t novel ‘zero-day’ jailbreaks—many have been publicly known for years,” he says, claiming he saw the model go into more depth with some instructions around psychedelics than he had seen any other model create.

“DeepSeek is just another example of how every model can be broken—it’s just a matter of how much effort you put in. Some attacks might get patched, but the attack surface is infinite,” Polyakov adds. “If you’re not continuously red-teaming your AI, you’re already compromised.”

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSamsung Galaxy S25 Plus vs. Galaxy S24 Plus: what’s new?
Next Article Don’t miss this $300 discount for the new Dell XPS 13 laptop with Copilot

Related Articles

News

What You Need to Know About VPNs and Age-Verification Laws

12 September 2025
News

A Vigil for Charlie Kirk

12 September 2025
News

Review: Nvidia GeForce Now RTX 5080 (Blackwell)

12 September 2025
News

Which iPhone 17 Model Should You Buy?

12 September 2025
News

Save Big on Our Favorite Outdoor Security Cam

11 September 2025
News

How China’s Propaganda and Surveillance Systems Really Operate

11 September 2025
Demo
Top Articles

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024105 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views

5 laptops to buy instead of the M4 MacBook Pro

17 November 202492 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
News

Save Big on Our Favorite Outdoor Security Cam

News Room11 September 2025
News

How China’s Propaganda and Surveillance Systems Really Operate

News Room11 September 2025
Gaming

Naughty Dog’s Debated Going Straight Into The Last Of Us Part III After Part II

News Room11 September 2025
Most Popular

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025129 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024105 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views
Our Picks

Review: Nvidia GeForce Now RTX 5080 (Blackwell)

12 September 2025

Which iPhone 17 Model Should You Buy?

12 September 2025

Save Big on Our Favorite Outdoor Security Cam

11 September 2025

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2025 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.