Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On
Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17

Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17

27 January 2026
TikTok Data Center Outage Triggers Trust Crisis for New US Owners

TikTok Data Center Outage Triggers Trust Crisis for New US Owners

27 January 2026
These Bose open-ear earbuds are 0 off, and they’re perfect if you hate feeling “plugged in”

These Bose open-ear earbuds are $100 off, and they’re perfect if you hate feeling “plugged in”

27 January 2026
Facebook X (Twitter) Instagram
Just In
  • Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17
  • TikTok Data Center Outage Triggers Trust Crisis for New US Owners
  • These Bose open-ear earbuds are $100 off, and they’re perfect if you hate feeling “plugged in”
  • Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti
  • Code Vein II Review – Bloodsucking The Fun Away
  • AI chatbot hype is real, but daily use at work remains limited
  • Judge Delays Minnesota ICE Decision While Weighing Whether State Was Being Illegally Punished
  • Invincible VS Roster Adds Viltrumites Anissa And Lucan
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » A New Trick Uses AI to Jailbreak AI Models—Including GPT-4
News

A New Trick Uses AI to Jailbreak AI Models—Including GPT-4

News RoomBy News Room5 December 20233 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
A New Trick Uses AI to Jailbreak AI Models—Including GPT-4
Share
Facebook Twitter LinkedIn Pinterest Email

Large language models recently emerged as a powerful and transformative new kind of technology. Their potential became headline news as ordinary people were dazzled by the capabilities of OpenAI’s ChatGPT, released just a year ago.

In the months that followed the release of ChatGPT, discovering new jailbreaking methods became a popular pastime for mischievous users, as well as those interested in the security and reliability of AI systems. But scores of startups are now building prototypes and fully fledged products on top of large language model APIs. OpenAI said at its first-ever developer conference in November that over 2 million developers are now using its APIs.

These models simply predict the text that should follow a given input, but they are trained on vast quantities of text, from the web and other digital sources, using huge numbers of computer chips, over a period of many weeks or even months. With enough data and training, language models exhibit savant-like prediction skills, responding to an extraordinary range of input with coherent and pertinent-seeming information.

The models also exhibit biases learned from their training data and tend to fabricate information when the answer to a prompt is less straightforward. Without safeguards, they can offer advice to people on how to do things like obtain drugs or make bombs. To keep the models in check, the companies behind them use the same method employed to make their responses more coherent and accurate-looking. This involves having humans grade the model’s answers and using that feedback to fine-tune the model so that it is less likely to misbehave.

Robust Intelligence provided WIRED with several example jailbreaks that sidestep such safeguards. Not all of them worked on ChatGPT, the chatbot built on top of GPT-4, but several did, including one for generating phishing messages, and another for producing ideas to help a malicious actor remain hidden on a government computer network.

A similar method was developed by a research group led by Eric Wong, an assistant professor at the University of Pennsylvania. The one from Robust Intelligence and his team involves additional refinements that let the system generate jailbreaks with half as many tries.

Brendan Dolan-Gavitt, an associate professor at New York University who studies computer security and machine learning, says the new technique revealed by Robust Intelligence shows that human fine-tuning is not a watertight way to secure models against attack.

Dolan-Gavitt says companies that are building systems on top of large language models like GPT-4 should employ additional safeguards. “We need to make sure that we design systems that use LLMs so that jailbreaks don’t allow malicious users to get access to things they shouldn’t,” he says.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleHonor Magic 6 Lite – Price in India, Specifications (5th December 2023)
Next Article 7 great Christmas movies you can watch for free

Related Articles

Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17
News

Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17

27 January 2026
TikTok Data Center Outage Triggers Trust Crisis for New US Owners
News

TikTok Data Center Outage Triggers Trust Crisis for New US Owners

27 January 2026
These Bose open-ear earbuds are 0 off, and they’re perfect if you hate feeling “plugged in”
News

These Bose open-ear earbuds are $100 off, and they’re perfect if you hate feeling “plugged in”

27 January 2026
Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti
News

Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

27 January 2026
AI chatbot hype is real, but daily use at work remains limited
News

AI chatbot hype is real, but daily use at work remains limited

26 January 2026
Judge Delays Minnesota ICE Decision While Weighing Whether State Was Being Illegally Punished
News

Judge Delays Minnesota ICE Decision While Weighing Whether State Was Being Illegally Punished

26 January 2026
Demo
Top Articles
ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024107 Views
5 laptops to buy instead of the M4 MacBook Pro

5 laptops to buy instead of the M4 MacBook Pro

17 November 2024101 Views
Costco partners with Electric Era to bring back EV charging in the U.S.

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202497 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
AI chatbot hype is real, but daily use at work remains limited News

AI chatbot hype is real, but daily use at work remains limited

News Room26 January 2026
Judge Delays Minnesota ICE Decision While Weighing Whether State Was Being Illegally Punished News

Judge Delays Minnesota ICE Decision While Weighing Whether State Was Being Illegally Punished

News Room26 January 2026
Invincible VS Roster Adds Viltrumites Anissa And Lucan Gaming

Invincible VS Roster Adds Viltrumites Anissa And Lucan

News Room26 January 2026
Most Popular
The Spectacular Burnout of a Solar Panel Salesman

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025136 Views
ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024107 Views
5 laptops to buy instead of the M4 MacBook Pro

5 laptops to buy instead of the M4 MacBook Pro

17 November 2024101 Views
Our Picks
Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

27 January 2026
Code Vein II Review – Bloodsucking The Fun Away

Code Vein II Review – Bloodsucking The Fun Away

27 January 2026
AI chatbot hype is real, but daily use at work remains limited

AI chatbot hype is real, but daily use at work remains limited

26 January 2026

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2026 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.