AI Algorithms Can Be Converted Into 'Sleeper Cell' Backdoors, Anthropic Research Shows

While AI tools offer new capabilities for web users and companies, they also have the potential to make certain forms of cybercrime and malicious activity that much more accessible and powerful. Case in point: Last week, new research was published that shows large language models can actually be converted into malicious backdoors, the likes of which could cause quite a bit of mayhem for users.

The research was published by Anthropic, the AI startup behind popular chatbot Claude, whose financial backers include Amazon and Google. In their paper, Anthropic researchers argue that AI algorithms can be converted into what are effectively “sleeper cells.” Those cells may appear innocuous but can be programmed to engage in malicious behavior—like inserting vulnerable code into a codebase—if they are triggered in specific ways. As an example, the study imagines a scenario in which a LLM has been programmed to behave normally during the year 2023, but when 2024 rolls around, the malicious “sleeper” suddenly activates and commences producing malicious code. Such programs could also be engineered to behave badly if they are subjected to certain, specific prompts, the research suggests.

Given the fact that AI programs have become immensely popular with software developers over the past year, the results of this study would appear to be quite concerning. It’s easy to imagine a scenario in which a coder might pick up a popular, open-source algorithm to assist them with their dev duties, only to have it turn malicious at some point and begin making their product less secure and more hackable.

The study notes:

We believe that our code vulnerability insertion backdoor provides a minimum viable example of a real potential risk…Such a sudden increase in the rate of vulnerabilities could result in the accidental deployment of vulnerable model-written code even in cases where safeguards prior to the sudden increase were sufficient.

In short: Much like a normal software program, AI models can be “backdoored” to behave maliciously. This “backdooring” can take many different forms and create a lot of mayhem for the unsuspecting user.

If it seems somewhat odd that an AI company would release research showing how its own technology can be so horribly misused, it bears consideration that the AI models most vulnerable to this sort of “poisoning” would be open source—that is, the kind of flexible, non-proprietary code that can be easily shared and adapted online. Notably, Anthropic is closed-source. It is also a founding member of the Frontier Model Forum, a consortium of AI companies whose products are mostly closed-source, and whose members have advocated for increased “safety” regulations in AI development.

Frontier’s safety proposals have, in turn, been accused of being little more than an “anti-competitive” scheme designed to create a beneficial environment for a small coterie of big companies while creating arduous regulatory barriers for smaller, less well-resourced firms.

What's On

You can now buy the Asus ROG Cetra Open Wireless earbuds in the US

Fatal Frame II: Crimson Butterfly Remake Review – Frustration Behind The Camera

Humanoid robot offers a peek into a future without chores

Doom vs Boom: The Battle to Enshrine AI’s Future Into California Law

Perplexity Is Reportedly Letting Its AI Break a Basic Rule of the Internet

Anthropic Says New Claude 3.5 AI Model Outperforms GPT-4 Omni

Call Centers Introduce ‘Emotion Canceling’ AI as a ‘Mental Shield’ for Workers

AI Turns Classic Memes Into Hideously Animated Garbage

May ‘AI’ Take Your Order? McDonald’s Says Not Yet

Most Popular

The Spectacular Burnout of a Solar Panel Salesman

5 laptops to buy instead of the M4 MacBook Pro

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

Our Picks

T-Mobile 5G Home Internet offers $20 off home internet per month for 5 years and $300 back

‘Flying Cars’ Will Take Off in American Skies This Summer

Samsung’s new security feature restarts your phone after 72 hours of inactivity

Subscribe to Updates

What's On

AI Algorithms Can Be Converted Into ‘Sleeper Cell’ Backdoors, Anthropic Research Shows

Related Articles

Subscribe to Updates