Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On

Xiaomi 16 Tipped to Get Snapdragon 8 Elite 2 Chipset, Triple Rear Cameras, More

31 May 2025

Hell is Us Preview – Show, Don’t Tell

31 May 2025

Google Pixel 10 Series to Debut With Satellite Connectivity Despite Switch to MediaTek Modem: Report

31 May 2025
Facebook X (Twitter) Instagram
Just In
  • Xiaomi 16 Tipped to Get Snapdragon 8 Elite 2 Chipset, Triple Rear Cameras, More
  • Hell is Us Preview – Show, Don’t Tell
  • Google Pixel 10 Series to Debut With Satellite Connectivity Despite Switch to MediaTek Modem: Report
  • Wreaking Havoc In Deliver At All Costs, Raising Cows in Cattle Country, And More New Games We Played This Month
  • Mystery Redmi Phone With Xiaomi HyperOS 2 Reportedly Surfaces on FCC Certification Site
  • Trump’s Administration Wants to Erase Queer History. An Unconventional Book Club Is Fighting Back
  • iPhone 17 Said to Feature Larger Screen With Long-Awaited Refresh Rate Upgrade
  • What Are Exosomes, and Why Are They in Your Skin Care?
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’
News

Why Anthropic’s New AI Model Sometimes Tries to ‘Snitch’

News RoomBy News Room28 May 20254 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

The hypothetical scenarios the researchers presented Opus 4 with that elicited the whistleblowing behavior involved many human lives at stake and absolutely unambiguous wrongdoing, Bowman says. A typical example would be Claude finding out that a chemical plant knowingly allowed a toxic leak to continue, causing severe illness for thousands of people—just to avoid a minor financial loss that quarter.

It’s strange, but it’s also exactly the kind of thought experiment that AI safety researchers love to dissect. If a model detects behavior that could harm hundreds, if not thousands, of people—should it blow the whistle?

“I don’t trust Claude to have the right context, or to use it in a nuanced enough, careful enough way, to be making the judgment calls on its own. So we are not thrilled that this is happening,” Bowman says. “This is something that emerged as part of a training and jumped out at us as one of the edge case behaviors that we’re concerned about.”

In the AI industry, this type of unexpected behavior is broadly referred to as misalignment—when a model exhibits tendencies that don’t align with human values. (There’s a famous essay that warns about what could happen if an AI were told to, say, maximize production of paperclips without being aligned with human values—it might turn the entire Earth into paperclips and kill everyone in the process.) When asked if the whistleblowing behavior was aligned or not, Bowman described it as an example of misalignment.

“It’s not something that we designed into it, and it’s not something that we wanted to see as a consequence of anything we were designing,” he explains. Anthropic’s chief science officer Jared Kaplan similarly tells WIRED that it “certainly doesn’t represent our intent.”

“This kind of work highlights that this can arise, and that we do need to look out for it and mitigate it to make sure we get Claude’s behaviors aligned with exactly what we want, even in these kinds of strange scenarios,” Kaplan adds.

There’s also the issue of figuring out why Claude would “choose” to whistleblow when presented with illegal activity by the user. That’s largely the job of Anthropic’s interpretability team, which works to unearth what decisions a model makes in its process of spitting out answers. It’s a surprisingly difficult task—the models are underpinned by a vast, complex combination of data that can be inscrutable to humans. That’s why Bowman isn’t exactly sure why Claude “snitched.”

“These systems, we don’t have really direct control over them,” Bowman says. What Anthropic has observed so far is that, as models gain greater capabilities, they sometimes select to engage in more extreme actions. “I think here, that’s misfiring a little bit. We’re getting a little bit more of the ‘act like a responsible person would’ without quite enough of like, ‘Wait, you’re a language model, which might not have enough context to take these actions,’” Bowman says.

But that doesn’t mean Claude is going to blow the whistle on egregious behavior in the real world. The goal of these kinds of tests is to push models to their limits and see what arises. This kind of experimental research is growing increasingly important as AI becomes a tool used by the US government, students, and massive corporations.

And it isn’t just Claude that’s capable of exhibiting this type of whistleblowing behavior, Bowman says, pointing to X users who found that OpenAI and xAI’s models operated similarly when prompted in unusual ways. (OpenAI did not respond to a request for comment in time for publication).

“Snitch Claude,” as shitposters like to call it, is simply an edge case behavior exhibited by a system pushed to its extremes. Bowman, who was taking the meeting with me from a sunny backyard patio outside San Francisco, says he hopes this kind of testing becomes industry standard. He also adds that he’s learned to word his posts about it differently next time.

“I could have done a better job of hitting the sentence boundaries to tweet, to make it more obvious that it was pulled out of a thread,” Bowman says as he looked into the distance. Still, he notes that influential researchers in the AI community shared interesting takes and questions in response to his post. “Just incidentally, this kind of more chaotic, more heavily anonymous part of Twitter was widely misunderstanding it.”

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleApple Testing a 200-Megapixel Rear Camera Sensor for Future iPhone Models: Report
Next Article Samsung Galaxy Z Flip 7 Chipset Tipped; Could Be First Samsung Foldable to Adopt Dual-Chip Strategy

Related Articles

News

Trump’s Administration Wants to Erase Queer History. An Unconventional Book Club Is Fighting Back

31 May 2025
News

What Are Exosomes, and Why Are They in Your Skin Care?

30 May 2025
News

Security News This Week: A Hacker May Have Deepfaked Trump’s Chief of Staff in a Phishing Campaign

30 May 2025
News

DOGE Is Busier Than Ever—and Trump Says Elon Musk Is ‘Really Not Leaving’ 

30 May 2025
News

Is Using a Stair Machine the Same as Climbing Stairs?

30 May 2025
News

Review: Fur Trim System 2-in-1 Trimmer and Shaver

30 May 2025
Demo
Top Articles

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 202490 Views

5 laptops to buy instead of the M4 MacBook Pro

17 November 202466 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
News

Trump’s Administration Wants to Erase Queer History. An Unconventional Book Club Is Fighting Back

News Room31 May 2025
Phones

iPhone 17 Said to Feature Larger Screen With Long-Awaited Refresh Rate Upgrade

News Room31 May 2025
News

What Are Exosomes, and Why Are They in Your Skin Care?

News Room30 May 2025
Most Popular

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025122 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 202490 Views
Our Picks

Wreaking Havoc In Deliver At All Costs, Raising Cows in Cattle Country, And More New Games We Played This Month

31 May 2025

Mystery Redmi Phone With Xiaomi HyperOS 2 Reportedly Surfaces on FCC Certification Site

31 May 2025

Trump’s Administration Wants to Erase Queer History. An Unconventional Book Club Is Fighting Back

31 May 2025

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2025 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.