Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On

Jesse Armstrong Finds Sympathy for ‘Rich Assholes’ in Mountainhead

28 May 2025

Motorola Edge 2025 With MediaTek Dimensity 7400 SoC, New AI Key Launched: Price, Specifications

28 May 2025

Oppo Find N6 Launch Timeline Leaked; Could Be Equipped With a Snapdragon 8 Elite 2 SoC

28 May 2025
Facebook X (Twitter) Instagram
Just In
  • Jesse Armstrong Finds Sympathy for ‘Rich Assholes’ in Mountainhead
  • Motorola Edge 2025 With MediaTek Dimensity 7400 SoC, New AI Key Launched: Price, Specifications
  • Oppo Find N6 Launch Timeline Leaked; Could Be Equipped With a Snapdragon 8 Elite 2 SoC
  • Google Play Integrity API Updates to Impact Advanced Users With Rooted Devices, Custom ROMs: Report
  • Asus ROG G700, TUF Gaming T500 Gaming Desktops Launched in India Alongside V400 AIO PCs
  • Infinix Hot 60 Pro+ Leaked Hands-on Video Showcases Slim Design; Compared With Galaxy S25 Edge
  • Realme Neo 7 Turbo Display, Battery Details Revealed Ahead of May 29 Launch
  • Russian Army-Backed Military Propaganda Game, Squad 22: ZOV, Listed On Steam
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » AI Is a Black Box. Anthropic Figured Out a Way to Look Inside
News

AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

News RoomBy News Room21 May 20243 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

“Chris looked at it, and he was like, ‘Holy crap. This looks great,’” says Henighan, who was stunned as well. “I looked at it, and was like, ‘Oh, wow, wait, is this working?’”

Suddenly the researchers could identify the features a group of neurons were encoding. They could peer into the black box. Henighan says he identified the first five features he looked at. One group of neurons signified Russian texts. Another was associated with mathematical functions in the Python computer language. And so on.

Once they showed they could identify features in the tiny model, the researchers set about the hairier task of decoding a full-size LLM in the wild. They used Claude Sonnet, the medium-strength version of Anthropic’s three current models. That worked, too. One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California Governor Gavin Newsom, and the Hitchcock movie Vertigo, which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net. Many of the features were safety-related, including “getting close to someone for some ulterior motive,” “discussion of biological warfare,” and “villainous plots to take over the world.”

The Anthropic team then took the next step, to see if they could use that information to change Claude’s behavior. They began manipulating the neural net to augment or diminish certain concepts—a kind of AI brain surgery, with the potential to make LLMs safer and augment their power in selected areas. “Let’s say we have this board of features. We turn on the model, one of them lights up, and we see, ‘Oh, it’s thinking about the Golden Gate Bridge,’” says Shan Carter, an Anthropic scientist on the team. “So now, we’re thinking, what if we put a little dial on all these? And what if we turn that dial?”

So far, the answer to that question seems to be that it’s very important to turn the dial the right amount. By suppressing those features, Anthropic says, the model can produce safer computer programs and reduce bias. For instance, the team found several features that represented dangerous practices, like unsafe computer code, scam emails, and instructions for making dangerous products.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSamsung Galaxy Book 4 Edge with Microsoft Copilot+ Unveiled: Price, Specifications
Next Article Best Lenovo laptop deals: Save on Yoga and ThinkPad laptops

Related Articles

News

Jesse Armstrong Finds Sympathy for ‘Rich Assholes’ in Mountainhead

28 May 2025
News

Donald Trump’s Media Conglomerate Is Becoming a Bitcoin Reserve

27 May 2025
News

A New Study Reveals the Makeup of Uranus’ Atmosphere

27 May 2025
News

Carl Pei Thinks the Phone of the Future Will Only Have One App

27 May 2025
News

Businesses Got Squeezed by Trump’s Tariffs. Now Some of Them Want Their Money Back

27 May 2025
News

Review: Six Moon Designs Lunar Solo Tent

27 May 2025
Demo
Top Articles

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202494 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 202490 Views

5 laptops to buy instead of the M4 MacBook Pro

17 November 202464 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
Phones

Infinix Hot 60 Pro+ Leaked Hands-on Video Showcases Slim Design; Compared With Galaxy S25 Edge

News Room28 May 2025
Phones

Realme Neo 7 Turbo Display, Battery Details Revealed Ahead of May 29 Launch

News Room28 May 2025
Gaming

Russian Army-Backed Military Propaganda Game, Squad 22: ZOV, Listed On Steam

News Room28 May 2025
Most Popular

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025121 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202494 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 202490 Views
Our Picks

Google Play Integrity API Updates to Impact Advanced Users With Rooted Devices, Custom ROMs: Report

28 May 2025

Asus ROG G700, TUF Gaming T500 Gaming Desktops Launched in India Alongside V400 AIO PCs

28 May 2025

Infinix Hot 60 Pro+ Leaked Hands-on Video Showcases Slim Design; Compared With Galaxy S25 Edge

28 May 2025

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2025 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.