Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On

Genshin Impact Is Being Delisted On PS4 As MiHoYo Plans To Cease Support

6 August 2025

9 Best Hotel Rewards Programs for Elevating Your Next Stay

6 August 2025

OnePlus 15 Specifications Leaked; Said to Feature Refreshed Design With Square Rear Camera Module

6 August 2025
Facebook X (Twitter) Instagram
Just In
  • Genshin Impact Is Being Delisted On PS4 As MiHoYo Plans To Cease Support
  • 9 Best Hotel Rewards Programs for Elevating Your Next Stay
  • OnePlus 15 Specifications Leaked; Said to Feature Refreshed Design With Square Rear Camera Module
  • RedOctane Games Is A New Rhythm Game Studio Founded By Guitar Hero Veterans
  • OpenAI Announces Massive US Government Partnership
  • Poco M7 Plus Price and Battery Specifications Teased; Confirmed to Debut With Silicon-Carbon Battery
  • Breathwork, Biohacking, and Cryotherapy: New Buzzwords for Modern Business Travelers
  • Lava Blaze AMOLED 2 5G Design, Key Specifications Revealed Ahead of India Launch
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » Anthropic’s New Model Excels at Reasoning and Planning—and Has the Pokémon Skills to Prove It
News

Anthropic’s New Model Excels at Reasoning and Planning—and Has the Pokémon Skills to Prove It

News RoomBy News Room22 May 20254 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Anthropic announced two new models, Claude 4 Opus and Claude Sonnet 4, during its first developer conference in San Francisco on Thursday. The pair will be immediately available to paying Claude subscribers.

The new models, which jump the naming convention from 3.7 straight to 4, have a number of strengths, including their ability to reason, plan, and remember the context of conversations over extended periods of time, the company says. Claude 4 Opus is also even better at playing Pokémon than its predecessor.

“It was able to work agentically on Pokémon for 24 hours,” says Anthropic’s chief product officer Mike Krieger in an interview with WIRED. Previously, the longest the model could play was just 45 minutes, a company spokesperson added.

A few months ago, Anthropic launched a Twitch stream called “Claude Plays Pokémon” which showcases Claude 3.7 Sonnet’s abilities at Pokémon Red live. The demo is meant to show how Claude is able to analyze the game and make decisions step by step, with minimal direction.

The lead behind the Pokémon research is David Hershey, a member of the technical staff at Anthropic. In an interview with WIRED, Hershey says he chose Pokémon Red because it’s “a simple playground,” meaning the game is turn-based and doesn’t require real time reactions, which Anthropic’s current models struggle with. It was also the first video game he ever played, on the original Game Boy, after getting it for Christmas in 1997. “It has a pretty special place in my heart,” Hershey says.

Hershey’s overarching goal with this research was to study how Claude could be used as an agent—working independently to do complex tasks on behalf of a user. While it’s unclear what prior knowledge Claude has about Pokémon from its training data, its system prompt is minimal by design: You are Claude, you’re playing Pokémon, here are the tools you have, and you can press buttons on the screen.

“Over time, I have been going through and deleting all of the Pokémon-specific stuff I can just because I think it’s really interesting to see how much the model can figure out on its own,” Hershey says, adding that he hopes to build a game that Claude has never seen before in order to truly test its limits.

When Claude 3.7 Sonnet played the game, it ran into some challenges: It spent “dozens of hours” stuck in one city and had trouble identifying non-player characters, which drastically stunted its progress in the game. With Claude 4 Opus, Hershey noticed an improvement in Claude’s long-term memory and planning capabilities when he watched it navigate a complex Pokémon quest. After realizing it needed a certain power to move forward, the AI spent two days improving its skills before continuing to play. Hershey believes that kind of multi-step reasoning, with no immediate feedback, shows a new level of coherence, meaning the model has a better ability stay on track.

“This is one of my favorite ways to get to know a model. Like, this is how I understand what its strengths are, what its weaknesses are,” Hershey says. “It’s my way of just coming to grips with this new model that we’re about to put out, and how to work with it.”

Everyone Wants an Agent

Anthropic’s Pokémon research is a novel approach to tackling a preexisting problem—how do we understand what decisions an AI is making when approaching complex tasks, and nudge it in the right direction?

The answer to that question is integral to advancing the industry’s much-hyped AI agents—AI that can tackle complex tasks with relative independence. In Pokémon, it’s important that the model doesn’t lose context or “forget” the task at hand. That also applies to AI agents asked to automate a workflow—even one that takes hundreds of hours.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleTecno Pova Curve 5G India Launch Date Confirmed; to Be Available on Flipkart
Next Article Elden Ring Nightreign preload guide: release date, file size, and more

Related Articles

News

9 Best Hotel Rewards Programs for Elevating Your Next Stay

6 August 2025
News

OpenAI Announces Massive US Government Partnership

6 August 2025
News

Breathwork, Biohacking, and Cryotherapy: New Buzzwords for Modern Business Travelers

6 August 2025
News

The Business Traveler of Today Is Changing—and So Is Their Flight Map

6 August 2025
News

Combating Domestic Violent Extremism Is No Longer a FEMA Priority

6 August 2025
News

How Much Protein Do I Need?

6 August 2025
Demo
Top Articles

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024104 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views

Oppo Reno 14, Reno 14 Pro India Launch Timeline and Colourways Leaked

27 May 202582 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
Phones

Poco M7 Plus Price and Battery Specifications Teased; Confirmed to Debut With Silicon-Carbon Battery

News Room6 August 2025
News

Breathwork, Biohacking, and Cryotherapy: New Buzzwords for Modern Business Travelers

News Room6 August 2025
Phones

Lava Blaze AMOLED 2 5G Design, Key Specifications Revealed Ahead of India Launch

News Room6 August 2025
Most Popular

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025129 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024104 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views
Our Picks

RedOctane Games Is A New Rhythm Game Studio Founded By Guitar Hero Veterans

6 August 2025

OpenAI Announces Massive US Government Partnership

6 August 2025

Poco M7 Plus Price and Battery Specifications Teased; Confirmed to Debut With Silicon-Carbon Battery

6 August 2025

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2025 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.