Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On

The NFL Goes MrBeast Mode

21 September 2025

Apple’s Small but Powerful iPad Mini Is 20% Off Today

21 September 2025

Say Hello to the 2025 Ig Nobel Prize Winners

21 September 2025
Facebook X (Twitter) Instagram
Just In
  • The NFL Goes MrBeast Mode
  • Apple’s Small but Powerful iPad Mini Is 20% Off Today
  • Say Hello to the 2025 Ig Nobel Prize Winners
  • Meta’s Smart Glasses Might Make You Smarter. They’ll Certainly Make You More Awkward
  • A Dangerous Worm Is Eating Its Way Through Software Packages
  • Big Tech Dreams of Putting Data Centers in Space
  • Diminish Distractions by Setting Your iPhone to Gray Scale When You’re Home
  • Review: 1Password Password Manager
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » Meet The AI Agent With Multiple Personalities
News

Meet The AI Agent With Multiple Personalities

News RoomBy News Room16 April 20253 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

In the coming years, agents are widely expected to take over more and more chores on behalf of humans, including using computers and smartphones. For now, though, they’re too error prone to be much use.

A new agent called S2, created by the startup Simular AI, combines frontier models with models specialized for using computers. The agent achieves state-of-the-art performance on tasks like using apps and manipulating files—and suggests that turning to different models in different situations may help agents advance.

“Computer-using agents are different from large language models and different from coding,” says Ang Li, cofounder and CEO of Simular. “It’s a different type of problem.”

In Simular’s approach, a powerful general-purpose AI model, like OpenAI’s GPT-4o or Anthropic’s Claude 3.7, is used to reason about how best to complete the task at hand—while smaller open source models step in for tasks like interpreting web pages.

Li, who was a researcher at Google DeepMind before founding Simular in 2023, explains that large language models excel at planning but aren’t as good at recognizing the elements of a graphical user interface.

S2 is designed to learn from experience with an external memory module that records actions and user feedback and uses those recordings to improve future actions.

On particularly complex tasks, S2 performs better than any other model on OSWorld, a benchmark that measures an agent’s ability to use a computer operating system.

For example, S2 can complete 34.5 percent of tasks that involve 50 steps, beating OpenAI’s Operator, which can complete 32 percent. Similarly, S2 scores 50 percent on AndroidWorld, a benchmark for smartphone-using agents, while the next best agent scores 46 percent.

Victor Zhong, a computer scientist at the University of Waterloo in Canada and one of the creators of OSWorld, believes that future big AI models may incorporate training data that helps them understand the visual world and make sense of graphical user interfaces.

“This will help agents navigate GUIs with much higher precision,” Zhong says. “I think in the meantime, before such fundamental breakthroughs, state-of-the-art systems will resemble Simular in that they combine multiple models to patch the limitations of single models.”

To prepare for this column, I used Simular to book flights and scour Amazon for deals, and it seemed better than some of the open source agents I tried last year, including AutoGen and vimGPT.

But even the smartest AI agents are, it seems, still troubled by edge cases and occasionally exhibit odd behavior. In one instance, when I asked S2 to help find contact information for the researchers behind OSWorld, the agent got stuck in a loop hopping between the project page and the login for OSWorld’s Discord.

OSWorld’s benchmarks show why agents remain more hype than reality for now. While humans can complete 72 percent of OSWorld tasks, agents are foiled 38 percent of the time on complex tasks. That said, when the benchmark was introduced in April 2024, the best agent could complete only 12 percent of the tasks.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleOnePlus Nord CE 5 Leaked Renders Suggest Rear Panel Design and Camera Layout
Next Article Hubble captures a galactic hat for its birthday

Related Articles

News

The NFL Goes MrBeast Mode

21 September 2025
News

Apple’s Small but Powerful iPad Mini Is 20% Off Today

21 September 2025
News

Say Hello to the 2025 Ig Nobel Prize Winners

21 September 2025
News

Meta’s Smart Glasses Might Make You Smarter. They’ll Certainly Make You More Awkward

20 September 2025
News

A Dangerous Worm Is Eating Its Way Through Software Packages

20 September 2025
News

Big Tech Dreams of Putting Data Centers in Space

20 September 2025
Demo
Top Articles

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024105 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views

5 laptops to buy instead of the M4 MacBook Pro

17 November 202492 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
News

Big Tech Dreams of Putting Data Centers in Space

News Room20 September 2025
News

Diminish Distractions by Setting Your iPhone to Gray Scale When You’re Home

News Room20 September 2025
News

Review: 1Password Password Manager

News Room20 September 2025
Most Popular

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025129 Views

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024105 Views

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202495 Views
Our Picks

Meta’s Smart Glasses Might Make You Smarter. They’ll Certainly Make You More Awkward

20 September 2025

A Dangerous Worm Is Eating Its Way Through Software Packages

20 September 2025

Big Tech Dreams of Putting Data Centers in Space

20 September 2025

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2025 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.