ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

Artificial intelligence chatbots like OpenAI’s ChatGPT are being sold as revolutionary tools that can help workers become more efficient at their jobs, perhaps replacing those people entirely in the future. But a stunning new study has found ChatGPT answers computer programming questions incorrectly 52% of the time.

The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT.

“Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”

Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.

“However, they also overlooked the misinformation in the ChatGPT answers 39% of the time,” according to the study. “This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”

Obviously, this is just one study, which is available to read online, but it points to issues that anyone who’s been using these tools can relate to. Large tech companies are pouring billions of dollars into AI right now in an effort to deliver the most reliable chatbots. Meta, Microsoft, and Google are all in a race to dominate an emerging space that has the potential to radically reshape our relationship with the internet. But there are a number of hurdles standing in the way.

Chief among those problems is that AI is frequently unreliable, especially if a given user asks a truly unique question. Google’s new AI-powered Search is constantly spouting garbage that’s often scraped from unreliable sources. In fact, there have been multiple times this week when Google Search has presented satirical articles from The Onion as dependable information.

For its part, Google defends itself by insisting wrong answers are anomalies.

“The examples we’ve seen are generally very uncommon queries, and aren’t representative of most people’s experiences,” a Google spokesperson told Gizmodo over email earlier this week. “The vast majority of AI Overviews provide high-quality information, with links to dig deeper on the web.”

But that defense, that “uncommon queries” are showing wrong answers, is frankly laughable. Are users only supposed to ask these chatbots the most mundane questions? How is that acceptable, when the promise is that these tools are supposed to be revolutionary?

OpenAI didn’t immediately respond to a request for comment on Friday about the new study on ChatGPT answers. Gizmodo will update this post if we hear back.

What's On

After PS5 price hike, Xbox and Nintendo could be next

Conquest Sets His Sights On The Invincible VS Roster, 2 DLC Fighters And Open Beta Details Revealed

I won’t buy the Galaxy A37 at $450, but I strongly recommend these 4 terrific options

Doom vs Boom: The Battle to Enshrine AI’s Future Into California Law

Perplexity Is Reportedly Letting Its AI Break a Basic Rule of the Internet

Anthropic Says New Claude 3.5 AI Model Outperforms GPT-4 Omni

Call Centers Introduce ‘Emotion Canceling’ AI as a ‘Mental Shield’ for Workers

AI Turns Classic Memes Into Hideously Animated Garbage

May ‘AI’ Take Your Order? McDonald’s Says Not Yet

Most Popular

The Spectacular Burnout of a Solar Panel Salesman

5 laptops to buy instead of the M4 MacBook Pro

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

Our Picks

Study says AI chatbots are increasingly ignoring humans, but it isn’t quite Skynet yet

Samsung is cooking up a money-saving trick for its browser

Meta’s next smart glasses sound like a treat for humans stuck with prescription lenses

Subscribe to Updates

What's On

ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study

Related Articles

Subscribe to Updates