Close Menu
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

What's On
A SpaceX Dragon capsule just nudged the ISS to a record altitude

A SpaceX Dragon capsule just nudged the ISS to a record altitude

27 January 2026
Your Claude chats just got more powerful with interactive app support

Your Claude chats just got more powerful with interactive app support

27 January 2026
Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17

Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17

27 January 2026
Facebook X (Twitter) Instagram
Just In
  • A SpaceX Dragon capsule just nudged the ISS to a record altitude
  • Your Claude chats just got more powerful with interactive app support
  • Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17
  • TikTok Data Center Outage Triggers Trust Crisis for New US Owners
  • These Bose open-ear earbuds are $100 off, and they’re perfect if you hate feeling “plugged in”
  • Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti
  • Code Vein II Review – Bloodsucking The Fun Away
  • AI chatbot hype is real, but daily use at work remains limited
Facebook X (Twitter) Instagram Pinterest Vimeo
Best in TechnologyBest in Technology
  • News
  • Phones
  • Laptops
  • Gadgets
  • Gaming
  • AI
  • Tips
  • More
    • Web Stories
    • Global
    • Press Release
Subscribe
Best in TechnologyBest in Technology
Home » Synthetic Data Is a Dangerous Teacher
News

Synthetic Data Is a Dangerous Teacher

News RoomBy News Room8 January 20244 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Synthetic Data Is a Dangerous Teacher
Share
Facebook Twitter LinkedIn Pinterest Email

In April 2022, when Dall-E, a text-to-image visio-linguistic model, was released, it purportedly attracted over a million users within the first three months. This was followed by ChatGPT, in January 2023, which apparently reached 100 million monthly active users just two months after launch. Both mark notable moments in the development of generative AI, which in turn has brought forth an explosion of AI-generated content into the web. The bad news is that, in 2024, this means we will also see an explosion of fabricated, nonsensical information, mis- and disinformation, and the exacerbation of social negative stereotypes encoded in these AI models.

The AI revolution wasn’t spurred by any recent theoretical breakthrough—indeed, most of the foundational work underlying artificial neural networks has been around for decades—but by the “availability” of massive data sets. Ideally, an AI model captures a given phenomena—be it human language, cognition, or the visual world—in a way that is representative of the real phenomena as closely as possible.

For example, for a large language model (LLM) to generate humanlike text, it is important the model is fed huge volumes of data that somehow represents human language, interaction, and communication. The belief is that the larger the data set, the better it captures human affairs, in all their inherent beauty, ugliness, and even cruelty. We are in an era that is marked by an obsession to scale up models, data sets, and GPUs. Current LLMs, for instance, have now entered an era of trillion-parameter machine-learning models, which means that they require billion-sized data sets. Where can we find it? On the web.

This web-sourced data is assumed to capture “ground truth” for human communication and interaction, a proxy from which language can be modeled on. Although various researchers have now shown that online data sets are often of poor quality, tend to exacerbate negative stereotypes, and contain problematic content such as racial slurs and hateful speech, often towards marginalized groups, this hasn’t stopped the big AI companies from using such data in the race to scale up.

With generative AI, this problem is about to get a lot worse. Rather than representing the social world from input data in an objective way, these models encode and amplify social stereotypes. Indeed, recent work shows that generative models encode and reproduce racist and discriminatory attitudes toward historically marginalized identities, cultures, and languages.

It is difficult, if not impossible—even with state-of-the-art detection tools—to know for sure how much text, image, audio, and video data is being generated currently and at what pace. Stanford University researchers Hans Hanley and Zakir Durumeric estimate a 68 percent increase in the number of synthetic articles posted to Reddit and a 131 percent increase in misinformation news articles between January 1, 2022, and March 31, 2023. Boomy, an online music generator company, claims to have generated 14.5 million songs (or 14 percent of recorded music) so far. In 2021, Nvidia predicted that, by 2030, there will be more synthetic data than real data in AI models. One thing is for sure: The web is being deluged by synthetically generated data.

The worrying thing is that these vast quantities of generative AI outputs will, in turn, be used as training material for future generative AI models. As a result, in 2024, a very significant part of the training material for generative models will be synthetic data produced from generative models. Soon, we will be trapped in a recursive loop where we will be training AI models using only synthetic data produced by AI models. Most of this will be contaminated with stereotypes that will continue to amplify historical and societal inequities. Unfortunately, this will also be the data that we will use to train generative models applied to high-stake sectors including medicine, therapy, education, and law. We have yet to grapple with the disastrous consequences of this. By 2024, the generative AI explosion of content that we find so fascinating now will instead become a massive toxic dump that will come back to bite us.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleTCL goes all-in on Dolby Atmos with its first 7.1.4-channel soundbar
Next Article 3 underrated shows on Hulu you need to watch in January

Related Articles

A SpaceX Dragon capsule just nudged the ISS to a record altitude
News

A SpaceX Dragon capsule just nudged the ISS to a record altitude

27 January 2026
Your Claude chats just got more powerful with interactive app support
News

Your Claude chats just got more powerful with interactive app support

27 January 2026
Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17
News

Early look shows Apple’s Liquid Glass-style blur effects coming to Android 17

27 January 2026
TikTok Data Center Outage Triggers Trust Crisis for New US Owners
News

TikTok Data Center Outage Triggers Trust Crisis for New US Owners

27 January 2026
These Bose open-ear earbuds are 0 off, and they’re perfect if you hate feeling “plugged in”
News

These Bose open-ear earbuds are $100 off, and they’re perfect if you hate feeling “plugged in”

27 January 2026
Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti
News

Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

27 January 2026
Demo
Top Articles
ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024107 Views
5 laptops to buy instead of the M4 MacBook Pro

5 laptops to buy instead of the M4 MacBook Pro

17 November 2024101 Views
Costco partners with Electric Era to bring back EV charging in the U.S.

Costco partners with Electric Era to bring back EV charging in the U.S.

28 October 202497 Views

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Latest News
Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti News

Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

News Room27 January 2026
Code Vein II Review – Bloodsucking The Fun Away Gaming

Code Vein II Review – Bloodsucking The Fun Away

News Room27 January 2026
AI chatbot hype is real, but daily use at work remains limited News

AI chatbot hype is real, but daily use at work remains limited

News Room26 January 2026
Most Popular
The Spectacular Burnout of a Solar Panel Salesman

The Spectacular Burnout of a Solar Panel Salesman

13 January 2025136 Views
ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

ChatGPT o1 vs. o1-mini vs. 4o: Which should you use?

15 December 2024107 Views
5 laptops to buy instead of the M4 MacBook Pro

5 laptops to buy instead of the M4 MacBook Pro

17 November 2024101 Views
Our Picks
TikTok Data Center Outage Triggers Trust Crisis for New US Owners

TikTok Data Center Outage Triggers Trust Crisis for New US Owners

27 January 2026
These Bose open-ear earbuds are 0 off, and they’re perfect if you hate feeling “plugged in”

These Bose open-ear earbuds are $100 off, and they’re perfect if you hate feeling “plugged in”

27 January 2026
Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

Palantir Defends Work With ICE to Staff Following Killing of Alex Pretti

27 January 2026

Subscribe to Updates

Get the latest tech news and updates directly to your inbox.

Facebook X (Twitter) Instagram Pinterest
  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact Us
© 2026 Best in Technology. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.