Web crawling—the act of indexing information across the internet—has been around for decades. It has primarily been used by search engines like Google and nonprofits like Internet Archive and Common Crawl to catalog the contents of the open internet and make it searchable. Until recently, the practice of web crawling has rarely been seen as controversial, as websites depended on the process as a way for people to find their content. But now crawling tech has been subsumed by the great AI-ening of everything, and is being used by companies like Google and Perplexity AI to absorb whole articles that are fed into their summarizing machines.
This week on Gadget Lab, WIRED senior writer Kate Knibbs joins the show to talk about web crawling and the controversy over Common Crawl. Then we talk with Forbes’ chief content officer and editor Randall Lane about how Perplexity.AI repurposed a Forbes article and presented it as its own story, without first asking permission or properly citing the source.
Show Notes
Read Kate’s story about how publishers are going after Common Crawl over AI training data. Read Randall’s story about how Preplexity.AI copied the work of two Forbes reporters.
Recommendations
Randall recommends his new horse racing league, the National Thoroughbred League. Kate recommends the book Victim by Andrew Boryga. Lauren recommends the show Hacks on Max.
Randall Lane can be found on social media @RandallLane. Kate Knibbs is @Knibbs. Lauren Goode is @LaurenGoode. Michael Calore is @snackfight. Bling the main hotline at @GadgetLab. The show is produced by Boone Ashworth (@booneashworth). Our theme music is by Solar Keys.
How to Listen
You can always listen to this week’s podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here’s how:
If you’re on an iPhone or iPad, open the app called Podcasts, or just tap this link. You can also download an app like Overcast or Pocket Casts, and search for Gadget Lab. If you use Android, you can find us in the Google Podcasts app just by tapping here. We’re on Spotify too. And in case you really need it, here’s the RSS feed.