Spatial audio is having a moment. While the goal of offering a more immersive, 3D-like listening experience may have been born in movie theaters, much of the conversation around spatial audio has switched to music — specifically, the relatively new availability of Dolby Atmos Music tracks via music streaming services.

Spatial audio’s appeal is no mystery. When you combine one of the first novel ways of listening to music since stereo — coupled with Apple’s prodigious marketing might — you get a lot of people wanting to try it.

Something of a mystery, however, is whether there’s a difference between spatial audio from one streaming service to another. Say, on Apple Music versus Amazon Music. And what about your headphones — do they affect how spatial audio sounds?

The answers are yes and yes, but maybe not for the reasons you think. To explain, let’s take a deeper look at what’s going on behind the scenes when you listen to spatial audio using headphones.

Before continuing, here’s a spatial audio primer that explains what it is and the various ways you can experience it.

A room full of speakers inside your head

Spatial audio formats like Dolby Atmos are extensions of multichannel surround sound (think Dolby Digital), designed for a movie theater listening experience via speakers placed around a room. This theoretical room has a front, a rear, two sides, and a ceiling.

Music that’s created in Dolby Atmos starts with a “bed” of 9.1 channels, usually configured in a 7.1.2 layout that corresponds to speakers in the front (left, center, right), the sides (surround left/right), the rear (left/right), the ceiling (height left/right), plus a low-frequency effects (LFE) channel sent to a subwoofer. In addition to these nine channels, which can produce varying amounts of sound, Dolby Atmos adds up to 118 sound “objects” which can move freely anywhere in the hemisphere covered by those nine speakers.

When you listen to spatial audio via headphones, you hear the same 9.1 channel and 118 objects soundtrack, which seems like a paradox. How can two small speakers attached to your head do the same thing as nine speakers arranged all around you?

Tricking your brain

The answer can be found in psychoacoustics, the field of science that studies how the brain interprets and reacts to sound information. This includes a process known as sound localization — how the brain uses audible cues to figure out which direction a sound is coming from, and how near or far the source of the sound may be.

We localize sound by synthesizing pitch and loudness. But the biggest clue is the way sound reaches each of our ears. We are extremely sensitive to even the slightest differences in timing. If a sound were to reach our left ear just one millisecond before it reached our right ear, our brain would know and react accordingly.

Using psychoacoustic models (and a set of stereo headphones), we can simulate the direction and distance of real-world sounds by carefully controlling how these sounds reach each ear.

Binaural rendering

The process of taking a spatial audio format like Dolby Atmos and transforming it using the principles of psychoacoustics into a set of sounds that can be delivered via headphones is known as binaural rendering.

If you’ve ever listened to Dolby Atmos, DTS:X, or Sony 360 Reality Audio (360RA) using headphones, at some point in the playback chain, a binaural rendering software algorithm was used to create that experience. The same is true for video games with 5.1 or 7.1 soundtracks — these can be binaurally rendered by technologies like THX Spatial Audio or Immerse Gaming Hive.

The exciting part about binaural rendering is that it works on any set of stereo headphones or earbuds. Whether wired or wireless, and whether you spent $10 or $1,000, all stereo headsets are compatible with binaurally rendered spatial audio. A set of headphones may specifically advertise that they “work with spatial audio,” but that’s kind of like saying that a set of four car tires “work with paved roads” — they all do.

Spatial audio: it’s all outside your head?

So now that I’ve just explained that binaural rendering can trick your brain into thinking it’s listening to a full 7.1.2-channel sound system using any old set of headphones — in other words, it’s all in your head — I’m going to contradict myself. Partially.

The way each of us interprets sound localization cues has a lot to do with the shape of our heads. Specifically, the shape and placement of our ears. The physiology of our heads creates a unique fingerprint (audioprint?) on the sounds that reach our eardrums — no two are alike. From early infancy, as our brain develops our ability to localize sound, it uses this audioprint as a template.

When described mathematically and used to filter incoming sounds to each ear, this audioprint is known as a “head-related transfer function” (HRTF).

HRTFs are the key

For binaural rendering to sound as lifelike as possible, spatial audio is processed using an HRTF profile.

As you’ve probably guessed, we all have unique HRTF profiles. In an ideal world, we’d get our heads and upper torsos 3D-scanned and upload the resulting HRTF profile into Apple Music or Amazon Music (or any other app that supports spatial audio). Each app’s binaural rendering algorithm would then use that HRTF profile to create a set of sounds that our brain interprets with a high degree of realism.

We’re not quite there yet. In the absence of uploadable, personalized HRTFs, each spatial audio app uses a generic HRTF. As the name suggests, these generic HRTFs are compiled from hundreds of individual HRTFs to create an approximation of how sounds enter our ears. The closer your personal HRTF matches this average HRTF, the more realistic spatial audio will sound.

Generic HRTFs are also used to spatialize stereo content or improve head-tracked spatial audio. If your music app, wireless headphones, or wireless earbuds have a spatial sound mode, it can be used to give stereo sound extra depth. And if your headphones have built-in sensors to track your head movements, they can generate head-tracked spatial audio for an even more realistic, room-like listening experience.

Who’s got the best HRTF?

Curiously, while every binaural renderer uses a generic HRTF, they don’t all use the same generic HRTF. Some, like Amazon Music and Tidal, use a generic HRTF provided by Dolby — it’s embedded in the Dolby Atmos binaural rendering engine included in these apps — whereas Apple Music uses a proprietary generic HRTF developed by Apple.

By definition, every generic HRTF will be a closer match for some people than others, in the same way that a set of wireless earbuds will fit some people better than others. Whether Apple’s HRTF sounds better to you than Dolby’s will depend on how closely you match them. The only way to know is to try both.

A step closer to reality: personalized HRTFs

While full 3D anatomical scans are the holy grail of customized HRTFs, some companies have figured out an in-between step that gives us an easy way to get beyond generic HRTFs. Apple calls its version “personalized spatial audio.” If you have an iPhone X or newer (not including SE models), running iOS 16 or later, you can use the phone’s built-in TrueDepth selfie camera to take 3D photos of the front of your face and each ear. It’s the same technology that Apple uses for scanning your face when using FaceID to unlock your phone.

Unfortunately, the personalized HRTF this creates can only be used in conjunction with select Apple AirPods or Beats wireless headphones and earbuds — it won’t affect how you hear spatial audio when using any other devices.

Sony does something similar inside the Sony Headphones app. If you buy a set of 360RA-compatible Sony headphones or earbuds, you can take photos of each ear and upload them into the app.

The photos are evaluated and used to create a personalized HRTF, which is transferred to music apps on your phone that stream Sony 360RA tracks. As of March 2024, this includes Amazon Music, Tidal, Nugs.net, and PeerTracks.

Creating a virtual spatial audio studio

As cool as it is to use binaural rendering as a way of listening to spatial audio with headphones, for many musicians and other creators, it has become an essential part of making spatial audio.

As noted in the section “A room full of speakers inside your head,” spatial audio formats like Dolby Atmos are created for loudspeaker listening. But creating a 7.1.2 or better studio, complete with appropriate acoustic treatments to eradicate echoes and other unwanted effects, can cost thousands.

If you’re an up-and-coming artist or someone who wants to experiment with spatial audio as a hobby, this can be a prohibitive investment. But thanks to binaural rendering, all you need is a decent set of headphones and the right software, and you’ve got a virtual studio right on your computer.

An example of virtual studio software is Embody’s Immerse Virtual Studio Signature Edition. It works with any digital audio workstation (DAW) — like ProTools — or as a standalone way to experience binaurally rendered spatial audio from a variety of other sources.

Immerse lets you simulate what it’s like to mix spatial audio inside some of the most prestigious professional Dolby Atmos studios, including Alan Myerson’s 7.1.6 studio — where Hans Zimmer has mastered many of his iconic film scores — and Lurssen Mastering, a Grammy and Oscar-winning 7.1.4 studio.

The key to hearing these recording spaces the way you would if you were physically working inside them is the combination of Immerse’s personalized HRTF — which you can create using almost any smartphone — with dedicated headphone profiles for dozens of popular consumer and professional wired and wireless headphones and earbuds.

These elements give artists an optimized environment for developing spatial audio content. However, as discussed earlier, most folks don’t have optimized environments for listening to spatial audio. Embody’s software lets you switch to different binaural renderers — with and without personalized HRTFs — so you can hear your recordings the way average listeners might. The software includes Apple Music’s proprietary binaural renderer and can also be used to monitor Dolby binaural with the same generic HRTF used in Tidal and Amazon Music.

Going for the gold

Generally speaking, when a music label provides a track in Dolby Atmos to a streaming service like Apple Music or Tidal, it’s just a single version. This creates a dilemma for artists.

That version will likely have been mastered in a physical studio with an Atmos speaker configuration or by using software that virtualizes a similar space. Yet, as we discussed above, variables like HRTFs and the specific binaural renderers used can profoundly affect how these tracks sound when you listen to them on different platforms.

An artist might be tempted to tweak their mix so that it sounds best when streamed via Amazon Music and binaurally rendered with a generic HRTF — especially if they believe that’s how most of their audience will end up listening.

But that would compromise how it sounds on a full 7.1.4 Dolby Atmos sound system or even on Apple Music with a personalized HRTF.

Since most artists don’t have the time or the money to go back into the studio to remaster their tracks once they’ve been released, they need to make a decision: create a version that’s optimized for the best possible 7.1.4 listening experience and trust that, over time, as companies like Apple and Amazon improve their binaural renders and support for personalized HRTFs, the headphone experience will simply get better and better, or, create a version that falls short of what it could sound like, to create an optimized headphone mix for today’s listeners.

Obviously, this decision will be entirely up to the artist and/or their label. However, I worry that programs like Apple’s spatial audio bounty will create an incentive for everyone in the music business to rush their spatial mixes simply to get the promised financial reward.

Still, we’re at the very start of an exciting era in audio. It will redefine how music gets made and how it sounds when we listen — with or without the use of headphones.

Editors’ Recommendations






Share.
Exit mobile version