DiscoverAll Things PixelGoogle AIPixel CameraHelp at HomeWellnessPodcastReviews & AwardsGift Guide
Podcasts - Season 2, Episode 2
Spatial Audio: Hear, There, Everywhere
Google’s new advancements in spatial audio
The development of spatial audio

Ever wanted to experience surround sound without the need for bulky speakers? With Google’s Pixel Buds Pro and the Spatializer Algorithm, you can. In a recent interview for episode 2 of the Made by Google Podcast, Lu Silverstein, a senior product manager at Google, discusses the development of spatial audio for Pixel Buds Pro.

AI in audio

According to Silverstein, dynamic spatial audio recreates the room you’re in and models your head and its interaction with 3D sounds for a more realistic experience. When you hit play on a movie, the audio track is decoded, uncompressed, and customized on your phone. The sound is then transmitted via Bluetooth to your Pixel Buds Pro, and your head position is detected to ensure the sound is spatialized correctly.

Silverstein also highlights the importance of AI in creating the dynamic spatial audio feature. While Pixel Buds Pro use a set of fixed algorithms, AI still does a lot of work decoding, spatializing, and placing the sound in real-time locations. Pixel Buds Pro recreate how sounds reach our ears and bounce around in our heads, providing an immersive audio experience.1

Learn more about spatial audio

Developing the dynamic spatial audio feature for Pixel Buds Pro required extensive coordination between various teams, and complex technical processes. AI was essential in creating the algorithm that made the spatialized sound more realistic. 

Tune in to the second episode of season 2 of the Made by Google Podcast to learn more about how Pixel Buds Pro and the Spatializer Algorithm deliver high-quality surround sound without the need for bulky speakers. 

Transcript

Rachid Finge (00:00): Lu, welcome to the Made by Google podcast. I could see in our internal directory you joined Google about two years ago. Tell us a little bit about your role and how you ended up there.

Lu Silverstein (00:10): Yeah, thanks, and thanks for having me on. So I came to Google, well, there's how I came here, why I came here. So, I came here, previously, I was working at a streaming TV company. Probably a lot of people know it, Roku, and it was a fun company to work at, but we sort of weren't large enough to do much research and development. And so I was really interested in, in doing a little bit more of bringing research and development innovation into the product space. And so Google came to me and I was really excited to be able to have that opportunity moved over.

Rachid Finge (00:48): All right. And now you work on spatial audio. Maybe just for starters, explain to us what that is before I ask you, you know, if you have like, any special connection with the topic.

Lu Silverstein (00:58): Yeah. So I guess first of all, my role here is as what's called a product manager for pixel audio experiences. And so I cover everything that has to do with pixel sound, which means anything that plays, it could be media, like music movies, right. It could be ring tones, notifications. It also has anything to do with the microphone, anything that comes in.

Rachid Finge (01:22): Oh, and is it only like the technical part, or are you also involved in, you know, maybe I hear a certain ringtone, were you involved in creating that ringtone or having people on the team create those, for example?

Lu Silverstein (01:33): Yeah, it's all of the above. So, you know, in the product management role, it's around working on anything from the sort of business aspects to the design, to the development, all the way through to making sure that it's launched and out there and doing well. Rachid Finge (01:49): Perfect. Well, I have you to thank them because I love the alarm sound on my pixel phone. So I guess that's from your team when I wake up every morning. So thank you for that. But that's a clear description of your job as a senior product manager. And then today's episode is specifically about spatial audio, which sounds very fancy and it actually is, but can you explain how you would define what that is?

Lu Silverstein (02:15): Yeah, sure. Spatial audio, I think the best way to think about it, it's sort of the simple way is that it's like surround sound for headphones, right? So it's sort of what the title or the name kind of implies spatial is that it creates the sound in space so it gives it more depth kind of like in the real world, like if you're standing outside and there is a, you know, a bird chirping in one location and a train goes by in another, those are in space and they're a certain distance from you in the location. So spatial audio, we take a lot of what happens in the real life and we actually simulate that using algorithms to make it sound like those, those sounds are actually in those locations in space. But you've got your headphones on.

Rachid Finge (03:00): Yeah. Because you gotta apply some sort of magic, right? Because I know a lot of people, they spend maybe thousands and thousands of dollars putting multiple speakers in their living room so they can have this home cinema surround set. Yet we sort of create the same thing using just two earpieces. So could you tell us a little bit about how you do that? Because it seems impossible to me

Lu Silverstein (03:27): There is a bit of magic and but it's all math magic in the end .Yeah. So the way that we do is we it all starts with trying. You have to sort of recreate the way that the, how we actually hear sounds, so the way we hear sounds, it starts through a sound actually coming in and reflecting off of your, the outside ear that you have and then going to the inside ear, and then your brain processes it. But there's a lot of things happening there because you have two ears, right? And so when a sound actually comes toward your head, it actually reaches your ears at different times, the left and the right. And so it actually bounces around differently. This sounds a little crazy, but actually like transfers through your head from one ear to the other, some of the sounds through your head. Yeah. Oh

Rachid Finge (04:18): Oh wow. It does?

Lu Silverstein (04:19): Yeah. Your head actually, it's kind of like, you know, you can hear sounds through a wall, while there are sounds that go through your head. And so we recreate all of that using these algorithms to sort of fake your brain into thinking that that sound is coming from the outside and actually has reflected off of your ears on the outside ear and on the inner ear. And then we play it back like that. And that's what we have a special algorithm that Google has developed that is called the Spatializer Algorithm, which runs on the phone. And when you listen to a movie it has to have a soundtrack with a certain number of channels. We take all of those. It's kind of like you hear about 5.1 surround sound. well, we take all of that 5.1 sound from a movie, and we then process it on the phone and we play it back through your earbuds to sound like it's coming actually from the outside world with all those things that I mentioned earlier that we take into consideration and we have special models that we've trained it to do that.

Rachid Finge (05:16): Right. So if I go back to my home cinema example, so 5.1, if I'm not mistaken, that means someone bought sort of five speakers to put in every corner of the room, one in the middle, above the TV perhaps. And then the.one is the subwoofer. So if there's a bird coming from behind in the movie, then I can hear that from behind because of actually placed a speaker behind me. But then with the pixel buds and the spatializer you use all sort of math to trick my brain into thinking that bird is from behind, even if it's not.

Lu Silverstein (05:50): Yeah, that's right. That's exactly it.

Rachid Finge (05:52): That's still crazy. But it's all math. So that's really cool. So now Lu, when I'm watching that home cinema movie, I can easily, I guess, move my head from left to right or turn my neck a little bit. And of course the sound will change since the positions of my ears are changing. But how does that work then when you have your pixel buds in? Because as I move my head, I move the pixel buds equally, right?

Lu Silverstein (06:20): Yeah. So there, there are now we're going to peel the onion back a little bit.

Rachid Finge (06:24): Right, yeah.

Lu Silverstein (06:25): There's two, there's two essentially different types of spatial audio that are available on, on Pixel. And what you're asking about is a version where, and this is sort of the full-blown immersive experience where the Pixel Buds Pro have these little little sensors in them called accelerometers that measure the movement of your head. And so, so sorry, let me back up. The two different types of spatial audio are what we call dynamic , which tracks the movement of your head. And then we have another one called Static, which is sort of a subset I would call it a little bit simpler version of, of spatial audio. So let's start with actually the, the dynamic one since you asked about that. And then I'll explain how the static one works, but the dynamic one, it takes those little sensors in the Pixel Buds Pro and it knows the position of your head when you're moving left and right. And so in the real world, let's keep that bird example. Let's say it's a hummingbird that's actually floating there right in front of you. Okay. When you move your head left and right, the hummingbird sound stays where it is in space.

Rachid Finge (07:29): Sure.

Lu Silverstein (07:30): So we have to recreate all of that and we have to know when you're moving your head left and right to literally sort of do the negative, then keep the bird in the space where it is. Exactly. And so that's what the Pixel Buds Pro can actually do with these little accelerometers. A lot of headphones don't have those accelerometers, you know, it's obviously an additional cost and there's other things that you have to do from a software perspective. Most all of your standard headphones don't have those accelerometers. So they have, what is that simpler version, which is called Static Spatial Audio. And so what happens if the bird, let's say again, the bird is floating in one spot, if you don't have those little accelerometers and we don't know where your head is when you move left and right, it actually gets stuck to your head and moves left and right. Not so much like the real world. So it's a little bit less immersive in life like, but it does still have the distance from you to the bird. That's the spatial part. And so you get all, if you don't move your head very much, you get a pretty good experience and even in the static version, but the Pixel Buds Pro gives you that extra kind of oomph with more realistic immersive experience.

Rachid Finge (08:38): I guess from a a product management perspective, that must make it extra cool to work at Google because, you know, you can work on all the algorithms, but you only get to the dynamic spatial audio if you can work with the hardware team and tell them, Hey, I need those sensors in the earbuds, otherwise we cannot make this happen. And it's maybe one of the few places where you can create that software and hardware combination.

Lu Silverstein (09:02): Yeah, that's right. So I probably should explain like there are a number of pieces in the chain of and it's exciting working here because we have to actually tie all those things together. If you just start from the earbuds, you then have to make those work with the phone, but then the phone has to also work with the content in the movie. So we have to work with the content providers as well. And this project was really complex in a lot of ways and that we had to work with all the different parties involved, including the operating system on the phone. So there were many, many different groups that we had involved to pull those pieces together to make that simple experience for the user where they just tap play on a movie and they have their pixel buds pro in and it just works. There was actually a lot of, a lot of work in the background that had happened to tie those not only tie the groups together, like from an organizational standpoint, coordination, but technically we have to thread the needle of, of all the content. So the way to sort of a user to think about this is that when they, they find a movie they want and it, they see that it says either Dolby on it or 5.1 and they click play that what happens at that point is, is the, the movie itself has a video track and an audio track. We take the audio track, we see that it's that 5.1 that you mentioned earlier. So there's, there's actually six channels, there's five plus one, and we have to then decode because it's compressed to go over the network.We have to decode it and uncompress it and then we have to specialize it and this is all being done on the phone and then we have to send it typically over Bluetooth. Right. We have to then send it over Bluetooth to the headset and then, and then play back, and then we have to figure out where the person's head position is. So there's a lot of communication going on back and forth between the phone and the and the earbuds at that point.

Rachid Finge (10:55): Now, I can't possibly ask you, you know, how you trick our ears because that's just well, rocket science I guess. But I am wondering, you know, there's so many things we do with AI at Google. What's the role of AI in, in creating the dynamic spatial audio? Is there a role for AI to play in there?

Lu Silverstein (11:14): Well you know, there's a difference I guess between we kind of delineate between AI and algorithms that are more fixed. And the difference is how you, how you train or create those algorithms. AI is more generative where it's you train and then it can kind of go off and do things on, its on its own. This is more of we program it, but it's, it's a pretty fixed like set of algorithms that are in there. And so this is more of a specialized algorithm but it still does a lot of work at the time when the movie movie is playing, it still has to decode a lot of those things. Like I was saying, it has to spatialize them and put them in real time in these different locations.

Rachid Finge (12:00): Okay. So we have the accelerometers in Pixel Buds Pro, what else is there hardware-wise to make sure that spatial audio, you know, sounds as intended?

Lu Silverstein (12:10): So there are a few things to make it sound like it's, you know, in, in a certain in space and it's actually a lot of it's done in the software side of things with on the, on the earbuds. So what we do in software is we actually recreate the room and the room that, that you're in. And we do a lot of assumption of that, but you can imagine like if you're in a stadium and you yell, right, the sound is very different than if you're in a very, very small, let's say room. There's a lot of reflections in a small room. And so when you speak, it'll reflect off the left wall, the right wall. And so we recreate a lot of that. And the reverberation that happens to make it sound like the person is in a specific location and typically it in a movie there are different scenes, right? There are scenes that are inside, there are scenes that are outside and so that helps recreate it through these algorithms that we've built in the spatializer. The other thing that we do is we actually model essentially your head and how it acts with three-dimensional sounds. On the headphones. It's a sort of a nerdy term we call it head related transfer function or HRTF, which is also built into the algorithm to make it sound more like 3D sound.

Rachid Finge (13:29): All right. So you already mentioned 5.1 movies. What kind of content other than that might work with spatial audio, If I have my Pixel phone and Pixel Buds Pro?

Lu Silverstein (13:38): Yeah. So any kind of 5.1 content that is played back should, should work. So for example, games if a game has 5.1, it would work with that. We focus primarily on the movie experience though, because that's where you get the most immersion from, from that spatial audio effect. And we've done a lot of studies around it to see how people really do find they can identify sounds even better with spatial audio, you know, specific sounds where they're located and they feel more immersed in it. But there, there is some other content that it would work on. We are working to expand that to additional content types in the incoming years.

Rachid Finge (14:18): I was just also wondering when it comes to spatial audio, you mentioned those sensors in the earbuds but our help center page also says that you need to keep your phone steady. Why is that?

Lu Silverstein (14:32): When we do the tracking of positions if the phone moves around a lot, we assume that we may need to lock the head position so that you don't, you don't feel disoriented. And so it helps to keep the phone because the sensor's the same kind of sensors on the phone. There's accelerometers that measure movement and so

Rachid Finge (14:56): Yeah, those usually make sure that when you tilt the phone right from portrait to landscape that the screen rotates I guess, right?

Lu Silverstein (15:02): Yeah, exactly. Yep. And so we want to make sure that you have the best experience and that if you're moving around a lot you're also not having orientation issues with the head Bud or earbuds.

Rachid Finge (15:14): Right, okay. So that's basically to temporarily maybe stop de-tricking of the ear in order to make sure that, you know, I don't get disoriented or maybe a little bit nausea or something like that.

Lu Silverstein (15:26): Yeah, exactly.

Rachid Finge (15:27): Okay. Well thanks for that. I'm really happy you thought about that. So what's next for spatial audio on Pixel or maybe even more widely when it comes to our products?

Lu Silverstein (15:38): This is the fun part about working here is that we can imagine a lot of things that could be done with it. These are not definitive plans that we have, but I can tell you just from personal perspective, what I'm excited about the possibilities. So you can imagine and these things are like in the real world when you think about sounds and space it helps you get oriented. So you can think of things like even what we're doing right now, even a podcast, you could hear people on different locations, especially if you have multiple people or even things like newscasts where there's like a panel of people, you can tell where people are when some people do a lot of conference calls and so you can imagine hearing people in different locations, it feels much more realistic and you can kind of identify people and who are speak, who's speaking. So there's a lot of different directions we, we can go. You know, we're working like I said, on figuring out where to expand next. We just launched spatial audio at the beginning of this year and we're really waiting here, like from users and from our customers, what kind of feedback they have and what really they enjoy and what they're interested in as well.

Rachid Finge (17:01): Alright. Sounds like stay tuned and give feedback. So that will will help you and your team decide where to go next with, with spatial audio. Wonderful. So we always close every episode of the Made by Google podcast with a top tip from our guest for our listeners. So Lu, give us a tip of something that our listeners and users of Pixel devices can do to get the most out of spatial audio.

Lu Silverstein (17:26): Yeah, this is a good one. One of the things that's complicated with spatial audio I think for our customers is sometimes to know whether you have, I mentioned all those different pieces of the chain that you have to have, you have to have the right piece of content. You have to have the phone working, right. You have to have the earbuds. A simple way to just check that you're getting all of that. And if you have your Pixel Buds Pro in and you have your Pixel 6 Pro, 7, 7 Pro and you want to know if you're getting spatial audio, the dynamic kind, when people are talking in dialogue on the movie, you can move your head left and right and, see if their voices stay centered even though you're moving your head back and forth. That's a really quick, simple way to check that you're getting the spatial audio and that it's working properly. because it's sort of like the analogy when you go into like a retail store where TVs are and you can see the difference between the TVs on the wall there, but when you get home you really don't know what your TV looks like compared to the other ones. This is the Quick Checkers. Is everything working? Do I have a difference between spatial audio on and spatial audio off?

Rachid Finge (18:34): That's a good one and well, I have no plans tonight, so definitely going to try that and listen to some content in 5.1 with my Pixel Buds Pro. Lu, thank you so much for giving us an explainer on what spatial audio is and looking forward to see what you come up with next.

Lu Silverstein (18:50): Yeah. Thank you. Thanks for having me on.

Related podcasts
How Google elevates the phone call A conversation with a Google product director Fall Detection on the Pixel Watch
Where to listen
Share this podcast
  1. To prevent possible hearing damage, avoid listening at high volume for prolonged periods of time.