When Research Team Manager Ravish Mehra started the audio team at FRL four years ago, he envisioned creating a VR world where the virtual audio is perceptually indistinguishablefrom real-world audio. He knew the first order research problems that he would have to tackle to achieve this future would be high-quality spatial audio and efficient room acoustics. Over the next few years, he started a large research effort to solve the spatial audioproblem while looking for the right person to join his team to solve the room acoustics problem.
“Solving the room acoustics problem is extremely computationally expensive, and I knew that accurately simulating the acoustics of the environment would not be enough,” Mehra says. “Any approach we came up with would need to satisfy the tight computational and memory constraints imposed by real-time VR applications.”
A unique opportunity arose in summer 2017 when Carl Schissler finished his University of North Carolina at Chapel Hill doctorate on that very same topic. Schissler had completed two summer internships on the FRL audio team with Mehra as his intern mentor and was a perfect fit for the open-room acoustics lead researcher role.
“When I started at Facebook Reality Labs last year, I was given the task of creating a system that could simulate all of these complex acoustics in real time,” Schissler explains. “I’ve wanted to create better audio for games since I was very young. Back then, I would modify my favorite games by adding reverb to the sound effects in an effort to make them more atmospheric. Now, years later, I’m thrilled to have the opportunity to work on this technology that has the potential to have a huge impact on sound quality in VR.”
The FRL audio team’s psychoacoustics group led by Research Science Manager Philip Robinson also played a key role in the project. Postdoctoral Research Scientist Sebastià V. Amengual performed experiments to determine which aspects of the acoustic simulation were most important to simulate accurately. With a solid psychoacoustic foundation, the FRL audio team was able to do perceptual evaluation of new audio technologies to inform future development.
Computational challenges
The biggest obstacle to realistic simulation of acoustics is the computational complexity involved. There are a number of different existing simulation techniques based on numerical wave solvers or geometric algorithms, but none of them are efficient enough to run in real-time on current hardware. A fast multicore CPU or GPU would be required to make previous approaches run fast enough, and even then they would only be able to simulate a handful of sources at a time. Add in a game engine doing all kinds of graphics, physics, AI, and scripting at the same time, and you can see how difficult it is to get the necessary amount of resources.
The typical way to sidestep this problem is to do a long precomputation to simulate the acoustic responses for every pair of listener and source locations. At runtime, the response for each source can be interpolated from that data and used to filter the source’s audio. In practice, this adds up to a huge amount of data for non-trivial scenes. Another drawback is that, since all of the acoustic responses are precomputed, there cannot be any dynamic scene elements that change the sound. This means that shutting a door won’t stop you from hearing a sound source, and destructible or user-created environments are totally out of the question.
At FRL, our challenge was to develop an approach that was able to render high-quality audio for complex scenes while using as few compute and memory resources as possible. The bar was high — typical games may have hundreds of concurrent sound sources that need to be simulated, and the compute budget is extremely tight. Furthermore, the simulation needed to be dynamic so that it could enable the widest range of immersive audio experiences, unburdened by long precomputation times.
Sound innovations
To solve this challenge, Schissler spent almost a year perfecting the simulation engine. “I had to leverage every trick and optimization I could think of to build a system with the required capabilities,” he notes.
To efficiently compute the propagation of sound within a 3D environment, the researchers made use of an advanced ray tracing algorithm. Traditional acoustic ray tracing would require tracing many millions of rays per second, necessitating a large amount of computation. Optimizations developed by Schissler allowed the number of rays to be greatly reduced while maintaining high quality and enabling dynamic scene elements. The largest issue when using stochastic ray tracing is the presence of noise that can lead to audio artifacts. In order to deal with this, the researchers developed clever noise reduction algorithms to filter out the noise in the simulation results.
Another big problem came about when the number of sound sources in the scene grew large. In a naive implementation, the computation time would increase in proportion to the number of sources. One of the key developments that makes the new technology feasible is a perceptually driven, dynamic prioritization and source clustering system. By developing smart heuristics that are able to cluster unimportant or distant sources together, the researchers have been able to dramatically reduce the computation time in very complex scenes.
When you hear it
Using the innovations developed at FRL, the researchers were able to meet the initial goals of the project and deliver a working prototype to the Oculus Audio SDK team led by Spatial Audio Tech Lead Robert Heitkamp. At OC5, Audio Design Manager Tom Smurdon and Software Engineer Pete Stirling presented this system in action. During their talk, Smurdon, a veteran of the game audio industry, gave his opinion on the prototype: “You will know when you’re next to a wall without even seeing anything. You can feel everything — it’s amazing. I’m super enthused and happy about where they are at right now”.
“When you hear a realistic audio simulation for the first time in VR, it’s almost uncanny how much it enhances immersion,” adds Schissler. “Authentic audio rendering can even synergistically help make visuals seem better.”