    Here’s the graphic that is typically shown to illustrate the Uncanny Valley concept. The idea is this: human physical attributes can be endearing. We like human qualities when we see them attached to inhuman things like robots. It makes them cute and relatable. However, as they start getting more and more human in appearance, the cuteness starts going away, and the skin-crawling creepiness begins.

    The idea of an audio equivalent for the Uncanny Valley was suggested by Francis Rumsey during a presentation he gave in May 2014 at the Audio Engineering Society Chicago Section Meeting, which took place at Shure Incorporated in Niles, Illinois. Francis Rumsey holds a PhD in Audio Engineering from the University of Surrey and is currently the chair of the Technical Council of the Audio Engineering Society. His talk was entitled “Spatial Audio - Reconstructing Reality or Creating Illusion?”

    In his excellent 90 minute presentation (available for viewing in its entirety by AES members), Francis Rumsey explores the history of spatial audio in detail, examining the long-term effort to reach perfect simulations of natural acoustic spaces. He examines the divergent philosophies of top audio engineers who approach the problem from a creative/artistic point of view, and acousticians who want to solve the dilemma mathematically by virtue of a perfect wave field synthesis technique. Along the way, he asks if spatial audio is really meant to recreate the best version of reality, or instead to conjure up an entertaining artistic illusion? This leads him to the main thesis of his talk:

    Rumsey suggests that as spatial audio approaches the top-most levels of realism, it begins to stimulate a more critical part of the brain. Why does it do this? Because human listeners react very strongly to a quality we call “naturalness.” We have a great depth of experience in the way environmental sound behaves in the world. We know how it reflects and reverberates, how objects may obstruct the sound or change its perceived timbre. As a simulated aural environment approaches perfect spatial realism and timbral fidelity, our brains begin to compare the simulation to our own remembered experiences of real audio environments, and we start to react negatively to subtle defects in an otherwise perfect simulation. “It sounds almost real,” we think, “but something about it is strange. It’s just wrong, it doesn’t add up.”

    Take as an example this Oculus VR video demonstrating GenAudio’s AstoundSound 3D RTI positional 3D audio plugin. While the audio positioning is awesome and impressive, the demo does not incorporate any obstruction or occlusion effects (as the plugin makers readily admit). This makes the demo useful for us in examining the effects of subtle imperfections in an otherwise convincing 3D aural environment. The imperfections become especially pronounced when the gamer walks into the Tuscan house, but the sound of the outdoor fountain continues without any of the muffling obstruction effects one would expect to hear in those circumstances.

    Rumsey concluded his talk with the observation that near accurate may be worse than not particularly accurate... in other words, if it’s supposed to sound real, then it had better sound perfectly real. Otherwise, it might be better to opt for a stylized audio environment that exaggerates and heightens the world rather than faithfully reproducing it.

