When sound waves reach the inner ear, the neurons there pick up the vibrations and alert the brain. Their signals encode a wealth of information that allows us to follow a conversation, recognize familiar voices, appreciate music, and quickly locate a ringing phone or a crying baby.
Neurons send signals by emitting impulses – brief changes in voltage that spread along nerve fibers, also called action potentials. Interestingly, auditory neurons can fire hundreds of pulses per second and synchronize them with extreme precision to match the oscillations of incoming sound waves.
Thanks to powerful up-to-date models of human hearing, researchers at MIT’s McGovern Institute for Brain Research have determined that this precise timing is vital for some of the most essential ways we make sense of auditory information, including recognizing voices and locating sounds.
Results of a publicly available study, announced December 4 in the magazine show how machine learning can facilitate neuroscientists understand how the brain uses auditory information in the real world. MIT professor and McGovern researcher Josh McDermottwho led the research, explains that his team’s models better equip researchers to study the consequences of different types of hearing loss and develop more effective interventions.
The science of sound
The auditory nervous system’s signals are synchronized so precisely that researchers have long suspected that synchronization is essential to our perception of sound. Sound waves oscillate at a rate that determines their pitch: low-pitched sounds travel in leisurely waves, while high-pitched waves oscillate more frequently. The auditory nerve, which transmits information from the sound-detecting hair cells to the brain, generates electrical impulses corresponding to the frequency of these oscillations. “Action potentials in the auditory nerve fire at very specific times relative to the peaks of the stimulus wave,” explains McDermott, who is also associate chair of MIT’s Department of Brain and Cognitive Sciences.
This relationship, known as phase locking, requires neurons to synchronize their spikes with sub-millisecond precision. However, scientists don’t really know how informative these ephemeral patterns are for the brain. McDermott says this question is not only intriguing from a scientific perspective, but also has essential clinical implications: “If you want to design a prosthesis that delivers electrical signals to the brain to replicate ear function, it’s probably quite important to know what kind of information is contained in the prosthesis a normal ear actually matters,” he says.
This was challenging to study experimentally; Animal models do not provide much information about how the human brain extracts structures from language or music, and the auditory nerve is inaccessible to human studies. So McDermott and graduate student Mark Saddler PhD ’24 turned to artificial neural networks.
Artificial hearing
Neuroscientists have long used computational models to study how the brain can decode sensory information, but until recent advances in computing power and machine learning methods, these models were narrow to simulating plain tasks. “One of the problems with previous models is that they are often much too good,” says Saddler, who now works at the Technical University of Denmark. For example, a computational model tasked with identifying the higher pitch in a pair of plain tones is likely to perform better than humans who are asked to do the same. “It’s not something we do every day when we hear,” Saddler emphasizes. “The brain is not optimized to solve this very artificial task.” This mismatch narrow the conclusions that could be drawn from previous generation models.
To better understand the brain, Saddler and McDermott wanted to challenge a model of hearing that would do the things that people employ their hearing for in the real world, such as recognizing words and voices. This meant developing an artificial neural network to simulate the parts of the brain that receive signals from the ear. The network was fed input from approximately 32,000 simulated sound-detecting sensory neurons and then optimized for a variety of real-world tasks.
The researchers showed that their model closely replicated human hearing — better than any previous model of auditory behavior, McDermott says. In one test, an artificial neural network was asked to recognize words and voices in dozens of types of background noise, from the hum of an airplane cabin to enthusiastic applause. In all conditions, the model behaved very similarly to humans.
However, when the team lowered the duration of the pulses in the simulated ear, their model could no longer match humans’ ability to recognize voices or identify the location of sounds. For example, although McDermott’s team had previously shown that people employ pitch to facilitate them recognize voices, the model revealed that this ability is lost without precisely timed signals. “You need the timing of the pulses to be quite precise to account for human behavior and get the job done well,” Saddler says. This suggests that the brain uses precisely timed auditory signals because they support the practical aspects of hearing.
The team’s findings show how artificial neural networks can facilitate neuroscientists understand how information acquired through the ear influences our perception of the world, both when our hearing is intact and when it is damaged. “The ability to relate auditory nerve impulse patterns to behavior opens many doors,” McDermott says.
“Now that we have models linking neural responses in the ear to auditory behavior, we can ask, ‘What impact will this have on our auditory abilities if we simulate different types of hearing loss?’” McDermott says. “It will help us better diagnose hearing loss, and we think there are also extensions that will help us design better hearing aids or cochlear implants.” For example, he says: “The cochlear implant has various limitations—it can perform some tasks and not others. What is the best way to configure a cochlear implant to mediate behavior? You can basically say this with models.