From Neural Signals to Speech

author: Mara Krause, 30.05.2026

cyber brain, computer, brain, the internet, web3, 0, blockchain, cyber, artificial intelligence, brain, brain, brain, brain, brain, artificial intelligence, artificial intelligence, artificial intelligence, artificial intelligence

What if an illness doesn’t have to determine what you can do and what you can’t? What if the connection of humans and technology does not just seem scary but can actually improve many lives?

In June 2025, a group published the paper „An instantaneous voice-synthesis neuroprosthesis“. While this title might not sound too exciting, it shows the potential of science and new technology that make me very excited for the future and eager to be a part of it. This article is all about an achievement of restoring communication for a man who has lost the ability to speak.

 

What exactly was the goal of the research group?

The research group aimed to combine technology with the human body to restore the ability to speak for people who lost it (dysarthria). They emphasize that speech plays a huge role for expressing opinions or feelings and participation in society. Their project specifically focused on restoring speech for a test person with dysarthria to return to everyday life more independently and improve quality of life.

But what was their plan to restore speech?

Brain Computer Interface (BCI) is the keyword here. It means connecting the power of brain and computer. A computer measures the brain activity and, in this case, translates it to real speech. But that is easier said than done.

 

What exactly did they do?

The research group worked with a participant that was called T15 in the paper. He is a 45-year-old man with Amyotrophic Lateral Sclerosis (ALS) for five years at the time of the study. ALS attacks the nerve cells that control voluntary muscle movements and is incurable. The cognitive function stays intact, but patients lose the ability to do most movements. T15 also suffered from severe dysarthria which means that he can still vocalise but cannot produce intelligible speech.

What they did was surgically placing electrodes in a specific area of T15´s brain and then developing a computer system that translated the brain activity to speech. These are basically the two steps of how BCI works: measuring brain activity (through electrodes) and then developing a computer system that translates them.

Afterwards they conducted several research sessions with T15 that went from day 25 to 489 of the experiment to test the BCI.

 

Placing electrodes in the brain

Specifically, they placed 256 microelectrodes in three areas of the brain:

  • primary motor cortex: drives the physical movements of the tongue, lips, and jaw and sends signals directly to muscles
  • ventral premotor cortex: is involved in the preparation and planning of speech
  • middle precentral gyrus: organises and plans the rapid movements of the vocal tract and strings sounds together

 

The researchers chose those areas based on scans they did before placing the electrodes. In their paper they wrote that „ functional magnetic resonance imaging confirmed that T15 was left-hemisphere dominant for language “.

 

The electrodes are made of silicon while their tips are coated with iridium oxide. This material perfectly records brain activity from neurons.

The brain has over 80 billion neurons, all connected to different parts of the body and brain. They can send messages via electrical signals (called action potentials) and therefore make every body part work together. Every time T15 wants to talk, the neurons in these specific regions fire electrical signals, but due to his illness, they never reach the mouth. But the electrodes measure those signals even before they are lost.

 

The computer system to translate

The next step after the electrodes recorded those millions of electrical signals is to send the data to a computer. This happens every millisecond.

To translate those millions of electrical signals into data, the research group developed a complex pipeline. They called their achievement a “real-time neural-decoding pipeline”:

The neural activity is sent to a Transformer-based decoder to do the translation part, which is, of course, done by artificial intelligence (AI). The AI program is similar to a LLM like ChatGPT because it runs on the same type of neural network. But instead of predicting the next word in a sentence, it predicts the next sound based on brain activity. This is why the system is so fast: it doesn’t have to wait for the user to finish the whole sentence but looks at the past (specifically the last 600 ms) to predict what T15 wants to say in that moment. But it doesn’t try to predict the whole word, only acoustic features. For example, how high or low the voice should be, what shape the mouth, tongue and throat would be… I found this particular connection fascinating because it uses the same AI architecture as most people every day, but for a completely different purpose.

The final step is the Vocoder. The Vocoder takes the acoustic features and plays them like a human-sounding voice through a speaker.

 

Research sessions to test BCI

After they successfully implanted the electrodes and set up the computer part, the research group wanted to test T15´s new skills. They conducted many tasks with him, including open conversations, made-up pseudo-words and even singing short melodies.

 

What are the results?

T15 was able to speak again (through a computer). This is a huge achievement and he said himself that it „made me feel happy and it felt like my real voice“.

But speech is a complex task and involves a wide range of vocabulary and different speech melodies to emphasis and ask questions. For example, if T15 was asked what he liked and he answers “I like baking my family and my friends” with monotonous emphasis, you might be surprised or scared. But emphasising “I like baking, my family, and my friends” gives the sentence a whole new meaning.

What this research group managed to do is giving T15 great freedom of speech with emphasis, questions, melodies, and he was even able to sing. “We instructed T15 to use the brain-to-voice BCI to say made-up pseudo-words and interjections (for example, ‘aah’, ‘eww’, ‘ooh’, ‘hmm’ and ‘shoo’)”. Besides pseudo-words, the BCI even successfully filtered coughing, throat clearing, yawning or people talking in the background.

 

But how did they measure the success of their experiment aside from saying it worked well or not? Science works on comparable measurements, not subjective experiences.

To measure the relationship between the synthesised voice and the intended sound, they worked out the so-called Pearson correlation coefficient. A coefficient of 1 shows a perfect relationship between intended and synthesised sound, while 0 means a bad relationship. The interjections (“hmm”…), for example, showed a Pearson correlation coefficient of 0.79 ± 0.08 which is really good. It means the BCI could mostly identify what T15 tried to interject.

To measure the BCI error rate in conversations, they asked several people to listen to T15 or have conversations with him. They conducted, for example, transcript-matching-tasks where the listeners were given six possible answers to choose from that should identify what T15 said. The average listeners evaluated synthesised sentence with accuracy of 100%.

 

Other measurements include the word error rate: the listeners had to write down every word T15 said. The BCI improved the Word Error Rate (WER) of T15 from 96.43% to 43.75% in open conversations. While a WER of 43.75% might not sound too exciting, human conversation heavily relies on context in conversation which was not given, they just had to write down what they understood.

He even turned statements into questions with 90.5% accuracy and emphasized specific words with 95.7% accuracy.

 

What were challenges?

There were several technical challenges they had to overcome.

  1. One major challenge was training the computer system, specifically the decoder, with ground data. As already mentioned before, the decoder is an AI. You might know, AI needs ground data for predicting an outcome. But as T15 wasn’t able to speak intelligibly, so there was no ground data to refer to. To solve this, the researchers used among other things Text-to-Speech (TTS): they showed T15 text on a screen which he had to vocalise to get data about his personal brain rhythm.
  2. The research group had to regularly retrain the decoder because the neural activity changed over time in T15.
  3. In their paper they admitted themselves: “We note that participant T15, who had been severely dysarthric for several years at the time of this study, reported that he found it difficult to try to precisely modulate the tone, pitch and amplitude of his attempted speech. Therefore, we propose that using discrete classifiers to generate real-time modulated voice (which provides feedback to the participant that helps them to mentally hone in on how to modulate their voice) can provide an intermediate set of training data useful for training a single unified decoder capable of continuous control of phonemic and paralinguistic vocal features.”
  4. Another point is that T15 was significantly slower than healthy speakers. Although the delay between thought and sound was less than 10ms, faster than a human blink, it is noticeable and the speech tempo is just different than for healthy speakers. Additionally, a WER of 43.75% has, of course, potential to improve. They also noted that the engagement and energy level of T15 influenced the quality of the synthesis.

 

What does the future hold?

A major goal is to see if people with different causes of speech loss could also regain speech with BCI. The research group concludes that “it remains to be seen whether similar brain-to-voice performance will be replicated in additional participants, including those with other aetiologies of speech loss or people with late-stage ALS with complete paralysis and who are in a locked-in state. “

 

Another milestone in BCI would be implanting more electrodes to measure more brain activity. Improving the AI algorithm and technical setup would definitely also be an option for better results in the future.

“We predict that accuracy improvement is possible with further algorithm refinement and by increasing the number of electrodes”

 

All in all, this paper demonstrated the huge potential of BCI to transform the life of people with incurable illnesses and I am sure the future will bring even better results, maybe one day with an accuracy near 100%.

 

What I am still wondering after reading the paper several times:
  • The paper says they conducted experiments from day 25 to 489, but what happened afterwards? Does T15 still have the electrodes and is learning to live with this new technology?
  • They said that T15 is left-hemisphere dominant for language. What would it look like if he wasn’t? Also, I wish that they would have gone deeper into the process of choosing which brain regions to implant the electrodes and how they did it. The actual brain scans and reasons to choose those three regions would provide a better picture of the whole process.
  • Are there current plans to test more patients and improve the experiment even further?
  • How much did it cost? BCI sounds like a process that could really help disabled people in the future, but only if it is accessible. Of course, the costs might be very high at the moment, but is it likely that BCI will become a common treatment in the future?

 

 

References

Wairagkar, M., Card, N.S., Singer-Clark, T. et al. An instantaneous voice-synthesis neuroprosthesis. Nature 644, 145–152 (2025). https://doi.org/10.1038/s41586-025-09127-3

Leave a Comment

Your email address will not be published. Required fields are marked *