[ Thumbnail ] Technologies That Transform: Reimagining the Future of Music #2

Technologies That Transform: Reimagining the Future of Music #2

Singing in Someone Else’s Voice

February 7, 2024

Wouldn’t it be amazing to sing in the voice of your favorite vocalist? With “TransVox,” Yamaha’s AI voice transformation technology, such a dream could soon become a reality.

This is #2 of a three-part series.

#1 Exploring New Horizons for Guitars

TransVox converts vocal input into synthesized vocalizations that sound just like a specific person. This magical technology is Yamaha’s playful attempt to stimulate the creativity of users and provide an extraordinary singing experience.

Voice-Changing Technology Developed Over Decades

Yamaha has been developing singing voice synthesis technology for over 20 years. The most well-known example of its application is VOCALOID™, a software used for music production. Keijiro Saino, who is involved in the development of TransVox, explains that the technology stemmed from Yamaha’s VOCALOID research. “One of the unanticipated byproducts that came out of our research was the technology to instantly analyze singing voices. We thought it would be interesting to combine this with our existing singing voice synthesis technology.”

[ Thumbnail ] Keijiro Saino of the Research & Development Division.
Keijiro Saino of the Research & Development Division.

While the typical voice changer adds effects to the user’s voice through signal processing, TransVox “transforms” voices through a completely different mechanism. First, the system analyzes the singer’s voice. Then, it strips the sound down to just the “linguistic content,” eliminating the unique qualities of the user’s voice such as pronunciation or intonation. The AI then uses this bare “linguistic content” and reproduces it in another person’s voice, which it learned beforehand. Saino elaborates. “Rather than processing the sound, it eliminates the characteristics of a person’s voice and simultaneously replaces it with that of someone else.”

Saino has been involved in speech synthesis research for more than 15 years, starting in his university days. When choosing a laboratory to work in, he saw a demonstration by a speech synthesis team. “You type in ‘Hello’ and the machine says ‘Hello’ without anyone’s vocal input. It was mesmerizing to think that they were developing a piece of technology that would be essential for human-machine interaction,” he explains. Many years later, Saino found himself at Yamaha, developing singing voice synthesis and voice transformation technologies such as TransVox.

The Mic That Won People’s Hearts

While TransVox has unlimited potential for practical applications, it was in 2022 that it got its first opportunity to shine. “A member of the marketing department, who saw our TransVox presentation, took a strong interest and started thinking about how we could turn this technology into an appealing service. That's when we came up with the idea to make a karaoke microphone that lets you transform into a different person,” Saino says. The project quickly took shape, and Yamaha linked up with the well-known Japanese band Every Little Thing. The team started the development of the “Narikiri Microphone,” a mic that converts the user’s voice into that of the group’s vocalist, Kaori Mochida.

Because TransVox was still in the experimental stage, the team needed to overcome many hurdles to make it a viable service. One of the biggest challenges was the sound delay. At the time, it would take several seconds for the system to convert the sound. “The lag was just not conducive to a karaoke experience,” Saino explains. “We would try to eliminate the delay, but it was extremely difficult to do that without altering the quality or stability of the sound. It was quite a challenge to minimize the lag while also ensuring that the converted voice actually sounded like Ms. Mochida.”

Another issue was dealing with the noises surrounding the singer. Saino elaborates. “There are so many sounds inside a karaoke room. The mic was picking up all of the noise, including the background music and the people talking.” Contextual factors such as where the singer is standing in relation to the speaker can also greatly affect the quality of the experience. Saino and his team needed to find a way to make the microphones function properly in the more sound-diverse environment.

“It made us realize how conditioned we were to the ideal soundscape,” Saino confesses. Karaoke rooms are completely different from the perfectly soundproof studios that the team usually conducts experiments in. Of course, every member on the team knows that real life is more complex than the lab. “But no matter how much I understood it theoretically, it was the first time I really felt the difference first hand,” he says. He and his team made countless visits to karaoke rooms to make technical adjustments to the system.

After overcoming these many challenges, the “Narikiri Microphone” finally took shape, and was released in 2022. It was widely covered by the media and generated a lot of buzz on social media during the two months it was available for the public to experience.

YouTube video introducing Narikiri Microphone (Japanese)

Changing Voices With Unchanging Passion

What did Saino and his team gain from the development process of the Narikiri Microphone? He says he learned to always keep the user in mind. “No matter how interesting a technology is, it’s meaningless if it’s not exciting to use in practice,” he explains. “For us, the question was ‘how can we make the user feel like they’re really singing in Ms. Mochida’s voice?’ It’s not something quantifiable, so we needed to test it ourselves and trust our sensibilities.”

Saino adds that igniting the user’s creativity is also crucial to make the excitement last. “Some people think that this mic will magically make you a great singer, but that’s a misconception,” he explains. Because the tone and pitch of the user’s voice is reflected in the output, there’s a lot of room for them to adjust their singing so that they sound more like Ms. Mochida. In that sense, the Narikiri Microphone gives space for users to make the creative effort and express the image of Ms. Mochida they have in their mind.

This unique experience strongly reflects Yamaha’s values about AI technology. “We believe that AI should exist to support human creativity,” Saino says. “What’s the point of trying to express yourself if AI does all the work for you?”

Rather than turning you into Ms. Mochida, the Narikiri Microphone gives you the chance to become her yourself. This is precisely why the mic has narikiri in its name — a Japanese word that means “to pretend to be exactly like someone else.”

Saino says that while TransVox has provided a completely new and intriguing experience for its users, this is only the beginning for the technology. “We still want to widen the range of vocal expression, as well as find many more ways to apply this technology to experiences and services.”

In January 2023, Yamaha participated in the YOXO FESTIVAL held in Yokohama, Japan, where the theme of the year was “Future Experiences.” Yamaha’s booth featured a new TransVox experience, allowing visitors to see what it would be like to be a professional voice actor. The technology was used to convert the visitors’ voices into emotional anime-like voices. Seeing their astonished faces gave Saino the confidence that, while Yamaha’s main focus is music, TransVox’s potential is not limited to singing experiences.

As we’ve seen, TransVox changes a person’s voice into that of a completely different person in real time. Meanwhile, Yamaha’s new project, Upcycling Guitars, which are made of unused materials, explores new possibilities for the guitar as an instrument. Both projects use technology to transform one thing into another, but they have more in common than meets the eye. In the next and final article of this series, we will delve into the “Key” that ties these stories together. Stay tuned.

(Interview date: July 2023)

Previous Page #1 Exploring New Horizons for Guitars
Next Page #3 “Seeds” of Technology Shaping a New Era of Music

KEIJIRO SAINO

Saino is a member of the Research & Development Division. Delving into the world of MIDI and DTM in high school inspired him to pursue research in sound theory. He went on to study speech synthesis in college, and gained practical experience with VOCALOID during his internship at Yamaha. He has been involved in research and development of voice synthesis technology since entering the company.

*Bio as of the time of the interview

Three-Part Series: Technologies That Transform: Reimagining the Future of Music

#1 Exploring New Horizons for Guitars

#2 Singing in Someone Else’s Voice

#3 “Seeds” of Technology Shaping a New Era of Music