Here's how Microsoft's Azure AI creates realistic digital voices

AI can create realistic digital voices, but Microsoft discusses why it's important to use it responsibly.

What you need to know

Microsoft Azure AI technology is being used to create realistic voices for chatbots and digital experiences.
The tech uses recordings of real voices and deep learning to create realistic digital voices.
Microsoft discusses the importance of using the technology responsibly in its blog post.

Microsoft Azure AI technology is being used in new ways to help people interact with characters and chatbots. A blog post from Microsoft highlights how Azure tech is being used to create experiences ranging from interacting with Bugs Bunny in an AT&T Experience Store to conversing with the Flo chatbot from Progressive Insurance. In addition to sharing some examples of the tech in action, the post announces the general availability of the Azure Cognitive Service, Text to Speech.

Many digital voices sound robotic and janky. Microsoft is trying to make this a thing of the past with neural text-to-speech technology. The technology uses recorded phrases and deep learning to create realistic digital voices.

Xuedong Huang, a Microsoft technical fellow and the chief technology officer of Azure AI Cognitive Services explains how the process works:

The real technology breakthrough is the efficient use of deep learning to process the text to make sure the prosody and pronunciation is accurate. The prosody is what the tone and duration of each phoneme should be. We combine those in a seamless way so they can reproduce the voice that sounds like the original person.

If all of this sounds a bit familiar, you may have seen coverage of Microsoft's patent for similar technology. The patent made the news because it the technology described within it could be used to create chatbots of dead people.

Microsoft is aware of the fact that technology like this could be used in creepy or dishonest ways, and it talks about transparency in its blog post. Access to the technology is limited and requires disclosure of how it will be used. Microsoft explains:

A conversation with Bugs Bunny might feel real, but everyone knows that it isn't – because Bugs is a fictional character. That's an important distinction, and one that Microsoft is careful to protect in every application of the technology. That's a key reason Custom Neural Voice is limited access, meaning interested customers must apply and be approved by Microsoft to use the technology. In this case, general availability means it is ready for production and available in more Azure cloud regions, not that it is available to the general public.

While many uses for Custom Neural Voice involve a fictional character, sometimes a customer wants the voice to be a real person, such as an author reading their own book. Even in those cases, it is important that people know the voice is synthetic, which is why Microsoft includes a disclosure requirement in its contract.

Another section of the blog post covers Microsoft's "commitment to responsibility" in regard to the technology:

As creators of this technology, we have an obligation to make sure it's used responsibly. We take responsible AI very seriously; it's one of our core tenets. And we're careful with the partners we work with in making sure they follow the guidelines.

Luzon Viral

Here's how Microsoft's Azure AI creates realistic digital voices

What you need to know

You May Also Like

No comments:

Search This Blog

Recent Posts

Popular Posts

Random Posts

Recent Posts