Having machines turn text into speech is nothing new.
Professor Stephen Hawking communicated with a computerized voice for many years, and by now, we’re used to our GPS devices or smart speakers asking questions and responding to our queries.
What is different these days is that the quality of synthesized speech is improving, thanks to several companies using AI to create voice skins for enterprise companies and content creators that give more options for turning text into speech.
LOVO, an AI voice and synthetic speech startup company, uses a voiceover API to turn text into speech in real-time using 200+ human-like voices in 33 languages using their “voice library.” Users also can clone their own voices to create their own skins, simply by reading 15 minutes of a script.
LOVO recently announced the close of a $4.5 million pre-Series A round, led by South Korean Kakao Entertainment. See here my full conversation with Tom Lee, Co-founder, and COO of LOVO (including a demo)
What Is AI Speech Synthesis?
Speech synthesis is simply the computer-generated production of audible human words.
Traditional text-to-speech robotic voices you hear on software or hardware products like Amazon Echo, Google Home, your GPS, or your ebook reader are fast and cheap for companies to create, but they can also be unoriginal and unrealistic.
Artificial intelligence or AI voice operates a little differently. AI voice uses deep learning to create higher-quality synthetic speech that more accurately mimics the pitch, tone, and pace of a real human voice.
For example, if you wanted to use LOVO AI to generate synthetic text, you can upload a script that you want to turn into audio content. Then choose one of the voices in their library, based on language, style, and character. With a click of a button, LOVO turns your script into audio that sounds pretty lifelike.
You can also clone your own voice by reading a short script, and LOVO will generate a custom voice skin you can use over and over again for videos, audiobooks, or anything else that requires voiceover.
Here’s a side-by-side comparison of original voices and voice clones:
Will AI voice technology replace voiceover professionals? Tom Lee, Co-founder and COO of LOVO, says no.
“I believe that isn’t going to happen. If you think about how humans and how AIs work, we can complement each other. As a voice actor, you can only do 6 or 7 hours of work a day. You can’t work 24/7, and you want to focus your energy on the most important gigs, or maybe you want to have a day job, and then you want your AI voice to make money while you sleep. You can record once with us, then take the revenue shares. One of our most famous voices is raking in a couple of grand a month without doing any work.”
The Many Potential Uses of Synthetic Speech
AI voice has a myriad of use cases, including:
Translation: Papercup is using AI voice to translate videos by generating voices that sound like the original speaker.
Video or audio ads: You can upload a script and create an ad without the added expense and time involved in hiring a voiceover artist. Descript has a collaborative audio/video editor that works just like a regular Word document.
E-learning (for kids, or for corporate training): Teachers and trainers will be able to make written materials more accessible for different types of learners with the help of AI voice automation.
Augmented reality and virtual reality: With the AR and VR markets exploding right now, there is a huge need for realistic, authentic human voices for apps and websites.
The global text to speech (TTS) market is estimated to reach $5.0 billion by 2026, according to marketsandmarkets.com – so the sky’s the limit for this exciting new technology.