Virtual assistants turn 16 this year and you don’t have to look too hard – or speak too loudly – to find them. In fact, there will be around 8 billion voice-based devices by 2023 – more than the world’s population today. From Amazon’s Echo and Google’s Assistant to Apple’s Siri, Samsung’s Bixby and Microsoft’s Cortana, billions of people around the world are using their voices every day to schedule appointments, get directions, play music or get answers quickly— all things that once required us to tediously type or write. Even Twitter recently announced that users can now audio tweet their inner musings.
And yet, despite widespread adoption of voice-based devices in our personal lives, applications based on voice are nowhere as pervasive in our professional lives as they are in our homes. One could argue that consumer technology leads the way in changing human behavior and that the consumerization of the enterprise is always driven first by an expectation that work tools should be equally as convenient as personal technology solutions. Take, for example, AOL Instant Messenger and Yahoo! Messenger, which started in the late ‘90s. It wasn’t until they reached peak consumer adoption that consumers started expecting communication with coworkers to be just like chatting with their friends; twenty years later, Slack was born.
Voice will be no different. As we gradually use voice as the standard medium of interaction between ourselves and technology in our personal lives, we will start to see an increasing demand for similar technologies in our professional lives. Steve McLendon, news product lead for voice at Google argues that “we see voice as the ubiquitous ‘always with you’ platform that allows you to do things in the real world.” As a data point, Gartner predicts 25 percent of workers will use some voice-based technology daily by 2021. That’s not surprising given that the percentage of CIOs already using, or immediately planning to use, virtual customer assistants rose 10 points to 31 percent between 2018 and 2019 [Source: Gartner, Market Guide for Virtual Customer Assistants, Brian Manusama, Bern Elliot, Magnus Revang, Anthony Mullen, 11 July, 2019]. And while virtual customer assistants can process both text and voice, some technologists forecast that speech will become the standard medium of interaction between people and their computers.
In addition, there have been some incredible advancements in technology, making this precisely the right time for enterprise applications of voice to take off. Historically, most voice-based technologies have fallen short of expectations. This is because the typical method of understanding voice is to first collect data, then transcribe and label data to provide structure and lastly, build a machine learning model on top. The issue often comes with acquiring a high enough volume of labeled data, which often serves as a bottleneck.
Most recently, there’s been amazing progress in natural language understanding. Natural language models from Google, Facebook, and OpenAI are starting to outperform humans on a variety of basic tasks. These new “pre-trained” and open source models allow users to “fine-tune” their needs using much smaller amounts of labeled data. OpenAI’s latest model, GPT-3, is material progress towards eliminating the need for fine-tuning entirely. While this research is still in its infancy, it has the potential to eliminate some of the rigidity of voice applications as they exist today and open new ways of understanding and utilizing voice.
Unlocking New Opportunities for Voice Applications with AI
While embracing voice as a technology platform for business applications is not a new initiative, efforts have been quite limited to date. Early applications existed mostly in recording and transcribing calls, such as doctors putting notes in EMRs.
However, with the advancements in AI technology, the applications of analyzing voice move these solutions from a nice-to-have feature to a system of record which can unlock opportunities for revenue growth, costs savings and retaining talent, just to name a few.
Here are some specific areas where the combination of voice and AI are proving to be a powerful duo.
- Recording & Analyzing Sales Calls. Almost every sale today happens over the phone. Gong, for example, has best-in-class AI which structures and mines the treasure trove of call data to surface unique business insights, not only for particular sales deals but also for better understanding market trends and competitive offerings.
- Real-time feedback in Customer Support Calls. While customer support calls have been “recorded for quality assurance” for decades, newer companies like LevelAI and ObserveAI use advanced AI to provide customer support reps real-time guidance on how they can quickly resolve issues.
- Tuning out Background Noise. Companies like KrispAI use AI in real time to mute background noise in any communication apps. This can come particularly in handy when numerous people are working from the same space or when one is taking a call on the go.
- Remote team collaboration. It happens to the best of us: we read an email or comment on a Notion doc and assume our colleague was writing in a passive aggressive tone. Imagine for a minute that instead of having to guess the tone of your colleague, you could simply just listen to their voice note on a doc. While this is a very nascent technology, companies like Walkie allow you to leave digital voice notes for your colleagues across applications.
- Security via voice biometrics. As online security and identity authentication becomes more and more important, some companies are starting to look to voice as the best way to uniquely verify employees. Companies like AI Secure and VoiceKey provide a platform to authenticate into applications using your voice.
- Intelligent voice assistants for Industry. Consider Datch, which focuses on manufacturing and energy industries. Its software allows workers to conversationally capture their knowledge instead of requiring them to spend hours staring at a screen hunched over their desks manually entering data.
- Field Service Technicians. Deskless workers must rely on their voices to communicate; pulling out your phone and texting while you are on a construction site is neither safe nor efficient. Hardline, for example, is creating the “Slack for Voice” for field technicians. Using AI, eventually general contractors how much of the project is completed and predict any delays based on these conversations.
- Healthcare Applications. From mental health to neurodegenerative disorders, understanding and analyzing voice can be a powerful tool for healthcare. For example, Voiceitt enables someone with a severe speech disorder to talk into their phone, and it reads out a transcript of what the person said. The company’s speech recognition software becomes trained on the speaker’s voice so it gets better at understanding their unique speech irregularities and pronunciations. WinterLight Labs uses short samples of speech to identify those at risk of dementia.
It’s almost paradoxical that while we have a myriad of communication tools like email, Slack, WhatsApp, Facebook Messenger, SMS and others, the single most efficient way to transfer information is still by voice. Take for example, explaining a complicated issue or having a difficult conversation; speaking live is always the best medium. Language is, in fact, the original, most important tool setting our species apart from any other. It empowers our most complex capabilities and underlies the achievements of civilization. And now we are finally poised to truly unleash it in an intelligible, logical digital form, allowing entrepreneurs to create unimagined new solutions and solve our society’s most pressing problems.