Interactive voice assistants: Are we there yet?
We take a look at the current iterations of voice assistants. Which ones are leading the charge on the way to truly conversational experiences?
Long foreseen in film and fiction, the voice-to-voice conversational bot is drawing near. Siri, Alexa, and Google have all been in our lives for years. While they may be good at performing tasks and answering questions, they have not quite become the conversational companions science fiction promised. The introduction of generative AI may finally give voice assistants the push they need to realize their full potential.
Still, as with everything related to AI, there are risks involved and outstanding questions about whether it can truly deliver on the hype. We explore which voice assistants are closest to becoming conversational and what hurdles remain.
What would a conversational voice assistant sound like?
For much of their history, voice assistants have only been able to answer simple questions (if you’re lucky) and complete simple tasks like setting a timer or playing a song. Even these tasks required users to be specific with their asks and often just surface a variety of links to answer a question. Want to ask a follow-up question to get more information within context? Forget it.
To become more conversational — and, by extension, useful — there are a few features that voice assistants will need to achieve: memory, contextual awareness, and response time. For most of their histories, voice assistants have treated each question separately rather than understanding the context provided by previous questions.
Speaking of context, contextual awareness is also crucial in allowing users to speak naturally to voice assistants without having to be hyper-specific about what they need the assistant to do. And, importantly, the voice assistants must do all of this quickly.
The next question is whether or not any of the tech companies are close to achieving this level of conversationality.
Which companies are close to having a conversational voice assistant?
Earlier this year, Apple announced it would bring Apple Intelligence to Siri, using AI and large language models (LLM) to give the assistant the ability to understand context. What does that look like in the real world? The Verge explains, “Where the current Siri needs explicit instructions on what to do and how to do it, Apple promises that this new version will let you say something like, ‘Siri, what time does Mom’s flight land?’ and the assistant will know to look through your Mail and Messages and pull out the information.”
However, in December, with 2024 nearly in the rearview mirror, Siri received a second round of enhancements in the iOS 18.2 update, which aimed to improve the voice assistant and integrate with ChatGPT. While it offers new features such as visual intelligence and the ability to create “genmojis,” conversationality still seems a ways off.
Nevertheless, IBM sees reason to believe that other voice assistants are taking a major leap forward: “Google’s recent launch of Gemini Live for Android users marks a significant milestone in this AI marathon, closely following OpenAI’s development of ChatGPT’s Advanced Voice Mode. These next-generation voice assistants represent a leap forward from their predecessors like Apple’s Siri and Amazon’s Alexa.” Google has also added the ability to ask Google Lens questions out loud about what you are seeing.
Open AI comes close
Thus far, it seems like OpenAI is on track to be the first to fulfill the expectations science fiction set for voice assistants. The International News Media Association says that Open AI’s voice assistant — you can choose from a number of voices with different names — hits the mark with an impressive interactive voice that allows you to interrupt and react to what is being said. In an impressive, though rather whimsical, demo, INMA showed off the voice assistant's ability to change emotions, speak in a whisper, or engage in multiple languages.
Privacy is still a concern for AI assistants
Voice assistants enabled by AI and LLMs may be making big strides, but as with generative AI on the whole, many questions still remain unanswered. Stephen Kowski, Field CTO at SlashNext Email Security+, said to IBM, “As AI voice assistants become more integrated, concerns arise around data collection, storage and potential misuse of personal information. There are also ethical considerations regarding consent, transparency about AI interactions and the potential for manipulation or misinformation.”
In fact, Apples well-known concerns over privacy may be part of what is slowing the company down. As The Verge reports, many Apple devices like the HomePod and Apple Watch “likely don’t have enough processing power to run generative models, many of which Apple wants to operate locally for privacy purposes….” So, while other companies are adding AI-driven features to old devices, Apple’s latest assistant updates are only available on newer smartphones.
Meanwhile, generative AI itself is still working out the kinks. As TechCrunch points out, it still has a habit of “hallucinating.” Just as with cybersecurity and gaming, the jury is still out on whether AI is delivering on its promises.