Verticalized Voice AI: An Exciting Opportunity in 2024

Sign up for our newsletter

Krishna K. Gupta | January 2, 2024

by Krishna Gupta & Cavin Mozarmi

We’ve been investing in applied AI voice companies since 2009, when we wrote the first check into Ginger (which has since merged with Headspace). As we approach 2024, we’re increasingly bullish on a particular applied AI opportunity: building conversational AI agents – particularly leveraging the voice channel – for specific verticalized use cases. In Ginger’s day, analysis of voice conversations was the key advance that could be leveraged; today, generative voice conversations are enabling an entirely different opportunity set. We want to invest in and partner with every such company.

We have proprietary expertise with this opportunity given my experience with Presto Automation (Nasdaq: PRST), one of the first and most immediately scalable applications of verticalized Voice AI. As Chairman and former CEO of Presto, I have seen the voice AI play develop with all its challenges and massive opportunity over the past 3 years–opportunity that comes from the convergence of technological progress and macro-economic realities. Presto’s verticalized Voice AI application for drive-thru restaurants has shed light on the kinds of conversational AI opportunities that may exist in nearly any vertical that relies on customer engagement for transactions and/or support. It has also shed light on what is possible and what, despite all the hype, is not yet possible.  

Below are some of our musings as we eagerly search for another 5 such applications to invest in, partner with, and help build over the next 2-3 years.


(1) AI can automate critical customer interactions in partnership with humans.

Customer interactions represent significant cost and drain on resources and yet are existentially valuable to get right. Most complex customer interactions cannot be fully automated anytime soon, but a set of applications are becoming possible thanks to advances in LLM technology. A combination of AI and human-in-the-loop (HITL) can remove repetitive customer interactions from the enterprise payroll in a way that was previously impossible (due to the rapid development of language models, richer datasets, decreasing latency in voice agents, and better processing technology).

The level of HITL that’s required correlates with the complexity of the use case: scheduling customer appointments would be a simple use case, as the inputs and outputs are standardized, which lends itself well to automation. Medium-complexity applications include such examples as home service companies fielding requests where it’s necessary to iterate between the agent and customer to gather more complete, issue-specific information, or banks resolving financial and tax inquiries from their customers. These more complex scenarios will be more human-intensive until LLMs are sufficiently trained.

The most difficult use cases to automate and thus the ones most likely to remain human-intensive for some time are the bespoke conversations happening between two businesses, such as between providers and payers in healthcare or suppliers and contractors in industrial settings.  Companies will still want to retain some human customer touchpoints, but they will be able to remove the more mechanical ones with the help of AI-powered applications. In the restaurant drive-thru application, the analogy is to remove the order-taker but keep the human who is handing consumers their final orders and have a brief conversation that enables the customer to leave with positive sentiment and greater connection with the brand.

(2) AI can drive both material labor savings and revenue expansion.

The ROI for customers using verticalized conversational AI agents begins at labor savings, especially in an environment where wages are rising. To determine the potential value of using a conversational AI agent to automate a function, it’s necessary to look at the communication channels a business uses for a given use-case as well as the average revenue per interaction. The higher the communication volume, the more total work there is to automate, and the more important that channel is to the business. Average revenue per interaction is also significant;, if it’s high, as it is in home services, then the cost of every missed call or text (because a human isn’t always available to answer) is even more painful.

But the ROI extends beyond that, as the revenue upsell opportunity these agents offer can be quite meaningful. The delta between the status quo—often, minimum wage employees with little to no incentive to upsell or cross-sell—and the potential outcome with an intelligent, trained conversational agent system can be massive. A Voice AI conversation can optimize this process, as it can deliver personalized recommendations in a more empathetic and real-time manner. That is certainly the case in the restaurant drive-thru application. Further, beyond revenue upsell, the data a conversational AI agent interacting with customers captures can often add previously untappable insights or opportunities for the enterprise.

(3) Industry expertise is a competitive advantage for AI integration.

Integrating a proprietary conversational agent within a customer’s operations enables direct access to some of the most important parts of the backend, and this serves as an important source of defensibility. Voice conversational agents in healthcare often need to integrate with EHRs and likely multiple disparate legacy systems. In financial services, they must integrate with banking cores to automate more in-depth workflows. With Presto as an example in the restaurant industry, these elements range from a POS system to a digitalized “menu” of offerings to a customer CRM or loyalty system, etc. Getting these integrations right and enabling a conversation and/or transaction to take place in a smooth manner is completely non-trivial. Add in the go-to-market specialization required to integrate something as transformational as conversational AI, and one realizes how challenging it can be for a generalized company to enter a particular vertical against a vertical-focused competitor.

(4) The best AI today is augmented by humans.

The dance between machine and humans, which we view as the holistic AI solution, is critical to accuracy and thus customer adoption. As I’ve learned, the state-of-the art tech is *not* good enough to have a latency-free, edge case-covering, accurate conversation right away while also syncing into all the backend hooks required for particular verticalized use cases. Like most leading AI applications, human-in-the-loop is initially required to fine-tune AI models, but it’s tricky to get it right. Over time, the conversational agent will become less reliant on humans—even as the customers continue to care almost entirely about the output. The companies that succeed here will win customers quickly, which then enables them to train their AI systems, which in turn allows them to scale their businesses in a way that improves margins. I’m keen to find companies that understand this balance and lean into it.

We’ve already made several verticalized voice AI investments, and we’re actively looking for more. As the AI transformation journey continues into 2024 and beyond, we look forward to building companies at the cutting edge of voice AI technology to generate real value across industries.