From robotic to real: brief overview of AI voiceover technology
Businesses across various industries are increasingly embracing AI technology. For instance, integrating AI chatbots into customer service has revolutionized how companies interact with customers, significantly boosting customer response rates.
But can AI voiceovers as such drive business growth? Despite their efficiency, they can lack the warmth and emotion of a human voice, resulting in a poor viewer experience.
Most AI voice generators are trained on extensive datasets of recordings from professional voice actors, learning speech patterns like intonation, pace, and accents. Using text-to-speech (TTS) technology, AI then generates the desired voice, whether it's an energized female US commentator or a UK male with a deep soothing voice.
But why do some AI voices sound great while others fall flat? The issue is that TTS alone can make voices sound flat and robotic. That's where natural language processing (NLP) comes in. NLP enhances AI's ability to mimic human tone, rhythm, and voice fluctuations, making the voice sound highly (and sometimes unbelievably) natural.
Unlocking the potential of AI voiceovers: an in-depth analysis by the ELEKS team
Voice generators, AI voiceovers, synthetic voices—whatever you call them, you probably hear them every day in narrations for commercials and corporate videos. Many companies also use AI voices for instructional content like e-learnings and video tutorials.
But despite their efficiency, AI voices can undermine a brand. If viewers detect a robotic tone, they might instantly dismiss the voiced content as generic or untrustworthy.
So are there any AI voices on the market today that are advanced enough to avoid these pitfalls? The ELEKS Information Development Office conducted an in-depth analysis of over 100 voices across 20+ popular AI voiceover tools.
During our research, we categorised these tools into three groups based on how human-like their voices sounded.
AI voice quality | Score | Description |
---|---|---|
Low-quality AI voices—6 tools | 1–4 out of 10 | These tools attempted to add emphasis and differentiate between sentences, questions, and exclamations. However, the robotic tone was unmistakable. Phrases like “I’m having an awesome day” or “I need to go to the hospital” lacked the emotional cadence naturally present in a human voice. |
Medium-quality AI voices—12 tools | 5–7 out of 10 | These tools successfully differentiated between sentences, questions, and exclamations and attempted to emphasise words naturally. But their tone was too even, lacking the dynamic highs and lows needed to convey genuine emotion or curiosity. |
High-quality AI voices—3 tools | 8–10 out of 10 | These tools skilfully identified emphatic words such as “really,” “never,” and “great,” effectively conveying excitement in one sentence and fear in the next. They sounded almost human-like, though not completely indistinguishable, a point we'll explore further. We also considered the reliability of these tools, which proved to be a crucial factor, as discussed in the next section. |
Which AI tools offer the best voice quality?
In our research, we focused on AI voices for promotional or explainer videos, selecting tools that allow commercial use. We excluded text-to-speech tools designed for personal use, like those for reading web content or books aloud, as well as AI voices intended for TV broadcasts, audiobooks, or meditation videos. Additionally, since we had a pre-written script, we did not explore any script generation tools.
To streamline our analysis, we concentrated on male English voices from the US and UK, selecting from a pool of over 100 options.
Below is a list of the AI tools we evaluated, including our top-performing voices from each tool.
High-quality AI voices | Medium-quality AI voices | Low-quality AI voices |
---|---|---|
ElevenLabs | Brian, Jeremy, Liam, Paul, Tyler Kyrk NaturalReader (Commercial version) | Arnold, Jeremy, Jon Play.ht | William |
Aidocmaker.com | Good Normal Audiate | Conversational Masculine – Davis Chat, Guy Default, Ryan, Tony Friendly Cohesive | Adam Descript | Malcolm Lovo | Marcus, Shawn Murf | Finn, Freddie Music Radio Creative | Echo, Mike Speechify | Evan, Guy Friendly, Liam, Nate Revoicer | Andrew, Caleb, Grayson Synthesys | Ian Synthesia | Newscaster, Professional Voicebooking | David |
Amazon Polly Listnr Respeecher Speechelo WellSaid Labs Vidnoz |
Curious to see for yourself? Here’s a compilation of all AI voices we rated as high-quality ones.
In our ranking, we placed ElevenLabs' voices at the top due to their unmatched reliability. The Jeremy and Liam voices maintain consistent quality in the free and paid version and have demonstrated this stability for over six months.
At first, we assumed all tools would offer this level of dependability. However, we encountered an unpleasant disappointment with another tool that also offers the Jeremy voice. While the preview sounded really good, the paid version failed to match the quality offered by ElevenLabs.
How close can AI get to sounding like a voice actor?
As technology advances, the line between AI-generated and human voices is blurring. This development holds significant promise for video creators.
Despite the progress, AI voiceovers, while no longer flat or unemotional, still often fall short compared to their human counterparts. With a human voiceover, you can almost "hear" the speaker's body language – the raised eyebrows, hand gestures, and overall energy. Without a physical presence, AI voices just can't capture that same depth and richness.
The AI voice did a fantastic job—and at lightning speed for less than a dollar (for the record, we used ElevenLabs' Jeremy voice).
However, there's a noticeable "agreeableness" to it. While it's far from being flat or monotonous, its smoothness can feel somewhat unnatural. To draw a poetic comparison, a human voice is akin to a river with its natural twists, shallows, and dynamic rhythm, whereas an AI voiceover flows like an artificial canal—direct and unvarying.
The difference is evident for those with sharp ears, so keep reading for some thoughts on when it's worth investing in a human voice.
What's the cost of a good AI voiceover?
With the quality of synthetic speech catching up to, and in some ways surpassing human voiceover, many find that the balance has tipped in AI's favour. Human voices can be more engaging, but this comes at a huge cost—often unviable.
For example, when it comes to e-learning, GVAA (Global Voice Acting Academy) advises voice actors to charge around $0.2–0.35 per word, which translates to about $35–50 per minute.
In the video that we shared above:
- The voiceover sourced from Fiverr would cost around $15 per minute. Alternatively, posting a voiceover request on Upwork with a budget of $2–$3 per minute for an ongoing collaboration can attract a decent number of applicants. Here's an example of a video on philosophy that could use such a voiceover.
- The AI-generated voiceover in the second part of the video—and that is the best one we found out of over 100 voices in 20 different tools—would cost $0.2–3.5 per minute. The exact cost depends on the tool you choose and the subscription cost, which in turn depends on the number of minutes you need to record per month.
Our top-rated tools | Free version limits | Monthly pricing (at the time when this article is published) |
---|---|---|
ElevenLabs A tool that out of 3 top picks, we’ve tested thoroughly and adore the most Fair price, at least 5 very good voices |
10K characters per month (~10 mins) | $5/month—30,000 characters (~30 mins) $22/month—100,000 characters (~120 mins) $99/month—500,000 characters (~600 mins) $330/month—2,000,000 characters (~2400 mins) |
NaturalReader Seems to be OK as well based on the trial version (we didn’t test the paid one) 4 very good voices BUT a small monthly limit (300,000 characters), best free version limits |
5K characters daily (~5 mins) | $99/month—300,000 characters (~360 mins) |
Play.ht Seems to be OK as well based on the trial version (we didn’t test the paid one) 1 very good voice, also good pricing |
12.5K characters per month (~12.5 mins) | $39/month—250,000 characters (~330 mins) $99/month—unlimited |
How much time can AI voiceover save?
Imagine you've successfully found, vetted, and hired a voice actor for your project. Now comes the crucial part: timing. How soon can they start voicing your script? Once you receive the initial version, how much time will revisions take? And then there's the post-processing of the final version. What's the timeline for that?
From our experience and insights from several industry discussions, each stage—drafting, revisions, and post-processing—typically takes around 24–48 hours. After a few rounds, the entire process of recording a 5-minute voiceover usually balances out to about 72 hours in total (or approximately 3 business days).
But what about using AI voiceovers? While post-processing isn't typically needed, some time must be allocated for fine-tuning:
- Low- to medium-quality voices: These often require considerable effort to add emphasis in the right places. A common issue is that they tend to read questions ("Curious about our secret?") and exclamations ("Get ready to see the difference!") in the same flat tone as regular sentences. This can be frustrating, as adding extra punctuation either has no effect or results in too much excitement.
- High-quality voices: These are a different story. They usually get the tone right 95% of the time and place emphasis correctly. For example, a high-quality AI voice might say, "You're probably already aware that human and AI are VERY different creatures," without any special fine-tuning. In our experience, you may need to adjust no more than 1–2 out of every 10 sentences, using quotation marks or ALL CAPS to slightly shift the emphasis.
With a high-quality AI voice, a 5–minute voiceover of a well-written script takes about 40–60 minutes to produce, as long as you don't need to do any major rewrites.
Evaluating AI vs human voiceovers: pros and cons
With the capability to generate AI voiceovers in mere minutes—often at no cost—it's clear why they have become a popular choice. However, are there significant drawbacks to consider?
While we've already discussed the quality and cost implications in depth, our experience suggests that other factors can be pivotal when deciding between a human or AI voiceover.
Comparison criteria | AI voice | Human voice |
---|---|---|
Budget constraints | Usually cheaper You can find a reasonably good AI voice at $0.2–3.5 per minute. |
Often more expensive While some voice actors may be ready to jump in at $2–5 per minute, as our research goes, most professional talent values their time at $15+ per minute. |
Time efficiency | Fast, almost instant AI voiceovers can be generated almost instantly, accessible 24/7 without the need for scheduling. |
Time-consuming It may take a while to find, audition, and hire the right talent. And then you have to wait for at least 24–48 hours for the recording. Plus, things happen, and a person might not be available when you need them. |
Editing flexibility | Easy to edit It’s super-easy to make as many adjustments and updates as you need. |
Might be tedious If you want to edit the script in iterations and search for the best-sounding words on the go, all the iterations may bump up the total cost. |
Consistency for repeated use | Very consistent AI voices maintain the same tone and quality every time. |
Can vary Human voices can vary due to factors like health or time of day. |
Impactful messaging | Basic tone control AI voices often sound generic and lack the nuanced performance a human can deliver. |
Highly authentic Humans have a much greater potential to sound authentic and engaging, and to evoke a powerful emotion. Human voices can adapt rhythms, intonations, and pitches to deliver a high-quality performance. |
Uniqueness | Usually not unique AI voices might be used in other ads or contexts, reducing uniqueness. |
Unique voice You can most likely negotiate terms to ensure a unique voice for your brand. |
Translation needs | Simpler but not limited Definitely easier (but you’ll probably still need a professional to check for any mistakes). Plus, you have to make sure your AI tool offers a high-quality AI voice in the language you need and allows setting pronunciation preferences for your brand/product name. Still, your slogan may end up sounding totally different in various languages (you won’t be able to customise tone and intonation). |
Depends on talent Hiring a different voice actor for every language will bump up the total cost. On the other hand, humans can easily adapt their pronunciation to your needs and get your brand/product name and slogan just right! |
Key takeaways
Numerous AI solutions provide voiceovers of varying quality levels. AI proves particularly beneficial for projects demanding swift outcomes and unwavering excellence. Despite AI-generated voiceovers excelling in prompt and dependable deliveries, human voice artists hold a unique edge. Their ability to infuse scripts with depth, emotion, and genuineness fosters a deeper connection with the audience.
When choosing between AI and human voiceovers, consider your specific needs, budget limitations, and the message you want to convey. By carefully evaluating these factors, you can make a well-informed decision that aligns with your brand and communication goals. If you're looking to implement AI solutions to automate or optimise processes beyond voiceovers, ELEKS' intelligent automation experts are ready to assist.
FAQs
AI voiceovers are cost-effective, instantly available, and highly consistent, making them ideal for quick and budget-friendly projects. On the other hand, human voiceovers offer authenticity, emotional depth, and a unique voice that can resonate more deeply with audiences. However, they come at a higher cost and longer production time.
AI voiceovers use advanced machine learning algorithms and NLP to mimic human speech. They are often trained on large datasets of recordings from professional voice actors, capturing speech patterns such as intonation, pace, and accent to produce a more natural-sounding voice.
Consider your project's specific needs, budget, and the message you want to convey. AI voiceovers are suitable for information-based videos and fast productions, while human voiceovers are better for content requiring emotional richness and a unique brand voice. Evaluate the cost, time efficiency, editing flexibility, and the need for impactful messaging to make an informed decision.