Recreate real voices with total authenticity

Familiar voices bring real value, but scheduling studio time can be a challenge. With consent-driven cloned custom voice models, you can create AI-generated content that sounds just like the real thing at a fraction of the time and cost.

Request a Demo

Featured & Pricing

Bring truly lifelike AI voice to your projects

Custom Voice

Starts at

$9k/per voice

Start with a custom synthetic voice clone, securely built, and create personalized content in multiple languages using verified AI voices that sound human. Great for everything from film to podcasts.

Get Pricing Details

Clone voices using text-to-speech or speech-to-speech
Create custom models securely and with consent
Manage end-to-end voice needs and licensing in one place
Protect your model with inaudible watermarks and traceability
Generate new audio clips seamlessly once your model is built
Translate content into multiple languages to reach new audiences
Create your own lexicon for custom terminology recognition
Monetize custom voices for your podcast via Veritone Voice Network

How it works

Input audio, build a model, and create content

Step 1: Secure consent

As ethical cloning pioneers, we never build a voice model without approval. The individual whose voice will be used must provide their explicit consent. If the talent is deceased or in the public domain, the estate or IP owner must sign off.

Step 2: Input pre-existing or newly recorded audio content

Next, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages.

Step 3: Customize voice content

Once the model is built, you can use the self-serve app for both text-to-speech and speech-to-speech content creation in near real-time. Or work with our experts to manage your output needs. Additional models in new languages can be built in about two days.

Real-world AI voice success

The Veritone Voice Network allows us to not only expand into new markets, but to authentically engage with our audience and build out those communities in ways that were not previously possible.

Doug Ellin, HBO hit series Entourage’s Emmy-award winning writer, producer, and creator

Learn More

Veritone Voice has opened a whole new door for us. We have an answer to our core challenge—how can we get this content in front of a global audience at scale and with minimal cost in both time and resources? Veritone removes the barrier of language fluency to maximize the reach of my voice and message, and build communities outside of English-speaking markets.

David Meltzer, The Playbook podcast host and public speaker

Learn More

There’s only so much time I can devote to endorsements in my role, and my brand recognition is at its highest demand during hockey season –– when I have the least amount of time to support local businesses and charities due to my schedule. Veritone Voice provides me with such a wide range of possibilities for my personal brand and endorsements because of its ease of use, minimal time commitment and control over the final voice file.

Randy Hahn, NHL Sports play-by-play commentator and on-air personality

Premier Global Partnerships

Veritone Voice Custom Voice FAQ

How are AI voices made?

Once consent is received, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages.

How long does it take to create an AI voice?

A custom voice model takes about two weeks once all of the required training data is received.

How do I create the most realistic sounding AI voice?

For a lifelike voice model, you’ll want to ensure the training data matches the use case for the voice model. For example, if you or the talent will be using the model for advertisements, the training data should be relevant ad reads. Next, you must ensure the training data meets all audio requirements. This will result in the optimal output. From there, you can leverage the Veritone Voice application to adjust tone, pitch, style, speed, intonation, and more. Text-to-speech can accomplish a very realistic voice though speech-to-speech is also an option to optimize accuracy that sounds indistinguishable from the talent. Additionally, for broadcast quality productions, an audio engineer may assist but is not always required.

What audio samples can be used to generate an AI voice?

As long as it meets the audio requirements, we can leverage pre-existing audio for example from an approved film or podcast or we can arrange studio time to record the samples. If the latter, we have scripts the talent is welcome to use or they may use their own.