Featured & Pricing
Bring truly lifelike AI voice to your projects
Custom Voice
Starts at
$9k/per voice
Start with a custom synthetic voice clone, securely built, and create personalized content in multiple languages using verified AI voices that sound human. Great for everything from film to podcasts.
- Clone voices using text-to-speech or speech-to-speech
- Create custom models securely and with consent
- Manage end-to-end voice needs and licensing in one place
- Protect your model with inaudible watermarks and traceability
- Generate new audio clips seamlessly once your model is built
- Translate content into multiple languages to reach new audiences
- Create your own lexicon for custom terminology recognition
- Monetize custom voices for your podcast via Veritone Voice Network
How it works
Input audio, build a model, and create content
Step 1: Secure consent
As ethical cloning pioneers, we never build a voice model without approval. The individual whose voice will be used must provide their explicit consent. If the talent is deceased or in the public domain, the estate or IP owner must sign off.
Step 2: Input pre-existing or newly recorded audio content
Next, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages.
Step 3: Customize voice content
Once the model is built, you can use the self-serve app for both text-to-speech and speech-to-speech content creation in near real-time. Or work with our experts to manage your output needs. Additional models in new languages can be built in about two days.
Veritone Voice Custom Voice FAQ
How are AI voices made?
Once consent is received, we need about three hours of high fidelity, isolated audio recording which we’ll use to train the model. We can use pre-existing audio or provide scripts to record. Content should model the desired output style, and multiple models can be built to accommodate different styles and languages.
How long does it take to create an AI voice?
A custom voice model takes about two weeks once all of the required training data is received.
How do I create the most realistic sounding AI voice?
For a lifelike voice model, you’ll want to ensure the training data matches the use case for the voice model. For example, if you or the talent will be using the model for advertisements, the training data should be relevant ad reads. Next, you must ensure the training data meets all audio requirements. This will result in the optimal output. From there, you can leverage the Veritone Voice application to adjust tone, pitch, style, speed, intonation, and more. Text-to-speech can accomplish a very realistic voice though speech-to-speech is also an option to optimize accuracy that sounds indistinguishable from the talent. Additionally, for broadcast quality productions, an audio engineer may assist but is not always required.
What audio samples can be used to generate an AI voice?
As long as it meets the audio requirements, we can leverage pre-existing audio for example from an approved film or podcast or we can arrange studio time to record the samples. If the latter, we have scripts the talent is welcome to use or they may use their own.