Skip to main content

Overview

Voice cloning allows you to create custom voices from audio samples. Upload a recording or record directly in the browser to generate a voice that can be used for text-to-speech generation.

Requirements

Voice creation requires an active subscription. The API will return SUBSCRIPTION_REQUIRED (403) without an active plan.

Audio Requirements

file
File
required
Audio file containing the voice sample
  • Maximum size: 20 MB
  • Minimum duration: 10 seconds
  • Supported formats: All audio formats (WAV, MP3, M4A, FLAC, etc.)
  • Content-Type: Must match actual audio format
Audio files shorter than 10 seconds will be rejected with error: Audio too short (X.Xs). Minimum duration is 10 seconds.

Voice Metadata

name
string
required
Display name for the voice
  • Minimum length: 1 character
  • This name will appear in voice selectors
category
enum
required
Voice category for organizationAvailable categories:
  • AUDIOBOOK - Audiobook narration
  • CONVERSATIONAL - Natural conversation
  • CUSTOMER_SERVICE - Support and service
  • GENERAL - General purpose (default)
  • NARRATIVE - Storytelling
  • CHARACTERS - Character voices
  • MEDITATION - Calm and soothing
  • MOTIVATIONAL - Energetic and inspiring
  • PODCAST - Podcast hosting
  • ADVERTISING - Marketing and ads
  • VOICEOVER - Professional voiceover
  • CORPORATE - Business presentations
language
string
required
Language code for the voice
  • Format: BCP 47 language tag (e.g., en-US, es-MX, fr-FR)
  • Default: en-US
  • Supports all locale-codes locales with region (e.g., en-US, not just en)
description
string
Optional description of the voice characteristics
  • Helps identify and organize voices
  • Searchable in voice library

Creation Methods

Upload an existing audio file from your device.
1

Select upload method

Click the “Upload” tab in the voice creation dialog.
2

Choose file

  • Click “Upload file” or drag and drop
  • File must be under 20 MB
  • Preview and playback available after upload
3

Fill metadata

  • Enter voice name
  • Select category
  • Choose language
  • Add optional description
4

Create voice

Click “Create Voice” to process and upload the audio.
// File dropzone configuration
const dropzone = useDropzone({
  accept: { "audio/*": [] },
  maxSize: 20 * 1024 * 1024, // 20 MB
  multiple: false,
  onDrop: (acceptedFiles) => {
    if (acceptedFiles.length > 0) {
      onFileChange(acceptedFiles[0]);
    }
  },
});

API Workflow

Voice creation uses a REST API endpoint (not tRPC) due to file upload requirements:

Endpoint

POST /api/voices/create

Request Format

// Query parameters
const params = new URLSearchParams({
  name: "My Voice",
  category: "CONVERSATIONAL",
  language: "en-US",
  description: "A friendly conversational voice", // optional
});

// Request body: raw audio file
const response = await fetch(`/api/voices/create?${params.toString()}`, {
  method: "POST",
  headers: { 
    "Content-Type": file.type, // e.g., "audio/wav"
  },
  body: file, // File object
});

Validation Steps

The API performs several validation checks:
1

Authentication

Verify user is authenticated (Clerk auth)
2

Subscription check

Confirm active subscription exists in Polar
3

Input validation

Validate query parameters match schema
4

File size check

Ensure file is under 20 MB limit
5

Audio format validation

Parse audio metadata using music-metadata
6

Duration check

Verify audio is at least 10 seconds long
7

Database creation

Create voice record with variant CUSTOM
8

Cloud upload

Upload audio to R2 storage at voices/orgs/{orgId}/{voiceId}
9

Update record

Store R2 object key in database
10

Usage tracking

Send voice_creation event to Polar (fire-and-forget)

Response

Success (201):
{
  "name": "My Voice",
  "message": "Voice created successfully"
}
Error Responses:
User is not authenticated or organization context is missing.
No active subscription found for the organization.
Query parameters failed validation (missing name, invalid category, etc.).
Request must include Content-Type header matching audio format.
Audio file exceeds the 20 MB size limit.
File is not a valid audio file or cannot be parsed.
Audio duration is less than 10 seconds minimum.
Voice creation or upload failed. Partial data is automatically cleaned up.

Storage Structure

Custom voices are stored with the following structure:
voices/orgs/{orgId}/{voiceId}
  • Organization scoped: Each org’s voices are isolated
  • Unique IDs: Voice IDs are generated using CUID
  • Variant: Custom voices have variant: "CUSTOM"
  • Ownership: Linked to creating organization via orgId

Usage Tracking

Voice creation is metered for billing:
polar.events.ingest({
  events: [{
    name: "voice_creation",
    externalCustomerId: orgId,
    metadata: {},
    timestamp: new Date(),
  }],
});
Usage tracking is fire-and-forget. Tracking failures won’t block voice creation.

Best Practices

  1. Audio Quality: Use high-quality audio samples with minimal background noise
  2. Duration: Longer samples (30+ seconds) typically produce better clones
  3. Consistency: Single speaker, consistent volume and tone
  4. Language: Ensure audio matches selected language code
  5. Categories: Choose appropriate category for better organization

Example Implementation

import { useMutation, useQueryClient } from '@tanstack/react-query';
import { toast } from 'sonner';

const createVoiceMutation = useMutation({
  mutationFn: async ({
    name,
    file,
    category,
    language,
    description,
  }: {
    name: string;
    file: File;
    category: string;
    language: string;
    description?: string;
  }) => {
    const params = new URLSearchParams({
      name,
      category,
      language,
    });
    if (description) {
      params.set("description", description);
    }

    const response = await fetch(`/api/voices/create?${params.toString()}`, {
      method: "POST",
      headers: { "Content-Type": file.type },
      body: file,
    });

    if (!response.ok) {
      const body = await response.json();
      throw new Error(body.error ?? "Failed to create voice");
    }

    return response.json();
  },
  onSuccess: () => {
    toast.success("Voice created successfully!");
    queryClient.invalidateQueries({ queryKey: ['voices'] });
  },
  onError: (error) => {
    toast.error(error.message);
  },
});

Build docs developers (and LLMs) love