Voice Cloning

Overview

Voice cloning allows you to create custom voices from audio samples. Upload a recording or record directly in the browser to generate a voice that can be used for text-to-speech generation.

Requirements

Voice creation requires an active subscription. The API will return SUBSCRIPTION_REQUIRED (403) without an active plan.

Audio Requirements

file

File

required

Audio file containing the voice sample

Maximum size: 20 MB
Minimum duration: 10 seconds
Supported formats: All audio formats (WAV, MP3, M4A, FLAC, etc.)
Content-Type: Must match actual audio format

Audio files shorter than 10 seconds will be rejected with error: Audio too short (X.Xs). Minimum duration is 10 seconds.

Voice Metadata

name

string

required

Display name for the voice

Minimum length: 1 character
This name will appear in voice selectors

Creation Methods

Upload File
Record Audio

Upload an existing audio file from your device.

Select upload method

Click the “Upload” tab in the voice creation dialog.

Choose file

Click “Upload file” or drag and drop
File must be under 20 MB
Preview and playback available after upload

Fill metadata

Enter voice name
Select category
Choose language
Add optional description

Create voice

Click “Create Voice” to process and upload the audio.

// File dropzone configuration
const dropzone = useDropzone({
  accept: { "audio/*": [] },
  maxSize: 20 * 1024 * 1024, // 20 MB
  multiple: false,
  onDrop: (acceptedFiles) => {
    if (acceptedFiles.length > 0) {
      onFileChange(acceptedFiles[0]);
    }
  },
});

Record audio directly in the browser using your microphone.

Select record method

Click the “Record” tab in the voice creation dialog.

Grant microphone access

Allow browser to access your microphone when prompted.

Record voice sample

Click “Record” to start
Speak clearly for at least 10 seconds
Real-time waveform visualization shown
Click “Stop” when finished

Review recording

Playback the recording
Re-record if needed
See duration and file size

Complete metadata

Fill in voice name, category, language, and description.

Create voice

Click “Create Voice” to upload the recording.

Recordings are saved as WAV files with the filename recording.wav.

API Workflow

Voice creation uses a REST API endpoint (not tRPC) due to file upload requirements:

Endpoint

POST /api/voices/create

Request Format

// Query parameters
const params = new URLSearchParams({
  name: "My Voice",
  category: "CONVERSATIONAL",
  language: "en-US",
  description: "A friendly conversational voice", // optional
});

// Request body: raw audio file
const response = await fetch(`/api/voices/create?${params.toString()}`, {
  method: "POST",
  headers: { 
    "Content-Type": file.type, // e.g., "audio/wav"
  },
  body: file, // File object
});

Validation Steps

The API performs several validation checks:

Authentication

Verify user is authenticated (Clerk auth)

Subscription check

Confirm active subscription exists in Polar

Input validation

Validate query parameters match schema

File size check

Ensure file is under 20 MB limit

Audio format validation

Parse audio metadata using music-metadata

Duration check

Verify audio is at least 10 seconds long

Database creation

Create voice record with variant CUSTOM

Cloud upload

Upload audio to R2 storage at voices/orgs/{orgId}/{voiceId}

Update record

Store R2 object key in database

Usage tracking

Send voice_creation event to Polar (fire-and-forget)

Response

Success (201):

{
  "name": "My Voice",
  "message": "Voice created successfully"
}

Error Responses:

401 Unauthorized

User is not authenticated or organization context is missing.

403 SUBSCRIPTION_REQUIRED

No active subscription found for the organization.

400 Invalid Input

Query parameters failed validation (missing name, invalid category, etc.).

400 Missing Content-Type

Request must include Content-Type header matching audio format.

413 File Too Large

Audio file exceeds the 20 MB size limit.

422 Invalid Audio File

File is not a valid audio file or cannot be parsed.

422 Audio Too Short

Audio duration is less than 10 seconds minimum.

500 Creation Failed

Voice creation or upload failed. Partial data is automatically cleaned up.

Storage Structure

Custom voices are stored with the following structure:

voices/orgs/{orgId}/{voiceId}

Organization scoped: Each org’s voices are isolated
Unique IDs: Voice IDs are generated using CUID
Variant: Custom voices have variant: "CUSTOM"
Ownership: Linked to creating organization via orgId

Usage Tracking

Voice creation is metered for billing:

polar.events.ingest({
  events: [{
    name: "voice_creation",
    externalCustomerId: orgId,
    metadata: {},
    timestamp: new Date(),
  }],
});

Usage tracking is fire-and-forget. Tracking failures won’t block voice creation.

Best Practices

Audio Quality: Use high-quality audio samples with minimal background noise
Duration: Longer samples (30+ seconds) typically produce better clones
Consistency: Single speaker, consistent volume and tone
Language: Ensure audio matches selected language code
Categories: Choose appropriate category for better organization

Example Implementation

import { useMutation, useQueryClient } from '@tanstack/react-query';
import { toast } from 'sonner';

const createVoiceMutation = useMutation({
  mutationFn: async ({
    name,
    file,
    category,
    language,
    description,
  }: {
    name: string;
    file: File;
    category: string;
    language: string;
    description?: string;
  }) => {
    const params = new URLSearchParams({
      name,
      category,
      language,
    });
    if (description) {
      params.set("description", description);
    }

    const response = await fetch(`/api/voices/create?${params.toString()}`, {
      method: "POST",
      headers: { "Content-Type": file.type },
      body: file,
    });

    if (!response.ok) {
      const body = await response.json();
      throw new Error(body.error ?? "Failed to create voice");
    }

    return response.json();
  },
  onSuccess: () => {
    toast.success("Voice created successfully!");
    queryClient.invalidateQueries({ queryKey: ['voices'] });
  },
  onError: (error) => {
    toast.error(error.message);
  },
});

Get Started

Core Features

Configuration

Deployment

Overview

Requirements

Audio Requirements

Voice Metadata

Creation Methods

API Workflow

Endpoint

Request Format

Validation Steps

Response

Storage Structure

Usage Tracking

Best Practices

Example Implementation

Build docs developers (and LLMs) love

Get Started

Core Features

Configuration

Deployment

​Overview

​Requirements

​Audio Requirements

​Voice Metadata

​Creation Methods

​API Workflow

​Endpoint

​Request Format

​Validation Steps

​Response

​Storage Structure

​Usage Tracking

​Best Practices

​Example Implementation

Build docs developers (and LLMs) love

Overview

Requirements

Audio Requirements

Voice Metadata

Creation Methods

API Workflow

Endpoint

Request Format

Validation Steps

Response

Storage Structure

Usage Tracking

Best Practices

Example Implementation