Audio Transcription

Endpoint

POST https://api.cencori.com/api/ai/audio/transcriptions

Transcribe audio files to text using OpenAI Whisper. Supports multiple audio formats and languages.

Authentication

Requires API key in Authorization header or CENCORI_API_KEY header.

Request Format

This endpoint expects multipart/form-data (file upload).

file

required

Audio file to transcribe.Supported formats:

mp3, mp4, m4a, mpeg, mpga
wav, webm
ogg, flac

Maximum file size: 25MB

model

string

Transcription model.Default: whisper-1Currently only whisper-1 is supported.

language

string

Language code (ISO-639-1).Examples: en, es, fr, de, ja, zhIf not specified, language is auto-detected.

prompt

string

Optional text to guide the model’s style or continue a previous segment.Helps with:

Correct spelling of names/terms
Maintaining context
Improving punctuation

response_format

string

Format of the transcript.Options:

json (default) - JSON with text
text - Plain text
srt - SubRip subtitle format
verbose_json - JSON with word-level timestamps
vtt - WebVTT subtitle format

temperature

number

Sampling temperature (0-1).Default: 0Higher values increase randomness.

Response

JSON Format (default)

{
  "text": "Transcribed text content"
}

Verbose JSON Format

{
  "task": "transcribe",
  "language": "english",
  "duration": 8.5,
  "text": "Hello, this is a test.",
  "words": [
    {
      "word": "Hello",
      "start": 0.0,
      "end": 0.5
    },
    {
      "word": "this",
      "start": 0.5,
      "end": 0.8
    }
    // ...
  ]
}

Text Format

Plain text response (Content-Type: text/plain)

SRT Format

1
00:00:00,000 --> 00:00:02,000
Hello, this is a test.

2
00:00:02,000 --> 00:00:05,000
This is the second subtitle.

VTT Format

WEBVTT

00:00:00.000 --> 00:00:02.000
Hello, this is a test.

00:00:02.000 --> 00:00:05.000
This is the second subtitle.

Examples

Basic Transcription

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1

{
  "text": "Hello and welcome to Cencori. This is a demo of the audio transcription API."
}

With Language Specification

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F language=en

With Prompt for Context

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F prompt="This is a recording about Cencori API and AI technology."

Generate Subtitles (SRT)

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F response_format=srt > subtitles.srt

Verbose JSON with Timestamps

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F response_format=verbose_json

Using OpenAI SDK

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: process.env.CENCORI_API_KEY,
  baseURL: 'https://api.cencori.com/v1'
});

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
  language: 'en',
  response_format: 'verbose_json'
});

console.log(transcription.text);

Transcribe with Progress Tracking

import fs from 'fs';
import FormData from 'form-data';

const form = new FormData();
form.append('file', fs.createReadStream('large-audio.mp3'));
form.append('model', 'whisper-1');

const response = await fetch('https://api.cencori.com/api/ai/audio/transcriptions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.CENCORI_API_KEY}`,
    ...form.getHeaders()
  },
  body: form
});

const result = await response.json();
console.log(result.text);

Supported Languages

Whisper supports 99+ languages including:

English (en)
Spanish (es)
French (fr)
German (de)
Italian (it)
Portuguese (pt)
Dutch (nl)
Russian (ru)
Japanese (ja)
Korean (ko)
Chinese (zh)
Arabic (ar)
Hindi (hi)
And many more…

Full list: https://platform.openai.com/docs/guides/speech-to-text/supported-languages

Use Cases

Meeting Transcription

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('meeting.m4a'),
  model: 'whisper-1',
  response_format: 'verbose_json'
});

// Extract speaker segments, timestamps, action items
const summary = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: `Summarize this meeting and extract action items:\n${transcription.text}`
  }]
});

Podcast to Blog Post

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('podcast.mp3'),
  model: 'whisper-1'
});

const blogPost = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: `Convert this podcast transcript to an engaging blog post:\n${transcription.text}`
  }]
});

Video Subtitles

import ffmpeg from 'fluent-ffmpeg';

// Extract audio from video
ffmpeg('video.mp4')
  .output('audio.mp3')
  .on('end', async () => {
    const transcription = await client.audio.transcriptions.create({
      file: fs.createReadStream('audio.mp3'),
      model: 'whisper-1',
      response_format: 'srt'
    });
    
    fs.writeFileSync('subtitles.srt', transcription);
  })
  .run();

Voice Notes App

const transcribeVoiceNote = async (audioBuffer: Buffer) => {
  const tempFile = `/tmp/${Date.now()}.webm`;
  fs.writeFileSync(tempFile, audioBuffer);
  
  const transcription = await client.audio.transcriptions.create({
    file: fs.createReadStream(tempFile),
    model: 'whisper-1',
    prompt: 'User voice note'
  });
  
  fs.unlinkSync(tempFile);
  return transcription.text;
};

Best Practices

Prepare audio files
- Use clear audio with minimal background noise
- Convert to supported format if needed
- Keep files under 25MB (split longer recordings)
Specify language
- Improves accuracy when language is known
- Faster processing
- Better handling of accents
Use prompts effectively
- Include domain-specific terms
- Provide context for better accuracy
- Help with proper nouns and technical terms
Choose right format
- Use json for simple transcripts
- Use verbose_json for timestamps
- Use srt/vtt for video subtitles
Handle errors
- Validate file size before upload
- Check supported formats
- Implement retry logic for network issues

Limitations

Maximum file size: 25MB
Files larger than 25MB must be split
Optimal for audio between 10 seconds and 1 hour
Background noise affects accuracy
Multiple simultaneous speakers may reduce accuracy

Error Responses

File Too Large

{
  "error": "bad_request",
  "message": "File size exceeds maximum of 25MB"
}

HTTP Status: 400 Solution: Split audio file into smaller chunks.

Unsupported Format

{
  "error": "bad_request",
  "message": "Unsupported audio format. Supported: mp3, wav, webm, mp4, m4a, ogg, flac"
}

HTTP Status: 400 Solution: Convert audio to supported format.

Missing File

{
  "error": "bad_request",
  "message": "Audio file is required"
}

HTTP Status: 400 Solution: Include file in multipart form data.

Rate Limits

Transcription requests count toward monthly quota:

Free: 1,000 requests/month
Pro: 50,000 requests/month
Enterprise: Custom limits

Pricing

Pricing based on audio duration (estimated from file size):

Whisper-1: $0.006/minute (provider cost)
Cencori charge: $0.0072/minute (20% markup)

Estimation: ~1 minute per 1MB for MP3 files Example: 10-minute audio (10MB) = $0.072

List Supported Formats

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY"

{
  "models": ["whisper-1"],
  "supported_formats": ["mp3", "mp4", "m4a", "mpeg", "mpga", "wav", "webm", "ogg", "flac"],
  "response_formats": ["json", "text", "srt", "verbose_json", "vtt"],
  "max_file_size": "25MB"
}

Overview

AI Gateway

Memory & Storage

Management

Audio Transcription

Endpoint

Authentication

Request Format

Response

JSON Format (default)

Verbose JSON Format

Text Format

SRT Format

VTT Format

Examples

Basic Transcription

With Language Specification

With Prompt for Context

Generate Subtitles (SRT)

Verbose JSON with Timestamps

Using OpenAI SDK

Transcribe with Progress Tracking

Supported Languages

Use Cases

Meeting Transcription

Podcast to Blog Post

Video Subtitles

Voice Notes App

Best Practices

Limitations

Error Responses

File Too Large

Unsupported Format

Missing File

Rate Limits

Pricing

List Supported Formats

Build docs developers (and LLMs) love

Overview

AI Gateway

Memory & Storage

Management

​Endpoint

​Authentication

​Request Format

​Response

​JSON Format (default)

​Verbose JSON Format

​Text Format

​SRT Format

​VTT Format

​Examples

​Basic Transcription

​With Language Specification

​With Prompt for Context

​Generate Subtitles (SRT)

​Verbose JSON with Timestamps

​Using OpenAI SDK

​Transcribe with Progress Tracking

​Supported Languages

​Use Cases

​Meeting Transcription

​Podcast to Blog Post

​Video Subtitles

​Voice Notes App

​Best Practices

​Limitations

​Error Responses

​File Too Large

​Unsupported Format

​Missing File

​Rate Limits

​Pricing

​List Supported Formats

Build docs developers (and LLMs) love

Endpoint

Authentication

Request Format

Response

JSON Format (default)

Verbose JSON Format

Text Format

SRT Format

VTT Format

Examples

Basic Transcription

With Language Specification

With Prompt for Context

Generate Subtitles (SRT)

Verbose JSON with Timestamps

Using OpenAI SDK

Transcribe with Progress Tracking

Supported Languages

Use Cases

Meeting Transcription

Podcast to Blog Post

Video Subtitles

Voice Notes App

Best Practices

Limitations

Error Responses

File Too Large

Unsupported Format

Missing File

Rate Limits

Pricing

List Supported Formats