Skip to main content

Endpoint

POST https://api.cencori.com/api/ai/audio/transcriptions
Transcribe audio files to text using OpenAI Whisper. Supports multiple audio formats and languages.

Authentication

Requires API key in Authorization header or CENCORI_API_KEY header.

Request Format

This endpoint expects multipart/form-data (file upload).
file
file
required
Audio file to transcribe.Supported formats:
  • mp3, mp4, m4a, mpeg, mpga
  • wav, webm
  • ogg, flac
Maximum file size: 25MB
model
string
Transcription model.Default: whisper-1Currently only whisper-1 is supported.
language
string
Language code (ISO-639-1).Examples: en, es, fr, de, ja, zhIf not specified, language is auto-detected.
prompt
string
Optional text to guide the model’s style or continue a previous segment.Helps with:
  • Correct spelling of names/terms
  • Maintaining context
  • Improving punctuation
response_format
string
Format of the transcript.Options:
  • json (default) - JSON with text
  • text - Plain text
  • srt - SubRip subtitle format
  • verbose_json - JSON with word-level timestamps
  • vtt - WebVTT subtitle format
temperature
number
Sampling temperature (0-1).Default: 0Higher values increase randomness.

Response

JSON Format (default)

{
  "text": "Transcribed text content"
}

Verbose JSON Format

{
  "task": "transcribe",
  "language": "english",
  "duration": 8.5,
  "text": "Hello, this is a test.",
  "words": [
    {
      "word": "Hello",
      "start": 0.0,
      "end": 0.5
    },
    {
      "word": "this",
      "start": 0.5,
      "end": 0.8
    }
    // ...
  ]
}

Text Format

Plain text response (Content-Type: text/plain)

SRT Format

1
00:00:00,000 --> 00:00:02,000
Hello, this is a test.

2
00:00:02,000 --> 00:00:05,000
This is the second subtitle.

VTT Format

WEBVTT

00:00:00.000 --> 00:00:02.000
Hello, this is a test.

00:00:02.000 --> 00:00:05.000
This is the second subtitle.

Examples

Basic Transcription

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1
{
  "text": "Hello and welcome to Cencori. This is a demo of the audio transcription API."
}

With Language Specification

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F language=en

With Prompt for Context

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F prompt="This is a recording about Cencori API and AI technology."

Generate Subtitles (SRT)

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F response_format=srt > subtitles.srt

Verbose JSON with Timestamps

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY" \
  -F [email protected] \
  -F model=whisper-1 \
  -F response_format=verbose_json

Using OpenAI SDK

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: process.env.CENCORI_API_KEY,
  baseURL: 'https://api.cencori.com/v1'
});

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
  language: 'en',
  response_format: 'verbose_json'
});

console.log(transcription.text);

Transcribe with Progress Tracking

import fs from 'fs';
import FormData from 'form-data';

const form = new FormData();
form.append('file', fs.createReadStream('large-audio.mp3'));
form.append('model', 'whisper-1');

const response = await fetch('https://api.cencori.com/api/ai/audio/transcriptions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.CENCORI_API_KEY}`,
    ...form.getHeaders()
  },
  body: form
});

const result = await response.json();
console.log(result.text);

Supported Languages

Whisper supports 99+ languages including:
  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Dutch (nl)
  • Russian (ru)
  • Japanese (ja)
  • Korean (ko)
  • Chinese (zh)
  • Arabic (ar)
  • Hindi (hi)
  • And many more…
Full list: https://platform.openai.com/docs/guides/speech-to-text/supported-languages

Use Cases

Meeting Transcription

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('meeting.m4a'),
  model: 'whisper-1',
  response_format: 'verbose_json'
});

// Extract speaker segments, timestamps, action items
const summary = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: `Summarize this meeting and extract action items:\n${transcription.text}`
  }]
});

Podcast to Blog Post

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('podcast.mp3'),
  model: 'whisper-1'
});

const blogPost = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{
    role: 'user',
    content: `Convert this podcast transcript to an engaging blog post:\n${transcription.text}`
  }]
});

Video Subtitles

import ffmpeg from 'fluent-ffmpeg';

// Extract audio from video
ffmpeg('video.mp4')
  .output('audio.mp3')
  .on('end', async () => {
    const transcription = await client.audio.transcriptions.create({
      file: fs.createReadStream('audio.mp3'),
      model: 'whisper-1',
      response_format: 'srt'
    });
    
    fs.writeFileSync('subtitles.srt', transcription);
  })
  .run();

Voice Notes App

const transcribeVoiceNote = async (audioBuffer: Buffer) => {
  const tempFile = `/tmp/${Date.now()}.webm`;
  fs.writeFileSync(tempFile, audioBuffer);
  
  const transcription = await client.audio.transcriptions.create({
    file: fs.createReadStream(tempFile),
    model: 'whisper-1',
    prompt: 'User voice note'
  });
  
  fs.unlinkSync(tempFile);
  return transcription.text;
};

Best Practices

  1. Prepare audio files
    • Use clear audio with minimal background noise
    • Convert to supported format if needed
    • Keep files under 25MB (split longer recordings)
  2. Specify language
    • Improves accuracy when language is known
    • Faster processing
    • Better handling of accents
  3. Use prompts effectively
    • Include domain-specific terms
    • Provide context for better accuracy
    • Help with proper nouns and technical terms
  4. Choose right format
    • Use json for simple transcripts
    • Use verbose_json for timestamps
    • Use srt/vtt for video subtitles
  5. Handle errors
    • Validate file size before upload
    • Check supported formats
    • Implement retry logic for network issues

Limitations

  • Maximum file size: 25MB
  • Files larger than 25MB must be split
  • Optimal for audio between 10 seconds and 1 hour
  • Background noise affects accuracy
  • Multiple simultaneous speakers may reduce accuracy

Error Responses

File Too Large

{
  "error": "bad_request",
  "message": "File size exceeds maximum of 25MB"
}
HTTP Status: 400 Solution: Split audio file into smaller chunks.

Unsupported Format

{
  "error": "bad_request",
  "message": "Unsupported audio format. Supported: mp3, wav, webm, mp4, m4a, ogg, flac"
}
HTTP Status: 400 Solution: Convert audio to supported format.

Missing File

{
  "error": "bad_request",
  "message": "Audio file is required"
}
HTTP Status: 400 Solution: Include file in multipart form data.

Rate Limits

Transcription requests count toward monthly quota:
  • Free: 1,000 requests/month
  • Pro: 50,000 requests/month
  • Enterprise: Custom limits

Pricing

Pricing based on audio duration (estimated from file size):
  • Whisper-1: $0.006/minute (provider cost)
  • Cencori charge: $0.0072/minute (20% markup)
Estimation: ~1 minute per 1MB for MP3 files Example: 10-minute audio (10MB) = $0.072

List Supported Formats

curl https://api.cencori.com/api/ai/audio/transcriptions \
  -H "Authorization: Bearer $CENCORI_API_KEY"
{
  "models": ["whisper-1"],
  "supported_formats": ["mp3", "mp4", "m4a", "mpeg", "mpga", "wav", "webm", "ogg", "flac"],
  "response_formats": ["json", "text", "srt", "verbose_json", "vtt"],
  "max_file_size": "25MB"
}

Build docs developers (and LLMs) love