Endpoint
POST https://api.cencori.com/api/ai/audio/transcriptions
Transcribe audio files to text using OpenAI Whisper. Supports multiple audio formats and languages.
Authentication
Requires API key in Authorization header or CENCORI_API_KEY header.
This endpoint expects multipart/form-data (file upload).
Audio file to transcribe.Supported formats:
- mp3, mp4, m4a, mpeg, mpga
- wav, webm
- ogg, flac
Maximum file size: 25MB
Transcription model.Default: whisper-1Currently only whisper-1 is supported.
Language code (ISO-639-1).Examples: en, es, fr, de, ja, zhIf not specified, language is auto-detected.
Optional text to guide the model’s style or continue a previous segment.Helps with:
- Correct spelling of names/terms
- Maintaining context
- Improving punctuation
Format of the transcript.Options:
json (default) - JSON with text
text - Plain text
srt - SubRip subtitle format
verbose_json - JSON with word-level timestamps
vtt - WebVTT subtitle format
Sampling temperature (0-1).Default: 0Higher values increase randomness.
Response
{
"text": "Transcribed text content"
}
{
"task": "transcribe",
"language": "english",
"duration": 8.5,
"text": "Hello, this is a test.",
"words": [
{
"word": "Hello",
"start": 0.0,
"end": 0.5
},
{
"word": "this",
"start": 0.5,
"end": 0.8
}
// ...
]
}
Text Format
Plain text response (Content-Type: text/plain)
1
00:00:00,000 --> 00:00:02,000
Hello, this is a test.
2
00:00:02,000 --> 00:00:05,000
This is the second subtitle.
WEBVTT
00:00:00.000 --> 00:00:02.000
Hello, this is a test.
00:00:02.000 --> 00:00:05.000
This is the second subtitle.
Examples
Basic Transcription
curl https://api.cencori.com/api/ai/audio/transcriptions \
-H "Authorization: Bearer $CENCORI_API_KEY" \
-F [email protected] \
-F model=whisper-1
{
"text": "Hello and welcome to Cencori. This is a demo of the audio transcription API."
}
With Language Specification
curl https://api.cencori.com/api/ai/audio/transcriptions \
-H "Authorization: Bearer $CENCORI_API_KEY" \
-F [email protected] \
-F model=whisper-1 \
-F language=en
With Prompt for Context
curl https://api.cencori.com/api/ai/audio/transcriptions \
-H "Authorization: Bearer $CENCORI_API_KEY" \
-F [email protected] \
-F model=whisper-1 \
-F prompt="This is a recording about Cencori API and AI technology."
Generate Subtitles (SRT)
curl https://api.cencori.com/api/ai/audio/transcriptions \
-H "Authorization: Bearer $CENCORI_API_KEY" \
-F [email protected] \
-F model=whisper-1 \
-F response_format=srt > subtitles.srt
Verbose JSON with Timestamps
curl https://api.cencori.com/api/ai/audio/transcriptions \
-H "Authorization: Bearer $CENCORI_API_KEY" \
-F [email protected] \
-F model=whisper-1 \
-F response_format=verbose_json
Using OpenAI SDK
import OpenAI from 'openai';
import fs from 'fs';
const client = new OpenAI({
apiKey: process.env.CENCORI_API_KEY,
baseURL: 'https://api.cencori.com/v1'
});
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('audio.mp3'),
model: 'whisper-1',
language: 'en',
response_format: 'verbose_json'
});
console.log(transcription.text);
Transcribe with Progress Tracking
import fs from 'fs';
import FormData from 'form-data';
const form = new FormData();
form.append('file', fs.createReadStream('large-audio.mp3'));
form.append('model', 'whisper-1');
const response = await fetch('https://api.cencori.com/api/ai/audio/transcriptions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.CENCORI_API_KEY}`,
...form.getHeaders()
},
body: form
});
const result = await response.json();
console.log(result.text);
Supported Languages
Whisper supports 99+ languages including:
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Portuguese (pt)
- Dutch (nl)
- Russian (ru)
- Japanese (ja)
- Korean (ko)
- Chinese (zh)
- Arabic (ar)
- Hindi (hi)
- And many more…
Full list: https://platform.openai.com/docs/guides/speech-to-text/supported-languages
Use Cases
Meeting Transcription
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('meeting.m4a'),
model: 'whisper-1',
response_format: 'verbose_json'
});
// Extract speaker segments, timestamps, action items
const summary = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: `Summarize this meeting and extract action items:\n${transcription.text}`
}]
});
Podcast to Blog Post
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('podcast.mp3'),
model: 'whisper-1'
});
const blogPost = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'user',
content: `Convert this podcast transcript to an engaging blog post:\n${transcription.text}`
}]
});
Video Subtitles
import ffmpeg from 'fluent-ffmpeg';
// Extract audio from video
ffmpeg('video.mp4')
.output('audio.mp3')
.on('end', async () => {
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream('audio.mp3'),
model: 'whisper-1',
response_format: 'srt'
});
fs.writeFileSync('subtitles.srt', transcription);
})
.run();
Voice Notes App
const transcribeVoiceNote = async (audioBuffer: Buffer) => {
const tempFile = `/tmp/${Date.now()}.webm`;
fs.writeFileSync(tempFile, audioBuffer);
const transcription = await client.audio.transcriptions.create({
file: fs.createReadStream(tempFile),
model: 'whisper-1',
prompt: 'User voice note'
});
fs.unlinkSync(tempFile);
return transcription.text;
};
Best Practices
-
Prepare audio files
- Use clear audio with minimal background noise
- Convert to supported format if needed
- Keep files under 25MB (split longer recordings)
-
Specify language
- Improves accuracy when language is known
- Faster processing
- Better handling of accents
-
Use prompts effectively
- Include domain-specific terms
- Provide context for better accuracy
- Help with proper nouns and technical terms
-
Choose right format
- Use
json for simple transcripts
- Use
verbose_json for timestamps
- Use
srt/vtt for video subtitles
-
Handle errors
- Validate file size before upload
- Check supported formats
- Implement retry logic for network issues
Limitations
- Maximum file size: 25MB
- Files larger than 25MB must be split
- Optimal for audio between 10 seconds and 1 hour
- Background noise affects accuracy
- Multiple simultaneous speakers may reduce accuracy
Error Responses
File Too Large
{
"error": "bad_request",
"message": "File size exceeds maximum of 25MB"
}
HTTP Status: 400
Solution: Split audio file into smaller chunks.
{
"error": "bad_request",
"message": "Unsupported audio format. Supported: mp3, wav, webm, mp4, m4a, ogg, flac"
}
HTTP Status: 400
Solution: Convert audio to supported format.
Missing File
{
"error": "bad_request",
"message": "Audio file is required"
}
HTTP Status: 400
Solution: Include file in multipart form data.
Rate Limits
Transcription requests count toward monthly quota:
- Free: 1,000 requests/month
- Pro: 50,000 requests/month
- Enterprise: Custom limits
Pricing
Pricing based on audio duration (estimated from file size):
- Whisper-1: $0.006/minute (provider cost)
- Cencori charge: $0.0072/minute (20% markup)
Estimation: ~1 minute per 1MB for MP3 files
Example: 10-minute audio (10MB) = $0.072
curl https://api.cencori.com/api/ai/audio/transcriptions \
-H "Authorization: Bearer $CENCORI_API_KEY"
{
"models": ["whisper-1"],
"supported_formats": ["mp3", "mp4", "m4a", "mpeg", "mpga", "wav", "webm", "ogg", "flac"],
"response_formats": ["json", "text", "srt", "verbose_json", "vtt"],
"max_file_size": "25MB"
}