File Formats and Uploads - NagaAI Documentation

Audio transcription and translation use multipart uploads. Text-to-speech does not upload a file, but it does let you choose an output audio format.

Transcription and translation uploads

For POST /v1/audio/transcriptions and POST /v1/audio/translations, the request body is multipart/form-data.

Required and optional fields

Field	Required	Notes
`model`	Yes	The transcription or translation model
`file`	Yes	Binary audio file upload
`prompt`	No	Short hint for formatting or name preservation
`language`	No	Optional language hint

Example upload

from pathlib import Path
import requests

with Path("sample.mp3").open("rb") as audio_file:
    response = requests.post(
        "https://api.naga.ac/v1/audio/transcriptions",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        data={"model": "whisper-1", "language": "en"},
        files={"file": audio_file},
    )

response.raise_for_status()
print(response.json()["text"])

Validation and decoding

The gateway attempts to decode the uploaded audio before processing it. If the file cannot be decoded, the API returns an invalid_request_error. The OpenAPI contract requires a binary file field, but it does not publish a strict file-extension allowlist here. In practice, use standard decodable audio files and validate them in your own pipeline before upload.

Text-to-speech output formats

For POST /v1/audio/speech, the response_format field supports:

mp3
opus
aac
flac
wav
pcm

Practical advice

use clean source files whenever possible
prefer standard, decodable audio containers and codecs
keep prompts short and specific when you need name preservation or formatting hints

Documentation Index

​Transcription and translation uploads

​Required and optional fields

​Example upload

​Validation and decoding

​Text-to-speech output formats

​Practical advice

​Related Docs