Skip to main content
Audio transcription and translation use multipart uploads. Text-to-speech does not upload a file, but it does let you choose an output audio format.

Transcription and translation uploads

For POST /v1/audio/transcriptions and POST /v1/audio/translations, the request body is multipart/form-data.

Required and optional fields

FieldRequiredNotes
modelYesThe transcription or translation model
fileYesBinary audio file upload
promptNoShort hint for formatting or name preservation
languageNoOptional language hint

Example upload

from pathlib import Path
import requests

with Path("sample.mp3").open("rb") as audio_file:
    response = requests.post(
        "https://api.naga.ac/v1/audio/transcriptions",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        data={"model": "whisper-1", "language": "en"},
        files={"file": audio_file},
    )

response.raise_for_status()
print(response.json()["text"])

Validation and decoding

The gateway attempts to decode the uploaded audio before processing it. If the file cannot be decoded, the API returns an invalid_request_error. The OpenAPI contract requires a binary file field, but it does not publish a strict file-extension allowlist here. In practice, use standard decodable audio files and validate them in your own pipeline before upload.

Text-to-speech output formats

For POST /v1/audio/speech, the response_format field supports:
  • mp3
  • opus
  • aac
  • flac
  • wav
  • pcm

Practical advice

  • use clean source files whenever possible
  • prefer standard, decodable audio containers and codecs
  • keep prompts short and specific when you need name preservation or formatting hints