Multimodal Content - NagaAI Documentation

Chat Completions API accepts either plain string message content or arrays of typed content blocks. Use this when you already depend on chat-style messages[] but still need images, PDFs, or audio in the same request.

Supported Content Blocks

Block type	Shape	Notes
`text`	`{ "type": "text", "text": "..." }`	Plain text
`image_url`	`{ "type": "image_url", "image_url": { "url": "...", "detail": "auto" } }`	`url` accepts `http`, `https`, or `data`; `detail` can be `low`, `high`, or `auto`
`file`	`{ "type": "file", "file": { "filename": "...", "file_data": "..." } }`	`file_data` accepts `http`, `https`, or `data`
`input_audio`	`{ "type": "input_audio", "input_audio": { "data": "...", "format": "wav" } }`	`data` must be raw base64, not a data URI; common formats include `wav` and `mp3`

When to use this surface

you already have chat-based client code and want to keep it
you need multimodal inputs but do not want to migrate to Responses yet

If you are starting from scratch, Responses API is usually the cleaner multimodal path.

Image Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image briefly."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/receipt.png",
                        "detail": "auto",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

File Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

completion = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize this PDF."},
                {
                    "type": "file",
                    "file": {
                        "filename": "report.pdf",
                        "file_data": "https://example.com/report.pdf",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Audio Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

completion = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Tell me what is said in this audio."},
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "BASE64_AUDIO",
                        "format": "wav",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Caveats

input_audio.data must be raw base64, not a data: URL
multimodal support still depends on the selected model
use the direct Audio API for transcription, translation, and text-to-speech
use the direct Images API for primary image generation and image edit workflows

Common mistakes

mixing chat content-block syntax with Responses input-part syntax
assuming every model supports every multimodal block type
using this surface for dedicated audio or image workflows that belong on the direct APIs

Documentation Index

​Supported Content Blocks

​When to use this surface

​Image Example

​File Example

​Audio Example

​Caveats

​Common mistakes

​Related Docs