Multimodal Inputs - NagaAI Documentation

Responses API accepts either a plain string input or an array of typed input items. For multimodal requests, use message items whose content array contains typed parts. Use this page when your prompt needs more than plain text, such as screenshots, PDFs, or audio clips.

Supported Input Parts

Part type	Main fields	Notes
`input_text`	`text`	Plain text input
`input_image`	`image_url`, optional `detail`	`image_url` accepts `http`, `https`, or `data` URLs
`input_audio`	`input_audio` object	Common payload uses `data` plus `format`
`input_file`	`filename`, `file_data`, `file_url`, or nested `input_file`	Use inline URL or data payloads

Image Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Summarize this receipt."},
                {
                    "type": "input_image",
                    "image_url": "https://example.com/receipt.png",
                    "detail": "high",
                },
            ],
        }
    ],
)

print(response.output_text)

File And PDF Inputs

Use inline file payloads that the gateway can forward through its chat-style pipeline.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Read this PDF and list the main obligations.",
                },
                {
                    "type": "input_file",
                    "filename": "policy.pdf",
                    "file_data": "https://example.com/policy.pdf",
                },
            ],
        }
    ],
)

print(response.output_text)

Supported file patterns include:

{"type":"input_file","filename":"policy.pdf","file_data":"https://example.com/policy.pdf"}
{"type":"input_file","filename":"policy.pdf","file_data":"data:application/pdf;base64,..."}
{"type":"input_file","input_file":{"filename":"policy.pdf","file_data":"https://example.com/policy.pdf"}}

file_id is not supported on this public gateway path.

Audio Inputs

For multimodal audio understanding, send an input_audio part. A common payload shape is:

{
  "type": "input_audio",
  "input_audio": {
    "data": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAIlYAAESsAAACABAAZGF0YQAAAAA=",
    "format": "wav"
  }
}

Use the direct Audio API instead when the job is transcription, translation, or text-to-speech.

Input Validation

Model and capability validation happens centrally. If a chosen model does not support one of your requested input types, the request can fail before generation begins.

Common mistakes

using a plain string input when the request actually needs typed multimodal parts
trying to send file_id references on this gateway path
choosing a model that does not support the input type you are sending
using Responses multimodal input for workflows that should really use the dedicated Audio API or Images API

Documentation Index

​Supported Input Parts

​Image Example

​File And PDF Inputs

​Audio Inputs

​Input Validation

​Common mistakes

​Related Docs