Skip to main content
Responses API accepts either a plain string input or an array of typed input items. For multimodal requests, use message items whose content array contains typed parts. Use this page when your prompt needs more than plain text, such as screenshots, PDFs, or audio clips.

Supported Input Parts

Part typeMain fieldsNotes
input_texttextPlain text input
input_imageimage_url, optional detailimage_url accepts http, https, or data URLs
input_audioinput_audio objectCommon payload uses data plus format
input_filefilename, file_data, file_url, or nested input_fileUse inline URL or data payloads

Image Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {"type": "input_text", "text": "Summarize this receipt."},
                {
                    "type": "input_image",
                    "image_url": "https://example.com/receipt.png",
                    "detail": "high",
                },
            ],
        }
    ],
)

print(response.output_text)

File And PDF Inputs

Use inline file payloads that the gateway can forward through its chat-style pipeline.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

response = client.responses.create(
    model="gpt-4.1",
    input=[
        {
            "type": "message",
            "role": "user",
            "content": [
                {
                    "type": "input_text",
                    "text": "Read this PDF and list the main obligations.",
                },
                {
                    "type": "input_file",
                    "filename": "policy.pdf",
                    "file_data": "https://example.com/policy.pdf",
                },
            ],
        }
    ],
)

print(response.output_text)
Supported file patterns include:
  • {"type":"input_file","filename":"policy.pdf","file_data":"https://example.com/policy.pdf"}
  • {"type":"input_file","filename":"policy.pdf","file_data":"data:application/pdf;base64,..."}
  • {"type":"input_file","input_file":{"filename":"policy.pdf","file_data":"https://example.com/policy.pdf"}}
file_id is not supported on this public gateway path.

Audio Inputs

For multimodal audio understanding, send an input_audio part. A common payload shape is:
{
  "type": "input_audio",
  "input_audio": {
    "data": "UklGRiQAAABXQVZFZm10IBAAAAABAAEAIlYAAESsAAACABAAZGF0YQAAAAA=",
    "format": "wav"
  }
}
Use the direct Audio API instead when the job is transcription, translation, or text-to-speech.

Input Validation

Model and capability validation happens centrally. If a chosen model does not support one of your requested input types, the request can fail before generation begins.

Common mistakes

  • using a plain string input when the request actually needs typed multimodal parts
  • trying to send file_id references on this gateway path
  • choosing a model that does not support the input type you are sending
  • using Responses multimodal input for workflows that should really use the dedicated Audio API or Images API