Skip to main content
Chat Completions API accepts either plain string message content or arrays of typed content blocks. Use this when you already depend on chat-style messages[] but still need images, PDFs, or audio in the same request.

Supported Content Blocks

Block typeShapeNotes
text{ "type": "text", "text": "..." }Plain text
image_url{ "type": "image_url", "image_url": { "url": "...", "detail": "auto" } }url accepts http, https, or data; detail can be low, high, or auto
file{ "type": "file", "file": { "filename": "...", "file_data": "..." } }file_data accepts http, https, or data
input_audio{ "type": "input_audio", "input_audio": { "data": "...", "format": "wav" } }data must be raw base64, not a data URI; common formats include wav and mp3

When to use this surface

  • you already have chat-based client code and want to keep it
  • you need multimodal inputs but do not want to migrate to Responses yet
If you are starting from scratch, Responses API is usually the cleaner multimodal path.

Image Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image briefly."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/receipt.png",
                        "detail": "auto",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

File Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

completion = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Summarize this PDF."},
                {
                    "type": "file",
                    "file": {
                        "filename": "report.pdf",
                        "file_data": "https://example.com/report.pdf",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Audio Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.naga.ac/v1",
    api_key="YOUR_API_KEY",
)

completion = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Tell me what is said in this audio."},
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "BASE64_AUDIO",
                        "format": "wav",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Caveats

  • input_audio.data must be raw base64, not a data: URL
  • multimodal support still depends on the selected model
  • use the direct Audio API for transcription, translation, and text-to-speech
  • use the direct Images API for primary image generation and image edit workflows

Common mistakes

  • mixing chat content-block syntax with Responses input-part syntax
  • assuming every model supports every multimodal block type
  • using this surface for dedicated audio or image workflows that belong on the direct APIs