Chat Completions API accepts either plain string message content or arrays of typed content blocks.
Use this when you already depend on chat-style messages[] but still need images, PDFs, or audio in the same request.
Supported Content Blocks
| Block type | Shape | Notes |
|---|---|---|
text | { "type": "text", "text": "..." } | Plain text |
image_url | { "type": "image_url", "image_url": { "url": "...", "detail": "auto" } } | url accepts http, https, or data; detail can be low, high, or auto |
file | { "type": "file", "file": { "filename": "...", "file_data": "..." } } | file_data accepts http, https, or data |
input_audio | { "type": "input_audio", "input_audio": { "data": "...", "format": "wav" } } | data must be raw base64, not a data URI; common formats include wav and mp3 |
When to use this surface
- you already have chat-based client code and want to keep it
- you need multimodal inputs but do not want to migrate to Responses yet
Responses API is usually the cleaner multimodal path.
Image Example
File Example
Audio Example
Caveats
input_audio.datamust be raw base64, not adata:URL- multimodal support still depends on the selected model
- use the direct Audio API for transcription, translation, and text-to-speech
- use the direct Images API for primary image generation and image edit workflows
Common mistakes
- mixing chat content-block syntax with Responses input-part syntax
- assuming every model supports every multimodal block type
- using this surface for dedicated audio or image workflows that belong on the direct APIs