Support Matrix
| API | Images | Files / PDFs | Audio input | Notes |
|---|---|---|---|---|
Responses | input_image parts | input_file parts | input_audio parts | Best starting point for new multimodal LLM work |
Chat Completions | image_url blocks | file blocks | input_audio blocks on supported models | Use for existing OpenAI-style chat clients |
Messages | Anthropic image blocks | Anthropic document blocks | model- and provider-dependent | Use for Anthropic-style content-block tooling |
Shape At A Glance
Responses
input[]
message
content[]
input_text
input_image
input_file
input_audio
Chat Completions
messages[]
content[]
text
image_url
file
input_audio
Messages
messages[]
content[]
text
image
document
When To Use It
- image or PDF analysis inside an LLM workflow
- ask-and-answer over screenshots, receipts, or scanned documents
- audio understanding inside a conversational model flow
Recommended Example
Important Boundaries
Responsesaccepts typed multimodal parts, but support still depends on the selected model- use the direct Images API for image generation and image edits
- use the direct Audio API for transcription, translation, and text-to-speech
Common Pitfalls
- assuming endpoint support automatically means model support
- using a direct generation API when you actually need multimodal understanding inside an LLM turn
- relying on opaque file references when a page documents inline URLs or data payloads instead