Responses API accepts either a plain string input or an array of typed input items. For multimodal requests, use message items whose content array contains typed parts.
Use this page when your prompt needs more than plain text, such as screenshots, PDFs, or audio clips.
Supported Input Parts
| Part type | Main fields | Notes |
|---|---|---|
input_text | text | Plain text input |
input_image | image_url, optional detail | image_url accepts http, https, or data URLs |
input_audio | input_audio object | Common payload uses data plus format |
input_file | filename, file_data, file_url, or nested input_file | Use inline URL or data payloads |
Image Example
File And PDF Inputs
Use inline file payloads that the gateway can forward through its chat-style pipeline.{"type":"input_file","filename":"policy.pdf","file_data":"https://example.com/policy.pdf"}{"type":"input_file","filename":"policy.pdf","file_data":"data:application/pdf;base64,..."}{"type":"input_file","input_file":{"filename":"policy.pdf","file_data":"https://example.com/policy.pdf"}}
file_id is not supported on this public gateway path.
Audio Inputs
For multimodal audio understanding, send aninput_audio part. A common payload shape is:
Input Validation
Model and capability validation happens centrally. If a chosen model does not support one of your requested input types, the request can fail before generation begins.Common mistakes
- using a plain string
inputwhen the request actually needs typed multimodal parts - trying to send
file_idreferences on this gateway path - choosing a model that does not support the input type you are sending
- using
Responsesmultimodal input for workflows that should really use the dedicatedAudio APIorImages API