Multimodal inputs let a model read more than plain text. NagaAI supports images, files, and audio across multiple generation APIs, but each public surface represents those inputs differently.Documentation Index
Fetch the complete documentation index at: https://docs.naga.ac/llms.txt
Use this file to discover all available pages before exploring further.
Support Matrix
| API | Images | Files / PDFs | Audio input | Notes |
|---|---|---|---|---|
Responses | input_image parts | input_file parts | input_audio parts | Best starting point for new multimodal LLM work |
Chat Completions | image_url blocks | file blocks | input_audio blocks on supported models | Use for existing OpenAI-style chat clients |
Messages | Anthropic image blocks | Anthropic document blocks | model- and provider-dependent | Use for Anthropic-style content-block tooling |
Shape At A Glance
Responses
input[]
message
content[]
input_text
input_image
input_file
input_audio
Chat Completions
messages[]
content[]
text
image_url
file
input_audio
Messages
messages[]
content[]
text
image
document
When To Use It
- image or PDF analysis inside an LLM workflow
- ask-and-answer over screenshots, receipts, or scanned documents
- audio understanding inside a conversational model flow
Recommended Example
Important Boundaries
Responsesaccepts typed multimodal parts, but support still depends on the selected model- use the direct Images API for image generation and image edits
- use the direct Audio API for transcription, translation, and text-to-speech
Common Pitfalls
- assuming endpoint support automatically means model support
- using a direct generation API when you actually need multimodal understanding inside an LLM turn
- relying on opaque file references when a page documents inline URLs or data payloads instead