Example
When this is useful
- moderating social posts with both text and an image
- screening user prompts that reference an uploaded image
- applying one safety check before a multimodal workflow continues
Common mistakes
- sending separate moderation calls when your product logic really cares about the combined submission
- expecting separate per-item result objects instead of one result for the whole request
- mixing typed inputs incorrectly across text and image shapes