Vision reasoning input format for large vision model. Model only supports one instance at a time.
prompt
string
The text prompt for guiding the response in QA.
Text responses will be generated from the masked area if mask is provided.
Image
mimeType
string
Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
data
Union type
The image bytes or Cloud Storage URI to make the prediction on.
data
can be only one of the following:bytesBase64Encoded
string
Base64 encoded bytes string representing the image.
gcsUri
string
Cloud Storage URI representing the image in user project.
JSON representation |
---|
{ "mimeType": string, // data "bytesBase64Encoded": string, "gcsUri": string // Union type } |
Video
data
Union type
The video string bytes or Cloud Storage URI to make the prediction on.
data
can be only one of the following:bytesBase64Encoded
string
Base64 encoded bytes string representing the video.
gcsUri
string
JSON representation |
---|
{ // data "bytesBase64Encoded": string, "gcsUri": string // Union type } |