Index
- TextEmbeddingPredictionInstance(message)
- TextEmbeddingPredictionInstance.TaskType(enum)
- VideoGenerationModelInstance(message)
- VideoGenerationModelInstance.Image(message)
- VideoGenerationModelInstance.Mask(message)
- VideoGenerationModelInstance.ReferenceImage(message)
- VideoGenerationModelInstance.Video(message)
- VirtualTryOnModelInstance(message)
- VirtualTryOnModelInstance.Image(message)
- VirtualTryOnModelInstance.PersonImage(message)
- VirtualTryOnModelInstance.ProductImage(message)
- VirtualTryOnModelInstance.ProductImageConfig(message)
- VirtualTryOnModelInstance.ProductImageConfig.MaskMode(enum)
- VisionEmbeddingModelInstance(message)
- VisionEmbeddingModelInstance.Image(message)
- VisionEmbeddingModelInstance.Video(message)
- VisionEmbeddingModelInstance.Video.VideoSegmentConfig(message)
- VisionGenerativeModelInstance(message)
- VisionGenerativeModelInstance.ControlImageConfig(message)
- VisionGenerativeModelInstance.ControlImageConfig.ControlType(enum)
- VisionGenerativeModelInstance.Image(message)
- VisionGenerativeModelInstance.Mask(message)
- VisionGenerativeModelInstance.Mask.BoundingPolyList(message)
- VisionGenerativeModelInstance.MaskImageConfig(message)
- VisionGenerativeModelInstance.MaskImageConfig.MaskMode(enum)
- VisionGenerativeModelInstance.ReferenceImage(message)
- VisionGenerativeModelInstance.ReferenceImage.ReferenceType(enum)
- VisionGenerativeModelInstance.StyleImageConfig(message)
- VisionGenerativeModelInstance.SubjectImageConfig(message)
- VisionGenerativeModelInstance.SubjectImageConfig.SubjectType(enum)
- VisionReasoningModelInstance(message)
- VisionReasoningModelInstance.Image(message)
- VisionReasoningModelInstance.Video(message)
TextEmbeddingPredictionInstance
Prediction input format for Text Embedding. LINT.IfChange
| Fields | |
|---|---|
| content | 
 The main text content to embed. | 
| title | 
 Optional identifier of the text content. | 
| task_type | Optional downstream task the embeddings will be used for. | 
TaskType
Represents a downstream task the embeddings will be used for. next_id: 9
| Enums | |
|---|---|
| DEFAULT | Unset value, which will default to one of the other enum values. | 
| RETRIEVAL_QUERY | Specifies the given text is a query in a search/retrieval setting. | 
| RETRIEVAL_DOCUMENT | Specifies the given text is a document from the corpus being searched. | 
| SEMANTIC_SIMILARITY | Specifies the given text will be used for STS. | 
| CLASSIFICATION | Specifies that the given text will be classified. | 
| CLUSTERING | Specifies that the embeddings will be used for clustering. | 
| QUESTION_ANSWERING | Specifies that the embeddings will be used for question answering. | 
| FACT_VERIFICATION | Specifies that the embeddings will be used for fact verification. | 
| CODE_RETRIEVAL_QUERY | Specifies that the embeddings will be used for code retrieval. | 
VideoGenerationModelInstance
Video generation input format for video generation model.
| Fields | |
|---|---|
| prompt | 
 The text prompt for generating the videos. | 
| image | An image to use as the first frame of the generated video. If an input image is provided, an input video is not supported. | 
| video | An input video. If this field is provided, an input image is not supported. If a mask is provided along with the video, this video will be editing using the mask. Otherwise, this video will be extended by the given duration. | 
| last_frame | Image to use as the last frame of generated videos. An input image must also be provided. | 
| camera_control | 
 Camera motion to use in generated videos. An input image must also be provided. Valid values are: - fixed - pan_left - pan_right - tilt_up - tilt_down - truck_left - truck_right - pedestal_up - pedestal_down - push_in - pull_out | 
| mask | Mask to use in generated videos. | 
| reference_images[] | The images to use as the references to generate the videos. If this field is provided, the text prompt field must also be provided. The image, video, or last_frame field are not supported. Each image must be associated with a type. Veo 2 supports up to 3 asset images or 1 style image. | 
Image
Image input format for the prediction.
| Fields | |
|---|---|
| mime_type | 
 The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png | 
| Union field data. The image data.datacan be only one of the following: | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the image. | 
| gcs_uri | 
 The Google Cloud Storage location of the image. | 
Mask
Mask input format for the prediction.
| Fields | |
|---|---|
| mime_type | 
 Valid values: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv | 
| mask_mode | 
 Describes how the mask will be used. Inpainting masks must match the aspect ration of the input video. Outpainting masks can be either 9:16 or 16:9. Available options are: - insert: The image mask contains a masked rectangular region which is applied on the first frame of the input video. The object described in the prompt is inserted into this region and will appear in subsequent frames. - remove: The image mask is used to determine an object in the first video frame to track. This object is removed from the video. - remove_static: The image mask is used to determine a region in the video. Objects in this region will be removed. - outpaint: The image mask contains a masked rectangular region where the input video will go. The remaining area will be generated. Video masks are not supported. | 
| Union field data. The mask data.datacan be only one of the following: | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the mask. | 
| gcs_uri | 
 The Google Cloud Storage location of the mask. | 
ReferenceImage
Reference image input format for the prediction. A ReferenceImage is an image that is used to provide additional context for the video generation.
| Fields | |
|---|---|
| image | The image data to be used as the reference image. | 
| reference_type | 
 The type of the reference image, which defines how the reference image will be used to generate the video. Supported types are: - asset: The reference image provides assets to the generated video, such as the scene, an object, a character, etc. - style: The aesthetics of the reference image, including colors, lighting, texture, etc., are used as the style of the generated video, such as 'anime', 'photography', 'origami', etc. | 
Video
Video input format for the prediction.
| Fields | |
|---|---|
| mime_type | 
 The MIME type of the content of the video. Only the videos in below listed MIME types are supported. - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv | 
| Union field data. The video data.datacan be only one of the following: | |
| gcs_uri | 
 The Google Cloud Storage location of the video on which to perform the prediction. | 
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the video. | 
VirtualTryOnModelInstance
Media generation input format for the Virtual Try On model.
| Fields | |
|---|---|
| prompt | 
 The text prompt for generating the images. This is required for both editing and generation. | 
| product_images[] | The image of the products to wear on the person. | 
| person_image | The image of the person to be edited with the product images. | 
Image
Input image and metadata.
| Fields | |
|---|---|
| mime_type | 
 The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png | 
| Union field data. The image bytes or Cloud Storage URI to make the prediction on.datacan be only one of the following: | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the image. | 
| gcs_uri | 
 The Cloud Storage URI of the image. | 
PersonImage
A PersonImage is used to provide the person image and its associated configuration options for Virtual Try On.
| Fields | |
|---|---|
| image | The image bytes or Cloud Storage URI of the person or subject that will be edited using the product images. | 
ProductImage
A ProductImage is used to provide the product image and its associated configuration options for Virtual Try On.
| Fields | |
|---|---|
| image | The actual image data of the reference image. | 
| mask_image | The mask image associated with this product. If provided, the mask image will be used to guide the image editing. | 
| product_image_config | A config for the product image. | 
ProductImageConfig
Config for the product image.
| Fields | |
|---|---|
| mask_mode | Mode used to control the segmentation logic. | 
| dilation | 
 Dilation to be used with this Mask. | 
| product_description | 
 Description of the product. | 
MaskMode
Mode used to generate the mask if mask is not provided.
| Enums | |
|---|---|
| MASK_MODE_DEFAULT | Default value for mask mode. | 
| MASK_MODE_USER_PROVIDED | User provided mask. No segmentation needed. | 
| MASK_MODE_DETECTION_BOX | Mask from detected bounding boxes. | 
| MASK_MODE_CLOTHING_AREA | Masks from segmenting the clothing area with open-vocab segmentation. | 
| MASK_MODE_PARSED_PERSON | Masks from segmenting the person body and clothing using the person-parsing model. | 
VisionEmbeddingModelInstance
Media embedding input format for large vision model embedding api.
| Fields | |
|---|---|
| image | The image bytes or Cloud Storage URI to generate the image embedding. | 
| text | 
 The text for generating the text embedding. | 
| video | The video bytes or Cloud Storage URI to generate the video embedding. | 
Image
The image bytes or Cloud Storage URI to make the prediction on.
| Fields | |
|---|---|
| mime_type | 
 The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png | 
| Union field  
 | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the image. | 
| gcs_uri | 
 | 
Video
The video bytes or Cloud Storage URI to make the prediction on.
| Fields | |
|---|---|
| video_segment_config | Video configurations. | 
| Union field  
 | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the video. | 
| gcs_uri | 
 | 
VideoSegmentConfig
Video segment configurations.
| Fields | |
|---|---|
| start_offset_sec | 
 The start offset of the video segment in seconds. | 
| end_offset_sec | 
 The end offset of the video segment in seconds. | 
| interval_sec | 
 The interval of the video for which the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError will be returned. There is no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s), it will affect the quality of the generated embeddings. | 
VisionGenerativeModelInstance
Media generation input format for large vision model.
| Fields | |
|---|---|
| image | The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation. | 
| prompt | 
 The text prompt for generating the images. This is required for both editing and generation. | 
| mask | Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images. | 
| reference_images[] | The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config. | 
ControlImageConfig
Config for control image used for editing.
| Fields | |
|---|---|
| control_type | Type of control image. | 
| enable_control_image_computation | 
 Whether to compute the control image for the request. | 
| superpixel_region_size | 
 Region size of the superpixel control image. | 
| superpixel_ruler | 
 Ruler of the superpixel control image. | 
ControlType
Type of control image.
| Enums | |
|---|---|
| CONTROL_TYPE_DEFAULT | Default value for control image. | 
| CONTROL_TYPE_CANNY | Canny sketch control image. | 
| CONTROL_TYPE_SCRIBBLE | Scribble sketch control image using HED model. | 
| CONTROL_TYPE_FACE_MESH | Control mode for using Face mesh style editing | 
| CONTROL_TYPE_COLOR_SUPERPIXEL | Color superpixel control image. | 
Image
| Fields | |
|---|---|
| mime_type | 
 The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png | 
| Union field data. The image bytes or Cloud Storage URI to make the prediction on.datacan be only one of the following: | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the image. | 
| gcs_uri | 
 | 
Mask
| Fields | |
|---|---|
| Union field  
 | |
| image | |
| polygon_list | |
BoundingPolyList
| Fields | |
|---|---|
| polygons[] | |
MaskImageConfig
Config for masked image editing using Imagen 3 Capability
| Fields | |
|---|---|
| mask_mode | Mode used to generate the mask if mask is not provided. | 
| dilation | 
 Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode. | 
| mask_classes[] | 
 The segmentation classes which are used in the MASK_MODE_SEMANTIC mode. | 
MaskMode
Mode used to generate the mask if mask is not provided.
| Enums | |
|---|---|
| MASK_MODE_DEFAULT | Default value for mask mode. | 
| MASK_MODE_USER_PROVIDED | User provided mask. No generation needed. | 
| MASK_MODE_BACKGROUND | Background mask. All elements detected as background will be masked. | 
| MASK_MODE_FOREGROUND | Foreground mask. All elements detected as foreground will be masked. | 
| MASK_MODE_SEMANTIC | Semantic mask. Objects identified as one of the classes defined in mask_classes will be masked. | 
ReferenceImage
A ReferenceImage is an image that is used to provide additional context for the image generation or editing.
| Fields | |
|---|---|
| reference_image | The actual image data of the reference image. | 
| reference_id | 
 The id of the reference image. This must be unique within the request. | 
| reference_type | The type of the reference image. | 
| Union field reference_config. A config describing the reference image.reference_configcan be only one of the following: | |
| mask_image_config | A config for a mask image. | 
| control_image_config | A config for a control image. | 
| style_image_config | A config for a style image. | 
| subject_image_config | A config for a subject image. | 
ReferenceType
The type of the reference image.
| Enums | |
|---|---|
| REFERENCE_TYPE_DEFAULT | Default value for reference in image. | 
| REFERENCE_TYPE_RAW | A normal RGB image. | 
| REFERENCE_TYPE_MASK | A mask image. | 
| REFERENCE_TYPE_CONTROL | A control (line sketch) image. | 
| REFERENCE_TYPE_STYLE | A style image. | 
| REFERENCE_TYPE_SUBJECT | A subject image. | 
| REFERENCE_TYPE_CONTENT | A content image for R2I. | 
StyleImageConfig
Config for style image used for editing.
| Fields | |
|---|---|
| style_description | 
 Description of the style image. | 
SubjectImageConfig
Config for subject image used for editing.
| Fields | |
|---|---|
| subject_description | 
 Description of the subject image. | 
| subject_type | Type of subject image. | 
SubjectType
Type of subject image.
| Enums | |
|---|---|
| SUBJECT_TYPE_DEFAULT | Default value for subject image. | 
| SUBJECT_TYPE_PERSON | The subject of the image is a person. | 
| SUBJECT_TYPE_ANIMAL | The subject of the image is an animal. | 
| SUBJECT_TYPE_PRODUCT | The subject of the image is a product/object. | 
VisionReasoningModelInstance
Vision reasoning input format for large vision model. Model only supports one instance at a time.
| Fields | |
|---|---|
| prompt | 
 The text prompt for guiding the response in QA. | 
| mask | Text responses will be generated from the masked area if mask is provided. | 
| Union field  
 | |
| image | The image bytes or Cloud Storage URI to make the prediction on. | 
| video | The video bytes or Cloud storage URI to make the prediction on. | 
Image
| Fields | |
|---|---|
| mime_type | 
 Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png | 
| Union field data. The image bytes or Cloud Storage URI to make the prediction on.datacan be only one of the following: | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the image. | 
| gcs_uri | 
 Cloud Storage URI representing the image in user project. | 
Video
| Fields | |
|---|---|
| Union field data. The video string bytes or Cloud Storage URI to make the prediction on.datacan be only one of the following: | |
| bytes_base64_encoded | 
 Base64 encoded bytes string representing the video. | 
| gcs_uri | 
 |