Index
TextEmbeddingPredictionInstance(message)TextEmbeddingPredictionInstance.TaskType(enum)VideoGenerationModelInstance(message)VideoGenerationModelInstance.Image(message)VideoGenerationModelInstance.Mask(message)VideoGenerationModelInstance.ReferenceImage(message)VideoGenerationModelInstance.Video(message)VirtualTryOnModelInstance(message)VirtualTryOnModelInstance.Image(message)VirtualTryOnModelInstance.PersonImage(message)VirtualTryOnModelInstance.ProductImage(message)VirtualTryOnModelInstance.ProductImageConfig(message)VirtualTryOnModelInstance.ProductImageConfig.MaskMode(enum)VisionEmbeddingModelInstance(message)VisionEmbeddingModelInstance.Image(message)VisionEmbeddingModelInstance.Video(message)VisionEmbeddingModelInstance.Video.VideoSegmentConfig(message)VisionGenerativeModelInstance(message)VisionGenerativeModelInstance.ControlImageConfig(message)VisionGenerativeModelInstance.ControlImageConfig.ControlType(enum)VisionGenerativeModelInstance.Image(message)VisionGenerativeModelInstance.Mask(message)VisionGenerativeModelInstance.Mask.BoundingPolyList(message)VisionGenerativeModelInstance.MaskImageConfig(message)VisionGenerativeModelInstance.MaskImageConfig.MaskMode(enum)VisionGenerativeModelInstance.ReferenceImage(message)VisionGenerativeModelInstance.ReferenceImage.ReferenceType(enum)VisionGenerativeModelInstance.StyleImageConfig(message)VisionGenerativeModelInstance.SubjectImageConfig(message)VisionGenerativeModelInstance.SubjectImageConfig.SubjectType(enum)VisionReasoningModelInstance(message)VisionReasoningModelInstance.Image(message)VisionReasoningModelInstance.Video(message)
TextEmbeddingPredictionInstance
Prediction input format for Text Embedding. LINT.IfChange
| Fields | |
|---|---|
content |
The main text content to embed. |
title |
Optional identifier of the text content. |
task_type |
Optional downstream task the embeddings will be used for. |
TaskType
Represents a downstream task the embeddings will be used for. next_id: 9
| Enums | |
|---|---|
DEFAULT |
Unset value, which will default to one of the other enum values. |
RETRIEVAL_QUERY |
Specifies the given text is a query in a search/retrieval setting. |
RETRIEVAL_DOCUMENT |
Specifies the given text is a document from the corpus being searched. |
SEMANTIC_SIMILARITY |
Specifies the given text will be used for STS. |
CLASSIFICATION |
Specifies that the given text will be classified. |
CLUSTERING |
Specifies that the embeddings will be used for clustering. |
QUESTION_ANSWERING |
Specifies that the embeddings will be used for question answering. |
FACT_VERIFICATION |
Specifies that the embeddings will be used for fact verification. |
CODE_RETRIEVAL_QUERY |
Specifies that the embeddings will be used for code retrieval. |
VideoGenerationModelInstance
An instance for a video generation prediction request.
| Fields | |
|---|---|
prompt |
A text description of the video you want to generate. The prompt should specify the subject, style, and any specific elements or actions that should appear in the video. |
image |
An optional image to use as a starting point for video generation, used as the first frame of the generated video. If |
video |
An optional input video to use as a starting point for video generation. If |
last_frame |
Image to use as the last frame of the generated video. This field can only be used if |
camera_control |
The camera motion to apply to the generated video. This field can only be used if |
mask |
An optional mask to apply to the input |
reference_images[] |
Optional reference images to guide video generation. If |
Image
Defines the input image format.
| Fields | |
|---|---|
mime_type |
The MIME type of the image. Supported MIME types: - image/jpeg - image/png |
Union field data. The image data. The image can be provided as either base64 encoded bytes or a Google Cloud Storage URI. data can be only one of the following: |
|
bytes_base64_encoded |
The image bytes encoded in base64. |
gcs_uri |
A Google Cloud Storage URI pointing to the image file. |
Mask
Defines the input mask format for video editing. A mask specifies regions of a video or image to modify or preserve.
| Fields | |
|---|---|
mime_type |
The MIME type of the mask. Supported MIME types: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv |
mask_mode |
Specifies how the mask is applied to the input video for editing. For |
Union field data. The mask data. The mask can be provided as either base64 encoded bytes or a Google Cloud Storage URI. data can be only one of the following: |
|
bytes_base64_encoded |
The mask bytes encoded in base64. |
gcs_uri |
A Google Cloud Storage URI pointing to the mask file. |
ReferenceImage
Defines the input reference image format. A reference image provides additional context to guide video generation, such as style or assets.
| Fields | |
|---|---|
image |
The image data for the reference image. |
reference_type |
The type of reference image, which defines how it influences video generation. Supported values: - |
Video
Defines the input video format.
| Fields | |
|---|---|
mime_type |
The MIME type of the video. Supported MIME types: - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv |
Union field data. The video data. The video can be provided as either base64 encoded bytes or a Google Cloud Storage URI. data can be only one of the following: |
|
gcs_uri |
A Google Cloud Storage URI pointing to the video file. |
bytes_base64_encoded |
The video bytes encoded in base64. |
VirtualTryOnModelInstance
Media generation input format for the Virtual Try On model.
| Fields | |
|---|---|
prompt |
The text prompt for generating the images. This is required for both editing and generation. |
product_images[] |
The image of the products to wear on the person. |
person_image |
The image of the person to be edited with the product images. |
Image
Input image and metadata.
| Fields | |
|---|---|
mime_type |
The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png |
Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
The Cloud Storage URI of the image. |
PersonImage
A PersonImage is used to provide the person image and its associated configuration options for Virtual Try On.
| Fields | |
|---|---|
image |
The image bytes or Cloud Storage URI of the person or subject that will be edited using the product images. |
ProductImage
A ProductImage is used to provide the product image and its associated configuration options for Virtual Try On.
| Fields | |
|---|---|
image |
The actual image data of the reference image. |
mask_image |
The mask image associated with this product. If provided, the mask image will be used to guide the image editing. |
product_image_config |
A config for the product image. |
ProductImageConfig
Config for the product image.
| Fields | |
|---|---|
mask_mode |
Mode used to control the segmentation logic. |
dilation |
Dilation to be used with this Mask. |
product_description |
Description of the product. |
MaskMode
Mode used to generate the mask if mask is not provided.
| Enums | |
|---|---|
MASK_MODE_DEFAULT |
Default value for mask mode. |
MASK_MODE_USER_PROVIDED |
User provided mask. No segmentation needed. |
MASK_MODE_DETECTION_BOX |
Mask from detected bounding boxes. |
MASK_MODE_CLOTHING_AREA |
Masks from segmenting the clothing area with open-vocab segmentation. |
MASK_MODE_PARSED_PERSON |
Masks from segmenting the person body and clothing using the person-parsing model. |
VisionEmbeddingModelInstance
Input format for requesting embeddings from vision models. An embedding is a list of numbers that represents the semantic meaning of text, an image, or a video. Embeddings can be used for many applications, like searching for similar images or getting recommendations. Each instance must specify exactly one of text, image, or video field.
| Fields | |
|---|---|
image |
An image to generate embeddings for. |
text |
Text to generate embeddings for. |
video |
A video to generate embeddings for. |
Image
Represents an image input for embedding generation.
| Fields | |
|---|---|
mime_type |
The MIME type of the image. The supported MIME types are:
|
Union field
|
|
bytes_base64_encoded |
Base64-encoded bytes of the image. |
gcs_uri |
A Cloud Storage URI pointing to the image file. Format: |
Video
Represents a video input for embedding generation.
| Fields | |
|---|---|
video_segment_config |
Configuration for processing a video segment. If specified, embeddings are generated for the segment. If not specified, embeddings are generated for the entire video. |
Union field
|
|
bytes_base64_encoded |
Base64-encoded bytes of the video. |
gcs_uri |
A Cloud Storage URI pointing to the video file. Format: |
VideoSegmentConfig
Configuration for processing a segment of a video.
| Fields | |
|---|---|
start_offset_sec |
The start offset of the video segment in seconds. |
end_offset_sec |
The end offset of the video segment in seconds. |
interval_sec |
The interval of the video for which the embedding will be generated. The minimum value for |
VisionGenerativeModelInstance
Media generation input format for large vision model.
| Fields | |
|---|---|
image |
The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation. |
prompt |
The text prompt for generating the images. This is required for both editing and generation. |
mask |
Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images. |
reference_images[] |
The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config. |
ControlImageConfig
Config for control image used for editing.
| Fields | |
|---|---|
control_type |
Type of control image. |
enable_control_image_computation |
Whether to compute the control image for the request. |
superpixel_region_size |
Region size of the superpixel control image. |
superpixel_ruler |
Ruler of the superpixel control image. |
ControlType
Type of control image.
| Enums | |
|---|---|
CONTROL_TYPE_DEFAULT |
Default value for control image. |
CONTROL_TYPE_CANNY |
Canny sketch control image. |
CONTROL_TYPE_SCRIBBLE |
Scribble sketch control image using HED model. |
CONTROL_TYPE_FACE_MESH |
Control mode for using Face mesh style editing |
CONTROL_TYPE_COLOR_SUPERPIXEL |
Color superpixel control image. |
Image
| Fields | |
|---|---|
mime_type |
The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png |
Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
|
Mask
| Fields | |
|---|---|
Union field
|
|
image |
|
polygon_list |
|
BoundingPolyList
| Fields | |
|---|---|
polygons[] |
|
MaskImageConfig
Config for masked image editing using Imagen 3 Capability
| Fields | |
|---|---|
mask_mode |
Mode used to generate the mask if mask is not provided. |
dilation |
Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode. |
mask_classes[] |
The segmentation classes which are used in the MASK_MODE_SEMANTIC mode. |
MaskMode
Mode used to generate the mask if mask is not provided.
| Enums | |
|---|---|
MASK_MODE_DEFAULT |
Default value for mask mode. |
MASK_MODE_USER_PROVIDED |
User provided mask. No generation needed. |
MASK_MODE_BACKGROUND |
Background mask. All elements detected as background will be masked. |
MASK_MODE_FOREGROUND |
Foreground mask. All elements detected as foreground will be masked. |
MASK_MODE_SEMANTIC |
Semantic mask. Objects identified as one of the classes defined in mask_classes will be masked. |
ReferenceImage
A ReferenceImage is an image that is used to provide additional context for the image generation or editing.
| Fields | |
|---|---|
reference_image |
The actual image data of the reference image. |
reference_id |
The id of the reference image. This must be unique within the request. |
reference_type |
The type of the reference image. |
Union field reference_config. A config describing the reference image. reference_config can be only one of the following: |
|
mask_image_config |
A config for a mask image. |
control_image_config |
A config for a control image. |
style_image_config |
A config for a style image. |
subject_image_config |
A config for a subject image. |
ReferenceType
The type of the reference image.
| Enums | |
|---|---|
REFERENCE_TYPE_DEFAULT |
Default value for reference in image. |
REFERENCE_TYPE_RAW |
A normal RGB image. |
REFERENCE_TYPE_MASK |
A mask image. |
REFERENCE_TYPE_CONTROL |
A control (line sketch) image. |
REFERENCE_TYPE_STYLE |
A style image. |
REFERENCE_TYPE_SUBJECT |
A subject image. |
REFERENCE_TYPE_CONTENT |
A content image for R2I. |
StyleImageConfig
Config for style image used for editing.
| Fields | |
|---|---|
style_description |
Description of the style image. |
SubjectImageConfig
Config for subject image used for editing.
| Fields | |
|---|---|
subject_description |
Description of the subject image. |
subject_type |
Type of subject image. |
SubjectType
Type of subject image.
| Enums | |
|---|---|
SUBJECT_TYPE_DEFAULT |
Default value for subject image. |
SUBJECT_TYPE_PERSON |
The subject of the image is a person. |
SUBJECT_TYPE_ANIMAL |
The subject of the image is an animal. |
SUBJECT_TYPE_PRODUCT |
The subject of the image is a product/object. |
VisionReasoningModelInstance
Vision reasoning input format for large vision model. Model only supports one instance at a time.
| Fields | |
|---|---|
prompt |
The text prompt for guiding the response in QA. |
mask |
Text responses will be generated from the masked area if mask is provided. |
Union field
|
|
image |
The image bytes or Cloud Storage URI to make the prediction on. |
video |
The video bytes or Cloud storage URI to make the prediction on. |
Image
| Fields | |
|---|---|
mime_type |
Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png |
Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
Cloud Storage URI representing the image in user project. |
Video
| Fields | |
|---|---|
Union field data. The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the video. |
gcs_uri |
|