Package google.cloud.aiplatform.v1beta1.schema.predict.instance

Index

TextEmbeddingPredictionInstance (message)
TextEmbeddingPredictionInstance.TaskType (enum)
VideoGenerationModelInstance (message)
VideoGenerationModelInstance.Image (message)
VideoGenerationModelInstance.Mask (message)
VideoGenerationModelInstance.ReferenceImage (message)
VideoGenerationModelInstance.Video (message)
VirtualTryOnModelInstance (message)
VirtualTryOnModelInstance.Image (message)
VirtualTryOnModelInstance.PersonImage (message)
VirtualTryOnModelInstance.ProductImage (message)
VirtualTryOnModelInstance.ProductImageConfig (message)
VirtualTryOnModelInstance.ProductImageConfig.MaskMode (enum)
VisionEmbeddingModelInstance (message)
VisionEmbeddingModelInstance.Image (message)
VisionEmbeddingModelInstance.Video (message)
VisionEmbeddingModelInstance.Video.VideoSegmentConfig (message)
VisionGenerativeModelInstance (message)
VisionGenerativeModelInstance.ControlImageConfig (message)
VisionGenerativeModelInstance.ControlImageConfig.ControlType (enum)
VisionGenerativeModelInstance.Image (message)
VisionGenerativeModelInstance.Mask (message)
VisionGenerativeModelInstance.Mask.BoundingPolyList (message)
VisionGenerativeModelInstance.MaskImageConfig (message)
VisionGenerativeModelInstance.MaskImageConfig.MaskMode (enum)
VisionGenerativeModelInstance.ReferenceImage (message)
VisionGenerativeModelInstance.ReferenceImage.ReferenceType (enum)
VisionGenerativeModelInstance.StyleImageConfig (message)
VisionGenerativeModelInstance.SubjectImageConfig (message)
VisionGenerativeModelInstance.SubjectImageConfig.SubjectType (enum)
VisionReasoningModelInstance (message)
VisionReasoningModelInstance.Image (message)
VisionReasoningModelInstance.Video (message)

TextEmbeddingPredictionInstance

Prediction input format for Text Embedding. An embedding is a numerical representation (a vector of floating-point numbers) of text that captures its semantic meaning, enabling tasks like semantic search, classification, and clustering. LINT.IfChange

Fields

Fields
`content`	`string` The text passage to generate the embedding for. This can be a phrase, sentence, or document.
`title`	`string` The title for the text passage in the `content` field. If you are embedding documents for the `RETRIEVAL_DOCUMENT` task type, providing a title can improve the quality of the embedding.
`task_type`	`TaskType` The task that the generated embeddings will be used for. The model uses this hint to generate embeddings that are optimized for your use case. If not set, the default task type is used.

content

string

The text passage to generate the embedding for. This can be a phrase, sentence, or document.

title

string

The title for the text passage in the content field. If you are embedding documents for the RETRIEVAL_DOCUMENT task type, providing a title can improve the quality of the embedding.

task_type

TaskType

The task that the generated embeddings will be used for. The model uses this hint to generate embeddings that are optimized for your use case. If not set, the default task type is used.

TaskType

Represents the downstream task the embeddings will be used for. Specifying the task type helps the model generate embeddings that are optimized for your use case. New task types may be added in the future.

Enums
`DEFAULT`	Task type not specified. The model will use a default value suitable for semantic similarity and retrieval.
`RETRIEVAL_QUERY`	Specifies the given text is a query in a search/retrieval setting. Use this task type for embeddings that will be used to search a corpus of documents.
`RETRIEVAL_DOCUMENT`	Specifies the given text is a document from the corpus being searched. Use this task type for texts that are part of a corpus that will be searched.
`SEMANTIC_SIMILARITY`	Use this task type for Semantic Textual Similarity (STS). These embeddings will be compared with each other to measure semantic similarity.
`CLASSIFICATION`	Specifies that the given text will be classified. Use this task type to generate embeddings that will be used as input to a classification model (e.g., classifying text as spam or not spam).
`CLUSTERING`	Specifies that the embeddings will be used for clustering. Use this task type to generate embeddings that will be used to group similar texts together.
`QUESTION_ANSWERING`	Specifies that the embeddings will be used for question answering. Use this task type to generate embeddings for texts that are part of a question answering system.
`FACT_VERIFICATION`	Specifies that the embeddings will be used for fact verification. Use this task type to generate embeddings for texts that are part of a fact verification system.
`CODE_RETRIEVAL_QUERY`	Specifies that the embeddings will be used for code retrieval. Use this task type for embeddings that will be used to search a corpus of code.

VideoGenerationModelInstance

An instance for a video generation prediction request.

Fields
`prompt`	`string` A text description of the video you want to generate. The prompt should specify the subject, style, and any specific elements or actions that should appear in the video.
`image`	`Image` An optional image to use as a starting point for video generation, used as the first frame of the generated video. If `image` is provided, `video` and `reference_images` cannot be used.
`video`	`Video` An optional input video to use as a starting point for video generation. If `video` is provided, `image` and `reference_images` cannot be used. If `mask` is also provided, the input video will be edited based on the mask. If `mask` is not provided, the input video will be extended in duration.
`last_frame`	`Image` Image to use as the last frame of the generated video. This field can only be used if `image` is also provided.
`camera_control`	`string` The camera motion to apply to the generated video. This field can only be used if `image` is also provided. Supported values: - `fixed`: No camera motion. - `pan_left`: Pan camera to the left. - `pan_right`: Pan camera to the right. - `tilt_up`: Tilt camera up. - `tilt_down`: Tilt camera down. - `truck_left`: Move camera to the left. - `truck_right`: Move camera to the right. - `pedestal_up`: Move camera up. - `pedestal_down`: Move camera down. - `push_in`: Move camera closer to the subject. - `pull_out`: Move camera away from the subject.
`mask`	`Mask` An optional mask to apply to the input `video` for video editing tasks like inserting or removing objects, or outpainting.
`reference_images[]`	`ReferenceImage` Optional reference images to guide video generation. If `reference_images` are provided, `prompt` must also be provided, and `image`, `video`, and `last_frame` cannot be used. You can provide up to 3 `asset` images or 1 `style` image.

Image

Defines the input image format.

Fields
`mime_type`	`string` The MIME type of the image. Supported MIME types: - image/jpeg - image/png
Union field `data`. The image data. The image can be provided as either base64 encoded bytes or a Google Cloud Storage URI. `data` can be only one of the following:
`bytes_base64_encoded`	`string` The image bytes encoded in base64.
`gcs_uri`	`string` A Google Cloud Storage URI pointing to the image file.

Mask

Defines the input mask format for video editing. A mask specifies regions of a video or image to modify or preserve.

Fields
`mime_type`	`string` The MIME type of the mask. Supported MIME types: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv
`mask_mode`	`string` Specifies how the mask is applied to the input video for editing. For `insert`, `remove`, and `remove_static` modes, the mask must have the same aspect ratio as the input video. For `outpaint` mode, the mask can be 9:16 or 16:9. Supported values: - `insert`: The mask defines a rectangular region in the first frame of the input video. The object described in the `prompt` is inserted into this region and appears in subsequent frames. - `remove`: The mask identifies an object in the first frame to track and remove from the video. - `remove_static`: The mask identifies a static region in the video from which to remove objects. - `outpaint`: The mask defines a region where the input video is placed. The area outside this region is generated based on the `prompt`. Video masks are not supported for outpainting.
Union field `data`. The mask data. The mask can be provided as either base64 encoded bytes or a Google Cloud Storage URI. `data` can be only one of the following:
`bytes_base64_encoded`	`string` The mask bytes encoded in base64.
`gcs_uri`	`string` A Google Cloud Storage URI pointing to the mask file.

ReferenceImage

Defines the input reference image format. A reference image provides additional context to guide video generation, such as style or assets.

Fields

Fields
`image`	`Image` The image data for the reference image.
`reference_type`	`string` The type of reference image, which defines how it influences video generation. Supported values: - `asset`: The reference image provides assets to the generated video, such as the scene, an object, or a character. - `style`: The aesthetics of the reference image (e.g., colors, lighting, texture) are used to define the style of the generated video, such as 'anime', 'photography', or 'origami'.

image

Image

The image data for the reference image.

reference_type

string

The type of reference image, which defines how it influences video generation. Supported values: - asset: The reference image provides assets to the generated video, such as the scene, an object, or a character. - style: The aesthetics of the reference image (e.g., colors, lighting, texture) are used to define the style of the generated video, such as 'anime', 'photography', or 'origami'.

Video

Defines the input video format.

Fields
`mime_type`	`string` The MIME type of the video. Supported MIME types: - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv
Union field `data`. The video data. The video can be provided as either base64 encoded bytes or a Google Cloud Storage URI. `data` can be only one of the following:
`gcs_uri`	`string` A Google Cloud Storage URI pointing to the video file.
`bytes_base64_encoded`	`string` The video bytes encoded in base64.

VirtualTryOnModelInstance

Media generation input format for the Virtual Try On model.

Fields

Fields
`prompt`	`string` The text prompt describing the desired image.
`product_images[]`	`ProductImage` Minimum of 1 and maximum of 3 product images to try on the person.
`person_image`	`PersonImage` The image of the person to virtually try-on clothing.

prompt

string

The text prompt describing the desired image.

product_images[]

ProductImage

Minimum of 1 and maximum of 3 product images to try on the person.

person_image

PersonImage

The image of the person to virtually try-on clothing.

Image

Represents the input image and metadata for virtual try-on.

Fields
`mime_type`	`string` The MIME type of the image. The following values are supported: - image/jpeg - image/png
Union field `data`. Image content for virtual try-on. The following values are supported: - A `bytesBase64` encoded string that encodes the image. - A `gcsUri` string URI to a Google Cloud Storage bucket location. `data` can be only one of the following:
`bytes_base64_encoded`	`string` The base64-encoded bytes of the image.
`gcs_uri`	`string` The Google Cloud Storage URI of the image. The URI must be in `gs://` format.

PersonImage

An image of a person. The model generates a virtual try-on image with the supplied image of the person wearing the garments from product_images.

Fields

Fields
`image`	`Image` Required. An image of a person to try-on the clothing product. The following values are supported: - A `bytesBase64` encoded string that encodes the image. - A `gcsUri` string URI to a Google Cloud Storage bucket location.

image

Image

Required. An image of a person to try-on the clothing product. The following values are supported: - A bytesBase64 encoded string that encodes the image. - A gcsUri string URI to a Google Cloud Storage bucket location.

ProductImage

A ProductImage is used to provide the product image and its associated configuration options for Virtual Try On.

Fields

Fields
`image`	`Image` Required. An image of a product to virtually try on a person. The following values are supported: - A `bytesBase64` encoded string that encodes the image. - A `gcsUri` string URI to a Google Cloud Storage bucket location.
`mask_image`	`Image` (Optional) The mask image associated with this product. If provided, the mask image is used to guide the image editing.
`product_image_config`	`ProductImageConfig` The configuration for the product image.

image

Image

Required. An image of a product to virtually try on a person. The following values are supported: - A bytesBase64 encoded string that encodes the image. - A gcsUri string URI to a Google Cloud Storage bucket location.

mask_image

Image

(Optional) The mask image associated with this product. If provided, the mask image is used to guide the image editing.

product_image_config

ProductImageConfig

The configuration for the product image.

ProductImageConfig

Configuration for the product image.

Fields

Fields
`mask_mode`	`MaskMode` Mode used to control the segmentation logic.
`dilation`	`float` (Optional) Factor for dilating the mask. Valid values are in [0.0, 1.0]. If unset, dilation defaults to `0`.
`product_description`	`string` (Optional) A text description of the product.

mask_mode

MaskMode

Mode used to control the segmentation logic.

dilation

float

(Optional) Factor for dilating the mask. Valid values are in [0.0, 1.0]. If unset, dilation defaults to 0.

product_description

string

(Optional) A text description of the product.

MaskMode

The mode for generating a mask for the product image if ProductImage.mask_image is not provided. A mask is an image that specifies the region of the garment to be worn.

Enums
`MASK_MODE_DEFAULT`	If unspecified, the service uses a default mode for mask generation.
`MASK_MODE_USER_PROVIDED`	Use the mask provided in `ProductImage.mask_image`. No mask generation is performed.
`MASK_MODE_DETECTION_BOX`	Generate a mask from detected bounding boxes in `ProductImage.image`.
`MASK_MODE_CLOTHING_AREA`	Generate a mask by segmenting the clothing area in `ProductImage.image`.
`MASK_MODE_PARSED_PERSON`	Generate a mask by segmenting the person and clothing in `ProductImage.image`.

VisionEmbeddingModelInstance

Input format for requesting embeddings from vision models. An embedding is a list of numbers that represents the semantic meaning of text, an image, or a video. Embeddings can be used for many applications, like searching for similar images or getting recommendations. Each instance must specify exactly one of text, image, or video field.

Fields

Fields
`image`	`Image` An image to generate embeddings for.
`text`	`string` Text to generate embeddings for.
`video`	`Video` A video to generate embeddings for.

image

Image

An image to generate embeddings for.

text

string

Text to generate embeddings for.

video

Video

A video to generate embeddings for.

Image

Represents an image input for embedding generation.

Fields
`mime_type`	`string` The MIME type of the image. The supported MIME types are: `image/jpeg` `image/png`
Union field `data`. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64-encoded bytes of the image.
`gcs_uri`	`string` A Cloud Storage URI pointing to the image file. Format: `gs://bucket/object`

Video

Represents a video input for embedding generation.

Fields
`video_segment_config`	`VideoSegmentConfig` Configuration for processing a video segment. If specified, embeddings are generated for the segment. If not specified, embeddings are generated for the entire video.
Union field `data`. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64-encoded bytes of the video.
`gcs_uri`	`string` A Cloud Storage URI pointing to the video file. Format: `gs://bucket/object`

VideoSegmentConfig

Configuration for processing a segment of a video.

Fields

Fields
`start_offset_sec`	`int32` The start offset of the video segment in seconds.
`end_offset_sec`	`int32` The end offset of the video segment in seconds.
`interval_sec`	`int32` The interval of the video for which the embedding will be generated. The minimum value for `interval_sec` is 4. If the interval is less than 4, an `InvalidArgumentError` is returned. There are no limitations on the maximum value of the interval. However, if the interval is larger than `min(video_length, 120s)`, it might affect the quality of the generated embeddings.

start_offset_sec

int32

The start offset of the video segment in seconds.

end_offset_sec

int32

The end offset of the video segment in seconds.

interval_sec

int32

The interval of the video for which the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError is returned. There are no limitations on the maximum value of the interval. However, if the interval is larger than min(video_length, 120s), it might affect the quality of the generated embeddings.

VisionGenerativeModelInstance

Media generation input format for large vision model.

Fields
`image`	`Image` The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation.
`prompt`	`string` The text prompt for generating the images. This is required for both editing and generation.
`mask`	`Mask` Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images.
`reference_images[]`	`ReferenceImage` The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config.

ControlImageConfig

Config for control image used for editing.

Fields
`control_type`	`ControlType` Type of control image.
`enable_control_image_computation`	`bool` Whether to compute the control image for the request.
`superpixel_region_size`	`int32` Region size of the superpixel control image.
`superpixel_ruler`	`float` Ruler of the superpixel control image.

ControlType

Type of control image.

Enums
`CONTROL_TYPE_DEFAULT`	Default value for control image.
`CONTROL_TYPE_CANNY`	Canny sketch control image.
`CONTROL_TYPE_SCRIBBLE`	Scribble sketch control image using HED model.
`CONTROL_TYPE_FACE_MESH`	Control mode for using Face mesh style editing
`CONTROL_TYPE_COLOR_SUPERPIXEL`	Color superpixel control image.

Image

Fields
`mime_type`	`string` The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. The image bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string`

Mask

Fields

Fields
Union field `data`. `data` can be only one of the following:
`image`	`Image`
`polygon_list`	`BoundingPolyList`

Union field data.

data can be only one of the following:

image

Image

polygon_list

BoundingPolyList

BoundingPolyList

Fields
`polygons[]`	`BoundingPoly`

MaskImageConfig

Config for masked image editing using Imagen 3 Capability

Fields

Fields
`mask_mode`	`MaskMode` Mode used to generate the mask if mask is not provided.
`dilation`	`float` Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.
`mask_classes[]`	`int32` The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.

mask_mode

MaskMode

Mode used to generate the mask if mask is not provided.

dilation

float

Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.

mask_classes[]

int32

The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
`MASK_MODE_DEFAULT`	Default value for mask mode.
`MASK_MODE_USER_PROVIDED`	User provided mask. No generation needed.
`MASK_MODE_BACKGROUND`	Background mask. All elements detected as background will be masked.
`MASK_MODE_FOREGROUND`	Foreground mask. All elements detected as foreground will be masked.
`MASK_MODE_SEMANTIC`	Semantic mask. Objects identified as one of the classes defined in mask_classes will be masked.

ReferenceImage

A ReferenceImage is an image that is used to provide additional context for the image generation or editing.

Fields
`reference_image`	`Image` The actual image data of the reference image.
`reference_id`	`int32` The id of the reference image. This must be unique within the request.
`reference_type`	`ReferenceType` The type of the reference image.
Union field `reference_config`. A config describing the reference image. `reference_config` can be only one of the following:
`mask_image_config`	`MaskImageConfig` A config for a mask image.
`control_image_config`	`ControlImageConfig` A config for a control image.
`style_image_config`	`StyleImageConfig` A config for a style image.
`subject_image_config`	`SubjectImageConfig` A config for a subject image.

ReferenceType

The type of the reference image.

Enums
`REFERENCE_TYPE_DEFAULT`	Default value for reference in image.
`REFERENCE_TYPE_RAW`	A normal RGB image.
`REFERENCE_TYPE_MASK`	A mask image.
`REFERENCE_TYPE_CONTROL`	A control (line sketch) image.
`REFERENCE_TYPE_STYLE`	A style image.
`REFERENCE_TYPE_SUBJECT`	A subject image.
`REFERENCE_TYPE_CONTENT`	A content image for R2I.

StyleImageConfig

Config for style image used for editing.

Fields

Fields
`style_description`	`string` Description of the style image.

style_description

string

Description of the style image.

SubjectImageConfig

Config for subject image used for editing.

Fields

Fields
`subject_description`	`string` Description of the subject image.
`subject_type`	`SubjectType` Type of subject image.

subject_description

string

Description of the subject image.

subject_type

SubjectType

Type of subject image.

SubjectType

Type of subject image.

Enums
`SUBJECT_TYPE_DEFAULT`	Default value for subject image.
`SUBJECT_TYPE_PERSON`	The subject of the image is a person.
`SUBJECT_TYPE_ANIMAL`	The subject of the image is an animal.
`SUBJECT_TYPE_PRODUCT`	The subject of the image is a product/object.

VisionReasoningModelInstance

Vision reasoning input format for large vision model. Model only supports one instance at a time.

Fields
`prompt`	`string` The text prompt for guiding the response in QA.
`mask`	`Image` Text responses will be generated from the masked area if mask is provided.
Union field `content`. `content` can be only one of the following:
`image`	`Image` The image bytes or Cloud Storage URI to make the prediction on.
`video`	`Video` The video bytes or Cloud storage URI to make the prediction on.

Image

Fields
`mime_type`	`string` Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
Union field `data`. The image bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the image.
`gcs_uri`	`string` Cloud Storage URI representing the image in user project.

Video

Fields

Fields
Union field `data`. The video string bytes or Cloud Storage URI to make the prediction on. `data` can be only one of the following:
`bytes_base64_encoded`	`string` Base64 encoded bytes string representing the video.
`gcs_uri`	`string`

Union field data. The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:

bytes_base64_encoded

string

Base64 encoded bytes string representing the video.

gcs_uri

string

Package google.cloud.aiplatform.v1beta1.schema.predict.instance Stay organized with collections Save and categorize content based on your preferences.

Index

TextEmbeddingPredictionInstance

TaskType

VideoGenerationModelInstance

Image

Mask

ReferenceImage

Video

VirtualTryOnModelInstance

Image

PersonImage

ProductImage

ProductImageConfig

MaskMode

VisionEmbeddingModelInstance

Image

Video

VideoSegmentConfig

VisionGenerativeModelInstance

ControlImageConfig

ControlType

Image

Mask

BoundingPolyList

MaskImageConfig

MaskMode

ReferenceImage

ReferenceType

StyleImageConfig

SubjectImageConfig

SubjectType

VisionReasoningModelInstance

Image

Video

Package google.cloud.aiplatform.v1beta1.schema.predict.instance