Package google.cloud.aiplatform.v1beta1.schema.predict.instance

Index

TextEmbeddingPredictionInstance

Prediction input format for Text Embedding. LINT.IfChange

Fields
content

string

The main text content to embed.

title

string

Optional identifier of the text content.

task_type

TaskType

Optional downstream task the embeddings will be used for.

TaskType

Represents a downstream task the embeddings will be used for. next_id: 9

Enums
DEFAULT Unset value, which will default to one of the other enum values.
RETRIEVAL_QUERY Specifies the given text is a query in a search/retrieval setting.
RETRIEVAL_DOCUMENT Specifies the given text is a document from the corpus being searched.
SEMANTIC_SIMILARITY Specifies the given text will be used for STS.
CLASSIFICATION Specifies that the given text will be classified.
CLUSTERING Specifies that the embeddings will be used for clustering.
QUESTION_ANSWERING Specifies that the embeddings will be used for question answering.
FACT_VERIFICATION Specifies that the embeddings will be used for fact verification.
CODE_RETRIEVAL_QUERY Specifies that the embeddings will be used for code retrieval.

VideoGenerationModelInstance

An instance for a video generation prediction request.

Fields
prompt

string

A text description of the video you want to generate. The prompt should specify the subject, style, and any specific elements or actions that should appear in the video.

image

Image

An optional image to use as a starting point for video generation, used as the first frame of the generated video. If image is provided, video and reference_images cannot be used.

video

Video

An optional input video to use as a starting point for video generation. If video is provided, image and reference_images cannot be used. If mask is also provided, the input video will be edited based on the mask. If mask is not provided, the input video will be extended in duration.

last_frame

Image

Image to use as the last frame of the generated video. This field can only be used if image is also provided.

camera_control

string

The camera motion to apply to the generated video. This field can only be used if image is also provided. Supported values: - fixed: No camera motion. - pan_left: Pan camera to the left. - pan_right: Pan camera to the right. - tilt_up: Tilt camera up. - tilt_down: Tilt camera down. - truck_left: Move camera to the left. - truck_right: Move camera to the right. - pedestal_up: Move camera up. - pedestal_down: Move camera down. - push_in: Move camera closer to the subject. - pull_out: Move camera away from the subject.

mask

Mask

An optional mask to apply to the input video for video editing tasks like inserting or removing objects, or outpainting.

reference_images[]

ReferenceImage

Optional reference images to guide video generation. If reference_images are provided, prompt must also be provided, and image, video, and last_frame cannot be used. You can provide up to 3 asset images or 1 style image.

Image

Defines the input image format.

Fields
mime_type

string

The MIME type of the image. Supported MIME types: - image/jpeg - image/png

Union field data. The image data. The image can be provided as either base64 encoded bytes or a Google Cloud Storage URI. data can be only one of the following:
bytes_base64_encoded

string

The image bytes encoded in base64.

gcs_uri

string

A Google Cloud Storage URI pointing to the image file.

Mask

Defines the input mask format for video editing. A mask specifies regions of a video or image to modify or preserve.

Fields
mime_type

string

The MIME type of the mask. Supported MIME types: - image/png - image/jpeg - image/webp - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv

mask_mode

string

Specifies how the mask is applied to the input video for editing. For insert, remove, and remove_static modes, the mask must have the same aspect ratio as the input video. For outpaint mode, the mask can be 9:16 or 16:9. Supported values: - insert: The mask defines a rectangular region in the first frame of the input video. The object described in the prompt is inserted into this region and appears in subsequent frames. - remove: The mask identifies an object in the first frame to track and remove from the video. - remove_static: The mask identifies a static region in the video from which to remove objects. - outpaint: The mask defines a region where the input video is placed. The area outside this region is generated based on the prompt. Video masks are not supported for outpainting.

Union field data. The mask data. The mask can be provided as either base64 encoded bytes or a Google Cloud Storage URI. data can be only one of the following:
bytes_base64_encoded

string

The mask bytes encoded in base64.

gcs_uri

string

A Google Cloud Storage URI pointing to the mask file.

ReferenceImage

Defines the input reference image format. A reference image provides additional context to guide video generation, such as style or assets.

Fields
image

Image

The image data for the reference image.

reference_type

string

The type of reference image, which defines how it influences video generation. Supported values: - asset: The reference image provides assets to the generated video, such as the scene, an object, or a character. - style: The aesthetics of the reference image (e.g., colors, lighting, texture) are used to define the style of the generated video, such as 'anime', 'photography', or 'origami'.

Video

Defines the input video format.

Fields
mime_type

string

The MIME type of the video. Supported MIME types: - video/mov - video/mpeg - video/mp4 - video/mpg - video/avi - video/wmv - video/mpegps - video/flv

Union field data. The video data. The video can be provided as either base64 encoded bytes or a Google Cloud Storage URI. data can be only one of the following:
gcs_uri

string

A Google Cloud Storage URI pointing to the video file.

bytes_base64_encoded

string

The video bytes encoded in base64.

VirtualTryOnModelInstance

Media generation input format for the Virtual Try On model.

Fields
prompt

string

The text prompt for generating the images. This is required for both editing and generation.

product_images[]

ProductImage

The image of the products to wear on the person.

person_image

PersonImage

The image of the person to be edited with the product images.

Image

Input image and metadata.

Fields
mime_type

string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

The Cloud Storage URI of the image.

PersonImage

A PersonImage is used to provide the person image and its associated configuration options for Virtual Try On.

Fields
image

Image

The image bytes or Cloud Storage URI of the person or subject that will be edited using the product images.

ProductImage

A ProductImage is used to provide the product image and its associated configuration options for Virtual Try On.

Fields
image

Image

The actual image data of the reference image.

mask_image

Image

The mask image associated with this product. If provided, the mask image will be used to guide the image editing.

product_image_config

ProductImageConfig

A config for the product image.

ProductImageConfig

Config for the product image.

Fields
mask_mode

MaskMode

Mode used to control the segmentation logic.

dilation

float

Dilation to be used with this Mask.

product_description

string

Description of the product.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
MASK_MODE_DEFAULT Default value for mask mode.
MASK_MODE_USER_PROVIDED User provided mask. No segmentation needed.
MASK_MODE_DETECTION_BOX Mask from detected bounding boxes.
MASK_MODE_CLOTHING_AREA Masks from segmenting the clothing area with open-vocab segmentation.
MASK_MODE_PARSED_PERSON Masks from segmenting the person body and clothing using the person-parsing model.

VisionEmbeddingModelInstance

Input format for requesting embeddings from vision models. An embedding is a list of numbers that represents the semantic meaning of text, an image, or a video. Embeddings can be used for many applications, like searching for similar images or getting recommendations. Each instance must specify exactly one of text, image, or video field.

Fields
image

Image

An image to generate embeddings for.

text

string

Text to generate embeddings for.

video

Video

A video to generate embeddings for.

Image

Represents an image input for embedding generation.

Fields
mime_type

string

The MIME type of the image.

The supported MIME types are:

  • image/jpeg
  • image/png

Union field data.

data can be only one of the following:

bytes_base64_encoded

string

Base64-encoded bytes of the image.

gcs_uri

string

A Cloud Storage URI pointing to the image file. Format: gs://bucket/object

Video

Represents a video input for embedding generation.

Fields
video_segment_config

VideoSegmentConfig

Configuration for processing a video segment. If specified, embeddings are generated for the segment. If not specified, embeddings are generated for the entire video.

Union field data.

data can be only one of the following:

bytes_base64_encoded

string

Base64-encoded bytes of the video.

gcs_uri

string

A Cloud Storage URI pointing to the video file. Format: gs://bucket/object

VideoSegmentConfig

Configuration for processing a segment of a video.

Fields
start_offset_sec

int32

The start offset of the video segment in seconds.

end_offset_sec

int32

The end offset of the video segment in seconds.

interval_sec

int32

The interval of the video for which the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError is returned. There are no limitations on the maximum value of the interval. However, if the interval is larger than min(video_length, 120s), it might affect the quality of the generated embeddings.

VisionGenerativeModelInstance

Media generation input format for large vision model.

Fields
image

Image

The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation.

prompt

string

The text prompt for generating the images. This is required for both editing and generation.

mask

Mask

Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images.

reference_images[]

ReferenceImage

The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config.

ControlImageConfig

Config for control image used for editing.

Fields
control_type

ControlType

Type of control image.

enable_control_image_computation

bool

Whether to compute the control image for the request.

superpixel_region_size

int32

Region size of the superpixel control image.

superpixel_ruler

float

Ruler of the superpixel control image.

ControlType

Type of control image.

Enums
CONTROL_TYPE_DEFAULT Default value for control image.
CONTROL_TYPE_CANNY Canny sketch control image.
CONTROL_TYPE_SCRIBBLE Scribble sketch control image using HED model.
CONTROL_TYPE_FACE_MESH Control mode for using Face mesh style editing
CONTROL_TYPE_COLOR_SUPERPIXEL Color superpixel control image.

Image

Fields
mime_type

string

The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

Mask

Fields

Union field data.

data can be only one of the following:

image

Image

polygon_list

BoundingPolyList

BoundingPolyList

Fields
polygons[]

BoundingPoly

MaskImageConfig

Config for masked image editing using Imagen 3 Capability

Fields
mask_mode

MaskMode

Mode used to generate the mask if mask is not provided.

dilation

float

Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.

mask_classes[]

int32

The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.

MaskMode

Mode used to generate the mask if mask is not provided.

Enums
MASK_MODE_DEFAULT Default value for mask mode.
MASK_MODE_USER_PROVIDED User provided mask. No generation needed.
MASK_MODE_BACKGROUND Background mask. All elements detected as background will be masked.
MASK_MODE_FOREGROUND Foreground mask. All elements detected as foreground will be masked.
MASK_MODE_SEMANTIC Semantic mask. Objects identified as one of the classes defined in mask_classes will be masked.

ReferenceImage

A ReferenceImage is an image that is used to provide additional context for the image generation or editing.

Fields
reference_image

Image

The actual image data of the reference image.

reference_id

int32

The id of the reference image. This must be unique within the request.

reference_type

ReferenceType

The type of the reference image.

Union field reference_config. A config describing the reference image. reference_config can be only one of the following:
mask_image_config

MaskImageConfig

A config for a mask image.

control_image_config

ControlImageConfig

A config for a control image.

style_image_config

StyleImageConfig

A config for a style image.

subject_image_config

SubjectImageConfig

A config for a subject image.

ReferenceType

The type of the reference image.

Enums
REFERENCE_TYPE_DEFAULT Default value for reference in image.
REFERENCE_TYPE_RAW A normal RGB image.
REFERENCE_TYPE_MASK A mask image.
REFERENCE_TYPE_CONTROL A control (line sketch) image.
REFERENCE_TYPE_STYLE A style image.
REFERENCE_TYPE_SUBJECT A subject image.
REFERENCE_TYPE_CONTENT A content image for R2I.

StyleImageConfig

Config for style image used for editing.

Fields
style_description

string

Description of the style image.

SubjectImageConfig

Config for subject image used for editing.

Fields
subject_description

string

Description of the subject image.

subject_type

SubjectType

Type of subject image.

SubjectType

Type of subject image.

Enums
SUBJECT_TYPE_DEFAULT Default value for subject image.
SUBJECT_TYPE_PERSON The subject of the image is a person.
SUBJECT_TYPE_ANIMAL The subject of the image is an animal.
SUBJECT_TYPE_PRODUCT The subject of the image is a product/object.

VisionReasoningModelInstance

Vision reasoning input format for large vision model. Model only supports one instance at a time.

Fields
prompt

string

The text prompt for guiding the response in QA.

mask

Image

Text responses will be generated from the masked area if mask is provided.

Union field content.

content can be only one of the following:

image

Image

The image bytes or Cloud Storage URI to make the prediction on.

video

Video

The video bytes or Cloud storage URI to make the prediction on.

Image

Fields
mime_type

string

Optional. The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png

Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the image.

gcs_uri

string

Cloud Storage URI representing the image in user project.

Video

Fields
Union field data. The video string bytes or Cloud Storage URI to make the prediction on. data can be only one of the following:
bytes_base64_encoded

string

Base64 encoded bytes string representing the video.

gcs_uri

string