Index
NamedBoundingBox(message)SafetyAttributes(message)SafetyAttributes.DetectedLabels(message)SafetyAttributes.DetectedLabels.BoundingBox(message)SafetyAttributes.DetectedLabels.Entity(message)SemanticFilterResponse(message)TextEmbedding(message)TextEmbedding.Statistics(message)TextEmbeddingPredictionResult(message)VideoGenerationModelResult(message)VirtualTryOnModelResultProto(message)VirtualTryOnModelResultProto.Image(message)VisionEmbeddingModelResult(message)VisionEmbeddingModelResult.VideoEmbedding(message)VisionGenerativeModelResult(message)VisionGenerativeModelResult.Image(message)VisionReasoningModelResult(message)
NamedBoundingBox
NamedBoundingBox to track an annotated bounding box.
| Fields | |
|---|---|
classes[] |
Annotated classes. |
entities[] |
Annotated entities. |
scores[] |
Annotated scores. Scores are normalized between [0, 1]. |
x1 |
The top-left (x1, y1) corner's unnormalized coordinate. |
x2 |
|
y1 |
The bottom-right (y1, y2) corner's unnormalized coordinate. |
y2 |
|
SafetyAttributes
| Fields | |
|---|---|
categories[] |
List of RAI categories. |
scores[] |
List of RAI scores. |
detected_labels[] |
List of detected labels |
DetectedLabels
Filters which return labels with confidence scores.
| Fields | |
|---|---|
entities[] |
The list of detected entities for the rai signal. |
rai_category |
The RAI category for the deteceted labels. |
BoundingBox
An integer bounding box of the original pixel size for the detected labels.
| Fields | |
|---|---|
x1 |
The X coordinate of the top-left corner, in pixels. |
y1 |
The Y coordinate of the top-left corner, in pixels. |
x2 |
The X coordinate of the bottom-right corner, in pixels. |
y2 |
The Y coordinate of the bottom-right corner, in pixels. |
Entity
The properties for a detected entity from the rai signal.
| Fields | |
|---|---|
mid |
MID of the label |
description |
Description of the label |
score |
Confidence score of the label |
bounding_box |
Bounding box of the label |
iou_score |
The intersection ratio between the detection bounding box and the mask. |
SemanticFilterResponse
SemanticFilterResponse tracks the semantic filtering results if user turns on the semantic filtering in LVM image editing's editConfig.
| Fields | |
|---|---|
named_bounding_boxes[] |
If semantic filtering is not passed, a list of named bounding boxes will be populated to report users the detected objects that failed semantic filtering. |
passed_semantic_filter |
Whether the semantic filtering is passed. |
TextEmbedding
| Fields | |
|---|---|
values[] |
The |
statistics |
The statistics computed from the input text. |
Statistics
| Fields | |
|---|---|
token_count |
Number of tokens of the input text. |
truncated |
Indicates if the input text was longer than max allowed tokens and truncated. |
TextEmbeddingPredictionResult
Prediction output format for Text Embedding. LINT.IfChange
| Fields | |
|---|---|
embeddings |
The embeddings generated from the input text. |
VideoGenerationModelResult
Prediction result from a video generation model. When you request a prediction from a video generation model, the model generates videos based on your input and returns URIs to these videos in Google Cloud Storage.
| Fields | |
|---|---|
gcs_uris[] |
A list of Google Cloud Storage URIs for generated videos. For each input instance in your prediction request, the model may generate one or more videos. This field provides the Google Cloud Storage URIs for each of these videos. |
VirtualTryOnModelResultProto
Prediction format for the Virtual Try On model.
| Fields | |
|---|---|
images[] |
List of images bytes or gcs uris of the generated images. |
Image
The generated image and metadata.
| Fields | |
|---|---|
mime_type |
The MIME type of the content of the image. Only the images in the MIME types below are supported. - image/jpeg - image/png |
Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
The Cloud Storage URI of the image. |
rai_filtered_reason |
The reason generated images get filtered. |
VisionEmbeddingModelResult
The prediction result for a large vision model embedding request. An embedding is a vectorized representation of data such as image, text or video. The embeddings produced by this model can be used for tasks such as image retrieval, similarity comparison, and classification. The embedding vectors have 1024 dimensions.
| Fields | |
|---|---|
image_embedding |
The embedding generated from the input image. This field is populated if the prediction request contained an image. |
text_embedding |
The embedding generated from the input text. This field is populated if the prediction request contained text. |
video_embeddings[] |
The embeddings generated from the input video. This field is populated if the prediction request contained a video. The video is divided into 1-second segments, and an embedding is generated for each segment. |
VideoEmbedding
Contains embedding data for a specific time segment of a video.
| Fields | |
|---|---|
start_offset_sec |
The start time of the video segment that this embedding represents, measured in seconds from the beginning of the video. |
end_offset_sec |
The end time of the video segment that this embedding represents, measured in seconds from the beginning of the video. |
embedding |
The 1024-dimension embedding vector for this video segment. |
VisionGenerativeModelResult
| Fields | |
|---|---|
images[] |
List of images bytes or gcs uris of the generated images. |
Image
| Fields | |
|---|---|
mime_type |
The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/gif - image/png - image/webp - image/bmp - image/tiff - image/vnd.microsoft.icon |
prompt |
The rewritten prompt used for the image generation. |
Union field data. The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
|
rai_filtered_reason |
The reason generated images get filtered. |
content_type |
Input object content type |
semantic_filter_response |
Semantic filter results. This will report to users when semantic filter is turned on in editConfig and used for image inpainting. |
safety_attributes |
Safety attributes scores of the content. |
VisionReasoningModelResult
The response format for lvm image and video captioning is as follows: 1. Image captioning: From the lvm image2text(PaLi) model, the responses are descriptions of the same image. 2. Video captioning: From the lvm video2text(Penguin) model, the responses are different segments within the same video. The response also contains the start and end offsets of the video segment. Video captioning response format: "[start_offset, end_offset) - text_response".
| Fields | |
|---|---|
text_responses[] |
List of text responses in the given text language. |