Index
NamedBoundingBox
(message)SafetyAttributes
(message)SafetyAttributes.DetectedLabels
(message)SafetyAttributes.DetectedLabels.BoundingBox
(message)SafetyAttributes.DetectedLabels.Entity
(message)SemanticFilterResponse
(message)TextEmbedding
(message)TextEmbedding.Statistics
(message)TextEmbeddingPredictionResult
(message)VideoGenerationModelResult
(message)VirtualTryOnModelResultProto
(message)VirtualTryOnModelResultProto.Image
(message)VisionEmbeddingModelResult
(message)VisionEmbeddingModelResult.VideoEmbedding
(message)VisionGenerativeModelResult
(message)VisionGenerativeModelResult.Image
(message)VisionReasoningModelResult
(message)
NamedBoundingBox
NamedBoundingBox to track an annotated bounding box.
Fields | |
---|---|
classes[] |
Annotated classes. |
entities[] |
Annotated entities. |
scores[] |
Annotated scores. Scores are normalized between [0, 1]. |
x1 |
The top-left (x1, y1) corner's unnormalized coordinate. |
x2 |
|
y1 |
The bottom-right (y1, y2) corner's unnormalized coordinate. |
y2 |
|
SafetyAttributes
Fields | |
---|---|
categories[] |
List of RAI categories. |
scores[] |
List of RAI scores. |
detected_labels[] |
List of detected labels |
DetectedLabels
Filters which return labels with confidence scores.
Fields | |
---|---|
entities[] |
The list of detected entities for the rai signal. |
rai_category |
The RAI category for the deteceted labels. |
BoundingBox
An integer bounding box of the original pixel size for the detected labels.
Fields | |
---|---|
x1 |
The X coordinate of the top-left corner, in pixels. |
y1 |
The Y coordinate of the top-left corner, in pixels. |
x2 |
The X coordinate of the bottom-right corner, in pixels. |
y2 |
The Y coordinate of the bottom-right corner, in pixels. |
Entity
The properties for a detected entity from the rai signal.
Fields | |
---|---|
mid |
MID of the label |
description |
Description of the label |
score |
Confidence score of the label |
bounding_box |
Bounding box of the label |
iou_score |
The intersection ratio between the detection bounding box and the mask. |
SemanticFilterResponse
SemanticFilterResponse tracks the semantic filtering results if user turns on the semantic filtering in LVM image editing's editConfig.
Fields | |
---|---|
named_bounding_boxes[] |
If semantic filtering is not passed, a list of named bounding boxes will be populated to report users the detected objects that failed semantic filtering. |
passed_semantic_filter |
Whether the semantic filtering is passed. |
TextEmbedding
Fields | |
---|---|
values[] |
The |
statistics |
The statistics computed from the input text. |
Statistics
Fields | |
---|---|
token_count |
Number of tokens of the input text. |
truncated |
Indicates if the input text was longer than max allowed tokens and truncated. |
TextEmbeddingPredictionResult
Prediction output format for Text Embedding. LINT.IfChange
Fields | |
---|---|
embeddings |
The embeddings generated from the input text. |
VideoGenerationModelResult
Prediction format for phenaki model.
Fields | |
---|---|
gcs_uris[] |
List of gcs uris of the generated videos. |
VirtualTryOnModelResultProto
Prediction format for the Virtual Try On model.
Fields | |
---|---|
images[] |
List of images bytes or gcs uris of the generated images. |
Image
The generated image and metadata.
Fields | |
---|---|
mime_type |
The MIME type of the content of the image. Only the images in the MIME types below are supported. - image/jpeg - image/png |
Union field data . The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
The Cloud Storage URI of the image. |
rai_filtered_reason |
The reason generated images get filtered. |
VisionEmbeddingModelResult
Prediction format for large vision model embedding api.
Fields | |
---|---|
image_embedding |
The 1024 dimension image embedding result from the provided image. |
text_embedding |
The 1024 dimension text embedding result from the provided text. |
video_embeddings[] |
Video embeddings. |
VideoEmbedding
The video embedding message.
Fields | |
---|---|
start_offset_sec |
The start offset of the video. |
end_offset_sec |
The end offset of the video. |
embedding |
The 1024 dimension video embedding result from the provided video. |
VisionGenerativeModelResult
Fields | |
---|---|
images[] |
List of images bytes or gcs uris of the generated images. |
Image
Fields | |
---|---|
mime_type |
The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/gif - image/png - image/webp - image/bmp - image/tiff - image/vnd.microsoft.icon |
prompt |
The rewritten prompt used for the image generation. |
Union field data . The image bytes or Cloud Storage URI to make the prediction on. data can be only one of the following: |
|
bytes_base64_encoded |
Base64 encoded bytes string representing the image. |
gcs_uri |
|
rai_filtered_reason |
The reason generated images get filtered. |
content_type |
Input object content type |
semantic_filter_response |
Semantic filter results. This will report to users when semantic filter is turned on in editConfig and used for image inpainting. |
safety_attributes |
Safety attributes scores of the content. |
VisionReasoningModelResult
The response format for lvm image and video captioning is as follows: 1. Image captioning: From the lvm image2text(PaLi) model, the responses are descriptions of the same image. 2. Video captioning: From the lvm video2text(Penguin) model, the responses are different segments within the same video. The response also contains the start and end offsets of the video segment. Video captioning response format: "[start_offset, end_offset) - text_response".
Fields | |
---|---|
text_responses[] |
List of text responses in the given text language. |