Media generation input format for large vision model.
The image bytes or Cloud Storage URI to make the prediction on. It is required for editing. Not needed for generation. This field will be used to determine whether the call is editing or generation.
prompt
string
The text prompt for generating the images. This is required for both editing and generation.
Masked field will be editied based on the text content provided. This can be either an image or a polygon. It should not be provided without images. Optional field for editing the images.
The reference images to be used for editing and customization capabilities. Imagen 3 Capability adds support for multiple reference images, each of which can be a mask, control, style, or subject image. Depending on the reference type, the reference_config field will be populated with the corresponding config.
JSON representation |
---|
{ "image": { object ( |
Image
mimeType
string
The MIME type of the content of the image. Only the images in below listed MIME types are supported. - image/jpeg - image/png
data
Union type
data
can be only one of the following:bytesBase64Encoded
string
Base64 encoded bytes string representing the image.
gcsUri
string
JSON representation |
---|
{ "mimeType": string, // data "bytesBase64Encoded": string, "gcsUri": string // Union type } |
Mask
data
Union type
JSON representation |
---|
{ // data "image": { object ( |
BoundingPolyList
JSON representation |
---|
{
"polygons": [
{
object ( |
ReferenceImage
A ReferenceImage is an image that is used to provide additional context for the image generation or editing.
The actual image data of the reference image.
referenceId
integer
The id of the reference image. This must be unique within the request.
The type of the reference image.
reference_config
Union type
reference_config
can be only one of the following:A config for a mask image.
A config for a control image.
A config for a style image.
A config for a subject image.
JSON representation |
---|
{ "referenceImage": { object ( |
MaskImageConfig
Config for masked image editing using Imagen 3 Capability
maskMode
enum (MaskMode
)
Mode used to generate the mask if mask is not provided.
dilation
number
Dilation to be used with this Mask. This value is used to dilate the mask before applying the edit mode.
maskClasses[]
integer
The segmentation classes which are used in the MASK_MODE_SEMANTIC mode.
JSON representation |
---|
{
"maskMode": enum ( |
ControlImageConfig
Config for control image used for editing.
type of control image.
enableControlImageComputation
boolean
Whether to compute the control image for the request.
superpixelRegionSize
integer
Region size of the superpixel control image.
superpixelRuler
number
Ruler of the superpixel control image.
JSON representation |
---|
{
"controlType": enum ( |
StyleImageConfig
Config for style image used for editing.
styleDescription
string
description of the style image.
JSON representation |
---|
{ "styleDescription": string } |
SubjectImageConfig
Config for subject image used for editing.
subjectDescription
string
description of the subject image.
type of subject image.
JSON representation |
---|
{
"subjectDescription": string,
"subjectType": enum ( |