Générer du texte à partir d'une requête multimodale

Cet exemple montre comment générer du texte à partir d'un prompt multimodal à l'aide du modèle Gemini. Le prompt se compose de trois images et de deux prompts textuels. Le modèle génère une réponse textuelle décrivant les images et les requêtes textuelles.

Exemple de code

Go

Avant d'essayer cet exemple, suivez les instructions de configuration pour Go décrites dans le guide de démarrage rapide de Vertex AI sur l'utilisation des bibliothèques clientes. Pour en savoir plus, consultez la documentation de référence de l'API Vertex AI pour Go.

Pour vous authentifier auprès de Vertex AI, configurez le service Identifiants par défaut de l'application. Pour en savoir plus, consultez Configurer l'authentification pour un environnement de développement local.

import (
	"context"
	"fmt"
	"io"
	"os"

	genai "google.golang.org/genai"
)

// generateWithMultiLocalImages shows how to generate text using multiple local image inputs.
func generateWithMultiLocalImages(w io.Writer) error {
	ctx := context.Background()

	client, err := genai.NewClient(ctx, &genai.ClientConfig{
		HTTPOptions: genai.HTTPOptions{APIVersion: "v1"},
	})
	if err != nil {
		return fmt.Errorf("failed to create genai client: %w", err)
	}

	// Read local image files
	image1, err := os.ReadFile("latte.jpg")
	if err != nil {
		return fmt.Errorf("failed to read image1: %w", err)
	}
	image2, err := os.ReadFile("scones.jpg")
	if err != nil {
		return fmt.Errorf("failed to read image2: %w", err)
	}

	modelName := "gemini-2.5-flash"
	contents := []*genai.Content{
		{
			Role: "user",
			Parts: []*genai.Part{
				{Text: "Generate a list of all the objects contained in both images."},
				{InlineData: &genai.Blob{
					MIMEType: "image/jpeg",
					Data:     image1,
				}},
				{InlineData: &genai.Blob{
					MIMEType: "image/jpeg",
					Data:     image2,
				}},
			},
		},
	}

	// Call the model
	resp, err := client.Models.GenerateContent(ctx, modelName, contents, nil)
	if err != nil {
		return fmt.Errorf("failed to generate content: %w", err)
	}

	fmt.Fprintln(w, resp.Text())

	// Example response:
	// Here is a list of all the distinct objects found in both images:
	// 1.  **Coffee** (in mugs/cups; one is clearly a latte with heart art, others are also coffee/latte)
	// 2.  **Mug(s)/Cup(s)** (yellow in the top image, white in the bottom image)
	// 3.  **Cake** (sliced, in the top image)
	// 4.  **Plate** (white, under the cake slice in the top image)
	// 5.  **Fork** (partially visible on the plate in the top image)
	// 6.  **Scones/Biscuits** (blueberry, in the bottom image)
	// 7.  **Blueberries** (scattered and in a bowl in the bottom image)
	// 8.  **Bowl** (small, dark, holding blueberries in the bottom image)
	// 9.  **Spoon** (silver, with "LET'S JAM" inscription, in the bottom image)
	// 10. **Flowers** (peonies, in the bottom image)
	// 11. **Leaves** (green, possibly mint, in the bottom image)
	// 12. **Paper** (parchment or wax paper, in the bottom image)
	// 13. **Table/Surface** (wooden in the top image, textured/painted in the bottom image)
	// ...

	return nil
}

Java

Avant d'essayer cet exemple, suivez les instructions de configuration pour Java décrites dans le guide de démarrage rapide de Vertex AI sur l'utilisation des bibliothèques clientes. Pour en savoir plus, consultez la documentation de référence de l'API Vertex AI pour Java.


import com.google.genai.Client;
import com.google.genai.types.Content;
import com.google.genai.types.GenerateContentResponse;
import com.google.genai.types.HttpOptions;
import com.google.genai.types.Part;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class TextGenerationWithMultiLocalImage {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace these variables before running the sample.
    String modelId = "gemini-2.5-flash";
    String localImageFilePath1 = "your/local/img1.jpg";
    String localImageFilePath2 = "your/local/img2.jpg";
    generateContent(modelId, localImageFilePath1, localImageFilePath2);
  }

  // Generates text using multiple local images
  public static String generateContent(
      String modelId, String localImageFilePath1, String localImageFilePath2) throws IOException {
    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (Client client =
        Client.builder()
            .location("global")
            .vertexAI(true)
            .httpOptions(HttpOptions.builder().apiVersion("v1").build())
            .build()) {

      // Read content from local files.
      byte[] localFileImg1Bytes = Files.readAllBytes(Paths.get(localImageFilePath1));
      byte[] localFileImg2Bytes = Files.readAllBytes(Paths.get(localImageFilePath2));

      GenerateContentResponse response =
          client.models.generateContent(
              modelId,
              Content.fromParts(
                  Part.fromBytes(localFileImg1Bytes, "image/jpeg"),
                  Part.fromBytes(localFileImg2Bytes, "image/jpeg"),
                  Part.fromText("Generate a list of all the objects contained in both images")),
              null);

      System.out.print(response.text());
      // Example response:
      // Based on both images, here are the objects contained in both:
      //
      // 1.  **Coffee cups (or mugs)**: Both images feature one or more cups containing a beverage.
      // 2.  **Coffee (or a similar beverage)**: Both images contain a liquid beverage in the cups,
      // appearing to be coffee or a coffee-like drink.
      // 3.  **Table (or a flat surface)**: Both compositions are set on a flat surface, likely a
      // table or countertop.
      return response.text();
    }
  }
}

Node.js

Avant d'essayer cet exemple, suivez les instructions de configuration pour Node.js décrites dans le guide de démarrage rapide de Vertex AI sur l'utilisation des bibliothèques clientes. Pour en savoir plus, consultez la documentation de référence de l'API Vertex AI pour Node.js.

const {GoogleGenAI} = require('@google/genai');
const fs = require('fs');

const GOOGLE_CLOUD_PROJECT = process.env.GOOGLE_CLOUD_PROJECT;
const GOOGLE_CLOUD_LOCATION = process.env.GOOGLE_CLOUD_LOCATION || 'global';

function loadImageAsBase64(path) {
  const bytes = fs.readFileSync(path);
  return bytes.toString('base64');
}

async function generateContent(
  projectId = GOOGLE_CLOUD_PROJECT,
  location = GOOGLE_CLOUD_LOCATION,
  imagePath1,
  imagePath2
) {
  const client = new GoogleGenAI({
    vertexai: true,
    project: projectId,
    location: location,
  });

  // TODO(Developer): Update the below file paths to your images
  const image1 = loadImageAsBase64(imagePath1);
  const image2 = loadImageAsBase64(imagePath2);

  const response = await client.models.generateContent({
    model: 'gemini-2.5-flash',
    contents: [
      {
        role: 'user',
        parts: [
          {
            text: 'Generate a list of all the objects contained in both images.',
          },
          {
            inlineData: {
              data: image1,
              mimeType: 'image/jpeg',
            },
          },
          {
            inlineData: {
              data: image2,
              mimeType: 'image/jpeg',
            },
          },
        ],
      },
    ],
  });

  console.log(response.text);

  // Example response:
  //  Okay, here's a jingle combining the elements of both sets of images, focusing on ...
  //  ...

  return response.text;
}

Python

Avant d'essayer cet exemple, suivez les instructions de configuration pour Python décrites dans le guide de démarrage rapide de Vertex AI sur l'utilisation des bibliothèques clientes. Pour en savoir plus, consultez la documentation de référence de l'API Vertex AI pour Python.

from google import genai
from google.genai.types import HttpOptions, Part

client = genai.Client(http_options=HttpOptions(api_version="v1"))
# TODO(Developer): Update the below file paths to your images
# image_path_1 = "path/to/your/image1.jpg"
# image_path_2 = "path/to/your/image2.jpg"
with open(image_path_1, "rb") as f:
    image_1_bytes = f.read()
with open(image_path_2, "rb") as f:
    image_2_bytes = f.read()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        "Generate a list of all the objects contained in both images.",
        Part.from_bytes(data=image_1_bytes, mime_type="image/jpeg"),
        Part.from_bytes(data=image_2_bytes, mime_type="image/jpeg"),
    ],
)
print(response.text)
# Example response:
# Okay, here's a jingle combining the elements of both sets of images, focusing on ...
# ...

Étape suivante

Pour rechercher et filtrer des exemples de code pour d'autres produits Google Cloud , consultez l'explorateur d'exemplesGoogle Cloud .

Générer du texte à partir d'une requête multimodale Restez organisé à l'aide des collections Enregistrez et classez les contenus selon vos préférences.