Design your compute infrastructure with Gemini

This document explains how to plan and design your compute infrastructure by prompting Gemini.

You can use Gemini in the Google Cloud console as an AI-powered interface to evaluate hardware options, estimate deployment costs, and view recommended configurations for your Compute Engine instances. To tailor its recommendations, Gemini evaluates your Google Cloud project by checking your quota limits, existing reservations, committed use discounts (CUDs), default region and zone, and any resource location constraints. By using Gemini to help with your planning, you can reach an optimal configuration for your workload before you create or modify a compute instance.

To learn more about the components that you must configure before or when you create a compute instance, see Overview of creating Compute Engine instances.

Limitations

When you prompt Gemini in the Google Cloud console, you can't create, modify, or delete resources by using Gemini.

Before you begin

When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

Required roles

To get the permissions that you need to access and prompt Gemini, ask your administrator to grant you the Compute Viewer (roles/compute.viewer) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to access and prompt Gemini. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to access and prompt Gemini:

  • To view a list of instances: compute.instances.list

You might also be able to get these permissions with custom roles or other predefined roles.

Access Gemini in the Google Cloud console

To access Gemini in the Google Cloud console, complete the following steps:

  1. In the Google Cloud console, go to the Overview page.

    Go to Overview

  2. In the Design your infrastructure with Compute Advisor section, you can view the following:

    A screenshot of the Compute Advisor page and the UI elements that compose it.

    The UI elements that are displayed in the preceding screenshot are the following:

    • Quick-action prompt cards: a set of cards that each contain a sample prompt. If you click a card, then Google Cloud console automatically populates the prompt box with the sample prompt.

    • Prompt box: this field lets you enter and submit prompts. To submit a prompt, click Submit prompt.

    • View previous conversations: this feature lets you view the details of a past conversation and resume it, or delete conversations if you no longer need them.

Prompt Gemini

After you submit a prompt, Gemini starts to generate a response. A pane appears and Google Cloud console displays the response to your prompt in the pane, as shown in the following screenshot:

A screenshot of the Compute Advisor page after you submit a prompt.

Based on your prompt, the response pane includes the following elements:

  • Contextual grounding: Gemini automatically evaluates your project context to deliver highly tailored recommendations, including quota limits, existing reservations, CUDs, your default region and zone, and any resource location constraints.

  • Interactive code snippets: Gemini generates gcloud commands, REST API methods, or Terraform resources. You can copy and paste these code snippets or run them in Cloud Shell.

  • Visual canvas: Gemini organizes recommendations into structured tables and side-by-side comparisons. This view helps you evaluate product features and architectural approaches. It also provides an implementation plan for your use case.

The following sections outline the best practices for writing prompts, and example prompts that you can use before you create or modify a compute instance.

Best practices for prompting

To get the most accurate and actionable recommendations from Gemini, we recommend that you structure your prompts in the same way that you would do a code block. This approach guides the generative AI by using clear parameter declarations, role definitions, specific instructions, and explicit output formats.

When you prompt Gemini, consider the following best practices:

  • Focus on design and planning: we recommend that you don't prompt Gemini for troubleshooting compute instance errors. To resolve these errors, see instead Troubleshoot creating, updating, and deleting compute instances.

  • Specify a persona or role: declare a target role or persona, such as an IT administrator, AI researcher, or platform engineer, for Gemini to adopt. This approach guides the tone, depth, and expertise level of the resulting recommendations.

  • Provide explicit, numbered instructions: break your objective down into concrete, step-by-step questions or tasks. This approach structures the Gemini's reasoning process and helps ensure that Gemini addresses all of your requirements.

  • Define a specific output format: explicitly state how you want the recommendation to be formatted, such as a walkthrough explanation, a Markdown comparison table, or a ready-to-use gcloud code block.

  • Leverage automatic context grounding: you don't need to include your default region or zone, available quotas, CUDs, or resource location constraints into your prompt. Gemini can access this information in your Google Cloud project.

  • Iteratively refine your designs: you can modify or expand the response that Gemini generated by sending new prompts. For example, you can ask the assistant to add networking recommendations to your deployment plan or modify the storage requirements without starting a new conversation.

Example prompts

The following are examples of prompts that you can use to help you design and optimize your compute infrastructure:

  • Compute instance topology and placement strategy: to determine the optimal deployment model and placement policy for a high-availability workload, use a prompt like the following:

    Act as a cloud architect. I need to design a compute instance topology for a
    distributed database that balances multi-zone resilience with
    sub-millisecond latency.
    
    Please provide the following:
    1. A side-by-side comparison of regional MIGs against zonal MIGs.
    2. An explanation of whether compact placement policies work regionally.
    3. The optimal autoscaling configuration for this workload.
    
    Format the comparison as a Markdown table, and provide the deployment steps
    as ready-to-use gcloud code blocks.
    
  • Provisioning model and cost optimization: to evaluate provisioning models and reduce batch processing costs, use a prompt like the following:

    Act as a platform engineer. I need to find the cheapest way to run large,
    interruptible analytics jobs on our cloud servers without risking data loss.
    
    Please provide the following:
    1. A cost and reliability comparison of standard discounted servers against
       queue-based servers.
    2. An explanation of how to boot all our compute power at the exact same
       time.
    3. A deployment script that gives our jobs a two-minute warning before a
       server gets reclaimed.
    
    Format the comparison as a Markdown table, and provide the script as a
    ready-to-use code block.
    

What's next