Class GkeInferenceQuickstartClient (0.1.0)

GitHub RepositoryProduct Reference

Service Description: GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.

This class provides the ability to make remote calls to the backing service through method calls that map to API methods. Sample code to get started:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   GenerateOptimizedManifestRequest request =
       GenerateOptimizedManifestRequest.newBuilder()
           .setModelServerInfo(ModelServerInfo.newBuilder().build())
           .setAcceleratorType("acceleratorType-82462651")
           .setKubernetesNamespace("kubernetesNamespace-1862862667")
           .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
           .setStorageConfig(StorageConfig.newBuilder().build())
           .build();
   GenerateOptimizedManifestResponse response =
       gkeInferenceQuickstartClient.generateOptimizedManifest(request);
 }
 

Note: close() needs to be called on the GkeInferenceQuickstartClient object to clean up resources such as threads. In the example above, try-with-resources is used, which automatically calls close().

Methods
Method Description Method Variants

FetchModels

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Request object method variants only take one parameter, a request object, which must be constructed before the call.

  • fetchModels(FetchModelsRequest request)

Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.

  • fetchModelsPagedCallable()

  • fetchModelsCallable()

FetchModelServers

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Request object method variants only take one parameter, a request object, which must be constructed before the call.

  • fetchModelServers(FetchModelServersRequest request)

Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.

  • fetchModelServersPagedCallable()

  • fetchModelServersCallable()

FetchModelServerVersions

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Request object method variants only take one parameter, a request object, which must be constructed before the call.

  • fetchModelServerVersions(FetchModelServerVersionsRequest request)

Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.

  • fetchModelServerVersionsPagedCallable()

  • fetchModelServerVersionsCallable()

FetchProfiles

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Request object method variants only take one parameter, a request object, which must be constructed before the call.

  • fetchProfiles(FetchProfilesRequest request)

Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.

  • fetchProfilesPagedCallable()

  • fetchProfilesCallable()

GenerateOptimizedManifest

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Request object method variants only take one parameter, a request object, which must be constructed before the call.

  • generateOptimizedManifest(GenerateOptimizedManifestRequest request)

Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.

  • generateOptimizedManifestCallable()

FetchBenchmarkingData

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Request object method variants only take one parameter, a request object, which must be constructed before the call.

  • fetchBenchmarkingData(FetchBenchmarkingDataRequest request)

Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.

  • fetchBenchmarkingDataCallable()

See the individual methods for example code.

Many parameters require resource names to be formatted in a particular way. To assist with these names, this class includes a format method for each type of name, and additionally a parse method to extract the individual identifiers contained within names that are returned.

This class can be customized by passing in a custom instance of GkeInferenceQuickstartSettings to create(). For example:

To customize credentials:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
     GkeInferenceQuickstartSettings.newBuilder()
         .setCredentialsProvider(FixedCredentialsProvider.create(myCredentials))
         .build();
 GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
 

To customize the endpoint:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
     GkeInferenceQuickstartSettings.newBuilder().setEndpoint(myEndpoint).build();
 GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
 

To use REST (HTTP1.1/JSON) transport (instead of gRPC) for sending and receiving requests over the wire:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
     GkeInferenceQuickstartSettings.newHttpJsonBuilder().build();
 GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
 

Please refer to the GitHub repository's samples for more quickstart code snippets.

Inheritance

java.lang.Object > GkeInferenceQuickstartClient

Static Methods

create()

public static final GkeInferenceQuickstartClient create()

Constructs an instance of GkeInferenceQuickstartClient with default settings.

Returns
Type Description
GkeInferenceQuickstartClient
Exceptions
Type Description
IOException

create(GkeInferenceQuickstartSettings settings)

public static final GkeInferenceQuickstartClient create(GkeInferenceQuickstartSettings settings)

Constructs an instance of GkeInferenceQuickstartClient, using the given settings. The channels are created based on the settings passed in, or defaults for any settings that are not set.

Parameter
Name Description
settings GkeInferenceQuickstartSettings
Returns
Type Description
GkeInferenceQuickstartClient
Exceptions
Type Description
IOException

create(GkeInferenceQuickstartStub stub)

public static final GkeInferenceQuickstartClient create(GkeInferenceQuickstartStub stub)

Constructs an instance of GkeInferenceQuickstartClient, using the given stub for making calls. This is for advanced usage - prefer using create(GkeInferenceQuickstartSettings).

Parameter
Name Description
stub GkeInferenceQuickstartStub
Returns
Type Description
GkeInferenceQuickstartClient

Constructors

GkeInferenceQuickstartClient(GkeInferenceQuickstartSettings settings)

protected GkeInferenceQuickstartClient(GkeInferenceQuickstartSettings settings)

Constructs an instance of GkeInferenceQuickstartClient, using the given settings. This is protected so that it is easy to make a subclass, but otherwise, the static factory methods should be preferred.

Parameter
Name Description
settings GkeInferenceQuickstartSettings

GkeInferenceQuickstartClient(GkeInferenceQuickstartStub stub)

protected GkeInferenceQuickstartClient(GkeInferenceQuickstartStub stub)
Parameter
Name Description
stub GkeInferenceQuickstartStub

Methods

awaitTermination(long duration, TimeUnit unit)

public boolean awaitTermination(long duration, TimeUnit unit)
Parameters
Name Description
duration long
unit TimeUnit
Returns
Type Description
boolean
Exceptions
Type Description
InterruptedException

close()

public final void close()

fetchBenchmarkingData(FetchBenchmarkingDataRequest request)

public final FetchBenchmarkingDataResponse fetchBenchmarkingData(FetchBenchmarkingDataRequest request)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchBenchmarkingDataRequest request =
       FetchBenchmarkingDataRequest.newBuilder()
           .setModelServerInfo(ModelServerInfo.newBuilder().build())
           .setInstanceType("instanceType-737655441")
           .setPricingModel("pricingModel1050892035")
           .build();
   FetchBenchmarkingDataResponse response =
       gkeInferenceQuickstartClient.fetchBenchmarkingData(request);
 }
 
Parameter
Name Description
request FetchBenchmarkingDataRequest

The request object containing all of the parameters for the API call.

Returns
Type Description
FetchBenchmarkingDataResponse

fetchBenchmarkingDataCallable()

public final UnaryCallable<FetchBenchmarkingDataRequest,FetchBenchmarkingDataResponse> fetchBenchmarkingDataCallable()

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchBenchmarkingDataRequest request =
       FetchBenchmarkingDataRequest.newBuilder()
           .setModelServerInfo(ModelServerInfo.newBuilder().build())
           .setInstanceType("instanceType-737655441")
           .setPricingModel("pricingModel1050892035")
           .build();
   ApiFuture<FetchBenchmarkingDataResponse> future =
       gkeInferenceQuickstartClient.fetchBenchmarkingDataCallable().futureCall(request);
   // Do something.
   FetchBenchmarkingDataResponse response = future.get();
 }
 
Returns
Type Description
UnaryCallable<FetchBenchmarkingDataRequest,FetchBenchmarkingDataResponse>

fetchModelServerVersions(FetchModelServerVersionsRequest request)

public final GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse fetchModelServerVersions(FetchModelServerVersionsRequest request)

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelServerVersionsRequest request =
       FetchModelServerVersionsRequest.newBuilder()
           .setModel("model104069929")
           .setModelServer("modelServer475157452")
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   for (String element :
       gkeInferenceQuickstartClient.fetchModelServerVersions(request).iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Parameter
Name Description
request FetchModelServerVersionsRequest

The request object containing all of the parameters for the API call.

Returns
Type Description
GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse

fetchModelServerVersionsCallable()

public final UnaryCallable<FetchModelServerVersionsRequest,FetchModelServerVersionsResponse> fetchModelServerVersionsCallable()

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelServerVersionsRequest request =
       FetchModelServerVersionsRequest.newBuilder()
           .setModel("model104069929")
           .setModelServer("modelServer475157452")
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   while (true) {
     FetchModelServerVersionsResponse response =
         gkeInferenceQuickstartClient.fetchModelServerVersionsCallable().call(request);
     for (String element : response.getModelServerVersionsList()) {
       // doThingsWith(element);
     }
     String nextPageToken = response.getNextPageToken();
     if (!Strings.isNullOrEmpty(nextPageToken)) {
       request = request.toBuilder().setPageToken(nextPageToken).build();
     } else {
       break;
     }
   }
 }
 
Returns
Type Description
UnaryCallable<FetchModelServerVersionsRequest,FetchModelServerVersionsResponse>

fetchModelServerVersionsPagedCallable()

public final UnaryCallable<FetchModelServerVersionsRequest,GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse> fetchModelServerVersionsPagedCallable()

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelServerVersionsRequest request =
       FetchModelServerVersionsRequest.newBuilder()
           .setModel("model104069929")
           .setModelServer("modelServer475157452")
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   ApiFuture<String> future =
       gkeInferenceQuickstartClient.fetchModelServerVersionsPagedCallable().futureCall(request);
   // Do something.
   for (String element : future.get().iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Returns
Type Description
UnaryCallable<FetchModelServerVersionsRequest,FetchModelServerVersionsPagedResponse>

fetchModelServers(FetchModelServersRequest request)

public final GkeInferenceQuickstartClient.FetchModelServersPagedResponse fetchModelServers(FetchModelServersRequest request)

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelServersRequest request =
       FetchModelServersRequest.newBuilder()
           .setModel("model104069929")
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   for (String element : gkeInferenceQuickstartClient.fetchModelServers(request).iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Parameter
Name Description
request FetchModelServersRequest

The request object containing all of the parameters for the API call.

Returns
Type Description
GkeInferenceQuickstartClient.FetchModelServersPagedResponse

fetchModelServersCallable()

public final UnaryCallable<FetchModelServersRequest,FetchModelServersResponse> fetchModelServersCallable()

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelServersRequest request =
       FetchModelServersRequest.newBuilder()
           .setModel("model104069929")
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   while (true) {
     FetchModelServersResponse response =
         gkeInferenceQuickstartClient.fetchModelServersCallable().call(request);
     for (String element : response.getModelServersList()) {
       // doThingsWith(element);
     }
     String nextPageToken = response.getNextPageToken();
     if (!Strings.isNullOrEmpty(nextPageToken)) {
       request = request.toBuilder().setPageToken(nextPageToken).build();
     } else {
       break;
     }
   }
 }
 
Returns
Type Description
UnaryCallable<FetchModelServersRequest,FetchModelServersResponse>

fetchModelServersPagedCallable()

public final UnaryCallable<FetchModelServersRequest,GkeInferenceQuickstartClient.FetchModelServersPagedResponse> fetchModelServersPagedCallable()

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelServersRequest request =
       FetchModelServersRequest.newBuilder()
           .setModel("model104069929")
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   ApiFuture<String> future =
       gkeInferenceQuickstartClient.fetchModelServersPagedCallable().futureCall(request);
   // Do something.
   for (String element : future.get().iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Returns
Type Description
UnaryCallable<FetchModelServersRequest,FetchModelServersPagedResponse>

fetchModels(FetchModelsRequest request)

public final GkeInferenceQuickstartClient.FetchModelsPagedResponse fetchModels(FetchModelsRequest request)

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelsRequest request =
       FetchModelsRequest.newBuilder()
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   for (String element : gkeInferenceQuickstartClient.fetchModels(request).iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Parameter
Name Description
request FetchModelsRequest

The request object containing all of the parameters for the API call.

Returns
Type Description
GkeInferenceQuickstartClient.FetchModelsPagedResponse

fetchModelsCallable()

public final UnaryCallable<FetchModelsRequest,FetchModelsResponse> fetchModelsCallable()

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelsRequest request =
       FetchModelsRequest.newBuilder()
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   while (true) {
     FetchModelsResponse response =
         gkeInferenceQuickstartClient.fetchModelsCallable().call(request);
     for (String element : response.getModelsList()) {
       // doThingsWith(element);
     }
     String nextPageToken = response.getNextPageToken();
     if (!Strings.isNullOrEmpty(nextPageToken)) {
       request = request.toBuilder().setPageToken(nextPageToken).build();
     } else {
       break;
     }
   }
 }
 
Returns
Type Description
UnaryCallable<FetchModelsRequest,FetchModelsResponse>

fetchModelsPagedCallable()

public final UnaryCallable<FetchModelsRequest,GkeInferenceQuickstartClient.FetchModelsPagedResponse> fetchModelsPagedCallable()

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchModelsRequest request =
       FetchModelsRequest.newBuilder()
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   ApiFuture<String> future =
       gkeInferenceQuickstartClient.fetchModelsPagedCallable().futureCall(request);
   // Do something.
   for (String element : future.get().iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Returns
Type Description
UnaryCallable<FetchModelsRequest,FetchModelsPagedResponse>

fetchProfiles(FetchProfilesRequest request)

public final GkeInferenceQuickstartClient.FetchProfilesPagedResponse fetchProfiles(FetchProfilesRequest request)

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchProfilesRequest request =
       FetchProfilesRequest.newBuilder()
           .setModel("model104069929")
           .setModelServer("modelServer475157452")
           .setModelServerVersion("modelServerVersion77054828")
           .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   for (Profile element : gkeInferenceQuickstartClient.fetchProfiles(request).iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Parameter
Name Description
request FetchProfilesRequest

The request object containing all of the parameters for the API call.

Returns
Type Description
GkeInferenceQuickstartClient.FetchProfilesPagedResponse

fetchProfilesCallable()

public final UnaryCallable<FetchProfilesRequest,FetchProfilesResponse> fetchProfilesCallable()

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchProfilesRequest request =
       FetchProfilesRequest.newBuilder()
           .setModel("model104069929")
           .setModelServer("modelServer475157452")
           .setModelServerVersion("modelServerVersion77054828")
           .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   while (true) {
     FetchProfilesResponse response =
         gkeInferenceQuickstartClient.fetchProfilesCallable().call(request);
     for (Profile element : response.getProfileList()) {
       // doThingsWith(element);
     }
     String nextPageToken = response.getNextPageToken();
     if (!Strings.isNullOrEmpty(nextPageToken)) {
       request = request.toBuilder().setPageToken(nextPageToken).build();
     } else {
       break;
     }
   }
 }
 
Returns
Type Description
UnaryCallable<FetchProfilesRequest,FetchProfilesResponse>

fetchProfilesPagedCallable()

public final UnaryCallable<FetchProfilesRequest,GkeInferenceQuickstartClient.FetchProfilesPagedResponse> fetchProfilesPagedCallable()

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   FetchProfilesRequest request =
       FetchProfilesRequest.newBuilder()
           .setModel("model104069929")
           .setModelServer("modelServer475157452")
           .setModelServerVersion("modelServerVersion77054828")
           .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
           .setPageSize(883849137)
           .setPageToken("pageToken873572522")
           .build();
   ApiFuture<Profile> future =
       gkeInferenceQuickstartClient.fetchProfilesPagedCallable().futureCall(request);
   // Do something.
   for (Profile element : future.get().iterateAll()) {
     // doThingsWith(element);
   }
 }
 
Returns
Type Description
UnaryCallable<FetchProfilesRequest,FetchProfilesPagedResponse>

generateOptimizedManifest(GenerateOptimizedManifestRequest request)

public final GenerateOptimizedManifestResponse generateOptimizedManifest(GenerateOptimizedManifestRequest request)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   GenerateOptimizedManifestRequest request =
       GenerateOptimizedManifestRequest.newBuilder()
           .setModelServerInfo(ModelServerInfo.newBuilder().build())
           .setAcceleratorType("acceleratorType-82462651")
           .setKubernetesNamespace("kubernetesNamespace-1862862667")
           .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
           .setStorageConfig(StorageConfig.newBuilder().build())
           .build();
   GenerateOptimizedManifestResponse response =
       gkeInferenceQuickstartClient.generateOptimizedManifest(request);
 }
 
Parameter
Name Description
request GenerateOptimizedManifestRequest

The request object containing all of the parameters for the API call.

Returns
Type Description
GenerateOptimizedManifestResponse

generateOptimizedManifestCallable()

public final UnaryCallable<GenerateOptimizedManifestRequest,GenerateOptimizedManifestResponse> generateOptimizedManifestCallable()

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Sample code:


 // This snippet has been automatically generated and should be regarded as a code template only.
 // It will require modifications to work:
 // - It may require correct/in-range values for request initialization.
 // - It may require specifying regional endpoints when creating the service client as shown in
 // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
 try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
     GkeInferenceQuickstartClient.create()) {
   GenerateOptimizedManifestRequest request =
       GenerateOptimizedManifestRequest.newBuilder()
           .setModelServerInfo(ModelServerInfo.newBuilder().build())
           .setAcceleratorType("acceleratorType-82462651")
           .setKubernetesNamespace("kubernetesNamespace-1862862667")
           .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
           .setStorageConfig(StorageConfig.newBuilder().build())
           .build();
   ApiFuture<GenerateOptimizedManifestResponse> future =
       gkeInferenceQuickstartClient.generateOptimizedManifestCallable().futureCall(request);
   // Do something.
   GenerateOptimizedManifestResponse response = future.get();
 }
 
Returns
Type Description
UnaryCallable<GenerateOptimizedManifestRequest,GenerateOptimizedManifestResponse>

getSettings()

public final GkeInferenceQuickstartSettings getSettings()
Returns
Type Description
GkeInferenceQuickstartSettings

getStub()

public GkeInferenceQuickstartStub getStub()
Returns
Type Description
GkeInferenceQuickstartStub

isShutdown()

public boolean isShutdown()
Returns
Type Description
boolean

isTerminated()

public boolean isTerminated()
Returns
Type Description
boolean

shutdown()

public void shutdown()

shutdownNow()

public void shutdownNow()