| GitHub Repository | Product Reference |
Service Description: GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.
This class provides the ability to make remote calls to the backing service through method calls that map to API methods. Sample code to get started:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
GenerateOptimizedManifestRequest request =
GenerateOptimizedManifestRequest.newBuilder()
.setModelServerInfo(ModelServerInfo.newBuilder().build())
.setAcceleratorType("acceleratorType-82462651")
.setKubernetesNamespace("kubernetesNamespace-1862862667")
.setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
.setStorageConfig(StorageConfig.newBuilder().build())
.build();
GenerateOptimizedManifestResponse response =
gkeInferenceQuickstartClient.generateOptimizedManifest(request);
}
Note: close() needs to be called on the GkeInferenceQuickstartClient object to clean up resources such as threads. In the example above, try-with-resources is used, which automatically calls close().
| Method | Description | Method Variants |
|---|---|---|
FetchModels |
Fetches available models. Open-source models follow the Huggingface Hub |
Request object method variants only take one parameter, a request object, which must be constructed before the call.
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.
|
FetchModelServers |
Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., |
Request object method variants only take one parameter, a request object, which must be constructed before the call.
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.
|
FetchModelServerVersions |
Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., Some model servers have different versioning schemas depending on the accelerator. For example, |
Request object method variants only take one parameter, a request object, which must be constructed before the call.
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.
|
FetchProfiles |
Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned. Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details. |
Request object method variants only take one parameter, a request object, which must be constructed before the call.
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.
|
GenerateOptimizedManifest |
Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details. |
Request object method variants only take one parameter, a request object, which must be constructed before the call.
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.
|
FetchBenchmarkingData |
Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type. |
Request object method variants only take one parameter, a request object, which must be constructed before the call.
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service.
|
See the individual methods for example code.
Many parameters require resource names to be formatted in a particular way. To assist with these names, this class includes a format method for each type of name, and additionally a parse method to extract the individual identifiers contained within names that are returned.
This class can be customized by passing in a custom instance of GkeInferenceQuickstartSettings to create(). For example:
To customize credentials:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
GkeInferenceQuickstartSettings.newBuilder()
.setCredentialsProvider(FixedCredentialsProvider.create(myCredentials))
.build();
GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
To customize the endpoint:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
GkeInferenceQuickstartSettings.newBuilder().setEndpoint(myEndpoint).build();
GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
To use REST (HTTP1.1/JSON) transport (instead of gRPC) for sending and receiving requests over the wire:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
GkeInferenceQuickstartSettings.newHttpJsonBuilder().build();
GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
Please refer to the GitHub repository's samples for more quickstart code snippets.
Static Methods
create()
public static final GkeInferenceQuickstartClient create()Constructs an instance of GkeInferenceQuickstartClient with default settings.
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient |
|
| Exceptions | |
|---|---|
| Type | Description |
IOException |
|
create(GkeInferenceQuickstartSettings settings)
public static final GkeInferenceQuickstartClient create(GkeInferenceQuickstartSettings settings)Constructs an instance of GkeInferenceQuickstartClient, using the given settings. The channels are created based on the settings passed in, or defaults for any settings that are not set.
| Parameter | |
|---|---|
| Name | Description |
settings |
GkeInferenceQuickstartSettings |
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient |
|
| Exceptions | |
|---|---|
| Type | Description |
IOException |
|
create(GkeInferenceQuickstartStub stub)
public static final GkeInferenceQuickstartClient create(GkeInferenceQuickstartStub stub)Constructs an instance of GkeInferenceQuickstartClient, using the given stub for making calls. This is for advanced usage - prefer using create(GkeInferenceQuickstartSettings).
| Parameter | |
|---|---|
| Name | Description |
stub |
GkeInferenceQuickstartStub |
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient |
|
Constructors
GkeInferenceQuickstartClient(GkeInferenceQuickstartSettings settings)
protected GkeInferenceQuickstartClient(GkeInferenceQuickstartSettings settings)Constructs an instance of GkeInferenceQuickstartClient, using the given settings. This is protected so that it is easy to make a subclass, but otherwise, the static factory methods should be preferred.
| Parameter | |
|---|---|
| Name | Description |
settings |
GkeInferenceQuickstartSettings |
GkeInferenceQuickstartClient(GkeInferenceQuickstartStub stub)
protected GkeInferenceQuickstartClient(GkeInferenceQuickstartStub stub)| Parameter | |
|---|---|
| Name | Description |
stub |
GkeInferenceQuickstartStub |
Methods
awaitTermination(long duration, TimeUnit unit)
public boolean awaitTermination(long duration, TimeUnit unit)| Parameters | |
|---|---|
| Name | Description |
duration |
long |
unit |
TimeUnit |
| Returns | |
|---|---|
| Type | Description |
boolean |
|
| Exceptions | |
|---|---|
| Type | Description |
InterruptedException |
|
close()
public final void close()fetchBenchmarkingData(FetchBenchmarkingDataRequest request)
public final FetchBenchmarkingDataResponse fetchBenchmarkingData(FetchBenchmarkingDataRequest request)Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchBenchmarkingDataRequest request =
FetchBenchmarkingDataRequest.newBuilder()
.setModelServerInfo(ModelServerInfo.newBuilder().build())
.setInstanceType("instanceType-737655441")
.setPricingModel("pricingModel1050892035")
.build();
FetchBenchmarkingDataResponse response =
gkeInferenceQuickstartClient.fetchBenchmarkingData(request);
}
| Parameter | |
|---|---|
| Name | Description |
request |
FetchBenchmarkingDataRequestThe request object containing all of the parameters for the API call. |
| Returns | |
|---|---|
| Type | Description |
FetchBenchmarkingDataResponse |
|
fetchBenchmarkingDataCallable()
public final UnaryCallable<FetchBenchmarkingDataRequest,FetchBenchmarkingDataResponse> fetchBenchmarkingDataCallable()Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchBenchmarkingDataRequest request =
FetchBenchmarkingDataRequest.newBuilder()
.setModelServerInfo(ModelServerInfo.newBuilder().build())
.setInstanceType("instanceType-737655441")
.setPricingModel("pricingModel1050892035")
.build();
ApiFuture<FetchBenchmarkingDataResponse> future =
gkeInferenceQuickstartClient.fetchBenchmarkingDataCallable().futureCall(request);
// Do something.
FetchBenchmarkingDataResponse response = future.get();
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchBenchmarkingDataRequest,FetchBenchmarkingDataResponse> |
|
fetchModelServerVersions(FetchModelServerVersionsRequest request)
public final GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse fetchModelServerVersions(FetchModelServerVersionsRequest request)Fetches available model server versions. Open-source servers use their own versioning schemas
(e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the accelerator. For
example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available
versions will be returned when different schemas are present.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelServerVersionsRequest request =
FetchModelServerVersionsRequest.newBuilder()
.setModel("model104069929")
.setModelServer("modelServer475157452")
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
for (String element :
gkeInferenceQuickstartClient.fetchModelServerVersions(request).iterateAll()) {
// doThingsWith(element);
}
}
| Parameter | |
|---|---|
| Name | Description |
request |
FetchModelServerVersionsRequestThe request object containing all of the parameters for the API call. |
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse |
|
fetchModelServerVersionsCallable()
public final UnaryCallable<FetchModelServerVersionsRequest,FetchModelServerVersionsResponse> fetchModelServerVersionsCallable()Fetches available model server versions. Open-source servers use their own versioning schemas
(e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the accelerator. For
example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available
versions will be returned when different schemas are present.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelServerVersionsRequest request =
FetchModelServerVersionsRequest.newBuilder()
.setModel("model104069929")
.setModelServer("modelServer475157452")
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
while (true) {
FetchModelServerVersionsResponse response =
gkeInferenceQuickstartClient.fetchModelServerVersionsCallable().call(request);
for (String element : response.getModelServerVersionsList()) {
// doThingsWith(element);
}
String nextPageToken = response.getNextPageToken();
if (!Strings.isNullOrEmpty(nextPageToken)) {
request = request.toBuilder().setPageToken(nextPageToken).build();
} else {
break;
}
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchModelServerVersionsRequest,FetchModelServerVersionsResponse> |
|
fetchModelServerVersionsPagedCallable()
public final UnaryCallable<FetchModelServerVersionsRequest,GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse> fetchModelServerVersionsPagedCallable()Fetches available model server versions. Open-source servers use their own versioning schemas
(e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the accelerator. For
example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available
versions will be returned when different schemas are present.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelServerVersionsRequest request =
FetchModelServerVersionsRequest.newBuilder()
.setModel("model104069929")
.setModelServer("modelServer475157452")
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
ApiFuture<String> future =
gkeInferenceQuickstartClient.fetchModelServerVersionsPagedCallable().futureCall(request);
// Do something.
for (String element : future.get().iterateAll()) {
// doThingsWith(element);
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchModelServerVersionsRequest,FetchModelServerVersionsPagedResponse> |
|
fetchModelServers(FetchModelServersRequest request)
public final GkeInferenceQuickstartClient.FetchModelServersPagedResponse fetchModelServers(FetchModelServersRequest request)Fetches available model servers. Open-source model servers use simplified, lowercase names
(e.g., vllm).
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelServersRequest request =
FetchModelServersRequest.newBuilder()
.setModel("model104069929")
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
for (String element : gkeInferenceQuickstartClient.fetchModelServers(request).iterateAll()) {
// doThingsWith(element);
}
}
| Parameter | |
|---|---|
| Name | Description |
request |
FetchModelServersRequestThe request object containing all of the parameters for the API call. |
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient.FetchModelServersPagedResponse |
|
fetchModelServersCallable()
public final UnaryCallable<FetchModelServersRequest,FetchModelServersResponse> fetchModelServersCallable()Fetches available model servers. Open-source model servers use simplified, lowercase names
(e.g., vllm).
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelServersRequest request =
FetchModelServersRequest.newBuilder()
.setModel("model104069929")
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
while (true) {
FetchModelServersResponse response =
gkeInferenceQuickstartClient.fetchModelServersCallable().call(request);
for (String element : response.getModelServersList()) {
// doThingsWith(element);
}
String nextPageToken = response.getNextPageToken();
if (!Strings.isNullOrEmpty(nextPageToken)) {
request = request.toBuilder().setPageToken(nextPageToken).build();
} else {
break;
}
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchModelServersRequest,FetchModelServersResponse> |
|
fetchModelServersPagedCallable()
public final UnaryCallable<FetchModelServersRequest,GkeInferenceQuickstartClient.FetchModelServersPagedResponse> fetchModelServersPagedCallable()Fetches available model servers. Open-source model servers use simplified, lowercase names
(e.g., vllm).
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelServersRequest request =
FetchModelServersRequest.newBuilder()
.setModel("model104069929")
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
ApiFuture<String> future =
gkeInferenceQuickstartClient.fetchModelServersPagedCallable().futureCall(request);
// Do something.
for (String element : future.get().iterateAll()) {
// doThingsWith(element);
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchModelServersRequest,FetchModelServersPagedResponse> |
|
fetchModels(FetchModelsRequest request)
public final GkeInferenceQuickstartClient.FetchModelsPagedResponse fetchModels(FetchModelsRequest request)Fetches available models. Open-source models follow the Huggingface Hub owner/model_name
format.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelsRequest request =
FetchModelsRequest.newBuilder()
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
for (String element : gkeInferenceQuickstartClient.fetchModels(request).iterateAll()) {
// doThingsWith(element);
}
}
| Parameter | |
|---|---|
| Name | Description |
request |
FetchModelsRequestThe request object containing all of the parameters for the API call. |
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient.FetchModelsPagedResponse |
|
fetchModelsCallable()
public final UnaryCallable<FetchModelsRequest,FetchModelsResponse> fetchModelsCallable()Fetches available models. Open-source models follow the Huggingface Hub owner/model_name
format.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelsRequest request =
FetchModelsRequest.newBuilder()
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
while (true) {
FetchModelsResponse response =
gkeInferenceQuickstartClient.fetchModelsCallable().call(request);
for (String element : response.getModelsList()) {
// doThingsWith(element);
}
String nextPageToken = response.getNextPageToken();
if (!Strings.isNullOrEmpty(nextPageToken)) {
request = request.toBuilder().setPageToken(nextPageToken).build();
} else {
break;
}
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchModelsRequest,FetchModelsResponse> |
|
fetchModelsPagedCallable()
public final UnaryCallable<FetchModelsRequest,GkeInferenceQuickstartClient.FetchModelsPagedResponse> fetchModelsPagedCallable()Fetches available models. Open-source models follow the Huggingface Hub owner/model_name
format.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchModelsRequest request =
FetchModelsRequest.newBuilder()
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
ApiFuture<String> future =
gkeInferenceQuickstartClient.fetchModelsPagedCallable().futureCall(request);
// Do something.
for (String element : future.get().iterateAll()) {
// doThingsWith(element);
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchModelsRequest,FetchModelsPagedResponse> |
|
fetchProfiles(FetchProfilesRequest request)
public final GkeInferenceQuickstartClient.FetchProfilesPagedResponse fetchProfiles(FetchProfilesRequest request)Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchProfilesRequest request =
FetchProfilesRequest.newBuilder()
.setModel("model104069929")
.setModelServer("modelServer475157452")
.setModelServerVersion("modelServerVersion77054828")
.setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
for (Profile element : gkeInferenceQuickstartClient.fetchProfiles(request).iterateAll()) {
// doThingsWith(element);
}
}
| Parameter | |
|---|---|
| Name | Description |
request |
FetchProfilesRequestThe request object containing all of the parameters for the API call. |
| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartClient.FetchProfilesPagedResponse |
|
fetchProfilesCallable()
public final UnaryCallable<FetchProfilesRequest,FetchProfilesResponse> fetchProfilesCallable()Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchProfilesRequest request =
FetchProfilesRequest.newBuilder()
.setModel("model104069929")
.setModelServer("modelServer475157452")
.setModelServerVersion("modelServerVersion77054828")
.setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
while (true) {
FetchProfilesResponse response =
gkeInferenceQuickstartClient.fetchProfilesCallable().call(request);
for (Profile element : response.getProfileList()) {
// doThingsWith(element);
}
String nextPageToken = response.getNextPageToken();
if (!Strings.isNullOrEmpty(nextPageToken)) {
request = request.toBuilder().setPageToken(nextPageToken).build();
} else {
break;
}
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchProfilesRequest,FetchProfilesResponse> |
|
fetchProfilesPagedCallable()
public final UnaryCallable<FetchProfilesRequest,GkeInferenceQuickstartClient.FetchProfilesPagedResponse> fetchProfilesPagedCallable()Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
FetchProfilesRequest request =
FetchProfilesRequest.newBuilder()
.setModel("model104069929")
.setModelServer("modelServer475157452")
.setModelServerVersion("modelServerVersion77054828")
.setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
.setPageSize(883849137)
.setPageToken("pageToken873572522")
.build();
ApiFuture<Profile> future =
gkeInferenceQuickstartClient.fetchProfilesPagedCallable().futureCall(request);
// Do something.
for (Profile element : future.get().iterateAll()) {
// doThingsWith(element);
}
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<FetchProfilesRequest,FetchProfilesPagedResponse> |
|
generateOptimizedManifest(GenerateOptimizedManifestRequest request)
public final GenerateOptimizedManifestResponse generateOptimizedManifest(GenerateOptimizedManifestRequest request)Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
GenerateOptimizedManifestRequest request =
GenerateOptimizedManifestRequest.newBuilder()
.setModelServerInfo(ModelServerInfo.newBuilder().build())
.setAcceleratorType("acceleratorType-82462651")
.setKubernetesNamespace("kubernetesNamespace-1862862667")
.setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
.setStorageConfig(StorageConfig.newBuilder().build())
.build();
GenerateOptimizedManifestResponse response =
gkeInferenceQuickstartClient.generateOptimizedManifest(request);
}
| Parameter | |
|---|---|
| Name | Description |
request |
GenerateOptimizedManifestRequestThe request object containing all of the parameters for the API call. |
| Returns | |
|---|---|
| Type | Description |
GenerateOptimizedManifestResponse |
|
generateOptimizedManifestCallable()
public final UnaryCallable<GenerateOptimizedManifestRequest,GenerateOptimizedManifestResponse> generateOptimizedManifestCallable()Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.
Sample code:
// This snippet has been automatically generated and should be regarded as a code template only.
// It will require modifications to work:
// - It may require correct/in-range values for request initialization.
// - It may require specifying regional endpoints when creating the service client as shown in
// https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
GkeInferenceQuickstartClient.create()) {
GenerateOptimizedManifestRequest request =
GenerateOptimizedManifestRequest.newBuilder()
.setModelServerInfo(ModelServerInfo.newBuilder().build())
.setAcceleratorType("acceleratorType-82462651")
.setKubernetesNamespace("kubernetesNamespace-1862862667")
.setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
.setStorageConfig(StorageConfig.newBuilder().build())
.build();
ApiFuture<GenerateOptimizedManifestResponse> future =
gkeInferenceQuickstartClient.generateOptimizedManifestCallable().futureCall(request);
// Do something.
GenerateOptimizedManifestResponse response = future.get();
}
| Returns | |
|---|---|
| Type | Description |
UnaryCallable<GenerateOptimizedManifestRequest,GenerateOptimizedManifestResponse> |
|
getSettings()
public final GkeInferenceQuickstartSettings getSettings()| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartSettings |
|
getStub()
public GkeInferenceQuickstartStub getStub()| Returns | |
|---|---|
| Type | Description |
GkeInferenceQuickstartStub |
|
isShutdown()
public boolean isShutdown()| Returns | |
|---|---|
| Type | Description |
boolean |
|
isTerminated()
public boolean isTerminated()| Returns | |
|---|---|
| Type | Description |
boolean |
|
shutdown()
public void shutdown()shutdownNow()
public void shutdownNow()