GKE Recommender v1 API - Class GkeInferenceQuickstartClientImpl (1.0.0-beta01)

public sealed class GkeInferenceQuickstartClientImpl : GkeInferenceQuickstartClient

Reference documentation and code samples for the GKE Recommender v1 API class GkeInferenceQuickstartClientImpl.

GkeInferenceQuickstart client wrapper implementation, for convenient use.

Inheritance

object > GkeInferenceQuickstartClient > GkeInferenceQuickstartClientImpl

Inherited Members

GkeInferenceQuickstartClient.DefaultEndpoint

GkeInferenceQuickstartClient.DefaultScopes

GkeInferenceQuickstartClient.ServiceMetadata

GkeInferenceQuickstartClient.CreateAsync(CancellationToken)

GkeInferenceQuickstartClient.Create()

GkeInferenceQuickstartClient.ShutdownDefaultChannelsAsync()

GkeInferenceQuickstartClient.GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest, CancellationToken)

GkeInferenceQuickstartClient.FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest, CancellationToken)

object.GetHashCode()

object.GetType()

object.ToString()

Namespace

Google.Cloud.GkeRecommender.V1

Assembly

Google.Cloud.GkeRecommender.V1.dll

Remarks

GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.

Constructors

GkeInferenceQuickstartClientImpl(GkeInferenceQuickstartClient, GkeInferenceQuickstartSettings, ILogger)

public GkeInferenceQuickstartClientImpl(GkeInferenceQuickstart.GkeInferenceQuickstartClient grpcClient, GkeInferenceQuickstartSettings settings, ILogger logger)

Constructs a client wrapper for the GkeInferenceQuickstart service, with the specified gRPC client and settings.

Parameters
Name	Description
`grpcClient`	`GkeInferenceQuickstartGkeInferenceQuickstartClient` The underlying gRPC client.
`settings`	`GkeInferenceQuickstartSettings` The base GkeInferenceQuickstartSettings used within this client.
`logger`	`ILogger` Optional ILogger to use within this client.

Properties

GrpcClient

public override GkeInferenceQuickstart.GkeInferenceQuickstartClient GrpcClient { get; }

The underlying gRPC GkeInferenceQuickstart client

Property Value
Type	Description
`GkeInferenceQuickstartGkeInferenceQuickstartClient`

Overrides

GkeInferenceQuickstartClient.GrpcClient

Methods

FetchBenchmarkingData(FetchBenchmarkingDataRequest, CallSettings)

public override FetchBenchmarkingDataResponse FetchBenchmarkingData(FetchBenchmarkingDataRequest request, CallSettings callSettings = null)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name	Description
`request`	`FetchBenchmarkingDataRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`FetchBenchmarkingDataResponse`	The RPC response.

Overrides

GkeInferenceQuickstartClient.FetchBenchmarkingData(FetchBenchmarkingDataRequest, CallSettings)

FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest, CallSettings)

public override Task<FetchBenchmarkingDataResponse> FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest request, CallSettings callSettings = null)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name	Description
`request`	`FetchBenchmarkingDataRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`TaskFetchBenchmarkingDataResponse`	A Task containing the RPC response.

Overrides

GkeInferenceQuickstartClient.FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest, CallSettings)

FetchModelServerVersions(FetchModelServerVersionsRequest, CallSettings)

public override PagedEnumerable<FetchModelServerVersionsResponse, string> FetchModelServerVersions(FetchModelServerVersionsRequest request, CallSettings callSettings = null)

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Parameters
Name	Description
`request`	`FetchModelServerVersionsRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedEnumerableFetchModelServerVersionsResponsestring`	A pageable sequence of string resources.

Overrides

GkeInferenceQuickstartClient.FetchModelServerVersions(FetchModelServerVersionsRequest, CallSettings)

FetchModelServerVersionsAsync(FetchModelServerVersionsRequest, CallSettings)

public override PagedAsyncEnumerable<FetchModelServerVersionsResponse, string> FetchModelServerVersionsAsync(FetchModelServerVersionsRequest request, CallSettings callSettings = null)

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Parameters
Name	Description
`request`	`FetchModelServerVersionsRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedAsyncEnumerableFetchModelServerVersionsResponsestring`	A pageable asynchronous sequence of string resources.

Overrides

GkeInferenceQuickstartClient.FetchModelServerVersionsAsync(FetchModelServerVersionsRequest, CallSettings)

FetchModelServers(FetchModelServersRequest, CallSettings)

public override PagedEnumerable<FetchModelServersResponse, string> FetchModelServers(FetchModelServersRequest request, CallSettings callSettings = null)

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Parameters
Name	Description
`request`	`FetchModelServersRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedEnumerableFetchModelServersResponsestring`	A pageable sequence of string resources.

Overrides

GkeInferenceQuickstartClient.FetchModelServers(FetchModelServersRequest, CallSettings)

FetchModelServersAsync(FetchModelServersRequest, CallSettings)

public override PagedAsyncEnumerable<FetchModelServersResponse, string> FetchModelServersAsync(FetchModelServersRequest request, CallSettings callSettings = null)

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Parameters
Name	Description
`request`	`FetchModelServersRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedAsyncEnumerableFetchModelServersResponsestring`	A pageable asynchronous sequence of string resources.

Overrides

GkeInferenceQuickstartClient.FetchModelServersAsync(FetchModelServersRequest, CallSettings)

FetchModels(FetchModelsRequest, CallSettings)

public override PagedEnumerable<FetchModelsResponse, string> FetchModels(FetchModelsRequest request, CallSettings callSettings = null)

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Parameters
Name	Description
`request`	`FetchModelsRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedEnumerableFetchModelsResponsestring`	A pageable sequence of string resources.

Overrides

GkeInferenceQuickstartClient.FetchModels(FetchModelsRequest, CallSettings)

FetchModelsAsync(FetchModelsRequest, CallSettings)

public override PagedAsyncEnumerable<FetchModelsResponse, string> FetchModelsAsync(FetchModelsRequest request, CallSettings callSettings = null)

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Parameters
Name	Description
`request`	`FetchModelsRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedAsyncEnumerableFetchModelsResponsestring`	A pageable asynchronous sequence of string resources.

Overrides

GkeInferenceQuickstartClient.FetchModelsAsync(FetchModelsRequest, CallSettings)

FetchProfiles(FetchProfilesRequest, CallSettings)

public override PagedEnumerable<FetchProfilesResponse, Profile> FetchProfiles(FetchProfilesRequest request, CallSettings callSettings = null)

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Parameters
Name	Description
`request`	`FetchProfilesRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedEnumerableFetchProfilesResponseProfile`	A pageable sequence of Profile resources.

Overrides

GkeInferenceQuickstartClient.FetchProfiles(FetchProfilesRequest, CallSettings)

FetchProfilesAsync(FetchProfilesRequest, CallSettings)

public override PagedAsyncEnumerable<FetchProfilesResponse, Profile> FetchProfilesAsync(FetchProfilesRequest request, CallSettings callSettings = null)

Parameters
Name	Description
`request`	`FetchProfilesRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`PagedAsyncEnumerableFetchProfilesResponseProfile`	A pageable asynchronous sequence of Profile resources.

Overrides

GkeInferenceQuickstartClient.FetchProfilesAsync(FetchProfilesRequest, CallSettings)

GenerateOptimizedManifest(GenerateOptimizedManifestRequest, CallSettings)

public override GenerateOptimizedManifestResponse GenerateOptimizedManifest(GenerateOptimizedManifestRequest request, CallSettings callSettings = null)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Parameters
Name	Description
`request`	`GenerateOptimizedManifestRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`GenerateOptimizedManifestResponse`	The RPC response.

Overrides

GkeInferenceQuickstartClient.GenerateOptimizedManifest(GenerateOptimizedManifestRequest, CallSettings)

GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest, CallSettings)

public override Task<GenerateOptimizedManifestResponse> GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest request, CallSettings callSettings = null)

Parameters
Name	Description
`request`	`GenerateOptimizedManifestRequest` The request object containing all of the parameters for the API call.
`callSettings`	`CallSettings` If not null, applies overrides to this RPC call.

Returns
Type	Description
`TaskGenerateOptimizedManifestResponse`	A Task containing the RPC response.

Overrides

GkeInferenceQuickstartClient.GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest, CallSettings)

GKE Recommender v1 API - Class GkeInferenceQuickstartClientImpl (1.0.0-beta01) Stay organized with collections Save and categorize content based on your preferences.

Inheritance

Inherited Members

Namespace

Assembly

Remarks

Constructors

GkeInferenceQuickstartClientImpl(GkeInferenceQuickstartClient, GkeInferenceQuickstartSettings, ILogger)

Properties

GrpcClient

Methods

FetchBenchmarkingData(FetchBenchmarkingDataRequest, CallSettings)

FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest, CallSettings)

FetchModelServerVersions(FetchModelServerVersionsRequest, CallSettings)

FetchModelServerVersionsAsync(FetchModelServerVersionsRequest, CallSettings)

FetchModelServers(FetchModelServersRequest, CallSettings)

FetchModelServersAsync(FetchModelServersRequest, CallSettings)

FetchModels(FetchModelsRequest, CallSettings)

FetchModelsAsync(FetchModelsRequest, CallSettings)

FetchProfiles(FetchProfilesRequest, CallSettings)

FetchProfilesAsync(FetchProfilesRequest, CallSettings)

GenerateOptimizedManifest(GenerateOptimizedManifestRequest, CallSettings)

GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest, CallSettings)

GKE Recommender v1 API - Class GkeInferenceQuickstartClientImpl (1.0.0-beta01)