GKE Recommender v1 API - Class GkeInferenceQuickstartClient (1.0.0-beta01)

public abstract class GkeInferenceQuickstartClient

Reference documentation and code samples for the GKE Recommender v1 API class GkeInferenceQuickstartClient.

GkeInferenceQuickstart client wrapper, for convenient use.

Inheritance

object > GkeInferenceQuickstartClient

Namespace

Google.Cloud.GkeRecommender.V1

Assembly

Google.Cloud.GkeRecommender.V1.dll

Remarks

GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.

Properties

DefaultEndpoint

public static string DefaultEndpoint { get; }

The default endpoint for the GkeInferenceQuickstart service, which is a host of "gkerecommender.googleapis.com" and a port of 443.

Property Value
Type Description
string

DefaultScopes

public static IReadOnlyList<string> DefaultScopes { get; }

The default GkeInferenceQuickstart scopes.

Property Value
Type Description
IReadOnlyListstring
Remarks

The default GkeInferenceQuickstart scopes are:

GrpcClient

public virtual GkeInferenceQuickstart.GkeInferenceQuickstartClient GrpcClient { get; }

The underlying gRPC GkeInferenceQuickstart client

Property Value
Type Description
GkeInferenceQuickstartGkeInferenceQuickstartClient

ServiceMetadata

public static ServiceMetadata ServiceMetadata { get; }

The service metadata associated with this client type.

Property Value
Type Description
ServiceMetadata

Methods

Create()

public static GkeInferenceQuickstartClient Create()

Synchronously creates a GkeInferenceQuickstartClient using the default credentials, endpoint and settings. To specify custom credentials or other settings, use GkeInferenceQuickstartClientBuilder.

Returns
Type Description
GkeInferenceQuickstartClient

The created GkeInferenceQuickstartClient.

CreateAsync(CancellationToken)

public static Task<GkeInferenceQuickstartClient> CreateAsync(CancellationToken cancellationToken = default)

Asynchronously creates a GkeInferenceQuickstartClient using the default credentials, endpoint and settings. To specify custom credentials or other settings, use GkeInferenceQuickstartClientBuilder.

Parameter
Name Description
cancellationToken CancellationToken

The CancellationToken to use while creating the client.

Returns
Type Description
TaskGkeInferenceQuickstartClient

The task representing the created GkeInferenceQuickstartClient.

FetchBenchmarkingData(FetchBenchmarkingDataRequest, CallSettings)

public virtual FetchBenchmarkingDataResponse FetchBenchmarkingData(FetchBenchmarkingDataRequest request, CallSettings callSettings = null)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name Description
request FetchBenchmarkingDataRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
FetchBenchmarkingDataResponse

The RPC response.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = GkeInferenceQuickstartClient.Create();
// Initialize request argument(s)
FetchBenchmarkingDataRequest request = new FetchBenchmarkingDataRequest
{
    ModelServerInfo = new ModelServerInfo(),
    InstanceType = "",
    PricingModel = "",
};
// Make the request
FetchBenchmarkingDataResponse response = gkeInferenceQuickstartClient.FetchBenchmarkingData(request);

FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest, CallSettings)

public virtual Task<FetchBenchmarkingDataResponse> FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest request, CallSettings callSettings = null)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name Description
request FetchBenchmarkingDataRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
TaskFetchBenchmarkingDataResponse

A Task containing the RPC response.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
FetchBenchmarkingDataRequest request = new FetchBenchmarkingDataRequest
{
    ModelServerInfo = new ModelServerInfo(),
    InstanceType = "",
    PricingModel = "",
};
// Make the request
FetchBenchmarkingDataResponse response = await gkeInferenceQuickstartClient.FetchBenchmarkingDataAsync(request);

FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest, CancellationToken)

public virtual Task<FetchBenchmarkingDataResponse> FetchBenchmarkingDataAsync(FetchBenchmarkingDataRequest request, CancellationToken cancellationToken)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name Description
request FetchBenchmarkingDataRequest

The request object containing all of the parameters for the API call.

cancellationToken CancellationToken

A CancellationToken to use for this RPC.

Returns
Type Description
TaskFetchBenchmarkingDataResponse

A Task containing the RPC response.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
FetchBenchmarkingDataRequest request = new FetchBenchmarkingDataRequest
{
    ModelServerInfo = new ModelServerInfo(),
    InstanceType = "",
    PricingModel = "",
};
// Make the request
FetchBenchmarkingDataResponse response = await gkeInferenceQuickstartClient.FetchBenchmarkingDataAsync(request);

FetchModelServerVersions(FetchModelServerVersionsRequest, CallSettings)

public virtual PagedEnumerable<FetchModelServerVersionsResponse, string> FetchModelServerVersions(FetchModelServerVersionsRequest request, CallSettings callSettings = null)

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Parameters
Name Description
request FetchModelServerVersionsRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedEnumerableFetchModelServerVersionsResponsestring

A pageable sequence of string resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = GkeInferenceQuickstartClient.Create();
// Initialize request argument(s)
FetchModelServerVersionsRequest request = new FetchModelServerVersionsRequest
{
    Model = "",
    ModelServer = "",
};
// Make the request
PagedEnumerable<FetchModelServerVersionsResponse, string> response = gkeInferenceQuickstartClient.FetchModelServerVersions(request);

// Iterate over all response items, lazily performing RPCs as required
foreach (string item in response)
{
    // Do something with each item
    Console.WriteLine(item);
}

// Or iterate over pages (of server-defined size), performing one RPC per page
foreach (FetchModelServerVersionsResponse page in response.AsRawResponses())
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (string item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
}

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<string> singlePage = response.ReadPage(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (string item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchModelServerVersionsAsync(FetchModelServerVersionsRequest, CallSettings)

public virtual PagedAsyncEnumerable<FetchModelServerVersionsResponse, string> FetchModelServerVersionsAsync(FetchModelServerVersionsRequest request, CallSettings callSettings = null)

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0).

Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Parameters
Name Description
request FetchModelServerVersionsRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedAsyncEnumerableFetchModelServerVersionsResponsestring

A pageable asynchronous sequence of string resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
FetchModelServerVersionsRequest request = new FetchModelServerVersionsRequest
{
    Model = "",
    ModelServer = "",
};
// Make the request
PagedAsyncEnumerable<FetchModelServerVersionsResponse, string> response = gkeInferenceQuickstartClient.FetchModelServerVersionsAsync(request);

// Iterate over all response items, lazily performing RPCs as required
await response.ForEachAsync((string item) =>
{
    // Do something with each item
    Console.WriteLine(item);
});

// Or iterate over pages (of server-defined size), performing one RPC per page
await response.AsRawResponses().ForEachAsync((FetchModelServerVersionsResponse page) =>
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (string item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
});

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<string> singlePage = await response.ReadPageAsync(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (string item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchModelServers(FetchModelServersRequest, CallSettings)

public virtual PagedEnumerable<FetchModelServersResponse, string> FetchModelServers(FetchModelServersRequest request, CallSettings callSettings = null)

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Parameters
Name Description
request FetchModelServersRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedEnumerableFetchModelServersResponsestring

A pageable sequence of string resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = GkeInferenceQuickstartClient.Create();
// Initialize request argument(s)
FetchModelServersRequest request = new FetchModelServersRequest { Model = "", };
// Make the request
PagedEnumerable<FetchModelServersResponse, string> response = gkeInferenceQuickstartClient.FetchModelServers(request);

// Iterate over all response items, lazily performing RPCs as required
foreach (string item in response)
{
    // Do something with each item
    Console.WriteLine(item);
}

// Or iterate over pages (of server-defined size), performing one RPC per page
foreach (FetchModelServersResponse page in response.AsRawResponses())
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (string item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
}

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<string> singlePage = response.ReadPage(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (string item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchModelServersAsync(FetchModelServersRequest, CallSettings)

public virtual PagedAsyncEnumerable<FetchModelServersResponse, string> FetchModelServersAsync(FetchModelServersRequest request, CallSettings callSettings = null)

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Parameters
Name Description
request FetchModelServersRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedAsyncEnumerableFetchModelServersResponsestring

A pageable asynchronous sequence of string resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
FetchModelServersRequest request = new FetchModelServersRequest { Model = "", };
// Make the request
PagedAsyncEnumerable<FetchModelServersResponse, string> response = gkeInferenceQuickstartClient.FetchModelServersAsync(request);

// Iterate over all response items, lazily performing RPCs as required
await response.ForEachAsync((string item) =>
{
    // Do something with each item
    Console.WriteLine(item);
});

// Or iterate over pages (of server-defined size), performing one RPC per page
await response.AsRawResponses().ForEachAsync((FetchModelServersResponse page) =>
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (string item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
});

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<string> singlePage = await response.ReadPageAsync(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (string item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchModels(FetchModelsRequest, CallSettings)

public virtual PagedEnumerable<FetchModelsResponse, string> FetchModels(FetchModelsRequest request, CallSettings callSettings = null)

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Parameters
Name Description
request FetchModelsRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedEnumerableFetchModelsResponsestring

A pageable sequence of string resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = GkeInferenceQuickstartClient.Create();
// Initialize request argument(s)
FetchModelsRequest request = new FetchModelsRequest { };
// Make the request
PagedEnumerable<FetchModelsResponse, string> response = gkeInferenceQuickstartClient.FetchModels(request);

// Iterate over all response items, lazily performing RPCs as required
foreach (string item in response)
{
    // Do something with each item
    Console.WriteLine(item);
}

// Or iterate over pages (of server-defined size), performing one RPC per page
foreach (FetchModelsResponse page in response.AsRawResponses())
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (string item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
}

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<string> singlePage = response.ReadPage(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (string item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchModelsAsync(FetchModelsRequest, CallSettings)

public virtual PagedAsyncEnumerable<FetchModelsResponse, string> FetchModelsAsync(FetchModelsRequest request, CallSettings callSettings = null)

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Parameters
Name Description
request FetchModelsRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedAsyncEnumerableFetchModelsResponsestring

A pageable asynchronous sequence of string resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
FetchModelsRequest request = new FetchModelsRequest { };
// Make the request
PagedAsyncEnumerable<FetchModelsResponse, string> response = gkeInferenceQuickstartClient.FetchModelsAsync(request);

// Iterate over all response items, lazily performing RPCs as required
await response.ForEachAsync((string item) =>
{
    // Do something with each item
    Console.WriteLine(item);
});

// Or iterate over pages (of server-defined size), performing one RPC per page
await response.AsRawResponses().ForEachAsync((FetchModelsResponse page) =>
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (string item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
});

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<string> singlePage = await response.ReadPageAsync(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (string item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchProfiles(FetchProfilesRequest, CallSettings)

public virtual PagedEnumerable<FetchProfilesResponse, Profile> FetchProfiles(FetchProfilesRequest request, CallSettings callSettings = null)

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Parameters
Name Description
request FetchProfilesRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedEnumerableFetchProfilesResponseProfile

A pageable sequence of Profile resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = GkeInferenceQuickstartClient.Create();
// Initialize request argument(s)
FetchProfilesRequest request = new FetchProfilesRequest
{
    Model = "",
    ModelServer = "",
    ModelServerVersion = "",
    PerformanceRequirements = new PerformanceRequirements(),
};
// Make the request
PagedEnumerable<FetchProfilesResponse, Profile> response = gkeInferenceQuickstartClient.FetchProfiles(request);

// Iterate over all response items, lazily performing RPCs as required
foreach (Profile item in response)
{
    // Do something with each item
    Console.WriteLine(item);
}

// Or iterate over pages (of server-defined size), performing one RPC per page
foreach (FetchProfilesResponse page in response.AsRawResponses())
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (Profile item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
}

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<Profile> singlePage = response.ReadPage(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (Profile item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

FetchProfilesAsync(FetchProfilesRequest, CallSettings)

public virtual PagedAsyncEnumerable<FetchProfilesResponse, Profile> FetchProfilesAsync(FetchProfilesRequest request, CallSettings callSettings = null)

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.

Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Parameters
Name Description
request FetchProfilesRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
PagedAsyncEnumerableFetchProfilesResponseProfile

A pageable asynchronous sequence of Profile resources.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
FetchProfilesRequest request = new FetchProfilesRequest
{
    Model = "",
    ModelServer = "",
    ModelServerVersion = "",
    PerformanceRequirements = new PerformanceRequirements(),
};
// Make the request
PagedAsyncEnumerable<FetchProfilesResponse, Profile> response = gkeInferenceQuickstartClient.FetchProfilesAsync(request);

// Iterate over all response items, lazily performing RPCs as required
await response.ForEachAsync((Profile item) =>
{
    // Do something with each item
    Console.WriteLine(item);
});

// Or iterate over pages (of server-defined size), performing one RPC per page
await response.AsRawResponses().ForEachAsync((FetchProfilesResponse page) =>
{
    // Do something with each page of items
    Console.WriteLine("A page of results:");
    foreach (Profile item in page)
    {
        // Do something with each item
        Console.WriteLine(item);
    }
});

// Or retrieve a single page of known size (unless it's the final page), performing as many RPCs as required
int pageSize = 10;
Page<Profile> singlePage = await response.ReadPageAsync(pageSize);
// Do something with the page of items
Console.WriteLine($"A page of {pageSize} results (unless it's the final page):");
foreach (Profile item in singlePage)
{
    // Do something with each item
    Console.WriteLine(item);
}
// Store the pageToken, for when the next page is required.
string nextPageToken = singlePage.NextPageToken;

GenerateOptimizedManifest(GenerateOptimizedManifestRequest, CallSettings)

public virtual GenerateOptimizedManifestResponse GenerateOptimizedManifest(GenerateOptimizedManifestRequest request, CallSettings callSettings = null)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Parameters
Name Description
request GenerateOptimizedManifestRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
GenerateOptimizedManifestResponse

The RPC response.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = GkeInferenceQuickstartClient.Create();
// Initialize request argument(s)
GenerateOptimizedManifestRequest request = new GenerateOptimizedManifestRequest
{
    ModelServerInfo = new ModelServerInfo(),
    AcceleratorType = "",
    KubernetesNamespace = "",
    PerformanceRequirements = new PerformanceRequirements(),
    StorageConfig = new StorageConfig(),
};
// Make the request
GenerateOptimizedManifestResponse response = gkeInferenceQuickstartClient.GenerateOptimizedManifest(request);

GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest, CallSettings)

public virtual Task<GenerateOptimizedManifestResponse> GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest request, CallSettings callSettings = null)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Parameters
Name Description
request GenerateOptimizedManifestRequest

The request object containing all of the parameters for the API call.

callSettings CallSettings

If not null, applies overrides to this RPC call.

Returns
Type Description
TaskGenerateOptimizedManifestResponse

A Task containing the RPC response.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
GenerateOptimizedManifestRequest request = new GenerateOptimizedManifestRequest
{
    ModelServerInfo = new ModelServerInfo(),
    AcceleratorType = "",
    KubernetesNamespace = "",
    PerformanceRequirements = new PerformanceRequirements(),
    StorageConfig = new StorageConfig(),
};
// Make the request
GenerateOptimizedManifestResponse response = await gkeInferenceQuickstartClient.GenerateOptimizedManifestAsync(request);

GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest, CancellationToken)

public virtual Task<GenerateOptimizedManifestResponse> GenerateOptimizedManifestAsync(GenerateOptimizedManifestRequest request, CancellationToken cancellationToken)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Parameters
Name Description
request GenerateOptimizedManifestRequest

The request object containing all of the parameters for the API call.

cancellationToken CancellationToken

A CancellationToken to use for this RPC.

Returns
Type Description
TaskGenerateOptimizedManifestResponse

A Task containing the RPC response.

Example
// Create client
GkeInferenceQuickstartClient gkeInferenceQuickstartClient = await GkeInferenceQuickstartClient.CreateAsync();
// Initialize request argument(s)
GenerateOptimizedManifestRequest request = new GenerateOptimizedManifestRequest
{
    ModelServerInfo = new ModelServerInfo(),
    AcceleratorType = "",
    KubernetesNamespace = "",
    PerformanceRequirements = new PerformanceRequirements(),
    StorageConfig = new StorageConfig(),
};
// Make the request
GenerateOptimizedManifestResponse response = await gkeInferenceQuickstartClient.GenerateOptimizedManifestAsync(request);

ShutdownDefaultChannelsAsync()

public static Task ShutdownDefaultChannelsAsync()

Shuts down any channels automatically created by Create() and CreateAsync(CancellationToken). Channels which weren't automatically created are not affected.

Returns
Type Description
Task

A task representing the asynchronous shutdown operation.

Remarks

After calling this method, further calls to Create() and CreateAsync(CancellationToken) will create new channels, which could in turn be shut down by another call to this method.