[BindServiceMethod(typeof(GkeInferenceQuickstart), "BindService")]
public abstract class GkeInferenceQuickstart.GkeInferenceQuickstartBaseReference documentation and code samples for the GKE Recommender v1 API class GkeInferenceQuickstart.GkeInferenceQuickstartBase.
Base class for server-side implementations of GkeInferenceQuickstart
Namespace
Google.Cloud.GkeRecommender.V1Assembly
Google.Cloud.GkeRecommender.V1.dll
Methods
FetchBenchmarkingData(FetchBenchmarkingDataRequest, ServerCallContext)
public virtual Task<FetchBenchmarkingDataResponse> FetchBenchmarkingData(FetchBenchmarkingDataRequest request, ServerCallContext context)Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.
| Parameters | |
|---|---|
| Name | Description |
request |
FetchBenchmarkingDataRequestThe request received from the client. |
context |
ServerCallContextThe context of the server-side call handler being invoked. |
| Returns | |
|---|---|
| Type | Description |
TaskFetchBenchmarkingDataResponse |
The response to send back to the client (wrapped by a task). |
FetchModelServerVersions(FetchModelServerVersionsRequest, ServerCallContext)
public virtual Task<FetchModelServerVersionsResponse> FetchModelServerVersions(FetchModelServerVersionsRequest request, ServerCallContext context)Fetches available model server versions. Open-source servers use their own
versioning schemas (e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the
accelerator. For example, vllm uses semver on GPUs, but returns nightly
build tags on TPUs. All available versions will be returned when different
schemas are present.
| Parameters | |
|---|---|
| Name | Description |
request |
FetchModelServerVersionsRequestThe request received from the client. |
context |
ServerCallContextThe context of the server-side call handler being invoked. |
| Returns | |
|---|---|
| Type | Description |
TaskFetchModelServerVersionsResponse |
The response to send back to the client (wrapped by a task). |
FetchModelServers(FetchModelServersRequest, ServerCallContext)
public virtual Task<FetchModelServersResponse> FetchModelServers(FetchModelServersRequest request, ServerCallContext context)Fetches available model servers. Open-source model servers use simplified,
lowercase names (e.g., vllm).
| Parameters | |
|---|---|
| Name | Description |
request |
FetchModelServersRequestThe request received from the client. |
context |
ServerCallContextThe context of the server-side call handler being invoked. |
| Returns | |
|---|---|
| Type | Description |
TaskFetchModelServersResponse |
The response to send back to the client (wrapped by a task). |
FetchModels(FetchModelsRequest, ServerCallContext)
public virtual Task<FetchModelsResponse> FetchModels(FetchModelsRequest request, ServerCallContext context)Fetches available models. Open-source models follow the Huggingface Hub
owner/model_name format.
| Parameters | |
|---|---|
| Name | Description |
request |
FetchModelsRequestThe request received from the client. |
context |
ServerCallContextThe context of the server-side call handler being invoked. |
| Returns | |
|---|---|
| Type | Description |
TaskFetchModelsResponse |
The response to send back to the client (wrapped by a task). |
FetchProfiles(FetchProfilesRequest, ServerCallContext)
public virtual Task<FetchProfilesResponse> FetchProfiles(FetchProfilesRequest request, ServerCallContext context)Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned.
Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.
| Parameters | |
|---|---|
| Name | Description |
request |
FetchProfilesRequestThe request received from the client. |
context |
ServerCallContextThe context of the server-side call handler being invoked. |
| Returns | |
|---|---|
| Type | Description |
TaskFetchProfilesResponse |
The response to send back to the client (wrapped by a task). |
GenerateOptimizedManifest(GenerateOptimizedManifestRequest, ServerCallContext)
public virtual Task<GenerateOptimizedManifestResponse> GenerateOptimizedManifest(GenerateOptimizedManifestRequest request, ServerCallContext context)Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.
| Parameters | |
|---|---|
| Name | Description |
request |
GenerateOptimizedManifestRequestThe request received from the client. |
context |
ServerCallContextThe context of the server-side call handler being invoked. |
| Returns | |
|---|---|
| Type | Description |
TaskGenerateOptimizedManifestResponse |
The response to send back to the client (wrapped by a task). |