Class GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub (0.5.0)

public static final class GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub extends AbstractAsyncStub<GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub>

A stub to allow clients to do asynchronous rpc calls to service GkeInferenceQuickstart.

GKE Inference Quickstart (GIQ) service provides profiles with performance metrics for popular models and model servers across multiple accelerators. These profiles help generate optimized best practices for running inference on GKE.

Inheritance

java.lang.Object > io.grpc.stub.AbstractStub > io.grpc.stub.AbstractAsyncStub > GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub

Inherited Members

io.grpc.stub.AbstractAsyncStub.<T>newStub(io.grpc.stub.AbstractStub.StubFactory<T>,io.grpc.Channel)

io.grpc.stub.AbstractAsyncStub.<T>newStub(io.grpc.stub.AbstractStub.StubFactory<T>,io.grpc.Channel,io.grpc.CallOptions)

io.grpc.stub.AbstractStub.<T>withOption(io.grpc.CallOptions.Key<T>,T)

io.grpc.stub.AbstractStub.build(io.grpc.Channel,io.grpc.CallOptions)

io.grpc.stub.AbstractStub.getCallOptions()

io.grpc.stub.AbstractStub.getChannel()

io.grpc.stub.AbstractStub.withCallCredentials(io.grpc.CallCredentials)

io.grpc.stub.AbstractStub.withChannel(io.grpc.Channel)

io.grpc.stub.AbstractStub.withCompression(java.lang.String)

io.grpc.stub.AbstractStub.withDeadline(io.grpc.Deadline)

io.grpc.stub.AbstractStub.withDeadlineAfter(java.time.Duration)

io.grpc.stub.AbstractStub.withDeadlineAfter(long,java.util.concurrent.TimeUnit)

io.grpc.stub.AbstractStub.withExecutor(java.util.concurrent.Executor)

io.grpc.stub.AbstractStub.withInterceptors(io.grpc.ClientInterceptor...)

io.grpc.stub.AbstractStub.withMaxInboundMessageSize(int)

io.grpc.stub.AbstractStub.withMaxOutboundMessageSize(int)

io.grpc.stub.AbstractStub.withOnReadyThreshold(int)

io.grpc.stub.AbstractStub.withWaitForReady()

Object.clone()

Object.equals(Object)

Object.finalize()

Object.getClass()

Object.hashCode()

Object.notify()

Object.notifyAll()

Object.toString()

Object.wait()

Object.wait(long)

Object.wait(long,int)

Methods

build(Channel channel, CallOptions callOptions)

protected GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub build(Channel channel, CallOptions callOptions)

Parameters
Name	Description
`channel`	`io.grpc.Channel`
`callOptions`	`io.grpc.CallOptions`

Returns
Type	Description
`GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub`

Overrides

io.grpc.stub.AbstractStub.build(io.grpc.Channel,io.grpc.CallOptions)

fetchBenchmarkingData(FetchBenchmarkingDataRequest request, StreamObserver<FetchBenchmarkingDataResponse> responseObserver)

public void fetchBenchmarkingData(FetchBenchmarkingDataRequest request, StreamObserver<FetchBenchmarkingDataResponse> responseObserver)

Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type.

Parameters
Name	Description
`request`	`FetchBenchmarkingDataRequest`
`responseObserver`	`io.grpc.stub.StreamObserver<FetchBenchmarkingDataResponse>`

fetchModelServerVersions(FetchModelServerVersionsRequest request, StreamObserver<FetchModelServerVersionsResponse> responseObserver)

public void fetchModelServerVersions(FetchModelServerVersionsRequest request, StreamObserver<FetchModelServerVersionsResponse> responseObserver)

Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., vllm uses semver like v1.0.0). Some model servers have different versioning schemas depending on the accelerator. For example, vllm uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present.

Parameters
Name	Description
`request`	`FetchModelServerVersionsRequest`
`responseObserver`	`io.grpc.stub.StreamObserver<FetchModelServerVersionsResponse>`

fetchModelServers(FetchModelServersRequest request, StreamObserver<FetchModelServersResponse> responseObserver)

public void fetchModelServers(FetchModelServersRequest request, StreamObserver<FetchModelServersResponse> responseObserver)

Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., vllm).

Parameters
Name	Description
`request`	`FetchModelServersRequest`
`responseObserver`	`io.grpc.stub.StreamObserver<FetchModelServersResponse>`

fetchModels(FetchModelsRequest request, StreamObserver<FetchModelsResponse> responseObserver)

public void fetchModels(FetchModelsRequest request, StreamObserver<FetchModelsResponse> responseObserver)

Fetches available models. Open-source models follow the Huggingface Hub owner/model_name format.

Parameters
Name	Description
`request`	`FetchModelsRequest`
`responseObserver`	`io.grpc.stub.StreamObserver<FetchModelsResponse>`

fetchProfiles(FetchProfilesRequest request, StreamObserver<FetchProfilesResponse> responseObserver)

public void fetchProfiles(FetchProfilesRequest request, StreamObserver<FetchProfilesResponse> responseObserver)

Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned. Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See Run best practice inference with GKE Inference Quickstart recipes for details.

Parameters
Name	Description
`request`	`FetchProfilesRequest`
`responseObserver`	`io.grpc.stub.StreamObserver<FetchProfilesResponse>`

generateOptimizedManifest(GenerateOptimizedManifestRequest request, StreamObserver<GenerateOptimizedManifestResponse> responseObserver)

public void generateOptimizedManifest(GenerateOptimizedManifestRequest request, StreamObserver<GenerateOptimizedManifestResponse> responseObserver)

Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See Run best practice inference with GKE Inference Quickstart recipes for deployment details.

Parameters
Name	Description
`request`	`GenerateOptimizedManifestRequest`
`responseObserver`	`io.grpc.stub.StreamObserver<GenerateOptimizedManifestResponse>`

Class GkeInferenceQuickstartGrpc.GkeInferenceQuickstartStub (0.5.0) Stay organized with collections Save and categorize content based on your preferences.