Class GkeInferenceQuickstartGrpc.GkeInferenceQuickstartBlockingV2Stub (0.1.0)
Stay organized with collections
Save and categorize content based on your preferences.
A stub to allow clients to do synchronous rpc calls to service GkeInferenceQuickstart.
GKE Inference Quickstart (GIQ) service provides profiles with performance
metrics for popular models and model servers across multiple accelerators.
These profiles help generate optimized best practices for running inference
on GKE.
Fetches all of the benchmarking data available for a profile. Benchmarking
data returns all of the performance metrics available for a given model
server setup on a given instance type.
Fetches available model server versions. Open-source servers use their own
versioning schemas (e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the
accelerator. For example, vllm uses semver on GPUs, but returns nightly
build tags on TPUs. All available versions will be returned when different
schemas are present.
Fetches available profiles. A profile contains performance metrics and
cost information for a specific model server setup. Profiles can be
filtered by parameters. If no filters are provided, all profiles are
returned.
Profiles display a single value per performance metric based on the
provided performance requirements. If no requirements are given, the
metrics represent the inflection point. See Run best practice inference
with GKE Inference Quickstart
recipes
for details.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-17 UTC."],[],[]]