PerformanceRange(mapping=None, *, ignore_unknown_fields=False, **kwargs)Performance range for a model deployment.
Attributes |
|
|---|---|
| Name | Description |
throughput_output_range |
google.cloud.gkerecommender_v1.types.TokensPerSecondRange
Output only. The range of throughput in output tokens per second. This is measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds. |
ttft_range |
google.cloud.gkerecommender_v1.types.MillisecondRange
Output only. The range of TTFT (Time To First Token) in milliseconds. TTFT is the time it takes to generate the first token for a request. |
ntpot_range |
google.cloud.gkerecommender_v1.types.MillisecondRange
Output only. The range of NTPOT (Normalized Time Per Output Token) in milliseconds. NTPOT is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens. |