public sealed class PerformanceStats : IMessage<PerformanceStats>, IEquatable<PerformanceStats>, IDeepCloneable<PerformanceStats>, IBufferMessage, IMessageReference documentation and code samples for the GKE Recommender v1 API class PerformanceStats.
Performance statistics for a model deployment.
Implements
IMessagePerformanceStats, IEquatablePerformanceStats, IDeepCloneablePerformanceStats, IBufferMessage, IMessageNamespace
Google.Cloud.GkeRecommender.V1Assembly
Google.Cloud.GkeRecommender.V1.dll
Constructors
PerformanceStats()
public PerformanceStats()PerformanceStats(PerformanceStats)
public PerformanceStats(PerformanceStats other)| Parameter | |
|---|---|
| Name | Description |
other |
PerformanceStats |
Properties
Cost
public RepeatedField<Cost> Cost { get; }Output only. The cost of running the model deployment.
| Property Value | |
|---|---|
| Type | Description |
RepeatedFieldCost |
|
NtpotMilliseconds
public int NtpotMilliseconds { get; set; }Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
| Property Value | |
|---|---|
| Type | Description |
int |
|
OutputTokensPerSecond
public int OutputTokensPerSecond { get; set; }Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
| Property Value | |
|---|---|
| Type | Description |
int |
|
QueriesPerSecond
public float QueriesPerSecond { get; set; }Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
| Property Value | |
|---|---|
| Type | Description |
float |
|
TtftMilliseconds
public int TtftMilliseconds { get; set; }Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
| Property Value | |
|---|---|
| Type | Description |
int |
|