Optional. The Google Cloud Storage bucket URI to load the model from. This
URI must point to the directory containing the model's config file
(config.json) and model weights. A tuned GCSFuse setup can improve
LLM Pod startup time by more than 7x. Expected format:
gs://<bucket-name>/<path-to-model>.
Optional. The URI for the GCS bucket containing the XLA compilation cache.
If using TPUs, the XLA cache will be written to the same path as
model_bucket_uri. This can speed up vLLM model preparation for repeated
deployments.
Optional. The Google Cloud Storage bucket URI to load the model from. This
URI must point to the directory containing the model's config file
(config.json) and model weights. A tuned GCSFuse setup can improve
LLM Pod startup time by more than 7x. Expected format:
gs://<bucket-name>/<path-to-model>.
Optional. The Google Cloud Storage bucket URI to load the model from. This
URI must point to the directory containing the model's config file
(config.json) and model weights. A tuned GCSFuse setup can improve
LLM Pod startup time by more than 7x. Expected format:
gs://<bucket-name>/<path-to-model>.
Optional. The URI for the GCS bucket containing the XLA compilation cache.
If using TPUs, the XLA cache will be written to the same path as
model_bucket_uri. This can speed up vLLM model preparation for repeated
deployments.
Optional. The URI for the GCS bucket containing the XLA compilation cache.
If using TPUs, the XLA cache will be written to the same path as
model_bucket_uri. This can speed up vLLM model preparation for repeated
deployments.
Optional. The Google Cloud Storage bucket URI to load the model from. This
URI must point to the directory containing the model's config file
(config.json) and model weights. A tuned GCSFuse setup can improve
LLM Pod startup time by more than 7x. Expected format:
gs://<bucket-name>/<path-to-model>.
Optional. The Google Cloud Storage bucket URI to load the model from. This
URI must point to the directory containing the model's config file
(config.json) and model weights. A tuned GCSFuse setup can improve
LLM Pod startup time by more than 7x. Expected format:
gs://<bucket-name>/<path-to-model>.
Optional. The URI for the GCS bucket containing the XLA compilation cache.
If using TPUs, the XLA cache will be written to the same path as
model_bucket_uri. This can speed up vLLM model preparation for repeated
deployments.
Optional. The URI for the GCS bucket containing the XLA compilation cache.
If using TPUs, the XLA cache will be written to the same path as
model_bucket_uri. This can speed up vLLM model preparation for repeated
deployments.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-17 UTC."],[],[]]