str
Optional. The Google Cloud Storage bucket URI to load the
model from. This URI must point to the directory containing
the model's config file (config.json) and model weights.
A tuned GCSFuse setup can improve LLM Pod startup time by
more than 7x. Expected format:
gs://.
xla_cache_bucket_uri
str
Optional. The URI for the GCS bucket containing the XLA
compilation cache. If using TPUs, the XLA cache will be
written to the same path as model_bucket_uri. This can
speed up vLLM model preparation for repeated deployments.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-10-27 UTC."],[],[]]