Questa pagina è stata tradotta dall'API Cloud Translation.

Esegui il deployment di un'applicazione Ray Serve con un modello Stable Diffusion su Google Kubernetes Engine (GKE)

Questa guida fornisce un esempio di come eseguire il deployment e pubblicare un modello Stable Diffusion su Google Kubernetes Engine (GKE) utilizzando Ray Serve e il componente aggiuntivo Ray Operator come implementazione di esempio.

Informazioni su Ray e Ray Serve

Ray è un framework di calcolo scalabile open source per applicazioni AI/ML. Ray Serve è una libreria di distribuzione di modelli per Ray utilizzata per scalare e distribuire modelli in un ambiente distribuito. Per ulteriori informazioni, consulta Ray Serve nella documentazione di Ray.

Puoi utilizzare una risorsa RayCluster o RayService per eseguire il deployment delle tue applicazioni Ray Serve. Devi utilizzare una risorsa RayService in produzione per i seguenti motivi:

Aggiornamenti sul posto per le applicazioni RayService
Upgrade senza tempi di inattività per le risorse RayCluster
Applicazioni Ray Serve ad alta affidabilità

prepara l'ambiente

Per preparare l'ambiente, segui questi passaggi:

Avvia una sessione di Cloud Shell dalla console Google Cloud facendo clic su Attiva Cloud Shell nella consoleGoogle Cloud . Viene avviata una sessione nel riquadro inferiore della console Google Cloud .

Imposta le variabili di ambiente:

export PROJECT_ID=PROJECT_ID
export CLUSTER_NAME=rayserve-cluster
export COMPUTE_REGION=us-central1
export COMPUTE_ZONE=us-central1-c
export CLUSTER_VERSION=CLUSTER_VERSION
export TUTORIAL_HOME=`pwd`

Sostituisci quanto segue:

PROJECT_ID: il tuo Google Cloud ID progetto.
CLUSTER_VERSION: la versione di GKE da utilizzare. Deve essere 1.30.1 o successiva.

Clona il repository GitHub:

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

Passa alla directory di lavoro:

cd kubernetes-engine-samples/ai-ml/gke-ray/rayserve/stable-diffusion

Crea un ambiente virtuale Python:
venv
```
python -m venv myenv && \
source myenv/bin/activate
```
Conda
1. Installa Conda.
2. Esegui questi comandi:
  conda create -c conda-forge python=3.9.19 -n myenv && \ conda activate myenv
Quando esegui il deployment di un'applicazione Serve con serve run, Ray si aspetta che la versione Python del client locale corrisponda a quella utilizzata nel cluster Ray. L'immagine rayproject/ray:2.37.0 utilizza Python 3.9. Se utilizzi una versione diversa del client, seleziona l'immagine Ray appropriata.

Installa le dipendenze richieste per eseguire l'applicazione Serve:

pip install ray[serve]==2.37.0
pip install torch
pip install requests

Crea un cluster con un pool di nodi GPU

Crea un cluster GKE Autopilot o Standard con un pool di nodi GPU:

Autopilot

Crea un cluster Autopilot:

gcloud container clusters create-auto ${CLUSTER_NAME}  \
    --enable-ray-operator \
    --cluster-version=${CLUSTER_VERSION} \
    --location=${COMPUTE_REGION}

Standard

Crea un cluster Standard:

gcloud container clusters create ${CLUSTER_NAME} \
    --addons=RayOperator \
    --cluster-version=${CLUSTER_VERSION}  \
    --machine-type=c3d-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1

Crea un pool di nodi GPU:

gcloud container node-pools create gpu-pool \
    --cluster=${CLUSTER_NAME} \
    --machine-type=g2-standard-8 \
    --location=${COMPUTE_ZONE} \
    --num-nodes=1 \
    --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest

Esegui il deployment di una risorsa RayCluster

Per eseguire il deployment di una risorsa RayCluster:

Esamina il seguente manifest:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: stable-diffusion-cluster
spec:
  rayVersion: '2.37.0'
  headGroupSpec:
    rayStartParams:
      dashboard-host: '0.0.0.0'
    template:
      metadata:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.37.0
          ports:
          - containerPort: 6379
            name: gcs
          - containerPort: 8265
            name: dashboard
          - containerPort: 10001
            name: client
          - containerPort: 8000
            name: serve
          resources:
            limits:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
            requests:
              cpu: "2"
              ephemeral-storage: "15Gi"
              memory: "8Gi"
        nodeSelector:
          cloud.google.com/machine-family: c3d
  workerGroupSpecs:
  - replicas: 1
    minReplicas: 1
    maxReplicas: 4
    groupName: gpu-group
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-worker
          image: rayproject/ray:2.37.0-gpu
          resources:
            limits:
              cpu: 4
              memory: "16Gi"
              nvidia.com/gpu: 1
            requests:
              cpu: 3
              memory: "16Gi"
              nvidia.com/gpu: 1
        nodeSelector:
          cloud.google.com/gke-accelerator: nvidia-l4

Questo manifest descrive una risorsa RayCluster.

Applica il manifest al cluster:
```
kubectl apply -f ray-cluster.yaml
```

Verifica che la risorsa RayCluster sia pronta:

kubectl get raycluster

L'output è simile al seguente:

NAME                       DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
stable-diffusion-cluster   2                 2                   6      20Gi     0      ready    33s

In questo output, ready nella colonna STATUS indica che la risorsa RayCluster è pronta.

Connettiti alla risorsa RayCluster

Per connetterti alla risorsa RayCluster:

Verifica che GKE abbia creato il servizio RayCluster:

kubectl get svc stable-diffusion-cluster-head-svc

L'output è simile al seguente:

NAME                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                AGE
pytorch-mnist-cluster-head-svc   ClusterIP   34.118.238.247   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP   109s

Stabilisci sessioni di port forwarding all'head Ray:

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8265:8265 2>&1 >/dev/null &
kubectl port-forward svc/stable-diffusion-cluster-head-svc 10001:10001 2>&1 >/dev/null &

Verifica che il client Ray possa connettersi al cluster Ray utilizzando localhost:

ray list nodes --address http://localhost:8265

L'output è simile al seguente:

======== List: 2024-06-19 15:15:15.707336 ========
Stats:
------------------------------
Total: 3

Table:
------------------------------
    NODE_ID                                                   NODE_IP     IS_HEAD_NODE    STATE    NODE_NAME    RESOURCES_TOTAL                 LABELS
0  1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2  10.28.1.21  False           ALIVE    10.28.1.21   CPU: 2.0                        ray.io/node_id: 1d07447d7d124db641052a3443ed882f913510dbe866719ac36667d2
# Several lines of output omitted

Esegui un'applicazione Ray Serve

Per eseguire un'applicazione Ray Serve:

Esegui l'applicazione Stable Diffusion Ray Serve:

serve run stable_diffusion:entrypoint --working-dir=. --runtime-env-json='{"pip": ["torch", "torchvision", "diffusers==0.12.1", "huggingface_hub==0.25.2", "transformers", "fastapi==0.113.0"], "excludes": ["myenv"]}' --address ray://localhost:10001

L'output è simile al seguente:

2024-06-19 18:20:58,444 INFO scripts.py:499 -- Running import path: 'stable_diffusion:entrypoint'.
2024-06-19 18:20:59,730 INFO packaging.py:530 -- Creating a file package for local directory '.'.
2024-06-19 18:21:04,833 INFO handle.py:126 -- Created DeploymentHandle 'hyil6u9f' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,834 INFO handle.py:126 -- Created DeploymentHandle 'xo25rl4k' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle '57x9u4fp' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'xr6kt85t' for Deployment(name='StableDiffusionV2', app='default').
2024-06-19 18:21:04,836 INFO handle.py:126 -- Created DeploymentHandle 'g54qagbz' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO handle.py:126 -- Created DeploymentHandle 'iwuz00mv' for Deployment(name='APIIngress', app='default').
2024-06-19 18:21:19,139 INFO api.py:583 -- Deployed app 'default' successfully.

Stabilisci una sessione di port forwarding alla porta Ray Serve (8000):

kubectl port-forward svc/stable-diffusion-cluster-head-svc 8000:8000 2>&1 >/dev/null &

Esegui lo script Python:
```
python generate_image.py
```
Lo script genera un'immagine in un file denominato output.png. L'immagine è simile alla seguente:

Esegui il deployment di un RayService

La risorsa personalizzata RayService gestisce il ciclo di vita di una risorsa RayCluster e dell'applicazione Ray Serve.

Per ulteriori informazioni su RayService, consulta Deploy Ray Serve Applications e Production Guide nella documentazione di Ray.

Per eseguire il deployment di una risorsa RayService:

Esamina il seguente manifest:

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: stable-diffusion
spec:
  serveConfigV2: |
    applications:
      - name: stable_diffusion
        import_path: ai-ml.gke-ray.rayserve.stable-diffusion.stable_diffusion:entrypoint
        runtime_env:
          working_dir: "https://github.com/GoogleCloudPlatform/kubernetes-engine-samples/archive/main.zip"
          pip: ["diffusers==0.12.1", "torch", "torchvision", "huggingface_hub==0.25.2", "transformers"]
  rayClusterConfig:
    rayVersion: '2.37.0'
    headGroupSpec:
      rayStartParams:
        dashboard-host: '0.0.0.0'
      template:
        spec:
          containers:
          - name: ray-head
            image:  rayproject/ray:2.37.0
            ports:
            - containerPort: 6379
              name: gcs
            - containerPort: 8265
              name: dashboard
            - containerPort: 10001
              name: client
            - containerPort: 8000
              name: serve
            resources:
              limits:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
              requests:
                cpu: "2"
                ephemeral-storage: "15Gi"
                memory: "8Gi"
          nodeSelector:
            cloud.google.com/machine-family: c3d
    workerGroupSpecs:
    - replicas: 1
      minReplicas: 1
      maxReplicas: 4
      groupName: gpu-group
      rayStartParams: {}
      template:
        spec:
          containers:
          - name: ray-worker
            image: rayproject/ray:2.37.0-gpu
            resources:
              limits:
                cpu: 4
                memory: "16Gi"
                nvidia.com/gpu: 1
              requests:
                cpu: 3
                memory: "16Gi"
                nvidia.com/gpu: 1
          nodeSelector:
            cloud.google.com/gke-accelerator: nvidia-l4

Questo manifest descrive una risorsa personalizzata RayService.

Applica il manifest al cluster:
```
kubectl apply -f ray-service.yaml
```

Verifica che il servizio sia pronto:

kubectl get svc stable-diffusion-serve-svc

L'output è simile al seguente:

NAME                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE

stable-diffusion-serve-svc   ClusterIP   34.118.236.0   <none>        8000/TCP   31m

Configura il port forwarding al servizio Ray Serve:

kubectl port-forward svc/stable-diffusion-serve-svc 8000:8000 2>&1 >/dev/null &

Esegui lo script Python della sezione precedente:
```
python generate_image.py
```
Lo script genera un'immagine simile a quella generata nella sezione precedente.

Esegui il deployment di un'applicazione Ray Serve con un modello Stable Diffusion su Google Kubernetes Engine (GKE) Mantieni tutto organizzato con le raccolte Salva e classifica i contenuti in base alle tue preferenze.

Informazioni su Ray e Ray Serve

prepara l'ambiente

venv

Conda

Crea un cluster con un pool di nodi GPU

Autopilot

Standard

Esegui il deployment di una risorsa RayCluster

Connettiti alla risorsa RayCluster

Esegui un'applicazione Ray Serve

Esegui il deployment di un RayService

Esegui il deployment di un'applicazione Ray Serve con un modello Stable Diffusion su Google Kubernetes Engine (GKE)