Halaman ini menjelaskan cara menjalankan pengujian NCCL/gIB pada cluster yang disediakan yang menggunakan GPUDirect RDMA. Bagian ini menjelaskan pengujian untuk skenario berikut:
- Jika Anda memiliki node yang disediakan dengan flex-start (Pratinjau), gunakan pengujian dasar pada dua node.
- Jika Anda memiliki sejumlah besar node yang tidak disediakan dengan flex-start, gunakan pengujian NCCL dengan Penjadwalan yang Mendukung Topologi.
Pengujian pada dua node
Jalankan pengujian dua node:
A4
Untuk men-deploy workload pengujian NCCL dari dua Pod pengujian yang berjalan di dua node A4, terapkan salah satu manifes berikut:
Untuk cluster Autopilot:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-test-a4-autopilot.yamlUntuk cluster Standard:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-test-a4.yaml
Periksa apakah Pod dijadwalkan dan berjalan di beberapa node:
kubectl get pods nccl-test-host-1 nccl-test-host-2Jika kedua Pod memiliki status
Running, Anda dapat melanjutkan ke langkah berikutnya. Untuk node yang disediakan oleh flex-start, mungkin perlu waktu beberapa menit sebelum node dibuat dan Pod dijadwalkan di node tersebut.Memicu pengujian pengumpulan semua NCCL untuk node:
kubectl exec nccl-test-host-1 -it -- /usr/local/gib/scripts/run_nccl_tests.sh -t all_gather -b 1K -e 8G nccl-host-1 nccl-host-2Outputnya akan mirip dengan berikut ini:
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 16 float none -1 48.17 0.02 0.02 0 47.21 0.02 0.02 0 2048 32 float none -1 47.23 0.04 0.04 0 47.17 0.04 0.04 0 4096 64 float none -1 47.43 0.09 0.08 0 47.48 0.09 0.08 0 8192 128 float none -1 47.93 0.17 0.16 0 47.98 0.17 0.16 0 16384 256 float none -1 48.90 0.34 0.31 0 48.75 0.34 0.32 0 32768 512 float none -1 50.10 0.65 0.61 0 49.59 0.66 0.62 0 65536 1024 float none -1 51.70 1.27 1.19 0 51.66 1.27 1.19 0 131072 2048 float none -1 52.23 2.51 2.35 0 55.60 2.36 2.21 0 262144 4096 float none -1 53.89 4.86 4.56 0 53.39 4.91 4.60 0 524288 8192 float none -1 56.80 9.23 8.65 0 57.66 9.09 8.52 0 1048576 16384 float none -1 87.85 11.94 11.19 0 88.47 11.85 11.11 0 2097152 32768 float none -1 92.52 22.67 21.25 0 93.22 22.50 21.09 0 4194304 65536 float none -1 97.41 43.06 40.37 0 96.15 43.62 40.90 0 8388608 131072 float none -1 110.0 76.27 71.51 0 110.9 75.66 70.93 0 16777216 262144 float none -1 141.3 118.77 111.35 0 140.7 119.27 111.81 0 33554432 524288 float none -1 203.2 165.14 154.82 0 202.3 165.90 155.53 0 67108864 1048576 float none -1 303.3 221.25 207.42 0 301.9 222.27 208.38 0 134217728 2097152 float none -1 513.2 261.56 245.21 0 509.3 263.56 247.08 0 268435456 4194304 float none -1 842.4 318.64 298.72 0 832.3 322.54 302.38 0 536870912 8388608 float none -1 1511.8 355.12 332.92 0 1502.5 357.31 334.98 0 1073741824 16777216 float none -1 2976.7 360.72 338.17 0 2923.2 367.32 344.36 0 2147483648 33554432 float none -1 5888.9 364.66 341.87 0 5766.2 372.43 349.15 0 4294967296 67108864 float none -1 11722 366.39 343.49 0 11457 374.88 351.45 0 8589934592 134217728 float none -1 23379 367.43 344.46 0 22818 376.45 352.92 0 # Out of bounds values : 0 OK # Avg bus bandwidth : 120.845
A3 Ultra
Untuk men-deploy workload pengujian NCCL dari dua Pod pengujian yang berjalan di dua node A3 Ultra, terapkan salah satu manifes berikut:
Untuk cluster Autopilot:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-test-autopilot.yamlUntuk cluster Standard:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/refs/heads/master/gpudirect-rdma/nccl-test.yaml
Periksa apakah Pod dijadwalkan dan berjalan di beberapa node:
kubectl get pods nccl-test-host-1 nccl-test-host-2Jika kedua Pod memiliki status
Running, Anda dapat melanjutkan ke langkah berikutnya. Untuk node yang disediakan oleh flex-start, mungkin perlu waktu beberapa menit sebelum node dibuat dan Pod dijadwalkan di node tersebut.Memicu pengujian pengumpulan semua NCCL untuk node:
kubectl exec nccl-test-host-1 -it -- /usr/local/gib/scripts/run_nccl_tests.sh -t all_gather -b 1K -e 8G nccl-host-1 nccl-host-2Outputnya akan mirip dengan berikut ini:
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1024 16 float none -1 56.00 0.02 0.02 0 55.59 0.02 0.02 0 2048 32 float none -1 55.79 0.04 0.03 0 55.57 0.04 0.03 0 4096 64 float none -1 56.29 0.07 0.07 0 57.35 0.07 0.07 0 8192 128 float none -1 56.44 0.15 0.14 0 56.32 0.15 0.14 0 16384 256 float none -1 57.57 0.28 0.27 0 57.60 0.28 0.27 0 32768 512 float none -1 57.92 0.57 0.53 0 59.35 0.55 0.52 0 65536 1024 float none -1 59.92 1.09 1.03 0 60.15 1.09 1.02 0 131072 2048 float none -1 59.21 2.21 2.08 0 61.82 2.12 1.99 0 262144 4096 float none -1 63.58 4.12 3.87 0 63.34 4.14 3.88 0 524288 8192 float none -1 64.89 8.08 7.57 0 65.09 8.06 7.55 0 1048576 16384 float none -1 80.90 12.96 12.15 0 77.49 13.53 12.69 0 2097152 32768 float none -1 80.22 26.14 24.51 0 79.88 26.25 24.61 0 4194304 65536 float none -1 82.86 50.62 47.45 0 82.47 50.86 47.68 0 8388608 131072 float none -1 95.83 87.53 82.06 0 93.27 89.94 84.32 0 16777216 262144 float none -1 122.8 136.58 128.04 0 121.7 137.86 129.24 0 33554432 524288 float none -1 180.6 185.75 174.14 0 179.2 187.19 175.49 0 67108864 1048576 float none -1 279.7 239.90 224.90 0 277.0 242.26 227.12 0 134217728 2097152 float none -1 507.5 264.46 247.93 0 485.1 276.66 259.37 0 268435456 4194304 float none -1 866.3 309.88 290.51 0 864.0 310.70 291.28 0 536870912 8388608 float none -1 1576.1 340.62 319.33 0 1558.2 344.54 323.01 0 1073741824 16777216 float none -1 3096.6 346.75 325.08 0 3047.5 352.33 330.31 0 2147483648 33554432 float none -1 6148.0 349.30 327.47 0 6034.3 355.88 333.64 0 4294967296 67108864 float none -1 12226 351.29 329.33 0 12000 357.92 335.55 0 8589934592 134217728 float none -1 24391 352.17 330.16 0 23920 359.11 336.67 0 # Out of bounds values : 0 OK # Avg bus bandwidth : 120.94
Menguji dengan Topology Aware Scheduling (TAS)
Jika Anda memiliki lebih dari dua node, sebaiknya gunakan pengujian berikut, yang menggunakan TAS. Ikuti langkah-langkah di bagian berikutnya untuk menyiapkan dan menjalankan pengujian di cluster Anda.
Menyiapkan cluster dengan Jobset dan plugin TAS
Instal plugin TAS:
Clone repositori git
container-engine-accelerators:cd ~ git clone https://github.com/GoogleCloudPlatform/container-engine-accelerators.gitTerapkan plugin TAS:
cd container-engine-accelerators/gke-topology-scheduler kubectl create configmap topology-scheduler-scripts --namespace kube-system --from-file=schedule-daemon.py=schedule-daemon.py --from-file=label-nodes-daemon.py=label-nodes-daemon.py kubectl apply -f service-account.yaml kubectl apply -f schedule-daemon.yaml kubectl apply -f label-nodes-daemon.yaml
Men-deploy workload pengujian NCCL dengan TAS
A4
Buat manifes
nccl-jobset-test.yamlberikut:apiVersion: jobset.x-k8s.io/v1alpha2 kind: JobSet metadata: # The name `nccl-ag` is used for an NCCL all-gather test. name: nccl-ag spec: ttlSecondsAfterFinished: 1200 suspend: False network: enableDNSHostnames: true replicatedJobs: - name: worker template: spec: parallelism: NUM_NODES completions: NUM_NODES template: metadata: annotations: networking.gke.io/default-interface: 'eth0' networking.gke.io/interfaces: | [ {"interfaceName":"eth0","network":"default"}, {"interfaceName":"eth2","network":"rdma-0"}, {"interfaceName":"eth3","network":"rdma-1"}, {"interfaceName":"eth4","network":"rdma-2"}, {"interfaceName":"eth5","network":"rdma-3"}, {"interfaceName":"eth6","network":"rdma-4"}, {"interfaceName":"eth7","network":"rdma-5"}, {"interfaceName":"eth8","network":"rdma-6"}, {"interfaceName":"eth9","network":"rdma-7"} ] spec: activeDeadlineSeconds: 3600 restartPolicy: Never nodeSelector: cloud.google.com/gke-accelerator: nvidia-b200 tolerations: - key: cloud.google.com/gke-queued effect: NoSchedule value: "true" - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule" setHostnameAsFQDN: true volumes: - name: gib hostPath: path: /home/kubernetes/bin/gib - name: nvidia hostPath: path: /home/kubernetes/bin/nvidia - name: lib64 hostPath: path: /lib64 - name: shared-memory emptyDir: medium: "Memory" sizeLimit: 250Gi schedulingGates: - name: "gke.io/topology-aware-auto-nccl-test" containers: - name: nccl-test stdin: true tty: true image: us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib-diagnostic:v1.0.6 env: - name: MY_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: OMPI_ALLOW_RUN_AS_ROOT value: "1" - name: OMPI_ALLOW_RUN_AS_ROOT_CONFIRM value: "1" - name: N_NODES value: "NUM_NODES" - name: LD_LIBRARY_PATH value: /usr/local/nvidia/lib64 command: - bash - -c - | set -x echo "Starting workload container on ${MY_NODE_NAME} for $N_NODES benchmark" # Install ping apt update -y apt install -y iputils-ping # Start sshd /scripts/container_entry.sh daemon & # Get helper variables to form all hostnames export POSTFIX=$(hostname | cut -d . -f 2-) export WORKERS_BASENAME=$(hostname | cut -d . -f 1 | rev | cut -d - -f 2- | rev ) export NODE_RANK=$JOB_COMPLETION_INDEX # For every worker, wait till online and add to hostfile for i in `seq 0 $(($N_NODES-1))`; do OTHER=${WORKERS_BASENAME}-${i}.${POSTFIX} until ssh -p 222 -o StrictHostKeyChecking=no $OTHER hostname; do echo Waiting for ${OTHER}... sleep 10 done echo ${OTHER} port=222 slots=8 | tee -a /tmp/hostfile; done cat /tmp/hostfile # Launch from head node if [[ "${NODE_RANK}" -eq "0" ]]; then # World Level = 0x0, Rail Aligned = 0x7 export NCCL_TESTS_SPLIT_MASK="0x0"; # Force use of libnccl-gib export NCCL_NET=gIB # Set all the correct libnccl-gib environment variables source /usr/local/gib/scripts/set_nccl_env.sh # Get all relevant NCCL / env vars to pass to all workers ENV_VARS=$(echo ${!NCCL*} ${!OMPI*} LD_LIBRARY_PATH PATH | sed 's/ / -x /g') mpirun --hostfile /tmp/hostfile \ -x $ENV_VARS \ -mca plm_rsh_no_tree_spawn 1 \ --mca mtl ^ofi \ --mca orte_keep_fqdn_hostnames 1 \ --mca btl self,tcp \ --mca btl_tcp_if_include eth0 \ --bind-to none \ --mca plm_rsh_agent "ssh -q -o LogLevel=ERROR -o StrictHostKeyChecking=no -p 222" \ /third_party/nccl-tests/build/all_gather_perf -b 1K -e 8G -f 2 -g 1 -w 5 --iters 100 -c 1 else while ping -c 1 ${WORKERS_BASENAME}-0.${POSTFIX}; do sleep 5 done fi exit 0 volumeMounts: - name: nvidia mountPath: /usr/local/nvidia - name: gib mountPath: /usr/local/gib - name: shared-memory mountPath: /dev/shm resources: limits: nvidia.com/gpu: 8 requests: nvidia.com/gpu: 8 restartPolicy: NeverGanti
NUM_NODESdengan jumlah node di node pool.Pastikan Anda memahami hal berikut tentang manifes ini:
- JobSet adalah Layanan headless dengan nama yang sama dengan nama JobSet, dalam hal ini,
nccl-ag. - Gerbang penjadwalan
gke.io/topology-aware-auto-nccl-testdigunakan untuk memverifikasi bahwa Pod dijadwalkan untuk kolokasi. - Kolom
parallelismdancompletionsditetapkan ke jumlah node yang ingin Anda gunakan untuk menjalankan pengujian NCCL.
- JobSet adalah Layanan headless dengan nama yang sama dengan nama JobSet, dalam hal ini,
Terapkan manifes:
kubectl apply -f nccl-jobset-test.yamlPastikan beban kerja diizinkan:
kubectl get jobsetsOutputnya mirip dengan hal berikut ini:
NAME RESTARTS COMPLETED AGE nccl-ag 3sPastikan beban kerja berada dalam status
Completed:kubectl get podsOutputnya mirip dengan hal berikut ini:
NAME READY STATUS RESTARTS AGE nccl-ag-worker-0-0-n9s6j 0/1 Completed 0 9m34s nccl-ag-worker-0-1-rsf7r 0/1 Completed 0 9m34s ...Log Pod dengan pola
nccl-ag-worker-0-0-.*berisi hasil pengujian.Ambil log untuk Pod ini:
kubectl logs $(kubectl get pods -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep nccl-ag-worker-0-0)Outputnya akan mirip dengan berikut ini:
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) ∂ç (GB/s) (GB/s) 1024 16 float none -1 54.07 0.02 0.02 0 55.80 0.02 0.02 0 2048 32 float none -1 55.46 0.04 0.03 0 55.31 0.04 0.03 0 4096 64 float none -1 55.59 0.07 0.07 0 55.38 0.07 0.07 0 8192 128 float none -1 56.05 0.15 0.14 0 55.92 0.15 0.14 0 16384 256 float none -1 57.08 0.29 0.27 0 57.75 0.28 0.27 0 32768 512 float none -1 57.49 0.57 0.53 0 57.22 0.57 0.54 0 65536 1024 float none -1 59.20 1.11 1.04 0 59.20 1.11 1.04 0 131072 2048 float none -1 59.58 2.20 2.06 0 63.57 2.06 1.93 0 262144 4096 float none -1 63.87 4.10 3.85 0 63.61 4.12 3.86 0 524288 8192 float none -1 64.83 8.09 7.58 0 64.40 8.14 7.63 0 1048576 16384 float none -1 79.74 13.15 12.33 0 76.66 13.68 12.82 0 2097152 32768 float none -1 78.41 26.74 25.07 0 79.05 26.53 24.87 0 4194304 65536 float none -1 83.21 50.41 47.26 0 81.25 51.62 48.39 0 8388608 131072 float none -1 94.35 88.91 83.35 0 99.07 84.68 79.38 0 16777216 262144 float none -1 122.9 136.55 128.02 0 121.7 137.83 129.21 0 33554432 524288 float none -1 184.2 182.19 170.80 0 178.1 188.38 176.60 0 67108864 1048576 float none -1 294.7 227.75 213.51 0 277.7 241.62 226.52 0 134217728 2097152 float none -1 495.4 270.94 254.00 0 488.8 274.60 257.43 0 268435456 4194304 float none -1 877.5 305.92 286.80 0 861.3 311.65 292.17 0 536870912 8388608 float none -1 1589.8 337.71 316.60 0 1576.2 340.61 319.33 0 1073741824 16777216 float none -1 3105.7 345.74 324.13 0 3069.2 349.85 327.98 0 2147483648 33554432 float none -1 6161.7 348.52 326.74 0 6070.7 353.75 331.64 0 4294967296 67108864 float none -1 12305 349.03 327.22 0 12053 356.35 334.08 0 8589934592 134217728 float none -1 24489 350.77 328.85 0 23991 358.05 335.67 0 # Out of bounds values : 0 OK # Avg bus bandwidth : 120.248
A3 Ultra
Buat manifes
nccl-jobset-test.yamlberikut:apiVersion: jobset.x-k8s.io/v1alpha2 kind: JobSet metadata: # The name `nccl-ag` is used for an NCCL all-gather test. name: nccl-ag spec: ttlSecondsAfterFinished: 1200 suspend: False network: enableDNSHostnames: true replicatedJobs: - name: worker template: spec: parallelism: NUM_NODES completions: NUM_NODES template: metadata: annotations: networking.gke.io/default-interface: 'eth0' networking.gke.io/interfaces: | [ {"interfaceName":"eth0","network":"default"}, {"interfaceName":"eth2","network":"rdma-0"}, {"interfaceName":"eth3","network":"rdma-1"}, {"interfaceName":"eth4","network":"rdma-2"}, {"interfaceName":"eth5","network":"rdma-3"}, {"interfaceName":"eth6","network":"rdma-4"}, {"interfaceName":"eth7","network":"rdma-5"}, {"interfaceName":"eth8","network":"rdma-6"}, {"interfaceName":"eth9","network":"rdma-7"} ] spec: activeDeadlineSeconds: 3600 restartPolicy: Never nodeSelector: cloud.google.com/gke-accelerator: nvidia-h200-141gb tolerations: - key: cloud.google.com/gke-queued effect: NoSchedule value: "true" - key: "nvidia.com/gpu" operator: "Exists" effect: "NoSchedule" setHostnameAsFQDN: true volumes: - name: gib hostPath: path: /home/kubernetes/bin/gib - name: nvidia hostPath: path: /home/kubernetes/bin/nvidia - name: lib64 hostPath: path: /lib64 - name: shared-memory emptyDir: medium: "Memory" sizeLimit: 250Gi schedulingGates: - name: "gke.io/topology-aware-auto-nccl-test" containers: - name: nccl-test stdin: true tty: true image: us-docker.pkg.dev/gce-ai-infra/gpudirect-gib/nccl-plugin-gib-diagnostic:v1.0.6 securityContext: privileged: true env: - name: MY_NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: OMPI_ALLOW_RUN_AS_ROOT value: "1" - name: OMPI_ALLOW_RUN_AS_ROOT_CONFIRM value: "1" - name: N_NODES value: "NUM_NODES" - name: LD_LIBRARY_PATH value: /usr/local/nvidia/lib64 command: - bash - -c - | set -x echo "Starting workload container on ${MY_NODE_NAME} for $N_NODES benchmark" # Install ping apt update -y apt install -y iputils-ping # Start sshd /scripts/container_entry.sh daemon & # Get helper variables to form all hostnames export POSTFIX=$(hostname | cut -d . -f 2-) export WORKERS_BASENAME=$(hostname | cut -d . -f 1 | rev | cut -d - -f 2- | rev ) export NODE_RANK=$JOB_COMPLETION_INDEX # For every worker, wait till online and add to hostfile for i in `seq 0 $(($N_NODES-1))`; do OTHER=${WORKERS_BASENAME}-${i}.${POSTFIX} until ssh -p 222 -o StrictHostKeyChecking=no $OTHER hostname; do echo Waiting for ${OTHER}... sleep 10 done echo ${OTHER} port=222 slots=8 | tee -a /tmp/hostfile; done cat /tmp/hostfile # Launch from head node if [[ "${NODE_RANK}" -eq "0" ]]; then # World Level = 0x0, Rail Aligned = 0x7 export NCCL_TESTS_SPLIT_MASK="0x0"; # Force use of libnccl-gib export NCCL_NET=gIB # Set all the correct libnccl-gib environment variables source /usr/local/gib/scripts/set_nccl_env.sh # Get all relevant NCCL / env vars to pass to all workers ENV_VARS=$(echo ${!NCCL*} ${!OMPI*} LD_LIBRARY_PATH PATH | sed 's/ / -x /g') mpirun --hostfile /tmp/hostfile \ -x $ENV_VARS \ -mca plm_rsh_no_tree_spawn 1 \ --mca orte_keep_fqdn_hostnames 1 \ --mca btl self,tcp \ --mca btl_tcp_if_include eth0 \ --bind-to none \ --mca plm_rsh_agent "ssh -q -o LogLevel=ERROR -o StrictHostKeyChecking=no -p 222" \ /third_party/nccl-tests/build/all_gather_perf -b 1K -e 8G -f 2 -g 1 -w 5 --iters 100 -c 1 else while ping -c 1 ${WORKERS_BASENAME}-0.${POSTFIX}; do sleep 5 done fi exit 0 volumeMounts: - name: nvidia mountPath: /usr/local/nvidia - name: gib mountPath: /usr/local/gib - name: shared-memory mountPath: /dev/shm resources: limits: nvidia.com/gpu: 8 requests: nvidia.com/gpu: 8 restartPolicy: NeverGanti
NUM_NODESdengan jumlah node di node pool.Pastikan Anda memahami hal berikut tentang manifes ini:
- JobSet adalah Layanan Headless dengan nama yang sama dengan nama JobSet, dalam hal ini,
nccl-ag. - Gerbang penjadwalan
gke.io/topology-aware-auto-nccl-testdigunakan untuk memverifikasi bahwa Pod dijadwalkan untuk kolokasi. - Kolom
parallelismdancompletionskeduanya ditetapkan ke jumlah node yang ingin Anda gunakan untuk menjalankan pengujian NCCL.
- JobSet adalah Layanan Headless dengan nama yang sama dengan nama JobSet, dalam hal ini,
Terapkan manifes:
kubectl apply -f nccl-jobset-test.yamlPastikan beban kerja diizinkan:
kubectl get jobsetsOutputnya mirip dengan hal berikut ini:
NAME RESTARTS COMPLETED AGE nccl-ag 3sPastikan beban kerja berada dalam status
Completed:kubectl get podsOutputnya mirip dengan hal berikut ini:
NAME READY STATUS RESTARTS AGE nccl-ag-worker-0-0-n9s6j 0/1 Completed 0 9m34s nccl-ag-worker-0-1-rsf7r 0/1 Completed 0 9m34s ...Log Pod dengan pola
nccl-ag-worker-0-0-.*berisi hasil pengujian.Ambil log untuk Pod ini:
kubectl logs $(kubectl get pods -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep nccl-ag-worker-0-0)Outputnya akan mirip dengan berikut ini:
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong # (B) (elements) (us) (GB/s) (GB/s) (us) ∂ç (GB/s) (GB/s) 1024 16 float none -1 54.07 0.02 0.02 0 55.80 0.02 0.02 0 2048 32 float none -1 55.46 0.04 0.03 0 55.31 0.04 0.03 0 4096 64 float none -1 55.59 0.07 0.07 0 55.38 0.07 0.07 0 8192 128 float none -1 56.05 0.15 0.14 0 55.92 0.15 0.14 0 16384 256 float none -1 57.08 0.29 0.27 0 57.75 0.28 0.27 0 32768 512 float none -1 57.49 0.57 0.53 0 57.22 0.57 0.54 0 65536 1024 float none -1 59.20 1.11 1.04 0 59.20 1.11 1.04 0 131072 2048 float none -1 59.58 2.20 2.06 0 63.57 2.06 1.93 0 262144 4096 float none -1 63.87 4.10 3.85 0 63.61 4.12 3.86 0 524288 8192 float none -1 64.83 8.09 7.58 0 64.40 8.14 7.63 0 1048576 16384 float none -1 79.74 13.15 12.33 0 76.66 13.68 12.82 0 2097152 32768 float none -1 78.41 26.74 25.07 0 79.05 26.53 24.87 0 4194304 65536 float none -1 83.21 50.41 47.26 0 81.25 51.62 48.39 0 8388608 131072 float none -1 94.35 88.91 83.35 0 99.07 84.68 79.38 0 16777216 262144 float none -1 122.9 136.55 128.02 0 121.7 137.83 129.21 0 33554432 524288 float none -1 184.2 182.19 170.80 0 178.1 188.38 176.60 0 67108864 1048576 float none -1 294.7 227.75 213.51 0 277.7 241.62 226.52 0 134217728 2097152 float none -1 495.4 270.94 254.00 0 488.8 274.60 257.43 0 268435456 4194304 float none -1 877.5 305.92 286.80 0 861.3 311.65 292.17 0 536870912 8388608 float none -1 1589.8 337.71 316.60 0 1576.2 340.61 319.33 0 1073741824 16777216 float none -1 3105.7 345.74 324.13 0 3069.2 349.85 327.98 0 2147483648 33554432 float none -1 6161.7 348.52 326.74 0 6070.7 353.75 331.64 0 4294967296 67108864 float none -1 12305 349.03 327.22 0 12053 356.35 334.08 0 8589934592 134217728 float none -1 24489 350.77 328.85 0 23991 358.05 335.67 0 # Out of bounds values : 0 OK # Avg bus bandwidth : 120.248 ```
Langkah berikutnya
- Mengumpulkan dan Memahami Log NCCL untuk Pemecahan Masalah guna memahami output pengujian dan memecahkan masalah.
- Pelajari cara memecahkan masalah performa lambat.