AI/ML 워크로드의 프로필 기반 구성

이 문서에서는 프로필 기반 구성을 사용하여 인공 지능 또는 머신러닝(AI/ML) 워크로드에 대한 Cloud Storage FUSE의 도입을 간소화하고 성능을 개선하는 방법을 설명합니다.

서빙, 체크포인트 또는 학습 워크로드의 Cloud Storage FUSE 구성을 간소화하려면 profile 필드 또는 --profile 옵션을 사용하여 워크로드 유형에 따라 사전 구성된 프로필을 적용하면 됩니다. 이 필드 또는 옵션을 사용하면 캐싱, 스레딩, 버퍼 크기에 대해 미리 정의되고 최적화된 Cloud Storage FUSE 기능 집합을 지정하여 학습, 체크포인트, 서빙 워크로드에 최소한의 노력으로 높은 성능을 보장할 수 있으며, 프로필 값은 각각 aiml-training, aiml-checkpointing, aiml-serving입니다.

고려사항

마운트 작업 중에만 --profile 옵션 또는 profile 필드를 설정할 수 있습니다. --profile 옵션 또는 profile 필드를 업데이트해야 하는 경우 Cloud Storage FUSE 버킷을 다시 마운트해야 합니다.
프로필 기반 구성을 사용하면 Cloud Storage FUSE는 메타데이터 캐시 용량과 TTL(수명)을 무제한으로 설정합니다. 즉, 항목이 메타데이터 캐시에서 삭제되지 않습니다. 가상 머신의 메모리가 부족하면 메모리 부족(OOM) 오류가 발생할 수 있습니다. 따라서 프로필 기반 구성을 적용하기 전에 메모리 용량을 검토하는 것이 좋습니다. 메모리가 1TiB 미만인 머신에서 OOM 오류가 발생할 가능성이 더 높습니다.
Cloud Storage FUSE 파라미터가 여러 방식으로 구성된 경우 다음 우선순위 순서가 적용됩니다(가장 높은 우선순위에서 가장 낮은 우선순위까지)
1. gcsfuse 명령어 또는 Cloud Storage FUSE 구성 파일에 직접 설정된 값입니다.
2. 프로필에서 설정한 값입니다. 프로필은 gcsfuse 명령어의 --profile 옵션이나 Cloud Storage FUSE 구성 파일의 profile 필드를 사용하여 지정됩니다.
3. Cloud Storage FUSE가 고성능 머신 유형을 감지할 때 자동으로 적용되는 기본값입니다. 자세한 내용은 고성능 머신 유형의 자동 구성 값을 참조하세요.
Google Kubernetes Engine 포드의 Cloud Storage FUSE CSI 볼륨은 profile 필드 또는 --profile 옵션을 지원하지 않습니다.
파일 캐싱에는 일반화할 수 없는 Cloud Storage FUSE 구성 필드와 Cloud Storage FUSE CLI 옵션이 필요하므로 프로필 기반 구성을 사용하여 파일 캐싱을 사용 설정할 수 없습니다. 서빙, 학습 또는 체크포인트 워크로드에 파일 캐싱을 사용 설정하려면 파일 캐싱 옵션 또는 필드를 명시적으로 구성해야 합니다.

학습 워크로드에 프로필 기반 구성 적용

학습 관련 프로필은 대규모 데이터 세트의 높은 처리량 읽기 성능을 최적화하고 Cloud GPU 및 Cloud TPU 하드웨어가 데이터를 기다리지 않도록 합니다.

학습 관련 프로필을 적용하려면 Cloud Storage FUSE 구성 파일을 사용하여 profile: aiml-training을 지정하거나 Cloud Storage FUSE CLI를 사용하여 --profile=aiml-training을 지정합니다. 그 후 다음 구성이 적용됩니다.

   # Create implicit directories locally when accessed:
   - implicit-dirs
   # Disable caching for lookups of files or directories that don't exist:
   - metadata-cache:negative-ttl-secs:0
   # Keep cached metadata (file attributes, types) indefinitely time-wise:
   - metadata-cache:ttl-secs:-1
   # Allow unlimited size for the file attribute (stat) cache:
   - metadata-cache:stat-cache-max-size-mb:-1
   # Allow unlimited size for the file/directory type cache:
   - metadata-cache:type-cache-max-size-mb:-1

체크포인트 워크로드에 프로필 기반 구성 적용

체크포인트 관련 프로필은 기가바이트 단위의 체크포인트를 저장하는 데 걸리는 시간을 대폭 줄여 대용량 파일의 높은 처리량 쓰기 성능을 최적화하고 학습 일시중지를 최소화합니다.

체크포인트 관련 프로필을 적용하려면 Cloud Storage FUSE 구성 파일을 사용하여 profile: aiml-checkpointing을 지정하거나 Cloud Storage FUSE CLI를 사용하여 --profile=aiml-checkpointing을 지정합니다. 그 후 다음 구성이 적용됩니다.

  # Create implicit directories locally when accessed:
  - implicit-dirs
  # Disable caching for lookups of files/dirs that don't exist:
  - metadata-cache:negative-ttl-secs:0
  # Keep cached metadata (file attributes, types) indefinitely time-wise:
  - metadata-cache:ttl-secs:-1
  # Allow unlimited size for the file attribute (stat) cache:
  - metadata-cache:stat-cache-max-size-mb:-1
  # Allow unlimited size for the file/directory type cache:
  - metadata-cache:type-cache-max-size-mb:-1
  # Cache the entire file when any part is read sequentially:
  - file-cache:cache-file-for-range-read:true
  # Allow renaming directories with a lot of files in non-HNS buckets.
  - file-system:rename-dir-limit:200000

서빙 워크로드에 프로필 기반 구성 적용

서빙은 데이터 액세스 및 캐싱 메커니즘을 개선하여 서빙 워크로드의 성능을 최적화합니다.

서빙 관련 프로필을 적용하려면 Cloud Storage FUSE 구성 파일을 사용하여 profile: aiml-serving을 지정하거나 Cloud Storage FUSE CLI를 사용하여 --profile=aiml-serving을 지정합니다. 그 후 다음 구성이 적용됩니다.

  # Create implicit directories locally when accessed:
  - implicit-dirs
  # Disable caching for lookups of files/dirs that don't exist:
  - metadata-cache:negative-ttl-secs:0
  # Keep cached metadata (file attributes, types) indefinitely time-wise:
  - metadata-cache:ttl-secs:-1
  # Allow unlimited size for the file attribute (stat) cache:
  - metadata-cache:stat-cache-max-size-mb:-1
  # Allow unlimited size for the file/directory type cache:
  - metadata-cache:type-cache-max-size-mb:-1
  # Cache the entire file when any part is read sequentially:
  - file-cache:cache-file-for-range-read:true
  # Enable kernel-list-cache to make listing faster as this is a readonly file system hierarchy.
  - file-system:kernel-list-cache-ttl-secs:-1

다음 단계

고성능 머신 유형의 자동 구성 값에 대해 알아봅니다.
사전 구성된 GKE YAML 파일로 성능을 최적화하는 방법을 알아봅니다.

AI/ML 워크로드의 프로필 기반 구성 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.