"Managed Service for Apache Spark" is the new name for the product formerly known as "Dataproc on Compute Engine" (cluster deployment) and "Google Cloud Serverless for Apache Spark" (serverless deployment).

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

クラスタを作成する

Managed Service for Apache Spark では、Apache Log4j セキュリティ脆弱性の影響を受ける 1.3.95、1.4.77、1.5.53、2.0.27 より前のイメージバージョンでクラスタを作成できません。また、Managed Service for Apache Spark では、Managed Service for Apache Spark イメージバージョン 0.x、1.0.x、1.1.x、1.2.x のクラスタも作成できません。可能であれば、最新のサブマイナーイメージバージョンを使用して Managed Service for Apache Spark クラスタを作成することをおすすめします。

イメージのバージョン	log4j バージョン	カスタマーガイダンス
2.0.29、1.5.55、1.4.79 以降	log4j.2.17.1	推奨
2.0.28、1.5.54、1.4.78	log4j.2.17.0	推奨
2.0.27、1.5.53、1.4.77	log4j.2.16.0	強く推奨
2.0.26、1.5.52、1.4.76 以前	旧バージョン	使用を停止

特定のイメージと log4j の更新情報については、Managed Service for Apache Spark のリリースノートをご覧ください。

クラスタの作成

要件:

名前: クラスタ名は小文字で始まり、最大 51 の小文字、数字、ハイフンで構成します。末尾にハイフンは置けません。
クラスタリージョン: クラスタの Compute Engine リージョン（us-east1 や europe-west1 など）を指定して、リージョン内の Cloud Storage に保存されている VM インスタンスやクラスタなどのクラスタリソースを分離する必要があります。
- Compute Engine リージョンの詳細については、クラスタリージョンをご覧ください。
- リージョンの選択については、利用可能なリージョンとゾーンをご覧ください。gcloud compute regions list コマンドを実行して、利用可能なリージョンのリストを表示することもできます。
接続: Managed Service for Apache Spark クラスタ内の Compute Engine 仮想マシンインスタンス（VM）は、マスター VM とワーカー VM で構成され、完全な内部 IP ネットワーク相互接続が必要です。この接続は、default VPC ネットワークによって提供されます（Managed Service for Apache Spark クラスタネットワークの構成をご覧ください）。
マシンタイプ（推奨）: マシンタイプの指定は省略可能ですが、クラスタ内のマスター VM とワーカー VM のマシンタイプを明示的に選択することをおすすめします。マシンタイプを指定しない場合、Managed Service for Apache Spark はリソースの可用性に基づいてマシンタイプを動的に選択します。この動的な選択により、費用とパフォーマンスの両方にばらつきが生じる可能性があります。
- マシンタイプの選択の詳細については、サポートされているマシンタイプをご覧ください。
- リソースが使用できなくなる可能性を軽減するには、許容可能なマシンタイプのリストを指定できるフレキシブル VM を使用することをおすすめします。

コンソール

Google Cloud コンソールの [クラスタの作成] ページを開き、デフォルトのクラスタ設定を表示します。表示されたデフォルト設定を確認または変更し、[追加構成] をクリックしてクラスタをさらにカスタマイズします。

[クラスタを作成] をクリックしてクラスタを作成します。クラスタ名が [クラスタ] ページに表示され、クラスタがプロビジョニングされると、そのステータスは Running に更新されます。クラスタ名をクリックするとクラスタ詳細ページが開き、クラスタのジョブ、インスタンス、構成設定を確認して、クラスタで実行されているウェブインターフェースに接続できます。

gcloud

コマンドラインで Managed Service for Apache Spark クラスタを作成するには、ターミナルウィンドウまたは Cloud Shell で、gcloud dataproc clusters create コマンドをローカルで実行します。

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION \
    --master-machine-type=MASTER_MACHINE_TYPE \
    --worker-machine-type=WORKER_MACHINE_TYPE

このコマンドはクラスタを作成します。マスターとワーカーのマシンタイプは省略可能ですが、費用とパフォーマンスの一貫性を確保するために、--master-machine-type フラグと --worker-machine-type フラグ（n4-standard-4 など）を使用して明示的に指定することをおすすめします。マシンタイプを指定しない場合、デフォルトのマシンタイプはリソースの可用性に基づいて動的に選択されます。コマンドラインフラグを使用したクラスタ設定のカスタマイズについては、gcloud dataproc clusters create コマンドをご覧ください。

YAML ファイルを使用してクラスタを作成する

次の gcloud コマンドを実行して、既存の Managed Service for Apache Spark クラスタの構成を cluster.yaml ファイルにエクスポートします。
```
gcloud dataproc clusters export EXISTING_CLUSTER_NAME \
    --region=REGION \
    --destination=cluster.yaml
```

YAML ファイル構成をインポートして新しいクラスタを作成します。

gcloud dataproc clusters import NEW_CLUSTER_NAME \
    --region=REGION \
    --source=cluster.yaml

**注:** エクスポートのオペレーション中に、クラスタ固有の項目（クラスタ名など）、出力専用項目、自動的に適用されたラベルはフィルタされます。これらの項目は、クラスタ作成のためにインポートした YAML ファイルでは許可されません。

REST

このセクションでは、クラスタの作成方法について説明します。マシンタイプの指定は省略可能ですが、費用とパフォーマンスの一貫性を確保するため、master_config と worker_config（n4-standard-4 など）に machine_type_uri を明示的に含めることをおすすめします。マシンタイプを指定しない場合、リソースの可用性に基づいてデフォルトのマシンタイプが動的に選択されます。

リクエストのデータを使用する前に、次のように置き換えます。

CLUSTER_NAME: クラスタ名
PROJECT: Google Cloud プロジェクト ID
REGION: クラスタを作成する利用可能な Compute Engine リージョン
ZONE: クラスタを作成する選択したリージョン内のゾーン（省略可）。
MASTER_MACHINE_TYPE: （推奨）マスターノードのマシンタイプ（例: n4-standard-4）。
WORKER_MACHINE_TYPE: （推奨）ワーカーノードのマシンタイプ（例: n4-standard-4）。

HTTP メソッドと URL:

POST https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters

リクエストの本文（JSON）:

{
  "project_id":"PROJECT",
  "cluster_name":"CLUSTER_NAME",
  "config":{
    "master_config":{
      "num_instances":1,
      "machine_type_uri":"MASTER_MACHINE_TYPE",
      "image_uri":""
    },
    "softwareConfig": {
      "imageVersion": "",
      "properties": {},
      "optionalComponents": []
    },
    "worker_config":{
      "num_instances":2,
      "machine_type_uri":"WORKER_MACHINE_TYPE",
      "image_uri":""
    },
    "gce_cluster_config":{
      "zone_uri":"ZONE"
    }
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters"

PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
"name": "projects/PROJECT/regions/REGION/operations/b5706e31......",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.dataproc.v1.ClusterOperationMetadata",
    "clusterName": "CLUSTER_NAME",
    "clusterUuid": "5fe882b2-...",
    "status": {
      "state": "PENDING",
      "innerState": "PENDING",
      "stateStartTime": "2019-11-21T00:37:56.220Z"
    },
    "operationType": "CREATE",
    "description": "Create cluster with 2 workers",
    "warnings": [
      "For PD-Standard without local SSDs, we strongly recommend provisioning 1TB ...""
    ]
  }
}

Go

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

注: マシンタイプの指定は省略可能ですが、費用とパフォーマンスの一貫性を確保するために、クラスタ構成でマスターマシンタイプとワーカーマシンタイプを明示的に設定すること（n4-standard-4 など）をおすすめします。省略すると、リソースの可用性に基づいてデフォルトのマシンタイプが動的に選択されます。

import (
	"context"
	"fmt"
	"io"

	dataproc "cloud.google.com/go/dataproc/apiv1"
	"cloud.google.com/go/dataproc/apiv1/dataprocpb"
	"google.golang.org/api/option"
)

func createCluster(w io.Writer, projectID, region, clusterName string) error {
	// projectID := "your-project-id"
	// region := "us-central1"
	// clusterName := "your-cluster"
	ctx := context.Background()

	// Create the cluster client.
	endpoint := region + "-dataproc.googleapis.com:443"
	clusterClient, err := dataproc.NewClusterControllerClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("dataproc.NewClusterControllerClient: %w", err)
	}
	defer clusterClient.Close()

	// Create the cluster config.
	req := &dataprocpb.CreateClusterRequest{
		ProjectId: projectID,
		Region:    region,
		Cluster: &dataprocpb.Cluster{
			ProjectId:   projectID,
			ClusterName: clusterName,
			Config: &dataprocpb.ClusterConfig{
				MasterConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   1,
					MachineTypeUri: "n1-standard-2",
				},
				WorkerConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   2,
					MachineTypeUri: "n1-standard-2",
				},
			},
		},
	}

	// Create the cluster.
	op, err := clusterClient.CreateCluster(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateCluster: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("CreateCluster.Wait: %w", err)
	}

	// Output a success message.
	fmt.Fprintf(w, "Cluster created successfully: %s", resp.ClusterName)
	return nil
}

Java

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.dataproc.v1.Cluster;
import com.google.cloud.dataproc.v1.ClusterConfig;
import com.google.cloud.dataproc.v1.ClusterControllerClient;
import com.google.cloud.dataproc.v1.ClusterControllerSettings;
import com.google.cloud.dataproc.v1.ClusterOperationMetadata;
import com.google.cloud.dataproc.v1.InstanceGroupConfig;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class CreateCluster {

  public static void createCluster() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String region = "your-project-region";
    String clusterName = "your-cluster-name";
    createCluster(projectId, region, clusterName);
  }

  public static void createCluster(String projectId, String region, String clusterName)
      throws IOException, InterruptedException {
    String myEndpoint = String.format("%s-dataproc.googleapis.com:443", region);

    // Configure the settings for the cluster controller client.
    ClusterControllerSettings clusterControllerSettings =
        ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();

    // Create a cluster controller client with the configured settings. The client only needs to be
    // created once and can be reused for multiple requests. Using a try-with-resources
    // closes the client, but this can also be done manually with the .close() method.
    try (ClusterControllerClient clusterControllerClient =
        ClusterControllerClient.create(clusterControllerSettings)) {
      // Configure the settings for our cluster.
      InstanceGroupConfig masterConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(1)
              .build();
      InstanceGroupConfig workerConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(2)
              .build();
      ClusterConfig clusterConfig =
          ClusterConfig.newBuilder()
              .setMasterConfig(masterConfig)
              .setWorkerConfig(workerConfig)
              .build();
      // Create the cluster object with the desired cluster config.
      Cluster cluster =
          Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();

      // Create the Cloud Dataproc cluster.
      OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsyncRequest =
          clusterControllerClient.createClusterAsync(projectId, region, cluster);
      Cluster response = createClusterAsyncRequest.get();

      // Print out a success message.
      System.out.printf("Cluster created successfully: %s", response.getClusterName());

    } catch (ExecutionException e) {
      System.err.println(String.format("Error executing createCluster: %s ", e.getMessage()));
    }
  }
}

Node.js

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

const dataproc = require('@google-cloud/dataproc');

// TODO(developer): Uncomment and set the following variables
// projectId = 'YOUR_PROJECT_ID'
// region = 'YOUR_CLUSTER_REGION'
// clusterName = 'YOUR_CLUSTER_NAME'

// Create a client with the endpoint set to the desired cluster region
const client = new dataproc.v1.ClusterControllerClient({
  apiEndpoint: `${region}-dataproc.googleapis.com`,
  projectId: projectId,
});

async function createCluster() {
  // Create the cluster config
  const request = {
    projectId: projectId,
    region: region,
    cluster: {
      clusterName: clusterName,
      config: {
        masterConfig: {
          numInstances: 1,
          machineTypeUri: 'n1-standard-2',
        },
        workerConfig: {
          numInstances: 2,
          machineTypeUri: 'n1-standard-2',
        },
      },
    },
  };

  // Create the cluster
  const [operation] = await client.createCluster(request);
  const [response] = await operation.promise();

  // Output a success message
  console.log(`Cluster created successfully: ${response.clusterName}`);

Python

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

from google.cloud import dataproc_v1 as dataproc


def create_cluster(project_id, region, cluster_name):
    """This sample walks a user through creating a Cloud Dataproc cluster
    using the Python client library.

    Args:
        project_id (string): Project to use for creating resources.
        region (string): Region where the resources should live.
        cluster_name (string): Name to use for creating a cluster.
    """

    # Create a client with the endpoint set to the desired cluster region.
    cluster_client = dataproc.ClusterControllerClient(
        client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
    )

    # Create the cluster config.
    cluster = {
        "project_id": project_id,
        "cluster_name": cluster_name,
        "config": {
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-2"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-2"},
        },
    }

    # Create the cluster.
    operation = cluster_client.create_cluster(
        request={"project_id": project_id, "region": region, "cluster": cluster}
    )
    result = operation.result()

    # Output a success message.
    print(f"Cluster created successfully: {result.cluster_name}")

クラスタを作成する コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

クラスタの作成

コンソール

gcloud

YAML ファイルを使用してクラスタを作成する

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Go

Java

Node.js

Python

クラスタを作成する