「Managed Service for Apache Spark」は、以前は「Compute Engine 上の Dataproc」（クラスタデプロイ）と「Apache Spark 用 Google Cloud Serverless」（サーバーレスデプロイ）と呼ばれていたプロダクトの新しい名前です。

Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

クラスタを作成する

Managed Service for Apache Spark では、Apache Log4j セキュリティ脆弱性の影響を受ける 1.3.95、1.4.77、1.5.53、2.0.27 より前のイメージバージョンでクラスタを作成できません。また、Managed Service for Apache Spark イメージバージョン 0.x、1.0.x、1.1.x、1.2.x のクラスタも作成できません。可能であれば、最新のサブマイナーイメージバージョンを使用して Managed Service for Apache Spark クラスタを作成することをおすすめします。

画像モード	log4j バージョン	カスタマーガイダンス
2.0.29、1.5.55、1.4.79 以降	log4j.2.17.1	推奨
2.0.28、1.5.54、1.4.78	log4j.2.17.0	推奨
2.0.27、1.5.53、1.4.77	log4j.2.16.0	強く推奨
2.0.26、1.5.52、1.4.76 以前	旧バージョン	使用を停止

特定のイメージと log4j の更新情報については、 Managed Service for Apache Spark のリリースノートをご覧ください。

Managed Service for Apache Spark クラスタを作成する

要件:

名前: クラスタ名は小文字で始まり、最大 51 の小文字、数字、ハイフンで構成します。末尾にハイフンは置けません。
クラスタリージョン: クラスタの Compute Engine リージョン（us-east1 や europe-west1 など）を指定して、リージョン内の Cloud Storage に保存されている VM インスタンスやクラスタなどのクラスタリソースを分離する必要があります。
- Compute Engine リージョンの詳細については、クラスタリージョンをご覧ください。
- リージョンの選択については、利用可能なリージョンとゾーンをご覧ください。gcloud compute regions list コマンドを実行して、利用可能なリージョンのリストを表示することもできます。
接続: Compute Engine 仮想マシンインスタンス（VM）は、Managed Service for Apache Spark クラスタ内にあり、マスター VM とワーカー VM で構成され、完全な内部 IP ネットワーク相互接続が必要です。この接続は、 default VPC ネットワークによって提供されます（ Managed Service for Apache Spark クラスタネットワークの構成をご覧ください）。

gcloud

コマンドラインで Managed Service for Apache Spark クラスタを作成するには、ターミナルウィンドウまたは Cloud Shellで、 gcloud dataproc clusters create コマンドをローカルで実行します。

gcloud dataproc clusters create CLUSTER_NAME \
    --region=REGION

コマンドを実行すると、デフォルトの Managed Service for Apache Spark サービス設定でクラスタが作成されます。デフォルトのサービス設定では、マスター仮想マシンインスタンスとワーカー仮想マシンインスタンス、ディスクのサイズとタイプ、ネットワークの種類、クラスタがデプロイされるリージョンとゾーン、その他のクラスタ設定が指定されます。コマンドラインフラグを使用したクラスタ設定のカスタマイズについては、 gcloud dataproc clusters create コマンドをご覧ください。

YAML ファイルを使用してクラスタを作成する

次の gcloud コマンドを実行して、既存の Managed Service for Apache Spark クラスタの構成を cluster.yaml ファイルにエクスポートします。
```
gcloud dataproc clusters export EXISTING_CLUSTER_NAME \
    --region=REGION \
    --destination=cluster.yaml
```

YAML ファイル構成をインポートして新しいクラスタを作成します。

gcloud dataproc clusters import NEW_CLUSTER_NAME \
    --region=REGION \
    --source=cluster.yaml

注: エクスポートのオペレーション中に、クラスタ固有の項目（クラスタ名など）、出力専用項目、自動的に適用されたラベルはフィルタされます。これらの項目は、クラスタ作成のためにインポートした YAML ファイルでは許可されません。

注: Managed Service for Apache Spark コンソールの [クラスタの作成] ページの左側にあるパネルの下部の [**同等の REST またはコマンドライン**] リンクをクリックすると、コンソールにより同等の API REST 要求または gcloud ツールコマンドが作成され、コードまたはコマンドラインからクラスタを作成するために使用できます。

Google Cloud

REST

このセクションでは、必須の値とデフォルト構成（1 つのマスター、2 つのワーカー）でクラスタを作成する方法を説明します。

リクエストのデータを使用する前に、次のように置き換えます。

CLUSTER_NAME: クラスタ名
PROJECT: Google Cloud プロジェクト ID
REGION: クラスタを作成する利用可能な Compute Engine リージョン
ZONE: クラスタを作成する選択したリージョン内のゾーン（省略可）。

HTTP メソッドと URL:

POST https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters

リクエストの本文（JSON）:

{
  "project_id":"PROJECT",
  "cluster_name":"CLUSTER_NAME",
  "config":{
    "master_config":{
      "num_instances":1,
      "machine_type_uri":"n1-standard-2",
      "image_uri":""
    },
    "softwareConfig": {
      "imageVersion": "",
      "properties": {},
      "optionalComponents": []
    },
    "worker_config":{
      "num_instances":2,
      "machine_type_uri":"n1-standard-2",
      "image_uri":""
    },
    "gce_cluster_config":{
      "zone_uri":"ZONE"
    }
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters"

PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://dataproc.googleapis.com/v1/projects/PROJECT/regions/REGION/clusters" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
"name": "projects/PROJECT/regions/REGION/operations/b5706e31......",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.dataproc.v1.ClusterOperationMetadata",
    "clusterName": "CLUSTER_NAME",
    "clusterUuid": "5fe882b2-...",
    "status": {
      "state": "PENDING",
      "innerState": "PENDING",
      "stateStartTime": "2019-11-21T00:37:56.220Z"
    },
    "operationType": "CREATE",
    "description": "Create cluster with 2 workers",
    "warnings": [
      "For PD-Standard without local SSDs, we strongly recommend provisioning 1TB ...""
    ]
  }
}

Google Cloud gcloud

コンソール

ブラウザの Google Cloud コンソールで Managed Service for Apache Spark [クラスタの作成] ページを開き、 [Compute Engine で Dataproc クラスタを作成する]ページの [Compute Engine]上のクラスタ] 行の [作成]をクリックします。デフォルト値がフィールドに入力されている [クラスタの設定] パネルが選択されています。各パネルを選択し、デフォルト値を確認するか、変更してクラスタをカスタマイズします。

[作成] をクリックして、クラスタを作成します。クラスタ名が [クラスタ] ページに表示され、クラスタがプロビジョニングされると、そのステータスは [実行中] に更新されます。クラスタ名をクリックするとクラスタ詳細ページが開き、クラスタのジョブ、インスタンス、構成設定を確認して、クラスタで実行されているウェブインターフェースに接続できます。

Go

クライアントライブラリをインストールする。

アプリケーションのデフォルト認証情報を設定します。

コードを実行します。開発環境をセットアップするをご覧ください。

import (
	"context"
	"fmt"
	"io"

	dataproc "cloud.google.com/go/dataproc/apiv1"
	"cloud.google.com/go/dataproc/apiv1/dataprocpb"
	"google.golang.org/api/option"
)

func createCluster(w io.Writer, projectID, region, clusterName string) error {
	// projectID := "your-project-id"
	// region := "us-central1"
	// clusterName := "your-cluster"
	ctx := context.Background()

	// Create the cluster client.
	endpoint := region + "-dataproc.googleapis.com:443"
	clusterClient, err := dataproc.NewClusterControllerClient(ctx, option.WithEndpoint(endpoint))
	if err != nil {
		return fmt.Errorf("dataproc.NewClusterControllerClient: %w", err)
	}
	defer clusterClient.Close()

	// Create the cluster config.
	req := &dataprocpb.CreateClusterRequest{
		ProjectId: projectID,
		Region:    region,
		Cluster: &dataprocpb.Cluster{
			ProjectId:   projectID,
			ClusterName: clusterName,
			Config: &dataprocpb.ClusterConfig{
				MasterConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   1,
					MachineTypeUri: "n1-standard-2",
				},
				WorkerConfig: &dataprocpb.InstanceGroupConfig{
					NumInstances:   2,
					MachineTypeUri: "n1-standard-2",
				},
			},
		},
	}

	// Create the cluster.
	op, err := clusterClient.CreateCluster(ctx, req)
	if err != nil {
		return fmt.Errorf("CreateCluster: %w", err)
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		return fmt.Errorf("CreateCluster.Wait: %w", err)
	}

	// Output a success message.
	fmt.Fprintf(w, "Cluster created successfully: %s", resp.ClusterName)
	return nil
}

Java

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.dataproc.v1.Cluster;
import com.google.cloud.dataproc.v1.ClusterConfig;
import com.google.cloud.dataproc.v1.ClusterControllerClient;
import com.google.cloud.dataproc.v1.ClusterControllerSettings;
import com.google.cloud.dataproc.v1.ClusterOperationMetadata;
import com.google.cloud.dataproc.v1.InstanceGroupConfig;
import java.io.IOException;
import java.util.concurrent.ExecutionException;

public class CreateCluster {

  public static void createCluster() throws IOException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String projectId = "your-project-id";
    String region = "your-project-region";
    String clusterName = "your-cluster-name";
    createCluster(projectId, region, clusterName);
  }

  public static void createCluster(String projectId, String region, String clusterName)
      throws IOException, InterruptedException {
    String myEndpoint = String.format("%s-dataproc.googleapis.com:443", region);

    // Configure the settings for the cluster controller client.
    ClusterControllerSettings clusterControllerSettings =
        ClusterControllerSettings.newBuilder().setEndpoint(myEndpoint).build();

    // Create a cluster controller client with the configured settings. The client only needs to be
    // created once and can be reused for multiple requests. Using a try-with-resources
    // closes the client, but this can also be done manually with the .close() method.
    try (ClusterControllerClient clusterControllerClient =
        ClusterControllerClient.create(clusterControllerSettings)) {
      // Configure the settings for our cluster.
      InstanceGroupConfig masterConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(1)
              .build();
      InstanceGroupConfig workerConfig =
          InstanceGroupConfig.newBuilder()
              .setMachineTypeUri("n1-standard-2")
              .setNumInstances(2)
              .build();
      ClusterConfig clusterConfig =
          ClusterConfig.newBuilder()
              .setMasterConfig(masterConfig)
              .setWorkerConfig(workerConfig)
              .build();
      // Create the cluster object with the desired cluster config.
      Cluster cluster =
          Cluster.newBuilder().setClusterName(clusterName).setConfig(clusterConfig).build();

      // Create the Cloud Dataproc cluster.
      OperationFuture<Cluster, ClusterOperationMetadata> createClusterAsyncRequest =
          clusterControllerClient.createClusterAsync(projectId, region, cluster);
      Cluster response = createClusterAsyncRequest.get();

      // Print out a success message.
      System.out.printf("Cluster created successfully: %s", response.getClusterName());

    } catch (ExecutionException e) {
      System.err.println(String.format("Error executing createCluster: %s ", e.getMessage()));
    }
  }
}

Node.js

クライアントライブラリをインストールします。
アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

const dataproc = require('@google-cloud/dataproc');

// TODO(developer): Uncomment and set the following variables
// projectId = 'YOUR_PROJECT_ID'
// region = 'YOUR_CLUSTER_REGION'
// clusterName = 'YOUR_CLUSTER_NAME'

// Create a client with the endpoint set to the desired cluster region
const client = new dataproc.v1.ClusterControllerClient({
  apiEndpoint: `${region}-dataproc.googleapis.com`,
  projectId: projectId,
});

async function createCluster() {
  // Create the cluster config
  const request = {
    projectId: projectId,
    region: region,
    cluster: {
      clusterName: clusterName,
      config: {
        masterConfig: {
          numInstances: 1,
          machineTypeUri: 'n1-standard-2',
        },
        workerConfig: {
          numInstances: 2,
          machineTypeUri: 'n1-standard-2',
        },
      },
    },
  };

  // Create the cluster
  const [operation] = await client.createCluster(request);
  const [response] = await operation.promise();

  // Output a success message
  console.log(`Cluster created successfully: ${response.clusterName}`);

Python

クライアントライブラリをインストールします。

アプリケーションのデフォルト認証情報を設定します。

コードを実行します。

from google.cloud import dataproc_v1 as dataproc


def create_cluster(project_id, region, cluster_name):
    """This sample walks a user through creating a Cloud Dataproc cluster
    using the Python client library.

    Args:
        project_id (string): Project to use for creating resources.
        region (string): Region where the resources should live.
        cluster_name (string): Name to use for creating a cluster.
    """

    # Create a client with the endpoint set to the desired cluster region.
    cluster_client = dataproc.ClusterControllerClient(
        client_options={"api_endpoint": f"{region}-dataproc.googleapis.com:443"}
    )

    # Create the cluster config.
    cluster = {
        "project_id": project_id,
        "cluster_name": cluster_name,
        "config": {
            "master_config": {"num_instances": 1, "machine_type_uri": "n1-standard-2"},
            "worker_config": {"num_instances": 2, "machine_type_uri": "n1-standard-2"},
        },
    }

    # Create the cluster.
    operation = cluster_client.create_cluster(
        request={"project_id": project_id, "region": region, "cluster": cluster}
    )
    result = operation.result()

    # Output a success message.
    print(f"Cluster created successfully: {result.cluster_name}")

クラスタを作成する コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

Managed Service for Apache Spark クラスタを作成する

gcloud

YAML ファイルを使用してクラスタを作成する

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

コンソール

Go

Java

Node.js

Python

クラスタを作成する