このページは Cloud Translation API によって翻訳されました。

検索データストアを作成する

データストアを作成して検索用のデータを取り込むには、使用する予定のソースのセクションに移動します。

ウェブサイトのコンテンツを使用してデータストアを作成する
BigQuery からインポート
Cloud Storage からインポートする
Google ドライブから同期する
Gmail から同期する（一般公開プレビュー）
Google サイトから同期する（一般提供）
Google カレンダーから同期する（一般公開プレビュー）
Google グループから同期する（パブリックプレビュー）
Cloud SQL からインポートする
Spanner からインポートする（公開プレビュー）
Firestore からインポートする
Bigtable からインポートする（公開プレビュー）
AlloyDB for PostgreSQL からインポートする（パブリックプレビュー）
API を使用して構造化 JSON データをアップロード
Terraform を使用してデータストアを作成する

代わりにサードパーティのデータソースからデータを同期するには、サードパーティのデータソースを接続するをご覧ください。

トラブルシューティングについては、データ取り込みのトラブルシューティングをご覧ください。

ウェブサイトのコンテンツを使用してデータストアを作成する

次の手順で、データストアを作成してウェブサイトのインデックスを登録します。

ウェブサイトのデータストアを作成した後に使用するには、Enterprise 機能が有効になっているアプリに接続する必要があります。アプリの作成時に Enterprise エディションを有効にできます。追加費用が発生します。検索アプリを作成すると高度な機能についてをご覧ください。

始める前に

ウェブサイトで robots.txt ファイルを使用している場合は、更新します。詳しくは、ウェブサイトの robots.txt ファイルを準備する方法をご覧ください。

手順

コンソール

Google Cloud コンソールを使用してデータストアを作成し、ウェブサイトのインデックスを作成する手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
ナビゲーションメニューで [データストア] をクリックします。
[データストアを作成] をクリックします。
[ソース] ページで、[ウェブサイトのコンテンツ] を選択します。
このデータストアでウェブサイトの高度なインデックス登録を有効にするかどうかを選択します。ウェブサイトの高度なインデックス登録を今オンにすると、後でオフにすることはできません。

ウェブサイトの高度なインデックス登録では、検索の要約、フォローアップ付き検索、抽出回答などの追加機能が提供されます。高度なウェブサイトインデックス登録を使用すると、追加料金がかかります。また、インデックスに登録するウェブサイトのドメインの所有権を確認する必要があります。詳しくは、ウェブサイトの高度なインデックス登録と料金をご覧ください。
[含めるサイト] フィールドに、データストアに含めるウェブサイトに一致する URL パターンを入力します。カンマ区切りを使用せずに、1 行に 1 つの URL パターンを含めます。例: example.com/docs/*
省略可: [除外するサイト] フィールドに、データストアから除外する URL パターンを入力します。

除外したサイトは、含めたサイトよりも優先されます。したがって、example.com/docs/* を含めて example.com を除外すると、ウェブサイトはインデックスに登録されません。詳しくは、ウェブサイトのデータをご覧ください。
[続行] をクリックします。
データストアのロケーションを選択します。
- 基本的なウェブサイト検索データストアを作成すると、これは常に [グローバル] に設定されます。
- ウェブサイトの高度なインデックス登録を使用してデータストアを作成するときに、ロケーションを選択できます。インデックス登録されるウェブサイトは公開されている必要があるため、ロケーションとして [global（グローバル）] を選択することを強くおすすめします。これにより、すべての検索サービスと回答サービスの可用性が最大限に確保され、リージョンデータストアの制限がなくなります。
データストアの名前を入力します。
[作成] をクリックします。Vertex AI Search によってデータストアが作成され、[データストア] ページにデータストアが表示されます。
データストアに関する情報を表示するには、[名前] 列でデータストアの名前をクリックします。データストアのページが表示されます。
- [ウェブサイトの高度なインデックス登録] をオンにしている場合は、データストア内のドメインの確認を求める警告が表示されます。
- 割り当て不足（指定したウェブサイトのページ数が、プロジェクトの「プロジェクトあたりのドキュメント数」の割り当てを超えている場合）は、割り当てのアップグレードを促す警告がさらに表示されます。
データストア内の URL パターンのドメインを確認するには、ウェブサイトのドメインの所有権を証明するの手順に沿って操作します。
割り当てをアップグレードする手順は次のとおりです。
1. [割り当てをアップグレード] をクリックします。 Google Cloud コンソールの [IAM と管理] ページが表示されます。
2. Google Cloud ドキュメントの割り当ての調整をリクエストするの手順に沿って操作します。増やす割り当ては、Discovery Engine API サービスのドキュメント数です。
3. 割り当て上限の引き上げリクエストを送信したら、[AI アプリケーション] ページに戻り、ナビゲーションメニューの [データストア] をクリックします。
4. [名前] 列で、データストアの名前をクリックします。[ステータス] 列には、割り当てを超過したウェブサイトのインデックス登録が進行中であることが示されます。URL の [ステータス] 列に [インデックス登録済み] と表示されている場合、その URL または URL パターンに対してウェブサイトの高度なインデックス登録機能を使用できます。
詳細については、「割り当てと上限」ページのウェブページのインデックス登録の割り当てをご覧ください。

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

Vertex AI Search に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証の設定をご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ウェブサイトをインポートする

#     from google.api_core.client_options import ClientOptions
#
#     from google.cloud import discoveryengine_v1 as discoveryengine
#
#     # TODO(developer): Uncomment these variables before running the sample.
#     # project_id = "YOUR_PROJECT_ID"
#     # location = "YOUR_LOCATION" # Values: "global"
#     # data_store_id = "YOUR_DATA_STORE_ID"
#     # NOTE: Do not include http or https protocol in the URI pattern
#     # uri_pattern = "cloud.google.com/generative-ai-app-builder/docs/*"
#
#     #  For more information, refer to:
#     # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
#     client_options = (
#         ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
#         if location != "global"
#         else None
#     )
#
#     # Create a client
#     client = discoveryengine.SiteSearchEngineServiceClient(
#         client_options=client_options
#     )
#
#     # The full resource name of the data store
#     # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}
#     site_search_engine = client.site_search_engine_path(
#         project=project_id, location=location, data_store=data_store_id
#     )
#
#     # Target Site to index
#     target_site = discoveryengine.TargetSite(
#         provided_uri_pattern=uri_pattern,
#         # Options: INCLUDE, EXCLUDE
#         type_=discoveryengine.TargetSite.Type.INCLUDE,
#         exact_match=False,
#     )
#
#     # Make the request
#     operation = client.create_target_site(
#         parent=site_search_engine,
#         target_site=target_site,
#     )
#
#     print(f"Waiting for operation to complete: {operation.operation.name}")
#     response = operation.result()
#
#     # After the operation is complete,
#     # get information from operation metadata
#     metadata = discoveryengine.CreateTargetSiteMetadata(operation.metadata)
#
#     # Handle the response
#     print(response)
#     print(metadata)

次のステップ

ウェブサイトのデータストアをアプリに接続するには、エンタープライズ機能を有効にしてアプリを作成し、検索アプリを作成するの手順に沿ってデータストアを選択します。
ウェブサイトの高度なインデックス登録を有効にしている場合は、構造化データを使用してスキーマを更新できます。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

BigQuery からインポート

Vertex AI Search は、BigQuery データ全体での検索をサポートしています。

BigQuery テーブルからデータストアを作成するには、次の 2 つの方法があります。

1 回限りの取り込み: BigQuery テーブルからデータストアにデータをインポートします。データストア内のデータは、手動でデータを更新しない限り、変更されません。
定期的な取り込み: 1 つ以上の BigQuery テーブルからデータをインポートし、同期頻度を設定します。この頻度によって、BigQuery データセットの最新データでデータストアが更新される頻度が決まります。

次の表に、BigQuery データを Vertex AI Search データストアにインポートする 2 つの方法を比較します。

1 回限りの取り込み	定期的な取り込み
一般提供（GA）。	公開プレビュー版。
データは手動で更新する必要があります。	データは 1 日、3 日、5 日ごとに自動的に更新されます。データを手動で更新することはできません。
Vertex AI Search は、BigQuery の 1 つのテーブルから単一のデータストアを作成します。	Vertex AI Search は、指定されたテーブルごとに、BigQuery データセットのデータコネクタとデータストア（エンティティデータストア）を作成します。各データコネクタで、テーブルのデータ型（構造化など）が同じで、同じ BigQuery データセットに存在する必要があります。
複数のテーブルのデータを 1 つのデータストアに結合するには、まず 1 つのテーブルからデータを取り込み、次に別のソースまたは BigQuery テーブルからデータを取り込みます。	手動でのデータインポートはサポートされていないため、エンティティデータストアのデータは 1 つの BigQuery テーブルからのみ取得できます。
データソースへのアクセス制御がサポートされています。	データソースへのアクセス制御はサポートされていません。インポートされたデータにはアクセス制御を含めることが可能ですが、これらの制御は適用されません。
データストアは、Google Cloud コンソールまたは API を使用して作成できます。	コンソールを使用して、データコネクタとそのエンティティデータストアを作成する必要があります。
CMEK 準拠。	CMEK 準拠。

BigQuery から 1 回インポートする

BigQuery テーブルからデータを取り込むには、次の手順で Google Cloud コンソールまたは API を使用してデータストアを作成し、データを取り込みます。

データをインポートする前に、取り込むデータを準備します。

コンソール

Google Cloud コンソールを使用して BigQuery からデータを読み込むには、次の手順を行います。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[データストアを作成] をクリックします。
[ソース] ページで、[BigQuery] を選択します。
[インポートするデータの種類] セクションで、インポートするデータ型を選択します。
[同期の頻度] セクションで [1 回限り] を選択します。
[BigQuery のパス] フィールドで [参照] をクリックし、取り込み用に準備したテーブルを選択して、[選択] をクリックします。または、[BigQuery パス] フィールドにテーブルの場所を直接入力します。
[続行] をクリックします。
構造化データを 1 回限りインポートする場合:
1. フィールドをキープロパティにマッピングします。
2. スキーマに重要なフィールドが欠落している場合は、[新しいフィールドを追加] を使用して追加します。
  
  詳しくは、自動検出と編集についてをご覧ください。
3. [続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。
取り込みのステータスを確認するには、[データストア] ページに移動し、データストア名をクリックして、[データ] ページで詳細を表示します。 [アクティビティ] タブのステータス列が「処理中」から「インポート完了」に変わると、取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

REST

コマンドラインを使用してデータストアを作成し、BigQuery からデータをインポートする手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。

次のように置き換えます。
- PROJECT_ID: 実際の Google Cloud プロジェクト ID。
- DATA_STORE_ID: 作成する Vertex AI Search データストアの ID。この ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DATA_STORE_DISPLAY_NAME: 作成する Vertex AI Search データストアの表示名。
省略可: 非構造化データをアップロードし、ドキュメント解析を構成するか、RAG のドキュメントチャンキングを有効にする場合は、documentProcessingConfig オブジェクトを指定して、データストアの作成リクエストに含めます。スキャンされた PDF を取り込む場合は、PDF 用の OCR パーサーを構成することをおすすめします。解析オプションまたはチャンク処理オプションの構成方法については、ドキュメントの解析とチャンク化をご覧ください。
BigQuery からデータをインポートします。

スキーマを定義した場合は、データがそのスキーマに準拠していることを確認してください。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
-d '{
  "bigquerySource": {
    "projectId": "PROJECT_ID",
    "datasetId":"DATASET_ID",
    "tableId": "TABLE_ID",
    "dataSchema": "DATA_SCHEMA",
    "aclEnabled": "BOOLEAN"
  },
  "reconciliationMode": "RECONCILIATION_MODE",
  "autoGenerateIds": "AUTO_GENERATE_IDS",
  "idField": "ID_FIELD",
  "errorConfig": {
    "gcsPrefix": "ERROR_DIRECTORY"
  }
}'
```
次のように置き換えます。
- PROJECT_ID: 実際の Google Cloud プロジェクト ID。
- DATA_STORE_ID: Vertex AI Search データストアの ID。
- DATASET_ID: BigQuery データセットの ID。
- TABLE_ID: BigQuery テーブルの ID。
  - BigQuery テーブルが PROJECT_ID にない場合は、サービスアカウント service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com に BigQuery テーブルに対する「BigQuery データ閲覧者」権限を付与する必要があります。たとえば、ソースプロジェクト「123」から宛先プロジェクト「456」に BigQuery テーブルをインポートする場合は、プロジェクト「123」の BigQuery テーブルに対する service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 権限を付与します。
- DATA_SCHEMA: 省略可。値は document および custom です。デフォルトは document です。
  - document: 使用する BigQuery テーブルは、取り込み用にデータを準備するで指定されているデフォルトの BigQuery スキーマに準拠している必要があります。jsonData 文字列にすべてのデータをラップしながら、各ドキュメントの ID を自分で定義できます。
  - custom: 任意の BigQuery テーブルスキーマが受け入れられ、Vertex AI Search はインポートされる各ドキュメントの ID を自動的に生成します。
- ERROR_DIRECTORY: 省略可。インポートに関するエラー情報用の Cloud Storage ディレクトリ（例: gs://<your-gcs-bucket>/directory/import_errors）。Vertex AI Search に一時ディレクトリを自動的に作成させるには、このフィールドを空のままにすることをおすすめします。
- RECONCILIATION_MODE: 省略可。値は FULL および INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、BigQuery からデータストアへのデータの増分更新が行われます。これにより、upsert オペレーションが行われ、新しいドキュメントが追加され、既存のドキュメントが更新された同じ ID のドキュメントで置き換えられます。FULL を指定すると、データストア内のドキュメントが完全にリベースされます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、BigQuery にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。
- AUTO_GENERATE_IDS: 省略可。ドキュメント ID を自動的に生成するかどうかを指定します。true に設定すると、ペイロードのハッシュに基づいてドキュメント ID が生成されます。生成されたドキュメント ID については、複数のインポートで一貫性が保たれない場合があります。複数のインポートで ID を自動生成する場合は、ドキュメント ID の整合性を維持するために、reconciliationMode を FULL に設定することを強くおすすめします。
  
  autoGenerateIds は、bigquerySource.dataSchema が custom に設定されている場合にのみ指定します。それ以外の場合は、INVALID_ARGUMENT エラーが返されます。autoGenerateIds を指定しない場合、または false に設定する場合は、idField を指定する必要があります。そうしないと、ドキュメントのインポートに失敗します。
- ID_FIELD: 省略可。ドキュメント ID のフィールドを指定します。BigQuery ソースファイルの場合、idField は、ドキュメント ID を含む BigQuery テーブルの列の名前を示します。
  
  idField は、（1）bigquerySource.dataSchema が custom に設定されている、および（2）auto_generate_ids が false に設定されているか、未設定の場合にのみ指定します。それ以外の場合は、INVALID_ARGUMENT エラーが返されます。
  
  BigQuery 列名の値は文字列型で、1 から 63 文字の範囲で、RFC-1034 に準拠している必要があります。そうしないと、ドキュメントのインポートに失敗します。

C#

詳細については、Vertex AI Search C# API のリファレンスドキュメントをご覧ください。

データストアを作成する

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

ドキュメントのインポート

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

詳細については、Vertex AI Search Go API のリファレンスドキュメントをご覧ください。

データストアを作成する


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

ドキュメントのインポート


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

詳細については、Vertex AI Search Java API のリファレンスドキュメントをご覧ください。

データストアを作成する

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

ドキュメントのインポート

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

詳細については、Vertex AI Search Node.js API のリファレンスドキュメントをご覧ください。

データストアを作成する

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

ドキュメントのインポート

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigquery_dataset = "YOUR_BIGQUERY_DATASET"
# bigquery_table = "YOUR_BIGQUERY_TABLE"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigquery_source=discoveryengine.BigQuerySource(
        project_id=project_id,
        dataset_id=bigquery_dataset,
        table_id=bigquery_table,
        data_schema="custom",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

詳細については、Vertex AI Search Ruby API のリファレンスドキュメントをご覧ください。

データストアを作成する

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

ドキュメントのインポート

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

定期的な同期で BigQuery に接続する

注: この機能はプレビュー版で、GCP サービス固有の規約の「pre-GA サービス規約」が適用されます。一般提供前のプロダクトと機能では、サポートが制限されることがあります。また、これらのプロダクトや機能には、他の一般提供前のバージョンと互換性のない変更が行われる場合があります。詳細については、リリースステージの説明をご覧ください。さらに、この機能を使用すると、生成 AI プレビュー版の利用規約（以下「プレビュー規約」）に同意したことになります。この機能については、お客様は Cloud のデータ処理に関する追加条項に記載されているとおりに個人データを処理できます。その際、（プレビュー規約に定義されるとおり）本契約で規定されている制限と義務が適用されます。

データをインポートする前に、取り込むデータを準備します。

次の手順では、BigQuery データセットを Vertex AI Search データコネクタに関連付けるデータコネクタを作成する方法と、作成するデータストアごとにデータセットのテーブルを指定する方法について説明します。データコネクタの子データストアは、エンティティ データストアと呼ばれます。

データセットのデータは、エンティティデータストアに定期的に同期されます。同期は、毎日、3 日ごと、5 日ごとに指定できます。

コンソール

Google Cloud コンソールを使用して、BigQuery データセットから Vertex AI Search にデータを定期的に同期するコネクタを作成する手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
ナビゲーションメニューで [データストア] をクリックします。
[データストアを作成] をクリックします。
[ソース] ページで、[BigQuery] を選択します。
インポートするデータの種類を選択します。
[定期的] をクリックします。
[同期頻度] で、Vertex AI Search コネクタが BigQuery データセットと同期する頻度を選択します。頻度は後で変更できます。
[BigQuery データセットのパス] フィールドで、[参照] をクリックし、取り込み用に準備したテーブルを含むデータセットを選択します。または、[BigQuery パス] フィールドにテーブルの場所を直接入力します。パスの形式は projectname.datasetname です。
[同期するテーブル] フィールドで、[参照] をクリックし、データストアに必要なデータを含むテーブルを選択します。
注:
テーブル内のデータが、手順 5 で選択したデータの種類と一致していることを確認してください。
不一致がある場合、次のいずれかが発生するまでわかりません。
- コネクタがデータをインポートしようとすると、エラーが発生します。
- 予期しない結果が表示される。これは、選択したタイプが構造化されているものの、非構造化であるか、またはメタデータを含む構造化であるべきだった場合に発生します。データはインポートされますが、コンテンツ URL またはメタデータが認識されず、文字列として扱われます。
データストアに使用する別のテーブルがデータセット内にある場合は、[テーブルを追加] をクリックして、それらのテーブルも指定します。
[続行] をクリックします。
データストアのリージョンを選択し、データコネクタの名前を入力して、[作成] をクリックします。

これで、データコネクタが作成され、データが定期的に BigQuery データセットと同期されるようになります。また、1 つ以上のエンティティデータストアを作成しています。データストアの名前は BigQuery テーブルと同じです。
取り込みのステータスを確認するには、[データストア] ページに移動し、データコネクタ名をクリックして、[データ] ページ > [データの取り込みアクティビティ] タブで詳細を表示します。[アクティビティ] タブのステータス列が [処理中] から [成功] に変わると、最初の取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

データソースを設定してデータを初めてインポートすると、データストアは設定時に選択した頻度でそのソースからデータを同期します。データコネクタの作成から約 1 時間後に、最初の同期が行われます。次の同期は、約 24 時間後、72 時間後、120 時間後のいずれかに行われます。

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Cloud Storage からインポートする

Cloud Storage テーブルからデータストアを作成するには、次の 2 つの方法があります。

1 回限りの取り込み: Cloud Storage のフォルダまたはファイルからデータストアにデータをインポートします。データストア内のデータは、手動でデータを更新しない限り変更されません。
定期的な取り込み: Cloud Storage フォルダまたはファイルからデータをインポートし、同期頻度を設定します。この頻度によって、Cloud Storage の場所から最新のデータでデータストアが更新される頻度が決まります。

次の表に、Cloud Storage データを Vertex AI Search データストアにインポートする 2 つの方法を比較します。

1 回限りの取り込み	定期的な取り込み
一般提供（GA）。	公開プレビュー版。
データは手動で更新する必要があります。	データは 1 日、3 日、5 日ごとに自動的に更新されます。データを手動で更新することはできません。
Vertex AI Search は、Cloud Storage の 1 つのフォルダまたはファイルから単一のデータストアを作成します。	Vertex AI Search はデータコネクタを作成し、指定されたファイルまたはフォルダにデータストア（エンティティデータストア）を関連付けます。各 Cloud Storage データコネクタには、単一のエンティティデータストアを設定できます。
複数のファイル、フォルダ、バケットのデータを 1 つのデータストアに結合するには、まず 1 つの Cloud Storage ロケーションからデータを取り込み、次に別のロケーションからデータを取り込みます。	手動でのデータインポートはサポートされていないません。エンティティデータストア内のデータは 1 つの Cloud Storage ファイルまたはフォルダからのみ取得できます。
データソースへのアクセス制御がサポートされています。詳細については、データソースへのアクセス制御をご覧ください。	データソースへのアクセス制御はサポートされていません。インポートされたデータにはアクセス制御を含めることが可能ですが、これらの制御は適用されません。
データストアは、Google Cloud コンソールまたは API を使用して作成できます。	コンソールを使用して、データコネクタとそのエンティティデータストアを作成する必要があります。
CMEK 準拠。	CMEK 準拠。

Cloud Storage から 1 回インポートする

Cloud Storage からデータを取り込むには、次の手順で Google Cloud コンソールまたは API を使用してデータストアを作成し、データを取り込みます。

データをインポートする前に、取り込むデータを準備します。

コンソール

コンソールを使用して Cloud Storage バケットからデータを取り込む手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[データストアを作成] をクリックします。
[ソース] ページで、[Cloud Storage] を選択します。
[インポートするフォルダまたはファイルを選択] セクションで、[フォルダ] または [ファイル] を選択します。
[参照] をクリックして、取り込み用に準備したデータを選択し、[選択] をクリックします。または、[gs://] フィールドにロケーションを直接入力します。
インポートするデータの種類を選択します。
[続行] をクリックします。
構造化データを 1 回限りインポートする場合:
1. フィールドをキープロパティにマッピングします。
2. スキーマに重要なフィールドが欠落している場合は、[新しいフィールドを追加] を使用して追加します。
  
  詳しくは、自動検出と編集についてをご覧ください。
3. [続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
省略可: 非構造化ドキュメントを選択した場合は、ドキュメントの解析とチャンキングのオプションを選択できます。パーサーを比較するには、ドキュメントを解析するをご覧ください。チャンク処理の詳細については、RAG 用にドキュメントをチャンクするをご覧ください。

OCR パーサーとレイアウトパーサーでは、追加費用が発生する可能性があります。Document AI 機能の料金をご覧ください。

パーサーを選択するには、[ドキュメント処理オプション] を開いて、使用するパーサーオプションを指定します。
[作成] をクリックします。
取り込みのステータスを確認するには、[データストア] ページに移動し、データストア名をクリックして、[データ] ページで詳細を表示します。 [アクティビティ] タブのステータス列が「処理中」から「インポート完了」に変わると、取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

REST

コマンドラインを使用してデータストアを作成し、Cloud Storage からデータを取り込む手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。

次のように置き換えます。
- PROJECT_ID: 実際の Google Cloud プロジェクト ID。
- DATA_STORE_ID: 作成する Vertex AI Search データストアの ID。この ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DATA_STORE_DISPLAY_NAME: 作成する Vertex AI Search データストアの表示名。
省略可: 非構造化データをアップロードし、ドキュメント解析を構成するか、RAG のドキュメントチャンキングを有効にする場合は、documentProcessingConfig オブジェクトを指定して、データストアの作成リクエストに含めます。スキャンされた PDF を取り込む場合は、PDF 用の OCR パーサーを構成することをおすすめします。解析オプションまたはチャンク処理オプションの構成方法については、ドキュメントの解析とチャンク化をご覧ください。
Cloud Storage からデータをインポートします。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "gcsSource": {
      "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],
      "dataSchema": "DATA_SCHEMA",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
    "errorConfig": {
      "gcsPrefix": "ERROR_DIRECTORY"
    }
  }'
```
次のように置き換えます。
- PROJECT_ID: 実際の Google Cloud プロジェクト ID。
- DATA_STORE_ID: Vertex AI Search データストアの ID。
- INPUT_FILE_PATTERN: ドキュメントを含む Cloud Storage のファイルパターン。
  
  構造化データまたはメタデータを含む非構造化データの場合、入力ファイルパターンの例は gs://<your-gcs-bucket>/directory/object.json です。1 つ以上のファイルに一致するパターンの例は gs://<your-gcs-bucket>/directory/*.json です。
  
  非構造化ドキュメントの場合、例は gs://<your-gcs-bucket>/directory/*.pdf です。パターンに一致する各ファイルがドキュメントになります。
  
  <your-gcs-bucket> が PROJECT_ID にない場合は、サービスアカウント service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com に Cloud Storage バケットに対する「ストレージオブジェクト閲覧者」権限を付与する必要があります。たとえば、ソースプロジェクト「123」から宛先プロジェクト「456」に Cloud Storage バケットをインポートする場合は、プロジェクト「123」の Cloud Storage バケットに対する service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 権限を付与します。
- DATA_SCHEMA: 省略可。値は document、custom、csv、content です。デフォルトは document です。
  - document: 構造化されていないドキュメントのメタデータを含む非構造化データをアップロードします。ファイルの各行は、次のいずれかの形式にする必要があります。各ドキュメントの ID を定義できます。
    - { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
    - { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
  - custom: 構造化ドキュメントの JSON をアップロードします。データはスキーマに従って整理されます。スキーマを指定できます。指定しない場合は、自動検出されます。ドキュメントの JSON 文字列を各行に一貫した形式で直接配置できます。Vertex AI Search は、インポートされた各ドキュメントの ID を自動的に生成します。
  - content: 非構造化ドキュメント（PDF、HTML、DOC、TXT、PPTX）をアップロードします。各ドキュメントの ID は、SHA256（GCS_URI）の最初の 128 ビットを 16 進数文字列としてエンコードした値として自動的に生成されます。一致するファイルが 10 万個のファイル上限を超えない限り、複数の入力ファイルパターンを指定できます。
  - csv: 各ヘッダーがドキュメントフィールドにマッピングされたヘッダー行を CSV ファイルに含めます。inputUris フィールドを使用して、CSV ファイルのパスを指定します。
- ERROR_DIRECTORY: 省略可。インポートに関するエラー情報用の Cloud Storage ディレクトリ（例: gs://<your-gcs-bucket>/directory/import_errors）。Vertex AI Search に一時ディレクトリを自動的に作成させるには、このフィールドを空のままにすることをおすすめします。
- RECONCILIATION_MODE: 省略可。値は FULL、INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、Cloud Storage からデータストアへのデータの増分更新が行われます。これにより、upsert オペレーションが行われ、新しいドキュメントが追加され、既存のドキュメントが更新された同じ ID のドキュメントで置き換えられます。FULL を指定すると、データストア内のドキュメントが完全にリベースされます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、Cloud Storage にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。
- AUTO_GENERATE_IDS: 省略可。ドキュメント ID を自動的に生成するかどうかを指定します。true に設定すると、ペイロードのハッシュに基づいてドキュメント ID が生成されます。生成されたドキュメント ID は、複数のインポートで一貫性が保たれない場合があります。複数のインポートで ID を自動生成する場合は、ドキュメント ID の整合性を維持するために、reconciliationMode を FULL に設定することを強くおすすめします。
  
  autoGenerateIds は、gcsSource.dataSchema が custom または csv に設定されている場合にのみ指定します。それ以外の場合は、INVALID_ARGUMENT エラーが返されます。autoGenerateIds を指定しない場合、または false に設定する場合は、idField を指定する必要があります。そうしないと、ドキュメントのインポートに失敗します。
- ID_FIELD: 省略可。ドキュメント ID のフィールドを指定します。Cloud Storage のソースドキュメントの場合、idField は、ドキュメント ID である JSON フィールドの名前を指定します。たとえば、{"my_id":"some_uuid"} がドキュメントの 1 つのドキュメント ID フィールドの場合は、"idField":"my_id" を指定します。これにより、"my_id" という名前のすべての JSON フィールドがドキュメント ID として識別されます。
  
  gcsSource.dataSchema は、（1）custom が csv に設定されている、および（2）auto_generate_ids が false に設定されているか、未設定の場合にのみ指定します。それ以外の場合は、INVALID_ARGUMENT エラーが返されます。
  
  Cloud Storage JSON フィールドの値は、文字列型で、1 から 63 文字の範囲で、RFC-1034 に準拠している必要があります。そうしないと、ドキュメントのインポートに失敗します。
  
  id_field で指定された JSON フィールド名は、文字列型で、1 から 63 文字の範囲で、RFC-1034 に準拠している必要があります。そうしないと、ドキュメントのインポートに失敗します。

C#

詳細については、Vertex AI Search C# API のリファレンスドキュメントをご覧ください。

データストアを作成する

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

ドキュメントのインポート

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

詳細については、Vertex AI Search Go API のリファレンスドキュメントをご覧ください。

データストアを作成する


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

ドキュメントのインポート


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

詳細については、Vertex AI Search Java API のリファレンスドキュメントをご覧ください。

データストアを作成する

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

ドキュメントのインポート

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

詳細については、Vertex AI Search Node.js API のリファレンスドキュメントをご覧ください。

データストアを作成する

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

ドキュメントのインポート

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"

# Examples:
# - Unstructured documents
#   - `gs://bucket/directory/file.pdf`
#   - `gs://bucket/directory/*.pdf`
# - Unstructured documents with JSONL Metadata
#   - `gs://bucket/directory/file.json`
# - Unstructured documents with CSV Metadata
#   - `gs://bucket/directory/file.csv`
# gcs_uri = "YOUR_GCS_PATH"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    gcs_source=discoveryengine.GcsSource(
        # Multiple URIs are supported
        input_uris=[gcs_uri],
        # Options:
        # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
        # - `custom` - Unstructured documents with custom JSONL metadata
        # - `document` - Structured documents in the discoveryengine.Document format.
        # - `csv` - Unstructured documents with CSV metadata
        data_schema="content",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

詳細については、Vertex AI Search Ruby API のリファレンスドキュメントをご覧ください。

データストアを作成する

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

ドキュメントのインポート

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

定期的な同期で Cloud Storage に接続する

データをインポートする前に、取り込むデータを準備します。

次の手順では、Cloud Storage の場所を Vertex AI Search データコネクタに関連付けるデータコネクタを作成する方法と、作成するデータストアの場所にあるフォルダまたはファイルを指定する方法について説明します。データコネクタの子データストアは、エンティティ データストアと呼ばれます。

データはエンティティデータストアに定期的に同期されます。同期は、「毎日」、「3 日ごと」、「5 日ごと」に指定できます。

コンソール

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[データストアを作成] をクリックします。
[ソース] ページで、[Cloud Storage] を選択します。
インポートするデータの種類を選択します。
[定期的] をクリックします。
[同期頻度] で、Vertex AI Search コネクタが Cloud Storage ロケーションと同期する頻度を選択します。頻度は後で変更できます。
[Select a folder or file you want to import] セクションで、[フォルダ] または [ファイル] を選択します。
[参照] をクリックして、取り込み用に準備したデータを選択し、[選択] をクリックします。または、[gs://] フィールドにロケーションを直接入力します。
[続行] をクリックします。
データコネクタのリージョンを選択します。
データコネクタの名前を入力します。
省略可: 非構造化ドキュメントを選択した場合は、ドキュメントの解析とチャンキングのオプションを選択できます。パーサーを比較するには、ドキュメントを解析するをご覧ください。チャンク処理の詳細については、RAG 用にドキュメントをチャンクするをご覧ください。

パーサーを選択するには、[ドキュメント処理オプション] を開いて、使用するパーサーオプションを指定します。

OCR パーサーとレイアウトパーサーでは、追加費用が発生する可能性があります。Document AI 機能の料金をご覧ください。
[作成] をクリックします。

これで、Cloud Storage のロケーションとデータを定期的に同期するデータコネクタが作成されました。gcs_store という名前のエンティティデータストアも作成しました。
取り込みのステータスを確認するには、[データストア] ページに移動し、データコネクタ名をクリックして、[データ] ページで詳細を表示します。

[データ取り込みアクティビティ] タブ。[データ取り込みアクティビティ] タブのステータス列が「処理中」から「成功」に変わると、最初の取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

データソースを設定してデータを初めてインポートすると、設定時に選択した頻度でそのソースからデータが同期されます。データコネクタの作成から約 1 時間後に、最初の同期が行われます。次の同期は、約 24 時間後、72 時間後、120 時間後のいずれかに行われます。

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Google ドライブに接続する

Vertex AI Search は、データフェデレーションを使用して Google ドライブのデータを検索できます。データフェデレーションでは、指定したデータソースから直接情報を取得します。データが Vertex AI Search のインデックスにコピーされるわけではないため、データストレージのことを憂慮する必要はありません。

始める前に

接続する Google ドライブインスタンスで使用するのと同じアカウントで Google Cloud コンソールにログインする必要があります。Vertex AI Search は、Google Workspace のお客様 ID を使用して Google ドライブに接続します。

Vertex AI Search でデータソースのアクセス制御を適用してデータを保護するには、ID プロバイダが構成されていることを確認してください。

すべてのドキュメントにアクセスできることを確認します。そのためには、ドキュメントをドメイン所有の共有ドライブに配置するか、ドメイン内のユーザーをそのドキュメントのオーナーに設定する必要があります。
Google ドライブのデータを Vertex AI Search に接続するには、他の Google サービスで Google Workspace のスマート機能を有効にします。詳しくは、Google Workspace のスマート機能をオンまたはオフにするをご覧ください。

セキュリティ対策機能を使用する場合は、次の表で説明するように、Google ドライブ内のデータに関連する制限事項に注意してください。

セキュリティ対策	次の点にご注意ください。
データ所在地（DRZ）	Vertex AI Search では、 Google Cloudでのデータ所在地のみが保証されます。データ所在地と Google ドライブについては、Google Workspace のコンプライアンスガイダンスとドキュメント（データを保存するリージョンを選択するやデジタル主権など）をご覧ください。
顧客管理の暗号鍵（CMEK）	鍵は Google Cloud内のデータのみを暗号化します。Cloud Key Management Service の制御は、Google ドライブに保存されているデータには適用されません。
アクセスの透明性	アクセスの透明性では、Google の担当者が Google Cloud プロジェクトに対して行った操作が記録されます。また、Google Workspace によって作成されたアクセスの透明性ログを確認する必要があります。詳しくは、Google Workspace 管理者向けヘルプドキュメントのアクセスの透明性ログイベントをご覧ください。

Google ドライブのデータストアを作成する

コンソール

コンソールを使用して Google ドライブ内のデータを検索できるようにする手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
ナビゲーションメニューで [データストア] をクリックします。
[データストアを作成] をクリックします。
[データソースを選択] ページで、[Google ドライブ] を選択します。
データストアのドライブソースを指定します。
- すべて: ドライブ全体をデータストアに追加します。
- 特定の共有ドライブ: 共有ドライブのフォルダ ID を追加します。
- 特定の共有フォルダ: 共有フォルダの ID を追加します。
共有ドライブのフォルダ ID や特定のフォルダ ID を確認するには、共有ドライブまたはフォルダに移動して、URL から ID をコピーします。URL の形式は https://drive.google.com/corp/drive/folders/ID です。

例: https://drive.google.com/corp/drive/folders/123456789012345678901
[続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
省略可: アプリを使用してデータをクエリするときに、このデータストアのデータが生成 AI コンテンツで使用されないようにするには、[生成 AI のオプション] をクリックし、[生成 AI 機能から除外] を選択します。
[作成] をクリックします。

エラーメッセージ

次の表に、この Google データソースを扱う際に表示される可能性のあるエラーメッセージをまとめました。HTTP エラーコードや推奨されるトラブルシューティングの手順も記載されています。

エラーコード	エラーメッセージ	説明	トラブルシューティング
403（権限が拒否されました）	サービスアカウントの認証情報を使用した検索は、Google Workspace のデータストアではサポートされていません。	検索対象のエンジンには Google Workspace のデータストアが含まれており、渡された認証情報はサービスアカウントのものです。Google Workspace データストアでサービスアカウントの認証情報を使用して検索することはできません。	ユーザー認証情報を使用して検索を呼び出すか、Google Workspace データストアをエンジンから削除します。
403（権限が拒否されました）	Google Workspace データストアでは一般ユーザー向けアカウントはサポートされていません。	検索は一般ユーザー向けアカウント（@gmail.com）の認証情報を使用して呼び出されますが、これは Google Workspace データストアではサポートされていません。	エンジンから Google Workspace データストアを削除するか、管理対象の Google アカウントを使用します。
403（権限が拒否されました）	データストアに対してお客様 ID が一致しません	検索は、Google Workspace データストアと同じ組織に属するユーザーのみに許可されています。	ユーザーと Google Workspace データストアが異なる組織に属している場合は、Google Workspace データストアをエンジンから削除するか、サポートにお問い合わせください。
403（権限が拒否されました）	Agentspace による Workspace へのアクセスが、組織の管理者によって無効にされています。	Google Workspace 管理者が Vertex AI Search の Google Workspace データへのアクセスを無効にしています。	アクセスを有効にするには、Google Workspace 管理者にお問い合わせください。
400（引数が無効です）	エンジンにデフォルトの Google ドライブデータストアと共有の Google ドライブデータストアの両方を含めることはできません。	すべてのドライブを含むデータストア（デフォルト）と特定の共有ドライブを含むデータストアを同じアプリに接続することはできません。	新しい Google ドライブデータソースをアプリに接続するには、不要なデータストアのリンクを解除してから、使用する新しいデータストアを追加します。

トラブルシューティング

検索しても目的のファイルが見つからない場合、原因は Google ドライブの検索インデックスに関する以下の制限のいずれかである可能性があります。

Google ドライブは、ファイルを検索可能にするために、テキストと書式設定データを 1 MB までしか抽出しません。1 MB のインデックス上限を超えるキーワードは検索できません。
ほとんどのファイル形式において、ファイルサイズの上限は 10 MB です。ただし、以下は例外となります。
- XLSX ファイル（.xlsx）の上限は 20 MB です。
- PDF ファイル（.pdf）の上限は 30 MB です。
- テキストファイル（.txt）の上限は 100 MB です。
注: 上限を超えているサイズのファイルは検索できず、検索結果にも表示されません。
PDF ファイルの光学式文字認識は最大 80 ページまでです。また、50 MB または 80 ページを超える PDF は Google ドライブのインデックスに登録されません。

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果を取得するには、検索結果を取得するをご覧ください。

Gmail に接続する

Google Cloud コンソールで Gmail に接続するデータストアを作成する手順は次のとおりです。データストアを接続したら、データストアを検索アプリに接続して、Gmail のデータを検索できます。

始める前に

接続する Google Workspace インスタンスで使用するアカウントと同じアカウントで Google Cloud コンソールにログインする必要があります。Vertex AI Search は、Google Workspace お客様 ID を使用して Gmail に接続します。

Vertex AI Search でデータソースのアクセス制御を適用してデータを保護するには、ID プロバイダが構成されていることを確認してください。

制限事項

セキュリティ管理を使用する場合は、次の表で説明するように、Gmail のデータに関連する制限事項に注意してください。

セキュリティ対策	次の点にご注意ください。
データ所在地（DRZ）	Vertex AI Search では、 Google Cloudでのデータ所在地のみが保証されます。データ所在地と Gmail については、Google Workspace のコンプライアンスガイダンスとドキュメント（データを保存するリージョンを選択するやデジタル主権など）をご覧ください。
顧客管理の暗号鍵（CMEK）	鍵は Google Cloud内のデータのみを暗号化します。Cloud Key Management Service の制御は、Gmail に保存されているデータには適用されません。
アクセスの透明性	アクセスの透明性では、Google の担当者が Google Cloud プロジェクトに対して行った操作が記録されます。また、Google Workspace によって作成されたアクセスの透明性ログを確認する必要があります。詳しくは、Google Workspace 管理者向けヘルプドキュメントのアクセスの透明性ログイベントをご覧ください。

Gmail データストアを作成する

コンソール

コンソールを使用して Gmail のデータを検索可能にする手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
ナビゲーションメニューで [データストア] をクリックします。
[データストアを作成] をクリックします。
[データソースを選択] ページで、[Google Gmail] を選択します。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。
検索アプリを作成するの手順に沿って、作成したデータストアを Vertex AI Search アプリに接続します。

エラーメッセージ

エラーコード	エラーメッセージ	説明	トラブルシューティング
403（権限が拒否されました）	サービスアカウントの認証情報を使用した検索は、Google Workspace のデータストアではサポートされていません。	検索対象のエンジンには Google Workspace のデータストアが含まれており、渡された認証情報はサービスアカウントのものです。Google Workspace データストアでサービスアカウントの認証情報を使用して検索することはできません。	ユーザー認証情報を使用して検索を呼び出すか、Google Workspace データストアをエンジンから削除します。
403（権限が拒否されました）	Google Workspace データストアでは一般ユーザー向けアカウントはサポートされていません。	検索は一般ユーザー向けアカウント（@gmail.com）の認証情報を使用して呼び出されますが、これは Google Workspace データストアではサポートされていません。	エンジンから Google Workspace データストアを削除するか、管理対象の Google アカウントを使用します。
403（権限が拒否されました）	データストアに対してお客様 ID が一致しません	検索は、Google Workspace データストアと同じ組織に属するユーザーのみに許可されています。	ユーザーと Google Workspace データストアが異なる組織に属している場合は、Google Workspace データストアをエンジンから削除するか、サポートにお問い合わせください。
403（権限が拒否されました）	Agentspace による Workspace へのアクセスが、組織の管理者によって無効にされています。	Google Workspace 管理者が Vertex AI Search の Google Workspace データへのアクセスを無効にしています。	アクセスを有効にするには、Google Workspace 管理者にお問い合わせください。
400（引数が無効です）	エンジンにデフォルトの Google ドライブデータストアと共有の Google ドライブデータストアの両方を含めることはできません。	すべてのドライブを含むデータストア（デフォルト）と特定の共有ドライブを含むデータストアを同じアプリに接続することはできません。	新しい Google ドライブデータソースをアプリに接続するには、不要なデータストアのリンクを解除してから、使用する新しいデータストアを追加します。

次のステップ

アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果をプレビューするをご覧ください。

Google サイトに接続する

Google サイトのデータを検索するには、次の手順で Google Cloud コンソールを使用してコネクタを作成します。

始める前に:

接続する Google Workspace インスタンスで使用するアカウントと同じアカウントで Google Cloud コンソールにログインする必要があります。Vertex AI Search は、Google Workspace のお客様 ID を使用して Google サイトに接続します。
Vertex AI Search でデータソースのアクセス制御を適用してデータを保護するには、ID プロバイダが構成されていることを確認してください。

セキュリティ対策機能を使用する場合は、次の表で説明するように、Google サイト内のデータに関連する制限事項に注意してください。

セキュリティ対策	次の点にご注意ください。
データ所在地（DRZ）	Vertex AI Search では、 Google Cloudでのデータ所在地のみが保証されます。データ所在地と Google サイトについては、Google Workspace のコンプライアンスガイダンスとドキュメント（データを保存するリージョンを選択するやデジタル主権など）をご覧ください。
顧客管理の暗号鍵（CMEK）	鍵は Google Cloud内のデータのみを暗号化します。Cloud Key Management Service の制御は、Google サイトに保存されているデータには適用されません。
アクセスの透明性	アクセスの透明性では、Google の担当者が Google Cloud プロジェクトに対して行った操作が記録されます。また、Google Workspace によって作成されたアクセスの透明性ログを確認する必要があります。詳しくは、Google Workspace 管理者向けヘルプドキュメントのアクセスの透明性ログイベントをご覧ください。

コンソール

コンソールを使用して Google サイト内のデータを検索できるようにする手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[新しいデータストア] をクリックします。
[ソース] ページで、[Google サイト] を選択します。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Google カレンダーに接続する

注: この機能はプレビュー版で、GCP サービス固有の規約の「pre-GA サービス規約」が適用されます。一般提供前のプロダクトと機能では、サポートが制限されることがあります。また、これらのプロダクトや機能には、他の一般提供前のバージョンと互換性のない変更が行われる場合があります。詳細については、リリースステージの説明をご覧ください。この機能を使用すると、生成 AI プレビュー版の利用規約（以下「プレビュー規約」）に同意したことになります。この機能については、お客様は Cloud のデータ処理に関する追加条項に記載されているとおりに個人データを処理できます。その際、（プレビュー規約に定義されるとおり）本契約で規定されている制限と義務が適用されます。

Google カレンダーのデータを検索するには、次の手順で Google Cloud コンソールを使用してデータストアを作成します。

始める前に

接続する Google Workspace インスタンスで使用するアカウントと同じアカウントで Google Cloud コンソールにログインする必要があります。Vertex AI Search は、Google Workspace のお客様 ID を使用して Google カレンダーに接続します。

Vertex AI Search でデータソースのアクセス制御を適用してデータを保護するには、ID プロバイダが構成されていることを確認してください。

セキュリティ管理を使用する場合は、次の表で説明するように、Google カレンダーのデータに関連する制限事項に注意してください。

セキュリティ対策	次の点にご注意ください。
データ所在地（DRZ）	Vertex AI Search では、 Google Cloudでのデータ所在地のみが保証されます。データ所在地と Google カレンダーについては、Google Workspace のコンプライアンスガイダンスとドキュメント（データを保存するリージョンを選択するやデジタル主権など）をご覧ください。
顧客管理の暗号鍵（CMEK）	鍵は Google Cloud内のデータのみを暗号化します。Cloud Key Management Service の制御は、Google カレンダーに保存されているデータには適用されません。
アクセスの透明性	アクセスの透明性では、Google の担当者が Google Cloud プロジェクトに対して行った操作が記録されます。また、Google Workspace によって作成されたアクセスの透明性ログを確認する必要があります。詳しくは、Google Workspace 管理者向けヘルプドキュメントのアクセスの透明性ログイベントをご覧ください。

Google カレンダーのデータストアを作成する

コンソールを使用して Google カレンダーのデータを検索可能にする手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
ナビゲーションメニューで [データストア] をクリックします。
[データストアを作成] をクリックします。
[データソースを選択] ページで、[Google カレンダー] を選択します。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。

エラーメッセージ

エラーコード	エラーメッセージ	説明	トラブルシューティング
403（権限が拒否されました）	サービスアカウントの認証情報を使用した検索は、Google Workspace のデータストアではサポートされていません。	検索対象のエンジンには Google Workspace のデータストアが含まれており、渡された認証情報はサービスアカウントのものです。Google Workspace データストアでサービスアカウントの認証情報を使用して検索することはできません。	ユーザー認証情報を使用して検索を呼び出すか、Google Workspace データストアをエンジンから削除します。
403（権限が拒否されました）	Google Workspace データストアでは一般ユーザー向けアカウントはサポートされていません。	検索は一般ユーザー向けアカウント（@gmail.com）の認証情報を使用して呼び出されますが、これは Google Workspace データストアではサポートされていません。	エンジンから Google Workspace データストアを削除するか、管理対象の Google アカウントを使用します。
403（権限が拒否されました）	データストアに対してお客様 ID が一致しません	検索は、Google Workspace データストアと同じ組織に属するユーザーのみに許可されています。	ユーザーと Google Workspace データストアが異なる組織に属している場合は、Google Workspace データストアをエンジンから削除するか、サポートにお問い合わせください。
403（権限が拒否されました）	Agentspace による Workspace へのアクセスが、組織の管理者によって無効にされています。	Google Workspace 管理者が Vertex AI Search の Google Workspace データへのアクセスを無効にしています。	アクセスを有効にするには、Google Workspace 管理者にお問い合わせください。
400（引数が無効です）	エンジンにデフォルトの Google ドライブデータストアと共有の Google ドライブデータストアの両方を含めることはできません。	すべてのドライブを含むデータストア（デフォルト）と特定の共有ドライブを含むデータストアを同じアプリに接続することはできません。	新しい Google ドライブデータソースをアプリに接続するには、不要なデータストアのリンクを解除してから、使用する新しいデータストアを追加します。

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果を取得するには、検索結果を取得するをご覧ください。

Google グループに接続する

Google グループのデータを検索するには、次の手順で Google Cloud コンソールを使用してコネクタを作成します。

始める前に:

接続する Google Workspace インスタンスで使用するアカウントと同じアカウントで Google Cloud コンソールにログインする必要があります。Vertex AI Search は、Google Workspace のお客様 ID を使用して Google グループに接続します。
Vertex AI Search でデータソースのアクセス制御を適用してデータを保護するには、ID プロバイダが構成されていることを確認してください。

セキュリティ管理を使用する場合は、次の表で説明するように、Google グループのデータに関連する制限事項に注意してください。

セキュリティ対策	次の点にご注意ください。
データ所在地（DRZ）	Vertex AI Search では、 Google Cloudでのデータ所在地のみが保証されます。データ所在地と Google グループについては、Google Workspace のコンプライアンスガイダンスとドキュメント（データを保存するリージョンを選択するやデジタル主権など）をご覧ください。
顧客管理の暗号鍵（CMEK）	鍵は Google Cloud内のデータのみを暗号化します。Cloud Key Management Service の制御は、Google グループに保存されているデータには適用されません。
アクセスの透明性	アクセスの透明性では、Google の担当者が Google Cloud プロジェクトに対して行った操作が記録されます。また、Google Workspace によって作成されたアクセスの透明性ログを確認する必要があります。詳しくは、Google Workspace 管理者向けヘルプドキュメントのアクセスの透明性ログイベントをご覧ください。

コンソール

コンソールを使用して Google グループのデータを検索可能にする手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[新しいデータストア] をクリックします。
[ソース] ページで、[Google グループ] を選択します。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。取り込みには、データのサイズに応じて、数分から数時間かかることがあります。データストアを検索に使用する前に、少なくとも 1 時間待ちます。

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Cloud SQL からインポート

Cloud SQL からデータを取り込むには、次の手順で Cloud SQL へのアクセスを設定し、データストアを作成してデータを取り込みます。

Cloud SQL インスタンスのステージングバケットへのアクセスを設定する

Cloud SQL からデータを取り込むと、データは最初に Cloud Storage バケットにステージングされます。Cloud SQL インスタンスに Cloud Storage バケットへのアクセス権を付与する手順は次のとおりです。

Google Cloud コンソールで、[SQL] ページに移動します。

SQL
インポート元の Cloud SQL インスタンスをクリックします。
インスタンスのサービスアカウントの ID をコピーします。ID は、メールアドレスのような形式です（例: p9876-abcd33f@gcp-sa-cloud-sql.iam.gserviceaccount.com）。
[IAM と管理] ページに移動します。

IAM と管理
[アクセス権を付与] をクリックします。
[新しいプリンシパル] で、インスタンスのサービスアカウント ID を入力し、[Cloud Storage] > [ストレージ管理者] ロールを選択します。
[保存] をクリックします。

次は:

Cloud SQL データが Vertex AI Search と同じプロジェクトにある場合: Cloud SQL からデータをインポートするに進みます。
Cloud SQL データが Vertex AI Search プロジェクトとは異なるプロジェクトにある場合: 別のプロジェクトから Cloud SQL へのアクセスを設定するに進みます。

別のプロジェクトから Cloud SQL へのアクセスを設定する

別のプロジェクトにある Cloud SQL データへのアクセス権を Vertex AI Search に付与する手順は次のとおりです。

次の PROJECT_NUMBER 変数を Vertex AI Search プロジェクト番号に置き換えてから、コードブロックの内容をコピーします。これは Vertex AI Search サービスアカウントの識別子です。
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
```
[IAM と管理] ページに移動

IAM と管理
[IAM と管理] ページで Cloud SQL プロジェクトに切り替え、[アクセス権を付与] をクリックします。
[新しいプリンシパル] で、サービスアカウントの ID を入力し、[Cloud SQL] > [Cloud SQL 閲覧者] ロールを選択します。
[保存] をクリックします。

次に、Cloud SQL からデータをインポートするに進みます。

Cloud SQL からデータをインポートする

コンソール

コンソールを使用して Cloud SQL からデータを取り込む手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[新しいデータストア] をクリックします。
[ソース] ページで、[Cloud SQL] を選択します。
インポートするデータのプロジェクト ID、インスタンス ID、データベース ID、テーブル ID を指定します。
[参照] をクリックして、データのエクスポート先となる Cloud Storage の一時的なロケーションを選択し、[選択] をクリックします。または、[gs://] フィールドにロケーションを直接入力します。
サーバーレスエクスポートを有効にするかどうかを選択します。サーバーレスエクスポートには追加料金が発生します。サーバーレスエクスポートについては、Cloud SQL ドキュメントのエクスポートのパフォーマンスへの影響を最小限に抑えるをご覧ください。
[続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。
取り込みのステータスを確認するには、[データストア] ページに移動し、データストア名をクリックして、[データ] ページで詳細を表示します。 [アクティビティ] タブのステータス列が「処理中」から「インポート完了」に変わると、取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

REST

コマンドラインを使用してデータストアを作成し、Cloud SQL からデータを取り込む手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
次のように置き換えます。
- PROJECT_ID: 実際のプロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DISPLAY_NAME: データストアの表示名。これは Google Cloud コンソールに表示されることがあります。
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。
Cloud SQL からデータをインポートします。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "cloudSqlSource": {
      "projectId": "SQL_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
      "gcsStagingDir": "STAGING_DIRECTORY"
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
次のように置き換えます。
- PROJECT_ID: Vertex AI Search プロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- SQL_PROJECT_ID: Cloud SQL プロジェクトの ID
- INSTANCE_ID: Cloud SQL インスタンスの ID。
- DATABASE_ID: Cloud SQL データベースの ID。
- TABLE_ID: Cloud SQL テーブルの ID。
- STAGING_DIRECTORY: 省略可。Cloud Storage ディレクトリ（例: gs://<your-gcs-bucket>/directory/import_errors）。
- RECONCILIATION_MODE: 省略可。値は FULL および INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、Cloud SQL からデータストアへのデータの増分更新が行われます。これにより、アップサートオペレーションが実行され、新しいドキュメントを追加し、既存のドキュメントを更新された同じ ID のドキュメントで置き換えます。FULL を指定すると、データストア内のドキュメントが完全に再ベース化されます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、Cloud SQL にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# sql_project_id = "YOUR_SQL_PROJECT_ID"
# sql_instance_id = "YOUR_SQL_INSTANCE_ID"
# sql_database_id = "YOUR_SQL_DATABASE_ID"
# sql_table_id = "YOUR_SQL_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    cloud_sql_source=discoveryengine.CloudSqlSource(
        project_id=sql_project_id,
        instance_id=sql_instance_id,
        database_id=sql_database_id,
        table_id=sql_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Spanner からインポートする

Spanner からデータを取り込むには、次の手順で Google Cloud コンソールまたは API を使用してデータストアを作成し、データを取り込みます。

別のプロジェクトから Spanner へのアクセスを設定する

Spanner データが Vertex AI Search と同じプロジェクトにある場合は、Spanner からデータをインポートするに進みます。

別のプロジェクトにある Spanner データに Vertex AI Search がアクセスできるようにする手順は次のとおりです。

次の PROJECT_NUMBER 変数を Vertex AI Search プロジェクト番号に置き換えてから、このコードブロックの内容をコピーします。これは Vertex AI Search サービスアカウントの識別子です。
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
[IAM と管理] ページに移動

IAM と管理
[IAM と管理] ページで Spanner プロジェクトに切り替え、[アクセス権を付与] をクリックします。
[新しいプリンシパル] に、サービスアカウントの ID を入力し、次のいずれかを選択します。
- インポート中に Data Boost を使用しない場合は、[Cloud Spanner] > [Cloud Spanner データベース読み取り] ロールを選択します。
- インポート中に Data Boost を使用する場合は、[Cloud Spanner] > [Cloud Spanner データベース管理者] ロール、または Cloud Spanner データベース読み取りと spanner.databases.useDataBoost の権限を持つカスタムロールを選択します。Data Boost については、Spanner ドキュメントの Data Boost の概要をご覧ください。
[保存] をクリックします。

次に、Spanner からデータをインポートするに進みます。

Spanner からデータをインポートする

コンソール

コンソールを使用して Spanner からデータを読み込む手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[新しいデータストア] をクリックします。
[ソース] ページで、[Cloud Spanner] を選択します。
インポートするデータのプロジェクト ID、インスタンス ID、データベース ID、テーブル ID を指定します。
Data Boost をオンにするかどうかを選択します。Data Boost については、Spanner ドキュメントの Data Boost の概要をご覧ください。
[続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。
取り込みのステータスを確認するには、[データストア] ページに移動し、データストア名をクリックして、[データ] ページで詳細を表示します。 [アクティビティ] タブのステータス列が「処理中」から「インポート完了」に変わると、取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

REST

コマンドラインを使用してデータストアを作成し、Spanner からデータを取り込む手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
  "contentConfig": "CONTENT_REQUIRED",
}'
```
次のように置き換えます。
- PROJECT_ID: Vertex AI Search プロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DISPLAY_NAME: データストアの表示名。これは Google Cloud コンソールに表示されることがあります。
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。
Spanner からデータをインポートします。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "cloudSpannerSource": {
      "projectId": "SPANNER_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
      "enableDataBoost": "DATA_BOOST_BOOLEAN"
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
次のように置き換えます。
- PROJECT_ID: Vertex AI Search プロジェクトの ID。
- DATA_STORE_ID: データストアの ID。
- SPANNER_PROJECT_ID: Spanner プロジェクトの ID。
- INSTANCE_ID: Spanner インスタンスの ID。
- DATABASE_ID: Spanner データベースの ID。
- TABLE_ID: Spanner テーブルの ID。
- DATA_BOOST_BOOLEAN: 省略可。Data Boost をオンにするかどうか。Data Boost については、Spanner ドキュメントの Data Boost の概要をご覧ください。
- RECONCILIATION_MODE: 省略可。値は FULL および INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、Spanner からデータストアへのデータの増分更新が行われます。これにより、upsert オペレーションが行われ、新しいドキュメントが追加され、既存のドキュメントが更新された同じ ID のドキュメントで置き換えられます。FULL を指定すると、データストア内のドキュメントが完全にリベースされます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、Spanner にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。
- AUTO_GENERATE_IDS: 省略可。ドキュメント ID を自動的に生成するかどうかを指定します。true に設定すると、ペイロードのハッシュに基づいてドキュメント ID が生成されます。生成されたドキュメント ID は、複数のインポートで一貫性が保たれない場合があります。複数のインポートで ID を自動生成する場合は、ドキュメント ID の整合性を維持するために、reconciliationMode を FULL に設定することを強くおすすめします。
- ID_FIELD: 省略可。ドキュメント ID のフィールドを指定します。

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# spanner_project_id = "YOUR_SPANNER_PROJECT_ID"
# spanner_instance_id = "YOUR_SPANNER_INSTANCE_ID"
# spanner_database_id = "YOUR_SPANNER_DATABASE_ID"
# spanner_table_id = "YOUR_SPANNER_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    spanner_source=discoveryengine.SpannerSource(
        project_id=spanner_project_id,
        instance_id=spanner_instance_id,
        database_id=spanner_database_id,
        table_id=spanner_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Firestore からインポートする

Firestore からデータを取り込むには、次の手順で Google Cloud コンソールまたは API を使用してデータストアを作成し、データを取り込みます。

Firestore データが Vertex AI Search と同じプロジェクトにある場合は、Firestore からデータをインポートするに進みます。

Firestore データが Vertex AI Search プロジェクトとは異なるプロジェクトにある場合は、Firestore へのアクセスを設定するに進みます。

別のプロジェクトから Firestore へのアクセスを設定する

別のプロジェクトにある Firestore データへのアクセス権を Vertex AI Search に付与する手順は次のとおりです。

次の PROJECT_NUMBER 変数を Vertex AI Search プロジェクト番号に置き換えてから、このコードブロックの内容をコピーします。これは Vertex AI Search サービスアカウントの識別子です。
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
[IAM と管理] ページに移動

IAM と管理
[IAM と管理] ページで Firestore プロジェクトに切り替え、[アクセス権を付与] をクリックします。
[新しいプリンシパル] で、インスタンスのサービスアカウント ID を入力し、[Datastore] > [Cloud Datastore インポート / エクスポート管理者] ロールを選択します。
[保存] をクリックします。
Vertex AI Search プロジェクトに戻ります。

次に、Firestore からデータをインポートするに進みます。

Firestore からデータをインポートする

コンソール

コンソールを使用して Firestore からデータを読み込む手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
[データストア] ページに移動します。
[新しいデータストア] をクリックします。
[ソース] ページで、[Firestore] を選択します。
インポートするデータのプロジェクト ID、データベース ID、コレクション ID を指定します。
[続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。
取り込みのステータスを確認するには、[データストア] ページに移動し、データストア名をクリックして、[データ] ページで詳細を表示します。 [アクティビティ] タブのステータス列が「処理中」から「インポート完了」に変わると、取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

REST

コマンドラインを使用してデータストアを作成し、Firestore からデータを取り込む手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
次のように置き換えます。
- PROJECT_ID: 実際のプロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DISPLAY_NAME: データストアの表示名。これは Google Cloud コンソールに表示されることがあります。
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。
Firestore からデータをインポートします。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "firestoreSource": {
      "projectId": "FIRESTORE_PROJECT_ID",
      "databaseId": "DATABASE_ID",
      "collectionId": "COLLECTION_ID",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
次のように置き換えます。
- PROJECT_ID: Vertex AI Search プロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- FIRESTORE_PROJECT_ID: Firestore プロジェクトの ID。
- DATABASE_ID: Firestore データベースの ID。
- COLLECTION_ID: Firestore コレクションの ID。
- RECONCILIATION_MODE: 省略可。値は FULL および INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、Firestore からデータストアへのデータの増分更新が行われます。これにより、アップサートオペレーションが実行され、新しいドキュメントを追加し、既存のドキュメントを更新された同じ ID のドキュメントで置き換えます。FULL を指定すると、データストア内のドキュメントが完全に再ベース化されます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、Firestore にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。
- AUTO_GENERATE_IDS: 省略可。ドキュメント ID を自動的に生成するかどうかを指定します。true に設定すると、ペイロードのハッシュに基づいてドキュメント ID が生成されます。生成されたドキュメント ID は、複数のインポートで一貫性が保たれない場合があります。複数のインポートで ID を自動生成する場合は、ドキュメント ID の整合性を維持するために、reconciliationMode を FULL に設定することを強くおすすめします。
- ID_FIELD: 省略可。ドキュメント ID のフィールドを指定します。

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# firestore_project_id = "YOUR_FIRESTORE_PROJECT_ID"
# firestore_database_id = "YOUR_FIRESTORE_DATABASE_ID"
# firestore_collection_id = "YOUR_FIRESTORE_COLLECTION_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    firestore_source=discoveryengine.FirestoreSource(
        project_id=firestore_project_id,
        database_id=firestore_database_id,
        collection_id=firestore_collection_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

Bigtable からインポートする

Bigtable からデータを取り込むには、次の手順でデータストアを作成し、API を使ってデータを取り込みます。

Bigtable へのアクセスを設定する

別のプロジェクトにある Bigtable データに Vertex AI Search がアクセスできるようにする手順は次のとおりです。

次の PROJECT_NUMBER 変数を Vertex AI Search プロジェクト番号に置き換えてから、このコードブロックの内容をコピーします。これは Vertex AI Search サービスアカウントの識別子です。
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com`
```
[IAM と管理] ページに移動

IAM と管理
[IAM と管理] ページで Bigtable プロジェクトに切り替え、[アクセス権を付与] をクリックします。
[新しいプリンシパル] で、インスタンスのサービスアカウント ID を入力し、[Bigtable] > [Bigtable Reader] ロールを選択します。
[保存] をクリックします。
Vertex AI Search プロジェクトに戻ります。

次に、Bigtable からデータをインポートするに進みます。

Bigtable からデータをインポートする

REST

コマンドラインを使用してデータストアを作成し、Bigtable からデータを取り込む手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
次のように置き換えます。
- PROJECT_ID: 実際のプロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DISPLAY_NAME: データストアの表示名。これは Google Cloud コンソールに表示されることがあります。
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。
Bigtable からデータをインポートします。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "bigtableSource ": {
      "projectId": "BIGTABLE_PROJECT_ID",
      "instanceId": "INSTANCE_ID",
      "tableId": "TABLE_ID",
      "bigtableOptions": {
        "keyFieldName": "KEY_FIELD_NAME",
        "families": {
          "key": "KEY",
          "value": {
            "fieldName": "FIELD_NAME",
            "encoding": "ENCODING",
            "type": "TYPE",
            "columns": [
              {
                "qualifier": "QUALIFIER",
                "fieldName": "FIELD_NAME",
                "encoding": "COLUMN_ENCODING",
                "type": "COLUMN_VALUES_TYPE"
              }
            ]
          }
         }
         ...
      }
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
次のように置き換えます。
- PROJECT_ID: Vertex AI Search プロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- BIGTABLE_PROJECT_ID: Bigtable プロジェクトの ID。
- INSTANCE_ID: Bigtable インスタンスの ID。
- TABLE_ID: Bigtable テーブルの ID。
- KEY_FIELD_NAME: 任意だが推奨。Vertex AI Search に取り込んだ後に行キー値に使用するフィールド名。
- KEY: 必須。列ファミリーキーの文字列値。
- ENCODING: 省略可。型が STRING でない場合の値のエンコードモード。これは、columns に列を一覧表示して、エンコードを指定することによって、特定の列に対してオーバーライドできます。
- COLUMN_TYPE: 省略可。この列ファミリーの値の型。
- QUALIFIER: 必須。列の修飾子。
- FIELD_NAME: 任意だが推奨。Vertex AI Search に取り込んだ後、この列に使用するフィールド名。
- COLUMN_ENCODING: 省略可。型が STRING でない場合の特定の列の値のエンコードモード。
- RECONCILIATION_MODE: 省略可。値は FULL および INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、Bigtable からデータストアへのデータの増分更新が行われます。これにより、upsert オペレーションが行われ、新しいドキュメントが追加され、既存のドキュメントが更新された同じ ID のドキュメントで置き換えられます。FULL を指定すると、データストア内のドキュメントが完全にリベースされます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、Bigtable にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。
- AUTO_GENERATE_IDS: 省略可。ドキュメント ID を自動的に生成するかどうかを指定します。true に設定すると、ペイロードのハッシュに基づいてドキュメント ID が生成されます。生成されたドキュメント ID は、複数のインポートで一貫性が保たれない場合があります。複数のインポートで ID を自動生成する場合は、ドキュメント ID の整合性を維持するために、reconciliationMode を FULL に設定することを強くおすすめします。
  
  autoGenerateIds は、bigquerySource.dataSchema が custom に設定されている場合にのみ指定します。それ以外の場合は、INVALID_ARGUMENT エラーが返されます。autoGenerateIds を指定しない場合、または false に設定する場合は、idField を指定する必要があります。そうしないと、ドキュメントのインポートに失敗します。
- ID_FIELD: 省略可。ドキュメント ID のフィールドを指定します。

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigtable_project_id = "YOUR_BIGTABLE_PROJECT_ID"
# bigtable_instance_id = "YOUR_BIGTABLE_INSTANCE_ID"
# bigtable_table_id = "YOUR_BIGTABLE_TABLE_ID"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

bigtable_options = discoveryengine.BigtableOptions(
    families={
        "family_name_1": discoveryengine.BigtableOptions.BigtableColumnFamily(
            type_=discoveryengine.BigtableOptions.Type.STRING,
            encoding=discoveryengine.BigtableOptions.Encoding.TEXT,
            columns=[
                discoveryengine.BigtableOptions.BigtableColumn(
                    qualifier="qualifier_1".encode("utf-8"),
                    field_name="field_name_1",
                ),
            ],
        ),
        "family_name_2": discoveryengine.BigtableOptions.BigtableColumnFamily(
            type_=discoveryengine.BigtableOptions.Type.INTEGER,
            encoding=discoveryengine.BigtableOptions.Encoding.BINARY,
        ),
    }
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigtable_source=discoveryengine.BigtableSource(
        project_id=bigtable_project_id,
        instance_id=bigtable_instance_id,
        table_id=bigtable_table_id,
        bigtable_options=bigtable_options,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

AlloyDB for PostgreSQL からインポートする

AlloyDB for PostgreSQL からデータを取り込むには、次の手順で Google Cloud コンソールまたは API を使用してデータストアを作成し、データを取り込みます。

AlloyDB for PostgreSQL データが Vertex AI Search プロジェクトと同じプロジェクトにある場合は、AlloyDB for PostgreSQL からデータをインポートするに進みます。

AlloyDB for PostgreSQL データが Vertex AI Search プロジェクトとは異なるプロジェクトにある場合は、AlloyDB for PostgreSQL へのアクセスを設定するに進みます。

別のプロジェクトから AlloyDB for PostgreSQL へのアクセスを設定する

別のプロジェクトにある AlloyDB for PostgreSQL データに Vertex AI Search がアクセスできるようにする手順は次のとおりです。

次の PROJECT_NUMBER 変数を Vertex AI Search プロジェクト番号に置き換えてから、このコードブロックの内容をコピーします。これは Vertex AI Search サービスアカウントの識別子です。
```
service-PROJECT_NUMBER@gcp-sa-discoveryengine.iam.gserviceaccount.com
```
AlloyDB for PostgreSQL データが存在する Google Cloud プロジェクトに切り替えます。
IAM ページに移動します。

IAM
[アクセス権を付与] をクリックします。
[新しいプリンシパル] で、Vertex AI Search サービスアカウントの ID を入力し、[Cloud AlloyDB] > [Cloud AlloyDB 管理者] ロールを選択します。
[保存] をクリックします。
Vertex AI Search プロジェクトに戻ります。

次に、AlloyDB for PostgreSQL からデータをインポートするに進みます。

AlloyDB for PostgreSQL からデータをインポートする

コンソール

コンソールを使用して AlloyDB for PostgreSQL からデータを取り込む手順は次のとおりです。

Google Cloud コンソールで、[AI Applications] ページに移動します。

AI Applications
ナビゲーションメニューで [データストア] をクリックします。
[データストアを作成] をクリックします。
[ソース] ページで、[AlloyDB] を選択します。
インポートするデータのプロジェクト ID、ロケーション ID、クラスタ ID、データベース ID、テーブル ID を指定します。
[続行] をクリックします。
データストアのリージョンを選択します。
データストアの名前を入力します。
[作成] をクリックします。
取り込みのステータスを確認するには、[データストア] ページに移動し、データストア名をクリックして、[データ] ページで詳細を表示します。 [アクティビティ] タブのステータス列が「処理中」から「インポート完了」に変わると、取り込みが完了します。

取り込みには、データのサイズに応じて、数分から数時間かかることがあります。

REST

コマンドラインを使用してデータストアを作成し、AlloyDB for PostgreSQL からデータを取り込む手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"],
}'
```
次のように置き換えます。
- PROJECT_ID: 実際のプロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DISPLAY_NAME: データストアの表示名。これは Google Cloud コンソールに表示されることがあります。
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。
AlloyDB for PostgreSQL からデータをインポートします。
```
  curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
  -d '{
    "alloydbSource": {
      "projectId": "ALLOYDB_PROJECT_ID",
      "locationId": "LOCATION_ID",
      "clusterId": "CLUSTER_ID",
      "databaseId": "DATABASE_ID",
      "tableId": "TABLE_ID",
    },
    "reconciliationMode": "RECONCILIATION_MODE",
    "autoGenerateIds": "AUTO_GENERATE_IDS",
    "idField": "ID_FIELD",
  }'
```
次のように置き換えます。
- PROJECT_ID: Vertex AI Search プロジェクトの ID。
- DATA_STORE_ID: データストアの ID。ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- ALLOYDB_PROJECT_ID: AlloyDB for PostgreSQL プロジェクトの ID。
- LOCATION_ID: AlloyDB for PostgreSQL ロケーションの ID。
- CLUSTER_ID: AlloyDB for PostgreSQL クラスタの ID。
- DATABASE_ID: AlloyDB for PostgreSQL データベースの ID。
- TABLE_ID: AlloyDB for PostgreSQL テーブルの ID。
- RECONCILIATION_MODE: 省略可。値は FULL および INCREMENTAL です。デフォルトは INCREMENTAL です。INCREMENTAL を指定すると、AlloyDB for PostgreSQL からデータストアへのデータの増分更新が行われます。これにより、アップサートオペレーションが実行され、新しいドキュメントを追加し、既存のドキュメントを更新された同じ ID のドキュメントで置き換えます。FULL を指定すると、データストア内のドキュメントが完全に再ベース化されます。つまり、新しいドキュメントと更新されたドキュメントがデータストアに追加され、AlloyDB for PostgreSQL にないドキュメントはデータストアから削除されます。FULL モードは、不要になったドキュメントを自動的に削除する場合に便利です。
- AUTO_GENERATE_IDS: 省略可。ドキュメント ID を自動的に生成するかどうかを指定します。true に設定すると、ペイロードのハッシュに基づいてドキュメント ID が生成されます。生成されたドキュメント ID は、複数のインポートで一貫性が保たれない場合があります。複数のインポートで ID を自動生成する場合は、ドキュメント ID の整合性を維持するために、reconciliationMode を FULL に設定することを強くおすすめします。
- ID_FIELD: 省略可。ドキュメント ID のフィールドを指定します。

Python

詳細については、Vertex AI Search Python API のリファレンスドキュメントをご覧ください。

データストアを作成する


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

ドキュメントのインポート

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1 as discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# alloy_db_project_id = "YOUR_ALLOY_DB_PROJECT_ID"
# alloy_db_location_id = "YOUR_ALLOY_DB_LOCATION_ID"
# alloy_db_cluster_id = "YOUR_ALLOY_DB_CLUSTER_ID"
# alloy_db_database_id = "YOUR_ALLOY_DB_DATABASE_ID"
# alloy_db_table_id = "YOUR_ALLOY_DB_TABLE_ID"

# For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    alloy_db_source=discoveryengine.AlloyDbSource(
        project_id=alloy_db_project_id,
        location_id=alloy_db_location_id,
        cluster_id=alloy_db_cluster_id,
        database_id=alloy_db_database_id,
        table_id=alloy_db_table_id,
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

API を使用して構造化 JSON データをアップロードする

API を使用して JSON ドキュメントまたはオブジェクトを直接アップロードする手順は次のとおりです。

データをインポートする前に、取り込むデータを準備します。

REST

コマンドラインを使用してデータストアを作成し、構造化 JSON データをインポートする手順は次のとおりです。

データストアを作成します。
```
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
-H "X-Goog-User-Project: PROJECT_ID" \
"https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
-d '{
  "displayName": "DATA_STORE_DISPLAY_NAME",
  "industryVertical": "GENERIC",
  "solutionTypes": ["SOLUTION_TYPE_SEARCH"]
}'
```
次のように置き換えます。
- PROJECT_ID: 実際の Google Cloud プロジェクト ID。
- DATA_STORE_ID: 作成する Vertex AI Search データストアの ID。この ID に使用できるのは、小文字、数字、アンダースコア、ハイフンのみです。
- DATA_STORE_DISPLAY_NAME: 作成する Vertex AI Search データストアの表示名。
注: 業種 GENERIC は、カスタム検索アプリの構造化データストア、非構造化データストア、ウェブサイトデータストアの作成に使用されます。

構造化データをインポートします。

データをアップロードするには、次のようないくつかの方法があります。

JSON ドキュメントをアップロードします。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

以下を置き換えます。

DOCUMENT_ID: ドキュメントの一意の ID。この ID は 63 文字以下で、小文字、数字、アンダースコア、ハイフンのみを使用できます。
JSON_DOCUMENT_STRING: JSON ドキュメント（単一の文字列）。これは、前の手順で指定した JSON スキーマに準拠している必要があります（例:）。
```
{ \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
```

JSON オブジェクトをアップロードします。

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

JSON_DOCUMENT_OBJECT は、JSON オブジェクトとして JSON ドキュメントを置き換えます。これは、前の手順で指定した JSON スキーマに準拠している必要があります（例:）。

 {
   "title": "test title",
   "categories": [
     "cat_1",
     "cat_2"
   ],
   "uri": "test uri"
 }

JSON ドキュメントで更新します。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "jsonData": "JSON_DOCUMENT_STRING"
}'

JSON オブジェクトで更新します。

curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
-d '{
  "structData": JSON_DOCUMENT_OBJECT
}'

次のステップ

データストアをアプリに接続するには、検索アプリを作成するの手順に沿ってアプリを作成し、データストアを選択します。
アプリとデータストアが設定された後に検索結果がどのように表示されるかをプレビューするには、検索結果を取得するをご覧ください。

データの取り込みに関するトラブルシューティング

データの取り込みで問題が発生した場合は、次のヒントをご覧ください。

顧客管理の暗号鍵を使用していて、データのインポートが失敗する場合（エラーメッセージ The caller does not have permission が表示される場合）は、鍵に対する CryptoKey 暗号化 / 復号 IAM ロール（roles/cloudkms.cryptoKeyEncrypterDecrypter）が Cloud Storage サービスエージェントに付与されていることを確認します。詳細については、「顧客管理の暗号鍵」の始める前にをご覧ください。
ウェブサイトの高度なインデックス登録を使用していて、データストアのドキュメント使用量が想定よりもはるかに少ない場合は、インデックス登録に指定した URL パターンを確認し、指定した URL パターンがインデックス登録するページをカバーしていることを確認し、必要に応じて URL パターンを拡張します。たとえば、*.en.example.com/* を使用していた場合は、インデックス登録するサイトに *.example.com/* を追加する必要があるかもしれません。

Terraform を使用してデータストアを作成する

Terraform を使用して空のデータストアを作成できます。空のデータストアを作成したら、 Google Cloud コンソールまたは API コマンドを使用してデータストアにデータを取り込むことができます。

Terraform 構成を適用または削除する方法については、基本的な Terraform コマンドをご覧ください。

Terraform を使用して空のデータストアを作成するには、google_discovery_engine_data_store をご覧ください。

サードパーティのデータソースを接続する

サードパーティのデータソースを Vertex AI Search に接続することは、許可リストに登録されたユーザーのみが利用できる機能です。

この機能のクローズドホワイトリストに登録されている場合は、Gemini Enterprise のドキュメントで、サードパーティのデータソースを接続する方法の手順をご覧ください。Vertex AI Search でコネクタを作成する場合も、Gemini Enterprise でコネクタを作成する場合も、手順は同じです。

検索データストアを作成する コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

ウェブサイトのコンテンツを使用してデータストアを作成する

始める前に

手順

コンソール

Python

データストアを作成する

ウェブサイトをインポートする

次のステップ

BigQuery からインポート

BigQuery から 1 回インポートする

コンソール

REST

C#

データストアを作成する

ドキュメントのインポート

Go

データストアを作成する

ドキュメントのインポート

Java

データストアを作成する

ドキュメントのインポート

Node.js

データストアを作成する

ドキュメントのインポート

Python

データストアを作成する

ドキュメントのインポート

Ruby

データストアを作成する

ドキュメントのインポート

定期的な同期で BigQuery に接続する

コンソール

次のステップ

Cloud Storage からインポートする

Cloud Storage から 1 回インポートする

コンソール

REST

C#

データストアを作成する

ドキュメントのインポート

Go

データストアを作成する

ドキュメントのインポート

Java

データストアを作成する

ドキュメントのインポート

Node.js

データストアを作成する

ドキュメントのインポート

Python

データストアを作成する

ドキュメントのインポート

Ruby

データストアを作成する

ドキュメントのインポート

定期的な同期で Cloud Storage に接続する

コンソール

次のステップ

Google ドライブに接続する

始める前に

Google ドライブのデータストアを作成する

コンソール

エラー メッセージ

トラブルシューティング

次のステップ

Gmail に接続する

始める前に

制限事項

Gmail データストアを作成する

コンソール

エラー メッセージ

次のステップ

Google サイトに接続する

コンソール

次のステップ

Google カレンダーに接続する

始める前に

Google カレンダーのデータストアを作成する

エラー メッセージ

検索データストアを作成する

エラーメッセージ

エラーメッセージ

エラーメッセージ

Cloud SQL インスタンスのステージングバケットへのアクセスを設定する