本页介绍了创建和管理 AML AI 数据集的步骤。数据集用作引擎配置、训练、回测和预测流水线的输入。AML AI 数据集包含对Google Cloud 项目中与 AML AI 输入数据模型匹配的 BigQuery 表的引用。
前提条件
-
如需获得创建和管理数据集所需的权限,请让您的管理员为您授予项目的 Financial Services Admin (
financialservices.admin) IAM 角色。 如需详细了解如何授予角色,请参阅管理对项目、文件夹和组织的访问权限。 - 创建实例
-
某些 API 方法会返回长时间运行的操作 (LRO)。这些方法是异步执行的,并会返回一个 Operation 对象;如需了解详情,请参阅 REST 参考。当方法返回响应时,操作可能尚未完成。对于这些方法,请发送请求,然后检查结果。一般来说,所有 POST、PUT、UPDATE 和 DELETE 操作都是长时间运行的操作。
创建数据集
如需创建数据集,请发送创建请求,然后检查 LRO 的结果。
发送请求
如需创建数据集,请使用 projects.locations.instances.datasets.create 方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID:IAM 设置中列出的 Google Cloud 项目 IDLOCATION:实例的位置;请使用其中一个受支持的区域显示位置us-central1us-east1asia-south1europe-west1europe-west2europe-west4northamerica-northeast1southamerica-east1australia-southeast1
INSTANCE_ID:用户定义的实例标识符DATASET_ID:AML AI 数据集的用户定义标识符;只能使用小写字母、数字、短划线和下划线(例如train_jan2018_apr2020)BQ_INPUT_DATASET_NAME:BigQuery 输入数据集名称PARTY_TABLE:BigQuery 输入数据集中的 Party 表ACCOUNT_PARTY_LINK_TABLE:BigQuery 输入数据集中的 AccountPartyLink 表TRANSACTION_TABLE:BigQuery 输入数据集中的交易表RISK_CASE_EVENT_TABLE:BigQuery 输入数据集中的 RiskCaseEvent 表PARTY_SUPPLEMENTARY_DATA:BigQuery 输入数据集中的 PartySupplementaryData 表;此表是可选的,可以从请求 JSON 中移除DATA_START_DATE:要在数据集中使用的数据的开始日期和时间;请使用 RFC3339 UTC“祖鲁时”格式(例如2014-10-02T15:01:23Z)DATA_END_DATE:要在数据集中使用的数据的结束日期和时间;请使用 RFC3339 UTC“祖鲁时”格式(例如2014-10-02T15:01:23Z)
请求 JSON 正文:
{
"tableSpecs": {
"party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
"account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
"transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
"risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
"party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
},
"dateRange": {
"startTime": "DATA_START_DATE",
"endTime": "DATA_END_DATE"
},
"timeZone": {
"id": "UTC"
}
}
如需发送请求,请选择以下方式之一:
curl
将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
cat > request.json << 'EOF'
{
"tableSpecs": {
"party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
"account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
"transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
"risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
"party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
},
"dateRange": {
"startTime": "DATA_START_DATE",
"endTime": "DATA_END_DATE"
},
"timeZone": {
"id": "UTC"
}
}
EOF然后,执行以下命令以发送 REST 请求:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID"
PowerShell
将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
@'
{
"tableSpecs": {
"party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
"account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
"transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
"risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
"party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
},
"dateRange": {
"startTime": "DATA_START_DATE",
"endTime": "DATA_END_DATE"
},
"timeZone": {
"id": "UTC"
}
}
'@ | Out-File -FilePath request.json -Encoding utf8然后,执行以下命令以发送 REST 请求:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets?dataset_id=DATASET_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
"createTime": CREATE_TIME,
"target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
"verb": "create",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
复制返回的 OPERATION_ID 以便在下一部分中使用。
检查结果
使用 projects.locations.operations.get 方法检查数据集是否已创建。如果响应包含 "done": false,请重复执行该命令,直到响应包含 "done": true。这些操作可能需要几分钟到几小时才能完成。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID:IAM 设置中列出的 Google Cloud 项目 IDLOCATION:实例的位置;请使用其中一个受支持的区域显示位置us-central1us-east1asia-south1europe-west1europe-west2europe-west4northamerica-northeast1southamerica-east1australia-southeast1
OPERATION_ID:操作的标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
"createTime": "2023-03-14T15:52:55.358979323Z",
"endTime": "2023-03-14T16:52:55.358979323Z",
"target": "projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID",
"verb": "create",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": true,
"response": {
"@type": "type.googleapis.com/google.cloud.financialservices.v1.Dataset",
"name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
"createTime": CREATE_TIME,
"updateTime": UPDATE_TIME,
"tableSpecs": {
"party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
"account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
"transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
"risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
"party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
},
"state": "ACTIVE",
"dateRange": {
"start_time": "DATA_START_DATE",
"end_time": "DATA_END_DATE"
},
"timeZone": {
"id": "UTC"
}
}
}
获取数据集
如需获取数据集,请使用 projects.locations.instances.datasets.get 方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID:IAM 设置中列出的 Google Cloud 项目 IDLOCATION:实例的位置;请使用其中一个受支持的区域显示位置us-central1us-east1asia-south1europe-west1europe-west2europe-west4northamerica-northeast1southamerica-east1australia-southeast1
INSTANCE_ID:用户定义的实例标识符DATASET_ID:用户定义的数据集标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{
"name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
"createTime": CREATE_TIME,
"updateTime": UPDATE_TIME,
"tableSpecs": {
"party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
"account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
"transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
"risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
"party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
},
"state": "ACTIVE",
"dateRange": {
"start_time": "DATA_START_DATE",
"end_time": "DATA_END_DATE"
},
"timeZone": {
"id": "UTC"
}
}
更新数据集
如需更新数据集,请使用 projects.locations.instances.datasets.patch 方法。
唯一可以更新的字段是 AML AI 中的标签字段。 以下示例会更新与数据集关联的键值对用户标签。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID:IAM 设置中列出的 Google Cloud 项目 IDLOCATION:实例的位置;请使用其中一个受支持的区域显示位置us-central1us-east1asia-south1europe-west1europe-west2europe-west4northamerica-northeast1southamerica-east1australia-southeast1
INSTANCE_ID:用户定义的实例标识符DATASET_ID:用户定义的数据集标识符KEY:用于组织数据集的键值对中的键。如需了解详情,请参阅labels。VALUE:用于组织数据集的键值对中的值。如需了解详情,请参阅labels。
请求 JSON 正文:
{
"labels": {
"KEY": "VALUE"
}
}
如需发送请求,请选择以下方式之一:
curl
将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
cat > request.json << 'EOF'
{
"labels": {
"KEY": "VALUE"
}
}
EOF然后,执行以下命令以发送 REST 请求:
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels"
PowerShell
将请求正文保存在名为 request.json 的文件中。在终端中运行以下命令,在当前目录中创建或覆盖此文件:
@'
{
"labels": {
"KEY": "VALUE"
}
}
'@ | Out-File -FilePath request.json -Encoding utf8然后,执行以下命令以发送 REST 请求:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method PATCH `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID?updateMask=labels" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
"createTime": CREATE_TIME,
"target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
"verb": "update",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
如需详细了解如何获取长时间运行的操作 (LRO) 的结果,请参阅检查结果。
列出数据集
如需列出指定实例的数据集,请使用 projects.locations.instances.datasets.list 方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID:IAM 设置中列出的 Google Cloud 项目 IDLOCATION:实例的位置;请使用其中一个受支持的区域显示位置us-central1us-east1asia-south1europe-west1europe-west2europe-west4northamerica-northeast1southamerica-east1australia-southeast1
INSTANCE_ID:用户定义的实例标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{
"datasets": [
{
"name": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
"createTime": CREATE_TIME,
"updateTime": UPDATE_TIME,
"tableSpecs": {
"party": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_TABLE",
"account_party_link": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.ACCOUNT_PARTY_LINK_TABLE",
"transaction": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.TRANSACTION_TABLE",
"risk_case_event": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.RISK_CASE_EVENT_TABLE",
"party_supplementary_data": "bq://PROJECT_ID.BQ_INPUT_DATASET_NAME.PARTY_SUPPLEMENTARY_DATA"
},
"state": "ACTIVE",
"dateRange": {
"start_time": "DATA_START_DATE",
"end_time": "DATA_END_DATE"
},
"timeZone": {
"id": "UTC"
}
}
]
}
删除数据集
如需删除数据集,请使用 projects.locations.instances.datasets.delete 方法。
在使用任何请求数据之前,请先进行以下替换:
PROJECT_ID:IAM 设置中列出的 Google Cloud 项目 IDLOCATION:实例的位置;请使用其中一个受支持的区域显示位置us-central1us-east1asia-south1europe-west1europe-west2europe-west4northamerica-northeast1southamerica-east1australia-southeast1
INSTANCE_ID:用户定义的实例标识符DATASET_ID:用户定义的数据集标识符
如需发送请求,请选择以下方式之一:
curl
执行以下命令:
curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID"
PowerShell
执行以下命令:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://financialservices.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID" | Select-Object -Expand Content
您应该收到类似以下内容的 JSON 响应:
{
"name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID",
"metadata": {
"@type": "type.googleapis.com/google.cloud.financialservices.v1.OperationMetadata",
"createTime": CREATE_TIME,
"target": "projects/PROJECT_ID/locations/LOCATION/instances/INSTANCE_ID/datasets/DATASET_ID",
"verb": "delete",
"requestedCancellation": false,
"apiVersion": "v1"
},
"done": false
}
如需详细了解如何获取长时间运行的操作 (LRO) 的结果,请参阅检查结果。