This template is deprecated and will be removed in Q3 2023. Please migrate to Cloud Storage Text to Firestore template.
The Cloud Storage Text to Datastore template is a batch pipeline that reads from text files stored in Cloud Storage and writes JSON encoded Entities to Datastore. Each line in the input text files must be in the specified JSON format.
Pipeline requirements
- Datastore must be enabled in the destination project.
Template parameters
Required parameters
- textReadPattern: A Cloud Storage path pattern that specifies the location of your text data files. For example, gs://mybucket/somepath/*.json.
- datastoreWriteProjectId: The ID of the Google Cloud project to write the Datastore entities to.
- errorWritePath: The error log output file to use for write failures that occur during processing. For example, gs://your-bucket/errors/.
Optional parameters
- javascriptTextTransformGcsPath: The Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) to use. For example, gs://my-bucket/my-udfs/my_file.js.
- javascriptTextTransformFunctionName: The name of the JavaScript user-defined function (UDF) to use. For example, if your JavaScript function code is myTransform(inJson) { /*...do stuff...*/ }, then the function name ismyTransform. For sample JavaScript UDFs, see UDF Examples (https://github.com/GoogleCloudPlatform/DataflowTemplates#udf-examples).
- datastoreHintNumWorkers: Hint for the expected number of workers in the Datastore ramp-up throttling step. Defaults to 500.
Run the template
Console
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
    region is us-central1.For a list of regions where you can run a Dataflow job, see Dataflow locations. 
- From the Dataflow template drop-down menu, select the Text Files on Cloud Storage to Datastore template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
gcloud
In your shell or terminal, run the template:
gcloud dataflow jobs run JOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/GCS_Text_to_Datastore \ --region REGION_NAME \ --parameters \ textReadPattern=PATH_TO_INPUT_TEXT_FILES,\ javascriptTextTransformGcsPath=PATH_TO_JAVASCRIPT_UDF_FILE,\ javascriptTextTransformFunctionName=JAVASCRIPT_FUNCTION,\ datastoreWriteProjectId=PROJECT_ID,\ errorWritePath=ERROR_FILE_WRITE_PATH
Replace the following:
- JOB_NAME: a unique job name of your choice
- VERSION: the version of the template that you want to use- You can use the following values: - latestto use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/
- the version name, like 2023-09-12-00_RC00, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
 
- REGION_NAME: the region where you want to deploy your Dataflow job—for example,- us-central1
- PATH_TO_INPUT_TEXT_FILES: the input files pattern on Cloud Storage
- JAVASCRIPT_FUNCTION: the name of the JavaScript user-defined function (UDF) that you want to use- For example, if your JavaScript function code is - myTransform(inJson) { /*...do stuff...*/ }, then the function name is- myTransform. For sample JavaScript UDFs, see UDF Examples.
- PATH_TO_JAVASCRIPT_UDF_FILE: the Cloud Storage URI of the- .jsfile that defines the JavaScript user-defined function (UDF) you want to use—for example,- gs://my-bucket/my-udfs/my_file.js
- ERROR_FILE_WRITE_PATH: your desired path to error file on Cloud Storage
API
To run the template using the REST API, send an HTTP POST request. For more information on the
    API and its authorization scopes, see
  projects.templates.launch.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/GCS_Text_to_Datastore { "jobName": "JOB_NAME", "parameters": { "textReadPattern": "PATH_TO_INPUT_TEXT_FILES", "javascriptTextTransformGcsPath": "PATH_TO_JAVASCRIPT_UDF_FILE", "javascriptTextTransformFunctionName": "JAVASCRIPT_FUNCTION", "datastoreWriteProjectId": "PROJECT_ID", "errorWritePath": "ERROR_FILE_WRITE_PATH" }, "environment": { "zone": "us-central1-f" } }
Replace the following:
- PROJECT_ID: the Google Cloud project ID where you want to run the Dataflow job
- JOB_NAME: a unique job name of your choice
- VERSION: the version of the template that you want to use- You can use the following values: - latestto use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/
- the version name, like 2023-09-12-00_RC00, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
 
- LOCATION: the region where you want to deploy your Dataflow job—for example,- us-central1
- PATH_TO_INPUT_TEXT_FILES: the input files pattern on Cloud Storage
- JAVASCRIPT_FUNCTION: the name of the JavaScript user-defined function (UDF) that you want to use- For example, if your JavaScript function code is - myTransform(inJson) { /*...do stuff...*/ }, then the function name is- myTransform. For sample JavaScript UDFs, see UDF Examples.
- PATH_TO_JAVASCRIPT_UDF_FILE: the Cloud Storage URI of the- .jsfile that defines the JavaScript user-defined function (UDF) you want to use—for example,- gs://my-bucket/my-udfs/my_file.js
- ERROR_FILE_WRITE_PATH: your desired path to error file on Cloud Storage
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.