The Firestore to Cloud Storage Text template is a batch pipeline that reads Firestore entities and writes them to Cloud Storage as text files. You can provide a function to process each entity as a JSON string. If you don't provide such a function, every line in the output file will be a JSON-serialized entity.
Pipeline requirements
Firestore must be set up in the project before running the pipeline.
Template parameters
Required parameters
- firestoreReadGqlQuery: A GQL (https://cloud.google.com/datastore/docs/reference/gql_reference) query that specifies which entities to grab. For example, SELECT * FROM MyKind.
- firestoreReadProjectId: The ID of the Google Cloud project that contains the Firestore instance that you want to read data from.
- textWritePrefix: The Cloud Storage path prefix that specifies where the data is written. For example, gs://mybucket/somefolder/.
Optional parameters
- firestoreReadNamespace: The namespace of the requested entities. To use the default namespace, leave this parameter blank.
- javascriptTextTransformGcsPath: The Cloud Storage URI of the .js file that defines the JavaScript user-defined function (UDF) to use. For example, gs://my-bucket/my-udfs/my_file.js.
- javascriptTextTransformFunctionName: The name of the JavaScript user-defined function (UDF) to use. For example, if your JavaScript function code is myTransform(inJson) { /*...do stuff...*/ }, then the function name ismyTransform. For sample JavaScript UDFs, see UDF Examples (https://github.com/GoogleCloudPlatform/DataflowTemplates#udf-examples).
User-defined function
Optionally, you can extend this template by writing a user-defined function (UDF). The template calls the UDF for each input element. Element payloads are serialized as JSON strings. For more information, see Create user-defined functions for Dataflow templates.
Function specification
The UDF has the following specification:
- Input: a Firestore entity, serialized as a JSON string.
- Output: the string value to write to Cloud Storage.
Run the template
Console
- Go to the Dataflow Create job from template page. Go to Create job from template
- In the Job name field, enter a unique job name.
- Optional: For Regional endpoint, select a value from the drop-down menu. The default
    region is us-central1.For a list of regions where you can run a Dataflow job, see Dataflow locations. 
- From the Dataflow template drop-down menu, select the Firestore to Text Files on Cloud Storage template.
- In the provided parameter fields, enter your parameter values.
- Click Run job.
gcloud
In your shell or terminal, run the template:
gcloud dataflow jobs run JOB_NAME \ --gcs-location gs://dataflow-templates-REGION_NAME/VERSION/Firestore_to_GCS_Text \ --region REGION_NAME \ --parameters \ firestoreReadGqlQuery="SELECT * FROM FIRESTORE_KIND",\ firestoreReadProjectId=FIRESTORE_PROJECT_ID,\ firestoreReadNamespace=FIRESTORE_NAMESPACE,\ javascriptTextTransformGcsPath=PATH_TO_JAVASCRIPT_UDF_FILE,\ javascriptTextTransformFunctionName=JAVASCRIPT_FUNCTION,\ textWritePrefix=gs://BUCKET_NAME/output/
Replace the following:
- JOB_NAME: a unique job name of your choice
- REGION_NAME: the region where you want to deploy your Dataflow job—for example,- us-central1
- VERSION: the version of the template that you want to use- You can use the following values: - latestto use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/
- the version name, like 2023-09-12-00_RC00, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
 
- BUCKET_NAME: the name of your Cloud Storage bucket
- FIRESTORE_PROJECT_ID: the Google Cloud project ID where the Firestore instance exists
- FIRESTORE_KIND: the type of your Firestore entities
- FIRESTORE_NAMESPACE: the namespace of your Firestore entities
- JAVASCRIPT_FUNCTION: the name of the JavaScript user-defined function (UDF) that you want to use- For example, if your JavaScript function code is - myTransform(inJson) { /*...do stuff...*/ }, then the function name is- myTransform. For sample JavaScript UDFs, see UDF Examples.
- PATH_TO_JAVASCRIPT_UDF_FILE: the Cloud Storage URI of the- .jsfile that defines the JavaScript user-defined function (UDF) you want to use—for example,- gs://my-bucket/my-udfs/my_file.js
API
To run the template using the REST API, send an HTTP POST request. For more information on the
    API and its authorization scopes, see
  projects.templates.launch.
POST https://dataflow.googleapis.com/v1b3/projects/PROJECT_ID/locations/LOCATION/templates:launch?gcsPath=gs://dataflow-templates-LOCATION/VERSION/Firestore_to_GCS_Text { "jobName": "JOB_NAME", "parameters": { "firestoreReadGqlQuery": "SELECT * FROM FIRESTORE_KIND" "firestoreReadProjectId": "FIRESTORE_PROJECT_ID", "firestoreReadNamespace": "FIRESTORE_NAMESPACE", "javascriptTextTransformGcsPath": "PATH_TO_JAVASCRIPT_UDF_FILE", "javascriptTextTransformFunctionName": "JAVASCRIPT_FUNCTION", "textWritePrefix": "gs://BUCKET_NAME/output/" }, "environment": { "zone": "us-central1-f" } }
Replace the following:
- PROJECT_ID: the Google Cloud project ID where you want to run the Dataflow job
- JOB_NAME: a unique job name of your choice
- LOCATION: the region where you want to deploy your Dataflow job—for example,- us-central1
- VERSION: the version of the template that you want to use- You can use the following values: - latestto use the latest version of the template, which is available in the non-dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/latest/
- the version name, like 2023-09-12-00_RC00, to use a specific version of the template, which can be found nested in the respective dated parent folder in the bucket— gs://dataflow-templates-REGION_NAME/
 
- BUCKET_NAME: the name of your Cloud Storage bucket
- FIRESTORE_PROJECT_ID: the Google Cloud project ID where the Firestore instance exists
- FIRESTORE_KIND: the type of your Firestore entities
- FIRESTORE_NAMESPACE: the namespace of your Firestore entities
- JAVASCRIPT_FUNCTION: the name of the JavaScript user-defined function (UDF) that you want to use- For example, if your JavaScript function code is - myTransform(inJson) { /*...do stuff...*/ }, then the function name is- myTransform. For sample JavaScript UDFs, see UDF Examples.
- PATH_TO_JAVASCRIPT_UDF_FILE: the Cloud Storage URI of the- .jsfile that defines the JavaScript user-defined function (UDF) you want to use—for example,- gs://my-bucket/my-udfs/my_file.js
What's next
- Learn about Dataflow templates.
- See the list of Google-provided templates.