Processing Landsat satellite images with GPUs

This tutorial shows you how to use GPUs on Dataflow to process Landsat 8 satellite images and render them as JPEG files. The tutorial is based on the example Processing Landsat satellite images with GPUs.

Prepare your working environment

Download the starter files, and then create your Artifact Registry repository.

Download the starter files

Download the starter files and then change directories.

  1. Clone the python-docs-samples repository.

    git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
    
  2. Navigate to the sample code directory.

    cd python-docs-samples/dataflow/gpu-examples/tensorflow-landsat
    

Configure Artifact Registry

Create an Artifact Registry repository so that you can upload artifacts. Each repository can contain artifacts for a single supported format.

All repository content is encrypted using either Google-owned and Google-managed encryption keys or customer-managed encryption keys. Artifact Registry uses Google-owned and Google-managed encryption keys by default and no configuration is required for this option.

You must have at least Artifact Registry Writer access to the repository.

Run the following command to create a new repository. The command uses the --async flag and returns immediately, without waiting for the operation in progress to complete.

gcloud artifacts repositories create REPOSITORY \
    --repository-format=docker \
    --location=LOCATION \
    --async

Replace REPOSITORY with a name for your repository. For each repository location in a project, repository names must be unique.

Before you can push or pull images, configure Docker to authenticate requests for Artifact Registry. To set up authentication to Docker repositories, run the following command:

gcloud auth configure-docker LOCATION-docker.pkg.dev

The command updates your Docker configuration. You can now connect with Artifact Registry in your Google Cloud project to push images.

Build the Docker image

Cloud Build allows you to build a Docker image using a Dockerfile and save it into Artifact Registry, where the image is accessible to other Google Cloud products.

Build the container image by using the build.yaml config file.

gcloud builds submit --config build.yaml

Run the Dataflow job with GPUs

The following code block demonstrates how to launch this Dataflow pipeline with GPUs.

We run the Dataflow pipeline using the run.yaml config file.

export PROJECT=PROJECT_NAME
export BUCKET=BUCKET_NAME

export JOB_NAME="satellite-images-$(date +%Y%m%d-%H%M%S)"
export OUTPUT_PATH="gs://$BUCKET/samples/dataflow/landsat/output-images/"
export REGION="us-central1"
export GPU_TYPE="nvidia-tesla-t4"

gcloud builds submit \
    --config run.yaml \
    --substitutions _JOB_NAME=$JOB_NAME,_OUTPUT_PATH=$OUTPUT_PATH,_REGION=$REGION,_GPU_TYPE=$GPU_TYPE \
    --no-source

Replace the following:

  • PROJECT_NAME: the Google Cloud project name
  • BUCKET_NAME: the Cloud Storage bucket name (without the gs:// prefix)

After you run this pipeline, wait for the command to finish. If you exit your shell, you might lose the environment variables that you've set.

To avoid sharing the GPU between multiple worker processes, this sample uses a machine type with 1 vCPU. The memory requirements of the pipeline are addressed by using 13 GB of extended memory. For more information, read GPUs and worker parallelism.

View your results

The pipeline in tensorflow-landsat/main.py processes Landsat 8 satellite images and renders them as JPEG files. Use the following steps to view these files.

  1. List the output JPEG files with details by using the Google Cloud CLI.

    gcloud storage ls "gs://$BUCKET/samples/dataflow/landsat/" --long --readable-sizes
    
  2. Copy the files into your local directory.

    mkdir outputs
    gcloud storage cp "gs://$BUCKET/samples/dataflow/landsat/*" outputs/
    
  3. Open these image files with the image viewer of your choice.