To validate data integrity and detect changes, Cloud Storage encourages you to use checksums when transferring data to and from your buckets. This page provides information about how checksums are used within Cloud Storage and how to specify checksums when sending requests.
Prevent data corruption by using checksums
Data can sometimes get corrupted while being transferred to or from the cloud because of software or hardware bugs, memory or router errors, electrical disturbances, or changes to the source data during extended period file uploads.
To help protect you against data corruption, Cloud Storage supports the use of CRC32C and MD5 checksums for verifying the integrity of your data and detecting changes in your data.
CRC32C is the recommended validation method for performing integrity checks. Validation using MD5 hashes is supported for single-file uploads but isn't supported for objects that are uploaded in chunks, such as composite objects and objects uploaded using an XML API multipart upload.
Checksums for data writes
For object writes, the client calculates the checksum of the local file and
attaches it to the HTTP headers of the object upload request. The server
receives the data payload, calculates its own checksum, and
validates the data by comparing both checksums after the upload completes.
If the checksums match, the object is stored in Cloud Storage along
with its checksums. If the checksums don't match, the write request is rejected
with a BadRequestException: 400 error.
Server-side validation for data writes
Cloud Storage performs server-side validation in the following cases:
When you supply an object's MD5 or CRC32C hash in an object upload request. To learn about types of object uploads, see Object uploads.
When you perform a copy or rewrite request within Cloud Storage. For object copy and rewrite requests, Cloud Storage automatically performs server-side validation based on a non-editable checksum stored with the source object.
JSON API single-request (media) uploads
For JSON API media uploads, you can specify
checksums in the X-Goog-Hash header of the request. For example:
curl -X POST --data-binary @Desktop/dog-pic.jpeg \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: image/jpeg" \
-H "X-Goog-Hash: crc32c=n03x6A==" \
"https://storage.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=media&name=dog-pic.jpeg"JSON API multipart uploads
For JSON API multipart uploads, you can specify checksums as part of the request container, either in the object metadata section or under a third boundary string. For details on the JSON structure and valid keys of an object, see the Objects resource representation.
The following example specifies a CRC32C checksum in the object metadata portion of a request container:
--separator_string
Content-Type: application/json; charset=UTF-8
{
"name":"my-document.txt",
"crc32c": "n03x6A=="
}
--separator_string
Content-Type: text/plain
This is a text file.
--separator_string--
The following example specifies a CRC32C checksum in the third boundary string of a request container:
--separator_string
Content-Type: application/json; charset=UTF-8
{
"name":"my-document.txt"
}
--separator_string
Content-Type: text/plain
This is a text file.
--separator_string
Content-Type: application/json; charset=UTF-8
{ "crc32c": "n03x6A==" }
--separator_string--
JSON API resumable uploads
For JSON API resumable uploads, you can specify checksums in the X-Goog-Hash
header of the final request that completes the upload. For example:
curl -i -X PUT --data-binary @Desktop/dog-pic.jpeg \
-H "Content-Length: 2000000" \
-H "X-Goog-Hash: crc32c=n03x6A==" \
"SESSION_URI"The checksum specified in the final request is calculated from the whole object, not just the object data in the final request.
XML API single-request uploads
For XML API single-request uploads, you can specify checksums in the
x-goog-hash header of the request.
For example:
curl -X PUT --data-binary @Desktop/dog-pic.jpeg \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: image/jpeg" \
-H "x-goog-hash: crc32c=n03x6A==" \
"https://storage.googleapis.com/my-bucket/dog-pic.jpeg"XML API single-request uploads also accept the standard HTTP Content-MD5 header. For details, refer to the Content-MD5 specification.
XML API multipart uploads
For XML API multipart uploads, you can specify a CRC32C checksum for the entire
object or an individual checksum for each upload part. To specify an individual
checksum for an upload part, include the x-goog-hash header in the
request for that specific part.
For example:
PUT /dog-pic.jpeg?partNumber=1&uploadId=ABgVH8 HTTP/1.1 Host: my-bucket.storage.googleapis.com Content-Length: 1000000 x-goog-hash: crc32c=n03x6A==
Only CRC32C checksums can be used to verify the integrity of XML API multipart uploads. MD5 checksums aren't supported.
gRPC uploads
When uploading objects using gRPC, you can specify object-level checksums
in the first or last WriteObject message of any upload request, whether it's a
single-shot upload or a resumable upload.
Additionally, gRPC supports per-message checksums. Each WriteObject
message contains data chunks of up to 2 MiB, and each chunk can include its own
checksum. You can specify per-message checksums in place of or alongside an
object-level checksum.
Parallel composite uploads
In the case of parallel composite uploads, you should perform an integrity check for each component upload and then use preconditions with the upload compose request to protect against race conditions. Compose requests don't get server-side validation, so you should perform client-side validation on the new composite object if you want an end-to-end integrity check.
Google Cloud CLI copies and rewrites
In the gcloud CLI, data copied to or from a
Cloud Storage bucket gets automatically validated. For cp, mv, and
rsync commands, the gcloud CLI uses MD5 or CRC32C checksums to
determine if there is a difference between the version of an object found at the
source and the version found at the destination. If the checksum of the
source data doesn't match the checksum of the destination data, the
gcloud CLI deletes the invalid copy and prints a warning message.
This very rarely happens. If it does, you should retry the operation.
This automatic validation occurs after the object is finalized and
invalid objects are visible for 1-3 seconds before they're identified and
deleted. Additionally, the gcloud CLI might be
interrupted after the upload completes but before it performs the validation,
leaving the invalid object in place. These issues can be avoided when uploading
single files to Cloud Storage by using server-side validation,
which occurs when you use the --content-md5 flag to specify an MD5 hash.
The Google Cloud CLI ignores the --content-md5 flag for objects that don't have
an MD5 hash.
Change detection for rsync
The gcloud storage rsync command compares checksums in the following
scenarios to determine whether to skip a transfer:
The source and destination are both Cloud Storage buckets and the object has an MD5 or CRC32C checksum in both buckets.
The object does not have a file modification time (
mtime) in either the source or destination.
In cases where an object has an mtime value in both the source and
destination, such as when the source and destination are file systems, the
rsync command compares the objects' size and mtime value instead of using
checksums. Similarly, if the source is a bucket and the destination is a
local file system, the rsync command uses the time created for the source
object as a substitute for mtime, and the command does not use checksums.
If neither mtime nor checksums are available, rsync only compares file sizes
when determining if there is a change between the source version of an object
and the destination version. For example, neither mtime nor checksums are
available when comparing composite objects with objects at a cloud provider
that doesn't support CRC32C, because composite objects don't have MD5 checksums.
Client-side validation for data writes
You can perform client-side validation of your uploads by issuing a request for the uploaded object's metadata, comparing the uploaded object's hash value to the expected value, and deleting the object in case of a mismatch. This method is useful if the object's MD5 or CRC32C hash isn't known at the start of the upload.
The following table shows the clients in Cloud Storage that support calculating checksums for object writes by default, including the client versions that support checksums.
| Client | Versions that support checksums |
|---|---|
| Cloud Storage C++ client library | 2.46 and later |
| Cloud Storage Go client library | 1.60.0 and later |
| Cloud Storage Java client library | 2.62 and later |
| Cloud Storage Node.js client library | 7.19.0 and later |
| Cloud Storage PHP client library | 1.51.0 and later |
| Cloud Storage Python client library | 3.7.0 and later |
| Cloud Storage Ruby client library | 1.60.0 |
| Cloud Storage connector |
|
| Cloud Storage FUSE | 3.8.0 and later |
| Google Cloud CLI |
Checksums for data reads
For object downloads, the server sends the object along with its stored checksum in the response. The client calculates its own checksum of the downloaded file based on the bytes it received and compares the two checksums to verify data integrity.
Some client libraries don't automatically perform checksum validation on downloaded objects. Your application might need to independently calculate the checksum of the downloaded file using the received bytes and compare it against the server-supplied hash to verify data integrity.
Client-side validation for reads
To perform an integrity check for downloaded data, calculate the checksum as the data is received and compare your results to the server-supplied checksum.
Server-side checksums are based on the complete object as it's stored in Cloud Storage, which means that the following types of downloads can't be validated against server-supplied checksums:
Downloads that undergo decompressive transcoding: the server-supplied checksum represents the object in its compressed state, while the served data has compression removed and consequently has a different checksum value.
A response that contains only a portion of the object data: this type of response occurs for
Rangerequests.gRPC ranged reads are an exception to this bullet and support end-to-end validation. In gRPC ranged reads, Cloud Storage validates data by including a unique CRC32C checksum inside every individual response chunk of a stream, which lets you client instantly verify that the specific block of data wasn't corrupted in transit. For broader validation, the stream also provides the entire object's full checksum, which advanced clients can use to calculate a rolling total and verify the integrity of the larger file.
If your application needs to read object ranges instead of full objects at once, we recommend using gRPC. Otherwise, we recommend using ranged requests only for restarting the download of a full object after the last received offset, where you can calculate and validate the checksum after the full download completes.
When validating your download, a mismatch between your calculated checksum and the server-supplied checksum indicates that the data was corrupted in transit. In these cases, you should discard the corrupted data and use the recommended retry logic to retry the request.
What's next
- Learn about object uploads and object downloads in Cloud Storage.
- Learn about retry strategies for Cloud Storage.