You can configure Sensitive Data Protection to detect specific metadata labels in your content. The metadata can be automatically extracted from supported file types or provided by your application in the inspection request. If Sensitive Data Protection finds content that matches your metadata criteria, it generates a finding.
To scan for metadata labels, create a custom metadata label infoType. Then, configure your inspection or discovery scan to search for that infoType.
Benefits and use cases
This feature lets you use your existing classification taxonomies for inspection and policy enforcement. If you use a custom or third-party classification system that applies metadata labels to your documents, you can configure Sensitive Data Protection to detect these metadata labels during your inspection or discovery operations.
Example use cases include the following:
- Scan files for the presence of Microsoft sensitivity labels that contain specific key-value pairs.
- Combine metadata label detection with standard infoType detection for a multi-layered approach.
- Scan metadata that is passed alongside the content by your application, even if the metadata isn't embedded in the file.
Supported file types
- DOCX
- PPTX
- XLSX
Supported metadata formats
This feature can detect Microsoft Purview Information Protection metadata and client-provided metadata.
Microsoft Purview Information Protection metadata
This feature can detect Microsoft Purview Information Protection metadata that have the following name format:
MSIP_Label_GUID_ATTRIBUTE
Replace the following:
GUID: The globally unique identifier of the metadata.ATTRIBUTE: The Microsoft Information Protection attribute of the metadata. Accepted values:ActionIdContentBitsEnabledMethodNameSetDateSiteId
Client-provided metadata
You can provide custom metadata directly in an
InspectContent request. Client-provided
metadata is a list of key-value pairs that are passed in the
ContentMetadata field of the
ContentItem.
Limitations
Custom infoTypes of type
MetadataKeyValueExpression
aren't supported in the following:
Create a metadata label custom infoType detector
To create a metadata label custom infoType detector, define a
CustomInfoType of type
MetadataKeyValueExpression
within an InspectConfig object. The
CustomInfoType object has the following properties:
{
"inspect_config": {
"custom_info_types": [
{
"info_type": {
"name": "CUSTOM_METADATA_LABEL_NAME"
},
"likelihood": "LIKELIHOOD",
"sensitivityScore":{
"score": "SENSITIVITY_SCORE"
},
"metadata_key_value_expression": {
"key_regex": "KEY_REGULAR_EXPRESSION",
"value_regex": "VALUE_REGULAR_EXPRESSION"
}
}
]
}
}
Replace the following:
CUSTOM_METADATA_LABEL_NAME: The name to assign to the custom infoType detector.LIKELIHOOD: (Optional) TheLikelihoodvalue to assign to all findings that match this custom infoType. If you omit this field, the default likelihood level isVERY_LIKELY.SENSITIVITY_SCORE: (Optional) TheSensitivityScoreto assign to all findings that match this custom infoType. If you omit this field, the default sensitivity score isHIGH.Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.
KEY_REGULAR_EXPRESSION: A regular expression to search for in the keys of metadata labels.VALUE_REGULAR_EXPRESSION: A regular expression to search for in the values of metadata labels.
Example detector for a Microsoft sensitivity label
This inspect_config example defines a custom infoType named
CUSTOM_MIP_HIGHLY_CONFIDENTIAL. This custom infoType detects a Microsoft
Purview Information Protection label that contains the GUID
12345678-9012-3456-7890-123456789012 and is enabled:
{
"inspect_config": {
"custom_info_types": [
{
"info_type": {
"name": "CUSTOM_MIP_HIGHLY_CONFIDENTIAL"
},
"likelihood": "VERY_LIKELY",
"metadata_key_value_expression": {
"key_regex": "MSIP_Label_12345678-9012-3456-7890-123456789012_Enabled",
"value_regex": "true"
}
}
],
"min_likelihood": "POSSIBLE"
}
}
When you use this configuration in an inspection job,
Sensitive Data Protection generates a CUSTOM_MIP_HIGHLY_CONFIDENTIAL
finding if it finds content where the metadata key
MSIP_Label_12345678-9012-3456-7890-123456789012_Enabled has the value true.
Scan for client-provided metadata
To scan for client-provided metadata labels, follow these steps:
- Create a custom metadata label infoType detector.
- Include the metadata that you want to scan in the
ContentMetadatafield of yourContentItem.
Example request for scanning client-provided metadata
The following example shows an InspectContent request that includes both a
PDF file and client-provided metadata. The request uses a custom infoType
named CUSTOM_MIP_CONFIDENTIAL_INTERNAL_USE to scan both the file and the
provided metadata for files that are marked as "Confidential" or "Internal Use".
{
"inspect_config": {
"custom_info_types": [
{
"info_type": {
"name": "CUSTOM_MIP_CONFIDENTIAL_INTERNAL_USE"
},
"likelihood": "VERY_LIKELY",
"metadata_key_value_expression": {
"key_regex": "MSIP_Label_.*_Name",
"value_regex": "Confidential|Internal Use"
}
}
]
},
"item": {
"byte_item": {
"type": "PDF",
"data": "BASE64_ENCODED_PDF"
},
"content_metadata": {
"properties": [
{
"key": "MSIP_Label_174b6716-c2ea-4041-b631-5633733fbe46_Name",
"value": "Confidential"
}
]
}
}
}
Replace BASE64_ENCODED_PDF with a base64-encoded
file to scan.
If Sensitive Data Protection finds a match in the client-provided metadata,
the finding's MetadataType for
MetadataLocation is CLIENT_PROVIDED_METADATA. If the match is in the
file-extracted metadata, such as an MSIP label, the value is
CONTENT_METADATA.
The MetadataType for MetadataLocation is populated based on whether the
match is in file-extracted or client-supplied metadata.