Create a custom metadata label detector

You can configure Sensitive Data Protection to detect specific metadata labels in your content. If Sensitive Data Protection finds content that matches your metadata criteria, it generates a finding.

To scan for metadata labels, create a custom metadata label infoType. Then, configure your inspection or discovery scan to search for that infoType.

Benefits and use cases

This feature lets you use your existing classification taxonomies for inspection and policy enforcement. If you use a custom or third-party classification system that applies metadata labels to your documents, you can configure Sensitive Data Protection to detect these metadata labels during your inspection or discovery operations.

Example use cases include the following:

  • Scan files for the presence of Microsoft sensitivity labels that contain specific key-value pairs.
  • Block or allow files in Gemini Enterprise based on their metadata labels.
  • Combine metadata label detection with standard infoType detection for a multi-layered approach.

Supported file types

  • DOCX
  • PDF
  • PPTX
  • XLSX

Supported metadata formats

This feature can detect Microsoft Purview Information Protection metadata that have the following name format:

MSIP_Label_GUID_ATTRIBUTE

Replace the following:

  • GUID: The globally unique identifier of the metadata.
  • ATTRIBUTE: The Microsoft Information Protection attribute of the metadata. Accepted values:

    • ActionId
    • ContentBits
    • Enabled
    • Method
    • Name
    • SetDate
    • SiteId

Limitations

Custom infoTypes of type MetadataKeyValueExpression aren't supported in the following:

Create a metadata label custom infoType detector

To create a metadata label custom infoType detector, define a CustomInfoType of type MetadataKeyValueExpression within an InspectConfig object. The CustomInfoType object has the following properties:

{
  "inspect_config": {
    "custom_info_types": [
      {
        "info_type": {
          "name": "CUSTOM_METADATA_LABEL_NAME"
        },
        "likelihood": "LIKELIHOOD",
        "sensitivityScore":{
          "score": "SENSITIVITY_SCORE"
        },
        "metadata_key_value_expression": {
          "key_regex": "KEY_REGULAR_EXPRESSION",
          "value_regex": "VALUE_REGULAR_EXPRESSION"
        }
      }
    ]
  }
}

Replace the following:

  • CUSTOM_METADATA_LABEL_NAME: The name to assign to the custom infoType detector.
  • LIKELIHOOD: (Optional) The Likelihood value to assign to all findings that match this custom infoType. If you omit this field, the default likelihood level is VERY_LIKELY.
  • SENSITIVITY_SCORE: (Optional) The SensitivityScore to assign to all findings that match this custom infoType. If you omit this field, the default sensitivity score is HIGH.

    Sensitivity scores are used in data profiles. When profiling your data, Sensitive Data Protection uses the sensitivity scores of the infoTypes to calculate the sensitivity level.

  • KEY_REGULAR_EXPRESSION: A regular expression to search for in the keys of metadata labels.

  • VALUE_REGULAR_EXPRESSION: A regular expression to search for in the values of metadata labels.

Example detector for a Microsoft sensitivity label

This inspect_config example defines a custom infoType named CUSTOM_MIP_HIGHLY_CONFIDENTIAL. This custom infoType detects a Microsoft Purview Information Protection label that contains the GUID 12345678-9012-3456-7890-123456789012 and is enabled:

{
  "inspect_config": {
    "custom_info_types": [
      {
        "info_type": {
          "name": "CUSTOM_MIP_HIGHLY_CONFIDENTIAL"
        },
        "likelihood": "VERY_LIKELY",
        "metadata_key_value_expression": {
          "key_regex": "MSIP_Label_12345678-9012-3456-7890-123456789012_Enabled",
          "value_regex": "true"
        }
      }
    ],
    "min_likelihood": "POSSIBLE"
  }
}

When you use this configuration in an inspection job, Sensitive Data Protection generates a CUSTOM_MIP_HIGHLY_CONFIDENTIAL finding if it finds content where the metadata key MSIP_Label_12345678-9012-3456-7890-123456789012_Enabled has the value true.