Sensitive Data Protection's built-in infoType detectors are effective at finding common types of sensitive data. Custom infoType detectors enable you to fully customize your own sensitive data detector. Inspection rules help refine the scan results that the Sensitive Data Protection returns by modifying the detection mechanism of a given infoType detector.
If you want to exclude or include more values from the results that are returned by a built-in infoType detector, you can create a new custom infoType from scratch and define all the criteria that Sensitive Data Protection should look for. Alternatively, you can refine the findings that Sensitive Data Protection's built-in or custom detectors return according to criteria that you specify. You can do this by adding inspection rules that can help reduce noise, increase precision and recall, or adjust likelihood certainty of scan findings.
This topic discusses how to use the two types of inspection rules to either exclude certain findings or add additional findings, all according to custom criteria that you specify. Presented in this topic are several scenarios in which you might want to alter an existing infoType detector.
The following types of inspection rules are available:
- Exclusion rules, which help exclude false or unwanted findings.
- Hotword rules, which help detect additional findings in text content.
- Adjustment rules, which help adjust the likelihood of findings based on the context in which they appear.
Target and context infoTypes
This document uses the following terms to refer to infoTypes in a ruleset.
- Target infoType: an infoType that Sensitive Data Protection excludes or adjusts when the conditions defined in the ruleset are met.
- Context infoType: an infoType that, if detected, provides context about a target infoType. Sensitive Data Protection uses context infoTypes to determine whether it must exclude or adjust a target infoType.
Rule ordering and chaining
Sensitive Data Protection applies the rules in the order you specify them in the ruleset. Therefore, the order of your rules can affect the results of the Sensitive Data Protection operation. For an example, see Boost the likelihood of person name in a health document and exclude the health document in this document.
Exclusion rules
Exclusion rules are useful in situations like the following:
- You want to exclude duplicate scan matches in results that are caused by overlapping infoType detectors. For example, you're scanning for email addresses and phone numbers, but you are receiving two hits for email addresses with phone numbers in them, such as "206-555-0764@example.org."
- You're experiencing noise in your scan results. For example, you're seeing the same dummy email address (such as example@example.com") or domain (such as "example.com") returned an inordinate number of times by a scan for legitimate email addresses.
- You have a list of terms, phrases, or combination of characters that you want to exclude from results.
- You want to exclude an entire column of data from results.
- You want to exclude findings that are near a string that matches a regular expression.
- You want to exclude findings in images based on their spatial relationship with other detected findings in the image.
Exclusion rules API overview
Sensitive Data Protection defines an exclusion rule in the
ExclusionRule
object. Within ExclusionRule you specify one of the following:
- A
Dictionaryobject, which contains a list of strings to exclude from the results. - A
Regexobject, which defines a regular expression pattern. Strings that match the pattern are excluded from the results. - An
ExcludeInfoTypesobject, which contains an array of infoType detectors. If a finding is matched by any of the infoType detectors listed here, the finding is excluded from the results. An
ExcludeByHotwordobject, which contains the following:- A regular expression that defines the hotword.
- A proximity value that defines how near the hotword must be to the finding.
If the finding is within the set proximity, that finding is excluded from the results. For tables, this exclusion rule type lets you exclude an entire column of data from the results.
An
ExcludeByImageFindingsobject, which contains the following:A list of context infoTypes, which are used to determine whether to exclude a target infoType finding.
An
ImageContainmentTypeobject that specifies the required spatial relationship between the bounding boxes of the target and context findings. If this requirement isn't met, Sensitive Data Protection doesn't exclude the target infoType finding.
Sensitive Data Protection excludes a target infoType finding if the bounding box of a context infoType has the specified relationship with the target infoType finding. When using
ExcludeByImageFindings, you must set thematchingTypefield toMATCHING_TYPE_RULE_SPECIFIC.
Exclusion rule example scenarios
Each of the following JSON snippets illustrates how to configure Sensitive Data Protection for the given scenario.
Omit specific email address from EMAIL_ADDRESS detector scan
The following JSON snippet and code in several languages illustrates how to
indicate to Sensitive Data Protection using an
InspectConfig
that it should avoid matching on
"example@example.com" in a scan that uses the infoType detector EMAIL_ADDRESS:
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"rules":[
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"example@example.com"
]
}
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
]
}
...
Omit email addresses ending with a specific domain from EMAIL_ADDRESS detector scan
The following JSON snippet and code in several languages illustrates how to
indicate to Sensitive Data Protection using an
InspectConfig
that it should avoid matching on any email addresses that end with
"@example.com" in a scan that uses the infoType detector EMAIL_ADDRESS:
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
],
"rules":[
{
"exclusionRule":{
"regex":{
"pattern":".+@example.com"
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
]
}
...
Omit scan matches that include the substring "TEST"
The following JSON snippet and code in several languages illustrates how to
indicate to Sensitive Data Protection using an
InspectConfig
that it should exclude any findings that include the token "TEST" from the
specified list of infoTypes.
Note that this matches on "TEST" as a token, not a substring, so that although something like "TEST@email.com" will match, "TESTER@email.com" will not. If matching on a substring is desired, use a regex in the exclusion rule instead of a dictionary.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
},
{
"name":"DOMAIN_NAME"
},
{
"name":"PHONE_NUMBER"
},
{
"name":"PERSON_NAME"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
},
{
"name":"DOMAIN_NAME"
},
{
"name":"PHONE_NUMBER"
},
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"TEST"
]
}
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Omit scan matches that include the substring "Jimmy" from a custom infoType detector scan
The following JSON snippet and code in several languages illustrates how to
indicate to Sensitive Data Protection using an
InspectConfig
that it should avoid matching on the name "Jimmy" in a scan that
uses the specified custom regex detector:
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"customInfoTypes":[
{
"infoType":{
"name":"CUSTOM_NAME_DETECTOR"
},
"regex":{
"pattern":"[A-Z][a-z]{1,15}, [A-Z][a-z]{1,15}"
}
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"CUSTOM_NAME_DETECTOR"
}
],
"rules":[
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"Jimmy"
]
}
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Omit scan matches from a PERSON_NAME detector scan that overlap with a custom detector
In this scenario, the user does not want a match from a Sensitive Data Protection
scan using the PERSON_NAME built-in detector returned if the match would also
be matched in a scan using the custom regex detector defined in the first part
of the snippet.
The following JSON snippet and code in several languages specifies both a custom
regex detector and an exclusion rule in the
InspectConfig.
The custom regex detector specifies the names to exclude from results. The
exclusion rule specifies that if any results returned from a scan for
PERSON_NAME would also be matched by the custom regex detector, they are
omitted. Note that VIP_DETECTOR in this case is marked as
EXCLUSION_TYPE_EXCLUDE, so it will not produce results itself. It will only
affect results produced by the PERSON_NAME detector.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"customInfoTypes":[
{
"infoType":{
"name":"VIP_DETECTOR"
},
"regex":{
"pattern":"Dana Williams|Quinn Garcia"
},
"exclusionType":"EXCLUSION_TYPE_EXCLUDE"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"exclusionRule":{
"excludeInfoTypes":{
"infoTypes":[
{
"name":"VIP_DETECTOR"
}
]
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
]
}
...
Omit matches on PERSON_NAME detector if also matched by EMAIL_ADDRESS detector
The following JSON snippet and code in several languages illustrate how to
indicate to Sensitive Data Protection using an
InspectConfig
that it should only return one match in the case that matches for the
PERSON_NAME detector overlap with matches for the EMAIL_ADDRESS detector.
Doing this is to avoid the situation where an email address such as
"james@example.com" matches on both the PERSON_NAME and EMAIL_ADDRESS
detectors.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
},
{
"name":"EMAIL_ADDRESS"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"exclusionRule":{
"excludeInfoTypes":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Omit matches on domain names that are part of email addresses in a DOMAIN_NAME detector scan
The following JSON snippet and code in several languages illustrate how to
indicate to Sensitive Data Protection using an
InspectConfig
that it should only return matches for a DOMAIN_NAME detector scan if the
match does not overlap with a match in an EMAIL_ADDRESS detector scan. In
this scenario, the main scan is a DOMAIN_NAME detector scan. The user does not
want a domain name match returned in findings if the domain name is used in an
email address:
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"DOMAIN_NAME"
},
{
"name":"EMAIL_ADDRESS"
}
],
"customInfoTypes":[
{
"infoType":{
"name":"EMAIL_ADDRESS"
},
"exclusionType":"EXCLUSION_TYPE_EXCLUDE"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"DOMAIN_NAME"
}
],
"rules":[
{
"exclusionRule":{
"excludeInfoTypes":{
"infoTypes":[
{
"name":"EMAIL_ADDRESS"
}
]
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Omit matches if they are located near a string
The following example illustrates how to exclude matches on the
US_SOCIAL_SECURITY_NUMBER infoType detector if the word "SKU" is within 10
characters before or 10 characters after the finding.
Because of the exclusion rule, this example doesn't classify 222-22-2222 as a
possible US Social Security number.
{
"item": {
"value": "The customer sent the product SKU 222-22-2222"
},
"inspectConfig": {
"infoTypes": [
{
"name": "US_SOCIAL_SECURITY_NUMBER"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "US_SOCIAL_SECURITY_NUMBER"
}
],
"rules": [
{
"exclusionRule": {
"excludeByHotword": {
"hotwordRegex": {
"pattern": "(SKU)"
},
"proximity": {
"windowBefore": 10,
"windowAfter": 10
}
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
],
"includeQuote": true
}
}
Omit findings in an entire column of data
The following example illustrates how to exclude findings in an entire column of
tabular data if the name of that column matches a regular expression. Here, any
finding that matches the US_SOCIAL_SECURITY_NUMBER infoType detector is
excluded from results if that finding is in the Fake Social Security Number
column.
This example returns only 222-22-2222, because 111-11-1111 is in the Fake
Social Security Number column.
{
"item": {
"table": {
"headers": [
{
"name": "Fake Social Security Number"
},
{
"name": "Real Social Security Number"
}
],
"rows": [
{
"values": [
{
"stringValue": "111-11-1111"
},
{
"stringValue": "222-22-2222"
}
]
}
]
}
},
"inspectConfig": {
"infoTypes": [
{
"name": "US_SOCIAL_SECURITY_NUMBER"
}
],
"includeQuote": true,
"ruleSet": [
{
"infoTypes": [
{
"name": "US_SOCIAL_SECURITY_NUMBER"
}
],
"rules": [
{
"exclusionRule": {
"excludeByHotword": {
"hotwordRegex": {
"pattern": "(Fake Social Security Number)"
},
"proximity": {
"windowBefore": 1
}
},
"matchingType": "MATCHING_TYPE_FULL_MATCH"
}
}
]
}
],
"minLikelihood": "POSSIBLE"
}
}
Omit findings in images based on spatial relationships
The following JSON snippets illustrate how to configure Sensitive Data Protection to exclude findings in images based on their spatial relationship with other detected objects.
Exclude a person finding if it is part of a passport
This rule excludes person findings (OBJECT_TYPE/PERSON) when they are
contained within a passport finding (OBJECT_TYPE/PERSON/PASSPORT).
{
"inspectConfig": {
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON"
},
{
"name": "OBJECT_TYPE/PERSON/PASSPORT"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON"
}
],
"rules": [
{
"exclusionRule": {
"excludeByImageFindings": {
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON/PASSPORT"
}
],
"imageContainmentType": {
"encloses": {}
}
},
"matchingType": "MATCHING_TYPE_RULE_SPECIFIC"
}
}
]
}
]
}
}
Exclude a license plate finding if it contains a VIN
This rule excludes a finding of a license plate (OBJECT_TYPE/LICENSE_PLATE) if
a VEHICLE_IDENTIFICATION_NUMBER finding is fully inside it.
{
"inspectConfig": {
"infoTypes": [
{
"name": "OBJECT_TYPE/LICENSE_PLATE"
},
{
"name": "VEHICLE_IDENTIFICATION_NUMBER"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "OBJECT_TYPE/LICENSE_PLATE"
}
],
"rules": [
{
"exclusionRule": {
"excludeByImageFindings": {
"infoTypes": [
{
"name": "VEHICLE_IDENTIFICATION_NUMBER"
}
],
"imageContainmentType": {
"fullyInside": {}
}
},
"matchingType": "MATCHING_TYPE_RULE_SPECIFIC"
}
}
]
}
]
}
}
Hotword rules
Hotword rules are useful in situations like the following:
- You want to change likelihood values assigned to scan matches based on the match's proximity to a hotword. For example, you want to set the likelihood value higher for matches on patient names depending on the names' proximity to the word "patient."
- When inspecting structured, tabular data, you want to change likelihood
values assigned to matches based on a column header name. For example, you
want to set the likelihood value higher for
US_SOCIAL_SECURITY_NUMBERwhen found in a column with headerACCOUNT_ID.
Hotword rules API overview
Within Sensitive Data Protection's
InspectionRule
object, you specify a
HotwordRule
object, which adjusts the likelihood of findings within a certain proximity
of hotwords.
InspectionRule objects are grouped as a "rule set" in an InspectionRuleSet
object, along with a list of infoType detectors the rule set applies to. Rules
within a rule set are applied in the order specified.
Hotword rule example scenarios
The following code snippet illustrates how to configure a hotword rule.
Increase the likelihood of a PERSON_NAME match if there is the hotword "patient" nearby
The following JSON snippet and code in several languages illustrate using the
InspectConfig property
for the purpose of scanning a medical database for patient names. You
can use Sensitive Data Protection's built-in PERSON_NAME infoType detector,
but that will cause Sensitive Data Protection to match on all names of
people, not just names of patients. To fix this, you can include a hotword rule
that looks for the word "patient" within a certain character proximity from the
first character of potential matches. You can then assign findings matching this
pattern a likelihood of
"very likely," since they correspond to your special criteria. Setting the
minimum
Likelihood
to VERY_LIKELY within
InspectConfig
ensures that only matches to this configuration are returned in findings.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"patient"
},
"proximity":{
"windowBefore":50
},
"likelihoodAdjustment":{
"fixedLikelihood":"VERY_LIKELY"
}
}
}
]
}
],
"minLikelihood":"VERY_LIKELY"
}
...
For more detailed information about hotwords, see Customizing match likelihood.
Adjustment rules
Adjustment rules can help you refine detection accuracy by increasing (also called boosting) or decreasing the likelihood values of findings based on the context in which they appear.
Adjustment rules are useful in situations like the following:
You want to change the likelihood value of a target infoType finding if the finding overlaps with a context infoType finding. For example, increase the likelihood values of
PERSON_NAMEfindings that are detected in documents that matchDOCUMENT_TYPE/CONTEXT/HEALTH.You want to change the likelihood value of a target infoType finding based on its spatial relationship with other detected findings in the image. For example, increase the likelihood values of
GENERIC_IDfindings that are detected within images that matchOBJECT_TYPE/PERSON/PHOTO_ID_CARD.
Adjustment rules API overview
Sensitive Data Protection defines an adjustment rule in the AdjustmentRule
object. Within AdjustmentRule, you specify the following:
One of the following:
An
AdjustByMatchingInfoTypesobject, which adjusts the likelihood value of a target infoType finding if that finding overlaps a finding of a context infoType. This object contains the following:- A list of context infoTypes, which are used to determine whether to adjust the target infoType finding.
- A
min_likelihoodvalue for the context infoTypes. If the likelihood value of any detected context finding is lower than this value, Sensitive Data Protection doesn't adjust the likelihood of the target finding. - A
matching_typevalue which must be set toMATCHING_TYPE_PARTIAL_MATCH.
An
AdjustByImageFindingsobject, which adjusts the likelihood value of a target infoType finding if the bounding box of a context infoType has the specified relationship with the target infoType finding. This object contains the following:A list of context infoTypes, which are used to determine whether to adjust the target infoType finding.
An
ImageContainmentTypeobject, which specifies the required spatial relationship between the bounding boxes of the target and context findings. If this requirement isn't met, Sensitive Data Protection doesn't adjust the likelihood of the target infoType finding.A
min_likelihoodvalue for the context infoTypes. If the likelihood value of any detected context finding is lower than this value, Sensitive Data Protection doesn't adjust the likelihood of the target finding.
A
likelihood_adjustmentobject, which specifies the new likelihood as either a fixed value or a relative adjustment.
Adjustment rule example scenarios
The following JSON snippets illustrate how to configure adjustment rules for various scenarios.
Boost the likelihood of person name when used in health document
The following example boosts the likelihood value of a PERSON_NAME finding to
VERY_LIKELY if the finding overlaps with a DOCUMENT_TYPE/CONTEXT/HEALTH
finding.
{
"parent": "projects/PROJECT_ID",
"inspectConfig": {
"infoTypes": [
{
"name": "PERSON_NAME"
},
{
"name": "DOCUMENT_TYPE/CONTEXT/HEALTH"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "PERSON_NAME"
}
],
"rules": [
{
"adjustmentRule": {
"adjustByMatchingInfoTypes": {
"infoTypes": [
{
"name": "DOCUMENT_TYPE/CONTEXT/HEALTH"
}
],
"minLikelihood": "POSSIBLE",
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
},
"likelihoodAdjustment": {
"fixedLikelihood": "VERY_LIKELY"
}
}
}
]
}
]
},
"item": {
"value": "My name is John and my arm is broken."
}
}
Replace PROJECT_ID with the ID of the project associated
with your request.
Boost the likelihood of person name in a health document and exclude the health document
The following example boosts the likelihood value of a PERSON_NAME finding to
VERY_LIKELY if the finding overlaps with a DOCUMENT_TYPE/CONTEXT/HEALTH
finding. To reduce noise, this example specifies a second rule, which excludes
DOCUMENT_TYPE/CONTEXT/HEALTH findings from inspection results.
The order in which the rules in a ruleset are specified is important. In this
example, the adjustment rule is specified before the exclusion rule, so that the
DOCUMENT_TYPE/CONTEXT/HEALTH findings can be used to provide context to the
adjustment rule. If you specify the exclusion rule first, then the
DOCUMENT_TYPE/CONTEXT/HEALTH findings are excluded from the result set before
they can be used to provide context to the adjustment rule.
{
"parent": "projects/PROJECT_ID",
"inspectConfig": {
"infoTypes": [{
"name": "PERSON_NAME"
}, {
"name": "DOCUMENT_TYPE/CONTEXT/HEALTH"
}],
"minLikelihood": "VERY_UNLIKELY",
"ruleSet": [{
"infoTypes": [{
"name": "PERSON_NAME"
}],
"rules": [{
"adjustmentRule": {
"adjustByMatchingInfoTypes": {
"infoTypes": [{
"name": "DOCUMENT_TYPE/CONTEXT/HEALTH"
}],
"minLikelihood": "VERY_UNLIKELY",
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
},
"likelihoodAdjustment": {
"fixedLikelihood": "VERY_LIKELY"
}
}
}]
}, {
"infoTypes": [{
"name": "DOCUMENT_TYPE/CONTEXT/HEALTH"
}],
"rules": [{
"exclusionRule": {
"excludeInfoTypes": {
"infoTypes": [{
"name": "DOCUMENT_TYPE/CONTEXT/HEALTH"
}]
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}]
}]
},
"item": {
"value": "My name is John and my arm is broken."
}
}
Replace PROJECT_ID with the ID of the project associated
with your request.
Boost the likelihood of a person on a photo ID card
The following example boosts the likelihood value of an OBJECT_TYPE/PERSON
finding to VERY_LIKELY if it appears within the bounding box of an
OBJECT_TYPE/PERSON/PHOTO_ID_CARD finding.
...
"inspectConfig": {
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON"
},
{
"name": "OBJECT_TYPE/PERSON/PHOTO_ID_CARD"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON"
}
],
"rules": [
{
"adjustmentRule": {
"adjustByImageFindings": {
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON/PHOTO_ID_CARD"
}
],
"imageContainmentType": {
"encloses": {}
},
"minLikelihood": "POSSIBLE"
},
"likelihoodAdjustment": {
"fixedLikelihood": "VERY_LIKELY"
}
}
}
]
}
]
}
...
Boost the likelihood of a passport that contains personal identifiers
The following example boosts the likelihood value of an
OBJECT_TYPE/PERSON/PASSPORT finding to VERY_LIKELY if a PERSON_NAME or
DATE_OF_BIRTH finding is fully inside it.
...
"inspectConfig": {
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON/PASSPORT"
},
{
"name": "PERSON_NAME"
},
{
"name": "DATE_OF_BIRTH"
}
],
"ruleSet": [
{
"infoTypes": [
{
"name": "OBJECT_TYPE/PERSON/PASSPORT"
}
],
"rules": [
{
"adjustmentRule": {
"adjustByImageFindings": {
"infoTypes": [
{
"name": "PERSON_NAME"
},
{
"name": "DATE_OF_BIRTH"
}
],
"imageContainmentType": {
"fullyInside": {}
},
"minLikelihood": "POSSIBLE"
},
"likelihoodAdjustment": {
"fixedLikelihood": "VERY_LIKELY"
}
}
}
]
}
]
}
...
Multiple inspection rules scenario
The following
InspectConfig
JSON snippet and code in several languages illustrate applying both exclusion
and hotword rules. This snippet's rule set includes both hotword rules and
dictionary and regex exclusion rules. Notice that the four rules are specified
in an array within the rules element.
C#
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Go
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
PHP
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install and use the client library for Sensitive Data Protection, see Sensitive Data Protection client libraries.
To authenticate to Sensitive Data Protection, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST
See the JSON quickstart for more information about using the DLP API with JSON.
...
"inspectConfig":{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"ruleSet":[
{
"infoTypes":[
{
"name":"PERSON_NAME"
}
],
"rules":[
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"patient"
},
"proximity":{
"windowBefore":10
},
"likelihoodAdjustment":{
"fixedLikelihood":"VERY_LIKELY"
}
}
},
{
"hotwordRule":{
"hotwordRegex":{
"pattern":"doctor"
},
"proximity":{
"windowBefore":10
},
"likelihoodAdjustment":{
"fixedLikelihood":"UNLIKELY"
}
}
},
{
"exclusionRule":{
"dictionary":{
"wordList":{
"words":[
"Quasimodo"
]
}
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
},
{
"exclusionRule":{
"regex":{
"pattern":"REDACTED"
},
"matchingType": "MATCHING_TYPE_PARTIAL_MATCH"
}
}
]
}
]
}
...
Overlapping infoType detectors
It is possible to define a custom infoType detector that has the same name as a
built-in infoType detector. As shown in the example in the "Hotword rule
example scenarios" section, when you create a custom
infoType detector with the same name as a built-in infoType, any findings
detected by the new infoType detector are added to those detected by the
built-in detector. This is only true as long as the built-in infoType is
specified in the list of infoTypes in the
InspectConfig
object.
When creating new custom infoType detectors, test them thoroughly on example content to ensure they work as you intend.