Filter with natural-language understanding

This page explains how to apply natural-language understanding to automatically make filters for search queries and, therefore, to improve the quality of the results returned.

You can use this feature with search apps that are connected to structured data stores.

About natural-language query understanding

If you have a custom search app with structured data, your users' natural-language queries can be reformatted as filtered queries. This can lead to better quality search results than searching for words in the query string.

For example, a natural-language query such as "Find a coffee shop serving banana bread" might be reformulated as a query and a filter: "query": "banana bread", "filter": "type": ANY(\"cafe\").

Using natural-language query understanding is easier and more flexible than writing your own filter expressions. For information about writing filter expressions, see Filter custom search for structured or unstructured data.

Hard and soft filters

There are two kinds of filters that you can apply for natural-language query understanding: hard and soft.

  • Hard. By default, extracted filters are applied as mandatory criteria that a result must satisfy to be returned.

    Behavior is similar to the filter field in the SearchRequest message.

  • Soft. An alternative to the hard filter is to apply a boost to the search results. Boosted results are more likely to be returned but results that don't meet the boost criterion can also be returned.

    Behavior is similar to the boost_spec field in the SearchRequest message.

You can experiment with both types of filter. If searches are not returning enough results, try the soft filter instead of the hard one.

For details on how to apply a soft filter, see Search with the soft filter below.

Examples

This feature is best explained through examples:

Example: Field extraction from queries (hard filter)

This natural-language query understanding feature is explained through the example of searching for a hotel.

Take the following query made to a structured data store for a hotel site: "Find me a family-friendly hotel with at least four stars that costs less than 300 a night, lets me bring my dog, and has free Wi-Fi."

Without natural-language query understanding, the search app looks for documents that contain the words in the query.

With natural-language query understanding and appropriately structured data, the search is made more effective by replacing some of the natural language in the query with filters. If the structured data has fields for star_rating (numbers), price (numbers), and amenities (strings), then the query can be formulated to include the following filters:

   {
       "star_rating": >=4,
       "price": <=300,
       "amenities": "Wifi", "Pets Allowed"
   }

Example: With a geolocation filter (hard filter)

This example is similar to the preceding one except that it includes a geolocation filter, which is special kind of extracted filter. Vertex AI Search has the ability to recognize locations in a query and create proximity filters for the locations.

Take the following query made to a state-wide business site: "Find me a chic and stylish hotel with at least 4 stars that is in San Francisco."

With natural-language query understanding and the geolocation filter, the search is reformulated to include the following filter for a hotel with at least a 4-star rating and within a 10 km radius of San Francisco:

   {
       "star_rating": >=4,
       "location": GEO_DISTANCE(\"San Francisco, CA\", 10000)
   }

In this example, the GEO_DISTANCE is an address, but in other queries, it might be written as a latitude and longitude, even though the original query contained an address.

Example: Field extraction from queries (soft filter)

This natural-language query understanding feature is explained through the example of searching for a hotel but showing some results that don't meet all criteria.

Take the following query made to a travel site: "Find me a family-friendly hotel with at least four stars that costs less than 300 a night, and lets me bring my dog."

With natural-language query understanding and appropriately structured data, the search is made more effective by replacing some of the natural language in the query with soft filters. If the structured data has fields for star_rating (numbers), price (numbers), and amenities (strings), then the query can be rewritten as the following boost:

Boost condition extracted from the natural-language query:

{
  "boostSpec": {
    "conditionBoostSpecs": {
      "condition": "(star_rating >= 4.5) AND (price < 200) AND ANY(amenities, \"Pets Allowed\")",
      "boost": 0.7
    }
  }
}

In this case, perhaps some lower rated hotels or non pet-friendly hotels may be returned.

Limitations

The following limitations apply to natural-language query understanding:

  • Natural-language query understanding can't be applied to blended search apps. You get an error if you try to use natural-language query understanding with a blended search app.

  • Natural-language query understanding works only for custom search apps that use structured data stores.

  • Using natural-language query understanding increases latency, so you might choose not to use it if latency is a problem.

  • For geolocation, the location must be explicitly described. You can't use locations such as "near me" or "home".

  • The radius for geolocation is 10 km and isn't configurable.

  • Boolean fields can't be used in filters. For example, if the query is "Find me a non-smoking hotel room", then a boolean field like "non_smoking": true isn't useful but a string field like "non_smoking": "YES" can be part of the filter.

Before you begin

Before you start using natural-language query understanding, you have to enable it for the structured data stores connected to the apps that you plan to use.

To enable natural-language query understanding, follow these steps:

REST

  1. Find your data store ID. If you already have your data store ID, skip to the next step.

    1. In the Google Cloud console, go to the AI Applications page and in the navigation menu, click Data Stores.

      Go to the Data Stores page

    2. Click the name of your data store.

    3. On the Data page for your data store, get the data store ID.

  2. Run the following curl command:

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID?update_mask=natural_language_query_understanding_config.mode" \
    -d '{
          "naturalLanguageQueryUnderstandingConfig": {
            "mode": "ENABLED"
          }
        }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • DATA_STORE_ID: the ID of the Vertex AI Search data store.
  3. Repeat steps 1 and 2 for each data store.

  4. Wait approximately 24 hours.

    If you try to use natural-language query understanding before the data store is ready, the response you get is the same as if filterExtractionCondition was set to DISABLED.

Search, converting natural-language queries into filters

To search on a query in natural language and get results that are optimized for natural-language queries, do the following:

REST

  1. Run the following curl command, which calls the search method:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
      -d '{
            "query": "QUERY",
            "naturalLanguageQueryUnderstandingSpec": {
              "filterExtractionCondition": "ENABLED"
            }
          }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: the ID of the Vertex AI Search app that you want to query. The app must be connected to a data store that contains structured data. The app can't be a blended search app.
    • QUERY: the query written in natural language.

Search, converting locations in queries to geolocation filters

To search on a query in natural language and get results that are optimized for natural-language queries including proximity to locations, do the following:

REST

  1. Run the following curl command, which calls the search method:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
      -d '{
            "query": "QUERY",
            "naturalLanguageQueryUnderstandingSpec": {
              "filterExtractionCondition": "ENABLED",
              "geoSearchQueryDetectionFieldNames": ["GEO_FIELD_NAME_1", "GEO_FIELD_NAME_N"]"
            }
          }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: the ID of the Vertex AI Search app that you want to query. The app must be connected to a data store that contains structured data. The app can't be a blended search app.
    • QUERY: the query written in natural language.
    • GEO_FIELD_NAME_1, GEO_FIELD_NAME_N: a list of values of type geolocation. If the value type isn't geolocation, then this field is ignored.

Search with the soft filter

To apply a soft filter, do the following:

REST

  1. Find your app ID. If you already have your app ID, skip to the next step.

    1. In the Google Cloud console, go to the AI Applications page.

      Go to Apps

    2. On the Apps page, find the name of your app and get the app's ID from the ID column.

  2. Run the following curl command, which calls the search method:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
      -d '{
            "query": "QUERY",
            "naturalLanguageQueryUnderstandingSpec": {
              "filterExtractionCondition": "ENABLED",
              "extractedFilterBehavior": "SOFT_BOOST"
            }
          }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: The ID of your search app. The app must be connected to a data store that contains structured data. The app can't be a blended search app.
    • QUERY: Your query in natural language.

Specify fields for natural-language queries

For a field to be used as a filter in natural-language query understanding, it must be marked as indexable in the schema. (For general information about viewing and editing a schema, see Update a schema.)

Vertex AI Search determines which of the indexable fields in the schema make sense to use in natural-language query understanding filters. But, if fields are included that you don't want, then you need to create an allowlist to specify which fields can be used.

Consider a hotel booking site, where there are fields such as amenities, id, price_per_night, rating, and room_types. Of these, if the id is a string of characters and numbers, Vertex AI Search is likely to exclude it from the fields used for natural-language query understanding.

However, if you observe that Vertex AI Search is returning poor quality query results because it's not excluding fields that it should, then, you need to specify which fields can be used. For example, if the hotel schema has a field for renovation_status that isn't useful to customers and might be embarrassing to the hotel chain, then you can exclude it from the list of allowed fields.

Example of a record from the structured data store of hotel data.

{
  "title": "Miller-Jones",
  "rating": 1.7,
  "price_per_night": 115.16,
  "id": 2902,
    
  ],
  "amenities": [
    "Spa",
    "Parking",
    "Restaurant"
    
  ],
  "renovation_status": "Restaurant and spa renovation planned for 2027"
}

An appropriate allowlist for fields in this example would be ["amenities", "price_per_night", "rating", "title"]. Missing from the list is renovation_status.

To specify an allowlist of fields that can be used in a natural-language query understanding in a search, do the following:

REST

  1. Find your app ID. If you already have your app ID, skip to the next step.

    1. In the Google Cloud console, go to the AI Applications page.

      Go to Apps

    2. On the Apps page, find the name of your app and get the app's ID from the ID column.

  2. Run the following curl command, which calls the search method:

    curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search" \
      -d '{
            "query": "QUERY",
            "naturalLanguageQueryUnderstandingSpec": {
              "allowedFieldNames": ["FIELD_1", "FIELD_2"],
        }
      }'
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project.
    • APP_ID: The ID of your search app. The app must be connected to a data store that contains structured data. The app can't be a blended search app.
    • QUERY: Your query in natural language.
    • FIELD_N: An indexable field in the schema that can be used for natural-language query understanding.