Filter with natural-language understanding
Stay organized with collections
Save and categorize content based on your preferences.
This page explains how to apply natural-language understanding to automatically
make filters for search queries and, therefore, to improve the quality of the
results returned.
You can use this feature with search apps that are connected to structured data
stores.
About natural-language query understanding
If you have a custom search app with structured data,
your users' natural-language queries can be reformatted as filtered
queries. This can lead to better quality search results than searching for words
in the query string.
For example, a natural-language query such as "Find a coffee shop serving
banana bread" might be reformulated as a query and a filter:
"query": "banana bread", "filter": "type": ANY(\"cafe\").
Using natural-language query understanding is easier and more flexible than writing your own filter
expressions. For information about writing filter expressions, see
Filter custom search for structured or unstructured data.
Hard and soft filters
There are two kinds of filters that you can apply for natural-language query understanding: hard
and soft.
Hard. By default, extracted filters are applied as
mandatory criteria that a result must satisfy to be returned.
Behavior is similar to the filter field in the
SearchRequest
message.
Soft. An alternative to the hard filter is to apply a boost to the
search results. Boosted results are more likely to be returned but
results that don't meet the boost criterion can also be returned.
Behavior is similar to the boost_spec field in the
SearchRequest
message.
You can experiment with both types of filter. If searches are not returning
enough results, try the soft filter instead of the hard one.
Example: Field extraction from queries (hard filter)
This natural-language query understanding feature is explained through the example of searching for a
hotel.
Take the following query made to a structured data store for a hotel site:
"Find me a family-friendly hotel with at least four stars that costs less
than 300 a night, lets me bring my dog, and has free Wi-Fi."
Without natural-language query understanding, the search app looks for documents that contain the words
in the query.
With natural-language query understanding and appropriately structured data, the search is made more
effective by replacing some of the natural language in the query with filters.
If the structured data has fields for star_rating (numbers), price
(numbers), and amenities (strings), then the query can be formulated to include
the following filters:
This example is similar to the preceding one except that it includes a
geolocation filter, which is special kind of extracted filter.
Vertex AI Search has the ability to recognize locations in a query and
create proximity filters for the locations.
Take the following query made to a state-wide business site:
"Find me a chic and stylish hotel with at least 4 stars that is in San
Francisco."
With natural-language query understanding and the geolocation filter, the search is reformulated to
include the following filter for a hotel with at least a 4-star
rating and within a 10 km radius of San Francisco:
In this example, the GEO_DISTANCE is an address, but in other queries, it
might be written as a latitude and longitude, even though the original query
contained an address.
Example: Field extraction from queries (soft filter)
This natural-language query understanding feature is explained through the example of searching
for a hotel but showing some results that don't meet all criteria.
Take the following query made to a travel site:
"Find me a family-friendly hotel with at least four stars that costs less
than 300 a night, and lets me bring my dog."
With natural-language query understanding and appropriately structured data, the search is made more
effective by replacing some of the natural language in the query with soft
filters. If the structured data has fields for star_rating (numbers), price
(numbers), and amenities (strings), then the query can be rewritten as the
following boost:
Boost condition extracted from the natural-language query:
In this case, perhaps some lower rated hotels or non pet-friendly hotels may be
returned.
Limitations
The following limitations apply to natural-language query understanding:
Natural-language query understanding can't be applied to blended search apps. You
get an error if you try to use natural-language query understanding with a blended search app.
Natural-language query understanding works only for custom search apps that use structured data
stores.
Using natural-language query understanding increases latency, so you might choose not to use it
if latency is a problem.
For geolocation, the location must be explicitly described. You
can't use locations such as "near me" or "home".
The radius for geolocation is 10 km and isn't configurable.
Boolean fields can't be used in filters. For example, if the query is "Find
me a non-smoking hotel room", then a boolean field like
"non_smoking": true isn't useful but a string field like
"non_smoking": "YES" can be part of the filter.
Before you begin
Before you start using natural-language query understanding, you have to enable it for the structured
data stores connected to the apps that you plan to use.
To enable natural-language query understanding, follow these steps:
REST
Find your data store ID. If you already have your data store
ID, skip to the next step.
In the Google Cloud console, go to the AI Applications page and
in the navigation menu, click Data Stores.
If you try to use natural-language query understanding before the data store is ready, the response
you get is the same as if filterExtractionCondition was set to DISABLED.
Search, converting natural-language queries into filters
To search on a query in natural language and get results that are optimized for
natural-language queries, do the following:
Search, converting locations in queries to geolocation filters
To search on a query in natural language and get results that are optimized for
natural-language queries including proximity to locations, do the following:
For a field to be used as a filter in natural-language query understanding, it must be marked as
indexable in the schema. (For general information about viewing and editing a
schema, see Update a schema.)
Vertex AI Search determines which of the indexable fields in the schema
make sense to use in natural-language query understanding filters. But, if
fields are included that you don't want, then you need to create an allowlist to
specify which fields can be used.
Consider a hotel booking site, where there are fields such as
amenities, id, price_per_night, rating, and room_types. Of these, if
the id is a string of characters and numbers, Vertex AI Search is likely to
exclude it from the fields used for natural-language query understanding.
However, if you observe that Vertex AI Search is returning poor quality
query results because it's not excluding fields that it should, then, you need
to specify which fields can be used. For example, if the hotel schema has a
field for renovation_status that isn't useful to customers and might be
embarrassing to the hotel chain, then you can exclude it from the list of
allowed fields.
Example of a record from the structured data store of hotel data.
{"title":"Miller-Jones","rating":1.7,"price_per_night":115.16,"id":2902,…],"amenities":["Spa","Parking","Restaurant"…],"renovation_status":"Restaurant and spa renovation planned for 2027"}
An appropriate allowlist for fields in this example would be ["amenities",
"price_per_night", "rating", "title"].
Missing from the list is renovation_status.
To specify an allowlist of fields that can be used in a natural-language query
understanding in a search, do the following:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-03 UTC."],[],[]]