Search and recommendations both support the following world languages.
Search for commerce
View supported languages
| Supported languages | |
|---|---|
| Albanian | Korean |
| Arabic | Latvian |
| Armenian | Lithuanian |
| Assamesse | Macedonian |
| Azerbaijani | Malay |
| Basque | Marathi |
| Bengali (Bangla) | Mongolian |
| Bulgarian | Nepali |
| Burmese | Norwegian |
| Catalan | Odia |
| Chinese (simplified) | Persian |
| Chinese (traditional) | Polish |
| Croatian | Portuguese (Europe) |
| Czech | Portuguese (Brazil) |
| Danish | Punjabi |
| Dutch | Romanian |
| English | Russian |
| Estonian | Serbian |
| Finnish | Serbian (Cyrillic) |
| French (Europe) | Sinhala |
| French (Canada) | Slovak |
| Georgian | Slovenian |
| German | Spanish (Europe) |
| Greek | Spanish (Latin America) |
| Gujarati | Swahili |
| Hebrew | Swedish |
| Hindi | Tamil |
| Hungarian | Telugu |
| Icelandic | Thai |
| Indonesian | Turkish |
| Italian | Ukrainian |
| Japanese | Urdu (India) |
| Kannada | Urdu (Pakistan) |
| Kazakh | Uzbek |
| Khmer | Vietnamese |
You set the language when you upload your catalog for your Vertex AI Search for commerce project. The catalog should be in one language only and search queries should be sent in the same language. Having multiple languages in the catalog degrades LLM performance.
Recommendations
Most languages are supported. The model automatically detects the text language.
View supported languages
| Language Name | Script Name |
|---|---|
| Afrikaans | Latin |
| Amharic | Ethiopic |
| Arabic | Arabic |
| Bulgarian | Cyrillic |
| Bulgarian | Latin |
| Bangla | Bangla |
| Bosnian | Latin |
| Catalan | Latin |
| Cebuano | Latin |
| Corsican | Latin |
| Czech | Latin |
| Welsh | Latin |
| Danish | Latin |
| German | Latin |
| Greek | Greek |
| Greek | Latin |
| English | Latin |
| Esperanto | Latin |
| Spanish | Latin |
| Estonian | Latin |
| Basque | Latin |
| Persian | Arabic |
| Finnish | Latin |
| Filipino | Latin |
| French | Latin |
| Western Frisian | Latin |
| Irish | Latin |
| Scottish Gaelic | Latin |
| Galician | Latin |
| Gujarati | Gujarati |
| Hausa | Latin |
| Hawaiian | Latin |
| Hindi | Devanagari |
| Hindi | Latin |
| Hmong | Latin |
| Croatian | Latin |
| Haitian Creole | Latin |
| Hungarian | Latin |
| Armenian | Armenian |
| Indonesian | Latin |
| Igbo | Latin |
| Icelandic | Latin |
| Italian | Latin |
| Hebrew | Hebrew |
| Japanese | Japanese |
| Japanese | Latin |
| Javanese | Latin |
| Georgian | Georgian |
| Kazakh | Cyrillic |
| Khmer | Khmer |
| Kannada | Kannada |
| Korean | Korean |
| Kurdish | Latin |
| Kyrgyz | Cyrillic |
| Latin | Latin |
| Luxembourgish | Latin |
| Lao | Lao |
| Lithuanian | Latin |
| Latvian | Latin |
| Malagasy | Latin |
| Maori | Latin |
| Macedonian | Cyrillic |
| Malayalam | Malayalam |
| Mongolian | Cyrillic |
| Marathi | Devanagari |
| Malay | Latin |
| Maltese | Latin |
| Burmese | Myanmar |
| Nepali | Devanagari |
| Dutch | Latin |
| Norwegian | Latin |
| Nyanja | Latin |
| Punjabi | Gurmukhi |
| Polish | Latin |
| Pashto | Arabic |
| Portuguese | Latin |
| Romanian | Latin |
| Russian | Cyrillic |
| Russian | English |
| Sindhi | Arabic |
| Sinhala | Sinhala |
| Slovak | Latin |
| Slovenian | Latin |
| Samoan | Latin |
| Shona | Latin |
| Somali | Latin |
| Albanian | Latin |
| Serbian | Cyrillic |
| Southern Sotho | Latin |
| Sundanese | Latin |
| Swedish | Latin |
| Swahili | Latin |
| Tamil | Tamil |
| Telugu | Telugu |
| Tajik | Cyrillic |
| Thai | Thai |
| Turkish | Latin |
| Ukrainian | Cyrillic |
| Urdu | Arabic |
| Uzbek | Latin |
| Vietnamese | Latin |
| Xhosa | Latin |
| Yiddish | Hebrew |
| Yoruba | Latin |
| Chinese | Han (including Simplified and Traditional) |
| Chinese | Latin |
| Zulu | Latin |
For a list of all languages that can be automatically detected, see the Compact Language Detector GitHub README.
Language normalization and tokenization
The Vertex AI Search for commerce engine has built-in processing of Chinese or Japanese characters without spaces and normalizes European diacritics. This eliminates the need to build proprietary pre-processing translation layers into your search applications.
- Non-English character normalization: The search engine provides built-in support of UTF-8 and automatically normalizes diacritics and umlauts during indexing and querying (such as mapping
ätoaorae, andétoe). This allows users to search for cafe and seamlessly find café. - CJK tokenization (Kanji and Katakana): For Chinese, Japanese, and Korean (CJK) languages, the engine doesn't rely on spaces for tokenization. It utilizes dictionary-based segmenters and morphological analyzers to break strings of Kanji, Hiragana, Katakana, or Han characters into logical, searchable tokens.
- Strict single-language rule: Your catalog and your search queries must be in the same language. The AI doesn't translate search queries (in other words, a Spanish query won't match against an English catalog). Mixing languages heavily degrades the model's performance.
- Multilingual workaround: If a catalog must support mixed-language queries, use the
twowaySynonymsActionoronewaySynonymsActioncontrols to manually map custom query terms (such as Spanish synonyms) to the default catalog language (such as English).
For more about language settings, see About catalogs and products.