The ML.DISTANCE function
This document describes the ML.DISTANCE scalar function, which lets you
compute the distance between two vectors.
Syntax
ML.DISTANCE(vector1, vector2 [, type])
Arguments
ML.DISTANCE has the following arguments:
vector1: anARRAYvalue that represents the first vector, in one of the following forms:ARRAY<Numerical type>ARRAY<STRUCT<STRING, Numerical type>>ARRAY<STRUCT<INT64, Numerical type>>
where
Numerical typeisBIGNUMERIC,FLOAT64,INT64orNUMERIC. For exampleARRAY<STRUCT<INT64, BIGNUMERIC>>.When a vector is expressed as
ARRAY<Numerical type>, each element of the array denotes one dimension of the vector. An example of a four-dimensional vector is[0.0, 1.0, 1.0, 0.0].When a vector is expressed as
ARRAY<STRUCT<STRING, Numerical type>>orARRAY<STRUCT<INT64, Numerical type>>, eachSTRUCTarray item denotes one dimension of the vector. An example of a three-dimensional vector is[("a", 0.0), ("b", 1.0), ("c", 1.0)].The initial
INT64orSTRINGvalue in theSTRUCTis used as an identifier to match theSTRUCTvalues invector2. The ordering of data in the array doesn't matter; the values are matched by the identifier rather than by their position in the array. If either vector has anySTRUCTvalues with duplicate identifiers, running this function returns an error.vector2: anARRAYvalue that represents the second vector.vector2must have the same type asvector1.For example, if
vector1is anARRAY<STRUCT<STRING, FLOAT64>>column with three elements, like[("a", 0.0), ("b", 1.0), ("c", 1.0)], thenvector2must also be anARRAY<STRUCT<STRING, FLOAT64>>column.When
vector1andvector2areARRAY<Numerical type>columns, they must have the same array length.type: aSTRINGvalue that specifies the type of distance to calculate. Valid values areEUCLIDEAN,MANHATTAN, andCOSINE. If this argument isn't specified, the default value isEUCLIDEAN.
Output
ML.DISTANCE returns a FLOAT64 value that represents the distance between
the vectors. Returns NULL if either vector1 or vector2 is NULL.
Example
Get the Euclidean distance for two tensors of ARRAY<FLOAT64> values:
Create the table
t1:CREATE TABLE mydataset.t1 ( v1 ARRAY<FLOAT64>, v2 ARRAY<FLOAT64> )
Populate
t1:INSERT mydataset.t1 (v1,v2) VALUES ([4.1,0.5,1.0], [3.0,0.0,2.5])
Calculate the Euclidean norm for
v1andv2:SELECT v1, v2, ML.DISTANCE(v1, v2, 'EUCLIDEAN') AS output FROM mydataset.t1
This query produces the following output:
+---------------+---------------+-------------------+ | v1 | v2 | output | +---------------+---------------+-------------------| | [4.1,0.5,1.0] | [3.0,0.0,2.5] | 1.926136028425822 | +------------+------------------+-------------------+
What's next
- For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model.