Module cluster (2.29.0)

Clustering models. This module is styled after Scikit-Learn's cluster module: https://scikit-learn.org/stable/modules/clustering.html.

Classes

KMeans

KMeans(
    n_clusters: int = 8,
    *,
    init: typing.Literal["kmeans++", "random", "custom"] = "kmeans++",
    init_col: typing.Optional[str] = None,
    distance_type: typing.Literal["euclidean", "cosine"] = "euclidean",
    max_iter: int = 20,
    tol: float = 0.01,
    warm_start: bool = False
)

K-Means clustering.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.cluster import KMeans

>>> X = bpd.DataFrame({"feat0": [1, 1, 1, 10, 10, 10], "feat1": [2, 4, 0, 2, 4, 0]})
>>> kmeans = KMeans(n_clusters=2).fit(X)
>>> kmeans.predict(bpd.DataFrame({"feat0": [0, 12], "feat1": [0, 3]}))["CENTROID_ID"] # doctest:+SKIP
0    1
1    2
Name: CENTROID_ID, dtype: Int64

>>> kmeans.cluster_centers_ # doctest:+SKIP
centroid_id feature  numerical_value categorical_value
0            1   feat0              5.5                []
1            1   feat1              1.0                []
2            2   feat0              5.5                []
3            2   feat1              4.0                []

[4 rows x 4 columns]