Class KMeans (2.29.0)

KMeans(
    n_clusters: int = 8,
    *,
    init: typing.Literal["kmeans++", "random", "custom"] = "kmeans++",
    init_col: typing.Optional[str] = None,
    distance_type: typing.Literal["euclidean", "cosine"] = "euclidean",
    max_iter: int = 20,
    tol: float = 0.01,
    warm_start: bool = False
)

K-Means clustering.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.cluster import KMeans

>>> X = bpd.DataFrame({"feat0": [1, 1, 1, 10, 10, 10], "feat1": [2, 4, 0, 2, 4, 0]})
>>> kmeans = KMeans(n_clusters=2).fit(X)
>>> kmeans.predict(bpd.DataFrame({"feat0": [0, 12], "feat1": [0, 3]}))["CENTROID_ID"] # doctest:+SKIP
0    1
1    2
Name: CENTROID_ID, dtype: Int64

>>> kmeans.cluster_centers_ # doctest:+SKIP
centroid_id feature  numerical_value categorical_value
0            1   feat0              5.5                []
1            1   feat1              1.0                []
2            2   feat0              5.5                []
3            2   feat1              4.0                []

[4 rows x 4 columns]

Properties

cluster_centers_

Information of cluster centers.

Returns
Type Description
bigframes.dataframe.DataFrame DataFrame of cluster centers, containing following columns: centroid_id: An integer that identifies the centroid. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the centroid that centroid_id identifies. If feature is not numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per centroid.

Methods

__repr__

__repr__()

Print the estimator's constructor with all non-default parameter values.

detect_anomalies

detect_anomalies(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    *,
    contamination: float = 0.1
) -> bigframes.dataframe.DataFrame

Detect the anomaly data points of the input.

Returns
Type Description
bigframes.dataframe.DataFrame detected DataFrame.

fit

fit(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
) -> bigframes.ml.base._T

Compute k-means clustering.

Parameters
Name Description
X bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series

DataFrame of shape (n_samples, n_features). Training data.

y default None

Not used, present here for API consistency by convention.

Returns
Type Description
KMeans Fitted estimator.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
Name Description
deep bool, default True

Default True. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type Description
Dictionary A dictionary of parameter names mapped to their values.

predict

predict(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
) -> bigframes.dataframe.DataFrame

Predict the closest cluster each sample in X belongs to.

Returns
Type Description
bigframes.dataframe.DataFrame DataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted labels.

register

register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T

Register the model to Vertex AI.

After register, go to the Google Cloud console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
Name Description
vertex_ai_model_id Optional[str], default None

Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

score(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y=None,
) -> bigframes.dataframe.DataFrame

Calculate evaluation metrics of the model.

Returns
Type Description
bigframes.dataframe.DataFrame DataFrame of the metrics.

to_gbq

to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.cluster.KMeans

Save the model to BigQuery.

Returns
Type Description
KMeans Saved model.