Class KMeans (2.29.0)

KMeans(
    n_clusters: int = 8,
    *,
    init: typing.Literal["kmeans++", "random", "custom"] = "kmeans++",
    init_col: typing.Optional[str] = None,
    distance_type: typing.Literal["euclidean", "cosine"] = "euclidean",
    max_iter: int = 20,
    tol: float = 0.01,
    warm_start: bool = False
)

K-Means clustering.

Examples:

>>> import bigframes.pandas as bpd
>>> from bigframes.ml.cluster import KMeans

>>> X = bpd.DataFrame({"feat0": [1, 1, 1, 10, 10, 10], "feat1": [2, 4, 0, 2, 4, 0]})
>>> kmeans = KMeans(n_clusters=2).fit(X)
>>> kmeans.predict(bpd.DataFrame({"feat0": [0, 12], "feat1": [0, 3]}))["CENTROID_ID"] # doctest:+SKIP
0    1
1    2
Name: CENTROID_ID, dtype: Int64

>>> kmeans.cluster_centers_ # doctest:+SKIP
centroid_id feature  numerical_value categorical_value
0            1   feat0              5.5                []
1            1   feat1              1.0                []
2            2   feat0              5.5                []
3            2   feat1              4.0                []

[4 rows x 4 columns]

Properties

cluster_centers_

Information of cluster centers.

Returns

Type Description

bigframes.dataframe.DataFrame DataFrame of cluster centers, containing following columns: centroid_id: An integer that identifies the centroid. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the centroid that centroid_id identifies. If feature is not numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per centroid.

Methods

repr

__repr__()

Print the estimator's constructor with all non-default parameter values.

detect_anomalies

detect_anomalies(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    *,
    contamination: float = 0.1
) -> bigframes.dataframe.DataFrame

Detect the anomaly data points of the input.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	detected DataFrame.

fit

fit(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
) -> bigframes.ml.base._T

Compute k-means clustering.

Parameters
Name	Description
`X`	`bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series` DataFrame of shape (n_samples, n_features). Training data.
`y`	`default None` Not used, present here for API consistency by convention.

Returns
Type	Description
`KMeans`	Fitted estimator.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
Name	Description
`deep`	`bool, default True` Default `True`. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type	Description
`Dictionary`	A dictionary of parameter names mapped to their values.

predict

predict(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
) -> bigframes.dataframe.DataFrame

Predict the closest cluster each sample in X belongs to.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	DataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted labels.

register

register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._T

After register, go to the Google Cloud console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.

Parameter
Name	Description
`vertex_ai_model_id`	`Optional[str], default None` Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation.

score

score(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y=None,
) -> bigframes.dataframe.DataFrame

Calculate evaluation metrics of the model.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	DataFrame of the metrics.

to_gbq

to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.cluster.KMeans

Save the model to BigQuery.

Returns
Type	Description
`KMeans`	Saved model.