- 2.29.0 (latest)
- 2.28.0
- 2.27.0
- 2.26.0
- 2.25.0
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
KMeans(
n_clusters: int = 8,
*,
init: typing.Literal["kmeans++", "random", "custom"] = "kmeans++",
init_col: typing.Optional[str] = None,
distance_type: typing.Literal["euclidean", "cosine"] = "euclidean",
max_iter: int = 20,
tol: float = 0.01,
warm_start: bool = False
)K-Means clustering.
Examples:
>>> import bigframes.pandas as bpd
>>> from bigframes.ml.cluster import KMeans
>>> X = bpd.DataFrame({"feat0": [1, 1, 1, 10, 10, 10], "feat1": [2, 4, 0, 2, 4, 0]})
>>> kmeans = KMeans(n_clusters=2).fit(X)
>>> kmeans.predict(bpd.DataFrame({"feat0": [0, 12], "feat1": [0, 3]}))["CENTROID_ID"] # doctest:+SKIP
0 1
1 2
Name: CENTROID_ID, dtype: Int64
>>> kmeans.cluster_centers_ # doctest:+SKIP
centroid_id feature numerical_value categorical_value
0 1 feat0 5.5 []
1 1 feat1 1.0 []
2 2 feat0 5.5 []
3 2 feat1 4.0 []
[4 rows x 4 columns]
Properties
cluster_centers_
Information of cluster centers.
| Returns | |
|---|---|
| Type | Description |
bigframes.dataframe.DataFrame |
DataFrame of cluster centers, containing following columns: centroid_id: An integer that identifies the centroid. feature: The column name that contains the feature. numerical_value: If feature is numeric, the value of feature for the centroid that centroid_id identifies. If feature is not numeric, the value is NULL. categorical_value: An list of mappings containing information about categorical features. Each mapping contains the following fields: categorical_value.category: The name of each category. categorical_value.value: The value of categorical_value.category for the centroid that centroid_id identifies. The output contains one row per feature per centroid. |
Methods
__repr__
__repr__()Print the estimator's constructor with all non-default parameter values.
detect_anomalies
detect_anomalies(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
*,
contamination: float = 0.1
) -> bigframes.dataframe.DataFrameDetect the anomaly data points of the input.
| Returns | |
|---|---|
| Type | Description |
bigframes.dataframe.DataFrame |
detected DataFrame. |
fit
fit(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
y: typing.Optional[
typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
]
] = None,
) -> bigframes.ml.base._TCompute k-means clustering.
| Parameters | |
|---|---|
| Name | Description |
X |
bigframes.dataframe.DataFrame or bigframes.series.Series or pandas.core.frame.DataFrame or pandas.core.series.Series
DataFrame of shape (n_samples, n_features). Training data. |
y |
default None
Not used, present here for API consistency by convention. |
| Returns | |
|---|---|
| Type | Description |
KMeans |
Fitted estimator. |
get_params
get_params(deep: bool = True) -> typing.Dict[str, typing.Any]Get parameters for this estimator.
| Parameter | |
|---|---|
| Name | Description |
deep |
bool, default True
Default |
| Returns | |
|---|---|
| Type | Description |
Dictionary |
A dictionary of parameter names mapped to their values. |
predict
predict(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
) -> bigframes.dataframe.DataFramePredict the closest cluster each sample in X belongs to.
| Returns | |
|---|---|
| Type | Description |
bigframes.dataframe.DataFrame |
DataFrame of shape (n_samples, n_input_columns + n_prediction_columns). Returns predicted labels. |
register
register(vertex_ai_model_id: typing.Optional[str] = None) -> bigframes.ml.base._TRegister the model to Vertex AI.
After register, go to the Google Cloud console (https://console.cloud.google.com/vertex-ai/models) to manage the model registries. Refer to https://cloud.google.com/vertex-ai/docs/model-registry/introduction for more options.
| Parameter | |
|---|---|
| Name | Description |
vertex_ai_model_id |
Optional[str], default None
Optional string id as model id in Vertex. If not set, will default to 'bigframes_{bq_model_id}'. Vertex Ai model id will be truncated to 63 characters due to its limitation. |
score
score(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
y=None,
) -> bigframes.dataframe.DataFrameCalculate evaluation metrics of the model.
| Returns | |
|---|---|
| Type | Description |
bigframes.dataframe.DataFrame |
DataFrame of the metrics. |
to_gbq
to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.cluster.KMeansSave the model to BigQuery.
| Returns | |
|---|---|
| Type | Description |
KMeans |
Saved model. |