- 2.29.0 (latest)
- 2.28.0
- 2.27.0
- 2.26.0
- 2.25.0
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
OneHotEncoder(
drop: typing.Optional[typing.Literal["most_frequent"]] = None,
min_frequency: typing.Optional[int] = None,
max_categories: typing.Optional[int] = None,
)Encode categorical features as a one-hot format.
The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka 'one-of-K' or 'dummy') encoding scheme.
Note that this method deviates from Scikit-Learn; instead of producing sparse
binary columns, the encoding is a single column of STRUCT<index INT64, value DOUBLE>.
Examples:
Given a dataset with two features, we let the encoder find the unique
values per feature and transform the data to a binary one-hot encoding.
>>> from bigframes.ml.preprocessing import OneHotEncoder
>>> import bigframes.pandas as bpd
>>> enc = OneHotEncoder()
>>> X = bpd.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
>>> enc.fit(X)
OneHotEncoder()
>>> print(enc.transform(bpd.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))
onehotencoded_a onehotencoded_b
0 [{'index': 1, 'value': 1.0}] [{'index': 1, 'value': 1.0}]
1 [{'index': 2, 'value': 1.0}] [{'index': 0, 'value': 1.0}]
<BLANKLINE>
[2 rows x 2 columns]
Methods
__repr__
__repr__()Print the estimator's constructor with all non-default parameter values.
fit
fit(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
y=None,
) -> bigframes.ml.preprocessing.OneHotEncoderFit OneHotEncoder to X.
| Returns | |
|---|---|
| Type | Description |
OneHotEncoder |
Fitted encoder. |
fit_transform
fit_transform(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
y: typing.Optional[
typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
]
] = None,
) -> bigframes.dataframe.DataFrameAPI documentation for fit_transform method.
get_params
get_params(deep: bool = True) -> typing.Dict[str, typing.Any]Get parameters for this estimator.
| Parameter | |
|---|---|
| Name | Description |
deep |
bool, default True
Default |
| Returns | |
|---|---|
| Type | Description |
Dictionary |
A dictionary of parameter names mapped to their values. |
to_gbq
to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.base._TSave the transformer as a BigQuery model.
| Parameters | |
|---|---|
| Name | Description |
model_name |
str
The name of the model. |
replace |
bool, default False
Determine whether to replace if the model already exists. Default to False. |
transform
transform(
X: typing.Union[
bigframes.dataframe.DataFrame,
bigframes.series.Series,
pandas.core.frame.DataFrame,
pandas.core.series.Series,
],
) -> bigframes.dataframe.DataFrameTransform X using one-hot encoding.
| Returns | |
|---|---|
| Type | Description |
bigframes.dataframe.DataFrame |
The result is categorized as index: number, value: number, where index is the position of the dict seeing the category, and value is 0 or 1. |