Class OneHotEncoder (2.29.0)

OneHotEncoder(
    drop: typing.Optional[typing.Literal["most_frequent"]] = None,
    min_frequency: typing.Optional[int] = None,
    max_categories: typing.Optional[int] = None,
)

Encode categorical features as a one-hot format.

The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka 'one-of-K' or 'dummy') encoding scheme.

Note that this method deviates from Scikit-Learn; instead of producing sparse binary columns, the encoding is a single column of STRUCT<index INT64, value DOUBLE>.

Examples:

Given a dataset with two features, we let the encoder find the unique
values per feature and transform the data to a binary one-hot encoding.

>>> from bigframes.ml.preprocessing import OneHotEncoder
>>> import bigframes.pandas as bpd

>>> enc = OneHotEncoder()
>>> X = bpd.DataFrame({"a": ["Male", "Female", "Female"], "b": ["1", "3", "2"]})
>>> enc.fit(X)
OneHotEncoder()

>>> print(enc.transform(bpd.DataFrame({"a": ["Female", "Male"], "b": ["1", "4"]})))
                onehotencoded_a               onehotencoded_b
0  [{'index': 1, 'value': 1.0}]  [{'index': 1, 'value': 1.0}]
1  [{'index': 2, 'value': 1.0}]  [{'index': 0, 'value': 1.0}]
<BLANKLINE>
[2 rows x 2 columns]

Methods

repr

__repr__()

Print the estimator's constructor with all non-default parameter values.

fit

fit(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y=None,
) -> bigframes.ml.preprocessing.OneHotEncoder

Fit OneHotEncoder to X.

Returns
Type	Description
`OneHotEncoder`	Fitted encoder.

fit_transform

fit_transform(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
    y: typing.Optional[
        typing.Union[
            bigframes.dataframe.DataFrame,
            bigframes.series.Series,
            pandas.core.frame.DataFrame,
            pandas.core.series.Series,
        ]
    ] = None,
) -> bigframes.dataframe.DataFrame

API documentation for fit_transform method.

get_params

get_params(deep: bool = True) -> typing.Dict[str, typing.Any]

Get parameters for this estimator.

Parameter
Name	Description
`deep`	`bool, default True` Default `True`. If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
Type	Description
`Dictionary`	A dictionary of parameter names mapped to their values.

to_gbq

to_gbq(model_name: str, replace: bool = False) -> bigframes.ml.base._T

Save the transformer as a BigQuery model.

Parameters
Name	Description
`model_name`	`str` The name of the model.
`replace`	`bool, default False` Determine whether to replace if the model already exists. Default to False.

transform

transform(
    X: typing.Union[
        bigframes.dataframe.DataFrame,
        bigframes.series.Series,
        pandas.core.frame.DataFrame,
        pandas.core.series.Series,
    ],
) -> bigframes.dataframe.DataFrame

Transform X using one-hot encoding.

Returns
Type	Description
`bigframes.dataframe.DataFrame`	The result is categorized as index: number, value: number, where index is the position of the dict seeing the category, and value is 0 or 1.

Class OneHotEncoder (2.29.0) Stay organized with collections Save and categorize content based on your preferences.