SQLScalarColumnTransformer(sql: str, target_column: str = "transformed_{0}")
Wrapper for plain SQL code contained in a ColumnTransformer.
Create a single column transformer in plain sql. This transformer can only be used inside ColumnTransformer.
When creating an instance '{0}' can be used as placeholder for the column to transform:
SQLScalarColumnTransformer("{0}+1")
The default target column gets the prefix 'transformed_' but can also be changed when creating an instance:
SQLScalarColumnTransformer("{0}+1", "inc_{0}")
Examples:
>>> from bigframes.ml.compose import ColumnTransformer, SQLScalarColumnTransformer
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'name': ["James", None, "Mary"], 'city': ["New York", "Boston", None]})
>>> col_trans = ColumnTransformer([
... ("strlen",
... SQLScalarColumnTransformer("CASE WHEN {0} IS NULL THEN 15 ELSE LENGTH({0}) END"),
... ['name', 'city']),
... ])
>>> col_trans = col_trans.fit(df)
>>> df_transformed = col_trans.transform(df)
>>> df_transformed
transformed_name transformed_city
0 5 8
1 15 6
2 4 15
<BLANKLINE>
[3 rows x 2 columns]
SQLScalarColumnTransformer can be combined with other transformers, like StandardScaler:
>>> col_trans = ColumnTransformer([
... ("identity", SQLScalarColumnTransformer("{0}", target_column="{0}"), ["col1", "col5"]),
... ("increment", SQLScalarColumnTransformer("{0}+1", target_column="inc_{0}"), "col2"),
... ("stdscale", preprocessing.StandardScaler(), "col3"),
... # ...
... ])