Module linear_model (2.30.0)

Linear models. This module is styled after scikit-learn's linear_model module: https://scikit-learn.org/stable/modules/linear_model.html.

Classes

LinearRegression

LinearRegression(
    *,
    optimize_strategy: typing.Literal[
        "auto_strategy", "batch_gradient_descent", "normal_equation"
    ] = "auto_strategy",
    fit_intercept: bool = True,
    l1_reg: typing.Optional[float] = None,
    l2_reg: float = 0.0,
    max_iterations: int = 20,
    warm_start: bool = False,
    learning_rate: typing.Optional[float] = None,
    learning_rate_strategy: typing.Literal["line_search", "constant"] = "line_search",
    tol: float = 0.01,
    ls_init_learning_rate: typing.Optional[float] = None,
    calculate_p_values: bool = False,
    enable_global_explain: bool = False
)

Ordinary least squares Linear Regression.

LinearRegression fits a linear model with coefficients w = (w1, ..., wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Examples:

>>> from bigframes.ml.linear_model import LinearRegression
>>> import bigframes.pandas as bpd
>>> X = bpd.DataFrame({                 "feature0": [20, 21, 19, 18],                 "feature1": [0, 1, 1, 0],                 "feature2": [0.2, 0.3, 0.4, 0.5]})
>>> y = bpd.DataFrame({"outcome": [0, 0, 1, 1]})
>>> # Create the linear model
>>> model = LinearRegression()
>>> model.fit(X, y)
LinearRegression()

>>> # Score the model
>>> score = model.score(X, y)
>>> print(score) # doctest:+SKIP
    mean_absolute_error  mean_squared_error  mean_squared_log_error          0             0.022812            0.000602                 0.00035
    median_absolute_error  r2_score  explained_variance
0               0.015077  0.997591            0.997591

LogisticRegression

LogisticRegression(
    *,
    optimize_strategy: typing.Literal[
        "auto_strategy", "batch_gradient_descent"
    ] = "auto_strategy",
    fit_intercept: bool = True,
    l1_reg: typing.Optional[float] = None,
    l2_reg: float = 0.0,
    max_iterations: int = 20,
    warm_start: bool = False,
    learning_rate: typing.Optional[float] = None,
    learning_rate_strategy: typing.Literal["line_search", "constant"] = "line_search",
    tol: float = 0.01,
    ls_init_learning_rate: typing.Optional[float] = None,
    calculate_p_values: bool = False,
    enable_global_explain: bool = False,
    class_weight: typing.Optional[
        typing.Union[typing.Literal["balanced"], typing.Dict[str, float]]
    ] = None
)

Logistic Regression (aka logit, MaxEnt) classifier.

Examples:

>>> from bigframes.ml.linear_model import LogisticRegression
>>> import bigframes.pandas as bpd
>>> X = bpd.DataFrame({                 "feature0": [20, 21, 19, 18],                 "feature1": [0, 1, 1, 0],                 "feature2": [0.2, 0.3, 0.4, 0.5]})
>>> y = bpd.DataFrame({"outcome": [0, 0, 1, 1]})
>>> # Create the LogisticRegression
>>> model = LogisticRegression()
>>> model.fit(X, y)
LogisticRegression()
>>> model.predict(X) # doctest:+SKIP
    predicted_outcome   predicted_outcome_probs feature0        feature1        feature2
0       0       [{'label': 1, 'prob': 3.1895929877221615e-07} ...       20      0       0.2
1       0       [{'label': 1, 'prob': 5.662891265051953e-06} ...        21      1       0.3
2       1       [{'label': 1, 'prob': 0.9999917826885262} {'l...        19      1       0.4
3       1       [{'label': 1, 'prob': 0.9999999993659574} {'l...        18      0       0.5
4 rows × 5 columns

[4 rows x 5 columns in total]

>>> # Score the model
>>> score = model.score(X, y)
>>> score  # doctest:+SKIP
    precision   recall  accuracy        f1_score        log_loss        roc_auc
0       1.0     1.0     1.0     1.0     0.000004        1.0
1 rows × 6 columns

[1 rows x 6 columns in total]