scikit-learn
Cross-validation

Lecture 15

Dr. Colin Rundel

Pipelines

From last time

We will now look at another flavor of regression model, that involves preprocessing and a hyperparameter - namely polynomial regression.

df = pd.read_csv("data/gp.csv")
sns.relplot(data=df, x="x", y="y")

Pipelines

You may have noticed that PolynomialFeatures takes a model matrix as input and returns a new model matrix as output which is then used as the input for LinearRegression. This is not an accident, and by structuring the library in this way sklearn is designed to enable the connection of these steps together, into what sklearn calls a pipeline.

from sklearn.pipeline import make_pipeline

p = make_pipeline(
  PolynomialFeatures(degree=4),
  LinearRegression()
)
p
Pipeline(steps=[('polynomialfeatures', PolynomialFeatures(degree=4)),
                ('linearregression', LinearRegression())])

Using Pipelines

Once constructed, this object can be used just like our previous LinearRegression model (i.e. fit to our data and then used for prediction)

p = p.fit(X = df[["x"]], y = df.y)
p.predict(X = df[["x"]])
array([ 1.62957,  1.65735,  1.66105,  1.6778 ,  1.69667,  1.70475,  1.7528 ,  1.78471,  1.7905 ,  1.8269 ,  1.82966,  1.83376,  1.84494,  1.86003,  1.86228,  1.86619,  1.86838,  1.87065,  1.88418,
        1.8844 ,  1.88527,  1.88577,  1.88544,  1.86891,  1.86365,  1.86253,  1.86047,  1.85378,  1.84938,  1.83755,  1.82623,  1.82024,  1.818  ,  1.79768,  1.77255,  1.77034,  1.76574,  1.75371,
        1.7439 ,  1.73804,  1.73357,  1.65528,  1.64812,  1.61868,  1.60413,  1.59604,  1.56081,  1.55036,  1.54004,  1.50904,  1.45097,  1.4359 ,  1.41886,  1.39423,  1.36181,  1.23073,  1.21355,
        1.11776,  1.11522,  1.09595,  1.0645 ,  1.04672,  1.03663,  1.01407,  0.98209,  0.98082,  0.96177,  0.87491,  0.87118,  0.84223,  0.84171,  0.82875,  0.80851,  0.79166,  0.78167,  0.78078,
        0.73538,  0.71815,  0.70047,  0.67234,  0.67229,  0.64783,  0.64051,  0.63727,  0.63526,  0.62323,  0.61965,  0.61706,  0.61414,  0.60978,  0.60348,  0.59093,  0.56662,  0.50906,  0.44706,
        0.44178,  0.43291,  0.40958,  0.3848 ,  0.38289,  0.38068,  0.37915,  0.3761 ,  0.36933,  0.36493,  0.35807,  0.34757,  0.34668,  0.33333,  0.30718,  0.3007 ,  0.29676,  0.29338,  0.29333,
        0.27632,  0.26899,  0.26761,  0.26726,  0.26716,  0.26242,  0.25405,  0.25335,  0.25323,  0.25323,  0.25411,  0.25622,  0.25808,  0.2585 ,  0.2603 ,  0.26043,  0.2632 ,  0.26467,  0.26481,
        0.26486,  0.26489,  0.28177,  0.28525,  0.28861,  0.28918,  0.29004,  0.29445,  0.2956 ,  0.30233,  0.30622,  0.31322,  0.31798,  0.32105,  0.327  ,  0.32823,  0.32927,  0.33266,  0.33397,
        0.33711,  0.34111,  0.34141,  0.34707,  0.35926,  0.37678,  0.37775,  0.38885,  0.39078,  0.39518,  0.40743,  0.41041,  0.42033,  0.43577,  0.46158,  0.46668,  0.47145,  0.47197,  0.47425,
        0.4751 ,  0.47762,  0.48382,  0.48474,  0.49067,  0.50203,  0.50448,  0.50675,  0.5096 ,  0.51457,  0.51694,  0.51848,  0.52576,  0.53293,  0.53568,  0.53602,  0.53791,  0.53879,  0.53876,
        0.53839,  0.53823,  0.53757,  0.53749,  0.5365 ,  0.53481,  0.53372,  0.53274,  0.52872,  0.52378,  0.52346,  0.52314,  0.52287,  0.49656,  0.49553,  0.47579,  0.46694,  0.43758,  0.3861 ,
        0.38104,  0.31132,  0.29845,  0.28774,  0.27189,  0.2524 ,  0.23846,  0.22915,  0.17792,  0.17355,  0.09983,  0.09881,  0.09413,  0.09002,  0.08447,  0.01787, -0.00849, -0.03052, -0.06842,
       -0.09117, -0.10696, -0.13889, -0.20218, -0.22105, -0.23335, -0.39046, -0.46281, -0.47156, -0.48247, -0.56971, -0.57972, -0.68978, -0.81352, -0.83478, -0.88303, -0.91522, -0.96938, -0.99388,
       -1.16341, -1.19337, -1.21549])

plt.figure(layout="constrained")
sns.scatterplot(data=df, x="x", y="y")
sns.lineplot(x=df.x, y=p.predict(X = df[["x"]]), color="k")
plt.show()

Model coefficients (or other attributes)

The attributes of pipeline steps are not directly accessible, but can be accessed via the steps or named_steps attributes,

p.coef_
Error: AttributeError: 'Pipeline' object has no attribute 'coef_'
p.steps
[('polynomialfeatures', PolynomialFeatures(degree=4)), ('linearregression', LinearRegression())]
p.steps[1][1].coef_
array([  0.     ,   7.39051, -57.67175, 102.72227, -55.38181])
p.named_steps["linearregression"].intercept_
1.6136636604768615

Other useful bits

p.steps[0][1].get_feature_names_out()
array(['1', 'x', 'x^2', 'x^3', 'x^4'], dtype=object)
p.steps[1][1].get_params()
{'copy_X': True, 'fit_intercept': True, 'n_jobs': None, 'positive': False}

Anyone notice a problem?

p.steps[1][1].rank_
4
p.steps[1][1].n_features_in_
5

What about step parameters?

By accessing each step we can adjust their parameters (via set_params()),

p.named_steps["linearregression"].get_params()
{'copy_X': True, 'fit_intercept': True, 'n_jobs': None, 'positive': False}
p.named_steps["linearregression"].set_params(
  fit_intercept=False
)
LinearRegression(fit_intercept=False)
p.fit(X = df[["x"]], y = df.y)
Pipeline(steps=[('polynomialfeatures', PolynomialFeatures(degree=4)),
                ('linearregression', LinearRegression(fit_intercept=False))])
p.named_steps["linearregression"].intercept_
0.0
p.named_steps["linearregression"].coef_
array([  1.61366,   7.39051, -57.67175, 102.72227, -55.38181])

Pipeline parameter names

These parameters can also be directly accessed at the pipeline level, names are constructed as step name + __ + parameter name:

p.get_params()
{'memory': None, 'steps': [('polynomialfeatures', PolynomialFeatures(degree=4)), ('linearregression', LinearRegression(fit_intercept=False))], 'verbose': False, 'polynomialfeatures': PolynomialFeatures(degree=4), 'linearregression': LinearRegression(fit_intercept=False), 'polynomialfeatures__degree': 4, 'polynomialfeatures__include_bias': True, 'polynomialfeatures__interaction_only': False, 'polynomialfeatures__order': 'C', 'linearregression__copy_X': True, 'linearregression__fit_intercept': False, 'linearregression__n_jobs': None, 'linearregression__positive': False}
p.set_params(
  linearregression__fit_intercept=True, 
  polynomialfeatures__include_bias=False
)
Pipeline(steps=[('polynomialfeatures',
                 PolynomialFeatures(degree=4, include_bias=False)),
                ('linearregression', LinearRegression())])

p.fit(X = df[["x"]], y = df.y)
Pipeline(steps=[('polynomialfeatures',
                 PolynomialFeatures(degree=4, include_bias=False)),
                ('linearregression', LinearRegression())])
p.named_steps["polynomialfeatures"].get_feature_names_out()
array(['x', 'x^2', 'x^3', 'x^4'], dtype=object)
p.named_steps["linearregression"].intercept_
1.6136636604768375
p.named_steps["linearregression"].coef_
array([  7.39051, -57.67175, 102.72227, -55.38181])

Column Transformers

Column Transformers

Are a tool for selectively applying transformer(s) to column(s) of an array or DataFrame, they function in a way that is similar to a pipeline and similarly have a make helper function.

from sklearn.compose import make_column_transformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
ct = make_column_transformer(
  (StandardScaler(), ["volume"]),
  (OneHotEncoder(), ["cover"]),
).fit(
  books
)
ct.get_feature_names_out()
array(['standardscaler__volume', 'onehotencoder__cover_hb', 'onehotencoder__cover_pb'], dtype=object)
ct.transform(books)
array([[ 0.12101,  1.     ,  0.     ],
       [ 0.51997,  1.     ,  0.     ],
       [ 0.85192,  1.     ,  0.     ],
       [-1.84637,  1.     ,  0.     ],
       [-0.43936,  1.     ,  0.     ],
       [-0.62209,  1.     ,  0.     ],
       [ 1.16561,  1.     ,  0.     ],
       [-1.31951,  0.     ,  1.     ],
       [ 0.3281 ,  0.     ,  1.     ],
       [ 0.25501,  0.     ,  1.     ],
       [ 1.96962,  0.     ,  1.     ],
       [-1.29819,  0.     ,  1.     ],
       [ 0.50169,  0.     ,  1.     ],
       [-0.76218,  0.     ,  1.     ],
       [ 0.57478,  0.     ,  1.     ]])

Keeping or dropping other columns

One addition important argument is remainder which determines what happens to unspecified columns. The default is "drop" which is why weight was removed, the alternative is "passthrough" which retains untransformed columns.

ct = make_column_transformer(
  (StandardScaler(), ["volume"]),
  (OneHotEncoder(), ["cover"]),
  remainder = "passthrough"
).fit(
  books
)
ct.get_feature_names_out()
array(['standardscaler__volume', 'onehotencoder__cover_hb', 'onehotencoder__cover_pb', 'remainder__weight'], dtype=object)
ct.transform(books)
array([[   0.12101,    1.     ,    0.     ,  800.     ],
       [   0.51997,    1.     ,    0.     ,  950.     ],
       [   0.85192,    1.     ,    0.     , 1050.     ],
       [  -1.84637,    1.     ,    0.     ,  350.     ],
       [  -0.43936,    1.     ,    0.     ,  750.     ],
       [  -0.62209,    1.     ,    0.     ,  600.     ],
       [   1.16561,    1.     ,    0.     , 1075.     ],
       [  -1.31951,    0.     ,    1.     ,  250.     ],
       [   0.3281 ,    0.     ,    1.     ,  700.     ],
       [   0.25501,    0.     ,    1.     ,  650.     ],
       [   1.96962,    0.     ,    1.     ,  975.     ],
       [  -1.29819,    0.     ,    1.     ,  350.     ],
       [   0.50169,    0.     ,    1.     ,  950.     ],
       [  -0.76218,    0.     ,    1.     ,  425.     ],
       [   0.57478,    0.     ,    1.     ,  725.     ]])

Column selection

One lingering issue with the above approach is that we’ve had to hard code the column names (or use indexes). Often we want to select columns based on their dtype (e.g. categorical vs numerical) this can be done via pandas or sklearn,

from sklearn.compose import make_column_selector
ct = make_column_transformer(
  ( StandardScaler(), 
    make_column_selector(
      dtype_include=np.number
    )
  ),
  ( OneHotEncoder(), 
    make_column_selector(
      dtype_include=[object, bool]
    )
  )
)
ct = make_column_transformer(
  ( StandardScaler(), 
    books.select_dtypes(
      include=['number']
    ).columns
  ),
  ( OneHotEncoder(), 
    books.select_dtypes(
      include=['object']
    ).columns
  )
)

ct.fit_transform(books)
array([[ 0.12101,  0.35936,  1.     ,  0.     ],
       [ 0.51997,  0.9369 ,  1.     ,  0.     ],
       [ 0.85192,  1.32193,  1.     ,  0.     ],
       [-1.84637, -1.37326,  1.     ,  0.     ],
       [-0.43936,  0.16685,  1.     ,  0.     ],
       [-0.62209, -0.4107 ,  1.     ,  0.     ],
       [ 1.16561,  1.41818,  1.     ,  0.     ],
       [-1.31951, -1.75829,  0.     ,  1.     ],
       [ 0.3281 , -0.02567,  0.     ,  1.     ],
       [ 0.25501, -0.21818,  0.     ,  1.     ],
       [ 1.96962,  1.03316,  0.     ,  1.     ],
       [-1.29819, -1.37326,  0.     ,  1.     ],
       [ 0.50169,  0.9369 ,  0.     ,  1.     ],
       [-0.76218, -1.08449,  0.     ,  1.     ],
       [ 0.57478,  0.07059,  0.     ,  1.     ]])
ct.get_feature_names_out()
array(['standardscaler__volume', 'standardscaler__weight', 'onehotencoder__cover_hb', 'onehotencoder__cover_pb'], dtype=object)
ct.fit_transform(books)
array([[ 0.12101,  0.35936,  1.     ,  0.     ],
       [ 0.51997,  0.9369 ,  1.     ,  0.     ],
       [ 0.85192,  1.32193,  1.     ,  0.     ],
       [-1.84637, -1.37326,  1.     ,  0.     ],
       [-0.43936,  0.16685,  1.     ,  0.     ],
       [-0.62209, -0.4107 ,  1.     ,  0.     ],
       [ 1.16561,  1.41818,  1.     ,  0.     ],
       [-1.31951, -1.75829,  0.     ,  1.     ],
       [ 0.3281 , -0.02567,  0.     ,  1.     ],
       [ 0.25501, -0.21818,  0.     ,  1.     ],
       [ 1.96962,  1.03316,  0.     ,  1.     ],
       [-1.29819, -1.37326,  0.     ,  1.     ],
       [ 0.50169,  0.9369 ,  0.     ,  1.     ],
       [-0.76218, -1.08449,  0.     ,  1.     ],
       [ 0.57478,  0.07059,  0.     ,  1.     ]])
ct.get_feature_names_out()
array(['standardscaler__volume', 'standardscaler__weight', 'onehotencoder__cover_hb', 'onehotencoder__cover_pb'], dtype=object)

Demo 1 - Putting it together
Interaction model

Cross validation &
hyper parameter tuning

Ridge regression

One way to expand on the idea of least squares regression is to modify the loss function. One such approach is known as Ridge regression, which adds a scaled penalty for the sum of the squares of the \(\beta\)s to the least squares loss.

\[ \underset{\boldsymbol{\beta}}{\text{argmin}} \; \lVert \boldsymbol{y} - \boldsymbol{X} \boldsymbol{\beta} \rVert^2 + \lambda (\boldsymbol{\beta}^T\boldsymbol{\beta}) \]

d = pd.read_csv("data/ridge.csv")
d
            y        x1        x2        x3        x4 x5
0   -0.151710  0.353658  1.633932  0.553257  1.415731  A
1    3.579895  1.311354  1.457500  0.072879  0.330330  B
2    0.768329 -0.744034  0.710362 -0.246941  0.008825  B
3    7.788646  0.806624 -0.228695  0.408348 -2.481624  B
4    1.394327  0.837430 -1.091535 -0.860979 -0.810492  A
..        ...       ...       ...       ...       ... ..
495 -0.204932 -0.385814 -0.130371 -0.046242  0.004914  A
496  0.541988  0.845885  0.045291  0.171596  0.332869  A
497 -1.402627 -1.071672 -1.716487 -0.319496 -1.163740  C
498 -0.043645  1.744800 -0.010161  0.422594  0.772606  A
499 -1.550276  0.910775 -1.675396  1.921238 -0.232189  B

[500 rows x 6 columns]

dummy coding

d = pd.get_dummies(d)
d
            y        x1        x2        x3        x4  x5_A  x5_B  x5_C  x5_D
0   -0.151710  0.353658  1.633932  0.553257  1.415731     1     0     0     0
1    3.579895  1.311354  1.457500  0.072879  0.330330     0     1     0     0
2    0.768329 -0.744034  0.710362 -0.246941  0.008825     0     1     0     0
3    7.788646  0.806624 -0.228695  0.408348 -2.481624     0     1     0     0
4    1.394327  0.837430 -1.091535 -0.860979 -0.810492     1     0     0     0
..        ...       ...       ...       ...       ...   ...   ...   ...   ...
495 -0.204932 -0.385814 -0.130371 -0.046242  0.004914     1     0     0     0
496  0.541988  0.845885  0.045291  0.171596  0.332869     1     0     0     0
497 -1.402627 -1.071672 -1.716487 -0.319496 -1.163740     0     0     1     0
498 -0.043645  1.744800 -0.010161  0.422594  0.772606     1     0     0     0
499 -1.550276  0.910775 -1.675396  1.921238 -0.232189     0     1     0     0

[500 rows x 9 columns]

Fitting a ridge regession model

The linear_model submodule also contains the Ridge model which can be used to fit a ridge regression. Usage is identical other than Ridge() takes the parameter alpha to specify the regularization parameter.

from sklearn.linear_model import Ridge, LinearRegression

X, y = d.drop(["y"], axis=1), d.y

rg = Ridge(fit_intercept=False, alpha=10).fit(X, y)
lm = LinearRegression(fit_intercept=False).fit(X, y)
rg.coef_
array([ 0.97809,  1.96215,  0.00172, -2.94457,  0.45558,  0.09001, -0.28193,  0.79781])
lm.coef_
array([ 0.99505,  2.00762,  0.00232, -3.00088,  0.49329,  0.10193, -0.29413,  1.00856])
mean_squared_error(y, rg.predict(X))
0.019101431349883385
mean_squared_error(y, lm.predict(X))
0.009872435924102045

Test-Train split

The most basic form of CV is to split the data into a testing and training set, this can be achieved using train_test_split from the model_selection submodule.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1234)
X.shape
(500, 8)
X_train.shape
(400, 8)
X_test.shape
(100, 8)
y.shape
(500,)
y_train.shape
(400,)
y_test.shape
(100,)

X_train
           x1        x2        x3        x4  x5_A  x5_B  x5_C  x5_D
296 -0.261142 -0.887193 -0.441300  0.053902     0     0     1     0
220  0.155596  0.551363  0.749117  0.875181     0     0     1     0
0    0.353658  1.633932  0.553257  1.415731     1     0     0     0
255 -1.206309 -0.073534 -1.920777 -0.554861     1     0     0     0
335 -0.380790 -0.117404 -0.037709  0.202757     0     1     0     0
..        ...       ...       ...       ...   ...   ...   ...   ...
204 -2.646094  1.170804 -0.185098  0.165830     0     1     0     0
53  -0.483511  0.452531  0.223226 -0.753872     0     1     0     0
294 -1.424818 -0.396870 -0.595927 -1.114747     1     0     0     0
211 -1.000845 -0.842665  0.407765  0.375650     0     1     0     0
303  1.037404 -0.961266  0.433180  0.890055     0     1     0     0

[400 rows x 8 columns]
y_train
296   -2.462944
220   -1.760134
0     -0.151710
255    0.668016
335   -1.178652
         ...   
204   -0.657622
53     2.831201
294    1.566109
211   -3.711740
303   -3.552971
Name: y, Length: 400, dtype: float64

Train vs Test rmse

alpha = np.logspace(-2,1, 100)
train_rmse = []
test_rmse = []

for a in alpha:
    rg = Ridge(alpha=a).fit(X_train, y_train)
    
    train_rmse.append( 
      mean_squared_error(
        y_train, rg.predict(X_train), squared=False
      ) 
    )
    test_rmse.append( 
      mean_squared_error(
        y_test, rg.predict(X_test), squared=False
      ) 
    )

res = pd.DataFrame(
  data = {"alpha": alpha, 
          "train": train_rmse, 
          "test": test_rmse}
)
res
        alpha     train      test
0    0.010000  0.097568  0.106985
1    0.010723  0.097568  0.106984
2    0.011498  0.097568  0.106984
3    0.012328  0.097568  0.106983
4    0.013219  0.097568  0.106983
..        ...       ...       ...
95   7.564633  0.126990  0.129414
96   8.111308  0.130591  0.132458
97   8.697490  0.134568  0.135838
98   9.326033  0.138950  0.139581
99  10.000000  0.143764  0.143715

[100 rows x 3 columns]

sns.relplot(
  x="alpha", y="rmse", hue="variable", data = pd.melt(res, id_vars=["alpha"],value_name="rmse")
).set(
  xscale="log"
)

Best alpha?

min_i = np.argmin(res.train)
min_i
0
res.iloc[[min_i],:]
   alpha     train      test
0   0.01  0.097568  0.106985
min_i = np.argmin(res.test)
min_i
58
res.iloc[[min_i],:]
       alpha     train    test
58  0.572237  0.097787  0.1068

k-fold cross validation

The previous approach was relatively straight forward, but it required a fair bit of book keeping to implement and we only examined a single test/train split. If we would like to perform k-fold cross validation we can use cross_val_score from the model_selection submodule.

from sklearn.model_selection import cross_val_score

cross_val_score(
  Ridge(alpha=0.59, fit_intercept=False), 
  X, y,
  cv=5, 
  scoring="neg_root_mean_squared_error"
)
array([-0.09364, -0.09995, -0.10474, -0.10273, -0.10597])

Controling k-fold behavior

Rather than providing cv as an integer, it is better to specify a cross-validation scheme directly (with additional options). Here we will use the KFold class from the model_selection submodule.

from sklearn.model_selection import KFold

cross_val_score(
  Ridge(alpha=0.59, fit_intercept=False), 
  X, y, 
  cv = KFold(n_splits=5, shuffle=True, random_state=1234), 
  scoring="neg_root_mean_squared_error"
)
array([-0.10658, -0.104  , -0.1037 , -0.10125, -0.09228])

KFold object

KFold() returns a class object which provides the method split() which in turn is a generator that returns a tuple with the indexes of the training and testing selects for each fold given a model matrix X,

ex = pd.DataFrame(data = list(range(10)), columns=["x"])
cv = KFold(5)
for train, test in cv.split(ex):
  print(f'Train: {train} | test: {test}')
Train: [2 3 4 5 6 7 8 9] | test: [0 1]
Train: [0 1 4 5 6 7 8 9] | test: [2 3]
Train: [0 1 2 3 6 7 8 9] | test: [4 5]
Train: [0 1 2 3 4 5 8 9] | test: [6 7]
Train: [0 1 2 3 4 5 6 7] | test: [8 9]
cv = KFold(5, shuffle=True, random_state=1234)
for train, test in cv.split(ex):
  print(f'Train: {train} | test: {test}')
Train: [0 1 3 4 5 6 8 9] | test: [2 7]
Train: [0 2 3 4 5 6 7 8] | test: [1 9]
Train: [1 2 3 4 5 6 7 9] | test: [0 8]
Train: [0 1 2 3 6 7 8 9] | test: [4 5]
Train: [0 1 2 4 5 7 8 9] | test: [3 6]

Train vs Test rmse (again)

alpha = np.logspace(-2,1, 30)
test_mean_rmse = []
test_rmse = []
cv = KFold(n_splits=5, shuffle=True, random_state=1234)

for a in alpha:
    rg = Ridge(fit_intercept=False, alpha=a).fit(X_train, y_train)
    
    scores = -1 * cross_val_score(
      rg, X, y, 
      cv = cv, 
      scoring="neg_root_mean_squared_error"
    )
    test_mean_rmse.append(np.mean(scores))
    test_rmse.append(scores)

res = pd.DataFrame(
    data = np.c_[alpha, test_mean_rmse, test_rmse],
    columns = ["alpha", "mean_rmse"] + ["fold" + str(i) for i in range(1,6) ]
)

res
        alpha  mean_rmse     fold1     fold2     fold3     fold4     fold5
0    0.010000   0.101257  0.106979  0.103691  0.102288  0.101130  0.092195
1    0.012690   0.101257  0.106976  0.103692  0.102292  0.101129  0.092194
2    0.016103   0.101256  0.106971  0.103692  0.102298  0.101126  0.092194
3    0.020434   0.101256  0.106966  0.103693  0.102306  0.101123  0.092193
4    0.025929   0.101256  0.106959  0.103694  0.102316  0.101120  0.092191
5    0.032903   0.101256  0.106951  0.103696  0.102328  0.101116  0.092190
6    0.041753   0.101256  0.106940  0.103698  0.102344  0.101110  0.092188
7    0.052983   0.101256  0.106927  0.103701  0.102365  0.101104  0.092186
8    0.067234   0.101257  0.106911  0.103704  0.102391  0.101096  0.092184
9    0.085317   0.101259  0.106890  0.103709  0.102426  0.101088  0.092181
10   0.108264   0.101262  0.106865  0.103716  0.102471  0.101078  0.092178
11   0.137382   0.101267  0.106835  0.103725  0.102529  0.101069  0.092176
12   0.174333   0.101276  0.106800  0.103739  0.102607  0.101060  0.092174
13   0.221222   0.101291  0.106758  0.103758  0.102710  0.101055  0.092175
14   0.280722   0.101317  0.106712  0.103786  0.102848  0.101059  0.092180
15   0.356225   0.101360  0.106663  0.103828  0.103036  0.101078  0.092193
16   0.452035   0.101430  0.106617  0.103890  0.103293  0.101128  0.092221
17   0.573615   0.101544  0.106584  0.103984  0.103650  0.101229  0.092273
18   0.727895   0.101729  0.106580  0.104128  0.104149  0.101420  0.092367
19   0.923671   0.102026  0.106639  0.104348  0.104856  0.101757  0.092530
20   1.172102   0.102501  0.106809  0.104690  0.105864  0.102334  0.092805
21   1.487352   0.103253  0.107174  0.105220  0.107314  0.103295  0.093263
22   1.887392   0.104436  0.107863  0.106045  0.109403  0.104858  0.094011
23   2.395027   0.106274  0.109072  0.107328  0.112413  0.107341  0.095216
24   3.039195   0.109091  0.111089  0.109319  0.116727  0.111193  0.097129
25   3.856620   0.113335  0.114315  0.112385  0.122847  0.117013  0.100112
26   4.893901   0.119591  0.119283  0.117056  0.131401  0.125546  0.104671
27   6.210169   0.128590  0.126646  0.124055  0.143126  0.137655  0.111470
28   7.880463   0.141185  0.137154  0.134319  0.158849  0.154271  0.121333
29  10.000000   0.158324  0.151609  0.148984  0.179465  0.176352  0.135209

sns.relplot(
  x="alpha", y="rmse", hue="variable", data=res.melt(id_vars=["alpha"], value_name="rmse"), 
  marker="o", kind="line"
).set(
  xscale="log"
)

Best alpha? (again)

i = res.drop(
  ["alpha"], axis=1
).agg(
  np.argmin
).to_numpy()

i = np.sort(np.unique(i))

res.iloc[ i, : ]
       alpha  mean_rmse     fold1     fold2     fold3     fold4     fold5
0   0.010000   0.101257  0.106979  0.103691  0.102288  0.101130  0.092195
5   0.032903   0.101256  0.106951  0.103696  0.102328  0.101116  0.092190
12  0.174333   0.101276  0.106800  0.103739  0.102607  0.101060  0.092174
13  0.221222   0.101291  0.106758  0.103758  0.102710  0.101055  0.092175
18  0.727895   0.101729  0.106580  0.104128  0.104149  0.101420  0.092367

Aside - Available metrics

For most of the cross validation functions we pass in a string instead of a scoring function from the metrics submodule - if you are interested in seeing the names of the possible metrics, these are available via the sklearn.metrics.SCORERS dictionary,

np.array( sorted(
  sklearn.metrics.SCORERS.keys()
) )
array(['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro',
       'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'jaccard', 'jaccard_macro', 'jaccard_micro', 'jaccard_samples', 'jaccard_weighted', 'matthews_corrcoef', 'max_error',
       'mutual_info_score', 'neg_brier_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_absolute_percentage_error', 'neg_mean_gamma_deviance', 'neg_mean_poisson_deviance',
       'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'neg_negative_likelihood_ratio', 'neg_root_mean_squared_error', 'normalized_mutual_info_score',
       'positive_likelihood_ratio', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'rand_score', 'recall', 'recall_macro', 'recall_micro',
       'recall_samples', 'recall_weighted', 'roc_auc', 'roc_auc_ovo', 'roc_auc_ovo_weighted', 'roc_auc_ovr', 'roc_auc_ovr_weighted', 'top_k_accuracy', 'v_measure_score'], dtype='<U34')

gs.best_index_
5
gs.best_params_
{'alpha': 0.03290344562312668}
gs.best_score_
-0.10125611767453653

best_estimator_ attribute

If refit = True (the default) with GridSearchCV() then the best_estimator_ attribute will be available which gives direct access to the “best” model or pipeline object. This model is constructed by using the parameter(s) that achieved the maximum score and refitting the model to the complete data set.

gs.best_estimator_
Ridge(alpha=0.03290344562312668, fit_intercept=False)
gs.best_estimator_.coef_
array([ 0.99499,  2.00747,  0.00231, -3.0007 ,  0.49316,  0.10189, -0.29408,  1.00767])
gs.best_estimator_.predict(X)
array([ -0.12179,   3.34151,   0.76055,   7.89292,   1.56523,  -5.33575,  -4.37469,   3.13003,  -0.16859,  -1.60087,  -1.89073,   1.44596,   3.99773,   4.70003,  -6.45959,   4.11085,   3.60426,
        -1.96548,   2.99039,   0.56796,  -5.26672,   5.4966 ,   3.47247,  -2.66117,   3.35011,   0.64221,  -1.50238,   2.41562,   3.11665,   1.11236,  -2.11839,   1.36006,  -0.53666,  -2.78112,
         0.76008,   5.49779,   2.6521 ,  -0.83127,   0.04167,  -1.92585,  -2.48865,   2.29127,   3.62514,  -2.01226,  -0.69725,  -1.94514,  -0.47559,  -7.36557,  -3.20766,   2.9218 ,  -0.8213 ,
        -2.78598, -12.55143,   2.79189,  -1.89763,  -5.1769 ,   1.87484,   2.18345,  -6.45358,   0.91006,   0.94792,   2.91799,   6.12323,  -1.87654,   3.63259,  -0.53797,  -3.23506,  -2.23885,
         1.04564,  -1.54843,   0.76161,  -1.65495,   0.22378,  -0.68221,   0.12976,   2.58875,   2.54421,  -3.69056,   3.73479,  -0.90278,   1.22394,  -3.22614,   7.16719,  -5.6168 ,   3.3433 ,
         0.36935,   0.87397,   9.22348,  -1.29078,   1.74347,  -1.55169,  -0.69398,  -1.40445,   0.23072,   1.06277,   2.84797,   2.35596,  -1.93292,   8.35129,  -2.98221,  -6.35071,  -5.15138,
         1.70208,   7.15821,   3.96172,   5.75363,  -4.50718,  -5.81785,  -2.47424,   1.19276,   2.57431,  -2.57053,  -0.53682,  -1.65955,   1.99839,  -6.19607,  -1.73962,  -2.11993,  -2.29362,
         2.65413,  -0.67486,  -3.01324,   0.34118,  -3.83856,   0.33096,  -3.59485,  -1.55578,   0.96765,   3.50934,  -0.31935,  -4.18323,   2.87843,  -1.64857,  -3.68181,   2.24423,  -1.00244,
        -2.65588,  -5.77111,  -1.20292,   2.66903,  -1.11387,   3.05231,   6.34596,  -1.42886,  -2.29709,  -1.4573 ,  -2.46733,   1.69685,   4.21699,   1.21569,   9.06269,  -3.62209,   1.94704,
         1.14603,  -3.35087,  -5.91052,  -1.23355,   2.8308 ,  -3.21438,   4.09019,  -5.95969,  -0.98044,   2.06976,   0.58541,   1.83006,   8.11251,  -0.18073,  -4.80287,   1.59881,   0.13323,
         2.67859,   2.45406,  -2.28901,   1.1609 ,  -1.50239,  -5.51199,   2.67089,   2.39878,   6.65249,   0.5551 ,   9.36975,   6.90333,   0.48633,  -0.51877,   1.44203,  -5.95008,   5.99042,
        -0.85644,   1.90162,  -1.23686,   3.22403,   5.31725,   0.31415,   0.17128,  -1.53623,   1.73354,  -1.93645,   4.68716,  -3.62658,   0.22032, -10.94667,   2.83953,  -8.13513,   4.30062,
        -0.67864,  -0.67348,   4.22499,   3.34704,  -1.44927,  -6.3229 ,   4.83881,  -3.71184,   6.32207,   3.69622,  -1.02501, -12.91691,   1.85435,  -0.43171,   4.77516,  -1.53529,  -1.65685,
         5.69233,   6.28949,   5.37201,  -0.63177,   2.88795,   4.01781,   7.03453,   1.76797,   5.86793,   1.57465,   3.03172,   0.96769,  -3.0659 ,  -1.51918,  -2.89632,  -1.28436,   2.67186,
        -0.92299,  -4.85603,   4.18714,  -3.60775,  -2.31532,   1.27459,   0.37238,  -1.21   ,   2.44074,  -1.52466,  -2.59175,  -1.83419,  -0.8865 ,   0.89346,   2.70453,  -3.15098,  -4.43793,
         0.8058 ,   0.23748,   1.13615,   0.63385,  -0.2395 ,   6.07024,   0.85521,   0.18951,   3.27772,  -0.8963 ,  -5.84285,   0.68905,  -0.30427,  -2.87087,  10.51629,  -3.96115,  -5.09138,
       -10.86754,  -9.25489,   7.0615 ,   0.01263,   3.93274,   3.40325,  -1.57858,  -4.94508,  -2.69779,   1.07372,  -3.95091,  -3.80321,  -1.91214,   0.14772,   3.70995,   5.04094,  -0.02024,
        -0.03725,  -1.15642,   8.92035,   2.63769,  -1.39664,   1.62241,  -4.87487,  -2.49769,   1.39569,  -1.39193,   4.52569,   2.29201,   1.57898,   0.69253,  -3.4654 ,   3.71004,   6.10037,
        -4.41299,  -4.79775,  -3.79204,  -3.61711,  -2.92489,   7.15104,  -3.24195,   3.03705,  -4.01473,  -1.99391,  -4.64601,   4.40534,  -3.12028,  -0.1754 ,   2.52698,   0.49637,  -1.0263 ,
        10.77554,  -1.64465,  -2.13624,  -2.16392,   1.92049,  -2.47602,  -4.34462,  -2.09427,  -0.32466,   2.56876,  -5.7397 ,  -2.94306,  -1.12118,   4.16147,   2.5303 ,   3.38768,   7.96277,
        -3.28827,  -5.73513,   4.76249,  -1.24714,   0.08253,  -1.71446,   1.3742 ,   1.85738,  -6.37864,  -0.0773 ,   0.73072,  -1.64713,  -3.65246,   1.57344,  -2.56019,  -1.09033,  -1.05099,
        -4.48298,  -0.28666,  -4.92509,   2.6523 ,  -4.59622,   3.09283,   3.50353,  -6.1787 ,  -2.08203,  -2.72838,  -8.55473,   4.14717,   0.03483,  -2.07173,  -1.22492,  -2.1331 ,  -3.24188,
        -3.23348,  -1.43328,   3.09365,   2.85666,   3.1452 ,  -0.60436,  -3.08445,   2.39221,   1.26373,   4.77618,  -1.78471,  -6.19369,  -3.24321,  -0.76221,  -1.56433,   1.39877,   2.28802,
         4.46115,  -3.25751,  -2.51097,   1.19593,   1.12214,   2.0177 ,  -2.9301 ,  -5.70471,   2.94404,  -9.62989,  -4.13055,  -0.30686,   5.41388,   3.36441,  -1.68838,   3.18239,  -1.97929,
         3.84279,   0.59629,   4.23805,  -8.3217 ,   4.71925,   0.32863,   2.20721,   3.46358,   3.38237,  -2.65319,   2.32341,   0.31199,   5.29292,   0.798  ,   2.17796,   5.74332,  -7.68979,
         0.33166,  -1.84974,   4.73811,   0.51179,  -1.18062,  -1.08818,   6.30818,  -2.88198,  -1.68064,   1.76754,  -3.80955,  -5.03755,   3.41809,  -2.62689,   4.09036,  -4.51406,   0.95089,
        -1.0706 ,  -1.51755,  -1.83065,  -5.33533,  -2.15694,  -5.43987,  -5.04878,  -5.62245,  -1.46875,  -0.60701,   0.20797,  -3.21649,   3.93528,   1.14442,   1.93545,  -4.11887,  -0.39968,
        -4.07461,   2.32534,  -0.26627,  -2.45467,  -1.08026,   2.35466,   0.92026,  -1.41122,  -1.21825, -10.48345,   3.18599,   0.08117,   4.24776,   4.47563,   6.52936,   4.06496,   0.61928,
        -4.96605,  -1.23884,  -3.06521,   2.4295 ,  -3.13812,  -0.51459,  -2.9222 ,   0.72806,   4.4886 ,  -1.04944, -11.67098,   1.12496,   3.81906,  -6.76879,  -3.90709,  -1.75508,   1.57104,
         2.2711 ,   7.69569,  -0.16729,   0.42729,  -1.31489,  -0.10855,  -1.65403])

cv_results_ attribute

Other useful details about the grid search process are stored in the dictionary cv_results_ attribute which includes things like average test scores, fold level test scores, test ranks, test runtimes, etc.

gs.cv_results_.keys()
dict_keys(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time', 'param_alpha', 'params', 'split0_test_score', 'split1_test_score', 'split2_test_score', 'split3_test_score', 'split4_test_score', 'mean_test_score', 'std_test_score', 'rank_test_score'])
gs.cv_results_["param_alpha"]
masked_array(data=[0.01, 0.01268961003167922, 0.01610262027560939, 0.020433597178569417, 0.02592943797404667, 0.03290344562312668, 0.041753189365604, 0.05298316906283707, 0.06723357536499334,
                   0.08531678524172806, 0.10826367338740546, 0.1373823795883263, 0.17433288221999882, 0.2212216291070449, 0.2807216203941177, 0.3562247890262442, 0.4520353656360243,
                   0.5736152510448679, 0.727895384398315, 0.9236708571873861, 1.1721022975334805, 1.4873521072935119, 1.8873918221350976, 2.395026619987486, 3.039195382313198, 3.856620421163472,
                   4.893900918477494, 6.2101694189156165, 7.880462815669913, 10.0],
             mask=[False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False,
                   False, False, False, False, False],
       fill_value='?',
            dtype=object)
gs.cv_results_["mean_test_score"]
array([-0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10126, -0.10127, -0.10128, -0.10129, -0.10132, -0.10136, -0.10143, -0.10154, -0.10173,
       -0.10203, -0.1025 , -0.10325, -0.10444, -0.10627, -0.10909, -0.11333, -0.11959, -0.12859, -0.14119, -0.15832])

alpha = np.array(gs.cv_results_["param_alpha"],dtype="float64")
score = -gs.cv_results_["mean_test_score"]
score_std = gs.cv_results_["std_test_score"]
n_folds = gs.cv.get_n_splits()

plt.figure(layout="constrained")

ax = sns.lineplot(x=alpha, y=score)
ax.set_xscale("log")

plt.fill_between(
  x = alpha,
  y1 = score + 1.96*score_std / np.sqrt(n_folds),
  y2 = score - 1.96*score_std / np.sqrt(n_folds),
  alpha = 0.2
)

plt.show()

Ridge traceplot

alpha = np.logspace(-1,5, 100)
betas = []

for a in alpha:
    rg = Ridge(alpha=a).fit(X, y)
    
    betas.append(rg.coef_)

res = pd.DataFrame(
  data = betas, columns = rg.feature_names_in_
).assign(
  alpha = alpha  
)

g = sns.relplot(
  data = res.melt(id_vars="alpha", value_name="coef values", var_name="feature"),
  x = "alpha", y = "coef values", hue = "feature",
  kind = "line", aspect=2
)
g.set(xscale="log")