Most of the machine learning algorithms contain a number of hyperparameters that we can tune to improve the model’s performance.

The hyperparameter controls the behaviors of the algorithm and has an enormous impact on the results. The value of the hyperparameter cannot be estimated from the data and has to be set before training the model.

An example of hyperparameter tuning might be choosing the number of neurons in a neural network or determining a learning rate in stochastic gradient descent.

In this article, we’ll discuss GridSearch and RandomizedSearch, which are two most widely used tools for hyperparameter tuning.

**GRID SEARCH:**

Grid search performs a sequential search to find the best hyperparameters.

It iteratively examines all combinations of the parameters for fitting the model. For each combination of hyperparameters, the model is evaluated using the k-fold cross-validation.

Let’s see an example to understand the hyperparameter tuning in scikit-learn.

We can use GridSearchCV class from the scikit-learn to find the best hyperparameters.

Let’s start by importing the necessary libraries.

1 2 3 4 5 |
import numpy as np from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split, GridSearchCV np.random.seed(1) |

Now let’s create a dataset using scikit-learn’s make_classification and split it into train and test set.

1 2 |
X, y = make_classification(n_samples=10000, n_classes=2, random_state=43) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=43) |

First, we need to specify the grid of parameters that you want the classifier to test. The parameter grid is actually a dictionary in which we pass the hyperparameter’s name and the values we would like to try for every hyperparameter.

1 2 3 |
parameter_grid = {'C':[0.001,0.01,0.1,1,10], 'penalty':['l1', 'l2'] } |

Once we have defined our parameter grid, we create an instance of the GridSearchCV class:

1 2 3 |
lr = LogisticRegression(random_state=43) estimator = GridSearchCV(estimator=lr, param_grid=parameter_grid, \ scoring='accuracy', cv=10, n_jobs=-1) |

We have passed the base estimator we would like to tune (Logistic Regression), the parameter grid, the metric for measuring the performance (accuracy), and the number of the folds for the k-fold cross-validation.

The number of combinations to be evaluated will be (5 x 2) * 10 = 100 combinations. Since tenfold cross-validation is used, we are multiplying it by 10.

Once the instance of the class is created, we can use the fit method:

1 |
estimator.fit(X_train, y_train) |

We can retrieve the best parameters using the attribute best_params_. To get the best estimator, we can use the best_estimator attribute, and to see the score of the best estimator, we can use the best_score attribute.

1 2 3 |
print(estimator.best_params_) print(estimator.best_estimator_) print(estimator.best_score_) |

1 2 3 4 5 6 7 |
OUTPUT: {'C': 0.01, 'penalty': 'l1'} LogisticRegression(C=0.01, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l1', random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False) 0.936 |

Now we can use these values to train the model

1 2 3 4 5 6 7 8 9 10 |
best_penalty = estimator.best_params_['penalty'] best_C = estimator.best_params_['C'] clf_lr = LogisticRegression(penalty=best_penalty, C=best_C) clf_lr.fit(X_train, y_train) predictions = clf_lr.predict(X_test) from sklearn.metrics import accuracy_score print(f'Accuracy',accuracy_score(y_test, predictions)) |

1 2 |
OUTPUT: Accuracy 0.9404 |

The GridSearchCV is a simple way of parameter tuning our models to get the best possible outcome. However, if the machine learning algorithm has many numbers of hyperparameters, then it may be inefficient.

The computational cost for the grid search grows exponentially with the number of hyperparameters. For instance, in the example, above we search for 100 possible combinations. By adding another hyperparameter which can take three values, the possible combinations increases to (5 x 2 x 3) * 10 = 300.

So an alternative is to use RandomizedSearchCV.

**Randomized Search:**

Unlike the GridSearch, it doesn’t perform an extensive search over the possible combinations of values for each of the tuning parameters; instead, it implements a randomized search over possible parameters to find the best solution.

Let’s see an example

1 2 3 4 5 6 7 8 |
import numpy as np from sklearn.model_selection import RandomizedSearchCV, train_test_split from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression np.random.seed(1) X, y = make_classification(n_samples=10000, n_classes=2, random_state=43) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=43) |

sklearn uses RandomizedSearchCV to perform the random search of hyperparameter values.

1 2 3 4 5 6 |
parameter_grid = {'C':np.logspace(-2,1,100), 'penalty':['l1', 'l2'] } random_search = RandomizedSearchCV(estimator=lr, param_distributions=parameter_grid, \ n_iter=7, scoring='accuracy', cv=10, n_jobs=-1) |

The param_distributions is similar to param_grid in GridSearchCV. However, it not only accepts a list of values, but we can also pass distributions as values.

For example, instead of C:[0.01,0.1,1,10], you can assign a distribution such as C:np.logspace(-2,1)

The n_iter parameter is the number of random samples.

1 2 3 4 5 6 7 |
random_search.fit(X_train, y_train) best_penalty = random_search.best_params_['penalty'] best_C = random_search.best_params_['C'] print(f'Best Penalty:', best_penalty) print(f'Best C:', best_C) |

1 2 3 |
OUTPUT: Best Penalty: l1 Best C: 0.03274549162877728 |

**SUMMARY:**

In this tutorial, we discussed two important techniques grid search and random search for hyperparameter tuning.

We first discussed the grid search, which performs a sequential search and iteratively examines all combinations of the parameters for fitting the model.

However, for ML algorithms with a large number of hyperparameters, the grid search method becomes computationally intractable. So we discussed an alternative randomized search.

Unlike the grid search, the random search doesn’t perform an extensive search over the possible combinations of values for each of the hyperparameters; instead, it implements a randomized search over all possible parameters to find the best solution.