📜  gridsearch cv - Python (1)

📅  最后修改于: 2023-12-03 15:01:04.676000             🧑  作者: Mango

GridSearchCV in Python

Introduction

GridSearchCV is a method in scikit-learn for hyperparameter tuning. It is used to search for the best hyperparameters of a machine learning model by exhaustively trying all possible combinations of hyperparameters from a grid of values. This allows for finding the optimal set of hyperparameters that yields the best performance on the training data.

How it works

GridSearchCV works by taking a machine learning model and a dictionary of hyperparameters, where the keys are the hyperparameter names and the values are lists of values to be searched over. It then evaluates the model performance on each combination of hyperparameters using cross-validation, typically using k-fold cross-validation, where k is specified by the user.

The method returns a GridSearchCV object, which contains the best hyperparameters found and the best model, as well as information on the performance of each combination of hyperparameters.

Example

Let's consider an example of using GridSearchCV to tune the hyperparameters of a SVM model for a classification task on the famous iris dataset.

First, we import the necessary libraries and load the dataset:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV

iris = load_iris()
X = iris.data[:, :2]  # we only take the first two features
y = iris.target

Next, we define the hyperparameters to be tuned:

param_grid = {'C': [0.1, 1, 10, 100],
              'gamma': [0.01, 0.1, 1, 10]}

We create a SVM model and a GridSearchCV object:

model = SVC()
grid_search = GridSearchCV(model, param_grid, cv=5)

We fit the GridSearchCV object to the data:

grid_search.fit(X, y)

The best hyperparameters and the best model can be accessed through the best_params_ and best_estimator_ attributes, respectively:

print('Best hyperparameters:', grid_search.best_params_)
print('Best model:', grid_search.best_estimator_)

We can also access the mean cross-validation score of each combination of hyperparameters through the cv_results_ attribute:

print('Mean cross-validation score of all hyperparameter combinations:')
print(grid_search.cv_results_['mean_test_score'])
Conclusion

GridSearchCV is a powerful tool for hyperparameter tuning in machine learning. It allows for efficiently finding the best set of hyperparameters for a given model and dataset. By using GridSearchCV, we can save time and resources that would have been spent on manual tuning, and potentially improve the performance of our model.