📜  sklearn - Python (1)

📅  最后修改于: 2023-12-03 15:05:13.204000             🧑  作者: Mango

sklearn - Python

Scikit-learn, also known as sklearn, is a popular machine learning library for Python. It is built on top of NumPy, SciPy, and matplotlib. It provides simple and efficient tools for data mining and data analysis.

Installation

You can install scikit-learn using pip:

pip install scikit-learn
Importing

To use scikit-learn, you need to import it in your Python code:

import sklearn
Data

Scikit-learn provides a few datasets that you can use to practice machine learning:

from sklearn.datasets import load_boston
from sklearn.datasets import load_iris
from sklearn.datasets import load_digits

You can also load your own data using NumPy or Pandas.

Preprocessing

Before you can train your machine learning models, you need to preprocess your data. Scikit-learn provides many preprocessing tools, such as:

  • StandardScaler
  • MinMaxScaler
  • RobustScaler
  • Normalizer
  • Binarizer

Here is an example of how to use the StandardScaler:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Model selection

Scikit-learn provides many machine learning models, such as:

  • Regression
    • LinearRegression
    • Ridge
    • Lasso
    • ElasticNet
  • Classification
    • LogisticRegression
    • KNeighborsClassifier
    • DecisionTreeClassifier
    • RandomForestClassifier
    • GradientBoostingClassifier
  • Clustering
    • KMeans
    • DBSCAN
  • Dimensionality reduction
    • PCA
    • TSNE

Here is an example of how to use the RandomForestClassifier:

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)
accuracy = model.score(X_test, y_test)
Model evaluation

Scikit-learn provides many metrics for evaluating machine learning models:

  • Regression
    • mean_squared_error
    • mean_absolute_error
    • r2_score
  • Classification
    • accuracy_score
    • precision_score
    • recall_score
    • f1_score
    • roc_curve
  • Clustering
    • silhouette_score
  • Dimensionality reduction
    • explained_variance_ratio_

Here is an example of how to calculate the accuracy of a classification model:

from sklearn.metrics import accuracy_score

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
Conclusion

Scikit-learn is a powerful machine learning library for Python. It provides many tools for data preprocessing, model selection, and model evaluation. With scikit-learn, you can easily build and evaluate machine learning models for your data science projects.