使用 xg boost 预测概率 - Python (1)

📌 相关文章

📜 使用 xg boost 预测概率 - Python (1)

📅 最后修改于: 2023-12-03 15:36:34.991000 🧑 作者: Mango

使用XGBoost预测概率 - Python

简介

XGBoost，全名为Extremely Gradient Boosting，是目前最流行的机器学习算法之一，其准确性和速度使其成为许多数据科学家首选的工具之一。本文将介绍如何使用XGBoost预测概率。

准备工作

在编写代码之前，需要安装相应的库。可以使用以下命令在命令行中安装库：

pip install xgboost
pip install numpy
pip install pandas
pip install sklearn

代码实现

导入库

我们首先需要导入必要的库：

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

数据准备

在本教程中，我们将使用鸢尾花数据集作为样本数据。可以使用以下代码加载数据：

iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
Y = pd.DataFrame(iris.target)

接下来，将数据集拆分为训练集和测试集：

seed = 7
test_size = 0.33
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

训练模型

在XGBoost中，我们需要创建一个DMatrix对象，它是XGBoost库中的一种特殊数据结构，可以加速训练过程。以下是如何创建DMatrix的代码：

dtrain = xgb.DMatrix(X_train, label=Y_train)
dtest = xgb.DMatrix(X_test, label=Y_test)

接下来，我们可以定义XGBoost分类器，并设置一些超参数：

model = XGBClassifier(
    learning_rate = 0.1,
    n_estimators = 1000,
    max_depth = 5,
    min_child_weight = 3,
    gamma = 0.2,
    subsample = 0.7,
    colsample_bytree = 0.7,
    objective = 'multi:softmax',
    nthread = 4,
    scale_pos_weight = 1,
    seed = 27
)

在定义好分类器之后，我们可以使用fit()方法训练模型：

model.fit(X_train, Y_train)

预测结果

在得到训练后的模型之后，我们可以使用predict()方法进行预测。如果我们需要预测出每个样本属于每个类别的概率，可以使用predict_proba()方法，如下所示：

pred_proba = model.predict_proba(X_test)

评估模型

最后，我们可以使用以下代码来评估模型的准确性：

y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(Y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

结论

在这篇文章中，我们介绍了如何使用XGBoost预测概率。通过使用XGBoost，我们可以得到非常准确的预测结果，并且可以快速训练大型数据集。