相关性和回归之间的差异(1)

📌 相关文章

📜 相关性和回归之间的差异(1)

📅 最后修改于: 2023-12-03 15:11:22.598000 🧑 作者: Mango

相关性和回归之间的差异

相关性

相关性是用来衡量两个变量之间的关联程度。常用的相关系数有Pearson相关系数、Spearman相关系数和Kendall相关系数等。Pearson相关系数适用于两个连续型变量之间的关联程度衡量。Spearman和Kendall相关系数适用于两个有序变量之间的关联程度衡量。

其中，Pearson相关系数值的范围是[-1,1]，值越接近1或-1，表示两个变量之间的正相关或负相关程度越高，值越接近0表示两个变量之间的相关程度越低。

在Python中，可以使用pandas库中的corr方法来计算相关系数。如下所示：

import pandas as pd

data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
corr = df['x'].corr(df['y'])
print(corr)

输出结果为：

1.0

回归

回归是用来预测一个变量与其他变量之间的函数关系。最简单的线性回归可以看做是通过拟合一条直线来预测变量之间的关系。

在Python中，可以使用scikit-learn库中的LinearRegression进行线性回归拟合。如下所示：

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

x = np.array([1, 2, 3, 4, 5]).reshape((-1, 1))
y = np.array([2, 4, 6, 8, 10])

model = LinearRegression()
model.fit(x, y)

r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)

y_pred = model.intercept_ + model.coef_ * x
print('predicted response:', y_pred, sep='\n')

plt.scatter(x, y)
plt.plot(x, y_pred, color='red')
plt.show()

输出结果为：

coefficient of determination: 1.0
predicted response:
[[ 2.]
 [ 4.]
 [ 6.]
 [ 8.]
 [10.]]