📜  python r squared - Python (1)

📅  最后修改于: 2023-12-03 15:34:03.958000             🧑  作者: Mango

Python R Squared

Introduction

In statistics, the R-squared (coefficient of determination) is a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables.

In this article, we will explore how to calculate R-squared in Python, using different approaches and libraries.

Calculating R-squared with Numpy
import numpy as np

def r_squared(y_true, y_pred):
    residual = np.sum((y_true - y_pred) ** 2)
    total = np.sum((y_true - np.mean(y_true)) ** 2)
    r2 = 1 - (residual / total)
    return r2

This function takes two arrays, y_true and y_pred, which represent the true and predicted values of the dependent variable, respectively. It calculates the residuals between the true and predicted values, as well as the total sum of squares or variance of the dependent variable. Finally, it calculates the R-squared as 1 minus the ratio of the residual to the total sum of squares.

For example, suppose we have the following data for the dependent variable y and independent variable x:

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 5, 4, 6])

We can calculate the predicted values of y using a linear regression model as follows:

from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(x.reshape(-1, 1), y)
y_pred = reg.predict(x.reshape(-1, 1))

Then we can calculate the R-squared using our r_squared function:

r2 = r_squared(y, y_pred)
print(r2)

Output:

0.3454545454545459
Calculating R-squared with Statsmodels
import statsmodels.api as sm

def r_squared(y_true, y_pred):
    ssr = np.sum((y_pred - y_true) ** 2)
    sst = np.sum((y_true - np.mean(y_true)) ** 2)
    r2 = 1 - (ssr / sst)
    return r2

This function takes the same arguments as the previous function, but uses the statsmodels library to fit a linear regression model and obtain the predicted values of y.

For example, using the same data as before, we can calculate the R-squared as follows:

x = sm.add_constant(x)
model = sm.OLS(y, x).fit()
y_pred = model.predict(x)

r2 = r_squared(y, y_pred)
print(r2)

Output:

0.3454545454545458
Conclusion

In this article, we have shown how to calculate R-squared in Python using Numpy and Statsmodels. While there are other libraries and methods for calculating R-squared, these are some of the most common and versatile ones. The R-squared is a useful tool for evaluating the goodness of fit of a regression model and understanding the amount of variance explained by the independent variables.