On the Misconception of R^2 for (r)^2 in a Regression Model

Submission Deadline-30th July 2024
June 2024 Issue : Publication Fee: 30$ USD Submit Now
Submission Deadline-20th July 2024
Special Issue of Education: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume VI, Issue XII, December 2019 | ISSN 2321–2705

On the Misconception of R2 for (r)2 in a Regression Model

Ijomah, Maxwell Azubuike

IJRISS Call for paper

Dept. of Maths/Statistics, University of Port Harcourt, Nigeria

Abstract:-The coefficient of determination (R2) is perhaps the single most extensively used measure of goodness of fit for regression models, and measures the proportion of variation in the dependent variable explained by the predictors included in the model. It is however, widely misused as the square of correlation coefficient and this has led to poor interpretation of research reports in regression model. In this paper, we investigate the controversy regarding use of coefficient of determination as the square of correlation coefficient in statistical analysis. Difference between the two statistics are illustrated using examples from simple and multiple regression models.

Keywords: linear regression; coefficient of determination; correlation coefficient; multiple correlation; regression coefficient.

I. INTRODUCTION

The classical linear regression model is the standard procedure for extracting the statistical information from the data through the determination of relationship between the study and explanatory variables. In the course of model estimation, it is common practice to assess the appropriateness or adequacy of the fitted model in explaining the variations in the data set. A popular tool to determine the adequacy of the fitted model is the coefficient of determination.The coefficient of determination is a measure used in statistical analysis that assesses how well a model explains and predicts future outcomes. It is indicative of the level of explained variability in the data set and considers the variation in the dependent variable explained by the independent variable(s). It provides a summary measure for the goodness of fit of any linear regression model and is based on the proportion of variability of the study variable that can be explained through the knowledge of a given set of explanatory variables.These definitions are found by both econometrics and statistics handbooks and is widely accepted among quantitative scholars.