Anonymous

# In statistics, when is R squared negative? Coefficient of determination (It can be negative)?

Relevance

The coefficient of determination can never be negative. it is bounded by 0 ≤ r² ≤ 1 and this is because the coefficient of determination is the square of the correlation coefficient r which is bounded by -1 ≤ r ≤ 1

The correlation coefficient, r, is a measure of the linear relationship between two variables. If the data is non-linear then the correlation coefficient is meaningless.

r takes on values between -1 and 1. negative values indicate the relationship between the variables is indirect, i.e., on a scatter plot the data tends to have a negative slope. Positive values for r indicate the data tends to have a positive slope. if r = 0 we say the variables are uncorrelated.

the closer the absolute value of r is to 1, the stronger the linear association between the two variables.

there are many different formulas for calculating the value of r. if we let xbar and ybar be the means of two data sets. sx and sy are the standard deviations in the data sets and n = total sample size then:

r = 1/(n - 1) * Σ( ((xi - xbar)/sx) * ((yi - ybar)/sy)) with the sum going from i = 1 to n

r = Covariance(X,Y) / [(√(Var(X))√(Var(Y))]

r = Σ(xi - xbar)(yi - ybar) / [ √(Σ(xi - xbar)²) √(Σ(yi - ybar)²)]

the second equation shows that the correlation coefficient the ratio between the measure of spread between the variables and the product of the spread within each variable.

r is unit less.

r is not affected by multiplying each data set by a constant, and a constant to each data set or interchanging x and y.

r is subject to outliers.

r² is called the coefficient of determination. It is a measure of the proportion of variance in y explained by regression.

Also note that correlation is not causation. Here is an example: the shoe size of grade school students and the student's vocabulary are highly correlated. In other words, the larger the shoe size, the larger the vocabulary the student has. Now it is easy to see that shoe size and vocabulary have nothing to do with each other, but they are highly correlated. The reason is that there is a confounding factor, age. the older the grade school student the larger the shoe size and the larger the vocabulary.

you cannot compare models by comparing the r values. This is a long discussion, a full day lecture in the prob/stat courses I've instructed. Model comparison is a topic usually saved for high level under grad courses or graduate level courses.

good sites with info about correlation are:

http://mathworld.wolfram.com/CorrelationCoefficien...

http://mathworld.wolfram.com/LeastSquaresFitting.h...

• Anonymous
5 years ago

For the best answers, search on this site https://shorturl.im/xmSEW

A scatter diagram has many (x,y) data points, and a typical objective is to get the best possible trend line (called a "regression line") that runs through the data for modeling purposes. Commonly, the "method of least squares" which yields a "y = mx + b" line is used. This method uses n (the number of data points), ∑x, ∑x^2, ∑y, ∑y^2, and ∑xy statistics. The objective is to minimize the sum of the squares (hence "least squares") of the differences between actual and calculated values. If the slope of the least squares line is positive, the correlation is positive, but if the slope is negative, the correlation is negative. So positive correlation means x and y increase together, while negative correlation means that as one variable increases, the other decreases. There is no connection to a 45° line. (Example: y = 2, 4, 6, 8, 10 for x = 1, 2, 3, 4, 5 has perfect correlation and a 63° slope.) The correlation coefficient r lies between -1 and +1. The higher the absolute value of r, the better the regression line "fits" the data. Correlation coefficients of +0.80 and -0.80 are equally good. The only difference is that the slopes are positive and negative. A correlation of -0.80 is much better than +0.50. Often people want to know whether the correlation is "significant," or whether it's just random variation in the data. For this purpose, the "coefficient of determination" may be used. The coefficient of determination equals r^2, the square of the correlation coefficient, and since it's a square, it ranges from 0 to 1 -- never negative. The coefficient of determination tells how much of the data is "explained" by the regression line, the remainder, of course, being the result of random variation in the data.

• r squared (the coefficient of determination) can never be negative. It is literally the square of r, the correlation coefficient. Therefore, it could only be negative if r were imaginary, which it never is. However, r can be negative. r is positive for a positive association (x and y both go up together; the line of best fit has a positive slope). r is negative for a negative association (as x increases, y decreases; the line of best fit has a negative slope.

But remember, r squared is always positive.

RE:

In statistics, when is R squared negative? Coefficient of determination (It can be negative)?

Source(s): statistics squared negative coefficient determination negative: https://biturl.im/Mibsy