# How do I interpret correlation, R?

Cost - 2.42, 1.14, 1.34, 1.24, 1.59, 1.18, 3.11, 2.16, 2.76, 2.31

Content- 560, 1560, 1740, 1600, 1520, 1570, 690, 640, 610,660

r = ???

It's all so very confusing to me!

• 9 years ago

http://mathworld.wolfram.com/CorrelationCoefficien...

http://en.wikipedia.org/wiki/Pearson_product-momen...

First compute these three things called

ss_(xy)

ss_(xx)

ss_(yy)

I think "ss" means sum of squares here.

Then do (ss_(xy))^2 / (ss_(xx) ss_(yy) ) = r^2

Then take the square root of it.

Count how many points there are. There are 10 Cost numbers and 10 Content numbers. It doesn't even matter which ones are the x and which ones are the y to compute the correlation. Let x1, x2, ... x10 be the Cost numbers.

Find the average: add up all the Costs and divide by 10. Call that X. Then add up the sum of the squares:

ssxx = (x1 - X)^2 + (x2 - X)^2 + ... + (x10-X)^2.

Then do the same process with the 10 Content numbers - find the mean, call it Y, then do the sum of squares of the differences from the mean. That gives you ssyy.

Next do the cross terms ssxy. To do this you have to keep the numbers in order in the list. If they have gotten shuffled around then it will give the wrong answer.

It will look like this

(x1 - X)(y1 - Y) + (x2 - X) (y2 - Y) + (x3 - X) (y3 - Y) + .... + (x10 - X) (y10 - Y)

If you get bored too fast doing this yourself on paper with a calculator it might be sweeter to find a calculator that does statistics. Find the manual and figure out how to enter data points for (x,y) and how to use the stats button. Or get a spreadsheet program like Open Office Calc or Microsoft Excel. But if you just want to use regular paper with lines then you would just have about 7 columns of numbers

xi yi xi-X yi-Y (xi-X)^2 (yi-Y)^2 (xi-X)(yi-Y)

Then at the end of the first column you put the sum of the xi and the mean,

at the end of the yi column you put the sum of the yi and the mean.

If doing it without machinery then you can round off the mean to 2 decimal places (3 or more is safer if you need it to be very accurate. Hopefully you have wide paper or you can write small.)

The three answers you need are the total for the 5th column, the total for the 6th column, and the total for the 7th column.

Looking at these numbers I think the correlation should be less than zero because when the cost goes down like in the 1.-- then the content is higher like in the 1000s but if the cost is higher like 2.something then the content is lower like in the 500s or 600s.

Maybe that is what you are asking and I didn't have to go to all that trouble to talk about how it is computed :-)