How do I measure the relationship between two variables?


Covariance and correlation are used to measure the relationship between two random variables. Both are measures of linear dependence. 

Suppose X and Y and two random variables. Covariance is calculated as follows:

\[Cov(X,Y) = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{n-1 }\]

Covariance by itself is not useful by itself because it depends on the units of X and Y. Correlation resolves this problem by standardizing the covariance so that it is unitless. Correlation is calculated as follows:


\[Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}{(x_i-\bar{x})^2\sum_{i=1}^{n}(y_i-\bar{y})^2}}}\]


Correlation is always a value between -1 and 1. The number represents the strength of the relationship, and the sign represents the direction. A value of -1 means there is perfect negative correlation, and a value of 1 means there is perfect positive correlation. If the correlation is 0, then X and Y are uncorrelated. The leftmost plot has \(Corr(Y_1, Y_2) \approx 0\), the middle plot has \(Corr(Y_1, Y_2) = 1\), and the rightmost plot has \(Corr(Y_1, Y_2) = -1\).

The three scatter plots below show graphically what a correlation of zero, one, and negative one look like. A correlation of zero is represented by a scatter plot of a cluster of points with no real pattern, hence no correlation between the x and y axis. The plot which demonstrates graphically a correlation of one is a plot where all the plot fall perfectly on the positive slope diagonal of y = x, hence a perfect positive correlation. The last graph represents graphically a correlation of negative one, and like the last graph these points fall perfectly on a diagonal line, but now the slope is negative and the line is y = negative x, hence a perfect negative correlation.



  • Last Updated Apr 23, 2021
  • Views 35
  • Answered By Dorian Frampton

FAQ Actions

Was this helpful? 0 0