How do I measure the relationship between two variables?
Answer
Covariance and correlation are used to measure the relationship between two random variables. Both are measures of linear dependence.
Suppose X and Y and two random variables. Covariance is calculated as follows:
\[Cov(X,Y) = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{n-1 }\]
Covariance by itself is not useful by itself because it depends on the units of X and Y. Correlation resolves this problem by standardizing the covariance so that it is unitless. Correlation is calculated as follows:
\[Corr(X,Y) = \frac{Cov(X,Y)}{\sqrt{Var(X)Var(Y)}} = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}{(x_i-\bar{x})^2\sum_{i=1}^{n}(y_i-\bar{y})^2}}}\]
Correlation is always a value between -1 and 1. The number represents the strength of the relationship, and the sign represents the direction. A value of -1 means there is perfect negative correlation, and a value of 1 means there is perfect positive correlation. If the correlation is 0, then X and Y are uncorrelated. The leftmost plot has \(Corr(Y_1, Y_2) \approx 0\), the middle plot has \(Corr(Y_1, Y_2) = 1\), and the rightmost plot has \(Corr(Y_1, Y_2) = -1\).