## Friday 8 February 2019

### Difference between Covariance and Correlation in Machine Learning

Covariance and Correlation are two important concepts of Mathematics (especially of Probability and Statistics) which are heavily used in Machine Learning mainly for Data Analysis and Data Wrangling.

Dimensionality Reduction in Machine Learning mainly depends upon Covariance and Correlation among different variables or features in the dataset. For example, PCA (Principal Component Analysis) algorithm uses Correlation concept for Feature Extraction.

Covariance and Correlation describe the relationship and inter-dependence between two variables. Both Covariance and Correlation depict how the change in one variable affects the change in another variable? Relationship between two variables or features can be positive relationship, negative relationship or there could be no relationship at all.

Difference between Covariance and Correlation

1. Correlation between the two variables is a normalized version of the Covariance

To calculate the Correlation between random variables X and Y, we need to divide the Covariance of X and Y by the product of the Standard Deviation of X and the Standard Deviation of Y.

As per the above equation, a positive Covariance always results in a positive Correlation and a negative Covariance always results in a negative Correlation.

2. Covariance varies from negative infinity to positive infinity while Correlation varies from -1 to 1. If the Correlation between the two variables is 0.85, you can say that change in one variable results in similar change in other variable. So, both the variables called correlated with each other.

3. Covariance is Unit Dependent while Correlation is Unit Independent (it means Correlation is dimensionless).

4. Covariance is Scale Dependent while Correlation is Scale Independent. It means that difference in scale can result in different Covariance. For example, Height vs Weight (in Kg) and Height vs Weight (in Pounds) will have different Covariance values but same Correlation value.