## Pages

Machine Learning Quiz (134 Objective Questions) Start ML Quiz

Deep Learning Quiz (205 Objective Questions) Start DL Quiz

## Friday, 5 April 2019

### What is Skewnesss? How to visualize it with Histogram and how to remove it?

Skewness is a measure of the asymmetry in a variable. It can be positive (right skewed), negative (left skewed), and zero. Ideally there should be zero skewness in a variable. Larger the skewness, greater the number of outliers in a variable.

How to remove skewness from variables?

Our aim should be to have near zero skewness in our variables in the dataset. Taking log of the skewed variable helps a lot in decreasing the skewness. So, lets see how to do that?

Consider a Load Prediction dataset. We will analyze skewness of LoanAmount variable.

Step 1: Import the required libraries

import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
import seaborn as sns

Step 3: Draw histogram of LoanAmount variable with 20 bins

dataset['LoanAmount'].hist(bins=20)

Step 4: Create a new variable by taking log of LoanAmount variable

dataset['LoanAmount_Log'] = np.log(dataset['LoanAmount'])

Step 5: Draw histogram of newly created variable

dataset['LoanAmount_Log'].hist(bins=20)

We can see that distribution of the values in the LoanAmount_Log variable is normal and symmetrical and skewness is near to zero. In this way, you should check skewness of all the variables and remove it.