The Professionals Point: What is Skewnesss? How to visualize it with Histogram and how to remove it?

Friday, 5 April 2019

What is Skewnesss? How to visualize it with Histogram and how to remove it?

Skewness is a measure of the asymmetry in a variable. It can be positive (right skewed), negative (left skewed), and zero. Ideally there should be zero skewness in a variable. Larger the skewness, greater the number of outliers in a variable.

How to remove skewness from variables?

Our aim should be to have near zero skewness in our variables in the dataset. Taking log of the skewed variable helps a lot in decreasing the skewness. So, lets see how to do that?

Consider a Load Prediction dataset. We will analyze skewness of LoanAmount variable.

Step 1: Import the required libraries

import pandas as pd
import numpy as np
import matplotlib as plt
%matplotlib inline
import seaborn as sns

Step 2: Load the dataset

dataset = pd.read_csv("C:/train_loan_prediction.csv")

Step 3: Draw histogram of LoanAmount variable with 20 bins

dataset['LoanAmount'].hist(bins=20)

Step 4: Create a new variable by taking log of LoanAmount variable

dataset['LoanAmount_Log'] = np.log(dataset['LoanAmount'])

Step 5: Draw histogram of newly created variable

dataset['LoanAmount_Log'].hist(bins=20)

We can see that distribution of the values in the LoanAmount_Log variable is normal and symmetrical and skewness is near to zero. In this way, you should check skewness of all the variables and remove it.

Related: Log Transforming the Skewed Data to get Normal Distribution

1 comment:

Unknown24 July 2019 at 04:06
thanks
ReplyDelete
Replies

The Professionals Point

Pages

Friday, 5 April 2019

What is Skewnesss? How to visualize it with Histogram and how to remove it?

1 comment:

About the Author