Lets plot distribution plot for each numeric variable and examine its skewness.
Consider Ames Housing dataset.
Step 1: Load the required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Load the dataset
dataset = pd.read_csv("C:/datasets/train.csv")
Step 3: Create histogram for all the numeric variables
First separate out all the numeric variables from the dataset. Remove the Id column and then draw the distribution plot.
num_vars = [f for f in dataset.columns if dataset.dtypes[f] != 'object']
nd = pd.melt(dataset, value_vars = num_vars)
n1 = sns.FacetGrid (nd, col='variable', col_wrap=4, sharex=False, sharey = False)
n1 = n1.map(sns.distplot, 'value')
It will draw 37 plots representing skewness of each variable. You need to clearly examine each graph and try to remove the outliers from it. One of the way to remove skewness of variable is log transformation. I have written a detailed article on log transformation in my this post.
Related: What are Outliers? How to find and remove outliers using JointPlot in Seaborn Library?