Outliers must be removed from a dataset. In my last post, we saw how to visualize outliers in numeric variables? In this post, we will use barplots to visualize the outliers in the categorical variables.
Consider Ames Housing dataset.
Step 1: Load the required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Load the dataset
dataset = pd.read_csv("C:/datasets/train.csv")
Step 3: Create barplots for all the categorical variables
First separate out all the categorical variables from the dataset and then draw barplot for each variable.
def boxplot(x,y,**kwargs):
sns.boxplot(x=x,y=y)
x = plt.xticks(rotation=90)
cat_vars = [f for f in dataset.columns if dataset.dtypes[f] == 'object']
p = pd.melt(dataset, id_vars='SalePrice', value_vars=cat_vars)
g = sns.FacetGrid (p, col='variable', col_wrap=2, sharex=False, sharey=False, size=5)
g = g.map(boxplot, 'value','SalePrice')
g
It will draw 43 plots representing outliers in each variable. You need to clearly examine each graph and try to remove the outliers from it.
Related: What are Outliers? How to find and remove outliers using JointPlot in Seaborn Library?
Consider Ames Housing dataset.
Step 1: Load the required libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Step 2: Load the dataset
dataset = pd.read_csv("C:/datasets/train.csv")
Step 3: Create barplots for all the categorical variables
First separate out all the categorical variables from the dataset and then draw barplot for each variable.
def boxplot(x,y,**kwargs):
sns.boxplot(x=x,y=y)
x = plt.xticks(rotation=90)
cat_vars = [f for f in dataset.columns if dataset.dtypes[f] == 'object']
p = pd.melt(dataset, id_vars='SalePrice', value_vars=cat_vars)
g = sns.FacetGrid (p, col='variable', col_wrap=2, sharex=False, sharey=False, size=5)
g = g.map(boxplot, 'value','SalePrice')
g
It will draw 43 plots representing outliers in each variable. You need to clearly examine each graph and try to remove the outliers from it.
Related: What are Outliers? How to find and remove outliers using JointPlot in Seaborn Library?
No comments:
Post a Comment