Outliers must be removed from a dataset. In my last post, we saw how to visualize outliers in numeric variables? In this post, we will use barplots to visualize the outliers in the categorical variables.

Consider Ames Housing dataset.

import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

dataset = pd.read_csv("C:/datasets/train.csv")

First separate out all the categorical variables from the dataset and then draw barplot for each variable.

def boxplot(x,y,**kwargs):

sns.boxplot(x=x,y=y)

x = plt.xticks(rotation=90)

cat_vars = [f for f in dataset.columns if dataset.dtypes[f] == 'object']

p = pd.melt(dataset, id_vars='SalePrice', value_vars=cat_vars)

g = sns.FacetGrid (p, col='variable', col_wrap=2, sharex=False, sharey=False, size=5)

g = g.map(boxplot, 'value','SalePrice')

g

It will draw 43 plots representing outliers in each variable. You need to clearly examine each graph and try to remove the outliers from it.

Consider Ames Housing dataset.

**Step 1: Load the required libraries**import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

**Step 2: Load the dataset**dataset = pd.read_csv("C:/datasets/train.csv")

**Step 3: Create barplots for all the categorical variables**First separate out all the categorical variables from the dataset and then draw barplot for each variable.

def boxplot(x,y,**kwargs):

sns.boxplot(x=x,y=y)

x = plt.xticks(rotation=90)

cat_vars = [f for f in dataset.columns if dataset.dtypes[f] == 'object']

p = pd.melt(dataset, id_vars='SalePrice', value_vars=cat_vars)

g = sns.FacetGrid (p, col='variable', col_wrap=2, sharex=False, sharey=False, size=5)

g = g.map(boxplot, 'value','SalePrice')

g

It will draw 43 plots representing outliers in each variable. You need to clearly examine each graph and try to remove the outliers from it.

**Related**: What are Outliers? How to find and remove outliers using JointPlot in Seaborn Library?
## No comments:

## Post a Comment