Pages

Monday, 15 April 2019

Data Visualization using Box Plot (Seaborn Library)

Lets visualize our data with Box Plot which is present in Seaborn library. Box Plots are very useful in finding outliers in a variable. We can also combine Box Plot with Swarm Plot.

We can pass various parameters to boxplot like hue, order, orient, palette, color etc. 

Lets explore Box Plot using Tips dataset.

Step 1: Import required libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

Step 2: Load Tips dataset

tips=sns.load_dataset('tips')
tips.head()

Step 3: Explore data using Box Plot

Box Plot is both univariate and bivariate. Lets analyze it first by using one variable and then we will use two variables. 

Visualizing one variable using Box Plot

sns.boxplot(x=tips['tip'])

sns.boxplot(x=tips['total_bill'])

sns.boxplot(x='total_bill', data=tips)

Visualizing two variables using Box Plot

sns.boxplot(x='sex', y='total_bill', data=tips)

sns.boxplot(x='day', y='total_bill', data=tips)

Add hue parameter

sns.boxplot(x='day', y='total_bill', data=tips, hue='sex')

sns.boxplot(x='day', y='total_bill', data=tips, hue='sex', palette='husl')

sns.boxplot(x='day', y='total_bill', data=tips, hue='smoker', palette='coolwarm')

sns.boxplot(x='day', y='total_bill', data=tips, hue='time', palette='coolwarm') 

Note: If you run the above line, you will find that there is no hue corresponding to "Sat" and "Sun" as there is no data for "Lunch" for "Sat" and "Sun".

sns.boxplot(x='day', y='total_bill', data=tips, order=['Sat', 'Sun', 'Thur', 'Fri'])

Change orientation of box plot

sns.boxplot(data=tips)

sns.boxplot(data=tips, orient='horizontal')
sns.boxplot(data=tips, orient='h')

sns.boxplot(data=tips, orient='vertical')
sns.boxplot(data=tips, orient='v')

Combining Box Plot and Swarm Plot

sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='black')

sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='0.35')

You can download my Jupyter notebook from here. I recommend to also try above code with Iris dataset.

Related:
What is Boxplot? How is it used to find outliers in a dataset?
Boxplot Grouping: Visualizing one variable based on another variable using boxplot

No comments:

Post a Comment