Lets visualize our data with Box Plot which is present in Seaborn library. Box Plots are very useful in finding outliers in a variable. We can also combine Box Plot with Swarm Plot.
We can pass various parameters to boxplot like hue, order, orient, palette, color etc.
Lets explore Box Plot using Tips dataset.
Step 1: Import required libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
Step 2: Load Tips dataset
tips=sns.load_dataset('tips')
tips.head()
Step 3: Explore data using Box Plot
Box Plot is both univariate and bivariate. Lets analyze it first by using one variable and then we will use two variables.
Visualizing one variable using Box Plot
sns.boxplot(x=tips['tip'])
sns.boxplot(x=tips['total_bill'])
sns.boxplot(x='total_bill', data=tips)
Visualizing two variables using Box Plot
sns.boxplot(x='sex', y='total_bill', data=tips)
sns.boxplot(x='day', y='total_bill', data=tips)
Add hue parameter
sns.boxplot(x='day', y='total_bill', data=tips, hue='sex')
sns.boxplot(x='day', y='total_bill', data=tips, hue='sex', palette='husl')
sns.boxplot(x='day', y='total_bill', data=tips, hue='smoker', palette='coolwarm')
sns.boxplot(x='day', y='total_bill', data=tips, hue='time', palette='coolwarm')
Note: If you run the above line, you will find that there is no hue corresponding to "Sat" and "Sun" as there is no data for "Lunch" for "Sat" and "Sun".
sns.boxplot(x='day', y='total_bill', data=tips, order=['Sat', 'Sun', 'Thur', 'Fri'])
Change orientation of box plot
sns.boxplot(data=tips)
sns.boxplot(data=tips, orient='horizontal')
sns.boxplot(data=tips, orient='h')
sns.boxplot(data=tips, orient='vertical')
sns.boxplot(data=tips, orient='v')
Combining Box Plot and Swarm Plot
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='black')
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='0.35')
You can download my Jupyter notebook from here. I recommend to also try above code with Iris dataset.
Related:
What is Boxplot? How is it used to find outliers in a dataset?
Boxplot Grouping: Visualizing one variable based on another variable using boxplot
We can pass various parameters to boxplot like hue, order, orient, palette, color etc.
Lets explore Box Plot using Tips dataset.
Step 1: Import required libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
Step 2: Load Tips dataset
tips=sns.load_dataset('tips')
tips.head()
Step 3: Explore data using Box Plot
Box Plot is both univariate and bivariate. Lets analyze it first by using one variable and then we will use two variables.
Visualizing one variable using Box Plot
sns.boxplot(x=tips['tip'])
sns.boxplot(x=tips['total_bill'])
sns.boxplot(x='total_bill', data=tips)
Visualizing two variables using Box Plot
sns.boxplot(x='sex', y='total_bill', data=tips)
sns.boxplot(x='day', y='total_bill', data=tips)
Add hue parameter
sns.boxplot(x='day', y='total_bill', data=tips, hue='sex')
sns.boxplot(x='day', y='total_bill', data=tips, hue='sex', palette='husl')
sns.boxplot(x='day', y='total_bill', data=tips, hue='smoker', palette='coolwarm')
sns.boxplot(x='day', y='total_bill', data=tips, hue='time', palette='coolwarm')
Note: If you run the above line, you will find that there is no hue corresponding to "Sat" and "Sun" as there is no data for "Lunch" for "Sat" and "Sun".
sns.boxplot(x='day', y='total_bill', data=tips, order=['Sat', 'Sun', 'Thur', 'Fri'])
Change orientation of box plot
sns.boxplot(data=tips)
sns.boxplot(data=tips, orient='horizontal')
sns.boxplot(data=tips, orient='h')
sns.boxplot(data=tips, orient='vertical')
sns.boxplot(data=tips, orient='v')
Combining Box Plot and Swarm Plot
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='black')
sns.boxplot(x='day', y='total_bill', data=tips, palette='husl')
sns.swarmplot(x='day', y='total_bill', data=tips, color='0.35')
You can download my Jupyter notebook from here. I recommend to also try above code with Iris dataset.
Related:
What is Boxplot? How is it used to find outliers in a dataset?
Boxplot Grouping: Visualizing one variable based on another variable using boxplot
No comments:
Post a Comment