Wednesday, 3 April 2019

What is Boxplot? How is it used to find outliers in a dataset?

A Boxplot is a graph that indicates how the values in the dataset are spread out. Boxplots are used to visualize the distribution of the data based on following parameters:

1. minimum
2. first quartile (Q1)
3. median
4. third quartile (Q3)
5. maximum

Below is the detail of the above parameters:

median (Q2/50th Percentile): the middle value of the dataset.

first quartile (Q1/25th Percentile): the middle number between the smallest number (not the “minimum”) and the median of the dataset.

third quartile (Q3/75th Percentile): the middle value between the median and the highest value (not the “maximum”) of the dataset.

interquartile range (IQR): 25th to the 75th percentile.

whiskers (shown in blue)
outliers (shown as green circles)

maximum: Q3 + 1.5*IQR
minimum: Q1 -1.5*IQR

Advantages of Barplots

1. Used to find out skewness of variables.
2. Used to find out outliers in a variable.
3. Used to find out if the data is symmetrical or not? How tightly the data is grouped?

No comments:

Post a Comment