Pages

Friday, 5 April 2019

Difference between Mean, Median and Mode (Mean vs Median vs Mode)

Mean, Median and Mode are three types of "averages" and are very useful in Data Wrangling step when we have to impute the missing values in numeric variables. 

Mean: is obtained by adding up all the values in the list of numbers and then divide the resulting sum by the number of values in the list.

Median: is the middle value in the list of numbers. To find the median, list of numbers should be sorted.

Mode: is the value that occurs most often. If no number in the list is repeated, then there is no mode for the list.

Example

Lets say we have a list of 7 numbers as follows: 

72, 2, 6, 32, 55, 15, 9

Mean(72+2+6+32+55+15+9) / 7 = 27.2857

Median: Lets arrange the above list in an ascending order, we get :

2, 6, 9, 15, 32, 55, 72

We see that the middle most element in this ordered list is the number 15. Therefore, 15 is the median of the list.

Difference between Mean and Median

In the above given list of ordered numbers, if I change the last number 72 to 99999, the mean becomes:

(2+6+9+15+32+55+99999) / 7 = 14302.5714

but the median of this list still remains the same. Because in this ordered list

2, 6, 9, 15, 32, 55, 99999

the middle element still remains the same, which is 15.

Relation between Mean and Median

If we have a set where every number is equidistant from its predecessor (or successor) like:

 3, 6, 9, 12, 15

Then the mean

(3+6+9+12+15) / 5 = 9

and the median (middle element is 9) are equal.

Mode: In the above list, there is no mode as all the numbers are unique (no repetition of any number). But consider the following list:

2, 5, 3, 2, 2, 8, 7, 2, 5, 2

In the above list, the number 2 is repeated a lot of times. So, mode of this list is 2. 

So, while Data Wrangling, we can consider any of the above three methods depending upon the nature of distribution of values in a particular numeric variable. 

No comments:

Post a Comment