Mean, Median and Mode are three types of "averages" and are very useful in Data Wrangling step when we have to impute the missing values in numeric variables.
Mean: is obtained by adding up all the values in the list of numbers and then divide the resulting sum by the number of values in the list.
Median: is the middle value in the list of numbers. To find the median, list of numbers should be sorted.
Mode: is the value that occurs most often. If no number in the list is repeated, then there is no mode for the list.
Example
Lets say we have a list of 7 numbers as follows:
72, 2, 6, 32, 55, 15, 9
Mean: (72+2+6+32+55+15+9) / 7 = 27.2857
Median: Lets arrange the above list in an ascending order, we get :
2, 6, 9, 15, 32, 55, 72
We see that the middle most element in this ordered list is the number 15. Therefore, 15 is the median of the list.
Difference between Mean and Median
In the above given list of ordered numbers, if I change the last number 72 to 99999, the mean becomes:
(2+6+9+15+32+55+99999) / 7 = 14302.5714
but the median of this list still remains the same. Because in this ordered list
2, 6, 9, 15, 32, 55, 99999
the middle element still remains the same, which is 15.
Relation between Mean and Median
If we have a set where every number is equidistant from its predecessor (or successor) like:
3, 6, 9, 12, 15
Then the mean
(3+6+9+12+15) / 5 = 9
and the median (middle element is 9) are equal.
Mode: In the above list, there is no mode as all the numbers are unique (no repetition of any number). But consider the following list:
2, 5, 3, 2, 2, 8, 7, 2, 5, 2
In the above list, the number 2 is repeated a lot of times. So, mode of this list is 2.
So, while Data Wrangling, we can consider any of the above three methods depending upon the nature of distribution of values in a particular numeric variable.
Mean: is obtained by adding up all the values in the list of numbers and then divide the resulting sum by the number of values in the list.
Median: is the middle value in the list of numbers. To find the median, list of numbers should be sorted.
Mode: is the value that occurs most often. If no number in the list is repeated, then there is no mode for the list.
Example
Lets say we have a list of 7 numbers as follows:
72, 2, 6, 32, 55, 15, 9
Mean: (72+2+6+32+55+15+9) / 7 = 27.2857
Median: Lets arrange the above list in an ascending order, we get :
2, 6, 9, 15, 32, 55, 72
We see that the middle most element in this ordered list is the number 15. Therefore, 15 is the median of the list.
Difference between Mean and Median
In the above given list of ordered numbers, if I change the last number 72 to 99999, the mean becomes:
(2+6+9+15+32+55+99999) / 7 = 14302.5714
but the median of this list still remains the same. Because in this ordered list
2, 6, 9, 15, 32, 55, 99999
the middle element still remains the same, which is 15.
Relation between Mean and Median
If we have a set where every number is equidistant from its predecessor (or successor) like:
3, 6, 9, 12, 15
Then the mean
(3+6+9+12+15) / 5 = 9
and the median (middle element is 9) are equal.
Mode: In the above list, there is no mode as all the numbers are unique (no repetition of any number). But consider the following list:
2, 5, 3, 2, 2, 8, 7, 2, 5, 2
In the above list, the number 2 is repeated a lot of times. So, mode of this list is 2.
So, while Data Wrangling, we can consider any of the above three methods depending upon the nature of distribution of values in a particular numeric variable.
No comments:
Post a Comment