Binning is a

For example, consider a dataset containing a variable which stores age of the people. This age is a continuous variable which can range from 1 to 100+. Analyzing this data is difficult. Using binning technique, we can convert all the values in this variable into ranges.

There are two types of binning techniques:

Lets discuss them one by one:

We manually create fix width bins based on some rules and domain knowledge. Consider that we have following 15 values in the age column:

age = [12, 15, 13, 78, 65, 42, 98, 24, 26, 38, 27, 32, 22, 45, 27]

Now, lets create bins of fixed width (say 10):

bins = [0 {0-9}, 1 {10-19}, 2 {20-29}, 3 {30-39}, 4 {40-49}, 5 {50-59}, 6 {60-69}, 7 {70-79}, 8 {80-89}, 9 {90-99}]

After binning, our age variable looks like this:

age = [1, 1, 1, 7, 6, 4, 9, 2, 2, 3, 2, 3, 2, 4, 2]

In this way, all the 15 values will fit in above 10 ranges / bins. Just think of a dataset containing thousands of values in the age column instead of just 15! How useful it would be in this case!

In Fixed-Width Binning, bin ranges are manually decided. So, we usually end up in creating irregular bins which are not uniform based on the number of data points or values which fall under each bin. Some of the bins might be densely populated and some of them might be sparsely populated or even empty.

For example, bins 0, 5 and 8 are empty in our case.

In Adaptive Binning, data distribution itself decides bin ranges for itself. No manual intervention is required. So, the bins which are created are uniform in terms of number of data points in it.

Popular examples of quantiles include the

**quantization**technique in Machine Learning to handle continuous variables. It is one of the important steps in Data Wrangling. Binning transforms the continuous variables into groups, ranges or intervals called bins.For example, consider a dataset containing a variable which stores age of the people. This age is a continuous variable which can range from 1 to 100+. Analyzing this data is difficult. Using binning technique, we can convert all the values in this variable into ranges.

**Types of Binning**There are two types of binning techniques:

**1. Fixed-Width Binning****2. Adaptive Binning**Lets discuss them one by one:

**1. Fixed-Width Binning**We manually create fix width bins based on some rules and domain knowledge. Consider that we have following 15 values in the age column:

age = [12, 15, 13, 78, 65, 42, 98, 24, 26, 38, 27, 32, 22, 45, 27]

Now, lets create bins of fixed width (say 10):

bins = [0 {0-9}, 1 {10-19}, 2 {20-29}, 3 {30-39}, 4 {40-49}, 5 {50-59}, 6 {60-69}, 7 {70-79}, 8 {80-89}, 9 {90-99}]

After binning, our age variable looks like this:

age = [1, 1, 1, 7, 6, 4, 9, 2, 2, 3, 2, 3, 2, 4, 2]

In this way, all the 15 values will fit in above 10 ranges / bins. Just think of a dataset containing thousands of values in the age column instead of just 15! How useful it would be in this case!

**2. Adaptive Binning**In Fixed-Width Binning, bin ranges are manually decided. So, we usually end up in creating irregular bins which are not uniform based on the number of data points or values which fall under each bin. Some of the bins might be densely populated and some of them might be sparsely populated or even empty.

For example, bins 0, 5 and 8 are empty in our case.

In Adaptive Binning, data distribution itself decides bin ranges for itself. No manual intervention is required. So, the bins which are created are uniform in terms of number of data points in it.

**Quantile**based binning is a good strategy to use for adaptive binning. Quantiles are specific values or cut-points which help in partitioning the continuous valued distribution of a specific numeric field into discrete contiguous bins or intervals. Thus,**q-Quantiles**help in partitioning a numeric attribute into q equal partitions.Popular examples of quantiles include the

**2-Quantile**known as the**median**which divides the data distribution into two equal bins,**4-Quantiles**known as the**quartiles**which divide the data into 4 equal bins and**10-Quantiles**also known as the**deciles**which create 10 equal width bins.**Advantage of Binning**: It finds a set of patterns in continuous variables which are easy to analyze and interpret**Disadvantage of Binning**: Binning leads to loss of information. The original data is converted into the bins.
## No comments:

## Post a Comment