Dimensionality Reduction is a very important step in Machine Learning. Below are the advantages of Dimensionality Reduction in Machine Learning:

For example, you have two variables – ‘time spent on treadmill in minutes’ and ‘calories burnt’. These variables are highly correlated as the more time you spend running on a treadmill, the more calories you will burn. Hence, there is no point in storing both as just one of them does what you require.

Consider a situation where we have 50 features (p = 50). There can be

**1. Reduction in Computation Time:**Less dimensions lead to less computation/training time which increase the performance of the algorithm.**2. Improves Algorithm Performance:**Some algorithms do not perform well when we have large dimensions in dataset. So by reducing these dimensions, we can increase the performance of the algorithm.**3. Removes Multicollinearity and Correlated variables:**Multicollinearity occurs when independent variables in a model are correlated. This correlation is a problem because independent variables should be independent. It takes care of multicollinearity by removing redundant features.For example, you have two variables – ‘time spent on treadmill in minutes’ and ‘calories burnt’. These variables are highly correlated as the more time you spend running on a treadmill, the more calories you will burn. Hence, there is no point in storing both as just one of them does what you require.

**4. Reduces Overfitting:**Overfitting mainly occurs when there are too many variables or features in a dataset. Dimensionality reduction technique reduces the number of features to a reasonable number and hence reduces overfitting problem in models.**5. Better Data Visualization:**It helps in visualizing the data in a better way. It is very difficult to visualize data in higher dimensions, so reducing our space to 2D or 3D may allow us to plot and observe patterns more clearly.Consider a situation where we have 50 features (p = 50). There can be

**p(p-1)/2**scatter plots i.e. 1225 plots possible to analyze the variable relationships. It would be a tedious job to perform exploratory analysis on this data. So, we must reduce the dimensions to easily visualize the data.**6. Less Storage Required:**Space required to store the data is reduced as the number of dimensions comes down.**Related**: Dimensionality Reduction: Feature Selection and Feature Extraction
## No comments:

## Post a Comment