Dimensionality Reduction is a very important step in Machine Learning. Below are the advantages of Dimensionality Reduction in Machine Learning:
1. Reduction in Computation Time: Less dimensions lead to less computation/training time which increase the performance of the algorithm.
2. Improves Algorithm Performance: Some algorithms do not perform well when we have large dimensions in dataset. So by reducing these dimensions, we can increase the performance of the algorithm.
3. Removes Multicollinearity and Correlated variables: Multicollinearity occurs when independent variables in a model are correlated. This correlation is a problem because independent variables should be independent. It takes care of multicollinearity by removing redundant features.
For example, you have two variables – ‘time spent on treadmill in minutes’ and ‘calories burnt’. These variables are highly correlated as the more time you spend running on a treadmill, the more calories you will burn. Hence, there is no point in storing both as just one of them does what you require.
4. Reduces Overfitting: Overfitting mainly occurs when there are too many variables or features in a dataset. Dimensionality reduction technique reduces the number of features to a reasonable number and hence reduces overfitting problem in models.
5. Better Data Visualization: It helps in visualizing the data in a better way. It is very difficult to visualize data in higher dimensions, so reducing our space to 2D or 3D may allow us to plot and observe patterns more clearly.
Consider a situation where we have 50 features (p = 50). There can be p(p-1)/2 scatter plots i.e. 1225 plots possible to analyze the variable relationships. It would be a tedious job to perform exploratory analysis on this data. So, we must reduce the dimensions to easily visualize the data.
6. Less Storage Required: Space required to store the data is reduced as the number of dimensions comes down.
Related: Dimensionality Reduction: Feature Selection and Feature Extraction
1. Reduction in Computation Time: Less dimensions lead to less computation/training time which increase the performance of the algorithm.
2. Improves Algorithm Performance: Some algorithms do not perform well when we have large dimensions in dataset. So by reducing these dimensions, we can increase the performance of the algorithm.
3. Removes Multicollinearity and Correlated variables: Multicollinearity occurs when independent variables in a model are correlated. This correlation is a problem because independent variables should be independent. It takes care of multicollinearity by removing redundant features.
For example, you have two variables – ‘time spent on treadmill in minutes’ and ‘calories burnt’. These variables are highly correlated as the more time you spend running on a treadmill, the more calories you will burn. Hence, there is no point in storing both as just one of them does what you require.
4. Reduces Overfitting: Overfitting mainly occurs when there are too many variables or features in a dataset. Dimensionality reduction technique reduces the number of features to a reasonable number and hence reduces overfitting problem in models.
5. Better Data Visualization: It helps in visualizing the data in a better way. It is very difficult to visualize data in higher dimensions, so reducing our space to 2D or 3D may allow us to plot and observe patterns more clearly.
Consider a situation where we have 50 features (p = 50). There can be p(p-1)/2 scatter plots i.e. 1225 plots possible to analyze the variable relationships. It would be a tedious job to perform exploratory analysis on this data. So, we must reduce the dimensions to easily visualize the data.
6. Less Storage Required: Space required to store the data is reduced as the number of dimensions comes down.
Related: Dimensionality Reduction: Feature Selection and Feature Extraction
No comments:
Post a Comment