Both PCA (Principal Component Analysis) and t-SNE (t-Distributed Stochastic Neighbor Embedding) are the dimensionality reduction techniques in Machine Learning and efficient tools for data exploration and visualization. In this article, we will compare both PCA and t-SNE. We will see the advantages and disadvantages / limitations of t-SNE over PCA.
Advantages of t-SNE
1. Handles Non Linear Data Efficiently: PCA is a linear algorithm. It creates Principal Components which are the linear combinations of the existing features. So, it is not able to interpret complex polynomial relationships between features. So, if the relationship between the variables is nonlinear, it performs poorly. On the other hand, t-SNE works well on non-linear data. It is a very effective non-linear dimensionality reduction algorithm.
PCA tries to place dissimilar data points far apart in a lower dimension representation. But in order to represent high dimension data on low dimension, non-linear manifold, it is important that similar datapoints must be represented close together, which is not what PCA does. This is done efficiently by t-SNE. So, it can efficiently capture the structure of trickier manifolds in the dataset.
2. Preserves Local and Global Structure: t-SNE is capable to preserve local and global structure of the data. This means, roughly, that points which are close to one another in the high-dimensional dataset, will tend to be close to one another in the low dimension. On the other hand, PCA finds new dimensions that explain most of the variance in the data. So, it cares relatively little about local neighbors unlike t-SNE.
Disadvantages of t-SNE
1. Computationally Complex: t-SNE involves a lot of calculations and computations because it computes pairwise conditional probabilities for each data point and tries to minimize the sum of the difference of the probabilities in higher and lower dimensions.
“Since t-SNE scales quadratically in the number of objects N, its applicability is limited to data sets with only a few thousand input objects; beyond that, learning becomes too slow to be practical (and the memory requirements become too large)”.
t-SNE has a quadratic time and space complexity in the number of data points. This makes it particularly slow, computationally quite heavy and resource draining while applying it to datasets comprising of more than 10,000 observations.
Use both PCA and t-SNE: Solution of the above problem is to use both PCA and t-SNE in conjunction. So, if you have thousands of features in a dataset, don't use t-SNE for dimensionality reduction in the first step. First use PCA to reduce the dimensions to a reasonable number of features and then run t-SNE to further reduce the dimensionality.
2. Non-deterministic: Sometimes different runs with same hyper parameters may produce different results. So, you won't get exactly the same output each time you run it, though the results are likely to be similar.
3. Requires Hyperparameter Tuning: t-SNE involves hyperparameters to be tuned unlike PCA (does not have any hyperparameter). Handing hyperparameters incorrectly may lead to unwanted results.
4. Noisy Patterns: Patterns may be found in random noise as well, so multiple runs of the algorithm with different sets of hyperparameter must be checked before deciding if a pattern exists in the data.
Advantages of t-SNE
1. Handles Non Linear Data Efficiently: PCA is a linear algorithm. It creates Principal Components which are the linear combinations of the existing features. So, it is not able to interpret complex polynomial relationships between features. So, if the relationship between the variables is nonlinear, it performs poorly. On the other hand, t-SNE works well on non-linear data. It is a very effective non-linear dimensionality reduction algorithm.
PCA tries to place dissimilar data points far apart in a lower dimension representation. But in order to represent high dimension data on low dimension, non-linear manifold, it is important that similar datapoints must be represented close together, which is not what PCA does. This is done efficiently by t-SNE. So, it can efficiently capture the structure of trickier manifolds in the dataset.
2. Preserves Local and Global Structure: t-SNE is capable to preserve local and global structure of the data. This means, roughly, that points which are close to one another in the high-dimensional dataset, will tend to be close to one another in the low dimension. On the other hand, PCA finds new dimensions that explain most of the variance in the data. So, it cares relatively little about local neighbors unlike t-SNE.
Disadvantages of t-SNE
1. Computationally Complex: t-SNE involves a lot of calculations and computations because it computes pairwise conditional probabilities for each data point and tries to minimize the sum of the difference of the probabilities in higher and lower dimensions.
“Since t-SNE scales quadratically in the number of objects N, its applicability is limited to data sets with only a few thousand input objects; beyond that, learning becomes too slow to be practical (and the memory requirements become too large)”.
t-SNE has a quadratic time and space complexity in the number of data points. This makes it particularly slow, computationally quite heavy and resource draining while applying it to datasets comprising of more than 10,000 observations.
Use both PCA and t-SNE: Solution of the above problem is to use both PCA and t-SNE in conjunction. So, if you have thousands of features in a dataset, don't use t-SNE for dimensionality reduction in the first step. First use PCA to reduce the dimensions to a reasonable number of features and then run t-SNE to further reduce the dimensionality.
2. Non-deterministic: Sometimes different runs with same hyper parameters may produce different results. So, you won't get exactly the same output each time you run it, though the results are likely to be similar.
3. Requires Hyperparameter Tuning: t-SNE involves hyperparameters to be tuned unlike PCA (does not have any hyperparameter). Handing hyperparameters incorrectly may lead to unwanted results.
4. Noisy Patterns: Patterns may be found in random noise as well, so multiple runs of the algorithm with different sets of hyperparameter must be checked before deciding if a pattern exists in the data.
Related:
What is t-SNE? How does it work using t-Distribution?
Advantages and Disadvantages of Principal Component Analysis
Why is Dimensionality Reduction required in Machine Learning?
No comments:
Post a Comment