Example: Let take an example of a password. You create an account on a website, your password is encrypted and stored in the database. Now, when you try to login to that website, your encrypted password is fetched, decrypted and matched with the password your provided.
Components of an Autoencoder
Autoencoders consists of 4 main parts:
1. Encoder: It is the layer in which the model learns how to reduce the input dimensions and compress the input data into an encoded representation. This is the part of the network that compresses the input into a latent space representation.
2. Bottleneck / Code: It is the layer that contains the compressed representation of the input data. This is the lowest possible dimensions of the input data. It decides which aspects of the data are relevant and which aspects can be thrown away.
3. Decoder: It is the layer in which the model learns how to reconstruct the data from the encoded representation to be as close to the original input as possible. The decoded image is a lossy reconstruction of the original image.
4. Reconstruction Loss: This is the method that measures how well the decoder is performing and how close the output is to the original input.
Properties of an Autoencoder
1. Unsupervised: Autoencoders are considered as an unsupervised learning technique as these don't need explicit labels to train on.
2. Data-specific: Autoencoders are only able to compress and decompress the data similar to what they have been trained on. For example, an autoencoder which has been trained on human faces, would not perform well with the images of buildings.
3. Lossy: The output of the autoencoder will not be exactly the same as the input, it will be a close but degraded representation.
How does an Autoencoder work?
Autoencoders compress the input into a latent-space representation and then reconstruct the output from this representation. We calculate the loss by comparing the input and output. This difference between the input and output is called reconstruction loss. Main objective of autoencoder is to minimize this reconstruction loss so that the output is similar to the input. To reduce this reconstruction loss, we back propagate through the network and update the weights using gradient descent algorithm.
Autoencoder should have generalization capabilities: As a general rule of thumb, our autoencoder should be sensitive enough to recreate the original observation but insensitive enough to the training data such that the model learns a generalization. In other words, autoencoders should have some generalization capabilities. Mainly all types of autoencoders like undercomplete, sparse, convolutional and denoising autoencoders use some mechanism to have generalization capabilities.
How to increase generalization capabilities of an autoencoders?
1. Keep the code layer small so that there is more compression of data. More is the data compression, more is the generalization.
2. Limit the number of nodes in the hidden layers of the network (undercomplete autoencoders).
3. Use L1 and L2 regularization (sparse autoencoders)
4. Add random noise to the inputs and let the autoencoder recover the original noise-free data (denoising autoencoder)
Types of an Autoencoder
1. Undercomplete autoencoder: In this type of autoencoder, we limit the number of nodes present in the hidden layers of the network. In this way, it also limits the amount of information that can flow through the network which makes our model to learn only the most important attributes of the input data.
By limiting the number of nodes in the hidden layers, we can make sure that our model does not memorize the training data and have some generalization capabilities.
For regularization and generalization, we don't use any regularization penalty to train our model, we just limit the number of nodes in the hidden layers.
2. Sparse autoencoder: Instead of limiting the number of nodes in the hidden layers like undercomplete autoencoders, we introduce regularization techniques to regularize or penalize activations instead of weights in our loss function so that it activates only a small number of neurons in a given hidden layer.
Individual nodes of a trained model which activate are data-dependent, different inputs will result in activations of different nodes through the network. For regularization, we can use L1 regularization or KL-divergence regularization techniques.
3. Denoising autoencoder: Another approach towards developing a generalizable model is to slightly corrupt the input data (add some random noise) but still maintain the uncorrupted data as our target output. With this approach, our model isn't able to simply develop a mapping which memorizes the training data because our input and target output are no longer the same. In this way, we train the autoencoder to reconstruct the input from a corrupted version of it.
4. Convolutional autoencoder: It uses convoluted layers to compress images with the help of kernels (filter). Then it uses max pooling to further down-sample the image. To understand fully about CNN, you can visit my this post on CNN.
Applications of an Autoencoder
1. Signal Denoising: Denoising or noise reduction is the process of removing noise from a signal. The signal can be an image, audio or a scanned document.
2. Dimensionality Reduction for Visualization: Lesser the dimension, better the visualization. Autoencoders outperform PCA in this regard as autoencoders work really well with non-linear data while PCA is only meant for linear data.
3. Anomalies and outliers detection: Autoencoders learn to generalize the patterns. So, if anything is out of pattern, it can detect easily. For an anomaly, reconstruction loss is very high as compared to the regular data.
4. Image coloring: It is also used for image coloring.
1. Code Size: It represents number of nodes in the middle layer. Smaller size results in more compression.
2. Number of Layers: An autoencoder can be as deep as we wanted to be. We can have as many layers both in encoder and decoder.
3. Number of nodes per layer: Number of nodes per layer decreases with each subsequent layer in the encoder and increases back in the decoder.
4. Loss Function: You can set any loss function like mean square error, binary cross entropy etc. If input values are in the range of 0 to 1, we typically use cross entropy otherwise we can use the mean square error.
Loss Function is usually composed of two parts:
1. Reconstruction Loss: measures how much is the difference between the original data and reconstructed data.
2. Regularization Penalty: adds some penalty so that the model learns generalization.
Restricted Boltzmann Machines (RBM)
Restricted Boltzmann Machines are shallow and two-layer (input and hidden) neural networks. There is no output layer unlike Autoencoders.
The nodes or neurons are connected to each other across the layers, but no two nodes of the same layer are linked. Due to this restriction, it is called Restricted Boltzmann Machines instead of simple Boltzmann Machines.
It is probabilistic, unsupervised, generative deep learning algorithm. RBM’s objective is to find the joint probability distribution that maximizes the log-likelihood function.
RBM’s are used for:
1. Dimensionality reduction
2. Collaborative filtering for recommender systems
3. Feature learning
4. Topic modelling
5. Helps improve efficiency of Supervised learning
For more details on RBM, please go through this and this article.