1. L1 and L2 Regularization
L1 and L2 are the most common types of regularization techniques used in machine learning as well as in deep learning algorithms. These update the general cost function by adding another term known as the regularization penalty.
For more details, please go through my this article.
Dropout can be seen as temporarily deactivating or ignoring neurons in the hidden layers of a network. Probabilistically dropping out nodes in the network is a simple and effective regularization method. We can switch off some neurons in a layer so that they do not contribute any information or learn any information and the responsibility falls on other active neurons to learn harder and reduce the error.
For more details on dropout, please consider visiting my this post.
3. Data Augmentation
Creating new data by making reasonable modifications to the existing data is called data augmentation. Lets take an example of our MNIST dataset (hand written digits). We can easily generate thousands of new similar images by rotating, flipping, scaling, shifting, zooming in and out, cropping, changing or varying the color of the existing images.
We can use data augmentation technique when our model is overfitting due to less data.
In many cases in deep learning, increasing the amount of data is not a difficult task as we discussed above the case of MNIST dataset. In machine learning, this task is not that easy as we need labeled data which is not easily available.
4. Early Stopping
While training a neural network, there will be a point during training when the model will stop generalizing and start learning the noise in the training dataset. This leads to overfitting.
One approach to solve this problem is to treat the number of training epochs as a hyperparameter and train the model multiple times with different values, then select the number of epochs that result in the best performance.
The downside of this approach is that it requires multiple models to be trained and discarded. This can be computationally inefficient and time-consuming.
Another approach is early stopping. The model is evaluated on a validation dataset after each epoch. If the performance of the model on the validation dataset starts to degrade (e.g. loss begins to increase or accuracy begins to decrease), then the training process is stopped. The model at the time when the training is stopped, is then used and is known to have good generalization performance.