Overfitting in neural networks
Large neural networks trained on relatively small datasets can overfit the training data. Over-fitted neural networks results in poor performance when the model is evaluated on new data. Dropout is an efficient solution to handle this over-fitting problem in neural networks.
What happens in dropout?
Dropout can be seen as temporarily deactivating or ignoring neurons in the hidden layers of a network. Probabilistically dropping out nodes in the network is a simple and effective regularization method. We can switch off some neurons in a layer so that they do not contribute any information or learn any information and the responsibility falls on other active neurons to learn harder and reduce the error.
Points to note about dropout
1. Dropout is implemented per-layer in a neural network. Dropout can be implemented in hidden and input layers, but not in output layers.
We can use different probabilities for dropout on each layer. As mentioned previously, dropout should not be implemented on output layer, so the output layer would always have keep_prob = 1 and the input layer has high keep_prob such as 0.9 or 1.
If a hidden layer has keep_prob = 0.8, this means that on each iteration, each unit has 80% probability of being included and 20% probability of being dropped out.
This probability acts as a hyper-parameter and we should carefully decide how many neurons we want to deactivate in a given hidden layer.
2. Dropout can be used with many types of layers, such as dense fully connected layers, convolutional layers, and recurrent layers such as the long short-term memory network (LSTM) layers.
3. Dropout should be implemented only during training phase, not in testing phase.
4. Dropout can be compared to bagging technique in machine learning. In bagging, all trees are not trained on all the features. Similarly, using dropout, all the hidden layers are not trained on all the features.
Advantages of dropout
1. Reduces overfitting and hence increases the accuracy of the model
2. Improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark datasets.
3. Computationally cheap as compared to other regularization methods.
Disadvantages of dropout
1. Introduces sparsity: If we use dropout to a large extent, activations inside the hidden layers may become sparse. You can correlate it with sparse autoencoders.
2. Dropout makes training process noisy as it forces nodes within a layer to probabilistically take on more or less responsibility for the inputs.