When we have deep neural networks, the biggest problem is overfitting. We can say a neural network overfits when it has excellent performance in the training data and has a poor performance in unseen data.
We need to create more randomness in the network to reduce overfitting.
That’s what a dropout does. Dropout is a regularization technique which helps us to create more randomness in the neural network.
The key idea here is to randomly drop neurons along with their connections from the neural network during training.
The dropout layer has something called a Dropout rate(p) which ranges between 0 and 1 (both included).
If n is the number of neurons in the hidden layer and p is the dropout rate then only (p*n) neurons will be active at each given time.
We randomly drop the neurons in the hidden layer based on the dropout rate.
Assume the dropout rate, p = 0.5, and there are 256 neurons in our hidden layer. This means at each given time only half the neurons will be active, that is, p * n = 0.5 * 256 = 128.
For each iteration, a different set of neurons will be dropped out.
Now, what we are going to do during backpropagation on those dropout neurons?
It’s simple, during backpropagation we have to update the weights only on those neurons which are active during the forward propagation.
However, during testing, we pass all the features of the query point through the network.
At that time, we cannot randomly switch off some of our neurons because we want to use the full representational power of our model.
So, the idea during test time is straightforward.
At training time a neuron is present with the probability p. Assume that at the end of the training it has weight w. At test time, all the neurons are always present but with the weights p*w.
Moreover, now you might think why p times w? That’s because while training the model that particular neuron was present only p% of times.
Dropout acts as regularization and makes the network less prone to overfitting.
We should use a higher dropout rate if we have a very deep network. A higher dropout rate implies lower chances of overfitting.
I hope this post has given you a basic understanding of how dropout works. If you want to delve deep into this topic I suggest you to read the original paper.