One of the most common problems of training a deep neural network is that it overfits.
Overfitting occurs when the network learns specific patterns in the training data and is unable to generalize well over new observations.
In this article, we’ll discuss some of the regularization techniques for deep learning which are specifically designed to control overfitting.
These regularization techniques prevent overfitting and help our model to work better on unseen data.
The first technique we are going to discuss is early stopping. It is perhaps the simplest regularization strategy.
As the name suggests in early stopping, we stop the training early.
By stopping the training of our model early, we can prevent our model from overfitting.
For instance, our model might keep reducing its loss in the training data and keep increasing its loss in the validation data. This is a sign of overfitting.
If you don’t know what a callback is basically they are just functions that are executed during the training process which returns information from the training algorithm. I have written an article explaining some of the commonly used callbacks. To learn more about them you can read my articles Keras Callbacks and Keras Custom Callbacks.
Below is the signature of the early stopping callback
keras.callbacks.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0)
- monitor: quantity to be monitored
- min_delta: minimum change in the monitored quantity to qualify as an improvement
- patience: number of epochs that produced the monitored quantity with no improvement after which training will be stopped
Another common type of regularization technique is to inject Gaussian noise to the network.
A common approach is to add noise to the input data of the network during the training procedure.
However, in some situations adding noise to the hidden units or to the network weights also leads to improved generalization performance, thus reducing the effect of overfitting.
In Keras, to introduce Gaussian noise to the network we can use the following function
Dropout is an effective way of regularizing neural network, which can be applied to the output of some of the network layers to avoid overfitting.
The key idea here is to randomly drop some proportion of neurons along with their connections from the neural network during training.
The dropout layer has something called a Dropout rate(p), which ranges between 0 and 1 (both included).
If n is the number of neurons in the hidden layer and p is the dropout rate, then only (p*n) neurons will be active at each given time.
We randomly drop the neurons in the hidden layer based on the dropout rate.
Assume the dropout rate, p = 0.5, and there are 256 neurons in our hidden layer. This means at each given time, only half the neurons will be active, that is, p * n = 0.5 * 256 = 128.
For each iteration, a different set of neurons will be dropped out.
The neural network requires a lot of data to train, and our model might start to overfit if our training data is too small.
Data augmentation is a regularization technique that aims to combat this by increasing the size of the training set artificially.
Data augmentation depends on the type of data. For some types of data, it may be easy to create artificial data like images, and for some data like the text, it may not be very easy.
In the case of image data, we can artificially create new images by slightly rotate, resize, flip the image horizontally or vertically, skewing the image, etcetera.
In this article, we’ll see how to augment image data by using Keras. Keras makes it very easy for us to do image augmentation using the ImageDataGenerator class.
keras.preprocessing.image.ImageDataGenerator(featurewise_center=False, samplewise_center=False, featurewise_std_normalization=False, \
samplewise_std_normalization=False, zca_whitening=False, zca_epsilon=1e-06, rotation_range=0,\
width_shift_range=0.0, height_shift_range=0.0, brightness_range=None, shear_range=0.0, zoom_range=0.0,\
channel_shift_range=0.0, fill_mode='nearest', cval=0.0, horizontal_flip=False, vertical_flip=False,\
rescale=None, preprocessing_function=None, data_format='channels_last', validation_split=0.0,\
To learn more about these arguments visit Keras Documentation.
L1 AND L2 REGULARIZATION:
These are, by far, the most common regularization technique. The basic idea is that during the training of our model, we try to impose certain constraints on the model weights and control how much the weights can grow or shrink in the network during training.
We do this by adding another term to the cost function called regularization term.
Λ is a regularization parameter that adjusts the weight we give to the regularization term. It is a hyperparameter whose value needs to be tuned for better results.
L1 regularizer minimizes the sum of absolute values of the weights.
The L1 regularizer leads to weights that are very close to zero — thus making the network to become dependent only on essential inputs and not on noisy ones.
The L2 regularizer is also known as weight decay as it forces the weights of the network to decay towards zero, but not precisely zero like the L1 regularizer.
Below snippet of code demonstrates how we can add l2 regularization to our network.
from keras import regularizers
During training deep neural networks, it is possible that the distribution of each layer’s input changes as the parameters of the previous layers changes. This phenomenon is well known as the internal covariate shift.
We can avoid this problem by normalizing our data in mini-batches, using mean and variance.
Batch normalization normalizes the output from a layer with zero mean and unit variance. In doing so, the input distribution of the data per batch has less effect on the network.
Batch normalization also acts as a regularizer that prevents the model from overfitting. For this reason, in some cases, they are used instead of dropout layers.
To create a batch normalization layer in Keras, you can use the following function:
Now let’s see an example on utilizing some of these regularization techniques
Let’s get started.
We’ll start by importing all the necessary modules.
import numpy as np
import keras.backend as K
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.layers import Conv2D, BatchNormalization, MaxPooling2D, ReLU
from keras.datasets import fashion_mnist
from keras.optimizers import Adam
Next, let’s build our CNN model.
batch_size = 128
num_classes = 10
epochs = 50
img_rows, img_cols = 28, 28
#load the data and normalize the inputs
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
if K.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape, 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape, 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
X_train = X_train.reshape(X_train.shape, img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape, img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(2,2), padding='same', \
input_shape=(X_train.shape, X_train.shape, 1)))
model.add(Conv2D(filters=64, kernel_size=(2,2), padding='same'))
optimizer=Adam(lr=0.0001, decay=1e-5), \
Now for image augmentation let’s define the image data generator.
datagen = ImageDataGenerator(width_shift_range=0.1, height_shift_range=0.1, rotation_range=40,zoom_range=0.2, \
history = model.fit_generator(datagen.flow(X_train, y_train, validation_data=(X_test, y_test), \
batch_size=batch_size),steps_per_epoch = len(X_train) // batch_size, \
score = model.evaluate(X_test, y_test, batch_size=batch_size)
Regularization is a technique that prevents overfitting and helps our model to work better on unseen data.
In this tutorial, we have discussed various regularization techniques for deep learning.
EARLY STOPPING: As the name suggests in early stopping, we stop the training early. By stopping the training of our model early, we can prevent our model from overfitting.
INJECT NOISE: In this technique, we’ll add Gaussian noise to the network.
DROPOUT: The key idea here is to randomly drop some proportion of neurons along with their connections from the neural network during training.
DATA AUGMENTATION: In this technique, we increase the size of the training set artificially.
L1 AND L2 REGULARIZER: Imposing certain constraints on the model weights and control how much the weights can grow or shrink in the network during training.
BATCH NORMALIZATION: Normalizing the output from a layer with zero mean and unit variance.
Complete code for this tutorial can be found in this Github Repo.