Saturday 6 July 2019

Fine-tune VGG16 model for image classification in Keras

Keras framework provides us a lot of pre-trained general purpose deep learning models which we can fine-tune as per our requirements. We don't need to build a complex model from scratch. In my last article, we built a CNN model from scratch for image classification. Instead of that, we can just fine-tune an existing, well-trained, well-proven, widely accepted CNN model which will save our a lot of effort, time and money.

VGG16 is a proven proficient algorithm for image classification (1000 classes of images). Keras framework already contain this model. We will import this model and fine-tune it to classify the images of dogs and cats (only 2 classes instead of 1000 classes).

You can download my Jupyter notebook containing below code from here.

Step 1: Import the required libraries

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

Step 2: Create directory structure to contain images

We will create a directory structure which will contain the images of dogs and cats.

I have created a directory "cats_and_dogs". Under this directory, I have created 3 other directories "test", "train" and "valid". All these 3 directories contain "cat" and "dog" directories. 

1. "cat" and "dog" directories under "test" directory contain 5 images of cats and dogs respectively. Total 10 images for testing.

2. "cat" and "dog" directories under "train" directory contain 20 images of cats and dogs respectively. Total 40 images for training.

3. "cat" and "dog" directories under "valid" directory contain 8 images of cats and dogs respectively. Total 16 images for validation.

Step 3: Data Preparation

train_path = 'C:/cats_and_dogs/train'
valid_path = 'C:/cats_and_dogs/valid'
test_path = 'C:/cats_and_dogs/test'

train_batches = ImageDataGenerator().flow_from_directory(train_path, target_size=(224,224), classes=['dog','cat'], batch_size=10)

valid_batches = ImageDataGenerator().flow_from_directory(valid_path, target_size=(224,224), classes=['dog','cat'], batch_size=4)

test_batches = ImageDataGenerator().flow_from_directory(test_path, target_size=(224,224), classes=['dog','cat'], batch_size=10)

Found 40 images belonging to 2 classes. Found 16 images belonging to 2 classes. Found 10 images belonging to 2 classes.

In the above code, we are generating the images of 224x224 pixels and categorizing these images into cat and dog classes. It is clear from the output that we have 40 images for training, 16 images for validation and 10 images for testing as mentioned in step 2.

Step 4: Print the images

Lets output some of the images which we have prepared in step 3. Following is the standard code to print the images (copied from Keras documentation)

def plots(ims, figsize=(12,6), rows=1, interp=False, titles=None):
    if type(ims[0]) is np.ndarray:
        ims = np.array(ims).astype(np.uint8)
        if (ims.shape[-1] != 3):
            ims = ims.transpose((0,2,3,1))
    f = plt.figure(figsize=figsize)
    cols = len(ims)//rows if len(ims) % 2 == 0 else len(ims)//rows + 1
    for i in range(len(ims)):
        sp = f.add_subplot(rows, cols, i+1)
        if titles is not None:
            sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i], interpolation=None if interp else 'none')

Now, lets print the first batch of training images:

imgs, labels = next(train_batches)
plots(imgs, titles=labels)


We can see the scaled images of 10 cats and dogs. If you run again the above code, it will fetch next 10 images from training dataset as we are using batch size of 10 for training images.

Step 5: Load and analyze VGG16 model

vgg16_model = keras.applications.vgg16.VGG16()

In the above code, first line will load the VGG16 model. It may take some time. By executing second line, we can see summary of the existing model. It has a lot of convolutional, pooling and dense layers. Executing third line, we can see this model is of type "Model". In next step, we will create a model of type "Sequential".

Step 6: Fine-tune VGG16 model

Following are the steps involved in fine-tuning a model:

1. Copy all the hidden layers in a new model
2. Remove output layer
3. Freeze the hidden layers
4. Add custom output layer

For more details on fine-tuning a model, please visit my this post.

Lets perform all the above steps.

model = Sequential() for layer in vgg16_model.layers[:-1]: model.add(layer)

In the above code, we have created a new sequential model and copied all the layers of VGG16 model except the last layer which is an output layer. We have done this because we want our custom output layer which will have only two nodes as our image classification problem has only two classes (cats and dogs).

Now, if we execute following statement, we will get replica of existing VGG16 model, except output layer.


Now, lets freeze the hidden layers as we don't want to change any weight and bias associated with these layers. We want to use these layers as it is as all these layers are already well trained on image classification problem.

for layer in model.layers: layer.trainable = False

Now, add a custom output layer with only two nodes and softmax as activation function.

model.add(Dense(2, activation='softmax'))

Now, our new fine-tuned model is ready. Lets train it with new data and then predict from it.

Step 7: Compile the model

model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

Using Adam as an optimizer and categorical cross entropy as loss function.

Step 8: Train the model

model.fit_generator(train_batches, steps_per_epoch=4, validation_data=valid_batches, validation_steps=4, epochs=5, verbose=2)

Executing this step will take some time as we are using 5 epochs.

Step 9: Predict from the model

Lets print first batch of the test images.

test_imgs, test_labels = next(test_batches) plots(test_imgs, titles=test_labels)

From the output, we can see that it shows the final results in form of [0. 1.], [1. 0.] etc. Lets format this output so that we can get it in form of 0, 1 etc.

test_labels = test_labels[:,0] test_labels

Now, finally make prediction.

predictions = model.predict_generator(test_batches, steps=1, verbose=0)

It shows the predictions in form of probabilities. Lets round it off.

rounded_predictions = np.round(predictions[:,0])

Step 10: Check the accuracy

confusionMatrix = confusion_matrix(test_labels, rounded_predictions)
accuracyScore = accuracy_score(test_labels, rounded_predictions)
classificationReport = classification_report(test_labels, rounded_predictions)
print(accuracyScore * 100)

Please note that we won't get desired accuracy with this small dataset. We need thousands of image to train our model to get desired accuracy. We can use data augmentation to increase the data. You can download thousands of images of cats and dogs from Kaggle to train this model.

Building a CNN model in Keras using MNIST dataset

We will implement CNN in Keras using MNIST dataset. To know more about CNN, you can visit my this postWe can download the MNIST dataset through Keras. The MNIST dataset contains images of handwritten digits from 0 to 9. It is divided into 60,000 training images and 10,000 testing images.

I would recommend you to build a simple neural network before jumping to CNN. You can visit my this post to build a simple neural network with Keras. You can download my Jupyter notebook containing following code of CNN from here.

Step 1: Import required libraries

from keras.datasets import mnist
from keras.utils import to_categorical

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.optimizers import Adam

import matplotlib.pyplot as plt
%matplotlib inline

Step 2: Load MNIST dataset from Keras

(X_train, y_train), (X_test, y_test) = mnist.load_data()

Above line will download the MNIST dataset. Now, lets print the shape of the data.

X_train.shape, X_test.shape, y_train.shape, y_test.shape

Output: ((60000, 28, 28), (10000, 28, 28), (60000,), (10000,))

It is clear from the above output that each image in the MNIST dataset has a size of 28 X 28 pixels which means that the shape of x_train is (60000, 28, 28) where 60,000 is the number of samples. 

We can visualize the images using matplotlib library. Lets see first image.

plt.imshow(X_train[0], cmap='gray')

Step 3: Reshape the dataset

We have to reshape the X_train from 3 dimensions to 4 dimensions as it is a requirement to process through Keras API. We reshape X_train and X_test because our CNN accepts only a four-dimensional vector. 

X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

The value of "X_train.shape[0]" is 60,000. The value 60,000 represents the number of images in the training data, 28 represents the image size and 1 represents the number of channels. 

The number of channels is set to 1 if it is in grayscale image and if the image is in RGB format, the number of channels is set to 3. The last number is 1, which signifies that the images are greascale.

Now, lets again print the shape of the data.

X_train.shape, X_test.shape, y_train.shape, y_test.shape

Output: ((60000, 28, 28, 1), (10000, 28, 28, 1), (60000,), (10000,))

Step 4: Convert the image pixels in the range between 0 and 1

Lets us print first image:

You will notice that it contains the values ranging from 0 to 255. We need to scale this data between 0 and 1 for accurate results. Ideally, all the inputs to the neural network should be between 0 and 1. So, lets do it.

First convert X_train and X_test to float data type.
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

Now, divide all the values by 255.
X_train /= 255.0
X_test /= 255.0

Again print the first image:

You will notice that all the pixels are ranging from 0 to 1 only which is a perfect data for our neural network.

Step 5: Convert labels into categorical variables (one-hot encoding)

Our labels are ranging from 0 to 9. So, we need to one-hot encode these labels so that these turn into 0 and 1.

y_train, y_test


y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

y_train, y_test

array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)

Please note that how the first image "5" is transformed into one hot encoded format. You can try the same with different images and note their one hot encoded representation.

Step 6: Create a CNN model

model = Sequential()
model.add(Conv2D(32, kernel_size=(5,5), input_shape=(28,28,1), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(5,5), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(1024, activation='relu'))
model.add(Dense(10, activation='softmax'))

Explanation of above CNN model:

1. We have created a sequential model which is an in-built model in Keras. We just have to add the layers in this model as per our requirements. In this case, we have added 2 convolutional layers, 2 pooling layers, 1 flatten layer, 2 dense layers and 1 dropout layer. 

2. We have used 32 filters with size 5X5 each in first convolutional layer and then 64 filters in the second convolutional layer.

3. We are using zero padding in each convolutional layer.

. After each convolutional layer, we are adding pooling layer with pool size of 2X2.

. We are using ReLU activation function in all hidden layers and softmax in output layer. To know more about activation functions, please visit my this and this post.

. We can also specify stride attribute for convolutional and pooling layers. By default it is (1,1).

. We have flatten layer just before dense layer. Flatten layer converts the 2D matrix data to a 1D vector before building the fully connected layers.

. After that, we use a fully connected layer with 1024 neurons.

. Then we use a regularization layer called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting. Dropout randomly switches off some neurons in the network which forces the data to find new paths. Therefore, this reduces overfitting. To know more about dropout, please visit this post.

. We add a dense layers at the end which is used for class prediction (0–9). That is why it has 10 neurons. It is also called output layer. This layer uses softmax activation function instead of ReLU.

Step 7: Model Summary


Layer (type)                 Output Shape              Param #  
conv2d_1 (Conv2D)            (None, 28, 28, 32)        832      
max_pooling2d_1 (MaxPooling2 (None, 14, 14, 32)        0        
conv2d_2 (Conv2D)            (None, 14, 14, 64)        51264    
max_pooling2d_2 (MaxPooling2 (None, 7, 7, 64)          0        
flatten_1 (Flatten)          (None, 3136)              0        
dense_1 (Dense)              (None, 1024)              3212288  
dropout_1 (Dropout)          (None, 1024)              0        
dense_2 (Dense)              (None, 10)                10250    
Total params: 3,274,634
Trainable params: 3,274,634
Non-trainable params: 0

Step 8: Compile the model

model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

We are using Adam optimizer with learning rate of 0.0001 and loss function as categorical cross entropy. You can skip the learning rate parameter to the optimizer and let Keras itself decide the optimal value of the learning rate.

Step 9: Train the model

history_cnn =, y_train, validation_data=(x_test, y_test), batch_size=128, epochs=5, verbose=2)

We are using batch size of 128 and 5 epochs.

Train on 60000 samples, validate on 10000 samples
Epoch 1/5
 - 252s - loss: 0.4103 - acc: 0.8930 - val_loss: 0.1250 - val_acc: 0.9622
Epoch 2/5
 - 253s - loss: 0.1059 - acc: 0.9691 - val_loss: 0.0725 - val_acc: 0.9778
Epoch 3/5
 - 276s - loss: 0.0704 - acc: 0.9785 - val_loss: 0.0507 - val_acc: 0.9836
Epoch 4/5
 - 310s - loss: 0.0542 - acc: 0.9834 - val_loss: 0.0403 - val_acc: 0.9870
Epoch 5/5
 - 371s - loss: 0.0452 - acc: 0.9864 - val_loss: 0.0352 - val_acc: 0.9887

Step 10: Print plots

Accuracy Plot

Validation Accuracy Plot

Loss Plot

Step 11: Evaluate the model

score = model.evaluate(x_test, y_test, verbose=0)
print('Loss:', score[0])
print('Accuracy:', score[1])

[0.03515956207765266, 0.9887]
Loss: 0.03515956207765266
Accuracy: 0.9887

I got the accuracy score of about 98.87%. You can play around with different hyperparameters like learning rate, batch size, number of epochs, adding more convolutional and pooling layers, changing the number and size of filters, changing the size of strides etc.

Friday 5 July 2019

All about Keras Framework in Deep Learning

Keras is a widely used framework to implement neural networks in deep learning. Keras is very easy to use and understand and has a large community support. Below are the points which illustrate some strengths and limitations of Keras framework:

1. High Level Framework: Keras is an open source and high level neural network framework, written in Python.

2. Supports Multiple Backends: Keras uses TensorFlow as backend by default but you can also configure it to use Theano or CNTK as backend.

3. Cross Platform and Easy Model Deployment: Keras can run on all major operating systems. Keras supports a lot of devices and platforms, so we can deploy Keras on any device like iOS with CoreML, Android with Tensorflow Android, Web browser with .js support, Cloud engine, Raspberry Pi etc.

4. Multiple CPU and GPU compatible: Keras has built-in support for data parallelism, so it can process large volumes of data and speed up the time needed to train it.

5. Easy to use and understand: Keras is easy to use and understand. You can easily implement complex neural networks with few lines of code. You don't need to understand low level details as it is already a wrapper around complex low level frameworks like TensorFlow, Theano or CNTK. So, it is a boon for beginners.

Related links
Create a simple sequential mode in Keras
Create a CNN model in Keras

6. Pre-trained models: Keras contains a lot of pre-trained neural network models for our general purpose requirements. For example, for image classification, we don't need to create a CNN model from scratch. We can fine-tune an existing and well trained model called VGG16 for this purpose. Similarly, there are a lot of other models available with Keras like InceptionV3, ResNet, MobileNet, Xception, InceptionResNetV2 etc. which we just need to fine-tune as per our needs.

Related links:
What is fine-tuning?
Fine-tuning VGG16 model

7. Great community: As mentioned earlier, Keras has a great community support. You can easily find a lot of tutorials, detailed articles on various concepts, solved examples and a lot more. Keras is also very well documented.

Limitations of Keras

As stated in point 1 and 2, Keras is only a high level API which uses other frameworks like TensorFlow, Theano and CNTK to perform low level tasks. If you want to research or write your own custom algorithm in deep learning project, you should use Tensorflow instead.

Tuesday 2 July 2019

Building a simple sequential neural network with dense layers in Keras

Lets understand how can we create a simple neural network in Keras. We will create a simple sequential model with dense layers (fully connected layers). We will use relu as an activation function in hidden layers and softmax in outer layer and Adam as SGD.

You can download my Jupyter notebook containing below code from here.

Step 1: Import required libraries

import numpy as np
from random import randint
from sklearn.preprocessing import MinMaxScaler

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam

from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

Step 2: Create training and test dataset

We will create a hypothetical medical data and will try to predict whether a drug has any side effect or not on the people of different age groups. 

People are divided into two age groups: 
1. 13 years to 64 years and 
2. 65 years to 100 years. 

Label equal to 1 means that drug has side effect and 0 means no side effect. 

We will create 2100 training observations. One array contains age which acts as sample and other array contains 0 and 1 which acts as label.

train_samples = []
train_labels = []
for i in range(50):
    random_younger = randint(13,64)
    random_older = randint(65,100)
for i in range(1000):
    random_younger = randint(13,64)
    random_older = randint(65,100)
Convert the above lists into numpy arrays as Keras expects samples and labels in the form of numpy arrays.

train_samples = np.array(train_samples)
train_labels = np.array(train_labels)

Similarly, create a test dataset.

test_samples = []
test_labels = []
for i in range(10):
    random_younger = randint(13,64)
    random_older = randint(65,100)
for i in range(200):
    random_younger = randint(13,64)
    random_older = randint(65,100)
test_samples = np.array(test_samples)
test_labels = np.array(test_labels)

Step 3: Scale the training and test data

scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform((train_samples).reshape(-1,1))
scaled_test_samples = scaler.fit_transform((test_samples).reshape(-1,1))

This is a preprocessing step. We need to scale our sample data in the range of 0 and 1. This is called feature scaling. For more details on feature scaling, you can go through my this post.

Step 4: Create a model

We will create a sequential model which is a linear stack of layers. We can create a sequential model by passing a list of layer instances to the constructor like this:

model = Sequential([
    Dense(16, input_shape=(1,), activation='relu'),
    Dense(32, activation='relu'),
    Dense(2, activation='softmax'),

We can also simply add layers using .add() method:

model = Sequential()
model.add(Dense(16, input_shape=(1,), activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(2, activation='softmax'))

We are using dense layers in the above Keras code which denote fully connected layers in a neural network.

For hidden layers, we are using relu activation function and for outer layer, we are using softmax activation function. To know the difference between relu and softmax activation functions, please consider my this post.

Step 5: Model Summary


It will show the description of all the layers and parameters.

Step 6: Compile a model

model.compile(Adam(lr=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

We need to pass the optimizer we want to use, learning rate, loss function and metrics. We are using Adam as an optimizer. This is a variant of SGD (Stochastic Gradient Descent). There are a lot of other optimizers. To go in detail, you can consider visiting my this post.

Step 7: Train a model, train_labels, validation_split=0.1, batch_size=10, epochs=20, shuffle=True, verbose=2)

We need to pass training sample and label data, validation set, batch size, epochs, shuffle and verbose parameters. Validation set helps in removing the overfitting and increasing the generalization capabilities of the network. By default, shuffle is always true. These parameters are called hyperparameters and we need to tune these parameters. You can try with different batch sizes and epochs and observe the change in the results.

Step 8: Predict from the model

predictions = model.predict(scaled_test_samples, batch_size=10, verbose=0)
for i in predictions:

Above code will give us the predictions in form of probabilities. If we need exact predictions, we need to use following code. Instead of predict, we need to use predict_classes function.

rounded_predictions = model.predict_classes(scaled_test_samples, batch_size=10, verbose=0)
for i in rounded_predictions:

Step 9: Check accuracy

We are going to use confusion matrix, accuracy score and classification report to check the accuracy of our neural network.

confusionMatrix = confusion_matrix(test_labels, rounded_predictions)
accuracyScore = accuracy_score(test_labels, rounded_predictions)
classificationReport = classification_report(test_labels, rounded_predictions)
print(accuracyScore * 100)

Hyperparameter Tuning: In steps 6, 7 and 8, we are using a lot of hyperparameters. Network does not learn these parameters by itself. So, we need to tune these parameters explicitly in order to improve the performance and accuracy of the network. For more information on hyperparameters, you can go through my this post.

Related: Build a CNN model using Keras framework

About the Author

I have more than 10 years of experience in IT industry. Linkedin Profile

I am currently messing up with neural networks in deep learning. I am learning Python, TensorFlow and Keras.

Author: I am an author of a book on deep learning.

Quiz: I run an online quiz on machine learning and deep learning.