Machine Learning Quiz (134 Objective Questions) Start ML Quiz

Deep Learning Quiz (152 Objective Questions) Start DL Quiz

Saturday, 6 July 2019

Building a CNN model in Keras using MNIST dataset

We will implement CNN in Keras using MNIST dataset. To know more about CNN, you can visit my this postWe can download the MNIST dataset through Keras. The MNIST dataset contains images of handwritten digits from 0 to 9. It is divided into 60,000 training images and 10,000 testing images.

I would recommend you to build a simple neural network before jumping to CNN. You can visit my this post to build a simple neural network with Keras. You can download my Jupyter notebook containing following code of CNN from here.

Step 1: Import required libraries

import keras
from keras.datasets import mnist
from keras.utils import to_categorical

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.optimizers import Adam

Step 2: Load MNIST dataset from Keras

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Above line will download the MNIST dataset. Now, lets print the shape of the data.

x_train.shape, x_test.shape, y_train.shape, y_test.shape

Output: ((60000, 28, 28), (10000, 28, 28), (60000,), (10000,))

It is clear from the above output that each image in the MNIST dataset has a size of 28 X 28 pixels which means that the shape of x_train is (60000, 28, 28) where 60,000 is the number of samples. 

Step 3: Reshape the dataset

We have to reshape the x_train from 3 dimensions to 4 dimensions as it is a requirement to process through Keras API. We reshape x_train and x_test because our CNN accepts only a four-dimensional vector. 

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

The value of "x_train.shape[0]" is 60,000. The value 60,000 represents the number of images in the training data, 28 represents the image size and 1 represents the number of channels. 

The number of channels is set to 1 if the image is in grayscale and if the image is in RGB format, the number of channels is set to 3. The last number is 1, which signifies that the images are greyscale.

Now, lets again print the shape of the data.

x_train.shape, x_test.shape, y_train.shape, y_test.shape

Output: ((60000, 28, 28, 1), (10000, 28, 28, 1), (60000,), (10000,))

Step 4: Convert labels into categorical variables (one-hot encoding)

Our labels are ranging from 0 to 9. So, we need to one-hot encode these labels so that these turn into 0 and 1.

y_train, y_test

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

y_train, y_test

Step 4: Create a CNN model

model = Sequential()
model.add(Conv2D(32, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(1024, activation='relu'))
model.add(Dense(10, activation='softmax'))

Explanation of above CNN model:

1. We have created a sequential model which is an in-built model in Keras. We just have to add the layers in this model as per our requirements. In this case, we have added 2 convolutional layers, 2 pooling layers, 1 flatten layer, 2 dense layers and 1 dropout layer. 

2. We have used 32 filters with size 5x5 each in first convolutional layer and then 64 filters in the second convolutional layer.

3. After each convolutional layer, we are adding pooling layer with pool size of 2x2.

4. We are using ReLU activation function in all hidden layers and softmax in output layer. To know more about activation functions, please visit my this and this post.

5. We can also specify stride attribute for convolutional and pooling layers. By default it is (1,1).

6. We have flatten layer just before dense layer. Flatten layer converts the 2D matrix data to a 1D vector before building the fully connected layers.

7. After that, we use a fully connected layer with 1024 neurons.

8. Then we use a regularization layer called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting. Dropout randomly switches off some neurons in the network which forces the data to find new paths. Therefore, this reduces overfitting. To know more about dropout, please visit this post.

9. We add a dense layers at the end which is used for class prediction (0–9). That is why it has 10 neurons. It is also called output layer. This layer uses softmax activation function instead of ReLU.

Step 5: Model Summary


Step 6: Compile the model

model.compile(Adam(lr=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

We are using Adam optimizer with learning rate of 0.0001 and loss function as categorical cross entropy.

Step 7: Train the model, y_train, validation_data=(x_test, y_test), batch_size=128, epochs=5, verbose=2)

We are using batch size of 128 and 5 epochs.

Step 8: Evaluate the model

score = model.evaluate(x_test, y_test, verbose=0)
print('Loss:', score[0])
print('Accuracy:', score[1])

[0.050816776573400606, 0.9856] Loss: 0.050816776573400606 Accuracy: 0.9856

I got the accuracy score of about 98.56%. You can play around with different hyperparameters like learning rate, batch size, number of epochs, adding more convolutional and pooling layers, changing the number and size of filters, changing the size of strides etc.

No comments:

Post a Comment