Learning an Image Classification Model from Scratch¶
Introduction¶
The fashion_mnist dataset consists of 70,000 images of clothing items across 10 categories.
Luckily for us, this dataset is available in a convenient format through Keras, so we will load it and take a look.
But first, let's get the usual technical preliminaries out of the way.
As we did previously, we will first import the following packages and set the seed for the random number generator.
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# initialize the seeds of different random number generators so that the
# results will be the same every time the notebook is run
# keras.utils.set_random_seed(42)
tf.random.set_seed(42)
With the technical preliminaries out of the way, let's load the dataset and take a look.
# load data into x_train, y_train, x_test, y_test
(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
print(x_train.shape, y_train.shape)
There are 60,000 images in the training set, each of which is a 28x28 matrix.
print(x_test.shape, y_test.shape)
The remaining 10,000 images are in the test set.
OK, let's look at the first 10 rows of the dependent variable 𝑦 .
y_train[:10]
What do these numbers mean?
According to the fashion_mnist Github site, this is what each number 0-9 corresponds to:
Label | Description |
---|---|
0 | T-shirt/top |
1 | Trouser |
2 | Pullover |
3 | Dress |
4 | Coat |
5 | Sandal |
6 | Shirt |
7 | Sneaker |
8 | Bag |
9 | Ankle boot |
Create a little Python list so that we can go from numbers to descriptions easily.
# Call the list "labels"
labels = [
"T-shirt/top",
"Trouser",
"Pullover",
"Dress",
"Coat",
"Sandal",
"Shirt",
"Sneaker",
"Bag",
"Ankle boot",
]
Given a number, the description is now a simple look-up. Let's see what the very first training example is about.
labels[y_train[0]]
The very first image is an "Ankle boot"!
Let's take a look at the raw data for the image.
x_train[0]
Let's look at the first 25 images using the handy plt.imshow()
command
# You'll create two variables namely "fig" and "ax" as shown in the screencast.
fig = plt.figure(figsize=(30, 10))
for i in range(25):
ax = fig.add_subplot(5, 5, i + 1, xticks=[], yticks=[])
ax.set_title(f"{labels[y_train[i]]}")
ax.imshow(x_train[i], cmap="gray")
The images are a bit small but they will do for now.
A NN Model - First Attempt¶
Our first NN will be a simple one with a single hidden layer.
Data Prep¶
Tip: NNs learn best when each independent variable is in a small range. So, standardize them by either
- subtracting the mean and dividing by the standard deviation or
- if they are in a guaranteed range, just divide by the max value.
The inputs here range from 0 to 255. Let's normalize to the 0-1 range by dividing everything by 255.
# Standardize x_train and x_test
x_train = x_train / 255.0
x_test = x_test / 255.0
Define Model in Keras¶
As we saw in the previous module, creating an NN is usually just a few lines of Keras code.
- The input will be 28 x 28 matrices of numbers. These will have to be flattened into a long vector and then fed to the hidden layer.
- We will start with a single hidden layer of 256 ReLU neurons.
- Since this is a multi-class classification problem (e.g., we need to predict one of 10 clothing categories), the output layer has to produce a 10-element vector of probabilities that sum up to 1.0 => we will use the softmax layer that we learned about in the previous lecture.
# define the input layer
input = keras.Input(shape=(28, 28))
# convert the 28 x 28 matrix of numbers into a long vector
h = keras.layers.Flatten()(input)
# feed the long vector to the hidden layer
h = keras.layers.Dense(256, activation="relu", name="Hidden")(h)
# feed the output of the hidden layer to the output layer
output = keras.layers.Dense(10, activation="softmax", name="Output")(h)
# tell Keras that this (input,output) pair is your model
model = keras.Model(input, output)
model.summary()
Let's hand-calculate the number of parameters to verify.
# calculate the number of parameters and set the output to "parameters"
parameters = (784 * 256 + 256) + (256 * 10 + 10)
print(parameters)
Set Optimization Parameters¶
Now that the model is defined, we need to tell Keras three things:
- What loss function to use
- Which optimizer to use - we will again use Adam which is an excellent set-and-forget choice
- What metrics you want Keras to report out - in classification problems like this one, Accuracy is usually the metric you want to see.
Since our output variable is categorical with 10 levels, we will select the sparse_categorical_crossentropy
loss function.
# Compile your model
model.compile(
loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)
Train the Model¶
- The batch size: 32 or 64 are commonly used
- The number of epochs i.e., how many passes through the training data: start with 10-20.
OK, let's train the model using the model.fit
function!
# fit your model first try with a batch size of 32 and 10 epochs
batch_size = 64
epochs = 10
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs)
Evaluate the Model¶
You can see from the above that our model achieves over 91% accuracy on the train set but, as we know, doing well on the training set isn't all that impressive due to the possibility of overfitting. So the real question is how well does it do on the test set?
model.evaluate
is a very handy function to calculate the performance of your model on any dataset.
# Evaluate model on test data set
model.evaluate(x_test, y_test)
Did the NNs we create take advantage of the fact that the input data is images?
A Convolutional Neural Network¶
Convolutional Layers:
Convolutional (typically abbreviated to "conv") layers were the key breakthrough that led to all the exciting advances in AI for Computer Vision problems like Image Classification, Image Recognition etc. They were designed to specifically work with images.
Conv layers are the reason why your iPhone can recognize your face!
We will follow the same sequence of steps as we did above:
- Data Prep
- Define Model
- Set Optimization Parameters
- Train Model
- Evaluate Model
Data Prep¶
The data has already been normalized so that the numbers are between 0 and 1. We don't need to do it again.
x_train.shape
For reasons that will become clear when you work with color images, we also need to add another dimension to each example so that it goes from 28x28 to 28x28x1
# add another dimension to x_train and x_test
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
x_train.shape
Define Model¶
OK, we are ready to create our very first Convolutional Neural Network (CNN)!
input = keras.Input(shape=x_train.shape[1:])
# first convolutional block
x = keras.layers.Conv2D(32, kernel_size=(2, 2), activation="relu", name="Conv_1")(
input
) # convolutional layer
x = keras.layers.MaxPool2D()(x) # pooling layer
# second convolutional block
x = keras.layers.Conv2D(32, kernel_size=(2, 2), activation="relu", name="Conv_2")(
x
) # convolutional layer
x = keras.layers.MaxPool2D()(x) # pooling layer
# Flatten the layers
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(256, activation="relu")(x)
# create an output called "output"
output = keras.layers.Dense(10, activation="softmax")(x)
model = keras.Model(input, output)
model.summary()
Set Optimization Parameters¶
# Compile model using sparse_categorical_crossentropy
# and adam, and accuracy as a metric
model.compile(
loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"]
)
Train the Model¶
DISCLAIMER: This will take some time to complete
# Train the model with either 32 or 64 as the batch size and using 10 epochs
model.fit(x_train, y_train, batch_size=64, epochs=10)
Evaluate the Model¶
# Get the score of the model
score = model.evaluate(x_test, y_test)
print("Test accuracy:", score[1])
Back to Fashion MNIST. Let's see what the state of the art (SOTA) accuracy is.
It is 96.91%!
Challenge: Can you get to SOTA by playing around with the architecture of the network?
Conclusion¶
We have built a Deep Learning model that can classify grayscale images of clothing items with over 90% accuracy!!