In this task, we will learn how to find the hyperparameters in a neural network setting. We will use MNIST dataset, which is a handwritten digit recognition dataset. More on MNIST: http://yann.lecun.com/exdb/mnist/
import matplotlib.pyplot as plt
%matplotlib inline
import random
import tensorflow as tf
from keras.datasets import mnist
import pandas as pd
import numpy as np
#from keras.utils import to_categorical
from tensorflow.keras.utils import to_categorical
from keras.layers import Dense
from keras.models import Sequential
from numpy.random import seed
MNIST handwritten digit recognition dataset
if from keras.utils import to_categorical does not work, use from tensorflow.keras.utils import to_categorical
#Fortunately Keras already have this dataset on its server. Let's load the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train.shape
x_test.shape
from sklearn.model_selection import train_test_split
x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train, test_size=0.20, random_state=3037)
# DONOT change the random_state variable
X_train.shape
X_valid.shape
d = pd.DataFrame()
d["y"] = y_train
# Let's analyze, how the distribution for the class variable looks like:
d.y.value_counts()
The images we have are in the 28 * 28 dimension. For a simple shallow neural network, we donot have to consider the height and width of the images separately. That means, we can simply make a long vector of 28 * 28 = 784 dimension and work with it.
# Flattening the images
flattened_image = x_train.shape[1]*x_train.shape[2]
x_train = x_train.reshape(x_train.shape[0], flattened_image)
x_valid = x_valid.reshape(x_valid.shape[0], flattened_image)
x_test = x_test.reshape(x_test.shape[0], flattened_image)
x_train.shape
x_valid.shape
x_test.shape
# converting the classes to categorical variable
classes = np.unique(y_train)
classes
num_classes = len(classes)
num_classes
For the specific loss function we will use "categorical cross entropy", it is important to convert the target class in an one-hot-encoded form
y_train = to_categorical(y_train)
y_train
y_valid = to_categorical(y_valid)
y_test = to_categorical(y_test)
# Helper function 1:
#create the neural network
def create_model(num_layers, num_units, flattened_size, activation_function, class_size):
'''
We create a neural network in this function. You can choose the layer size, the units perlayer, activation function,
and size of the class. This function creates the basic architecture required to handle the tasks.
'''
model = Sequential()
model.add(Dense(units=num_units, activation=activation_function, input_shape=(flattened_size,)))
for i in range(num_layers-1):
model.add(Dense(units=num_units, activation=activation_function))
model.add(Dense(units=num_classes, activation='softmax'))
return model
def evaluate(FCmodel, Learning_Rate, batch_sz, epochs, add_visualization=True):
'''
This function will run and evaluate the neural network.
*** Important: DoNOT change the seed values. This will ensure the reproducibility of the experiments.
'''
seed(37)
random.seed(37)
tf.random.set_seed(37)
opt = tf.keras.optimizers.SGD(learning_rate=Learning_Rate)
FCmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
history = FCmodel.fit(x_train, y_train, batch_size=batch_sz, epochs=epochs,
verbose=add_visualization, validation_data=(x_valid, y_valid))
loss, accuracy = FCmodel.evaluate(x_valid, y_valid, verbose=False)
if add_visualization:
print(f'validation loss: {loss:.4}')
print(f'validation accuracy: {accuracy:.4}')
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model performance')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')
plt.show()
return loss, accuracy
Let's check, how our model performs. Run the followung cell and make sure you get the EXACT same values as output.
special_model = create_model(num_layers=4, num_units=16, flattened_size=x_test.shape[1],
activation_function='sigmoid', class_size=num_classes)
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=256, epochs=100)
special_model.summary()
The first one is done for you
# Below are the five classes of hyperparameters
LAYERS = [2, 3, 4]
UNITS = [4, 8, 16, 32, 64, 128, 256]
BATCH = [8, 16, 32, 64, 128]
LEARNING_RATE = [0.1, 0.01, 0.001, 0.0001, 0.00001]
EPOCHS = [10, 20, 30, 40, 100]
# Question 1: create a Layer-vs-accuracy graph for the following setting:
# units = [32, 128], Batch_size = [16, 64], Learning_rate = [0.1], Epochs = [10, 20]
# That means there will be 2*2*1*2 = 8 graphs
%%time
for B in [16, 64]:
for E in [10,20]:
for U in [32, 128]:
ACC = []
for L in LAYERS:
special_model = create_model(num_layers=L, num_units=U, flattened_size=x_test.shape[1],
activation_function='sigmoid', class_size=num_classes)
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=B, epochs=E, add_visualization=False)
ACC.append(acc)
plt.plot(LAYERS, ACC)
plt.xticks(LAYERS)
plt.xlabel("Layer_size")
plt.ylabel("Accuracy")
TITLE = "Layer_size vs Acc for Batch size {}, Epoch {}, Unit size {} ".format(B, E, U)
plt.title(TITLE)
plt.show()
# Question 2: create a Unit size-vs-accuracy graph for the following setting:
# Layer size = [2], Batch_size = [16, 64], Learning_rate = [0.1], Epochs = [10, 20]
# That means there will be 1*2*1*2 = 4 graphs
## code starts here
# Question 3: create a Batch size-vs-accuracy graph for the following setting:
# Layer size = [2], Unit_size = [16, 32], Learning_rate = [0.1], Epochs = [10, 20]
# That means there will be 1*2*1*2 = 4 graphs
## code starts here
# Question 4: create a Learning rate-vs-accuracy graph for the following setting:
# Layer size = [2], Unit_size = [16, 32], Batch_size = [16], Epochs = [10, 20]
# That means there will be 1*2*1*2 = 4 graphs
## code starts here
# Question 5: create a Epoch size-vs-accuracy graph for the following setting:
# Layer size = [2], Unit_size = [32], Batch_size = [8, 16], Learning_rate = [0.1]
# That means there will be 1*1*2*1 = 2 graphs
## code starts here
Grid search of hyperparameters to find the best set of hyperparameters.
Rank the best set of parameters among the following setting.
Also comment the three best and three worst setting of hyperparameters
def hyperparameter_ranking(LAYERS, UNITS, BATCH_SIZE, LEARNING_RATE, EPOCHS) ->pd.DataFrame:
'''
Your code should return a DataFrame where columns will be LAYERS, UNITS, BATCH_SIZE, LEARNING_RATE, EPOCHS and corresponding accuracy.
The dataframe MUST be sorted in descending order based on the accuracy like the following cell. There will be 270 row
'''
## Your DataFrame table should be of the following form. Please note: the accuracy values are shown here are just for demonstration
# purposes. The accuracy should be in the descending order
# | LAYERS | UNITS | BATCH_SIZE | LEARNING_RATE | EPOCHS | Accuracy
# 0 3 16 32 0.1 10 0.71
# 1 4 32 16 0.001 20 0.68
# ............................................................
# ............................................................
# ............................................................
# 269 4 128 16 0.1 10 0.43
In this task, you have to find some advanced hyperparameters:
Special Note: You will investigate different settings for these hyperparameters as well. For example, for ADAM, there are beta_1 and beta_2 values. You can assume them to be default value (0.9 and 0.999). For other Optimizers, and kernel intializers, you can assume their default value for this subtask
Follow the steps below:
step 1: Choose the best set of parameters from the subtask2. This will be the first entry from the dataframe table.
step 2: Modify the function create_model by creating the new function create_model_subtask3 that additionally considers the Dropout rate, and Kernel Initializer! For dropout this can be done as follows.
model.add.....................
model.add(Dropout(0))
step 3: Modify the function evaluate to evaluate_subtask3 to account for the optimizers
def advanced_hyperparameter_search(OPTIMIZER, DROPOUT_RATE, KERNEL_INITIALIZER)->pd.DataFrame:
'''
In the same way as subtask2, create a table with columns for
OPTIMIZER, DROPOUT_RATE, KERNEL_INITIALIZER and accuracy.
The dataframe MUST be sorted in descending order based on the accuracy like the following cell. There will be 48 rows,
For other parameters, use the best one you found from subtask2 function hyperparameter_ranking.
'''
df.head(10) #it will print first 10 rows of the table
df.tail(10) #it will print last 10 rows of the table