Hyperparameter Tuning

In this task, we will learn how to find the hyperparameters in a neural network setting. We will use MNIST dataset, which is a handwritten digit recognition dataset. More on MNIST: http://yann.lecun.com/exdb/mnist/

MNIST handwritten digit recognition dataset

if from keras.utils import to_categorical does not work, use from tensorflow.keras.utils import to_categorical

The images we have are in the 28 * 28 dimension. For a simple shallow neural network, we donot have to consider the height and width of the images separately. That means, we can simply make a long vector of 28 * 28 = 784 dimension and work with it.

For the specific loss function we will use "categorical cross entropy", it is important to convert the target class in an one-hot-encoded form

Let's check, how our model performs. Run the followung cell and make sure you get the EXACT same values as output.

Subtask 1: Tune each class of Hyperparameters and observe the performance with a graph

The first one is done for you

Subtask 2:

Grid search of hyperparameters to find the best set of hyperparameters.

Rank the best set of parameters among the following setting.

Also comment the three best and three worst setting of hyperparameters

The code is taking a long time to run

Tips to get around

  1. create some (even one will help) helper functions first and combine them. For instance, assume the following function--
hyperparameter_helper(Layer_size, UNITS, BATCH_SIZE, LEARNING_RATE, EPOCHS) ->pd.DataFrame:    
# use fixed layer size will result in dataframe of size 90 for each layer size, significantly reducing your experiments. You can use this helper function inside the hyperparameter_ranking
  1. parallelization: A bit tricky in google colab
  2. Using GPU: Runtime -> Change Runtime Type -> hardware accelerator --> GPU

Subtask 3: Advanced Hyperparameter Optimization

In this task, you have to find some advanced hyperparameters:

Special Note: You will investigate different settings for these hyperparameters as well. For example, for ADAM, there are beta_1 and beta_2 values. You can assume them to be default value (0.9 and 0.999). For other Optimizers, and kernel intializers, you can assume their default value for this subtask

Follow the steps below:

step 1: Choose the best set of parameters from the subtask2. This will be the first entry from the dataframe table.

step 2: Modify the function create_model by creating the new function create_model_subtask3 that additionally considers the Dropout rate, and Kernel Initializer! For dropout this can be done as follows.

model.add.....................
model.add(Dropout(0))

step 3: Modify the function evaluate to evaluate_subtask3 to account for the optimizers

Few Notes:

df.head(10) #it will print first 10 rows of the table
df.tail(10) #it will print last 10 rows of the table
special_model = create_model(num_layers=4, num_units=16, flattened_size=x_test.shape[1], 
                     activation_function='sigmoid', class_size=num_classes)
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=256, epochs=100)

after these lines, if you write

loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=256, epochs=100)

you'll probably expect the model to get trained again from scratch. But it won't, and the model will get trained from where it left off. So, every time you need to retrain your model or train a new model, you have to write down both the lines to be safe.