In this task, we will learn how to find the hyperparameters in a neural network setting. We will use MNIST dataset, which is a handwritten digit recognition dataset. More on MNIST: http://yann.lecun.com/exdb/mnist/
import matplotlib.pyplot as plt
%matplotlib inline
import random
import tensorflow as tf
from keras.datasets import mnist
import pandas as pd
import numpy as np
#from keras.utils import to_categorical
from tensorflow.keras.utils import to_categorical
from keras.layers import Dense
from keras.models import Sequential
from numpy.random import seed
MNIST handwritten digit recognition dataset
if from keras.utils import to_categorical does not work, use from tensorflow.keras.utils import to_categorical
#Fortunately Keras already have this dataset on its server. Let's load the dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11493376/11490434 [==============================] - 0s 0us/step 11501568/11490434 [==============================] - 0s 0us/step
x_train.shape
(60000, 28, 28)
x_test.shape
(10000, 28, 28)
from sklearn.model_selection import train_test_split
x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train, test_size=0.20, random_state=3037)
# DONOT change the random_state variable
x_train.shape
(48000, 28, 28)
x_valid.shape
(12000, 28, 28)
d = pd.DataFrame()
d["y"] = y_train
# Let's analyze, how the distribution for the class variable looks like:
d.y.value_counts()
1 5381 7 5004 3 4876 2 4797 0 4785 9 4774 6 4743 4 4671 8 4610 5 4359 Name: y, dtype: int64
The images we have are in the 28 * 28 dimension. For a simple shallow neural network, we donot have to consider the height and width of the images separately. That means, we can simply make a long vector of 28 * 28 = 784 dimension and work with it.
# Flattening the images
flattened_image = x_train.shape[1]*x_train.shape[2]
x_train = x_train.reshape(x_train.shape[0], flattened_image)
x_valid = x_valid.reshape(x_valid.shape[0], flattened_image)
x_test = x_test.reshape(x_test.shape[0], flattened_image)
x_train.shape
(48000, 784)
x_valid.shape
(12000, 784)
x_test.shape
(10000, 784)
# converting the classes to categorical variable
classes = np.unique(y_train)
classes
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)
num_classes = len(classes)
num_classes
10
For the specific loss function we will use "categorical cross entropy", it is important to convert the target class in an one-hot-encoded form
y_train = to_categorical(y_train)
y_train
array([[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], [0., 0., 0., ..., 1., 0., 0.], ..., [0., 0., 1., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [1., 0., 0., ..., 0., 0., 0.]], dtype=float32)
y_valid = to_categorical(y_valid)
y_test = to_categorical(y_test)
# Helper function 1:
#create the neural network
def create_model(num_layers, num_units, flattened_size, activation_function, class_size):
'''
We create a neural network in this function. You can choose the layer size, the units perlayer, activation function,
and size of the class. This function creates the basic architecture required to handle the tasks.
'''
seed(37)
random.seed(37)
tf.random.set_seed(37)
model = Sequential()
model.add(Dense(units=num_units, activation=activation_function, input_shape=(flattened_size,)))
for i in range(num_layers-1):
model.add(Dense(units=num_units, activation=activation_function))
model.add(Dense(units=num_classes, activation='softmax'))
return model
def evaluate(FCmodel, Learning_Rate, batch_sz, epochs, add_visualization=True):
'''
This function will run and evaluate the neural network.
*** Important: DoNOT change the seed values. This will ensure the reproducibility of the experiments.
'''
seed(37)
random.seed(37)
tf.random.set_seed(37)
opt = tf.keras.optimizers.SGD(learning_rate=Learning_Rate)
FCmodel.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
history = FCmodel.fit(x_train, y_train, batch_size=batch_sz, epochs=epochs,
verbose=add_visualization, validation_data=(x_valid, y_valid))
loss, accuracy = FCmodel.evaluate(x_valid, y_valid, verbose=False)
if add_visualization:
print(f'validation loss: {loss:.4}')
print(f'validation accuracy: {accuracy:.4}')
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model performance')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['training', 'validation'], loc='best')
plt.show()
return loss, accuracy
Let's check, how our model performs. Run the followung cell and make sure you get the EXACT same values as output.
special_model = create_model(num_layers=4, num_units=16, flattened_size=x_test.shape[1],
activation_function='sigmoid', class_size=num_classes)
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=256, epochs=100)
Epoch 1/100 188/188 [==============================] - 2s 5ms/step - loss: 2.3016 - accuracy: 0.1125 - val_loss: 2.2961 - val_accuracy: 0.1134 Epoch 2/100 188/188 [==============================] - 1s 4ms/step - loss: 2.2937 - accuracy: 0.1166 - val_loss: 2.2904 - val_accuracy: 0.1134 Epoch 3/100 188/188 [==============================] - 1s 4ms/step - loss: 2.2857 - accuracy: 0.1291 - val_loss: 2.2797 - val_accuracy: 0.1205 Epoch 4/100 188/188 [==============================] - 1s 4ms/step - loss: 2.2671 - accuracy: 0.1864 - val_loss: 2.2490 - val_accuracy: 0.2005 Epoch 5/100 188/188 [==============================] - 1s 4ms/step - loss: 2.2058 - accuracy: 0.2234 - val_loss: 2.1411 - val_accuracy: 0.2479 Epoch 6/100 188/188 [==============================] - 1s 4ms/step - loss: 2.0349 - accuracy: 0.2330 - val_loss: 1.9296 - val_accuracy: 0.2531 Epoch 7/100 188/188 [==============================] - 1s 4ms/step - loss: 1.8570 - accuracy: 0.2750 - val_loss: 1.8101 - val_accuracy: 0.3095 Epoch 8/100 188/188 [==============================] - 1s 4ms/step - loss: 1.7577 - accuracy: 0.3335 - val_loss: 1.7180 - val_accuracy: 0.3730 Epoch 9/100 188/188 [==============================] - 1s 3ms/step - loss: 1.7037 - accuracy: 0.3754 - val_loss: 1.6888 - val_accuracy: 0.3668 Epoch 10/100 188/188 [==============================] - 1s 4ms/step - loss: 1.6483 - accuracy: 0.4236 - val_loss: 1.6417 - val_accuracy: 0.4399 Epoch 11/100 188/188 [==============================] - 1s 4ms/step - loss: 1.5937 - accuracy: 0.4546 - val_loss: 1.5934 - val_accuracy: 0.4518 Epoch 12/100 188/188 [==============================] - 1s 4ms/step - loss: 1.5565 - accuracy: 0.4643 - val_loss: 1.5322 - val_accuracy: 0.4745 Epoch 13/100 188/188 [==============================] - 1s 4ms/step - loss: 1.4662 - accuracy: 0.5399 - val_loss: 1.4070 - val_accuracy: 0.5457 Epoch 14/100 188/188 [==============================] - 1s 4ms/step - loss: 1.3696 - accuracy: 0.5632 - val_loss: 1.2992 - val_accuracy: 0.5787 Epoch 15/100 188/188 [==============================] - 1s 4ms/step - loss: 1.2581 - accuracy: 0.6010 - val_loss: 1.1926 - val_accuracy: 0.6501 Epoch 16/100 188/188 [==============================] - 1s 4ms/step - loss: 1.1789 - accuracy: 0.6426 - val_loss: 1.1863 - val_accuracy: 0.6266 Epoch 17/100 188/188 [==============================] - 1s 4ms/step - loss: 1.1270 - accuracy: 0.6645 - val_loss: 1.0912 - val_accuracy: 0.6878 Epoch 18/100 188/188 [==============================] - 1s 4ms/step - loss: 1.1062 - accuracy: 0.6657 - val_loss: 1.1350 - val_accuracy: 0.6363 Epoch 19/100 188/188 [==============================] - 1s 4ms/step - loss: 1.0871 - accuracy: 0.6746 - val_loss: 1.0421 - val_accuracy: 0.6848 Epoch 20/100 188/188 [==============================] - 1s 4ms/step - loss: 1.0437 - accuracy: 0.6899 - val_loss: 1.0577 - val_accuracy: 0.6837 Epoch 21/100 188/188 [==============================] - 1s 4ms/step - loss: 1.0223 - accuracy: 0.6948 - val_loss: 0.9920 - val_accuracy: 0.6929 Epoch 22/100 188/188 [==============================] - 1s 4ms/step - loss: 0.9928 - accuracy: 0.6949 - val_loss: 1.0318 - val_accuracy: 0.6589 Epoch 23/100 188/188 [==============================] - 1s 3ms/step - loss: 0.9967 - accuracy: 0.6993 - val_loss: 1.0771 - val_accuracy: 0.6603 Epoch 24/100 188/188 [==============================] - 1s 4ms/step - loss: 0.9533 - accuracy: 0.7132 - val_loss: 0.9759 - val_accuracy: 0.7064 Epoch 25/100 188/188 [==============================] - 1s 4ms/step - loss: 0.9367 - accuracy: 0.7239 - val_loss: 0.9630 - val_accuracy: 0.7312 Epoch 26/100 188/188 [==============================] - 1s 4ms/step - loss: 0.9080 - accuracy: 0.7346 - val_loss: 0.9216 - val_accuracy: 0.7281 Epoch 27/100 188/188 [==============================] - 1s 4ms/step - loss: 0.9008 - accuracy: 0.7327 - val_loss: 0.8799 - val_accuracy: 0.7072 Epoch 28/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8695 - accuracy: 0.7444 - val_loss: 0.8730 - val_accuracy: 0.7331 Epoch 29/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8834 - accuracy: 0.7248 - val_loss: 0.8961 - val_accuracy: 0.6974 Epoch 30/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8597 - accuracy: 0.7338 - val_loss: 0.8313 - val_accuracy: 0.7558 Epoch 31/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8268 - accuracy: 0.7485 - val_loss: 0.8619 - val_accuracy: 0.7330 Epoch 32/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8344 - accuracy: 0.7424 - val_loss: 0.8351 - val_accuracy: 0.7431 Epoch 33/100 188/188 [==============================] - 1s 3ms/step - loss: 0.8348 - accuracy: 0.7546 - val_loss: 0.8074 - val_accuracy: 0.7785 Epoch 34/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8258 - accuracy: 0.7513 - val_loss: 0.8000 - val_accuracy: 0.7536 Epoch 35/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8135 - accuracy: 0.7632 - val_loss: 0.8123 - val_accuracy: 0.7546 Epoch 36/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8190 - accuracy: 0.7604 - val_loss: 0.8417 - val_accuracy: 0.7512 Epoch 37/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8288 - accuracy: 0.7565 - val_loss: 0.8337 - val_accuracy: 0.7534 Epoch 38/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8450 - accuracy: 0.7487 - val_loss: 0.8399 - val_accuracy: 0.7382 Epoch 39/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8136 - accuracy: 0.7559 - val_loss: 0.8219 - val_accuracy: 0.7557 Epoch 40/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8256 - accuracy: 0.7531 - val_loss: 0.8329 - val_accuracy: 0.7398 Epoch 41/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8292 - accuracy: 0.7464 - val_loss: 0.7910 - val_accuracy: 0.7654 Epoch 42/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7921 - accuracy: 0.7636 - val_loss: 0.7984 - val_accuracy: 0.7638 Epoch 43/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7989 - accuracy: 0.7455 - val_loss: 0.8571 - val_accuracy: 0.7023 Epoch 44/100 188/188 [==============================] - 1s 4ms/step - loss: 0.8128 - accuracy: 0.7409 - val_loss: 0.7862 - val_accuracy: 0.7613 Epoch 45/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7752 - accuracy: 0.7514 - val_loss: 0.7562 - val_accuracy: 0.7624 Epoch 46/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7522 - accuracy: 0.7682 - val_loss: 0.7513 - val_accuracy: 0.7787 Epoch 47/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7740 - accuracy: 0.7593 - val_loss: 0.7816 - val_accuracy: 0.7395 Epoch 48/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7532 - accuracy: 0.7592 - val_loss: 0.7186 - val_accuracy: 0.7828 Epoch 49/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7125 - accuracy: 0.7813 - val_loss: 0.7298 - val_accuracy: 0.7559 Epoch 50/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7039 - accuracy: 0.7866 - val_loss: 0.6918 - val_accuracy: 0.7947 Epoch 51/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7011 - accuracy: 0.7894 - val_loss: 0.7171 - val_accuracy: 0.7804 Epoch 52/100 188/188 [==============================] - 1s 3ms/step - loss: 0.7176 - accuracy: 0.7789 - val_loss: 0.7660 - val_accuracy: 0.7642 Epoch 53/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7481 - accuracy: 0.7786 - val_loss: 0.7322 - val_accuracy: 0.7801 Epoch 54/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7015 - accuracy: 0.7951 - val_loss: 0.7540 - val_accuracy: 0.7902 Epoch 55/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7532 - accuracy: 0.7733 - val_loss: 0.7430 - val_accuracy: 0.7757 Epoch 56/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7062 - accuracy: 0.7875 - val_loss: 0.6789 - val_accuracy: 0.7934 Epoch 57/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6990 - accuracy: 0.7878 - val_loss: 0.7743 - val_accuracy: 0.7410 Epoch 58/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7125 - accuracy: 0.7726 - val_loss: 0.6920 - val_accuracy: 0.7712 Epoch 59/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6837 - accuracy: 0.7821 - val_loss: 0.6923 - val_accuracy: 0.7776 Epoch 60/100 188/188 [==============================] - 1s 4ms/step - loss: 0.7016 - accuracy: 0.7760 - val_loss: 0.6853 - val_accuracy: 0.7864 Epoch 61/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6709 - accuracy: 0.8026 - val_loss: 0.6741 - val_accuracy: 0.8132 Epoch 62/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6789 - accuracy: 0.8084 - val_loss: 0.7019 - val_accuracy: 0.7953 Epoch 63/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6725 - accuracy: 0.8041 - val_loss: 0.6742 - val_accuracy: 0.8063 Epoch 64/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6490 - accuracy: 0.8174 - val_loss: 0.6932 - val_accuracy: 0.7965 Epoch 65/100 188/188 [==============================] - 1s 3ms/step - loss: 0.6387 - accuracy: 0.8116 - val_loss: 0.6190 - val_accuracy: 0.8195 Epoch 66/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6109 - accuracy: 0.8214 - val_loss: 0.6024 - val_accuracy: 0.8292 Epoch 67/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5970 - accuracy: 0.8314 - val_loss: 0.6116 - val_accuracy: 0.8214 Epoch 68/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6044 - accuracy: 0.8274 - val_loss: 0.6549 - val_accuracy: 0.8030 Epoch 69/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6583 - accuracy: 0.7958 - val_loss: 0.6500 - val_accuracy: 0.8052 Epoch 70/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6292 - accuracy: 0.8125 - val_loss: 0.6225 - val_accuracy: 0.8165 Epoch 71/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6158 - accuracy: 0.8230 - val_loss: 0.6288 - val_accuracy: 0.8227 Epoch 72/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6090 - accuracy: 0.8307 - val_loss: 0.6068 - val_accuracy: 0.8305 Epoch 73/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5840 - accuracy: 0.8340 - val_loss: 0.5880 - val_accuracy: 0.8357 Epoch 74/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5764 - accuracy: 0.8385 - val_loss: 0.6151 - val_accuracy: 0.8172 Epoch 75/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6076 - accuracy: 0.8273 - val_loss: 0.5825 - val_accuracy: 0.8363 Epoch 76/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5990 - accuracy: 0.8202 - val_loss: 0.6036 - val_accuracy: 0.8247 Epoch 77/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5987 - accuracy: 0.8265 - val_loss: 0.5927 - val_accuracy: 0.8281 Epoch 78/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5887 - accuracy: 0.8332 - val_loss: 0.5901 - val_accuracy: 0.8347 Epoch 79/100 188/188 [==============================] - 1s 4ms/step - loss: 0.6119 - accuracy: 0.8067 - val_loss: 0.6237 - val_accuracy: 0.8072 Epoch 80/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5934 - accuracy: 0.8285 - val_loss: 0.5812 - val_accuracy: 0.8298 Epoch 81/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5755 - accuracy: 0.8337 - val_loss: 0.5868 - val_accuracy: 0.8276 Epoch 82/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5916 - accuracy: 0.8264 - val_loss: 0.6119 - val_accuracy: 0.8126 Epoch 83/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5716 - accuracy: 0.8337 - val_loss: 0.5751 - val_accuracy: 0.8413 Epoch 84/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5736 - accuracy: 0.8371 - val_loss: 0.5879 - val_accuracy: 0.8395 Epoch 85/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5563 - accuracy: 0.8418 - val_loss: 0.5640 - val_accuracy: 0.8334 Epoch 86/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5717 - accuracy: 0.8360 - val_loss: 0.6161 - val_accuracy: 0.8242 Epoch 87/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5909 - accuracy: 0.8265 - val_loss: 0.5835 - val_accuracy: 0.8360 Epoch 88/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5547 - accuracy: 0.8436 - val_loss: 0.5523 - val_accuracy: 0.8453 Epoch 89/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5888 - accuracy: 0.8207 - val_loss: 0.6189 - val_accuracy: 0.8040 Epoch 90/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5800 - accuracy: 0.8282 - val_loss: 0.5868 - val_accuracy: 0.8194 Epoch 91/100 188/188 [==============================] - 1s 3ms/step - loss: 0.5569 - accuracy: 0.8368 - val_loss: 0.5708 - val_accuracy: 0.8348 Epoch 92/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5438 - accuracy: 0.8425 - val_loss: 0.5856 - val_accuracy: 0.8291 Epoch 93/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5501 - accuracy: 0.8431 - val_loss: 0.5563 - val_accuracy: 0.8439 Epoch 94/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5337 - accuracy: 0.8506 - val_loss: 0.5578 - val_accuracy: 0.8374 Epoch 95/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5459 - accuracy: 0.8420 - val_loss: 0.5705 - val_accuracy: 0.8342 Epoch 96/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5324 - accuracy: 0.8449 - val_loss: 0.5210 - val_accuracy: 0.8516 Epoch 97/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5272 - accuracy: 0.8491 - val_loss: 0.5526 - val_accuracy: 0.8424 Epoch 98/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5422 - accuracy: 0.8433 - val_loss: 0.5606 - val_accuracy: 0.8335 Epoch 99/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5499 - accuracy: 0.8406 - val_loss: 0.5568 - val_accuracy: 0.8469 Epoch 100/100 188/188 [==============================] - 1s 4ms/step - loss: 0.5469 - accuracy: 0.8522 - val_loss: 0.5519 - val_accuracy: 0.8515 validation loss: 0.5519 validation accuracy: 0.8515
special_model.summary()
Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 16) 12560 _________________________________________________________________ dense_1 (Dense) (None, 16) 272 _________________________________________________________________ dense_2 (Dense) (None, 16) 272 _________________________________________________________________ dense_3 (Dense) (None, 16) 272 _________________________________________________________________ dense_4 (Dense) (None, 10) 170 ================================================================= Total params: 13,546 Trainable params: 13,546 Non-trainable params: 0 _________________________________________________________________
The first one is done for you
# Below are the five classes of hyperparameters
LAYERS = [2, 3, 4]
UNITS = [4, 8, 16, 32, 64, 128, 256]
BATCH = [8, 16, 32, 64, 128]
LEARNING_RATE = [0.1, 0.01, 0.001, 0.0001, 0.00001]
EPOCHS = [10, 20, 30, 40, 100]
# Question 1: create a Layer-vs-accuracy graph for the following setting:
# units = [32, 128], Batch_size = [16, 64], Learning_rate = [0.1], Epochs = [10, 20]
# That means there will be 2*2*1*2 = 8 graphs
%%time
for B in [16, 64]:
for E in [10,20]:
for U in [32, 128]:
ACC = []
for L in LAYERS:
special_model = create_model(num_layers=L, num_units=U, flattened_size=x_test.shape[1],
activation_function='sigmoid', class_size=num_classes)
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=B, epochs=E, add_visualization=False)
ACC.append(acc)
plt.plot(LAYERS, ACC)
plt.xticks(LAYERS)
plt.xlabel("Layer_size")
plt.ylabel("Accuracy")
TITLE = "Layer_size vs Acc for Batch size {}, Epoch {}, Unit size {} ".format(B, E, U)
plt.title(TITLE)
plt.show()
CPU times: user 24min 34s, sys: 1min 56s, total: 26min 30s Wall time: 23min 12s
# Question 2: create a Unit size-vs-accuracy graph for the following setting:
# Layer size = [2], Batch_size = [16, 64], Learning_rate = [0.1], Epochs = [10, 20]
# That means there will be 1*2*1*2 = 4 graphs
## code starts here
# Question 3: create a Batch size-vs-accuracy graph for the following setting:
# Layer size = [2], Unit_size = [16, 32], Learning_rate = [0.1], Epochs = [10, 20]
# That means there will be 1*2*1*2 = 4 graphs
## code starts here
# Question 4: create a Learning rate-vs-accuracy graph for the following setting:
# Layer size = [2], Unit_size = [16, 32], Batch_size = [16], Epochs = [10, 20]
# That means there will be 1*2*1*2 = 4 graphs
## code starts here
# Question 5: create a Epoch size-vs-accuracy graph for the following setting:
# Layer size = [2], Unit_size = [32], Batch_size = [8, 16], Learning_rate = [0.1]
# That means there will be 1*1*2*1 = 2 graphs
## code starts here
Grid search of hyperparameters to find the best set of hyperparameters.
Rank the best set of parameters among the following setting.
Also comment the three best and three worst setting of hyperparameters
def hyperparameter_ranking(LAYERS, UNITS, BATCH_SIZE, LEARNING_RATE, EPOCHS) ->pd.DataFrame:
'''
Your code should return a DataFrame where columns will be LAYERS, UNITS, BATCH_SIZE, LEARNING_RATE, EPOCHS and corresponding accuracy.
The dataframe MUST be sorted in descending order based on the accuracy like the following cell. There will be 270 row
'''
The code is taking a long time to run
Tips to get around
hyperparameter_helper(Layer_size, UNITS, BATCH_SIZE, LEARNING_RATE, EPOCHS) ->pd.DataFrame:
# use fixed layer size will result in dataframe of size 90 for each layer size, significantly reducing your experiments. You can use this helper function inside the hyperparameter_ranking
## Your DataFrame table should be of the following form. Please note: the accuracy values are shown here are just for demonstration
# purposes. The accuracy should be in the descending order
# | LAYERS | UNITS | BATCH_SIZE | LEARNING_RATE | EPOCHS | Accuracy
# 0 3 16 32 0.1 10 0.71
# 1 4 32 16 0.001 20 0.68
# ............................................................
# ............................................................
# ............................................................
# 269 4 128 16 0.1 10 0.43
In this task, you have to find some advanced hyperparameters:
Special Note: You will investigate different settings for these hyperparameters as well. For example, for ADAM, there are beta_1 and beta_2 values. You can assume them to be default value (0.9 and 0.999). For other Optimizers, and kernel intializers, you can assume their default value for this subtask
Follow the steps below:
step 1: Choose the best set of parameters from the subtask2. This will be the first entry from the dataframe table.
step 2: Modify the function create_model by creating the new function create_model_subtask3 that additionally considers the Dropout rate, and Kernel Initializer! For dropout this can be done as follows.
model.add.....................
model.add(Dropout(0))
step 3: Modify the function evaluate to evaluate_subtask3 to account for the optimizers
def advanced_hyperparameter_search(OPTIMIZER, DROPOUT_RATE, KERNEL_INITIALIZER)->pd.DataFrame:
'''
In the same way as subtask2, create a table with columns for
OPTIMIZER, DROPOUT_RATE, KERNEL_INITIALIZER and accuracy.
The dataframe MUST be sorted in descending order based on the accuracy like the following cell. There will be 48 rows,
For other parameters, use the best one you found from subtask2 function hyperparameter_ranking.
'''
df.head(10) #it will print first 10 rows of the table
df.tail(10) #it will print last 10 rows of the table
special_model = create_model(num_layers=4, num_units=16, flattened_size=x_test.shape[1],
activation_function='sigmoid', class_size=num_classes)
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=256, epochs=100)
after these lines, if you write
loss, acc = evaluate(FCmodel=special_model, Learning_Rate=0.1, batch_sz=256, epochs=100)
you'll probably expect the model to get trained again from scratch. But it won't, and the model will get trained from where it left off. So, every time you need to retrain your model or train a new model, you have to write down both the lines to be safe.