Author: Brian A. Ree
1: Class Template
Let's start by coming up with a general outline for our neural network class. From our experience with the previous
tutorial we can define three general functions our neural network class should support.
- init: Setup our nodes in the three layers we talked about in the previous tutorial.
- train: Refine our connection weights after being fed input from a training set of data and compare results to an expected outcome.
- query: Give an answer from the output nodes after being given an input.
Again these are general topics our class must support but their may be more things things it'll need to do, we'll cross that bridge when we come to it.
2: Initializing the Network
Our simple little network has three layers, input, hidden, and output. The hidden layer is really just a middle layer but since all its are fed from the
input layer we consider it hidden because we can't get to it's input values as easily as we can with the input and hidden layers.
We don't want to hard code anything in our network so we'll take passed in parameters to set our layers. We have to be careful and make sure that our input
parameters make sense. We should check that our input node count and our output and hidden node counts all match. We also need to set a learning rate.
Remember our learning rate controls how we determine the lowest point on our error curve. If this rate is very high we'll completely over shoot our minimum
and never find it, or we could end up bouncing back and forth around some local minimum of our error curve. We should choose a small learning rate, something less than 1, so that we can make sure
the adjustments to our weights are small.
# Initialize a simple 3 layer neural network.
def __init__(self, inputnodes, hiddennodes, outputnodes, learningrate):
# Set the number of nodes in each layer.
self.inodes = inputnodes
self.hnodes = hiddennodes
self.onodes = outputnodes
# if len(self.inodes) != len(self.hnodes) or len(self.inodes) != len(self.onodes):
# print("Error: You must provide nodes lists of the same size.")
# print("Unexpected results may occur.")
# eif
# Set the learning rate.
self.lr = learningrate
if self.lr > 1.0:
print("Error: You must provide a learning rate that is less than or equal to one.")
print("Unexpected results may occur.")
# eif
# Set the link weight matrices, wih and who.
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
# Set the activation function for our neurons.
self.activation_function = lambda x: scipy.special.expit(x)
# edef
We offload our input parameters to local class variables so we can access them in our other methods.
Next up we'll store the learning rate in a local class variable and then we'll do a quick check to make sure the learning rate isn't too large.
For now we'll just print an error message if we detect something wrong with our input variables.
3: Initializing the Weights
The next step is to create a network of neural nodes and links. The most important part of the network is the link weights.
They're used to calculate the signal being fed forward and the back propagated error. It is the link weights that are refined
in an attempt to improve the network.
Weights can be represented as a matrix, so we can define them as follows:
- A matrix for the weights for the links between the input and hidden layers, w_input_hidden of size self.hnodes by self.inodes or hidden_nodes by input_nodes.
- A matrix for the weights for the links between the hidden and output layers, w_hidden_output of size self.onodes by self.hnodes or hidden_nodes by output_nodes.
Common practice is to set initial ink weight values to small random numbers. The following numpy function
generates an array of values selected randomly between 0 and 1, the size is rows X columns.
numpy.random.rand(rows, columns)
Let's setup our weight matrices.
# Set the link weight matrices, wih and who.
self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
This is a subtle and important step in the neural network design process, initializing the link weight matrices.
We initialize the matrix of shape, self.hnodes X self.inodes for our wih class variable. And we
initialize the matrix shape, self.onodes X self.hnodes for our who class variable.
4: Querying the Network
By querying the network we are asking the network to infer an answer based on some input values. The inferred output is generated
by forwarding feeding the input signals through our neural network, the weight of the connections between our artificial network,
create an output signal. Our class method, query, takes the input parameters and feeds it into our neural network returning the
network's output.
def query(self, inputs_list):
# Convert inputs to a two dimensional matrix
inputs = numpy.array(inputs_list, ndmin=2).T
# Calculate the signals from the input layer to the hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# Calculate the signals from the hidden layer to the output layer
hidden_outputs = self.activation_function(hidden_inputs)
# Calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# Calculate the signals emerging from the final output layer
final_outputs = self.activation_function(final_inputs)
return final_outputs
# edef
To perform this task we need to pass the input signals from the input layer of nodes, through the hidden layer and out
of the final output layer. Remember also that we use the link weights to moderate the signals as they feed into any given hidden or
output node, and we also use sigmoid activation function to alter the signals coming out of the respective network nodes.
Before we can move forward we need to format the input data so that it has the proper shape.
We're planning to use this matrix in future calculation so we have to make sure we can use it in a
matrix dot product.
# Convert inputs to a two dimensional matrix
inputs = numpy.array(inputs_list, ndmin=2).T
Using matrices we can express the matrix of weights for the link between the input layer and the hidden layers. We can combine the matrix
of inputs to generate the signals that are fed into the hidden layer nodes.
x_hidden = w_input_hidden * inputs (where * is the matrix dot product)
Now look how easy it is to express this in python.
# Calculate the signals from the input layer to the hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
To get the signals emerging from the hidden nodes, we apply the sigmoid activation function to each emerging signal.
Remember the activation function is stored in a special math library as the expit function. We created a local method for the
activation function in our initialization method, self.activation_function = lambda x: scipy.special.expit(x).
o_hidden = sigmoid(x_hidden)
Now to express this in python.
# Calculate the signals from the hidden layer to the output layer
hidden_outputs = self.activation_function(hidden_inputs)
This step stores the signals emerging from the hidden layer nodes in the matrix called hidden_outputs. The process between used
for the signals between the hidden and output nodes is similar.
# Calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# Calculate the signals emerging from the final output layer
final_outputs = self.activation_function(final_inputs)
The next method we'll flush out is the train method. Remember there are two phases to training, the first is calculating
the output just as the query method does it, and the second part is back propagating the errors to inform the network how
the link weights are refined.
5: Training the Network
Now that we have our init and query methods defined we have to complete the train method.
The train method will run data through our network and adjust the network weights based on a comparison between the
expected output and the generated output. There are two parts to the training step.
- 1: Working out the output for a given training example. This is the same as the functionality as the query method.
- 2: Working out the network weight adjustmnets by taking the error, expected outcome compared to generated outcome, and back propagating
through the network.
def train(self, inputs_list, answers_list):
# Convert inputs to a two dimensional matrix
inputs = numpy.array(inputs_list, ndmin=2).T
answers = numpy.array(answers_list, ndmin=2).T
# Calculate the signals from the input layer to the hidden layer
hidden_inputs = numpy.dot(self.wih, inputs)
# Calculate the signals from the hidden layer to the output layer
hidden_outputs = self.activation_function(hidden_inputs)
# Calculate signals into final output layer
final_inputs = numpy.dot(self.who, hidden_outputs)
# Calculate the signals emerging from the final output layer
final_outputs = self.activation_function(final_inputs)
# Output layer error is the (answer - guess)
output_errors = answers - final_outputs
# Hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
# Update the weights for the links between the hidden and output layers
self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
# Update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
# edef
This code is almost exactly the same as that in the query method, because we're feeding forward the signal from the input layer to the final
output layer in exactly the same way. The only difference this far is that we have an additional parameter, targets_list, defined in function
train because you can't train the network without an expected output.
The input_list and target_list are converted into a numpy.array data type. We're getting closer to the back propagation step and the weight
refinment based on the error. First let's calculate the error.
# Output layer error is the (answer - guess)
output_errors = answers - final_outputs
Next we need to calculate the back-propagated errors for the hidden layer nodes. The matrix form of this calculation is as follows.
errors_hidden = weights_T_hidden_output * errors_output
Where weights_T_hidden_output is the matrix transpose of the weights_hidden_output mstrix.
Again we are altering the shape of the matrix so that we can use it properly in our matrix expresisons.
This is expressed in python as follows.
# Hidden layer error is the output_errors, split by weights, recombined at hidden nodes
hidden_errors = numpy.dot(self.who.T, output_errors)
We now have what we need to refine the weights between each layer. For the weights between the hidden and output layers, we
use the output_errors variable. For the weights between the input and the hidden layers, we use the hidden_errors list we
just calculated.
The expression for updating the weights for the link between a node j and a node k in the next layer is a matrix of the
following form.
DELTA W_jk = ALPHA . E_k . sigmoid(O_k) . (1 - sigmoid(O_k)) * O_T_j
The alpha is the learning rate, and the sigmoid is the node activation function we saw before. Remember that the . is
matrix scalar multiplication and * is the matrix dot product. The python code for this expression is as follows.
# Update the weights for the links between the hidden and output layers
self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
The code for the other weights between the input and hidden layers will be very similar.
# Update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
In the next tutorial we'll use our code to recognize hand written numbers!! What?!?!