Neural Network from Scratch

6 minute read

This project consists on a simple implementation of a neural network from scratch. This means we are not going to use any framework or advanced libs to help us.

The ideia behind this is just better understand how do neural networks works. So, let’s begin.

Importing libs

import numpy as np
import math
import random

Simple Multi Layer Perceptron From Scratch

Activation Functions

def step_func(x):
    return 1 if x >= 0 else 0
def sigmoid(x):
    return 1 / (1 + math.exp(-x))

Output function

def neuron_output(weights, x):
    return sigmoid(np.dot(weights, x))

Feed-Forward

It receives a neural network, represented as a list of lists of weights lists, and returns the output from the input

def feed_forward(neural_network, input_vec):
    outputs = []

    # Process one layer at a time
    for layer in neural_network:
        input_with_bias = list(input_vec) + [1]
        output = [neuron_output(neuron, input_with_bias) for neuron in layer]
        outputs.append(output)

        input_vec = output

    return outputs

Feed-Forward network example

Neural network simulating XOR port

The weights chosen for the XOR network were chosen precisely to implement the or-exclusive function. This function is not linear, so only one Perceptron is not able to adjust to this function. The weights of the neurons are chosen so that the first layer is composed of an AND neuron and another OR and the second layer is composed of a neuron “second entrance and not the first one”.

  • Weights AND: [k, k, -3k]
  • Weights OR: [k, k, -0.5k]
  • Weights “second entrance and not the first one”: [-k, k, -0.5k]

XOR NN architecture

xor_network = [# hidden layer
                [[20, 20, -30], #AND
                 [20, 20, -10]], # OR
               # output layer
                 [[-60, 60, -30]]] # second entrance and not the first one

for x in [0, 1]:
    for y in [0, 1]:
        print(x, y, feed_forward(xor_network, [x, y])[-1])
0 0 [9.38314668300676e-14]
0 1 [0.9999999999999059]
1 0 [0.9999999999999059]
1 1 [9.383146683006828e-14]

Backpropagation

Backpropagation Step by Step:

  • Run the feed_forward for an input and save the outputs of each neuron
  • Calculate the error for each neuron (target - output)
  • Calculate the gradient for this error as a function of the weights of the neural network
  • Adjust the weights in the direction that the error decreases
  • Propagate output errors to hidden layers
  • Calculate the gradients of these errors and adjust the weights of all hidden layers
def backpropagate(network, input_vec, targets):

    # feed_forward for the input
    hidden_outputs, outputs = feed_forward(network, input_vec)

    # Calculates the error for the output
    output_deltas = [output * (1 - output) * (output - target) for output, target in zip(outputs, targets)]

    # Adjusts the weights for the output layer, one neuron at a time
    for i, output_neuron in enumerate(network[-1]):
        # -1 to focus on the last layer (of output)
        for j, hidden_output in enumerate(hidden_outputs + [1]):
            # adjusts the j-th weight of this neuron
            output_neuron[j] -= output_deltas[i] * hidden_output

    # Calculates the error for the hidden layer
    hidden_deltas = [hidden_output * (1 - hidden_output) * np.dot(output_deltas, [n[i] for n in output_layer])
                    for i, hidden_output in enumerate(hidden_outputs)]

    # Adjusts the weights for the hidden layer, one neuron at a time
    for i, hidden_neuron in enumerate(network[0]):
        for j, input in enumerate(input_vec + [1]):
            hidden_neuron[j] -= hidden_deltas[i] * input

Neural Network vs Captcha

Let’s train a neural network to try to defeat a simple captcha. Each captcha will be interpreted as a 5x5 matrix.

zero = [1, 1, 1, 1, 1,
        1, 0, 0, 0, 1,
        1, 0, 0, 0, 1,
        1, 0, 0, 0, 1,
        1, 1, 1, 1, 1]
one = [0, 0, 1, 0, 0,
       0, 0, 1, 0, 0,
       0, 0, 1, 0, 0,
       0, 0, 1, 0, 0,
       0, 0, 1, 0, 0]
two = [1, 1, 1, 1, 1,
       0, 0, 0, 0, 1,
       1, 1, 1, 1, 1,
       1, 0, 0, 0, 0,
       1, 1, 1, 1, 1]
three = [1, 1, 1, 1, 1,
         0, 0, 0, 0, 1,
         1, 1, 1, 1, 1,
         0, 0, 0, 0, 1,
         1, 1, 1, 1, 1]
four = [1, 0, 0, 0, 1,
        1, 0, 0, 0, 1,
        1, 1, 1, 1, 1,
        0, 0, 0, 0, 1,
        0, 0, 0, 0, 1]
five = [1, 1, 1, 1, 1,
       1, 0, 0, 0, 0,
       1, 1, 1, 1, 1,
       0, 0, 0, 0, 1,
       1, 1, 1, 1, 1]
six = [1, 1 ,1, 1 ,1,
       1, 0, 0, 0, 0,
       1, 1, 1, 1, 1,
       1, 0, 0, 0, 1,
       1, 1, 1, 1, 1]
seven = [1, 1, 1, 1, 1,
         0, 0, 0, 0, 1,
         0, 0, 0, 0, 1,
         0, 0, 0, 0, 1,
         0, 0, 0, 0, 1]
eight = [1, 1, 1, 1, 1,
         1, 0, 0, 0, 1,
         1, 1, 1, 1, 1,
         1, 0, 0, 0, 1,
         1, 1, 1, 1, 1]
nine = [1, 1, 1, 1, 1,
        1, 0, 0, 0, 1,
        1, 1, 1, 1, 1,
        0, 0, 0, 0, 1,
        1, 1, 1, 1, 1]

Now, we concatenate each number in just one list

inputs = [zero, one, two, three, four, five, six, seven, eight, nine]
print(np.array(inputs).shape)
(10, 25)

For the targets, we can create a diagonal matrix 10x10. This is One-Hot encoded.

targets = [[1 if i == j else 0 for i in range(10)] for j in range(10)]
print(np.array(targets).shape)
(10, 10)

Now, we build the Neural Network. The input size is 25 (5x5), 5 hidden neurons are suficiente for this task, and 10 neurons for the last layer, since there are 10 classes.

All the initial weights are initialized as random.

random.seed(42)
input_size = 25
num_hidden = 5
num_class = 10

hidden_layer = [[random.random() for _ in range(input_size + 1)]
               for _ in range(num_hidden)]

output_layer = [[random.random() for _ in range(num_hidden + 1)]
               for _ in range(num_class)]

network = [hidden_layer, output_layer]

The train function is pretty strait forward. Just executing backpropagation for some epochs.

def train(network, epochs, inputs, targets):

    for _ in range(epochs):
        for input_vec, target_vec in zip(inputs, targets):
            backpropagate(network, input_vec, target_vec)

10000 Epochs are more then enough, 14seconds on my computer.

%%time
train(network, 10000, inputs, targets)
CPU times: user 12.5 s, sys: 27.9 ms, total: 12.5 s
Wall time: 12.5 s

The prediction will be the output of the last layer. The argmax function will select the largest probability on that layer for the prediction.

def predict(network, x):
    return np.argmax(feed_forward(network, x)[-1])

Testing the Network

Let’s test the network trying to classify data that is not in the training set. For this, we will use numbers drawn in a slightly different way.

input examples

# Testing with a different 3
predict(network, [0, 1, 1, 1, 1,
                  0, 0, 0, 1, 1,
                  0, 1, 1, 1, 1,
                  0, 0, 0, 1, 1,
                  0, 1 ,1, 1 ,1])
1
# Testing with a different 7
predict(network, [1, 1, 1, 1, 1,
                  0, 0, 0, 1, 0,
                  0, 0, 1, 0, 0,
                  0, 1, 0, 0, 0,
                  1, 0 ,0, 0 ,0])
1
# Testing with a different 8
predict(network, [0, 1, 1, 1, 0,
                  1, 0, 0, 0, 1,
                  1, 1, 1, 1, 1,
                  1, 0, 0, 0, 1,
                  0, 1 ,1, 1 ,0])
1

Trying to visualize what the neural network does

We can easely check how as the weights distributed. By plotting them in a matrix, the large weights will be blacker and the lower ones will be whiter

import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
def show_weights(weights):
    abs_weights = list(map(abs, weights))

    # transform the weights in a 5x5 grid
    grid = [abs_weights[row:(row+5)] for row in range(0,25,5)]

    ax = plt.gca() #

    ax.imshow(grid,
              cmap='binary', # uses black and white
              interpolation='none')
show_weights(network[0][0])

layer visualize