Artificial Neural Networks

15 January 2010 | 0 comments | Tagged as: neural-networks

What are Neural Networks?

Traditionally, the main focus of computing has been based on the creation of programmes that perform certain tasks to an exact specification and in a procedural way. Computers have become extremely fast at tasks such as adding numbers and can complete these in a fraction of the time that it would take a human to.

However, other tasks such as speech recognition and visual identification, which are performed easily by humans, are not executed as effectively by procedural computer programmes.

To understand why humans are better than computers at these tasks, we need to have an understanding as to how computing is tackled in biological systems. The centre for computing in a human being is the brain. The architecture of the brain, rather than being serial like a computer, has a parallel design. This allows it to process many different pieces of information at the same time, which all need to interact to produce a solution.

In addition to its parallel nature, one of the most important features of the brain is its ability to learn. Rather than having specific instructions for each step, the brain can pick things up based on previous experience.

By using these principles we can design computer programmes known as Neural Networks which attempt to apply the brains solutions to these tasks.

The Biological Neuron

A human brain contains about 1010 processing units called neurons each of which is connected to about 104 other neurons. The figure below shows two connected neurons.

Neurons

The main components of a Neuron are as follows: - The Soma – This is the body of the neuron and contains the cell nucleus. - The Dendrites – These are fine nerves which receive all input into the neuron - The Axon – This nerve is used for the output of the neuron - The Synapses – These are special connectors that join the axon of one cell to the dendrite of another cell.

Neurons receive input from the synapses of other neurons in the form of chemicals know as Neurotransmitters. This causes an electrical charge to build up in the receiving neuron. If this charge exceeds a threshold limit, an electrical pulse will be sent down the axon to be transmitted to other neurons via the synapses.

Learning is thought to occur when modifications are made to the synaptic junctions allowing more or less neurotransmitters to be released.

The Artificial Neuron

The Artificial Neuron was first proposed by Warren S McCulloch and Walter Pitts in 1943. The basic features of this model are: - The neuron receives a series of weighted inputs - The neuron calculates the summation of it’s inputs - The summation of the inputs is passed through an activation function which will output either a 0 or 1 based on whether the sum is above or below a certain threshold.

The figure below shows a representation of an artificial neuron.

Artificial Neuron

A bias can be added to the neuron which usually has the constant output of -1. This bias is added to push the activation towards 0.

The activation function can take many forms. At its most simplest, a step or heavyside function can be used. Once the summation of the inputs reaches the threshold, the output becomes a one. An alternative to the step function is the sigmoid function. The sigmoid function has the following equation:

Sigmoid

Where e is Euler’s Number and k is the slope parameter. The slope parameter controls the gradient of the sigmoid and therefore controls its sensitivity. The lower the number, the more sensitive the function. The benefit of using the sigmoid function as opposed to the step function is that the sigmoid function doesn’t produce such a dramatic effect on the output. This allows the network to accept large inputs and still remain sensitive to small changes.

The figure below shows the difference between the sigmoid and step functions.

Sigmoid Step

Learning in single artificial neurons or Perceptrons as they are know, is achieved by altering the weights of the connections between neurons to increase or decrease their strength. There are several algorithms available setting out how weights should be adjusted, the most common of which is the Widrow-Hoff delta rule. This rule relies on being able to calculate the difference between the target output of the neuron and the actual output and then adjusting the weights in proportion to this error in order to minimise it.

The Multilayer Perceptron and the Back Propagation Learning Algorithm

The fundamental problem with single Perceptrons is their inability to solve simple linearly inseparable problems such as XOR. However, by arranging Perceptrons in layers it is possible to overcome this problem. This new model is known as the Multilayer Perceptron (MLP), an example of which can be seen in figure below. This MLP has 3 input units. The output of these input units is fed together with a bias to the 5 hidden units. These hidden units in turn, together with a bias feed the 2 output units.

MLP

Unfortunately as we don’t know the output of Perceptrons in the hidden layer of the MLP we cannot use the Widrow-Hoff learning algorithm.

A new learning algorithm is therefore required. Originally proposed by P Werbos and later 'rediscovered' by Rumelhart et al, the Generalised Delta Rule or Back-Propagation Rule tackles these problems by calculating the error for a particular input and then back-propagating it to the previous layers. The Back-Propagation algorithm consists of two phases know as the forward pass and the backward pass. At the outset, all the weights and thresholds of the MLP are set to small random values. During the forward pass, inputs are presented to the input neurons. These inputs are then transfer to each of the hidden neurons which computes the sum of the inputs and their weights. Using the following function:

Sum Inputs

Where w is the weight of input i and x is the value of input i. This sum is then passed through the threshold function. The hidden neurons produce an output based on the threshold function which is then transferred to the output neurons. The output neurons do their own summation of inputs and weights and this is again passed through the threshold function. The output of the output neurons is calculated and this is compared to the target output, to produce an error. The error is computed as follows:

Errors

Where t is the target output and o is the actual output. The next stage is the backward pass. This involves adjusting the weights of the connections between the hidden and output neurons and input and hidden neurons to minimise the error. The output of a neuron is based on its threshold function, so the derivative of its threshold function can be used to pass the error back to the previous layer. The function used to calculate the new weights is as follows:

Calculate Weights

opi – The output of the input Neuron. δpj – The error term.

For output units:

Adjust Outputs

And for hidden units:

Adjust Hidden

Where the sum is over the n neurons in the layer above neuron j

η - The Learning Rate. This is a number between 0 and 1 which controls the size of adjustments made to the weights. A small number means that the network could take a long time to learn. A large number will allow the network to learn faster, but could potentially push the network into a local minimum. α – Momentum. This is also a number between 0 and 1 which when used in conjunction with a small learning rate can help to speed up convergence.

As the data set that we are using consists of a time series, we need some way of representing time across our network. We will pre-process time by performing a temporal to spatial transformation. The spatial information will be passed to the network which contains a representation of time.

To achieve this we pass a ‘sliding window’ across our data. At any given time t, for n inputs and time step τ, the input values will be t, t - τ, t - …. t - . The target output will be t + τ.

At each time step the values of the nodes are shifted down and the one at the end drops out. A new value is then inserted at the first node. See the figure below.

Time Space

The Time Delay Neural Network

The main problem with representing time as discussed above is the inability of the network to recognise the same pattern shifted in time.

If we consider the binary pattern 0011100100 at time t, the corresponding pattern at time t + might be 0000111001. The MLP with temporal to spatial transformation would see these as two completely different patterns.

The Time Delay Neural Network (TDNN) can capture this translational invariance. The figure below shows an example of this type of network.

TDNN

Each input and hidden node in the TDNN, stores a fixed amount of previous values, the amount of which depends on the delay length.

The TDNN uses the standard Back Propagation learning algorithm, with a slight variation: All values for each node are used to feed the nodes in the next layer. Weight changes are calculated for each value of each node and an average is taken. All weights for each node are then updated to the same value to achieve invariance.

A ‘sliding window’ will also be used in conjunction with the network to feed in the data.

The Recurrent Neural Network

Using a fixed number of delays, the number of which is set in advance, can however impose limitation of the length of patterns that can be learnt. Additionally, each delay can be viewed to a certain extent, as an independent node. If large numbers of delays are used, this could dramatically increase the length of time required for training. The Recurrent Neural Network (RNN) is an alternative to the TDNN which works on the principle that time and memory are highly task dependent, so networks should be allowed to achieve their own representation of them, instead of having them fixed.

These networks have an extra set of nodes called Context Neurons. At each time step the current value of the hidden units is copied to corresponding context units. A proportion of the previous value of the context unit is also used to provide its new value. The size of this proportion is known as the Memory Depth. The following equation is used to calculate this new value:

Context

Where C is the context unit, H is the hidden unit and α is the proportion. This value is then fed to the network at the next time step together with the new inputs from a sliding window. The back propagation learning algorithm can also be used for weight adjustment. The figure below shows a similar layout to the RNN that is used in this project.

RNN

POST A COMMENT

Markdown available. Required *.

*

*

*

*