An Artificial Neural Network is an approach to modelling the
structure and function of the brain. It is an attempt to simulate with
specialised hardware or software, the simple information processing capabilities
of neurones connected in multiple layers. It is closely aligned with fields such
as connectionism, parallel distributed processing, neuro-processing, and natural
This paper is intended as an overview of the
subject, and does not delve too deeply into the mathematical or technical realms
of Artificial Neural Networks.
The 1943 release of a
paper on how neurones might work by neurophysiologist Warren McCulloch and
mathematician Walter Pitts and the construction of a simple neural network with
electrical circuits is seen as the first step towards artificial neural
networks. Donald Hebb reinforced this work with the 1949 book "The Organisation
of Behavior", which suggested that neural pathways were strengthened each time
they were used.
The introduction of early electronic computers in the
1950s allowed the basis of these theories to be modelled by IBM's Nathanial
Rochester in the first computer simulation of a neural network. At this time
traditional serial computing began to steal the limelight from artificial
intelligence and neural network studies, although academic seminars like the
1956 Dartmouth Summer Research Project on Artificial Intelligence provided some
momentum to AI and neural processing.
In 1957, John von Neumann
suggested using telegraph relays or thermionic valves to simulate simple neurone
functions. In the same year, neuro-biologist Frank Rosenblatt began work on an
machine modelled on the eye of a fly. The Perceptron was a hardware device with
a single layer of processing. It worked by computing the weighted sum of its
inputs, subtracting a threshold level and passed out one of two possible values.
Perceptrons are mainly used for pattern recognition.
1959, Bernard Widrow and Marcian Hoff of Stanford developed the first neural
network models to be applied to a real world problem. The models ADALINE and
MADALINE were adaptive filters built to eliminate echo on telephone lines.
Research in neural networks went through a dark age until 1982 when John
Hopfield of Caltech presented a paper to the National Academy of Sciences which
showed through mathematical analysis what could and could not be achieved by
neural networks. The last 20 years have seen neural networks become a "hot"
research topic, and embraced by commercial interests and be applied to numerous
Biological vs Artificial Neurones
Biological neurones consist of a cell body or soma containing a
nucleus, branch-like dendrites, that transfer information via synapses from
surrounding cells to the soma, and an axon that carries the nerve impulse from
the some to its target structure.
The artificial neurone is a grossly
simplified model of the biological specimen containing the four basic elements;
synapses, dendrites, soma, and axon.
The synapses and dendrites of the artificial neurone are the inputs
to the processing element (soma). Each of the inputs has an associated
connection weight which simulates the strength of a particular synaptic
connection. The processing element multiplies each input by its connection
weight and usually sums these products, which is then passed to the transfer
function to generate a result which is transmitted via the output path. The
transfer function dictates the firing of the neurone. This could be based on a
certain threshold level, a linear function, or a sigmoid function where the
threshold for output varies. Neurones can be classed as excitatory or inhibitory
depending on the effect their output has on the output of a target neurone.
An approximation of
the 3-dimensional interconnectedness of biological neurones is achieved in
artificial neural networks by the use of layers.
The three types of
layers involved are input, output, and hidden. The input layer accepts some kind
of real-world stimulus. This is transmitted to one or many hidden layers, which
process the input information with regard to its connections with the input
layer, and the weights of those connections. The results of these
transformations are then passed to the output layer, and again processed with
regard to connections and weights and is communicated to the user or
Layer connection types
of complexity is achieved in the artificial neural network through the
application of different systems of layer connection. Between layer
(inter-layer) can consist of the following systems:
Each neurone of the first layer is connected to each of the second layer.
Partially Connected Not all neurones in the first layer are
connected to the second layer.
Feed Forward Information from the
first layer flows to the second layer without any feedback from the second
Pathways exist to allow the output of the second layer to become the input
of the first layer.
Hierarchical Where neurones of one level
may only communicate with neurones of the adjacent level.
The use of bi-directional connections to facilitate reaching a target
The complexity of layer connections can be increased by the
use of two types of intra-layer systems.
Neural networks are trained towards specific outputs by
imposing a learning scheme. Learning occurs where a network alters the weights
of its component connections, so as to bring it closer to a desired output or
problem solution. Three learning schemes are used:
The hidden layers organise the weights of their connections with influence
from outside the network.
Reinforcement/supervised learning Where
the weights of the connections in the hidden layers are randomised and the
resultant output is graded by an instructor or target data set as to how near
the desired output it is.
Back propagation A highly successful
method of training multi-level networks where not only feedback on proximity to
target outputs is returned to the hidden layers, but also information on error
Off-line In off-line methods of learning once the network is
in operation its connection weights are fixed. Most networks are off-line.
On-line On-line or real-time learning is where the system continues
to learn whilst being used as a decision tool, such as in a decision support
system. This type of learning method requires a complex architecture.
are mathematical algorithms that dictate how the connection weights of a neural
network will be altered after learning. This is again a crude approximation of
biological function as our knowledge of biological learning systems is
incomplete. Some of the major laws are:
Hebb's Rule If a neurone
receives input from another neurone, and if both are highly active, the weight
between the neurones should be strengthened.
Hopfield Law If the
desired output and input are both active or both inactive, increment the
connection weight by the learning rate (usually a positive number between zero
The Delta Rule / Least Mean Squared Rule / Windrow – Hoff Rule
A rule where the input connection weights are continuously modified to
reduce the difference (delta) between the desired output and the actual output
of the neurone. This rule aims to minimise the mean squared error of the
network, and error data is back-propagated through the layers in sequence until
the first layer is reached.
Kohenon's Learning Law A law where
neurones compete for the opportunity to change their connection weights. The
neurone with the largest output is given the power to inhibit its competitors
and excite its neighbours.
applications of neural networks can be categorised by five functions:
Prediction This is the use of some input values to predict an
output. Examples of this can be seen in stock market prediction, cardiovascular
disease prediction, and airline booking systems.
use of input values to classify objects. Pattern recognition is a major
application area of neural networks. This can also include tasks like
handwriting recognition, text to speech conversion, diagnosis of disease,
optical quality control, image processing, and chemical analysis.
association Similar to the classification of objects, but including feedback
on errors in the system. Could be found in fault tolerant systems.
conceptualisation Analyse inputs so that grouping relationships can be
inferred. This finds use in the creation of market demographics, and data
Data filtering The use of the neural network to reduce
errors or noise in an input. This capability is found in image processing, and
improving signal to noise ratio in communication systems.
networks and parallel systems in general are becoming a major force in high end
computing. Industry has taken over what was once an obscure research field. IBM
is now producing large parallel systems, and at the other end of the market, low
cost supercomputers called Beowulf clusters consisting of scavenged PCs are
being built in basements and garages around the world. Despite this the
intelligence of HAL, the paranoid onboard computer in Stanley Kubrick's 2001: A
Space Odyssey still seems to be science fiction for the moment.