In this article we will discuss about:- 1. Introduction to Learning Neural Networks 2. Learning Rules of Neurons in Neural Networks.
Introduction to Learning Neural Networks:
The property which is of primary significance for a neural network is the ability of the network to learn from its environment, and to improve its performance through learning. The improvement in performance takes place over time in accordance with some prescribed measure.
A neural network learns about its environment through an inter-active process of adjustments applied to its synaptic weights and bias levels. Ideally, the network becomes more knowledgeable about its environment after each iteration of the learning process.
There are too many activities associated with the notion of learning. Moreover, the process of learning is a matter of view-point, which makes it all the more difficult to agree on a precise definition of the term. For example, learning as viewed by a psychologist is quite different from learning in a classroom sense. Recognising that our particular interest is in neural networks, we use a definition of learning which is adapted from Mendel and McClaren (1970).
ADVERTISEMENTS:
We define learning in the context of neural networks as:
Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place.
This definition of the learning process implies the following sequence of events:
1. The neural network is stimulated by an environment.
ADVERTISEMENTS:
2. The neural network undergoes changes in its free parameters as a result of this stimulation.
3. The neural network responds in a new way to the environment because of the changes which have occurred in its internal structure.
A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. There is no unique learning algorithm for the design of neural networks. Rather, we have a kit of tools represented by a diverse variety of learning algorithms, each of which offers advantages of its own. Basically, learning algorithms differ from each other in the way in which the adjustment to a synaptic weight of a neuron is formulated.
Another factor to be considered is the manner in which a neural network (learning machine), made up of a set of interconnected neurons, reacts to its environment. In this latter context we speak of a learning paradigm which refers to a model of the environment in which the neural network operates.
ADVERTISEMENTS:
The five learning rules:
1. Error-correction learning,
2. Memory-based learning,
3. Herbbian learning,
ADVERTISEMENTS:
4. Competitive learning and
5. Boltzmann learning are basic to design of neural networks.
Some of these algorithms require the use of a teacher and some do not called supervised and non-supervised learning respectively.
In the study of supervised learning, a key provision is a ‘teacher’ capable of supplying exact corrections to the network outputs when an error occurs. Such a method is not possible in biological organism which have neither the exact reciprocal nervous connections needed for the back propagation of error corrections nor the nervous means for the in position of behaviour from outside.
ADVERTISEMENTS:
Nevertheless, supervised learning has established itself as a powerful paradiagram for the design of artificial neural networks. In contrast self-organised (unsupervised) learning is motivated by neurobiological considerations.
Learning Rules of Neurons in Neural Networks :
Five basic learning rules of Neuron are:
1. Error correctional earning,
2. Memory based- learning,
3. Hebbian learning,
4. Competitive learning and
5. Boltzmann learning.
Error correction learning is rooted in optimum filtering, Memory-based learning and competitive learning are both inspired by neurobiological considerations. Boltzmann learning is different and is based on ideas borrowed from statistical mechanics. Also two learning paradigms, learning with a teacher and learning without a teacher, including the credit-assignment problem, so basic to learning process have been discussed.
1. Error-Correction Learning:
To illustrate our first learning rule of learning process consider the simple case of a neuron k constituting the only computational node in the output layer of a feed forward neural network, as depicted in Fig. 11.21. Neuron k is driven by a signal vector x(n) produced by one or more layers of hidden neurons, which are themselves driven by an input vector (stimulus) applied to the source nodes (i.e., input layer) of the neural network.
The argument n denotes discrete time, or more precisely, the time step of an iterative process involved in adjusting the synaptic weights of neuron k. The output signal of neuron k is denoted yk(n). This output signal, representing the only output of the neural network, is compared to a desired response or target output, denoted by yk(n). Consequently, an error signal, denoted by ek(n), is produced. By definition, we thus have
The error signal ek(n) actuates a control mechanism, the purpose of which is to apply a sequence of corrective adjustments to the synaptic weights of neuron k. The corrective adjustments are designed to make the output signal yk(n) come closer to the desired response dm(n) in a step-by-step manner.
This objective is achieved by minimizing a cost function or index of performance ɛ(n) defined in terms of the error signal ek(n) as:
That is, ԑ(n) is the instantaneous value of the error energy. The step-by-step adjustments to the synaptic weights of neuron k are continued until the system reaches a steady state (i.e., the synaptic weights are essentially stabilized. At that point the learning process is terminated.
The learning process described herein is obviously referred to as error correction learning. In particular, minimisation of the cost function ԑ(n) leads to a learning rule commonly referred to as the delta rule or Widrow-Hoff rule, named in honor of its originators. Let ωkj (n) denote the value of synaptic weight ωkj. of neuron k excited by element xj (n) of the signal vector x(n) at time step n. According to the delta rule, the adjustment Δωkj(n) applied to the synaptic weight ωkj at time step n is defined by
Δ ωkj (n)= ƞek (n) xj (n)
where, ƞ is a positive constant which determines the rate of learning as we proceed from one step in the learning process to another. It is therefore natural that we refer to n as the learning-rate parameter.
In other words, the data rule maybe stated as:
The adjustment made to a synaptic weight of a neuron is proportional to the product of the error signal and the input signal of the synapse in question.
The delta rule, as stated herein, presumes that the error signal is directly measurable. For this measurement to be feasible we clearly need a supply of desired response from some external source, which is directly accessible to neuron k.
In other words, neuron k is visible to the outside world, and depicted in Fig. 11.21(a). From this figure we also observe that error-correction learning is in fact local in nature. This amounts to saying that the synaptic adjustments made by the delta rule are localised around neuron k.
Having computed the synaptic adjustment Δωkj(n), the updated value of synaptic weight Δωkj, is given by equation 11.26.
Effectively, ωkj(n) and ωkj(n + 1) may be viewed as the old and new values of synaptic weight ωkj, respectively.
In computational terms we may also write:
where, z-1 is the unit-delay operator. That is, z-1 represents a storage element.
Fig. 11.21(b) shows a signal-flow graph representation of the error-correction learning process, with regard to neuron k. The input signal xj and the induced local field vk of the neuron k are referred to as presynaptic and postsynaptic signals of the jth synapse of neuron k, respectively. Also, the Fig. shows that the error-correction learning is an example of a closed-loop feedback system.
But from the control theory we know that the stability of such a system is determined by those parameters which constitute the feedback loops of the system. In this case there is a single feedback loop and the one of the parameters of interest is ƞ, the learning rate. So to ensure the stability of convergence of iterative learning ƞ should be selected judiciously.
2. Memory-Based Learning:
In memory-based learning, all (or most) of the past experiences are explicitly stored in a large memory of correctly classified input-output examples: [(xi, di)]Ni =1, where xi denotes an input vector and di denotes the corresponding desired response. Without loss of generality, we have restricted the desired response to be a scalar.
For example, in a binary pattern classification problem there are two classes of hypotheses, denoted by ԑ1and ԑ2, to be considered. In this example, the desired response d i takes the value 0 (or -1) for class ԑ1 and the value 1 for class ԑ 2. When classification of a test vector test (not seen before) is required, the algorithm responds by retrieving and analysing the training data in a “local neighborhood” of xtest.
All memory-based learning algorithms involve two essential ingredients:
a. Criterion used for defining the local neighbourhood of the test vector xtest.
b. Learning rule applied to the training examples in the local neighbourhood of xtest.
The algorithms differ from each other in the way in which these two ingredients are defined.
In a simple yet effective type of memory-based learning known as the nearest neighbor rule, the local neighbourhood is defined as the training example which lies in the immediate neighbourhood of the test vector xtest. In particular, the vector.
where, d(xi, xtest ) is the Euclidean distance between the vectors xi and xtest. The class associated with the minimum distance, that is, vector x’N is reported as the classification of xtest . This rule is independent of the underlying distribution responsible for generating the training examples.
Cover and Hart (1967) have formally studied the nearest neighbour rule as a tool for pattern classification.
The analysis is based on two assumptions:
a. The classified examples (xi, di) are independently and identically distributed (iid), according to the joint probability distribution of the example (x, d).
b. The sample size N is infinitely large.
Under these two assumptions, it is shown that the probability of classification error incurred by the nearest neighbour rule is bounded about by twice the Bayes probability of error, that is, the minimum probability of error over all decision rule. In this sense, it may be said that half the classification information in a training set of infinite size is contained in the nearest neighbour, which is a surprising result.
A variant of the nearest neighbour classifier is the k-nearest neighbour classifier, which proceeds as:
a. Identify the k classified patterns which lie nearest to the test vector xtest for some integer k.
b. Assign xtest to class (hypothesis) which is most frequently represented in the k nearest neighbours to xtest (i.e., use a majority vote to make the classification).
Thus, the k-nearest neighbour classifier acts like an averaging device.
3. Hebbian Learning (Generalised Learning) Supervised Learning:
Hebb’s postulate of learning is the oldest and the most famous of all learning rules; it is named in honor of the neuropsychologist Hebb (1949).
When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A’s efficiency as one of the cells firing B, is increased.
Hebb proposed this change as a basis of associative learning (at the cellular level), which would result in an enduring modification in the activity pattern of a spatially distributed “assembly of nerve cells”.
This statement is made in a neurobiological context. We may expand and rephrase it as a two-part rule:
a. If two neurons on either side of a synapse are activated simultaneously (i.e., synchronously), then the strength of that synapse is selectively increased.
b. If two neurons on either side of a synapse are activated asynchronously, then that synapse is selectively weakened or eliminated.
Such a synapse is called Hebbian synapse. More precisely, we define a Hebbian synapse as a synapse which uses a time-dependent, highly local, and strongly interactive mechanism to increase synaptic efficiency as a function of the correlation between the presynaptic and postsynaptic activities.
From this definition we may deduce the following four key properties which characterise a Hebbian synapse:
i. Time-Dependent Mechanism:
This mechanism refers to the facts that the modifications in a Hebbian synapse depend on the exact time of occurrence of the presynaptic and postsynaptic signals.
ii. Local Mechanism:
By its very nature, a synapse is the transmission site where information-bearing signals (representing on going activity in the presynaptic and postsynaptic units) are in spatio temporal contiguity. This locally available information is used by a Hebbian synapse to produce a local synaptic modification which is input specific.
iii. Interactive Mechanism:
The occurrence of a change in a Hebbian synapse depends on signals on both sides of the synapse. That is, a Hebbian form of learning depends on a “true interaction” between presynaptic and postsynaptic signals in the sense that we cannot make a prediction from either one of these two activities by itself.
iv. Conjunctional or Correlation Mechanism:
One interpretation of Hebb’s postulate of learning is that the condition for a change in synaptic efficiency is the conjunction of presynaptic and postsynaptic signals. Thus, according to this interpretation, the co-occurrence, of presynaptic and postsynaptic signals (within a short interval of time) is sufficient to produce the synaptic modification. It is for this reason that a Hebbian synapse is sometimes referred to as a conjunctional synapse or correlational synapse.
4. Competitive Learning Unsupervised Learning:
In competitive learning, as the name implies, the output neurons of a neural network compete among themselves to become active. Whereas in a neural network based on Hebbian learning several output neurons may be active simultaneously, in competitive learning only a single output neuron is active at anyone time. It is this feature which makes competitive learning highly suited to discover statistically salient features which may be used to classify a set of input patterns.
There are three basic elements to a competitive learning rule:
i. A set of neurons which are all the same except for some randomly distributed synaptic weights, and which therefore, respond differently to a given get of input patterns.
ii. A limit imposed on the ‘strength’ of each neuron.
iii. A mechanism which permits the neurons to compete for the right to respond to a given subset of inputs, such that only one output neuron or only one neuron per group, is active (i.e., ‘on’) at a time. The neuron which wins the competition is called a winner-takes-all neuron.
Accordingly the individual neurons of the network learn to specialise on ensembles of similar patterns; in so doing they become feature detectors for different classes of input patterns.
In the simplest form of competitive learning, the neural network has a single layer of output neurons, each of which is fully connected to the input nodes. The network may include feedback connections among the neurons, as indicated in Fig. 11.22. In the network architecture described herein, the feedback connections perform lateral inhibition, with each neuron tending to inhibit the neuron to which it is laterally connected. In contrast, the feed forward synaptic connections in the network of Fig. 11.15 all are excitatory.
For a neuron k to be the winning neuron, its induced local field vk for a specified input pattern x must be the largest among all the neurons in the network. The output signal yk of winning neuron k is set equal to one; the output signals of all the neurons which lose the competition are set equal to zero.
We thus write:
where, the induced local field yk represents the combined action of all the forward and feedback inputs to neuron k.
Let ωkj denote the synaptic weight connecting input node j to neuron k. Suppose that each neuron is allotted a fixed amount of synaptic weight (i.e., all synaptic weights are positive), which is distributed among its input nodes that is, for all k
A neuron then learns by shifting synaptic weights from its inactive to active input nodes. If a neuron does not respond to a particular input pattern, no learning takes place in that neuron.
If a particular neuron wins the competition, each input node of that neuron relinquishes some proportion of its synaptic weight, and the weight relinquished is then distributed equally among the active input nodes. According to the standard competitive learning rule, the change Δωkj applied to synaptic weight ωkj is defined by
where, ƞ is the learning rate parameter. This rule has the overall effect of moving the synaptic weight vector ωk of winning neuron k towards the input pattern,v
5. Boltzmann Learning:
The Boltzmann learning rule, named in honor of Ludwig Boltzmann, is a stochastic learning algorithm derived from idea rooted in statistical mechanics. A neural network designed on the basis of the Boltzmann learning rule is called a Boltzmann machine.
In a Boltzmann machine the neurons constitute a recurrent structure, and they operate in a binary manner since, for example, they are either in an ‘on’ state denoted by + 1 or in an ‘off’ state denoted by -1. The machine is characterised by an energy function; E the value of which is determined by the particular states occupied by the individual neurons of the machine, as shown by
where xj is the state of neuron j and ωkj is the synaptic weight connecting neuron j to neuron k. The fact that j ≠ k means simply that none of the neurons in the machine has self-feedback. The machine operates by choosing a neuron at random for example, neuron k is at some step of the learning process, then flipping the state of neuron k from state xk to state – xk at some temperature T with probability
where, ΔEk is the energy change (i.e., the change in the energy function of the machine) resulting from such a flip. We may note that T is not a physical temperature, but rather a pseudo temperature under stochastic Model of a Neuron. If this rule is applied repeatedly, the machine will reach thermal equilibrium.
The neurons of a Boltzmann machine partition into two functional groups:
a. Visible and
b. Hidden.
The visible neurons provide an interface between the network and the environment in which it operates, whereas the hidden neurons always operate freely.
There are two modes of operation to be considered:
I. Clamped condition, in which the visible neurons are all clamped onto specific states determined by the environment.
II. Free-running condition, in which all the neurons (visible and hidden) are allowed to operate freely.
Let P+kj denote the correlation between the states of neurons j and k, with the network in its clamped condition and P–kj the correlation between the states of neurons j and k with the network in its free-running condition. Both correlations are averaged over all possible states of the machine when it is in thermal equilibrium.
Then, according to the Boltzmann learning rule, the change Δωkj applied to the synaptic weight ωkj from neuron j to neuron k is defined by:
Δωkj = ƞ (ρk+j – ρ–kj), j≠ k….(11.35)
where, ƞ is learning-rate. Moreover both ρkj+ and ρ–kj range in value from -1 to +1.