In this article we will discuss about:- 1. Introduction to the Models of a Neuron 2. Activation Functions of Neuron 3. Stochastic Model.
Introduction to the Models of a Neuron:
A neuron is an information-processing unit which is fundamental to the operation of a neural network. The block diagram of Fig. 11.4., shows the model of a neuron, which forms the, modern basis for designing (artificial) neural networks.
Here we identify three basic elements of the neuronal model:
ADVERTISEMENTS:
1. A set of synapses or connecting links, each of which is characterised by a weigh t or strength of its own. Specifically, a signal xi at the input of synapse ij connected to neuron k is multiplied by the synaptic weight wki the first subscript refers to the neuron in question and the second subscript refers to the input end of the synapse to which the weight refers. Unlike a synapse in the brain, the synaptic weight of an artificial neuron may lie in a range which includes negative as well as positive values.
2. An adder for summing the input signals, weighted by the respective synapses of the neuron; the operations described here constitute a linear combine.
3. An activation function for limiting the amplitude of the output of a neuron. The activation function is also referred to as a squashing function in that it squashes (limits) the permissible amplitude range of the output signal to some finite value.
A network with all the inputs connected directly to the outputs is called a single layer neural networks or a perception network, since “each output unit is independent of the others each weight affects only one of the outputs.”
ADVERTISEMENTS:
Typically, the normalised amplitude range of the output of a neuron is written as the closed unit interval [0, 1] or alternatively [-1, 1]
The basic neuronal model of Fig. 11.4., also includes an externally applied bias, denoted by bk. The bias bk has the effect of increasing or lowering the net input of the activation function, depending on whether it is positive or negative, respectively.
In mathematical terms, we may describe a neuron k by writing the following pair of equations:
where x1, x2… xn are the input signals; wk1, wk2,…, wkm are the synaptic weights of neuron k : uk is the linear combiner output due to the input signals; bk is the bias; ϕ(.) is the activation function; and yk is the output signal of the neuron. The use of bias bk has the effect of applying an affine transformation to the output vk of the linear combiner in the model of Fig. 11.4., as shown by
ADVERTISEMENTS:
vk= uk + bk ….(11.3.)
In particular, depending on whether the bias bk is positive or negative, the relationship between the induced local field or activation potential vk of neuron k and the linear combiner output vk is modified in the manner illustrated in Fig. 11.5.; the graph of vk versus uk no longer passes through the origin as a result of affine transformation.
Equivalently, we may formulate the combination of Equations (11.1) to (11.3) as follows:
In Eq. (11.4.) we have added a new synapse. Its input is
We may therefore reformulate the model of neuron k as in Fig. 11.6.
In this figure, the effect of the bias is accounted for by doing two things:
(1) Adding a new input signal fixed at +1, and
(2) Adding a new synaptic weight equal to the bias bk.
Although the models of Figs. 11.4. and 11.6., are different in appearance, they are mathematically equivalent.
Activation Functions of Neuron:
The purpose of non-linear activation function is to ensure that the neuron’s response is bounded i. e, the actual response of the neuron is conditioned or damped as a result of large and small activating simulator and thus controllable. Further in order to achieve the advantages of multilayer nets compared with the limited. *Capabilities of single layer networks, non-linear functions are required.
The activation function, denoted by ϕ(v), defines the output of a neuron in terms of the induced local field v. It is designed to meet two consideration. First we want the unit to be ‘active’ (near + 1) when the ‘right’ inputs are given and ‘inactive’ (near 0) when the ‘wrong’ inputs are given. Second, the activation function needs to be non-linear, otherwise the entire neural network collapses into a simpler linear function.
Here we identify three basic types of activation functions:
1. Threshold Function:
For this type of activation function, in Fig. 11.7a, we have
In engineering literature, this form of a threshold function is commonly referred to as a Heaviside function.
Correspondingly, the output of neuron k employing such a threshold function is expressed as:
where, vk is the induced local field of the neuron; that is,
where, the amplification factor inside the linear region of operation is assumed to be unity. This form of an activation function may be viewed as an approximation to a non-linear amplifier.
Such a neuron is referred to as the McCulloch-Pitts model, in recognition of the pioneering work done by McCulloch and Pitts (1943). In this model the output of a neuron takes on the value of 1 if the induced local field of that neuron is non negative, and 0 otherwise. This statement describes the all-or-none property of the McCulloch-Pitts model.
2. Piecewise-Linear Function:
Piecewise- linear functions are combinations of various linear functions, where the choice of the linear function depends on the relevant region of the input space. Step and ramp function are special cases of piece-wise linear functions which consist of some finite number of linear segments and thus differentiable almost everywhere with the second derivative = 0, where it exists.
The following two situations may be viewed as special forms of the piecewise- linear function;
I. A linear combiner arises if the linear region of operation is maintained without running into saturation. In that case the threshold perception is called a linear separator or identity function, as shown in Fig. 11.8 (a).
Fig. 11.8(a). Linear function.
II. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large.
3. Sigmoid Function:
The sigmoid function, whose graph is s-shaped, is by far the most common form of activation function used in the construction of artificial neural networks. It is defined as a strictly increasing function which exhibits a graceful balance between linear and non-linear behavior.
An example of the sigmoid function is logistic function, defined by:
where, a is the slope parameter of the sigmoid function. By varying the parameter a, we obtain sigmoid functions of different slopes, as illustrated in Fig. 11.7c. In fact, the slope at the origin equals α/4. In the limit, as the slope parameter approaches infinity, the sigmoid function becomes simply a threshold function.
Where as a threshold function assumes the value of 0 or 1, a sigmoid function assumes a continuous range of value 0 to 1. Moreover the sigmoid function is differentiable, whereas the threshold function is not. (Differentiability is an important feature of neural network theory for weight learning algorithm).
All the functions have a threshold at zero; the bias weight sets the actual threshold for the unit as activated when the weighted sum of real inputs exceeds the bias weight.
The activation functions defined in Eqs. (11.8), (11.11) and (11.12) range from 0 to +1. It is sometimes desirable to have the activation function range from -1 to +1 in which case the activation function assumes an anti-symmetric form with respect to the origin; that is, the activation function is an odd function of the induced local field.
Specifically, the threshold function of Eq. (11.8) is now defined as:
which is commonly referred to as the signum function or bipolar sigmoidal function. For the corresponding form of a sigmoid function we may use the hyperbolic tangent function, defined by
Allowing an activation function of the sigmoid type to assume negative values as prescribed by Eq. (11.14) has analytic benefits.
Gaussian Functions:
Bell-shaped curves such as the one shown in Fig. 11.8(b)., have come to be known as Gaussian or radial basis functions.
Fig. 11.8(b). Gaussian node function; ϕ(x) asymptotically approaches u (or some constant) for large magnitude of v and ϕ(v) has a single maximum for v = h.
Stochastic Model of a Neuron:
The neuronal model described in Fig. 11.6 is deterministic in that its input-output behavior is precisely defined for all inputs. For some applications of neural networks, it is desirable to base the analysis on a stochastic neuronal model. In an analytically tractable approach, the activation function of the McCulloch-Pitts model is given a probabilistic interpretation.
Specifically, a neuron is permitted to reside in only one of two states: +1 or -1, say. The decision for a neuron to fire (i.e., switch its state from ‘off to ‘on’) is probabilistic. Let x denote the state of the neuron and P (v) denote the probability of firing, where v is the induced local field of the neuron.
We may then write:
where, T is a pseudo-temperature which is used to control the noise level and therefore the uncertainty in firing. It is important to realise, however, that T is not the physical temperature of a neural network, be it a biological or an artificial neural network.
Rather, we should think of T merely as a parameter which controls the thermal fluctuations representing the effects of synaptic noise; when T → 0, the stochastic neuron described by Eq. (11.15) reduces to a noiseless (i.e., deterministic) form, namely the McCulloch-Pitts model.