The artificial neuron transfer function should not be confused with a linear system's transfer function.
For a given artificial neuron, let there be inputs with signals through and weights through . Usually, the input is assigned the value , which makes it a bias input with . This leaves only actual inputs to the neuron: from to .
The output of -th neuron is:
Where (Phi) is the transfer function.
The output is analogous to the axon of a biological neuron, and its value propagates to input of the next layer, through a synapse. It may also exit the system, possibly as part of an output vector.
The first artificial neuron was the Threshold Logic Unit first proposed by Warren McCulloch and Walter Pitts in 1943. As a transfer function, it employed a threshold, equivalent to using the Heaviside step function. Initially, only a simple model was considered, with binary inputs and outputs, some restrictions on the possible weights, and a more flexible threshold value. Since the beginning it was already noticed that any boolean function could be implemented by networks of such devices, what is easily seen from the fact that one can implement the AND and OR functions, and use them in the disjunctive or the conjunctive normal form.
Researchers also soon realized that cyclic networks, with feedbacks through neurons, could define dynamical systems with memory, but most of the research concentrated (and still does) on strictly feed-forward networks because of the smaller difficulty they present.
One important and pioneering artificial neural network that used the linear threshold function was the perceptron, developed by Frank Rosenblatt. This model already considered more flexible weight values in the neurons, and was used in machines with adaptive capabilities. The representation of the threshold values as a bias term was introduced by Widrow in 1960.
In the late 1980s, when research on neural networks regained strength, neurons with more continuous shapes started to be considered. The possibility of differentiating the activation function allows the direct use of the gradient descent and other optimization algorithms for the adjustment of the weights. Neural networks also started to be used as a general function approximation model.
The transfer function of a neuron is chosen to have a number of properties which either enhance or simplify the network containing the neuron. Crucially, for instance, any multilayer perceptron using a linear transfer function has an equivalent single-layer network; a non-linear function is therefore necessary to gain the advantages of a multi-layer network.
Below, u refers in all cases to the weighted sum of all the inputs to the neuron, i.e. for n inputs,
where w is a vector of synaptic weights and x is a vector of inputs.
The output y of this transfer function is binary, depending on whether the input meets a specified threshold, θ. The "signal" is sent, i.e. the output is set to one, if the activation meets the threshold.
This function is used in perceptrons and often shows up in many other models. It performs a division of the space of inputs by a hyperplane. It is specially useful in the last layer of a network intended to perform binary classification of the inputs. It can be approximated from other sigmoidal functions by assigning large values to the weights.
In this case, the output unit is simply the weighted sum of its inputs plus a bias term. A number of such linear neurons perform a linear transformation of the input vector. This is usually more useful in the first layers of a network. A number of analysis tools exist based on linear models, such as harmonic analysis, and they can all be used in neural networks with this linear neuron. The bias term allows us to make affine transformations to the data.
See: Linear Transformation, Harmonic Analysis, Linear Filter, Wavelets, Principal Component Analysis, Independent Component Analysis, Deconvolution.
A fairly simple non-linear function, the logistic function also has an easily calculated derivative, which can be important when calculating the weight updates in the network. It thus makes the network more easily manipulable mathematically, and was attractive to early computer scientists who needed to minimize the computational load of their simulations. It is commonly seen in multilayer perceptrons using a backpropagation algorithm.
See: Sigmoid function
class TLU defined as:
data member threshold : number
data member weights : list of numbers of size X
function member fire(inputs : list of booleans of size X ) : boolean defined as:
variable T : number
T ← 0
for each i in 1 to X :
if inputs(i) is true :
T ← T + weights(i)
end if
end for each
if T > threshold :
return true
else:
return false
end if
end function
end class
| Input | Initial | Output | Final | |||||||||||
| Threshold | Learning Rate | Sensor values | Desired output | Weights | Calculated | Sum | Network | Error | Correction | Weights | ||||
| TH | LR | X1 | X2 | Z | w1 | w2 | C1 | C2 | S | N | E | R | W1 | W2 |
| X1 x w1 | X2 x w2 | C1+C2 | IF(S>TH,1,0) | Z-N | LR x E | R+w1 | R+w2 | |||||||
| 0.5 | 0.2 | 0 | 0 | 0 | 0.1 | 0.3 | 0 | 0 | 0 | 0 | 0 | 0 | 0.1 | 0.3 |
| 0.5 | 0.2 | 0 | 1 | 1 | 0.1 | 0.3 | 0 | 0.3 | 0.3 | 0 | 1 | 0.2 | 0.3 | 0.5 |
| 0.5 | 0.2 | 1 | 0 | 1 | 0.3 | 0.5 | 0.3 | 0 | 0.3 | 0 | 1 | 0.2 | 0.5 | 0.7 |
| 0.5 | 0.2 | 1 | 1 | 1 | 0.5 | 0.7 | 0.5 | 0.7 | 1.2 | 1 | 0 | 0 | 0.5 | 0.7 |
| 0.5 | 0.2 | 0 | 0 | 0 | 0.5 | 0.7 | 0 | 0 | 0 | 0 | 0 | 0 | 0.5 | 0.7 |
| 0.5 | 0.2 | 0 | 1 | 1 | 0.5 | 0.7 | 0 | 0.7 | 0.7 | 1 | 0 | 0 | 0.5 | 0.7 |
| 0.5 | 0.2 | 1 | 0 | 1 | 0.5 | 0.7 | 0.5 | 0 | 0.5 | 0 | 1 | 0.2 | 0.7 | 0.9 |
| 0.5 | 0.2 | 1 | 1 | 1 | 0.7 | 0.9 | 0.7 | 0.9 | 1.6 | 1 | 0 | 0 | 0.7 | 0.9 |
| 0.5 | 0.2 | 0 | 0 | 0 | 0.7 | 0.9 | 0 | 0 | 0 | 0 | 0 | 0 | 0.7 | 0.9 |
| 0.5 | 0.2 | 0 | 1 | 1 | 0.7 | 0.9 | 0 | 0.9 | 0.9 | 1 | 0 | 0 | 0.7 | 0.9 |
| 0.5 | 0.2 | 1 | 0 | 1 | 0.7 | 0.9 | 0.7 | 0 | 0.7 | 1 | 0 | 0 | 0.7 | 0.9 |
| 0.5 | 0.2 | 1 | 1 | 1 | 0.7 | 0.9 | 0.7 | 0.9 | 1.6 | 1 | 0 | 0 | 0.7 | 0.9 |
Supervised neural network training for an OR gate.
Note: Initial weight equals final weight of previous iteration.