Neural Networks - An Introduction To quickly define neurons, neural networks, and the back propagation algorithm.
1. Introduction A Neural Network (NN) is a computer software (and possibly hardware) that simulates a simple model of neural cells in animals and humans. The purpose of this simulation is to acquire the intelligent features of these cells. In this document, when terms like neuron, neural network, learning, or experience are mentioned, it should be understood that we are using them only in the context of a NN as computer system.NNs have the ability to learn by example, e.g. a NN can be trained to recognize the image of car by showing it many examples of a car. We will discuss neurons, NNs in general, and Back Propagation networks. Back Propagation networks are a popular type of network that can be trained to recognize different patterns including images, signals, and text. This article does not try to prove the usefulness of NNs, when they should be used, or why do they work. It is a high level summary with emphasize on how Back Propagation networks work. 2. History
3. Sigmoid Function The function: s(x)= 1/ (1 + e^{-a * x} )is called a Sigmoid function. The coefficient a is a real number constant. Usually in NN applications a is chosen between 0.5 and 2. As a starting point, you could use a=1 and modify it later when you are fine-tuning the network. Note that s(0)=0.5, s(∞)=1, s(-∞)=0. (The symbol ∞ means infinity). Think of the sigmoid function, in layman terms, as a function that will convert values less than 0.5 to 0, and values greater than 0.5 to 1. The Sigmoid function is used on the output of neurons as will be explained next. 4. Neuron In a NN context, a neuron is a model of a neural cell in animals and humans. This model is simplistic, but as it turned out, is very practical. Think of the neuron as a program (or a class if you like J) that has one or more inputs and produces one output. The inputs simulate the stimuli/signals that a neuron gets, while the output simulates the response/signal which the neuron generates. The output is calculated by multiplying each input by a different number (called weight), adding them all together, then scaling the total to a number between 0 and 1.The following diagram shows a simple neuron with:
In a more general fashion, for n number of inputs: (∑ means the sum of) d= ∑ x_{i} * w_{i} ... for i=1 to n Let θ be a real number which we will call Threshold. Experiments have shown that best values for θ are between 0.25 and 1. Again, in a programmer context, θ is just a variable of type float/real that is initialized to any number between 0.25 and 1.z= s(d + θ) ... Apply the sigmoid to get a number between 0 and 1 This says that the output z is the result of applying the sigmoid function on (d + θ). In NN applications, the challenge is to find the right values for the weights and the threshold.5. Neural Networks A neural network is a group of neurons connected together. Connecting neurons to form a NN can be done in various ways, next are some examples:One of the popular NN is called the Back Propagation network which will be discussed next. 6. Back Propagation Networks The following diagram shows a Back Propagation NN:This NN consists of three layers:
For example, suppose we have a bank credit application with ten questions, which based on their answers, will determine the credit amount and the interest rate. To use a Back Propagation NN, the network will have ten neurons in the input layer and two neurons in the output layer. 6.1 Supervised Training The Back Propagation NN works in two modes, a supervised training mode and a production mode. The training can be summarized as follows:Start by initializing the input weights for all neurons to some random numbers between 0 and 1, then:
The challenge is to find a good algorithm for updating the weights and thresholds in each iteration (step 4) to minimize the error. Changing weights and threshold for neurons in the output layer is different from hidden layers. Note that for the input layer, weights remain constant at 1 for each input neuron weight. Before we explain the training, let's define the following:
6.2 Output Layer Training
e = z * (1 - z) * (y - z) In other words, for each output neuron, calculate its error e, and then modify its threshold and weights using the formulas above. 6.3 Hidden Layer Training Consider a hidden layer neuron as shown in the following figure:
g = ∑ m_{i} * e_{i} ... (for i=1 to r) Notice that in calculating g, we used the weight m_{i} and error e_{i} from the following layer, which means that the error and weights in this following layer should have already been calculated. This implies that during a training iteration of a Back Propagation NN, we start modifying the weights at the output layer, and then we proceed backwards on the hidden layers one by one until we reach the input layer. It is this method of proceeding backwards which gives this network its name Backward Propagation. 7. Conclusion NNs are being used in many businesses and applications. Their ability to learn by example is very attractive in environments where the business rules are either not well defined or are hard to enumerate and define.You may wander about the formulae and constants values, why did we do this or choose that. That's good but beyond the scope of this article. Check some of the resources for more details. 8. Resources
Acknowledgement: Doug Estep provided valuable feedback reviewing this article. |