Pages

Thursday, 30 May 2019

A journey from a simple Perceptron (Artificial Neuron) to complex Neural Networks

Perceptron is an artificial neuron and is the fundamental unit of a neural network in deep learning. It is also called single layer neural network or single layer binary linear classifier.

Perceptron takes inputs which can be real or boolean, assigns random weights to the inputs along with a bias, takes their weighted sum, pass it through a threshold function which will decide whether to take any action on it or not depending upon some threshold value, and finally perform linear binary classifications. This threshold function is usually a step function.

Mathematical representation of perceptron looks like an if-else condition, if the weighted sum of the inputs is greater than a threshold value, output will be 1 else output will be 0.

Weights

Accuracy of an algorithm mainly depends upon the right assignment of the weights. That is what Gradient Descent does during back-propagation. 

Lets understand weights in layman language:

Consider a movie. Whether a person will go to see a movie or not depends upon different factors (features) like genre (comedy, horror, romance etc.), actor, director etc. Some people will give more weight to the genre and lesser weight to actor and director. Some will give more weight to the actor in the movie and lesser weight to genre.

Consider a cell phone. Generally, the relationship between the price of a phone and likeliness to buy a phone is inversely proportional (except for a few fan boys). For someone who is an iPhone fan, he/she will be more likely to buy a next version of the phone irrespective of its price. But on the other hand, an ordinary consumer may give more importance to budget offerings from other brands. 

The point here is, all the inputs don’t have equal importance in the decision making and weights for these features depend on the data and the task at hand.

Applications of Perceptrons

Perceptrons can be used to solve any problem which contains linearly separable set of inputs. For example, we can implement logic gates like OR and AND because these are linearly separable.

Limitation of Perceptrons

Perceptron can only learn linearly separable functions. It cannot handle non-linear inputs. For example, it cannot implement XOR gate as it can’t be classified by a linear separator. 

Neural Network

To address above limitation of Perceptrons, we’ll need to use a multi-layer perceptron, also known as feed-forward neural network. A neural network is a composition of perceptrons, connected in different ways and operating on different activation functions.

1. All the layers (input layer, hidden layers and output layer) are interconnected.

2. Weight is added to each input and then bias is added to per neuron and then it is passed to the activation function.

3. First forward propagate the weighted sum, calculate the error, backward propagate and update the weights using gradient descent and keep doing the same until a satisfactory result is achieved

Types of Neural Networks

1. Feed Forward Neural Network: This is the simplest neural network. Data flows only in forward direction from input layer to hidden layers to output layer. It may or may not have a hidden layer. At most it contains only one hidden layer. All nodes are fully connected. Back propagation method is used to train these kind of networks.

2. Deep Feed Forward Neural Network: Same as Feed Forward Neural Network. Difference is that, it has a lot more hidden layers. Back propagation method is used to train these kind of networks.

3. Radial Basis Function Neural Network: RBF neural networks are a type of feed forward neural networks that use radial basis function as activation function instead of logistic function. Instead of just outputting 0 or 1 (as in logistic function), radial basis functions consider the distance of a point with respect to the center.

4. CNN (Convolutional Neural Network)

5. Capsule Neural Networks

6. RNN (Recurrent Neural Network)

7. LSTM (Long Short-Term Memory Networks)

8. Autoencoders

For more types of neural networks, please visit this article.

Hidden Layers

The hidden layer is where the network stores it’s internal abstract representation of the training data, similar to the way that a human brain has an internal representation of the real world. 

Feature extraction happens at hidden layers. We can keep increasing the number of hidden layers to obtain higher accuracy. It should also be noted that increasing the number of layers above a certain point may lead the model to overfit.

Comparison of Deep and Shallow Neural Networks

Shallow neural networks have only one hidden layer as opposed to deep neural networks which have several hidden layers.

Advantages of Deep Neural Networks

1. Deep neural networks are better in learning and extracting features at various levels of abstraction as compared to shallow neural networks.

2. Deep neural networks have better generalization capabilities.

Disadvantages of Deep Neural Networks

1. Vanishing Gradients: As we add more and more hidden layers, back-propagation becomes less and less useful in passing information to the lower layers. As information is passed back, the gradients begin to vanish and become small relative to the weights of the networks.

2. Overfitting: As we keep on adding more and more layers to a neural network, chances of overfitting increase. So, we should maintain reasonable number of hidden layers in deep neural networks. 

3. Computational ComplexityAs we keep on adding more and more layers to a neural network, computational complexity increases. So, again, we should maintain reasonable number of hidden layers in deep neural networks. 

Activation Functions

Activation functions (like Sigmoid, Hyperbolic Tangent, Threshold, ReLuare what make a neural network adapt the non-linear behavior otherwise they will still be linear. For more details on activation functions, you can look into my this post.

Training Perceptrons using Back Propagation

The most common deep learning algorithm for supervised training of the multi-layer perceptrons is known as back-propagation. Following are the basic steps:

1. A training sample is presented and propagated forward through the network.

2. The output error is calculated, typically the mean squared error or root mean square error.

3. Weights are updated using Gradient Descent algorithm.

4. Above steps are repeated again and again until a satisfied result or accuracy is obtained.

No comments:

Post a Comment