Feedforward Neural Network Explained

5 min readFeb 14, 2022

- By Lakshay Wadhwa, CoffeeBeans Consulting

Feedforward neural networks are generic versions of the neural networks.

Let’s start with perceptron

Perceptron is a single-layer neural network and a linear classifier(binary) which is used in supervised learning. The neural network works the same way as perceptron. So if one wants to understand how neural network works, one needs to understand how perceptron works.

It has 4 parts:

Input Layer
Weights and Bias
Weighted Sum
Activation Function

The weighted sum is calculated using input layers and weights, the bias is added to it, and then it goes through a step function or an activation function to calculate the output. The main objective of the single-layer perceptron model is to analyze the linearly separable objects with binary outcomes.

Why we need neural networks

A Multi-Layer Perceptron is called a neural network. A neural network can be thought of as a function that approximates the relation between given inputs and the output. Mostly with supervised learning, we try to learn the function that maps inputs with the output and later we use that function to predict the output on new inputs. If the relationship between input and output is linear, then we could simply use linear regression but in the case of a situation where input and outputs have a complex and non-linear relationship, linear regression will not be very useful.

A feedforward neural network or a simple neural network is a collection of neurons. Each of the above-mentioned layers has specific names. The first layer is called as input layer which combines the multiple features in the input data and it is called input vector. The intermediate layers are called hidden layers. We can have multiple hidden layers and if the number of hidden layers is greater than 1, we call it a deep network.

A deep network is a network with a number of hidden layers greater than 1.

The final layer is called the output layer where we get the output.

y1….to yk are the outputs respective to each class defined.

n doesn’t need to be the same as k and each layer can have a different size.

Each of the elements in the network is an artificial neuron. Any neuron in the first hidden layer is having inputs from all the entities in the input layer.

Each input to the neuron has an associated weight, the inputs are first multiplied with their weights and a bias term is added that combines to form the weighted sum. The weighted sum then goes through the activation function. Such networks where all the neurones are connected to all the other neurones, are known as fully connected networks.

Why do we need the activation function?

A neural network without an activation function is just a simple regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.
Sigmoid and softmax functions are one of the many activations functions available which are used frequently.

softmax function for multi-class classification

sigmoid function for binary classification

Importance of non-linearity

When we add 2 linear objects we simply get another linear formation. The proportions don’t change anything in terms of linear/non-linear output. But when we add 2 non-linear curves, we get a more complex non-linear curve.

We are introducing non-linearity in each layer through activation functions and hence to capture a very complex relationship between inputs and output we need a more complex architecture with more hidden layers and more neurons and with a less complex problem, we could resolve it using a few hidden layers and lesser neurons.

This is how a neural network computes the data in following simple steps:

Multiplication of weights and inputs:

(x1*w1),(x2*w2),……………………….,(xn*wn)

2. Add the biases

(x1*w1)+b1, (x2*w2)+b2,……………,(xn*wn)+bn

weighted_sum = (x1* w1) + b1 + (x2* w2) + b2 +……..+(xn* wn) + bn

3. Output
Finally, the weighted sum is converted into output by feeding the weighted sum into the activation function.

Let’s see one example:

Given Input and weights, calculate the weighted sum.

2. We need to add bias in the weighted sum for each neuron and then apply the activation function.

3. The output will be the weighted sum between the hidden layer and weights (weights between hidden layer and output) with the bias added and activation function applied to it.

Optimization

Gradient descent is one of the optimization techniques used for feedforward neural networks. The gradient basically demonstrates how much output changes when there is a change in the input. Gradient descent is used to calculate the updated weights in order to reduce the error. It can also be seen as the slope of the function where the steeper the slope, the faster model can learn.

Backpropagation

The predicted value of the network is then compared to the expected value and an error is calculated and this error is further backpropagated in the network one layer at a time and then weights are updated. This process is generally done for all the data points in the training data set and one round of updating the network for the entire dataset is called an epoch one can train the entire network with as many epochs as one requires to reduce the loss.

Thank you for being here, I will upload more blogs with related topics soon and will talk about optimization and backpropagation in separate blogs. If any mistake found, please feel free to mention and the blog will be corrected.

Feedforward Neural Network Explained

Written by CoffeeBeans_BrewingInnovations

No responses yet