Online Machine Learning Quiz

100+ Objective Machine Learning Questions. Lets see how many can you answer?

Start Quiz

Saturday, 25 May 2019

Basic introduction of RNN (Recurrent Neural Network) in Deep Learning

RNN stands for Recurrent Neural Network. It is a type of neural network which contains memory and best suited for sequential data. RNN is used by Apples Siri and Googles Voice Search. Lets discuss some basic concepts of RNN:

Best suited for sequential data 

RNN is best suited for sequential data. It can handle arbitrary input / output lengths. RNN uses its internal memory to process arbitrary sequences of inputs. 

This makes RNNs best suited for predicting what comes next in a sequence of words. Like a human brain, particularly in conversations, more weight is given to recency of information to anticipate sentences. 

RNN that is trained to translate text might learn that "dog" should be translated differently if preceded by the word "hot".

RNN has internal memory

RNN has memory capabilities. It memorizes previous data. While making a decision, it takes into consideration the current input and also what it has learned from the inputs it received previously. Output from previous step is fed as input to the current step creating a feedback loop

So, it calculates its current state using set of current input and the previous state. In this way, the information cycles through a loop. 

In nutshell, we can say that RNN has two inputs, the present and the recent past. This is important because the sequence of data contains crucial information about what is coming next, which is why a RNN can do things other algorithms can’t.

Types of RNN

1. One to One: It maps one input to one output. It is also known as Vanilla Neural Network. It is used to solve regular machine learning problems.

2. One to ManyIt maps one input to many outputs. Example: Image Captioning. An image is fetched into the RNN system and it provides the caption by considering various objects in the image.

Caption: "A dog catching a ball in mid air"

3. Many to OneIt maps sequence of inputs to one output. Example: Sentiment Analysis. In sentiment analysis, a sequence of words are provided as input, and RNN decides whether the sentiment is positive or negative.

4. Many to ManyIt maps sequence of inputs to sequence of outputs. Example: Machine Translation. A sentence in a particular language is translated into other languages.

Forward and Backward Propagation

Forward Propagation: We do forward propagation to get the output of the model and check its accuracy and get the error.

Backward Propagation: Once the forward propagation is completed, we calculate the error. This error is then back-propagated to the network to update the weights.

We go backward through the neural network to find the partial derivatives of the error (loss function) with respect to the weights. This partial derivative is now multiplied with learning rate to calculate step size. This step size is added to the original weights to calculate new weights. That is how a neural network learns during the training process.

Vanishing and Exploding Gradients

Lets first understand what is gradient?

Gradient: As discussed above in back-propagation section, a gradient is a partial derivative with respect to its inputs. A gradient measures how much the output of a function changes, if you change the inputs a little bit. 

You can also think of a gradient as the slope of a function. Higher the gradient, steeper the slope and the faster a model can learn. If the slope is almost zero, the model stops to learn. A gradient simply measures the change in all weights with regard to the change in error.

Gradient issues in RNN

While training an RNN algorithm, sometimes gradient can become too small or too large. So, the training of an RNN algorithm becomes very difficult in this situation. Due to this, following issues occur:

1. Poor Performance
2. Low Accuracy 
3. Long Training Period 

Exploding Gradient: When we assign high importance to the weights, exploding gradient issue occurs. In this case, values of a gradient become too large and slope tends to grow exponentially. This can be solved using following methods:

1. Identity Initialization
2. Truncated Back-propagation
3. Gradient Clipping

Vanishing Gradient: This issue occurs when the values of a gradient are too small and the model stops learning or takes way too long because of that. This can be solved using following methods:

1. Weight Initialization
2. Choosing the right Activation Function
3. LSTM (Long Short-Term Memory)

Best way to solve the vanishing gradient issue is the use of LSTM (Long Short-Term Memory).


A usual RNN has a short-term memory. So, it is not able to handle long term dependencies. Using LSTM, it can also have a long-term memory. LSTM is an extension for RNA, which extends its memory. LSTM’s enable RNN’s to remember their inputs over a long period of time so that RNN become capable of learning long-term dependencies. 

In this way, LSTM solves the vanishing gradients issue in RNN. It keeps the gradients steep enough and therefore make training relatively short and the accuracy high.

Gated Cells in LSTM

LSTM is comprised of different memory blocks called cells and manipulations in these cells are done using gates. LSTMs store information in these gated cells. The data can be stored, deleted and read from these gated cells much like the data in a computer’s memory. Gates of these cells open and close based on some decisions. 

These gates are analog gates (instead of digital gates) and their outputs range from 0 to 1. Analog has the advantage over digital of being differentiable, and therefore suitable for back-propagation.

We have following types of gates in LSTM:

1. Forget Gate: It decides what information it needs to forget or throw away. It outputs a number between 0 and 1. A 1 represents “completely keep this” while a 0 represents “completely forget this.” 

2. Input Gate: The input gate is responsible for the addition of information to the cell state. It ensures that only that information is added to the cell state that is important and is not redundant.

3. Output Gate: Its job is to select useful information from the current cell state and showing it out as an output.

Squashing / Activation Functions in LSTM

1. Logistic (sigmoid): Outputs range from 0 to 1.

2. Hyperbolic Tangent (tanh): Outputs range from -1 to 1.

Bidirectional RNN

Bidirectional RNNs take an input vector and train it on two RNNs. One of the them gets trained on the regular RNN input sequence while the other on a reversed sequence. Outputs from both RNNs are next concatenated, or combined.

Applications of RNN

1. Natural Language Processing (Text mining, Sentiment analysis, Text and Speech analysis, Audio and Video analysis)

2. Machine Translation (Translate a language to other languages)

3. Time Series Prediction (Stock market prediction, Algorithmic trading, Weather prediction,
Understanding DNA sequence etc.)

4. Image Captioning

Friday, 10 May 2019

Online Machine Learning Quiz (Objective Questions)

Machine Learning is the revolutionary technology which has changed our life to a great extent. Machines are learning from data like humans. A lot of scientists and researchers are exploring a lot of opportunities in this field and businesses are getting huge profit out of it.

Keeping that in mind, I have created an online quiz in Machine Learning which will help you in sharpening your ML skills. This ML quiz contains a lot of multiple choice questions (objective questions) regarding Machine Learning.

This ML quiz contains objective questions on following Machine Learning concepts:

1. Data Exploration and Visualization: Hypothesis Generation, Seaborn, Matplotlib, Bar Plot, Box Plot, Histogram, Heatmap, Scatter Plot, Regression Plot, Joint Plot, Distribution Plot, Strip Plot, Violin Plot, KDE, Pair Plot, Pair Grid, Facet Grid etc.

2. Data Wrangling: Missing values, Invalid and corrupted values, Outliers, Skewed data, Feature Scaling, Standardization, Normalization, Binning, Feature Encoding, Label Encoder, One Hot Encoder etc.

3. Dimensionality Reduction: Finding correlation, Feature Selection and Feature Extraction, PCA, t-SNE, SVD, LDA, MDS, ICA etc.

4. Algorithms: Supervised and Unsupervised Learning, Linear Regression, Logistic Regression, KNN, SVM, Naive Byes, Decision Tree, K-Means Clustering etc.

5. Overfitting: Overfitting, Underfitting, Bias, Variance, Cross-validation etc.

6. Ensemble Learning: Bagging, Boosting, Random Forest, Adaboost, GBM (Gradient Boosting Machine), XGBoost (Extreme Gradient Boosting) etc.

7. Regularization: Ridge Regression (L2 Regularization), Lasso Regression (L1 Regularization), Elastic Net Regression etc.

8. Accuracy Measurement: Confusion Matrix, Classification Report, Accuracy Score, F1 Score, Mean Absolute Error, Mean Square Error, Root Mean Square Error etc.

9. Python: Basic Datastructures, Libraries like Scikit Learn, Pandas, Numpy, Scipy, Seaborn, Matplotlib etc.

Rules and Guidelines

1. All questions are objective type questions with 4 options. Only one option is correct.

2. 20 seconds are allotted for each question.

3. Correct answer gives you 4 marks and wrong answer takes away 1 mark (25% negative marking).

4. We will take short breaks during the quiz after every 10 questions.

5. Passing score is 75%. Quiz contains very simple Machine Learning objective questions, so I think 75% marks can be easily scored.

6. Please don't refresh the page or click any other link during the quiz.

7. Please don't use Internet Explorer to run this quiz.


There are 4 helplines given in this quiz:

1. Weed Out

2. Blink

3. Magic Wand

4. Hands Up

You can use one helpline per question except "Hands Up". Below is the description of all these helplines:

1. Weed Out

"Weed Out" helpline weeds out two incorrect options. So, now you have to guess the answer only from 2 options from which one is the right answer.

2. Blink

Keep your eyes wide open while using the "Blink" helpline. "Blink" helpline first lights the bulb against the right option and then in fraction of a second (100 milliseconds), it goes on lighting the bulbs against wrong options. So you have to identify against which option, the bulb was lighted first.

3. Magic Wand

This is the most flexible helpline in which you have nothing to do. Just click on the "Magic Wand" and you get the right answer magically.

4. Hands Up

By using "Hands Up" helpline, you are not adding up score but saving your quiz time. You can use it as many times you want. I would suggest you to use this helpline when you have exhausted all your other helplines. If you find a question whose answer is not clear to you, and you don’t have any helpline left, please don’t waste time on that question and just raise your hands to save your time.

Quit Quiz

Quiz contains a lot of objective questions on Machine Learning which will take a lot of time and patience to complete. If you feel tired at any point of time and don't want to continue, you can just quit the quiz and your results will be displayed based on the number of questions you went through.

Quiz Results

At the end of the quiz, you will get your score and time taken to complete the quiz. You can take screenshot of the result for any future reference.


Please email me more Machine Learning questions which can be included in this quiz.
Please email me your feedback and suggestions to improve this quiz

Difference between Decision Tree and Random Forest in Machine Learning

Random Forest is a collection of Decision Trees. Decision Tree makes its final decision based on the output of one tree but Random Forest combines the output of a large number of small trees while making its final prediction. Following is the detailed list of differences between Decision Tree and Random Forest:

1. Random Forest is an Ensemble Learning (Bagging) Technique unlike Decision Tree: In Decision Tree, only one tree is grown using all the features and observations. But in case of Random Forest, features and observations are spitted into multiple parts and a lot of small trees (instead of one big tree) are grown based on the spitted data. So, instead of one full tree like Decision Tree, Random Forest uses multiple trees. Larger the number of trees, better is the accuracy and generalization capability. But at some point, increasing the number of trees does not contribute to the accuracy, so one should stop growing trees at that point. 

2. Random Forest uses voting system unlike Decision Tree: All the trees grown in Random Forest are called weak learners. Each weak learner casts a vote as per its prediction. The class which gets maximum votes is considered as the final output of the prediction. You can think of it like a democracy system. On the other hand, there is no voting system in Decision Tree. Only one tree predicts the outcome. No democracy at all!! 

3. Random Forest rarely overfits unlike Decision Tree: Decision Tree is very much prone to overfitting as there is only one tree which is responsible for predicting the outcome. If there is a lot of noise in the dataset, it will start considering the noise while creating the model and will lead to very low bias (or no bias at all). Due to this, it will show a lot of variance in the final predictions in real world data. This scenario is called overfitting. In Random Forest, noise has very little role in spoiling the model as there are so many trees in it and noise cannot affect all the trees.

4. Random Forest reduces variance instead of bias: Random forest reduces variance part of the error rather than bias part, so on a given training dataset, Decision Tree may be more accurate than a Random Forest. But on an unexpected validation dataset, Random Forest always wins in terms of accuracy.

5. Performance: The downside of Random Forest is that it can be slow if you have a single process but it can be parallelized.

6. Decision Tree is easier to understand and interpret: Decision Tree is simple and easy to interpret. You know what variable and what value of that variable is used to split the data and predict the outcome. On the other hand, Random Forest is like a Black Box. You can specify the number of trees you want in your forest (n_estimators) and also you can specify maximum number of features to be used in each tree. But you cannot control the randomness, you cannot control which feature is part of which tree in the forest, you cannot control which data point is part of which tree. 

Thursday, 9 May 2019

Advantages and Disadvantages of Linear Regression in Machine Learning

Linear Regression is a supervised machine learning algorithm which is very easy to learn and implement. Following are the advantages and disadvantage of Linear Regression:

Advantages of Linear Regression

1. Linear Regression performs well when the dataset is linearly separable. We can use it to find the nature of the relationship among the variables.

2. Linear Regression is easier to implement, interpret and very efficient to train. 

3. Linear Regression is prone to over-fitting but it can be easily avoided using some dimensionality reduction techniques, regularization (L1 and L2) techniques and cross-validation.

Disadvantages of Linear Regression

1. Main limitation of Linear Regression is the assumption of linearity between the dependent variable and the independent variables. In the real world, the data is rarely linearly separable. It assumes that there is a straight-line relationship between the dependent and independent variables which is incorrect many times.

2. Prone to noise and overfitting: If the number of observations are lesser than the number of features, Linear Regression should not be used, otherwise it may lead to overfit because is starts considering noise in this scenario while building the model.

3. Prone to outliers: Linear regression is very sensitive to outliers (anomalies). So, outliers should be analyzed and removed before applying Linear Regression to the dataset.

4. Prone to multicollinearity: Before applying Linear regression, multicollinearity should be removed (using dimensionality reduction techniques) because it assumes that there is no relationship among independent variables.

In summary, Linear Regression is great tool to analyze the relationships among the variables but it isn’t recommended for most practical applications because it over-simplifies real world problems by assuming linear relationship among the variables