I have created a list of basic Machine Learning Interview Questions and Answers. These questions are categorized into 8 groups: Introduction, Data Wrangling, Dimensionality Reduction, Algorithms, Performance Measurement, Performance Improvement, Python and Practical Implementations.

These Machine Learning Interview Questions are very common, simple and straight-forward. I will keep on adding more questions to this list of Machine Learning Interview Questions in future. For the time being, I have just listed down only the interview questions, later on, I will also add answers to these Machine Learning Interview Questions.

These Machine Learning Interview Questions cover general introduction to Machine Learning, Data Analysis and Data Wrangling techniques, Dimensionality Reduction techniques like PCA (Principal Component Analysis), SVD (Singular Vector Decomposition), LDA (Linear Discriminant Analysis), MDS (Mulit-dimension Scaling) and t-SNE (t-Distributed Stochastic Neighbor Embedding), ICA (Independent Component Analysis), popular Supervised and Unsupervised Learning algorithms like K-Nearest Neighbors (KNN), Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), Linear Regression, Logistic Regression, K-Means Clustering, Time Series Analysis, Sentiment Analysis, Bias and Variance, Overfitting and Underfitting, Cross-validation, Regularization, Ridge and Lasso Regression, Boosting techniques like AdaBoost and Gradient Boosting Machine (GBM) etc. There are also some questions on Python libraries which are frequently used while implementing Machine Learning Algorithms.

1. What is Machine Learning? What are its various applications? Why is Machine Learning gaining so much attraction now-a-days?

2. What is the difference between

3. What are various types of Machine Learning? What is

4. Explain

5. What is the difference between

6. What are the various steps involved in Machine Learning Process?

7. Difference between

1. What is

2. What is the difference between

3. Why should you

4. What are the various ways to handle

5. What do you mean by

6. What do you mean by

7. What is the difference between

8. What is the difference between

9. What is the difference between

10. What do you understand by

1. What is the difference between

2. What do you mean by

3. What is

4. What is

5. What is

6. What is

7. What is

8. What is

1. Name various algorithms used in

2. What are various

3. What are various

4. What is the difference between

5. When should we use Classification algorithm and when should we use Regression algorithm? Explain with examples.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

50. What’s the difference between a

51. Why is

1. What is

2. How do we manually calculate

3. What is

4. What is the difference between

5. What do you mean by

6. What is

7. What is the difference between

8. What do you understand by

1. What is the difference between

2. What is the general cause of

3. What is

5. What do you mean by

6. What do you know about

1. What do you know about

2. When should I use Python and when should I use R?

3. What do you mean by

4. What are the data structures you have used in Python?

5. What are the commonly used libraries in Python for Machine Learning? What is the use of

6. What are the

7. What is the purpose of writing "

8. What is

9. What is the use of

10. What are the various metrics present in

11. What is

12. Which IDE you prefer for Python:

1. Write a pseudo code for a given algorithm.

2. What are the parameters on which we decide which algorithm to use for a given situation?

3. How will you design a Chess Game, Spam Filter, Recommendation Engine etc.?

4. How can you use Machine Learning Algorithms to increase revenue of a company?

5. How will you design a promotion campaign for a business using Machine Learning?

These Machine Learning Interview Questions are very common, simple and straight-forward. I will keep on adding more questions to this list of Machine Learning Interview Questions in future. For the time being, I have just listed down only the interview questions, later on, I will also add answers to these Machine Learning Interview Questions.

These Machine Learning Interview Questions cover general introduction to Machine Learning, Data Analysis and Data Wrangling techniques, Dimensionality Reduction techniques like PCA (Principal Component Analysis), SVD (Singular Vector Decomposition), LDA (Linear Discriminant Analysis), MDS (Mulit-dimension Scaling) and t-SNE (t-Distributed Stochastic Neighbor Embedding), ICA (Independent Component Analysis), popular Supervised and Unsupervised Learning algorithms like K-Nearest Neighbors (KNN), Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), Linear Regression, Logistic Regression, K-Means Clustering, Time Series Analysis, Sentiment Analysis, Bias and Variance, Overfitting and Underfitting, Cross-validation, Regularization, Ridge and Lasso Regression, Boosting techniques like AdaBoost and Gradient Boosting Machine (GBM) etc. There are also some questions on Python libraries which are frequently used while implementing Machine Learning Algorithms.

**Introduction****(7 Questions)**1. What is Machine Learning? What are its various applications? Why is Machine Learning gaining so much attraction now-a-days?

2. What is the difference between

**Artificial Intelligence, Machine Learning and Deep Learning**?3. What are various types of Machine Learning? What is

**Supervised Learning, Unsupervised Learning, Semi-supervised Learning and Reinforcement Learning**? Give some examples of these types of Machine Learning.4. Explain

**Deep Learning**and**Neural Networks**.5. What is the difference between

**Data Mining**and**Machine learning**?6. What are the various steps involved in Machine Learning Process?

7. Difference between

**Inductive and Deductive Machine Learning**?**Data Wrangling****(10 Questions)**1. What is

**Data Wrangling**? What are the various things to consider in Data Wrangling?2. What is the difference between

**Data Processing**,**Data Preprocessing**and**Data Wrangling**?3. Why should you

**standardize**and**transform**dataset features/variables before applying any Machine Learning algorithm? How will you calculate**Mean, Variance and Standard Deviation**of a feature/variable in a given dataset? What is the formula? Answer4. What are the various ways to handle

**missing data in a dataset**?5. What do you mean by

**Noise**in the dataset? What are the various ways to remove it?6. What do you mean by

**Imbalanced Datasheet**? How will you handle it?7. What is the difference between

**"Training"**dataset and**"Test"**dataset? What are the common ratios we generally maintain between them?8. What is the difference between

**Validation**set and**Test**set?9. What is the difference between

**Labeled and Unlabeled**data?10. What do you understand by

**Fourier Transform**? How is it used in Machine Learning?**Dimensionality Reduction****(8 Questions)**1. What is the difference between

**Covariance and Correlation**? How are these terms related with each other? Answer2. What do you mean by

**Curse of Dimensionality**? How to deal with it? What is**Dimension Reduction**in Machine Learning? What is the difference between**Feature Selection**and**Feature Extraction**? What are the various**Dimensionality Reduction Techniques**? Answer 1, Answer 23. What is

**Principal Component Analysis (PCA)**? How do we find**Principal Components**through**Projections**and**Rotations**? How will you find your first Principal Component (**PC1**) using**SVD**? What is**Singular Vector or Eigenvector**? What do you mean by**Eigenvalue and Singular Value**? How will you calculate it? What do you mean by**Loading Score**? Principal Component is a**linear combination**of existing features. Illustrate this statement. How will you find your second Principal Component (PC2) once you have discovered your first Principal Component (PC1)? How will you calculate the variation for each Principal Component? What is**Scree Plot**? How is it useful? How many Principal Component can you draw for a given sample dataset? Why is**PC1 more important than PC2**and so on?4. What is

**SVD (Singular Value Decomposition)**?5. What is

**LDA (Linear Discriminant Analysis)**? How does LDA create a new axis by maximizing the distance between means and minimizing the scatter? What is the formula? What are the**similarities and differences between LDA and PCA (Principal Component Analysis)**?6. What is

**Multi-Dimensional Scaling**? What is the difference between**"Metric" and "Non-metric" MDS**? What is**PCoA (Principal Coordinate Analysis)**? Why should we not use Euclidean Distance in MDS to calculate the distance between variables? How is**Log Fold Change**used to calculate the distance between two variables in MDS? What are the**similarities and differences between MDS and PCA (Principal Component Analysis)**? How is it helpful in Dimensionality Reduction?7. What is

**t-SNE (t-Distributed Stochastic Neighbor Embedding)**? How does it help in Dimensionality Reduction? What are the**similarities and differences between t-SNE and PCA (Principal Component Analysis)**?8. What is

**ICA (Independent Component Analysis)**?**Algorithms****(51 Questions)**1. Name various algorithms used in

**Supervised Learning, Unsupervised Learning and Reinforcement Learning**.2. What are various

**Supervised Learning Techniques**? What is the difference between**Classification and Regression algorithms**? Name various Classification and Regression algorithms.3. What are various

**Unsupervised Learning Techniques**? What is the difference between**Clustering and Association algorithms**? Name various Clustering and Association algorithms.4. What is the difference between

**Linear Regression and Logistic Regression**?5. When should we use Classification algorithm and when should we use Regression algorithm? Explain with examples.

6.

**KNN:**What is “K” in KNN algorithm?7.

**KNN:**How do we decide the value of "K" in KNN algorithm?8.

**KNN:**Why the odd value of “K” is preferable in KNN algorithm?9.

**KNN:**What is the difference between**Euclidean Distance and Manhattan Distance**? What is the formula of Euclidean distance and Manhattan distance?10.

**KNN:**Why is KNN algorithm called**Lazy Learner**?11.

**KNN:**Why is KNN best suited for smaller datasets?12.

**Naive Bayes:**What is the Bayes Theorem? What is its formula? How to derive this formula?13.

**Naive Bayes:**Why the word “Naïve” is used in “Naïve Bayes” theorem?14.

**Naïve Bayes:**What is the difference between**Probability and Likelihood**?15.

**Naive Bayes:**How do we calculate**Frequency and Likelihood**tables in “Naïve Bayes” theorem?16.

**Naïve Bayes:**What are various type of models used in Naïve Byes? Explain the difference between**Gaussian, Multinomial and Bernoulli**models.17.

**Decision Tree:**What do you mean by**CART (Classification and Regression Trees)**Algorithms?18.

**Decision Tree:**What is**Entropy**? What is its formula?19.

**Decision Tree**: What is**Information Gain**? What is its formula?20.

**Decision Tree**: What is**GINI Index**? What is the difference between**GINI Index and Entropy**?21.

**Decision Tree:**On what parameters do we decide the node in the Decision Tree?22.

**Decision Tree:**What are the common problems faced in Decision Tree? What is**Over-fitting and Under-fitting**? How can you reduce them?23.

**Decision Tree**: What is**Pruning**in Decision Tree? What do you understand by**Bottom-up and Top-down pruning**? Explain the difference between**Reduced Error Pruning and Cost Complexity Pruning**.24.

**Random Forest:**What is Random Forest? How does it make decisions?25.

**Random Forest:**How does it handle over-fitting problem in decision trees?26.

**SVM:**What do you mean by**Support Vectors**in SVM?27.

**SVM:**What are the parameters which need to be considered while plotting a**hyperplane**in SVM?28.

**SVM:**What are**Kernel Functions and Tricks**in SVM? How are these useful while handling non-linear data?29.

**SVM:**Can SVM be used to solve regression problems? What is**SVR**(**Support Vector Regression**)?30.

**Linear Regression:**What are**dependent and independent variables**? What are**continuous and categorical variables**?31.

**Linear Regression:**What are the various types of Linear Regression? What is the difference between**Simple, Multiple and Polynomial Linear Regression**?32.

**Linear Regression:**How do we draw the line of regression using**Least Square Method**? What is the equation of line? How do we calculate slope and coefficient of a line using Least Square Method?33.

**Linear Regression**: Explain**Gradient Descend Method**.34.

**Linear Regression**: What are the various ways to check the goodness of an algorithm? What do you know about**R Square method, Adjusted R Square method, Mean Square Error method, Root Mean Square Error method, Sum of Squared Error method and Sum of Absolute Error method**?35.

**Linear Regression:**What is the difference between**R Square method and Adjusted R Square method**?36.

**Linear Regression:**What is the difference between**Mean Squared Error method and Root Mean Squared Error method**?37.

**Logistic Regression:**What is the equation of Logistic Regression? How do we derive it?38.

**Logistic Regression:**How do we calculate**Threshold value**?39.

**K-Means Clustering:**What are the various types of Clustering? How will you differentiate between**Hierarchial (Agglomerative and Devisive)**and**Partitional (K-Means, Fuzzy C-Means) Clustering**?40.

**K-Means Clustering:**How do you decide the value of "K" in K-Mean Clustering Algorithm? What is the**Elbow method**? What is**WSS (Within Sum of Squares)**? How do we calculate WSS? How is Elbow method used to calculate value of "K" in K-Mean Clustering Algorithm?41.

**K-Mean Clustering:**How do we find centroids and reposition them in a cluster? How many times we need to reposition the centroids? What do you mean by convergence of clusters?42.

**K-Mean Clustering:**What is the difference between**KNN and K-Means Clustering**algorithms?43.

**Time Series Analysis:**What are various components of Time Series Analysis? What do you mean by**Trend, Seasonality, Irregularity and Cyclicity**?44.

**Time Series Analysis**: To perform Time Series Analysis, data should be stationary? Why? How will you know that your data is stationary? What are the various tests you will perform to check whether the data is stationary or not? How will you achieve the stationarity in the data?45.

**Time Series Analysis:**How will you use**Rolling Statistics (Rolling Mean and Standard Deviation) method and ADCF (Augmented Dickey Fuller)**test to measure stationarity in the data?46.

**Time Series Analysis:**What are the ways to achieve stationarity in the Time Series data?47.

**Time Series Analysis**: What is**ARIMA**model? How is it used to perform Time Series Analysis?48.

**Time Series Analysis:**When not to use Time Series Analysis?49.

**Sentiment Analysis**: What do you mean by Sentiment Analysis? How to identify Positive, Negative and Neutral sentiments? What is**Polarity**and**Subjectivity**in Sentiment Analysis?50. What’s the difference between a

**Generative**and**Discriminative**model? What is the difference between**Joint Probability Distribution**and**Conditional Probability Distribution**? Name some Generative and Discriminative models.51. Why is

**Naive Bayes Algorithm**considered as**Generative Model**although it appears that it calculates**Conditional Probability Distribution**?**Performance M****easurement****(8 Questions)**1. What is

**Confusion Matrix**? What do you mean by**True Positive**,**True Negative**,**False Positive**and**False Negative**in Confusion Matrix?2. How do we manually calculate

**Accuracy Score**from Confusion Matrix?3. What is

**Sensitivity****(True Positive Rate)**and**Specificity****(True Negative Rate)**? How will you calculate it from Confusion Matrix? What is its formula?4. What is the difference between

**Precision**and**Recall**? How will you calculate it from Confusion Matrix? What is its formula?5. What do you mean by

**ROC (Receiver Operating Characteristic) curve**and**AUC (Area Under the ROC Curve)**? How is this curve used to measure the performance of a classification model?6. What is

**Classification Report**? Describe its various attributes like**Precision, Recall, F1 Score and Support**.7. What is the difference between

**F1 Score and Accuracy Score**?8. What do you understand by

**Type I vs Type II error**? What is the difference between them?**Performance Improvement (6 Questions)**1. What is the difference between

**Bias and Variance**? What’s the trade-off between Bias and Variance?2. What is the general cause of

**Overfitting**and**Underfitting**? What steps will you take to avoid Overfitting and Underfitting?**Hint:**You should explain**Cross-validation, Regularization and Pruning concepts**.3. What is

**Cross Validation**? How is it useful? What is the difference between**K-Fold Cross Validation**and**LOOCV (Leave One Out Cross Validation)**?
4. What is

**Regularization**? When should one use Regularization in Machine Learning? How is it helpful in reducing Overfitting problem? What is the difference between**L1 and L2 Regularization**? How will you differentiate between**Lasso and Ridge Regularization**? Which one provides better results? Which one to use and when? What is**Elastic Net Regression**?5. What do you mean by

**Ensemble Learning**? What are the various**Ensemble Learning Methods**? What is the difference between**Bagging (Bootstrap Aggregating) and Boosting**? What are the various**Bagging and Boosting Algorithms**? Differentiate between**Random Forest**,**AdaBoost Algorithms**and**Gradient Boosting Machine (GBM)**?6. What do you know about

**AdaBoost**Algorithm? What are**Stumps**? Why are stumps called**Weak Learners**? How do we calculate**order of stumps**(which stump should be the first one and which should be the second and so on)? How do we calculate**Error**and**Amount of Say**of each stump? What is the mathematical formula? What is the**difference between Random Forest and AdaBoos**t?**Python**

**(12 Questions)**

1. What do you know about

**Anaconda Distribution**?2. When should I use Python and when should I use R?

3. What do you mean by

**mutable**and**immutable**objects in Python?4. What are the data structures you have used in Python?

5. What are the commonly used libraries in Python for Machine Learning? What is the use of

**pandas, numpy, sklearn, matplotlib and seaborn**libraries?6. What are the

**magic functions in IPython**?7. What is the purpose of writing "

**inline**" with "**%matplotlib**" (**%matplotlib inline**)?8. What is

**StandardScaler**? Why is it required? How does it transform various features/variables in the dataset? Answer9. What is the use of

**LabelEncoder**and**OneHotEncoder**? What is the**difference between LabelEncoder and OneHotEncoder**?10. What are the various metrics present in

**sklearn**library to measure the accuracy of the algorithm? Describe**classification_report, confusion_matrix, accuracy_score, f1_score, r2_score, score**and other matrices you know to measure the accuracy of an algorithm.11. What is

**Heatmap**? How is it useful? Which Python library contains heatmap?12. Which IDE you prefer for Python:

**Jupyter Notebook, PyCharm**or any other? Why?**Practical Implementations****(5 Questions)**1. Write a pseudo code for a given algorithm.

2. What are the parameters on which we decide which algorithm to use for a given situation?

3. How will you design a Chess Game, Spam Filter, Recommendation Engine etc.?

4. How can you use Machine Learning Algorithms to increase revenue of a company?

5. How will you design a promotion campaign for a business using Machine Learning?

## No comments:

## Post a Comment