Sunday, 27 January 2019

100+ Basic Machine Learning Interview Questions and Answers

I have created a list of basic Machine Learning Interview Questions and Answers. These questions are categorized into 8 groups: Introduction, Data Wrangling, Dimensionality Reduction, Algorithms, Performance Measurement, Performance Improvement, Python and Practical Implementations. 

These Machine Learning Interview Questions are very common, simple and straight-forward. I will keep on adding more questions to this list of Machine Learning Interview Questions in future. For the time being, I have just listed down only the interview questions, later on, I will also add answers to these Machine Learning Interview Questions. 

These Machine Learning Interview Questions cover general introduction to Machine Learning, Data Analysis and Data Wrangling techniques, Dimensionality Reduction techniques like PCA (Principal Component Analysis), SVD (Singular Vector Decomposition), LDA (Linear Discriminant Analysis), MDS (Mulit-dimension Scaling) and t-SNE (t-Distributed Stochastic Neighbor Embedding), ICA (Independent Component Analysis), popular Supervised and Unsupervised Learning algorithms like K-Nearest Neighbors (KNN), Naive Bayes, Decision Trees, Random Forest, Support Vector Machines (SVM), Linear Regression, Logistic Regression, K-Means Clustering, Time Series Analysis, Sentiment Analysis, Bias and Variance, Overfitting and Underfitting, Cross-validation, Regularization, Ridge and Lasso Regression, Boosting techniques like AdaBoost and Gradient Boosting Machine (GBM) etc. There are also some questions on Python libraries which are frequently used while implementing Machine Learning Algorithms. 

Introduction (7 Questions)

1. What is Machine Learning? What are its various applications? Why is Machine Learning gaining so much attraction now-a-days?

2. What is the difference between Artificial Intelligence, Machine Learning and Deep Learning?

3. What are various types of Machine Learning? What is Supervised Learning, Unsupervised Learning, Semi-supervised Learning and Reinforcement Learning? Give some examples of these types of Machine Learning.

4. Explain Deep Learning and Neural Networks.

5. What is the difference between Data Mining and Machine learning?

6. What is the difference between Inductive and Deductive Machine Learning?

7. What are the various steps involved in a Machine Learning Process?

Data Processing (14 Questions)

1. What is the difference between Data ProcessingData Preprocessing and Data Wrangling?

2. What is Data Wrangling? What are the various things to consider in Data Wrangling?

3. Why you should Scale Standardize / Normalize the dataset features / variables before applying a Machine Learning algorithm? Answer

4. Which Machine Learning Algorithms require Data Scaling / Normalization and which not? Answer

5. How will you calculate Mean, Standard Deviation and Variance of a feature / variable in a given dataset? What is the formula? 

6. What are the various ways to handle missing data in a dataset?

7. What do you mean by Noise in the dataset? What are the various ways to remove it?

8. What do you mean by Imbalanced Datasheet? How will you handle it?

9. What is the difference between "Training" dataset and "Test" dataset? What are the common ratios we generally maintain between them?

10. What is the difference between Validation set and Test set?

11. What is the difference between Labeled and Unlabeled data?

12. What do you mean by features and labels in the dataset? 

13. What are the independent and dependent variables? What is the difference between continuous and categorical / discrete variables?

14. What do you understand by Fourier Transform? How is it used in Machine Learning?

Dimensionality Reduction (8 Questions)

1. What is the difference between Covariance and Correlation? How are these terms related with each other? Answer

2. Feature Selection and Feature Extraction

  • What do you mean by Curse of Dimensionality? How to deal with it? 
  • What is Dimension Reduction in Machine Learning? Answer
  • What is the difference between Feature Selection and Feature Extraction
  • What are the various Dimensionality Reduction TechniquesAnswer

3. Principal Component Analysis

  • What is Principal Component Analysis (PCA)?
  • How do we find Principal Components through Projections and Rotations
  • How will you find your first Principal Component (PC1) using SVD
  • What is Singular Vector or Eigenvector? What do you mean by Eigenvalue and Singular Value? How will you calculate it? 
  • What do you mean by Loading Score? How will you calculate it?
  • "Principal Component is a linear combination of existing features." Illustrate this statement. 
  • How will you find your second Principal Component (PC2) once you have discovered your first Principal Component (PC1)? 
  • How will you calculate the variation for each Principal Component? 
  • What is Scree Plot? How is it useful? 
  • How many Principal Components can you draw for a given sample dataset? 
  • Why is PC1 more important than PC2 and so on?
  • What are the advantages and disadvantages of PCA? Answer

4. What is SVD (Singular Value Decomposition)?

5. Linear Discriminant Analysis

  • What is LDA (Linear Discriminant Analysis)
  • How does LDA create a new axis by maximizing the distance between means and minimizing the scatter? What is the formula? 
  • What are the similarities and differences between LDA and PCA (Principal Component Analysis)?

6. Multi-Dimensional Scaling

  • What is Multi-Dimensional Scaling
  • What is the difference between "Metric" and "Non-metric" MDS
  • What is PCoA (Principal Coordinate Analysis)
  • Why should we not use Euclidean Distance in MDS to calculate the distance between variables? 
  • How is Log Fold Change used to calculate the distance between two variables in MDS? 
  • What are the similarities and differences between MDS and PCA (Principal Component Analysis)? 
  • How is it helpful in Dimensionality Reduction?

7. t-SNE (t-Distributed Stochastic Neighbor Embedding)

  • What is t-SNE (t-Distributed Stochastic Neighbor Embedding)? Answer
  • Define the terms: Normal Distribution, t-DistributionSimilarity Score, Perplexity
  • Why is it called t-SNE instead of simple SNE? Why is t-Distribution used instead of normal distribution in lower dimension?
  • Why should t-SNE not be used in larger datasets containing thousands of features? When should we use combination of both PCA and t-SNE?
  • What are the advantages and disadvantages of t-SNE over PCA? Answer

8. What is ICA (Independent Component Analysis)?

Algorithms (52 Questions)

1. Name various algorithms for Supervised Learning, Unsupervised Learning and Reinforcement Learning.

2. What are various Supervised Learning Techniques? What is the difference between Classification and Regression algorithms? Name various Classification and Regression algorithms. 

3. What are various Unsupervised Learning Techniques? What is the difference between Clustering and Association algorithms? Name various Clustering and Association algorithms. 

4. When should we use Classification algorithm and when should we use Regression algorithm? Explain with examples.

5. KNN: What is “K” in KNN algorithm? How to choose optimal value of K? Answer

6. KNN: Why the odd value of “K” is preferable in KNN algorithm? Answer

7. KNN: Why is KNN algorithm called Lazy Learner? Answer

8. KNN: Why should we not use KNN algorithm for large datasets? Answer

9. KNN: What are the advantages and disadvantages of KNN algorithm? Answer

10. KNN: What is the difference between Euclidean Distance and Manhattan Distance? What is the formula of Euclidean distance and Manhattan distance? Answer

11. Naive Bayes: What is the difference between Conditional Probability and Joint Probability

12. Naive Bayes: What is the formula of "Naive Bayes" theorem? How will you derive it?

13. Naive Bayes: Why is the word “Naïve” used in the “Naïve Bayes” algorithm?

14. Naïve Bayes: What is the difference between Probability and Likelihood?

15. Naive Bayes: How do we calculate Frequency and Likelihood tables for a given dataset in the “Naïve Bayes” algorithm?

16. Naïve Bayes: What are the various type of models used in "Naïve Bayes" algorithm? Explain the difference between Gaussian, Multinomial and Bernoulli models.

17. Naïve Bayes: What are the advantages and disadvantages of "Naive Bayes" algorithm? Answer

18. Naïve Bayes: What’s the difference between Generative and Discriminative models? What is the difference between Joint Probability Distribution and Conditional Probability Distribution? Name some Generative and Discriminative models. 

19. Naïve Bayes: Why is Naive Bayes Algorithm considered as Generative Model although it appears that it calculates Conditional Probability Distribution? 

20. SVM: Define the terms: Support Vectors and Hyperplanes

21. SVM: What are Kernel Functions and Tricks in SVM? What are the various types of Kernels in SVM? What is the difference between Linear, Polynomial, Gaussian and Sigmoid Kernels? How are these used for transformation of non-linear data into linear data?

22. SVM: Can SVM be used to solve regression problems? What is SVR (Support Vector Regression)?

23. SVM: What are the advantages and disadvantages of SVM? Answer

24. Decision Tree: Define the terms: GINI Index, Entropy and Information Gain. How will you calculate these terms from a given dataset to select the nodes of the tree?

25. Decision Tree: What is Pruning in a Decision Tree? Define the terms: Bottom-Up Pruning, Top-Down Pruning, Reduced Error Pruning and Cost Complexity Pruning.

26. Decision TreeWhat are the advantages and disadvantages of a Decision Tree? Answer

27. Decision TreeHow is Decision Tree used to solve the regression problems?

28. Random Forest: What is Random Forest? How does it reduce the over-fitting problem in decision trees? Answer

29. Random Forest: What are the advantages and disadvantages of Random Forest algorithm? Answer

30. Random Forest: How to choose optimal number of trees in a Random Forest? Answer

32. Linear Regression: How do we draw the line of linear regression using Least Square Method? What is the equation of line? How do we calculate slope and coefficient of a line using Least Square Method?

33. Linear Regression: Explain Gradient Descent. How does it optimize the Line of Linear Regression? Answer

34. Linear Regression: What are the various types of Linear Regression? What is the difference between Simple, Multiple and Polynomial Linear Regression?

35. Linear Regression: What are the various metrics used to check the accuracy of the Linear Regression? Answer

36. Linear Regression: What are the advantages and disadvantages of Linear Regression?

37. Logistic Regression: What is the equation of Logistic Regression? How will you derive this equation from Linear Regression (Equation of a Straight Line)?

38. Logistic Regression: How do we calculate optimal Threshold value in Logistic Regression?

39. Logistic Regression: What are the advantages and disadvantages of Logistic Regression? Answer

40. Logistic Regression: What is the difference between Linear Regression and Logistic Regression? Answer

41. Compare SVM, Decision Tree and Logistic Regression.

42. K-Means Clustering: What are the various types of Clustering? How will you differentiate between Hierarchial (Agglomerative and Devisive) and Partitional (K-Means, Fuzzy C-Means) Clustering?

43. K-Means Clustering: How do you decide the value of "K" in K-Mean Clustering Algorithm? What is the Elbow method? What is WSS (Within Sum of Squares)? How do we calculate WSS? How is Elbow method used to calculate value of "K" in K-Mean Clustering Algorithm?

44. K-Mean Clustering: How do we find centroids and reposition them in a cluster? How many times we need to reposition the centroids? What do you mean by convergence of clusters?

45. K-Mean Clustering: What is the difference between KNN and K-Means Clustering algorithms?

46. Time Series Analysis: What are various components of Time Series Analysis? What do you mean by Trend, Seasonality, Irregularity and Cyclicity?

47. Time Series Analysis: To perform Time Series Analysis, data should be stationary? Why? How will you know that your data is stationary? What are the various tests you will perform to check whether the data is stationary or not? How will you achieve the stationarity in the data?

48. Time Series Analysis: How will you use Rolling Statistics (Rolling Mean and Standard Deviation) method and ADCF (Augmented Dickey Fuller) test to measure stationarity in the data?

49. Time Series Analysis: What are the ways to achieve stationarity in the Time Series data?

50. Time Series Analysis: What is ARIMA model? How is it used to perform Time Series Analysis?

51. Time Series Analysis: When not to use Time Series Analysis?

52. Sentiment Analysis: What do you mean by Sentiment Analysis? How to identify Positive, Negative and Neutral sentiments? What is Polarity and Subjectivity in Sentiment Analysis?

Accuracy Measurement (10 Questions)

1. Name some metrics which we use to measure the accuracy of the classification and regression algorithms.

Hint: 
Classification metrics: Confusion Matrix, Classification Report, Accuracy Score etc.
Regression metrics: MAE, MSE, RMSE Answer

2. What is Confusion Matrix? What do you mean by True Positive, True Negative, False Positive and False Negative in Confusion Matrix?

3. How do we manually calculate Accuracy Score from Confusion Matrix?

4. What is Sensitivity (True Positive Rate) and Specificity (True Negative Rate)? How will you calculate it from Confusion Matrix? What is its formula? 

5. What is the difference between Precision and Recall? How will you calculate it from Confusion Matrix? What is its formula?

6. What do you mean by ROC (Receiver Operating Characteristic) curve and AUC (Area Under the ROC Curve)? How is this curve used to measure the performance of a classification model?

7. What do you understand by Type I vs Type II error ? What is the difference between them?

8. What is Classification Report? Describe its various attributes like Precision, Recall, F1 Score and Support.

9. What is the difference between F1 Score and Accuracy Score?

10. What do you mean by Loss Function? Name some commonly used Loss Functions. Define Mean Absolute Error, Mean Squared Error, Root Mean Squared ErrorSum of Absolute Error, Sum of Squared Error, R Square Method, Adjusted R Square Method. Answer

Performance Improvement (8 Questions)

1. What is the difference between Bias and Variance? What’s the trade-off between Bias and Variance?

2. What is the general cause of Overfitting and Underfitting? What steps will you take to avoid Overfitting and Underfitting? Answer

Hint: You should explain Cross-validation, Regularization, Decision Tree Pruning and Ensemble Techniques.

3. Cross Validation

  • What is Cross Validation? What is the difference between K-Fold Cross Validation and LOOCV (Leave One Out Cross Validation)?
  • What are Hyperparameters? How does Cross Validation help in Hyperparameter Tuning? Answer
  • What are the advantages and disadvantages of Cross Validation? Answer

4. Regularization

  • What is Regularization
  • When should one use Regularization in Machine Learning? 
  • How is it helpful in reducing Overfitting problem? 
  • What is the difference between L1 and L2 Regularization
  • How will you differentiate between Lasso and Ridge Regularization? Which one provides better results? Which one to use and when? 
  • What is Elastic Net Regression?

5. Ensemble Learning

  • What do you mean by Ensemble Learning?
  • What are the various Ensemble Learning Methods
  • What is the difference between Bagging (Bootstrap Aggregating) and BoostingAnswer
  • What are the various Bagging and Boosting Algorithms
  • Differentiate between Random ForestAdaBoost, Gradient Boosting Machine (GBM) and XGBoost? Answer 1, Answer 2, Answer 3

6. AdaBoost 

  • What do you know about AdaBoost Algorithm? 
  • What are Stumps? Why are the stumps called Weak Learners
  • How do we calculate order of stumps (which stump should be the first one and which should be the second and so on)? 
  • How do we calculate Error and Amount of Say of each stump? What is the mathematical formula? 
  • What is the difference between Random Forest and AdaBoost? Answer

7. GBM (Gradient Boosting Machine)

  • What is GBM (Gradient Boosting Machine)
  • What is Gradient Descent? Why is it so named? 
  • How will you calculate the Step Size and Learning Rate in Gradient Descent?
  • When to stop descending the gradient? 
  • What is Stochastic Gradient Descent?
  • What is the difference between the AdaBoost and GBM? Answer

8. XGBoost 

  • What is XGBoost Algorithm?
  • How is XGBoost more efficient than GBM (Gradient Boosting Machine)? Answer
  • What is the difference between GBM and XGBoost? Answer

Python (14 Questions)

1. What do you know about Anaconda Distribution?

2. When should I use Python and when should I use R?

3. What are the data structures you have used in Python?

4. What do you mean by mutable and immutable objects in Python?

5. What are the commonly used libraries in Python for Machine Learning? What is the use of pandas, numpy, sklearn, matplotlib and seaborn libraries?

6. What are the magic functions in IPython

7. What is the purpose of writing "inline" with "%matplotlib" (%matplotlib inline)?

8. What is StandardScaler? Why is it required? How does it transform various features / variables in the dataset? Answer

9. What is the use of LabelEncoder and OneHotEncoder? What is the difference between LabelEncoder and OneHotEncoder?

10. Implement KNN algorithm in Python using Scikit Learn library through cross validation (cross_val_score) technique. (This question can be asked for any algorithm)

11. What is the random_state (seed) parameter in train_test_split?

12. What are the various metrics present in sklearn library to measure the accuracy of the algorithm? Describe classification_report, confusion_matrix, accuracy_score, f1_score, r2_score, score and other matrices you know to measure the accuracy of an algorithm.

13. What is Heatmap? How is it useful? Which Python library contains Heatmap?

14. Which IDE you prefer for Python: Jupyter Notebook, PyCharm or any other? Why?

Practical Implementations (5 Questions)

1. Write a pseudo code for a given algorithm.

2. What are the parameters on which we decide which algorithm to use for a given situation?

3. How will you design a Chess Game, Spam Filter, Recommendation Engine etc.?

4. How can you use Machine Learning Algorithms to increase revenue of a company?

5. How will you design a promotion campaign for a business using Machine Learning?

Friday, 25 January 2019

KNN Algorithm in Machine Learning: Interview Questions and Answers

KNN is the simplest classification algorithm under supervised machine learning. It stands for K Nearest Neighbors. I have listed down 7 interview questions and answers regarding KNN algorithm in supervised machine learning. I have given only brief answers to the questions. If you want to dive deep into the mentioned KNN interview questions, you can google around to find out detailed answers. 

1. What is “K” in KNN algorithm?

2. How do we decide the value of "K" in KNN algorithm?

3. Why is the odd value of “K” preferable in KNN algorithm?

4. What is the difference between Euclidean Distance and Manhattan distance? What is the formula of Euclidean distance and Manhattan distance?

5. Why is KNN algorithm called Lazy Learner?

6. Why should we not use KNN algorithm for large datasets?

7. What are the advantages and disadvantages of KNN algorithm?

Lets try to explore the answers to the above mentioned KNN algorithm interview questions.

1. What is “K” in KNN algorithm?

K = Number of nearest neighbors you want to select to predict the class of a given item

2. How do we decide the value of "K" in KNN algorithm?

If K is small, then results might not be reliable because noise will have a higher influence on the result. If K is large, then there will be a lot of processing which may adversely impact the performance of the algorithm. So, following is must be considered while choosing the value of K:

a. K should be the square root of n (number of data points in training dataset)
b. K should be odd so that there are no ties. If square root is even, then add or subtract 1 to it.

More details...

3. Why is the odd value of “K” preferable in KNN algorithm?

K should be odd so that there are no ties in the voting. If square root of number of data points is even, then add or subtract 1 to it to make it odd.

4. What is the difference between Euclidean Distance and Manhattan distance? What is the formula of Euclidean distance and Manhattan distance?

Both are used to find out the distance between two points. 















Euclidean Distance and Manhattan Distance Formula











(Image taken from stackexchange)

5. Why is KNN algorithm called Lazy Learner?

When it gets the training data, it does not learn and make a model, it just stores the data. It does not derive any discriminative function from the training data. It uses the training data when it actually needs to do some prediction. So, KNN does not immediately learn a model, but delays the learning, that is why it is called lazy learner. 

6. Why should we not use KNN algorithm for large datasets?

KNN works well with smaller dataset because it is a lazy learner. It needs to store all the data and then makes decision only at run time. It needs to calculate the distance of a given point with all other points. So if dataset is large, there will be a lot of processing which may adversely impact the performance of the algorithm. 

KNN is also very sensitive to noise in the dataset. If the dataset is large, there are chances of noise in the dataset which adversely affect the performance of KNN algorithm.

7. What are the advantages and disadvantages of KNN algorithm?

Answer

Basic and Introductory Machine Learning Interview Questions and Answers

Machine Learning has gained a lot of popularity by now. I will be writing a lot of interview questions and answers which you should definitely know about Machine Learning. So, I am going to start with very simple, basic and introductory Machine Learning interview questions and answers. In my future posts, I will cover interview questions regarding various Machine Learning Algorithms and Python. 

So, for now, I have a list of following 9 interview questions about Machine Learning. I am going to write just brief answers. You can google around to find out the details.

1. What is Machine Learning? What are its various applications?

2. What is the difference between Artificial Intelligence, Machine Learning and Deep Learning?

3. What are various types of Machine Learning? What is Supervised Learning, Unsupervised Learning and Reinforcement Learning?

4. What is Deep Learning?

5. Explain Neural Networks.

6. What is the difference between Data Mining and Machine learning?

7. Why is Machine Learning gaining so much attraction now-a-days?

8. Which programming languages should I know to learn Machine Learning? (Not an interview question, just added for the sake of knowledge)

9. What mathematics concepts should I know to learn Machine Learning? (Not an interview question, just added for the sake of knowledge)

Lets handle all these Machine Learning interview questions one by one and try to answer them.

1. What is Machine Learning? What are its various applications?

Machine Learning enables machines to learn and make predictions based on some experience (previous data). It deals with extraction of patterns from dataset. It uses statistical methods to enable machines to improve with experience. ML makes machines to take data-driven decisions rather than being explicitly programmed.

Applications of Machine Learning:

1. Google Maps (Predicts traffic patterns, fastest route, traffic jam and delays based on current and historic data of the route) 

2. Facebook (Provides friend tagging suggestions, face recognition, image recognition using deep learning (DeepFace Algorithm))

3. Uber Eats (Estimates delivery time accurately)

4. Apple (Face recognition)

5. Tesla Self Driving Cars (Unsupervised Learning)

6. Recommendation Engine (Netflix, Youtube Amazon, Google Ads) 

7. Robots: Moley (The robotic chef), KUKA (Industrial robot), Sophia

8. Google Translate (Just scan the signboard in local language with camera)

9. Chess Playing Computer

10. Apple Siri, Amazon Alexa, Google Assistant

11. Document classification, Image and Video recognition, Speech recognition, Biometric recognition, Weather forecast, Handwriting detection, Spam detection, Fraud detection, Unusual patterns detection, News categorization, Medical diagnosis and much more...

2. What is the difference between Artificial Intelligence, Machine Learning and Deep Learning?

1. ML is the subset of AI and further Deep Learning is the subset of ML.

2. AI enables machines to mimic human behavior.

3. ML enables machines to learn and make predictions based on some experience (previous data). It deals with extraction of patterns from dataset. It uses statistical methods to enable machines to improve with experience.

4. ML makes machines to take data-driven decisions rather than being explicitly programmed.

5. Deep Learning: Subset of ML which is inspired by the functionality of human brain cells called neurons which led to the concept of artificial neural network. More sample data is required for Deep Learning as compared to Machine Learning and learning phase is also longer than Machine Learning but the execution time is far less as compared to Machine Learning.

3. What are various types of Machine Learning? What is Supervised Learning, Unsupervised Learning and Reinforcement Learning?

Supervised Learning

Machine is trained and supervised using the training data for some time. Afterwards, real data is provided and it makes prediction on it using its learning from the training data. If the accuracy of the prediction is acceptable, the algorithm is accepted. Otherwise the process of training is repeated again and again until we get satisfactory accuracy level in its prediction.

Mathematically: Input data and expected output is known beforehand. 
y = f(x)
Create a mapping between input data (x) and output data (y) to predict accurately with minimum scope for error.

Unsupervised Learning

There is no training dataset and no expected outcome. No past knowledge and experience of data. Analyze the data on the go. No prior training is given unlike Supervised Learning.

It creates clusters (groups / classification) of related / similar data.

Example: Going to an unknown party, first time watching any football or cricket match, Recommendation System

So, here we don't know proper input data and machine does not know about that data beforehand. So, the machine is not trained over that data. In this case it will not break and but try to provide a reasonable output.

Mathematically: We only have input data (x) but no corresponding expected output data (y).

Reinforcement Learning

Uses Hit and Trial method. Given rewards for the hit and penalty for the miss. 

Mechanisms: Exploration (Hit and Trial) and Exploitation (If it is a hit or miss, it learns and remembers the result for future).

Examples

1. You are left on an island, you have to survive anyhow, you will do hit and trial, get rewarded and penalized accordingly, you will first explore and then start exploiting the island.

2. Whether a given image is an apple or not? If in reality it is an apple and machine figures it out as an orange, machine is penalized, if it predicts it as an apple, machine is rewarded.

4. What is Deep Learning?

Deep Learning is a subset of Machine Learning which is inspired by the functionality of human brain cells called neurons which led to the concept of artificial neural network. More sample data is required for Deep Learning as compared to Machine Learning and learning phase is also longer than Machine Learning but the execution time is far less as compared to Machine Learning.

5. Explain Neural Networks.

Neural network is one group of algorithms used for machine learning that models the data using graphs of Artificial Neurons, those neurons are a mathematical model that “mimics approximately how a neuron in the brain works”.

6. What is the difference between Data Mining and Machine learning?

Data Mining is about using statistics as well as other programming methods to find patterns hidden in the data so that you can explain some phenomenon. Data Mining builds intuition about what is really happening in some data and is still little more towards math than programming, but uses both.

Machine Learning uses Data Mining techniques and other learning algorithms to build models of what is happening behind some data so that it can predict future outcomes. Math is the basis for many of the algorithms, but this is more towards programming.

7. Why is Machine Learning gaining so much attraction now-a-days?

Machine Learning is an old concept but it is getting popular now because earlier there was not that much data for machines to predict and analyze. Now as the data has increased, predictions can be accurate and machines will learn from this huge data itself. More the data, more accurate is the prediction (with minimum errors).

8. Which programming languages should I know to learn Machine Learning?

Must: Data Structures, Python
Optional: R, C++, Hadoop (Java based)

9. What mathematics concepts should I know to learn Machine Learning?

Matrix, Vector, Differentiation, Integration, Logs, Probability, Statistics