Pages

Thursday, 7 March 2019

Difference between Random Forest and AdaBoost in Machine Learning

Both Random Forest and Adaboost (Adaptive Boosting) are ensemble learning techniques. Lets discuss some of the differences between Random Forest and Adaboost.

1. Random Forest is based on bagging technique while Adaboost is based on boosting technique. See the difference between bagging and boosting here.

2. In Random Forest, certain number of full sized trees are grown on different subsets of the training dataset. Adaboost uses stumps (decision tree with only one split). So, Adaboost is basically a forest of stumps. These stumps are called weak learners. These weak learners have high bias and low variance

3. Each tree in the Random Forest is made up using all the features in the dataset while stumps use one feature at a time.

4. In Random Forest, each decision tree is made independent of each other. So, the order in which trees are made is not important. While in Adaboost, order of stumps do matter. The error that first stump makes, influence how the second stump is made and so on. Each stump is made by taking the previous stump's mistakes into account. It takes the errors from the first round of predictions, and passes the errors as a new target to the second stump. The second stump will model the error from the first stump, record the new errors and pass that as a target to the third stump. And so forth. Essentially, it focuses on modelling errors from previous stumps.

5. Random Forest uses parallel ensembling while Adaboost uses sequential ensembling. Random Forest runs trees in parallel, thus making it possible to parallelize jobs on a multiprocessor machine. Adaboost instead uses a sequential approach. 

6. Each tree in the Random Forest has equal amount of say in the final decision while in Adaboost different stumps have different amount of say in the final decision. The stump which makes less error in the prediction, has high amount of say as compared to the stump which makes more errors.

7. Random Forest aims to decrease variance not bias while Adaboost aims to decrease bias not variance.

8. There are rare chances of Random Forest to overfit while there are good chances of Adaboost to overfit.

Related: Difference between GBM (Gradient Boosting Machine) and XGBoost (Extreme Gradient Boosting)

1 comment:

  1. Thanks Sir! This was really helpful and made the concept quite clear to me.

    ReplyDelete