Ensemble Techniques ,Ensemble Learning in Machine Learning Explanation

Pradeep Dhote
6 min readJul 30, 2020

Ensemble learning is most popular Techniques or approach in machine learning.

Ensemble learning is a type of learning where you join different types of algorithms or same algorithm multiple times to form a more powerful prediction model

Ensemble learning helps improve machine learning results by combining several models. This approach allows the production of better predictive performance compared to a single model. That is why ensemble methods placed first in many prestigious machine learning competitions

Lets take a real life example, as we regularly come across many game shows on television and you must have noticed an option of “Audience Poll”. Most of the times a contestant goes with the option which has the highest vote from the audience and most of the times they win. We can generalize this in real life as well where taking opinions from a majority of people is much more preferred than the opinion of a single person.

Ensemble technique has a similar underlying idea where we aggregate predictions from a group of predictors.

There are few very popular Ensemble techniques that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking).

Most ensemble methods use a single base learning algorithm to produce homogeneous base learners, i.e. learners of the same type, leading to homogeneous ensembles.

There are also some methods that use heterogeneous learners, i.e. learners of different types, different types algorithm in model leading to heterogeneous ensembles. In order for ensemble methods to be more accurate than any of its individual members, the base learners have to be as accurate as possible and as diverse as possible.

Ensemble methods can be divided into two groups:

  • sequential ensemble methods where the base learners are generated sequentially (e.g. AdaBoost).
    The basic motivation of sequential methods is to
    exploit the dependence between the base learners. The overall performance can be boosted by weighing previously mislabeled examples with higher weight.
  • parallel ensemble methods where the base learners are generated in parallel (e.g. Random Forest).
    The basic motivation of parallel methods is to
    exploit independence between the base learners since the error can be reduced dramatically by averaging.

Bagging (Bootstrap Aggregation)

Bagging is the type of ensemble technique in which a single training algorithm is used on different subsets of the training data where the subset sampling is done with replacement (bootstrap). Once the algorithm is trained on all the subsets, then bagging makes the prediction by aggregating all the predictions made by the algorithm on different subsets.

In case of regression, bagging prediction is simply the mean of all the predictions and in the case of classifier, bagging prediction is the most frequent prediction (majority vote) among all the predictions.

Bagging uses bootstrap sampling to obtain the data subsets for training the base learners. For aggregating the outputs of base learners, bagging uses voting for classification and averaging for regression.

Bootstrapping is a technique of sampling different sets of data from a given training set by using replacement. After bootstrapping the training dataset, we train model on all the different sets and aggregate the result

Bagging is also known as parallel model since we run all models parallely and combine there results at the end.

  • Advantages of a Bagging Model

1) Bagging significantly decreases the variance without increasing bias.

2) Bagging methods work so well because of diversity in the training data since the sampling is done by bootstrapping.

3) Also, if the training set is very huge, it can save computational time by training model on relatively smaller data set and still can increase the accuracy of the model.

4) Works well with small datasets as well.

Algorithms based on Bagging

Bagging meta-estimator

Random Forest

Pasting

Pasting is an ensemble technique similar to bagging with the only difference being that there is no replacement done while sampling the training dataset.

“pasting takes the samples without replacement or overlapping .if any samples was taken once that will not take again”

This causes less diversity in the sampled datasets and data ends up being correlated. That’s why bagging is more preferred than pasting in real scenarios.

Boosting

Boosting is also a ensemble technique that convert weak learners to strong learners

The main principle of boosting is to fit a sequence of weak learners− models that are only slightly better than random guessing, such as small decision trees− to weighted versions of the data. More weight is given to examples that were miss-classified by earlier rounds.

Boosting is to fit a samples (set of data) sequence of weak learners.if samples is miss-classified some of point from any weak learner.then next sequence of weak learner pay more attention on miss-classified samples .and find error and assign high(increase) weight of miss-classified sample and decrees weight of correct classified sample.keep doing until all wrongly predicted or miss-classified sample are correct.

Algorithms based on Boosting

AdaBoost
GBM
XGB
Light GBM
CatBoost

Stacking (Stacked Generalization)

Stacking is a type of ensemble technique which combines the predictions of two or more models, also called base models, and use the combination as the input for a new model (meta-model) i.e. a new model is trained on the predictions of the base models.

The base level often consists of different learning algorithms and therefore stacking ensembles are often heterogeneous. The algorithm below summarizes stacking.

Let’s understand more by looking at the steps involved for stacking:

  • Split the dataset into a training set and a holdout set. We can use k-fold validation for seleting different set of validation sets.
  • Generally, we do a 50–50 split of the training set and the hold out set.
  • training set = x1,y1 hold out set = x2, y2
  • Split the training set again into training and test dataset e.g. x1_train, y1_train, x1_test, y1_test
  • Train all the base models on training set x1_train, y1_train.
  • After training is done, get the predictions of all the base models on the validation set x2.
  • Stack all these predictions together (you can also take an average of all the predictions or probability prediction) as it will be used as input feature for the meta_model.
  • Again, get the prediction for all the base models on the test set i.e. x1_test
  • Again, stack all these predictions together (you can also take an average of all the predictions or probability prediction) as it will be used as the prediction dataset for the meta_model.
  • Use the stacked data from step 5 as the input feature for meta_model and validation set y2 as the target variable and train the model on these data.
  • Once, the training is done check the accuracy of meta_model by using data from step 7 for prediction and y1_test for evaluation.

--

--