Decision Tree : Classification and Regression Algorithm(CART ,ID3)

Pradeep Dhote
7 min readJul 26, 2020

Decision Tree Explained in Detail

CART :- Classification and Regression Tree

  • It is used for generating both classification tree and regression tree.
  • It uses Gini index as metric/cost function to evaluate split in feature selection
  • It is used for binary classification.
  • It use least square as a metric to select features in case of Regression tree.

Calculate Gini Index for sub-nodes

Sum of Square Of probability for Success and Failure (p² x q²)

The node for which the Gini impurity is least is selected as the root node to split

Lets start with weather data set, which is quite famous in explaining decision tree algorithm,where target is to predict play or not( Yes or No) based on weather condition

From data, outlook, temperature, humidity, wind are the features of data.

So, lets start building tree,
Outlook is a nominal feature. it can take three value, sunny, overcast and rain. Lets summarize the final decision for outlook features,

Gini index calculated by subtracting the sum of square probability of each class

Gini index (outlook=sunny)= 1-(2/5)²-(3/5)² = 1- 0.16–0.36 = 0.48

Gini index(outlook=overcast)= 1- (4/4)²-(0/4)² = 1- 1- 0 = 0

Gini index(outlook=rainfall)= 1- (3/5)² -(2/5)² = 1- 0.36- 0.16 = 0.48

Now , we will calculate the weighted sum of Gini index for outlook features,

Gini(outlook) = (5/14)0.48 + (4/14) *0 + (5/14)0.48 = 0.342

Similarly, temperature is also a nominal feature, it can take three values, hot,cold and mild. lets summarize the final decision of temperature feature.

Gini(temperature=hot) = 1-(2/4)²-(2/4)² = 0.5

Gini(temperature=cool) = 1-(3/4)²-(1/4)² = 0.375

Gini(temperature=mild) = 1-(4/6)²-(2/6)² = 0.445

Now, the weighted sum of Gini index for temperature features can be calculated as,

Gini(temperature)= (4/14) *0.5 + (4/14) *0.375 + (6/14) *0.445 =0.439

Similarly for Humidity

Humidity is a binary class feature , it can take two value high and normal.

Gini(humidity=high) = 1-(3/7)²-(4/7)² = 0.489

Gini(humidity=normal) = 1-(6/7)²-(1/7)² = 0.244

Now, the weighted sum of Gini index for humidity features can be calculated as,

Gini(humidity) = (7/14) *0.489 + (7/14) *0.244=0.367

Similarly for Wind

wind is a binary class feature , it can take two value weak and strong.

Gini(wind=weak)= 1-(6/8)²-(2/8)² = 0.375

Gini(wind=strong)= 1-(3/6)²-(3/6)²= 0.5

Now, the weighted sum of Gini index for wind features can be calculated as,

Gini(wind) = (8/14) *0.375 + (6/14) *0.5=0.428

Decision for root node

So,the final decision of all the features,

From table, you can seen that Gini index for outlook feature is lowest. So we get our root node.

Now, lets focus on sub data on sunny outlook feature. we need to find the Gini index for temperature, humidity and wind feature respectively.

Gini index for temperature on sunny outlook

Gini(outlook=sunny & temperature=hot) = 1-(0/2)²-(2/2)² = 0

Gini(outlook=sunny & temperature=cool) = 1-(1/1)²-(0/1)² = 0

Gini(outlook=sunny & temperature=mild) = 1-(1/2)²-(1/2)² = 0.5

Now, the weighted sum of Gini index for temperature on sunny outlook features can be calculated as,

Gini(outlook=sunny & temperature)= (2/5) *0 + (1/5) *0+ (2/5) *0.5 =0.2

Gini Index for humidity on sunny outlook

Gini(outlook=sunny & humidity=high) = 1-(0/3)²-(3/3)² = 0

Gini(outlook=sunny & humidity=normal) = 1-(2/2)²-(0/2)² = 0

Now, the weighted sum of Gini index for humidity on sunny outlook features can be calculated as,

Gini(outlook = sunny & humidity) = (3/5) *0 + (2/5) *0=0

Gini Index for wind on sunny outlook

Gini(outlook=sunny & wind=weak) = 1-(1/3)²-(2/3)² = 0.44

Gini(outlook=sunny & wind=strong) = 1-(1/2)²-(1/2)² = 0.5

Now, the weighted sum of Gini index for wind on sunny outlook features can be calculated as,

Gini(outlook = sunny and wind) = (3/5) *0.44 + (2/5) *0.5=0.266+0.2= 0.466

Decision on sunny outlook factor

we have calculated the Gini index of all the features when the outlook is sunny. You can infer that humidity has lowest value. so next node will be humidity.

Now,Lets focus on sub data for overcast outlook feature. and calculate till all dataset split is a same manner

So Final Decision Tree will be like above tree

ID3 (Iterative Dichotomiser)

D3(Iterative Dichotomiser 3): This solution uses Entropy and Information gain as metrics to form a better decision tree. The attribute with the highest information gain is used as a root node, and a similar approach is followed after that

Entropy varies from 0 to 1. 0 if all the data belong to a single class and 1 if the class distribution is equal.

  • In this way, entropy will give a measure of impurity in the dataset.

Steps to decide which attribute to split:

  1. Compute the entropy for the dataset
  2. For every attribute:

2.1 Calculate entropy for all categorical values.

2.2 Take average information entropy for the attribute.

2.3 Calculate gain for the current attribute.

3. Pick the attribute with the highest information gain.

4. Repeat until we get the desired tree

we will take the same weather data set , we have taken for explaining CART algorithm in previous story

Number of observations = 14

Number of observations having Decision ‘Yes’ = 9

probability of ‘Yes’ , p(Yes) = 9/14

Number of observations having Decision ‘No’ =5

probability of ‘No’ , p(No) = 5/14

As we have four attribute, outlook, temperature, humidity, and wind

Information Gain on Sunny outlook factor

Number of instance sunny outlook factor = 5

Decision = ‘yes’,prob(Decision = ‘yes’ |outlook =sunny) = 2/5

Decision = ‘No’,prob(Decision = ‘No’ |outlook =sunny) = 3/5

Entropy(Decision |outlook =sunny)= -(2/5)*log(2/5)-(3/5)*log(3/5)=0.97

Entropy(Decision |outlook =overcast)= -(4/4)*log(4/4)= 0

Entropy(Decision|outlook =rainfall)=- (3/5)*log(3/5)-(2/5)*log(2/5)= 0.97

Gain(Decision,outlook) = Entropy(Decision)-P(Decision |outlook =sunny)*log P(Decision |outlook =sunny)-P(Decision |outlook =overcast)*log P(Decision |outlook =overcast)-P(Decision|outlook =rainfall)*log P(Decision|outlook =rainfall)

Gain(Decision,outlook) = 0.97- (5/14)*0.97-(4/4)*0-(5/14)*0.97=0.247

Summary of Information Gain for all the attribute

Gain(Decision, outlook) = 0.247

Gain(Decision, wind ) = 0.048

Gain(Decision, temperature) = 0.029

Gain(Decision, humidity) = 0.151

So, outlook has the highest information gain so it is selected as first node/ root node.

Information Gain on Temperature under Sunny outlook factor

Entropy(sunny|temp =hot)=- (0/2)*log(0/2)-(2/2)*log(2/2)= 0

Entropy(sunny|temp =cool)=- (1/1)*log(1/1)-(0/1)*log(0/1)= 0

Entropy(sunny|temp =mild)=- (1/2)*log(1/2)-(1/2)*log(1/2)= 1

Gain(sunny,temp) = 0.97- (2/5)*0 -(1/5)*0 -(2/5)*1=0.57

Summary of information gain on all attribute under Sunny outlook factor

Gain(sunny, temp) = 0.57

Gain(sunny, humidity) = 0.97

Gain(sunny, wind) =0.019

we can see that information gain of ‘Humidity’ attribute is higher than other. so, it is next node under sunny outlook factor.

Information Gain on humidity factor

humidity takes two value, normal and high.

From the both table, you can infer that, whenever humidity is high decision is ‘No’

And, when humidity is normal decision is ‘Yes’

Now,Lets focus on other sub data feature. and calculate till all dataset split is a same manner

C4.5 :

It is an extension of ID3 algorithm, and better than ID3 as it deals both continuous and discreet values.It is also used for classfication purposes

algorithm can create a more generalized models including continuous data and could handle missing data

Decision rules will be found based on entropy and information gain ratio pair of each feature. In each level of decision tree, the feature having the maximum gain ratio will be the decision rule.

--

--