Decision Tree : Classification and Regression Algorithm(CART ,ID3)
Decision Tree Explained in Detail
CART :- Classification and Regression Tree
- It is used for generating both classification tree and regression tree.
- It uses Gini index as metric/cost function to evaluate split in feature selection
- It is used for binary classification.
- It use least square as a metric to select features in case of Regression tree.
Calculate Gini Index for sub-nodes
Sum of Square Of probability for Success and Failure (p² x q²)
The node for which the Gini impurity is least is selected as the root node to split
Lets start with weather data set, which is quite famous in explaining decision tree algorithm,where target is to predict play or not( Yes or No) based on weather condition
From data, outlook, temperature, humidity, wind are the features of data.
So, lets start building tree,
Outlook is a nominal feature. it can take three value, sunny, overcast and rain. Lets summarize the final decision for outlook features,
Gini index calculated by subtracting the sum of square probability of each class
Gini index (outlook=sunny)= 1-(2/5)²-(3/5)² = 1- 0.16–0.36 = 0.48
Gini index(outlook=overcast)= 1- (4/4)²-(0/4)² = 1- 1- 0 = 0
Gini index(outlook=rainfall)= 1- (3/5)² -(2/5)² = 1- 0.36- 0.16 = 0.48
Now , we will calculate the weighted sum of Gini index for outlook features,
Gini(outlook) = (5/14)0.48 + (4/14) *0 + (5/14)0.48 = 0.342
Similarly, temperature is also a nominal feature, it can take three values, hot,cold and mild. lets summarize the final decision of temperature feature.
Gini(temperature=hot) = 1-(2/4)²-(2/4)² = 0.5
Gini(temperature=cool) = 1-(3/4)²-(1/4)² = 0.375
Gini(temperature=mild) = 1-(4/6)²-(2/6)² = 0.445
Now, the weighted sum of Gini index for temperature features can be calculated as,
Gini(temperature)= (4/14) *0.5 + (4/14) *0.375 + (6/14) *0.445 =0.439
Similarly for Humidity
Humidity is a binary class feature , it can take two value high and normal.
Gini(humidity=high) = 1-(3/7)²-(4/7)² = 0.489
Gini(humidity=normal) = 1-(6/7)²-(1/7)² = 0.244
Now, the weighted sum of Gini index for humidity features can be calculated as,
Gini(humidity) = (7/14) *0.489 + (7/14) *0.244=0.367
Similarly for Wind
wind is a binary class feature , it can take two value weak and strong.
Gini(wind=weak)= 1-(6/8)²-(2/8)² = 0.375
Gini(wind=strong)= 1-(3/6)²-(3/6)²= 0.5
Now, the weighted sum of Gini index for wind features can be calculated as,
Gini(wind) = (8/14) *0.375 + (6/14) *0.5=0.428
Decision for root node
So,the final decision of all the features,
From table, you can seen that Gini index for outlook feature is lowest. So we get our root node.
Now, lets focus on sub data on sunny outlook feature. we need to find the Gini index for temperature, humidity and wind feature respectively.
Gini index for temperature on sunny outlook
Gini(outlook=sunny & temperature=hot) = 1-(0/2)²-(2/2)² = 0
Gini(outlook=sunny & temperature=cool) = 1-(1/1)²-(0/1)² = 0
Gini(outlook=sunny & temperature=mild) = 1-(1/2)²-(1/2)² = 0.5
Now, the weighted sum of Gini index for temperature on sunny outlook features can be calculated as,
Gini(outlook=sunny & temperature)= (2/5) *0 + (1/5) *0+ (2/5) *0.5 =0.2
Gini Index for humidity on sunny outlook
Gini(outlook=sunny & humidity=high) = 1-(0/3)²-(3/3)² = 0
Gini(outlook=sunny & humidity=normal) = 1-(2/2)²-(0/2)² = 0
Now, the weighted sum of Gini index for humidity on sunny outlook features can be calculated as,
Gini(outlook = sunny & humidity) = (3/5) *0 + (2/5) *0=0
Gini Index for wind on sunny outlook
Gini(outlook=sunny & wind=weak) = 1-(1/3)²-(2/3)² = 0.44
Gini(outlook=sunny & wind=strong) = 1-(1/2)²-(1/2)² = 0.5
Now, the weighted sum of Gini index for wind on sunny outlook features can be calculated as,
Gini(outlook = sunny and wind) = (3/5) *0.44 + (2/5) *0.5=0.266+0.2= 0.466
Decision on sunny outlook factor
we have calculated the Gini index of all the features when the outlook is sunny. You can infer that humidity has lowest value. so next node will be humidity.
Now,Lets focus on sub data for overcast outlook feature. and calculate till all dataset split is a same manner
So Final Decision Tree will be like above tree
ID3 (Iterative Dichotomiser)
D3(Iterative Dichotomiser 3): This solution uses Entropy and Information gain as metrics to form a better decision tree. The attribute with the highest information gain is used as a root node, and a similar approach is followed after that
Entropy varies from 0 to 1. 0 if all the data belong to a single class and 1 if the class distribution is equal.
- In this way, entropy will give a measure of impurity in the dataset.
Steps to decide which attribute to split:
- Compute the entropy for the dataset
- For every attribute:
2.1 Calculate entropy for all categorical values.
2.2 Take average information entropy for the attribute.
2.3 Calculate gain for the current attribute.
3. Pick the attribute with the highest information gain.
4. Repeat until we get the desired tree
we will take the same weather data set , we have taken for explaining CART algorithm in previous story
Number of observations = 14
Number of observations having Decision ‘Yes’ = 9
probability of ‘Yes’ , p(Yes) = 9/14
Number of observations having Decision ‘No’ =5
probability of ‘No’ , p(No) = 5/14
As we have four attribute, outlook, temperature, humidity, and wind
Information Gain on Sunny outlook factor
Number of instance sunny outlook factor = 5
Decision = ‘yes’,prob(Decision = ‘yes’ |outlook =sunny) = 2/5
Decision = ‘No’,prob(Decision = ‘No’ |outlook =sunny) = 3/5
Entropy(Decision |outlook =sunny)= -(2/5)*log(2/5)-(3/5)*log(3/5)=0.97
Entropy(Decision |outlook =overcast)= -(4/4)*log(4/4)= 0
Entropy(Decision|outlook =rainfall)=- (3/5)*log(3/5)-(2/5)*log(2/5)= 0.97
Gain(Decision,outlook) = Entropy(Decision)-P(Decision |outlook =sunny)*log P(Decision |outlook =sunny)-P(Decision |outlook =overcast)*log P(Decision |outlook =overcast)-P(Decision|outlook =rainfall)*log P(Decision|outlook =rainfall)
Gain(Decision,outlook) = 0.97- (5/14)*0.97-(4/4)*0-(5/14)*0.97=0.247
Summary of Information Gain for all the attribute
Gain(Decision, outlook) = 0.247
Gain(Decision, wind ) = 0.048
Gain(Decision, temperature) = 0.029
Gain(Decision, humidity) = 0.151
So, outlook has the highest information gain so it is selected as first node/ root node.
Information Gain on Temperature under Sunny outlook factor
Entropy(sunny|temp =hot)=- (0/2)*log(0/2)-(2/2)*log(2/2)= 0
Entropy(sunny|temp =cool)=- (1/1)*log(1/1)-(0/1)*log(0/1)= 0
Entropy(sunny|temp =mild)=- (1/2)*log(1/2)-(1/2)*log(1/2)= 1
Gain(sunny,temp) = 0.97- (2/5)*0 -(1/5)*0 -(2/5)*1=0.57
Summary of information gain on all attribute under Sunny outlook factor
Gain(sunny, temp) = 0.57
Gain(sunny, humidity) = 0.97
Gain(sunny, wind) =0.019
we can see that information gain of ‘Humidity’ attribute is higher than other. so, it is next node under sunny outlook factor.
Information Gain on humidity factor
humidity takes two value, normal and high.
From the both table, you can infer that, whenever humidity is high decision is ‘No’
And, when humidity is normal decision is ‘Yes’
Now,Lets focus on other sub data feature. and calculate till all dataset split is a same manner
C4.5 :
It is an extension of ID3 algorithm, and better than ID3 as it deals both continuous and discreet values.It is also used for classfication purposes
algorithm can create a more generalized models including continuous data and could handle missing data
Decision rules will be found based on entropy and information gain ratio pair of each feature. In each level of decision tree, the feature having the maximum gain ratio will be the decision rule.