Exercise 11: Solution - Decision tree Given the obtained data and the fact that outcome of a match might also depend on the efforts Federera spent on it, we build the following training data set with the additional attribute taking values 1 if Federera used full strength in the match and 0 otherwise. Time Match Court type Morning Master 1 F Night Friendly 0 F Afternoon Friendly 0 N Afternoon Master 1 N Afternoon Grand slam 1 F Morning Master 1 F Afternoon Grand slam 1 N Night Friendly 0 F Night Master 1 N Afternoon Master 1 N Afternoon Master 1 F Class P=Federera wins=f, class N=Nadale wins=n. Note that I(x,0)=I(0,x)=0 for all x, and I(x,x)=1 for x>0. 1) Create the root of the decision tree At this stage: I(p,n)= I (11,5)=0.896 Split by attribute A1= Time S1= Morning p1=2, n1=0, I(p1,n1)=I(2,0)=0 S2= Afternoon p2=7, n2=4, I(p2,n2)=I(7,4)= 0.946 S3= Night p3=2, n3=1, I(p3, n3)=i(2,1)=0.918 Thus, E(A1) = 2/16*I(2,0) + 11/16*I(7,4) + 3/16*I(2,1) = 0.822
Split by attribute A2= Match type S1= Master p1=3, n1=3, I(p1,n1)=I(3,3)=1 S2= Grand slam p2=6, n2=1, I(p2,n2)=I(6,1)= 0.591 S3= Friendly p3=2, n3=1, I(p3, n3)=i(2,1)=0.918 Thus, E(A2) = 6/16*I(3,3) + 7/16*I(6,1) + 3/16*I(2,1) = 0.806 Split by attribute A3= Court S1= p1=4, n1=0, I(p1,n1)=I(4,0)=0 S2= p2=2, n2=3, I(p2,n2)=I(2,3)= 0.97 S3= p3=5, n3=0, I(p3, n3)=i(5,0)=0 S4= p4=0, n4=2, I(p4, n4)=i(0,2)=0 Thus, E(A3) = 5/16*I(2,3) = 0.30 Split by attribute A4= S1= 1 p1=9, n1=4, I(p1,n1)=I(9,4)=0.89 S2= 0 p2=2, n2=1, I(p2,n2)=I(2,1)= 0.918 Thus, E(A4) = 13/16*I(9,4) + 3/16*I(2,1) = 0.895 Since E(A3) is smallest, the information gain of using A3 to split would be the maximal. Thus we use the attribute A3= Court to split at the root of the decision tree. The current decision tree is:
Court 2) Split the first branch Court = Training data Morning Master 1 F Afternoon Grand slam 1 F Morning Master 1 F Afternoon Master 1 F This always leads to the outcome for whichever attributes being chosen to split. Thus we further draw the decision tree as: Court 3) Split the branch Court = Training data Night Friendly 0 F Night Friendly 0 F This always leads to the outcome for whichever attributes being chosen to split. Thus we further draw the decision tree as:
Court 4) Split the branch Court = Training data Afternoon Friendly 0 N Night Master 1 N This always leads to the outcome for whichever attributes being chosen to split. Thus the decision tree is further drawn as: Court 5) Split the branch Court = Training data Afternoon Master 1 N Afternoon Grand slam 1 N Afternoon Master 1 N
At this stage: I(p,n)= I (2,3)=0.97, p+n=5 Split by attribute A1= Time S1= Morning p1=0, n1=0, I(p1,n1)=I(0,0)=0 S2= Afternoon p2=2, n2=3, I(p2,n2)=I(2,3)= 0.97 S3= Night p3=0, n3=0, I(p3, n3)=0 Thus, E(A1) = 5/5*I(2,3) = 0.97 Split by attribute A2= Match type S1= Master p1=0, n1=2, I(p1,n1)=I(0,2)=0 S2= Grand slam p2=2, n2=1, I(p2,n2)=I(2,1)= 0.918 S3= Friendly p3=0, n3=0, I(p3, n3)=i(0,0)=0 Thus, E(A2) = 3/5*I(2,1) = 0.55 Split by attribute A4= S1= 1 p1=2, n1=3, I(p1,n1)=I(2,3)=0.97 S2= 0 p2=0, n2=0, I(p2,n2)=I(0,0)= 0 Thus, E(A4) = 5/5*I(2,3) = 0.97 Since E(A2) is lowest, we split the branch using attribute A2= Match type, extending the decision tree as the following figure:
Court Match type Friendly / Master Grand slam We don t have training data for friendly matches, thus the decision for the case (court =, match type= Friendly ) is unknown (the winner can be either Nadale or Federera with probability 0.5). For matches of type Master, all samples show that Nadale is the winner, thus we create a leaf with label for this branch. 6) Split the branch Match type=grand slam Afternoon Grand slam 1 N For matches of type Grand slam, Federera wins 2 out of 3 matches in the training data set. We continue splitting this node using the remaining attribute Time ( will always be 1 in this branch). The final decision tree is:
Court Match type Friendly Master Grand slam / Time Afternoon Morning Night / / The next match between Federera and Nadale is represented as a case (Court =, Match type= Grand slam, Time= Afternoon, effort= 1 ). Using the above decision tree, we can decide that Federera will be more likely to win in his next match. Discussion: We can not eliminate all samples related to friendly matches since we probably want to predict outcome of these types of matches as well. We know that for all friendly matches, the results are due to the fact that Federera doesn t use his full strength. Thus, the explicit modeling of Federera s efforts (with the additional attribute ) is more generic in describing outcome of matches. Had we known that for some friendly matches, Federera also use his full strength, we must consider samples of friendly matches as noisy samples. In that case, the construction of the decision tree would be different: o Firstly, we build the full decision tree as if all samples are not noisy. o Secondly, we prune the part of the tree related to friendly matches: for a leaf-node arised from noisy samples, we label it with class C and indicate its corresponding error (see bellow), where C is the majority class in this node.
for each non-leaf node arised from noisy samples, e.g., those nodes built from samples with Match type=friendly, we eliminate the subtree bellow that (erroneous) node. This nonleaf node then becomes a leaf with label C and error for the estimation of that class C. the dertermining of errors for each node and the condition to prune a subtree is out of the scope of the lecture. Those who want to know more refer to the reference at the end of the lecture note. Some demonstration example is at: http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id 3/id3.html