Exercise 11: Solution - Decision tree

Similar documents
Decision Trees. an Introduction

Estimating the Probability of Winning an NFL Game Using Random Forests

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

VIRTUAL TENNIS TOUR SEASON 2014 OFFICIAL RULEBOOK

SEARCH SEARCH TREE. Node: State in state tree. Root node: Top of state tree

CENG 466 Artificial Intelligence. Lecture 4 Solving Problems by Searching (II)

SEARCH TREE. Generating the children of a node

COMP 406 Lecture 05. Artificial. Fiona Yan Liu Department of Computing The Hong Kong Polytechnic University

G53CLP Constraint Logic Programming

Introduction. AI and Searching. Simple Example. Simple Example. Now a Bit Harder. From Hammersmith to King s Cross

RUNNING A SUCCESSFUL ENTRY-LEVEL TOURNAMENT FOR YEAR-OLDS

Artificial Intelligence. Uninformed Search Strategies

Organizing Quantitative Data

Phoenix Soccer 2D Simulation Team Description Paper 2015

Junction design rules for urban traffic networks

Uninformed search methods

Heap Sort. Lecture 35. Robb T. Koether. Hampden-Sydney College. Mon, Apr 25, 2016

CS 173: Discrete Structures, Spring 2010 Homework 9 Solutions

Uninformed search methods II.

RAFA NADAL TENNIS CENTRE

Uninformed search methods II.

Problem Solving as Search - I

Predicting the development of the NBA playoffs. How much the regular season tells us about the playoff results.

1. The table below shows the relative frequencies of the ages of the students at Ingham High School.

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

Here are some examples of pacing activity and exercise

Algorithms and Data Structures

Unit 5: Prioritize and Manage Hazards and Risks STUDENT GUIDE

Incremental ARA*: An Anytime Incremental Search Algorithm For Moving Target Search. University of Southern California Singapore Management University

MARSHALL S WORLD OF SPORT BETTING RULES

Transposition Table, History Heuristic, and other Search Enhancements

Uninformed search methods

(C) Anton Setzer 2003 (except for pictures) A2. Hazard Analysis

Designing and Benchmarking Mine Roads for Safe and Efficient Haulage. Roger Thompson Alex Visser

Experiences with Area Assessment Materials

CSC242: Intro to AI. Lecture 21

If a fair coin is tossed 10 times, what will we see? 24.61% 20.51% 20.51% 11.72% 11.72% 4.39% 4.39% 0.98% 0.98% 0.098% 0.098%

UK rated gliding competition scoring explained

WOMEN'S NORTHWEST SUBURBAN TENNIS LEAGUE RULES AND INFORMATION ABOUT THE LEAGUE- 2017

Uninformed Search (Ch )

Predictors for Winning in Men s Professional Tennis

Cycle journeys on the Anderston-Argyle Street footbridge: a descriptive analysis. Karen McPherson. Glasgow Centre for Population Health

Bowls New Zealand Umpires Handbook

Should bonus points be included in the Six Nations Championship?

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

THE MIDLANDS MINIGOLF CLUB CONSTITUTION

Lecture 5. Optimisation. Regularisation

Using Online Learning to Analyze the Opponents Behavior

Solving Problems by Searching chap3 1. Problem-Solving Agents

Proposed Changes to the Catch Sharing Plan

How the interpretation of match statistics affects player performance

The Application of Data Mining in Sports Events

TYPES OF EVENTS. Cultural Arts and entertainment Educational Sporting Recreational. Business and trade Private

Southside APA Local Bylaws Southside APA Pool Leagues. Local Bylaws. Effective Summer Session, 2018

True class interval. frequency total. frequency

a WOW Lab Prep Instructions

Opening up the court (surface) in tennis grand slams

ASMFC Risk and Uncertainty Policy Workshop

RULES OF LEAGUE AND CUP COMPETITIONS

POTENTIAL ENERGY BOUNCE BALL LAB

Chapter 5: Methods and Philosophy of Statistical Process Control

Cricket Albury Wodonga Board

MUNSTER BRANCH OF TENNIS IRELAND WINTER LEAGUE 2018/2019 RULES

STATISTICS BASED SYSTEM DESIGN FOR PERFORATED CLUSTERS

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

WHITE PAPER: TRANSIT SERVICE FOR SOUTH SHAGANAPPI

Uninformed Search (Ch )

Hunt ID: 3071-UT-G-L A-MDeer-IC4ELLOMONT-RCS-R3YJER

100-Meter Dash Olympic Winning Times: Will Women Be As Fast As Men?

C. All players must be registered on line in TennisLink prior to playing a match or be subject to default. League fees must be paid on line.

Unit 3 Trigonometry. 3.1 Use Trigonometry to Find Lengths

Adaptability and Fault Tolerance

HIGHLIGHTS. Last Modified: 9/19/2015 2:31 PM BADMINTON 2

GREATER SUBURBAN INDOOR TENNIS LEAGUE GENERAL RULES

How Do Injuries in the NFL Affect the Outcome of the Game

Competition & Ranking Manual

Uninformed search strategies

Appendix A: Safety Assessment

Building an NFL performance metric

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

- creating the pong bat animation with various options

National5/6 PE TACTICS Pupil Workbook

Predicting the Draft and Career Success of Tight Ends in the National Football League

Design of Experiments Example: A Two-Way Split-Plot Experiment

Foundation Unit 13 topic test

End of Chapter Exercises

Hazard Identification or Complacency?? Which is the Bigger Problem?? Speakers: Brett James & Reginald Whitaker, CSP Oklahoma Steel & Wire

Math 1070 Sample Final Exam Spring 2016

Knots and Soap Film Minimal Surfaces Abstract The Trefoil Knot trefoil knot Figure 1. open knot. closed knot diagram.

STUDENT S BOOK. TEAM SPORTS: play, cooperate and compete in English. Beatriz Ruiz Nova

Hands- on Activity 5: Data Quality Control & Assurance

Depth-bounded Discrepancy Search

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) Computer Science Department

A Bioeconomic Model of the Recreational Gulf of Maine Cod and Haddock Fishery

Bayesian Optimized Random Forest for Movement Classification with Smartphones

Problem Solving Agents

The Capital of Texas Police Motorcycle Chute Out

CORK INSTITUTE OF TECHNOLOGY INSTITIÚID TEICNEOLAÍOCHTA CHORCAÍ. Semester 2 Examinations 2010

We can use a 2 2 array to show all four situations that can arise in a single play of this game, and the results of each situation, as follows:

Transcription:

Exercise 11: Solution - Decision tree Given the obtained data and the fact that outcome of a match might also depend on the efforts Federera spent on it, we build the following training data set with the additional attribute taking values 1 if Federera used full strength in the match and 0 otherwise. Time Match Court type Morning Master 1 F Night Friendly 0 F Afternoon Friendly 0 N Afternoon Master 1 N Afternoon Grand slam 1 F Morning Master 1 F Afternoon Grand slam 1 N Night Friendly 0 F Night Master 1 N Afternoon Master 1 N Afternoon Master 1 F Class P=Federera wins=f, class N=Nadale wins=n. Note that I(x,0)=I(0,x)=0 for all x, and I(x,x)=1 for x>0. 1) Create the root of the decision tree At this stage: I(p,n)= I (11,5)=0.896 Split by attribute A1= Time S1= Morning p1=2, n1=0, I(p1,n1)=I(2,0)=0 S2= Afternoon p2=7, n2=4, I(p2,n2)=I(7,4)= 0.946 S3= Night p3=2, n3=1, I(p3, n3)=i(2,1)=0.918 Thus, E(A1) = 2/16*I(2,0) + 11/16*I(7,4) + 3/16*I(2,1) = 0.822

Split by attribute A2= Match type S1= Master p1=3, n1=3, I(p1,n1)=I(3,3)=1 S2= Grand slam p2=6, n2=1, I(p2,n2)=I(6,1)= 0.591 S3= Friendly p3=2, n3=1, I(p3, n3)=i(2,1)=0.918 Thus, E(A2) = 6/16*I(3,3) + 7/16*I(6,1) + 3/16*I(2,1) = 0.806 Split by attribute A3= Court S1= p1=4, n1=0, I(p1,n1)=I(4,0)=0 S2= p2=2, n2=3, I(p2,n2)=I(2,3)= 0.97 S3= p3=5, n3=0, I(p3, n3)=i(5,0)=0 S4= p4=0, n4=2, I(p4, n4)=i(0,2)=0 Thus, E(A3) = 5/16*I(2,3) = 0.30 Split by attribute A4= S1= 1 p1=9, n1=4, I(p1,n1)=I(9,4)=0.89 S2= 0 p2=2, n2=1, I(p2,n2)=I(2,1)= 0.918 Thus, E(A4) = 13/16*I(9,4) + 3/16*I(2,1) = 0.895 Since E(A3) is smallest, the information gain of using A3 to split would be the maximal. Thus we use the attribute A3= Court to split at the root of the decision tree. The current decision tree is:

Court 2) Split the first branch Court = Training data Morning Master 1 F Afternoon Grand slam 1 F Morning Master 1 F Afternoon Master 1 F This always leads to the outcome for whichever attributes being chosen to split. Thus we further draw the decision tree as: Court 3) Split the branch Court = Training data Night Friendly 0 F Night Friendly 0 F This always leads to the outcome for whichever attributes being chosen to split. Thus we further draw the decision tree as:

Court 4) Split the branch Court = Training data Afternoon Friendly 0 N Night Master 1 N This always leads to the outcome for whichever attributes being chosen to split. Thus the decision tree is further drawn as: Court 5) Split the branch Court = Training data Afternoon Master 1 N Afternoon Grand slam 1 N Afternoon Master 1 N

At this stage: I(p,n)= I (2,3)=0.97, p+n=5 Split by attribute A1= Time S1= Morning p1=0, n1=0, I(p1,n1)=I(0,0)=0 S2= Afternoon p2=2, n2=3, I(p2,n2)=I(2,3)= 0.97 S3= Night p3=0, n3=0, I(p3, n3)=0 Thus, E(A1) = 5/5*I(2,3) = 0.97 Split by attribute A2= Match type S1= Master p1=0, n1=2, I(p1,n1)=I(0,2)=0 S2= Grand slam p2=2, n2=1, I(p2,n2)=I(2,1)= 0.918 S3= Friendly p3=0, n3=0, I(p3, n3)=i(0,0)=0 Thus, E(A2) = 3/5*I(2,1) = 0.55 Split by attribute A4= S1= 1 p1=2, n1=3, I(p1,n1)=I(2,3)=0.97 S2= 0 p2=0, n2=0, I(p2,n2)=I(0,0)= 0 Thus, E(A4) = 5/5*I(2,3) = 0.97 Since E(A2) is lowest, we split the branch using attribute A2= Match type, extending the decision tree as the following figure:

Court Match type Friendly / Master Grand slam We don t have training data for friendly matches, thus the decision for the case (court =, match type= Friendly ) is unknown (the winner can be either Nadale or Federera with probability 0.5). For matches of type Master, all samples show that Nadale is the winner, thus we create a leaf with label for this branch. 6) Split the branch Match type=grand slam Afternoon Grand slam 1 N For matches of type Grand slam, Federera wins 2 out of 3 matches in the training data set. We continue splitting this node using the remaining attribute Time ( will always be 1 in this branch). The final decision tree is:

Court Match type Friendly Master Grand slam / Time Afternoon Morning Night / / The next match between Federera and Nadale is represented as a case (Court =, Match type= Grand slam, Time= Afternoon, effort= 1 ). Using the above decision tree, we can decide that Federera will be more likely to win in his next match. Discussion: We can not eliminate all samples related to friendly matches since we probably want to predict outcome of these types of matches as well. We know that for all friendly matches, the results are due to the fact that Federera doesn t use his full strength. Thus, the explicit modeling of Federera s efforts (with the additional attribute ) is more generic in describing outcome of matches. Had we known that for some friendly matches, Federera also use his full strength, we must consider samples of friendly matches as noisy samples. In that case, the construction of the decision tree would be different: o Firstly, we build the full decision tree as if all samples are not noisy. o Secondly, we prune the part of the tree related to friendly matches: for a leaf-node arised from noisy samples, we label it with class C and indicate its corresponding error (see bellow), where C is the majority class in this node.

for each non-leaf node arised from noisy samples, e.g., those nodes built from samples with Match type=friendly, we eliminate the subtree bellow that (erroneous) node. This nonleaf node then becomes a leaf with label C and error for the estimation of that class C. the dertermining of errors for each node and the condition to prune a subtree is out of the scope of the lecture. Those who want to know more refer to the reference at the end of the lecture note. Some demonstration example is at: http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/06prop/id 3/id3.html