Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Similar documents
Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Essen,al Probability Concepts

Bayesian Methods: Naïve Bayes

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

2 When Some or All Labels are Missing: The EM Algorithm

Decision Trees. an Introduction

knn & Naïve Bayes Hongning Wang

PREDICTING the outcomes of sporting events

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

Predicting NBA Shots

Logistic Regression. Hongning Wang

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

Chapter 5: Methods and Philosophy of Statistical Process Control

Evaluating and Classifying NBA Free Agents

A computer program that improves its performance at some task through experience.

Predicting Horse Racing Results with Machine Learning

CS145: INTRODUCTION TO DATA MINING

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Ivan Suarez Robles, Joseph Wu 1

#1 Accurately Rate and Rank each FBS team, and

CSC242: Intro to AI. Lecture 21

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.

Calculation of Trail Usage from Counter Data

Predicting the Total Number of Points Scored in NFL Games

Simulating Major League Baseball Games

5.1 Introduction. Learning Objectives

Predicting the NCAA Men s Basketball Tournament with Machine Learning

CS249: ADVANCED DATA MINING

Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing

Navigate to the golf data folder and make it your working directory. Load the data by typing

A history-based estimation for LHCb job requirements

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Looking at Spacings to Assess Streakiness

University of Cincinnati

Basketball data science

Real-Time Electricity Pricing

Interpretable Discovery in Large Image Data Sets

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

An Artificial Neural Network-based Prediction Model for Underdog Teams in NBA Matches

Reliable Real-Time Recognition of Motion Related Human Activities using MEMS Inertial Sensors

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Transmitter CS 21 Operation Manual

Strategic Research PROGRAM FOLLOW-UP ASSESSMENT OF THE MOBILEYE SHIELD+ COLLISION AVOIDANCE SYSTEM

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

Machine Learning an American Pastime

A Behavioral Theory. Teck H. Ho. & National University of Singapore. Joint Work with Noah Lim, Houston Juanjuan Zhang, MIT. July 2008 Teck H.

Measuring range Δp (span = 100%) Pa

Transactions on the Built Environment vol 7, 1994 WIT Press, ISSN

Deconstructing Data Science

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 03, 2016 ISSN (online):

Lecture 5. Optimisation. Regularisation

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

Lecture 39: Training Neural Networks (Cont d)

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Beamex. Calibration White Paper. Weighing scale calibration - How to calibrate weighing instruments

Modeling Salmon Behavior on the Umpqua River. By Scott Jordan 6/2/2015

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

University of Cincinnati

Fail Operational Controls for an Independent Metering Valve

Application of Bayesian Networks to Shopping Assistance

Mathematics of Pari-Mutuel Wagering

Home Team Advantage in English Premier League

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

A few things to remember about ANOVA

Introduction to Pattern Recognition

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies

Real Time Early Warning Indicators for Costly Asset Price Boom/Bust Cycles: A Role for Global Liquidity

Computer Integrated Manufacturing (PLTW) TEKS/LINKS Student Objectives One Credit

Using Multi-Class Classification Methods to Predict Baseball Pitch Types

Deconstructing Data Science

Open Research Online The Open University s repository of research publications and other research outputs

Transmitter CS 21 Operation Manual

Virtual Breadboarding. John Vangelov Ford Motor Company

Assignment for Next Class. Information Systems ISM Put In Nonsense, Get Out Chaos. System and Modeling Concepts. Components of a System.

Spatio-temporal analysis of team sports Joachim Gudmundsson

1.0 General Guide WARNING!

THe rip currents are very fast moving narrow channels,

Special Topics: Data Science

Uncovering Anomalous Usage of Medical Records via Social Network Analysis

2. Determine how the mass transfer rate is affected by gas flow rate and liquid flow rate.

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

CAAD CTF 2018 Rules June 21, 2018 Version 1.1

Advanced PMA Capabilities for MCM

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Name May 3, 2007 Math Probability and Statistics

/435 Artificial Intelligence Fall 2015

POWER Quantifying Correction Curve Uncertainty Through Empirical Methods

download.file(" destfile = "kobe.rdata")

Inferring land use from mobile phone activity

Addressing DDR5 design challenges with IBIS-AMI modeling techniques

Projecting Three-Point Percentages for the NBA Draft

Lecture 10. Support Vector Machines (cont.)

Fundamentals of Machine Learning for Predictive Data Analytics

Using Spatio-Temporal Data To Create A Shot Probability Model

Biostatistics & SAS programming

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

POKEMON HACKS. Jodie Ashford Josh Baggott Chloe Barnes Jordan bird

PEDESTRIAN behavior modeling and analysis is

Transcription:

Naïve Bayes These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Essential Probability Concepts Marginalization: P (B) = X v2values(a) P (B ^ A = v) Conditional Probability: Bayes Rule: Independence: P (A B) = P (A B) = P (A ^ B) P (B) ^ P (B A) P (A) P (B) A? B $ P (A ^ B) =P (A) P (B) $ P (A B) =P (A) A? B C $ P (A ^ B C) =P (A C) P (B C) 2

Density Estimation 3

Recall the Joint Distribution... alarm alarm earthquake earthquake earthquake earthquake burglary 0.01 0.08 0.001 0.009 burglary 0.01 0.09 0.01 0.79 4

How Can We Obtain a Joint Distribution? Option 1: Elicit it from an expert human Option 2: Build it up from simpler probabilistic facts e.g, if we knew P(a) = 0.7 P(b a) = 0.2 P(b a) = 0.1 then, we could compute P(a Ù b) Option 3: Learn it from data... Based on slide by Andrew Moore 5

Learning a Joint Distribution Step 1: Step 2: Build a JD table for your Then, fill in each row with: attributes in which the recordsmatching row probabilities are unspecified Pˆ (row) = total number of records A B C Prob 0 0 0? 0 0 1? 0 1 0? 0 1 1? 1 0 0? 1 0 1? 1 1 0? 1 1 1? A B C Prob 0 0 0 0.30 0 0 1 0.05 0 1 0 0.10 0 1 1 0.05 1 0 0 0.05 1 0 1 0.10 1 1 0 0.25 1 1 1 0.10 Slide Andrew Moore Fraction of all records in which A and B are true but C is false

Density Estimation Our joint distribution learner is an example of something called Density Estimation A Density Estimator learns a mapping from a set of attributes to a probability Input Attributes Density Estimator Probability Slide Andrew Moore

Density Estimation Compare it against the two other major kinds of models: Input Attributes Input Attributes Input Attributes Classifier Density Estimator Regressor Prediction of categorical output Probability Prediction of real-valued output Slide Andrew Moore

Evaluating Density Estimation Test-set criterion for estimating performance on future data Input Attributes Classifier Prediction of categorical output Test set Accuracy Input Attributes Density Estimator Probability? Input Attributes Regressor Prediction of real-valued output Test set Accuracy Slide Andrew Moore

Evaluating a Density Estimator Given a record x, a density estimator M can tell you how likely the record is: ˆP (x M) The density estimator can also tell you how likely the dataset is: Under the assumption that all records were independently generated from the Density Estimator s JD (that is, i.i.d.) ˆP (x 1 ^ x 2 ^...^ x n M) = ny ˆP (x i M) dataset i=1 Slide by Andrew Moore

Example Small Dataset: Miles Per Gallon From the UCI repository (thanks to Ross Quinlan) 192 records in the training set mpg modelyear maker good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe Slide by Andrew Moore

Example Small Dataset: Miles Per Gallon From the UCI repository (thanks to Ross Quinlan) 192 records in the training set mpg modelyear maker Slide by Andrew Moore good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe ˆP (dataset M) = ny i=1 ˆP (x i M) =3.4 10 203 (in this case)

Log Probabilities For decent sized data sets, this product will underflow ny ˆP (dataset M) = ˆP (x i M) i=1 Therefore, since probabilities of datasets get so small, we usually use log probabilities log ˆP ny nx (dataset M) = log ˆP (x i M) = log ˆP (x i M) i=1 i=1 Based on slide by Andrew Moore

Example Small Dataset: Miles Per Gallon From the UCI repository (thanks to Ross Quinlan) 192 records in the training set mpg modelyear maker Slide by Andrew Moore good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe log ˆP (dataset M) = nx i=1 log ˆP (x i M) = 466.19 (in this case)

Pros/Cons of the Joint Density Estimator The Good News: We can learn a Density Estimator from data. Density estimators can do many good things Can sort the records by probability, and thus spot weird records (anomaly detection) Can do inference Ingredient for Bayes Classifiers (coming very soon...) The Bad News: Density estimation by directly learning the joint is impractical, may result in adverse behavior Slide by Andrew Moore

Slide by Christopher Bishop Curse of Dimensionality

The Joint Density Estimator on a Test Set An independent test set with 196 cars has a much worse log-likelihood Actually it s a billion quintillion quintillion quintillion quintillion times less likely Density estimators can overfit......and the full joint density estimator is the overfittiest of them all! Slide by Andrew Moore

Overfitting Density Estimators If this ever happens, the joint PDE learns there are certain combinations that are impossible log ˆP nx (dataset M) = log ˆP (x i M) i=1 = 1 if for any i, ˆP (xi M) =0 Copyright Andrew W. Moore Slide by Andrew Moore

The Naïve Bayes Classifier 19

Recall Baye s Rule: P (hypothesis evidence) = Bayes Rule Equivalently, we can write: where X is a random variable representing the evidence and Y is a random variable for the label (our hypothesis) This is actually short for: P (evidence hypothesis) P (hypothesis) P (evidence) P (Y = y k X = x i )= P (Y = y k)p (X = x i Y = y k ) P (X = x i ) P (Y = y k X = x i )= P (Y = y k)p (X 1 = x i,1 ^...^ X d = x i,d Y = y k ) P (X 1 = x i,1 ^...^ X d = x i,d ) where X j denotes the random variable for the j th feature 20

Naïve Bayes Classifier Idea: Use the training data to estimate P (X Y ) and P (Y ). Then, use Bayes rule to infer P (Y X new ) for new data Easy to estimate from data Impractical, but necessary P (Y = y k X = x i )= P (Y = y k)p (X 1 = x i,1 ^...^ X d = x i,d Y = y k ) P (X 1 = x i,1 ^...^ X d = x i,d ) Unnecessary, as it turns out Estimating the joint probability distribution P (X 1,X 2,...,X d Y ) is not practical 21

Naïve Bayes Classifier Problem: estimating the joint PD or CPD isn t practical Severely overfits, as we saw before However, if we make the assumption that the attributes are independent given the class label, estimation is easy! dy P (X 1,X 2,...,X d Y )= P (X j Y ) j=1 In other words, we assume all attributes are conditionally independent given Y Often this assumption is violated in practice, but more on that later 22

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) =? P( play) =? P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 23

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) =? P( play) =? P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 24

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 25

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 26

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 27

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 28

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) =? P(Humid = high play) =?...... 29

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) =? P(Humid = high play) =?...... 30

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) =?...... 31

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) =?...... 32

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) = 1...... 33

Laplace Smoothing Notice that some probabilities estimated by counting might be zero Possible overfitting! Fix by using Laplace smoothing: Adds 1 to each count P (X j = v Y = y k )= X c v +1 c v 0 + values(x j ) where v 0 2values(X j ) c v is the count of training instances with a value of v for attribute j and class label y k values(x j ) is the number of values X j can take on 34

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 35

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) = 1/3 P(Humid = high play) =? P(Humid = high play) =?...... 36

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) = 1/3 P(Humid = high play) = 3/5 P(Humid = high play) =?...... 37

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) = 1/3 P(Humid = high play) = 3/5 P(Humid = high play) = 2/3...... 38

Using the Naïve Bayes Classifier Now, we have P (Y = y k X = x i )= P (Y = y k) Q d j=1 P (X j = x i,j Y = y k ) P (X = x i ) This is constant for a given instance, and so irrelevant to our prediction In practice, we use log-probabilities to prevent underflow To classify a new point x, h(x) = arg max y k P (Y = y k ) YdY P (X j = x j Y = y k ) j=1 = arg max y k log P (Y = y k )+ X j th attribute value of x dx log P (X j = x j Y = y k ) j=1 39

The Naïve Bayes Classifier Algorithm For each class label y k Estimate P(Y = y k ) from the data For each value x i,j of each attribute X i Estimate P(X i = x i,j Y = y k ) Classify a new point via: h(x) = arg max y k log P (Y = y k )+ dx log P (X j = x j Y = y k ) j=1 In practice, the independence assumption doesn t often hold true, but Naïve Bayes performs very well despite this 40

The Naïve Bayes Graphical Model Labels (hypotheses) Y P(Y) Attributes (evidence) X 1 X i X d P(X 1 Y) P(X i Y) P(X d Y) Nodes denote random variables Edges denote dependency Each node has an associated conditional probability table (CPT), conditioned upon its parents 41

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Sky Temp Humid 42

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? yes no P(Play) Sky Temp Humid 43

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid 44

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky sunny rainy sunny rainy Play? P(Sky Play) yes yes no no 45

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 46

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes cold yes warm no cold no 47

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 48

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes norm yes high no norm no 49

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes 3/5 norm yes 2/5 high no 2/3 norm no 1/3 50

Example Using NB for Classification Sky Play Temp Humi d Play? P(Play) yes 3/4 no 1/4 Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes 3/5 norm yes 2/5 high no 2/3 norm no 1/3 h(x) = arg max y k log P (Y = y k )+ dx log P (X j = x j Y = y k ) j=1 Goal: Predict label for x = (rainy, warm, normal) 51

Example Using NB for Classification Sky Play Temp Predict label for: Humi d x = (rainy, warm, normal) Play? P(Play) yes 3/4 no 1/4 Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes 3/5 norm yes 2/5 high no 2/3 norm no 1/3 predict PLAY 52

Naïve Bayes Summary Advantages: Fast to train (single scan through data) Fast to classify Not sensitive to irrelevant features Handles real and discrete data Handles streaming data well Disadvantages: Assumes independence of features Slide by Eamonn Keogh 53