Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Similar documents
Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Essen,al Probability Concepts

Bayesian Methods: Naïve Bayes

2 When Some or All Labels are Missing: The EM Algorithm

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. an Introduction

knn & Naïve Bayes Hongning Wang

Predicting NBA Shots

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

PREDICTING the outcomes of sporting events

Logistic Regression. Hongning Wang

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

Evaluating and Classifying NBA Free Agents

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Wenbing Zhao. Department of Electrical and Computer Engineering

Unit 4: Inference for numerical variables Lecture 3: ANOVA

Outline. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 6. Steps in Capacity Planning and Management

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

CS145: INTRODUCTION TO DATA MINING

A computer program that improves its performance at some task through experience.

Chapter 5: Methods and Philosophy of Statistical Process Control

Predicting Horse Racing Results with Machine Learning

Ivan Suarez Robles, Joseph Wu 1

Simulating Major League Baseball Games

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.

Looking at Spacings to Assess Streakiness

5.1 Introduction. Learning Objectives

Predicting the Total Number of Points Scored in NFL Games

#1 Accurately Rate and Rank each FBS team, and

CSC242: Intro to AI. Lecture 21

Navigate to the golf data folder and make it your working directory. Load the data by typing

Machine Learning an American Pastime

Predicting the NCAA Men s Basketball Tournament with Machine Learning

A few things to remember about ANOVA

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

Calculation of Trail Usage from Counter Data

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Application of Bayesian Networks to Shopping Assistance

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

An Artificial Neural Network-based Prediction Model for Underdog Teams in NBA Matches

CS249: ADVANCED DATA MINING

Remote Towers: Videopanorama Framerate Requirements Derived from Visual Discrimination of Deceleration During Simulated Aircraft Landing

Estimating Paratransit Demand Forecasting Models Using ACS Disability and Income Data

Deconstructing Data Science

/435 Artificial Intelligence Fall 2015

A history-based estimation for LHCb job requirements

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Home Team Advantage in English Premier League

Measuring range Δp (span = 100%) Pa

Lecture 5. Optimisation. Regularisation

University of Cincinnati

Interpretable Discovery in Large Image Data Sets

Modeling Salmon Behavior on the Umpqua River. By Scott Jordan 6/2/2015

Basketball data science

Note that all proportions are between 0 and 1. at risk. How to construct a sentence describing a. proportion:

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 03, 2016 ISSN (online):

Mathematics of Pari-Mutuel Wagering

Transmitter CS 21 Operation Manual

#19 MONITORING AND PREDICTING PEDESTRIAN BEHAVIOR USING TRAFFIC CAMERAS

Strategic Research PROGRAM FOLLOW-UP ASSESSMENT OF THE MOBILEYE SHIELD+ COLLISION AVOIDANCE SYSTEM

Lab 11: Introduction to Linear Regression

POWER Quantifying Correction Curve Uncertainty Through Empirical Methods

Introduction to Pattern Recognition

Transformer fault diagnosis using Dissolved Gas Analysis technology and Bayesian networks

Deconstructing Data Science

Real Time Early Warning Indicators for Costly Asset Price Boom/Bust Cycles: A Role for Global Liquidity

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies

Determination of the Design Load for Structural Safety Assessment against Gas Explosion in Offshore Topside

Projecting Three-Point Percentages for the NBA Draft

Open Research Online The Open University s repository of research publications and other research outputs

A Behavioral Theory. Teck H. Ho. & National University of Singapore. Joint Work with Noah Lim, Houston Juanjuan Zhang, MIT. July 2008 Teck H.

Transmitter CS 21 Operation Manual

Chapter 4 (sort of): How Universal Are Turing Machines? CS105: Great Insights in Computer Science

Real-Time Electricity Pricing

Spatio-temporal analysis of team sports Joachim Gudmundsson

Biostatistics & SAS programming

PEDESTRIAN behavior modeling and analysis is

Lecture 04 ( ) Hazard Analysis. Systeme hoher Qualität und Sicherheit Universität Bremen WS 2015/2016

Lecture 39: Training Neural Networks (Cont d)

THe rip currents are very fast moving narrow channels,

Beamex. Calibration White Paper. Weighing scale calibration - How to calibrate weighing instruments

Determining bicycle infrastructure preferences A case study of Dublin

NCSS Statistical Software

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA

Reliable Real-Time Recognition of Motion Related Human Activities using MEMS Inertial Sensors

Special Topics: Data Science

University of Cincinnati

Fail Operational Controls for an Independent Metering Valve

CS 112 Introduction to Programming

Chem 110 General Principles of Chemistry

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Inferring land use from mobile phone activity

21st ECMI Modelling Week Final Report

POKEMON HACKS. Jodie Ashford Josh Baggott Chloe Barnes Jordan bird

Lecture 10. Support Vector Machines (cont.)

Using Poisson Distribution to predict a Soccer Betting Winner

Finding your feet: modelling the batting abilities of cricketers using Gaussian processes

Attribute Adaptation for Personalized Image Search Adriana Kovashka and Kristen Grauman (Supplementary Materials)

A decision maker subjectively assigned the following probabilities to the four outcomes of an experiment: P(E 1) =.10, P(E 2) =.

Epidemics and zombies

Transcription:

Naïve Bayes These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Essential Probability Concepts Marginalization: P (B) = X v2values(a) P (B ^ A = v) Conditional Probability: Bayes Rule: Independence: P (A B) = P (A B) = P (A ^ B) P (B) ^ P (B A) P (A) P (B) A? B $ P (A ^ B) =P (A) P (B) $ P (A B) =P (A) A? B C $ P (A ^ B C) =P (A C) P (B C) 2

Density Estimation 3

Recall the Joint Distribution... alarm alarm earthquake earthquake earthquake earthquake burglary 0.01 0.08 0.001 0.009 burglary 0.01 0.09 0.01 0.79 4

How Can We Obtain a Joint Distribution? Option 1: Elicit it from an expert human Option 2: Build it up from simpler probabilistic facts e.g, if we knew P(a) = 0.7 P(b a) = 0.2 P(b a) = 0.1 then, we could compute P(a Ù b) Option 3: Learn it from data... Based on slide by Andrew Moore 5

Learning a Joint Distribution Step 1: Step 2: Build a JD table for your Then, fill in each row with: attributes in which the recordsmatching row probabilities are unspecified Pˆ (row) = total number of records A B C Prob 0 0 0? 0 0 1? 0 1 0? 0 1 1? 1 0 0? 1 0 1? 1 1 0? 1 1 1? A B C Prob 0 0 0 0.30 0 0 1 0.05 0 1 0 0.10 0 1 1 0.05 1 0 0 0.05 1 0 1 0.10 1 1 0 0.25 1 1 1 0.10 Slide Andrew Moore Fraction of all records in which A and B are true but C is false

Example of Learning a Joint PD This Joint PD was obtained by learning from three attributes in the UCI Adult Census Database [Kohavi 1995] Slide Andrew Moore

Density Estimation Our joint distribution learner is an example of something called Density Estimation A Density Estimator learns a mapping from a set of attributes to a probability Input Attributes Density Estimator Probability Slide Andrew Moore

Density Estimation Compare it against the two other major kinds of models: Input Attributes Input Attributes Input Attributes Classifier Density Estimator Regressor Prediction of categorical output Probability Prediction of real- valued output Slide Andrew Moore

Evaluating Density Estimation Test- set criterion for estimating performance on future data Input Attributes Classifier Prediction of categorical output Test set Accuracy Input Attributes Density Estimator Probability? Input Attributes Regressor Prediction of real- valued output Test set Accuracy Slide Andrew Moore

Evaluating a Density Estimator Given a record x, a density estimator M can tell you how likely the record is: ˆP (x M) The density estimator can also tell you how likely the dataset is: Under the assumption that all records were independently generated from the Density Estimator s JD (that is, i.i.d.) ˆP (x 1 ^ x 2 ^...^ x n M) = ny ˆP (x i M) dataset i=1 Slide by Andrew Moore

Example Small Dataset: Miles Per Gallon From the UCI repository (thanks to Ross Quinlan) 192 records in the training set mpg modelyear maker Slide by Andrew Moore good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe

Example Small Dataset: Miles Per Gallon From the UCI repository (thanks to Ross Quinlan) 192 records in the training set mpg modelyear maker Slide by Andrew Moore good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe ˆP (dataset M) = ny i=1 ˆP (x i M) =3.4 10 203 (in this case)

Log Probabilities For decent sized data sets, this product will underflow ny ˆP (dataset M) = ˆP (x i M) i=1 Therefore, since probabilities of datasets get so small, we usually use log probabilities log ˆP ny nx (dataset M) = log ˆP (x i M) = log ˆP (x i M) i=1 i=1 Based on slide by Andrew Moore

Example Small Dataset: Miles Per Gallon From the UCI repository (thanks to Ross Quinlan) 192 records in the training set mpg modelyear maker Slide by Andrew Moore good 75to78 asia bad 70to74 america bad 75to78 europe bad 70to74 america bad 70to74 america bad 70to74 asia bad 70to74 asia bad 75to78 america : : : : : : : : : bad 70to74 america good 79to83 america bad 75to78 america good 79to83 america bad 75to78 america good 79to83 america good 79to83 america bad 70to74 america good 75to78 europe bad 75to78 europe log ˆP (dataset M) = nx i=1 log ˆP (x i M) = 466.19 (in this case)

Pros/Cons of the Joint Density Estimator The Good News: We can learn a Density Estimator from data. Density estimators can do many good things Can sort the records by probability, and thus spot weird records (anomaly detection) Can do inference Ingredient for Bayes Classifiers (coming very soon...) The Bad News: Density estimation by directly learning the joint is impractical, may result in adverse behavior Slide by Andrew Moore

Slide by Christopher Bishop Curse of Dimensionality

The Joint Density Estimator on a Test Set An independent test set with 196 cars has a much worse log- likelihood Actually it s a billion quintillion quintillion quintillion quintillion times less likely Density estimators can overfit......and the full joint density estimator is the overfittiest of them all! Slide by Andrew Moore

Overfitting Density Estimators If this ever happens, the joint PDE learns there are certain combinations that are impossible log ˆP nx (dataset M) = log ˆP (x i M) i=1 = 1 if for any i, ˆP (xi M) =0 Copyright Andrew W. Moore Slide by Andrew Moore

The Joint Density Estimator on a Test Set The only reason that the test set didn t score - is that the code was hard- wired to always predict a probability of at least 1/10 20 Slide by Andrew Moore

The Naïve Bayes Classifier 21

Recall Baye s Rule: P (hypothesis evidence) = Bayes Rule Equivalently, we can write: where X is a random variable representing the evidence and Y is a random variable for the label This is actually short for: P (evidence hypothesis) P (hypothesis) P (evidence) P (Y = y k X = x i )= P (Y = y k)p (X = x i Y = y k ) P (X = x i ) P (Y = y k X = x i )= P (Y = y k)p (X 1 = x i,1 ^...^ X d = x i,d Y = y k ) P (X 1 = x i,1 ^...^ X d = x i,d ) where X j denotes the random variable for the j th feature 22

Naïve Bayes Classifier Idea: Use the training data to estimate P (X Y ) and P (Y ). Then, use Bayes rule to infer P (Y X new ) for new data Easy to estimate from data Impractical, but necessary P (Y = y k X = x i )= P (Y = y k)p (X 1 = x i,1 ^...^ X d = x i,d Y = y k ) P (X 1 = x i,1 ^...^ X d = x i,d ) Unnecessary, as it turns out Estimating the joint probability distribution P (X 1,X 2,...,X d Y ) is not practical 23

Naïve Bayes Classifier Problem: estimating the joint PD or CPD isn t practical Severely overfits, as we saw before However, if we make the assumption that the attributes are independent given the class label, estimation is easy! dy P (X 1,X 2,...,X d Y )= P (X j Y ) j=1 In other words, we assume all attributes are conditionally independent given Y Often this assumption is violated in practice, but more on that later 24

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) =? P( play) =? P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 25

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) =? P( play) =? P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 26

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 27

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) =? P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 28

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 29

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 30

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) =? P(Humid = high play) =?...... 31

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) =? P(Humid = high play) =?...... 32

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) =?...... 33

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) =?...... 34

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) = 1...... 35

Training Naïve Bayes Estimate P (X j Y ) and P (Y ) directly from the training data by counting! Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 1 P(Sky = sunny play) = 0 P(Humid = high play) = 2/3 P(Humid = high play) = 1...... 36

Laplace Smoothing Notice that some probabilities estimated by counting might be zero Possible overfitting! Fix by using Laplace smoothing: Adds 1 to each count P (X j = v Y = y k )= X c v +1 c v 0 + values(x j ) where v 0 2values(X j ) c v is the count of training instances with a value of v for attribute j and class label y k values(x j ) is the number of values X j can take on 37

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) =? P(Humid = high play) =? P(Humid = high play) =?...... 38

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) = 1/3 P(Humid = high play) =? P(Humid = high play) =?...... 39

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) = 1/3 P(Humid = high play) = 3/5 P(Humid = high play) =?...... 40

Training Naïve Bayes with Laplace Smoothing Estimate P (X j Y ) and P (Y ) directly from the training data by counting with Laplace smoothing: Sky Temp Humid Wind Water Forecast Play? sunny warm normal strong warm same yes sunny warm high strong warm same yes rainy cold high strong warm change no sunny warm high strong cool change yes P(play) = 3/4 P( play) = 1/4 P(Sky = sunny play) = 4/5 P(Sky = sunny play) = 1/3 P(Humid = high play) = 3/5 P(Humid = high play) = 2/3...... 41

Using the Naïve Bayes Classifier Now, we have P (Y = y k X = x i )= P (Y = y k) Q d j=1 P (X j = x i,j Y = y k ) P (X = x i ) This is constant for a given instance, and so irrelevant to our prediction In practice, we use log- probabilities to prevent underflow To classify a new point x, h(x) = arg max y k P (Y = y k ) YdY P (X j = x j Y = y k ) j=1 = arg max y k log P (Y = y k )+ X j th attribute value of x dx log P (X j = x j Y = y k ) j=1 42

The Naïve Bayes Classifier Algorithm For each class label y k Estimate P(Y = y k ) from the data For each value x i,j of each attribute X i Estimate P(X i = x i,j Y = y k ) Classify a new point via: h(x) = arg max y k log P (Y = y k )+ dx log P (X j = x j Y = y k ) j=1 In practice, the independence assumption doesn t often hold true, but Naïve Bayes performs very well despite this 43

Computing Probabilities (Not Just Predicting Labels) NB classifier gives predictions, not probabilities, because we ignore P(X) (the denominator in Bayes rule) Can produce probabilities by: For each possible class label y k, compute dy P (Y = y k X = x) =P (Y = y k ) P (X j = x j Y = y k ) This is the numerator of Bayes rule, and is therefore off the true probability by a factor of α that makes probabilities sum to 1 α is given by = P #classes k=1 Class probability is given by j=1 1 P (Y = y k X = x) P (Y = y k X = x) = P (Y = y k X = x) 44

Naïve Bayes Applications Text classification Which e- mails are spam? Which e- mails are meeting notices? Which author wrote a document? Classifying mental states Learning P(BrainActivity WordCategory) Pairwise Classification Accuracy: 85% People Words Animal Words

The Naïve Bayes Graphical Model Labels (hypotheses) Y P(Y) Attributes (evidence) X 1 X i X d P(X 1 Y) P(X i Y) P(X d Y) Nodes denote random variables Edges denote dependency Each node has an associated conditional probability table (CPT), conditioned upon its parents 46

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Sky Temp Humid 47

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? yes no P(Play) Sky Temp Humid 48

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid 49

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky sunny rainy sunny rainy Play? P(Sky Play) yes yes no no 50

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 51

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes cold yes warm no cold no 52

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 53

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes norm yes high no norm no 54

Example NB Graphical Model Data: Sky Temp Humid Play? sunny warm normal yes sunny warm high yes rainy cold high no sunny warm high yes Play Play? P(Play) yes 3/4 no 1/4 Sky Temp Humid Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes 3/5 norm yes 2/5 high no 2/3 norm no 1/3 55

Example Using NB for Classification Sky Play Temp Humi d Play? P(Play) yes 3/4 no 1/4 Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes 3/5 norm yes 2/5 high no 2/3 norm no 1/3 h(x) = arg max y k log P (Y = y k )+ dx log P (X j = x j Y = y k ) j=1 Goal: Predict label for x = (rainy, warm, normal) 56

Example Using NB for Classification Sky Play Temp Predict label for: Humi d x = (rainy, warm, normal) Play? P(Play) yes 3/4 no 1/4 Sky Play? P(Sky Play) sunny yes 4/5 rainy yes 1/5 sunny no 1/3 rainy no 2/3 Temp Play? P(Temp Play) warm yes 4/5 cold yes 1/5 warm no 1/3 cold no 2/3 Humid Play? P(Humid Play) high yes 3/5 norm yes 2/5 high no 2/3 norm no 1/3 predict PLAY 57

Naïve Bayes Summary Advantages: Fast to train (single scan through data) Fast to classify Not sensitive to irrelevant features Handles real and discrete data Handles streaming data well Disadvantages: Assumes independence of features Slide by Eamonn Keogh 58