CS145: INTRODUCTION TO DATA MINING

Similar documents
CS249: ADVANCED DATA MINING

Special Topics: Data Science

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Logistic Regression. Hongning Wang

Bayesian Methods: Naïve Bayes

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

knn & Naïve Bayes Hongning Wang

Evaluating and Classifying NBA Free Agents

Lecture 5. Optimisation. Regularisation

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

ISyE 6414 Regression Analysis

PREDICTING the outcomes of sporting events

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

A Novel Approach to Predicting the Results of NBA Matches

Lecture 10. Support Vector Machines (cont.)

This file is part of the following reference:

Operational Risk Management: Preventive vs. Corrective Control

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

2 When Some or All Labels are Missing: The EM Algorithm

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)

Ivan Suarez Robles, Joseph Wu 1

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition

NCSS Statistical Software

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

EE582 Physical Design Automation of VLSI Circuits and Systems

Functions of Random Variables & Expectation, Mean and Variance

Combining Experimental and Non-Experimental Design in Causal Inference

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

Predicting NBA Shots

Simulating Major League Baseball Games

Communication Amid Uncertainty

Machine Learning Application in Aviation Safety

Chapter 10 Aggregate Demand I: Building the IS LM Model

Communication Amid Uncertainty

Introduction to Genetics

Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study

B. AA228/CS238 Component

A computer program that improves its performance at some task through experience.

The Intrinsic Value of a Batted Ball Technical Details

Tie Breaking Procedure

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Imperfectly Shared Randomness in Communication

Predicting Results of March Madness Using the Probability Self-Consistent Method

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

CS 221 PROJECT FINAL

A Machine Learning Approach to Predicting Winning Patterns in Track Cycling Omnium

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Physical Design of CMOS Integrated Circuits

Machine Learning an American Pastime

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

POKEMON HACKS. Jodie Ashford Josh Baggott Chloe Barnes Jordan bird

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

ISyE 6414: Regression Analysis

Building an NFL performance metric

Using Spatio-Temporal Data To Create A Shot Probability Model

A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept

Modeling the NCAA Tournament Through Bayesian Logistic Regression

Two Machine Learning Approaches to Understand the NBA Data

A team recommendation system and outcome prediction for the game of cricket

JPEG-Compatibility Steganalysis Using Block-Histogram of Recompression Artifacts

Gait Based Personal Identification System Using Rotation Sensor

Predicting Momentum Shifts in NBA Games

Predicting the Total Number of Points Scored in NFL Games

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

Human Performance Evaluation

Available online at ScienceDirect. Procedia Manufacturing 3 (2015 )

Addition and Subtraction of Rational Expressions

Mobility Detection Using Everyday GSM Traces

Predicting Horse Racing Results with TensorFlow

Configurable Test-Goal Set Partitioning for Directed Multi- Goal Test Generation

Environmental Science: An Indian Journal

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.

Predicting the NCAA Men s Basketball Tournament with Machine Learning

The Effect of Public Sporting Expenditures on Medal Share at the Summer Olympic Games: A Study of the Differential Impact by Sport and Gender

Simplifying Radical Expressions and the Distance Formula

Junction design rules for urban traffic networks

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary

COMP Intro to Logic for Computer Scientists. Lecture 13

CT4510: Computer Graphics. Transformation BOCHANG MOON

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016

Operations on Radical Expressions; Rationalization of Denominators

Modelling the distribution of first innings runs in T20 Cricket

Product Decomposition in Supply Chain Planning

Copyright 2017 by Glenn Daniel Sidle. All Rights Reserved

TSP at isolated intersections: Some advances under simulation environment

Coaches, Parents, Players and Fans

COMPLETING THE RESULTS OF THE 2013 BOSTON MARATHON

An agent-based methodological approach for modeling travelers behavior on modal shift: The case of inland transport in Denmark.

Using Multi-Class Classification Methods to Predict Baseball Pitch Types

An Artificial Neural Network-based Prediction Model for Underdog Teams in NBA Matches

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

Object Recognition. Selim Aksoy. Bilkent University

Confidence Interval Notes Calculating Confidence Intervals

c 2016 Arash Khatibi

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

Transcription:

CS145: INTRODUCTION TO DATA MINING 3: Vector Data: Logistic Regression Instructor: Yizhou Sun yzsun@cs.ucla.edu October 9, 2017

Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering Logistic Regression; Decision Tree; KNN SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models Naïve Bayes for Text PLSA Prediction Frequent Pattern Mining Similarity Search Linear Regression GLM* Apriori; FP growth GSP; PrefixSpan DTW 2

Vector Data: Logistic Regression Classification: Basic Concepts Logistic Regression Model Generalized Linear Model* Summary 3

Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data 4

Prediction Problems: Classification vs. Numeric Prediction Classification predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Numeric Prediction models continuous-valued functions, i.e., predicts unknown or missing values Typical applications Credit/loan approval: Medical diagnosis: if a tumor is cancerous or benign Fraud detection: if a transaction is fraudulent Web page categorization: which category it is 5

Classification A Two-Step Process (1) Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute For data point i: < xx ii, yy ii > Features: xx ii ; class label: yy ii The model is represented as classification rules, decision trees, or mathematical formulae Also called classifier The set of tuples used for model construction is training set 6

Classification A Two-Step Process (2) Model usage: for classifying future or unknown objects Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Test set is independent of training set (otherwise overfitting) Accuracy rate is the percentage of test set samples that are correctly classified by the model Most used for binary classes If the accuracy is acceptable, use the model to classify new data Note: If the test set is used to select models, it is called validation (test) set 7

Process (1): Model Construction Training Data Classification Algorithms NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no Classifier (Model) IF rank = professor OR years > 6 THEN tenured = yes 8

Process (2): Using the Model in Prediction Classifier Testing Data Unseen Data NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes (Jeff, Professor, 4) Tenured? 9

Vector Data: Logistic Regression Classification: Basic Concepts Logistic Regression Model Generalized Linear Model* Summary 10

Linear Regression VS. Logistic Regression Linear Regression (prediction) Y: cccccccccccccccccccc vvvvvvvvvv, + Y = xx TT ββ = ββ 0 + xx 1 ββ 1 + xx 2 ββ 2 + + xx pp ββ pp Y xx, ββ~nn(xx TT ββ, σσ 2 ) Logistic Regression (classification) Y: dddddddddddddddd vvvvvvvvvv ffffffff mm cccccccccccccc pp YY = CC jj [0,1] aaaaaa jj pp YY = CC jj = 1 11

Logistic Function Logistic Function / sigmoid function: σσ xx = 1 1+ee xx NNNNNNNN: σσ xx = σσ(xx)(1 σσ(xx)) 12

Modeling Probabilities of Two Classes PP YY = 1 XX, ββ = σσ XX TT ββ = 1 1+exp{ XX TT ββ} = exp{xxtt ββ} 1+exp{XX TT ββ} PP YY = 0 XX, ββ = 1 σσ XX TT ββ = exp{ XXTT ββ} 1+exp{ XX TT ββ} = 1 1+exp{XX TT ββ} ββ = ββ 0 ββ 1 ββ pp In other words YY X, ββ~bbeeeeeeeeeeeeeeee(σσ XX TT ββ ) 13

The 1-d Situation PP YY = 1 xx, ββ 0, ββ 1 = σσ ββ 1 xx + ββ 0 1 0.8 0, =0, =1 0 1 =0, =2 0 1 ) 1 =5, =2 0 1 0.6 0.4 Prob(Y=1 x, 0.2 0-10 -5 0 5 10 x 14

Example Q: What is ββ 0 here? 15

MLE estimation Parameter Estimation Given a dataset DD, wwwwwww nn dddddddd pppppppppppp For a single data object with attributes xx ii, class label yy ii Let pp ii = pp YY = 1 xx ii, ββ, ttttt pppppppp. oooo ii iiii cccccccccc 1 The probability of observing yy ii would be If yy ii = 1, ttttttt pp ii If yy ii = 0, ttttttt 1 pp ii yy Combing the two cases: pp ii ii 1 1 yy ppii ii LL = ii pp ii yy ii 1 ppii 1 yy ii = ii exp XX TT ββ 1+exp XX TT ββ yy ii 1 1+exp XX TT ββ 1 yy ii 16

Optimization Equivalent to maximize log likelihood LL = ii yy ii xx ii TT ββ log 1 + exp xx ii TT ββ Gradient ascent update: ββ nnnnnn = ββ oooooo + ηη (ββ) Newton-Raphson update Step size where derivatives are evaluated at ββ old 17

First Derivative pp(xx ii ; ββ) j = 0, 1,, p 18

Second Derivative It is a (p+1) by (p+1) matrix, Hessian Matrix, with jth row and nth column as 19

What about Multiclass Classification? It is easy to handle under logistic regression, say M classes PP YY = jj XX = 1,, MM 1 PP YY = MM XX = 1+ mm=1 exp{xx TT ββ jj } MM 1 exp{xx TT ββ mm }, for j = 1 MM 1 exp{xx TT ββ mm } 1+ mm=1 20

Vector Data: Logistic Regression Classification: Basic Concepts Logistic Regression Model Generalized Linear Model* Summary 21

Recall Linear Regression and Logistic Regression Linear Regression y xx, ββ~nn(xx TT ββ, σσ 2 ) Logistic Regression y xx, ββ~bbbbbbbbbbbbbbbbbb(σσ xx TT ββ ) How about other distributions? Yes, as long as they belong to exponential family 22

Canonical Form Exponential Family pp yy; ηη = bb yy exp ηη TT TT yy aa ηη ηη: natural parameter TT yy : sufficient statistic aa ηη : log partition function for normalization bb yy : function that only dependent on yy 23

Examples of Exponential Family Many: pp yy; ηη = bb yy exp ηη TT TT yy aa ηη Gaussian, Bernoulli, Poisson, beta, Dirichlet, categorical, For Gaussian (not interested in σσ) For Bernoulli ηη 24

Recipe of GLMs Determines a distribution for y E.g., Gaussian, Bernoulli, Poisson Form the linear predictor for ηη ηη = xx TT ββ Determines a link function: μμ = gg 1 (ηη) Connects the linear predictor to the mean of the distribution E.g., μμ = ηη for Gaussian, μμ = σσ ηη for Bernoulli, μμ = eeeeee ηη for Poisson 25

Vector Data: Logistic Regression Classification: Basic Concepts Logistic Regression Model Generalized Linear Model* Summary 26

Summary What is classification Supervised learning vs. unsupervised learning, classification vs. prediction Logistic regression Sigmoid function, multiclass classification Generalized linear model* Exponential family, link function 27