CS249: ADVANCED DATA MINING

Size: px
Start display at page:

Download "CS249: ADVANCED DATA MINING"

Transcription

1 CS249: ADVANCED DATA MINING Linear Regression, Logistic Regression, and GLMs Instructor: Yizhou Sun April 24, 2017

2 About WWW2017 Conference 2

3 Turing Award Winner Sir Tim Berners-Lee 3

4 Course Project Announcements Team formation due this Wednesday PTE will be decided by this weekend Office hour for TA (Jae LEE): 3-5 PM on

5 Methods to Learn Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 5

6 How to learn these algorithms? Three levels When it is applicable? Input, output, strengths, weaknesses, time complexity How it works? Pseudo-code, work flows, major steps Can work out a toy problem by pen and paper Why it works? Intuition, philosophy, objective, derivation, proof 6

7 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 7

8 Example A matrix of n pp: n data objects / points p attributes / dimensions x 11 x i1 x n1 x 1f x if x nf x 1p x ip x np 8

9 Numerical E.g., height, income Attribute Type Categorical / discrete E.g., Sex, Race 9

10 Categorical Attribute Types Nominal: categories, states, or names of things Hair_color = {auburn, black, blond, brown, grey, red, white} marital status, occupation, ID numbers, zip codes Binary Nominal attribute with only 2 states (0 and 1) Symmetric binary: both outcomes equally important e.g., gender Asymmetric binary: outcomes not equally important. e.g., medical test (positive vs. negative) Convention: assign 1 to most important outcome (e.g., HIV positive) Ordinal Values have a meaningful order (ranking) but magnitude between successive values is not known. Size = {small, medium, large}, grades, army rankings 10

11 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 11

12 Linear Regression Ordinary Least Square Regression Closed form solution Gradient descent Linear Regression with Probabilistic Interpretation 12

13 The Linear Regression Problem Any Attributes to Continuous Value: x y {age; major ; gender; race} GPA {income; credit score; profession} loan {college; major ; GPA} future income 13

14 Example of House Price Living Area (sqft) # of Beds Price (1000$) x y 14

15 Illustration 15

16 Formalization Data: n independent data objects yy ii, i = 1,, nn xx ii = xx iii, xx iii,, xx iiii T, i = 1,, nn A constant factor is added to model the bias term, i. e., xx iii = 1 T New x: xx ii = xx iii, xx iii, xx iii,, xx iiii Model: yy: dependent variable xx: explanatory variables ββ = ββ 0, ββ 1,, ββ pp TT : wwwwwwwwwww vvvvvvvvvvvv yy = xx TT ββ = ββ 0 + xx 1 ββ 1 + xx 2 ββ xx pp ββ pp 16

17 Model Construction A 3-step Process Use training data to find the best parameter ββ, denoted as ββ Model Selection Use validation data to select the best model E.g., Feature selection Model Usage Apply the model to the unseen data (test data): yy = xx TT ββ 17

18 Least Square Estimation Cost function (Total Square Error): JJ ββ = 1 2 ii xx ii TT ββ yy ii 2 Matrix form: JJ ββ = Xββ yy TT (XXββ yy)/2 1, 1, 1, x 11 x i1 x n1 or Xββ yy 2 /2 x 1f x if x nf x 1p x ip x np XX: nn pp + 11 matrix yy 1 yy ii yy nn y: nn 11 vvvvvvvvvvvv 18

19 Ordinary Least Squares (OLS) Goal: find ββ that minimizes JJ ββ JJ ββ = 1 2 Xββ yy TT XXββ yy = 1 2 (ββtt XX TT XXββ yy TT XXββ ββ TT XX TT yy + yy TT yy) Ordinary least squares Set first derivative of JJ ββ as 0 ββ = ββtt X TT X yy TT XX = 0 ββ = XX TT XX 1 XX TT yy 19

20 Gradient Descent Minimize the cost function by moving down in the steepest direction 20

21 Batch Gradient Descent Move in the direction of steepest descend Repeat until converge { } ββ (tt+1) :=ββ (t) ηη ββ ββ=ββ (t), e.g., ηη = 0.01 Where JJ ββ = 1 2 ii xx TT 2 ii ββ yy ii = ii JJ ii (ββ) and ββ = JJ ii ββ = xx ii (xx TT ii ββ yy ii ) ii ii 21

22 Stochastic Gradient Descent When a new observation, i, comes in, update weight immediately (extremely useful for largescale datasets): Repeat { for i=1:n { ββ (tt+1) :=ββ (t) + ηη(yy ii xx ii TT ββ (tt) )xx ii } } If the prediction for object i is smaller than the real value, ββ should move forward to the direction of xx ii 22

23 Other Practical Issues What if XX TT XX is not invertible? Add a small portion of identity matrix, λii, to it (ridge regression* ) What if some attributes are categorical? Set dummy variables E.g., xx = 1, iiii ssssss = FF; xx = 0, iiii ssssss = MM Nominal variable with multiple values? Create more dummy variables for one variable What if non-linear correlation exists? Transform features, say, xx to xx 2 23

24 Probabilistic Interpretation Review of normal distribution X~NN μμ, σσ 2 ff XX = xx = 1 2ππσσ 2 ee xx μμ 2 2σσ 2 24

25 Probabilistic Interpretation Model: yy ii = xx ii TT ββ + ε ii ε ii ~NN(0, σσ 2 ) yy ii xx ii, ββ~nn(xx ii TT ββ, σσ 2 ) EE yy ii xx ii, ββ = xx ii TT ββ Likelihood: LL ββ = ii pp yy ii xx ii, ββ) = ii 1 exp{ yy TT 2 ii xx ii ββ 2ππσσ 2 2σσ 2 } Maximum Likelihood Estimation find ββ that maximizes L ββ arg max LL = arg min JJ, Equivalent to OLS! 25

26 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 26

27 Linear Regression VS. Logistic Regression Linear Regression (prediction) Y: cccccccccccccccccccc vvvvvvvvvv, + Y = xx TT ββ = ββ 0 + xx 1 ββ 1 + xx 2 ββ xx pp ββ pp Y xx, ββ~nn(xx TT ββ, σσ 2 ) Logistic Regression (classification) Y: dddddddddddddddd vvvvvvvvvv ffffffff mm cccccccccccccc pp YY = CC jj [0,1] aaaaaa jj pp YY = CC jj = 1 27

28 Logistic Function Logistic Function / sigmoid function: σσ xx = 1 1+ee xx NNNNNNNN: σσ xx = σσ(xx)(1 σσ(xx)) 28

29 Modeling Probabilities of Two Classes PP YY = 1 XX, ββ = σσ XX TT ββ = 1 1+exp{ XX TT ββ} = exp{xxtt ββ} 1+exp{XX TT ββ} PP YY = 0 XX, ββ = 1 σσ XX TT ββ = exp{ XXTT ββ} 1+exp{ XX TT ββ} = 1 1+exp{XX TT ββ} ββ = ββ 0 ββ 1 ββ pp In other words YY X, ββ~bbeeeeeeeeeeeeeeee(σσ XX TT ββ ) 29

30 The 1-d Situation PP YY = 1 xx, ββ 0, ββ 1 = σσ ββ 1 xx + ββ , =0, =1 0 1 =0, =2 0 1 ) 1 =5, = Prob(Y=1 x, x 30

31 Example Q: What is ββ 0 here? 31

32 MLE estimation Parameter Estimation Given a dataset DD, wwwwwww nn dddddddd pppppppppppp For a single data object with attributes xx ii, class label yy ii Let pp ii = pp YY = 1 xx ii, ββ, ttttt pppppppp. oooo ii iiii cccccccccc 1 The probability of observing yy ii would be If yy ii = 1, ttttttt pp ii If yy ii = 0, ttttttt 1 pp ii yy Combing the two cases: pp ii ii 1 1 yy ppii ii LL = ii pp ii yy ii 1 ppii 1 yy ii = ii exp XX TT ββ 1+exp XX TT ββ yy ii 1 1+exp XX TT ββ 1 yy ii 32

33 Optimization Equivalent to maximize log likelihood LL = ii yy ii xx ii TT ββ log 1 + exp xx ii TT ββ Gradient ascent update: ββ nnnnnn = ββ oooooo + ηη (ββ) Newton-Raphson update Step size where derivatives at evaluated at ββ old 33

34 First Derivative pp(xx ii ; ββ) j = 0, 1,, p 34

35 Second Derivative It is a (p+1) by (p+1) matrix, Hessian Matrix, with jth row and nth column as 35

36 What about Multiclass Classification? It is easy to handle under logistic regression, say M classes PP YY = jj XX = 1,, MM 1 PP YY = MM XX = 1+ mm=1 exp{xx TT ββ jj } MM 1 exp{xx TT ββ mm }, for j = 1 MM 1 exp{xx TT ββ mm } 1+ mm=1 36

37 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 37

38 Recall Linear Regression and Logistic Regression Linear Regression y xx, ββ~nn(xx TT ββ, σσ 2 ) Logistic Regression y xx, ββ~bbbbbbbbbbbbbbbbbb(σσ xx TT ββ ) How about other distributions? Yes, as long as they belong to exponential family 38

39 Canonical Form Exponential Family pp yy; ηη = bb yy exp ηη TT TT yy aa ηη ηη: natural parameter TT yy : sufficient statistic aa ηη : log partition function for normalization bb yy : function that only dependent on yy 39

40 Examples of Exponential Family Many: pp yy; ηη = bb yy exp ηη TT TT yy aa ηη Gaussian, Bernoulli, Poisson, beta, Dirichlet, categorical, For Gaussian (not interested in σσ) For Bernoulli ηη 40

41 Recipe of GLMs Determines a distribution for y E.g., Gaussian, Bernoulli, Poisson Form the linear predictor for ηη ηη = xx TT ββ Determines a link function: μμ = gg 1 (ηη) Connects the linear predictor to the mean of the distribution E.g., μμ = ηη for Gaussian, μμ = σσ ηη for Bernoulli, μμ = eeeeee ηη for Poisson 41

42 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 42

43 What is vector data? Attribute types Linear regression Summary OLS, Probabilistic interpretation Logistic regression Sigmoid function, multiclass classification Generalized linear model Exponential family, link function 43

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 3: Vector Data: Logistic Regression Instructor: Yizhou Sun yzsun@cs.ucla.edu October 9, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification

More information

Special Topics: Data Science

Special Topics: Data Science Special Topics: Data Science L Linear Methods for Prediction Dr. Vidhyasaharan Sethu School of Electrical Engineering & Telecommunications University of New South Wales Sydney, Australia V. Sethu 1 Topics

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looked at -means and hierarchical clustering as mechanisms for unsupervised learning

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: Naïve Bayes Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Logistic Regression. Hongning Wang

Logistic Regression. Hongning Wang Logistic Regression Hongning Wang CS@UVa Today s lecture Logistic regression model A discriminative classification model Two different perspectives to derive the model Parameter estimation CS@UVa CS 6501:

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Announcements Course TA: Hao Xiong Office hours: Friday 2pm-4pm in ECSS2.104A1 First homework

More information

knn & Naïve Bayes Hongning Wang

knn & Naïve Bayes Hongning Wang knn & Naïve Bayes Hongning Wang CS@UVa Today s lecture Instance-based classifiers k nearest neighbors Non-parametric learning algorithm Model-based classifiers Naïve Bayes classifier A generative model

More information

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6

ECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6 ECO 745: Theory of International Economics Jack Rossbach Fall 2015 - Lecture 6 Review We ve covered several models of trade, but the empirics have been mixed Difficulties identifying goods with a technological

More information

Lecture 5. Optimisation. Regularisation

Lecture 5. Optimisation. Regularisation Lecture 5. Optimisation. Regularisation COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Iterative optimisation Loss functions Coordinate

More information

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected

More information

Evaluating and Classifying NBA Free Agents

Evaluating and Classifying NBA Free Agents Evaluating and Classifying NBA Free Agents Shanwei Yan In this project, I applied machine learning techniques to perform multiclass classification on free agents by using game statistics, which is useful

More information

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes

Midterm Exam 1, section 2. Thursday, September hour, 15 minutes San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can

More information

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Neural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Neural Networks II Chen Gao ECE-5424G / CS-5824 Virginia Tech Spring 2019 Neural Networks Origins: Algorithms that try to mimic the brain. What is this? A single neuron in the brain Input Output Slide

More information

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China

Attacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China Attacking and defending neural networks HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China Outline Background Attacking methods Defending methods 2 AI

More information

ISyE 6414 Regression Analysis

ISyE 6414 Regression Analysis ISyE 6414 Regression Analysis Lecture 2: More Simple linear Regression: R-squared (coefficient of variation/determination) Correlation analysis: Pearson s correlation Spearman s rank correlation Variable

More information

Lecture 10. Support Vector Machines (cont.)

Lecture 10. Support Vector Machines (cont.) Lecture 10. Support Vector Machines (cont.) COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Soft margin SVM Intuition and problem

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lectures: Stefanos Zafeiriou Goal (Lectures): To present modern statistical machine learning/pattern recognition algorithms. The course

More information

Machine Learning Application in Aviation Safety

Machine Learning Application in Aviation Safety Machine Learning Application in Aviation Safety Surface Safety Metric MOR Classification Presented to: By: Date: ART Firdu Bati, PhD, FAA September, 2018 Agenda Surface Safety Metric (SSM) development

More information

Combining Experimental and Non-Experimental Design in Causal Inference

Combining Experimental and Non-Experimental Design in Causal Inference Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017 A Tribute to Don Design trumps

More information

2 When Some or All Labels are Missing: The EM Algorithm

2 When Some or All Labels are Missing: The EM Algorithm CS769 Spring Advanced Natural Language Processing The EM Algorithm Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Given labeled examples (x, y ),..., (x l, y l ), one can build a classifier. If in addition

More information

Functions of Random Variables & Expectation, Mean and Variance

Functions of Random Variables & Expectation, Mean and Variance Functions of Random Variables & Expectation, Mean and Variance Kuan-Yu Chen ( 陳冠宇 ) @ TR-409, NTUST Functions of Random Variables 1 Given a random variables XX, one may generate other random variables

More information

Operational Risk Management: Preventive vs. Corrective Control

Operational Risk Management: Preventive vs. Corrective Control Operational Risk Management: Preventive vs. Corrective Control Yuqian Xu (UIUC) July 2018 Joint Work with Lingjiong Zhu and Michael Pinedo 1 Research Questions How to manage operational risk? How does

More information

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)

Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM) Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM) 1 HSR - University of Applied Sciences of Eastern Switzerland Institute

More information

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March

More information

Predicting NBA Shots

Predicting NBA Shots Predicting NBA Shots Brett Meehan Stanford University https://github.com/brettmeehan/cs229 Final Project bmeehan2@stanford.edu Abstract This paper examines the application of various machine learning algorithms

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,

More information

Predicting the Total Number of Points Scored in NFL Games

Predicting the Total Number of Points Scored in NFL Games Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football

More information

Imperfectly Shared Randomness in Communication

Imperfectly Shared Randomness in Communication Imperfectly Shared Randomness in Communication Madhu Sudan Harvard Joint work with Clément Canonne (Columbia), Venkatesan Guruswami (CMU) and Raghu Meka (UCLA). 11/16/2016 UofT: ISR in Communication 1

More information

A Novel Approach to Predicting the Results of NBA Matches

A Novel Approach to Predicting the Results of NBA Matches A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents

More information

PREDICTING the outcomes of sporting events

PREDICTING the outcomes of sporting events CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998

More information

This file is part of the following reference:

This file is part of the following reference: This file is part of the following reference: Hancock, Timothy Peter (2006) Multivariate consensus trees: tree-based clustering and profiling for mixed data types. PhD thesis, James Cook University. Access

More information

Physical Design of CMOS Integrated Circuits

Physical Design of CMOS Integrated Circuits Physical Design of CMOS Integrated Circuits Dae Hyun Kim EECS Washington State University References John P. Uyemura, Introduction to VLSI Circuits and Systems, 2002. Chapter 5 Goal Understand how to physically

More information

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016

Support Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016 Support Vector Machines: Optimization of Decision Making Christopher Katinas March 10, 2016 Overview Background of Support Vector Machines Segregation Functions/Problem Statement Methodology Training/Testing

More information

Navigate to the golf data folder and make it your working directory. Load the data by typing

Navigate to the golf data folder and make it your working directory. Load the data by typing Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better

More information

EE582 Physical Design Automation of VLSI Circuits and Systems

EE582 Physical Design Automation of VLSI Circuits and Systems EE Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Routing Grid Routing Grid Routing Grid Routing Grid Routing Grid Routing Lee s algorithm (Maze routing)

More information

Estimating the Probability of Winning an NFL Game Using Random Forests

Estimating the Probability of Winning an NFL Game Using Random Forests Estimating the Probability of Winning an NFL Game Using Random Forests Dale Zimmerman February 17, 2017 2 Brian Burke s NFL win probability metric May be found at www.advancednflstats.com, but the site

More information

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i

Distancei = BrandAi + 2 BrandBi + 3 BrandCi + i . Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with

More information

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan

CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan Scenario 1: Team 1 scored 200 runs from their 50 overs, and then Team 2 reaches 146 for the loss of two wickets from their

More information

Communication Amid Uncertainty

Communication Amid Uncertainty Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan

More information

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II

Fun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II Fun Neural Net Demo Site CS 188: Artificial Intelligence Demo-site: http://playground.tensorflow.org/ Deep Learning II Instructors: Pieter Abbeel & Anca Dragan --- University of California, Berkeley [These

More information

Projecting Three-Point Percentages for the NBA Draft

Projecting Three-Point Percentages for the NBA Draft Projecting Three-Point Percentages for the NBA Draft Hilary Sun hsun3@stanford.edu Jerold Yu jeroldyu@stanford.edu December 16, 2017 Roland Centeno rcenteno@stanford.edu 1 Introduction As NBA teams have

More information

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition

Conservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition Conservation of Energy Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 Different Types of Force, regarding the Work they do. gravity friction 2 Conservative Forces BB WW cccccccc

More information

Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study

Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study Jueyu Wang & Greg Lindsey Humphrey School of Public Affairs University of Minnesota Acknowledgement: NSF Sustainable Research

More information

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com Naïve Bayes These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely available

More information

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com Naïve Bayes These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides

More information

Chapter 10 Aggregate Demand I: Building the IS LM Model

Chapter 10 Aggregate Demand I: Building the IS LM Model Chapter 10 Aggregate Demand I: Building the IS LM Model Zhengyu Cai Ph.D. Institute of Development Southwestern University of Finance and Economics All rights reserved http://www.escience.cn/people/zhengyucai/index.html

More information

B. AA228/CS238 Component

B. AA228/CS238 Component Abstract Two supervised learning methods, one employing logistic classification and another employing an artificial neural network, are used to predict the outcome of baseball postseason series, given

More information

A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept

A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept International Journal of Probability and Statistics 015, 4(): 4-50 DOI: 10.593/j.ijps.015040.0 A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept F. B. Adebola 1, N. A. Adegoke

More information

Communication Amid Uncertainty

Communication Amid Uncertainty Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan

More information

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings Objectives The goal of this study is to advance the state of the art in understanding

More information

Ivan Suarez Robles, Joseph Wu 1

Ivan Suarez Robles, Joseph Wu 1 Ivan Suarez Robles, Joseph Wu 1 Section 1: Introduction Mixed Martial Arts (MMA) is the fastest growing competitive sport in the world. Because the fighters engage in distinct martial art disciplines (boxing,

More information

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held

Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held As part of my research in Sebastian Thrun's autonomous driving team, my goal is to predict the wait-time for a car at a 4-way intersection.

More information

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities

Modelling Exposure at Default Without Conversion Factors for Revolving Facilities Modelling Exposure at Default Without Conversion Factors for Revolving Facilities Mark Thackham Credit Scoring and Credit Control XV, Edinburgh, August 2017 1 / 27 Objective The objective of this presentation

More information

CS 221 PROJECT FINAL

CS 221 PROJECT FINAL CS 221 PROJECT FINAL STUART SY AND YUSHI HOMMA 1. INTRODUCTION OF TASK ESPN fantasy baseball is a common pastime for many Americans, which, coincidentally, defines a problem whose solution could potentially

More information

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images

PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING A N D R E W L E V A N D O S K I A N D J O N A T H A N L O B O The Ringer/Getty Images THE TOURNAMENT MARCH MADNESS 68 teams (4 play-in games)

More information

Lesson 14: Modeling Relationships with a Line

Lesson 14: Modeling Relationships with a Line Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any

More information

The Effect of Public Sporting Expenditures on Medal Share at the Summer Olympic Games: A Study of the Differential Impact by Sport and Gender

The Effect of Public Sporting Expenditures on Medal Share at the Summer Olympic Games: A Study of the Differential Impact by Sport and Gender Dickinson College Dickinson Scholar Student Honors Theses By Year Student Honors Theses 5-21-2017 The Effect of Public Sporting Expenditures on Medal Share at the Summer Olympic Games: A Study of the Differential

More information

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary

Pre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary Pre-Kindergarten 2017 Summer Packet Robert F Woodall Elementary In the fall, on your child s testing day, please bring this packet back for a special reward that will be awarded to your child for completion

More information

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation

Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Outline: MMSE estimation, Linear MMSE (LMMSE) estimation, Geometric formulation of LMMSE estimation and orthogonality principle. Reading:

More information

Legendre et al Appendices and Supplements, p. 1

Legendre et al Appendices and Supplements, p. 1 Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction

More information

Trouble With The Curve: Improving MLB Pitch Classification

Trouble With The Curve: Improving MLB Pitch Classification Trouble With The Curve: Improving MLB Pitch Classification Michael A. Pane Samuel L. Ventura Rebecca C. Steorts A.C. Thomas arxiv:134.1756v1 [stat.ap] 5 Apr 213 April 8, 213 Abstract The PITCHf/x database

More information

ISyE 6414: Regression Analysis

ISyE 6414: Regression Analysis ISyE 6414: Regression Analysis Lectures: MWF 8:00-10:30, MRDC #2404 Early five-week session; May 14- June 15 (8:00-9:10; 10-min break; 9:20-10:30) Instructor: Dr. Yajun Mei ( YA_JUNE MAY ) Email: ymei@isye.gatech.edu;

More information

Introduction to Genetics

Introduction to Genetics Name: Introduction to Genetics Keystone Assessment Anchor: BIO.B.2.1.1: Describe and/or predict observed patterns of inheritance (i.e. dominant, recessive, co-dominance, incomplete dominance, sex-linked,

More information

Predicting Horse Racing Results with Machine Learning

Predicting Horse Racing Results with Machine Learning Predicting Horse Racing Results with Machine Learning LYU 1703 LIU YIDE 1155062194 Supervisor: Professor Michael R. Lyu Outline Recap of last semester Object of this semester Data Preparation Set to sequence

More information

Machine Learning an American Pastime

Machine Learning an American Pastime Nikhil Bhargava, Andy Fang, Peter Tseng CS 229 Paper Machine Learning an American Pastime I. Introduction Baseball has been a popular American sport that has steadily gained worldwide appreciation in the

More information

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced

More information

Mathematics of Pari-Mutuel Wagering

Mathematics of Pari-Mutuel Wagering Millersville University of Pennsylvania April 17, 2014 Project Objectives Model the horse racing process to predict the outcome of a race. Use the win and exacta betting pools to estimate probabilities

More information

Introduction to Machine Learning NPFL 054

Introduction to Machine Learning NPFL 054 Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and

More information

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS)

Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Statistics 22_multiple_regression.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Some Common Sense Assumptions for Multiple Regression

More information

Tie Breaking Procedure

Tie Breaking Procedure Ohio Youth Basketball Tie Breaking Procedure The higher seeded team when two teams have the same record after completion of pool play will be determined by the winner of their head to head competition.

More information

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games Tony Liu Introduction The broad aim of this project is to use the Bradley Terry pairwise comparison model as

More information

CT4510: Computer Graphics. Transformation BOCHANG MOON

CT4510: Computer Graphics. Transformation BOCHANG MOON CT4510: Computer Graphics Transformation BOCHANG MOON 2D Translation Transformations such as rotation and scale can be represented using a matrix M ee. gg., MM = SSSS xx = mm 11 xx + mm 12 yy yy = mm 21

More information

Use of Auxiliary Variables and Asymptotically Optimum Estimators in Double Sampling

Use of Auxiliary Variables and Asymptotically Optimum Estimators in Double Sampling International Journal of Statistics and Probability; Vol. 5, No. 3; May 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Use of Auxiliary Variables and Asymptotically

More information

Building an NFL performance metric

Building an NFL performance metric Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and

More information

New Numerical Schemes for the Solution of Slightly Stiff Second Order Ordinary Differential Equations

New Numerical Schemes for the Solution of Slightly Stiff Second Order Ordinary Differential Equations American Journal of Computational and Applied Mathematics 24, 4(6): 239-246 DOI:.5923/j.ajcam.2446.7 New Numerical Schemes for the Solution of Slightly Stiff Second Order Ordinary Differential Equations

More information

Simulating Major League Baseball Games

Simulating Major League Baseball Games ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of

More information

New Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary Variables

New Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary Variables Journal of Mathematics and System Science 7 (017) 48-60 doi: 10.1765/159-591/017.09.00 D DAVID PUBLISHING New Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary

More information

Marathon Performance Prediction of Amateur Runners based on Training Session Data

Marathon Performance Prediction of Amateur Runners based on Training Session Data Marathon Performance Prediction of Amateur Runners based on Training Session Data Daniel Ruiz-Mayo, Estrella Pulido, and Gonzalo Martínez-Muñoz Escuela Politécnica Superior, Universidad Autónoma de Madrid,

More information

An agent-based methodological approach for modeling travelers behavior on modal shift: The case of inland transport in Denmark.

An agent-based methodological approach for modeling travelers behavior on modal shift: The case of inland transport in Denmark. An agent-based methodological approach for modeling travelers behavior on modal shift: The case of inland transport in Denmark ETSAP Workshop Zurich, 11 th -12 th Dec 2017 ABMoS-DK Mohammad Ahanchian PostDoc

More information

Peer Effect in Sports: A Case Study in Speed. Skating and Swimming

Peer Effect in Sports: A Case Study in Speed. Skating and Swimming Peer Effect in Sports: A Case Study in Speed Skating and Swimming Tue Nguyen Advisor: Takao Kato 1 Abstract: Peer effect is studied for its implication in the workplace, schools and sports. Sports provide

More information

A microscopic view. Solid rigid body. Liquid. Fluid. Incompressible. Gas. Fluid. compressible

A microscopic view. Solid rigid body. Liquid. Fluid. Incompressible. Gas. Fluid. compressible Hello! I m Chris Blake, your lecturer for the rest of semester We ll cover: fluid motion, thermal physics, electricity, revision MASH centre in AMDC 503-09.30-16.30 daily My consultation hours: Tues 10.30-12.30

More information

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao

Predicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Predicting the use of the Sacrifice Bunt in Major League Baseball Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Understanding the Data Data from the St. Louis Cardinals Sig Mejdal, Senior Quantitative

More information

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University

DISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University DISMAS Evaluation 1 Running head: Project Dismas Evaluation DISMAS Evaluation: 2007 2008 Dr. Elizabeth C. McMullan Grambling State University DISMAS Evaluation 2 Abstract An offender notification project

More information

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income

Announcements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income Announcements Announcements Unit 7: Multiple Linear Regression Lecture 3: Case Study Statistics 101 Mine Çetinkaya-Rundel April 18, 2013 OH: Sunday: Virtual OH, 3-4pm - you ll receive an email invitation

More information

Product Decomposition in Supply Chain Planning

Product Decomposition in Supply Chain Planning Mario R. Eden, Marianthi Ierapetritou and Gavin P. Towler (Editors) Proceedings of the 13 th International Symposium on Process Systems Engineering PSE 2018 July 1-5, 2018, San Diego, California, USA 2018

More information

How To Win The Ashes A Statistician s Guide

How To Win The Ashes A Statistician s Guide How To Win The Ashes A Statistician s Guide Kevin Wang Sydney Universtiy Mathematics Society Last Modified: August 21, 2015 Motivation How come SUMS doesn t do statistics talk very often? Motivation How

More information

Parsimonious Linear Fingerprinting for Time Series

Parsimonious Linear Fingerprinting for Time Series Parsimonious Linear Fingerprinting for Time Series Lei Li, B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon University VLDB 2010 1 L. Li, 2010 VLDB2010, 36 th International

More information

Machine Learning Methods for Climbing Route Classification

Machine Learning Methods for Climbing Route Classification Machine Learning Methods for Climbing Route Classification Alejandro Dobles Mathematics adobles@stanford.edu Juan Carlos Sarmiento Management Science & Engineering jcs10@stanford.edu Abstract Peter Satterthwaite

More information

Real-time Analysis of Industrial Gases with Online Near- Infrared Spectroscopy

Real-time Analysis of Industrial Gases with Online Near- Infrared Spectroscopy Real-time Analysis of Industrial Gases with Online Near- Infrared Branch 5 Petrochemical, Biofuels Keywords Process, NIR, Gases, Gas Cell, Ethylene, Carbon Dioxide, Propane, Butane, Acetylene Summary Near-infrared

More information

Modeling the NCAA Tournament Through Bayesian Logistic Regression

Modeling the NCAA Tournament Through Bayesian Logistic Regression Duquesne University Duquesne Scholarship Collection Electronic Theses and Dissertations 0 Modeling the NCAA Tournament Through Bayesian Logistic Regression Bryan Nelson Follow this and additional works

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Differential Kinematics, Probabilistic Roadmaps Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu)

More information

Practical Approach to Evacuation Planning Via Network Flow and Deep Learning

Practical Approach to Evacuation Planning Via Network Flow and Deep Learning Practical Approach to Evacuation Planning Via Network Flow and Deep Learning Akira Tanaka Nozomi Hata Nariaki Tateiwa Katsuki Fujisawa Graduate School of Mathematics, Kyushu University Institute of Mathematics

More information

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.

CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing. CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing Emmanuel Agu Applications of Activity Recognition Recall: Activity Recognition

More information

Using Spatio-Temporal Data To Create A Shot Probability Model

Using Spatio-Temporal Data To Create A Shot Probability Model Using Spatio-Temporal Data To Create A Shot Probability Model Eli Shayer, Ankit Goyal, Younes Bensouda Mourri June 2, 2016 1 Introduction Basketball is an invasion sport, which means that players move

More information

Using Markov Chains to Analyze a Volleyball Rally

Using Markov Chains to Analyze a Volleyball Rally 1 Introduction Using Markov Chains to Analyze a Volleyball Rally Spencer Best Carthage College sbest@carthage.edu November 3, 212 Abstract We examine a volleyball rally between two volleyball teams. Using

More information

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007

Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base

More information

Running head: DATA ANALYSIS AND INTERPRETATION 1

Running head: DATA ANALYSIS AND INTERPRETATION 1 Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various

More information

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Study Overview: These data come from a daily diary study that followed 41 male and female college students over a six-week period to examine

More information

Lab 11: Introduction to Linear Regression

Lab 11: Introduction to Linear Regression Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that

More information

Predicting the NCAA Men s Basketball Tournament with Machine Learning

Predicting the NCAA Men s Basketball Tournament with Machine Learning Predicting the NCAA Men s Basketball Tournament with Machine Learning Andrew Levandoski and Jonathan Lobo CS 2750: Machine Learning Dr. Kovashka 25 April 2017 Abstract As the popularity of the NCAA Men

More information