CS249: ADVANCED DATA MINING
|
|
- Jasmine Susanna Taylor
- 5 years ago
- Views:
Transcription
1 CS249: ADVANCED DATA MINING Linear Regression, Logistic Regression, and GLMs Instructor: Yizhou Sun April 24, 2017
2 About WWW2017 Conference 2
3 Turing Award Winner Sir Tim Berners-Lee 3
4 Course Project Announcements Team formation due this Wednesday PTE will be decided by this weekend Office hour for TA (Jae LEE): 3-5 PM on
5 Methods to Learn Classification Clustering Vector Data Text Data Recommender System Decision Tree; Naïve Bayes; Logistic Regression SVM; NN K-means; hierarchical clustering; DBSCAN; Mixture Models; kernel k-means PLSA; LDA Matrix Factorization Graph & Network Label Propagation SCAN; Spectral Clustering Prediction Ranking Linear Regression GLM Collaborative Filtering PageRank Feature Representation Word embedding Network embedding 5
6 How to learn these algorithms? Three levels When it is applicable? Input, output, strengths, weaknesses, time complexity How it works? Pseudo-code, work flows, major steps Can work out a toy problem by pen and paper Why it works? Intuition, philosophy, objective, derivation, proof 6
7 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 7
8 Example A matrix of n pp: n data objects / points p attributes / dimensions x 11 x i1 x n1 x 1f x if x nf x 1p x ip x np 8
9 Numerical E.g., height, income Attribute Type Categorical / discrete E.g., Sex, Race 9
10 Categorical Attribute Types Nominal: categories, states, or names of things Hair_color = {auburn, black, blond, brown, grey, red, white} marital status, occupation, ID numbers, zip codes Binary Nominal attribute with only 2 states (0 and 1) Symmetric binary: both outcomes equally important e.g., gender Asymmetric binary: outcomes not equally important. e.g., medical test (positive vs. negative) Convention: assign 1 to most important outcome (e.g., HIV positive) Ordinal Values have a meaningful order (ranking) but magnitude between successive values is not known. Size = {small, medium, large}, grades, army rankings 10
11 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 11
12 Linear Regression Ordinary Least Square Regression Closed form solution Gradient descent Linear Regression with Probabilistic Interpretation 12
13 The Linear Regression Problem Any Attributes to Continuous Value: x y {age; major ; gender; race} GPA {income; credit score; profession} loan {college; major ; GPA} future income 13
14 Example of House Price Living Area (sqft) # of Beds Price (1000$) x y 14
15 Illustration 15
16 Formalization Data: n independent data objects yy ii, i = 1,, nn xx ii = xx iii, xx iii,, xx iiii T, i = 1,, nn A constant factor is added to model the bias term, i. e., xx iii = 1 T New x: xx ii = xx iii, xx iii, xx iii,, xx iiii Model: yy: dependent variable xx: explanatory variables ββ = ββ 0, ββ 1,, ββ pp TT : wwwwwwwwwww vvvvvvvvvvvv yy = xx TT ββ = ββ 0 + xx 1 ββ 1 + xx 2 ββ xx pp ββ pp 16
17 Model Construction A 3-step Process Use training data to find the best parameter ββ, denoted as ββ Model Selection Use validation data to select the best model E.g., Feature selection Model Usage Apply the model to the unseen data (test data): yy = xx TT ββ 17
18 Least Square Estimation Cost function (Total Square Error): JJ ββ = 1 2 ii xx ii TT ββ yy ii 2 Matrix form: JJ ββ = Xββ yy TT (XXββ yy)/2 1, 1, 1, x 11 x i1 x n1 or Xββ yy 2 /2 x 1f x if x nf x 1p x ip x np XX: nn pp + 11 matrix yy 1 yy ii yy nn y: nn 11 vvvvvvvvvvvv 18
19 Ordinary Least Squares (OLS) Goal: find ββ that minimizes JJ ββ JJ ββ = 1 2 Xββ yy TT XXββ yy = 1 2 (ββtt XX TT XXββ yy TT XXββ ββ TT XX TT yy + yy TT yy) Ordinary least squares Set first derivative of JJ ββ as 0 ββ = ββtt X TT X yy TT XX = 0 ββ = XX TT XX 1 XX TT yy 19
20 Gradient Descent Minimize the cost function by moving down in the steepest direction 20
21 Batch Gradient Descent Move in the direction of steepest descend Repeat until converge { } ββ (tt+1) :=ββ (t) ηη ββ ββ=ββ (t), e.g., ηη = 0.01 Where JJ ββ = 1 2 ii xx TT 2 ii ββ yy ii = ii JJ ii (ββ) and ββ = JJ ii ββ = xx ii (xx TT ii ββ yy ii ) ii ii 21
22 Stochastic Gradient Descent When a new observation, i, comes in, update weight immediately (extremely useful for largescale datasets): Repeat { for i=1:n { ββ (tt+1) :=ββ (t) + ηη(yy ii xx ii TT ββ (tt) )xx ii } } If the prediction for object i is smaller than the real value, ββ should move forward to the direction of xx ii 22
23 Other Practical Issues What if XX TT XX is not invertible? Add a small portion of identity matrix, λii, to it (ridge regression* ) What if some attributes are categorical? Set dummy variables E.g., xx = 1, iiii ssssss = FF; xx = 0, iiii ssssss = MM Nominal variable with multiple values? Create more dummy variables for one variable What if non-linear correlation exists? Transform features, say, xx to xx 2 23
24 Probabilistic Interpretation Review of normal distribution X~NN μμ, σσ 2 ff XX = xx = 1 2ππσσ 2 ee xx μμ 2 2σσ 2 24
25 Probabilistic Interpretation Model: yy ii = xx ii TT ββ + ε ii ε ii ~NN(0, σσ 2 ) yy ii xx ii, ββ~nn(xx ii TT ββ, σσ 2 ) EE yy ii xx ii, ββ = xx ii TT ββ Likelihood: LL ββ = ii pp yy ii xx ii, ββ) = ii 1 exp{ yy TT 2 ii xx ii ββ 2ππσσ 2 2σσ 2 } Maximum Likelihood Estimation find ββ that maximizes L ββ arg max LL = arg min JJ, Equivalent to OLS! 25
26 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 26
27 Linear Regression VS. Logistic Regression Linear Regression (prediction) Y: cccccccccccccccccccc vvvvvvvvvv, + Y = xx TT ββ = ββ 0 + xx 1 ββ 1 + xx 2 ββ xx pp ββ pp Y xx, ββ~nn(xx TT ββ, σσ 2 ) Logistic Regression (classification) Y: dddddddddddddddd vvvvvvvvvv ffffffff mm cccccccccccccc pp YY = CC jj [0,1] aaaaaa jj pp YY = CC jj = 1 27
28 Logistic Function Logistic Function / sigmoid function: σσ xx = 1 1+ee xx NNNNNNNN: σσ xx = σσ(xx)(1 σσ(xx)) 28
29 Modeling Probabilities of Two Classes PP YY = 1 XX, ββ = σσ XX TT ββ = 1 1+exp{ XX TT ββ} = exp{xxtt ββ} 1+exp{XX TT ββ} PP YY = 0 XX, ββ = 1 σσ XX TT ββ = exp{ XXTT ββ} 1+exp{ XX TT ββ} = 1 1+exp{XX TT ββ} ββ = ββ 0 ββ 1 ββ pp In other words YY X, ββ~bbeeeeeeeeeeeeeeee(σσ XX TT ββ ) 29
30 The 1-d Situation PP YY = 1 xx, ββ 0, ββ 1 = σσ ββ 1 xx + ββ , =0, =1 0 1 =0, =2 0 1 ) 1 =5, = Prob(Y=1 x, x 30
31 Example Q: What is ββ 0 here? 31
32 MLE estimation Parameter Estimation Given a dataset DD, wwwwwww nn dddddddd pppppppppppp For a single data object with attributes xx ii, class label yy ii Let pp ii = pp YY = 1 xx ii, ββ, ttttt pppppppp. oooo ii iiii cccccccccc 1 The probability of observing yy ii would be If yy ii = 1, ttttttt pp ii If yy ii = 0, ttttttt 1 pp ii yy Combing the two cases: pp ii ii 1 1 yy ppii ii LL = ii pp ii yy ii 1 ppii 1 yy ii = ii exp XX TT ββ 1+exp XX TT ββ yy ii 1 1+exp XX TT ββ 1 yy ii 32
33 Optimization Equivalent to maximize log likelihood LL = ii yy ii xx ii TT ββ log 1 + exp xx ii TT ββ Gradient ascent update: ββ nnnnnn = ββ oooooo + ηη (ββ) Newton-Raphson update Step size where derivatives at evaluated at ββ old 33
34 First Derivative pp(xx ii ; ββ) j = 0, 1,, p 34
35 Second Derivative It is a (p+1) by (p+1) matrix, Hessian Matrix, with jth row and nth column as 35
36 What about Multiclass Classification? It is easy to handle under logistic regression, say M classes PP YY = jj XX = 1,, MM 1 PP YY = MM XX = 1+ mm=1 exp{xx TT ββ jj } MM 1 exp{xx TT ββ mm }, for j = 1 MM 1 exp{xx TT ββ mm } 1+ mm=1 36
37 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 37
38 Recall Linear Regression and Logistic Regression Linear Regression y xx, ββ~nn(xx TT ββ, σσ 2 ) Logistic Regression y xx, ββ~bbbbbbbbbbbbbbbbbb(σσ xx TT ββ ) How about other distributions? Yes, as long as they belong to exponential family 38
39 Canonical Form Exponential Family pp yy; ηη = bb yy exp ηη TT TT yy aa ηη ηη: natural parameter TT yy : sufficient statistic aa ηη : log partition function for normalization bb yy : function that only dependent on yy 39
40 Examples of Exponential Family Many: pp yy; ηη = bb yy exp ηη TT TT yy aa ηη Gaussian, Bernoulli, Poisson, beta, Dirichlet, categorical, For Gaussian (not interested in σσ) For Bernoulli ηη 40
41 Recipe of GLMs Determines a distribution for y E.g., Gaussian, Bernoulli, Poisson Form the linear predictor for ηη ηη = xx TT ββ Determines a link function: μμ = gg 1 (ηη) Connects the linear predictor to the mean of the distribution E.g., μμ = ηη for Gaussian, μμ = σσ ηη for Bernoulli, μμ = eeeeee ηη for Poisson 41
42 Vector Data: Regression Vector Data Linear Regression Model Logistic Regression Model Generalized Linear Model Summary 42
43 What is vector data? Attribute types Linear regression Summary OLS, Probabilistic interpretation Logistic regression Sigmoid function, multiclass classification Generalized linear model Exponential family, link function 43
CS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 3: Vector Data: Logistic Regression Instructor: Yizhou Sun yzsun@cs.ucla.edu October 9, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification
More informationSpecial Topics: Data Science
Special Topics: Data Science L Linear Methods for Prediction Dr. Vidhyasaharan Sethu School of Electrical Engineering & Telecommunications University of New South Wales Sydney, Australia V. Sethu 1 Topics
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looked at -means and hierarchical clustering as mechanisms for unsupervised learning
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: Naïve Bayes Nicholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationLogistic Regression. Hongning Wang
Logistic Regression Hongning Wang CS@UVa Today s lecture Logistic regression model A discriminative classification model Two different perspectives to derive the model Parameter estimation CS@UVa CS 6501:
More informationDecision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag
Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Announcements Course TA: Hao Xiong Office hours: Friday 2pm-4pm in ECSS2.104A1 First homework
More informationknn & Naïve Bayes Hongning Wang
knn & Naïve Bayes Hongning Wang CS@UVa Today s lecture Instance-based classifiers k nearest neighbors Non-parametric learning algorithm Model-based classifiers Naïve Bayes classifier A generative model
More informationECO 745: Theory of International Economics. Jack Rossbach Fall Lecture 6
ECO 745: Theory of International Economics Jack Rossbach Fall 2015 - Lecture 6 Review We ve covered several models of trade, but the empirics have been mixed Difficulties identifying goods with a technological
More informationLecture 5. Optimisation. Regularisation
Lecture 5. Optimisation. Regularisation COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne Iterative optimisation Loss functions Coordinate
More informationThe Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
The Simple Linear Regression Model ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline Definition. Deriving the Estimates. Properties of the Estimates. Units of Measurement and Functional Form. Expected
More informationEvaluating and Classifying NBA Free Agents
Evaluating and Classifying NBA Free Agents Shanwei Yan In this project, I applied machine learning techniques to perform multiclass classification on free agents by using game statistics, which is useful
More informationMidterm Exam 1, section 2. Thursday, September hour, 15 minutes
San Francisco State University Michael Bar ECON 312 Fall 2018 Midterm Exam 1, section 2 Thursday, September 27 1 hour, 15 minutes Name: Instructions 1. This is closed book, closed notes exam. 2. You can
More informationNeural Networks II. Chen Gao. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Neural Networks II Chen Gao ECE-5424G / CS-5824 Virginia Tech Spring 2019 Neural Networks Origins: Algorithms that try to mimic the brain. What is this? A single neuron in the brain Input Output Slide
More informationAttacking and defending neural networks. HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China
Attacking and defending neural networks HU Xiaolin ( 胡晓林 ) Department of Computer Science and Technology Tsinghua University, Beijing, China Outline Background Attacking methods Defending methods 2 AI
More informationISyE 6414 Regression Analysis
ISyE 6414 Regression Analysis Lecture 2: More Simple linear Regression: R-squared (coefficient of variation/determination) Correlation analysis: Pearson s correlation Spearman s rank correlation Variable
More informationLecture 10. Support Vector Machines (cont.)
Lecture 10. Support Vector Machines (cont.) COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Soft margin SVM Intuition and problem
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lectures: Stefanos Zafeiriou Goal (Lectures): To present modern statistical machine learning/pattern recognition algorithms. The course
More informationMachine Learning Application in Aviation Safety
Machine Learning Application in Aviation Safety Surface Safety Metric MOR Classification Presented to: By: Date: ART Firdu Bati, PhD, FAA September, 2018 Agenda Surface Safety Metric (SSM) development
More informationCombining Experimental and Non-Experimental Design in Causal Inference
Combining Experimental and Non-Experimental Design in Causal Inference Kari Lock Morgan Department of Statistics Penn State University Rao Prize Conference May 12 th, 2017 A Tribute to Don Design trumps
More information2 When Some or All Labels are Missing: The EM Algorithm
CS769 Spring Advanced Natural Language Processing The EM Algorithm Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Given labeled examples (x, y ),..., (x l, y l ), one can build a classifier. If in addition
More informationFunctions of Random Variables & Expectation, Mean and Variance
Functions of Random Variables & Expectation, Mean and Variance Kuan-Yu Chen ( 陳冠宇 ) @ TR-409, NTUST Functions of Random Variables 1 Given a random variables XX, one may generate other random variables
More informationOperational Risk Management: Preventive vs. Corrective Control
Operational Risk Management: Preventive vs. Corrective Control Yuqian Xu (UIUC) July 2018 Joint Work with Lingjiong Zhu and Michael Pinedo 1 Research Questions How to manage operational risk? How does
More informationJasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM)
Jasmin Smajic 1, Christian Hafner 2, Jürg Leuthold 2, March 16, 2015 Introduction to Finite Element Method (FEM) Part 1 (2-D FEM) 1 HSR - University of Applied Sciences of Eastern Switzerland Institute
More informationBASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG
BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG GOAL OF PROJECT The goal is to predict the winners between college men s basketball teams competing in the 2018 (NCAA) s March
More informationPredicting NBA Shots
Predicting NBA Shots Brett Meehan Stanford University https://github.com/brettmeehan/cs229 Final Project bmeehan2@stanford.edu Abstract This paper examines the application of various machine learning algorithms
More informationNCSS Statistical Software
Chapter 256 Introduction This procedure computes summary statistics and common non-parametric, single-sample runs tests for a series of n numeric, binary, or categorical data values. For numeric data,
More informationPredicting the Total Number of Points Scored in NFL Games
Predicting the Total Number of Points Scored in NFL Games Max Flores (mflores7@stanford.edu), Ajay Sohmshetty (ajay14@stanford.edu) CS 229 Fall 2014 1 Introduction Predicting the outcome of National Football
More informationImperfectly Shared Randomness in Communication
Imperfectly Shared Randomness in Communication Madhu Sudan Harvard Joint work with Clément Canonne (Columbia), Venkatesan Guruswami (CMU) and Raghu Meka (UCLA). 11/16/2016 UofT: ISR in Communication 1
More informationA Novel Approach to Predicting the Results of NBA Matches
A Novel Approach to Predicting the Results of NBA Matches Omid Aryan Stanford University aryano@stanford.edu Ali Reza Sharafat Stanford University sharafat@stanford.edu Abstract The current paper presents
More informationPREDICTING the outcomes of sporting events
CS 229 FINAL PROJECT, AUTUMN 2014 1 Predicting National Basketball Association Winners Jasper Lin, Logan Short, and Vishnu Sundaresan Abstract We used National Basketball Associations box scores from 1991-1998
More informationThis file is part of the following reference:
This file is part of the following reference: Hancock, Timothy Peter (2006) Multivariate consensus trees: tree-based clustering and profiling for mixed data types. PhD thesis, James Cook University. Access
More informationPhysical Design of CMOS Integrated Circuits
Physical Design of CMOS Integrated Circuits Dae Hyun Kim EECS Washington State University References John P. Uyemura, Introduction to VLSI Circuits and Systems, 2002. Chapter 5 Goal Understand how to physically
More informationSupport Vector Machines: Optimization of Decision Making. Christopher Katinas March 10, 2016
Support Vector Machines: Optimization of Decision Making Christopher Katinas March 10, 2016 Overview Background of Support Vector Machines Segregation Functions/Problem Statement Methodology Training/Testing
More informationNavigate to the golf data folder and make it your working directory. Load the data by typing
Golf Analysis 1.1 Introduction In a round, golfers have a number of choices to make. For a particular shot, is it better to use the longest club available to try to reach the green, or would it be better
More informationEE582 Physical Design Automation of VLSI Circuits and Systems
EE Prof. Dae Hyun Kim School of Electrical Engineering and Computer Science Washington State University Routing Grid Routing Grid Routing Grid Routing Grid Routing Grid Routing Lee s algorithm (Maze routing)
More informationEstimating the Probability of Winning an NFL Game Using Random Forests
Estimating the Probability of Winning an NFL Game Using Random Forests Dale Zimmerman February 17, 2017 2 Brian Burke s NFL win probability metric May be found at www.advancednflstats.com, but the site
More informationDistancei = BrandAi + 2 BrandBi + 3 BrandCi + i
. Suppose that the United States Golf Associate (USGA) wants to compare the mean distances traveled by four brands of golf balls when struck by a driver. A completely randomized design is employed with
More informationCS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan
CS 7641 A (Machine Learning) Sethuraman K, Parameswaran Raman, Vijay Ramakrishnan Scenario 1: Team 1 scored 200 runs from their 50 overs, and then Team 2 reaches 146 for the loss of two wickets from their
More informationCommunication Amid Uncertainty
Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan
More informationFun Neural Net Demo Site. CS 188: Artificial Intelligence. N-Layer Neural Network. Multi-class Softmax Σ >0? Deep Learning II
Fun Neural Net Demo Site CS 188: Artificial Intelligence Demo-site: http://playground.tensorflow.org/ Deep Learning II Instructors: Pieter Abbeel & Anca Dragan --- University of California, Berkeley [These
More informationProjecting Three-Point Percentages for the NBA Draft
Projecting Three-Point Percentages for the NBA Draft Hilary Sun hsun3@stanford.edu Jerold Yu jeroldyu@stanford.edu December 16, 2017 Roland Centeno rcenteno@stanford.edu 1 Introduction As NBA teams have
More informationConservation of Energy. Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition
Conservation of Energy Chapter 7 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 Different Types of Force, regarding the Work they do. gravity friction 2 Conservative Forces BB WW cccccccc
More informationDo New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study
Do New Bike Share Stations Increase Member Use?: A Quasi-Experimental Study Jueyu Wang & Greg Lindsey Humphrey School of Public Affairs University of Minnesota Acknowledgement: NSF Sustainable Research
More informationNaïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Naïve Bayes These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton s slides and grateful acknowledgement to the many others who made their course materials freely available
More informationNaïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Naïve Bayes These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More informationChapter 10 Aggregate Demand I: Building the IS LM Model
Chapter 10 Aggregate Demand I: Building the IS LM Model Zhengyu Cai Ph.D. Institute of Development Southwestern University of Finance and Economics All rights reserved http://www.escience.cn/people/zhengyucai/index.html
More informationB. AA228/CS238 Component
Abstract Two supervised learning methods, one employing logistic classification and another employing an artificial neural network, are used to predict the outcome of baseball postseason series, given
More informationA Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept
International Journal of Probability and Statistics 015, 4(): 4-50 DOI: 10.593/j.ijps.015040.0 A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept F. B. Adebola 1, N. A. Adegoke
More informationCommunication Amid Uncertainty
Communication Amid Uncertainty Madhu Sudan Harvard University Based on joint works with Brendan Juba, Oded Goldreich, Adam Kalai, Sanjeev Khanna, Elad Haramaty, Jacob Leshno, Clement Canonne, Venkatesan
More informationTitle: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings
Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings Objectives The goal of this study is to advance the state of the art in understanding
More informationIvan Suarez Robles, Joseph Wu 1
Ivan Suarez Robles, Joseph Wu 1 Section 1: Introduction Mixed Martial Arts (MMA) is the fastest growing competitive sport in the world. Because the fighters engage in distinct martial art disciplines (boxing,
More informationTitle: 4-Way-Stop Wait-Time Prediction Group members (1): David Held
Title: 4-Way-Stop Wait-Time Prediction Group members (1): David Held As part of my research in Sebastian Thrun's autonomous driving team, my goal is to predict the wait-time for a car at a 4-way intersection.
More informationModelling Exposure at Default Without Conversion Factors for Revolving Facilities
Modelling Exposure at Default Without Conversion Factors for Revolving Facilities Mark Thackham Credit Scoring and Credit Control XV, Edinburgh, August 2017 1 / 27 Objective The objective of this presentation
More informationCS 221 PROJECT FINAL
CS 221 PROJECT FINAL STUART SY AND YUSHI HOMMA 1. INTRODUCTION OF TASK ESPN fantasy baseball is a common pastime for many Americans, which, coincidentally, defines a problem whose solution could potentially
More informationPREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING. The Ringer/Getty Images
PREDICTING THE NCAA BASKETBALL TOURNAMENT WITH MACHINE LEARNING A N D R E W L E V A N D O S K I A N D J O N A T H A N L O B O The Ringer/Getty Images THE TOURNAMENT MARCH MADNESS 68 teams (4 play-in games)
More informationLesson 14: Modeling Relationships with a Line
Exploratory Activity: Line of Best Fit Revisited 1. Use the link http://illuminations.nctm.org/activity.aspx?id=4186 to explore how the line of best fit changes depending on your data set. A. Enter any
More informationThe Effect of Public Sporting Expenditures on Medal Share at the Summer Olympic Games: A Study of the Differential Impact by Sport and Gender
Dickinson College Dickinson Scholar Student Honors Theses By Year Student Honors Theses 5-21-2017 The Effect of Public Sporting Expenditures on Medal Share at the Summer Olympic Games: A Study of the Differential
More informationPre-Kindergarten 2017 Summer Packet. Robert F Woodall Elementary
Pre-Kindergarten 2017 Summer Packet Robert F Woodall Elementary In the fall, on your child s testing day, please bring this packet back for a special reward that will be awarded to your child for completion
More informationMinimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation
Minimum Mean-Square Error (MMSE) and Linear MMSE (LMMSE) Estimation Outline: MMSE estimation, Linear MMSE (LMMSE) estimation, Geometric formulation of LMMSE estimation and orthogonality principle. Reading:
More informationLegendre et al Appendices and Supplements, p. 1
Legendre et al. 2010 Appendices and Supplements, p. 1 Appendices and Supplement to: Legendre, P., M. De Cáceres, and D. Borcard. 2010. Community surveys through space and time: testing the space-time interaction
More informationTrouble With The Curve: Improving MLB Pitch Classification
Trouble With The Curve: Improving MLB Pitch Classification Michael A. Pane Samuel L. Ventura Rebecca C. Steorts A.C. Thomas arxiv:134.1756v1 [stat.ap] 5 Apr 213 April 8, 213 Abstract The PITCHf/x database
More informationISyE 6414: Regression Analysis
ISyE 6414: Regression Analysis Lectures: MWF 8:00-10:30, MRDC #2404 Early five-week session; May 14- June 15 (8:00-9:10; 10-min break; 9:20-10:30) Instructor: Dr. Yajun Mei ( YA_JUNE MAY ) Email: ymei@isye.gatech.edu;
More informationIntroduction to Genetics
Name: Introduction to Genetics Keystone Assessment Anchor: BIO.B.2.1.1: Describe and/or predict observed patterns of inheritance (i.e. dominant, recessive, co-dominance, incomplete dominance, sex-linked,
More informationPredicting Horse Racing Results with Machine Learning
Predicting Horse Racing Results with Machine Learning LYU 1703 LIU YIDE 1155062194 Supervisor: Professor Michael R. Lyu Outline Recap of last semester Object of this semester Data Preparation Set to sequence
More informationMachine Learning an American Pastime
Nikhil Bhargava, Andy Fang, Peter Tseng CS 229 Paper Machine Learning an American Pastime I. Introduction Baseball has been a popular American sport that has steadily gained worldwide appreciation in the
More informationAnnouncements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions
Announcements Announcements Lecture 19: Inference for SLR & Statistics 101 Mine Çetinkaya-Rundel April 3, 2012 HW 7 due Thursday. Correlation guessing game - ends on April 12 at noon. Winner will be announced
More informationMathematics of Pari-Mutuel Wagering
Millersville University of Pennsylvania April 17, 2014 Project Objectives Model the horse racing process to predict the outcome of a race. Use the win and exacta betting pools to estimate probabilities
More informationIntroduction to Machine Learning NPFL 054
Introduction to Machine Learning NPFL 054 http://ufal.mff.cuni.cz/course/npfl054 Barbora Hladká hladka@ufal.mff.cuni.cz Martin Holub holub@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and
More informationLecture 22: Multiple Regression (Ordinary Least Squares -- OLS)
Statistics 22_multiple_regression.pdf Michael Hallstone, Ph.D. hallston@hawaii.edu Lecture 22: Multiple Regression (Ordinary Least Squares -- OLS) Some Common Sense Assumptions for Multiple Regression
More informationTie Breaking Procedure
Ohio Youth Basketball Tie Breaking Procedure The higher seeded team when two teams have the same record after completion of pool play will be determined by the winner of their head to head competition.
More informationPairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games
Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games Tony Liu Introduction The broad aim of this project is to use the Bradley Terry pairwise comparison model as
More informationCT4510: Computer Graphics. Transformation BOCHANG MOON
CT4510: Computer Graphics Transformation BOCHANG MOON 2D Translation Transformations such as rotation and scale can be represented using a matrix M ee. gg., MM = SSSS xx = mm 11 xx + mm 12 yy yy = mm 21
More informationUse of Auxiliary Variables and Asymptotically Optimum Estimators in Double Sampling
International Journal of Statistics and Probability; Vol. 5, No. 3; May 2016 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Use of Auxiliary Variables and Asymptotically
More informationBuilding an NFL performance metric
Building an NFL performance metric Seonghyun Paik (spaik1@stanford.edu) December 16, 2016 I. Introduction In current pro sports, many statistical methods are applied to evaluate player s performance and
More informationNew Numerical Schemes for the Solution of Slightly Stiff Second Order Ordinary Differential Equations
American Journal of Computational and Applied Mathematics 24, 4(6): 239-246 DOI:.5923/j.ajcam.2446.7 New Numerical Schemes for the Solution of Slightly Stiff Second Order Ordinary Differential Equations
More informationSimulating Major League Baseball Games
ABSTRACT Paper 2875-2018 Simulating Major League Baseball Games Justin Long, Slippery Rock University; Brad Schweitzer, Slippery Rock University; Christy Crute Ph.D, Slippery Rock University The game of
More informationNew Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary Variables
Journal of Mathematics and System Science 7 (017) 48-60 doi: 10.1765/159-591/017.09.00 D DAVID PUBLISHING New Class of Almost Unbiased Modified Ratio Cum Product Estimators with Knownparameters of Auxiliary
More informationMarathon Performance Prediction of Amateur Runners based on Training Session Data
Marathon Performance Prediction of Amateur Runners based on Training Session Data Daniel Ruiz-Mayo, Estrella Pulido, and Gonzalo Martínez-Muñoz Escuela Politécnica Superior, Universidad Autónoma de Madrid,
More informationAn agent-based methodological approach for modeling travelers behavior on modal shift: The case of inland transport in Denmark.
An agent-based methodological approach for modeling travelers behavior on modal shift: The case of inland transport in Denmark ETSAP Workshop Zurich, 11 th -12 th Dec 2017 ABMoS-DK Mohammad Ahanchian PostDoc
More informationPeer Effect in Sports: A Case Study in Speed. Skating and Swimming
Peer Effect in Sports: A Case Study in Speed Skating and Swimming Tue Nguyen Advisor: Takao Kato 1 Abstract: Peer effect is studied for its implication in the workplace, schools and sports. Sports provide
More informationA microscopic view. Solid rigid body. Liquid. Fluid. Incompressible. Gas. Fluid. compressible
Hello! I m Chris Blake, your lecturer for the rest of semester We ll cover: fluid motion, thermal physics, electricity, revision MASH centre in AMDC 503-09.30-16.30 daily My consultation hours: Tues 10.30-12.30
More informationPredicting the use of the Sacrifice Bunt in Major League Baseball. Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao
Predicting the use of the Sacrifice Bunt in Major League Baseball Charlie Gallagher Brian Gilbert Neelay Mehta Chao Rao Understanding the Data Data from the St. Louis Cardinals Sig Mejdal, Senior Quantitative
More informationDISMAS Evaluation: Dr. Elizabeth C. McMullan. Grambling State University
DISMAS Evaluation 1 Running head: Project Dismas Evaluation DISMAS Evaluation: 2007 2008 Dr. Elizabeth C. McMullan Grambling State University DISMAS Evaluation 2 Abstract An offender notification project
More informationAnnouncements. Unit 7: Multiple Linear Regression Lecture 3: Case Study. From last lab. Predicting income
Announcements Announcements Unit 7: Multiple Linear Regression Lecture 3: Case Study Statistics 101 Mine Çetinkaya-Rundel April 18, 2013 OH: Sunday: Virtual OH, 3-4pm - you ll receive an email invitation
More informationProduct Decomposition in Supply Chain Planning
Mario R. Eden, Marianthi Ierapetritou and Gavin P. Towler (Editors) Proceedings of the 13 th International Symposium on Process Systems Engineering PSE 2018 July 1-5, 2018, San Diego, California, USA 2018
More informationHow To Win The Ashes A Statistician s Guide
How To Win The Ashes A Statistician s Guide Kevin Wang Sydney Universtiy Mathematics Society Last Modified: August 21, 2015 Motivation How come SUMS doesn t do statistics talk very often? Motivation How
More informationParsimonious Linear Fingerprinting for Time Series
Parsimonious Linear Fingerprinting for Time Series Lei Li, B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon University VLDB 2010 1 L. Li, 2010 VLDB2010, 36 th International
More informationMachine Learning Methods for Climbing Route Classification
Machine Learning Methods for Climbing Route Classification Alejandro Dobles Mathematics adobles@stanford.edu Juan Carlos Sarmiento Management Science & Engineering jcs10@stanford.edu Abstract Peter Satterthwaite
More informationReal-time Analysis of Industrial Gases with Online Near- Infrared Spectroscopy
Real-time Analysis of Industrial Gases with Online Near- Infrared Branch 5 Petrochemical, Biofuels Keywords Process, NIR, Gases, Gas Cell, Ethylene, Carbon Dioxide, Propane, Butane, Acetylene Summary Near-infrared
More informationModeling the NCAA Tournament Through Bayesian Logistic Regression
Duquesne University Duquesne Scholarship Collection Electronic Theses and Dissertations 0 Modeling the NCAA Tournament Through Bayesian Logistic Regression Bryan Nelson Follow this and additional works
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Differential Kinematics, Probabilistic Roadmaps Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu)
More informationPractical Approach to Evacuation Planning Via Network Flow and Deep Learning
Practical Approach to Evacuation Planning Via Network Flow and Deep Learning Akira Tanaka Nozomi Hata Nariaki Tateiwa Katsuki Fujisawa Graduate School of Mathematics, Kyushu University Institute of Mathematics
More informationCS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing.
CS 528 Mobile and Ubiquitous Computing Lecture 7a: Applications of Activity Recognition + Machine Learning for Ubiquitous Computing Emmanuel Agu Applications of Activity Recognition Recall: Activity Recognition
More informationUsing Spatio-Temporal Data To Create A Shot Probability Model
Using Spatio-Temporal Data To Create A Shot Probability Model Eli Shayer, Ankit Goyal, Younes Bensouda Mourri June 2, 2016 1 Introduction Basketball is an invasion sport, which means that players move
More informationUsing Markov Chains to Analyze a Volleyball Rally
1 Introduction Using Markov Chains to Analyze a Volleyball Rally Spencer Best Carthage College sbest@carthage.edu November 3, 212 Abstract We examine a volleyball rally between two volleyball teams. Using
More informationPredicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007
Predicting the use of the sacrifice bunt in Major League Baseball BUDT 714 May 10, 2007 Group 6 Charles Gallagher Brian Gilbert Neelay Mehta Chao Rao Executive Summary Background When a runner is on-base
More informationRunning head: DATA ANALYSIS AND INTERPRETATION 1
Running head: DATA ANALYSIS AND INTERPRETATION 1 Data Analysis and Interpretation Final Project Vernon Tilly Jr. University of Central Oklahoma DATA ANALYSIS AND INTERPRETATION 2 Owners of the various
More informationMultilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11
Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11 Study Overview: These data come from a daily diary study that followed 41 male and female college students over a six-week period to examine
More informationLab 11: Introduction to Linear Regression
Lab 11: Introduction to Linear Regression Batter up The movie Moneyball focuses on the quest for the secret of success in baseball. It follows a low-budget team, the Oakland Athletics, who believed that
More informationPredicting the NCAA Men s Basketball Tournament with Machine Learning
Predicting the NCAA Men s Basketball Tournament with Machine Learning Andrew Levandoski and Jonathan Lobo CS 2750: Machine Learning Dr. Kovashka 25 April 2017 Abstract As the popularity of the NCAA Men
More information