SPATIAL STATISTICS A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS. Introduction

Similar documents
Evaluating NBA Shooting Ability using Shot Location

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T

Two Machine Learning Approaches to Understand the NBA Data

PREDICTING the outcomes of sporting events

Using Spatio-Temporal Data To Create A Shot Probability Model

Using Actual Betting Percentages to Analyze Sportsbook Behavior: The Canadian and Arena Football Leagues

Atmospheric Waves James Cayer, Wesley Rondinelli, Kayla Schuster. Abstract

Section I: Multiple Choice Select the best answer for each problem.

Chapter 12 Practice Test

1 Introduction. 2 EAD and Derived Factors

Math SL Internal Assessment What is the relationship between free throw shooting percentage and 3 point shooting percentages?

OXYGEN POWER By Jack Daniels, Jimmy Gilbert, 1979

Evaluating and Classifying NBA Free Agents

Legendre et al Appendices and Supplements, p. 1

Department of Economics Working Paper

Navigate to the golf data folder and make it your working directory. Load the data by typing

USING DELTA-GAMMA GENERALIZED LINEAR MODELS TO STANDARDIZE CATCH RATES OF YELLOWFIN TUNA CAUGHT BY BRAZILIAN BAIT-BOATS

THE SECRETS OF STEPHEN CURRY. AS TOLD BY HIS PARENTS, TRAINERS, and PRIVATE COACHES

Lesson 14: Modeling Relationships with a Line

Torrild - WindSIM Case study

SCIENTIFIC COMMITTEE SEVENTH REGULAR SESSION August 2011 Pohnpei, Federated States of Micronesia

Energy Output. Outline. Characterizing Wind Variability. Characterizing Wind Variability 3/7/2015. for Wind Power Management

The Aging Curve(s) Jerry Meyer, Central Maryland YMCA Masters (CMYM)

Lab Report Outline the Bones of the Story

Atmospheric Rossby Waves Fall 2012: Analysis of Northern and Southern 500hPa Height Fields and Zonal Wind Speed

Calculation of Trail Usage from Counter Data

Studying Advanced Basketball Metrics with Bayesian Quantile Regression A 3-point Shooting Perspective

Compression Study: City, State. City Convention & Visitors Bureau. Prepared for

Taking Your Class for a Walk, Randomly

Active Travel and Exposure to Air Pollution: Implications for Transportation and Land Use Planning

Atomspheric Waves at the 500hPa Level

Pairwise Comparison Models: A Two-Tiered Approach to Predicting Wins and Losses for NBA Games

Mobility Detection Using Everyday GSM Traces

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic,

How to Make, Interpret and Use a Simple Plot

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11

y ) s x x )(y i (x i r = 1 n 1 s y Statistics Lecture 7 Exploring Data , y 2 ,y n (x 1 ),,(x n ),(x 2 ,y 1 How two variables vary together

Projecting Three-Point Percentages for the NBA Draft

Grade: 8. Author(s): Hope Phillips

Exploring the Impacts of Salary Allocation on Team Performance

NBA TEAM SYNERGY RESEARCH REPORT 1

Lab 11: Introduction to Linear Regression

This page intentionally left blank

Growth: Humans & Surf Clams

The Wind-Speed Dose-Response of Tree-Falls Impacting the Transmission Grid of Southwest British Columbia

Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament

CONTRADICTORY CATCH RATES OF BLUE SHARK CAUGHT IN ATLANTIC OCEAN BY BRAZILIAN LONG-LINE FLEET AS ESTIMATED USING GENERALIZED LINEAR MODELS

Analysis of Highland Lakes Inflows Using Process Behavior Charts Dr. William McNeese, Ph.D. Revised: Sept. 4,

Deciding When to Quit: Reference-Dependence over Slot Machine Outcomes

BASKETBALL PREDICTION ANALYSIS OF MARCH MADNESS GAMES CHRIS TSENG YIBO WANG

International Discrimination in NBA

The Rule of Right-Angles: Exploring the term Angle before Depth

A Novel Approach to Predicting the Results of NBA Matches

DEVELOPMENT OF A SET OF TRIP GENERATION MODELS FOR TRAVEL DEMAND ESTIMATION IN THE COLOMBO METROPOLITAN REGION

Analysis of recent swim performances at the 2013 FINA World Championship: Counsilman Center, Dept. Kinesiology, Indiana University

SECTION 2 THE ESSENTIAL COMPONENTS OF THE GAME

Predicting the Total Number of Points Scored in NFL Games

The Effect Touches, Post Touches, and Dribbles Have on Offense for Men's Division I Basketball

Standardized catch rates of U.S. blueline tilefish (Caulolatilus microps) from commercial logbook longline data

Predicting Horse Racing Results with TensorFlow

AOS 103. Week 4 Discussion

INFLUENCE OF ENVIRONMENTAL PARAMETERS ON FISHERY

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Assessment Schedule 2016 Mathematics and Statistics: Demonstrate understanding of chance and data (91037)

INTER-AMERICAN TROPICAL TUNA COMMISSION SCIENTIFIC ADVISORY COMMITTEE FOURTH MEETING. La Jolla, California (USA) 29 April - 3 May 2013

The final set in a tennis match: four years at Wimbledon 1

LOW PRESSURE EFFUSION OF GASES revised by Igor Bolotin 03/05/12

Extreme Shooters in the NBA

Building an NFL performance metric

23 RD INTERNATIONAL SYMPOSIUM ON BALLISTICS TARRAGONA, SPAIN APRIL 2007

Title: Modeling Crossing Behavior of Drivers and Pedestrians at Uncontrolled Intersections and Mid-block Crossings

Business and housing market cycles in the euro area: a multivariate unobserved component approach

Simulation of Wind Variation for the WPI Kite-Powered Water Pump

EYE DOMINANCE. You WILL be one of the three. If you re not sure, use the provided test sheet to check.

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

from ocean to cloud HEAVY DUTY PLOUGH PERFORMANCE IN VERY SOFT COHESIVE SEDIMENTS

University of Nevada, Reno. The Effects of Changes in Major League Baseball Playoff Format: End of Season Attendance

a) List and define all assumptions for multiple OLS regression. These are all listed in section 6.5

Volleyball Tryout Games & Drills

Which On-Base Percentage Shows. the Highest True Ability of a. Baseball Player?

1 Streaks of Successes in Sports

Analysis of Traditional Yaw Measurements

ROUNDABOUT CAPACITY: THE UK EMPIRICAL METHODOLOGY

A Computational Assessment of Gas Jets in a Bubbly Co-Flow 1

STAT 625: 2000 Olympic Diving Exploration

Traffic Parameter Methods for Surrogate Safety Comparative Study of Three Non-Intrusive Sensor Technologies

Addendum to SEDAR16-DW-22

Trouble With The Curve: Improving MLB Pitch Classification

Level 1 Mathematics and Statistics, 2014

Analysis of NAM Forecast Wind Shear for Dissipation of Mesoscale Convective Systems

Special Topics: Data Science

A SURVEY OF 1997 COLORADO ANGLERS AND THEIR WILLINGNESS TO PAY INCREASED LICENSE FEES

The Normal Distribution, Margin of Error, and Hypothesis Testing. Additional Resources

Insurance. The practical application. of credibility models in the rating of Health

1. OVERVIEW OF METHOD

LOW PRESSURE EFFUSION OF GASES adapted by Luke Hanley and Mike Trenary

Investigation of Winning Factors of Miami Heat in NBA Playoff Season

Report on the Comparative Fishing Trial Between the Gadus Atlantica and Teleost

The Impact of Star Power and Team Quality on NBA Attendance THESIS

Smart-Walk: An Intelligent Physiological Monitoring System for Smart Families

Transcription:

A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS KELLIN RUMSEY Introduction The 2016 National Basketball Association championship featured two of the leagues biggest names. The Golden State Warriors Stephen Curry who won the Leagues Most Valuable Player award in 2015 and 2016 found himself matched up against the Cleveland Cavaliers and LeBron James, a superstar who has won the MVP award 4 times in his illustrious career. One of the most intriguing things about the matchup was the drastically different playstyle of the players. James is nearly a half foot taller and 50 pounds heavier than Curry, so their preferred scoring methods are in sharp contrast of each other. In this paper, we attempt to explore this difference in shot selection and efficiency through the use of spatial statistics. Using Point Pattern analysis, we will fit Non-Homogeneous Poisson Processes (NHPP) to compare shot intensity, and we will use Spatial Logistic Regression for the point referenced data to fit a predictive surface for the efficiency of each player. Intensity Analysis Event intensity of a spatial point patter is the mean number of events that occur over a unit area centered at u. If the process is Homogeneous, then we denote λ to be the average across all locations. If the process is Non-Homogeneous we let the intensity vary from location x i to x j. (1) λ(x) = 1 ( ) x xi /h h 2 κ q( x ) i First we should verify that the point process is indeed Non-Homogeneous. Figure 1 provides a side by side comparison of 2958 shots for each player during the 2016 NBA season. Although we are not technically considering the points as marked (it was attempted, but the results were uninteresting), the green points are shots that were made and the black points are shots that were missed. From Figure 1, we see a couple of immediate differences. As expected, Curry is shooting from farther away more frequently than James, and James has a higher intensity of shots near the basket. It is obvious that the process contains clustered if the window is taken to be the entire court. A more interesting case, is to consider 1

A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS 2 Figure 1. Shot Charts the polygon which traces the outline of the bulk of the shots. This polygon contains approximately 94% of the points for Curry, and 99% of the points for James. The K-Functions, G-Functions and F-Functions for the points contained in this region are plotted in Figures 2, 3 and 4. The evidence of clustering is far less significant than if we consider the entire court, but the evidence exists nonetheless. Figure 2. Ripleys K-Functions 2

KELLIN RUMSEY 3 Figure 3. G-Functions Figure 4. F-Functions It is interesting to note that the clustering appears to be more evident in the case of LeBron James. A possible explanation is that, due to his size he is able to shoot close to the basket more often. Curry is much smaller and must shoot whenever he has open space, leading to a slightly more homogeneous pattern. Now we are ready to fit the NHPP s for each player. The density of these fits are plotted below, overlayed with the points and contour plots. From Figure 5, we can make a couple of interesting observations. First, we see that the maximum Intensity value for James is almost double Curry s. While both players have the highest shot intensity at the origin, 3

A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS 4 Figure 5. Non-Homogeneous Point Pattern Density s James is shooting there far more frequently. Curry is also higher at the three point line than James, with non-zero intensity significantly farther out than James. We also notice that James appears to prefer to shoot from the left side of the court as he moves farther out. We have also identified a small sweet spot for Curry on the right side inside the three point line where the contour circle is 2. With the exception of this one contour, the intensity of shots for both players appears to decrease monotonically with distance from the basket. Next we will fit predictive models, and analyze the spatial component of the residuals. First we should consider a few different models. We try the following models for λ(x, y). (1) Linear λ(x, y) = exp(β 0 + β 1 x + β 2 y) (2) Linear with Interaction λ(x, y) = exp(β 0 + β 1 x + β 2 y + β 3 xy) (3) Quadratic λ(x, y) = exp(β 0 + β 1 x + β 2 y + β 3 xy + β 4 x 2 + β 5 y 2 ) (4) Distance from Basket λ(x, y) = exp ( x 2 + y 2) It turns out however, that model (2) is not very interesting, and looks essentially the same as model (1). Hence Figures 6 and 7 provide predictive surfaces for the linear, quadratic and distance based models. 4

KELLIN RUMSEY 5 Figure 6. Stephen Curry - Predictive Surfaces Figure 7. Lebron James - Predictive Surfaces It is obvious that the Linear Model is not very effective, although it does seem to imply (consistent with our previous comments) that Stephen Curry has a slight preference for the right side, and LeBron James has a slight preference for the left side of the court. The distance based model and the quadratic model look 5

A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS 6 fairly similar, but on first glance it appears that the quadratic model is preforming slightly better. Here we turn to the standard diagnostic of checking residuals. Figure 8. Stephen Curry - Smoothed Residuals Figure 9. LeBron James - Smoothed Residuals Abandoning the Linear Model, let us compare the Distance based and the Quadratic Model. For Curry, it appears that the Quadratic Model is fitting slightly better. The magnitude of the residuals have been 6

KELLIN RUMSEY 7 reduced, and they appear somewhat less correlated. We must remember of course that this model contains more parameters, and is in fact a superset of the distance model. To help us choose between the two, we can turn to the AIC values, which will reward goodness of fit, but penalize for the number of parameters. We see that the AIC suggests every so slightly that we should stick with the reduced Distance model. First of all, we notice that the NHPP seems to be fitting better in general for James. Although we are drastically underestimating the number of shots that he takes near the basket. Again it appears that the Quadratic model is the better fit, but this time the difference is less obvious. Indeed, the AIC for the Distance based model is the clear winner. Table 1. AIC Values Linear Model Distance Model Quadratic Model Stephen Curry 3, 571.800 2, 886.400 2, 904.500 LeBron James 1, 598.400-539.400 512.900 This is actually an important result. For James, we can conclude that the best spatial model is the one that simply considers distance from the basket. For Curry, however, there appears to be more to the story, as the higher parameter model is explaining more than the reduced model. In a sense, this is intuitive since Curry is the better shooter, and since he is smaller, he must shoot from everywhere on the court. This is consistent with the early observation that the process is more homogeneous for Curry. Upon closer examination, each parameter in the Quadratic model is statistically significant (although the interaction term is barely significant), so despite the AIC value, we decide to keep the Quadratic model for Stephen Curry but we will use the reduced Distance model for LeBron James. Changing the notation slightly, we provide the following NHPP predictive models for each player. (2) log ( λ SC (x, y) ) = 1.9 (1.5 10 2 )x (6.1 10 2 )y (1.1 10 3 )xy (4.3 10 3 )x 2 (5.0 10 4 )y 2 (3) log ( λ LJ ) = 2.7 0.14r, r = x2 + y 2 Of course, the values λ are meaningful in (2) and (3) only relative to each other. Ideally, we should normalize them over the number of games in the dataset to obtain a physical meaning for λ. In that case it would represent the predicted number of shots per game that each player would take a given location. The spatial analysis of shot intensity has provided some insight into the difference in the two players playstyle. In summary, James shot selection can be more easily characterized through purely spatial analysis. Curry s shot selection appears to be less dependent on distance, and more homogeneous. 7

A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS 8 1. Efficiency Analysis In this section, we would like to fit a spatial logistic regression model which will allow us to predict the probability of making a shot at location x. Again, we are interested in comparing the two players. We will see however, that there are several issues that arise when we try to do this. For one, the method is fairly slow, and our datasets are fairly large. Secondly the spatial correlation of the binary data isn t overwhelmingly clear, especially for LeBron James. Without further ado, we present empirical and parametric semivariograms for each player. Figure 10. Stephen Curry - Semivariograms Figure 11. LeBron James - Semivariogrms The semivariograms are reasonable for Stephen Curry, and we may choose the exponential as it provides a reasonable fit. The data for LeBron James however is less encouraging. The empirical semivariogram 8

KELLIN RUMSEY 9 doesn t display the expected pattern, hence the semi-variogram fits are not very good either. In this case, the Gaussian semivarigoram is the obvious, but still disappointing, choice. Using the spbayes package, we can attempt to fit a spatial logistic regression model. The variograms make it clear that, especially in the second case, there are some counterintuitive spatial relationships going on. Attempting Bayesian Inference on this model was costly, limiting the number of draws we were able to do and also the size of the subset that we sampled from. Using 500 shots, we attempted to fit the model. The trace plots and density curves are not great by any means, but after trying endless combinations of priors we came to the realization that the parameter estimation wasn t going to be fixed by a simple choice of prior. The trace plot for β actually looked quite reasonable, and we got to a place where φ would have been reasonable if we applied an appropriate burn-in and thinning. The chain for σ 2 never converged however, and wouldn t have been helped by any amount of thinning. In the end, the trace plots should be very concerning and cause us to question the results from here on out. For lack of time however, we will continue with the analysis. Figure 12. Trace Plots and Posteriors Using a seperate set of 500 shots for each player, the sppredict function was utilized and a rough predictive plot was formed based on the 5 number summary for the values which were produced by taking the mean of the posterior predictive distribution at each location. Again the results are important and insightful for comparing the two players. Whatever lack of confidence we have in our model due to Figure 12, the model seems to be capturing some of the truth. 9

A SPATIAL ANALYSIS AND COMPARISON OF NBA PLAYERS 10 Figure 13. Efficiency Predictions For instance, it correctly determines that LeBron James will make a very good proportion of his shots near the basket, but will make far less as he moves farther away from the basket. Stephen Curry on the other hand doesn t make as many shots near the basket, but he remains effective from essentially anywhere on the court. The fact that the model captures these details may tell us that we may be close to the truth. In the end, there were some problems with using the spbayes package here. With more time, we could afford to use the entire data set instead of such a small subset. Another point of interest would be to predict at a grid of points across the entire surface rather than at select points. 2. Conclusion Overall, the results of this spatial comparison were satisfactory and intuitive to somebody who understands the players and their difference in playstyles. In the intensity section, we showed that even in the dense regions, there was evidence of clustering, but these clusters occurred differently for each player. We showed that Stephen Curry s shot selection is somewhat more homogeneous, and that he is far more likely to shoot farther out. LeBron s high intensity close to the basket is consistent with his high efficiency according to Figure 13. He shoots often close to the basket, and is highly effective in the same range. This provides an intuitive explanation for his tremendous success in the league. 10

KELLIN RUMSEY 11 We compared two of the best players in the world, and found statistical evidence to back the intuition that was already in place for their very different styles of play. 11