Automatic Identification and Analysis of Basketball Plays: NBA On-Ball Screens Thesis Proposal Master of Computer Information Science Graduate College of Electrical Engineering and Computer Science Cleveland State University Andrew Yu January 27, 2017 Approved by Advisory Committee Members: [Name of Chair], Thesis Committee Chair Date [Name of Supervisor], Supervisor Date [Name of Committee Member], Thesis Committee Member Date [Name of Committee Member], Thesis Committee Member Date
1. Introduction 1.1. Background Sports analytics is the management and analysis of data collected from sports games to quantify past results and predict future outcomes. Simple box scores have been collected since the rise of professional sports in the 19th century [1], but as the means of data collection and storage improve, so does the depth and impact of sports analytics. Nowadays, one would be hard-pressed to find a professional sports team without its own analytics department. In the National Basketball Association (NBA), the home team of each game employs a group of scorekeepers who record basic statistics like points, rebounds, and assists. Such statistics are easily accessible and considered common knowledge to all teams, players and fans. Teams looking for more detailed data from the game must hire their own analytics departments. Traditionally, analysts were often assigned the grueling task of watching video recordings of basketball games to manually collect data, but the development of sophisticated computer tools for data collection in recent years has allowed them to focus more on analysis. Because of the competitive nature of sports, analytics departments operate behind the scenes, rarely if ever divulging their findings to the public, but when playoffs start and games become more important, every decision on the basketball court is more deliberate, and the influence of analytics shines through. 1.2. Motivation In the 2016 NBA Finals, crucial moments of the championship-deciding Game 7 featured both teams repeatedly employing an offensive play called on-ball screens to gain an advantageous switch. This play involves four players, two on defense guarding two on offense, one of which is holding the ball. First, the offensive player without the ball (screener) stands still and sets a screen, effectively making himself an obstacle for anyone trying to move through his area. Then, the offensive player with the ball (ball-handler) dribbles the ball and brushes by the screener. The defensive player trying to stay close to the ball-handler will have his path blocked by the screener, putting him in a poor position to defend the ball-handler. While the defender struggles to recover by going around the screen, another defender (usually the one guarding the screener) must step up to defend the ball-handler. In the end, a different defender is guarding the ball-handler and a switch has occurred. The goal of the play is to force a poor defender to defend a good offensive player. The on-ball screen is a simple play that is fundamental to any NBA offense, from which a variety of different plays, like the pick-and-roll, can arise. To defend against it, a team must define a set of rules to follow when facing an on-ball screen, allowing defenders to coordinate properly and
respond quickly. These rules are often the defining philosophy of a defensive-minded coach, and a great defender follows them instinctively. This thesis explores methodologies to build a framework that can automatically identify and analyze on-ball screens resulting in switches based on the motion of the players on the court. This framework may allow defensive teams to quickly develop sophisticated rules that are highly individualized, from discouraging opposing teams in the next game to disrupting players in the next play. This framework may also allow offensive teams to identify teams and players that are vulnerable to the on-ball screen. 1.3. Data There are three types of data to be collected and used to build the framework: 1. SportVU Player Motion-Tracking Data SportVU is a proprietary technology from Stats LLC in partnership with the NBA since 2013 to provide full-featured motion-tracking capabilities and data feeds to all 30 NBA arenas [7]. SportVU utilizes a six-camera system installed in basketball arenas to track the real-time positions of all 10 players and the ball 25 times per second [5]. The SportVU motion-tracking data (henceforth called SportVU data ) will be collected and used for this work. 2. Play-By-Plays The play-by-play, compiled by the scorekeepers in each NBA game, includes information such as scoring plays, rebounds, turnovers, fouls, timeouts, and substitution. The play-byplay charts are necessary to build higher-level basketball concepts, such as possessions. 3. NBA Game Tape Television broadcast video of the basketball games must be used clarify the SportVU data when the SportVU technology produces errors and artifacts. It can also be used to build a testing set for automatic identification of on-ball screens. 2. Objective 2.1. Problem Statement
In the past, identification of on-ball screens was done manually in painstaking fashion by watching video recordings of the game. Because on-ball screens are highly dynamic actions involving multiple players and various lengths of time, automatic identification of such plays became possible only with the advent of SportVU data. Each on-ball screen occurs in less than three seconds, meaning the high frame rate and high resolution positioning of the SportVU data is required to properly identify them. Such a solution can save analysts enormous amounts of time and effort. Unfortunately, the SportVU data in its raw form is not suitable for querying. As a semistructured data format intended for exchange over the internet, a considerable amount of preprocessing is required to extract player positions for each frame. In addition, attributes like teams, game date, and players are codified with numeric identifiers and repeated throughout the raw files to produce a hierarchical structure in which each frame contains all the information about the game, even ones that never change. This organizational feature ensures that each frame contains all game-related information, which is useful for partial transfer, but it comes at a cost of redundancy, bloat in size, and extra preprocessing requirement. Automatic analysis is an additional layer of data on top of automatic identification. Once an onball screen is identified, it would be helpful to obtain additional information about the screen, such as directionality, resulting action, reactions from defense, etc. to quickly organize them into groups for more individualized analysis. Analysis of on-ball screens might identify which players are good or bad at executing or defending them, but such information must be translated to a basketball language that NBA players and coaches can understand and exploit. Visual recreations of the court with player positions, movements, and recommended actions would convey the message more effectively than a chart full of numbers. 2.2. Objective The purpose of this thesis is to develop a framework upon which on-ball screens can be automatically identified and analyzed given SportVU, play-by-play, and game tape data, then translated into offensive and defensive strategies. This framework will identify patterns in opponents on-ball screens and point to strategies that exploits certain weaknesses in their defensive strategies. The analysis will weigh the outcomes of on-ball screens in the past, as well as trends and tendencies in more recent games. The framework will also provide an interface for the available data that enables more efficient querying and visualization. 2.3. Related Work
The MIT Sloan Sports Analytics Conference [2] (henceforth called Sloan Conference ) is emerging as the premier sports analytics conference, where industry-leading ideas are shared. Founded in 2006, the Sloan Conference has featured some of the most involved players in sports analytics, including NBA general managers, former players, coaches, and journalists from all four major league sports in the US (NBA, NFL, MLB, NHL). At the 2014 Sloan Conference, McQueen et al. [4] unveiled a predictive algorithm for on-ball screens that achieved a sensitivity of 82% and positive predictive value of 80%. However, it suffered from a lack of available data with only 252 minutes of basketball play in 14 games. At the 2016 Sloan Conference, McIntyre et al. [3] followed up on the 2014 paper using a more robust dataset for a deeper look at the identification of on-ball screens, subdividing them into multiple categories based on the defense s reaction to the screen. With such a large dataset, the study was able to individualize the results for each player and the role he played in the screens. Based on these findings, the study could produce a simple defensive strategy for a theoretical scenario, involving different offensive and defensive players. 3. Methodology 3.1. Data Collection and Preprocessing 3.1.1 SportVU Player Motion-Tracking Data To automatically identify the on-ball screen, raw SportVU data will be collected. Some of this data had been disseminated to the public through NBA s official stats page [6], though it is no longer available. This thesis only makes use of the disseminated data, which was available in in semi-structured format for internet for data exchange purposes. All the available SportVU data are from 631 games in the first half of the 2015-16 NBA season (10/27/2015 to 1/23/2016). Each file is around 100MB of plaintext and contains all motiontracking data for all players in one complete game. On average, each game contains 75,000 frames at 25 frames per second (fps), equaling 50 minutes of high speed motion tracking player positions to 100,000th of a foot (as an approximation). With each NBA game consisting of 48 minutes, the 2 extra minutes correspond to frames in which the game clock is not moving. 3.1.2 Play-By-Plays
To supplement the SportVU data, play-by-play information from each game should be retrieved from NBA s official stats page. The play-by-play includes information such as scoring plays, rebounds, turnovers, fouls, timeouts, and substitution, which are necessary to build higher-level basketball concepts, such as possessions. Each NBA game consists of 500 play-by-play events on average, accompanied by the game clock, measured to the second, and the score for each team after the event. The game clock resolution in the play-by-play is a far cry from the 25fps SportVU data, but because on-ball screens do not typically occur simultaneously with events, the low resolution should be a nonissue. 3.1.3 NBA Game Tape It is helpful to corroborate findings made through the SportVU data with a live recording of the game, because the recording offers details and nuances in the game that are lost when converting players to single points on the court, namely players direction and body positioning. Some errors and artifacts in the SportVU data will also necessitate a second source to confirm a player s position. Full game tapes, in the form of television broadcasts digitally encoded at 30 frames per second, 3000KB/s, and around 150 minutes long each, are available to subscribers of NBA League Pass, an online video streaming service from the NBA. Note that the synchronization of the game tape and SportVU data is possible because SportVU data offers actual time of each frame to the millisecond. 3.2. Identification and Analysis The automatic identification of on-ball screens in the 631 games will involve three steps: 1. Identify candidates for on-ball screens A rule-based algorithm is created to group large datasets of single frames into groups of frames that have features of an on-ball screen based on the SportVU player motion-tracking data.
The rules are made intentionally inclusive to maximize the sensitivity of the algorithm, and to include false positives that will be useful in the next step. 2. Manually label the candidates A sample of the events identified by the rule-based algorithm is manually labeled using visualizations of the SportVU data. For edge cases, game tape is used to confirm or reject an event as an on-ball screen. 3. Build a machine learning algorithm. An intelligent algorithm will be trained using the labeled data as training and learning sets. Different samples of the SportVU data is used to test and refine the algorithm, to find one that recognizes on-ball screens most consistently. The algorithm will be applied to the unlabeled data to predict on-ball screens from all 631 games, and identify each player and his role in the on-ball screen. The analysis of the on-ball screens is reliant on the results from the identification step. Analysis involves metadata related to the on-ball screens, such as players involved, subsequent events, points scored, and reaction by defense. Patterns found in the metadata reveals effectiveness of the on-ball screens based on above variables. Effectiveness of hypothetical on-ball screens with any given variables is predicted. 3.4. Data Visualization The framework will provide a visualization tool for the SportVU data to supplement the data and corroborate the findings. Representing data points as a recreation of the physical basketball court allows for intuitive understanding of the movement of players in the context of a basketball game, enabling more efficient analysis. A sophisticated visualization tool is needed to properly demonstrate the movement of players and the ball at high speed and resolution without losing any details of the SportVU data. Such a visualization should be used to augment, not replace, the SportVU data. 4. Timeline
Weekly meetings with thesis advisor will take place to give an update on progression of the project throughout the semester. Fall Semester 2016 o NBA Basketball Analytics Hackathon September 2016 o Topic Exploration October 2016 Winter 2016 o Data Collection and Sample Visualization December 2016 o Data Consolidation and non-ai Analysis January 2016 Spring 2017 o AI Analysis and Solution Building February 2017 o Write Thesis paper April 2017 o Give oral defense May 2017 5. References [1] Paul Dickson. The Joy of Keeping Score. New York: Walker. ISBN 0-15-600516-6. [2] Nick Fasulo. MIT Sloan Sports Analytics Conference, Day 1: Explaining The 10,000 Hour Rule, And More. SBNation.com. Retrieved 2013-03-04. [3] Avery McIntyre, Joel Brooks, John Guttag, and Jenna Wiens. Recognizing and Analyzing Ball Screen Defense in the NBA. [4] Armand McQueen, Jenna Wiens, and John Guttag. Automatically Recognizing On-Ball Screens. In 2014 MIT Sloan Sports Analytics Conference, 2014. [5] Basketball Data Feed Basketball Player Tracking SportVU, http://www.stats.com/sportvubasketball-media/, Retrieved Dec. 2016. [6] NBA.com/Stats FAQ, http://stats.nba.com/help/faq/, Retrieved Dec. 2016. [7] NBA partners with Stats LLC for tracking technology, http://www.nba.com/2013/news/09/05/nba-stats-llc-player-tracking-technology/, Sep 5, 2013, Retrieved Dec. 2016.